Skip to content
This repository was archived by the owner on Jul 10, 2025. It is now read-only.

Commit e217a8a

Browse files
authored
Add area to resized crop (#2537)
* Added downscale_area to ResizedCrop and SizedCrop * Added downscale_area to ResizedCrop and SizedCrop * Fix * Typofix * fix
1 parent 35b912f commit e217a8a

19 files changed

Lines changed: 512 additions & 100 deletions

.cursor/rules/albumentations-rules.mdc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,4 @@ alwaysApply: true
99
- We use python 3.10+ typing. I.e. not Tuple, but tuple, not List, but list, not Optional, but | None
1010
- get_params_dependent_on_data should look minimal, but should look small and clear as we just call other functions from it
1111
- we do not use fill_value, but fill. Not fill_mask_value, but fill_mask
12+
- We do not have ANY default values in the InitSchema class

.cursor/rules/coding-guidelines.mdc

Lines changed: 40 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -390,6 +390,44 @@ Each transform must include an `InitSchema` class that inherits from `BaseTransf
390390
self.brightness_coefficient = brightness_coefficient
391391
```
392392

393+
#### No Default Values in InitSchema
394+
395+
**InitSchema classes must not contain default values for their fields.** This ensures that all transform parameters are explicitly provided and validated at initialization time.
396+
397+
```python
398+
# Correct - no default values in InitSchema
399+
class MyTransform(ImageOnlyTransform):
400+
class InitSchema(BaseTransformInitSchema):
401+
brightness_range: tuple[float, float]
402+
contrast_range: tuple[float, float]
403+
404+
def __init__(self, brightness_range: tuple[float, float] = (0.8, 1.2),
405+
contrast_range: tuple[float, float] = (0.8, 1.2), p: float = 0.5):
406+
# Default values go in __init__, not InitSchema
407+
super().__init__(p=p)
408+
self.brightness_range = brightness_range
409+
self.contrast_range = contrast_range
410+
411+
# Incorrect - default values in InitSchema
412+
class MyTransform(ImageOnlyTransform):
413+
class InitSchema(BaseTransformInitSchema):
414+
brightness_range: tuple[float, float] = (0.8, 1.2) # ❌ No defaults in InitSchema
415+
contrast_range: tuple[float, float] = (0.8, 1.2) # ❌ No defaults in InitSchema
416+
```
417+
418+
##### Exception: Discriminator Fields
419+
420+
The only exception to this rule is discriminator fields used for Pydantic discriminated unions, where the default value must match the literal type:
421+
422+
```python
423+
# Correct - discriminator field with matching default
424+
class UniformParams(NoiseParamsBase):
425+
noise_type: Literal["uniform"] = "uniform" # ✅ Required for discriminated unions
426+
ranges: list[Sequence[float]]
427+
```
428+
429+
This rule is enforced by a pre-commit hook that will flag any violations during development.
430+
393431
### Coordinate Systems
394432

395433
#### Image Center Calculations
@@ -468,8 +506,9 @@ Every transform class that is a descendant of `ImageOnlyTransform`, `DualTransfo
468506

469507
4. **Target-Specific Requirements**:
470508
- For `ImageOnlyTransform`: Pass and demonstrate transformation of image data. Including how to get all transformed targets.
471-
- For `DualTransform`: Pass and demonstrate transformation of image, mask, bboxes, keypoints, bbox_labels, class_labels (where supported). Including how to get all transformed targets
509+
- For `DualTransform`: Pass and demonstrate transformation of image, mask, bboxes, keypoints, bbox_labels, class_labels (where supported). Including how to get all transformed targets including bbox_labels and keypoints_labels
472510
- For `Transform3D`: Pass and demonstrate transformation of volume and mask3d data. Including how to get all transformed targets.
511+
Including keypoint_labels
473512

474513
5. **For Base Classes**: Examples for base classes should show:
475514
- How to initialize a custom transform that inherits from the base class

.pre-commit-config.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,13 @@ repos:
5555
files: ^albumentations/
5656
pass_filenames: true
5757
additional_dependencies: ["google-docstring-parser>=0.0.7"]
58+
- id: check-no-defaults-in-schemas
59+
name: Check no defaults in BaseModel schemas
60+
entry: python tools/check_no_defaults_in_schemas.py
61+
language: python
62+
types: [python]
63+
files: ^albumentations/
64+
pass_filenames: true
5865
- repo: local
5966
hooks:
6067
- id: check-albucore-version

albumentations/augmentations/crops/transforms.py

Lines changed: 72 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1498,6 +1498,12 @@ class _BaseRandomSizedCrop(DualTransform):
14981498
for image resizing. Default: cv2.INTER_LINEAR.
14991499
mask_interpolation (OpenCV flag): Flag that is used to specify the interpolation
15001500
algorithm for mask resizing. Default: cv2.INTER_NEAREST.
1501+
area_for_downscale (Literal[None, "image", "image_mask"]): Controls automatic use of INTER_AREA interpolation
1502+
for downscaling. Options:
1503+
- None: No automatic interpolation selection, always use the specified interpolation method
1504+
- "image": Use INTER_AREA when downscaling images, retain specified interpolation for upscaling and masks
1505+
- "image_mask": Use INTER_AREA when downscaling both images and masks
1506+
Default: None.
15011507
p (float): Probability of applying the transform. Default: 1.0.
15021508
15031509
Targets:
@@ -1510,6 +1516,8 @@ class _BaseRandomSizedCrop(DualTransform):
15101516
This class is not meant to be used directly. Instead, use derived transforms
15111517
like RandomSizedCrop or RandomResizedCrop that implement specific crop selection
15121518
strategies.
1519+
When area_for_downscale is set, INTER_AREA interpolation will be used automatically for
1520+
downscaling (when the crop is larger than the target size), which provides better quality for size reduction.
15131521
15141522
Examples:
15151523
>>> import numpy as np
@@ -1524,12 +1532,14 @@ class _BaseRandomSizedCrop(DualTransform):
15241532
... custom_parameter=0.5,
15251533
... interpolation=cv2.INTER_LINEAR,
15261534
... mask_interpolation=cv2.INTER_NEAREST,
1535+
... area_for_downscale="image",
15271536
... p=1.0
15281537
... ):
15291538
... super().__init__(
15301539
... size=size,
15311540
... interpolation=interpolation,
15321541
... mask_interpolation=mask_interpolation,
1542+
... area_for_downscale=area_for_downscale,
15331543
... p=p,
15341544
... )
15351545
... self.custom_parameter = custom_parameter
@@ -1560,7 +1570,7 @@ class _BaseRandomSizedCrop(DualTransform):
15601570
>>>
15611571
>>> # Create a pipeline with our custom transform
15621572
>>> transform = A.Compose(
1563-
... [CustomRandomCrop(size=(64, 64), custom_parameter=0.6)],
1573+
... [CustomRandomCrop(size=(64, 64), custom_parameter=0.6, area_for_downscale="image")],
15641574
... bbox_params=A.BboxParams(format='pascal_voc', label_fields=['bbox_labels']),
15651575
... keypoint_params=A.KeypointParams(format='xy', label_fields=['keypoint_labels'])
15661576
... )
@@ -1604,6 +1614,7 @@ class InitSchema(BaseRandomSizedCropInitSchema):
16041614
cv2.INTER_LANCZOS4,
16051615
cv2.INTER_LINEAR_EXACT,
16061616
]
1617+
area_for_downscale: Literal[None, "image", "image_mask"]
16071618

16081619
def __init__(
16091620
self,
@@ -1626,12 +1637,39 @@ def __init__(
16261637
cv2.INTER_LANCZOS4,
16271638
cv2.INTER_LINEAR_EXACT,
16281639
] = cv2.INTER_NEAREST,
1640+
area_for_downscale: Literal[None, "image", "image_mask"] = None,
16291641
p: float = 1.0,
16301642
):
16311643
super().__init__(p=p)
16321644
self.size = size
16331645
self.interpolation = interpolation
16341646
self.mask_interpolation = mask_interpolation
1647+
self.area_for_downscale = area_for_downscale
1648+
1649+
def _get_interpolation_for_resize(self, crop_shape: tuple[int, int], target_type: str) -> int:
1650+
"""Get the appropriate interpolation method for resizing.
1651+
1652+
Args:
1653+
crop_shape: Shape of the crop (height, width)
1654+
target_type: Either "image" or "mask" to determine base interpolation
1655+
1656+
Returns:
1657+
OpenCV interpolation flag
1658+
1659+
"""
1660+
crop_height, crop_width = crop_shape
1661+
target_height, target_width = self.size
1662+
1663+
# Determine if this is downscaling
1664+
is_downscale = (crop_height > target_height) or (crop_width > target_width)
1665+
1666+
# Use INTER_AREA for downscaling if configured
1667+
if (is_downscale and (target_type == "image" and self.area_for_downscale in ["image", "image_mask"])) or (
1668+
target_type == "mask" and self.area_for_downscale == "image_mask"
1669+
):
1670+
return cv2.INTER_AREA
1671+
# Get base interpolation
1672+
return self.interpolation if target_type == "image" else self.mask_interpolation
16351673

16361674
def apply(
16371675
self,
@@ -1648,7 +1686,8 @@ def apply(
16481686
16491687
"""
16501688
crop = fcrops.crop(img, *crop_coords)
1651-
return fgeometric.resize(crop, self.size, self.interpolation)
1689+
interpolation = self._get_interpolation_for_resize(crop.shape[:2], "image")
1690+
return fgeometric.resize(crop, self.size, interpolation)
16521691

16531692
def apply_to_mask(
16541693
self,
@@ -1665,7 +1704,8 @@ def apply_to_mask(
16651704
16661705
"""
16671706
crop = fcrops.crop(mask, *crop_coords)
1668-
return fgeometric.resize(crop, self.size, self.mask_interpolation)
1707+
interpolation = self._get_interpolation_for_resize(crop.shape[:2], "mask")
1708+
return fgeometric.resize(crop, self.size, interpolation)
16691709

16701710
def apply_to_bboxes(
16711711
self,
@@ -1731,8 +1771,11 @@ def apply_to_images(
17311771
# First crop the volume using volume_crop_yx (reduces data size)
17321772
crop = fcrops.volume_crop_yx(images, *crop_coords)
17331773

1734-
# Then resize the smaller cropped volume using decorated helper method
1735-
return np.stack([fgeometric.resize(crop[i], self.size, self.interpolation) for i in range(images.shape[0])])
1774+
# Get interpolation method based on crop dimensions
1775+
interpolation = self._get_interpolation_for_resize(crop.shape[1:3], "image")
1776+
1777+
# Then resize the smaller cropped volume using the selected interpolation
1778+
return np.stack([fgeometric.resize(crop[i], self.size, interpolation) for i in range(images.shape[0])])
17361779

17371780
def apply_to_volume(
17381781
self,
@@ -1783,6 +1826,12 @@ class RandomSizedCrop(_BaseRandomSizedCrop):
17831826
mask_interpolation (OpenCV flag): Flag that is used to specify the interpolation algorithm for mask.
17841827
Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
17851828
Default: cv2.INTER_NEAREST.
1829+
area_for_downscale (Literal[None, "image", "image_mask"]): Controls automatic use of INTER_AREA interpolation
1830+
for downscaling. Options:
1831+
- None: No automatic interpolation selection, always use the specified interpolation method
1832+
- "image": Use INTER_AREA when downscaling images, retain specified interpolation for upscaling and masks
1833+
- "image_mask": Use INTER_AREA when downscaling both images and masks
1834+
Default: None.
17861835
p (float): Probability of applying the transform. Default: 1.0
17871836
17881837
Targets:
@@ -1799,6 +1848,8 @@ class RandomSizedCrop(_BaseRandomSizedCrop):
17991848
- Keypoints that end up outside the cropped area will be removed.
18001849
- This transform differs from RandomResizedCrop in that it allows more control over the crop size
18011850
through the 'min_max_height' parameter, rather than using a scale parameter.
1851+
- When area_for_downscale is set, INTER_AREA interpolation will be used automatically for
1852+
downscaling (when the crop is larger than the target size), which provides better quality for size reduction.
18021853
18031854
Mathematical Details:
18041855
1. A random crop height h is sampled from the range [min_max_height[0], min_max_height[1]].
@@ -1828,6 +1879,7 @@ class RandomSizedCrop(_BaseRandomSizedCrop):
18281879
... w2h_ratio=1.0,
18291880
... interpolation=cv2.INTER_LINEAR,
18301881
... mask_interpolation=cv2.INTER_NEAREST,
1882+
... area_for_downscale="image", # Use INTER_AREA for image downscaling
18311883
... p=1.0
18321884
... ),
18331885
... ], bbox_params=A.BboxParams(format='pascal_voc', label_fields=['bbox_labels']),
@@ -1877,6 +1929,7 @@ class InitSchema(BaseTransformInitSchema):
18771929
min_max_height: OnePlusIntRangeType
18781930
w2h_ratio: Annotated[float, Field(gt=0)]
18791931
size: Annotated[tuple[int, int], AfterValidator(check_range_bounds(1, None))]
1932+
area_for_downscale: Literal[None, "image", "image_mask"]
18801933

18811934
def __init__(
18821935
self,
@@ -1901,12 +1954,14 @@ def __init__(
19011954
cv2.INTER_LANCZOS4,
19021955
cv2.INTER_LINEAR_EXACT,
19031956
] = cv2.INTER_NEAREST,
1957+
area_for_downscale: Literal[None, "image", "image_mask"] = None,
19041958
p: float = 1.0,
19051959
):
19061960
super().__init__(
19071961
size=size,
19081962
interpolation=interpolation,
19091963
mask_interpolation=mask_interpolation,
1964+
area_for_downscale=area_for_downscale,
19101965
p=p,
19111966
)
19121967
self.min_max_height = min_max_height
@@ -1960,6 +2015,12 @@ class RandomResizedCrop(_BaseRandomSizedCrop):
19602015
mask_interpolation (OpenCV flag): Flag that is used to specify the interpolation algorithm for mask.
19612016
Should be one of: cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4.
19622017
Default: cv2.INTER_NEAREST
2018+
area_for_downscale (Literal[None, "image", "image_mask"]): Controls automatic use of INTER_AREA interpolation
2019+
for downscaling. Options:
2020+
- None: No automatic interpolation selection, always use the specified interpolation method
2021+
- "image": Use INTER_AREA when downscaling images, retain specified interpolation for upscaling and masks
2022+
- "image_mask": Use INTER_AREA when downscaling both images and masks
2023+
Default: None.
19632024
p (float): Probability of applying the transform. Default: 1.0
19642025
19652026
Targets:
@@ -1976,6 +2037,8 @@ class RandomResizedCrop(_BaseRandomSizedCrop):
19762037
- Bounding boxes that end up fully outside the cropped area will be removed.
19772038
- Keypoints that end up outside the cropped area will be removed.
19782039
- After cropping, the result is resized to the specified size.
2040+
- When area_for_downscale is set, INTER_AREA interpolation will be used automatically for
2041+
downscaling (when the crop is larger than the target size), which provides better quality for size reduction.
19792042
19802043
Mathematical Details:
19812044
1. A target area A is sampled from the range [scale[0] * input_area, scale[1] * input_area].
@@ -2009,6 +2072,7 @@ class RandomResizedCrop(_BaseRandomSizedCrop):
20092072
... ratio=(0.75, 1.33), # Aspect ratio will vary from 3:4 to 4:3
20102073
... interpolation=cv2.INTER_LINEAR,
20112074
... mask_interpolation=cv2.INTER_NEAREST,
2075+
... area_for_downscale="image", # Use INTER_AREA for image downscaling
20122076
... p=1.0
20132077
... ),
20142078
... ], bbox_params=A.BboxParams(format='pascal_voc', label_fields=['bbox_labels']),
@@ -2062,6 +2126,7 @@ class InitSchema(BaseTransformInitSchema):
20622126
cv2.INTER_LANCZOS4,
20632127
cv2.INTER_LINEAR_EXACT,
20642128
]
2129+
area_for_downscale: Literal[None, "image", "image_mask"]
20652130

20662131
def __init__(
20672132
self,
@@ -2086,12 +2151,14 @@ def __init__(
20862151
cv2.INTER_LANCZOS4,
20872152
cv2.INTER_LINEAR_EXACT,
20882153
] = cv2.INTER_NEAREST,
2154+
area_for_downscale: Literal[None, "image", "image_mask"] = None,
20892155
p: float = 1.0,
20902156
):
20912157
super().__init__(
20922158
size=size,
20932159
interpolation=interpolation,
20942160
mask_interpolation=mask_interpolation,
2161+
area_for_downscale=area_for_downscale,
20952162
p=p,
20962163
)
20972164
self.scale = scale

albumentations/augmentations/geometric/resize.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -375,7 +375,7 @@ class MaxSizeTransform(DualTransform):
375375
class InitSchema(BaseTransformInitSchema):
376376
max_size: int | list[int] | None
377377
max_size_hw: tuple[int | None, int | None] | None
378-
area_for_downscale: Literal[None, "image", "image_mask"] = None
378+
area_for_downscale: Literal[None, "image", "image_mask"]
379379
interpolation: Literal[
380380
cv2.INTER_NEAREST,
381381
cv2.INTER_NEAREST_EXACT,
@@ -833,7 +833,7 @@ class Resize(DualTransform):
833833
class InitSchema(BaseTransformInitSchema):
834834
height: int = Field(ge=1)
835835
width: int = Field(ge=1)
836-
area_for_downscale: Literal[None, "image", "image_mask"] = None
836+
area_for_downscale: Literal[None, "image", "image_mask"]
837837
interpolation: Literal[
838838
cv2.INTER_NEAREST,
839839
cv2.INTER_NEAREST_EXACT,

albumentations/augmentations/geometric/transforms.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1184,8 +1184,8 @@ class InitSchema(BaseTransformInitSchema):
11841184
cv2.BORDER_REFLECT_101,
11851185
]
11861186

1187-
fill: tuple[float, ...] | float = 0
1188-
fill_mask: tuple[float, ...] | float = 0
1187+
fill: tuple[float, ...] | float
1188+
fill_mask: tuple[float, ...] | float
11891189

11901190
shift_limit_x: tuple[float, float] | float | None
11911191
shift_limit_y: tuple[float, float] | float | None

albumentations/augmentations/mixing/domain_adaptation.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,8 +37,8 @@
3737

3838
# Base class for Domain Adaptation Init Schema
3939
class BaseDomainAdaptationInitSchema(BaseTransformInitSchema):
40-
reference_images: Sequence[Any] | None = None
41-
read_fn: Callable[[Any], np.ndarray] | None = None
40+
reference_images: Sequence[Any] | None
41+
read_fn: Callable[[Any], np.ndarray] | None
4242
metadata_key: str
4343

4444
@model_validator(mode="after")

0 commit comments

Comments
 (0)