You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
iohub.ngff.utils.apply_transform_to_tczyx_and_save / process_single_position write results via arr._impl.write_oindex(...) in _save_transformed (src/iohub/ngff/utils.py:190). On OME-Zarr v0.5 (sharded) stores this can surface an upstream zarr-python bug with a shape-mismatch in the sharding codec merge step.
An earlier draft of this issue implied that any sharded-v0.5 write path was broken. That was wrong. iohub's default code path is fine — the bug only surfaces in one specific corner case (details below).
What actually triggers the bug
iohub pins zarrs>=0.2.3 as a required dependency, so zarrs.ZarrsCodecPipeline is the active codec pipeline by default. ZarrsCodecPipeline handles most oindex writes into sharded arrays correctly. But it does not support every indexer pattern — when it can't lower a write to Rust, it raises UnsupportedVIndexingError and falls back to zarr-python's BatchedCodecPipeline. The sharding codec in the fallback pipeline has the bug tracked in zarr-developers/zarr-python#2834.
Concrete behavior on a sharded v0.5 store, measured under iohub's default config:
Indexer pattern on oindex[...] = value
Result
Unique, contiguous indices, e.g. ([0, 1], [0])
✅ works
Unique, non-contiguous indices, e.g. ([0, 2], [0])
✅ works
Duplicate indices, e.g. ([0, 0], [0])
❌ ValueError: shape mismatch
So the failure surface is narrow: an oindex write whose indexer contains duplicates (or hits one of the other patterns listed in the zarrs README that fall back to the default pipeline) on a sharded Zarr v3 array.
How this surfaced in tests
tests/ngff/test_ngff_utils.py::test_apply_transform_to_tczyx_and_save was failing intermittently under the new #401 defaults. Root cause:
The hypothesis strategy for time_indices / channel_indices doesn't set unique=True, so it sometimes draws duplicate integers (e.g. [0, 0], [2, 2, 2]).
With duplicate indices + sharding, the write falls back to BatchedCodecPipeline and hits zarr-python#2834.
It wasn't caught on PR #403's CI because max_examples=5 — with 5 random draws, duplicate-index cases sometimes weren't sampled. Locally, once hypothesis's database had recorded a failing example, it replayed it deterministically; a bump to max_examples=200 reproduces the failure reliably.
Upstream
Tracked at zarr-developers/zarr-python#2834 — "bug with setitem with oindex and sharding". Still open; a WIP fix exists but hasn't landed.
Options
Narrow the test strategy — set unique=True on time_indices and channel_indices in plate_setup / apply_transform_czyx_setup. Semantically, writing the same transformed result to the same index twice isn't a real use case — this just aligns the strategy with what callers actually do. Low-risk, unblocks Set version-specific default chunk and shard sizes in create_empty_plate #401.
Guard _save_transformed — detect duplicate indices and either deduplicate them or fall back to a per-(t, c) basic-slice write. Defensive, slight overhead, handles the case correctly even if a user hits it.
Wait on upstream — do nothing iohub-side, track zarr-python#2834.
Recommendation: (1) is sufficient for unblocking #401. (2) is worth doing if we want iohub to behave correctly even for pathological caller inputs.
Summary (revised)
iohub.ngff.utils.apply_transform_to_tczyx_and_save/process_single_positionwrite results viaarr._impl.write_oindex(...)in_save_transformed(src/iohub/ngff/utils.py:190). On OME-Zarr v0.5 (sharded) stores this can surface an upstream zarr-python bug with a shape-mismatch in the sharding codec merge step.An earlier draft of this issue implied that any sharded-v0.5 write path was broken. That was wrong. iohub's default code path is fine — the bug only surfaces in one specific corner case (details below).
What actually triggers the bug
iohub pins
zarrs>=0.2.3as a required dependency, sozarrs.ZarrsCodecPipelineis the active codec pipeline by default.ZarrsCodecPipelinehandles mostoindexwrites into sharded arrays correctly. But it does not support every indexer pattern — when it can't lower a write to Rust, it raisesUnsupportedVIndexingErrorand falls back to zarr-python'sBatchedCodecPipeline. The sharding codec in the fallback pipeline has the bug tracked in zarr-developers/zarr-python#2834.Concrete behavior on a sharded v0.5 store, measured under iohub's default config:
oindex[...] = value([0, 1], [0])([0, 2], [0])([0, 0], [0])ValueError: shape mismatchSo the failure surface is narrow: an
oindexwrite whose indexer contains duplicates (or hits one of the other patterns listed in the zarrs README that fall back to the default pipeline) on a sharded Zarr v3 array.How this surfaced in tests
tests/ngff/test_ngff_utils.py::test_apply_transform_to_tczyx_and_savewas failing intermittently under the new #401 defaults. Root cause:time_indices/channel_indicesdoesn't setunique=True, so it sometimes draws duplicate integers (e.g.[0, 0],[2, 2, 2]).create_empty_plate#401 changescreate_empty_plateto default to sharded v0.5 stores, so the output array now has a sharding codec.BatchedCodecPipelineand hits zarr-python#2834.It wasn't caught on PR #403's CI because
max_examples=5— with 5 random draws, duplicate-index cases sometimes weren't sampled. Locally, once hypothesis's database had recorded a failing example, it replayed it deterministically; a bump tomax_examples=200reproduces the failure reliably.Upstream
Tracked at zarr-developers/zarr-python#2834 — "bug with setitem with oindex and sharding". Still open; a WIP fix exists but hasn't landed.
Options
unique=Trueontime_indicesandchannel_indicesinplate_setup/apply_transform_czyx_setup. Semantically, writing the same transformed result to the same index twice isn't a real use case — this just aligns the strategy with what callers actually do. Low-risk, unblocks Set version-specific default chunk and shard sizes increate_empty_plate#401._save_transformed— detect duplicate indices and either deduplicate them or fall back to a per-(t, c) basic-slice write. Defensive, slight overhead, handles the case correctly even if a user hits it.Recommendation: (1) is sufficient for unblocking #401. (2) is worth doing if we want iohub to behave correctly even for pathological caller inputs.
Repro (isolates the duplicate-index fallback)