Performance regression: open_ome_zarr in append/read mode parses all plate metadata on every call
Summary
Upgrading from iohub 0.2.1 to 0.3.0a6 caused a ~50% slowdown in a parallelized workflow that repeatedly calls open_ome_zarr() on a large HCS plate store. The root cause is that open_ome_zarr() now eagerly parses the full plate metadata (all positions) into pydantic models on every call, even when only a single position is needed.
Environment
| Package |
Old (working) |
New (slow) |
| iohub |
0.2.1 |
0.3.0a6 |
| zarr-python |
2.17.2 |
3.1.5 |
| numpy |
1.26.4 |
2.2.6 |
| Python |
3.12 |
3.12 |
Platform: HPC cluster with network filesystem (GPFS/Lustre), SLURM scheduler.
Reproduction
We have an HCS-OME-Zarr plate store with 7,035 positions and run a joblib Parallel loop that processes each position independently. Each worker needs to read one source position and write to one destination position:
from iohub.ngff import open_ome_zarr
from joblib import Parallel, delayed
def _process_position(i):
# Opens the ENTIRE plate store just to write one position
ds_w = open_ome_zarr(dest_path, mode="a") # <-- expensive
with open_ome_zarr(source_path, mode="r") as ds: # <-- expensive
data = np.asarray(ds[position_name].data)
# ... augment data ...
ds_w[position_name]["0"][:] = output
ds_w.close()
Parallel(n_jobs=63)(
delayed(_process_position)(i) for i in range(7035)
)
With iohub 0.2.1 this ran at ~8 it/s. With iohub 0.3.0a6 it dropped to ~4 it/s.
Root cause analysis
We traced the slowdown to Plate._parse_meta() in iohub/ngff/nodes.py:
# nodes.py ~line 1815-1825
def _parse_meta(self):
if plate_meta := self.maybe_wrapped_ome_attrs.get("plate"):
self.metadata = PlateMeta(**plate_meta) # Instantiates WellIndexMeta for ALL positions
for attr in ("_channel_names", "axes"):
if not hasattr(self, attr):
self._first_pos_attr(attr)
Every call to open_ome_zarr(plate_store, mode="a"):
- Reads the root
.zattrs file (~500KB JSON containing metadata for all 7,035 positions)
- Parses the entire plate metadata into a
PlateMeta pydantic model, which instantiates 7,035 WellIndexMeta objects
- This happens on the network filesystem, so step 1 is an I/O round-trip every time
In our workflow this means: 7,035 calls x (500KB network read + 7,035 pydantic object instantiations) = ~49 million pydantic model creations and ~3.4 GB of redundant .zattrs reads.
In iohub 0.2.1, open_ome_zarr used a lighter-weight internal representation that did not eagerly parse all positions into pydantic models, so this overhead did not exist.
Impact
- ~50% throughput reduction for parallel per-position processing on large plates
- Scales quadratically with position count: O(N) calls each doing O(N) metadata parsing = O(N^2) total work
- Particularly severe on network filesystems (HPC clusters) where each
.zattrs read has latency
- Workers that stopped with memory warnings due to redundant metadata copies in RAM
Workaround
We bypassed iohub in the parallel workers and use direct zarr.open() to access individual position arrays:
import zarr
def _process_position(i):
# Direct zarr access - only opens the single position's array
src_arr = zarr.open(str(source_path / position / "0"), mode="r")
dest_arr = zarr.open(str(dest_path / position / "0"), mode="r+")
# ... process and write ...
This eliminates the plate metadata parsing entirely and restores the original performance.
Suggested fix
Consider one or more of:
-
Lazy metadata parsing: Don't parse PlateMeta on open_ome_zarr(). Parse it on first access to plate-level properties instead.
-
Provide a lightweight position-only opener: Something like open_ome_zarr(path, mode="a", position="A/1/000042") that skips plate-level metadata entirely and returns a Position handle directly.
-
Cache/memoize plate metadata: If the same store is opened repeatedly (common in parallel workflows), cache the parsed PlateMeta so it's only read and parsed once per process.
-
Avoid pydantic for hot-path metadata: The PlateMeta(**plate_meta) call with 7,035 nested WellIndexMeta objects is CPU-expensive. Consider using plain dicts or dataclasses for internal representation when full validation isn't needed.
Performance regression:
open_ome_zarrin append/read mode parses all plate metadata on every callSummary
Upgrading from iohub 0.2.1 to 0.3.0a6 caused a ~50% slowdown in a parallelized workflow that repeatedly calls
open_ome_zarr()on a large HCS plate store. The root cause is thatopen_ome_zarr()now eagerly parses the full plate metadata (all positions) into pydantic models on every call, even when only a single position is needed.Environment
Platform: HPC cluster with network filesystem (GPFS/Lustre), SLURM scheduler.
Reproduction
We have an HCS-OME-Zarr plate store with 7,035 positions and run a joblib
Parallelloop that processes each position independently. Each worker needs to read one source position and write to one destination position:With iohub 0.2.1 this ran at ~8 it/s. With iohub 0.3.0a6 it dropped to ~4 it/s.
Root cause analysis
We traced the slowdown to
Plate._parse_meta()iniohub/ngff/nodes.py:Every call to
open_ome_zarr(plate_store, mode="a"):.zattrsfile (~500KB JSON containing metadata for all 7,035 positions)PlateMetapydantic model, which instantiates 7,035WellIndexMetaobjectsIn our workflow this means: 7,035 calls x (500KB network read + 7,035 pydantic object instantiations) = ~49 million pydantic model creations and ~3.4 GB of redundant
.zattrsreads.In iohub 0.2.1,
open_ome_zarrused a lighter-weight internal representation that did not eagerly parse all positions into pydantic models, so this overhead did not exist.Impact
.zattrsread has latencyWorkaround
We bypassed iohub in the parallel workers and use direct
zarr.open()to access individual position arrays:This eliminates the plate metadata parsing entirely and restores the original performance.
Suggested fix
Consider one or more of:
Lazy metadata parsing: Don't parse
PlateMetaonopen_ome_zarr(). Parse it on first access to plate-level properties instead.Provide a lightweight position-only opener: Something like
open_ome_zarr(path, mode="a", position="A/1/000042")that skips plate-level metadata entirely and returns aPositionhandle directly.Cache/memoize plate metadata: If the same store is opened repeatedly (common in parallel workflows), cache the parsed
PlateMetaso it's only read and parsed once per process.Avoid pydantic for hot-path metadata: The
PlateMeta(**plate_meta)call with 7,035 nestedWellIndexMetaobjects is CPU-expensive. Consider using plain dicts or dataclasses for internal representation when full validation isn't needed.