Skip to content

pef: Lazy metadata access#381

Merged
gav-sturm merged 1 commit into
mainfrom
perf/lazy-open
Feb 28, 2026
Merged

pef: Lazy metadata access#381
gav-sturm merged 1 commit into
mainfrom
perf/lazy-open

Conversation

@srivarra
Copy link
Copy Markdown
Collaborator

@srivarra srivarra commented Feb 28, 2026

Why: open_ome_zarr() on HCS plates had a throughput regression caused by two things in the metadata parsing path:

  1. _first_pos_attr() used next(group.groups()) to find the first position's channel_names/axes, but zarr v3's groups() eagerly enumerates all children before yielding the first; scaling linearly with FOVs/well (0.2ms for 1 FOV, 259ms for 2500)
  2. unique_validator() built pandas DataFrames just to check field uniqueness

Changes:

  • Made axes and channel_names lazy & cached via functools.cached_property on NGFFNode — they resolve on first access, not during __init__
  • On Plate, overrode both with cached_property that uses direct path access (self.zgroup[well_path][pos_name]) instead of groups() traversal
  • Replaced unique_validator's model_dump() + pandas DataFrame with a simple set length check, removed the pandas import

Before & After: on some test data

Plate Wells FOVs/well Before After Speedup
tiny 1 1 3.3 ms 0.63 ms 5.2x
few_wells 6 1 3.6 ms 0.71 ms 5.1x
96w_single 96 1 7.2 ms 0.78 ms 9.2x
96w_tiled 96 25 12.0 ms 0.90 ms 13.3x
25w_100fov 25 100 25.2 ms 0.68 ms 37.1x
384w_single 384 1 11.3 ms 1.13 ms 10.0x
dense_tile 1 2,500 535.8 ms 0.67 ms 799x

…ion and eliminate pandas from validation hot path

Signed-off-by: Sricharan Reddy Varra <sricharan.varra@biohub.org>
@srivarra srivarra requested a review from gav-sturm February 28, 2026 01:49
@srivarra srivarra marked this pull request as ready for review February 28, 2026 01:49
Copy link
Copy Markdown

@gav-sturm gav-sturm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Has my okay and thanks for such a rapid turnaround. Will benefit a lot of our runs.

@gav-sturm gav-sturm merged commit 610d8a4 into main Feb 28, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Potential slowdown found in open_ome_zarr() after upgradgin from 0.2.1 to 0.3.0a6

2 participants