Skip to content

Commit 2d1c5fb

Browse files
ziw-liualiddellieivanov
authored
Parallel writing to shards (#311)
* update signature type hints * clarify the type of position keys * expose sharding in plate creation * isort * fix default version * fix example code block format * fix type hints * utility to split indices by shards * separate apply and save * simplify random number generation * wip: batched writing in time * match storage keys with values * removed unused argument * fix import * fix string formatting * add more version testing * use tensorstore instead * control tensorstore concurrency * memory management * isort * format * remove platform check * warning if shards is specified for 0.4 * set shards to none for 0.4 * Update acquire-zarr OME v0.5 fixture / aqz test to reflect new config API and downsampling behavior * add notes about upstream issues * add example of sharded plate * explicitly add layout in open_ome_zarr * Fix tensorstore empty array handling (#326) * Fix tensorstore empty array handling - Add validation for empty arrays in _save_transformed before tensorstore write - Skip write operations for empty arrays with warning messages - Add comprehensive error handling with detailed diagnostics for tensorstore failures - Improve error messages to include array shapes, sizes, and tensorstore details This resolves the ValueError: Error aligning dimensions issue when empty arrays are passed to tensorstore write operations. * Add empty results check to prevent tensorstore alignment errors Adds validation in apply_transform_to_tczyx_and_save() to check for empty results dictionary before calling _save_transformed(). When no valid time points are available, logs diagnostic message and skips write operation instead of attempting to write empty arrays to tensorstore, which causes alignment dimension mismatches. * Revert "Fix tensorstore empty array handling" This reverts commit 65c9ddb. * better handling of output_time_indices * style * bugfix and better type hints * better messaging * style * raise error if attempting sharding along channel dimension --------- Co-authored-by: Alan Liddell <aliddell@chanzuckerberg.com> Co-authored-by: Ivan Ivanov <ivan.ivanov@czbiohub.org>
1 parent 8713e9b commit 2d1c5fb

5 files changed

Lines changed: 561 additions & 124 deletions

File tree

iohub/ngff/nodes.py

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -383,7 +383,7 @@ def dask_array(self):
383383
def downscale(self):
384384
raise NotImplementedError
385385

386-
def tensorstore(self):
386+
def tensorstore(self, concurrency: int | None = None):
387387
"""Open the zarr array as a TensorStore object.
388388
Needs the optional dependency ``tensorstore``.
389389
@@ -396,10 +396,20 @@ def tensorstore(self):
396396

397397
ts_spec = {
398398
"driver": "zarr2" if self.metadata.zarr_format == 2 else "zarr3",
399-
"kvstore": (Path(self.store.root) / self.path.strip("/")).as_uri(),
399+
"kvstore": {
400+
"driver": "file",
401+
"path": str(Path(self.store.root) / self.path.strip("/")),
402+
},
400403
}
401404
zarr_dataset = ts.open(
402-
ts_spec, read=True, write=not self.read_only
405+
ts_spec,
406+
read=True,
407+
write=not self.read_only,
408+
context=(
409+
ts.Context({"data_copy_concurrency": {"limit": concurrency}})
410+
if concurrency
411+
else None
412+
),
403413
).result()
404414
return zarr_dataset
405415

0 commit comments

Comments
 (0)