Skip to content

Commit efff544

Browse files
edyoshikunclaude
andauthored
feat(tensorstore): expose recheck_cached_data on TensorStoreConfig (#406)
* feat(tensorstore): expose recheck_cached_data on TensorStoreConfig Add ``recheck_cached_data`` to ``TensorStoreConfig`` and forward it into ``ts.open`` in ``TensorStoreImplementation.open_array``. The option controls whether cached chunk data is revalidated on every read (the TensorStore driver default) or only at open time (``"open"``), which is the recommended setting for long-running read-heavy workloads on networked filesystems (NFS/VAST) where revalidation costs one stat/GETATTR per chunk per read. ``None`` (default) preserves existing behaviour by omitting the kwarg so the TensorStore driver keeps its own default. ``True``, ``False``, and ``"open"`` are forwarded verbatim. Covered by a parametrized test that monkey-patches ``_ts_open`` to assert the kwarg reaches TensorStore for each configured value and is absent when unset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * delete redudant text --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 40b5f2a commit efff544

2 files changed

Lines changed: 30 additions & 8 deletions

File tree

src/iohub/core/config.py

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,26 @@ class ZarrConfig(BaseModel):
3434

3535

3636
class TensorStoreConfig(BaseModel):
37-
"""Config for the TensorStore implementation."""
37+
"""Config for the TensorStore implementation.
38+
39+
Parameters
40+
----------
41+
file_io_concurrency : int or None
42+
Concurrency limit for TensorStore's ``file_io_concurrency``
43+
resource. Raise above the default (32) on high-latency networked
44+
filesystems (e.g. NFS) where the default under-saturates the link.
45+
cache_pool_bytes : int or None
46+
Aggregate byte budget for TensorStore's chunk cache pool. ``None``
47+
disables caching.
48+
recheck_cached_data : bool, "open" or None
49+
Controls whether cached chunk data is re-validated on each read.
50+
``None`` (default) uses the TensorStore driver default, which
51+
revalidates cached metadata on every access — one stat/GETATTR per
52+
chunk. ``"open"`` checks freshness only when the array is opened
53+
and trusts the cache thereafter — recommended for long-running
54+
read-heavy workloads on NFS/VAST where the underlying zarr files
55+
do not change. ``False`` disables freshness checks entirely.
56+
"""
3857

3958
compressor: CompressorConfig = Field(default_factory=CompressorConfig)
4059
data_copy_concurrency: int = Field(default=4, ge=1)
@@ -43,6 +62,7 @@ class TensorStoreConfig(BaseModel):
4362
file_io_sync: bool = True
4463
file_io_locking: Literal["auto", "disabled"] = "auto"
4564
cache_pool_bytes: int | None = None
65+
recheck_cached_data: bool | Literal["open"] | None = None
4666
extra_context: dict | None = None
4767

4868

src/iohub/core/implementations/tensorstore.py

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -193,13 +193,15 @@ def open_array(self, group: zarr.Group, name: str) -> ts.TensorStore:
193193
"driver": driver,
194194
"kvstore": {"driver": "file", "path": key},
195195
}
196-
self._array_cache[key] = _ts_open(
197-
spec,
198-
open=True,
199-
read=True,
200-
write=writable,
201-
context=self._context(),
202-
)
196+
open_kwargs: dict[str, Any] = {
197+
"open": True,
198+
"read": True,
199+
"write": writable,
200+
"context": self._context(),
201+
}
202+
if self.config.recheck_cached_data is not None:
203+
open_kwargs["recheck_cached_data"] = self.config.recheck_cached_data
204+
self._array_cache[key] = _ts_open(spec, **open_kwargs)
203205
return self._array_cache[key]
204206

205207
# -- Array I/O ---------------------------------------------------------

0 commit comments

Comments
 (0)