Commit 1c739a2
refactor: add num_workers/use_threads to process_single_position (#410)
* refactor: add num_workers/use_threads to process_single_position
PR #396 replaced mp.Pool with ThreadPoolExecutor on the assumption that
the transforms passed to process_single_position release the GIL and
threads suffice. That holds for I/O-bound callers, but not for
tensor-heavy CPU torch workloads (deskew, register, deconvolve): under
threads, all concurrent task allocations live in one address space, and
torch's CPU caching allocator never returns memory to the OS, so peak
RSS climbs past the slurm cgroup limit. Process workers are still
needed for those cases.
Introduce two new public params and deprecate the old ones:
* num_workers (default 1) — replaces num_processes (#396 already
deprecated this) and num_threads. Both legacy names emit a
DeprecationWarning and forward to num_workers.
* use_threads (default False) — pick between ThreadPoolExecutor and
ProcessPoolExecutor.
Behaviour:
* num_workers <= 1 -> serial loop in the calling process (matches the
short-circuit added in #396).
* num_workers > 1, use_threads=True -> ThreadPoolExecutor (the #396
default).
* num_workers > 1, use_threads=False -> ProcessPoolExecutor with the
spawn context (the new default).
Two reasons to use ProcessPoolExecutor (and not mp.Pool, like before
#396):
1. Silent worker death — a slurm cgroup OOM-kill of one worker leaves
mp.Pool.starmap waiting forever for a result that never comes.
ProcessPoolExecutor surfaces this as BrokenProcessPool, so the
slurm job fails fast with a real traceback instead of hanging
until walltime.
2. Spawn (not fork) — tensorstore's internal C++ threads aren't
fork-safe (google/tensorstore#61), and multiprocessing defaults
to fork on Linux.
Verified end-to-end on a 57-timepoint deskew run (171 (T,C) tasks per
fov, 8 workers): both pool variants and the serial path produce
bit-identical output, and an intentional OOM under PPE fails within
~1 minute with BrokenProcessPool instead of hanging.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor: close input zarr handle and short-circuit nan/zero check
Two small cleanups in iohub.ngff.utils that surfaced while debugging
deskew memory pressure:
1. `_apply_transform_to_czyx` opens the input zarr without a context
manager, leaking the zarr group / metadata cache for the lifetime
of the worker. Wrap in `with open_ome_zarr(...)` so the handle is
released after each task. No measurable memory effect at the
cgroup level — file-handle hygiene fix; matters most for very long
task queues.
2. `_check_nan_n_zeros` materialised a full boolean mask of the input
volume (via `np.all(arr == 0)`) before reducing it. Replace with
`np.any(arr)`, which short-circuits in the numpy C reduction kernel
as soon as it sees a truthy element and does not allocate a temp
mask. The all-NaN branch only runs when `np.any` returned True
(i.e. the array contains content or NaNs); skip it entirely for
integer dtypes that can't represent NaN.
Behaviour-preserving: produces the same return value as the previous
implementation for all 3D and 4D inputs, including the per-channel
"any channel empty" semantics for 4D arrays. Verified end-to-end on
the deskew workload; bit-identical outputs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test: parametrise test_process_single_position over num_workers/use_threads
Renames the hypothesis strategy from `num_threads` to `num_workers` to
match the new public API, and adds a `use_threads` boolean strategy
so the test exercises both the ProcessPoolExecutor (default) and
ThreadPoolExecutor paths. The old test only covered serial + threads.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test: assert num_processes/num_threads emit DeprecationWarning
Adds a parametrized regression test that asserts both legacy kwargs
trigger a DeprecationWarning when forwarded to num_workers. The
warnings are otherwise invisible at runtime under Python's default
filter (which suppresses DeprecationWarning raised from package code),
so this is the only practical way to catch a future accidental
removal of the shim.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test: put repo root on PYTHONPATH so spawn workers can import tests/
`test_process_single_position` parametrises over `use_threads ∈ {True,
False}`. With `use_threads=False`, iohub spins up a `ProcessPoolExecutor`
with the `spawn` context. Spawn children re-initialise sys.path from the
runtime defaults plus PYTHONPATH; they do not inherit pytest's
`--import-mode=importlib` sys.path manipulation. Unpickling the
test-local `dummy_transform` (which lives at
`tests.ngff.test_ngff_utils.dummy_transform`) therefore fails with
`ModuleNotFoundError: No module named 'tests'` and the worker dies,
surfacing as `BrokenProcessPool` in the parent.
Fix: prepend the repo root to PYTHONPATH (and to the parent's sys.path
for symmetry) in `tests/conftest.py`. Spawn children inherit PYTHONPATH
via the OS env, so they can now resolve `tests.ngff.test_ngff_utils` and
unpickle the function.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: honour SLURM_CPUS_PER_TASK when capping num_workers
`os.cpu_count()` reports the host's total CPUs, not the cgroup CPU
allocation. On a 128-core slurm node where the job was granted only
8 cores, capping `num_workers` at `os.cpu_count()` lets a caller
oversubscribe the cgroup. Add `_available_cpus()` that prefers the
`SLURM_CPUS_PER_TASK` env var when present and falls back to
`os.cpu_count()` otherwise.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor!: drop num_processes and num_threads kwargs
Both were deprecated in the previous commit ('refactor: add
num_workers/use_threads to process_single_position'), with shims that
forwarded their values to num_workers. Drop the shims now — anything
still passing num_processes / num_threads gets a TypeError pointing at
the right argument name, which is more useful than a silent
DeprecationWarning that callers may never see (Python suppresses
DeprecationWarning raised from package code under the default filter).
Removes the corresponding regression test
(test_process_single_position_legacy_kwargs_deprecated) and the
unused 'warnings' import.
BREAKING CHANGE: callers of process_single_position must use
num_workers (and, optionally, use_threads) instead of num_processes /
num_threads.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* revert conftest.py
* Revert "revert conftest.py"
This reverts commit 0f86c59.
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent f281ac3 commit 1c739a2
3 files changed
Lines changed: 120 additions & 43 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| 5 | + | |
5 | 6 | | |
6 | | - | |
7 | 7 | | |
8 | 8 | | |
9 | | - | |
| 9 | + | |
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| |||
165 | 165 | | |
166 | 166 | | |
167 | 167 | | |
168 | | - | |
169 | | - | |
| 168 | + | |
| 169 | + | |
170 | 170 | | |
171 | 171 | | |
172 | 172 | | |
| |||
279 | 279 | | |
280 | 280 | | |
281 | 281 | | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
282 | 297 | | |
283 | 298 | | |
284 | 299 | | |
| |||
287 | 302 | | |
288 | 303 | | |
289 | 304 | | |
290 | | - | |
291 | | - | |
| 305 | + | |
| 306 | + | |
292 | 307 | | |
293 | 308 | | |
294 | 309 | | |
| |||
328 | 343 | | |
329 | 344 | | |
330 | 345 | | |
331 | | - | |
332 | | - | |
333 | | - | |
334 | | - | |
335 | | - | |
336 | | - | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
337 | 354 | | |
338 | 355 | | |
339 | 356 | | |
340 | 357 | | |
341 | 358 | | |
342 | 359 | | |
343 | 360 | | |
344 | | - | |
345 | | - | |
346 | | - | |
347 | | - | |
348 | | - | |
349 | | - | |
350 | | - | |
351 | | - | |
352 | 361 | | |
353 | 362 | | |
354 | 363 | | |
| |||
412 | 421 | | |
413 | 422 | | |
414 | 423 | | |
415 | | - | |
416 | | - | |
417 | | - | |
| 424 | + | |
418 | 425 | | |
| 426 | + | |
419 | 427 | | |
420 | 428 | | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
421 | 440 | | |
422 | | - | |
423 | | - | |
424 | | - | |
425 | | - | |
426 | | - | |
427 | | - | |
428 | | - | |
429 | | - | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
430 | 459 | | |
431 | 460 | | |
432 | 461 | | |
433 | 462 | | |
434 | 463 | | |
435 | 464 | | |
436 | 465 | | |
437 | | - | |
438 | | - | |
439 | | - | |
440 | | - | |
441 | | - | |
442 | | - | |
443 | | - | |
444 | | - | |
445 | | - | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
446 | 470 | | |
447 | 471 | | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
448 | 488 | | |
449 | 489 | | |
450 | 490 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
| 4 | + | |
4 | 5 | | |
5 | 6 | | |
6 | 7 | | |
| |||
13 | 14 | | |
14 | 15 | | |
15 | 16 | | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
16 | 30 | | |
17 | 31 | | |
18 | 32 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
| 17 | + | |
17 | 18 | | |
18 | 19 | | |
19 | 20 | | |
| |||
737 | 738 | | |
738 | 739 | | |
739 | 740 | | |
740 | | - | |
| 741 | + | |
| 742 | + | |
741 | 743 | | |
742 | 744 | | |
743 | | - | |
| 745 | + | |
744 | 746 | | |
745 | 747 | | |
746 | 748 | | |
| |||
779 | 781 | | |
780 | 782 | | |
781 | 783 | | |
782 | | - | |
| 784 | + | |
| 785 | + | |
783 | 786 | | |
784 | 787 | | |
785 | 788 | | |
| |||
802 | 805 | | |
803 | 806 | | |
804 | 807 | | |
| 808 | + | |
| 809 | + | |
| 810 | + | |
| 811 | + | |
| 812 | + | |
| 813 | + | |
| 814 | + | |
| 815 | + | |
| 816 | + | |
| 817 | + | |
| 818 | + | |
| 819 | + | |
| 820 | + | |
| 821 | + | |
| 822 | + | |
| 823 | + | |
| 824 | + | |
| 825 | + | |
| 826 | + | |
| 827 | + | |
805 | 828 | | |
806 | 829 | | |
807 | 830 | | |
| |||
0 commit comments