Avoid eager C-order copy in NibabelReader (fixes #8107)#8825
Avoid eager C-order copy in NibabelReader (fixes #8107)#8825ericspod merged 6 commits intoProject-MONAI:devfrom
Conversation
Nibabel exposes NIfTI voxel buffers in their native Fortran layout, but MONAI was forcing np.asanyarray(img.dataobj, order="C") in NibabelReader._get_array_data(). For compressed .nii.gz inputs that adds a full dense memory reorder on top of the file read/decompression step, which is the hot path reported in issue Project-MONAI#8107. Drop the forced C-order conversion and keep nibabel's native array layout instead. Downstream MONAI conversion paths already handle contiguity when they actually need it, so the reader does not need to pay that cost eagerly at load time. Add a regression test that loads a small NIfTI image through NibabelReader and asserts the returned data is still correct while preserving the native F-contiguous layout. This guards against reintroducing the eager copy in the reader path. Signed-off-by: Soumya Snigdha Kundu <soumya_snigdha.kundu@kcl.ac.uk>
Exercise both .nii and .nii.gz inputs in the tiny layout regression test so the reader path stays covered without adding a benchmark or a heavier fixture. Signed-off-by: Soumya Snigdha Kundu <soumya_snigdha.kundu@kcl.ac.uk>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughRemoved the explicit Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/data/test_init_reader.py (1)
84-84: Add a docstring to the new test method.Line 84 introduces a new definition without a docstring; add a short Google-style docstring.
As per coding guidelines, "Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/data/test_init_reader.py` at line 84, Add a Google-style docstring to the test function test_nibabel_reader_avoids_eager_c_order_copy describing what the test verifies (that the nibabel reader avoids an eager C-order copy), include a short "Args:" section only if the test takes parameters (omit otherwise), and include a "Raises:" or "Returns:" section only if the test explicitly raises or returns something (omit otherwise); place the docstring immediately under the def line in triple quotes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/data/test_init_reader.py`:
- Line 84: Add a Google-style docstring to the test function
test_nibabel_reader_avoids_eager_c_order_copy describing what the test verifies
(that the nibabel reader avoids an eager C-order copy), include a short "Args:"
section only if the test takes parameters (omit otherwise), and include a
"Raises:" or "Returns:" section only if the test explicitly raises or returns
something (omit otherwise); place the docstring immediately under the def line
in triple quotes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: be4e5a1e-b10b-4097-a28d-5d3c9fda14f0
📒 Files selected for processing (2)
monai/data/image_reader.pytests/data/test_init_reader.py
atharvajoshi01
left a comment
There was a problem hiding this comment.
Removing the explicit order='C' avoids a potentially expensive copy when the underlying NIfTI data is already in Fortran order (which is the native layout for medical imaging volumes). NumPy's default behavior without the order arg will return the data in its native layout, which is what downstream code expects anyway. Test coverage looks good.
|
This change was originally added here explicitly for a fix. It's not stated and I'm struggling to refresh my memory but it could be MetaTensor or lazy transform related given the context. @wyli or @atbenmurray may remember better than I, but it seemed this served a purpose so more investigation is needed with transforms. |
|
Ahh okay, my bad. Should've looked into that. My internal testing showed this was better. Ill have a more deeper dive on this. |
Thanks for the effort, it's possible we can do this change now with other changes that have been done since, or add a constructor argument to choose ordering. If you could please look into it further that would be great. |
|
I wasn't involved in this one, I think |
|
|
|
Code to benchmark: import gc, os, statistics, tempfile, time, tracemalloc
import numpy as np
import nibabel as nib
from monai.data import NibabelReader
SIZES = [(128, 128, 128), (256, 256, 128), (256, 256, 256), (512, 512, 256)]
DTYPES = [np.int16, np.float32]
SUFFIXES = [".nii", ".nii.gz"]
N_TRIALS = 5
N_INNER = 3
class CReader(NibabelReader):
def _get_array_data(self, img, filename):
return np.asanyarray(img.dataobj, order="C")
class NativeReader(NibabelReader):
def _get_array_data(self, img, filename):
return np.asanyarray(img.dataobj)
def time_load(reader_cls, fn):
times = []
for _ in range(N_INNER):
gc.collect()
r = reader_cls(mmap=False)
t0 = time.perf_counter()
img = r.read(fn)
data, _ = r.get_data(img)
_ = data.shape
t1 = time.perf_counter()
times.append(t1 - t0)
del data, img, r
return min(times)
def peak_mem(reader_cls, fn):
gc.collect()
tracemalloc.start()
r = reader_cls(mmap=False)
img = r.read(fn)
data, _ = r.get_data(img)
_, peak = tracemalloc.get_traced_memory()
tracemalloc.stop()
del data, img, r
return peak
print(f"{'Size':<20}{'dtype':<10}{'fmt':<8}{'C-order (ms)':>14}{'Native (ms)':>14}{'speedup':>10}{'peak C (MB)':>14}{'peak N (MB)':>14}")
print("-" * 120)
with tempfile.TemporaryDirectory() as tdir:
for size in SIZES:
for dtype in DTYPES:
arr = np.random.RandomState(0).randint(0, 1000, size=size).astype(dtype)
for suf in SUFFIXES:
fn = os.path.join(tdir, f"vol_{size}_{dtype.__name__}{suf}")
nib.save(nib.Nifti1Image(arr, np.eye(4)), fn)
with open(fn, "rb") as f: f.read() # warm OS cache
trials_c = [time_load(CReader, fn) for _ in range(N_TRIALS)]
trials_n = [time_load(NativeReader, fn) for _ in range(N_TRIALS)]
peak_c = peak_mem(CReader, fn)
peak_n = peak_mem(NativeReader, fn)
med_c = statistics.median(trials_c) * 1000
med_n = statistics.median(trials_n) * 1000
speedup = med_c / med_n if med_n else float("inf")
print(f"{str(size):<20}{dtype.__name__:<10}{suf:<8}"
f"{med_c:>14.2f}{med_n:>14.2f}{speedup:>9.2f}x"
f"{peak_c/1e6:>14.1f}{peak_n/1e6:>14.1f}")
os.remove(fn)Table:
|
|
@coderabbitai Is the statement about not needing |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 33345 🏁 Script executed: Length of output: 2707
# unconditional for ndim > 0 numpy arrays
data = np.ascontiguousarray(data)And The GPU path in One caveat worth noting: callers using |
Signed-off-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
|
Hi @aymuos15 I've made a very minor change to the documentation based on what Coderabbit said. As a sanity check it agrees with your assessment and since our tests pass we should be good. I'm aware a lot of our tests only check the shape of output arrays rather than contents so it's possible they are not thorough enough, but from our analysis I'm confident to merge this. |
|
Thank you very much for double checking everything. I did run quite a few personal tests apart from shapes to confirm. I am quite confident as well! Do you think this warrants another PR for stronger tests around this? |
For this specifically we're good I feel. There's still a lot of test refactoring to do, and tighter loading test criteria would be part of that bigger effort. If you wanted to have a look at things that wouldn't hurt, perhaps coordinate with @garciadias who was doing some of that earlier. |
|
I will speak to him then, thank you. |
Summary
NibabelReader._get_array_dataforcednp.asanyarray(img.dataobj, order="C"), triggering a full dense memory reorder on every load on top of the file read/decompression step. This is the hot path reported in Very Slow Loading of NIfTI (.nii.gz) Files Compared to SimpleITK #8107..reshape(data_shape, order="F").convert_to_tensor/convert_to_numpyinmonai/utils/type_conversion.py,monai/data/image_writer.py, recon utils) already callascontiguousarraywhere they actually need C-contiguous memory, so the reader does not need to pay that cost eagerly at load time.Biggest wins are on uncompressed
.nii, where nibabel's memmap view is returned lazily rather than being materialized by a forced reorder. Compressed.nii.gzstill pays the decompression cost but skips the subsequent reorder pass, which matches the "twice as long" observation from @ericspod in the issue thread.Compatibility note
The returned array's memory layout changes from C-contiguous to whatever nibabel provides (typically F-contiguous). Any external caller consuming
reader.get_data(...)[0]directly via.tobytes()or a raw C-extension buffer without first callingascontiguousarraywould see a different byte order. All in-repo consumers already guard themselves.Test plan
tests/data/test_init_reader.pyloads a small NIfTI throughNibabelReaderfor both.niiand.nii.gz, asserts array equality, and asserts the returned data is not C-contiguous (i.e. no eager C copy).pytest tests/data/test_init_reader.pypasses locally.runtests.sh-equivalent checks:ruff,black --skip-magic-trailing-comma --check,isort --check,pycln, pre-commit hooks — all clean on touched files.Fixes #8107