Performance comparison between tifffile and iohub's custom OME-TIFF implementation

A custom OME-TIFF reader (`iohub.multipagetiff.MicromanagerOmeTiffReader`) was implemented because historically tifffile and AICSImageIO was slow when reading large OME-TIFF series generated by Micro-Manager acquisitions. 

While debugging #65 I found that this implementation does not guarantee data integrity during reading. Before investing more time in fixing it, I think it is worth revisiting the topic of whether it is worth maintaining a custom OME-TIFF reader, given that the more widely adopted solutions have evolved since `waveorder.io`'s designation.

Here is a simple read speed benchmark of tifffile and iohub's custom reader:

![benchmark_plot](https://user-images.githubusercontent.com/67518483/220740761-a4028d70-4976-4c6f-90f9-63f26309e5c4.png)


The test was done on a 123 GB dataset with TCZYX=(8, 9, 3, 81, 2048, 2048) dimensions. Voxels from 2 non-sequential positions was read into RAM in each iteration (N=5).

<details>
  <summary>Test script (click to expand):</summary>

Environment: Python 3.10.8, Linux 4.18 (x86_64, AMD EPYC 7H12@2.6GHz)

  ```py
# %%
import os
from timeit import timeit
import zarr
import pandas as pd

# readers tested
from tifffile import TiffSequence  # 2023.2.3
from iohub.multipagetiff import MicromanagerOmeTiffReader  # 0.1.dev368+g3d62e6f


# %%
# 123 GB total
DATASET = (
    "/hpc/projects/comp_micro/rawdata/hummingbird/Soorya/"
    "2022_06_27_A549cellMembraneStained/"
    "A549_CellMaskDye_Well1_deltz0.25_63X_30s_2framemin/"
    "A549_CellMaskdye_Well1_30s_2framemin_1"
)

POSITIONS = (2, 0)


# %%
def read_tifffile():
    sequence = TiffSequence(os.scandir(DATASET))
    data = zarr.open(sequence.aszarr(), mode="r")
    for p in POSITIONS:
        _ = data[p]
    sequence.close()


# %%
def read_custom():
    reader = MicromanagerOmeTiffReader(DATASET)
    for p in POSITIONS:
        _ = reader.get_array(p)


# %%
def repeat(n=5):
    tf_times = []
    wo_times = []
    for _ in range(n):
        tf_times.append(
            timeit(
                "read_tifffile()", number=1, setup="from __main__ import read_tifffile"
            )
        )
        wo_times.append(
            timeit("read_custom()", number=1, setup="from __main__ import read_custom")
        )
    return pd.DataFrame({"tifffile": tf_times, "waveorder": wo_times})


# %%
timings = repeat()
  ```
</details>

At least in this test, the latest tifffile consistently out-performs the iohub implementation. While a comprehensive benchmark will take more time (#57), I think as long as a widely used library is not *significantly* slower, the reduction of maintenance overhead and increased user testing can make a strong case for us to reconsider maintaining the custom code in iohub.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance comparison between tifffile and iohub's custom OME-TIFF implementation #66

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance comparison between tifffile and iohub's custom OME-TIFF implementation #66

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions