Skip to content

Performance comparison between tifffile and iohub's custom OME-TIFF implementation #66

@ziw-liu

Description

@ziw-liu

A custom OME-TIFF reader (iohub.multipagetiff.MicromanagerOmeTiffReader) was implemented because historically tifffile and AICSImageIO was slow when reading large OME-TIFF series generated by Micro-Manager acquisitions.

While debugging #65 I found that this implementation does not guarantee data integrity during reading. Before investing more time in fixing it, I think it is worth revisiting the topic of whether it is worth maintaining a custom OME-TIFF reader, given that the more widely adopted solutions have evolved since waveorder.io's designation.

Here is a simple read speed benchmark of tifffile and iohub's custom reader:

benchmark_plot

The test was done on a 123 GB dataset with TCZYX=(8, 9, 3, 81, 2048, 2048) dimensions. Voxels from 2 non-sequential positions was read into RAM in each iteration (N=5).

Test script (click to expand):

Environment: Python 3.10.8, Linux 4.18 (x86_64, AMD EPYC 7H12@2.6GHz)

# %%
import os
from timeit import timeit
import zarr
import pandas as pd

# readers tested
from tifffile import TiffSequence  # 2023.2.3
from iohub.multipagetiff import MicromanagerOmeTiffReader  # 0.1.dev368+g3d62e6f


# %%
# 123 GB total
DATASET = (
  "/hpc/projects/comp_micro/rawdata/hummingbird/Soorya/"
  "2022_06_27_A549cellMembraneStained/"
  "A549_CellMaskDye_Well1_deltz0.25_63X_30s_2framemin/"
  "A549_CellMaskdye_Well1_30s_2framemin_1"
)

POSITIONS = (2, 0)


# %%
def read_tifffile():
  sequence = TiffSequence(os.scandir(DATASET))
  data = zarr.open(sequence.aszarr(), mode="r")
  for p in POSITIONS:
      _ = data[p]
  sequence.close()


# %%
def read_custom():
  reader = MicromanagerOmeTiffReader(DATASET)
  for p in POSITIONS:
      _ = reader.get_array(p)


# %%
def repeat(n=5):
  tf_times = []
  wo_times = []
  for _ in range(n):
      tf_times.append(
          timeit(
              "read_tifffile()", number=1, setup="from __main__ import read_tifffile"
          )
      )
      wo_times.append(
          timeit("read_custom()", number=1, setup="from __main__ import read_custom")
      )
  return pd.DataFrame({"tifffile": tf_times, "waveorder": wo_times})


# %%
timings = repeat()

At least in this test, the latest tifffile consistently out-performs the iohub implementation. While a comprehensive benchmark will take more time (#57), I think as long as a widely used library is not significantly slower, the reduction of maintenance overhead and increased user testing can make a strong case for us to reconsider maintaining the custom code in iohub.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area: readersFormat readers, converters, data ingestperformanceSpeed and memory usage of the code

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions