When #190 is complete we should see if any common code can be extracted into shared helper functions.
AI summary
Both backends implement the AnnData on-disk encoding spec, so read/write_h5ad_helpers.R and read/write_zarr_helpers.R contain duplicated pure-R logic that should live once in utils.R.
Six helper functions to extract:
┌─────────────────────────────────────────┬───────────────┬────────────────────────┐
│ New function │ Replaces │ Callers │
├─────────────────────────────────────────┼───────────────┼────────────────────────┤
│ resolve_dataframe_index(value, index) │ ~20 lines × 2 │ write_*_data_frame() │
├─────────────────────────────────────────┼───────────────┼────────────────────────┤
│ compute_column_order(value, index_name) │ ~5 lines × 2 │ write_*_data_frame() │
├─────────────────────────────────────────┼───────────────┼────────────────────────┤
│ factor_to_codes(value) │ ~3 lines × 2 │ write_*_categorical() │
├─────────────────────────────────────────┼───────────────┼────────────────────────┤
│ codes_to_factor(codes, levels, ordered) │ ~4 lines × 2 │ read_*_categorical() │
├─────────────────────────────────────────┼───────────────┼────────────────────────┤
│ apply_nullable_mask(data, mask) │ ~1 line × 2 │ read_*_nullable() │
├─────────────────────────────────────────┼───────────────┼────────────────────────┤
│ sparse_matrix_type(value, name) │ ~8 lines × 2 │ write_*_sparse_array() │
└─────────────────────────────────────────┴───────────────┴────────────────────────┘
What NOT to extract: The top-level write_*_element() dispatch functions look similar but have a meaningful intentional difference — H5AD only treats 1D integer arrays with NAs as nullable-integer (&& length(dim(value)) <= 1), while Zarr applies this to all integer arrays. These should stay separate but get a cross-reference comment explaining the asymmetry.
Files changed: R/utils.R (+6 functions), R/write_h5ad_helpers.R, R/write_zarr_helpers.R, R/read_h5ad_helpers.R, R/read_zarr_helpers.R
Verification: No new tests needed — existing roundtrip, read, write, and cross-backend tests cover all affected code paths. All tests should pass unchanged.
When #190 is complete we should see if any common code can be extracted into shared helper functions.
AI summary
Both backends implement the
AnnDataon-disk encoding spec, soread/write_h5ad_helpers.Randread/write_zarr_helpers.Rcontain duplicated pure-R logic that should live once inutils.R.Six helper functions to extract:
What NOT to extract: The top-level
write_*_element()dispatch functions look similar but have a meaningful intentional difference — H5AD only treats 1D integer arrays withNAs asnullable-integer(&& length(dim(value)) <= 1), while Zarr applies this to all integer arrays. These should stay separate but get a cross-reference comment explaining the asymmetry.Files changed:
R/utils.R(+6 functions),R/write_h5ad_helpers.R,R/write_zarr_helpers.R,R/read_h5ad_helpers.R,R/read_zarr_helpers.RVerification: No new tests needed — existing roundtrip, read, write, and cross-backend tests cover all affected code paths. All tests should pass unchanged.