Skip to content

Extract shared encoding helpers from H5AD and Zarr backends #445

@lazappi

Description

@lazappi

When #190 is complete we should see if any common code can be extracted into shared helper functions.

AI summary

Both backends implement the AnnData on-disk encoding spec, so read/write_h5ad_helpers.R and read/write_zarr_helpers.R contain duplicated pure-R logic that should live once in utils.R.

Six helper functions to extract:

┌─────────────────────────────────────────┬───────────────┬────────────────────────┐
│              New function               │   Replaces    │        Callers         │
├─────────────────────────────────────────┼───────────────┼────────────────────────┤
│ resolve_dataframe_index(value, index)   │ ~20 lines × 2 │ write_*_data_frame()   │
├─────────────────────────────────────────┼───────────────┼────────────────────────┤
│ compute_column_order(value, index_name) │ ~5 lines × 2  │ write_*_data_frame()   │
├─────────────────────────────────────────┼───────────────┼────────────────────────┤
│ factor_to_codes(value)                  │ ~3 lines × 2  │ write_*_categorical()  │
├─────────────────────────────────────────┼───────────────┼────────────────────────┤
│ codes_to_factor(codes, levels, ordered) │ ~4 lines × 2  │ read_*_categorical()   │
├─────────────────────────────────────────┼───────────────┼────────────────────────┤
│ apply_nullable_mask(data, mask)         │ ~1 line × 2   │ read_*_nullable()      │
├─────────────────────────────────────────┼───────────────┼────────────────────────┤
│ sparse_matrix_type(value, name)         │ ~8 lines × 2  │ write_*_sparse_array() │
└─────────────────────────────────────────┴───────────────┴────────────────────────┘

What NOT to extract: The top-level write_*_element() dispatch functions look similar but have a meaningful intentional difference — H5AD only treats 1D integer arrays with NAs as nullable-integer (&& length(dim(value)) <= 1), while Zarr applies this to all integer arrays. These should stay separate but get a cross-reference comment explaining the asymmetry.

Files changed: R/utils.R (+6 functions), R/write_h5ad_helpers.R, R/write_zarr_helpers.R, R/read_h5ad_helpers.R, R/read_zarr_helpers.R

Verification: No new tests needed — existing roundtrip, read, write, and cross-backend tests cover all affected code paths. All tests should pass unchanged.

Metadata

Metadata

Assignees

No one assigned

    Labels

    h5adIssues related to H5AD fileszarrIssues related to the Zarr backend

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions