Skip to content

Write Zarr strings as VLen-UTF8#452

Open
Bisaloo wants to merge 3 commits intoscverse:develfrom
Bisaloo:zarr-vlen-utf8
Open

Write Zarr strings as VLen-UTF8#452
Bisaloo wants to merge 3 commits intoscverse:develfrom
Bisaloo:zarr-vlen-utf8

Conversation

@Bisaloo
Copy link
Copy Markdown
Contributor

@Bisaloo Bisaloo commented Apr 29, 2026

Related to:

Description

  • This follows the anndata file format spec
  • This allows compatibility with zarr python for zarr version 3. In particular, this is blocking the addition of writing as zarr v3 because roundtrip with python fail.

Checklist

Before review

  • Update and regenerate man pages
  • Add/update tests
  • Add/update examples in vignettes
  • Pass CI checks

Before merge

  • Update NEWS
  • Bump devel version

Bisaloo added 3 commits April 24, 2026 17:00
- This follows the anndata file format spec
- This allows compatibility with zarr python for zarr version 3
Comment thread R/write_zarr_helpers.R
Comment on lines +411 to +433
if (zarr_version == 3L) {
zarr_json_path <- file.path(store, name, "zarr.json")
zarr_json <- jsonlite::read_json(zarr_json_path)
zarr_json$data_type <- "string"
# There should be only one bytes-array codec
zarr_json$codecs <- lapply(
zarr_json$codecs,
function(codec) {
if (codec$name == "bytes") {
list(name = "vlen-utf8")
} else {
codec
}
}
)
jsonlite::write_json(
zarr_json,
zarr_json_path,
auto_unbox = TRUE,
pretty = TRUE,
null = "null"
)
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not strictly need for now because writing is only possible for zarr v2, but this is laying the groundwork for an incoming PR from @Artur-man.

Copy link
Copy Markdown
Collaborator

@lazappi lazappi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good. I'm just wondering if we should wait for a fix upstream rather than add a workaround here?

If we do it here, maybe some of the repeated code could be moved to a helper function?

@lazappi lazappi changed the title Write strings as VLen-UTF8 Write Zarr strings as VLen-UTF8 May 6, 2026
@Bisaloo
Copy link
Copy Markdown
Contributor Author

Bisaloo commented May 6, 2026

I'm just wondering if we should wait for a fix upstream rather than add a workaround here?

In theory, yes, I agree.

The reality is that I'm spread quite thin over a large number of projects and extending the writing capabilities of Rarr (as opposed to reading) is somewhat lower priority at the moment.

I could submit a follow up PR once the new interface is in place (I'd love to get to it before the next release but I cannot say for sure).

@Artur-man
Copy link
Copy Markdown
Contributor

You guys let me know how you wanna move forward.

I can also write on top these commits of Hugo and open the zarr v3 write PR immediately, thus you do not need to work on it much @Bisaloo.

@Artur-man
Copy link
Copy Markdown
Contributor

Artur-man commented May 8, 2026

I will open a PR on top of @Bisaloo's commits, lets get this rolling ....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants