Write Zarr strings as VLen-UTF8#452
Conversation
- This follows the anndata file format spec - This allows compatibility with zarr python for zarr version 3
| if (zarr_version == 3L) { | ||
| zarr_json_path <- file.path(store, name, "zarr.json") | ||
| zarr_json <- jsonlite::read_json(zarr_json_path) | ||
| zarr_json$data_type <- "string" | ||
| # There should be only one bytes-array codec | ||
| zarr_json$codecs <- lapply( | ||
| zarr_json$codecs, | ||
| function(codec) { | ||
| if (codec$name == "bytes") { | ||
| list(name = "vlen-utf8") | ||
| } else { | ||
| codec | ||
| } | ||
| } | ||
| ) | ||
| jsonlite::write_json( | ||
| zarr_json, | ||
| zarr_json_path, | ||
| auto_unbox = TRUE, | ||
| pretty = TRUE, | ||
| null = "null" | ||
| ) | ||
| } |
There was a problem hiding this comment.
This is not strictly need for now because writing is only possible for zarr v2, but this is laying the groundwork for an incoming PR from @Artur-man.
lazappi
left a comment
There was a problem hiding this comment.
I think this looks good. I'm just wondering if we should wait for a fix upstream rather than add a workaround here?
If we do it here, maybe some of the repeated code could be moved to a helper function?
In theory, yes, I agree. The reality is that I'm spread quite thin over a large number of projects and extending the writing capabilities of Rarr (as opposed to reading) is somewhat lower priority at the moment. I could submit a follow up PR once the new interface is in place (I'd love to get to it before the next release but I cannot say for sure). |
|
You guys let me know how you wanna move forward. I can also write on top these commits of Hugo and open the zarr v3 write PR immediately, thus you do not need to work on it much @Bisaloo. |
|
I will open a PR on top of @Bisaloo's commits, lets get this rolling .... |
Related to:
Description
Checklist
Before review
Update and regenerate man pagesAdd/update examples in vignettesBefore merge
NEWS