If HDF5 dataset has multiple anonymous dimensions of the same size, assume they are different dimensions#1525
Conversation
DennisHeimbigner
left a comment
There was a problem hiding this comment.
Is there a corresponding issue for this PR?
In any case, this also affect DAP2 and DAP and Zarr because
they can have anon. dims. also
|
I have been thinking about this from the POV of semantics. |
|
PS, whatever we do here needs to be also done in DAP2, DAP4, and |
|
Well that is a valid idea, but the existing usage has already been built into a lot of programs, so I don't know that this can be changed. The reality on the ground is that in most cases, the writer is using shared dimensions. That is, when you take some model output and put it in HDF5, you get N datasets with the same 3 and 4 dimensional shape. And they really are sharing dimensions. Of course there are examples to the contrary, but even these are not a problem in a read-only setting. (And read/write is expressly documented as dangerous, though it can work under careful circumstances.) The case that generated this ticket is kind of a corner case. If there is to be a different interpretation, it would have to be activated with a mode flag (after we expand the size of the mode flags). If you decide to do that for the daps and zarr, I'm happy to support it in the libhdf5 code. (But I don't think that should hold up the current PR, which fixes a real bug a user is seeing.) |
|
I would be happier if the naming of anon dimensions was |
|
Any proposed change, with a mode flag, can be completely backward compatible. And if this is something you are encountering in your DAP or Zarr, then that could be the default for those dispatch layers; as long as plain old HDF5 files are assigned in the same way, then nothing will break. But I don't get what you mean:
The dimensions are named phony_dim_N, where N starts at 0 and is incremented for each new anon dim, which means each different length, or (now) same length but assigned to the same var. So they are not very dependent on the variables. The current system was whacked together by a madman, temporarily restrained. and probably verging on the brink of ever greater madness. So it could be better. ;-) I welcome clarification and elaboration of a better way to assign these anon dimensions, as long as we don't break existing codes. |
|
Let me try to clarify: |
Fixes #1484
When reading a HDF5 file (i.e. not a netCDF-4/HDF5 file), the library has to make assumptions about dimensions.
Until now, each dataspace length got it's own phony dimension, unless it was the same length as a previously assigned anon dimension.
That meant a HDF5 file with a 2-D dataspace of 100x100 would look like this:
This is probably not what the data writer intended. The two dimensions of the dataset should be two different dimensions, even though they are (coincidentally) the same size.
Now, if an anon dimension of the same size is found, it is also checked to see if it was already used in this dataset. If so, we now assume that it is a new dimension.
So now a HDF5 file, with two datasets of size 100 x 100 will look like this to netCDF:
Note: this PR also includes #1523