Skip to content

If HDF5 dataset has multiple anonymous dimensions of the same size, assume they are different dimensions#1525

Merged
WardF merged 7 commits intoUnidata:masterfrom
NetCDF-World-Domination-Council:ejh_anon_dims
Nov 14, 2019
Merged

If HDF5 dataset has multiple anonymous dimensions of the same size, assume they are different dimensions#1525
WardF merged 7 commits intoUnidata:masterfrom
NetCDF-World-Domination-Council:ejh_anon_dims

Conversation

@edhartnett
Copy link
Copy Markdown
Contributor

@edhartnett edhartnett commented Nov 14, 2019

Fixes #1484

When reading a HDF5 file (i.e. not a netCDF-4/HDF5 file), the library has to make assumptions about dimensions.

Until now, each dataspace length got it's own phony dimension, unless it was the same length as a previously assigned anon dimension.

That meant a HDF5 file with a 2-D dataspace of 100x100 would look like this:

netcdf dset {
dimensions:
        phony_dim_0 = 100 ;
variables:
        int dset(phony_dim_0, phony_dim_0) ;
}

This is probably not what the data writer intended. The two dimensions of the dataset should be two different dimensions, even though they are (coincidentally) the same size.

Now, if an anon dimension of the same size is found, it is also checked to see if it was already used in this dataset. If so, we now assume that it is a new dimension.

So now a HDF5 file, with two datasets of size 100 x 100 will look like this to netCDF:

netcdf tst_interops_dims {
dimensions:
	phony_dim_0 = 100 ;
	phony_dim_1 = 100 ;
variables:
	int dset(phony_dim_0, phony_dim_1) ;
	int dset2(phony_dim_0, phony_dim_1) ;

Note: this PR also includes #1523

@edhartnett edhartnett requested a review from WardF as a code owner November 14, 2019 14:30
@edhartnett edhartnett changed the title Now if HDF5 dataset has var that has multiple anonymous dimensions of the same size, assume they are different dimensions If HDF5 dataset has var that has multiple anonymous dimensions of the same size, assume they are different dimensions Nov 14, 2019
@edhartnett edhartnett changed the title If HDF5 dataset has var that has multiple anonymous dimensions of the same size, assume they are different dimensions If HDF5 dataset has dataset that has multiple anonymous dimensions of the same size, assume they are different dimensions Nov 14, 2019
@edhartnett edhartnett changed the title If HDF5 dataset has dataset that has multiple anonymous dimensions of the same size, assume they are different dimensions If HDF5 dataset has multiple anonymous dimensions of the same size, assume they are different dimensions Nov 14, 2019
@WardF WardF self-assigned this Nov 14, 2019
@WardF WardF merged commit 2462cda into Unidata:master Nov 14, 2019
Copy link
Copy Markdown
Collaborator

@DennisHeimbigner DennisHeimbigner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a corresponding issue for this PR?
In any case, this also affect DAP2 and DAP and Zarr because
they can have anon. dims. also

@DennisHeimbigner
Copy link
Copy Markdown
Collaborator

I have been thinking about this from the POV of semantics.
In netcdf-c when you use the same dimension name, it is intended to mean
that the dimensions are known to be semantically the same.
For anonymous dimensions, we have no such information.
So, I would think that this means every anonymous dimension
should map to a different phony dim. This means that the above example
shoul have four different phony dimensions.

@DennisHeimbigner
Copy link
Copy Markdown
Collaborator

PS, whatever we do here needs to be also done in DAP2, DAP4, and
Zarr.

@edhartnett
Copy link
Copy Markdown
Contributor Author

Well that is a valid idea, but the existing usage has already been built into a lot of programs, so I don't know that this can be changed.

The reality on the ground is that in most cases, the writer is using shared dimensions. That is, when you take some model output and put it in HDF5, you get N datasets with the same 3 and 4 dimensional shape. And they really are sharing dimensions. Of course there are examples to the contrary, but even these are not a problem in a read-only setting. (And read/write is expressly documented as dangerous, though it can work under careful circumstances.)

The case that generated this ticket is kind of a corner case.

If there is to be a different interpretation, it would have to be activated with a mode flag (after we expand the size of the mode flags). If you decide to do that for the daps and zarr, I'm happy to support it in the libhdf5 code.

(But I don't think that should hold up the current PR, which fixes a real bug a user is seeing.)

@DennisHeimbigner
Copy link
Copy Markdown
Collaborator

I would be happier if the naming of anon dimensions was
independent of the variable in which they occurred.
remember that to this point, size=> name, so any proposed change
will break something.

@edhartnett
Copy link
Copy Markdown
Contributor Author

Any proposed change, with a mode flag, can be completely backward compatible. And if this is something you are encountering in your DAP or Zarr, then that could be the default for those dispatch layers; as long as plain old HDF5 files are assigned in the same way, then nothing will break.

But I don't get what you mean:

I would be happier if the naming of anon dimensions was
independent of the variable in which they occurred.

The dimensions are named phony_dim_N, where N starts at 0 and is incremented for each new anon dim, which means each different length, or (now) same length but assigned to the same var. So they are not very dependent on the variables.

The current system was whacked together by a madman, temporarily restrained. and probably verging on the brink of ever greater madness. So it could be better. ;-) I welcome clarification and elaboration of a better way to assign these anon dimensions, as long as we don't break existing codes.

@DennisHeimbigner
Copy link
Copy Markdown
Collaborator

Let me try to clarify:
Given we encounter an anonymous dimension associated with a variable,
provide a written set of rules that tell us what to do with that dimension.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reading HDF5 - control creation of anonymous dimensions

3 participants