If HDF5 dataset has multiple anonymous dimensions of the same size, assume they are different dimensions by edhartnett · Pull Request #1525 · Unidata/netcdf-c

edhartnett · 2019-11-14T14:30:02Z

When reading a HDF5 file (i.e. not a netCDF-4/HDF5 file), the library has to make assumptions about dimensions.

Until now, each dataspace length got it's own phony dimension, unless it was the same length as a previously assigned anon dimension.

That meant a HDF5 file with a 2-D dataspace of 100x100 would look like this:

netcdf dset {
dimensions:
        phony_dim_0 = 100 ;
variables:
        int dset(phony_dim_0, phony_dim_0) ;
}

This is probably not what the data writer intended. The two dimensions of the dataset should be two different dimensions, even though they are (coincidentally) the same size.

Now, if an anon dimension of the same size is found, it is also checked to see if it was already used in this dataset. If so, we now assume that it is a new dimension.

So now a HDF5 file, with two datasets of size 100 x 100 will look like this to netCDF:

netcdf tst_interops_dims {
dimensions:
	phony_dim_0 = 100 ;
	phony_dim_1 = 100 ;
variables:
	int dset(phony_dim_0, phony_dim_1) ;
	int dset2(phony_dim_0, phony_dim_1) ;

Note: this PR also includes #1523

DennisHeimbigner

Is there a corresponding issue for this PR?
In any case, this also affect DAP2 and DAP and Zarr because
they can have anon. dims. also

DennisHeimbigner · 2019-11-15T20:29:50Z

I have been thinking about this from the POV of semantics.
In netcdf-c when you use the same dimension name, it is intended to mean
that the dimensions are known to be semantically the same.
For anonymous dimensions, we have no such information.
So, I would think that this means every anonymous dimension
should map to a different phony dim. This means that the above example
shoul have four different phony dimensions.

DennisHeimbigner · 2019-11-15T20:30:39Z

PS, whatever we do here needs to be also done in DAP2, DAP4, and
Zarr.

edhartnett · 2019-11-15T21:13:45Z

Well that is a valid idea, but the existing usage has already been built into a lot of programs, so I don't know that this can be changed.

The reality on the ground is that in most cases, the writer is using shared dimensions. That is, when you take some model output and put it in HDF5, you get N datasets with the same 3 and 4 dimensional shape. And they really are sharing dimensions. Of course there are examples to the contrary, but even these are not a problem in a read-only setting. (And read/write is expressly documented as dangerous, though it can work under careful circumstances.)

The case that generated this ticket is kind of a corner case.

If there is to be a different interpretation, it would have to be activated with a mode flag (after we expand the size of the mode flags). If you decide to do that for the daps and zarr, I'm happy to support it in the libhdf5 code.

(But I don't think that should hold up the current PR, which fixes a real bug a user is seeing.)

DennisHeimbigner · 2019-11-15T22:16:08Z

I would be happier if the naming of anon dimensions was
independent of the variable in which they occurred.
remember that to this point, size=> name, so any proposed change
will break something.

edhartnett · 2019-11-16T01:16:38Z

Any proposed change, with a mode flag, can be completely backward compatible. And if this is something you are encountering in your DAP or Zarr, then that could be the default for those dispatch layers; as long as plain old HDF5 files are assigned in the same way, then nothing will break.

But I don't get what you mean:

I would be happier if the naming of anon dimensions was
independent of the variable in which they occurred.

The dimensions are named phony_dim_N, where N starts at 0 and is incremented for each new anon dim, which means each different length, or (now) same length but assigned to the same var. So they are not very dependent on the variables.

The current system was whacked together by a madman, temporarily restrained. and probably verging on the brink of ever greater madness. So it could be better. ;-) I welcome clarification and elaboration of a better way to assign these anon dimensions, as long as we don't break existing codes.

DennisHeimbigner · 2019-11-16T22:23:01Z

Let me try to clarify:
Given we encounter an anonymous dimension associated with a variable,
provide a written set of rules that tell us what to do with that dimension.

edhartnett added 6 commits November 13, 2019 12:07

udf must take priority in NC_infermodel

0bbe91e

adding test for anonymous dims in HDF5 file

cebe841

adding test

6b9248c

now handle two anon dimensions of same size used in same HDF5 var

d73611d

another test for two anon dimensions of same size used in same HDF5 var

3e45fa1

modified release notes

af71852

edhartnett requested a review from WardF as a code owner November 14, 2019 14:30

edhartnett changed the title ~~Now if HDF5 dataset has var that has multiple anonymous dimensions of the same size, assume they are different dimensions~~ If HDF5 dataset has var that has multiple anonymous dimensions of the same size, assume they are different dimensions Nov 14, 2019

updated tutorial

73ed852

edhartnett changed the title ~~If HDF5 dataset has var that has multiple anonymous dimensions of the same size, assume they are different dimensions~~ If HDF5 dataset has dataset that has multiple anonymous dimensions of the same size, assume they are different dimensions Nov 14, 2019

edhartnett changed the title ~~If HDF5 dataset has dataset that has multiple anonymous dimensions of the same size, assume they are different dimensions~~ If HDF5 dataset has multiple anonymous dimensions of the same size, assume they are different dimensions Nov 14, 2019

WardF self-assigned this Nov 14, 2019

WardF approved these changes Nov 14, 2019

View reviewed changes

WardF merged commit 2462cda into Unidata:master Nov 14, 2019

DennisHeimbigner reviewed Nov 15, 2019

View reviewed changes

kmuehlbauer mentioned this pull request Dec 5, 2019

ENH: invent phony_dim names for unlabeled dimensions h5netcdf/h5netcdf#64

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

If HDF5 dataset has multiple anonymous dimensions of the same size, assume they are different dimensions#1525

If HDF5 dataset has multiple anonymous dimensions of the same size, assume they are different dimensions#1525
WardF merged 7 commits intoUnidata:masterfrom
NetCDF-World-Domination-Council:ejh_anon_dims

edhartnett commented Nov 14, 2019 •

edited

Loading

Uh oh!

DennisHeimbigner left a comment

Uh oh!

DennisHeimbigner commented Nov 15, 2019

Uh oh!

DennisHeimbigner commented Nov 15, 2019

Uh oh!

edhartnett commented Nov 15, 2019

Uh oh!

DennisHeimbigner commented Nov 15, 2019

Uh oh!

edhartnett commented Nov 16, 2019

Uh oh!

DennisHeimbigner commented Nov 16, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

edhartnett commented Nov 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DennisHeimbigner left a comment

Choose a reason for hiding this comment

Uh oh!

DennisHeimbigner commented Nov 15, 2019

Uh oh!

DennisHeimbigner commented Nov 15, 2019

Uh oh!

edhartnett commented Nov 15, 2019

Uh oh!

DennisHeimbigner commented Nov 15, 2019

Uh oh!

edhartnett commented Nov 16, 2019

Uh oh!

DennisHeimbigner commented Nov 16, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

edhartnett commented Nov 14, 2019 •

edited

Loading