Skip to content

Make dimscale attachment to variables optional#2161

Merged
WardF merged 6 commits intoUnidata:mainfrom
gsjaardema:dimscale_attachement_optional
Apr 19, 2022
Merged

Make dimscale attachment to variables optional#2161
WardF merged 6 commits intoUnidata:mainfrom
gsjaardema:dimscale_attachement_optional

Conversation

@gsjaardema
Copy link
Copy Markdown
Contributor

Code to implement the proposal discussed in #2128.

@gsjaardema gsjaardema requested a review from WardF as a code owner December 2, 2021 22:20
@gsjaardema
Copy link
Copy Markdown
Contributor Author

There are a few whitespace changes -- my editor automatically strips trailing spaces and I didn't clean out those changes...

@gsjaardema
Copy link
Copy Markdown
Contributor Author

Apologies again for all the whitespace changes...

@gsjaardema gsjaardema changed the title Make dimscale attachement to variables optional Make dimscale attachment to variables optional Dec 9, 2021
@gsjaardema
Copy link
Copy Markdown
Contributor Author

What needs to be done to this to make it a consideration for the next release? @edhartnett @WardF

@edwardhartnett
Copy link
Copy Markdown
Contributor

It's up to Ward - as far as I am concerned, it's good to go.

Ward's been busy with the netcdf-fortran release, which had to come out before the next C release, since the C library now includes the quantize functions and netcdf-fortran now must be modified to include them. So Ward is catching up and we have to be patient. ;-)

However, many of us are eager for the next netcdf-c release, which will contain the quantize functions for lossy compression. Onward and upward!

@DennisHeimbigner
Copy link
Copy Markdown
Collaborator

DennisHeimbigner commented Jan 11, 2022

I would like to be sure I understand the consequences, sorry for being pedantic.
Are these statements correct?

  1. Netcdf-4 files written by 4.6.3 or earlier (with dimscales) can still be read by post-4.6.3 versions.
  2. 4.6.3 and earlier cannot necessarily read post 4.6.3 written files

My concern is mostly this:

If dimension scales are not used, then netCDF-4 can still edit the file, and will invent anonymous dimensions for each variable shape. This is done by iterating through the space of each dataset. As each space size is encountered, a phony dimension of that size is checked for. It it does not exist, a new phony dimension is created for that size. In this way, a HDF5 file with datasets that are using shared dimensions can be seen properly in netCDF-4

It seems to imply that the shared dimensions are lost in this case. Is this correct? Why is it so?

@WardF
Copy link
Copy Markdown
Member

WardF commented Jan 11, 2022

With netCDF-Fortran announced/published, I am able to turn my attention back to the C library and begin moving forward here again :). Also shoring up the docker-based CI work we've been using, but that's not relevant to this. I am curious to the answers to Dennis' question, however.

@edwardhartnett
Copy link
Copy Markdown
Contributor

@DennisHeimbigner that documentation was written before the dim info was always included in the file metadata (i.e. 4.6.3).

In other words, netCDF-4 will only invent dimensions if they can't be determined from dimscales or other file metadata. What Greg proposes here is to make the dimscale optional, but the other file metadata is still always present.

So this will be readable to any netCDF 4.6.3 or greater.

@DennisHeimbigner
Copy link
Copy Markdown
Collaborator

I see, thanks. Then I see no reason not to go ahead with this. For back compatibility,
we will unfortunately have to keep dimscale cognizant code in the library for the foreseeable
future, but that is ok.

@edwardhartnett
Copy link
Copy Markdown
Contributor

Yes, the dimscale code will always be there, but the library only resolves dimscales (the slow operation) on file open for files that don't have the extra file metadata. And all files created with 4.6.3 or later have that extra metadata, so the dimscale step can be skipped.

However, files created before 4.6.3 don't have the extra metadata (except for some very specific cases where dimscales never worked). So we will always need to be able to resolve the dimscales for that older data. ;-)

Dimscales have been a bit of a curse for netCDF.

@gsjaardema
Copy link
Copy Markdown
Contributor Author

I know things are busy, but just another ping to see if this can be included in the next release.

@WardF WardF added this to the 4.9.0 milestone Feb 16, 2022
@WardF
Copy link
Copy Markdown
Member

WardF commented Feb 16, 2022

@gsjaardema yes! :)

wkliao added a commit to Parallel-NetCDF/E3SM-IO that referenced this pull request Mar 25, 2022
* NetCDF-4 uses the dimension scale feature which is part of HDF5
  high-level APIs, but HDF5 high-level APIs are not well tested for
  parallel I/O. See Unidata/netcdf-c#2251 and
  Unidata/netcdf-c#1822
* NetCDF PR #2161 adds a new flag NC_NODIMSCALE_ATTACH to allow users to
  disable dimension scale, which resolves the problem for e3sm-io. See
  Unidata/netcdf-c#2161
* NetCDF team indicates PR #2161 will appear in version 4.9.0.
wkliao added a commit to Parallel-NetCDF/E3SM-IO that referenced this pull request Mar 25, 2022
* NetCDF-4 uses the dimension scale feature which is part of HDF5
  high-level APIs, but HDF5 high-level APIs are not well tested for
  parallel I/O. See Unidata/netcdf-c#2251 and
  Unidata/netcdf-c#1822
* NetCDF PR #2161 adds a new flag NC_NODIMSCALE_ATTACH to allow users to
  disable dimension scale, which resolves the problem for e3sm-io. See
  Unidata/netcdf-c#2161
* NetCDF team indicates PR #2161 will appear in version 4.9.0.
@edwardhartnett
Copy link
Copy Markdown
Contributor

I think this PR looks good to merge. Nice work @gsjaardema as usual.

@WardF
Copy link
Copy Markdown
Member

WardF commented Apr 19, 2022

It looks like some of the tests timed out; sorting out re-running these and then will merge. Thanks!

@WardF WardF mentioned this pull request Apr 19, 2022
@WardF WardF merged commit 982b258 into Unidata:main Apr 19, 2022
@gsjaardema
Copy link
Copy Markdown
Contributor Author

Thanks. It is a pleasure to work and use the library. Easy to understand...

@gsjaardema gsjaardema deleted the dimscale_attachement_optional branch April 19, 2022 19:49
wkliao added a commit to Parallel-NetCDF/E3SM-IO that referenced this pull request Apr 22, 2022
* NetCDF-4 uses the dimension scale feature which is part of HDF5
  high-level APIs, but HDF5 high-level APIs are not well tested for
  parallel I/O. See Unidata/netcdf-c#2251 and
  Unidata/netcdf-c#1822
* NetCDF PR #2161 adds a new flag NC_NODIMSCALE_ATTACH to allow users to
  disable dimension scale, which resolves the problem for e3sm-io. See
  Unidata/netcdf-c#2161
* NetCDF team indicates PR #2161 will appear in version 4.9.0.
wkliao added a commit to Parallel-NetCDF/E3SM-IO that referenced this pull request Apr 28, 2022
* NetCDF-4 uses the dimension scale feature which is part of HDF5
  high-level APIs, but HDF5 high-level APIs are not well tested for
  parallel I/O. See Unidata/netcdf-c#2251 and
  Unidata/netcdf-c#1822
* NetCDF PR #2161 adds a new flag NC_NODIMSCALE_ATTACH to allow users to
  disable dimension scale, which resolves the problem for e3sm-io. See
  Unidata/netcdf-c#2161
* NetCDF team indicates PR #2161 will appear in version 4.9.0.
wkliao added a commit to Parallel-NetCDF/E3SM-IO that referenced this pull request Apr 28, 2022
* NetCDF-4 uses the dimension scale feature which is part of HDF5
  high-level APIs, but HDF5 high-level APIs are not well tested for
  parallel I/O. See Unidata/netcdf-c#2251 and
  Unidata/netcdf-c#1822
* NetCDF PR #2161 adds a new flag NC_NODIMSCALE_ATTACH to allow users to
  disable dimension scale, which resolves the problem for e3sm-io. See
  Unidata/netcdf-c#2161
* NetCDF team indicates PR #2161 will appear in version 4.9.0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants