Skip to content

Use the built-in HDF5 byte-range reader, if available.#1849

Merged
WardF merged 1 commit intoUnidata:masterfrom
DennisHeimbigner:hdf5ros3.dmh
Sep 24, 2020
Merged

Use the built-in HDF5 byte-range reader, if available.#1849
WardF merged 1 commit intoUnidata:masterfrom
DennisHeimbigner:hdf5ros3.dmh

Conversation

@DennisHeimbigner
Copy link
Copy Markdown
Collaborator

@DennisHeimbigner DennisHeimbigner commented Sep 24, 2020

re: Issue #1848

The existing Virtual File Driver built to support byte-range
read-only file access is quite old. It turns out to be extremely
slow (reason unknown at the moment).

Starting with HDF5 1.10.6, the HDF5 library has its own version
of such a file driver. The HDF5 developers have better knowledge
about building such a driver and what incantations are needed to
get good performance.

This PR modifies the byte-range code in hdf5open.c so
that if the HDF5 file driver is available, then it is used
in preference to the one written by the Netcdf group.

Note also that the HDF5 driver will work against any server that supports
byte-range. See the test case nc_test/test_byterange.sh.

Misc. Other Changes:

  1. Moved all of nc4print code to ncdump to keep appveyor quiet.

re: Issue Unidata#1848

The existing Virtual File Driver built to support byte-range
read-only file access is quite old. It turns out to be extremely
slow (reason unknown at the moment).

Starting with HDF5 1.10.6, the HDF5 library has its own version
of such a file driver. The HDF5 developers have better knowledge
about building such a driver and what incantations are needed to
get good performance.

This PR modifies the byte-range code in hdf5open.c so
that if the HDF5 file driver is available, then it is used
in preference to the one written by the Netcdf group.

Misc. Other Changes:

1. Moved all of nc4print code to ncdump to keep appveyor quiet.
Copy link
Copy Markdown
Member

@WardF WardF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you Dennis! Running the edge case tests and will get merged when they pass. Thanks!

@WardF WardF merged commit a96899a into Unidata:master Sep 24, 2020
@dopplershift
Copy link
Copy Markdown
Member

So to be clear...that driver from HDF5 only works on S3, right? So we’ll still see the pathological behavior on e.g. THREDDS servers?

Also, I’m not sure you really want to rely on that HADCrut data that’s sitting in a file on S3. @rsignell-usgs can correct me if I’m wrong, but the URL even says testing so I’m not sure how long it will live there.

@DennisHeimbigner
Copy link
Copy Markdown
Collaborator Author

Note also that the HDF5 driver will work against any server that supports
byte-range. See the test case nc_test/test_byterange.sh.

@DennisHeimbigner
Copy link
Copy Markdown
Collaborator Author

re: HADCrut data that’s sitting in a file on S3.
Is there a good and more permanent file that is also netcdf-4?

@dopplershift
Copy link
Copy Markdown
Member

@DennisHeimbigner DennisHeimbigner deleted the hdf5ros3.dmh branch September 25, 2020 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Poor performance downloading using byte-range requests

3 participants