Skip to content

Allow users to specify the use of the DIRECT driver #2236

@hmaarrfk

Description

@hmaarrfk

This is somewhat a followup to: #2177

In order to achieve maximum performance on writing large data blocks, I had to do a few things:

  1. Ensure the datasets were aligned in the user's ram (out of scope for netcdf-c)
  2. Ensure the destination in the file was aligned to a block : Allow users to specify data alignment #2177
  3. Use the direct driver to bypass many caching mechanism in the operating system.

With #2177 approaching being fixed #2206

I'm hoping that we can provide users to use different virtual file drivers.

The DIRECT driver fapl documention is:
https://support.hdfgroup.org/HDF5/doc/RM/H5P/H5Pset_fapl_direct.htm

Without the DIRECT driver, on linux, I'm limited to about 1GB/s.
With the DIRECT driver, on linux, I can reach read/write speeds that are limited by the PCIe Interface and the SSD reaching up to 3GB/s.

A secondary effect is that it bypasses the operating system file cache, so the operating system doesn't need to spend time evicting it (or other files) during large disk reads/writes.

Thank you for considering this feature.

I presume that the changes would have to provide a path similar to

        if (H5Pset_fapl_mpio(fapl_id, comm, info) < 0)
            BAIL(NC_EPARINIT);

In the file: libhdf5/hdf5create.c and libhdf5/hdf5open.c.

Let me know how I can help get this through.

For reference, here is my PR to h5py that exposes the same feature: h5py/h5py#2041

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions