Skip to content

PIO_IOTYPE_NETCDF4P requires NC_NODIMSCALE_ATTACH option #501

@dqwu

Description

@dqwu

With NetCDF 4.8.1 or later versions, some E3SM cases run with PIO_IOTYPE_NETCDF4P fail with HDF5 errors from nc_enddef() (when creating a NetCDF4 file). The error message from a run on one of the ANL GCE nodes (ne4 F case, NetCDF 4.9.0) is shown below,

[4] PIO: FATAL ERROR: Aborting... FATAL ERROR: NetCDF: Problem with HDF5 dimscales. (file = F2010_ne4_oQU240_netcdf4p.eam.h0.0001-01-01-00000.nc) (/scratch/wuda/E3SM/externals/scorpio/src/clib/pioc_support.c: 4341)

The two NetCDF issues below have simple NETCDF4 test programs to reproduce the HDF5 errors:
Unidata/netcdf-c#2165
Unidata/netcdf-c#2251

The error is due to H5DSattach_scale() calls failing in the NetCDF library. The error is returned in the NetCDF library in netcdf-c/libhdf5/nc4hdf.c, where the High-level DS API H5DSattach_scale is called multiple times inside a loop:

 if (H5DSattach_scale(hdf5_var->hdf_datasetid, dsid, d) < 0) 
     return NC_EHDFERR; 

According to HDF5 developers, HDF5 does not test any of the HL DS APIs like H5DSattach_scale in a parallel setting and these APIs are intended to be called by a single process (a single process creating/opening the file and calling the API).

In some cases with enough iterations of the loop above, HDF5 might get out of step between the ranks, see Unidata/netcdf-c#1822, causing the error.

Workaround:

NetCDF 4.9.0 introduced the NC_NODIMSCALE_ATTACH flag (when creating files) to make dimscale attachment to variables optional, see Unidata/netcdf-c#2161

As a workaround, we can apply this new NetCDF option when creating files using PIO_IOTYPE_NETCDF4P to avoid calling H5DSattach_scale.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions