Allow users to use zlib on parallel writes. This feature is needed urgently by NOAA. No doubt it is desired by many other large data producers.
Large data producers like NOAA are turning to netCDF-4 due to built-in compression. On HPC applications especially, they are writing very large files (10s of GB per file is normal, even with compression).
Since these files are produced on a HPC system, the data start on many processors. Due to the current limitation of the netcdf-c library, they must move all data to one processor, and have that one processor do all compression and writing sequentially. This slows down their very expensive machine!
More importantly, it means that compromises must be made in what is saved. Instead of saving every value that the science team wants, only a subset of values and data are saved. This makes the science harder but makes operations work.
Allowing users to write data compressed with parallel I/O will save them the step of moving all data to one processor, and will also allow the compression to be spread to many processors, instead of just one. This will improve their write time by almost an order of magnitude.
Decrease in I/O time leads to increased time for computation, allowing the science team to increase resolution or use more intensive algorithms, and still meet operational requirements.
Thus improving the I/O times directly impacts the science.
Allow users to use zlib on parallel writes. This feature is needed urgently by NOAA. No doubt it is desired by many other large data producers.
Large data producers like NOAA are turning to netCDF-4 due to built-in compression. On HPC applications especially, they are writing very large files (10s of GB per file is normal, even with compression).
Since these files are produced on a HPC system, the data start on many processors. Due to the current limitation of the netcdf-c library, they must move all data to one processor, and have that one processor do all compression and writing sequentially. This slows down their very expensive machine!
More importantly, it means that compromises must be made in what is saved. Instead of saving every value that the science team wants, only a subset of values and data are saved. This makes the science harder but makes operations work.
Allowing users to write data compressed with parallel I/O will save them the step of moving all data to one processor, and will also allow the compression to be spread to many processors, instead of just one. This will improve their write time by almost an order of magnitude.
Decrease in I/O time leads to increased time for computation, allowing the science team to increase resolution or use more intensive algorithms, and still meet operational requirements.
Thus improving the I/O times directly impacts the science.