Skip to content

NetCDF needs better and faster compression, especially for HPC applications #1545

@edwardhartnett

Description

@edwardhartnett

This issue is for general discussion about improving the compression capability of netCDF. I will address specific ideas in their own issues.

Compression is a key feature of netCDF-4. Whenever I ask why netCDF-4 was selected, the answer is always because it's netCDF, with built in compression. That is, the API is already understood and all the read program are working, and it's easy to turn on compression and suddenly save half your disk space. (The NOAA GFS is compressing a 33 GB file down to 6 GB. Without compression, storing and moving the file is much more expensive.)

However, currently we only offer zlib compression, which is not the best or most commonly desired kind. The work of @DennisHeimbigner with filters offers the chance to expose other HDF5 compression filters.

And what about the classic formats? Is there not a way to turn on compression? As a simple case, the entire file can be piped through gzip and automatically piped through unzip. It's pretty easy to do that, but there are much better ways.

When and if we add more compression options, we must of course continue to honor the existing API and code that has been written to it. Currently we do:

nc_def_var(...)
nc_def_var_deflate(...)

So we can imagine other forms of compression getting their own nc_def_var_XXXX function, which would be called after the var is defined and before enddef.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions