Skip to content

Add support for BitRound quantization by number-of-significant-bits? #2225

@czender

Description

@czender

netCDF-C needs (IMHO) one more lossy codec: BitRound. BitRound takes the Number of Significant Bits (NSB) not digits (NSD) as input, and performs quantization with IEEE rounding on the remaining bits, aka the "keep bits". BitRound is already used as the final step in GranularBR, thus the heart of BitRound algorithm is already in netCDF-dev. However, I think we need to make BitRound separately invokable so users can directly specify NSB (not NSD). BitRound will use the same NSB for all values of a given variable, unlike GBR in which each value of a given variable can have a different (internally decided) NSB. I added a separately invocable BitRound to NCO and to CCR last month, in case anyone wants to try it.

The difference in compression ratio (CR) between losslessly compressing (e.g., with Zstd) a uniformly quantized/rounded variable (e.g., with BitRound or BitGroom) and losslessly compressing a variable quantized/rounded value-by-value (e.g., with Granular BitRound) can be significant (~5% though needs testings). This is because lossless compressors can more easily recognize/compress the set of trailing zero-bits in mantissa when the number of those zero bits does not change. Setting NSB may appeal more to computer science types than to domain researchers who might be more comfortable with setting NSD than NSB. BitRound is also used as the final (and easiest) step in the method of Klower et al. (2021) https://doi.org/10.1038/s43588-021-00156-2

This issue is a place to discuss any related feedback. I have started to draft a PR and would appreciate if @DennisHeimbigner might opine on the general idea, and also one specific question: if BitRound were to go into netCDF-C, should the PR re-use the structure member var->nsd to also hold the NSB, or should the PR add a new member var->nsb, or should it rename the structure member, or what?

FWIW, I would recommend everyone use either GranularBR or BR depending on their tastes. BitGroom just doesn't cut the mustard against these two, though it has been a useful starting point with a nice peer-reviewed journal reference. I'm also willing to craft this PR to replace BG with BR thereby reducing the # of quantization codecs from 3 to 2. Feedback welcome!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions