As discussed here Unidata/netcdf4-python#654 (comment), there is a need for conventions to specify the encoding of strings and character arrays in netcdf.
There is also a need to specify whether char arrays in NetCDF3 contain strings or character arrays.
@BobSimons addressed these issues in an enhancement to CF conventions that would specify charset for NetCDF3 and _Encoding for NetCDF4, and the Unidata gang (@DennisHeimbigner, @WardF, @ethanrd and @cwardgar) agreed with the concept, but suggested this be handled in the NUG and we came up with this slightly different proposal that would still accomplish Bob's goals of making it easy for software to figure out what is stuffed in those char or string arrays!
Proposal:
- Use
_CharType variable attribute with allowed values ['STRING', 'CHAR_ARRAY'] to specify if a char array variable should be interpreted as a string or as an array of individual characters. If _CharType is missing, default is 'STRING'.
- Use
_Encoding variable attribute with allowed values ['ISO-8859-1', 'ISO-8859-15', 'UTF-8'] to specify the encoding. If _Encoding is missing for _CharType='STRING', default is 'UTF-8'. If _Encoding is missing for _CharType='CHAR_ARRAY', default is 'ISO-8859-15'.
As discussed here Unidata/netcdf4-python#654 (comment), there is a need for conventions to specify the encoding of strings and character arrays in netcdf.
There is also a need to specify whether
chararrays in NetCDF3 contain strings or character arrays.@BobSimons addressed these issues in an enhancement to CF conventions that would specify
charsetfor NetCDF3 and_Encodingfor NetCDF4, and the Unidata gang (@DennisHeimbigner, @WardF, @ethanrd and @cwardgar) agreed with the concept, but suggested this be handled in the NUG and we came up with this slightly different proposal that would still accomplish Bob's goals of making it easy for software to figure out what is stuffed in thosecharorstringarrays!Proposal:
_CharTypevariable attribute with allowed values['STRING', 'CHAR_ARRAY']to specify if achararray variable should be interpreted as a string or as an array of individual characters. If_CharTypeis missing, default is'STRING'._Encodingvariable attribute with allowed values['ISO-8859-1', 'ISO-8859-15', 'UTF-8']to specify the encoding. If_Encodingis missing for_CharType='STRING', default is'UTF-8'. If_Encodingis missing for_CharType='CHAR_ARRAY', default is'ISO-8859-15'.