@@ -2,13 +2,12 @@ NetCDF-4 Filter Support
22============================
33<!-- double header is needed to workaround doxygen bug -->
44
5- NetCDF-4 Filter Support {#compress }
6- =================================
5+ NetCDF-4 Filter Support {#filters }
6+ ============================
77
88[ TOC]
99
10- Introduction {#compress_intro}
11- ==================
10+ # Introduction {#filters_intro}
1211
1312The HDF5 library (1.8.11 and later)
1413supports a general filter mechanism to apply various
@@ -37,8 +36,7 @@ can locate, load, and utilize the compressor.
3736These libraries are expected to installed in a specific
3837directory.
3938
40- Enabling A Compression Filter {#Enable}
41- =============================
39+ # Enabling A Compression Filter {#filters_enable}
4240
4341In order to compress a variable, the netcdf-c library
4442must be given three pieces of information:
@@ -66,8 +64,7 @@ using __ncgen__, via an API call, or via command line parameters to __nccopy__.
6664In any case, remember that filtering also requires setting chunking, so the
6765variable must also be marked with chunking information.
6866
69- Using The API {#API}
70- -------------
67+ ## Using The API {#filters_API}
7168The necessary API methods are included in __ netcdf.h__ by default.
7269One API method is for setting the filter to be used
7370when writing a variable. The relevant signature is
@@ -90,8 +87,7 @@ __params__. As is usual with the netcdf API, one is expected to call
9087this function twice. The first time to get __ nparams__ and the
9188second to get the parameters in client-allocated memory.
9289
93- Using ncgen {#NCGEN}
94- -------------
90+ ## Using ncgen {#filters_NCGEN}
9591
9692In a CDL file, compression of a variable can be specified
9793by annotating it with the following attribute:
@@ -107,8 +103,8 @@ This is a "special" attribute, which means that
107103it will normally be invisible when using
108104__ ncdump__ unless the -s flag is specified.
109105
110- Example CDL File (Data elided)
111- ------------------------------
106+ ### Example CDL File (Data elided)
107+
112108````
113109netcdf bzip2 {
114110dimensions:
@@ -123,8 +119,8 @@ data:
123119}
124120````
125121
126- Using nccopy {#NCCOPY }
127- -------------
122+ ## Using nccopy {#filters_NCCOPY }
123+
128124When copying a netcdf file using __ nccopy__ it is possible
129125to specify filter information for any output variable by
130126using the "-F" option on the command line; for example:
@@ -171,8 +167,7 @@ by this table.
171167<tr ><td >false<td >-Fvar,...<td >NA<td >use output filter
172168</table >
173169
174- Parameter Encoding {#ParamEncode}
175- ==========
170+ # Parameter Encode/Decode {#filters_paramcoding}
176171
177172The parameters passed to a filter are encoded internally as a vector
178173of 32-bit unsigned integers. It may be that the parameters
@@ -198,57 +193,83 @@ them between the local machine byte order and network byte
198193order.
199194
200195Parameters whose size is larger than 32-bits present a byte order problem.
201- This typically includes double precision floats and (signed or unsigned)
202- 64-bit integers. For these cases, the machine byte order must be
203- handled by the compression code. This is because HDF5 will treat,
196+ This specifically includes double precision floats and (signed or unsigned)
197+ 64-bit integers. For these cases, the machine byte order issue must be
198+ handled, in part, by the compression code. This is because HDF5 will treat,
204199for example, an unsigned long long as two 32-bit unsigned integers
205200and will convert each to network order separately. This means that
206201on a machine whose byte order is different than the machine in which
207- the parameters were initially created, the two integers are out of order
208- and must be swapped to get the correct unsigned long long value.
209- Consider this example. Suppose we have this little endian unsigned long long.
210-
211- 1000000230000004
212-
213- In network byte order, it will be stored as two 32-bit integers.
214-
215- 20000001 40000003
216-
217- On a big endian machine, this will be given to the filter in that form.
218-
219- 2000000140000003
220-
221- But note that the proper big endian unsigned long long form is this.
222-
223- 4000000320000001
224-
225- So, the two words need to be swapped.
226-
227- But consider the case when both original and final machines are big endian.
228-
229- 1 . 4000000320000001
230- 2 . 40000003 20000001
231- 3 . 40000003 20000001
232-
233- where #1 is the original number, #2 is the network order and
234- #3 is the what is given to the filter. In this case we do not
235- want to swap words.
236-
237- The solution is to forcibly encode the original number using some
238- specified endianness so that the filter always assumes it is getting
239- its parameters in that order and will always do swapping as needed.
240- This is irritating, but one needs to be aware of it. Since most
241- machines are little-endian. We choose to use that as the endianness
242- for handling 64 bit entities.
243-
244- Filter Specification Syntax {#Syntax}
245- ==========
202+ the parameters were initially created, the two integers will be separately
203+ endian converted. But this will be incorrect for 64-bit values.
204+
205+ So, we have this situation:
206+
207+ 1 . the 8 bytes come in as native machine order for the machine
208+ doing the call to * nc_def_var_filter* .
209+ 2 . HDF5 divides the 8 bytes into 2 four byte pieces and ensures that each piece
210+ is in network (big) endian order.
211+ 3 . When the filter is called, the two pieces are returned in the same order
212+ but with the bytes in each piece consistent with the native machine order
213+ for the machine executing the filter.
214+
215+ ## Encoding Algorithms
216+
217+ In order to properly extract the correct 8-byte value, we need to ensure
218+ that the values stored in the HDF5 file have a known format independent of
219+ the native format of the creating machine.
220+
221+ The idea is to do sufficient manipulation so that HDF5
222+ will store the 8-byte value as a little endian value
223+ divided into two 4-byte integers.
224+ Note that little-endian is used as the standard
225+ because it is the most common machine format.
226+ When read, the filter code needs to be aware of this convention
227+ and do the appropriate conversions.
228+
229+ This leads to the following set of rules.
230+
231+ ### Encoding
232+
233+ 1 . Encode on little endian (LE) machine: no special action is required.
234+ The 8-byte value is passed to HDF5 as two 4-byte integers. HDF5 byte
235+ swaps each integer and stores it in the file.
236+ 2 . Encode on a big endian (BE) machine: several steps are required:
237+
238+ 1 . Do an 8-byte byte swap to convert the original value to little-endian
239+ format.
240+ 2 . Since the encoding machine is BE, HDF5 will just store the value.
241+ So it is necessary to simulate little endian encoding by byte-swapping
242+ each 4-byte integer separately.
243+ 3 . This doubly swapped pair of integers is then passed to HDF5 and is stored
244+ unchanged.
245+
246+ ### Decoding
247+
248+ 1 . Decode on LE machine: no special action is required.
249+ HDF5 will get the two 4-bytes values from the file and byte-swap each
250+ separately. The concatenation of those two integers will be the expected
251+ LE value.
252+ 2 . Decode on a big endian (BE) machine: the inverse of the encode case must
253+ be implemented.
254+
255+ 1 . HDF5 sends the two 4-byte values to the filter.
256+ 2 . The filter must then byte-swap each 4-byte value independently.
257+ 3 . The filter then must concatenate the two 4-byte values into a single
258+ 8-byte value. Because of the encoding rules, this 8-byte value will
259+ be in LE format.
260+ 4 . The filter must finally do an 8-byte byte-swap on that 8-byte value
261+ to convert it to desired BE format.
262+
263+ To support these rules, some utility programs exist and are discussed in
264+ <a href =" #AppendixA " >Appendix A</a >.
265+
266+ # Filter Specification Syntax {#filters_syntax}
246267
247268Both of the utilities
248269<a href =" #NCGEN " >__ ncgen__ </a >
249270and
250271<a href =" #NCCOPY " >__ nccopy__ </a >
251- allow the specification of filter parameters.
272+ allow the specification of filter parameters in text format .
252273These specifications consist of a sequence of comma
253274separated constants. The constants are converted
254275within the utility to a proper set of unsigned int
@@ -257,7 +278,7 @@ constants (see the <a href="#ParamEncode">parameter encoding section</a>).
257278To simplify things, various kinds of constants can be specified
258279rather than just simple unsigned integers. The utilities will encode
259280them properly using the rules specified in
260- the <a href =" #ParamEncode " >parameter encoding section </a >.
281+ the section on <a href =" #filters_paramcoding " >parameter encode/decode </a >.
261282
262283The currently supported constants are as follows.
263284<table >
@@ -270,9 +291,9 @@ The currently supported constants are as follows.
270291<tr ><td >77<td >implicit unsigned 32-bit integer<td >No tag<td >
271292<tr ><td >93U<td >explicit unsigned 32-bit integer<td >u|U<td >
272293<tr ><td >789f<td >32-bit float<td >f|F<td >
273- <tr ><td >12345678.12345678d<td >64-bit double<td >d|D<td >Network byte order
274- <tr ><td >-9223372036854775807L<td >64-bit signed long long<td >l|L<td >Network byte order
275- <tr ><td >18446744073709551615UL<td >64-bit unsigned long long<td >u|U l|L<td >Network byte order
294+ <tr ><td >12345678.12345678d<td >64-bit double<td >d|D<td >LE encoding
295+ <tr ><td >-9223372036854775807L<td >64-bit signed long long<td >l|L<td >LE encoding
296+ <tr ><td >18446744073709551615UL<td >64-bit unsigned long long<td >u|U l|L<td >LE encoding
276297</table >
277298Some things to note.
278299
@@ -283,18 +304,18 @@ Some things to note.
2833042 . For signed byte and short, the value is sign extended to 32 bits
284305 and then treated as an unsigned int value.
2853063 . For double, and signed|unsigned long long, they are converted
286- to network byte order and then treated as two unsigned int values.
287- This is consistent with the <a href =" #ParamEncode " >parameter encoding </a >.
307+ as specified in the section on
308+ <a href =" #filters_paramcoding " >parameter encode/decode </a >.
288309
289- Dynamic Loading Process {#Process }
310+ Dynamic Loading Process {#filters_Process }
290311==========
291312
292313The documentation[ 1,2] for the HDF5 dynamic loading was (at the time
293314this was written) out-of-date with respect to the actual HDF5 code
294315(see HDF5PL.c). So, the following discussion is largely derived
295316from looking at the actual code. This means that it is subject to change.
296317
297- Plugin directory {#Plugindir }
318+ Plugin directory {#filters_Plugindir }
298319----------------
299320
300321The HDF5 loader expects plugins to be in a specified plugin directory.
@@ -306,7 +327,7 @@ The default directory is:
306327The default may be overridden using the environment variable
307328__ HDF5_PLUGIN_PATH__ .
308329
309- Plugin Library Naming {#Pluginlib }
330+ Plugin Library Naming {#filters_Pluginlib }
310331---------------------
311332
312333Given a plugin directory, HDF5 examines every file in that
@@ -320,7 +341,7 @@ as determined by the platform on which the library is being executed.
320341<tr halign =" left " ><td >Windows<td >*<td >.dll
321342</table >
322343
323- Plugin Verification {#Pluginverify }
344+ Plugin Verification {#filters_Pluginverify }
324345-------------------
325346For each dynamic library located using the previous patterns,
326347HDF5 attempts to load the library and attempts to obtain information
@@ -340,7 +361,7 @@ specified for the variable in __nc_def_var_filter__ in order to be used.
340361If plugin verification fails, then that plugin is ignored and
341362the search continues for another, matching plugin.
342363
343- Debugging {#Debug }
364+ Debugging {#filters_Debug }
344365-------
345366Debugging plugins can be very difficult. You will probably
346367need to use the old printf approach for debugging the filter itself.
@@ -356,7 +377,7 @@ Since ncdump is not being asked to access the data (the -h flag), it
356377can obtain the filter information without failures. Then it can print
357378out the filter id and the parameters (the -s flag).
358379
359- Test Case {#TestCase }
380+ Test Case {#filters_TestCase }
360381-------
361382Within the netcdf-c source tree, the directory
362383__ netcdf-c/nc_test4__ contains a test case (__ test_filter.c__ ) for
@@ -365,7 +386,7 @@ bzip2. Another test (__test_filter_misc.c__) validates
365386parameter passing. These tests are disabled if __ --enable-shared__
366387is not set or if __ --enable-netcdf-4__ is not set.
367388
368- Example {#Example }
389+ Example {#filters_Example }
369390-------
370391A slightly simplified version of the filter test case is also
371392available as an example within the netcdf-c source tree
@@ -444,54 +465,43 @@ has been known to work.
444465gcc -g -O0 -shared -o libbzip2.so <plugin source files> -L${HDF5LIBDIR} -lhdf5_hl -lhdf5 -L${ZLIBDIR} -lz
445466````
446467
447- Appendix A. Byte Swap Code {#AppendixA }
468+ Appendix A. Support Utilities {#filters_AppendixA }
448469==========
449- Since in some cases, it is necessary for a filter to
450- byte swap from little-endian to big-endian, This appendix
451- provides sample code for doing this. It also provides
452- a code snippet for testing if the machine the
453- endianness of a machine.
454470
455- Byte swap an 8-byte chunk of memory
456- -------
457- ````
458- static void
459- byteswap8(unsigned char* mem)
460- {
461- register unsigned char c;
462- c = mem[0];
463- mem[0] = mem[7];
464- mem[7] = c;
465- c = mem[1];
466- mem[1] = mem[6];
467- mem[6] = c;
468- c = mem[2];
469- mem[2] = mem[5];
470- mem[5] = c;
471- c = mem[3];
472- mem[3] = mem[4];
473- mem[4] = c;
474- }
475-
476- ````
477-
478- Test for Machine Endianness
479- -------
480- ````
481- static const unsigned char b[4] = {0x0,0x0,0x0,0x1}; /* value 1 in big-endian*/
482- int endianness = (1 == *(unsigned int*)b); /* 1=>big 0=>little endian
483- ````
484- References {#References}
485- ========================
471+ Two functions are exported from the netcdf-c library
472+ for use by client programs and by filter implementations.
473+
474+ 1 . ```` int NC_parsefilterspec(const char* spec, unsigned int* idp, size_t* nparamsp, unsigned int** paramsp); ````
475+ * idp will contain the filter id value from the spec.
476+ * nparamsp will contain the number of 4-byte parameters
477+ * paramsp will contain a pointer to the parsed parameters -- the caller
478+ must free.
479+ This function can parse filter spec strings as defined in
480+ the section on <a href =" #filters_syntax " >Filter Specification Syntax</a >.
481+ This function parses the first argument and returns several values.
482+
483+ 2 . ```` int NC_filterfix8(unsigned char* mem8, int decode); ````
484+ * mem8 is a pointer to the 8-byte value either to fix.
485+ * decode is 1 if the function should apply the 8-byte decoding algorithm
486+ else apply the encoding algorithm.
487+ This function implements the 8-byte conversion algorithms.
488+ Before calling * nc_def_var_filter* (unless * NC_parsefilterspec* was used),
489+ the client must call this function with the decode argument set to 0.
490+ Inside the filter code, this function should be called with the decode
491+ argument set to 1.
492+
493+ Examples of the use of these functions can be seen in the test program
494+ * nc_test4/tst_filterparser.c* .
495+
496+ # References {#filters_References}
486497
4874981 . https://support.hdfgroup.org/HDF5/doc/Advanced/DynamicallyLoadedFilters/HDF5DynamicallyLoadedFilters.pdf
4884992 . https://support.hdfgroup.org/HDF5/doc/TechNotes/TechNote-HDF5-CompressionTroubleshooting.pdf
4895003 . https://portal.hdfgroup.org/display/support/Contributions#Contributions-filters
4905014 . https://support.hdfgroup.org/services/contributions.html#filters
4915025 . https://support.hdfgroup.org/HDF5/doc/RM/RM_H5.html
492503
493- Point of Contact
494- ================
504+ # Point of Contact
495505
496506__ Author__ : Dennis Heimbigner<br >
497507__ Email__ : dmh at ucar dot edu
0 commit comments