Skip to content

Commit 56f9489

Browse files
authored
Merge pull request #1302 from Unidata/bigend.dmh
Fix errors when building on big-endian machine
2 parents a260bbb + 8d59f4e commit 56f9489

19 files changed

Lines changed: 1207 additions & 609 deletions

File tree

docs/filters.md

Lines changed: 121 additions & 111 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,12 @@ NetCDF-4 Filter Support
22
============================
33
<!-- double header is needed to workaround doxygen bug -->
44

5-
NetCDF-4 Filter Support {#compress}
6-
=================================
5+
NetCDF-4 Filter Support {#filters}
6+
============================
77

88
[TOC]
99

10-
Introduction {#compress_intro}
11-
==================
10+
# Introduction {#filters_intro}
1211

1312
The HDF5 library (1.8.11 and later)
1413
supports a general filter mechanism to apply various
@@ -37,8 +36,7 @@ can locate, load, and utilize the compressor.
3736
These libraries are expected to installed in a specific
3837
directory.
3938

40-
Enabling A Compression Filter {#Enable}
41-
=============================
39+
# Enabling A Compression Filter {#filters_enable}
4240

4341
In order to compress a variable, the netcdf-c library
4442
must be given three pieces of information:
@@ -66,8 +64,7 @@ using __ncgen__, via an API call, or via command line parameters to __nccopy__.
6664
In any case, remember that filtering also requires setting chunking, so the
6765
variable must also be marked with chunking information.
6866

69-
Using The API {#API}
70-
-------------
67+
## Using The API {#filters_API}
7168
The necessary API methods are included in __netcdf.h__ by default.
7269
One API method is for setting the filter to be used
7370
when writing a variable. The relevant signature is
@@ -90,8 +87,7 @@ __params__. As is usual with the netcdf API, one is expected to call
9087
this function twice. The first time to get __nparams__ and the
9188
second to get the parameters in client-allocated memory.
9289

93-
Using ncgen {#NCGEN}
94-
-------------
90+
## Using ncgen {#filters_NCGEN}
9591

9692
In a CDL file, compression of a variable can be specified
9793
by annotating it with the following attribute:
@@ -107,8 +103,8 @@ This is a "special" attribute, which means that
107103
it will normally be invisible when using
108104
__ncdump__ unless the -s flag is specified.
109105

110-
Example CDL File (Data elided)
111-
------------------------------
106+
### Example CDL File (Data elided)
107+
112108
````
113109
netcdf bzip2 {
114110
dimensions:
@@ -123,8 +119,8 @@ data:
123119
}
124120
````
125121

126-
Using nccopy {#NCCOPY}
127-
-------------
122+
## Using nccopy {#filters_NCCOPY}
123+
128124
When copying a netcdf file using __nccopy__ it is possible
129125
to specify filter information for any output variable by
130126
using the "-F" option on the command line; for example:
@@ -171,8 +167,7 @@ by this table.
171167
<tr><td>false<td>-Fvar,...<td>NA<td>use output filter
172168
</table>
173169

174-
Parameter Encoding {#ParamEncode}
175-
==========
170+
# Parameter Encode/Decode {#filters_paramcoding}
176171

177172
The parameters passed to a filter are encoded internally as a vector
178173
of 32-bit unsigned integers. It may be that the parameters
@@ -198,57 +193,83 @@ them between the local machine byte order and network byte
198193
order.
199194

200195
Parameters whose size is larger than 32-bits present a byte order problem.
201-
This typically includes double precision floats and (signed or unsigned)
202-
64-bit integers. For these cases, the machine byte order must be
203-
handled by the compression code. This is because HDF5 will treat,
196+
This specifically includes double precision floats and (signed or unsigned)
197+
64-bit integers. For these cases, the machine byte order issue must be
198+
handled, in part, by the compression code. This is because HDF5 will treat,
204199
for example, an unsigned long long as two 32-bit unsigned integers
205200
and will convert each to network order separately. This means that
206201
on a machine whose byte order is different than the machine in which
207-
the parameters were initially created, the two integers are out of order
208-
and must be swapped to get the correct unsigned long long value.
209-
Consider this example. Suppose we have this little endian unsigned long long.
210-
211-
1000000230000004
212-
213-
In network byte order, it will be stored as two 32-bit integers.
214-
215-
20000001 40000003
216-
217-
On a big endian machine, this will be given to the filter in that form.
218-
219-
2000000140000003
220-
221-
But note that the proper big endian unsigned long long form is this.
222-
223-
4000000320000001
224-
225-
So, the two words need to be swapped.
226-
227-
But consider the case when both original and final machines are big endian.
228-
229-
1. 4000000320000001
230-
2. 40000003 20000001
231-
3. 40000003 20000001
232-
233-
where #1 is the original number, #2 is the network order and
234-
#3 is the what is given to the filter. In this case we do not
235-
want to swap words.
236-
237-
The solution is to forcibly encode the original number using some
238-
specified endianness so that the filter always assumes it is getting
239-
its parameters in that order and will always do swapping as needed.
240-
This is irritating, but one needs to be aware of it. Since most
241-
machines are little-endian. We choose to use that as the endianness
242-
for handling 64 bit entities.
243-
244-
Filter Specification Syntax {#Syntax}
245-
==========
202+
the parameters were initially created, the two integers will be separately
203+
endian converted. But this will be incorrect for 64-bit values.
204+
205+
So, we have this situation:
206+
207+
1. the 8 bytes come in as native machine order for the machine
208+
doing the call to *nc_def_var_filter*.
209+
2. HDF5 divides the 8 bytes into 2 four byte pieces and ensures that each piece
210+
is in network (big) endian order.
211+
3. When the filter is called, the two pieces are returned in the same order
212+
but with the bytes in each piece consistent with the native machine order
213+
for the machine executing the filter.
214+
215+
## Encoding Algorithms
216+
217+
In order to properly extract the correct 8-byte value, we need to ensure
218+
that the values stored in the HDF5 file have a known format independent of
219+
the native format of the creating machine.
220+
221+
The idea is to do sufficient manipulation so that HDF5
222+
will store the 8-byte value as a little endian value
223+
divided into two 4-byte integers.
224+
Note that little-endian is used as the standard
225+
because it is the most common machine format.
226+
When read, the filter code needs to be aware of this convention
227+
and do the appropriate conversions.
228+
229+
This leads to the following set of rules.
230+
231+
### Encoding
232+
233+
1. Encode on little endian (LE) machine: no special action is required.
234+
The 8-byte value is passed to HDF5 as two 4-byte integers. HDF5 byte
235+
swaps each integer and stores it in the file.
236+
2. Encode on a big endian (BE) machine: several steps are required:
237+
238+
1. Do an 8-byte byte swap to convert the original value to little-endian
239+
format.
240+
2. Since the encoding machine is BE, HDF5 will just store the value.
241+
So it is necessary to simulate little endian encoding by byte-swapping
242+
each 4-byte integer separately.
243+
3. This doubly swapped pair of integers is then passed to HDF5 and is stored
244+
unchanged.
245+
246+
### Decoding
247+
248+
1. Decode on LE machine: no special action is required.
249+
HDF5 will get the two 4-bytes values from the file and byte-swap each
250+
separately. The concatenation of those two integers will be the expected
251+
LE value.
252+
2. Decode on a big endian (BE) machine: the inverse of the encode case must
253+
be implemented.
254+
255+
1. HDF5 sends the two 4-byte values to the filter.
256+
2. The filter must then byte-swap each 4-byte value independently.
257+
3. The filter then must concatenate the two 4-byte values into a single
258+
8-byte value. Because of the encoding rules, this 8-byte value will
259+
be in LE format.
260+
4. The filter must finally do an 8-byte byte-swap on that 8-byte value
261+
to convert it to desired BE format.
262+
263+
To support these rules, some utility programs exist and are discussed in
264+
<a href="#AppendixA">Appendix A</a>.
265+
266+
# Filter Specification Syntax {#filters_syntax}
246267

247268
Both of the utilities
248269
<a href="#NCGEN">__ncgen__</a>
249270
and
250271
<a href="#NCCOPY">__nccopy__</a>
251-
allow the specification of filter parameters.
272+
allow the specification of filter parameters in text format.
252273
These specifications consist of a sequence of comma
253274
separated constants. The constants are converted
254275
within the utility to a proper set of unsigned int
@@ -257,7 +278,7 @@ constants (see the <a href="#ParamEncode">parameter encoding section</a>).
257278
To simplify things, various kinds of constants can be specified
258279
rather than just simple unsigned integers. The utilities will encode
259280
them properly using the rules specified in
260-
the <a href="#ParamEncode">parameter encoding section</a>.
281+
the section on <a href="#filters_paramcoding">parameter encode/decode</a>.
261282

262283
The currently supported constants are as follows.
263284
<table>
@@ -270,9 +291,9 @@ The currently supported constants are as follows.
270291
<tr><td>77<td>implicit unsigned 32-bit integer<td>No tag<td>
271292
<tr><td>93U<td>explicit unsigned 32-bit integer<td>u|U<td>
272293
<tr><td>789f<td>32-bit float<td>f|F<td>
273-
<tr><td>12345678.12345678d<td>64-bit double<td>d|D<td>Network byte order
274-
<tr><td>-9223372036854775807L<td>64-bit signed long long<td>l|L<td>Network byte order
275-
<tr><td>18446744073709551615UL<td>64-bit unsigned long long<td>u|U l|L<td>Network byte order
294+
<tr><td>12345678.12345678d<td>64-bit double<td>d|D<td>LE encoding
295+
<tr><td>-9223372036854775807L<td>64-bit signed long long<td>l|L<td>LE encoding
296+
<tr><td>18446744073709551615UL<td>64-bit unsigned long long<td>u|U l|L<td>LE encoding
276297
</table>
277298
Some things to note.
278299

@@ -283,18 +304,18 @@ Some things to note.
283304
2. For signed byte and short, the value is sign extended to 32 bits
284305
and then treated as an unsigned int value.
285306
3. For double, and signed|unsigned long long, they are converted
286-
to network byte order and then treated as two unsigned int values.
287-
This is consistent with the <a href="#ParamEncode">parameter encoding</a>.
307+
as specified in the section on
308+
<a href="#filters_paramcoding">parameter encode/decode</a>.
288309

289-
Dynamic Loading Process {#Process}
310+
Dynamic Loading Process {#filters_Process}
290311
==========
291312

292313
The documentation[1,2] for the HDF5 dynamic loading was (at the time
293314
this was written) out-of-date with respect to the actual HDF5 code
294315
(see HDF5PL.c). So, the following discussion is largely derived
295316
from looking at the actual code. This means that it is subject to change.
296317

297-
Plugin directory {#Plugindir}
318+
Plugin directory {#filters_Plugindir}
298319
----------------
299320

300321
The HDF5 loader expects plugins to be in a specified plugin directory.
@@ -306,7 +327,7 @@ The default directory is:
306327
The default may be overridden using the environment variable
307328
__HDF5_PLUGIN_PATH__.
308329

309-
Plugin Library Naming {#Pluginlib}
330+
Plugin Library Naming {#filters_Pluginlib}
310331
---------------------
311332

312333
Given a plugin directory, HDF5 examines every file in that
@@ -320,7 +341,7 @@ as determined by the platform on which the library is being executed.
320341
<tr halign="left"><td>Windows<td>*<td>.dll
321342
</table>
322343

323-
Plugin Verification {#Pluginverify}
344+
Plugin Verification {#filters_Pluginverify}
324345
-------------------
325346
For each dynamic library located using the previous patterns,
326347
HDF5 attempts to load the library and attempts to obtain information
@@ -340,7 +361,7 @@ specified for the variable in __nc_def_var_filter__ in order to be used.
340361
If plugin verification fails, then that plugin is ignored and
341362
the search continues for another, matching plugin.
342363

343-
Debugging {#Debug}
364+
Debugging {#filters_Debug}
344365
-------
345366
Debugging plugins can be very difficult. You will probably
346367
need to use the old printf approach for debugging the filter itself.
@@ -356,7 +377,7 @@ Since ncdump is not being asked to access the data (the -h flag), it
356377
can obtain the filter information without failures. Then it can print
357378
out the filter id and the parameters (the -s flag).
358379

359-
Test Case {#TestCase}
380+
Test Case {#filters_TestCase}
360381
-------
361382
Within the netcdf-c source tree, the directory
362383
__netcdf-c/nc_test4__ contains a test case (__test_filter.c__) for
@@ -365,7 +386,7 @@ bzip2. Another test (__test_filter_misc.c__) validates
365386
parameter passing. These tests are disabled if __--enable-shared__
366387
is not set or if __--enable-netcdf-4__ is not set.
367388

368-
Example {#Example}
389+
Example {#filters_Example}
369390
-------
370391
A slightly simplified version of the filter test case is also
371392
available as an example within the netcdf-c source tree
@@ -444,54 +465,43 @@ has been known to work.
444465
gcc -g -O0 -shared -o libbzip2.so <plugin source files> -L${HDF5LIBDIR} -lhdf5_hl -lhdf5 -L${ZLIBDIR} -lz
445466
````
446467

447-
Appendix A. Byte Swap Code {#AppendixA}
468+
Appendix A. Support Utilities {#filters_AppendixA}
448469
==========
449-
Since in some cases, it is necessary for a filter to
450-
byte swap from little-endian to big-endian, This appendix
451-
provides sample code for doing this. It also provides
452-
a code snippet for testing if the machine the
453-
endianness of a machine.
454470

455-
Byte swap an 8-byte chunk of memory
456-
-------
457-
````
458-
static void
459-
byteswap8(unsigned char* mem)
460-
{
461-
register unsigned char c;
462-
c = mem[0];
463-
mem[0] = mem[7];
464-
mem[7] = c;
465-
c = mem[1];
466-
mem[1] = mem[6];
467-
mem[6] = c;
468-
c = mem[2];
469-
mem[2] = mem[5];
470-
mem[5] = c;
471-
c = mem[3];
472-
mem[3] = mem[4];
473-
mem[4] = c;
474-
}
475-
476-
````
477-
478-
Test for Machine Endianness
479-
-------
480-
````
481-
static const unsigned char b[4] = {0x0,0x0,0x0,0x1}; /* value 1 in big-endian*/
482-
int endianness = (1 == *(unsigned int*)b); /* 1=>big 0=>little endian
483-
````
484-
References {#References}
485-
========================
471+
Two functions are exported from the netcdf-c library
472+
for use by client programs and by filter implementations.
473+
474+
1. ````int NC_parsefilterspec(const char* spec, unsigned int* idp, size_t* nparamsp, unsigned int** paramsp);````
475+
* idp will contain the filter id value from the spec.
476+
* nparamsp will contain the number of 4-byte parameters
477+
* paramsp will contain a pointer to the parsed parameters -- the caller
478+
must free.
479+
This function can parse filter spec strings as defined in
480+
the section on <a href="#filters_syntax">Filter Specification Syntax</a>.
481+
This function parses the first argument and returns several values.
482+
483+
2. ````int NC_filterfix8(unsigned char* mem8, int decode);````
484+
* mem8 is a pointer to the 8-byte value either to fix.
485+
* decode is 1 if the function should apply the 8-byte decoding algorithm
486+
else apply the encoding algorithm.
487+
This function implements the 8-byte conversion algorithms.
488+
Before calling *nc_def_var_filter* (unless *NC_parsefilterspec* was used),
489+
the client must call this function with the decode argument set to 0.
490+
Inside the filter code, this function should be called with the decode
491+
argument set to 1.
492+
493+
Examples of the use of these functions can be seen in the test program
494+
*nc_test4/tst_filterparser.c*.
495+
496+
# References {#filters_References}
486497

487498
1. https://support.hdfgroup.org/HDF5/doc/Advanced/DynamicallyLoadedFilters/HDF5DynamicallyLoadedFilters.pdf
488499
2. https://support.hdfgroup.org/HDF5/doc/TechNotes/TechNote-HDF5-CompressionTroubleshooting.pdf
489500
3. https://portal.hdfgroup.org/display/support/Contributions#Contributions-filters
490501
4. https://support.hdfgroup.org/services/contributions.html#filters
491502
5. https://support.hdfgroup.org/HDF5/doc/RM/RM_H5.html
492503

493-
Point of Contact
494-
================
504+
# Point of Contact
495505

496506
__Author__: Dennis Heimbigner<br>
497507
__Email__: dmh at ucar dot edu

include/Makefile.am

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
# Ed Hartnett, Dennis Heimbigner, Ward Fisher
88

9-
include_HEADERS = netcdf.h netcdf_meta.h netcdf_mem.h netcdf_aux.h
9+
include_HEADERS = netcdf.h netcdf_meta.h netcdf_mem.h netcdf_aux.h netcdf_filter.h
1010

1111
if BUILD_PARALLEL
1212
include_HEADERS += netcdf_par.h
@@ -17,7 +17,7 @@ ncuri.h ncutf8.h ncdispatch.h ncdimscale.h netcdf_f.h err_macros.h \
1717
ncbytes.h nchashmap.h ceconstraints.h rnd.h nclog.h ncconfigure.h \
1818
nc4internal.h nctime.h nc3internal.h onstack.h ncrc.h ncauth.h \
1919
ncoffsets.h nctestserver.h nc4dispatch.h nc3dispatch.h ncexternl.h \
20-
ncwinpath.h ncfilter.h ncindex.h hdf4dispatch.h hdf5internal.h \
20+
ncwinpath.h ncindex.h hdf4dispatch.h hdf5internal.h \
2121
nc_provenance.h hdf5dispatch.h
2222

2323
if USE_DAP

0 commit comments

Comments
 (0)