Skip to content

We Need a Big-Endian CI Workflow for netCDF-C #3282

@edhartnett

Description

@edhartnett

Big-Endian CI Workflow for netCDF-C

Background and motivation

Most modern hardware — x86-64, ARM, RISC-V — is little-endian: multi-byte integers
are stored with the least-significant byte first. The netCDF-C library must also work
correctly on big-endian platforms, where the most-significant byte comes first. This
matters because:

  1. netCDF file format correctness: The classic netCDF format stores data in
    big-endian (XDR) byte order on disk regardless of the host architecture. The
    conversion routines in libsrc/ncx.m4 (and the generated ncx.c) handle this
    translation. Bugs in those routines may only manifest on big-endian hosts where
    no byte-swapping is needed and different code paths are taken.

  2. Real-world deployments: IBM z/Architecture (s390x) mainframes running Linux
    are big-endian and are used in production scientific and enterprise environments.
    netCDF-C should build and pass its test suite on these systems.

  3. Latent bugs: Several bugs in netCDF-C are only exposed on big-endian platforms
    because the little-endian code path is exercised far more often. Without a CI job
    that runs on a big-endian host, these bugs go undetected until a user reports them.
    One such bug was found during the development of this workflow: an undeclared
    fillp variable in the ncx_getn_float_float and ncx_getn_double_double
    fallback paths in libsrc/ncx.m4 (see bugs section below).

Since no big-endian hardware is available in GitHub Actions, we use QEMU software
emulation to run an s390x (IBM Z) environment on a standard x86-64 runner.

Current state

Workflow file: .github/workflows/run_tests_bigendian.yml

Final design

Trigger

  • workflow_dispatch (manual)
  • pull_request targeting main

Architecture

  • s390x (IBM z/Architecture, big-endian) via uraimo/run-on-arch-action@v2
  • distro: ubuntu22.04 inside the QEMU container
  • ppc64 and ubuntu24.04 are not supported by the action

HDF5

  • Installed from apt (libhdf5-dev, libhdf5-103-1) — not built from source
  • Building from source fails because HDF5 configure tries to run compiled test binaries, which doesn't work under QEMU emulation
  • HDF5 serial library path: /usr/lib/s390x-linux-gnu/hdf5/serial
  • Headers: /usr/include/hdf5/serial
  • Must pass LIBS="-lhdf5_serial -lz" explicitly — the zlib check fails otherwise

Compiler

  • Native gcc/g++ installed inside the QEMU container (not cross-compilers)
  • Do NOT use --host=s390x-linux-gnu — the run: block executes natively inside the emulated s390x container, so no cross-compilation is needed
  • Cross-compiler packages (gcc-s390x-linux-gnu) caused GCC 11 ICEs under QEMU; native gcc is stable

Swap space

  • 8GB swap added on the host runner before QEMU starts via dd
  • Required because QEMU emulation is memory-intensive and GCC crashes (ICE/segfault) without sufficient memory

Configure flags

--enable-hdf5
--disable-dap
--disable-dap-remote-tests
--disable-nczarr
--disable-libxml2
--disable-shared
--enable-static
--disable-utilities      # skips ncgen/ncdump/ncrandom which triggered GCC ICEs

Make

  • -j 1 throughout (parallel builds triggered GCC ICEs under QEMU)

Bugs found and fixed during development

libsrc/ncx.m4fillp undeclared (fixed in master)

  • ncx_getn_float_float and ncx_getn_double_double #else fallback paths
    called ncx_get_float_float(xp, tp, fillp) and ncx_get_double_double(xp, tp, fillp)
    but fillp was not a parameter of the enclosing getn function
  • Only exposed when the #else branch is compiled (non-IEEE-754 or cross-compilation)
  • Fixed by replacing fillp with NULL in both calls
  • See /home/ed/ncx_m4_fillp_bug.md for full issue writeup

Architectures tried and rejected

  • ppc64 — not supported by uraimo/run-on-arch-action@v2
  • ubuntu24.04 — not supported by uraimo/run-on-arch-action@v2
  • bullseye (Debian) — apt mirrors incomplete/broken for s390x in the container
  • Cross-compiler (gcc-s390x-linux-gnu) — GCC 11 ICEs on random files under QEMU

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions