Fix misaligned memory access in LossyDctDecoder_execute HALF→FLOAT expansion (#2324)

cary-ilm · web-flow · commit 3ad9b29430f9 · 2026-03-31T07:49:09.000-07:00
* Fix misaligned memory access in `LossyDctDecoder_execute` HALF→FLOAT expansion

After DCT decoding, `LossyDctDecoder_execute()` expands FLOAT-type channels
from their intermediate HALF (16-bit) XDR representation back to FLOAT (32-bit)
XDR in place.  The expansion was done by casting `_rows[y]` (a `uint8_t *`)
directly to `float *` and `uint16_t *`, then reading and writing through those
typed pointers.

Because row buffers are assigned by advancing a byte pointer with no alignment
padding (`outBufferEnd += chan-&gt;width * chan-&gt;bytes_per_element` in
`internal_dwa_compressor.h`), a FLOAT channel that follows a HALF channel of
odd width receives a `_rows[y]` pointer that is 2-byte aligned but not 4-byte
aligned.  Dereferencing a `float *` cast from such a pointer is undefined
behavior under the C standard:

- On ARM, RISC-V, and MIPS (strict alignment) this crashes immediately.
- On x86 it is silently tolerated at the hardware level but remains UB:
  auto-vectorizing compilers (SSE/AVX) may assume aligned access and generate
  incorrect code.
- UBSan reports: `store to misaligned address ... for type 'float', which
  requires 4 byte alignment` at `internal_dwa_decoder.h:749`.

Fix: replace the cast-and-dereference pattern with the `unaligned_load16` /
`memcpy` / `unaligned_store32` helpers already used throughout the rest of
OpenEXRCore (`internal_xdr.h`, `unpack.c`, `pack.c`, `internal_pxr24.c`).
These helpers use `memcpy` internally, which the C standard guarantees is safe
for unaligned addresses and which compilers compile to a single load/store
instruction on architectures that support it.

The byte-order handling is preserved correctly:
- `unaligned_load16` reads 2 bytes via `memcpy` and applies `one_to_native16`
  (XDR → native), returning a native-endian HALF value.
- `half_to_float` converts native HALF → native float.
- `memcpy(&amp;bits, &amp;f, 4)` reinterprets the float's bit pattern as `uint32_t`
  without numeric conversion (the correct type-pun idiom in C).
- `unaligned_store32` applies `one_from_native32` (native → XDR) and writes
  4 bytes via `memcpy`, storing the result in XDR float format.

Made-with: Cursor
Signed-off-by: Cary Phillips &lt;cary@ilm.com&gt;

* add TODO comment

Signed-off-by: Cary Phillips &lt;cary@ilm.com&gt;

---------

Signed-off-by: Cary Phillips &lt;cary@ilm.com&gt;
diff --git a/src/lib/OpenEXRCore/internal_dwa_decoder.h b/src/lib/OpenEXRCore/internal_dwa_decoder.h
@@ -741,13 +741,22 @@ LossyDctDecoder_execute (
         /* process in place in reverse to avoid temporary buffer */
         for (int y = 0; y < d->_height; ++y)
         {
-            float*    floatXdrPtr = (float*) chanData[chan]->_rows[y];
-            uint16_t* halfXdr     = (uint16_t*) floatXdrPtr;
+            uint8_t* rowBytes = chanData[chan]->_rows[y];
 
             for (int x = d->_width - 1; x >= 0; --x)
             {
-                floatXdrPtr[x] = one_from_native_float (
-                    half_to_float (one_to_native16 (halfXdr[x])));
+                // TODO: make an unaligned_store32f that takes the float and
+                // packages up a one_from_native_float and calls memcpy
+                // instead of the two memcpy. We should look at the metrics
+                // for dwa and see if there's a performance difference to do
+                // so at some point. See:
+                // https://github.com/AcademySoftwareFoundation/openexr/pull/2324
+
+                uint16_t h = unaligned_load16 (rowBytes + x * sizeof (uint16_t));
+                float    f = half_to_float (h);
+                uint32_t bits;
+                memcpy (&bits, &f, sizeof (bits));
+                unaligned_store32 (rowBytes + x * sizeof (float), bits);
             }
         }
     }

Original file line number	Diff line number	Diff line change
`@@ -741,13 +741,22 @@ LossyDctDecoder_execute (`
`741`	`741`	`/* process in place in reverse to avoid temporary buffer */`
`742`	`742`	`for (int y = 0; y < d->_height; ++y)`
`743`	`743`	`{`
`744`		`- float* floatXdrPtr = (float*) chanData[chan]->_rows[y];`
`745`		`- uint16_t* halfXdr = (uint16_t*) floatXdrPtr;`
	`744`	`+ uint8_t* rowBytes = chanData[chan]->_rows[y];`
`746`	`745`
`747`	`746`	`for (int x = d->_width - 1; x >= 0; --x)`
`748`	`747`	`{`
`749`		`- floatXdrPtr[x] = one_from_native_float (`
`750`		`- half_to_float (one_to_native16 (halfXdr[x])));`
	`748`	`+ // TODO: make an unaligned_store32f that takes the float and`
	`749`	`+ // packages up a one_from_native_float and calls memcpy`
	`750`	`+ // instead of the two memcpy. We should look at the metrics`
	`751`	`+ // for dwa and see if there's a performance difference to do`
	`752`	`+ // so at some point. See:`
	`753`	`+ // https://github.com/AcademySoftwareFoundation/openexr/pull/2324`
	`754`	`+`
	`755`	`+ uint16_t h = unaligned_load16 (rowBytes + x * sizeof (uint16_t));`
	`756`	`+ float f = half_to_float (h);`
	`757`	`+ uint32_t bits;`
	`758`	`+ memcpy (&bits, &f, sizeof (bits));`
	`759`	`+ unaligned_store32 (rowBytes + x * sizeof (float), bits);`
`751`	`760`	`}`
`752`	`761`	`}`
`753`	`762`	`}`