Fix misaligned memory access in LossyDctDecoder_execute HALF→FLOAT expansion (#2324)

cary-ilm · musicinmybrain · commit 219a0870506f · 2026-04-07T13:40:49.000+01:00
* Fix misaligned memory access in `LossyDctDecoder_execute` HALF→FLOAT expansion

After DCT decoding, `LossyDctDecoder_execute()` expands FLOAT-type channels
from their intermediate HALF (16-bit) XDR representation back to FLOAT (32-bit)
XDR in place.  The expansion was done by casting `_rows[y]` (a `uint8_t *`)
directly to `float *` and `uint16_t *`, then reading and writing through those
typed pointers.

Because row buffers are assigned by advancing a byte pointer with no alignment
padding (`outBufferEnd += chan-&gt;width * chan-&gt;bytes_per_element` in
`internal_dwa_compressor.h`), a FLOAT channel that follows a HALF channel of
odd width receives a `_rows[y]` pointer that is 2-byte aligned but not 4-byte
aligned.  Dereferencing a `float *` cast from such a pointer is undefined
behavior under the C standard:

- On ARM, RISC-V, and MIPS (strict alignment) this crashes immediately.
- On x86 it is silently tolerated at the hardware level but remains UB:
  auto-vectorizing compilers (SSE/AVX) may assume aligned access and generate
  incorrect code.
- UBSan reports: `store to misaligned address ... for type 'float', which
  requires 4 byte alignment` at `internal_dwa_decoder.h:749`.

Fix: replace the cast-and-dereference pattern with the `unaligned_load16` /
`memcpy` / `unaligned_store32` helpers already used throughout the rest of
OpenEXRCore (`internal_xdr.h`, `unpack.c`, `pack.c`, `internal_pxr24.c`).
These helpers use `memcpy` internally, which the C standard guarantees is safe
for unaligned addresses and which compilers compile to a single load/store
instruction on architectures that support it.

The byte-order handling is preserved correctly:
- `unaligned_load16` reads 2 bytes via `memcpy` and applies `one_to_native16`
  (XDR → native), returning a native-endian HALF value.
- `half_to_float` converts native HALF → native float.
- `memcpy(&amp;bits, &amp;f, 4)` reinterprets the float's bit pattern as `uint32_t`
  without numeric conversion (the correct type-pun idiom in C).
- `unaligned_store32` applies `one_from_native32` (native → XDR) and writes
  4 bytes via `memcpy`, storing the result in XDR float format.

Made-with: Cursor
Signed-off-by: Cary Phillips &lt;cary@ilm.com&gt;

* add TODO comment

Signed-off-by: Cary Phillips &lt;cary@ilm.com&gt;

---------

Signed-off-by: Cary Phillips &lt;cary@ilm.com&gt;
diff --git a/pxr/imaging/plugin/hioOpenEXR/OpenEXR/OpenEXRCore/internal_dwa_decoder.h b/pxr/imaging/plugin/hioOpenEXR/OpenEXR/OpenEXRCore/internal_dwa_decoder.h
@@ -651,13 +651,22 @@ LossyDctDecoder_execute (
         /* process in place in reverse to avoid temporary buffer */
         for (int y = 0; y < d->_height; ++y)
         {
-            float*    floatXdrPtr = (float*) chanData[chan]->_rows[y];
-            uint16_t* halfXdr     = (uint16_t*) floatXdrPtr;
+            uint8_t* rowBytes = chanData[chan]->_rows[y];
 
             for (int x = d->_width - 1; x >= 0; --x)
             {
-                floatXdrPtr[x] = one_from_native_float (
-                    half_to_float (one_to_native16 (halfXdr[x])));
+                // TODO: make an unaligned_store32f that takes the float and
+                // packages up a one_from_native_float and calls memcpy
+                // instead of the two memcpy. We should look at the metrics
+                // for dwa and see if there's a performance difference to do
+                // so at some point. See:
+                // https://github.com/AcademySoftwareFoundation/openexr/pull/2324
+
+                uint16_t h = unaligned_load16 (rowBytes + x * sizeof (uint16_t));
+                float    f = half_to_float (h);
+                uint32_t bits;
+                memcpy (&bits, &f, sizeof (bits));
+                unaligned_store32 (rowBytes + x * sizeof (float), bits);
             }
         }
     }

Original file line number	Diff line number	Diff line change
`@@ -651,13 +651,22 @@ LossyDctDecoder_execute (`
`651`	`651`	`/* process in place in reverse to avoid temporary buffer */`
`652`	`652`	`for (int y = 0; y < d->_height; ++y)`
`653`	`653`	`{`
`654`		`- float* floatXdrPtr = (float*) chanData[chan]->_rows[y];`
`655`		`- uint16_t* halfXdr = (uint16_t*) floatXdrPtr;`
	`654`	`+ uint8_t* rowBytes = chanData[chan]->_rows[y];`
`656`	`655`
`657`	`656`	`for (int x = d->_width - 1; x >= 0; --x)`
`658`	`657`	`{`
`659`		`- floatXdrPtr[x] = one_from_native_float (`
`660`		`- half_to_float (one_to_native16 (halfXdr[x])));`
	`658`	`+ // TODO: make an unaligned_store32f that takes the float and`
	`659`	`+ // packages up a one_from_native_float and calls memcpy`
	`660`	`+ // instead of the two memcpy. We should look at the metrics`
	`661`	`+ // for dwa and see if there's a performance difference to do`
	`662`	`+ // so at some point. See:`
	`663`	`+ // https://github.com/AcademySoftwareFoundation/openexr/pull/2324`
	`664`	`+`
	`665`	`+ uint16_t h = unaligned_load16 (rowBytes + x * sizeof (uint16_t));`
	`666`	`+ float f = half_to_float (h);`
	`667`	`+ uint32_t bits;`
	`668`	`+ memcpy (&bits, &f, sizeof (bits));`
	`669`	`+ unaligned_store32 (rowBytes + x * sizeof (float), bits);`
`661`	`670`	`}`
`662`	`671`	`}`
`663`	`672`	`}`