Skip to content

Commit 219a087

Browse files
cary-ilmmusicinmybrain
authored andcommitted
Fix misaligned memory access in LossyDctDecoder_execute HALF→FLOAT expansion (#2324)
* Fix misaligned memory access in `LossyDctDecoder_execute` HALF→FLOAT expansion After DCT decoding, `LossyDctDecoder_execute()` expands FLOAT-type channels from their intermediate HALF (16-bit) XDR representation back to FLOAT (32-bit) XDR in place. The expansion was done by casting `_rows[y]` (a `uint8_t *`) directly to `float *` and `uint16_t *`, then reading and writing through those typed pointers. Because row buffers are assigned by advancing a byte pointer with no alignment padding (`outBufferEnd += chan->width * chan->bytes_per_element` in `internal_dwa_compressor.h`), a FLOAT channel that follows a HALF channel of odd width receives a `_rows[y]` pointer that is 2-byte aligned but not 4-byte aligned. Dereferencing a `float *` cast from such a pointer is undefined behavior under the C standard: - On ARM, RISC-V, and MIPS (strict alignment) this crashes immediately. - On x86 it is silently tolerated at the hardware level but remains UB: auto-vectorizing compilers (SSE/AVX) may assume aligned access and generate incorrect code. - UBSan reports: `store to misaligned address ... for type 'float', which requires 4 byte alignment` at `internal_dwa_decoder.h:749`. Fix: replace the cast-and-dereference pattern with the `unaligned_load16` / `memcpy` / `unaligned_store32` helpers already used throughout the rest of OpenEXRCore (`internal_xdr.h`, `unpack.c`, `pack.c`, `internal_pxr24.c`). These helpers use `memcpy` internally, which the C standard guarantees is safe for unaligned addresses and which compilers compile to a single load/store instruction on architectures that support it. The byte-order handling is preserved correctly: - `unaligned_load16` reads 2 bytes via `memcpy` and applies `one_to_native16` (XDR → native), returning a native-endian HALF value. - `half_to_float` converts native HALF → native float. - `memcpy(&bits, &f, 4)` reinterprets the float's bit pattern as `uint32_t` without numeric conversion (the correct type-pun idiom in C). - `unaligned_store32` applies `one_from_native32` (native → XDR) and writes 4 bytes via `memcpy`, storing the result in XDR float format. Made-with: Cursor Signed-off-by: Cary Phillips <cary@ilm.com> * add TODO comment Signed-off-by: Cary Phillips <cary@ilm.com> --------- Signed-off-by: Cary Phillips <cary@ilm.com>
1 parent a764742 commit 219a087

1 file changed

Lines changed: 13 additions & 4 deletions

File tree

pxr/imaging/plugin/hioOpenEXR/OpenEXR/OpenEXRCore/internal_dwa_decoder.h

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -651,13 +651,22 @@ LossyDctDecoder_execute (
651651
/* process in place in reverse to avoid temporary buffer */
652652
for (int y = 0; y < d->_height; ++y)
653653
{
654-
float* floatXdrPtr = (float*) chanData[chan]->_rows[y];
655-
uint16_t* halfXdr = (uint16_t*) floatXdrPtr;
654+
uint8_t* rowBytes = chanData[chan]->_rows[y];
656655

657656
for (int x = d->_width - 1; x >= 0; --x)
658657
{
659-
floatXdrPtr[x] = one_from_native_float (
660-
half_to_float (one_to_native16 (halfXdr[x])));
658+
// TODO: make an unaligned_store32f that takes the float and
659+
// packages up a one_from_native_float and calls memcpy
660+
// instead of the two memcpy. We should look at the metrics
661+
// for dwa and see if there's a performance difference to do
662+
// so at some point. See:
663+
// https://github.com/AcademySoftwareFoundation/openexr/pull/2324
664+
665+
uint16_t h = unaligned_load16 (rowBytes + x * sizeof (uint16_t));
666+
float f = half_to_float (h);
667+
uint32_t bits;
668+
memcpy (&bits, &f, sizeof (bits));
669+
unaligned_store32 (rowBytes + x * sizeof (float), bits);
661670
}
662671
}
663672
}

0 commit comments

Comments
 (0)