Skip to content

Commit 3ad9b29

Browse files
authored
Fix misaligned memory access in LossyDctDecoder_execute HALF→FLOAT expansion (#2324)
* Fix misaligned memory access in `LossyDctDecoder_execute` HALF→FLOAT expansion After DCT decoding, `LossyDctDecoder_execute()` expands FLOAT-type channels from their intermediate HALF (16-bit) XDR representation back to FLOAT (32-bit) XDR in place. The expansion was done by casting `_rows[y]` (a `uint8_t *`) directly to `float *` and `uint16_t *`, then reading and writing through those typed pointers. Because row buffers are assigned by advancing a byte pointer with no alignment padding (`outBufferEnd += chan->width * chan->bytes_per_element` in `internal_dwa_compressor.h`), a FLOAT channel that follows a HALF channel of odd width receives a `_rows[y]` pointer that is 2-byte aligned but not 4-byte aligned. Dereferencing a `float *` cast from such a pointer is undefined behavior under the C standard: - On ARM, RISC-V, and MIPS (strict alignment) this crashes immediately. - On x86 it is silently tolerated at the hardware level but remains UB: auto-vectorizing compilers (SSE/AVX) may assume aligned access and generate incorrect code. - UBSan reports: `store to misaligned address ... for type 'float', which requires 4 byte alignment` at `internal_dwa_decoder.h:749`. Fix: replace the cast-and-dereference pattern with the `unaligned_load16` / `memcpy` / `unaligned_store32` helpers already used throughout the rest of OpenEXRCore (`internal_xdr.h`, `unpack.c`, `pack.c`, `internal_pxr24.c`). These helpers use `memcpy` internally, which the C standard guarantees is safe for unaligned addresses and which compilers compile to a single load/store instruction on architectures that support it. The byte-order handling is preserved correctly: - `unaligned_load16` reads 2 bytes via `memcpy` and applies `one_to_native16` (XDR → native), returning a native-endian HALF value. - `half_to_float` converts native HALF → native float. - `memcpy(&bits, &f, 4)` reinterprets the float's bit pattern as `uint32_t` without numeric conversion (the correct type-pun idiom in C). - `unaligned_store32` applies `one_from_native32` (native → XDR) and writes 4 bytes via `memcpy`, storing the result in XDR float format. Made-with: Cursor Signed-off-by: Cary Phillips <cary@ilm.com> * add TODO comment Signed-off-by: Cary Phillips <cary@ilm.com> --------- Signed-off-by: Cary Phillips <cary@ilm.com>
1 parent fd6e400 commit 3ad9b29

1 file changed

Lines changed: 13 additions & 4 deletions

File tree

src/lib/OpenEXRCore/internal_dwa_decoder.h

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -741,13 +741,22 @@ LossyDctDecoder_execute (
741741
/* process in place in reverse to avoid temporary buffer */
742742
for (int y = 0; y < d->_height; ++y)
743743
{
744-
float* floatXdrPtr = (float*) chanData[chan]->_rows[y];
745-
uint16_t* halfXdr = (uint16_t*) floatXdrPtr;
744+
uint8_t* rowBytes = chanData[chan]->_rows[y];
746745

747746
for (int x = d->_width - 1; x >= 0; --x)
748747
{
749-
floatXdrPtr[x] = one_from_native_float (
750-
half_to_float (one_to_native16 (halfXdr[x])));
748+
// TODO: make an unaligned_store32f that takes the float and
749+
// packages up a one_from_native_float and calls memcpy
750+
// instead of the two memcpy. We should look at the metrics
751+
// for dwa and see if there's a performance difference to do
752+
// so at some point. See:
753+
// https://github.com/AcademySoftwareFoundation/openexr/pull/2324
754+
755+
uint16_t h = unaligned_load16 (rowBytes + x * sizeof (uint16_t));
756+
float f = half_to_float (h);
757+
uint32_t bits;
758+
memcpy (&bits, &f, sizeof (bits));
759+
unaligned_store32 (rowBytes + x * sizeof (float), bits);
751760
}
752761
}
753762
}

0 commit comments

Comments
 (0)