From 82c938f928de04a773128c45e039e59cc766bc20 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Thu, 31 Jul 2025 22:13:08 -0700
Subject: [PATCH 01/54] Copy EXT -> KHR verbatim

---
 .../Khronos/KHR_meshopt_compression/README.md | 540 ++++++++++++++++++
 ...buffer.EXT_meshopt_compression.schema.json |  16 +
 ...erView.EXT_meshopt_compression.schema.json |  51 ++
 3 files changed, 607 insertions(+)
 create mode 100644 extensions/2.0/Khronos/KHR_meshopt_compression/README.md
 create mode 100644 extensions/2.0/Khronos/KHR_meshopt_compression/schema/buffer.EXT_meshopt_compression.schema.json
 create mode 100644 extensions/2.0/Khronos/KHR_meshopt_compression/schema/bufferView.EXT_meshopt_compression.schema.json

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
new file mode 100644
index 0000000000..153a413ba8
--- /dev/null
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -0,0 +1,540 @@
+# EXT\_meshopt\_compression
+
+## Contributors
+
+* Arseny Kapoulkine, [@zeuxcg](https://twitter.com/zeuxcg)
+* Jasper St. Pierre, [@JasperRLZ](https://twitter.com/JasperRLZ)
+
+## Status
+
+Complete, Ratified by the Khronos Group
+
+## Dependencies
+
+Written against the glTF 2.0 spec.
+
+## Overview
+
+glTF files come with a variety of binary data - vertex attribute data, index data, morph target deltas, animation inputs/outputs - that can be a substantial fraction of the overall transmission size. To optimize for delivery size, general-purpose compression such as gzip can be used - however, it often doesn't capture some common types of redundancy in glTF binary data.
+
+This extension provides a generic option for compressing binary data that is tailored to the common types of data seen in glTF buffers. The extension works on a bufferView level and as such is agnostic of how the data is used, supporting geometry (vertex and index data, including morph targets), animation (keyframe time and values) and other data, such as instance transforms for `EXT_mesh_gpu_instancing`.
+
+Similarly to supercompressed textures (see `KHR_texture_basisu`), this extension assumes that the buffer view data is optimized for GPU efficiency - using quantization and using optimal data order for GPU rendering - and provides a compression layer on top of bufferView data. Each bufferView is compressed in isolation which allows the loaders to maximally efficiently decompress the data directly into GPU storage.
+
+The compressed format is designed to have two properties beyond optimizing compression ratio - very fast decoding (using WebAssembly SIMD, the decoders run at \~1 GB/sec on modern desktop hardware), and byte-wise storage compatible with general-purpose compression. That is, instead of reducing the encoded size as much as possible, the bitstream is constructed in such a way that general-purpose compressor can compress it further.
+
+This is beneficial for typical Web delivery scenarios, where all files are usually using gzip compression - instead of completely replacing it, the codecs here augment it, while still reducing the size (which is valuable to optimize delivery size when gzip compression isn't available, and additionally reduces the performance impact of gzip decompression which is typically *much slower* than decoders proposed here).
+
+## Specifying compressed views
+
+As explained in the overview, this extension operates on bufferViews. This allows the loaders to directly decompress data into GPU memory and minimizes the JSON size impact of specifying compressed data. To specify the compressed representation, `EXT_meshopt_compression` extension section overrides the source buffer index as well as specifying the buffer parameters and a compression mode/filter (detailed later in the specification):
+
+```json
+{
+	"buffer": 1,
+	"byteOffset": 0,
+	"byteLength": 2368,
+	"byteStride": 16,
+	"target": 34962,
+	"extensions": {
+		"EXT_meshopt_compression": {
+			"buffer": 0,
+			"byteOffset": 1024,
+			"byteLength": 347,
+			"byteStride": 16,
+			"mode": "ATTRIBUTES",
+			"count": 148
+		}
+	}
+}
+```
+
+In this example, the uncompressed buffer contents is stored in buffer 1 (this can be used by loaders that don't implement this extension). The compressed data is stored in a separate buffer, specifying a separate byte range (with compressed data). Note that for compressors to work, they need to know the compression `mode`, `filter` (for `"ATTRIBUTES"` mode), and additionally the layout of the encoded data - `count` elements with `byteStride` bytes each. This data is specified in the extension JSON; while in some cases `byteStride` is available on the parent `bufferView` declaration, JSON schema prohibits specifying this for some types of storage such as index data.
+
+## JSON schema updates
+
+Each `bufferView` can contain an extension object with the following properties:
+
+| Property | Type | Description | Required |
+|:---------|:--------------|:------------------------------------------| :--------------------------|
+| `buffer` | `integer` | The index of the buffer with compressed data. | :white_check_mark: Yes |
+| `byteOffset` | `integer` | The offset into the buffer in bytes. | No, default: `0` |
+| `byteLength` | `integer` | The length of the compressed data in bytes. | :white_check_mark: Yes |
+| `byteStride` | `integer` | The stride, in bytes. | :white_check_mark: Yes |
+| `count` | `integer` | The number of elements. | :white_check_mark: Yes |
+| `mode` | `string` | The compression mode. | :white_check_mark: Yes |
+| `filter` | `string` | The compression filter. | No, default: `"NONE"` |
+
+`mode` represents the compression mode using an enumerated value that must be one of `"ATTRIBUTES"`, `"TRIANGLES"`, `"INDICES"`.
+
+`filter` represents the post-decompression filter using an enumerated value that must be one of `"NONE"`, `"OCTAHEDRAL"`, `"QUATERNION"`, `"EXPONENTIAL"`.
+
+For the extension object to be valid, the following must hold:
+
+- When parent `bufferView` has `byteStride` defined, it matches `byteStride` in the extension JSON
+- The parent `bufferView.byteLength` is equal to `byteStride` times `count`
+- When `mode` is `"ATTRIBUTES"`, `byteStride` must be divisible by 4 and must be <= 256.
+- When `mode` is `"TRIANGLES"`, `count` must be divisible by 3
+- When `mode` is `"TRIANGLES"` or `"INDICES"`, `byteStride` must be equal to 2 or 4
+- When `mode` is `"TRIANGLES"` or `"INDICES"`, `filter` must be equal to `"NONE"` or omitted
+- When `filter` is `"OCTAHEDRAL"`, `byteStride` must be equal to 4 or 8
+- When `filter` is `"QUATERNION"`, `byteStride` must be equal to 8
+- When `filter` is `"EXPONENTIAL"`, `byteStride` must be divisible by 4
+
+The type of compressed data must match the bitstream specification (note that each `mode` specifies a different bitstream format).
+
+The parent `bufferView` properties define a layout which can hold the data decompressed from the extension object.
+
+## Compression modes and filters
+
+Compression mode specifies the bitstream layout and the algorithm used to decompress the data, and can be one of:
+
+- Mode 0: attributes. Suitable for storing sequences of values of arbitrary size, relies on exploiting similarity between bytes of consecutive elements to reduce the size.
+- Mode 1: triangles. Suitable for storing indices that represent triangle lists, relies on exploiting topological redundancy of consecutive triangles.
+- Mode 2: indices. Suitable for storing indices that don't represent triangle lists, relies on exploiting similarity between consecutive elements.
+
+In all three modes, the resulting compressed byte sequence is typically noticeably smaller than the buffer view length, *and* can be additionally compressed by using a general purpose compression algorithm such as Deflate for the resulting glTF file (.glb/.bin).
+
+The format of the bitstream is specified in [Appendix A (Bitstream)](#appendix-a-bitstream).
+
+When using attribute encoding, for some types of data exploiting the redundancy between consecutive elements is not enough to achieve good compression ratio; quantization can help but isn't always sufficient either. To that end, when using mode 0, this extension allows a further use of a compression filter, that transforms each element stored in the buffer view to make it more compressible with the attribute codec and often allows to trade precision for compressed size. Filters don't change the size of the output data, they merely improve the compressed size by reducing entropy; note that the use of a compression filter restricts `byteStride` which effectively prohibits storing interleaved data.
+
+Filter specifies the algorithm used to transform the data after decompression, and can be one of:
+
+- Filter 0: none. Attribute data is used as is.
+- Filter 1: octahedral. Suitable for storing unit length vectors (normals/tangents) as 4-byte or 8-byte values with variable precision octahedral encoding.
+- Filter 2: quaternion. Suitable for storing rotation data for animations or instancing as 8-byte values with variable precision max-component encoding.
+- Filter 3: exponential. Suitable for storing floating point data as 4-byte values with variable mantissa precision.
+
+The filters are detailed further in [Appendix B (Filters)](#appendix-b-filters).
+
+When using filters, the expectation is that the filter is applied after the attribute decoder on the contents of the resulting bufferView; the resulting data can then be used according to the referencing accessors without further modifications.
+
+**Non-normative** To decompress the data, [meshoptimizer](https://github.com/zeux/meshoptimizer) library may be used; it supports efficient decompression using C++ and/or WebAssembly, including fast SIMD implementation for attribute decoding.
+
+## Fallback buffers
+
+While the extension JSON specifies a separate buffer to source compressed data from, the parent `bufferView` must also have a valid `buffer` reference as per glTF 2.0 spec requirement. To produce glTF files that *require* support for this extension and don't have uncompressed data, the referenced buffer can contain no URI as follows:
+
+```json
+{ "byteLength": 1432878 }
+```
+
+The `byteLength` property of such a placeholder buffer **MUST** be sufficiently large to contain all uncompressed buffer views referencing it.
+
+When stored in a GLB file, the placeholder buffer should have index 1 or above, to avoid conflicts with GLB binary buffer.
+
+This extension allows buffers to be optionally tagged as fallback by using the `fallback` attribute as follows:
+
+```json
+{
+	"byteLength": 1432878,
+	"extensions": {
+		"EXT_meshopt_compression": {
+			"fallback": true
+		}
+	}
+}
+```
+
+This is useful to avoid confusion, and may also be used by loaders that support the extension to skip loading of these buffers.
+
+When a buffer is marked as a fallback buffer, the following must hold:
+
+- All references to the buffer must come from `bufferView`s that have a `EXT_meshopt_compression` extension specified
+- No references to the buffer may come from `EXT_meshopt_compression` extension JSON
+
+If a fallback buffer doesn't have a URI and doesn't refer to the GLB binary chunk, it follows that `EXT_meshopt_compression` must be a required extension.
+
+## Compressing geometry data
+
+> This section is non-normative.
+
+The codecs used by this extension can represent geometry exactly, replicating both vertex and index data without changes in contents or order. However, to get optimal compression, it's necessary to pre-process the data.
+
+To get optimal compression, encoders should optimize vertex and index data for locality of reference. Specifically:
+
+- Triangle order should be optimized to maximize the recency of previously encountered vertices; this is similar to optimizing meshes for vertex reuse aka post-transform cache in GPU hardware.
+- Vertex order should be linearized in the order that vertices appear in the index stream to get optimal index compression
+
+When index data is not available (e.g. point data sets) or represents topology with a lot of seams (e.g. each triangle has unique vertex indices because it specifies flat-shaded normal), encoders could additionally optimize vertex data for spatial locality, so that vertices close together in the vertex stream are close together in space.
+
+Vertex data should be quantized using the appropriate representation; this extension cleanly interacts with `KHR_mesh_quantization` by compressing already quantized data.
+
+Morph targets can be treated identically to other vertex attributes, as long as vertex order optimization is performed on all target streams at the same time. It is recommended to use quantized storage for morph target deltas, possibly with a narrower type than that used for baseline values.
+
+When storing vertex data, mode 0 (attributes) should be used; for index data, mode 1 (triangles) or mode 2 (indices) should be used instead. Mode 1 only supports triangle list storage; indices of other topology types can be stored using mode 2. The use of triangle strip topology is not recommended since it's more efficient to store triangle lists using mode 1.
+
+Using filter 1 (octahedral) for normal/tangent data may improve compression ratio further.
+
+## Compressing animation data
+
+> This section is non-normative.
+
+To minimize the size of animation data, it is important to reduce the number of stored keyframes and reduce the size of each keyframe.
+
+To reduce the number of keyframes, encoders can either selectively remove keyframes that don't contribute to the resulting movement, resulting in sparse input/output data, or resample the keyframes uniformly, resulting in uniformly dense data. Resampling can be beneficial since it means that all animation channels in the same animation can share the same input accessor, and provides a convenient quality vs size tradeoff, but it's up to the encoder to pick the optimal strategy.
+
+Additionally it's important to identify tracks with the same output value and use a single keyframe for these.
+
+To reduce the size of each keyframe, rotation data should be quantized using 16-bit normalized components; for additional compression, the use of filter 2 (quaternion) is recommended. Translation/scale data can be compressed using filter 3 (exponential) with the same exponent used for all three vector components.
+
+Note that animation inputs that specify time values require enough precision to avoid animation distortion. It's recommended to either not use any filters for animation inputs to avoid any precision loss (attribute encoder can still be efficient at reducing the size of animation input track even without filters when the inputs are uniformly spaced), or use filter 3 (exponential) with maximum mantissa bit count (23).
+
+After pre-processing, both input and output data should be stored using mode 0 (attributes).
+
+# Appendix A: Bitstream
+
+The following sections specify the format of the bitstream for compressed data for various modes.
+
+## Mode 0: attributes
+
+Attribute compression exploits similarity between consecutive elements of the buffer by encoding deltas. The deltas are stored for each separate byte which makes the codec more versatile since it can work with components of various sizes. Additionally, the elements are stored with bytes deinterleaved, which means that sequences of deltas are more easily compressible by some general purpose compressors that may run on the resulting data.
+
+To facilitate efficient decompression, deinterleaving and delta encoding are performed on attribute blocks instead of on the entire buffer; within each block, elements are processed in groups of 16.
+
+The encoded stream structure is as follows:
+
+- Header byte, which must be equal to `0xa0`
+- One or more attribute blocks, detailed below
+- Tail block, which consists of a baseline element stored verbatim, padded to 32 bytes
+
+Note that there is no way to calculate the length of a stream; instead, it is expected that the input stream is correctly sized (using `byteLength`) so that the tail block element can be found.
+
+Each attribute block stores a sequence of deltas, with the first element in the first block using the deltas from the baseline element stored in the tail block, and each subsequent element using the deltas from the previous element. The attribute block always stores an integer number of elements, with that number computed as follows:
+
+```
+maxBlockElements = min((8192 / byteStride) & ~15, 256)
+blockElements = min(remainingElements, maxBlockElements)
+```
+
+Where `remainingElements` is the number of elements that have yet to be decoded.
+
+Each attribute block consists of `byteStride` "data blocks" (one for each byte of the element), and each "data block" contains deltas stored for groups of elements. Each group always contains 16 elements; when the number of elements that needs to be encoded isn't divisible by 16, it gets rounded up and the remaining elements are ignored after decoding. In other terms:
+
+```
+groupCount = ceil(blockElements / 16)
+```
+
+For example, a stream with a `byteStride` of 64 containing 200 elements would be broken up into two attribute blocks: one containing 128 elements, and the other containing 72 elements. And these blocks would have 8 and 5 groups, respectively.
+
+The structure of each "data block" breaks down as follows:
+- Header bits, with 2 bits for each group, aligned to the byte boundary if groupCount is not divisible by 4
+- Delta blocks, with variable number of bytes stored for each group
+
+Header bits are stored from least significant to most significant bit - header bits for 4 consecutive groups are packed in a byte together as follows:
+
+```
+(headerBitsForGroup0 << 0) | (headerBitsForGroup1 << 2) | (headerBitsForGroup2 << 4) | (headerBitsForGroup3 << 6)
+```
+
+The header bits establish the delta encoding mode (0-3) for each group of 16 elements that follows:
+
+- bits 0: All 16 byte deltas are 0; the size of the encoded block is 0 bytes
+- bits 1: Deltas are stored in 2-bit sentinel encoding; the size of the encoded block is [4..20] bytes
+- bits 2: Deltas are stored in 4-bit sentinel encoding; the size of the encoded block is [8..24] bytes
+- bits 3: All 16 byte deltas are stored as bytes; the size of the encoded block is 16 bytes
+
+When using the sentinel encoding, each delta is stored as a 2-bit or 4-bit value in a single 4-byte or 8-byte block, with deltas stored from most significant to least significant bit inside the byte. That is, the 2-bit encoding is packed as follows with 4 deltas per byte:
+
+```
+(delta3 << 0) | (delta2 << 2) | (delta1 << 4) | (delta0 << 6)
+```
+
+And the 4-bit encoding is packed as follows with 2 deltas per byte:
+
+```
+(delta1 << 0) | (delta1 << 4)
+```
+
+Note that this is not the same order as the packing of the header bits found above.
+
+A delta that has all bits set to 1 (corresponds to `3` for 2-bit encoding and `15` for 4-bit encoding, otherwise known as "sentinel") indicates that the real delta value is outside of the 2-bit or 4-bit range, and is stored as a full byte after the bit deltas for this group.
+
+Byte deltas are stored as zigzag-encoded differences between the byte values of the element and the byte values of the previous element in the same position; the zigzag encoding scheme works as follows:
+
+```
+encode(uint8_t v) = ((v & 0x80) != 0) ? ~(v << 1) : (v << 1)
+decode(uint8_t v) = ((v & 1) != 0) ? ~(v >> 1) : (v >> 1)
+```
+
+For a complete example, assuming 4-bit sentinel coding, the following byte sequence:
+
+```
+0x17 0x5f 0xf0 0xbc 0x77 0xa9 0x21 0x00 0x34 0xb5
+```
+
+Encodes 16 deltas, where the first 8 bytes of the sequence specifies 16 4-bit deltas, and the last 2 bytes of the sequence specify the explicit delta code values encoded for elements 3 and 4 in the sequence. After de-zigzagging, the decoded deltas look like:
+
+```
+-1 -4 -3 26 -91 0 -6 6 -4 -4 5 -5 1 -1 0 0
+```
+
+Finally, note that the deltas are computed in 8-bit integer space with wrap-around two-complement arithmetic; for example, if the values of the first byte of two consecutive elements are `0x00` and `0xff`, the byte delta that is stored is `-1` (`1` after zigzag encoding).
+
+## Mode 1: triangles
+
+Triangle compression compresses triangle list indices by exploiting similarity between consecutive triangles. Given a triangle stream that has been optimized for locality, very often subsequent triangles share an edge with the recently encoded triangle. The encoder uses a few other techniques to try to encode most triangles in optimized triangle lists into a single byte.
+
+The encoded stream structure is as follows:
+
+- Header byte, which must be equal to `0xe1`
+- Triangle codes, referred to as `code` below, with a single byte for each triangle
+- Extra data which is necessary to decode triangles that don't fit into a single byte, referred to as `data` below
+- Tail block, which consists of a 16-byte lookup table, referred to as `codeaux` below
+
+Note that there is no way to calculate the length of a stream; instead, it is expected that the input stream is correctly sized (using `byteLength`) so that the tail block element can be found.
+
+There are two limitations on the structure of the 16-byte lookup table:
+
+- The last two bytes must be 0
+- The remaining bytes must not contain any nibbles equal to `0xf`.
+
+During the decoding process, decoder maintains four variables:
+
+- `next`: an integer referring to the expected next unique index (also known as high-watermark), starts at 0
+- `last`: an integer referring to the last encoded index, starts at 0
+- `edgefifo`: a 16-entry FIFO with two vertex indices in each entry; initial contents is undefined
+- `vertexfifo`: a 16-entry FIFO with a vertex index in each entry; initial contents is undefined
+
+To decode each triangle, the decoder needs to analyze the `code` byte, read additional bytes from `data` as necessary, and update the internal state correctly. The `code` byte encoding is optimized to reach a single byte per triangle in most common cases; the resulting data can often be compressed by a general purpose compressor running on the resulting .bin/.glb file.
+
+When extra data is necessary to decode a triangle and it represents an index value, the decoder uses varint-7 encoding (also known as [unsigned LEB128](https://en.wikipedia.org/wiki/LEB128#Unsigned_LEB128)), which encodes an integer as one or more bytes, with the byte with the 0 most significant bit terminating the sequence:
+
+```
+0x7f => 0x7f
+0x81 0x04 => 0x201
+0xff 0xa0 0x05 => 0x1fd005
+```
+
+Instead of using the raw index value, a zigzag-encoded 32-bit delta from `last` is used:
+
+```
+uint32_t decodeIndex(uint32_t v) {
+	int32_t delta = (v & 1) != 0 ? ~(v >> 1) : (v >> 1);
+
+	last += delta;
+	return last;
+}
+```
+
+The encoding for `code` is split into various cases, some of which are self-sufficient and some need to read extra data. The encoding is detailed below; after either path the triangle (a, b, c) is emitted to the output.
+
+- `0xX0`, where `X < 0xf`: Encodes a recently encountered edge and a `next` vertex.
+
+The edge (a, b) is read from the edge FIFO at index X (where 0 is the most recently added edge).
+The third index, `c`, is equal to `next` (which is then incremented).
+
+Edge (c, b) is pushed to the edge FIFO.
+Edge (a, c) is pushed to the edge FIFO.
+Vertex c is pushed to the vertex FIFO.
+
+- `0xXY`, where `X < 0xf` and `0 < Y < 0xd`: Encodes a recently encountered edge and a recently encountered vertex.
+
+The edge (a, b) is read from the edge FIFO at index X (where 0 is the most recently added edge).
+The third index, `c`, is read from the vertex FIFO at index Y (where 0 is the most recently added vertex; note that 0 is never actually read here, since `Y > 0`).
+
+Edge (c, b) is pushed to the edge FIFO.
+Edge (a, c) is pushed to the edge FIFO.
+
+- `0xXd` or `0xXe`, where `X < 0xf`: Encodes a recently encountered edge and a vertex that's adjacent to `last`.
+
+The edge (a, b) is read from the edge FIFO at index X (where 0 is the most recently added edge).
+The third index, `c`, is equal to `last-1` for `0xXd` and `last+1` for `0xXe`.
+
+`last` is set to `c` (effectively decrementing or incrementing it accordingly).
+
+Edge (c, b) is pushed to the edge FIFO.
+Edge (a, c) is pushed to the edge FIFO.
+Vertex c is pushed to the vertex FIFO.
+
+- `0xXf`, where `X < 0xf`: Encodes a recently encountered edge and a free-standing vertex encoded explicitly.
+
+The edge (a, b) is read from the edge FIFO at index X (where 0 is the most recently added edge).
+The third index, `c`, is decoded using `decodeIndex` by reading extra bytes from `data` (and also updates `last`).
+
+Edge (c, b) is pushed to edge FIFO.
+Edge (a, c) is pushed to edge FIFO.
+Vertex c is pushed to the vertex FIFO.
+
+- `0xfY`, where `Y < 0xe`: Encodes three indices using `codeaux` table lookup and vertex FIFO.
+
+The table `codeaux` is used to read the element Y; let's assume that results in `0xZW`.
+
+The first index, `a`, is equal to `next`; `next` is incremented to decode b/c correctly.
+The second index, `b`, is equal to `next` if `Z == 0` (`next` is then incremented), or is read from vertex FIFO at index `Z-1` (where 0 is the most recently added vertex).
+The third index, `c`, is equal to `next` if `W == 0` (`next` is then incremented), or is read from vertex FIFO at index `W-1` (where 0 is the most recently added vertex).
+
+Note that in the process `next` is incremented from 1 to 3 times depending on values of Z/W.
+
+Edge (b, a) is pushed to the edge FIFO.
+Edge (c, b) is pushed to the edge FIFO.
+Edge (a, c) is pushed to the edge FIFO.
+Vertex a is pushed to the vertex FIFO.
+Vertex b is pushed to the vertex FIFO if `Z == 0`.
+Vertex c is pushed to the vertex FIFO if `W == 0`.
+
+- `0xfe` or `0xff`: Encodes three indices explicitly.
+
+This requires an extra byte that is read from `data`; let's assume that results in `0xZW`. Note that this is *not* an LEB128 value, just a single byte.
+
+If `0xZW` == `0x00`, then `next` is reset to 0. This is a special mechanism used to restart the `next` sequence which is useful for concatenating independent triangle streams. This must be done before further processing.
+
+The first index, `a`, is equal to `next` for `0xfe` encoding (`next` is then incremented), or is read using `decodeIndex` by reading extra bytes from `data` (and also updates `last`).
+The second index, `b`, is equal to `next` if `Z == 0` (`next` is then incremented), is read from vertex FIFO at index `Z-1` (where 0 is the most recently added vertex) if `Z < 0xf`, or is read using `decodeIndex` by reading extra bytes from `data` (and also updates `last`) if `Z == 0xf`.
+The third index, `c`, is equal to `next` if `W == 0` (`next` is then incremented), is read from vertex FIFO at index `W-1` (where 0 is the most recently added vertex) if `W < 0xf`, or is read using `decodeIndex` by reading extra bytes from `data` (and also updates `last`) if `W == 0xf`.
+
+Edge (b, a) is pushed to the edge FIFO.
+Edge (c, b) is pushed to the edge FIFO.
+Edge (a, c) is pushed to the edge FIFO.
+Vertex a is pushed to the vertex FIFO.
+Vertex b is pushed to the vertex FIFO if `Z == 0` or `Z == 0xf`.
+Vertex c is pushed to the vertex FIFO if `W == 0` or `W == 0xf`.
+
+At the end of the decoding, `data` is expected to be fully read by all the triangle codes and not contain any extra bytes.
+
+## Mode 2: indices
+
+Index compression exploits similarity between consecutive indices. Note that, unlike the triangle index compression (mode 1), this mode doesn't assume a specific topology and as such is less efficient in terms of the resulting size. However, unlike mode 1, this mode can be used to compress triangle strips, line lists and other types of mesh index data, and can additionally be used to compress non-mesh index data such as sparse indices for accessors.
+
+The encoded stream structure is as follows:
+
+- Header byte, which must be equal to `0xd1`
+- A sequence of index deltas, with encoding specified below
+- Tail block, which consists of 4 bytes that are reserved and should be set to 0
+
+Instead of simply encoding deltas vs the previous index, the decoder tracks *two* baseline index values, that start at 0. Each delta is specified in relation to one of these values and updates it so that the next delta that references the same baseline uses the encoded index value as a reference. This encoding is more efficient at handling some types of bimodal sequences where two independent monotonic sequences are spliced together, which can occur for some common cases of triangle strips or line lists.
+
+To specify the index delta, the varint-7 encoding scheme (also known as [unsigned LEB128](https://en.wikipedia.org/wiki/LEB128#Unsigned_LEB128)) is used, which encodes an integer as one or more bytes, with the byte with the 0 most significant bit terminating the sequence:
+
+```
+0x7f => 0x7f
+0x81 0x04 => 0x201
+0xff 0xa0 0x05 => 0x1fd005
+```
+
+When decoding the deltas, the 32-bit value is read using the varint-7 encoding. The least significant bit of the value indicates one of the baseline values; the remaining bits specify a zigzag-encoded signed delta and can be decoded as follows:
+
+```
+uint32_t decode(uint32_t v) {
+	int32_t baseline = v & 1;
+	int32_t delta = (v & 2) != 0 ? ~(v >> 2) : (v >> 2);
+
+	last[baseline] += delta;
+	return last[baseline];
+}
+```
+
+It's up to the encoder to determine the optimal selection of the baseline for each index; this encoding scheme can be used to do basic delta encoding (with baseline bit always set to 0) as well as more complex bimodal encodings.
+
+Note that the zigzag-encoded delta must fit in a 31-bit integer; as such, deltas are limited to [-2^30..2^30-1].
+
+# Appendix B: Filters
+
+Filters are functions that transform each encoded attribute. For each filter, this document specifies the transformation used for decoding the data; it's up to the encoder to pick the parameters of the encoding for each element to balance quality and precision.
+
+For performance reasons the results of the decoding process are specified to one unit in last place (ULP) in terms of the decoded data, e.g. if a filter results in a 16-bit signed normalized integer, decoding may produce results within 1/32767 of specified value.
+
+## Filter 1: octahedral
+
+Octahedral filter allows to encode unit length 3D vectors (normals/tangents) using octahedral encoding, which results in a more optimal quality vs precision tradeoff compared to storing raw components.
+
+This filter is only valid if `byteStride` is 4 or 8. When `byteStride` is 4, then the input and output of this filter are four 8-bit components, and when `byteStride` is 8, the input and output of this filter are four 16-bit signed components.
+
+The input to the filter is four 8-bit or 16-bit components, where the first two specify the X and Y components in octahedral encoding encoded as signed normalized K-bit integers (4 <= K <= 16, integers are stored in two's complement format), the third component explicitly encodes 1.0 as a signed normalized K-bit integer. The last component may contain arbitrary data which is passed through unfiltered (this can be useful for tangents).
+
+The encoding of the third component allows to compute K for each vector independently from the bit representation, and must encode 1.0 precisely which is equivalent to `(1 << (K - 1)) - 1` as an integer; values of the third component that aren't equal to `(1 << (K - 1)) - 1` for a valid `K` are invalid and the result of decoding such vectors is unspecified.
+
+When storing a K-bit integer in a 8-bit of 16-bit component when K is not 8 or 16, the remaining bits (e.g. top 6 bits in case of K=10) must be equal to the sign bit; the valid range of the resulting integer is from `-max` to `max` where `max = (1 << (K - 1)) - 1`. The behavior of decoding values outside of that range is unspecified.
+
+The output of the filter is three decoded unit vector components, stored as 8-bit or 16-bit normalized integers, and the last input component verbatim.
+
+```
+void decode(intN_t input[4], intN_t output[4]) {
+	// input[2] encodes a K-bit representation of 1.0
+	float32_t one = input[2];
+
+	float32_t x = input[0] / one;
+	float32_t y = input[1] / one;
+	float32_t z = 1.0 - abs(x) - abs(y);
+
+	// octahedral fixup for negative hemisphere
+	float32_t t = min(z, 0.0);
+
+	x -= copysign(t, x);
+	y -= copysign(t, y);
+
+	// renormalize (x, y, z)
+	float32_t len = sqrt(x * x + y * y + z * z);
+
+	x /= len;
+	y /= len;
+	z /= len;
+
+	output[0] = round(x * INTN_MAX);
+	output[1] = round(y * INTN_MAX);
+	output[2] = round(z * INTN_MAX);
+	output[3] = input[3];
+}
+```
+
+`INTN_MAX` is equal to 127 when using 8-bit components (N is 8) and equal to 32767 when using 16-bit components (N is 16).
+
+`copysign` behaves as specified in C99 and returns the value with the magnitude of the first argument and the sign of the second argument.
+
+## Filter 2: quaternion
+
+Quaternion filter allows to encode unit length quaternions using normalized 16-bit integers for all components, but allows control over the precision used for the components and provides better quality compared to naively encoding each component one by one.
+
+This filter is only valid if `byteStride` is 8.
+
+The input to the filter is three quaternion components, excluding the component with the largest magnitude, encoded as signed normalized K-bit integers (4 <= K <= 16, integers are stored in two's complement format), and an index of the largest component that is omitted in the encoding. The largest component is assumed to always be positive (which is possible due to quaternion double-cover). To allow per-element control over K, the last input element must explicitly encode 1.0 as a signed normalized K-bit integer, except for the least significant 2 bits that store the index of the maximum component.
+
+When storing a K-bit integer in a 16-bit component when K is not 16, the remaining bits (e.g. top 6 bits in case of K=10) must be equal to the sign bit; the valid range of the resulting integer is from `-max` to `max` where `max = (1 << (K - 1)) - 1`. The behavior of decoding values outside of that range is unspecified.
+
+The output of the filter is four decoded quaternion components, stored as 16-bit normalized integers.
+
+After eliminating the maximum component, the maximum magnitude of the remaining components is 1/sqrt(2). Because of this the input components store the original component value scaled by sqrt(2.0) to increase precision.
+
+```
+void decode(int16_t input[4], int16_t output[4]) {
+	float32_t range = 1.0 / sqrt(2.0);
+
+	// input[3] encodes a K-bit representation of 1.0 except for bottom two bits
+	float32_t one = input[3] | 3;
+
+	float32_t x = input[0] / one * range;
+	float32_t y = input[1] / one * range;
+	float32_t z = input[2] / one * range;
+
+	float32_t w = sqrt(max(0.0, 1.0 - x * x - y * y - z * z));
+
+	int maxcomp = input[3] & 3;
+
+	// maxcomp specifies a cyclic rotation of the quaternion components
+	output[(maxcomp + 1) % 4] = round(x * 32767.0);
+	output[(maxcomp + 2) % 4] = round(y * 32767.0);
+	output[(maxcomp + 3) % 4] = round(z * 32767.0);
+	output[(maxcomp + 0) % 4] = round(w * 32767.0);
+}
+```
+
+## Filter 3: exponential
+
+Exponential filter allows to encode floating point values with a range close to the full range of a 32-bit floating point value, but allows more control over the exponent/mantissa to trade quality for precision, and has a bit structure that is more optimally aligned to the byte boundary to facilitate better compression.
+
+This filter is only valid if `byteStride` is a multiple of 4.
+
+The input to the filter is a sequence of 32-bit little endian integers, with the most significant 8 bits specifying a (signed) exponent value, and the remaining 24 bits specifying a (signed) mantissa value. The integers are stored in two-complement format.
+
+The result of the filter is 2^e * m:
+
+```
+float32_t decode(int32_t input) {
+	int32_t e = input >> 24;
+	int32_t m = (input << 8) >> 8;
+	return pow(2.0, e) * m;
+}
+```
+
+The valid range of `e` is [-100, +100], which facilitates performant implementations. Decoding out of range values results in unspecified behavior, and encoders are expected to clamp `e` to the valid range.
diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/schema/buffer.EXT_meshopt_compression.schema.json b/extensions/2.0/Khronos/KHR_meshopt_compression/schema/buffer.EXT_meshopt_compression.schema.json
new file mode 100644
index 0000000000..98a8313e05
--- /dev/null
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/schema/buffer.EXT_meshopt_compression.schema.json
@@ -0,0 +1,16 @@
+{
+    "$schema": "http://json-schema.org/draft-04/schema",
+    "title": "EXT_meshopt_compression buffer extension",
+    "type": "object",
+    "description": "Compressed data for bufferView.",
+    "allOf": [ { "$ref": "glTFProperty.schema.json" } ],
+    "properties": {
+        "fallback": {
+            "type": "boolean",
+            "description": "Set to true to indicate that the buffer is only referenced by bufferViews that have EXT_meshopt_compression extension and as such doesn't need to be loaded.",
+            "default": false
+        },
+        "extensions": { },
+        "extras": { }
+    }
+}
diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/schema/bufferView.EXT_meshopt_compression.schema.json b/extensions/2.0/Khronos/KHR_meshopt_compression/schema/bufferView.EXT_meshopt_compression.schema.json
new file mode 100644
index 0000000000..85309a950b
--- /dev/null
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/schema/bufferView.EXT_meshopt_compression.schema.json
@@ -0,0 +1,51 @@
+{
+    "$schema": "http://json-schema.org/draft-04/schema",
+    "title": "EXT_meshopt_compression bufferView extension",
+    "type": "object",
+    "description": "Compressed data for bufferView.",
+    "allOf": [ { "$ref": "glTFProperty.schema.json" } ],
+    "properties": {
+        "buffer": {
+            "allOf": [ { "$ref": "glTFid.schema.json" } ],
+            "description": "The index of the buffer with compressed data."
+        },
+        "byteOffset": {
+            "type": "integer",
+            "description": "The offset into the buffer in bytes.",
+            "minimum": 0,
+            "default": 0
+        },
+        "byteLength": {
+            "type": "integer",
+            "description": "The length of the compressed data in bytes.",
+            "minimum": 1
+        },
+        "byteStride": {
+            "type": "integer",
+            "description": "The stride, in bytes.",
+            "minimum": 2,
+            "maximum": 256
+        },
+        "count": {
+            "type": "integer",
+            "description": "The number of elements.",
+            "minimum": 1
+        },
+        "mode": {
+            "type": "string",
+            "description": "The compression mode.",
+            "enum": [ "ATTRIBUTES", "TRIANGLES", "INDICES" ]
+        },
+        "filter": {
+            "type": "string",
+            "description": "The compression filter.",
+            "enum": [ "NONE", "OCTAHEDRAL", "QUATERNION", "EXPONENTIAL" ],
+            "default": "NONE"
+        },
+        "extensions": { },
+        "extras": { }
+    },
+    "required": [
+        "buffer", "byteLength", "byteStride", "count", "mode"
+    ]
+}

From ddcb0e8597ac9326dbf8b1782bb4a14017cce855 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Thu, 31 Jul 2025 22:14:56 -0700
Subject: [PATCH 02/54] Rename EXT to KHR

---
 .../2.0/Khronos/KHR_meshopt_compression/README.md  | 14 +++++++-------
 ... => buffer.KHR_meshopt_compression.schema.json} |  4 ++--
 ...bufferView.KHR_meshopt_compression.schema.json} |  2 +-
 3 files changed, 10 insertions(+), 10 deletions(-)
 rename extensions/2.0/Khronos/KHR_meshopt_compression/schema/{buffer.EXT_meshopt_compression.schema.json => buffer.KHR_meshopt_compression.schema.json} (80%)
 rename extensions/2.0/Khronos/KHR_meshopt_compression/schema/{bufferView.EXT_meshopt_compression.schema.json => bufferView.KHR_meshopt_compression.schema.json} (96%)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 153a413ba8..3082467dec 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -1,4 +1,4 @@
-# EXT\_meshopt\_compression
+# KHR\_meshopt\_compression
 
 ## Contributors
 
@@ -27,7 +27,7 @@ This is beneficial for typical Web delivery scenarios, where all files are usual
 
 ## Specifying compressed views
 
-As explained in the overview, this extension operates on bufferViews. This allows the loaders to directly decompress data into GPU memory and minimizes the JSON size impact of specifying compressed data. To specify the compressed representation, `EXT_meshopt_compression` extension section overrides the source buffer index as well as specifying the buffer parameters and a compression mode/filter (detailed later in the specification):
+As explained in the overview, this extension operates on bufferViews. This allows the loaders to directly decompress data into GPU memory and minimizes the JSON size impact of specifying compressed data. To specify the compressed representation, `KHR_meshopt_compression` extension section overrides the source buffer index as well as specifying the buffer parameters and a compression mode/filter (detailed later in the specification):
 
 ```json
 {
@@ -37,7 +37,7 @@ As explained in the overview, this extension operates on bufferViews. This allow
 	"byteStride": 16,
 	"target": 34962,
 	"extensions": {
-		"EXT_meshopt_compression": {
+		"KHR_meshopt_compression": {
 			"buffer": 0,
 			"byteOffset": 1024,
 			"byteLength": 347,
@@ -130,7 +130,7 @@ This extension allows buffers to be optionally tagged as fallback by using the `
 {
 	"byteLength": 1432878,
 	"extensions": {
-		"EXT_meshopt_compression": {
+		"KHR_meshopt_compression": {
 			"fallback": true
 		}
 	}
@@ -141,10 +141,10 @@ This is useful to avoid confusion, and may also be used by loaders that support
 
 When a buffer is marked as a fallback buffer, the following must hold:
 
-- All references to the buffer must come from `bufferView`s that have a `EXT_meshopt_compression` extension specified
-- No references to the buffer may come from `EXT_meshopt_compression` extension JSON
+- All references to the buffer must come from `bufferView`s that have a `KHR_meshopt_compression` extension specified
+- No references to the buffer may come from `KHR_meshopt_compression` extension JSON
 
-If a fallback buffer doesn't have a URI and doesn't refer to the GLB binary chunk, it follows that `EXT_meshopt_compression` must be a required extension.
+If a fallback buffer doesn't have a URI and doesn't refer to the GLB binary chunk, it follows that `KHR_meshopt_compression` must be a required extension.
 
 ## Compressing geometry data
 
diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/schema/buffer.EXT_meshopt_compression.schema.json b/extensions/2.0/Khronos/KHR_meshopt_compression/schema/buffer.KHR_meshopt_compression.schema.json
similarity index 80%
rename from extensions/2.0/Khronos/KHR_meshopt_compression/schema/buffer.EXT_meshopt_compression.schema.json
rename to extensions/2.0/Khronos/KHR_meshopt_compression/schema/buffer.KHR_meshopt_compression.schema.json
index 98a8313e05..1afcc64aac 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/schema/buffer.EXT_meshopt_compression.schema.json
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/schema/buffer.KHR_meshopt_compression.schema.json
@@ -1,13 +1,13 @@
 {
     "$schema": "http://json-schema.org/draft-04/schema",
-    "title": "EXT_meshopt_compression buffer extension",
+    "title": "KHR_meshopt_compression buffer extension",
     "type": "object",
     "description": "Compressed data for bufferView.",
     "allOf": [ { "$ref": "glTFProperty.schema.json" } ],
     "properties": {
         "fallback": {
             "type": "boolean",
-            "description": "Set to true to indicate that the buffer is only referenced by bufferViews that have EXT_meshopt_compression extension and as such doesn't need to be loaded.",
+            "description": "Set to true to indicate that the buffer is only referenced by bufferViews that have KHR_meshopt_compression extension and as such doesn't need to be loaded.",
             "default": false
         },
         "extensions": { },
diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/schema/bufferView.EXT_meshopt_compression.schema.json b/extensions/2.0/Khronos/KHR_meshopt_compression/schema/bufferView.KHR_meshopt_compression.schema.json
similarity index 96%
rename from extensions/2.0/Khronos/KHR_meshopt_compression/schema/bufferView.EXT_meshopt_compression.schema.json
rename to extensions/2.0/Khronos/KHR_meshopt_compression/schema/bufferView.KHR_meshopt_compression.schema.json
index 85309a950b..5582b81233 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/schema/bufferView.EXT_meshopt_compression.schema.json
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/schema/bufferView.KHR_meshopt_compression.schema.json
@@ -1,6 +1,6 @@
 {
     "$schema": "http://json-schema.org/draft-04/schema",
-    "title": "EXT_meshopt_compression bufferView extension",
+    "title": "KHR_meshopt_compression bufferView extension",
     "type": "object",
     "description": "Compressed data for bufferView.",
     "allOf": [ { "$ref": "glTFProperty.schema.json" } ],

From bfb9000179dc83c1d616569aba63d88c890c839f Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Thu, 31 Jul 2025 22:25:45 -0700
Subject: [PATCH 03/54] Add color filter documentation

---
 .../Khronos/KHR_meshopt_compression/README.md | 51 ++++++++++++++++++-
 ...erView.KHR_meshopt_compression.schema.json |  2 +-
 2 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 3082467dec..6515f9ce49 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -67,7 +67,7 @@ Each `bufferView` can contain an extension object with the following properties:
 
 `mode` represents the compression mode using an enumerated value that must be one of `"ATTRIBUTES"`, `"TRIANGLES"`, `"INDICES"`.
 
-`filter` represents the post-decompression filter using an enumerated value that must be one of `"NONE"`, `"OCTAHEDRAL"`, `"QUATERNION"`, `"EXPONENTIAL"`.
+`filter` represents the post-decompression filter using an enumerated value that must be one of `"NONE"`, `"OCTAHEDRAL"`, `"QUATERNION"`, `"EXPONENTIAL"`, `"COLOR"`.
 
 For the extension object to be valid, the following must hold:
 
@@ -80,6 +80,7 @@ For the extension object to be valid, the following must hold:
 - When `filter` is `"OCTAHEDRAL"`, `byteStride` must be equal to 4 or 8
 - When `filter` is `"QUATERNION"`, `byteStride` must be equal to 8
 - When `filter` is `"EXPONENTIAL"`, `byteStride` must be divisible by 4
+- When `filter` is `"COLOR"`, `byteStride` must be equal to 4 or 8
 
 The type of compressed data must match the bitstream specification (note that each `mode` specifies a different bitstream format).
 
@@ -105,6 +106,7 @@ Filter specifies the algorithm used to transform the data after decompression, a
 - Filter 1: octahedral. Suitable for storing unit length vectors (normals/tangents) as 4-byte or 8-byte values with variable precision octahedral encoding.
 - Filter 2: quaternion. Suitable for storing rotation data for animations or instancing as 8-byte values with variable precision max-component encoding.
 - Filter 3: exponential. Suitable for storing floating point data as 4-byte values with variable mantissa precision.
+- Filter 4: color. Suitable for storing color data as 4-byte or 8-byte values using variable precision YCoCg-R color space encoding.
 
 The filters are detailed further in [Appendix B (Filters)](#appendix-b-filters).
 
@@ -538,3 +540,50 @@ float32_t decode(int32_t input) {
 ```
 
 The valid range of `e` is [-100, +100], which facilitates performant implementations. Decoding out of range values results in unspecified behavior, and encoders are expected to clamp `e` to the valid range.
+
+## Filter 4: color
+
+Color filter allows to encode color data using YCoCg-R color space transformation, which results in better compression for typical color data by exploiting correlation between color channels.
+
+This filter is only valid if `byteStride` is 4 or 8. When `byteStride` is 4, then the input and output of this filter are four 8-bit components, and when `byteStride` is 8, the input and output of this filter are four 16-bit components.
+
+The input to the filter is four 8-bit or 16-bit components, where the first component stores the Y (luma) value, the second component stores the Co (orange chrominance) value, the third component stores the Cg (green chrominance) value, and the fourth component stores the alpha value with the bit K used for scaling information.
+
+The transformation uses YCoCg-R encoding where:
+
+- Y = R/2 + G/2
+- Co = (R - B) / 2
+- Cg = (G - (R + B) / 2) / 2
+
+The alpha component uses K-1 bits for the alpha value with the high bit set to 1, where K is the bit depth (8 or 16).
+
+The output of the filter is four decoded color components (R, G, B, A), stored as 8-bit or 16-bit normalized integers.
+
+```
+void decode(intN_t input[4], intN_t output[4]) {
+	// recover scale from alpha high bit
+	int as = (1 << (firstbitset(input[3]) + 1)) - 1;
+
+	// convert to RGB in fixed point
+	int y = input[0], co = input[1], cg = input[2];
+
+	int r = y + co - cg;
+	int g = y + cg;
+	int b = y - co - cg;
+
+	// expand alpha by one bit to match other components, replicating last bit
+	int a = input[3] & (as >> 1);
+	a = (a << 1) | (a & 1);
+
+	// compute scaling factor
+	float ss = INTN_MAX / float(as);
+
+	// rounded float->int
+	output[0] = int(float(r) * ss + 0.5f);
+	output[1] = int(float(g) * ss + 0.5f);
+	output[2] = int(float(b) * ss + 0.5f);
+	output[3] = int(float(a) * ss + 0.5f);
+}
+```
+
+`INTN_MAX` is equal to 255 when using 8-bit components (N is 8) and equal to 65535 when using 16-bit components (N is 16).
diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/schema/bufferView.KHR_meshopt_compression.schema.json b/extensions/2.0/Khronos/KHR_meshopt_compression/schema/bufferView.KHR_meshopt_compression.schema.json
index 5582b81233..7f4fc117a9 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/schema/bufferView.KHR_meshopt_compression.schema.json
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/schema/bufferView.KHR_meshopt_compression.schema.json
@@ -39,7 +39,7 @@
         "filter": {
             "type": "string",
             "description": "The compression filter.",
-            "enum": [ "NONE", "OCTAHEDRAL", "QUATERNION", "EXPONENTIAL" ],
+            "enum": [ "NONE", "OCTAHEDRAL", "QUATERNION", "EXPONENTIAL", "COLOR" ],
             "default": "NONE"
         },
         "extensions": { },

From 0ef3275e65f35567fdc745564a8edcb8f0687eb5 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Thu, 31 Jul 2025 22:30:08 -0700
Subject: [PATCH 04/54] Add "differences from meshopt compression"

---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 6515f9ce49..9374565aff 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -587,3 +587,12 @@ void decode(intN_t input[4], intN_t output[4]) {
 ```
 
 `INTN_MAX` is equal to 255 when using 8-bit components (N is 8) and equal to 65535 when using 16-bit components (N is 16).
+
+# Appendix C: Differences from EXT_meshopt_compression
+
+This extension is derived from `EXT_meshopt_compression` with the following changes:
+
+- Vertex data uses upgraded v1 format which provides more types of bit packing and delta encoding to compress data better
+- Added `COLOR` filter to support lossy color compression at smaller compression ratios
+
+These improvements achieve better compression ratios for typical glTF content while maintaining the same fast decompression performance.

From fba14b22dc7f740cd87a7da3115898af6d3f6b2c Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Thu, 31 Jul 2025 22:53:57 -0700
Subject: [PATCH 05/54] Initial update of vertex encoding to v1

---
 .../Khronos/KHR_meshopt_compression/README.md | 67 ++++++++++++++++---
 1 file changed, 56 insertions(+), 11 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 9374565aff..3b81808c2d 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -197,13 +197,13 @@ To facilitate efficient decompression, deinterleaving and delta encoding are per
 
 The encoded stream structure is as follows:
 
-- Header byte, which must be equal to `0xa0`
+- Header byte, which must be equal to `0xa1`
 - One or more attribute blocks, detailed below
-- Tail block, which consists of a baseline element stored verbatim, padded to 32 bytes
+- Tail block, which consists of a baseline element stored verbatim, followed by channel information, padded to 24 bytes
 
 Note that there is no way to calculate the length of a stream; instead, it is expected that the input stream is correctly sized (using `byteLength`) so that the tail block element can be found.
 
-Each attribute block stores a sequence of deltas, with the first element in the first block using the deltas from the baseline element stored in the tail block, and each subsequent element using the deltas from the previous element. The attribute block always stores an integer number of elements, with that number computed as follows:
+Each attribute block encodes a sequence of deltas, with the first element in the first block using the deltas from the baseline element stored in the tail block, and each subsequent element using the deltas from the previous element. The attribute block always stores an integer number of elements, with that number computed as follows:
 
 ```
 maxBlockElements = min((8192 / byteStride) & ~15, 256)
@@ -212,7 +212,11 @@ blockElements = min(remainingElements, maxBlockElements)
 
 Where `remainingElements` is the number of elements that have yet to be decoded.
 
-Each attribute block consists of `byteStride` "data blocks" (one for each byte of the element), and each "data block" contains deltas stored for groups of elements. Each group always contains 16 elements; when the number of elements that needs to be encoded isn't divisible by 16, it gets rounded up and the remaining elements are ignored after decoding. In other terms:
+Each attribute block consists of:
+- Control header: `byteStride / 4` bytes specifying 4 control modes for each 4-byte channel
+- `byteStride` "data blocks" (one for each byte of the element), each containing deltas stored for groups of elements
+
+Each group always contains 16 elements; when the number of elements that needs to be encoded isn't divisible by 16, it gets rounded up and the remaining elements are ignored after decoding. In other terms:
 
 ```
 groupCount = ceil(blockElements / 16)
@@ -220,7 +224,20 @@ groupCount = ceil(blockElements / 16)
 
 For example, a stream with a `byteStride` of 64 containing 200 elements would be broken up into two attribute blocks: one containing 128 elements, and the other containing 72 elements. And these blocks would have 8 and 5 groups, respectively.
 
-The structure of each "data block" breaks down as follows:
+The control header contains 2 bits for each byte position, packed into bytes as follows:
+
+```
+controlByte = (controlForByte0 << 0) | (controlForByte1 << 2) | (controlForByte2 << 4) | (controlForByte3 << 6)
+```
+
+The control bits specify the control mode for each byte:
+
+- bits 0: Use bit lengths `{0, 1, 2, 4}` for encoding
+- bits 1: Use bit lengths `{1, 2, 4, 8}` for encoding  
+- bits 2: All byte deltas are 0; no data is stored for this byte
+- bits 3: Literal encoding; byte deltas are stored uncompressed
+
+The structure of each "data block" (when not using control mode 2 or 3) breaks down as follows:
 - Header bits, with 2 bits for each group, aligned to the byte boundary if groupCount is not divisible by 4
 - Delta blocks, with variable number of bytes stored for each group
 
@@ -230,14 +247,27 @@ Header bits are stored from least significant to most significant bit - header b
 (headerBitsForGroup0 << 0) | (headerBitsForGroup1 << 2) | (headerBitsForGroup2 << 4) | (headerBitsForGroup3 << 6)
 ```
 
-The header bits establish the delta encoding mode (0-3) for each group of 16 elements that follows:
+The header bits establish the delta encoding mode for each group of 16 elements:
 
+For control mode 0:
 - bits 0: All 16 byte deltas are 0; the size of the encoded block is 0 bytes
+- bits 1: Deltas are stored in 1-bit sentinel encoding; the size of the encoded block is [2..18] bytes
+- bits 2: Deltas are stored in 2-bit sentinel encoding; the size of the encoded block is [4..20] bytes
+- bits 3: Deltas are stored in 4-bit sentinel encoding; the size of the encoded block is [8..24] bytes
+
+For control mode 1:
+- bits 0: Deltas are stored in 1-bit sentinel encoding; the size of the encoded block is [2..18] bytes
 - bits 1: Deltas are stored in 2-bit sentinel encoding; the size of the encoded block is [4..20] bytes
 - bits 2: Deltas are stored in 4-bit sentinel encoding; the size of the encoded block is [8..24] bytes
 - bits 3: All 16 byte deltas are stored as bytes; the size of the encoded block is 16 bytes
 
-When using the sentinel encoding, each delta is stored as a 2-bit or 4-bit value in a single 4-byte or 8-byte block, with deltas stored from most significant to least significant bit inside the byte. That is, the 2-bit encoding is packed as follows with 4 deltas per byte:
+When using the sentinel encoding, each delta is stored as a 1-bit, 2-bit, or 4-bit value in packed bytes. For 2-bit and 4-bit encodings, deltas are stored from most significant to least significant bit inside the byte. For 1-bit encoding, deltas are stored from least significant to most significant bit to facilitate better reuse of lookup tables in efficient implementations. The 1-bit encoding is packed as follows with 8 deltas per byte:
+
+```
+(delta0 << 0) | (delta1 << 1) | (delta2 << 2) | (delta3 << 3) | (delta4 << 4) | (delta5 << 5) | (delta6 << 6) | (delta7 << 7)
+```
+
+The 2-bit encoding is packed as follows with 4 deltas per byte:
 
 ```
 (delta3 << 0) | (delta2 << 2) | (delta1 << 4) | (delta0 << 6)
@@ -246,14 +276,14 @@ When using the sentinel encoding, each delta is stored as a 2-bit or 4-bit value
 And the 4-bit encoding is packed as follows with 2 deltas per byte:
 
 ```
-(delta1 << 0) | (delta1 << 4)
+(delta1 << 0) | (delta0 << 4)
 ```
 
-Note that this is not the same order as the packing of the header bits found above.
+A delta that has all bits set to 1 (corresponds to `1` for 1-bit encoding, `3` for 2-bit encoding, and `15` for 4-bit encoding, otherwise known as "sentinel") indicates that the real delta value is outside of the bit range, and is stored as a full byte after the bit deltas for this group.
 
-A delta that has all bits set to 1 (corresponds to `3` for 2-bit encoding and `15` for 4-bit encoding, otherwise known as "sentinel") indicates that the real delta value is outside of the 2-bit or 4-bit range, and is stored as a full byte after the bit deltas for this group.
+Delta encoding varies by channel type (specified in the tail block):
 
-Byte deltas are stored as zigzag-encoded differences between the byte values of the element and the byte values of the previous element in the same position; the zigzag encoding scheme works as follows:
+**Channel 0 (byte deltas)**: Byte deltas are stored as zigzag-encoded differences between the byte values of the element and the byte values of the previous element in the same position; the zigzag encoding scheme works as follows:
 
 ```
 encode(uint8_t v) = ((v & 0x80) != 0) ? ~(v << 1) : (v << 1)
@@ -274,6 +304,21 @@ Encodes 16 deltas, where the first 8 bytes of the sequence specifies 16 4-bit de
 
 Finally, note that the deltas are computed in 8-bit integer space with wrap-around two-complement arithmetic; for example, if the values of the first byte of two consecutive elements are `0x00` and `0xff`, the byte delta that is stored is `-1` (`1` after zigzag encoding).
 
+**Channel 1 (2-byte deltas)**: 2-byte deltas are computed as zigzag-encoded differences between consecutive 2-byte values:
+
+```
+encode(uint16_t v) = ((v & 0x8000) != 0) ? ~(v << 1) : (v << 1)
+decode(uint16_t v) = ((v & 1) != 0) ? ~(v >> 1) : (v >> 1)
+```
+
+The deltas are computed in 16-bit integer space with wrap-around two-complement arithmetic.
+
+**Channel 2 (4-byte XOR deltas)**: 4-byte deltas are computed as XOR between consecutive 4-byte values, with an additional rotation applied based on the high 4 bits of the channel specification:
+
+```
+rotate(uint32_t v, int r) = (v << r) | (v >> (32 - r))
+```
+
 ## Mode 1: triangles
 
 Triangle compression compresses triangle list indices by exploiting similarity between consecutive triangles. Given a triangle stream that has been optimized for locality, very often subsequent triangles share an edge with the recently encoded triangle. The encoder uses a few other techniques to try to encode most triangles in optimized triangle lists into a single byte.

From 6f23bd051f57e26f6175588506fccbcfa9d2af35 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Thu, 31 Jul 2025 22:55:33 -0700
Subject: [PATCH 06/54] Use Unix line endings

---
 .../Khronos/KHR_meshopt_compression/README.md | 1286 ++++++++---------
 1 file changed, 643 insertions(+), 643 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 3b81808c2d..31ae441bb7 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -1,643 +1,643 @@
-# KHR\_meshopt\_compression
-
-## Contributors
-
-* Arseny Kapoulkine, [@zeuxcg](https://twitter.com/zeuxcg)
-* Jasper St. Pierre, [@JasperRLZ](https://twitter.com/JasperRLZ)
-
-## Status
-
-Complete, Ratified by the Khronos Group
-
-## Dependencies
-
-Written against the glTF 2.0 spec.
-
-## Overview
-
-glTF files come with a variety of binary data - vertex attribute data, index data, morph target deltas, animation inputs/outputs - that can be a substantial fraction of the overall transmission size. To optimize for delivery size, general-purpose compression such as gzip can be used - however, it often doesn't capture some common types of redundancy in glTF binary data.
-
-This extension provides a generic option for compressing binary data that is tailored to the common types of data seen in glTF buffers. The extension works on a bufferView level and as such is agnostic of how the data is used, supporting geometry (vertex and index data, including morph targets), animation (keyframe time and values) and other data, such as instance transforms for `EXT_mesh_gpu_instancing`.
-
-Similarly to supercompressed textures (see `KHR_texture_basisu`), this extension assumes that the buffer view data is optimized for GPU efficiency - using quantization and using optimal data order for GPU rendering - and provides a compression layer on top of bufferView data. Each bufferView is compressed in isolation which allows the loaders to maximally efficiently decompress the data directly into GPU storage.
-
-The compressed format is designed to have two properties beyond optimizing compression ratio - very fast decoding (using WebAssembly SIMD, the decoders run at \~1 GB/sec on modern desktop hardware), and byte-wise storage compatible with general-purpose compression. That is, instead of reducing the encoded size as much as possible, the bitstream is constructed in such a way that general-purpose compressor can compress it further.
-
-This is beneficial for typical Web delivery scenarios, where all files are usually using gzip compression - instead of completely replacing it, the codecs here augment it, while still reducing the size (which is valuable to optimize delivery size when gzip compression isn't available, and additionally reduces the performance impact of gzip decompression which is typically *much slower* than decoders proposed here).
-
-## Specifying compressed views
-
-As explained in the overview, this extension operates on bufferViews. This allows the loaders to directly decompress data into GPU memory and minimizes the JSON size impact of specifying compressed data. To specify the compressed representation, `KHR_meshopt_compression` extension section overrides the source buffer index as well as specifying the buffer parameters and a compression mode/filter (detailed later in the specification):
-
-```json
-{
-	"buffer": 1,
-	"byteOffset": 0,
-	"byteLength": 2368,
-	"byteStride": 16,
-	"target": 34962,
-	"extensions": {
-		"KHR_meshopt_compression": {
-			"buffer": 0,
-			"byteOffset": 1024,
-			"byteLength": 347,
-			"byteStride": 16,
-			"mode": "ATTRIBUTES",
-			"count": 148
-		}
-	}
-}
-```
-
-In this example, the uncompressed buffer contents is stored in buffer 1 (this can be used by loaders that don't implement this extension). The compressed data is stored in a separate buffer, specifying a separate byte range (with compressed data). Note that for compressors to work, they need to know the compression `mode`, `filter` (for `"ATTRIBUTES"` mode), and additionally the layout of the encoded data - `count` elements with `byteStride` bytes each. This data is specified in the extension JSON; while in some cases `byteStride` is available on the parent `bufferView` declaration, JSON schema prohibits specifying this for some types of storage such as index data.
-
-## JSON schema updates
-
-Each `bufferView` can contain an extension object with the following properties:
-
-| Property | Type | Description | Required |
-|:---------|:--------------|:------------------------------------------| :--------------------------|
-| `buffer` | `integer` | The index of the buffer with compressed data. | :white_check_mark: Yes |
-| `byteOffset` | `integer` | The offset into the buffer in bytes. | No, default: `0` |
-| `byteLength` | `integer` | The length of the compressed data in bytes. | :white_check_mark: Yes |
-| `byteStride` | `integer` | The stride, in bytes. | :white_check_mark: Yes |
-| `count` | `integer` | The number of elements. | :white_check_mark: Yes |
-| `mode` | `string` | The compression mode. | :white_check_mark: Yes |
-| `filter` | `string` | The compression filter. | No, default: `"NONE"` |
-
-`mode` represents the compression mode using an enumerated value that must be one of `"ATTRIBUTES"`, `"TRIANGLES"`, `"INDICES"`.
-
-`filter` represents the post-decompression filter using an enumerated value that must be one of `"NONE"`, `"OCTAHEDRAL"`, `"QUATERNION"`, `"EXPONENTIAL"`, `"COLOR"`.
-
-For the extension object to be valid, the following must hold:
-
-- When parent `bufferView` has `byteStride` defined, it matches `byteStride` in the extension JSON
-- The parent `bufferView.byteLength` is equal to `byteStride` times `count`
-- When `mode` is `"ATTRIBUTES"`, `byteStride` must be divisible by 4 and must be <= 256.
-- When `mode` is `"TRIANGLES"`, `count` must be divisible by 3
-- When `mode` is `"TRIANGLES"` or `"INDICES"`, `byteStride` must be equal to 2 or 4
-- When `mode` is `"TRIANGLES"` or `"INDICES"`, `filter` must be equal to `"NONE"` or omitted
-- When `filter` is `"OCTAHEDRAL"`, `byteStride` must be equal to 4 or 8
-- When `filter` is `"QUATERNION"`, `byteStride` must be equal to 8
-- When `filter` is `"EXPONENTIAL"`, `byteStride` must be divisible by 4
-- When `filter` is `"COLOR"`, `byteStride` must be equal to 4 or 8
-
-The type of compressed data must match the bitstream specification (note that each `mode` specifies a different bitstream format).
-
-The parent `bufferView` properties define a layout which can hold the data decompressed from the extension object.
-
-## Compression modes and filters
-
-Compression mode specifies the bitstream layout and the algorithm used to decompress the data, and can be one of:
-
-- Mode 0: attributes. Suitable for storing sequences of values of arbitrary size, relies on exploiting similarity between bytes of consecutive elements to reduce the size.
-- Mode 1: triangles. Suitable for storing indices that represent triangle lists, relies on exploiting topological redundancy of consecutive triangles.
-- Mode 2: indices. Suitable for storing indices that don't represent triangle lists, relies on exploiting similarity between consecutive elements.
-
-In all three modes, the resulting compressed byte sequence is typically noticeably smaller than the buffer view length, *and* can be additionally compressed by using a general purpose compression algorithm such as Deflate for the resulting glTF file (.glb/.bin).
-
-The format of the bitstream is specified in [Appendix A (Bitstream)](#appendix-a-bitstream).
-
-When using attribute encoding, for some types of data exploiting the redundancy between consecutive elements is not enough to achieve good compression ratio; quantization can help but isn't always sufficient either. To that end, when using mode 0, this extension allows a further use of a compression filter, that transforms each element stored in the buffer view to make it more compressible with the attribute codec and often allows to trade precision for compressed size. Filters don't change the size of the output data, they merely improve the compressed size by reducing entropy; note that the use of a compression filter restricts `byteStride` which effectively prohibits storing interleaved data.
-
-Filter specifies the algorithm used to transform the data after decompression, and can be one of:
-
-- Filter 0: none. Attribute data is used as is.
-- Filter 1: octahedral. Suitable for storing unit length vectors (normals/tangents) as 4-byte or 8-byte values with variable precision octahedral encoding.
-- Filter 2: quaternion. Suitable for storing rotation data for animations or instancing as 8-byte values with variable precision max-component encoding.
-- Filter 3: exponential. Suitable for storing floating point data as 4-byte values with variable mantissa precision.
-- Filter 4: color. Suitable for storing color data as 4-byte or 8-byte values using variable precision YCoCg-R color space encoding.
-
-The filters are detailed further in [Appendix B (Filters)](#appendix-b-filters).
-
-When using filters, the expectation is that the filter is applied after the attribute decoder on the contents of the resulting bufferView; the resulting data can then be used according to the referencing accessors without further modifications.
-
-**Non-normative** To decompress the data, [meshoptimizer](https://github.com/zeux/meshoptimizer) library may be used; it supports efficient decompression using C++ and/or WebAssembly, including fast SIMD implementation for attribute decoding.
-
-## Fallback buffers
-
-While the extension JSON specifies a separate buffer to source compressed data from, the parent `bufferView` must also have a valid `buffer` reference as per glTF 2.0 spec requirement. To produce glTF files that *require* support for this extension and don't have uncompressed data, the referenced buffer can contain no URI as follows:
-
-```json
-{ "byteLength": 1432878 }
-```
-
-The `byteLength` property of such a placeholder buffer **MUST** be sufficiently large to contain all uncompressed buffer views referencing it.
-
-When stored in a GLB file, the placeholder buffer should have index 1 or above, to avoid conflicts with GLB binary buffer.
-
-This extension allows buffers to be optionally tagged as fallback by using the `fallback` attribute as follows:
-
-```json
-{
-	"byteLength": 1432878,
-	"extensions": {
-		"KHR_meshopt_compression": {
-			"fallback": true
-		}
-	}
-}
-```
-
-This is useful to avoid confusion, and may also be used by loaders that support the extension to skip loading of these buffers.
-
-When a buffer is marked as a fallback buffer, the following must hold:
-
-- All references to the buffer must come from `bufferView`s that have a `KHR_meshopt_compression` extension specified
-- No references to the buffer may come from `KHR_meshopt_compression` extension JSON
-
-If a fallback buffer doesn't have a URI and doesn't refer to the GLB binary chunk, it follows that `KHR_meshopt_compression` must be a required extension.
-
-## Compressing geometry data
-
-> This section is non-normative.
-
-The codecs used by this extension can represent geometry exactly, replicating both vertex and index data without changes in contents or order. However, to get optimal compression, it's necessary to pre-process the data.
-
-To get optimal compression, encoders should optimize vertex and index data for locality of reference. Specifically:
-
-- Triangle order should be optimized to maximize the recency of previously encountered vertices; this is similar to optimizing meshes for vertex reuse aka post-transform cache in GPU hardware.
-- Vertex order should be linearized in the order that vertices appear in the index stream to get optimal index compression
-
-When index data is not available (e.g. point data sets) or represents topology with a lot of seams (e.g. each triangle has unique vertex indices because it specifies flat-shaded normal), encoders could additionally optimize vertex data for spatial locality, so that vertices close together in the vertex stream are close together in space.
-
-Vertex data should be quantized using the appropriate representation; this extension cleanly interacts with `KHR_mesh_quantization` by compressing already quantized data.
-
-Morph targets can be treated identically to other vertex attributes, as long as vertex order optimization is performed on all target streams at the same time. It is recommended to use quantized storage for morph target deltas, possibly with a narrower type than that used for baseline values.
-
-When storing vertex data, mode 0 (attributes) should be used; for index data, mode 1 (triangles) or mode 2 (indices) should be used instead. Mode 1 only supports triangle list storage; indices of other topology types can be stored using mode 2. The use of triangle strip topology is not recommended since it's more efficient to store triangle lists using mode 1.
-
-Using filter 1 (octahedral) for normal/tangent data may improve compression ratio further.
-
-## Compressing animation data
-
-> This section is non-normative.
-
-To minimize the size of animation data, it is important to reduce the number of stored keyframes and reduce the size of each keyframe.
-
-To reduce the number of keyframes, encoders can either selectively remove keyframes that don't contribute to the resulting movement, resulting in sparse input/output data, or resample the keyframes uniformly, resulting in uniformly dense data. Resampling can be beneficial since it means that all animation channels in the same animation can share the same input accessor, and provides a convenient quality vs size tradeoff, but it's up to the encoder to pick the optimal strategy.
-
-Additionally it's important to identify tracks with the same output value and use a single keyframe for these.
-
-To reduce the size of each keyframe, rotation data should be quantized using 16-bit normalized components; for additional compression, the use of filter 2 (quaternion) is recommended. Translation/scale data can be compressed using filter 3 (exponential) with the same exponent used for all three vector components.
-
-Note that animation inputs that specify time values require enough precision to avoid animation distortion. It's recommended to either not use any filters for animation inputs to avoid any precision loss (attribute encoder can still be efficient at reducing the size of animation input track even without filters when the inputs are uniformly spaced), or use filter 3 (exponential) with maximum mantissa bit count (23).
-
-After pre-processing, both input and output data should be stored using mode 0 (attributes).
-
-# Appendix A: Bitstream
-
-The following sections specify the format of the bitstream for compressed data for various modes.
-
-## Mode 0: attributes
-
-Attribute compression exploits similarity between consecutive elements of the buffer by encoding deltas. The deltas are stored for each separate byte which makes the codec more versatile since it can work with components of various sizes. Additionally, the elements are stored with bytes deinterleaved, which means that sequences of deltas are more easily compressible by some general purpose compressors that may run on the resulting data.
-
-To facilitate efficient decompression, deinterleaving and delta encoding are performed on attribute blocks instead of on the entire buffer; within each block, elements are processed in groups of 16.
-
-The encoded stream structure is as follows:
-
-- Header byte, which must be equal to `0xa1`
-- One or more attribute blocks, detailed below
-- Tail block, which consists of a baseline element stored verbatim, followed by channel information, padded to 24 bytes
-
-Note that there is no way to calculate the length of a stream; instead, it is expected that the input stream is correctly sized (using `byteLength`) so that the tail block element can be found.
-
-Each attribute block encodes a sequence of deltas, with the first element in the first block using the deltas from the baseline element stored in the tail block, and each subsequent element using the deltas from the previous element. The attribute block always stores an integer number of elements, with that number computed as follows:
-
-```
-maxBlockElements = min((8192 / byteStride) & ~15, 256)
-blockElements = min(remainingElements, maxBlockElements)
-```
-
-Where `remainingElements` is the number of elements that have yet to be decoded.
-
-Each attribute block consists of:
-- Control header: `byteStride / 4` bytes specifying 4 control modes for each 4-byte channel
-- `byteStride` "data blocks" (one for each byte of the element), each containing deltas stored for groups of elements
-
-Each group always contains 16 elements; when the number of elements that needs to be encoded isn't divisible by 16, it gets rounded up and the remaining elements are ignored after decoding. In other terms:
-
-```
-groupCount = ceil(blockElements / 16)
-```
-
-For example, a stream with a `byteStride` of 64 containing 200 elements would be broken up into two attribute blocks: one containing 128 elements, and the other containing 72 elements. And these blocks would have 8 and 5 groups, respectively.
-
-The control header contains 2 bits for each byte position, packed into bytes as follows:
-
-```
-controlByte = (controlForByte0 << 0) | (controlForByte1 << 2) | (controlForByte2 << 4) | (controlForByte3 << 6)
-```
-
-The control bits specify the control mode for each byte:
-
-- bits 0: Use bit lengths `{0, 1, 2, 4}` for encoding
-- bits 1: Use bit lengths `{1, 2, 4, 8}` for encoding  
-- bits 2: All byte deltas are 0; no data is stored for this byte
-- bits 3: Literal encoding; byte deltas are stored uncompressed
-
-The structure of each "data block" (when not using control mode 2 or 3) breaks down as follows:
-- Header bits, with 2 bits for each group, aligned to the byte boundary if groupCount is not divisible by 4
-- Delta blocks, with variable number of bytes stored for each group
-
-Header bits are stored from least significant to most significant bit - header bits for 4 consecutive groups are packed in a byte together as follows:
-
-```
-(headerBitsForGroup0 << 0) | (headerBitsForGroup1 << 2) | (headerBitsForGroup2 << 4) | (headerBitsForGroup3 << 6)
-```
-
-The header bits establish the delta encoding mode for each group of 16 elements:
-
-For control mode 0:
-- bits 0: All 16 byte deltas are 0; the size of the encoded block is 0 bytes
-- bits 1: Deltas are stored in 1-bit sentinel encoding; the size of the encoded block is [2..18] bytes
-- bits 2: Deltas are stored in 2-bit sentinel encoding; the size of the encoded block is [4..20] bytes
-- bits 3: Deltas are stored in 4-bit sentinel encoding; the size of the encoded block is [8..24] bytes
-
-For control mode 1:
-- bits 0: Deltas are stored in 1-bit sentinel encoding; the size of the encoded block is [2..18] bytes
-- bits 1: Deltas are stored in 2-bit sentinel encoding; the size of the encoded block is [4..20] bytes
-- bits 2: Deltas are stored in 4-bit sentinel encoding; the size of the encoded block is [8..24] bytes
-- bits 3: All 16 byte deltas are stored as bytes; the size of the encoded block is 16 bytes
-
-When using the sentinel encoding, each delta is stored as a 1-bit, 2-bit, or 4-bit value in packed bytes. For 2-bit and 4-bit encodings, deltas are stored from most significant to least significant bit inside the byte. For 1-bit encoding, deltas are stored from least significant to most significant bit to facilitate better reuse of lookup tables in efficient implementations. The 1-bit encoding is packed as follows with 8 deltas per byte:
-
-```
-(delta0 << 0) | (delta1 << 1) | (delta2 << 2) | (delta3 << 3) | (delta4 << 4) | (delta5 << 5) | (delta6 << 6) | (delta7 << 7)
-```
-
-The 2-bit encoding is packed as follows with 4 deltas per byte:
-
-```
-(delta3 << 0) | (delta2 << 2) | (delta1 << 4) | (delta0 << 6)
-```
-
-And the 4-bit encoding is packed as follows with 2 deltas per byte:
-
-```
-(delta1 << 0) | (delta0 << 4)
-```
-
-A delta that has all bits set to 1 (corresponds to `1` for 1-bit encoding, `3` for 2-bit encoding, and `15` for 4-bit encoding, otherwise known as "sentinel") indicates that the real delta value is outside of the bit range, and is stored as a full byte after the bit deltas for this group.
-
-Delta encoding varies by channel type (specified in the tail block):
-
-**Channel 0 (byte deltas)**: Byte deltas are stored as zigzag-encoded differences between the byte values of the element and the byte values of the previous element in the same position; the zigzag encoding scheme works as follows:
-
-```
-encode(uint8_t v) = ((v & 0x80) != 0) ? ~(v << 1) : (v << 1)
-decode(uint8_t v) = ((v & 1) != 0) ? ~(v >> 1) : (v >> 1)
-```
-
-For a complete example, assuming 4-bit sentinel coding, the following byte sequence:
-
-```
-0x17 0x5f 0xf0 0xbc 0x77 0xa9 0x21 0x00 0x34 0xb5
-```
-
-Encodes 16 deltas, where the first 8 bytes of the sequence specifies 16 4-bit deltas, and the last 2 bytes of the sequence specify the explicit delta code values encoded for elements 3 and 4 in the sequence. After de-zigzagging, the decoded deltas look like:
-
-```
--1 -4 -3 26 -91 0 -6 6 -4 -4 5 -5 1 -1 0 0
-```
-
-Finally, note that the deltas are computed in 8-bit integer space with wrap-around two-complement arithmetic; for example, if the values of the first byte of two consecutive elements are `0x00` and `0xff`, the byte delta that is stored is `-1` (`1` after zigzag encoding).
-
-**Channel 1 (2-byte deltas)**: 2-byte deltas are computed as zigzag-encoded differences between consecutive 2-byte values:
-
-```
-encode(uint16_t v) = ((v & 0x8000) != 0) ? ~(v << 1) : (v << 1)
-decode(uint16_t v) = ((v & 1) != 0) ? ~(v >> 1) : (v >> 1)
-```
-
-The deltas are computed in 16-bit integer space with wrap-around two-complement arithmetic.
-
-**Channel 2 (4-byte XOR deltas)**: 4-byte deltas are computed as XOR between consecutive 4-byte values, with an additional rotation applied based on the high 4 bits of the channel specification:
-
-```
-rotate(uint32_t v, int r) = (v << r) | (v >> (32 - r))
-```
-
-## Mode 1: triangles
-
-Triangle compression compresses triangle list indices by exploiting similarity between consecutive triangles. Given a triangle stream that has been optimized for locality, very often subsequent triangles share an edge with the recently encoded triangle. The encoder uses a few other techniques to try to encode most triangles in optimized triangle lists into a single byte.
-
-The encoded stream structure is as follows:
-
-- Header byte, which must be equal to `0xe1`
-- Triangle codes, referred to as `code` below, with a single byte for each triangle
-- Extra data which is necessary to decode triangles that don't fit into a single byte, referred to as `data` below
-- Tail block, which consists of a 16-byte lookup table, referred to as `codeaux` below
-
-Note that there is no way to calculate the length of a stream; instead, it is expected that the input stream is correctly sized (using `byteLength`) so that the tail block element can be found.
-
-There are two limitations on the structure of the 16-byte lookup table:
-
-- The last two bytes must be 0
-- The remaining bytes must not contain any nibbles equal to `0xf`.
-
-During the decoding process, decoder maintains four variables:
-
-- `next`: an integer referring to the expected next unique index (also known as high-watermark), starts at 0
-- `last`: an integer referring to the last encoded index, starts at 0
-- `edgefifo`: a 16-entry FIFO with two vertex indices in each entry; initial contents is undefined
-- `vertexfifo`: a 16-entry FIFO with a vertex index in each entry; initial contents is undefined
-
-To decode each triangle, the decoder needs to analyze the `code` byte, read additional bytes from `data` as necessary, and update the internal state correctly. The `code` byte encoding is optimized to reach a single byte per triangle in most common cases; the resulting data can often be compressed by a general purpose compressor running on the resulting .bin/.glb file.
-
-When extra data is necessary to decode a triangle and it represents an index value, the decoder uses varint-7 encoding (also known as [unsigned LEB128](https://en.wikipedia.org/wiki/LEB128#Unsigned_LEB128)), which encodes an integer as one or more bytes, with the byte with the 0 most significant bit terminating the sequence:
-
-```
-0x7f => 0x7f
-0x81 0x04 => 0x201
-0xff 0xa0 0x05 => 0x1fd005
-```
-
-Instead of using the raw index value, a zigzag-encoded 32-bit delta from `last` is used:
-
-```
-uint32_t decodeIndex(uint32_t v) {
-	int32_t delta = (v & 1) != 0 ? ~(v >> 1) : (v >> 1);
-
-	last += delta;
-	return last;
-}
-```
-
-The encoding for `code` is split into various cases, some of which are self-sufficient and some need to read extra data. The encoding is detailed below; after either path the triangle (a, b, c) is emitted to the output.
-
-- `0xX0`, where `X < 0xf`: Encodes a recently encountered edge and a `next` vertex.
-
-The edge (a, b) is read from the edge FIFO at index X (where 0 is the most recently added edge).
-The third index, `c`, is equal to `next` (which is then incremented).
-
-Edge (c, b) is pushed to the edge FIFO.
-Edge (a, c) is pushed to the edge FIFO.
-Vertex c is pushed to the vertex FIFO.
-
-- `0xXY`, where `X < 0xf` and `0 < Y < 0xd`: Encodes a recently encountered edge and a recently encountered vertex.
-
-The edge (a, b) is read from the edge FIFO at index X (where 0 is the most recently added edge).
-The third index, `c`, is read from the vertex FIFO at index Y (where 0 is the most recently added vertex; note that 0 is never actually read here, since `Y > 0`).
-
-Edge (c, b) is pushed to the edge FIFO.
-Edge (a, c) is pushed to the edge FIFO.
-
-- `0xXd` or `0xXe`, where `X < 0xf`: Encodes a recently encountered edge and a vertex that's adjacent to `last`.
-
-The edge (a, b) is read from the edge FIFO at index X (where 0 is the most recently added edge).
-The third index, `c`, is equal to `last-1` for `0xXd` and `last+1` for `0xXe`.
-
-`last` is set to `c` (effectively decrementing or incrementing it accordingly).
-
-Edge (c, b) is pushed to the edge FIFO.
-Edge (a, c) is pushed to the edge FIFO.
-Vertex c is pushed to the vertex FIFO.
-
-- `0xXf`, where `X < 0xf`: Encodes a recently encountered edge and a free-standing vertex encoded explicitly.
-
-The edge (a, b) is read from the edge FIFO at index X (where 0 is the most recently added edge).
-The third index, `c`, is decoded using `decodeIndex` by reading extra bytes from `data` (and also updates `last`).
-
-Edge (c, b) is pushed to edge FIFO.
-Edge (a, c) is pushed to edge FIFO.
-Vertex c is pushed to the vertex FIFO.
-
-- `0xfY`, where `Y < 0xe`: Encodes three indices using `codeaux` table lookup and vertex FIFO.
-
-The table `codeaux` is used to read the element Y; let's assume that results in `0xZW`.
-
-The first index, `a`, is equal to `next`; `next` is incremented to decode b/c correctly.
-The second index, `b`, is equal to `next` if `Z == 0` (`next` is then incremented), or is read from vertex FIFO at index `Z-1` (where 0 is the most recently added vertex).
-The third index, `c`, is equal to `next` if `W == 0` (`next` is then incremented), or is read from vertex FIFO at index `W-1` (where 0 is the most recently added vertex).
-
-Note that in the process `next` is incremented from 1 to 3 times depending on values of Z/W.
-
-Edge (b, a) is pushed to the edge FIFO.
-Edge (c, b) is pushed to the edge FIFO.
-Edge (a, c) is pushed to the edge FIFO.
-Vertex a is pushed to the vertex FIFO.
-Vertex b is pushed to the vertex FIFO if `Z == 0`.
-Vertex c is pushed to the vertex FIFO if `W == 0`.
-
-- `0xfe` or `0xff`: Encodes three indices explicitly.
-
-This requires an extra byte that is read from `data`; let's assume that results in `0xZW`. Note that this is *not* an LEB128 value, just a single byte.
-
-If `0xZW` == `0x00`, then `next` is reset to 0. This is a special mechanism used to restart the `next` sequence which is useful for concatenating independent triangle streams. This must be done before further processing.
-
-The first index, `a`, is equal to `next` for `0xfe` encoding (`next` is then incremented), or is read using `decodeIndex` by reading extra bytes from `data` (and also updates `last`).
-The second index, `b`, is equal to `next` if `Z == 0` (`next` is then incremented), is read from vertex FIFO at index `Z-1` (where 0 is the most recently added vertex) if `Z < 0xf`, or is read using `decodeIndex` by reading extra bytes from `data` (and also updates `last`) if `Z == 0xf`.
-The third index, `c`, is equal to `next` if `W == 0` (`next` is then incremented), is read from vertex FIFO at index `W-1` (where 0 is the most recently added vertex) if `W < 0xf`, or is read using `decodeIndex` by reading extra bytes from `data` (and also updates `last`) if `W == 0xf`.
-
-Edge (b, a) is pushed to the edge FIFO.
-Edge (c, b) is pushed to the edge FIFO.
-Edge (a, c) is pushed to the edge FIFO.
-Vertex a is pushed to the vertex FIFO.
-Vertex b is pushed to the vertex FIFO if `Z == 0` or `Z == 0xf`.
-Vertex c is pushed to the vertex FIFO if `W == 0` or `W == 0xf`.
-
-At the end of the decoding, `data` is expected to be fully read by all the triangle codes and not contain any extra bytes.
-
-## Mode 2: indices
-
-Index compression exploits similarity between consecutive indices. Note that, unlike the triangle index compression (mode 1), this mode doesn't assume a specific topology and as such is less efficient in terms of the resulting size. However, unlike mode 1, this mode can be used to compress triangle strips, line lists and other types of mesh index data, and can additionally be used to compress non-mesh index data such as sparse indices for accessors.
-
-The encoded stream structure is as follows:
-
-- Header byte, which must be equal to `0xd1`
-- A sequence of index deltas, with encoding specified below
-- Tail block, which consists of 4 bytes that are reserved and should be set to 0
-
-Instead of simply encoding deltas vs the previous index, the decoder tracks *two* baseline index values, that start at 0. Each delta is specified in relation to one of these values and updates it so that the next delta that references the same baseline uses the encoded index value as a reference. This encoding is more efficient at handling some types of bimodal sequences where two independent monotonic sequences are spliced together, which can occur for some common cases of triangle strips or line lists.
-
-To specify the index delta, the varint-7 encoding scheme (also known as [unsigned LEB128](https://en.wikipedia.org/wiki/LEB128#Unsigned_LEB128)) is used, which encodes an integer as one or more bytes, with the byte with the 0 most significant bit terminating the sequence:
-
-```
-0x7f => 0x7f
-0x81 0x04 => 0x201
-0xff 0xa0 0x05 => 0x1fd005
-```
-
-When decoding the deltas, the 32-bit value is read using the varint-7 encoding. The least significant bit of the value indicates one of the baseline values; the remaining bits specify a zigzag-encoded signed delta and can be decoded as follows:
-
-```
-uint32_t decode(uint32_t v) {
-	int32_t baseline = v & 1;
-	int32_t delta = (v & 2) != 0 ? ~(v >> 2) : (v >> 2);
-
-	last[baseline] += delta;
-	return last[baseline];
-}
-```
-
-It's up to the encoder to determine the optimal selection of the baseline for each index; this encoding scheme can be used to do basic delta encoding (with baseline bit always set to 0) as well as more complex bimodal encodings.
-
-Note that the zigzag-encoded delta must fit in a 31-bit integer; as such, deltas are limited to [-2^30..2^30-1].
-
-# Appendix B: Filters
-
-Filters are functions that transform each encoded attribute. For each filter, this document specifies the transformation used for decoding the data; it's up to the encoder to pick the parameters of the encoding for each element to balance quality and precision.
-
-For performance reasons the results of the decoding process are specified to one unit in last place (ULP) in terms of the decoded data, e.g. if a filter results in a 16-bit signed normalized integer, decoding may produce results within 1/32767 of specified value.
-
-## Filter 1: octahedral
-
-Octahedral filter allows to encode unit length 3D vectors (normals/tangents) using octahedral encoding, which results in a more optimal quality vs precision tradeoff compared to storing raw components.
-
-This filter is only valid if `byteStride` is 4 or 8. When `byteStride` is 4, then the input and output of this filter are four 8-bit components, and when `byteStride` is 8, the input and output of this filter are four 16-bit signed components.
-
-The input to the filter is four 8-bit or 16-bit components, where the first two specify the X and Y components in octahedral encoding encoded as signed normalized K-bit integers (4 <= K <= 16, integers are stored in two's complement format), the third component explicitly encodes 1.0 as a signed normalized K-bit integer. The last component may contain arbitrary data which is passed through unfiltered (this can be useful for tangents).
-
-The encoding of the third component allows to compute K for each vector independently from the bit representation, and must encode 1.0 precisely which is equivalent to `(1 << (K - 1)) - 1` as an integer; values of the third component that aren't equal to `(1 << (K - 1)) - 1` for a valid `K` are invalid and the result of decoding such vectors is unspecified.
-
-When storing a K-bit integer in a 8-bit of 16-bit component when K is not 8 or 16, the remaining bits (e.g. top 6 bits in case of K=10) must be equal to the sign bit; the valid range of the resulting integer is from `-max` to `max` where `max = (1 << (K - 1)) - 1`. The behavior of decoding values outside of that range is unspecified.
-
-The output of the filter is three decoded unit vector components, stored as 8-bit or 16-bit normalized integers, and the last input component verbatim.
-
-```
-void decode(intN_t input[4], intN_t output[4]) {
-	// input[2] encodes a K-bit representation of 1.0
-	float32_t one = input[2];
-
-	float32_t x = input[0] / one;
-	float32_t y = input[1] / one;
-	float32_t z = 1.0 - abs(x) - abs(y);
-
-	// octahedral fixup for negative hemisphere
-	float32_t t = min(z, 0.0);
-
-	x -= copysign(t, x);
-	y -= copysign(t, y);
-
-	// renormalize (x, y, z)
-	float32_t len = sqrt(x * x + y * y + z * z);
-
-	x /= len;
-	y /= len;
-	z /= len;
-
-	output[0] = round(x * INTN_MAX);
-	output[1] = round(y * INTN_MAX);
-	output[2] = round(z * INTN_MAX);
-	output[3] = input[3];
-}
-```
-
-`INTN_MAX` is equal to 127 when using 8-bit components (N is 8) and equal to 32767 when using 16-bit components (N is 16).
-
-`copysign` behaves as specified in C99 and returns the value with the magnitude of the first argument and the sign of the second argument.
-
-## Filter 2: quaternion
-
-Quaternion filter allows to encode unit length quaternions using normalized 16-bit integers for all components, but allows control over the precision used for the components and provides better quality compared to naively encoding each component one by one.
-
-This filter is only valid if `byteStride` is 8.
-
-The input to the filter is three quaternion components, excluding the component with the largest magnitude, encoded as signed normalized K-bit integers (4 <= K <= 16, integers are stored in two's complement format), and an index of the largest component that is omitted in the encoding. The largest component is assumed to always be positive (which is possible due to quaternion double-cover). To allow per-element control over K, the last input element must explicitly encode 1.0 as a signed normalized K-bit integer, except for the least significant 2 bits that store the index of the maximum component.
-
-When storing a K-bit integer in a 16-bit component when K is not 16, the remaining bits (e.g. top 6 bits in case of K=10) must be equal to the sign bit; the valid range of the resulting integer is from `-max` to `max` where `max = (1 << (K - 1)) - 1`. The behavior of decoding values outside of that range is unspecified.
-
-The output of the filter is four decoded quaternion components, stored as 16-bit normalized integers.
-
-After eliminating the maximum component, the maximum magnitude of the remaining components is 1/sqrt(2). Because of this the input components store the original component value scaled by sqrt(2.0) to increase precision.
-
-```
-void decode(int16_t input[4], int16_t output[4]) {
-	float32_t range = 1.0 / sqrt(2.0);
-
-	// input[3] encodes a K-bit representation of 1.0 except for bottom two bits
-	float32_t one = input[3] | 3;
-
-	float32_t x = input[0] / one * range;
-	float32_t y = input[1] / one * range;
-	float32_t z = input[2] / one * range;
-
-	float32_t w = sqrt(max(0.0, 1.0 - x * x - y * y - z * z));
-
-	int maxcomp = input[3] & 3;
-
-	// maxcomp specifies a cyclic rotation of the quaternion components
-	output[(maxcomp + 1) % 4] = round(x * 32767.0);
-	output[(maxcomp + 2) % 4] = round(y * 32767.0);
-	output[(maxcomp + 3) % 4] = round(z * 32767.0);
-	output[(maxcomp + 0) % 4] = round(w * 32767.0);
-}
-```
-
-## Filter 3: exponential
-
-Exponential filter allows to encode floating point values with a range close to the full range of a 32-bit floating point value, but allows more control over the exponent/mantissa to trade quality for precision, and has a bit structure that is more optimally aligned to the byte boundary to facilitate better compression.
-
-This filter is only valid if `byteStride` is a multiple of 4.
-
-The input to the filter is a sequence of 32-bit little endian integers, with the most significant 8 bits specifying a (signed) exponent value, and the remaining 24 bits specifying a (signed) mantissa value. The integers are stored in two-complement format.
-
-The result of the filter is 2^e * m:
-
-```
-float32_t decode(int32_t input) {
-	int32_t e = input >> 24;
-	int32_t m = (input << 8) >> 8;
-	return pow(2.0, e) * m;
-}
-```
-
-The valid range of `e` is [-100, +100], which facilitates performant implementations. Decoding out of range values results in unspecified behavior, and encoders are expected to clamp `e` to the valid range.
-
-## Filter 4: color
-
-Color filter allows to encode color data using YCoCg-R color space transformation, which results in better compression for typical color data by exploiting correlation between color channels.
-
-This filter is only valid if `byteStride` is 4 or 8. When `byteStride` is 4, then the input and output of this filter are four 8-bit components, and when `byteStride` is 8, the input and output of this filter are four 16-bit components.
-
-The input to the filter is four 8-bit or 16-bit components, where the first component stores the Y (luma) value, the second component stores the Co (orange chrominance) value, the third component stores the Cg (green chrominance) value, and the fourth component stores the alpha value with the bit K used for scaling information.
-
-The transformation uses YCoCg-R encoding where:
-
-- Y = R/2 + G/2
-- Co = (R - B) / 2
-- Cg = (G - (R + B) / 2) / 2
-
-The alpha component uses K-1 bits for the alpha value with the high bit set to 1, where K is the bit depth (8 or 16).
-
-The output of the filter is four decoded color components (R, G, B, A), stored as 8-bit or 16-bit normalized integers.
-
-```
-void decode(intN_t input[4], intN_t output[4]) {
-	// recover scale from alpha high bit
-	int as = (1 << (firstbitset(input[3]) + 1)) - 1;
-
-	// convert to RGB in fixed point
-	int y = input[0], co = input[1], cg = input[2];
-
-	int r = y + co - cg;
-	int g = y + cg;
-	int b = y - co - cg;
-
-	// expand alpha by one bit to match other components, replicating last bit
-	int a = input[3] & (as >> 1);
-	a = (a << 1) | (a & 1);
-
-	// compute scaling factor
-	float ss = INTN_MAX / float(as);
-
-	// rounded float->int
-	output[0] = int(float(r) * ss + 0.5f);
-	output[1] = int(float(g) * ss + 0.5f);
-	output[2] = int(float(b) * ss + 0.5f);
-	output[3] = int(float(a) * ss + 0.5f);
-}
-```
-
-`INTN_MAX` is equal to 255 when using 8-bit components (N is 8) and equal to 65535 when using 16-bit components (N is 16).
-
-# Appendix C: Differences from EXT_meshopt_compression
-
-This extension is derived from `EXT_meshopt_compression` with the following changes:
-
-- Vertex data uses upgraded v1 format which provides more types of bit packing and delta encoding to compress data better
-- Added `COLOR` filter to support lossy color compression at smaller compression ratios
-
-These improvements achieve better compression ratios for typical glTF content while maintaining the same fast decompression performance.
+# KHR\_meshopt\_compression
+
+## Contributors
+
+* Arseny Kapoulkine, [@zeuxcg](https://twitter.com/zeuxcg)
+* Jasper St. Pierre, [@JasperRLZ](https://twitter.com/JasperRLZ)
+
+## Status
+
+Complete, Ratified by the Khronos Group
+
+## Dependencies
+
+Written against the glTF 2.0 spec.
+
+## Overview
+
+glTF files come with a variety of binary data - vertex attribute data, index data, morph target deltas, animation inputs/outputs - that can be a substantial fraction of the overall transmission size. To optimize for delivery size, general-purpose compression such as gzip can be used - however, it often doesn't capture some common types of redundancy in glTF binary data.
+
+This extension provides a generic option for compressing binary data that is tailored to the common types of data seen in glTF buffers. The extension works on a bufferView level and as such is agnostic of how the data is used, supporting geometry (vertex and index data, including morph targets), animation (keyframe time and values) and other data, such as instance transforms for `EXT_mesh_gpu_instancing`.
+
+Similarly to supercompressed textures (see `KHR_texture_basisu`), this extension assumes that the buffer view data is optimized for GPU efficiency - using quantization and using optimal data order for GPU rendering - and provides a compression layer on top of bufferView data. Each bufferView is compressed in isolation which allows the loaders to maximally efficiently decompress the data directly into GPU storage.
+
+The compressed format is designed to have two properties beyond optimizing compression ratio - very fast decoding (using WebAssembly SIMD, the decoders run at \~1 GB/sec on modern desktop hardware), and byte-wise storage compatible with general-purpose compression. That is, instead of reducing the encoded size as much as possible, the bitstream is constructed in such a way that general-purpose compressor can compress it further.
+
+This is beneficial for typical Web delivery scenarios, where all files are usually using gzip compression - instead of completely replacing it, the codecs here augment it, while still reducing the size (which is valuable to optimize delivery size when gzip compression isn't available, and additionally reduces the performance impact of gzip decompression which is typically *much slower* than decoders proposed here).
+
+## Specifying compressed views
+
+As explained in the overview, this extension operates on bufferViews. This allows the loaders to directly decompress data into GPU memory and minimizes the JSON size impact of specifying compressed data. To specify the compressed representation, `KHR_meshopt_compression` extension section overrides the source buffer index as well as specifying the buffer parameters and a compression mode/filter (detailed later in the specification):
+
+```json
+{
+	"buffer": 1,
+	"byteOffset": 0,
+	"byteLength": 2368,
+	"byteStride": 16,
+	"target": 34962,
+	"extensions": {
+		"KHR_meshopt_compression": {
+			"buffer": 0,
+			"byteOffset": 1024,
+			"byteLength": 347,
+			"byteStride": 16,
+			"mode": "ATTRIBUTES",
+			"count": 148
+		}
+	}
+}
+```
+
+In this example, the uncompressed buffer contents is stored in buffer 1 (this can be used by loaders that don't implement this extension). The compressed data is stored in a separate buffer, specifying a separate byte range (with compressed data). Note that for compressors to work, they need to know the compression `mode`, `filter` (for `"ATTRIBUTES"` mode), and additionally the layout of the encoded data - `count` elements with `byteStride` bytes each. This data is specified in the extension JSON; while in some cases `byteStride` is available on the parent `bufferView` declaration, JSON schema prohibits specifying this for some types of storage such as index data.
+
+## JSON schema updates
+
+Each `bufferView` can contain an extension object with the following properties:
+
+| Property | Type | Description | Required |
+|:---------|:--------------|:------------------------------------------| :--------------------------|
+| `buffer` | `integer` | The index of the buffer with compressed data. | :white_check_mark: Yes |
+| `byteOffset` | `integer` | The offset into the buffer in bytes. | No, default: `0` |
+| `byteLength` | `integer` | The length of the compressed data in bytes. | :white_check_mark: Yes |
+| `byteStride` | `integer` | The stride, in bytes. | :white_check_mark: Yes |
+| `count` | `integer` | The number of elements. | :white_check_mark: Yes |
+| `mode` | `string` | The compression mode. | :white_check_mark: Yes |
+| `filter` | `string` | The compression filter. | No, default: `"NONE"` |
+
+`mode` represents the compression mode using an enumerated value that must be one of `"ATTRIBUTES"`, `"TRIANGLES"`, `"INDICES"`.
+
+`filter` represents the post-decompression filter using an enumerated value that must be one of `"NONE"`, `"OCTAHEDRAL"`, `"QUATERNION"`, `"EXPONENTIAL"`, `"COLOR"`.
+
+For the extension object to be valid, the following must hold:
+
+- When parent `bufferView` has `byteStride` defined, it matches `byteStride` in the extension JSON
+- The parent `bufferView.byteLength` is equal to `byteStride` times `count`
+- When `mode` is `"ATTRIBUTES"`, `byteStride` must be divisible by 4 and must be <= 256.
+- When `mode` is `"TRIANGLES"`, `count` must be divisible by 3
+- When `mode` is `"TRIANGLES"` or `"INDICES"`, `byteStride` must be equal to 2 or 4
+- When `mode` is `"TRIANGLES"` or `"INDICES"`, `filter` must be equal to `"NONE"` or omitted
+- When `filter` is `"OCTAHEDRAL"`, `byteStride` must be equal to 4 or 8
+- When `filter` is `"QUATERNION"`, `byteStride` must be equal to 8
+- When `filter` is `"EXPONENTIAL"`, `byteStride` must be divisible by 4
+- When `filter` is `"COLOR"`, `byteStride` must be equal to 4 or 8
+
+The type of compressed data must match the bitstream specification (note that each `mode` specifies a different bitstream format).
+
+The parent `bufferView` properties define a layout which can hold the data decompressed from the extension object.
+
+## Compression modes and filters
+
+Compression mode specifies the bitstream layout and the algorithm used to decompress the data, and can be one of:
+
+- Mode 0: attributes. Suitable for storing sequences of values of arbitrary size, relies on exploiting similarity between bytes of consecutive elements to reduce the size.
+- Mode 1: triangles. Suitable for storing indices that represent triangle lists, relies on exploiting topological redundancy of consecutive triangles.
+- Mode 2: indices. Suitable for storing indices that don't represent triangle lists, relies on exploiting similarity between consecutive elements.
+
+In all three modes, the resulting compressed byte sequence is typically noticeably smaller than the buffer view length, *and* can be additionally compressed by using a general purpose compression algorithm such as Deflate for the resulting glTF file (.glb/.bin).
+
+The format of the bitstream is specified in [Appendix A (Bitstream)](#appendix-a-bitstream).
+
+When using attribute encoding, for some types of data exploiting the redundancy between consecutive elements is not enough to achieve good compression ratio; quantization can help but isn't always sufficient either. To that end, when using mode 0, this extension allows a further use of a compression filter, that transforms each element stored in the buffer view to make it more compressible with the attribute codec and often allows to trade precision for compressed size. Filters don't change the size of the output data, they merely improve the compressed size by reducing entropy; note that the use of a compression filter restricts `byteStride` which effectively prohibits storing interleaved data.
+
+Filter specifies the algorithm used to transform the data after decompression, and can be one of:
+
+- Filter 0: none. Attribute data is used as is.
+- Filter 1: octahedral. Suitable for storing unit length vectors (normals/tangents) as 4-byte or 8-byte values with variable precision octahedral encoding.
+- Filter 2: quaternion. Suitable for storing rotation data for animations or instancing as 8-byte values with variable precision max-component encoding.
+- Filter 3: exponential. Suitable for storing floating point data as 4-byte values with variable mantissa precision.
+- Filter 4: color. Suitable for storing color data as 4-byte or 8-byte values using variable precision YCoCg-R color space encoding.
+
+The filters are detailed further in [Appendix B (Filters)](#appendix-b-filters).
+
+When using filters, the expectation is that the filter is applied after the attribute decoder on the contents of the resulting bufferView; the resulting data can then be used according to the referencing accessors without further modifications.
+
+**Non-normative** To decompress the data, [meshoptimizer](https://github.com/zeux/meshoptimizer) library may be used; it supports efficient decompression using C++ and/or WebAssembly, including fast SIMD implementation for attribute decoding.
+
+## Fallback buffers
+
+While the extension JSON specifies a separate buffer to source compressed data from, the parent `bufferView` must also have a valid `buffer` reference as per glTF 2.0 spec requirement. To produce glTF files that *require* support for this extension and don't have uncompressed data, the referenced buffer can contain no URI as follows:
+
+```json
+{ "byteLength": 1432878 }
+```
+
+The `byteLength` property of such a placeholder buffer **MUST** be sufficiently large to contain all uncompressed buffer views referencing it.
+
+When stored in a GLB file, the placeholder buffer should have index 1 or above, to avoid conflicts with GLB binary buffer.
+
+This extension allows buffers to be optionally tagged as fallback by using the `fallback` attribute as follows:
+
+```json
+{
+	"byteLength": 1432878,
+	"extensions": {
+		"KHR_meshopt_compression": {
+			"fallback": true
+		}
+	}
+}
+```
+
+This is useful to avoid confusion, and may also be used by loaders that support the extension to skip loading of these buffers.
+
+When a buffer is marked as a fallback buffer, the following must hold:
+
+- All references to the buffer must come from `bufferView`s that have a `KHR_meshopt_compression` extension specified
+- No references to the buffer may come from `KHR_meshopt_compression` extension JSON
+
+If a fallback buffer doesn't have a URI and doesn't refer to the GLB binary chunk, it follows that `KHR_meshopt_compression` must be a required extension.
+
+## Compressing geometry data
+
+> This section is non-normative.
+
+The codecs used by this extension can represent geometry exactly, replicating both vertex and index data without changes in contents or order. However, to get optimal compression, it's necessary to pre-process the data.
+
+To get optimal compression, encoders should optimize vertex and index data for locality of reference. Specifically:
+
+- Triangle order should be optimized to maximize the recency of previously encountered vertices; this is similar to optimizing meshes for vertex reuse aka post-transform cache in GPU hardware.
+- Vertex order should be linearized in the order that vertices appear in the index stream to get optimal index compression
+
+When index data is not available (e.g. point data sets) or represents topology with a lot of seams (e.g. each triangle has unique vertex indices because it specifies flat-shaded normal), encoders could additionally optimize vertex data for spatial locality, so that vertices close together in the vertex stream are close together in space.
+
+Vertex data should be quantized using the appropriate representation; this extension cleanly interacts with `KHR_mesh_quantization` by compressing already quantized data.
+
+Morph targets can be treated identically to other vertex attributes, as long as vertex order optimization is performed on all target streams at the same time. It is recommended to use quantized storage for morph target deltas, possibly with a narrower type than that used for baseline values.
+
+When storing vertex data, mode 0 (attributes) should be used; for index data, mode 1 (triangles) or mode 2 (indices) should be used instead. Mode 1 only supports triangle list storage; indices of other topology types can be stored using mode 2. The use of triangle strip topology is not recommended since it's more efficient to store triangle lists using mode 1.
+
+Using filter 1 (octahedral) for normal/tangent data may improve compression ratio further.
+
+## Compressing animation data
+
+> This section is non-normative.
+
+To minimize the size of animation data, it is important to reduce the number of stored keyframes and reduce the size of each keyframe.
+
+To reduce the number of keyframes, encoders can either selectively remove keyframes that don't contribute to the resulting movement, resulting in sparse input/output data, or resample the keyframes uniformly, resulting in uniformly dense data. Resampling can be beneficial since it means that all animation channels in the same animation can share the same input accessor, and provides a convenient quality vs size tradeoff, but it's up to the encoder to pick the optimal strategy.
+
+Additionally it's important to identify tracks with the same output value and use a single keyframe for these.
+
+To reduce the size of each keyframe, rotation data should be quantized using 16-bit normalized components; for additional compression, the use of filter 2 (quaternion) is recommended. Translation/scale data can be compressed using filter 3 (exponential) with the same exponent used for all three vector components.
+
+Note that animation inputs that specify time values require enough precision to avoid animation distortion. It's recommended to either not use any filters for animation inputs to avoid any precision loss (attribute encoder can still be efficient at reducing the size of animation input track even without filters when the inputs are uniformly spaced), or use filter 3 (exponential) with maximum mantissa bit count (23).
+
+After pre-processing, both input and output data should be stored using mode 0 (attributes).
+
+# Appendix A: Bitstream
+
+The following sections specify the format of the bitstream for compressed data for various modes.
+
+## Mode 0: attributes
+
+Attribute compression exploits similarity between consecutive elements of the buffer by encoding deltas. The deltas are stored for each separate byte which makes the codec more versatile since it can work with components of various sizes. Additionally, the elements are stored with bytes deinterleaved, which means that sequences of deltas are more easily compressible by some general purpose compressors that may run on the resulting data.
+
+To facilitate efficient decompression, deinterleaving and delta encoding are performed on attribute blocks instead of on the entire buffer; within each block, elements are processed in groups of 16.
+
+The encoded stream structure is as follows:
+
+- Header byte, which must be equal to `0xa1`
+- One or more attribute blocks, detailed below
+- Tail block, which consists of a baseline element stored verbatim, followed by channel information, padded to 24 bytes
+
+Note that there is no way to calculate the length of a stream; instead, it is expected that the input stream is correctly sized (using `byteLength`) so that the tail block element can be found.
+
+Each attribute block encodes a sequence of deltas, with the first element in the first block using the deltas from the baseline element stored in the tail block, and each subsequent element using the deltas from the previous element. The attribute block always stores an integer number of elements, with that number computed as follows:
+
+```
+maxBlockElements = min((8192 / byteStride) & ~15, 256)
+blockElements = min(remainingElements, maxBlockElements)
+```
+
+Where `remainingElements` is the number of elements that have yet to be decoded.
+
+Each attribute block consists of:
+- Control header: `byteStride / 4` bytes specifying 4 control modes for each 4-byte channel
+- `byteStride` "data blocks" (one for each byte of the element), each containing deltas stored for groups of elements
+
+Each group always contains 16 elements; when the number of elements that needs to be encoded isn't divisible by 16, it gets rounded up and the remaining elements are ignored after decoding. In other terms:
+
+```
+groupCount = ceil(blockElements / 16)
+```
+
+For example, a stream with a `byteStride` of 64 containing 200 elements would be broken up into two attribute blocks: one containing 128 elements, and the other containing 72 elements. And these blocks would have 8 and 5 groups, respectively.
+
+The control header contains 2 bits for each byte position, packed into bytes as follows:
+
+```
+controlByte = (controlForByte0 << 0) | (controlForByte1 << 2) | (controlForByte2 << 4) | (controlForByte3 << 6)
+```
+
+The control bits specify the control mode for each byte:
+
+- bits 0: Use bit lengths `{0, 1, 2, 4}` for encoding
+- bits 1: Use bit lengths `{1, 2, 4, 8}` for encoding  
+- bits 2: All byte deltas are 0; no data is stored for this byte
+- bits 3: Literal encoding; byte deltas are stored uncompressed
+
+The structure of each "data block" (when not using control mode 2 or 3) breaks down as follows:
+- Header bits, with 2 bits for each group, aligned to the byte boundary if groupCount is not divisible by 4
+- Delta blocks, with variable number of bytes stored for each group
+
+Header bits are stored from least significant to most significant bit - header bits for 4 consecutive groups are packed in a byte together as follows:
+
+```
+(headerBitsForGroup0 << 0) | (headerBitsForGroup1 << 2) | (headerBitsForGroup2 << 4) | (headerBitsForGroup3 << 6)
+```
+
+The header bits establish the delta encoding mode for each group of 16 elements:
+
+For control mode 0:
+- bits 0: All 16 byte deltas are 0; the size of the encoded block is 0 bytes
+- bits 1: Deltas are stored in 1-bit sentinel encoding; the size of the encoded block is [2..18] bytes
+- bits 2: Deltas are stored in 2-bit sentinel encoding; the size of the encoded block is [4..20] bytes
+- bits 3: Deltas are stored in 4-bit sentinel encoding; the size of the encoded block is [8..24] bytes
+
+For control mode 1:
+- bits 0: Deltas are stored in 1-bit sentinel encoding; the size of the encoded block is [2..18] bytes
+- bits 1: Deltas are stored in 2-bit sentinel encoding; the size of the encoded block is [4..20] bytes
+- bits 2: Deltas are stored in 4-bit sentinel encoding; the size of the encoded block is [8..24] bytes
+- bits 3: All 16 byte deltas are stored as bytes; the size of the encoded block is 16 bytes
+
+When using the sentinel encoding, each delta is stored as a 1-bit, 2-bit, or 4-bit value in packed bytes. For 2-bit and 4-bit encodings, deltas are stored from most significant to least significant bit inside the byte. For 1-bit encoding, deltas are stored from least significant to most significant bit to facilitate better reuse of lookup tables in efficient implementations. The 1-bit encoding is packed as follows with 8 deltas per byte:
+
+```
+(delta0 << 0) | (delta1 << 1) | (delta2 << 2) | (delta3 << 3) | (delta4 << 4) | (delta5 << 5) | (delta6 << 6) | (delta7 << 7)
+```
+
+The 2-bit encoding is packed as follows with 4 deltas per byte:
+
+```
+(delta3 << 0) | (delta2 << 2) | (delta1 << 4) | (delta0 << 6)
+```
+
+And the 4-bit encoding is packed as follows with 2 deltas per byte:
+
+```
+(delta1 << 0) | (delta0 << 4)
+```
+
+A delta that has all bits set to 1 (corresponds to `1` for 1-bit encoding, `3` for 2-bit encoding, and `15` for 4-bit encoding, otherwise known as "sentinel") indicates that the real delta value is outside of the bit range, and is stored as a full byte after the bit deltas for this group.
+
+Delta encoding varies by channel type (specified in the tail block):
+
+**Channel 0 (byte deltas)**: Byte deltas are stored as zigzag-encoded differences between the byte values of the element and the byte values of the previous element in the same position; the zigzag encoding scheme works as follows:
+
+```
+encode(uint8_t v) = ((v & 0x80) != 0) ? ~(v << 1) : (v << 1)
+decode(uint8_t v) = ((v & 1) != 0) ? ~(v >> 1) : (v >> 1)
+```
+
+For a complete example, assuming 4-bit sentinel coding, the following byte sequence:
+
+```
+0x17 0x5f 0xf0 0xbc 0x77 0xa9 0x21 0x00 0x34 0xb5
+```
+
+Encodes 16 deltas, where the first 8 bytes of the sequence specifies 16 4-bit deltas, and the last 2 bytes of the sequence specify the explicit delta code values encoded for elements 3 and 4 in the sequence. After de-zigzagging, the decoded deltas look like:
+
+```
+-1 -4 -3 26 -91 0 -6 6 -4 -4 5 -5 1 -1 0 0
+```
+
+Finally, note that the deltas are computed in 8-bit integer space with wrap-around two-complement arithmetic; for example, if the values of the first byte of two consecutive elements are `0x00` and `0xff`, the byte delta that is stored is `-1` (`1` after zigzag encoding).
+
+**Channel 1 (2-byte deltas)**: 2-byte deltas are computed as zigzag-encoded differences between consecutive 2-byte values:
+
+```
+encode(uint16_t v) = ((v & 0x8000) != 0) ? ~(v << 1) : (v << 1)
+decode(uint16_t v) = ((v & 1) != 0) ? ~(v >> 1) : (v >> 1)
+```
+
+The deltas are computed in 16-bit integer space with wrap-around two-complement arithmetic.
+
+**Channel 2 (4-byte XOR deltas)**: 4-byte deltas are computed as XOR between consecutive 4-byte values, with an additional rotation applied based on the high 4 bits of the channel specification:
+
+```
+rotate(uint32_t v, int r) = (v << r) | (v >> (32 - r))
+```
+
+## Mode 1: triangles
+
+Triangle compression compresses triangle list indices by exploiting similarity between consecutive triangles. Given a triangle stream that has been optimized for locality, very often subsequent triangles share an edge with the recently encoded triangle. The encoder uses a few other techniques to try to encode most triangles in optimized triangle lists into a single byte.
+
+The encoded stream structure is as follows:
+
+- Header byte, which must be equal to `0xe1`
+- Triangle codes, referred to as `code` below, with a single byte for each triangle
+- Extra data which is necessary to decode triangles that don't fit into a single byte, referred to as `data` below
+- Tail block, which consists of a 16-byte lookup table, referred to as `codeaux` below
+
+Note that there is no way to calculate the length of a stream; instead, it is expected that the input stream is correctly sized (using `byteLength`) so that the tail block element can be found.
+
+There are two limitations on the structure of the 16-byte lookup table:
+
+- The last two bytes must be 0
+- The remaining bytes must not contain any nibbles equal to `0xf`.
+
+During the decoding process, decoder maintains four variables:
+
+- `next`: an integer referring to the expected next unique index (also known as high-watermark), starts at 0
+- `last`: an integer referring to the last encoded index, starts at 0
+- `edgefifo`: a 16-entry FIFO with two vertex indices in each entry; initial contents is undefined
+- `vertexfifo`: a 16-entry FIFO with a vertex index in each entry; initial contents is undefined
+
+To decode each triangle, the decoder needs to analyze the `code` byte, read additional bytes from `data` as necessary, and update the internal state correctly. The `code` byte encoding is optimized to reach a single byte per triangle in most common cases; the resulting data can often be compressed by a general purpose compressor running on the resulting .bin/.glb file.
+
+When extra data is necessary to decode a triangle and it represents an index value, the decoder uses varint-7 encoding (also known as [unsigned LEB128](https://en.wikipedia.org/wiki/LEB128#Unsigned_LEB128)), which encodes an integer as one or more bytes, with the byte with the 0 most significant bit terminating the sequence:
+
+```
+0x7f => 0x7f
+0x81 0x04 => 0x201
+0xff 0xa0 0x05 => 0x1fd005
+```
+
+Instead of using the raw index value, a zigzag-encoded 32-bit delta from `last` is used:
+
+```
+uint32_t decodeIndex(uint32_t v) {
+	int32_t delta = (v & 1) != 0 ? ~(v >> 1) : (v >> 1);
+
+	last += delta;
+	return last;
+}
+```
+
+The encoding for `code` is split into various cases, some of which are self-sufficient and some need to read extra data. The encoding is detailed below; after either path the triangle (a, b, c) is emitted to the output.
+
+- `0xX0`, where `X < 0xf`: Encodes a recently encountered edge and a `next` vertex.
+
+The edge (a, b) is read from the edge FIFO at index X (where 0 is the most recently added edge).
+The third index, `c`, is equal to `next` (which is then incremented).
+
+Edge (c, b) is pushed to the edge FIFO.
+Edge (a, c) is pushed to the edge FIFO.
+Vertex c is pushed to the vertex FIFO.
+
+- `0xXY`, where `X < 0xf` and `0 < Y < 0xd`: Encodes a recently encountered edge and a recently encountered vertex.
+
+The edge (a, b) is read from the edge FIFO at index X (where 0 is the most recently added edge).
+The third index, `c`, is read from the vertex FIFO at index Y (where 0 is the most recently added vertex; note that 0 is never actually read here, since `Y > 0`).
+
+Edge (c, b) is pushed to the edge FIFO.
+Edge (a, c) is pushed to the edge FIFO.
+
+- `0xXd` or `0xXe`, where `X < 0xf`: Encodes a recently encountered edge and a vertex that's adjacent to `last`.
+
+The edge (a, b) is read from the edge FIFO at index X (where 0 is the most recently added edge).
+The third index, `c`, is equal to `last-1` for `0xXd` and `last+1` for `0xXe`.
+
+`last` is set to `c` (effectively decrementing or incrementing it accordingly).
+
+Edge (c, b) is pushed to the edge FIFO.
+Edge (a, c) is pushed to the edge FIFO.
+Vertex c is pushed to the vertex FIFO.
+
+- `0xXf`, where `X < 0xf`: Encodes a recently encountered edge and a free-standing vertex encoded explicitly.
+
+The edge (a, b) is read from the edge FIFO at index X (where 0 is the most recently added edge).
+The third index, `c`, is decoded using `decodeIndex` by reading extra bytes from `data` (and also updates `last`).
+
+Edge (c, b) is pushed to edge FIFO.
+Edge (a, c) is pushed to edge FIFO.
+Vertex c is pushed to the vertex FIFO.
+
+- `0xfY`, where `Y < 0xe`: Encodes three indices using `codeaux` table lookup and vertex FIFO.
+
+The table `codeaux` is used to read the element Y; let's assume that results in `0xZW`.
+
+The first index, `a`, is equal to `next`; `next` is incremented to decode b/c correctly.
+The second index, `b`, is equal to `next` if `Z == 0` (`next` is then incremented), or is read from vertex FIFO at index `Z-1` (where 0 is the most recently added vertex).
+The third index, `c`, is equal to `next` if `W == 0` (`next` is then incremented), or is read from vertex FIFO at index `W-1` (where 0 is the most recently added vertex).
+
+Note that in the process `next` is incremented from 1 to 3 times depending on values of Z/W.
+
+Edge (b, a) is pushed to the edge FIFO.
+Edge (c, b) is pushed to the edge FIFO.
+Edge (a, c) is pushed to the edge FIFO.
+Vertex a is pushed to the vertex FIFO.
+Vertex b is pushed to the vertex FIFO if `Z == 0`.
+Vertex c is pushed to the vertex FIFO if `W == 0`.
+
+- `0xfe` or `0xff`: Encodes three indices explicitly.
+
+This requires an extra byte that is read from `data`; let's assume that results in `0xZW`. Note that this is *not* an LEB128 value, just a single byte.
+
+If `0xZW` == `0x00`, then `next` is reset to 0. This is a special mechanism used to restart the `next` sequence which is useful for concatenating independent triangle streams. This must be done before further processing.
+
+The first index, `a`, is equal to `next` for `0xfe` encoding (`next` is then incremented), or is read using `decodeIndex` by reading extra bytes from `data` (and also updates `last`).
+The second index, `b`, is equal to `next` if `Z == 0` (`next` is then incremented), is read from vertex FIFO at index `Z-1` (where 0 is the most recently added vertex) if `Z < 0xf`, or is read using `decodeIndex` by reading extra bytes from `data` (and also updates `last`) if `Z == 0xf`.
+The third index, `c`, is equal to `next` if `W == 0` (`next` is then incremented), is read from vertex FIFO at index `W-1` (where 0 is the most recently added vertex) if `W < 0xf`, or is read using `decodeIndex` by reading extra bytes from `data` (and also updates `last`) if `W == 0xf`.
+
+Edge (b, a) is pushed to the edge FIFO.
+Edge (c, b) is pushed to the edge FIFO.
+Edge (a, c) is pushed to the edge FIFO.
+Vertex a is pushed to the vertex FIFO.
+Vertex b is pushed to the vertex FIFO if `Z == 0` or `Z == 0xf`.
+Vertex c is pushed to the vertex FIFO if `W == 0` or `W == 0xf`.
+
+At the end of the decoding, `data` is expected to be fully read by all the triangle codes and not contain any extra bytes.
+
+## Mode 2: indices
+
+Index compression exploits similarity between consecutive indices. Note that, unlike the triangle index compression (mode 1), this mode doesn't assume a specific topology and as such is less efficient in terms of the resulting size. However, unlike mode 1, this mode can be used to compress triangle strips, line lists and other types of mesh index data, and can additionally be used to compress non-mesh index data such as sparse indices for accessors.
+
+The encoded stream structure is as follows:
+
+- Header byte, which must be equal to `0xd1`
+- A sequence of index deltas, with encoding specified below
+- Tail block, which consists of 4 bytes that are reserved and should be set to 0
+
+Instead of simply encoding deltas vs the previous index, the decoder tracks *two* baseline index values, that start at 0. Each delta is specified in relation to one of these values and updates it so that the next delta that references the same baseline uses the encoded index value as a reference. This encoding is more efficient at handling some types of bimodal sequences where two independent monotonic sequences are spliced together, which can occur for some common cases of triangle strips or line lists.
+
+To specify the index delta, the varint-7 encoding scheme (also known as [unsigned LEB128](https://en.wikipedia.org/wiki/LEB128#Unsigned_LEB128)) is used, which encodes an integer as one or more bytes, with the byte with the 0 most significant bit terminating the sequence:
+
+```
+0x7f => 0x7f
+0x81 0x04 => 0x201
+0xff 0xa0 0x05 => 0x1fd005
+```
+
+When decoding the deltas, the 32-bit value is read using the varint-7 encoding. The least significant bit of the value indicates one of the baseline values; the remaining bits specify a zigzag-encoded signed delta and can be decoded as follows:
+
+```
+uint32_t decode(uint32_t v) {
+	int32_t baseline = v & 1;
+	int32_t delta = (v & 2) != 0 ? ~(v >> 2) : (v >> 2);
+
+	last[baseline] += delta;
+	return last[baseline];
+}
+```
+
+It's up to the encoder to determine the optimal selection of the baseline for each index; this encoding scheme can be used to do basic delta encoding (with baseline bit always set to 0) as well as more complex bimodal encodings.
+
+Note that the zigzag-encoded delta must fit in a 31-bit integer; as such, deltas are limited to [-2^30..2^30-1].
+
+# Appendix B: Filters
+
+Filters are functions that transform each encoded attribute. For each filter, this document specifies the transformation used for decoding the data; it's up to the encoder to pick the parameters of the encoding for each element to balance quality and precision.
+
+For performance reasons the results of the decoding process are specified to one unit in last place (ULP) in terms of the decoded data, e.g. if a filter results in a 16-bit signed normalized integer, decoding may produce results within 1/32767 of specified value.
+
+## Filter 1: octahedral
+
+Octahedral filter allows to encode unit length 3D vectors (normals/tangents) using octahedral encoding, which results in a more optimal quality vs precision tradeoff compared to storing raw components.
+
+This filter is only valid if `byteStride` is 4 or 8. When `byteStride` is 4, then the input and output of this filter are four 8-bit components, and when `byteStride` is 8, the input and output of this filter are four 16-bit signed components.
+
+The input to the filter is four 8-bit or 16-bit components, where the first two specify the X and Y components in octahedral encoding encoded as signed normalized K-bit integers (4 <= K <= 16, integers are stored in two's complement format), the third component explicitly encodes 1.0 as a signed normalized K-bit integer. The last component may contain arbitrary data which is passed through unfiltered (this can be useful for tangents).
+
+The encoding of the third component allows to compute K for each vector independently from the bit representation, and must encode 1.0 precisely which is equivalent to `(1 << (K - 1)) - 1` as an integer; values of the third component that aren't equal to `(1 << (K - 1)) - 1` for a valid `K` are invalid and the result of decoding such vectors is unspecified.
+
+When storing a K-bit integer in a 8-bit of 16-bit component when K is not 8 or 16, the remaining bits (e.g. top 6 bits in case of K=10) must be equal to the sign bit; the valid range of the resulting integer is from `-max` to `max` where `max = (1 << (K - 1)) - 1`. The behavior of decoding values outside of that range is unspecified.
+
+The output of the filter is three decoded unit vector components, stored as 8-bit or 16-bit normalized integers, and the last input component verbatim.
+
+```
+void decode(intN_t input[4], intN_t output[4]) {
+	// input[2] encodes a K-bit representation of 1.0
+	float32_t one = input[2];
+
+	float32_t x = input[0] / one;
+	float32_t y = input[1] / one;
+	float32_t z = 1.0 - abs(x) - abs(y);
+
+	// octahedral fixup for negative hemisphere
+	float32_t t = min(z, 0.0);
+
+	x -= copysign(t, x);
+	y -= copysign(t, y);
+
+	// renormalize (x, y, z)
+	float32_t len = sqrt(x * x + y * y + z * z);
+
+	x /= len;
+	y /= len;
+	z /= len;
+
+	output[0] = round(x * INTN_MAX);
+	output[1] = round(y * INTN_MAX);
+	output[2] = round(z * INTN_MAX);
+	output[3] = input[3];
+}
+```
+
+`INTN_MAX` is equal to 127 when using 8-bit components (N is 8) and equal to 32767 when using 16-bit components (N is 16).
+
+`copysign` behaves as specified in C99 and returns the value with the magnitude of the first argument and the sign of the second argument.
+
+## Filter 2: quaternion
+
+Quaternion filter allows to encode unit length quaternions using normalized 16-bit integers for all components, but allows control over the precision used for the components and provides better quality compared to naively encoding each component one by one.
+
+This filter is only valid if `byteStride` is 8.
+
+The input to the filter is three quaternion components, excluding the component with the largest magnitude, encoded as signed normalized K-bit integers (4 <= K <= 16, integers are stored in two's complement format), and an index of the largest component that is omitted in the encoding. The largest component is assumed to always be positive (which is possible due to quaternion double-cover). To allow per-element control over K, the last input element must explicitly encode 1.0 as a signed normalized K-bit integer, except for the least significant 2 bits that store the index of the maximum component.
+
+When storing a K-bit integer in a 16-bit component when K is not 16, the remaining bits (e.g. top 6 bits in case of K=10) must be equal to the sign bit; the valid range of the resulting integer is from `-max` to `max` where `max = (1 << (K - 1)) - 1`. The behavior of decoding values outside of that range is unspecified.
+
+The output of the filter is four decoded quaternion components, stored as 16-bit normalized integers.
+
+After eliminating the maximum component, the maximum magnitude of the remaining components is 1/sqrt(2). Because of this the input components store the original component value scaled by sqrt(2.0) to increase precision.
+
+```
+void decode(int16_t input[4], int16_t output[4]) {
+	float32_t range = 1.0 / sqrt(2.0);
+
+	// input[3] encodes a K-bit representation of 1.0 except for bottom two bits
+	float32_t one = input[3] | 3;
+
+	float32_t x = input[0] / one * range;
+	float32_t y = input[1] / one * range;
+	float32_t z = input[2] / one * range;
+
+	float32_t w = sqrt(max(0.0, 1.0 - x * x - y * y - z * z));
+
+	int maxcomp = input[3] & 3;
+
+	// maxcomp specifies a cyclic rotation of the quaternion components
+	output[(maxcomp + 1) % 4] = round(x * 32767.0);
+	output[(maxcomp + 2) % 4] = round(y * 32767.0);
+	output[(maxcomp + 3) % 4] = round(z * 32767.0);
+	output[(maxcomp + 0) % 4] = round(w * 32767.0);
+}
+```
+
+## Filter 3: exponential
+
+Exponential filter allows to encode floating point values with a range close to the full range of a 32-bit floating point value, but allows more control over the exponent/mantissa to trade quality for precision, and has a bit structure that is more optimally aligned to the byte boundary to facilitate better compression.
+
+This filter is only valid if `byteStride` is a multiple of 4.
+
+The input to the filter is a sequence of 32-bit little endian integers, with the most significant 8 bits specifying a (signed) exponent value, and the remaining 24 bits specifying a (signed) mantissa value. The integers are stored in two-complement format.
+
+The result of the filter is 2^e * m:
+
+```
+float32_t decode(int32_t input) {
+	int32_t e = input >> 24;
+	int32_t m = (input << 8) >> 8;
+	return pow(2.0, e) * m;
+}
+```
+
+The valid range of `e` is [-100, +100], which facilitates performant implementations. Decoding out of range values results in unspecified behavior, and encoders are expected to clamp `e` to the valid range.
+
+## Filter 4: color
+
+Color filter allows to encode color data using YCoCg-R color space transformation, which results in better compression for typical color data by exploiting correlation between color channels.
+
+This filter is only valid if `byteStride` is 4 or 8. When `byteStride` is 4, then the input and output of this filter are four 8-bit components, and when `byteStride` is 8, the input and output of this filter are four 16-bit components.
+
+The input to the filter is four 8-bit or 16-bit components, where the first component stores the Y (luma) value, the second component stores the Co (orange chrominance) value, the third component stores the Cg (green chrominance) value, and the fourth component stores the alpha value with the bit K used for scaling information.
+
+The transformation uses YCoCg-R encoding where:
+
+- Y = R/2 + G/2
+- Co = (R - B) / 2
+- Cg = (G - (R + B) / 2) / 2
+
+The alpha component uses K-1 bits for the alpha value with the high bit set to 1, where K is the bit depth (8 or 16).
+
+The output of the filter is four decoded color components (R, G, B, A), stored as 8-bit or 16-bit normalized integers.
+
+```
+void decode(intN_t input[4], intN_t output[4]) {
+	// recover scale from alpha high bit
+	int as = (1 << (firstbitset(input[3]) + 1)) - 1;
+
+	// convert to RGB in fixed point
+	int y = input[0], co = input[1], cg = input[2];
+
+	int r = y + co - cg;
+	int g = y + cg;
+	int b = y - co - cg;
+
+	// expand alpha by one bit to match other components, replicating last bit
+	int a = input[3] & (as >> 1);
+	a = (a << 1) | (a & 1);
+
+	// compute scaling factor
+	float ss = INTN_MAX / float(as);
+
+	// rounded float->int
+	output[0] = int(float(r) * ss + 0.5f);
+	output[1] = int(float(g) * ss + 0.5f);
+	output[2] = int(float(b) * ss + 0.5f);
+	output[3] = int(float(a) * ss + 0.5f);
+}
+```
+
+`INTN_MAX` is equal to 255 when using 8-bit components (N is 8) and equal to 65535 when using 16-bit components (N is 16).
+
+# Appendix C: Differences from EXT_meshopt_compression
+
+This extension is derived from `EXT_meshopt_compression` with the following changes:
+
+- Vertex data uses upgraded v1 format which provides more types of bit packing and delta encoding to compress data better
+- Added `COLOR` filter to support lossy color compression at smaller compression ratios
+
+These improvements achieve better compression ratios for typical glTF content while maintaining the same fast decompression performance.

From 84e57784d241abd967c76ecebe32a87f1f7f5a40 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Thu, 31 Jul 2025 23:03:53 -0700
Subject: [PATCH 07/54] Update channel description to be more precise.

---
 .../2.0/Khronos/KHR_meshopt_compression/README.md      | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 31ae441bb7..db12d45d0a 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -199,7 +199,7 @@ The encoded stream structure is as follows:
 
 - Header byte, which must be equal to `0xa1`
 - One or more attribute blocks, detailed below
-- Tail block, which consists of a baseline element stored verbatim, followed by channel information, padded to 24 bytes
+- Tail block, which consists of a baseline element stored verbatim, followed by channel modes, padded to 24 bytes
 
 Note that there is no way to calculate the length of a stream; instead, it is expected that the input stream is correctly sized (using `byteLength`) so that the tail block element can be found.
 
@@ -213,7 +213,7 @@ blockElements = min(remainingElements, maxBlockElements)
 Where `remainingElements` is the number of elements that have yet to be decoded.
 
 Each attribute block consists of:
-- Control header: `byteStride / 4` bytes specifying 4 control modes for each 4-byte channel
+- Control header: `byteStride / 4` bytes specifying 4 packed control modes for each byte of a 4-byte channel
 - `byteStride` "data blocks" (one for each byte of the element), each containing deltas stored for groups of elements
 
 Each group always contains 16 elements; when the number of elements that needs to be encoded isn't divisible by 16, it gets rounded up and the remaining elements are ignored after decoding. In other terms:
@@ -281,7 +281,7 @@ And the 4-bit encoding is packed as follows with 2 deltas per byte:
 
 A delta that has all bits set to 1 (corresponds to `1` for 1-bit encoding, `3` for 2-bit encoding, and `15` for 4-bit encoding, otherwise known as "sentinel") indicates that the real delta value is outside of the bit range, and is stored as a full byte after the bit deltas for this group.
 
-Delta encoding varies by channel type (specified in the tail block):
+To decode deltas into original values, the channel modes (specified in the tail block) are used. The encoded stride is split into `byteStride / 4` channels, and each channel specifies the mode in a single byte in the tail block, with the low 4 bits of the byte specifying the mode:
 
 **Channel 0 (byte deltas)**: Byte deltas are stored as zigzag-encoded differences between the byte values of the element and the byte values of the previous element in the same position; the zigzag encoding scheme works as follows:
 
@@ -313,12 +313,14 @@ decode(uint16_t v) = ((v & 1) != 0) ? ~(v >> 1) : (v >> 1)
 
 The deltas are computed in 16-bit integer space with wrap-around two-complement arithmetic.
 
-**Channel 2 (4-byte XOR deltas)**: 4-byte deltas are computed as XOR between consecutive 4-byte values, with an additional rotation applied based on the high 4 bits of the channel specification:
+**Channel 2 (4-byte XOR deltas)**: 4-byte deltas are computed as XOR between consecutive 4-byte values, with an additional rotation applied based on the high 4 bits of the channel mode byte:
 
 ```
 rotate(uint32_t v, int r) = (v << r) | (v >> (32 - r))
 ```
 
+Because the channel mode defines encoding for 4 bytes at once, it's impossible to mix modes 0 and 1 within the same channel: if the first 2-byte group of an aligned 4-byte group uses 2-byte deltas, the second 2-byte group must use 2-byte deltas as well.
+
 ## Mode 1: triangles
 
 Triangle compression compresses triangle list indices by exploiting similarity between consecutive triangles. Given a triangle stream that has been optimized for locality, very often subsequent triangles share an edge with the recently encoded triangle. The encoder uses a few other techniques to try to encode most triangles in optimized triangle lists into a single byte.

From 46ee177038d622f6db9af1d90eed0141739c79a9 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Thu, 31 Jul 2025 23:07:18 -0700
Subject: [PATCH 08/54] Specify endianness for 2/4 byte deltas.

---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index db12d45d0a..5a90b5f827 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -311,7 +311,7 @@ encode(uint16_t v) = ((v & 0x8000) != 0) ? ~(v << 1) : (v << 1)
 decode(uint16_t v) = ((v & 1) != 0) ? ~(v >> 1) : (v >> 1)
 ```
 
-The deltas are computed in 16-bit integer space with wrap-around two-complement arithmetic.
+The deltas are computed in 16-bit integer space with wrap-around two-complement arithmetic. Values are assumed to be little-endian, so the least significant byte is encoded before the most significant byte.
 
 **Channel 2 (4-byte XOR deltas)**: 4-byte deltas are computed as XOR between consecutive 4-byte values, with an additional rotation applied based on the high 4 bits of the channel mode byte:
 
@@ -319,6 +319,8 @@ The deltas are computed in 16-bit integer space with wrap-around two-complement
 rotate(uint32_t v, int r) = (v << r) | (v >> (32 - r))
 ```
 
+The deltas are computed in 32-bit integer space. Values are assumed to be little-endian, so the least significant byte is encoded before the most significant byte.
+
 Because the channel mode defines encoding for 4 bytes at once, it's impossible to mix modes 0 and 1 within the same channel: if the first 2-byte group of an aligned 4-byte group uses 2-byte deltas, the second 2-byte group must use 2-byte deltas as well.
 
 ## Mode 1: triangles

From ebbbea5628f6593809af31b0739d8b6419312721 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Thu, 31 Jul 2025 23:09:38 -0700
Subject: [PATCH 09/54] Update status to Draft

---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 5a90b5f827..7680cb4287 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -7,7 +7,7 @@
 
 ## Status
 
-Complete, Ratified by the Khronos Group
+Draft
 
 ## Dependencies
 
@@ -233,7 +233,7 @@ controlByte = (controlForByte0 << 0) | (controlForByte1 << 2) | (controlForByte2
 The control bits specify the control mode for each byte:
 
 - bits 0: Use bit lengths `{0, 1, 2, 4}` for encoding
-- bits 1: Use bit lengths `{1, 2, 4, 8}` for encoding  
+- bits 1: Use bit lengths `{1, 2, 4, 8}` for encoding
 - bits 2: All byte deltas are 0; no data is stored for this byte
 - bits 3: Literal encoding; byte deltas are stored uncompressed
 

From 93b4ac781499e4be6fd1640a3bf6adbe82ad56bc Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Thu, 31 Jul 2025 23:16:05 -0700
Subject: [PATCH 10/54] Clean up color filter description with minor fixes.

---
 .../Khronos/KHR_meshopt_compression/README.md  | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 7680cb4287..b523a5f0e7 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -106,7 +106,7 @@ Filter specifies the algorithm used to transform the data after decompression, a
 - Filter 1: octahedral. Suitable for storing unit length vectors (normals/tangents) as 4-byte or 8-byte values with variable precision octahedral encoding.
 - Filter 2: quaternion. Suitable for storing rotation data for animations or instancing as 8-byte values with variable precision max-component encoding.
 - Filter 3: exponential. Suitable for storing floating point data as 4-byte values with variable mantissa precision.
-- Filter 4: color. Suitable for storing color data as 4-byte or 8-byte values using variable precision YCoCg-R color space encoding.
+- Filter 4: color. Suitable for storing color data as 4-byte or 8-byte values using variable precision YCoCg color space encoding.
 
 The filters are detailed further in [Appendix B (Filters)](#appendix-b-filters).
 
@@ -539,7 +539,7 @@ Quaternion filter allows to encode unit length quaternions using normalized 16-b
 
 This filter is only valid if `byteStride` is 8.
 
-The input to the filter is three quaternion components, excluding the component with the largest magnitude, encoded as signed normalized K-bit integers (4 <= K <= 16, integers are stored in two's complement format), and an index of the largest component that is omitted in the encoding. The largest component is assumed to always be positive (which is possible due to quaternion double-cover). To allow per-element control over K, the last input element must explicitly encode 1.0 as a signed normalized K-bit integer, except for the least significant 2 bits that store the index of the maximum component.
+The input to the filter is three quaternion components, excluding the component with the largest magnitude, encoded as signed normalized K-bit integers (2 <= K <= 16, integers are stored in two's complement format), and an index of the largest component that is omitted in the encoding. The largest component is assumed to always be positive (which is possible due to quaternion double-cover). To allow per-element control over K, the last input element must explicitly encode 1.0 as a signed normalized K-bit integer, except for the least significant 2 bits that store the index of the maximum component.
 
 When storing a K-bit integer in a 16-bit component when K is not 16, the remaining bits (e.g. top 6 bits in case of K=10) must be equal to the sign bit; the valid range of the resulting integer is from `-max` to `max` where `max = (1 << (K - 1)) - 1`. The behavior of decoding values outside of that range is unspecified.
 
@@ -592,19 +592,15 @@ The valid range of `e` is [-100, +100], which facilitates performant implementat
 
 ## Filter 4: color
 
-Color filter allows to encode color data using YCoCg-R color space transformation, which results in better compression for typical color data by exploiting correlation between color channels.
+Color filter allows to encode color data using YCoCg color space transformation, which results in better compression for typical color data by exploiting correlation between color channels.
 
 This filter is only valid if `byteStride` is 4 or 8. When `byteStride` is 4, then the input and output of this filter are four 8-bit components, and when `byteStride` is 8, the input and output of this filter are four 16-bit components.
 
-The input to the filter is four 8-bit or 16-bit components, where the first component stores the Y (luma) value, the second component stores the Co (orange chrominance) value, the third component stores the Cg (green chrominance) value, and the fourth component stores the alpha value with the bit K used for scaling information.
+The input to the filter is four 8-bit or 16-bit components, where the first component stores the Y (luma) value as a K-bit unsigned integer, the second and third components store Co/Cg (chrominance) values as K-bit signed integers, and the fourth component stores the alpha value as a K-1-bit unsigned integer with the bit K set to 1. 1 <= K <= 16, signed integers are stored in two's complement format.
 
-The transformation uses YCoCg-R encoding where:
+The transformation uses YCoCg encoding; reconstruction of RGB values can be performed in integer space or in floating point space, depending on the implementation. The encoder must guarantee that RGB can be reconstructed using K-bit integer math without overflow or underflow.
 
-- Y = R/2 + G/2
-- Co = (R - B) / 2
-- Cg = (G - (R + B) / 2) / 2
-
-The alpha component uses K-1 bits for the alpha value with the high bit set to 1, where K is the bit depth (8 or 16).
+The alpha component uses K-1 bits for the alpha value with the high bit set to 1; note that K can be smaller than the bit depth. This allows decoder to recover K and decode the color and alpha values correctly.
 
 The output of the filter is four decoded color components (R, G, B, A), stored as 8-bit or 16-bit normalized integers.
 
@@ -642,6 +638,6 @@ void decode(intN_t input[4], intN_t output[4]) {
 This extension is derived from `EXT_meshopt_compression` with the following changes:
 
 - Vertex data uses upgraded v1 format which provides more types of bit packing and delta encoding to compress data better
-- Added `COLOR` filter to support lossy color compression at smaller compression ratios
+- Added `COLOR` filter to support lossy color compression at smaller compression ratios using YCoCg encoding
 
 These improvements achieve better compression ratios for typical glTF content while maintaining the same fast decompression performance.

From bb0e8edbbf15d6dcd81d6126ee0588b3396ce2e4 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Thu, 31 Jul 2025 23:23:10 -0700
Subject: [PATCH 11/54] More detail in tail block encoding

---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index b523a5f0e7..04bb293508 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -199,7 +199,7 @@ The encoded stream structure is as follows:
 
 - Header byte, which must be equal to `0xa1`
 - One or more attribute blocks, detailed below
-- Tail block, which consists of a baseline element stored verbatim, followed by channel modes, padded to 24 bytes
+- Tail block, which consists of a baseline element stored verbatim (`byteStride` bytes), followed by channel modes (`byteStride / 4` bytes), padded to 24 bytes
 
 Note that there is no way to calculate the length of a stream; instead, it is expected that the input stream is correctly sized (using `byteLength`) so that the tail block element can be found.
 

From 67ee72c40d20df56551d6c599cf80b35e675d732 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Thu, 31 Jul 2025 23:49:03 -0700
Subject: [PATCH 12/54] Change firstbitset to findMSB to be less ambiguous

---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 04bb293508..691d790a01 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -607,7 +607,7 @@ The output of the filter is four decoded color components (R, G, B, A), stored a
 ```
 void decode(intN_t input[4], intN_t output[4]) {
 	// recover scale from alpha high bit
-	int as = (1 << (firstbitset(input[3]) + 1)) - 1;
+	int as = (1 << (findMSB(input[3]) + 1)) - 1;
 
 	// convert to RGB in fixed point
 	int y = input[0], co = input[1], cg = input[2];

From 2c610039f267b750961afad94aaf4967e359a579 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Fri, 1 Aug 2025 10:24:59 -0700
Subject: [PATCH 13/54] More precision for tail padding (padding goes before
 tail block)

Also use "delta bytes" instead of "byte deltas" when describing control
bits to reduce confusion with 1-byte vs 2-byte vs 4-byte deltas.
---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 691d790a01..9272d20fed 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -199,7 +199,8 @@ The encoded stream structure is as follows:
 
 - Header byte, which must be equal to `0xa1`
 - One or more attribute blocks, detailed below
-- Tail block, which consists of a baseline element stored verbatim (`byteStride` bytes), followed by channel modes (`byteStride / 4` bytes), padded to 24 bytes
+- Tail padding, which pads the size of the subsequent tail block to a minimum of 24 bytes (required for efficient decoding)
+- Tail block, which consists of a baseline element stored verbatim (`byteStride` bytes), followed by channel modes (`byteStride / 4` bytes)
 
 Note that there is no way to calculate the length of a stream; instead, it is expected that the input stream is correctly sized (using `byteLength`) so that the tail block element can be found.
 
@@ -234,8 +235,8 @@ The control bits specify the control mode for each byte:
 
 - bits 0: Use bit lengths `{0, 1, 2, 4}` for encoding
 - bits 1: Use bit lengths `{1, 2, 4, 8}` for encoding
-- bits 2: All byte deltas are 0; no data is stored for this byte
-- bits 3: Literal encoding; byte deltas are stored uncompressed
+- bits 2: All delta bytes are 0; no data is stored for this byte
+- bits 3: Literal encoding; delta bytes are stored uncompressed with no header bits
 
 The structure of each "data block" (when not using control mode 2 or 3) breaks down as follows:
 - Header bits, with 2 bits for each group, aligned to the byte boundary if groupCount is not divisible by 4

From 87af1514ce4d8884e3d33cccf0d370be1d0806f1 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Sat, 2 Aug 2025 15:11:02 -0700
Subject: [PATCH 14/54] Improve wording for channel deltas and difference
 summary.

---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 9272d20fed..b64e1147f6 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -305,7 +305,7 @@ Encodes 16 deltas, where the first 8 bytes of the sequence specifies 16 4-bit de
 
 Finally, note that the deltas are computed in 8-bit integer space with wrap-around two-complement arithmetic; for example, if the values of the first byte of two consecutive elements are `0x00` and `0xff`, the byte delta that is stored is `-1` (`1` after zigzag encoding).
 
-**Channel 1 (2-byte deltas)**: 2-byte deltas are computed as zigzag-encoded differences between consecutive 2-byte values:
+**Channel 1 (2-byte deltas)**: 2-byte deltas are computed as zigzag-encoded differences between 16-bit values of the element and the previous element in the same position; the zigzag encoding scheme works as follows:
 
 ```
 encode(uint16_t v) = ((v & 0x8000) != 0) ? ~(v << 1) : (v << 1)
@@ -314,7 +314,7 @@ decode(uint16_t v) = ((v & 1) != 0) ? ~(v >> 1) : (v >> 1)
 
 The deltas are computed in 16-bit integer space with wrap-around two-complement arithmetic. Values are assumed to be little-endian, so the least significant byte is encoded before the most significant byte.
 
-**Channel 2 (4-byte XOR deltas)**: 4-byte deltas are computed as XOR between consecutive 4-byte values, with an additional rotation applied based on the high 4 bits of the channel mode byte:
+**Channel 2 (4-byte XOR deltas)**: 4-byte deltas are computed as XOR between 32-bit values of the element and the previous element in the same position, with an additional rotation applied based on the high 4 bits of the channel mode byte:
 
 ```
 rotate(uint32_t v, int r) = (v << r) | (v >> (32 - r))
@@ -638,7 +638,7 @@ void decode(intN_t input[4], intN_t output[4]) {
 
 This extension is derived from `EXT_meshopt_compression` with the following changes:
 
-- Vertex data uses upgraded v1 format which provides more types of bit packing and delta encoding to compress data better
-- Added `COLOR` filter to support lossy color compression at smaller compression ratios using YCoCg encoding
+- Vertex data uses upgraded v1 format which provides more granular bit packing (via control modes) and enhanced delta encoding (via channel modes) to compress data better
+- New `COLOR` filter supports lossy color compression at smaller compression ratios using YCoCg encoding
 
 These improvements achieve better compression ratios for typical glTF content while maintaining the same fast decompression performance.

From 4c7998c031bbb505f9c46e1a681e5183ea80db8f Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Wed, 6 Aug 2025 10:39:32 -0700
Subject: [PATCH 15/54] Update contributor list

Add Alexey and Don based on prior contributions to EXT_ (missed during
finalization) and advice for KHR_; update all links and profiles to
github.com for consistency.
---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index b64e1147f6..61407d8658 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -2,8 +2,10 @@
 
 ## Contributors
 
-* Arseny Kapoulkine, [@zeuxcg](https://twitter.com/zeuxcg)
-* Jasper St. Pierre, [@JasperRLZ](https://twitter.com/JasperRLZ)
+* Arseny Kapoulkine, [@zeux](https://github.com/zeux)
+* Jasper St. Pierre, [@magcius](https://github.com/magcius)
+* Alexey Knyazev, [@lexaknyazev](https://github.com/lexaknyazev)
+* Don McCurdy, [@donmccurdy](https://github.com/donmccurdy)
 
 ## Status
 

From 117e768d1438ff6dcfdb34a3bb3a8eea3d52ad28 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Wed, 6 Aug 2025 10:54:32 -0700
Subject: [PATCH 16/54] Improve wording around gzip so that it's not suggesting
 it as a requirement for Web delivery.

---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 61407d8658..524100dfee 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -25,7 +25,7 @@ Similarly to supercompressed textures (see `KHR_texture_basisu`), this extension
 
 The compressed format is designed to have two properties beyond optimizing compression ratio - very fast decoding (using WebAssembly SIMD, the decoders run at \~1 GB/sec on modern desktop hardware), and byte-wise storage compatible with general-purpose compression. That is, instead of reducing the encoded size as much as possible, the bitstream is constructed in such a way that general-purpose compressor can compress it further.
 
-This is beneficial for typical Web delivery scenarios, where all files are usually using gzip compression - instead of completely replacing it, the codecs here augment it, while still reducing the size (which is valuable to optimize delivery size when gzip compression isn't available, and additionally reduces the performance impact of gzip decompression which is typically *much slower* than decoders proposed here).
+This is beneficial for typical Web delivery scenarios, where all files are usually using lossess general-purpose compression (gzip, Brotli, Zstandard) - instead of completely replacing it, the codecs here augment it, while still reducing the size (which is valuable to optimize delivery size when general-purpose compression isn't available, and additionally reduces the performance impact of general-purpose decompression which is typically *much slower* than decoders proposed here).
 
 ## Specifying compressed views
 

From b008bb577c45ff8905a4bb2035ebb06e2eccae00 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Wed, 6 Aug 2025 10:54:45 -0700
Subject: [PATCH 17/54] Enhance recommendation on filters for geometry,
 animation and instanced data.

---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 524100dfee..c12fd7ed8a 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -169,7 +169,9 @@ Morph targets can be treated identically to other vertex attributes, as long as
 
 When storing vertex data, mode 0 (attributes) should be used; for index data, mode 1 (triangles) or mode 2 (indices) should be used instead. Mode 1 only supports triangle list storage; indices of other topology types can be stored using mode 2. The use of triangle strip topology is not recommended since it's more efficient to store triangle lists using mode 1.
 
-Using filter 1 (octahedral) for normal/tangent data may improve compression ratio further.
+Using filter 1 (octahedral) for normal/tangent data, and filter 4 (color) for color data, may improve compression ratio further.
+
+While using quantized attributes is recommended for optimal compression, it's also possible to use non-quantized floating point attributes. To increase compression ratio in that case, filter 3 (exponential) is recommended - advanced encoders can additionally constraint the exponent to be the same for all components of a vector, or for all values of the same component across the entire mesh, which can further improve compression ratio.
 
 ## Compressing animation data
 
@@ -181,12 +183,14 @@ To reduce the number of keyframes, encoders can either selectively remove keyfra
 
 Additionally it's important to identify tracks with the same output value and use a single keyframe for these.
 
-To reduce the size of each keyframe, rotation data should be quantized using 16-bit normalized components; for additional compression, the use of filter 2 (quaternion) is recommended. Translation/scale data can be compressed using filter 3 (exponential) with the same exponent used for all three vector components.
+To reduce the size of each keyframe, rotation data should be quantized using 16-bit normalized components; for additional compression, the use of filter 2 (quaternion) is recommended. Translation/scale data can be compressed using filter 3 (exponential); for scale data, using the same exponent for all three vector components can enhance compression ratio further.
 
 Note that animation inputs that specify time values require enough precision to avoid animation distortion. It's recommended to either not use any filters for animation inputs to avoid any precision loss (attribute encoder can still be efficient at reducing the size of animation input track even without filters when the inputs are uniformly spaced), or use filter 3 (exponential) with maximum mantissa bit count (23).
 
 After pre-processing, both input and output data should be stored using mode 0 (attributes).
 
+When `EXT_mesh_gpu_instancing` extension is used, the instance transform data can also be compressed with the same techniques as animation data, using mode 0 (attributes) with filter 3 (exponential) for position and scale, and filter 2 (quaternion) for rotation.
+
 # Appendix A: Bitstream
 
 The following sections specify the format of the bitstream for compressed data for various modes.

From 81003a40098cc6749640a84e8e0e27f6e679a2ad Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Wed, 6 Aug 2025 13:23:43 -0700
Subject: [PATCH 18/54] Add explicit exclusion text to prevent use of EXT_ and
 KHR_ variant on the same view.

---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index c12fd7ed8a..8f02857b6f 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -15,6 +15,10 @@ Draft
 
 Written against the glTF 2.0 spec.
 
+## Exclusions
+
+- This extension must not be used on a buffer view that also uses `EXT_meshopt_compression`.
+
 ## Overview
 
 glTF files come with a variety of binary data - vertex attribute data, index data, morph target deltas, animation inputs/outputs - that can be a substantial fraction of the overall transmission size. To optimize for delivery size, general-purpose compression such as gzip can be used - however, it often doesn't capture some common types of redundancy in glTF binary data.

From c6cd201dd6edbbf896a767389e8916794fb98590 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Thu, 14 Aug 2025 21:21:27 -0700
Subject: [PATCH 19/54] Update K range to 2..16 for octahedral filter

---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 8f02857b6f..83ef4f759c 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -503,7 +503,7 @@ Octahedral filter allows to encode unit length 3D vectors (normals/tangents) usi
 
 This filter is only valid if `byteStride` is 4 or 8. When `byteStride` is 4, then the input and output of this filter are four 8-bit components, and when `byteStride` is 8, the input and output of this filter are four 16-bit signed components.
 
-The input to the filter is four 8-bit or 16-bit components, where the first two specify the X and Y components in octahedral encoding encoded as signed normalized K-bit integers (4 <= K <= 16, integers are stored in two's complement format), the third component explicitly encodes 1.0 as a signed normalized K-bit integer. The last component may contain arbitrary data which is passed through unfiltered (this can be useful for tangents).
+The input to the filter is four 8-bit or 16-bit components, where the first two specify the X and Y components in octahedral encoding encoded as signed normalized K-bit integers (2 <= K <= 16, integers are stored in two's complement format), the third component explicitly encodes 1.0 as a signed normalized K-bit integer. The last component may contain arbitrary data which is passed through unfiltered (this can be useful for tangents).
 
 The encoding of the third component allows to compute K for each vector independently from the bit representation, and must encode 1.0 precisely which is equivalent to `(1 << (K - 1)) - 1` as an integer; values of the third component that aren't equal to `(1 << (K - 1)) - 1` for a valid `K` are invalid and the result of decoding such vectors is unspecified.
 

From 3889e488510e83f796a52d77812c0bfeb56800f8 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Thu, 4 Sep 2025 10:25:34 -0700
Subject: [PATCH 20/54] Update bitstream description to allow version 0

This improves compatibility: an EXT_meshopt_compression asset can now be
upgraded to KHR_meshopt_compression by simply replacing the extension
name in the JSON blob.
---
 .../Khronos/KHR_meshopt_compression/README.md | 25 +++++++++++++------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 83ef4f759c..289409ec20 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -207,11 +207,13 @@ To facilitate efficient decompression, deinterleaving and delta encoding are per
 
 The encoded stream structure is as follows:
 
-- Header byte, which must be equal to `0xa1`
+- Header byte, which must be equal to `0xa1` (version 1) or `0xa0` (version 0)
 - One or more attribute blocks, detailed below
-- Tail padding, which pads the size of the subsequent tail block to a minimum of 24 bytes (required for efficient decoding)
+- Tail padding, which pads the size of the subsequent tail block to a minimum of 24 bytes for version 1 or 32 bytes for version 0 (required for efficient decoding)
 - Tail block, which consists of a baseline element stored verbatim (`byteStride` bytes), followed by channel modes (`byteStride / 4` bytes)
 
+**Non-normative** While using version 1 is preferred for better compression, version 0 is provided for binary compatibility with `EXT_meshopt_compression`. When using version 0, the bitstream is identical to that defined in `EXT_meshopt_compression`.
+
 Note that there is no way to calculate the length of a stream; instead, it is expected that the input stream is correctly sized (using `byteLength`) so that the tail block element can be found.
 
 Each attribute block encodes a sequence of deltas, with the first element in the first block using the deltas from the baseline element stored in the tail block, and each subsequent element using the deltas from the previous element. The attribute block always stores an integer number of elements, with that number computed as follows:
@@ -224,7 +226,7 @@ blockElements = min(remainingElements, maxBlockElements)
 Where `remainingElements` is the number of elements that have yet to be decoded.
 
 Each attribute block consists of:
-- Control header: `byteStride / 4` bytes specifying 4 packed control modes for each byte of a 4-byte channel
+- Control header (only in version 1): `byteStride / 4` bytes specifying 4 packed control modes for each byte of a 4-byte channel
 - `byteStride` "data blocks" (one for each byte of the element), each containing deltas stored for groups of elements
 
 Each group always contains 16 elements; when the number of elements that needs to be encoded isn't divisible by 16, it gets rounded up and the remaining elements are ignored after decoding. In other terms:
@@ -248,7 +250,7 @@ The control bits specify the control mode for each byte:
 - bits 2: All delta bytes are 0; no data is stored for this byte
 - bits 3: Literal encoding; delta bytes are stored uncompressed with no header bits
 
-The structure of each "data block" (when not using control mode 2 or 3) breaks down as follows:
+The structure of each "data block" (when using control mode 0 or 1, or when using version 0) breaks down as follows:
 - Header bits, with 2 bits for each group, aligned to the byte boundary if groupCount is not divisible by 4
 - Delta blocks, with variable number of bytes stored for each group
 
@@ -260,18 +262,24 @@ Header bits are stored from least significant to most significant bit - header b
 
 The header bits establish the delta encoding mode for each group of 16 elements:
 
-For control mode 0:
+For control mode 0 (version 1):
 - bits 0: All 16 byte deltas are 0; the size of the encoded block is 0 bytes
 - bits 1: Deltas are stored in 1-bit sentinel encoding; the size of the encoded block is [2..18] bytes
 - bits 2: Deltas are stored in 2-bit sentinel encoding; the size of the encoded block is [4..20] bytes
 - bits 3: Deltas are stored in 4-bit sentinel encoding; the size of the encoded block is [8..24] bytes
 
-For control mode 1:
+For control mode 1 (version 1):
 - bits 0: Deltas are stored in 1-bit sentinel encoding; the size of the encoded block is [2..18] bytes
 - bits 1: Deltas are stored in 2-bit sentinel encoding; the size of the encoded block is [4..20] bytes
 - bits 2: Deltas are stored in 4-bit sentinel encoding; the size of the encoded block is [8..24] bytes
 - bits 3: All 16 byte deltas are stored as bytes; the size of the encoded block is 16 bytes
 
+For version 0:
+- bits 0: All 16 byte deltas are 0; the size of the encoded block is 0 bytes
+- bits 1: Deltas are stored in 2-bit sentinel encoding; the size of the encoded block is [4..20] bytes
+- bits 2: Deltas are stored in 4-bit sentinel encoding; the size of the encoded block is [8..24] bytes
+- bits 3: All 16 byte deltas are stored as bytes; the size of the encoded block is 16 bytes
+
 When using the sentinel encoding, each delta is stored as a 1-bit, 2-bit, or 4-bit value in packed bytes. For 2-bit and 4-bit encodings, deltas are stored from most significant to least significant bit inside the byte. For 1-bit encoding, deltas are stored from least significant to most significant bit to facilitate better reuse of lookup tables in efficient implementations. The 1-bit encoding is packed as follows with 8 deltas per byte:
 
 ```
@@ -292,7 +300,7 @@ And the 4-bit encoding is packed as follows with 2 deltas per byte:
 
 A delta that has all bits set to 1 (corresponds to `1` for 1-bit encoding, `3` for 2-bit encoding, and `15` for 4-bit encoding, otherwise known as "sentinel") indicates that the real delta value is outside of the bit range, and is stored as a full byte after the bit deltas for this group.
 
-To decode deltas into original values, the channel modes (specified in the tail block) are used. The encoded stride is split into `byteStride / 4` channels, and each channel specifies the mode in a single byte in the tail block, with the low 4 bits of the byte specifying the mode:
+To decode deltas into original values, the channel modes (specified in the tail block for version 1) are used. When using version 0, the channel mode is assumed to be 0 (byte deltas); other modes can only be present in version 1. The encoded stride is split into `byteStride / 4` channels, and each channel specifies the mode in a single byte in the tail block, with the low 4 bits of the byte specifying the mode:
 
 **Channel 0 (byte deltas)**: Byte deltas are stored as zigzag-encoded differences between the byte values of the element and the byte values of the previous element in the same position; the zigzag encoding scheme works as follows:
 
@@ -648,7 +656,8 @@ void decode(intN_t input[4], intN_t output[4]) {
 
 This extension is derived from `EXT_meshopt_compression` with the following changes:
 
-- Vertex data uses upgraded v1 format which provides more granular bit packing (via control modes) and enhanced delta encoding (via channel modes) to compress data better
+- Vertex data supports an upgraded v1 format which provides more granular bit packing (via control modes) and enhanced delta encoding (via channel modes) to compress data better
+- For compatibility, the v0 format (identical to `EXT_meshopt_compression` format) is still supported; however, use of v1 format is preferred
 - New `COLOR` filter supports lossy color compression at smaller compression ratios using YCoCg encoding
 
 These improvements achieve better compression ratios for typical glTF content while maintaining the same fast decompression performance.

From ad492f17e0c3ef76f168d8457d81669d79cdad55 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Thu, 4 Sep 2025 10:29:58 -0700
Subject: [PATCH 21/54] Update rounding formulas to use round() and define
 round() as well as copysign() without references to external standards.

---
 .../2.0/Khronos/KHR_meshopt_compression/README.md   | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 289409ec20..18cd4b32a2 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -550,7 +550,9 @@ void decode(intN_t input[4], intN_t output[4]) {
 
 `INTN_MAX` is equal to 127 when using 8-bit components (N is 8) and equal to 32767 when using 16-bit components (N is 16).
 
-`copysign` behaves as specified in C99 and returns the value with the magnitude of the first argument and the sign of the second argument.
+`copysign` returns the value with the magnitude of the first argument and the sign of the second argument.
+
+`round` returns the nearest integer value, rounding halfway cases away from zero.
 
 ## Filter 2: quaternion
 
@@ -642,11 +644,10 @@ void decode(intN_t input[4], intN_t output[4]) {
 	// compute scaling factor
 	float ss = INTN_MAX / float(as);
 
-	// rounded float->int
-	output[0] = int(float(r) * ss + 0.5f);
-	output[1] = int(float(g) * ss + 0.5f);
-	output[2] = int(float(b) * ss + 0.5f);
-	output[3] = int(float(a) * ss + 0.5f);
+	output[0] = round(float(r) * ss);
+	output[1] = round(float(g) * ss);
+	output[2] = round(float(b) * ss);
+	output[3] = round(float(a) * ss);
 }
 ```
 

From 916b004cb8aab3474e773ca1b82fc4d43bca14f8 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Thu, 4 Sep 2025 10:33:11 -0700
Subject: [PATCH 22/54] Add forgotten 'version 1' reference to channel modes in
 tail block.

---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 18cd4b32a2..c3eaf09dc1 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -210,7 +210,7 @@ The encoded stream structure is as follows:
 - Header byte, which must be equal to `0xa1` (version 1) or `0xa0` (version 0)
 - One or more attribute blocks, detailed below
 - Tail padding, which pads the size of the subsequent tail block to a minimum of 24 bytes for version 1 or 32 bytes for version 0 (required for efficient decoding)
-- Tail block, which consists of a baseline element stored verbatim (`byteStride` bytes), followed by channel modes (`byteStride / 4` bytes)
+- Tail block, which consists of a baseline element stored verbatim (`byteStride` bytes), followed by channel modes (`byteStride / 4` bytes, only in version 1)
 
 **Non-normative** While using version 1 is preferred for better compression, version 0 is provided for binary compatibility with `EXT_meshopt_compression`. When using version 0, the bitstream is identical to that defined in `EXT_meshopt_compression`.
 
@@ -237,7 +237,7 @@ groupCount = ceil(blockElements / 16)
 
 For example, a stream with a `byteStride` of 64 containing 200 elements would be broken up into two attribute blocks: one containing 128 elements, and the other containing 72 elements. And these blocks would have 8 and 5 groups, respectively.
 
-The control header contains 2 bits for each byte position, packed into bytes as follows:
+The control header (only present in version 1) contains 2 bits for each byte position, packed into bytes as follows:
 
 ```
 controlByte = (controlForByte0 << 0) | (controlForByte1 << 2) | (controlForByte2 << 4) | (controlForByte3 << 6)

From 0c1e83f83c5e4c3d7b1ca2d9ef02a1c66f42c3af Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Thu, 20 Nov 2025 21:25:39 +0100
Subject: [PATCH 23/54] Add content guidelines section

---
 .../2.0/Khronos/KHR_meshopt_compression/README.md      | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index c3eaf09dc1..3721cfa521 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -195,6 +195,16 @@ After pre-processing, both input and output data should be stored using mode 0 (
 
 When `EXT_mesh_gpu_instancing` extension is used, the instance transform data can also be compressed with the same techniques as animation data, using mode 0 (attributes) with filter 3 (exponential) for position and scale, and filter 2 (quaternion) for rotation.
 
+## Content guidelines
+
+> This section is non-normative.
+
+This extension expands the compression available by the existing extension `EXT_meshopt_compression`. Since existing tools and pipelines already support that extension, and existing assets already use it, the following guidelines are recommended for content creators and tool authors:
+
+- Tools that already support `EXT_meshopt_compression` extension should keep supporting it alongside this extension to be able to read pre-existing assets.
+- For maximum compabitility, DCC tools should give users a choice to use either variant when exporting assets. The default option should be eventually switched to the KHR variant once most loaders support it.
+- Existing assets that use the EXT variant can be losslessly converted to KHR, if needed, by changing the extension strings inside glTF JSON.
+
 # Appendix A: Bitstream
 
 The following sections specify the format of the bitstream for compressed data for various modes.

From cef4841b236291acbf2f3dd9cdc99b2a9f3cf506 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Thu, 20 Nov 2025 21:31:59 +0100
Subject: [PATCH 24/54] Add a guideline wrt assets when loader supports both
 extensions

This clarifies that there's no tradeoffs involved in KHR+v1 vs EXT+v0
other than compatibility.
---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 3721cfa521..2328b848b2 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -204,6 +204,7 @@ This extension expands the compression available by the existing extension `EXT_
 - Tools that already support `EXT_meshopt_compression` extension should keep supporting it alongside this extension to be able to read pre-existing assets.
 - For maximum compabitility, DCC tools should give users a choice to use either variant when exporting assets. The default option should be eventually switched to the KHR variant once most loaders support it.
 - Existing assets that use the EXT variant can be losslessly converted to KHR, if needed, by changing the extension strings inside glTF JSON.
+- When producing assets that target loaders supporting both extensions, using this extension with v1 format should be preferred since it provides better compression ratio at no additional runtime cost.
 
 # Appendix A: Bitstream
 

From bc8c083bd0e20b3e1de0b36b6b6b23b61795b371 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Wed, 26 Nov 2025 15:28:39 -0800
Subject: [PATCH 25/54] Wording adjustments and clarifications

---
 .../Khronos/KHR_meshopt_compression/README.md | 68 +++++++++----------
 1 file changed, 34 insertions(+), 34 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 2328b848b2..34a8334b4b 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -79,7 +79,7 @@ For the extension object to be valid, the following must hold:
 
 - When parent `bufferView` has `byteStride` defined, it matches `byteStride` in the extension JSON
 - The parent `bufferView.byteLength` is equal to `byteStride` times `count`
-- When `mode` is `"ATTRIBUTES"`, `byteStride` must be divisible by 4 and must be <= 256.
+- When `mode` is `"ATTRIBUTES"`, `byteStride` must be divisible by 4 and must be >= 4 and <= 256.
 - When `mode` is `"TRIANGLES"`, `count` must be divisible by 3
 - When `mode` is `"TRIANGLES"` or `"INDICES"`, `byteStride` must be equal to 2 or 4
 - When `mode` is `"TRIANGLES"` or `"INDICES"`, `filter` must be equal to `"NONE"` or omitted
@@ -88,7 +88,7 @@ For the extension object to be valid, the following must hold:
 - When `filter` is `"EXPONENTIAL"`, `byteStride` must be divisible by 4
 - When `filter` is `"COLOR"`, `byteStride` must be equal to 4 or 8
 
-The type of compressed data must match the bitstream specification (note that each `mode` specifies a different bitstream format).
+The compressed bitstream format is defined by the value of the `mode` property.
 
 The parent `bufferView` properties define a layout which can hold the data decompressed from the extension object.
 
@@ -112,7 +112,7 @@ Filter specifies the algorithm used to transform the data after decompression, a
 - Filter 1: octahedral. Suitable for storing unit length vectors (normals/tangents) as 4-byte or 8-byte values with variable precision octahedral encoding.
 - Filter 2: quaternion. Suitable for storing rotation data for animations or instancing as 8-byte values with variable precision max-component encoding.
 - Filter 3: exponential. Suitable for storing floating point data as 4-byte values with variable mantissa precision.
-- Filter 4: color. Suitable for storing color data as 4-byte or 8-byte values using variable precision YCoCg color space encoding.
+- Filter 4: color. Suitable for storing color data as 4-byte or 8-byte values using variable precision YCoCg color model.
 
 The filters are detailed further in [Appendix B (Filters)](#appendix-b-filters).
 
@@ -234,7 +234,7 @@ maxBlockElements = min((8192 / byteStride) & ~15, 256)
 blockElements = min(remainingElements, maxBlockElements)
 ```
 
-Where `remainingElements` is the number of elements that have yet to be decoded.
+Where `remainingElements` is the number of elements that have yet to be decoded (with the initial value of `count` extension property).
 
 Each attribute block consists of:
 - Control header (only in version 1): `byteStride / 4` bytes specifying 4 packed control modes for each byte of a 4-byte channel
@@ -256,10 +256,10 @@ controlByte = (controlForByte0 << 0) | (controlForByte1 << 2) | (controlForByte2
 
 The control bits specify the control mode for each byte:
 
-- bits 0: Use bit lengths `{0, 1, 2, 4}` for encoding
-- bits 1: Use bit lengths `{1, 2, 4, 8}` for encoding
-- bits 2: All delta bytes are 0; no data is stored for this byte
-- bits 3: Literal encoding; delta bytes are stored uncompressed with no header bits
+- control mode 0: Use bit lengths `{0, 1, 2, 4}` for encoding
+- control mode 1: Use bit lengths `{1, 2, 4, 8}` for encoding
+- control mode 2: All delta bytes are 0; no data is stored for this byte
+- control mode 3: Literal encoding; delta bytes are stored uncompressed with no header bits
 
 The structure of each "data block" (when using control mode 0 or 1, or when using version 0) breaks down as follows:
 - Header bits, with 2 bits for each group, aligned to the byte boundary if groupCount is not divisible by 4
@@ -274,22 +274,22 @@ Header bits are stored from least significant to most significant bit - header b
 The header bits establish the delta encoding mode for each group of 16 elements:
 
 For control mode 0 (version 1):
-- bits 0: All 16 byte deltas are 0; the size of the encoded block is 0 bytes
-- bits 1: Deltas are stored in 1-bit sentinel encoding; the size of the encoded block is [2..18] bytes
-- bits 2: Deltas are stored in 2-bit sentinel encoding; the size of the encoded block is [4..20] bytes
-- bits 3: Deltas are stored in 4-bit sentinel encoding; the size of the encoded block is [8..24] bytes
+- delta encoding mode 0: All 16 byte deltas are 0; the size of the encoded block is 0 bytes
+- delta encoding mode 1: Deltas are stored in 1-bit sentinel encoding; the size of the encoded block is [2..18] bytes
+- delta encoding mode 2: Deltas are stored in 2-bit sentinel encoding; the size of the encoded block is [4..20] bytes
+- delta encoding mode 3: Deltas are stored in 4-bit sentinel encoding; the size of the encoded block is [8..24] bytes
 
 For control mode 1 (version 1):
-- bits 0: Deltas are stored in 1-bit sentinel encoding; the size of the encoded block is [2..18] bytes
-- bits 1: Deltas are stored in 2-bit sentinel encoding; the size of the encoded block is [4..20] bytes
-- bits 2: Deltas are stored in 4-bit sentinel encoding; the size of the encoded block is [8..24] bytes
-- bits 3: All 16 byte deltas are stored as bytes; the size of the encoded block is 16 bytes
+- delta encoding mode 0: Deltas are stored in 1-bit sentinel encoding; the size of the encoded block is [2..18] bytes
+- delta encoding mode 1: Deltas are stored in 2-bit sentinel encoding; the size of the encoded block is [4..20] bytes
+- delta encoding mode 2: Deltas are stored in 4-bit sentinel encoding; the size of the encoded block is [8..24] bytes
+- delta encoding mode 3: All 16 byte deltas are stored as bytes; the size of the encoded block is 16 bytes
 
 For version 0:
-- bits 0: All 16 byte deltas are 0; the size of the encoded block is 0 bytes
-- bits 1: Deltas are stored in 2-bit sentinel encoding; the size of the encoded block is [4..20] bytes
-- bits 2: Deltas are stored in 4-bit sentinel encoding; the size of the encoded block is [8..24] bytes
-- bits 3: All 16 byte deltas are stored as bytes; the size of the encoded block is 16 bytes
+- delta encoding mode 0: All 16 byte deltas are 0; the size of the encoded block is 0 bytes
+- delta encoding mode 1: Deltas are stored in 2-bit sentinel encoding; the size of the encoded block is [4..20] bytes
+- delta encoding mode 2: Deltas are stored in 4-bit sentinel encoding; the size of the encoded block is [8..24] bytes
+- delta encoding mode 3: All 16 byte deltas are stored as bytes; the size of the encoded block is 16 bytes
 
 When using the sentinel encoding, each delta is stored as a 1-bit, 2-bit, or 4-bit value in packed bytes. For 2-bit and 4-bit encodings, deltas are stored from most significant to least significant bit inside the byte. For 1-bit encoding, deltas are stored from least significant to most significant bit to facilitate better reuse of lookup tables in efficient implementations. The 1-bit encoding is packed as follows with 8 deltas per byte:
 
@@ -313,7 +313,7 @@ A delta that has all bits set to 1 (corresponds to `1` for 1-bit encoding, `3` f
 
 To decode deltas into original values, the channel modes (specified in the tail block for version 1) are used. When using version 0, the channel mode is assumed to be 0 (byte deltas); other modes can only be present in version 1. The encoded stride is split into `byteStride / 4` channels, and each channel specifies the mode in a single byte in the tail block, with the low 4 bits of the byte specifying the mode:
 
-**Channel 0 (byte deltas)**: Byte deltas are stored as zigzag-encoded differences between the byte values of the element and the byte values of the previous element in the same position; the zigzag encoding scheme works as follows:
+**Channel mode 0 (byte deltas)**: Byte deltas are stored as zigzag-encoded differences between the byte values of the element and the byte values of the previous element in the same position; the zigzag encoding scheme works as follows:
 
 ```
 encode(uint8_t v) = ((v & 0x80) != 0) ? ~(v << 1) : (v << 1)
@@ -326,24 +326,24 @@ For a complete example, assuming 4-bit sentinel coding, the following byte seque
 0x17 0x5f 0xf0 0xbc 0x77 0xa9 0x21 0x00 0x34 0xb5
 ```
 
-Encodes 16 deltas, where the first 8 bytes of the sequence specifies 16 4-bit deltas, and the last 2 bytes of the sequence specify the explicit delta code values encoded for elements 3 and 4 in the sequence. After de-zigzagging, the decoded deltas look like:
+Encodes 16 deltas, where the first 8 bytes of the sequence specify 16 4-bit deltas, and the last 2 bytes of the sequence specify the explicit delta code values encoded for elements 3 and 4 in the sequence. After de-zigzagging, the decoded deltas look like:
 
 ```
 -1 -4 -3 26 -91 0 -6 6 -4 -4 5 -5 1 -1 0 0
 ```
 
-Finally, note that the deltas are computed in 8-bit integer space with wrap-around two-complement arithmetic; for example, if the values of the first byte of two consecutive elements are `0x00` and `0xff`, the byte delta that is stored is `-1` (`1` after zigzag encoding).
+Finally, note that the deltas are computed in 8-bit integer space with wraparound two's complement arithmetic; for example, if the values of the first byte of two consecutive elements are `0x00` and `0xff`, the byte delta that is stored is `-1` (`1` after zigzag encoding).
 
-**Channel 1 (2-byte deltas)**: 2-byte deltas are computed as zigzag-encoded differences between 16-bit values of the element and the previous element in the same position; the zigzag encoding scheme works as follows:
+**Channel mode 1 (2-byte deltas)**: 2-byte deltas are computed as zigzag-encoded differences between 16-bit values of the element and the previous element in the same position; the zigzag encoding scheme works as follows:
 
 ```
 encode(uint16_t v) = ((v & 0x8000) != 0) ? ~(v << 1) : (v << 1)
 decode(uint16_t v) = ((v & 1) != 0) ? ~(v >> 1) : (v >> 1)
 ```
 
-The deltas are computed in 16-bit integer space with wrap-around two-complement arithmetic. Values are assumed to be little-endian, so the least significant byte is encoded before the most significant byte.
+The deltas are computed in 16-bit integer space with wraparound two's complement arithmetic. Values are assumed to be little-endian, so the least significant byte is encoded before the most significant byte.
 
-**Channel 2 (4-byte XOR deltas)**: 4-byte deltas are computed as XOR between 32-bit values of the element and the previous element in the same position, with an additional rotation applied based on the high 4 bits of the channel mode byte:
+**Channel mode 2 (4-byte XOR deltas)**: 4-byte deltas are computed as XOR between 32-bit values of the element and the previous element in the same position, with an additional rotation applied based on the high 4 bits of the channel mode byte:
 
 ```
 rotate(uint32_t v, int r) = (v << r) | (v >> (32 - r))
@@ -360,16 +360,16 @@ Triangle compression compresses triangle list indices by exploiting similarity b
 The encoded stream structure is as follows:
 
 - Header byte, which must be equal to `0xe1`
-- Triangle codes, referred to as `code` below, with a single byte for each triangle
+- Triangle codes, referred to as `code` below, with a single byte for each triangle (for a total of `count` extension property divided by 3, since `count` counts index values)
 - Extra data which is necessary to decode triangles that don't fit into a single byte, referred to as `data` below
-- Tail block, which consists of a 16-byte lookup table, referred to as `codeaux` below
+- Tail block, which consists of a 16-byte lookup table (containing 16 one-byte values), referred to as `codeaux` below
 
 Note that there is no way to calculate the length of a stream; instead, it is expected that the input stream is correctly sized (using `byteLength`) so that the tail block element can be found.
 
 There are two limitations on the structure of the 16-byte lookup table:
 
 - The last two bytes must be 0
-- The remaining bytes must not contain any nibbles equal to `0xf`.
+- Neither high four bits nor low four bits of any of 16 bytes can be equal to `0xf`.
 
 During the decoding process, decoder maintains four variables:
 
@@ -481,7 +481,7 @@ Index compression exploits similarity between consecutive indices. Note that, un
 The encoded stream structure is as follows:
 
 - Header byte, which must be equal to `0xd1`
-- A sequence of index deltas, with encoding specified below
+- A sequence of index deltas (with number of elements equal to `count` extension property), with encoding specified below
 - Tail block, which consists of 4 bytes that are reserved and should be set to 0
 
 Instead of simply encoding deltas vs the previous index, the decoder tracks *two* baseline index values, that start at 0. Each delta is specified in relation to one of these values and updates it so that the next delta that references the same baseline uses the encoded index value as a reference. This encoding is more efficient at handling some types of bimodal sequences where two independent monotonic sequences are spliced together, which can occur for some common cases of triangle strips or line lists.
@@ -520,7 +520,7 @@ For performance reasons the results of the decoding process are specified to one
 
 Octahedral filter allows to encode unit length 3D vectors (normals/tangents) using octahedral encoding, which results in a more optimal quality vs precision tradeoff compared to storing raw components.
 
-This filter is only valid if `byteStride` is 4 or 8. When `byteStride` is 4, then the input and output of this filter are four 8-bit components, and when `byteStride` is 8, the input and output of this filter are four 16-bit signed components.
+This filter is only valid if `byteStride` is 4 or 8. When `byteStride` is 4, then the input and output of this filter are four 8-bit signed components, and when `byteStride` is 8, the input and output of this filter are four 16-bit signed components.
 
 The input to the filter is four 8-bit or 16-bit components, where the first two specify the X and Y components in octahedral encoding encoded as signed normalized K-bit integers (2 <= K <= 16, integers are stored in two's complement format), the third component explicitly encodes 1.0 as a signed normalized K-bit integer. The last component may contain arbitrary data which is passed through unfiltered (this can be useful for tangents).
 
@@ -608,7 +608,7 @@ Exponential filter allows to encode floating point values with a range close to
 
 This filter is only valid if `byteStride` is a multiple of 4.
 
-The input to the filter is a sequence of 32-bit little endian integers, with the most significant 8 bits specifying a (signed) exponent value, and the remaining 24 bits specifying a (signed) mantissa value. The integers are stored in two-complement format.
+The input to the filter is a sequence of 32-bit little endian integers, with the most significant 8 bits specifying a (signed) exponent value, and the remaining 24 bits specifying a (signed) mantissa value. The integers are stored in two's complement format.
 
 The result of the filter is 2^e * m:
 
@@ -624,13 +624,13 @@ The valid range of `e` is [-100, +100], which facilitates performant implementat
 
 ## Filter 4: color
 
-Color filter allows to encode color data using YCoCg color space transformation, which results in better compression for typical color data by exploiting correlation between color channels.
+Color filter allows to encode color data using YCoCg color model, which results in better compression for typical color data by exploiting correlation between color channels.
 
 This filter is only valid if `byteStride` is 4 or 8. When `byteStride` is 4, then the input and output of this filter are four 8-bit components, and when `byteStride` is 8, the input and output of this filter are four 16-bit components.
 
 The input to the filter is four 8-bit or 16-bit components, where the first component stores the Y (luma) value as a K-bit unsigned integer, the second and third components store Co/Cg (chrominance) values as K-bit signed integers, and the fourth component stores the alpha value as a K-1-bit unsigned integer with the bit K set to 1. 1 <= K <= 16, signed integers are stored in two's complement format.
 
-The transformation uses YCoCg encoding; reconstruction of RGB values can be performed in integer space or in floating point space, depending on the implementation. The encoder must guarantee that RGB can be reconstructed using K-bit integer math without overflow or underflow.
+The transformation uses YCoCg encoding; reconstruction of RGB values can be performed in integer space or in floating point space, depending on the implementation. The encoder must guarantee that original RGB values can be reconstructed using K-bit integer math without overflow or underflow.
 
 The alpha component uses K-1 bits for the alpha value with the high bit set to 1; note that K can be smaller than the bit depth. This allows decoder to recover K and decode the color and alpha values correctly.
 

From 6df650a3288fcc90e50c5b999b39102607407797 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Wed, 26 Nov 2025 15:45:21 -0800
Subject: [PATCH 26/54] Fix K range for quaternion filter

---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 34a8334b4b..e826caec80 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -571,13 +571,13 @@ Quaternion filter allows to encode unit length quaternions using normalized 16-b
 
 This filter is only valid if `byteStride` is 8.
 
-The input to the filter is three quaternion components, excluding the component with the largest magnitude, encoded as signed normalized K-bit integers (2 <= K <= 16, integers are stored in two's complement format), and an index of the largest component that is omitted in the encoding. The largest component is assumed to always be positive (which is possible due to quaternion double-cover). To allow per-element control over K, the last input element must explicitly encode 1.0 as a signed normalized K-bit integer, except for the least significant 2 bits that store the index of the maximum component.
+The input to the filter is three quaternion components, excluding the component with the largest magnitude, encoded as signed normalized K-bit integers (4 <= K <= 16, integers are stored in two's complement format), and an index of the largest component that is omitted in the encoding. The largest component is assumed to always be positive (which is possible due to quaternion double-cover). To allow per-element control over K, the last input element must explicitly encode 1.0 as a signed normalized K-bit integer, except for the least significant 2 bits that store the index of the maximum component.
 
 When storing a K-bit integer in a 16-bit component when K is not 16, the remaining bits (e.g. top 6 bits in case of K=10) must be equal to the sign bit; the valid range of the resulting integer is from `-max` to `max` where `max = (1 << (K - 1)) - 1`. The behavior of decoding values outside of that range is unspecified.
 
 The output of the filter is four decoded quaternion components, stored as 16-bit normalized integers.
 
-After eliminating the maximum component, the maximum magnitude of the remaining components is 1/sqrt(2). Because of this the input components store the original component value scaled by sqrt(2.0) to increase precision.
+After eliminating the maximum component, the maximum magnitude of the remaining components is `1.0/sqrt(2.0)`. Because of this the input components store the original component value scaled by `sqrt(2.0)` to increase precision.
 
 ```
 void decode(int16_t input[4], int16_t output[4]) {

From 7a3039dba336e64d8ee93bb27ba1e2f14d4b6046 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Wed, 26 Nov 2025 16:45:30 -0800
Subject: [PATCH 27/54] Adjust structural rules per review discussion

- Prohibit use of extension on buffers that use EXT variant to avoid
  fallback complexity
- Remove byteStride equality restriction as it doesn't affect decoding
  in practice and byteStride may be absent anyway
---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index e826caec80..27d4dcdc9e 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -18,6 +18,7 @@ Written against the glTF 2.0 spec.
 ## Exclusions
 
 - This extension must not be used on a buffer view that also uses `EXT_meshopt_compression`.
+- This extension must not be used on a buffer that also uses `EXT_meshopt_compression` (see "Fallback buffers").
 
 ## Overview
 
@@ -77,7 +78,6 @@ Each `bufferView` can contain an extension object with the following properties:
 
 For the extension object to be valid, the following must hold:
 
-- When parent `bufferView` has `byteStride` defined, it matches `byteStride` in the extension JSON
 - The parent `bufferView.byteLength` is equal to `byteStride` times `count`
 - When `mode` is `"ATTRIBUTES"`, `byteStride` must be divisible by 4 and must be >= 4 and <= 256.
 - When `mode` is `"TRIANGLES"`, `count` must be divisible by 3

From 65c5ceefed7d48d00f93da1f2e3c10c7d93f50d6 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Wed, 26 Nov 2025 16:51:55 -0800
Subject: [PATCH 28/54] Clarifications of LEB128, padding, FIFO sizes

---
 .../2.0/Khronos/KHR_meshopt_compression/README.md      | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 27d4dcdc9e..0108b395a4 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -220,7 +220,7 @@ The encoded stream structure is as follows:
 
 - Header byte, which must be equal to `0xa1` (version 1) or `0xa0` (version 0)
 - One or more attribute blocks, detailed below
-- Tail padding, which pads the size of the subsequent tail block to a minimum of 24 bytes for version 1 or 32 bytes for version 0 (required for efficient decoding)
+- Tail padding, which pads the size of the subsequent tail block with zero bytes to a minimum of 24 bytes for version 1 or 32 bytes for version 0 (required for efficient decoding)
 - Tail block, which consists of a baseline element stored verbatim (`byteStride` bytes), followed by channel modes (`byteStride / 4` bytes, only in version 1)
 
 **Non-normative** While using version 1 is preferred for better compression, version 0 is provided for binary compatibility with `EXT_meshopt_compression`. When using version 0, the bitstream is identical to that defined in `EXT_meshopt_compression`.
@@ -375,12 +375,12 @@ During the decoding process, decoder maintains four variables:
 
 - `next`: an integer referring to the expected next unique index (also known as high-watermark), starts at 0
 - `last`: an integer referring to the last encoded index, starts at 0
-- `edgefifo`: a 16-entry FIFO with two vertex indices in each entry; initial contents is undefined
-- `vertexfifo`: a 16-entry FIFO with a vertex index in each entry; initial contents is undefined
+- `edgefifo`: a 16-entry FIFO with two `uint32_t` vertex indices in each entry; initial contents is undefined
+- `vertexfifo`: a 16-entry FIFO with a `uint32_t` vertex index in each entry; initial contents is undefined
 
 To decode each triangle, the decoder needs to analyze the `code` byte, read additional bytes from `data` as necessary, and update the internal state correctly. The `code` byte encoding is optimized to reach a single byte per triangle in most common cases; the resulting data can often be compressed by a general purpose compressor running on the resulting .bin/.glb file.
 
-When extra data is necessary to decode a triangle and it represents an index value, the decoder uses varint-7 encoding (also known as [unsigned LEB128](https://en.wikipedia.org/wiki/LEB128#Unsigned_LEB128)), which encodes an integer as one or more bytes, with the byte with the 0 most significant bit terminating the sequence:
+When extra data is necessary to decode a triangle and it represents an index value, the decoder uses varint-7 encoding (also known as [unsigned LEB128](https://en.wikipedia.org/wiki/LEB128#Unsigned_LEB128)) with up to 5 bytes, which encodes an integer as one or more bytes, with the byte with the 0 most significant bit terminating the sequence:
 
 ```
 0x7f => 0x7f
@@ -486,7 +486,7 @@ The encoded stream structure is as follows:
 
 Instead of simply encoding deltas vs the previous index, the decoder tracks *two* baseline index values, that start at 0. Each delta is specified in relation to one of these values and updates it so that the next delta that references the same baseline uses the encoded index value as a reference. This encoding is more efficient at handling some types of bimodal sequences where two independent monotonic sequences are spliced together, which can occur for some common cases of triangle strips or line lists.
 
-To specify the index delta, the varint-7 encoding scheme (also known as [unsigned LEB128](https://en.wikipedia.org/wiki/LEB128#Unsigned_LEB128)) is used, which encodes an integer as one or more bytes, with the byte with the 0 most significant bit terminating the sequence:
+To specify the index delta, the varint-7 encoding scheme (also known as [unsigned LEB128](https://en.wikipedia.org/wiki/LEB128#Unsigned_LEB128)) with up to 5 bytes is used, which encodes an integer as one or more bytes, with the byte with the 0 most significant bit terminating the sequence:
 
 ```
 0x7f => 0x7f

From d8f9812efb6c0abdf85eac029cb6b7dde73aa001 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Wed, 26 Nov 2025 16:56:20 -0800
Subject: [PATCH 29/54] Add validity restrictions wrt end of stream or
 unprocessed bytes

Also prohibit streams that require undefined FIFO reads
---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 0108b395a4..0c82e508a5 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -225,7 +225,7 @@ The encoded stream structure is as follows:
 
 **Non-normative** While using version 1 is preferred for better compression, version 0 is provided for binary compatibility with `EXT_meshopt_compression`. When using version 0, the bitstream is identical to that defined in `EXT_meshopt_compression`.
 
-Note that there is no way to calculate the length of a stream; instead, it is expected that the input stream is correctly sized (using `byteLength`) so that the tail block element can be found.
+Note that there is no way to calculate the length of a stream; instead, the input stream must be correctly sized (using `byteLength`) so that the tail block element can be found. If the decoding procedure reaches the end of stream too early, or any unprocessed bytes remain after decoding and before tail, the stream is invalid.
 
 Each attribute block encodes a sequence of deltas, with the first element in the first block using the deltas from the baseline element stored in the tail block, and each subsequent element using the deltas from the previous element. The attribute block always stores an integer number of elements, with that number computed as follows:
 
@@ -364,7 +364,7 @@ The encoded stream structure is as follows:
 - Extra data which is necessary to decode triangles that don't fit into a single byte, referred to as `data` below
 - Tail block, which consists of a 16-byte lookup table (containing 16 one-byte values), referred to as `codeaux` below
 
-Note that there is no way to calculate the length of a stream; instead, it is expected that the input stream is correctly sized (using `byteLength`) so that the tail block element can be found.
+Note that there is no way to calculate the length of a stream; instead, the input stream must be correctly sized (using `byteLength`) so that the tail block element can be found. If the decoding procedure reaches the end of stream too early, or any unprocessed bytes remain after decoding and before tail, the stream is invalid.
 
 There are two limitations on the structure of the 16-byte lookup table:
 
@@ -401,6 +401,8 @@ uint32_t decodeIndex(uint32_t v) {
 
 The encoding for `code` is split into various cases, some of which are self-sufficient and some need to read extra data. The encoding is detailed below; after either path the triangle (a, b, c) is emitted to the output.
 
+Any streams that require the decoder to read an edge or vertex FIFO entry that was not previously written are invalid.
+
 - `0xX0`, where `X < 0xf`: Encodes a recently encountered edge and a `next` vertex.
 
 The edge (a, b) is read from the edge FIFO at index X (where 0 is the most recently added edge).
@@ -484,6 +486,8 @@ The encoded stream structure is as follows:
 - A sequence of index deltas (with number of elements equal to `count` extension property), with encoding specified below
 - Tail block, which consists of 4 bytes that are reserved and should be set to 0
 
+Note that there is no way to calculate the length of a stream; instead, the input stream must be correctly sized (using `byteLength`) so that the tail block element can be found. If the decoding procedure reaches the end of stream too early, or any unprocessed bytes remain after decoding and before tail, the stream is invalid.
+
 Instead of simply encoding deltas vs the previous index, the decoder tracks *two* baseline index values, that start at 0. Each delta is specified in relation to one of these values and updates it so that the next delta that references the same baseline uses the encoded index value as a reference. This encoding is more efficient at handling some types of bimodal sequences where two independent monotonic sequences are spliced together, which can occur for some common cases of triangle strips or line lists.
 
 To specify the index delta, the varint-7 encoding scheme (also known as [unsigned LEB128](https://en.wikipedia.org/wiki/LEB128#Unsigned_LEB128)) with up to 5 bytes is used, which encodes an integer as one or more bytes, with the byte with the 0 most significant bit terminating the sequence:

From d9c4fa0f4e29fd77226c2592ab8945a70e0518bc Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Wed, 26 Nov 2025 17:16:00 -0800
Subject: [PATCH 30/54] Decoder needs to keep track of data section offset

---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 0c82e508a5..cd30792ddd 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -371,8 +371,9 @@ There are two limitations on the structure of the 16-byte lookup table:
 - The last two bytes must be 0
 - Neither high four bits nor low four bits of any of 16 bytes can be equal to `0xf`.
 
-During the decoding process, decoder maintains four variables:
+During the decoding process, decoder maintains five variables:
 
+- current offset into `data` section
 - `next`: an integer referring to the expected next unique index (also known as high-watermark), starts at 0
 - `last`: an integer referring to the last encoded index, starts at 0
 - `edgefifo`: a 16-entry FIFO with two `uint32_t` vertex indices in each entry; initial contents is undefined

From 4f1c7ee45305003a0c9af6a37b019cdaca0ed60d Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Wed, 26 Nov 2025 17:19:51 -0800
Subject: [PATCH 31/54] Channel clarifications

- Remove misleading use of channels in relation to control modes; the
  fact that we use a byte per channel here is accidental, in reality
  it's just a packed sequence of 2-bit control modes.
- Streams that use channel mode 3+ or channel mode 0/1 with non-0 high
  bits are invalid
- Rotation uses uint and uses formula that doesn't trigger C UB
---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index cd30792ddd..427bd7a1d6 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -237,7 +237,7 @@ blockElements = min(remainingElements, maxBlockElements)
 Where `remainingElements` is the number of elements that have yet to be decoded (with the initial value of `count` extension property).
 
 Each attribute block consists of:
-- Control header (only in version 1): `byteStride / 4` bytes specifying 4 packed control modes for each byte of a 4-byte channel
+- Control header (only in version 1): `byteStride / 4` bytes specifying a packed 2-bit control mode for each byte position of the element
 - `byteStride` "data blocks" (one for each byte of the element), each containing deltas stored for groups of elements
 
 Each group always contains 16 elements; when the number of elements that needs to be encoded isn't divisible by 16, it gets rounded up and the remaining elements are ignored after decoding. In other terms:
@@ -343,16 +343,18 @@ decode(uint16_t v) = ((v & 1) != 0) ? ~(v >> 1) : (v >> 1)
 
 The deltas are computed in 16-bit integer space with wraparound two's complement arithmetic. Values are assumed to be little-endian, so the least significant byte is encoded before the most significant byte.
 
-**Channel mode 2 (4-byte XOR deltas)**: 4-byte deltas are computed as XOR between 32-bit values of the element and the previous element in the same position, with an additional rotation applied based on the high 4 bits of the channel mode byte:
+**Channel mode 2 (4-byte XOR deltas)**: 4-byte deltas are computed as XOR between 32-bit values of the element and the previous element in the same position, with an additional rotation `r` applied based on the high 4 bits of the channel mode byte:
 
 ```
-rotate(uint32_t v, int r) = (v << r) | (v >> (32 - r))
+rotate(uint32_t v, uint r) = (v << r) | (v >> ((32 - r) & 31))
 ```
 
 The deltas are computed in 32-bit integer space. Values are assumed to be little-endian, so the least significant byte is encoded before the most significant byte.
 
 Because the channel mode defines encoding for 4 bytes at once, it's impossible to mix modes 0 and 1 within the same channel: if the first 2-byte group of an aligned 4-byte group uses 2-byte deltas, the second 2-byte group must use 2-byte deltas as well.
 
+Streams that use channel mode 3 or above, as well as streams that use channel mode 0 or 1 with high 4 bits of the channel mode byte not equal to 0, are invalid.
+
 ## Mode 1: triangles
 
 Triangle compression compresses triangle list indices by exploiting similarity between consecutive triangles. Given a triangle stream that has been optimized for locality, very often subsequent triangles share an edge with the recently encoded triangle. The encoder uses a few other techniques to try to encode most triangles in optimized triangle lists into a single byte.

From 261045443e833f74328fcbd8967528080095f821 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Wed, 26 Nov 2025 17:37:15 -0800
Subject: [PATCH 32/54] Update triangle decoding to be more algorithmic

---
 .../Khronos/KHR_meshopt_compression/README.md | 40 +++++++++++++++----
 1 file changed, 33 insertions(+), 7 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 427bd7a1d6..393282f079 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -447,9 +447,18 @@ Vertex c is pushed to the vertex FIFO.
 
 The table `codeaux` is used to read the element Y; let's assume that results in `0xZW`.
 
-The first index, `a`, is equal to `next`; `next` is incremented to decode b/c correctly.
-The second index, `b`, is equal to `next` if `Z == 0` (`next` is then incremented), or is read from vertex FIFO at index `Z-1` (where 0 is the most recently added vertex).
-The third index, `c`, is equal to `next` if `W == 0` (`next` is then incremented), or is read from vertex FIFO at index `W-1` (where 0 is the most recently added vertex).
+The triangle indices are set as follows:
+
+Set first index, `a`, to `next`.
+Increment `next`.
+If `Z == 0`:
+	Set second index, `b`, to `next`.
+	Increment `next`.
+Else read second index, `b`, from vertex FIFO at index `Z-1` (where 0 is the most recently added vertex).
+If `W == 0`:
+	Set third index, `c`, to `next`.
+	Increment `next`.
+Else read third index, `c`, from vertex FIFO at index `W-1` (where 0 is the most recently added vertex).
 
 Note that in the process `next` is incremented from 1 to 3 times depending on values of Z/W.
 
@@ -462,13 +471,30 @@ Vertex c is pushed to the vertex FIFO if `W == 0`.
 
 - `0xfe` or `0xff`: Encodes three indices explicitly.
 
-This requires an extra byte that is read from `data`; let's assume that results in `0xZW`. Note that this is *not* an LEB128 value, just a single byte.
+Read one byte from `data` as-is, without using LEB128 decoding; let's assume that results in `0xZW`.
 
 If `0xZW` == `0x00`, then `next` is reset to 0. This is a special mechanism used to restart the `next` sequence which is useful for concatenating independent triangle streams. This must be done before further processing.
 
-The first index, `a`, is equal to `next` for `0xfe` encoding (`next` is then incremented), or is read using `decodeIndex` by reading extra bytes from `data` (and also updates `last`).
-The second index, `b`, is equal to `next` if `Z == 0` (`next` is then incremented), is read from vertex FIFO at index `Z-1` (where 0 is the most recently added vertex) if `Z < 0xf`, or is read using `decodeIndex` by reading extra bytes from `data` (and also updates `last`) if `Z == 0xf`.
-The third index, `c`, is equal to `next` if `W == 0` (`next` is then incremented), is read from vertex FIFO at index `W-1` (where 0 is the most recently added vertex) if `W < 0xf`, or is read using `decodeIndex` by reading extra bytes from `data` (and also updates `last`) if `W == 0xf`.
+The triangle indices are set as follows:
+
+If using `0xfe` encoding:
+	Set first index, `a`, to `next`.
+	Increment `next`.
+Else read first index, `a`, using `decodeIndex` by reading extra bytes from `data` (note, this also updates `last`).
+
+If `Z == 0`:
+	Set second index, `b`, to `next`.
+	Increment `next`.
+Else if `Z < 0xf`:
+	Read second index, `b`, from vertex FIFO at index `Z-1` (where 0 is the most recently added vertex).
+Else read second index, `b`, using `decodeIndex` by reading extra bytes from `data` (note, this also updates `last`).
+
+If `W == 0`:
+	Set third index, `c`, to `next`.
+	Increment `next`.
+Else if `W < 0xf`:
+	Read third index, `c`, from vertex FIFO at index `W-1` (where 0 is the most recently added vertex).
+Else read third index, `c`, using `decodeIndex` by reading extra bytes from `data` (note, this also updates `last`).
 
 Edge (b, a) is pushed to the edge FIFO.
 Edge (c, b) is pushed to the edge FIFO.

From 7ae3d991dfb9c0e965b8282e9f5f81b77021c906 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Wed, 26 Nov 2025 17:50:31 -0800
Subject: [PATCH 33/54] Update to remove "must"

---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 393282f079..6cb867b24b 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -539,9 +539,7 @@ uint32_t decode(uint32_t v) {
 }
 ```
 
-It's up to the encoder to determine the optimal selection of the baseline for each index; this encoding scheme can be used to do basic delta encoding (with baseline bit always set to 0) as well as more complex bimodal encodings.
-
-Note that the zigzag-encoded delta must fit in a 31-bit integer; as such, deltas are limited to [-2^30..2^30-1].
+It's up to the encoder to determine the optimal selection of the baseline for each index; this encoding scheme can be used to do basic delta encoding (with baseline bit always set to 0) as well as more complex bimodal encodings. Since zigzag-encoded delta uses a 31-bit integer, the deltas are limited to [-2^30..2^30-1].
 
 # Appendix B: Filters
 

From 7d6cfb67945fd3386f83e15353e98087c707b98c Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Wed, 26 Nov 2025 17:53:20 -0800
Subject: [PATCH 34/54] Clarify wraparound semantics

---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 6cb867b24b..9613973a20 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -376,7 +376,7 @@ There are two limitations on the structure of the 16-byte lookup table:
 During the decoding process, decoder maintains five variables:
 
 - current offset into `data` section
-- `next`: an integer referring to the expected next unique index (also known as high-watermark), starts at 0
+- `next`: an integer referring to the expected next unique index (also known as high-watermark), starts at 0 and is incremented with unsigned 32-bit wraparound
 - `last`: an integer referring to the last encoded index, starts at 0
 - `edgefifo`: a 16-entry FIFO with two `uint32_t` vertex indices in each entry; initial contents is undefined
 - `vertexfifo`: a 16-entry FIFO with a `uint32_t` vertex index in each entry; initial contents is undefined
@@ -397,7 +397,7 @@ Instead of using the raw index value, a zigzag-encoded 32-bit delta from `last`
 uint32_t decodeIndex(uint32_t v) {
 	int32_t delta = (v & 1) != 0 ? ~(v >> 1) : (v >> 1);
 
-	last += delta;
+	last += delta; // unsigned 32-bit wraparound
 	return last;
 }
 ```
@@ -534,7 +534,7 @@ uint32_t decode(uint32_t v) {
 	int32_t baseline = v & 1;
 	int32_t delta = (v & 2) != 0 ? ~(v >> 2) : (v >> 2);
 
-	last[baseline] += delta;
+	last[baseline] += delta; // unsigned 32-bit wraparound
 	return last[baseline];
 }
 ```

From b6fb8bca1eb056838751fe13db4d157861ea7fc0 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Thu, 27 Nov 2025 08:04:58 -0800
Subject: [PATCH 35/54] int32_t => uint32_t

---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 9613973a20..4e16faae3c 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -395,8 +395,7 @@ Instead of using the raw index value, a zigzag-encoded 32-bit delta from `last`
 
 ```
 uint32_t decodeIndex(uint32_t v) {
-	int32_t delta = (v & 1) != 0 ? ~(v >> 1) : (v >> 1);
-
+	uint32_t delta = (v & 1) != 0 ? ~(v >> 1) : (v >> 1);
 	last += delta; // unsigned 32-bit wraparound
 	return last;
 }
@@ -531,9 +530,8 @@ When decoding the deltas, the 32-bit value is read using the varint-7 encoding.
 
 ```
 uint32_t decode(uint32_t v) {
-	int32_t baseline = v & 1;
-	int32_t delta = (v & 2) != 0 ? ~(v >> 2) : (v >> 2);
-
+	uint32_t baseline = v & 1;
+	uint32_t delta = (v & 2) != 0 ? ~(v >> 2) : (v >> 2);
 	last[baseline] += delta; // unsigned 32-bit wraparound
 	return last[baseline];
 }

From be62d33e070f271dd494da8d407e264d267dbaa0 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Thu, 27 Nov 2025 08:15:35 -0800
Subject: [PATCH 36/54] Add clarifications wrt uint32 wraparound and next/last
 being uint32

---
 .../2.0/Khronos/KHR_meshopt_compression/README.md      | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 4e16faae3c..31cf2901b8 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -376,8 +376,8 @@ There are two limitations on the structure of the 16-byte lookup table:
 During the decoding process, decoder maintains five variables:
 
 - current offset into `data` section
-- `next`: an integer referring to the expected next unique index (also known as high-watermark), starts at 0 and is incremented with unsigned 32-bit wraparound
-- `last`: an integer referring to the last encoded index, starts at 0
+- `next`: a `uint32_t` referring to the expected next unique index (also known as high-watermark), starts at 0 and is incremented with unsigned 32-bit wraparound
+- `last`: a `uint32_t` referring to the last encoded index, starts at 0
 - `edgefifo`: a 16-entry FIFO with two `uint32_t` vertex indices in each entry; initial contents is undefined
 - `vertexfifo`: a 16-entry FIFO with a `uint32_t` vertex index in each entry; initial contents is undefined
 
@@ -391,7 +391,7 @@ When extra data is necessary to decode a triangle and it represents an index val
 0xff 0xa0 0x05 => 0x1fd005
 ```
 
-Instead of using the raw index value, a zigzag-encoded 32-bit delta from `last` is used:
+When decoding the deltas, the 32-bit value is read using the varint-7 encoding (with unsigned 32-bit wraparound). The resulting value specifies a zigzag-encoded signed delta from `last` and can be decoded as follows:
 
 ```
 uint32_t decodeIndex(uint32_t v) {
@@ -526,7 +526,7 @@ To specify the index delta, the varint-7 encoding scheme (also known as [unsigne
 0xff 0xa0 0x05 => 0x1fd005
 ```
 
-When decoding the deltas, the 32-bit value is read using the varint-7 encoding. The least significant bit of the value indicates one of the baseline values; the remaining bits specify a zigzag-encoded signed delta and can be decoded as follows:
+When decoding the deltas, the 32-bit value is read using the varint-7 encoding (with unsigned 32-bit wraparound). The least significant bit of the value indicates one of the baseline values; the remaining bits specify a zigzag-encoded signed delta and can be decoded as follows:
 
 ```
 uint32_t decode(uint32_t v) {
@@ -639,7 +639,7 @@ This filter is only valid if `byteStride` is a multiple of 4.
 
 The input to the filter is a sequence of 32-bit little endian integers, with the most significant 8 bits specifying a (signed) exponent value, and the remaining 24 bits specifying a (signed) mantissa value. The integers are stored in two's complement format.
 
-The result of the filter is 2^e * m:
+The output of the filter is a sequence of 32-bit floating point values, represented according to IEEE 754 standard. Each value is computed from the integer input as `2^e * m`:
 
 ```
 float32_t decode(int32_t input) {

From d700b37906b9dbfcbaa102334b18796deb5a3491 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Thu, 27 Nov 2025 08:25:25 -0800
Subject: [PATCH 37/54] Reformat index decoding as bulleted list

---
 .../Khronos/KHR_meshopt_compression/README.md | 148 +++++++++---------
 1 file changed, 74 insertions(+), 74 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 31cf2901b8..04f0797fed 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -401,106 +401,106 @@ uint32_t decodeIndex(uint32_t v) {
 }
 ```
 
-The encoding for `code` is split into various cases, some of which are self-sufficient and some need to read extra data. The encoding is detailed below; after either path the triangle (a, b, c) is emitted to the output.
-
 Any streams that require the decoder to read an edge or vertex FIFO entry that was not previously written are invalid.
 
+The encoding for `code` is split into various cases, some of which are self-sufficient and some need to read extra data. The encoding is detailed below; after either path the triangle (a, b, c) is emitted to the output.
+
 - `0xX0`, where `X < 0xf`: Encodes a recently encountered edge and a `next` vertex.
+	- The edge (a, b) is read from the edge FIFO at index X (where 0 is the most recently added edge).
+	- The third index, `c`, is set to `next`.
+	- `next` is incremented.
 
-The edge (a, b) is read from the edge FIFO at index X (where 0 is the most recently added edge).
-The third index, `c`, is equal to `next` (which is then incremented).
+	- Edge (c, b) is pushed to the edge FIFO.
+	- Edge (a, c) is pushed to the edge FIFO.
 
-Edge (c, b) is pushed to the edge FIFO.
-Edge (a, c) is pushed to the edge FIFO.
-Vertex c is pushed to the vertex FIFO.
+	- Vertex c is pushed to the vertex FIFO.
 
 - `0xXY`, where `X < 0xf` and `0 < Y < 0xd`: Encodes a recently encountered edge and a recently encountered vertex.
+	- The edge (a, b) is read from the edge FIFO at index X (where 0 is the most recently added edge).
+	- The third index, `c`, is read from the vertex FIFO at index Y (where 0 is the most recently added vertex; note that 0 is never actually read here, since `Y > 0`).
 
-The edge (a, b) is read from the edge FIFO at index X (where 0 is the most recently added edge).
-The third index, `c`, is read from the vertex FIFO at index Y (where 0 is the most recently added vertex; note that 0 is never actually read here, since `Y > 0`).
-
-Edge (c, b) is pushed to the edge FIFO.
-Edge (a, c) is pushed to the edge FIFO.
+	- Edge (c, b) is pushed to the edge FIFO.
+	- Edge (a, c) is pushed to the edge FIFO.
 
 - `0xXd` or `0xXe`, where `X < 0xf`: Encodes a recently encountered edge and a vertex that's adjacent to `last`.
 
-The edge (a, b) is read from the edge FIFO at index X (where 0 is the most recently added edge).
-The third index, `c`, is equal to `last-1` for `0xXd` and `last+1` for `0xXe`.
+	- The edge (a, b) is read from the edge FIFO at index X (where 0 is the most recently added edge).
+	- The third index, `c`, is equal to `last-1` for `0xXd` and `last+1` for `0xXe`.
+	- `last` is set to `c` (effectively decrementing or incrementing it accordingly).
 
-`last` is set to `c` (effectively decrementing or incrementing it accordingly).
+	- Edge (c, b) is pushed to the edge FIFO.
+	- Edge (a, c) is pushed to the edge FIFO.
 
-Edge (c, b) is pushed to the edge FIFO.
-Edge (a, c) is pushed to the edge FIFO.
-Vertex c is pushed to the vertex FIFO.
+	- Vertex c is pushed to the vertex FIFO.
 
 - `0xXf`, where `X < 0xf`: Encodes a recently encountered edge and a free-standing vertex encoded explicitly.
 
-The edge (a, b) is read from the edge FIFO at index X (where 0 is the most recently added edge).
-The third index, `c`, is decoded using `decodeIndex` by reading extra bytes from `data` (and also updates `last`).
+	- The edge (a, b) is read from the edge FIFO at index X (where 0 is the most recently added edge).
+	- The third index, `c`, is decoded using `decodeIndex` by reading extra bytes from `data` (note, this also updates `last`).
 
-Edge (c, b) is pushed to edge FIFO.
-Edge (a, c) is pushed to edge FIFO.
-Vertex c is pushed to the vertex FIFO.
+	- Edge (c, b) is pushed to edge FIFO.
+	- Edge (a, c) is pushed to edge FIFO.
+
+	- Vertex c is pushed to the vertex FIFO.
 
 - `0xfY`, where `Y < 0xe`: Encodes three indices using `codeaux` table lookup and vertex FIFO.
 
-The table `codeaux` is used to read the element Y; let's assume that results in `0xZW`.
+	- The table `codeaux` is used to read the element Y; let's assume that results in `0xZW`.
 
-The triangle indices are set as follows:
+	- The first index, `a`, is set to `next`.
+	- `next` is incremented.
 
-Set first index, `a`, to `next`.
-Increment `next`.
-If `Z == 0`:
-	Set second index, `b`, to `next`.
-	Increment `next`.
-Else read second index, `b`, from vertex FIFO at index `Z-1` (where 0 is the most recently added vertex).
-If `W == 0`:
-	Set third index, `c`, to `next`.
-	Increment `next`.
-Else read third index, `c`, from vertex FIFO at index `W-1` (where 0 is the most recently added vertex).
+	- If `Z == 0`:
+		- The second index, `b`, is set to `next`.
+		- `next` is incremented.
+	- Otherwise:
+		- The second index, `b`, is read from vertex FIFO at index `Z-1` (where 0 is the most recently added vertex).
 
-Note that in the process `next` is incremented from 1 to 3 times depending on values of Z/W.
+	- If `W == 0`:
+		- The third index, `c`, is set to `next`.
+		- `next` is incremented.
+	- Otherwise the third index, `c`, is read from vertex FIFO at index `W-1` (where 0 is the most recently added vertex).
 
-Edge (b, a) is pushed to the edge FIFO.
-Edge (c, b) is pushed to the edge FIFO.
-Edge (a, c) is pushed to the edge FIFO.
-Vertex a is pushed to the vertex FIFO.
-Vertex b is pushed to the vertex FIFO if `Z == 0`.
-Vertex c is pushed to the vertex FIFO if `W == 0`.
+	- Note that in the process `next` is incremented from 1 to 3 times depending on values of Z/W.
 
-- `0xfe` or `0xff`: Encodes three indices explicitly.
+	- Edge (b, a) is pushed to the edge FIFO.
+	- Edge (c, b) is pushed to the edge FIFO.
+	- Edge (a, c) is pushed to the edge FIFO.
 
-Read one byte from `data` as-is, without using LEB128 decoding; let's assume that results in `0xZW`.
-
-If `0xZW` == `0x00`, then `next` is reset to 0. This is a special mechanism used to restart the `next` sequence which is useful for concatenating independent triangle streams. This must be done before further processing.
-
-The triangle indices are set as follows:
-
-If using `0xfe` encoding:
-	Set first index, `a`, to `next`.
-	Increment `next`.
-Else read first index, `a`, using `decodeIndex` by reading extra bytes from `data` (note, this also updates `last`).
-
-If `Z == 0`:
-	Set second index, `b`, to `next`.
-	Increment `next`.
-Else if `Z < 0xf`:
-	Read second index, `b`, from vertex FIFO at index `Z-1` (where 0 is the most recently added vertex).
-Else read second index, `b`, using `decodeIndex` by reading extra bytes from `data` (note, this also updates `last`).
-
-If `W == 0`:
-	Set third index, `c`, to `next`.
-	Increment `next`.
-Else if `W < 0xf`:
-	Read third index, `c`, from vertex FIFO at index `W-1` (where 0 is the most recently added vertex).
-Else read third index, `c`, using `decodeIndex` by reading extra bytes from `data` (note, this also updates `last`).
-
-Edge (b, a) is pushed to the edge FIFO.
-Edge (c, b) is pushed to the edge FIFO.
-Edge (a, c) is pushed to the edge FIFO.
-Vertex a is pushed to the vertex FIFO.
-Vertex b is pushed to the vertex FIFO if `Z == 0` or `Z == 0xf`.
-Vertex c is pushed to the vertex FIFO if `W == 0` or `W == 0xf`.
+	- Vertex a is pushed to the vertex FIFO.
+	- Vertex b is pushed to the vertex FIFO if `Z == 0`.
+	- Vertex c is pushed to the vertex FIFO if `W == 0`.
+
+- `0xfe` or `0xff`: Encodes three indices explicitly.
+	- Read one byte from `data` as-is, without using LEB128 decoding; let's assume that results in `0xZW`.
+	- If `0xZW` == `0x00`, then `next` is reset to 0. This is a special mechanism used to restart the `next` sequence which is useful for concatenating independent triangle streams. This must be done before further processing.
+
+	- If using `0xfe` encoding:
+		- The first index, `a`, is set to `next`.
+		- `next` is incremented.
+	- Otherwise the first index, `a`, is read using `decodeIndex` by reading extra bytes from `data` (note, this also updates `last`).
+
+	- If `Z == 0`:
+		- The second index, `b`, is set to `next`.
+		- `next` is incremented.
+	- Else if `Z < 0xf`:
+		- The second index, `b`, is read from vertex FIFO at index `Z-1` (where 0 is the most recently added vertex).
+	- Otherwise the second index, `b`, is read using `decodeIndex` by reading extra bytes from `data` (note, this also updates `last`).
+
+	- If `W == 0`:
+		- The third index, `c`, is set to `next`.
+		- `next` is incremented.
+	- Else if `W < 0xf`:
+		- The third index, `c`, is read from vertex FIFO at index `W-1` (where 0 is the most recently added vertex).
+	- Otherwise the third index, `c`, is read using `decodeIndex` by reading extra bytes from `data` (note, this also updates `last`).
+
+	- Edge (b, a) is pushed to the edge FIFO.
+	- Edge (c, b) is pushed to the edge FIFO.
+	- Edge (a, c) is pushed to the edge FIFO.
+
+	- Vertex a is pushed to the vertex FIFO.
+	- Vertex b is pushed to the vertex FIFO if `Z == 0` or `Z == 0xf`.
+	- Vertex c is pushed to the vertex FIFO if `W == 0` or `W == 0xf`.
 
 At the end of the decoding, `data` is expected to be fully read by all the triangle codes and not contain any extra bytes.
 

From db4eb56db454ed6b21133aa1a7a489a299ce5aaf Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Thu, 27 Nov 2025 08:26:30 -0800
Subject: [PATCH 38/54] Fix inconsistent structure

---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 04f0797fed..a00bcbcb7f 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -453,8 +453,7 @@ The encoding for `code` is split into various cases, some of which are self-suff
 	- If `Z == 0`:
 		- The second index, `b`, is set to `next`.
 		- `next` is incremented.
-	- Otherwise:
-		- The second index, `b`, is read from vertex FIFO at index `Z-1` (where 0 is the most recently added vertex).
+	- Otherwise the second index, `b`, is read from vertex FIFO at index `Z-1` (where 0 is the most recently added vertex).
 
 	- If `W == 0`:
 		- The third index, `c`, is set to `next`.

From 87050ff3cb3366baa91e14e950ea26835d3276e0 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Thu, 27 Nov 2025 08:47:54 -0800
Subject: [PATCH 39/54] More clarifications

- Zero bit padding
- Triangle indices and indices are emitted as 32/16 bit based on
  byteStride
- remainingElements is reduced by blockElements

Also split tail block description into a nested list
---
 .../2.0/Khronos/KHR_meshopt_compression/README.md    | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index a00bcbcb7f..a861f26277 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -221,7 +221,9 @@ The encoded stream structure is as follows:
 - Header byte, which must be equal to `0xa1` (version 1) or `0xa0` (version 0)
 - One or more attribute blocks, detailed below
 - Tail padding, which pads the size of the subsequent tail block with zero bytes to a minimum of 24 bytes for version 1 or 32 bytes for version 0 (required for efficient decoding)
-- Tail block, which consists of a baseline element stored verbatim (`byteStride` bytes), followed by channel modes (`byteStride / 4` bytes, only in version 1)
+- Tail block, which consists of:
+	- Baseline element stored verbatim (`byteStride` bytes)
+	- Channel modes (`byteStride / 4` bytes, only in version 1)
 
 **Non-normative** While using version 1 is preferred for better compression, version 0 is provided for binary compatibility with `EXT_meshopt_compression`. When using version 0, the bitstream is identical to that defined in `EXT_meshopt_compression`.
 
@@ -234,7 +236,7 @@ maxBlockElements = min((8192 / byteStride) & ~15, 256)
 blockElements = min(remainingElements, maxBlockElements)
 ```
 
-Where `remainingElements` is the number of elements that have yet to be decoded (with the initial value of `count` extension property).
+Where `remainingElements` is the number of elements that have yet to be decoded (with the initial value of `count` extension property). Decoding the attribute block reduces `remainingElements` value by `blockElements`.
 
 Each attribute block consists of:
 - Control header (only in version 1): `byteStride / 4` bytes specifying a packed 2-bit control mode for each byte position of the element
@@ -262,7 +264,7 @@ The control bits specify the control mode for each byte:
 - control mode 3: Literal encoding; delta bytes are stored uncompressed with no header bits
 
 The structure of each "data block" (when using control mode 0 or 1, or when using version 0) breaks down as follows:
-- Header bits, with 2 bits for each group, aligned to the byte boundary if groupCount is not divisible by 4
+- Header bits, with 2 bits for each group, aligned to the byte boundary with zero padding if groupCount is not divisible by 4
 - Delta blocks, with variable number of bytes stored for each group
 
 Header bits are stored from least significant to most significant bit - header bits for 4 consecutive groups are packed in a byte together as follows:
@@ -501,6 +503,8 @@ The encoding for `code` is split into various cases, some of which are self-suff
 	- Vertex b is pushed to the vertex FIFO if `Z == 0` or `Z == 0xf`.
 	- Vertex c is pushed to the vertex FIFO if `W == 0` or `W == 0xf`.
 
+After decoding, the triangle indices a, b, c are emitted as 32-bit unsigned integers (if `byteStride == 4`) or 16-bit unsigned integers with wraparound (if `byteStride == 2`).
+
 At the end of the decoding, `data` is expected to be fully read by all the triangle codes and not contain any extra bytes.
 
 ## Mode 2: indices
@@ -536,6 +540,8 @@ uint32_t decode(uint32_t v) {
 }
 ```
 
+After decoding, the resulting value is emitted as a 32-bit unsigned integer (if `byteStride == 4`) or a 16-bit unsigned integer with wraparound (if `byteStride == 2`).
+
 It's up to the encoder to determine the optimal selection of the baseline for each index; this encoding scheme can be used to do basic delta encoding (with baseline bit always set to 0) as well as more complex bimodal encodings. Since zigzag-encoded delta uses a 31-bit integer, the deltas are limited to [-2^30..2^30-1].
 
 # Appendix B: Filters

From 179b0e85577f7d758200b6ef2438203919fccc42 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Thu, 27 Nov 2025 08:54:36 -0800
Subject: [PATCH 40/54] Suggestions

---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index a861f26277..d59d6761b5 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -171,7 +171,7 @@ Vertex data should be quantized using the appropriate representation; this exten
 
 Morph targets can be treated identically to other vertex attributes, as long as vertex order optimization is performed on all target streams at the same time. It is recommended to use quantized storage for morph target deltas, possibly with a narrower type than that used for baseline values.
 
-When storing vertex data, mode 0 (attributes) should be used; for index data, mode 1 (triangles) or mode 2 (indices) should be used instead. Mode 1 only supports triangle list storage; indices of other topology types can be stored using mode 2. The use of triangle strip topology is not recommended since it's more efficient to store triangle lists using mode 1.
+When storing vertex data, mode 0 (attributes) should be used; for index data, mode 1 (triangles) or mode 2 (indices) should be used instead. Mode 1 only supports triangle list storage; indices of other topology types can be stored using mode 2 (indices). The use of triangle strip topology is not recommended since it's more efficient to store triangle lists using mode 1 (triangles). These are suggestions; the extension does not require any specific mode to be used for any specific type of data.
 
 Using filter 1 (octahedral) for normal/tangent data, and filter 4 (color) for color data, may improve compression ratio further.
 

From c0cf78861ac04dba76d29f0791814efff89961d7 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Sat, 29 Nov 2025 19:07:59 -0800
Subject: [PATCH 41/54] Clarify varint encoding and alpha encoding for color

Also update K cutoff for color encoding to 2; 0-bit unsigned integer is
not well defined, and K=2 allows 1-bit alpha.
---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index d59d6761b5..8e5a767fdf 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -385,7 +385,7 @@ During the decoding process, decoder maintains five variables:
 
 To decode each triangle, the decoder needs to analyze the `code` byte, read additional bytes from `data` as necessary, and update the internal state correctly. The `code` byte encoding is optimized to reach a single byte per triangle in most common cases; the resulting data can often be compressed by a general purpose compressor running on the resulting .bin/.glb file.
 
-When extra data is necessary to decode a triangle and it represents an index value, the decoder uses varint-7 encoding (also known as [unsigned LEB128](https://en.wikipedia.org/wiki/LEB128#Unsigned_LEB128)) with up to 5 bytes, which encodes an integer as one or more bytes, with the byte with the 0 most significant bit terminating the sequence:
+When extra data is necessary to decode a triangle and it represents an index value, the decoder uses varint-7 encoding (also known as [unsigned LEB128](https://en.wikipedia.org/wiki/LEB128#Unsigned_LEB128)). The encoding stores a 32-bit unsigned integer as a sequence of bytes, where each byte's most significant bit indicates whether more bytes follow. The sequence must consist of 1-5 bytes where the most significant bit of the last byte should be 0 and the most significant bits of all prior bytes should be 1. The value is reconstructed by concatenating the lower 7 bits of each byte, ignoring extra bits:
 
 ```
 0x7f => 0x7f
@@ -521,7 +521,7 @@ Note that there is no way to calculate the length of a stream; instead, the inpu
 
 Instead of simply encoding deltas vs the previous index, the decoder tracks *two* baseline index values, that start at 0. Each delta is specified in relation to one of these values and updates it so that the next delta that references the same baseline uses the encoded index value as a reference. This encoding is more efficient at handling some types of bimodal sequences where two independent monotonic sequences are spliced together, which can occur for some common cases of triangle strips or line lists.
 
-To specify the index delta, the varint-7 encoding scheme (also known as [unsigned LEB128](https://en.wikipedia.org/wiki/LEB128#Unsigned_LEB128)) with up to 5 bytes is used, which encodes an integer as one or more bytes, with the byte with the 0 most significant bit terminating the sequence:
+To specify the index delta, the varint-7 encoding (also known as [unsigned LEB128](https://en.wikipedia.org/wiki/LEB128#Unsigned_LEB128)) is used. The encoding stores a 32-bit unsigned integer as a sequence of bytes, where each byte's most significant bit indicates whether more bytes follow. The sequence must consist of 1-5 bytes where the most significant bit of the last byte should be 0 and the most significant bits of all prior bytes should be 1. The value is reconstructed by concatenating the lower 7 bits of each byte, ignoring extra bits:
 
 ```
 0x7f => 0x7f
@@ -662,12 +662,10 @@ Color filter allows to encode color data using YCoCg color model, which results
 
 This filter is only valid if `byteStride` is 4 or 8. When `byteStride` is 4, then the input and output of this filter are four 8-bit components, and when `byteStride` is 8, the input and output of this filter are four 16-bit components.
 
-The input to the filter is four 8-bit or 16-bit components, where the first component stores the Y (luma) value as a K-bit unsigned integer, the second and third components store Co/Cg (chrominance) values as K-bit signed integers, and the fourth component stores the alpha value as a K-1-bit unsigned integer with the bit K set to 1. 1 <= K <= 16, signed integers are stored in two's complement format.
+The input to the filter is four 8-bit or 16-bit components, where the first component stores the Y (luma) value as a K-bit unsigned integer, the second and third components store Co/Cg (chrominance) values as K-bit signed integers, and the fourth component stores the alpha value as a K-1-bit unsigned integer with the bit K set to 1. K is defined by the position of the most significant bit of the fourth component. 2 <= K <= 16, signed integers are stored in two's complement format.
 
 The transformation uses YCoCg encoding; reconstruction of RGB values can be performed in integer space or in floating point space, depending on the implementation. The encoder must guarantee that original RGB values can be reconstructed using K-bit integer math without overflow or underflow.
 
-The alpha component uses K-1 bits for the alpha value with the high bit set to 1; note that K can be smaller than the bit depth. This allows decoder to recover K and decode the color and alpha values correctly.
-
 The output of the filter is four decoded color components (R, G, B, A), stored as 8-bit or 16-bit normalized integers.
 
 ```

From a5052b3f0881481100e8bd7a7cbf163f4f979fa6 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Sat, 29 Nov 2025 19:08:26 -0800
Subject: [PATCH 42/54] Require exact decoding of exponential filters to avoid
 issues with min/max bounds on positions.

---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 8e5a767fdf..6ff067d602 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -548,7 +548,7 @@ It's up to the encoder to determine the optimal selection of the baseline for ea
 
 Filters are functions that transform each encoded attribute. For each filter, this document specifies the transformation used for decoding the data; it's up to the encoder to pick the parameters of the encoding for each element to balance quality and precision.
 
-For performance reasons the results of the decoding process are specified to one unit in last place (ULP) in terms of the decoded data, e.g. if a filter results in a 16-bit signed normalized integer, decoding may produce results within 1/32767 of specified value.
+For performance reasons the results of the decoding process are specified to one unit in last place (ULP) in terms of the decoded data, e.g. if a filter results in a 16-bit signed normalized integer, decoding may produce results within 1/32767 of specified value. The exponential filter is an exception and must be decoded exactly.
 
 ## Filter 1: octahedral
 

From 2aa636b718d116940c765570da7f7de51c794eb9 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Sat, 29 Nov 2025 19:11:09 -0800
Subject: [PATCH 43/54] Add notes re: fallback

---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 6ff067d602..775a85b7ab 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -118,6 +118,8 @@ The filters are detailed further in [Appendix B (Filters)](#appendix-b-filters).
 
 When using filters, the expectation is that the filter is applied after the attribute decoder on the contents of the resulting bufferView; the resulting data can then be used according to the referencing accessors without further modifications.
 
+When compression filters are used, the decompressed data may not match the original uncompressed data exactly due to precision loss. When a buffer view using filters also has an uncompressed fallback, the `min` and `max` values in accessor bounds must be exact with respect to the uncompressed fallback data and may not be exact with respect to the compressed data.
+
 **Non-normative** To decompress the data, [meshoptimizer](https://github.com/zeux/meshoptimizer) library may be used; it supports efficient decompression using C++ and/or WebAssembly, including fast SIMD implementation for attribute decoding.
 
 ## Fallback buffers
@@ -154,6 +156,8 @@ When a buffer is marked as a fallback buffer, the following must hold:
 
 If a fallback buffer doesn't have a URI and doesn't refer to the GLB binary chunk, it follows that `KHR_meshopt_compression` must be a required extension.
 
+**Non-normative** To ensure consistency between compressed and uncompressed data, encoders should use the decompressed data to populate the fallback buffer view instead of using the original data. This reduces the chance of divergence between the two representations.
+
 ## Compressing geometry data
 
 > This section is non-normative.

From 73676f7025d06f2b21e84fd042c5a21c777dc145 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Sun, 30 Nov 2025 15:38:30 -0800
Subject: [PATCH 44/54] Use must instead of should for varint MSBs

---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 775a85b7ab..11df7e3a3d 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -389,7 +389,7 @@ During the decoding process, decoder maintains five variables:
 
 To decode each triangle, the decoder needs to analyze the `code` byte, read additional bytes from `data` as necessary, and update the internal state correctly. The `code` byte encoding is optimized to reach a single byte per triangle in most common cases; the resulting data can often be compressed by a general purpose compressor running on the resulting .bin/.glb file.
 
-When extra data is necessary to decode a triangle and it represents an index value, the decoder uses varint-7 encoding (also known as [unsigned LEB128](https://en.wikipedia.org/wiki/LEB128#Unsigned_LEB128)). The encoding stores a 32-bit unsigned integer as a sequence of bytes, where each byte's most significant bit indicates whether more bytes follow. The sequence must consist of 1-5 bytes where the most significant bit of the last byte should be 0 and the most significant bits of all prior bytes should be 1. The value is reconstructed by concatenating the lower 7 bits of each byte, ignoring extra bits:
+When extra data is necessary to decode a triangle and it represents an index value, the decoder uses varint-7 encoding (also known as [unsigned LEB128](https://en.wikipedia.org/wiki/LEB128#Unsigned_LEB128)). The encoding stores a 32-bit unsigned integer as a sequence of bytes, where each byte's most significant bit indicates whether more bytes follow. The sequence must consist of 1-5 bytes where the most significant bit of the last byte must be 0 and the most significant bits of all prior bytes must be 1. The value is reconstructed by concatenating the lower 7 bits of each byte, ignoring extra bits:
 
 ```
 0x7f => 0x7f
@@ -525,7 +525,7 @@ Note that there is no way to calculate the length of a stream; instead, the inpu
 
 Instead of simply encoding deltas vs the previous index, the decoder tracks *two* baseline index values, that start at 0. Each delta is specified in relation to one of these values and updates it so that the next delta that references the same baseline uses the encoded index value as a reference. This encoding is more efficient at handling some types of bimodal sequences where two independent monotonic sequences are spliced together, which can occur for some common cases of triangle strips or line lists.
 
-To specify the index delta, the varint-7 encoding (also known as [unsigned LEB128](https://en.wikipedia.org/wiki/LEB128#Unsigned_LEB128)) is used. The encoding stores a 32-bit unsigned integer as a sequence of bytes, where each byte's most significant bit indicates whether more bytes follow. The sequence must consist of 1-5 bytes where the most significant bit of the last byte should be 0 and the most significant bits of all prior bytes should be 1. The value is reconstructed by concatenating the lower 7 bits of each byte, ignoring extra bits:
+To specify the index delta, the varint-7 encoding (also known as [unsigned LEB128](https://en.wikipedia.org/wiki/LEB128#Unsigned_LEB128)) is used. The encoding stores a 32-bit unsigned integer as a sequence of bytes, where each byte's most significant bit indicates whether more bytes follow. The sequence must consist of 1-5 bytes where the most significant bit of the last byte must be 0 and the most significant bits of all prior bytes must be 1. The value is reconstructed by concatenating the lower 7 bits of each byte, ignoring extra bits:
 
 ```
 0x7f => 0x7f

From ab3dd703cc72394b8c8e14592f8f5241ed70a1f2 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Sun, 30 Nov 2025 15:51:31 -0800
Subject: [PATCH 45/54] For color filter, bit K-1 is set to 1 which allows to
 determine K

---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 11df7e3a3d..74d494796c 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -666,7 +666,7 @@ Color filter allows to encode color data using YCoCg color model, which results
 
 This filter is only valid if `byteStride` is 4 or 8. When `byteStride` is 4, then the input and output of this filter are four 8-bit components, and when `byteStride` is 8, the input and output of this filter are four 16-bit components.
 
-The input to the filter is four 8-bit or 16-bit components, where the first component stores the Y (luma) value as a K-bit unsigned integer, the second and third components store Co/Cg (chrominance) values as K-bit signed integers, and the fourth component stores the alpha value as a K-1-bit unsigned integer with the bit K set to 1. K is defined by the position of the most significant bit of the fourth component. 2 <= K <= 16, signed integers are stored in two's complement format.
+The input to the filter is four 8-bit or 16-bit components, where the first component stores the Y (luma) value as a K-bit unsigned integer, the second and third components store Co/Cg (chrominance) values as K-bit signed integers, and the fourth component stores the alpha value as a K-1-bit unsigned integer with the bit K-1 set to 1 and more significant bits set to 0. K can be determined from the position of the most significant bit of the fourth component. 2 <= K <= 16, signed integers are stored in two's complement format.
 
 The transformation uses YCoCg encoding; reconstruction of RGB values can be performed in integer space or in floating point space, depending on the implementation. The encoder must guarantee that original RGB values can be reconstructed using K-bit integer math without overflow or underflow.
 

From 056da78ed77850505be23af18cdc03367928cb7a Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Wed, 3 Dec 2025 08:34:29 -0800
Subject: [PATCH 46/54] Don't use "reserved" to refer to tail padding and
 remove "can be found"

There's no need to find the tail block for mode 2 as it contains no
information to decode.
---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 74d494796c..95392c4b55 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -519,9 +519,9 @@ The encoded stream structure is as follows:
 
 - Header byte, which must be equal to `0xd1`
 - A sequence of index deltas (with number of elements equal to `count` extension property), with encoding specified below
-- Tail block, which consists of 4 bytes that are reserved and should be set to 0
+- Tail block, which consists of 4 padding bytes that should be set to 0
 
-Note that there is no way to calculate the length of a stream; instead, the input stream must be correctly sized (using `byteLength`) so that the tail block element can be found. If the decoding procedure reaches the end of stream too early, or any unprocessed bytes remain after decoding and before tail, the stream is invalid.
+Note that there is no way to calculate the length of a stream; instead, the input stream must be correctly sized (using `byteLength`). If the decoding procedure reaches the end of stream too early, or any unprocessed bytes remain after decoding and before tail, the stream is invalid.
 
 Instead of simply encoding deltas vs the previous index, the decoder tracks *two* baseline index values, that start at 0. Each delta is specified in relation to one of these values and updates it so that the next delta that references the same baseline uses the encoded index value as a reference. This encoding is more efficient at handling some types of bimodal sequences where two independent monotonic sequences are spliced together, which can occur for some common cases of triangle strips or line lists.
 

From 96990519b39cdd9f60e04f43bdcdc7a997ab3dd1 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Wed, 10 Dec 2025 10:04:45 -0800
Subject: [PATCH 47/54] Add overflow/underflow clarification

---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 95392c4b55..873f5bac88 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -668,7 +668,7 @@ This filter is only valid if `byteStride` is 4 or 8. When `byteStride` is 4, the
 
 The input to the filter is four 8-bit or 16-bit components, where the first component stores the Y (luma) value as a K-bit unsigned integer, the second and third components store Co/Cg (chrominance) values as K-bit signed integers, and the fourth component stores the alpha value as a K-1-bit unsigned integer with the bit K-1 set to 1 and more significant bits set to 0. K can be determined from the position of the most significant bit of the fourth component. 2 <= K <= 16, signed integers are stored in two's complement format.
 
-The transformation uses YCoCg encoding; reconstruction of RGB values can be performed in integer space or in floating point space, depending on the implementation. The encoder must guarantee that original RGB values can be reconstructed using K-bit integer math without overflow or underflow.
+The transformation uses YCoCg encoding; reconstruction of RGB values can be performed in integer space or in floating point space, depending on the implementation. The encoder must guarantee that original RGB values can be reconstructed using K-bit integer math without overflow or underflow in the final result.
 
 The output of the filter is four decoded color components (R, G, B, A), stored as 8-bit or 16-bit normalized integers.
 

From a0751ce2580ed323d1886a40df6079bf5989b10a Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Wed, 10 Dec 2025 10:09:50 -0800
Subject: [PATCH 48/54] Improve color decoding pseudocode

- Use uintN more consistently as outputs and two of four inputs are
  unsigned
- Define findMSB
- Clarify that the output is unsigned normalized
---
 .../Khronos/KHR_meshopt_compression/README.md | 20 ++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 873f5bac88..73e3ee21fd 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -670,15 +670,15 @@ The input to the filter is four 8-bit or 16-bit components, where the first comp
 
 The transformation uses YCoCg encoding; reconstruction of RGB values can be performed in integer space or in floating point space, depending on the implementation. The encoder must guarantee that original RGB values can be reconstructed using K-bit integer math without overflow or underflow in the final result.
 
-The output of the filter is four decoded color components (R, G, B, A), stored as 8-bit or 16-bit normalized integers.
+The output of the filter is four decoded color components (R, G, B, A), stored as 8-bit or 16-bit unsigned normalized integers.
 
 ```
-void decode(intN_t input[4], intN_t output[4]) {
+void decode(uintN_t input[4], uintN_t output[4]) {
 	// recover scale from alpha high bit
 	int as = (1 << (findMSB(input[3]) + 1)) - 1;
 
 	// convert to RGB in fixed point
-	int y = input[0], co = input[1], cg = input[2];
+	int y = input[0], co = intN_t(input[1]), cg = intN_t(input[2]);
 
 	int r = y + co - cg;
 	int g = y + cg;
@@ -689,16 +689,26 @@ void decode(intN_t input[4], intN_t output[4]) {
 	a = (a << 1) | (a & 1);
 
 	// compute scaling factor
-	float ss = INTN_MAX / float(as);
+	float ss = UINTN_MAX / float(as);
 
 	output[0] = round(float(r) * ss);
 	output[1] = round(float(g) * ss);
 	output[2] = round(float(b) * ss);
 	output[3] = round(float(a) * ss);
 }
+
+// returns position of most significant bit set (0-based)
+int findMSB(uintN_t v) {
+	for (int i = N - 1; i >= 0; --i) {
+		if (v & (1u << i)) {
+			return i;
+		}
+	}
+	return -1;
+}
 ```
 
-`INTN_MAX` is equal to 255 when using 8-bit components (N is 8) and equal to 65535 when using 16-bit components (N is 16).
+`UINTN_MAX` is equal to 255 when using 8-bit components (N is 8) and equal to 65535 when using 16-bit components (N is 16).
 
 # Appendix C: Differences from EXT_meshopt_compression
 

From 80ae495988b19aa8e5c85dc4b15bc825a07d4f8d Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Wed, 10 Dec 2025 11:10:02 -0800
Subject: [PATCH 49/54] last => least significant

---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 73e3ee21fd..da7beb12e4 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -684,7 +684,7 @@ void decode(uintN_t input[4], uintN_t output[4]) {
 	int g = y + cg;
 	int b = y - co - cg;
 
-	// expand alpha by one bit to match other components, replicating last bit
+	// expand alpha by one bit to match other components, replicating least significant bit
 	int a = input[3] & (as >> 1);
 	a = (a << 1) | (a & 1);
 

From 30889e65218a84c9feb2a576a70909fe22a806d0 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Wed, 10 Dec 2025 16:47:31 -0800
Subject: [PATCH 50/54] Color encoding clarifications

- Clarify top bits when K is smaller than maximum
- Clarify that the range for signed chrominance values is symmetric
- Clarify that the decoding using 32-bit integer math must result in
  K-bit value
---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index da7beb12e4..4b3eb02b3f 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -668,7 +668,9 @@ This filter is only valid if `byteStride` is 4 or 8. When `byteStride` is 4, the
 
 The input to the filter is four 8-bit or 16-bit components, where the first component stores the Y (luma) value as a K-bit unsigned integer, the second and third components store Co/Cg (chrominance) values as K-bit signed integers, and the fourth component stores the alpha value as a K-1-bit unsigned integer with the bit K-1 set to 1 and more significant bits set to 0. K can be determined from the position of the most significant bit of the fourth component. 2 <= K <= 16, signed integers are stored in two's complement format.
 
-The transformation uses YCoCg encoding; reconstruction of RGB values can be performed in integer space or in floating point space, depending on the implementation. The encoder must guarantee that original RGB values can be reconstructed using K-bit integer math without overflow or underflow in the final result.
+When storing a K-bit integer in a 8-bit of 16-bit component when K is not 8 or 16, the remaining bits (e.g. top 6 bits in case of K=10) must be zero for the first and fourth component, and equal to the sign bit for the second and third component, which are signed; the valid range of the two signed integers is from `-max` to `max` where `max = (1 << (K - 1)) - 1`. The behavior of decoding values outside of that range is unspecified.
+
+The transformation uses YCoCg encoding; reconstruction of RGB values can be performed in integer space or in floating point space, depending on the implementation. The encoder must guarantee that original RGB values can be reconstructed using 32-bit signed integer math, with the final result fitting into a K-bit unsigned integer ([0..2^K-1]).
 
 The output of the filter is four decoded color components (R, G, B, A), stored as 8-bit or 16-bit unsigned normalized integers.
 

From e676fbe946e2223d548cbb1caa82134767a4e1b2 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Thu, 11 Dec 2025 08:02:34 -0800
Subject: [PATCH 51/54] Adjust wording around extra bits and YCoCg guarantees

---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 4b3eb02b3f..fbef0a4e1c 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -564,7 +564,7 @@ The input to the filter is four 8-bit or 16-bit components, where the first two
 
 The encoding of the third component allows to compute K for each vector independently from the bit representation, and must encode 1.0 precisely which is equivalent to `(1 << (K - 1)) - 1` as an integer; values of the third component that aren't equal to `(1 << (K - 1)) - 1` for a valid `K` are invalid and the result of decoding such vectors is unspecified.
 
-When storing a K-bit integer in a 8-bit of 16-bit component when K is not 8 or 16, the remaining bits (e.g. top 6 bits in case of K=10) must be equal to the sign bit; the valid range of the resulting integer is from `-max` to `max` where `max = (1 << (K - 1)) - 1`. The behavior of decoding values outside of that range is unspecified.
+When storing a K-bit integer in a 8-bit of 16-bit component when K is less than the component's bit width, the remaining bits (e.g. top 6 bits in case of K=10) must be equal to the sign bit; the valid range of the resulting integer is from `-max` to `max` where `max = (1 << (K - 1)) - 1`. The behavior of decoding values outside of that range is unspecified.
 
 The output of the filter is three decoded unit vector components, stored as 8-bit or 16-bit normalized integers, and the last input component verbatim.
 
@@ -611,7 +611,7 @@ This filter is only valid if `byteStride` is 8.
 
 The input to the filter is three quaternion components, excluding the component with the largest magnitude, encoded as signed normalized K-bit integers (4 <= K <= 16, integers are stored in two's complement format), and an index of the largest component that is omitted in the encoding. The largest component is assumed to always be positive (which is possible due to quaternion double-cover). To allow per-element control over K, the last input element must explicitly encode 1.0 as a signed normalized K-bit integer, except for the least significant 2 bits that store the index of the maximum component.
 
-When storing a K-bit integer in a 16-bit component when K is not 16, the remaining bits (e.g. top 6 bits in case of K=10) must be equal to the sign bit; the valid range of the resulting integer is from `-max` to `max` where `max = (1 << (K - 1)) - 1`. The behavior of decoding values outside of that range is unspecified.
+When storing a K-bit integer in a 16-bit component when K is less than the component's bit width, the remaining bits (e.g. top 6 bits in case of K=10) must be equal to the sign bit; the valid range of the resulting integer is from `-max` to `max` where `max = (1 << (K - 1)) - 1`. The behavior of decoding values outside of that range is unspecified.
 
 The output of the filter is four decoded quaternion components, stored as 16-bit normalized integers.
 
@@ -668,9 +668,9 @@ This filter is only valid if `byteStride` is 4 or 8. When `byteStride` is 4, the
 
 The input to the filter is four 8-bit or 16-bit components, where the first component stores the Y (luma) value as a K-bit unsigned integer, the second and third components store Co/Cg (chrominance) values as K-bit signed integers, and the fourth component stores the alpha value as a K-1-bit unsigned integer with the bit K-1 set to 1 and more significant bits set to 0. K can be determined from the position of the most significant bit of the fourth component. 2 <= K <= 16, signed integers are stored in two's complement format.
 
-When storing a K-bit integer in a 8-bit of 16-bit component when K is not 8 or 16, the remaining bits (e.g. top 6 bits in case of K=10) must be zero for the first and fourth component, and equal to the sign bit for the second and third component, which are signed; the valid range of the two signed integers is from `-max` to `max` where `max = (1 << (K - 1)) - 1`. The behavior of decoding values outside of that range is unspecified.
+When storing a K-bit integer in a 8-bit of 16-bit component when K is less than the component's bit width, the remaining bits (e.g. top 6 bits in case of K=10) must be zero for the first and fourth component, and equal to the sign bit for the second and third component, which are signed; the valid range of the two signed integers is from `-max` to `max` where `max = (1 << (K - 1)) - 1`. The behavior of decoding values outside of that range is unspecified.
 
-The transformation uses YCoCg encoding; reconstruction of RGB values can be performed in integer space or in floating point space, depending on the implementation. The encoder must guarantee that original RGB values can be reconstructed using 32-bit signed integer math, with the final result fitting into a K-bit unsigned integer ([0..2^K-1]).
+The transformation uses YCoCg encoding; reconstruction of RGB values can be performed in integer space or in floating point space, depending on the implementation. The Y, Co and Cg values must be chosen so that the original RGB values can be reconstructed using 32-bit signed integer math, with the final result fitting into a K-bit unsigned integer ([0..2^K-1]).
 
 The output of the filter is four decoded color components (R, G, B, A), stored as 8-bit or 16-bit unsigned normalized integers.
 

From b62ceec373e59fc17c544060c1c5058dabaf134b Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Thu, 11 Dec 2025 08:14:06 -0800
Subject: [PATCH 52/54] Fix typo: of -> or

---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index fbef0a4e1c..20d2558678 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -564,7 +564,7 @@ The input to the filter is four 8-bit or 16-bit components, where the first two
 
 The encoding of the third component allows to compute K for each vector independently from the bit representation, and must encode 1.0 precisely which is equivalent to `(1 << (K - 1)) - 1` as an integer; values of the third component that aren't equal to `(1 << (K - 1)) - 1` for a valid `K` are invalid and the result of decoding such vectors is unspecified.
 
-When storing a K-bit integer in a 8-bit of 16-bit component when K is less than the component's bit width, the remaining bits (e.g. top 6 bits in case of K=10) must be equal to the sign bit; the valid range of the resulting integer is from `-max` to `max` where `max = (1 << (K - 1)) - 1`. The behavior of decoding values outside of that range is unspecified.
+When storing a K-bit integer in a 8-bit or 16-bit component when K is less than the component's bit width, the remaining bits (e.g. top 6 bits in case of K=10) must be equal to the sign bit; the valid range of the resulting integer is from `-max` to `max` where `max = (1 << (K - 1)) - 1`. The behavior of decoding values outside of that range is unspecified.
 
 The output of the filter is three decoded unit vector components, stored as 8-bit or 16-bit normalized integers, and the last input component verbatim.
 
@@ -668,7 +668,7 @@ This filter is only valid if `byteStride` is 4 or 8. When `byteStride` is 4, the
 
 The input to the filter is four 8-bit or 16-bit components, where the first component stores the Y (luma) value as a K-bit unsigned integer, the second and third components store Co/Cg (chrominance) values as K-bit signed integers, and the fourth component stores the alpha value as a K-1-bit unsigned integer with the bit K-1 set to 1 and more significant bits set to 0. K can be determined from the position of the most significant bit of the fourth component. 2 <= K <= 16, signed integers are stored in two's complement format.
 
-When storing a K-bit integer in a 8-bit of 16-bit component when K is less than the component's bit width, the remaining bits (e.g. top 6 bits in case of K=10) must be zero for the first and fourth component, and equal to the sign bit for the second and third component, which are signed; the valid range of the two signed integers is from `-max` to `max` where `max = (1 << (K - 1)) - 1`. The behavior of decoding values outside of that range is unspecified.
+When storing a K-bit integer in a 8-bit or 16-bit component when K is less than the component's bit width, the remaining bits (e.g. top 6 bits in case of K=10) must be zero for the first and fourth component, and equal to the sign bit for the second and third component, which are signed; the valid range of the two signed integers is from `-max` to `max` where `max = (1 << (K - 1)) - 1`. The behavior of decoding values outside of that range is unspecified.
 
 The transformation uses YCoCg encoding; reconstruction of RGB values can be performed in integer space or in floating point space, depending on the implementation. The Y, Co and Cg values must be chosen so that the original RGB values can be reconstructed using 32-bit signed integer math, with the final result fitting into a K-bit unsigned integer ([0..2^K-1]).
 

From d2d5fba57c870dce4094a00a302ed4ad68fb3f7a Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Thu, 11 Dec 2025 09:19:40 -0800
Subject: [PATCH 53/54] Minor typo fixes

---
 .../2.0/Khronos/KHR_meshopt_compression/README.md      | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 20d2558678..0c115e8872 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -30,7 +30,7 @@ Similarly to supercompressed textures (see `KHR_texture_basisu`), this extension
 
 The compressed format is designed to have two properties beyond optimizing compression ratio - very fast decoding (using WebAssembly SIMD, the decoders run at \~1 GB/sec on modern desktop hardware), and byte-wise storage compatible with general-purpose compression. That is, instead of reducing the encoded size as much as possible, the bitstream is constructed in such a way that general-purpose compressor can compress it further.
 
-This is beneficial for typical Web delivery scenarios, where all files are usually using lossess general-purpose compression (gzip, Brotli, Zstandard) - instead of completely replacing it, the codecs here augment it, while still reducing the size (which is valuable to optimize delivery size when general-purpose compression isn't available, and additionally reduces the performance impact of general-purpose decompression which is typically *much slower* than decoders proposed here).
+This is beneficial for typical Web delivery scenarios, where all files are usually using lossless general-purpose compression (gzip, Brotli, Zstandard) - instead of completely replacing it, the codecs here augment it, while still reducing the size (which is valuable to optimize delivery size when general-purpose compression isn't available, and additionally reduces the performance impact of general-purpose decompression which is typically *much slower* than decoders proposed here).
 
 ## Specifying compressed views
 
@@ -179,7 +179,7 @@ When storing vertex data, mode 0 (attributes) should be used; for index data, mo
 
 Using filter 1 (octahedral) for normal/tangent data, and filter 4 (color) for color data, may improve compression ratio further.
 
-While using quantized attributes is recommended for optimal compression, it's also possible to use non-quantized floating point attributes. To increase compression ratio in that case, filter 3 (exponential) is recommended - advanced encoders can additionally constraint the exponent to be the same for all components of a vector, or for all values of the same component across the entire mesh, which can further improve compression ratio.
+While using quantized attributes is recommended for optimal compression, it's also possible to use non-quantized floating point attributes. To increase compression ratio in that case, filter 3 (exponential) is recommended - advanced encoders can additionally constrain the exponent to be the same for all components of a vector, or for all values of the same component across the entire mesh, which can further improve compression ratio.
 
 ## Compressing animation data
 
@@ -203,10 +203,10 @@ When `EXT_mesh_gpu_instancing` extension is used, the instance transform data ca
 
 > This section is non-normative.
 
-This extension expands the compression available by the existing extension `EXT_meshopt_compression`. Since existing tools and pipelines already support that extension, and existing assets already use it, the following guidelines are recommended for content creators and tool authors:
+This extension expands the compression offered by the existing extension `EXT_meshopt_compression`. Since existing tools and pipelines already support that extension, and existing assets already use it, the following guidelines are recommended for content creators and tool authors:
 
 - Tools that already support `EXT_meshopt_compression` extension should keep supporting it alongside this extension to be able to read pre-existing assets.
-- For maximum compabitility, DCC tools should give users a choice to use either variant when exporting assets. The default option should be eventually switched to the KHR variant once most loaders support it.
+- For maximum compatibility, DCC tools should give users a choice to use either variant when exporting assets. The default option should be eventually switched to the KHR variant once most loaders support it.
 - Existing assets that use the EXT variant can be losslessly converted to KHR, if needed, by changing the extension strings inside glTF JSON.
 - When producing assets that target loaders supporting both extensions, using this extension with v1 format should be preferred since it provides better compression ratio at no additional runtime cost.
 
@@ -718,6 +718,6 @@ This extension is derived from `EXT_meshopt_compression` with the following chan
 
 - Vertex data supports an upgraded v1 format which provides more granular bit packing (via control modes) and enhanced delta encoding (via channel modes) to compress data better
 - For compatibility, the v0 format (identical to `EXT_meshopt_compression` format) is still supported; however, use of v1 format is preferred
-- New `COLOR` filter supports lossy color compression at smaller compression ratios using YCoCg encoding
+- New `COLOR` filter supports lossy color compression at higher compression ratios using YCoCg encoding
 
 These improvements achieve better compression ratios for typical glTF content while maintaining the same fast decompression performance.

From a507c991b399b2aeca50de2b38c6170d8e704c29 Mon Sep 17 00:00:00 2001
From: Arseny Kapoulkine <arseny.kapoulkine@gmail.com>
Date: Fri, 23 Jan 2026 09:20:22 -0800
Subject: [PATCH 54/54] Update status to RC

---
 extensions/2.0/Khronos/KHR_meshopt_compression/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
index 0c115e8872..78bc55e3dc 100644
--- a/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
+++ b/extensions/2.0/Khronos/KHR_meshopt_compression/README.md
@@ -9,7 +9,7 @@
 
 ## Status
 
-Draft
+Release Candidate
 
 ## Dependencies