The prefix cache identifies KV blocks by a content-based hash. If two different token sequences produce the same hash (collision), get_cached_block returns the wrong block with no error, causing the model to attend over incorrect KV values and producing corrupted output.
Probability of this occuring is low but non-zero, and it's a silent failure.
Cause
KVCacheBlock stores only m_hash. On a cache hit, there is no way to verify the block actually corresponds to the expected tokens. The code has a // TODO: add tokens validation in case of hash collision for this. I intend to work on this.
The prefix cache identifies KV blocks by a content-based hash. If two different token sequences produce the same hash (collision), get_cached_block returns the wrong block with no error, causing the model to attend over incorrect KV values and producing corrupted output.
Probability of this occuring is low but non-zero, and it's a silent failure.
Cause
KVCacheBlock stores only m_hash. On a cache hit, there is no way to verify the block actually corresponds to the expected tokens. The code has a
// TODO: add tokens validation in case of hash collisionfor this. I intend to work on this.