[Bugfix] Zero recycled KV cache blocks for FullAttention models by AjAnubolu · Pull Request #39283 · vllm-project/vllm

AjAnubolu · 2026-04-08T08:50:26Z

Summary

Closes #39146. The KV block zeroing pipeline from #35219 was gated to Mamba-only models; enabling it for FullAttention prevents stale K/V in partial-block tail slots from propagating NaN through masked softmax.

Signed-off-by: AjAnubolu <anuboluajay@gmail.com>

gemini-code-assist

Code Review

This pull request updates the needs_kv_cache_zeroing property to include models using FullAttentionSpec, preventing stale K/V data leakage in partial-block tail slots as identified in issue #39146. A regression test was added to verify this behavior. Feedback suggests using isinstance() for type checking to ensure compatibility with subclasses like MLAAttentionSpec and to follow PEP 8 guidelines.

gemini-code-assist · 2026-04-08T08:51:55Z

+        return self.has_mamba_layers or any(
+            type(g.kv_cache_spec) is FullAttentionSpec for g in self.kv_cache_groups
+        )


Using type(g.kv_cache_spec) is FullAttentionSpec is overly restrictive as it excludes subclasses like MLAAttentionSpec and SinkFullAttentionSpec. These variants of full attention likely suffer from the same stale K/V issues in partial blocks and should also benefit from zeroing. Following PEP 8 recommendations, object type comparisons should use isinstance() instead of comparing types directly, which also ensures consistency with the has_mamba_layers implementation.

Suggested change

return self.has_mamba_layers or any(

type(g.kv_cache_spec) is FullAttentionSpec for g in self.kv_cache_groups

)

return self.has_mamba_layers or any(

isinstance(g.kv_cache_spec, FullAttentionSpec) for g in self.kv_cache_groups

)

Signed-off-by: AjAnubolu <anuboluajay@gmail.com>

mergify · 2026-04-13T03:45:50Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @AjAnubolu.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2026-05-15T11:09:01Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @AjAnubolu.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Zero recycled KV blocks for FullAttention models (vllm-project#39146)

1ad6786

Signed-off-by: AjAnubolu <anuboluajay@gmail.com>

AjAnubolu requested review from ApostaC, WoosukKwon, alexm-redhat, heheda12345, njhill, orozery, robertgshaw2-redhat and ywang96 as code owners April 8, 2026 08:50

mergify Bot added v1 bug Something isn't working labels Apr 8, 2026

gemini-code-assist Bot reviewed Apr 8, 2026

View reviewed changes

Use isinstance for FullAttentionSpec check (review)

01b9f9e

Signed-off-by: AjAnubolu <anuboluajay@gmail.com>

Yunzez mentioned this pull request Apr 11, 2026

[Bug]: KV Cache Read/Write Index Corruption Under Concurrent Prefill of Variable-Length Sequences (vLLM V1, FlashInfer) #39589

Open

1 task

parasol-aser mentioned this pull request Apr 11, 2026

[Bugfix] Zero block_table row tail to fix concurrent variable-length prefill non-determinism (#39589) #39591

Open

7 tasks

mergify Bot added the needs-rebase label Apr 13, 2026

mergify Bot removed the needs-rebase label May 15, 2026

mergify Bot added the needs-rebase label May 15, 2026

ranjitkumar5-at-acm-dot-org mentioned this pull request May 27, 2026

[Bugfix][V1] Zero recycled KV cache blocks for FullAttentionSpec to fix non-deterministic output at temperature=0 #43741

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Zero recycled KV cache blocks for FullAttention models#39283

[Bugfix] Zero recycled KV cache blocks for FullAttention models#39283
AjAnubolu wants to merge 2 commits into
vllm-project:mainfrom
AjAnubolu:fix/v1-kv-block-recycle-stale-state-no-prefix-cache

AjAnubolu commented Apr 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 8, 2026

Uh oh!

mergify Bot commented Apr 13, 2026

Uh oh!

mergify Bot commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

AjAnubolu commented Apr 8, 2026

Summary

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented Apr 13, 2026

Uh oh!

mergify Bot commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant