Potential TP correctness issue: BlockManager capacity may rely on per-rank local KV-cache estimation

Hi, thanks for open-sourcing this project. I was reading through the TP / KV-cache initialization path and noticed a potential correctness issue.

The initialization path seems to be:

1. for each rank `ModelRunner(config, ...)` calls `allocate_kv_cache()`
2. `allocate_kv_cache()` computes `config.num_kvcache_blocks` from local GPU memory state
3. `Scheduler(config)` then initializes `BlockManager(config.num_kvcache_blocks, ...)`

However, under tensor parallelism, the logical KV-cache block space should be consistent across all TP ranks, since the same block ids are used logically for a sequence across ranks, while each rank stores only its local KV-head shard.

From the current code, `num_kvcache_blocks` seem to be updated locally on each rank in the initialization, and finally determine by rank0, but I could not find a synchronization step (such as a distributed min across ranks) before this value is used to initialize `BlockManager`.

This seems risky because different ranks may estimate different local KV-cache capacities depending on memory state, while `BlockManager` requires a globally consistent logical block count.

Potential consequence:
- some block ids may exist on one rank but not another
- logical block allocation could become inconsistent with local physical KV-cache pools under TP

A possible fix might be:
- compute local block capacity on each rank
- synchronize using a distributed min across TP ranks
- use the synchronized global value both for local KV-cache allocation and for `BlockManager` initialization

I may be missing some context, but from the current code path this looks like a potential TP correctness bug.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential TP correctness issue: BlockManager capacity may rely on per-rank local KV-cache estimation #187

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Potential TP correctness issue: BlockManager capacity may rely on per-rank local KV-cache estimation #187

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions