Skip to content

vtgate/buffer: reduce hot-path latency with lock-free shard lookup and atomic state #19801

@harshit-gangal

Description

@harshit-gangal

Summary

The VTGate query buffer is checked on every PRIMARY query. In the normal case (no failover in progress), the check should be near-zero cost. Currently the hot path takes two mutex acquisitions per query — one on the global buffer map and one on the per-shard buffer. Under parallel load this creates measurable contention.

Current Hot Path (no buffering active)

WaitForFailoverEnd()
  → buf.getOrCreateBuffer(keyspace, shard)     // buf.mu.Lock() — global write lock
  → sb.mu.RLock()                               // per-shard read lock
  → shouldBufferLocked() returns false
  → sb.mu.RUnlock()

Every query across all shards contends on buf.mu even though the shard buffer already exists and the operation is read-only.

Proposed Changes

1. Atomic state for fast-path idle check

Replace the sb.mu.RLock() + shouldBufferLocked() + sb.mu.RUnlock() with an atomic state field. The idle check becomes a single atomic.LoadInt32() with no lock.

type shardBuffer struct {
    state atomic.Int32  // read atomically on hot path; written under mu
    // mu still protects queue, timers, etc.
}

func (sb *shardBuffer) waitForFailoverEnd(...) {
    if state(sb.state.Load()) == stateIdle {
        return nil, nil  // no lock needed
    }
    // slow path: take lock
}

State transitions (start buffering, drain) write the atomic under mu — they're already on the slow path.

2. Replace global mutex with sync.Map for shard lookup

The shard map is read on every query (hot) and written only when a shard is first seen (cold — happens once per shard at startup). This is the exact access pattern sync.Map is optimized for.

type Buffer struct {
    buffers sync.Map  // string → *shardBuffer, lock-free reads
}

WaitForFailoverEnd and HandleKeyspaceEvent use Load() — no lock. getOrCreateBuffer uses LoadOrStore — lock-free when the shard already exists.

3. String state → integer state

Current bufferState is a string type. Integer comparison is faster than string comparison on the hot path.

Expected Impact

We implemented these optimizations in a similar buffer system and benchmarked:

Benchmark Before After Change
Idle shard check (serial) ~22 ns ~14 ns -36%
Idle shard check (parallel, 10 goroutines) ~200 ns ~1.7 ns -99%
Multi-shard parallel (8 shards) ~150 ns ~2.3 ns -98%

The parallel improvements are dramatic because the lock contention is completely eliminated — every goroutine reads independently.

Affected Code

  • go/vt/vtgate/buffer/buffer.goBuffer.buffers map + mu mutex
  • go/vt/vtgate/buffer/shard_buffer.goshardBuffer.state field + shouldBufferLocked()
  • go/vt/vtgate/tabletgateway.go — caller of WaitForFailoverEnd

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions