Skip to content

Latest commit

 

History

History
54 lines (37 loc) · 1.63 KB

File metadata and controls

54 lines (37 loc) · 1.63 KB

NATS Leafnode Stale Connection Reproduction

Reported in #7615

Reproduces leafnode stale/reconnect cycling when hub-to-leaf traffic saturates a slow link.

Requirements

  • Docker with compose
  • NATS CLI (nats)

Steps

docker compose up -d && sleep 5

# Create stream with 50MB test data
nats -s nats://app:[email protected]:4222 stream add TEST \
  --subjects="test.>" --storage=file --replicas=1 --defaults
nats -s nats://app:[email protected]:4222 pub test.load \
  "$(head -c 1024 < /dev/zero | tr '\0' 'x')" --count=50000

# Apply traffic shaping (50kbit/s)
docker exec nats-hub tc qdisc add dev eth0 root tbf rate 50kbit burst 16kbit latency 100ms
docker exec nats-leaf tc qdisc add dev eth0 root tbf rate 50kbit burst 16kbit latency 100ms

# Create mirror (triggers replication)
cat > mirror.json << 'EOF'
{"name":"TEST_MIRROR","storage":"file","mirror":{"name":"TEST","external":{"api":"$JS.hub.API"}}}
EOF
nats -s nats://app:[email protected]:4223 stream add --config=mirror.json

# Watch for stale events (~30-60 seconds)
watch -n5 'docker logs nats-hub 2>&1 | grep -E "Stale|Slow" | tail -5'

Expected Output

[INF] Slow Consumer Detected: WriteDeadline of 10s exceeded
[ERR] Leafnode Error 'Stale Connection'

Leafnode cycles stale/reconnect repeatedly. Mirror stuck at 0 messages.

What's Happening

Hub's write buffer fills with replication data. When leaf sends PING, hub queues PONG immediately - but it's stuck behind data that can't drain fast enough. Leaf never receives PONG, detects stale, disconnects. Reconnects, same thing.

Cleanup

docker compose down -v