Skip to content

Commit 4bfc250

Browse files
feat(rate-limiter): pluggable algorithms, tenant isolation fix, and scale load test
- Add pluggable algorithm strategy: fixed_window, sliding_window, token_bucket - Add Redis backend for shared cross-instance rate limiting - Fix tenant isolation: skip by_tenant when tenant_id is None - Fix sliding window: sweep expired timestamps before counting - Fix backend validation: restore _validate_config check - Fix token bucket memory path: apply max(1,...) guard to reset timestamp - Add Redis integration tests for all three algorithms - Add direct regression tests for get_current_user tenant_id fallback - Add scale load test with Redis memory timeline and live algorithm detection - Add RL_PACE_MULTIPLIER for near-limit pace testing and boundary burst detection - Remove redundant algorithm locustfile; scale file is canonical - Correct stale comments and README limitations Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
1 parent cd241b2 commit 4bfc250

12 files changed

Lines changed: 4291 additions & 386 deletions

File tree

Makefile

Lines changed: 39 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2320,7 +2320,11 @@ load-test-agentgateway-mcp-server-time: ## Load test external MCP server (loc
23202320
# help: load-test-mcp-protocol-heavy - MCP-only protocol heavy test (500 users, 5min)
23212321

23222322
MCP_PROTOCOL_LOCUSTFILE ?= tests/loadtest/locustfile_mcp_protocol.py
2323-
MCP_RATE_LIMITER_LOCUSTFILE ?= tests/loadtest/locustfile_rate_limiter.py
2323+
MCP_RATE_LIMITER_LOCUSTFILE ?= tests/loadtest/locustfile_rate_limiter_backend_correctness.py
2324+
MCP_RATE_LIMITER_SCALE_LOCUSTFILE ?= tests/loadtest/locustfile_rate_limiter_scale.py
2325+
RL_ALGORITHM ?= fixed_window
2326+
RL_USERS ?= 100
2327+
RL_SPAWN_RATE ?= 10
23242328
MCP_PROTOCOL_HOST ?= http://localhost:4444
23252329
MCP_BENCHMARK_HOST ?= http://localhost:8080
23262330
MCP_BENCHMARK_SERVER_ID ?= 9779b6698cbd4b4995ee04a4fab38737
@@ -2437,6 +2441,40 @@ benchmark-rate-limiter: ## Rate limiter correctness test (1
24372441
--only-summary \
24382442
RateLimitedUser || true'
24392443

2444+
2445+
# help: benchmark-rate-limiter-scale - Multi-user scale test showing Redis memory divergence across algorithms
2446+
.PHONY: benchmark-rate-limiter-scale
2447+
RL_RUN_TIME ?= 300s
2448+
benchmark-rate-limiter-scale: ## Scale test: 500 unique users, Redis memory timeline per algorithm
2449+
@echo "📈 Running rate limiter scale test (resource divergence)..."
2450+
@echo " Algorithm: $(RL_ALGORITHM) (must match plugins/config.yaml)"
2451+
@echo " Users: $(RL_USERS) unique identities (each creates own Redis key)"
2452+
@echo " Spawn: $(RL_SPAWN_RATE) users/s"
2453+
@echo " Limit: $(RL_LIMIT_PER_MIN) req/min per user"
2454+
@echo " Duration: $(RL_RUN_TIME) (includes ~40s bootstrap for user registration)"
2455+
@echo ""
2456+
@echo " Redis memory diverges between algorithms as users ramp up:"
2457+
@echo " fixed_window: ~0.1-0.3 KiB/key (single integer)"
2458+
@echo " sliding_window: ~1-3 KiB/key (sorted set, $(RL_LIMIT_PER_MIN) entries)"
2459+
@echo " token_bucket: ~0.2 KiB/key (hash: tokens + last_refill)"
2460+
@test -d "$(VENV_DIR)" || $(MAKE) venv
2461+
@/bin/bash -eu -o pipefail -c 'source $(VENV_DIR)/bin/activate && \
2462+
LOCUST_LOG_LEVEL=ERROR \
2463+
RL_ALGORITHM=$(RL_ALGORITHM) \
2464+
RL_LIMIT_PER_MIN=$(RL_LIMIT_PER_MIN) \
2465+
RL_USERS=$(RL_USERS) \
2466+
RL_SPAWN_RATE=$(RL_SPAWN_RATE) \
2467+
RL_RUN_TIME=$(RL_RUN_TIME) \
2468+
MCP_SERVER_ID=$(MCP_BENCHMARK_SERVER_ID) \
2469+
locust -f $(MCP_RATE_LIMITER_SCALE_LOCUSTFILE) \
2470+
--host=$(MCP_BENCHMARK_HOST) \
2471+
--users=$(RL_USERS) \
2472+
--spawn-rate=$(RL_SPAWN_RATE) \
2473+
--run-time=$(RL_RUN_TIME) \
2474+
--headless \
2475+
--only-summary \
2476+
ScaleComparisonUser || true'
2477+
24402478
.PHONY: benchmark-mcp-mixed-300
24412479
benchmark-mcp-mixed-300: ## Distributed 300-user mixed MCP benchmark
24422480
@echo "📊 Running distributed mixed MCP benchmark..."

docs/docs/using/plugins/plugins.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ Plugins for improving system reliability, performance, and resource management.
4040
|--------|------|-------------|
4141
| [Circuit Breaker](https://github.com/IBM/mcp-context-forge/tree/main/plugins/circuit_breaker) | Native | Trips per-tool breaker on high error rates or consecutive failures and blocks during cooldown |
4242
| [Watchdog](https://github.com/IBM/mcp-context-forge/tree/main/plugins/watchdog) | Native | Enforces maximum runtime for tools with warn or block actions on threshold violations |
43-
| [Rate Limiter](https://github.com/IBM/mcp-context-forge/tree/main/plugins/rate_limiter) | Native | Fixed-window in-memory rate limiting by user, tenant, or tool |
43+
| [Rate Limiter](https://github.com/IBM/mcp-context-forge/tree/main/plugins/rate_limiter) | Native | Per-user, tenant, and tool rate limiting with selectable algorithms (fixed_window, sliding_window, token_bucket) and memory or Redis backends |
4444
| [Cached Tool Result](https://github.com/IBM/mcp-context-forge/tree/main/plugins/cached_tool_result) | Native | Caches idempotent tool results in-memory with configurable TTL and key fields |
4545
| [Response Cache by Prompt](https://github.com/IBM/mcp-context-forge/tree/main/plugins/response_cache_by_prompt) | Native | Advisory response cache using cosine similarity over prompt/input fields with configurable threshold |
4646
| [Retry with Backoff](https://github.com/IBM/mcp-context-forge/tree/main/plugins/retry_with_backoff) | Native | Annotates retry/backoff policy in metadata with exponential backoff on specific HTTP status codes |

mcpgateway/auth.py

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -998,11 +998,12 @@ async def _set_auth_method_from_payload(payload: dict) -> None:
998998
# Get plugin contexts from request state if available
999999
global_context = getattr(request.state, "plugin_global_context", None) if request else None
10001000
if not global_context:
1001-
# Create global context
1001+
# Propagate team_id → tenant_id for by_tenant rate limiting
1002+
team_id = getattr(getattr(request, "state", None), "team_id", None) if request else None
10021003
global_context = GlobalContext(
10031004
request_id=request_id,
10041005
server_id=None,
1005-
tenant_id=None,
1006+
tenant_id=team_id,
10061007
)
10071008

10081009
context_table = getattr(request.state, "plugin_context_table", None) if request else None
@@ -1532,5 +1533,11 @@ def _inject_userinfo_instate(request: Optional[object] = None, user: Optional[Em
15321533
global_context.user["is_admin"] = user.is_admin
15331534
global_context.user["full_name"] = user.full_name
15341535

1536+
# Propagate team_id → tenant_id for by_tenant rate limiting (only when not already set)
1537+
if request and global_context.tenant_id is None:
1538+
team_id = getattr(getattr(request, "state", None), "team_id", None)
1539+
if team_id:
1540+
global_context.tenant_id = team_id
1541+
15351542
if request and global_context:
15361543
request.state.plugin_global_context = global_context

plugins/config.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -222,6 +222,7 @@ plugins:
222222
redis_url: "redis://redis:6379/0"
223223
redis_key_prefix: "rl"
224224
redis_fallback: true
225+
algorithm: "fixed_window"
225226
by_user: "30/m"
226227
by_tenant: "3000/m"
227228
by_tool: {}

plugins/rate_limiter/README.md

Lines changed: 64 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
> Author: Mihai Criveti
44
> Version: 0.1.0
55
6-
Enforces fixed-window rate limits per user, tenant, and tool across `tool_pre_invoke` and `prompt_pre_fetch` hooks. Supports an in-process memory backend (single-instance) and a Redis backend (shared across all gateway instances).
6+
Enforces rate limits per user, tenant, and tool across `tool_pre_invoke` and `prompt_pre_fetch` hooks. Supports pluggable counting algorithms (fixed window, sliding window, token bucket), an in-process memory backend (single-instance), and a Redis backend (shared across all gateway instances).
77

88
## Hooks
99

@@ -32,6 +32,9 @@ If any configured dimension is exceeded, the plugin returns a violation with HTT
3232
search: "10/m"
3333
summarise: "5/m"
3434

35+
# Algorithm — choose one (default: fixed_window)
36+
algorithm: "fixed_window" # fixed_window | sliding_window | token_bucket
37+
3538
# Backend — choose one
3639
backend: "memory" # default: single-process, resets on restart
3740
# backend: "redis" # shared across all gateway instances
@@ -49,6 +52,7 @@ If any configured dimension is exceeded, the plugin returns a violation with HTT
4952
| `by_user` | string | `null` | Per-user rate limit, e.g. `"60/m"` |
5053
| `by_tenant` | string | `null` | Per-tenant rate limit, e.g. `"600/m"` |
5154
| `by_tool` | dict | `{}` | Per-tool overrides, e.g. `{"search": "10/m"}` |
55+
| `algorithm` | string | `"fixed_window"` | Counting algorithm: `"fixed_window"`, `"sliding_window"`, or `"token_bucket"` |
5256
| `backend` | string | `"memory"` | `"memory"` or `"redis"` |
5357
| `redis_url` | string | `null` | Redis connection URL (required when `backend: redis`) |
5458
| `redis_key_prefix` | string | `"rl"` | Prefix for all Redis keys |
@@ -69,18 +73,44 @@ Every request (allowed or blocked) includes:
6973
| `X-RateLimit-Reset` | Unix timestamp when the current window resets |
7074
| `Retry-After` | Seconds until the window resets (blocked requests only) |
7175

76+
## Algorithms
77+
78+
Three counting algorithms are available, selected via the `algorithm` config field.
79+
80+
| Algorithm | Config value | Best for | Trade-off |
81+
|---|---|---|---|
82+
| Fixed window | `fixed_window` | General use, lowest overhead | Up to 2× the limit at window boundaries |
83+
| Sliding window | `sliding_window` | Smooth enforcement, no boundary burst | Higher memory: stores one timestamp per request per key |
84+
| Token bucket | `token_bucket` | Bursty workloads — allows short spikes up to capacity | Slightly higher Redis overhead: stores `{tokens, last_refill}` hash per key |
85+
86+
### Fixed window (default)
87+
88+
Counts requests in a fixed time slot (e.g. "minute 14:03"). Resets at the slot boundary. Simple and fast. The 2× burst at a boundary (N requests at the end of slot T, N requests at the start of T+1) is a known trade-off; use `by_user` with headroom if this matters.
89+
90+
### Sliding window
91+
92+
Stores a timestamp for every request in the current window. At each check, expired timestamps are discarded and the remaining count is compared against the limit. Prevents boundary bursts entirely. Memory usage grows with request volume — roughly one float per request per active key.
93+
94+
### Token bucket
95+
96+
Each identity (user, tenant, tool) has a bucket that holds up to `count` tokens. Tokens refill at a steady rate of `count/window`. A request consumes one token. Bursts up to the bucket capacity are allowed; sustained rate above `count/window` is rejected. Useful for APIs where short spikes are acceptable but sustained overload is not.
97+
98+
**Redis support:** `token_bucket` with `backend: redis` is fully supported. The plugin stores `{tokens, last_refill}` in a Redis hash per key and uses an atomic Lua script to refill and consume tokens in a single round-trip — the same pattern as the other two algorithms. This means `token_bucket` enforces a true cluster-wide limit in multi-instance deployments.
99+
72100
## Backends
73101

74102
### Memory backend (default)
75103

76104
- Counters are stored in a process-local dict (`_store`)
77105
- An `asyncio.Lock` serialises all counter reads and writes — safe under concurrent asyncio tasks
78-
- A background sweep task evicts expired windows every 0.5s — memory is bounded to active windows only
106+
- A background sweep task evicts expired windows every 0.5s — for `fixed_window` and `token_bucket`, expired entries are removed promptly; for `sliding_window`, keys with fully stale timestamps are evicted by the sweep
79107
- **Limitation:** state is not shared across processes or hosts. In a multi-instance deployment (e.g. 3 gateway instances behind nginx), each instance tracks its own counter — the effective limit is `N × configured_limit`
80108

81109
### Redis backend
82110

83-
- Counters are stored in Redis using an atomic Lua `INCR`+`EXPIRE` script — a single Redis call per check with no race condition
111+
- `fixed_window`: atomic Lua `INCR`+`EXPIRE` — one Redis round-trip per check, no race condition
112+
- `sliding_window`: atomic Lua `ZADD`+`ZREMRANGEBYSCORE`+`ZCARD`+`EXPIRE` — one round-trip, no race condition
113+
- `token_bucket`: atomic Lua script — reads `{tokens, last_refill}` hash, refills proportionally, consumes 1 token, writes back — one round-trip, no race condition
84114
- All gateway instances share the same counter — the configured limit is the true cluster-wide limit
85115
- Requires `redis_url` to be set
86116
- If `redis_fallback: true` (default) and Redis is unavailable, the plugin falls back to the in-process `MemoryBackend` automatically — requests are never blocked due to Redis downtime
@@ -111,6 +141,34 @@ config:
111141
search: "10/m"
112142
```
113143

144+
### Sliding window (no boundary bursts)
145+
146+
```yaml
147+
config:
148+
algorithm: "sliding_window"
149+
by_user: "30/m"
150+
by_tenant: "300/m"
151+
```
152+
153+
### Token bucket — memory backend (default)
154+
155+
```yaml
156+
config:
157+
algorithm: "token_bucket"
158+
by_user: "30/m" # bucket holds 30 tokens, refills at 30/min
159+
```
160+
161+
### Token bucket — Redis backend (multi-instance)
162+
163+
```yaml
164+
config:
165+
algorithm: "token_bucket"
166+
backend: "redis"
167+
redis_url: "redis://redis:6379/0"
168+
redis_fallback: true
169+
by_user: "30/m"
170+
```
171+
114172
### Permissive mode (observe without blocking)
115173

116174
```yaml
@@ -126,7 +184,9 @@ In `permissive` mode the plugin records violations and emits `X-RateLimit-*` hea
126184
| Limitation | Severity | Status |
127185
|---|---|---|
128186
| Memory backend not shared across processes | HIGH | Use Redis backend for multi-instance deployments |
129-
| Fixed window allows up to 2× limit at window boundary | LOW | Deferred — use `by_user` with headroom as a workaround |
187+
| Fixed window allows up to 2× limit at window boundary | LOW | Use `sliding_window` algorithm, or use `by_user` with headroom |
188+
| `by_tool` matching is case-sensitive | LOW | Fixed — tool names are now normalised with `.strip().lower()` at init |
189+
| Whitespace-only user identity bypasses anonymous bucket | LOW | Documented gap; strip identities before passing to hooks |
130190
| No per-server limits (`server_id` dimension missing) | LOW | Not implemented |
131191
| No config hot-reload — rate string changes require restart | LOW | Not implemented |
132192
| Memory backend not safe under threaded workers (gunicorn `--threads`) | LOW | asyncio.Lock is loop-safe; use async workers (`-k uvicorn`) |

0 commit comments

Comments
 (0)