feat(rate-limiter): pluggable algorithms with Rust-backed execution engine, benchmarks, and validation by gandhipratik203 · Pull Request #3809 · IBM/mcp-context-forge

gandhipratik203 · 2026-03-23T21:30:45Z

Summary

Consolidates and extends the rate limiter plugin with a Rust-backed execution engine, carrying forward the Python-side foundation work from #3783 (pluggable algorithms, tenant isolation, correctness fixes) and replacing the Python algorithm/backend implementation with a high-performance Rust engine for both memory and redis backends.

The Rust engine is the preferred execution path — all algorithm execution, backend dispatch, result aggregation, and response construction live in Rust when the rate_limiter_rust PyO3 extension is installed. Python retains ownership of plugin lifecycle, hook integration, request-context extraction, and config validation. The full Python algorithm and backend implementation is retained as a fallback when the Rust extension is unavailable or when RATE_LIMITER_FORCE_PYTHON=1 is set.

The Rust engine exposes a high-level check() API that reduces the Python-Rust boundary to a single call returning pre-built response dicts. This keeps the existing plugin integration model intact while reducing request-path overhead and preserving shared-counter semantics for multi-instance deployments.

Gaps closed

Python foundation

Gap 1 (HIGH) — No algorithm choice: only fixed_window was available — no way to handle boundary bursts or bursty workloads. Fixed by introducing a strategy pattern with three selectable algorithms (fixed_window, sliding_window, token_bucket) configurable via the algorithm: field. All three run on both memory and redis backends; the Redis backend was extended with atomic Lua scripts for sliding_window and token_bucket to maintain cluster-wide enforcement across instances.

test_fixed_window_burst_at_boundary (xfail) documents the boundary burst as a known trade-off for users who stay on fixed_window; sliding_window eliminates it entirely.

Gap 2 (HIGH) — by_tenant cross-throttle: tenant_id=None fell back to a shared "tenant:default" bucket, causing unrelated users to throttle each other across the entire deployment. Fixed in two places: (1) the plugin now skips the by_tenant check entirely when tenant_id is None instead of inventing a phantom bucket; (2) mcpgateway/auth.py now propagates request.state.team_id → global_context.tenant_id unconditionally via _propagate_tenant_id(), called at all four return paths in get_current_user(), so single-team API tokens correctly populate tenant context for rate limiting regardless of the include_user_info setting.

Gap 3 (MEDIUM) — Sliding window memory leak in sweep(): stale-but-non-empty timestamp lists were never evicted from _store because sweep() only removed empty lists. Fixed by embedding the window size in each store key ("{key}:{window}") so staleness is computable at sweep time, and rewriting sweep() to evict any key where all timestamps fall outside the window.

Rust execution engine

Gap 4 (HIGH) — Python hot path overhead: every request still paid Python-side rate evaluation costs, including per-dimension orchestration, repeated backend calls, and response metadata construction via individual PyO3 attribute accesses. Fixed by introducing a Rust RateLimiterEngine with a single check() call per hook invocation. The engine builds dimension keys internally from its pre-parsed config, evaluates all active dimensions in one batch, and returns pre-built Python dicts for HTTP headers and plugin metadata — eliminating ~18-25 PyO3 boundary crossings per request (depending on the number of active dimensions).

Gap 5 (HIGH) — Rust acceleration was limited to the in-memory backend: Redis-backed deployments, which are the correctness-critical path for multi-instance enforcement, still relied on the Python backend. Fixed by adding a Redis backend to the Rust engine. Rust now owns the Redis connection and executes batch Lua scripts directly, preserving shared-counter behavior across replicas.

Gap 6 (MEDIUM) — Rate-limit computation, response shaping, and plugin policy handling were still interleaved in the Python implementation. Fixed by making the Python wrapper policy-only: it validates config, normalizes context, and invokes the engine. Algorithm execution, backend dispatch, result aggregation, and response dict construction now all live in Rust. The Python hot path is reduced to a single check() call plus result dispatch.

Gap 7 (LOW) — The Rust+Redis path needed end-to-end proof under a real multi-instance deployment. Validated with the backend-correctness load test against 3 gateways behind nginx with Redis shared state. Result: 60 allowed, 60 rate-limited, 49.6% blocked, matching the expected ~50% shared-counter behavior.

Additional hardening

_parse_rate robustness — was rate.split("/") with no error handling; now uses maxsplit=1 + try/except with a clear message showing the bad input
prompt_pre_fetch normalisation — prompt_id now normalised with .strip().lower(), consistent with how tool_pre_invoke handles tool names
_normalised_by_tool pre-computed — by_tool key lowercasing was re-done on every hook call; now computed once at init in _validate_config
Unconditional tenant propagation — _propagate_tenant_id() copies request.state.team_id → global_context.tenant_id at every get_current_user() return path, not just inside _inject_userinfo_instate() which is gated by include_user_info (default False)
Sliding-window Retry-After >= 1 — int truncation of (oldest_ts + window - now) could produce 0 when the oldest timestamp plus the window rounded down to int(now), telling clients to retry immediately while still over limit; fixed on both memory and Redis paths with max(1, reset_in)
Token-bucket first-request memory/Redis parity — the memory path hard-coded time_to_full = window on first request, while Redis derived it from tokens_needed / refill_rate, causing metadata divergence between backends (e.g. 60 vs ~6 for a 10/m limit); both paths now use tokens_needed / refill_rate
Config pre-parsing at startup — rate strings are parsed once during plugin initialization so the request path does not re-parse limits on every hook invocation
Rolling-upgrade compatibility — the Rust Redis backend uses the same key format as the Python Redis backend so both implementations share counters correctly during mixed deployments
Async Redis execution path — Redis-backed Rust evaluation now uses a multiplexed async Redis connection and an async PyO3 boundary so Python async hooks can await the Redis path directly
Rust engine preferred with Python fallback — when the rate_limiter_rust PyO3 extension is installed, the Rust engine handles all rate evaluation; when unavailable or when RATE_LIMITER_FORCE_PYTHON=1 is set, the plugin falls back to the full Python algorithm and backend implementation
RATE_LIMITER_FORCE_PYTHON env var — allows operators to force the Python backend for A/B comparisons or debugging; Makefile targets benchmark-rate-limiter-capacity-rust and benchmark-rate-limiter-capacity-python exercise both paths
Reduced lock contention on memory hot path — the Rust MemoryStore now acquires one read lock on the outer map + one write lock on the per-key state per steady-state request (previously two outer read locks), eliminating a redundant lock cycle
Response construction in Rust — HTTP headers and plugin metadata dicts are built inside the Rust engine after GIL reacquisition, replacing ~18-25 individual PyO3 attribute accesses from the Python side (depending on dimension count)

Architecture

The plugin now delegates all rate evaluation to Rust:

Python owns plugin lifecycle, hook integration, config validation, and request-context extraction
Rust owns dimension selection, algorithm execution, backend dispatch, result aggregation, and response dict construction
Backend selection is an engine concern:
- memory → Rust in-process store via the existing sync engine path
- redis → Rust async Redis execution via a multiplexed connection + batch Lua execution
When the rate_limiter_rust PyO3 extension is installed, Rust handles all rate evaluation; otherwise the plugin falls back to the Python backend

Plugin internals: request flow, wrapper responsibilities, Rust engine, backends, and response shaping

┌──────────────────────────────────────────────────────────────────────┐
│                         RateLimiterPlugin                           │
│                                                                      │
│  Hooks:  tool_pre_invoke            prompt_pre_fetch                │
│          (tool name + context)      (prompt_id + context)           │
└─────────────────────────────┬────────────────────────────────────────┘
                              │
                              │  Python responsibilities:
                              │  - validate config (at init)
                              │  - normalize by_tool keys (at init)
                              │  - extract user / tenant / tool|prompt
                              ▼
┌──────────────────────────────────────────────────────────────────────┐
│                     RustRateLimiterEngine wrapper                    │
│                        (PyO3 boundary)                               │
│                                                                      │
│  Python calls the Rust engine once per request:                     │
│  - check(user, tenant, tool, now_unix) for memory                   │
│  - await check_async(user, tenant, tool, now_unix) for redis        │
│                                                                      │
│  Returns: (allowed, headers_dict, meta_dict)                        │
│  Rust builds dimension keys, evaluates, and constructs response     │
│  dicts internally — no per-attribute PyO3 accesses needed.          │
└─────────────────────────────┬────────────────────────────────────────┘
                              │
                              ▼
┌──────────────────────────────────────────────────────────────────────┐
│                        RateLimiterEngine (Rust)                      │
│                                                                      │
│  Owns:                                                               │
│  - parsed config (by_user, by_tenant, by_tool)                      │
│  - dimension key construction (user:X, tenant:Y, tool:Z)            │
│  - backend selection + dispatch                                      │
│  - algorithm execution                                               │
│  - per-dimension result calculation                                  │
│  - aggregated EvalResult (most restrictive dimension)               │
│  - HTTP header dict construction (X-RateLimit-*, Retry-After)       │
│  - plugin metadata dict construction (remaining, reset_in, dims)    │
└─────────────────────────────┬────────────────────────────────────────┘
                              │
              ┌───────────────┴────────────────┐
              │                                │
              ▼                                ▼
┌───────────────────────────────┐   ┌──────────────────────────────────┐
│      Memory backend (Rust)    │   │       Redis backend (Rust)      │
│                               │   │                                  │
│  in-process counters          │   │  owns multiplexed Redis conn   │
│  fixed_window                 │   │  batch Lua execution            │
│  sliding_window               │   │  shared counters across replicas│
│  token_bucket                 │   │  compatible key format with     │
│  single read-lock hot path    │   │  Python Redis backend           │
└───────────────────────────────┘   └──────────────────────────────────┘
                              │
                              ▼
┌──────────────────────────────────────────────────────────────────────┐
│                     Python result dispatch                           │
│                                                                      │
│  allowed -> PromptPrehookResult / ToolPreInvokeResult + headers     │
│  blocked -> PluginViolation(429) + headers + meta                   │
│                                                                      │
│  (headers and meta are pre-built dicts from Rust — no conversion)   │
└──────────────────────────────────────────────────────────────────────┘

Test results

Validation for this PR is intentionally comprehensive: unit tests, Rust tests, micro-benchmarks, hook-path A/B comparisons, multi-instance Redis correctness, sustained load tests, algorithm comparison, and Fyre VM deployment validation are all included below.

The highest-signal correctness and unit-test summaries remain visible. The more detailed benchmark and deployment sections are preserved in expandables so the data stays in the PR without dominating the main narrative.

Test results summary

#	Area	Headline result
1	Backend correctness	`49.6%` blocked vs expected `~50%`, `0` infra failures
2	Rust micro-benchmarks	`17-21 ns` single-key, `49 ns` three-dimension
3	Hook-path comparison	`1.7x-1.9x` for most 3-dim memory paths, `28x` for `sliding_window`
4	Redis capacity benchmark	p99 `60 ms` vs `100 ms`, `28%` less gateway memory
5	Multi-user scale test	`47.6%` blocked, `0` infra failures across `28,650` requests
6	Algorithm comparison	All 3 algorithms at `~47-49%` blocked across `~28,600` requests each
7	Fyre VM validation	Rust engine validated on x86 Fyre
8	Unit test suite	`93` plugin tests passed, `47/47` Rust tests passed

1. Backend correctness (multi-instance Redis)

Multi-instance Redis correctness matched the expected shared-counter behavior: 49.6% blocked, 0 infrastructure failures, and an exact 60 allowed / 60 rate-limited split in the single-user validation run.

Full backend correctness results

Topology: nginx → 3 gateways → shared Redis, 1 user at 60 req/min against a 30/min limit.

Metric	Result
Total tool calls	`121`
Allowed	`60`
Rate-limited	`60`
Blocked percentage	`49.6%`
Infrastructure failures	`0`
Avg response time	`49.7 ms`
p90 / p99	`76 ms` / `88 ms`

Verdict: REDIS BACKEND — limit correctly enforced. The shared Redis counter produced the expected ~50% blocked rate across three gateway instances. Zero infrastructure failures.

2. Rust micro-benchmarks (criterion, Apple Silicon M-series, release build)

Criterion benchmarks on Apple Silicon show nanosecond-scale in-process memory-backend costs: 17-21 ns for single-key operations, 49 ns for three-dimension evaluation, and 16-67 µs in the contention scenarios included here.

Full micro-benchmark results

In-process memory backend, direct MemoryStore calls (no PyO3 overhead):

Benchmark	Median latency	Notes
`fixed_window/single_key`	19 ns	single dimension, window reset each iteration
`token_bucket/single_key`	21 ns
`sliding_window/single_key`	17 ns
`fixed_window/three_dims`	49 ns	user + tenant + tool in one call
`fixed_window/hot_counter`	16 ns	counter near limit (steady-state path)
`fixed_window/blocked_path`	16 ns	counter past limit (reject path)
`fixed_window/many_keys_10k`	185 ns	HashMap with 10,000 distinct keys
`fixed_window/concurrent_2t`	22 µs	2 threads via `std::thread::scope`
`fixed_window/concurrent_4t`	35 µs	4 threads
`fixed_window/concurrent_8t`	67 µs	8 threads

3. Hook-path comparison (Python pytest-benchmark, memory backend)

Measured per-hook Python→Rust round-trip overhead via pytest-benchmark, comparing the Rust engine check() path against the older Python-only algorithm+backend stack. Numbers reflect the full hook path including PyO3 crossing, dict construction, and Python result dispatch.

Full hook-path comparison results

All benchmarks: fixed_window, sliding_window, token_bucket × single dimension, three dimensions.

Config	Python mean	Rust mean	Speedup
`fixed_window` / 1 dim	5.1 µs	2.8 µs	1.8×
`fixed_window` / 3 dim	13.2 µs	7.2 µs	1.8×
`sliding_window` / 1 dim	14.3 µs	2.9 µs	4.9×
`sliding_window` / 3 dim	81.3 µs	2.9 µs	28×
`token_bucket` / 1 dim	5.6 µs	2.8 µs	2.0×
`token_bucket` / 3 dim	14.9 µs	7.8 µs	1.9×

The sliding_window improvement is the most dramatic because the Python implementation walks and filters timestamp lists in pure Python on every check, while Rust uses a VecDeque with amortized O(1) front pops.

4–8. Remaining validation

Redis capacity benchmark, multi-user scale, algorithm comparison, Fyre VM, unit tests

4. Redis capacity benchmark — p99 latency 60 ms (vs 100 ms Python), 28% less gateway RSS. Three gateways, shared Redis, 100 users at 0.25 rps for 5 minutes.

5. Multi-user scale test — 47.6% blocked across 28,650 requests with 0 infrastructure failures. 100 concurrent users, 5-minute sustained load.

6. Algorithm comparison — All three algorithms (fixed_window, sliding_window, token_bucket) converged to ~47-49% blocked across ~28,600 requests each, confirming equivalent enforcement behavior.

7. Fyre VM validation — Rust engine validated on x86 Fyre VM deployment.

8. Unit test suite — 93 Python plugin tests passed, 47/47 Rust tests passed.

lucarlig

I found one issue to address before merge: the new tenant_id propagation still does not make by_tenant limits work for session-authenticated users.

lucarlig · 2026-03-26T15:32:09Z

https://docs.rs/pyo3-log/latest/pyo3_log/
please add logs with pyo3_logs

Two optimizations informed by the rate limiter PR (#3809) patterns: 1. Batch list processing: for all-string lists in truncate mode, extract all &str borrows in one pass, process in a tight Rust loop, build output PyList in a single pass. Better cache locality and avoids per-item path string formatting and interleaved append calls. 2. Pre-sized String::with_capacity(): eliminate reallocation during truncation by pre-computing body + ellipsis size. Results: - Short list passthrough: 13.6x → 18.9x faster - List 10x10KB: 2.6x → 3.0x faster - Deep nested dict: 7.1x → 7.0x faster (stable) - Wide nested dict: 8.4x → 8.5x faster (stable) - 331 Python tests + 47 Rust tests pass Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

jonpspri · 2026-03-30T08:05:50Z

GTG

dima-zakharov

Github show that no new commits after review.

…cale load test - Add pluggable algorithm strategy: fixed_window, sliding_window, token_bucket - Add Redis backend for shared cross-instance rate limiting - Fix tenant isolation: skip by_tenant when tenant_id is None - Fix sliding window: sweep expired timestamps before counting - Fix backend validation: restore _validate_config check - Fix token bucket memory path: apply max(1,...) guard to reset timestamp - Add Redis integration tests for all three algorithms - Add direct regression tests for get_current_user tenant_id fallback - Add scale load test with Redis memory timeline and live algorithm detection - Add RL_PACE_MULTIPLIER for near-limit pace testing and boundary burst detection - Remove redundant algorithm locustfile; scale file is canonical - Correct stale comments and README limitations Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

…and validation - Rust-backed sliding window engine with pyo3-log integration - check() API with tenant propagation, sweep/retry-after support - Eliminate redundant ZRANGE in sliding window Lua script - Fix detect-secrets baseline for rate limiter load tests - Clarify memory backend is single-instance only in docs Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

…ity tests - Extract _dispatch_hook() shared by prompt_pre_fetch and tool_pre_invoke, reducing each hook to a single-line wrapper - Elevate Redis val_i64/val_f64 parse-error logging from warn to error so silent fail-open degradation surfaces in operator dashboards - Clamp sliding-window reset_timestamp with .max(1) so it is always strictly in the future even when the oldest entry expires in < 1 s - Add 5 s tokio::time::timeout around Redis connection establishment to prevent indefinite blocking on network partition - Replace silent except-pass in EVALSHA SHA tracking with logger.debug - Document dual Lua-script invariant (rolling-upgrade key-format parity) in both Python RedisBackend docstring and Rust redis_backend.rs header - Add 7 parametrized test_redis_key_format_parity_* tests validating that Python and Rust produce identical Redis keys for the same inputs - Revert unrelated .pyi stub changes for encoded_exfil_detection, pii_filter, retry_with_backoff, and secrets_detection Signed-off-by: Jonathan Springer <jps@s390x.com>

…e/ralph-loop.local.md - Remove plugins_rust/rate_limiter/.claude/ralph-loop.local.md which was accidentally committed — this is a local Claude Code loop state file and should never have been checked in. - Fix trailing whitespace in plugins_rust/rate_limiter/python/ rate_limiter_rust/__init__.pyi docstrings to pass pre-commit hooks. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

Update .secrets.baseline after adding test_extra_sensitive_keywords in plugins_rust/encoded_exfil_detection/src/lib.rs:969 which contains a fake credential string that triggers the Secret Keyword detector. All new entries are false positives (test data). Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

The baseline regeneration reset is_secret to null for entries whose line numbers shifted. Mark all 17 unaudited entries as is_secret=false (test data, example configs, fake credentials) to pass the --fail-on-unaudited pre-commit check. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

brian-hussey

Approving as this has undergone several rounds of review and has now passed all CI checks.

Two optimizations informed by the rate limiter PR (#3809) patterns: 1. Batch list processing: for all-string lists in truncate mode, extract all &str borrows in one pass, process in a tight Rust loop, build output PyList in a single pass. Better cache locality and avoids per-item path string formatting and interleaved append calls. 2. Pre-sized String::with_capacity(): eliminate reallocation during truncation by pre-computing body + ellipsis size. Results: - Short list passthrough: 13.6x → 18.9x faster - List 10x10KB: 2.6x → 3.0x faster - Deep nested dict: 7.1x → 7.0x faster (stable) - Wide nested dict: 8.4x → 8.5x faster (stable) - 331 Python tests + 47 Rust tests pass Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

…ngine, benchmarks, and validation (#3809) * feat(rate-limiter): pluggable algorithms, tenant isolation fix, and scale load test - Add pluggable algorithm strategy: fixed_window, sliding_window, token_bucket - Add Redis backend for shared cross-instance rate limiting - Fix tenant isolation: skip by_tenant when tenant_id is None - Fix sliding window: sweep expired timestamps before counting - Fix backend validation: restore _validate_config check - Fix token bucket memory path: apply max(1,...) guard to reset timestamp - Add Redis integration tests for all three algorithms - Add direct regression tests for get_current_user tenant_id fallback - Add scale load test with Redis memory timeline and live algorithm detection - Add RL_PACE_MULTIPLIER for near-limit pace testing and boundary burst detection - Remove redundant algorithm locustfile; scale file is canonical - Correct stale comments and README limitations Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * feat(rate-limiter): add Rust-backed engine, check() API, benchmarks, and validation - Rust-backed sliding window engine with pyo3-log integration - check() API with tenant propagation, sweep/retry-after support - Eliminate redundant ZRANGE in sliding window Lua script - Fix detect-secrets baseline for rate limiter load tests - Clarify memory backend is single-instance only in docs Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * chore: regenerate detect-secrets baseline after rebase Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * refactor(rate-limiter): review fixes, Redis hardening, key-format parity tests - Extract _dispatch_hook() shared by prompt_pre_fetch and tool_pre_invoke, reducing each hook to a single-line wrapper - Elevate Redis val_i64/val_f64 parse-error logging from warn to error so silent fail-open degradation surfaces in operator dashboards - Clamp sliding-window reset_timestamp with .max(1) so it is always strictly in the future even when the oldest entry expires in < 1 s - Add 5 s tokio::time::timeout around Redis connection establishment to prevent indefinite blocking on network partition - Replace silent except-pass in EVALSHA SHA tracking with logger.debug - Document dual Lua-script invariant (rolling-upgrade key-format parity) in both Python RedisBackend docstring and Rust redis_backend.rs header - Add 7 parametrized test_redis_key_format_parity_* tests validating that Python and Rust produce identical Redis keys for the same inputs - Revert unrelated .pyi stub changes for encoded_exfil_detection, pii_filter, retry_with_backoff, and secrets_detection Signed-off-by: Jonathan Springer <jps@s390x.com> * fix: strip trailing whitespace in pyi stubs, remove accidental .claude/ralph-loop.local.md - Remove plugins_rust/rate_limiter/.claude/ralph-loop.local.md which was accidentally committed — this is a local Claude Code loop state file and should never have been checked in. - Fix trailing whitespace in plugins_rust/rate_limiter/python/ rate_limiter_rust/__init__.pyi docstrings to pass pre-commit hooks. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * chore: regenerate detect-secrets baseline for new exfil test strings Update .secrets.baseline after adding test_extra_sensitive_keywords in plugins_rust/encoded_exfil_detection/src/lib.rs:969 which contains a fake credential string that triggers the Secret Keyword detector. All new entries are false positives (test data). Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * chore: audit new detect-secrets baseline entries as false positives The baseline regeneration reset is_secret to null for entries whose line numbers shifted. Mark all 17 unaudited entries as is_secret=false (test data, example configs, fake credentials) to pass the --fail-on-unaudited pre-commit check. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> --------- Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> Signed-off-by: Jonathan Springer <jps@s390x.com> Co-authored-by: Jonathan Springer <jps@s390x.com>

…ngine, benchmarks, and validation (#3809) * feat(rate-limiter): pluggable algorithms, tenant isolation fix, and scale load test - Add pluggable algorithm strategy: fixed_window, sliding_window, token_bucket - Add Redis backend for shared cross-instance rate limiting - Fix tenant isolation: skip by_tenant when tenant_id is None - Fix sliding window: sweep expired timestamps before counting - Fix backend validation: restore _validate_config check - Fix token bucket memory path: apply max(1,...) guard to reset timestamp - Add Redis integration tests for all three algorithms - Add direct regression tests for get_current_user tenant_id fallback - Add scale load test with Redis memory timeline and live algorithm detection - Add RL_PACE_MULTIPLIER for near-limit pace testing and boundary burst detection - Remove redundant algorithm locustfile; scale file is canonical - Correct stale comments and README limitations Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * feat(rate-limiter): add Rust-backed engine, check() API, benchmarks, and validation - Rust-backed sliding window engine with pyo3-log integration - check() API with tenant propagation, sweep/retry-after support - Eliminate redundant ZRANGE in sliding window Lua script - Fix detect-secrets baseline for rate limiter load tests - Clarify memory backend is single-instance only in docs Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * chore: regenerate detect-secrets baseline after rebase Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * refactor(rate-limiter): review fixes, Redis hardening, key-format parity tests - Extract _dispatch_hook() shared by prompt_pre_fetch and tool_pre_invoke, reducing each hook to a single-line wrapper - Elevate Redis val_i64/val_f64 parse-error logging from warn to error so silent fail-open degradation surfaces in operator dashboards - Clamp sliding-window reset_timestamp with .max(1) so it is always strictly in the future even when the oldest entry expires in < 1 s - Add 5 s tokio::time::timeout around Redis connection establishment to prevent indefinite blocking on network partition - Replace silent except-pass in EVALSHA SHA tracking with logger.debug - Document dual Lua-script invariant (rolling-upgrade key-format parity) in both Python RedisBackend docstring and Rust redis_backend.rs header - Add 7 parametrized test_redis_key_format_parity_* tests validating that Python and Rust produce identical Redis keys for the same inputs - Revert unrelated .pyi stub changes for encoded_exfil_detection, pii_filter, retry_with_backoff, and secrets_detection Signed-off-by: Jonathan Springer <jps@s390x.com> * fix: strip trailing whitespace in pyi stubs, remove accidental .claude/ralph-loop.local.md - Remove plugins_rust/rate_limiter/.claude/ralph-loop.local.md which was accidentally committed — this is a local Claude Code loop state file and should never have been checked in. - Fix trailing whitespace in plugins_rust/rate_limiter/python/ rate_limiter_rust/__init__.pyi docstrings to pass pre-commit hooks. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * chore: regenerate detect-secrets baseline for new exfil test strings Update .secrets.baseline after adding test_extra_sensitive_keywords in plugins_rust/encoded_exfil_detection/src/lib.rs:969 which contains a fake credential string that triggers the Secret Keyword detector. All new entries are false positives (test data). Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * chore: audit new detect-secrets baseline entries as false positives The baseline regeneration reset is_secret to null for entries whose line numbers shifted. Mark all 17 unaudited entries as is_secret=false (test data, example configs, fake credentials) to pass the --fail-on-unaudited pre-commit check. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> --------- Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> Signed-off-by: Jonathan Springer <jps@s390x.com> Co-authored-by: Jonathan Springer <jps@s390x.com> Signed-off-by: lucarlig <luca.carlig@ibm.com>

…3965) * refactor(plugins): replace in-tree rate_limiter with cpex-rate-limiter package Remove the in-tree rate_limiter plugin and replace it with the cpex-rate-limiter PyPI package, a compiled Rust extension providing the same RateLimiterPlugin class with additional algorithms (sliding-window, token-bucket) alongside the original fixed-window. - Add cpex-rate-limiter>=0.0.2 as a [plugins] optional dependency - Update Containerfile.lite to install the plugins extra - Remove plugins/rate_limiter/ source directory - Remove unit and integration tests that imported plugin internals - Update all config files to use cpex_rate_limiter.RateLimiterPlugin - Disable RateLimiterPlugin in test fixture config (package not available in unit test environment) - Update documentation to reflect the external package Signed-off-by: Jonathan Springer <jps@s390x.com> Signed-off-by: lucarlig <luca.carlig@ibm.com> * feat(rate-limiter): pluggable algorithms with Rust-backed execution engine, benchmarks, and validation (#3809) * feat(rate-limiter): pluggable algorithms, tenant isolation fix, and scale load test - Add pluggable algorithm strategy: fixed_window, sliding_window, token_bucket - Add Redis backend for shared cross-instance rate limiting - Fix tenant isolation: skip by_tenant when tenant_id is None - Fix sliding window: sweep expired timestamps before counting - Fix backend validation: restore _validate_config check - Fix token bucket memory path: apply max(1,...) guard to reset timestamp - Add Redis integration tests for all three algorithms - Add direct regression tests for get_current_user tenant_id fallback - Add scale load test with Redis memory timeline and live algorithm detection - Add RL_PACE_MULTIPLIER for near-limit pace testing and boundary burst detection - Remove redundant algorithm locustfile; scale file is canonical - Correct stale comments and README limitations Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * feat(rate-limiter): add Rust-backed engine, check() API, benchmarks, and validation - Rust-backed sliding window engine with pyo3-log integration - check() API with tenant propagation, sweep/retry-after support - Eliminate redundant ZRANGE in sliding window Lua script - Fix detect-secrets baseline for rate limiter load tests - Clarify memory backend is single-instance only in docs Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * chore: regenerate detect-secrets baseline after rebase Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * refactor(rate-limiter): review fixes, Redis hardening, key-format parity tests - Extract _dispatch_hook() shared by prompt_pre_fetch and tool_pre_invoke, reducing each hook to a single-line wrapper - Elevate Redis val_i64/val_f64 parse-error logging from warn to error so silent fail-open degradation surfaces in operator dashboards - Clamp sliding-window reset_timestamp with .max(1) so it is always strictly in the future even when the oldest entry expires in < 1 s - Add 5 s tokio::time::timeout around Redis connection establishment to prevent indefinite blocking on network partition - Replace silent except-pass in EVALSHA SHA tracking with logger.debug - Document dual Lua-script invariant (rolling-upgrade key-format parity) in both Python RedisBackend docstring and Rust redis_backend.rs header - Add 7 parametrized test_redis_key_format_parity_* tests validating that Python and Rust produce identical Redis keys for the same inputs - Revert unrelated .pyi stub changes for encoded_exfil_detection, pii_filter, retry_with_backoff, and secrets_detection Signed-off-by: Jonathan Springer <jps@s390x.com> * fix: strip trailing whitespace in pyi stubs, remove accidental .claude/ralph-loop.local.md - Remove plugins_rust/rate_limiter/.claude/ralph-loop.local.md which was accidentally committed — this is a local Claude Code loop state file and should never have been checked in. - Fix trailing whitespace in plugins_rust/rate_limiter/python/ rate_limiter_rust/__init__.pyi docstrings to pass pre-commit hooks. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * chore: regenerate detect-secrets baseline for new exfil test strings Update .secrets.baseline after adding test_extra_sensitive_keywords in plugins_rust/encoded_exfil_detection/src/lib.rs:969 which contains a fake credential string that triggers the Secret Keyword detector. All new entries are false positives (test data). Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * chore: audit new detect-secrets baseline entries as false positives The baseline regeneration reset is_secret to null for entries whose line numbers shifted. Mark all 17 unaudited entries as is_secret=false (test data, example configs, fake credentials) to pass the --fail-on-unaudited pre-commit check. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> --------- Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> Signed-off-by: Jonathan Springer <jps@s390x.com> Co-authored-by: Jonathan Springer <jps@s390x.com> Signed-off-by: lucarlig <luca.carlig@ibm.com> * feat(discovery): add automatic tool discovery with hot/cold classification (#3839) Implement automatic tool discovery for upstream MCP servers via usage-aware adaptive polling. The gateway can now continuously synchronise tool lists from registered servers without manual intervention. Server classification (hot/cold): - Classify servers based on MCP session pool usage patterns - Hot servers (top 20% by recent usage): polled at 1x base interval - Cold servers (remaining 80%): polled at 3x base interval - Classification is deterministic: sorted by recency, active sessions, use count, and URL for tie-breaking - Leader election via Redis with TTL renewal for multi-worker coordination - Falls back to local-only operation without Redis Integration with GatewayService: - Health checks respect hot/cold classification intervals - Auto-refresh of tools/resources/prompts respects classification - Fail-open on classification errors (poll anyway) - Poll timestamps tracked via Redis with TTL expiry - Uses base gateway URL (pre-auth) for classification lookups to avoid leaking query-param auth secrets to Redis Configuration: - AUTO_REFRESH_SERVERS=true enables automatic tool sync (default: false) - GATEWAY_AUTO_REFRESH_INTERVAL=300 sets base polling interval - HOT_COLD_CLASSIFICATION_ENABLED=false (opt-in, requires Redis) Includes comprehensive tests with 100% coverage on the new ServerClassificationService and integration tests for the GatewayService hot/cold polling paths. Closes #3734 Signed-off-by: Lang-Akshay <akshay.shinde26@ibm.com> Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> Signed-off-by: lucarlig <luca.carlig@ibm.com> * refactor(plugins): replace in-tree rate_limiter with cpex-rate-limiter package Remove the in-tree rate_limiter plugin and replace it with the cpex-rate-limiter PyPI package, a compiled Rust extension providing the same RateLimiterPlugin class with additional algorithms (sliding-window, token-bucket) alongside the original fixed-window. - Add cpex-rate-limiter>=0.0.2 as a [plugins] optional dependency - Update Containerfile.lite to install the plugins extra - Remove plugins/rate_limiter/ source directory - Remove unit and integration tests that imported plugin internals - Update all config files to use cpex_rate_limiter.RateLimiterPlugin - Disable RateLimiterPlugin in test fixture config (package not available in unit test environment) - Update documentation to reflect the external package Signed-off-by: Jonathan Springer <jps@s390x.com> Signed-off-by: lucarlig <luca.carlig@ibm.com> * refactor(plugins): update build, CI, and docs for PyPI plugin migration Remove all plugins_rust/ build infrastructure and update references across Containerfiles, Makefile, CI workflows, pre-commit configs, CODEOWNERS, and documentation to reflect that plugins are now distributed as PyPI packages (cpex-*) via the [plugins] optional extra. - Remove Rust plugin builder stages from all Containerfiles - Remove ~100 lines of rust-* plugin Makefile targets (keep mcp-runtime) - Add --extra plugins to CI pytest workflow - Add [plugins] extra to install-dev Makefile target - Update tool_service.py import to use cpex_retry_with_backoff - Update plugin kind paths in 7 doc files to cpex_pii_filter.* - Clean up pre-commit, CODEOWNERS, MANIFEST.in, whitesource, .gitignore Signed-off-by: Jonathan Springer <jps@s390x.com> Signed-off-by: lucarlig <luca.carlig@ibm.com> * fix(plugins): address PR review findings on PyPI plugin migration Round 1 (blockers + high): - Restore exclude-newer = "10 days" in pyproject.toml; replace stale langchain/requests pins with cpex-* per-package overrides anchored to 2026-04-09 so the plugins resolve newer than the global window - Guard cpex_retry_with_backoff import in tool_service.py with try/except ImportError; falls back to (None, True) for the Python pipeline when the optional [plugins] extra is not installed - Delete orphaned .github/workflows/rust-plugins.yml and the associated test cases in tests/unit/test_rust_plugins_workflow.py; drop the workflow card from docs/docs/architecture/explorer.html - Delete orphaned docs/docs/using/plugins/rust-plugins.md and remove it from docs/docs/using/plugins/.pages mkdocs nav - Harden docker-entrypoint.sh install_plugin_requirements: canonicalize /app and the resolved requirements path with readlink -f and require the path to live under /app/, log non-comment lines from the requirements file before pip runs, and skip cleanly on validation failure - Delete PLUGIN-MIGRATION-PLAN.md (one-time planning doc) - Add COPY plugins/requirements.txt to Containerfile.scratch (the layered Containerfile.lite already had it; the broad COPY . in Containerfile already includes it) Round 2 (medium + low): - Bump cpex-* version pin floors in pyproject.toml [plugins] to match resolved versions in uv.lock (cpex-rate-limiter>=0.0.3, cpex-encoded-exfil-detection>=0.2.0, cpex-pii-filter>=0.2.0, cpex-url-reputation>=0.1.1) - Add Prerequisites section to tests/performance/PLUGIN_PROFILING.md documenting the [plugins] extra requirement - Add Status: Partially superseded note to ADR-041 explaining that plugins_rust/ was removed when in-tree Rust plugins migrated to PyPI packages - Document upgrade semantics in plugins/requirements.txt header (pip without --upgrade skips already-satisfied constraints) - Add importlib.util.find_spec() precheck to tests/performance/test_plugins_performance.py main(); the script now skips cleanly with an actionable message if any of the five cpex packages referenced by the perf config are missing - Rename tests/unit/test_rust_plugins_workflow.py to test_go_toolchain_pinning.py to match its remaining contents (Go workflow pin and Makefile toolchain assertion) Follow-ups tracked in #4116 and IBM/cpex-plugins#21 for the longer-term tool_service.py refactor that will eliminate the cross-package import entirely. Signed-off-by: Jonathan Springer <jps@s390x.com> Signed-off-by: lucarlig <luca.carlig@ibm.com> * revert: restore tests changes from PR #3965 Signed-off-by: lucarlig <luca.carlig@ibm.com> * fix(ci): align plugin tests with PyPI migration Signed-off-by: lucarlig <luca.carlig@ibm.com> * test: remove legacy plugin test skip infrastructure Signed-off-by: lucarlig <luca.carlig@ibm.com> * test: align packaged plugin tests with rust shims Signed-off-by: lucarlig <luca.carlig@ibm.com> * test: cover retry policy import path in tool service Signed-off-by: lucarlig <luca.carlig@ibm.com> * fix: harden cpex plugin migration paths Signed-off-by: lucarlig <luca.carlig@ibm.com> * test: cover retry policy parser branches Signed-off-by: lucarlig <luca.carlig@ibm.com> * test: cover plugin requirements entrypoint path Signed-off-by: lucarlig <luca.carlig@ibm.com> --------- Signed-off-by: lucarlig <luca.carlig@ibm.com> Signed-off-by: Jonathan Springer <jps@s390x.com> Co-authored-by: Pratik Gandhi <gandhipratik203@gmail.com> Co-authored-by: Lang-Akshay <akshay.shinde26@ibm.com> Co-authored-by: lucarlig <luca.carlig@ibm.com>

gandhipratik203 marked this pull request as ready for review March 23, 2026 21:32

gandhipratik203 requested review from araujof, crivetimihai, jonpspri, kevalmahajan, madhav165 and terylt as code owners March 23, 2026 21:32

gandhipratik203 marked this pull request as draft March 24, 2026 08:15

gandhipratik203 marked this pull request as ready for review March 24, 2026 13:10

gandhipratik203 requested review from dima-zakharov and lucarlig as code owners March 24, 2026 13:10

gandhipratik203 force-pushed the feat/rate-limiter-rust branch 4 times, most recently from d287cc3 to f48fc3a Compare March 24, 2026 15:51

lucarlig requested changes Mar 24, 2026

View reviewed changes

Comment thread mcpgateway/auth.py

gandhipratik203 marked this pull request as draft March 24, 2026 16:07

gandhipratik203 force-pushed the feat/rate-limiter-rust branch from 6b517ff to 60f07cb Compare March 24, 2026 19:35

gandhipratik203 marked this pull request as ready for review March 24, 2026 22:38

gandhipratik203 changed the title ~~feat(rate-limiter): add rust-backed execution engine and benchmarks~~ feat(rate-limiter): add backed execution engine and benchmarks Mar 25, 2026

lucarlig self-requested a review March 26, 2026 15:05

gandhipratik203 force-pushed the feat/rate-limiter-rust branch from d44cd1d to 5514d53 Compare March 27, 2026 08:06

gandhipratik203 added enhancement New feature or request performance Performance related items plugins wxo wxo integration MUST P1: Non-negotiable, critical requirements without which the product is non-functional or unsafe labels Mar 27, 2026

gandhipratik203 added this to the Release 1.0.0 milestone Mar 27, 2026

gandhipratik203 added the release-fix Critical bugfix required for the release label Mar 27, 2026

gandhipratik203 force-pushed the feat/rate-limiter-rust branch from a433395 to 08cd2ad Compare March 28, 2026 19:40

jonpspri self-assigned this Mar 30, 2026

dima-zakharov previously approved these changes Mar 30, 2026

View reviewed changes

jonpspri assigned crivetimihai and unassigned jonpspri Mar 30, 2026

lucarlig previously approved these changes Mar 30, 2026

View reviewed changes

lucarlig mentioned this pull request Apr 1, 2026

Add Pratik Gandhi attribution to rate limiter import IBM/cpex-plugins#7

Merged

gandhipratik203 and others added 4 commits April 2, 2026 15:25

chore: regenerate detect-secrets baseline after rebase

8e632ea

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

jonpspri dismissed stale reviews from lucarlig and dima-zakharov via 8617367 April 2, 2026 14:26

jonpspri force-pushed the feat/rate-limiter-rust branch from 8aa04e1 to 8617367 Compare April 2, 2026 14:26

gandhipratik203 added 3 commits April 2, 2026 16:15

brian-hussey approved these changes Apr 2, 2026

View reviewed changes

brian-hussey merged commit b24e6cb into main Apr 2, 2026
34 checks passed

brian-hussey deleted the feat/rate-limiter-rust branch April 2, 2026 15:51

This was referenced Apr 13, 2026

[FEATURE][PLUGINS]: Rust-backed rate limiter execution engine for hot-path acceleration #3864

Closed

[FEATURE][RUST]: Rust implementation for rate_limiter plugin (parity + fallback + benchmarks) #3231

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rate-limiter): pluggable algorithms with Rust-backed execution engine, benchmarks, and validation#3809

feat(rate-limiter): pluggable algorithms with Rust-backed execution engine, benchmarks, and validation#3809
brian-hussey merged 7 commits intomainfrom
feat/rate-limiter-rust

gandhipratik203 commented Mar 23, 2026 •

edited by jonpspri

Loading

Uh oh!

lucarlig left a comment

Uh oh!

Uh oh!

lucarlig commented Mar 26, 2026

Uh oh!

jonpspri commented Mar 30, 2026

Uh oh!

dima-zakharov left a comment

Uh oh!

brian-hussey left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

gandhipratik203 commented Mar 23, 2026 • edited by jonpspri Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Gaps closed

Python foundation

Rust execution engine

Additional hardening

Architecture

Test results

Test results summary

1. Backend correctness (multi-instance Redis)

2. Rust micro-benchmarks (criterion, Apple Silicon M-series, release build)

3. Hook-path comparison (Python pytest-benchmark, memory backend)

4–8. Remaining validation

Uh oh!

lucarlig left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lucarlig commented Mar 26, 2026

Uh oh!

jonpspri commented Mar 30, 2026

Uh oh!

dima-zakharov left a comment

Choose a reason for hiding this comment

Uh oh!

brian-hussey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

gandhipratik203 commented Mar 23, 2026 •

edited by jonpspri

Loading