Summary
The rate limiter plugin enforces limits correctly across multi-instance deployments with pluggable algorithms and Redis shared state. However, the entire hot path — dimension orchestration, rate evaluation, result aggregation, and response header/metadata construction — runs in Python on every request. This issue tracks introducing a Rust execution engine for the rate limiter hot path to reduce per-request overhead.
Area Affected
plugins/rate_limiter/
plugins_rust/rate_limiter/ (new)
tests/unit/mcpgateway/plugins/plugins/rate_limiter/
tests/loadtest/locustfile_rate_limiter_*.py
Context / Rationale
Every hook invocation (tool_pre_invoke, prompt_pre_fetch) executes this sequence in Python:
- Extract user identity, tenant, tool name from context
- Build per-dimension check tuples (f-string key construction, dict lookups)
- Evaluate each dimension against the backend (asyncio.Lock per call for memory, Redis round-trip for redis)
- Aggregate results across dimensions (select most restrictive)
- Construct HTTP response headers (
X-RateLimit-*, Retry-After) and plugin metadata dicts
For multi-dimension configs (user + tenant + tool), the Python path makes 3 separate backend calls, each acquiring an asyncio.Lock (memory) or executing a Redis script (redis). Response construction involves multiple dict allocations per request.
| # |
Gap |
Impact |
| 1 |
Rate evaluation runs entirely in Python |
Per-request CPU cost includes interpreter overhead, asyncio scheduling, and GIL contention under concurrent load |
| 2 |
Multi-dimension evaluation requires N separate backend calls |
3 asyncio.Lock acquisitions or 3 Redis round-trips per request when user + tenant + tool are all configured |
| 3 |
Response header/metadata dicts built in Python per request |
Multiple dict allocations and string conversions on every hook call |
| 4 |
No way to compare Python vs accelerated path |
Cannot measure improvement without code changes and container rebuilds |
| 5 |
Load test response classification conflates rate-limited responses with infrastructure errors |
Automated correctness verdicts are unreliable |
| 6 |
No multi-dimension or per-algorithm parity tests |
Correctness coverage limited to single-dimension single-algorithm sequences |
Opportunity: A Rust execution engine can batch all dimensions in a single call, use lock-free-friendly concurrency primitives for in-memory state, build response dicts directly, and release the Python GIL during computation. The Python wrapper becomes a thin dispatch layer: extract context, call Rust once, return the result.
Acceptance Criteria
Correctness
Performance
Load Testing
Documentation
Hardening
CI
Additional Context
This is a performance optimization for the rate limiter plugin. The current Python implementation is functionally correct — this issue is about reducing per-request overhead by moving the hot path into Rust while preserving the existing plugin integration model.
Sub-issue of #3735.
Summary
The rate limiter plugin enforces limits correctly across multi-instance deployments with pluggable algorithms and Redis shared state. However, the entire hot path — dimension orchestration, rate evaluation, result aggregation, and response header/metadata construction — runs in Python on every request. This issue tracks introducing a Rust execution engine for the rate limiter hot path to reduce per-request overhead.
Area Affected
plugins/rate_limiter/plugins_rust/rate_limiter/(new)tests/unit/mcpgateway/plugins/plugins/rate_limiter/tests/loadtest/locustfile_rate_limiter_*.pyContext / Rationale
Every hook invocation (
tool_pre_invoke,prompt_pre_fetch) executes this sequence in Python:X-RateLimit-*,Retry-After) and plugin metadata dictsFor multi-dimension configs (user + tenant + tool), the Python path makes 3 separate backend calls, each acquiring an asyncio.Lock (memory) or executing a Redis script (redis). Response construction involves multiple dict allocations per request.
Opportunity: A Rust execution engine can batch all dimensions in a single call, use lock-free-friendly concurrency primitives for in-memory state, build response dicts directly, and release the Python GIL during computation. The Python wrapper becomes a thin dispatch layer: extract context, call Rust once, return the result.
Acceptance Criteria
Correctness
Performance
Load Testing
Documentation
Hardening
CI
cargo fmt,cargo clippy -D warnings, andcargo testAdditional Context
This is a performance optimization for the rate limiter plugin. The current Python implementation is functionally correct — this issue is about reducing per-request overhead by moving the hot path into Rust while preserving the existing plugin integration model.
Sub-issue of #3735.