Skip to content

[FEATURE][PLUGINS]: Rust-backed rate limiter execution engine for hot-path acceleration #3864

@gandhipratik203

Description

@gandhipratik203

Summary

The rate limiter plugin enforces limits correctly across multi-instance deployments with pluggable algorithms and Redis shared state. However, the entire hot path — dimension orchestration, rate evaluation, result aggregation, and response header/metadata construction — runs in Python on every request. This issue tracks introducing a Rust execution engine for the rate limiter hot path to reduce per-request overhead.


Area Affected

  • plugins/rate_limiter/
  • plugins_rust/rate_limiter/ (new)
  • tests/unit/mcpgateway/plugins/plugins/rate_limiter/
  • tests/loadtest/locustfile_rate_limiter_*.py

Context / Rationale

Every hook invocation (tool_pre_invoke, prompt_pre_fetch) executes this sequence in Python:

  1. Extract user identity, tenant, tool name from context
  2. Build per-dimension check tuples (f-string key construction, dict lookups)
  3. Evaluate each dimension against the backend (asyncio.Lock per call for memory, Redis round-trip for redis)
  4. Aggregate results across dimensions (select most restrictive)
  5. Construct HTTP response headers (X-RateLimit-*, Retry-After) and plugin metadata dicts

For multi-dimension configs (user + tenant + tool), the Python path makes 3 separate backend calls, each acquiring an asyncio.Lock (memory) or executing a Redis script (redis). Response construction involves multiple dict allocations per request.

# Gap Impact
1 Rate evaluation runs entirely in Python Per-request CPU cost includes interpreter overhead, asyncio scheduling, and GIL contention under concurrent load
2 Multi-dimension evaluation requires N separate backend calls 3 asyncio.Lock acquisitions or 3 Redis round-trips per request when user + tenant + tool are all configured
3 Response header/metadata dicts built in Python per request Multiple dict allocations and string conversions on every hook call
4 No way to compare Python vs accelerated path Cannot measure improvement without code changes and container rebuilds
5 Load test response classification conflates rate-limited responses with infrastructure errors Automated correctness verdicts are unreliable
6 No multi-dimension or per-algorithm parity tests Correctness coverage limited to single-dimension single-algorithm sequences

Opportunity: A Rust execution engine can batch all dimensions in a single call, use lock-free-friendly concurrency primitives for in-memory state, build response dicts directly, and release the Python GIL during computation. The Python wrapper becomes a thin dispatch layer: extract context, call Rust once, return the result.


Acceptance Criteria

Correctness

  • All existing rate limiter unit tests pass (no regressions)
  • Rust and Python paths produce identical allow/block sequences for all 3 algorithms
  • Multi-dimension parity verified (user + tenant + tool evaluated together)
  • Remaining-count parity verified for all 3 algorithms (not just fixed_window)
  • Fail-open behavior preserved — engine errors must not block requests
  • Rolling-upgrade safety — Rust and Python Redis backends must share counters correctly

Performance

  • Hook-path latency measured for both implementations across all 3 algorithms and both backends
  • Multi-dimension (3-dim) latency measured separately from single-dimension
  • Concurrent throughput measured under async load
  • Criterion micro-benchmarks covering realistic access patterns (hot-counter, blocked-path, many-keys, contention)
  • Results documented in the PR description with methodology

Load Testing

  • Backend correctness validated on multi-instance deployment (nginx + 3 gateways + Redis)
  • Response classification correctly distinguishes rate-limited from infrastructure errors
  • Redis capacity benchmark supports automated comparison between implementations
  • Scale test validates rate-limiting accuracy under sustained multi-user load
  • Makefile targets for reproducible A/B comparison

Documentation

  • Plugin README updated with Rust engine details and backend trade-offs
  • Configuration reference updated (new env var, engine selection behavior)
  • Benchmark methodology and results documented in PR description

Hardening

  • Config validation at startup — malformed rates and invalid algorithms rejected at init, not at request time
  • Fail-open behavior verified and documented — engine errors do not block requests
  • No user or tenant identifiers in log output from the engine path
  • Python fallback path preserved and independently testable when Rust engine is unavailable

CI

  • All CI jobs pass (linting, tests, security scan, builds)
  • Rust code passes cargo fmt, cargo clippy -D warnings, and cargo test

Additional Context

This is a performance optimization for the rate limiter plugin. The current Python implementation is functionally correct — this issue is about reducing per-request overhead by moving the hot path into Rust while preserving the existing plugin integration model.


Sub-issue of #3735.

Metadata

Metadata

Labels

MUSTP1: Non-negotiable, critical requirements without which the product is non-functional or unsafeenhancementNew feature or requestperformancePerformance related itemspluginsreadyValidated, ready-to-work-on itemsrelease-fixCritical bugfix required for the releaserustRust programmingwxowxo integration
No fields configured for Feature.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions