[FEATURE][PLUGINS]: Rust-backed rate limiter execution engine for hot-path acceleration

### Summary

The rate limiter plugin enforces limits correctly across multi-instance deployments with pluggable algorithms and Redis shared state. However, the entire hot path — dimension orchestration, rate evaluation, result aggregation, and response header/metadata construction — runs in Python on every request. This issue tracks introducing a Rust execution engine for the rate limiter hot path to reduce per-request overhead.

---

### Area Affected

- `plugins/rate_limiter/`
- `plugins_rust/rate_limiter/` (new)
- `tests/unit/mcpgateway/plugins/plugins/rate_limiter/`
- `tests/loadtest/locustfile_rate_limiter_*.py`

---

### Context / Rationale

Every hook invocation (`tool_pre_invoke`, `prompt_pre_fetch`) executes this sequence in Python:

1. Extract user identity, tenant, tool name from context
2. Build per-dimension check tuples (f-string key construction, dict lookups)
3. Evaluate each dimension against the backend (asyncio.Lock per call for memory, Redis round-trip for redis)
4. Aggregate results across dimensions (select most restrictive)
5. Construct HTTP response headers (`X-RateLimit-*`, `Retry-After`) and plugin metadata dicts

For multi-dimension configs (user + tenant + tool), the Python path makes 3 separate backend calls, each acquiring an asyncio.Lock (memory) or executing a Redis script (redis). Response construction involves multiple dict allocations per request.

| # | Gap | Impact |
|---|---|---|
| 1 | Rate evaluation runs entirely in Python | Per-request CPU cost includes interpreter overhead, asyncio scheduling, and GIL contention under concurrent load |
| 2 | Multi-dimension evaluation requires N separate backend calls | 3 asyncio.Lock acquisitions or 3 Redis round-trips per request when user + tenant + tool are all configured |
| 3 | Response header/metadata dicts built in Python per request | Multiple dict allocations and string conversions on every hook call |
| 4 | No way to compare Python vs accelerated path | Cannot measure improvement without code changes and container rebuilds |
| 5 | Load test response classification conflates rate-limited responses with infrastructure errors | Automated correctness verdicts are unreliable |
| 6 | No multi-dimension or per-algorithm parity tests | Correctness coverage limited to single-dimension single-algorithm sequences |

**Opportunity:** A Rust execution engine can batch all dimensions in a single call, use lock-free-friendly concurrency primitives for in-memory state, build response dicts directly, and release the Python GIL during computation. The Python wrapper becomes a thin dispatch layer: extract context, call Rust once, return the result.

---

### Acceptance Criteria

#### Correctness
- [ ] All existing rate limiter unit tests pass (no regressions)
- [ ] Rust and Python paths produce identical allow/block sequences for all 3 algorithms
- [ ] Multi-dimension parity verified (user + tenant + tool evaluated together)
- [ ] Remaining-count parity verified for all 3 algorithms (not just fixed_window)
- [ ] Fail-open behavior preserved — engine errors must not block requests
- [ ] Rolling-upgrade safety — Rust and Python Redis backends must share counters correctly

#### Performance
- [ ] Hook-path latency measured for both implementations across all 3 algorithms and both backends
- [ ] Multi-dimension (3-dim) latency measured separately from single-dimension
- [ ] Concurrent throughput measured under async load
- [ ] Criterion micro-benchmarks covering realistic access patterns (hot-counter, blocked-path, many-keys, contention)
- [ ] Results documented in the PR description with methodology

#### Load Testing
- [ ] Backend correctness validated on multi-instance deployment (nginx + 3 gateways + Redis)
- [ ] Response classification correctly distinguishes rate-limited from infrastructure errors
- [ ] Redis capacity benchmark supports automated comparison between implementations
- [ ] Scale test validates rate-limiting accuracy under sustained multi-user load
- [ ] Makefile targets for reproducible A/B comparison

#### Documentation
- [ ] Plugin README updated with Rust engine details and backend trade-offs
- [ ] Configuration reference updated (new env var, engine selection behavior)
- [ ] Benchmark methodology and results documented in PR description

#### Hardening
- [ ] Config validation at startup — malformed rates and invalid algorithms rejected at init, not at request time
- [ ] Fail-open behavior verified and documented — engine errors do not block requests
- [ ] No user or tenant identifiers in log output from the engine path
- [ ] Python fallback path preserved and independently testable when Rust engine is unavailable

#### CI
- [ ] All CI jobs pass (linting, tests, security scan, builds)
- [ ] Rust code passes `cargo fmt`, `cargo clippy -D warnings`, and `cargo test`

---

### Additional Context

This is a performance optimization for the rate limiter plugin. The current Python implementation is functionally correct — this issue is about reducing per-request overhead by moving the hot path into Rust while preserving the existing plugin integration model.

---

Sub-issue of #3735.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE][PLUGINS]: Rust-backed rate limiter execution engine for hot-path acceleration #3864

Summary

Area Affected

Context / Rationale

Acceptance Criteria

Correctness

Performance

Load Testing

Documentation

Hardening

CI

Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

#	Gap	Impact
1	Rate evaluation runs entirely in Python	Per-request CPU cost includes interpreter overhead, asyncio scheduling, and GIL contention under concurrent load
2	Multi-dimension evaluation requires N separate backend calls	3 asyncio.Lock acquisitions or 3 Redis round-trips per request when user + tenant + tool are all configured
3	Response header/metadata dicts built in Python per request	Multiple dict allocations and string conversions on every hook call
4	No way to compare Python vs accelerated path	Cannot measure improvement without code changes and container rebuilds
5	Load test response classification conflates rate-limited responses with infrastructure errors	Automated correctness verdicts are unreliable
6	No multi-dimension or per-algorithm parity tests	Correctness coverage limited to single-dimension single-algorithm sequences

[FEATURE][PLUGINS]: Rust-backed rate limiter execution engine for hot-path acceleration #3864

Description

Summary

Area Affected

Context / Rationale

Acceptance Criteria

Correctness

Performance

Load Testing

Documentation

Hardening

CI

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions