Skip to content

feat(rate-limiter): pluggable algorithms with Rust-backed execution engine, benchmarks, and validation#3809

Merged
brian-hussey merged 7 commits intomainfrom
feat/rate-limiter-rust
Apr 2, 2026
Merged

feat(rate-limiter): pluggable algorithms with Rust-backed execution engine, benchmarks, and validation#3809
brian-hussey merged 7 commits intomainfrom
feat/rate-limiter-rust

Conversation

@gandhipratik203
Copy link
Copy Markdown
Collaborator

@gandhipratik203 gandhipratik203 commented Mar 23, 2026

Summary

Consolidates and extends the rate limiter plugin with a Rust-backed execution engine, carrying forward the Python-side foundation work from #3783 (pluggable algorithms, tenant isolation, correctness fixes) and replacing the Python algorithm/backend implementation with a high-performance Rust engine for both memory and redis backends.

The Rust engine is the preferred execution path — all algorithm execution, backend dispatch, result aggregation, and response construction live in Rust when the rate_limiter_rust PyO3 extension is installed. Python retains ownership of plugin lifecycle, hook integration, request-context extraction, and config validation. The full Python algorithm and backend implementation is retained as a fallback when the Rust extension is unavailable or when RATE_LIMITER_FORCE_PYTHON=1 is set.

The Rust engine exposes a high-level check() API that reduces the Python-Rust boundary to a single call returning pre-built response dicts. This keeps the existing plugin integration model intact while reducing request-path overhead and preserving shared-counter semantics for multi-instance deployments.


Gaps closed

Python foundation

Gap 1 (HIGH) — No algorithm choice: only fixed_window was available — no way to handle boundary bursts or bursty workloads. Fixed by introducing a strategy pattern with three selectable algorithms (fixed_window, sliding_window, token_bucket) configurable via the algorithm: field. All three run on both memory and redis backends; the Redis backend was extended with atomic Lua scripts for sliding_window and token_bucket to maintain cluster-wide enforcement across instances.

test_fixed_window_burst_at_boundary (xfail) documents the boundary burst as a known trade-off for users who stay on fixed_window; sliding_window eliminates it entirely.

Gap 2 (HIGH)by_tenant cross-throttle: tenant_id=None fell back to a shared "tenant:default" bucket, causing unrelated users to throttle each other across the entire deployment. Fixed in two places: (1) the plugin now skips the by_tenant check entirely when tenant_id is None instead of inventing a phantom bucket; (2) mcpgateway/auth.py now propagates request.state.team_idglobal_context.tenant_id unconditionally via _propagate_tenant_id(), called at all four return paths in get_current_user(), so single-team API tokens correctly populate tenant context for rate limiting regardless of the include_user_info setting.

Gap 3 (MEDIUM) — Sliding window memory leak in sweep(): stale-but-non-empty timestamp lists were never evicted from _store because sweep() only removed empty lists. Fixed by embedding the window size in each store key ("{key}:{window}") so staleness is computable at sweep time, and rewriting sweep() to evict any key where all timestamps fall outside the window.

Rust execution engine

Gap 4 (HIGH) — Python hot path overhead: every request still paid Python-side rate evaluation costs, including per-dimension orchestration, repeated backend calls, and response metadata construction via individual PyO3 attribute accesses. Fixed by introducing a Rust RateLimiterEngine with a single check() call per hook invocation. The engine builds dimension keys internally from its pre-parsed config, evaluates all active dimensions in one batch, and returns pre-built Python dicts for HTTP headers and plugin metadata — eliminating ~18-25 PyO3 boundary crossings per request (depending on the number of active dimensions).

Gap 5 (HIGH) — Rust acceleration was limited to the in-memory backend: Redis-backed deployments, which are the correctness-critical path for multi-instance enforcement, still relied on the Python backend. Fixed by adding a Redis backend to the Rust engine. Rust now owns the Redis connection and executes batch Lua scripts directly, preserving shared-counter behavior across replicas.

Gap 6 (MEDIUM) — Rate-limit computation, response shaping, and plugin policy handling were still interleaved in the Python implementation. Fixed by making the Python wrapper policy-only: it validates config, normalizes context, and invokes the engine. Algorithm execution, backend dispatch, result aggregation, and response dict construction now all live in Rust. The Python hot path is reduced to a single check() call plus result dispatch.

Gap 7 (LOW) — The Rust+Redis path needed end-to-end proof under a real multi-instance deployment. Validated with the backend-correctness load test against 3 gateways behind nginx with Redis shared state. Result: 60 allowed, 60 rate-limited, 49.6% blocked, matching the expected ~50% shared-counter behavior.


Additional hardening

  • _parse_rate robustness — was rate.split("/") with no error handling; now uses maxsplit=1 + try/except with a clear message showing the bad input
  • prompt_pre_fetch normalisationprompt_id now normalised with .strip().lower(), consistent with how tool_pre_invoke handles tool names
  • _normalised_by_tool pre-computedby_tool key lowercasing was re-done on every hook call; now computed once at init in _validate_config
  • Unconditional tenant propagation_propagate_tenant_id() copies request.state.team_idglobal_context.tenant_id at every get_current_user() return path, not just inside _inject_userinfo_instate() which is gated by include_user_info (default False)
  • Sliding-window Retry-After >= 1int truncation of (oldest_ts + window - now) could produce 0 when the oldest timestamp plus the window rounded down to int(now), telling clients to retry immediately while still over limit; fixed on both memory and Redis paths with max(1, reset_in)
  • Token-bucket first-request memory/Redis parity — the memory path hard-coded time_to_full = window on first request, while Redis derived it from tokens_needed / refill_rate, causing metadata divergence between backends (e.g. 60 vs ~6 for a 10/m limit); both paths now use tokens_needed / refill_rate
  • Config pre-parsing at startup — rate strings are parsed once during plugin initialization so the request path does not re-parse limits on every hook invocation
  • Rolling-upgrade compatibility — the Rust Redis backend uses the same key format as the Python Redis backend so both implementations share counters correctly during mixed deployments
  • Async Redis execution path — Redis-backed Rust evaluation now uses a multiplexed async Redis connection and an async PyO3 boundary so Python async hooks can await the Redis path directly
  • Rust engine preferred with Python fallback — when the rate_limiter_rust PyO3 extension is installed, the Rust engine handles all rate evaluation; when unavailable or when RATE_LIMITER_FORCE_PYTHON=1 is set, the plugin falls back to the full Python algorithm and backend implementation
  • RATE_LIMITER_FORCE_PYTHON env var — allows operators to force the Python backend for A/B comparisons or debugging; Makefile targets benchmark-rate-limiter-capacity-rust and benchmark-rate-limiter-capacity-python exercise both paths
  • Reduced lock contention on memory hot path — the Rust MemoryStore now acquires one read lock on the outer map + one write lock on the per-key state per steady-state request (previously two outer read locks), eliminating a redundant lock cycle
  • Response construction in Rust — HTTP headers and plugin metadata dicts are built inside the Rust engine after GIL reacquisition, replacing ~18-25 individual PyO3 attribute accesses from the Python side (depending on dimension count)

Architecture

The plugin now delegates all rate evaluation to Rust:

  • Python owns plugin lifecycle, hook integration, config validation, and request-context extraction
  • Rust owns dimension selection, algorithm execution, backend dispatch, result aggregation, and response dict construction
  • Backend selection is an engine concern:
    • memory → Rust in-process store via the existing sync engine path
    • redis → Rust async Redis execution via a multiplexed connection + batch Lua execution
  • When the rate_limiter_rust PyO3 extension is installed, Rust handles all rate evaluation; otherwise the plugin falls back to the Python backend
Plugin internals: request flow, wrapper responsibilities, Rust engine, backends, and response shaping
┌──────────────────────────────────────────────────────────────────────┐
│                         RateLimiterPlugin                           │
│                                                                      │
│  Hooks:  tool_pre_invoke            prompt_pre_fetch                │
│          (tool name + context)      (prompt_id + context)           │
└─────────────────────────────┬────────────────────────────────────────┘
                              │
                              │  Python responsibilities:
                              │  - validate config (at init)
                              │  - normalize by_tool keys (at init)
                              │  - extract user / tenant / tool|prompt
                              ▼
┌──────────────────────────────────────────────────────────────────────┐
│                     RustRateLimiterEngine wrapper                    │
│                        (PyO3 boundary)                               │
│                                                                      │
│  Python calls the Rust engine once per request:                     │
│  - check(user, tenant, tool, now_unix) for memory                   │
│  - await check_async(user, tenant, tool, now_unix) for redis        │
│                                                                      │
│  Returns: (allowed, headers_dict, meta_dict)                        │
│  Rust builds dimension keys, evaluates, and constructs response     │
│  dicts internally — no per-attribute PyO3 accesses needed.          │
└─────────────────────────────┬────────────────────────────────────────┘
                              │
                              ▼
┌──────────────────────────────────────────────────────────────────────┐
│                        RateLimiterEngine (Rust)                      │
│                                                                      │
│  Owns:                                                               │
│  - parsed config (by_user, by_tenant, by_tool)                      │
│  - dimension key construction (user:X, tenant:Y, tool:Z)            │
│  - backend selection + dispatch                                      │
│  - algorithm execution                                               │
│  - per-dimension result calculation                                  │
│  - aggregated EvalResult (most restrictive dimension)               │
│  - HTTP header dict construction (X-RateLimit-*, Retry-After)       │
│  - plugin metadata dict construction (remaining, reset_in, dims)    │
└─────────────────────────────┬────────────────────────────────────────┘
                              │
              ┌───────────────┴────────────────┐
              │                                │
              ▼                                ▼
┌───────────────────────────────┐   ┌──────────────────────────────────┐
│      Memory backend (Rust)    │   │       Redis backend (Rust)      │
│                               │   │                                  │
│  in-process counters          │   │  owns multiplexed Redis conn   │
│  fixed_window                 │   │  batch Lua execution            │
│  sliding_window               │   │  shared counters across replicas│
│  token_bucket                 │   │  compatible key format with     │
│  single read-lock hot path    │   │  Python Redis backend           │
└───────────────────────────────┘   └──────────────────────────────────┘
                              │
                              ▼
┌──────────────────────────────────────────────────────────────────────┐
│                     Python result dispatch                           │
│                                                                      │
│  allowed -> PromptPrehookResult / ToolPreInvokeResult + headers     │
│  blocked -> PluginViolation(429) + headers + meta                   │
│                                                                      │
│  (headers and meta are pre-built dicts from Rust — no conversion)   │
└──────────────────────────────────────────────────────────────────────┘

Test results

Validation for this PR is intentionally comprehensive: unit tests, Rust tests, micro-benchmarks, hook-path A/B comparisons, multi-instance Redis correctness, sustained load tests, algorithm comparison, and Fyre VM deployment validation are all included below.

The highest-signal correctness and unit-test summaries remain visible. The more detailed benchmark and deployment sections are preserved in expandables so the data stays in the PR without dominating the main narrative.

Test results summary

# Area Headline result
1 Backend correctness 49.6% blocked vs expected ~50%, 0 infra failures
2 Rust micro-benchmarks 17-21 ns single-key, 49 ns three-dimension
3 Hook-path comparison 1.7x-1.9x for most 3-dim memory paths, 28x for sliding_window
4 Redis capacity benchmark p99 60 ms vs 100 ms, 28% less gateway memory
5 Multi-user scale test 47.6% blocked, 0 infra failures across 28,650 requests
6 Algorithm comparison All 3 algorithms at ~47-49% blocked across ~28,600 requests each
7 Fyre VM validation Rust engine validated on x86 Fyre
8 Unit test suite 93 plugin tests passed, 47/47 Rust tests passed

1. Backend correctness (multi-instance Redis)

Multi-instance Redis correctness matched the expected shared-counter behavior: 49.6% blocked, 0 infrastructure failures, and an exact 60 allowed / 60 rate-limited split in the single-user validation run.

Full backend correctness results

Topology: nginx → 3 gateways → shared Redis, 1 user at 60 req/min against a 30/min limit.

Metric Result
Total tool calls 121
Allowed 60
Rate-limited 60
Blocked percentage 49.6%
Infrastructure failures 0
Avg response time 49.7 ms
p90 / p99 76 ms / 88 ms

Verdict: REDIS BACKEND — limit correctly enforced. The shared Redis counter produced the expected ~50% blocked rate across three gateway instances. Zero infrastructure failures.

2. Rust micro-benchmarks (criterion, Apple Silicon M-series, release build)

Criterion benchmarks on Apple Silicon show nanosecond-scale in-process memory-backend costs: 17-21 ns for single-key operations, 49 ns for three-dimension evaluation, and 16-67 µs in the contention scenarios included here.

Full micro-benchmark results

In-process memory backend, direct MemoryStore calls (no PyO3 overhead):

Benchmark Median latency Notes
fixed_window/single_key 19 ns single dimension, window reset each iteration
token_bucket/single_key 21 ns
sliding_window/single_key 17 ns
fixed_window/three_dims 49 ns user + tenant + tool in one call
fixed_window/hot_counter 16 ns counter near limit (steady-state path)
fixed_window/blocked_path 16 ns counter past limit (reject path)
fixed_window/many_keys_10k 185 ns HashMap with 10,000 distinct keys
fixed_window/concurrent_2t 22 µs 2 threads via std::thread::scope
fixed_window/concurrent_4t 35 µs 4 threads
fixed_window/concurrent_8t 67 µs 8 threads

3. Hook-path comparison (Python pytest-benchmark, memory backend)

Measured per-hook Python→Rust round-trip overhead via pytest-benchmark, comparing the Rust engine check() path against the older Python-only algorithm+backend stack. Numbers reflect the full hook path including PyO3 crossing, dict construction, and Python result dispatch.

Full hook-path comparison results

All benchmarks: fixed_window, sliding_window, token_bucket × single dimension, three dimensions.

Config Python mean Rust mean Speedup
fixed_window / 1 dim 5.1 µs 2.8 µs 1.8×
fixed_window / 3 dim 13.2 µs 7.2 µs 1.8×
sliding_window / 1 dim 14.3 µs 2.9 µs 4.9×
sliding_window / 3 dim 81.3 µs 2.9 µs 28×
token_bucket / 1 dim 5.6 µs 2.8 µs 2.0×
token_bucket / 3 dim 14.9 µs 7.8 µs 1.9×

The sliding_window improvement is the most dramatic because the Python implementation walks and filters timestamp lists in pure Python on every check, while Rust uses a VecDeque with amortized O(1) front pops.

4–8. Remaining validation

Redis capacity benchmark, multi-user scale, algorithm comparison, Fyre VM, unit tests

4. Redis capacity benchmark — p99 latency 60 ms (vs 100 ms Python), 28% less gateway RSS. Three gateways, shared Redis, 100 users at 0.25 rps for 5 minutes.

5. Multi-user scale test47.6% blocked across 28,650 requests with 0 infrastructure failures. 100 concurrent users, 5-minute sustained load.

6. Algorithm comparison — All three algorithms (fixed_window, sliding_window, token_bucket) converged to ~47-49% blocked across ~28,600 requests each, confirming equivalent enforcement behavior.

7. Fyre VM validation — Rust engine validated on x86 Fyre VM deployment.

8. Unit test suite93 Python plugin tests passed, 47/47 Rust tests passed.

@gandhipratik203 gandhipratik203 marked this pull request as ready for review March 23, 2026 21:32
@gandhipratik203 gandhipratik203 marked this pull request as draft March 24, 2026 08:15
@gandhipratik203 gandhipratik203 marked this pull request as ready for review March 24, 2026 13:10
@gandhipratik203 gandhipratik203 force-pushed the feat/rate-limiter-rust branch 4 times, most recently from d287cc3 to f48fc3a Compare March 24, 2026 15:51
Copy link
Copy Markdown
Collaborator

@lucarlig lucarlig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one issue to address before merge: the new tenant_id propagation still does not make by_tenant limits work for session-authenticated users.

Comment thread mcpgateway/auth.py
@gandhipratik203 gandhipratik203 marked this pull request as draft March 24, 2026 16:07
@gandhipratik203 gandhipratik203 force-pushed the feat/rate-limiter-rust branch from 6b517ff to 60f07cb Compare March 24, 2026 19:35
@gandhipratik203 gandhipratik203 marked this pull request as ready for review March 24, 2026 22:38
@gandhipratik203 gandhipratik203 changed the title feat(rate-limiter): add rust-backed execution engine and benchmarks feat(rate-limiter): add backed execution engine and benchmarks Mar 25, 2026
@lucarlig lucarlig self-requested a review March 26, 2026 15:05
@lucarlig
Copy link
Copy Markdown
Collaborator

https://docs.rs/pyo3-log/latest/pyo3_log/
please add logs with pyo3_logs

@gandhipratik203 gandhipratik203 force-pushed the feat/rate-limiter-rust branch from d44cd1d to 5514d53 Compare March 27, 2026 08:06
@gandhipratik203 gandhipratik203 added enhancement New feature or request performance Performance related items plugins wxo wxo integration MUST P1: Non-negotiable, critical requirements without which the product is non-functional or unsafe labels Mar 27, 2026
@gandhipratik203 gandhipratik203 added this to the Release 1.0.0 milestone Mar 27, 2026
@gandhipratik203 gandhipratik203 added the release-fix Critical bugfix required for the release label Mar 27, 2026
@gandhipratik203 gandhipratik203 force-pushed the feat/rate-limiter-rust branch from a433395 to 08cd2ad Compare March 28, 2026 19:40
gandhipratik203 added a commit that referenced this pull request Mar 30, 2026
Two optimizations informed by the rate limiter PR (#3809) patterns:

1. Batch list processing: for all-string lists in truncate mode, extract
   all &str borrows in one pass, process in a tight Rust loop, build
   output PyList in a single pass. Better cache locality and avoids
   per-item path string formatting and interleaved append calls.

2. Pre-sized String::with_capacity(): eliminate reallocation during
   truncation by pre-computing body + ellipsis size.

Results:
  - Short list passthrough: 13.6x → 18.9x faster
  - List 10x10KB:           2.6x → 3.0x faster
  - Deep nested dict:       7.1x → 7.0x faster (stable)
  - Wide nested dict:       8.4x → 8.5x faster (stable)
  - 331 Python tests + 47 Rust tests pass
Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
@jonpspri jonpspri self-assigned this Mar 30, 2026
@jonpspri
Copy link
Copy Markdown
Collaborator

GTG

dima-zakharov
dima-zakharov previously approved these changes Mar 30, 2026
Copy link
Copy Markdown
Collaborator

@dima-zakharov dima-zakharov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github show that no new commits after review.

@jonpspri jonpspri assigned crivetimihai and unassigned jonpspri Mar 30, 2026
lucarlig
lucarlig previously approved these changes Mar 30, 2026
gandhipratik203 and others added 4 commits April 2, 2026 15:25
…cale load test

- Add pluggable algorithm strategy: fixed_window, sliding_window, token_bucket
- Add Redis backend for shared cross-instance rate limiting
- Fix tenant isolation: skip by_tenant when tenant_id is None
- Fix sliding window: sweep expired timestamps before counting
- Fix backend validation: restore _validate_config check
- Fix token bucket memory path: apply max(1,...) guard to reset timestamp
- Add Redis integration tests for all three algorithms
- Add direct regression tests for get_current_user tenant_id fallback
- Add scale load test with Redis memory timeline and live algorithm detection
- Add RL_PACE_MULTIPLIER for near-limit pace testing and boundary burst detection
- Remove redundant algorithm locustfile; scale file is canonical
- Correct stale comments and README limitations

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
…and validation

- Rust-backed sliding window engine with pyo3-log integration
- check() API with tenant propagation, sweep/retry-after support
- Eliminate redundant ZRANGE in sliding window Lua script
- Fix detect-secrets baseline for rate limiter load tests
- Clarify memory backend is single-instance only in docs

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
…ity tests

- Extract _dispatch_hook() shared by prompt_pre_fetch and tool_pre_invoke,
  reducing each hook to a single-line wrapper
- Elevate Redis val_i64/val_f64 parse-error logging from warn to error so
  silent fail-open degradation surfaces in operator dashboards
- Clamp sliding-window reset_timestamp with .max(1) so it is always strictly
  in the future even when the oldest entry expires in < 1 s
- Add 5 s tokio::time::timeout around Redis connection establishment to
  prevent indefinite blocking on network partition
- Replace silent except-pass in EVALSHA SHA tracking with logger.debug
- Document dual Lua-script invariant (rolling-upgrade key-format parity)
  in both Python RedisBackend docstring and Rust redis_backend.rs header
- Add 7 parametrized test_redis_key_format_parity_* tests validating that
  Python and Rust produce identical Redis keys for the same inputs
- Revert unrelated .pyi stub changes for encoded_exfil_detection, pii_filter,
  retry_with_backoff, and secrets_detection

Signed-off-by: Jonathan Springer <jps@s390x.com>
@jonpspri jonpspri dismissed stale reviews from lucarlig and dima-zakharov via 8617367 April 2, 2026 14:26
@jonpspri jonpspri force-pushed the feat/rate-limiter-rust branch from 8aa04e1 to 8617367 Compare April 2, 2026 14:26
…e/ralph-loop.local.md

- Remove plugins_rust/rate_limiter/.claude/ralph-loop.local.md which
  was accidentally committed — this is a local Claude Code loop state
  file and should never have been checked in.
- Fix trailing whitespace in plugins_rust/rate_limiter/python/
  rate_limiter_rust/__init__.pyi docstrings to pass pre-commit hooks.

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
Update .secrets.baseline after adding test_extra_sensitive_keywords
in plugins_rust/encoded_exfil_detection/src/lib.rs:969 which contains
a fake credential string that triggers the Secret Keyword detector.
All new entries are false positives (test data).

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
The baseline regeneration reset is_secret to null for entries whose
line numbers shifted. Mark all 17 unaudited entries as is_secret=false
(test data, example configs, fake credentials) to pass the
--fail-on-unaudited pre-commit check.

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
Copy link
Copy Markdown
Member

@brian-hussey brian-hussey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving as this has undergone several rounds of review and has now passed all CI checks.

@brian-hussey brian-hussey merged commit b24e6cb into main Apr 2, 2026
34 checks passed
@brian-hussey brian-hussey deleted the feat/rate-limiter-rust branch April 2, 2026 15:51
msureshkumar88 pushed a commit that referenced this pull request Apr 9, 2026
Two optimizations informed by the rate limiter PR (#3809) patterns:

1. Batch list processing: for all-string lists in truncate mode, extract
   all &str borrows in one pass, process in a tight Rust loop, build
   output PyList in a single pass. Better cache locality and avoids
   per-item path string formatting and interleaved append calls.

2. Pre-sized String::with_capacity(): eliminate reallocation during
   truncation by pre-computing body + ellipsis size.

Results:
  - Short list passthrough: 13.6x → 18.9x faster
  - List 10x10KB:           2.6x → 3.0x faster
  - Deep nested dict:       7.1x → 7.0x faster (stable)
  - Wide nested dict:       8.4x → 8.5x faster (stable)
  - 331 Python tests + 47 Rust tests pass
Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
jonpspri added a commit that referenced this pull request Apr 10, 2026
…ngine, benchmarks, and validation (#3809)

* feat(rate-limiter): pluggable algorithms, tenant isolation fix, and scale load test

- Add pluggable algorithm strategy: fixed_window, sliding_window, token_bucket
- Add Redis backend for shared cross-instance rate limiting
- Fix tenant isolation: skip by_tenant when tenant_id is None
- Fix sliding window: sweep expired timestamps before counting
- Fix backend validation: restore _validate_config check
- Fix token bucket memory path: apply max(1,...) guard to reset timestamp
- Add Redis integration tests for all three algorithms
- Add direct regression tests for get_current_user tenant_id fallback
- Add scale load test with Redis memory timeline and live algorithm detection
- Add RL_PACE_MULTIPLIER for near-limit pace testing and boundary burst detection
- Remove redundant algorithm locustfile; scale file is canonical
- Correct stale comments and README limitations

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

* feat(rate-limiter): add Rust-backed engine, check() API, benchmarks, and validation

- Rust-backed sliding window engine with pyo3-log integration
- check() API with tenant propagation, sweep/retry-after support
- Eliminate redundant ZRANGE in sliding window Lua script
- Fix detect-secrets baseline for rate limiter load tests
- Clarify memory backend is single-instance only in docs

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

* chore: regenerate detect-secrets baseline after rebase

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

* refactor(rate-limiter): review fixes, Redis hardening, key-format parity tests

- Extract _dispatch_hook() shared by prompt_pre_fetch and tool_pre_invoke,
  reducing each hook to a single-line wrapper
- Elevate Redis val_i64/val_f64 parse-error logging from warn to error so
  silent fail-open degradation surfaces in operator dashboards
- Clamp sliding-window reset_timestamp with .max(1) so it is always strictly
  in the future even when the oldest entry expires in < 1 s
- Add 5 s tokio::time::timeout around Redis connection establishment to
  prevent indefinite blocking on network partition
- Replace silent except-pass in EVALSHA SHA tracking with logger.debug
- Document dual Lua-script invariant (rolling-upgrade key-format parity)
  in both Python RedisBackend docstring and Rust redis_backend.rs header
- Add 7 parametrized test_redis_key_format_parity_* tests validating that
  Python and Rust produce identical Redis keys for the same inputs
- Revert unrelated .pyi stub changes for encoded_exfil_detection, pii_filter,
  retry_with_backoff, and secrets_detection

Signed-off-by: Jonathan Springer <jps@s390x.com>

* fix: strip trailing whitespace in pyi stubs, remove accidental .claude/ralph-loop.local.md

- Remove plugins_rust/rate_limiter/.claude/ralph-loop.local.md which
  was accidentally committed — this is a local Claude Code loop state
  file and should never have been checked in.
- Fix trailing whitespace in plugins_rust/rate_limiter/python/
  rate_limiter_rust/__init__.pyi docstrings to pass pre-commit hooks.

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

* chore: regenerate detect-secrets baseline for new exfil test strings

Update .secrets.baseline after adding test_extra_sensitive_keywords
in plugins_rust/encoded_exfil_detection/src/lib.rs:969 which contains
a fake credential string that triggers the Secret Keyword detector.
All new entries are false positives (test data).

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

* chore: audit new detect-secrets baseline entries as false positives

The baseline regeneration reset is_secret to null for entries whose
line numbers shifted. Mark all 17 unaudited entries as is_secret=false
(test data, example configs, fake credentials) to pass the
--fail-on-unaudited pre-commit check.

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

---------

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
Signed-off-by: Jonathan Springer <jps@s390x.com>
Co-authored-by: Jonathan Springer <jps@s390x.com>
jonpspri added a commit that referenced this pull request Apr 10, 2026
…ngine, benchmarks, and validation (#3809)

* feat(rate-limiter): pluggable algorithms, tenant isolation fix, and scale load test

- Add pluggable algorithm strategy: fixed_window, sliding_window, token_bucket
- Add Redis backend for shared cross-instance rate limiting
- Fix tenant isolation: skip by_tenant when tenant_id is None
- Fix sliding window: sweep expired timestamps before counting
- Fix backend validation: restore _validate_config check
- Fix token bucket memory path: apply max(1,...) guard to reset timestamp
- Add Redis integration tests for all three algorithms
- Add direct regression tests for get_current_user tenant_id fallback
- Add scale load test with Redis memory timeline and live algorithm detection
- Add RL_PACE_MULTIPLIER for near-limit pace testing and boundary burst detection
- Remove redundant algorithm locustfile; scale file is canonical
- Correct stale comments and README limitations

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

* feat(rate-limiter): add Rust-backed engine, check() API, benchmarks, and validation

- Rust-backed sliding window engine with pyo3-log integration
- check() API with tenant propagation, sweep/retry-after support
- Eliminate redundant ZRANGE in sliding window Lua script
- Fix detect-secrets baseline for rate limiter load tests
- Clarify memory backend is single-instance only in docs

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

* chore: regenerate detect-secrets baseline after rebase

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

* refactor(rate-limiter): review fixes, Redis hardening, key-format parity tests

- Extract _dispatch_hook() shared by prompt_pre_fetch and tool_pre_invoke,
  reducing each hook to a single-line wrapper
- Elevate Redis val_i64/val_f64 parse-error logging from warn to error so
  silent fail-open degradation surfaces in operator dashboards
- Clamp sliding-window reset_timestamp with .max(1) so it is always strictly
  in the future even when the oldest entry expires in < 1 s
- Add 5 s tokio::time::timeout around Redis connection establishment to
  prevent indefinite blocking on network partition
- Replace silent except-pass in EVALSHA SHA tracking with logger.debug
- Document dual Lua-script invariant (rolling-upgrade key-format parity)
  in both Python RedisBackend docstring and Rust redis_backend.rs header
- Add 7 parametrized test_redis_key_format_parity_* tests validating that
  Python and Rust produce identical Redis keys for the same inputs
- Revert unrelated .pyi stub changes for encoded_exfil_detection, pii_filter,
  retry_with_backoff, and secrets_detection

Signed-off-by: Jonathan Springer <jps@s390x.com>

* fix: strip trailing whitespace in pyi stubs, remove accidental .claude/ralph-loop.local.md

- Remove plugins_rust/rate_limiter/.claude/ralph-loop.local.md which
  was accidentally committed — this is a local Claude Code loop state
  file and should never have been checked in.
- Fix trailing whitespace in plugins_rust/rate_limiter/python/
  rate_limiter_rust/__init__.pyi docstrings to pass pre-commit hooks.

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

* chore: regenerate detect-secrets baseline for new exfil test strings

Update .secrets.baseline after adding test_extra_sensitive_keywords
in plugins_rust/encoded_exfil_detection/src/lib.rs:969 which contains
a fake credential string that triggers the Secret Keyword detector.
All new entries are false positives (test data).

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

* chore: audit new detect-secrets baseline entries as false positives

The baseline regeneration reset is_secret to null for entries whose
line numbers shifted. Mark all 17 unaudited entries as is_secret=false
(test data, example configs, fake credentials) to pass the
--fail-on-unaudited pre-commit check.

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

---------

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
Signed-off-by: Jonathan Springer <jps@s390x.com>
Co-authored-by: Jonathan Springer <jps@s390x.com>
lucarlig pushed a commit that referenced this pull request Apr 10, 2026
…ngine, benchmarks, and validation (#3809)

* feat(rate-limiter): pluggable algorithms, tenant isolation fix, and scale load test

- Add pluggable algorithm strategy: fixed_window, sliding_window, token_bucket
- Add Redis backend for shared cross-instance rate limiting
- Fix tenant isolation: skip by_tenant when tenant_id is None
- Fix sliding window: sweep expired timestamps before counting
- Fix backend validation: restore _validate_config check
- Fix token bucket memory path: apply max(1,...) guard to reset timestamp
- Add Redis integration tests for all three algorithms
- Add direct regression tests for get_current_user tenant_id fallback
- Add scale load test with Redis memory timeline and live algorithm detection
- Add RL_PACE_MULTIPLIER for near-limit pace testing and boundary burst detection
- Remove redundant algorithm locustfile; scale file is canonical
- Correct stale comments and README limitations

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

* feat(rate-limiter): add Rust-backed engine, check() API, benchmarks, and validation

- Rust-backed sliding window engine with pyo3-log integration
- check() API with tenant propagation, sweep/retry-after support
- Eliminate redundant ZRANGE in sliding window Lua script
- Fix detect-secrets baseline for rate limiter load tests
- Clarify memory backend is single-instance only in docs

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

* chore: regenerate detect-secrets baseline after rebase

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

* refactor(rate-limiter): review fixes, Redis hardening, key-format parity tests

- Extract _dispatch_hook() shared by prompt_pre_fetch and tool_pre_invoke,
  reducing each hook to a single-line wrapper
- Elevate Redis val_i64/val_f64 parse-error logging from warn to error so
  silent fail-open degradation surfaces in operator dashboards
- Clamp sliding-window reset_timestamp with .max(1) so it is always strictly
  in the future even when the oldest entry expires in < 1 s
- Add 5 s tokio::time::timeout around Redis connection establishment to
  prevent indefinite blocking on network partition
- Replace silent except-pass in EVALSHA SHA tracking with logger.debug
- Document dual Lua-script invariant (rolling-upgrade key-format parity)
  in both Python RedisBackend docstring and Rust redis_backend.rs header
- Add 7 parametrized test_redis_key_format_parity_* tests validating that
  Python and Rust produce identical Redis keys for the same inputs
- Revert unrelated .pyi stub changes for encoded_exfil_detection, pii_filter,
  retry_with_backoff, and secrets_detection

Signed-off-by: Jonathan Springer <jps@s390x.com>

* fix: strip trailing whitespace in pyi stubs, remove accidental .claude/ralph-loop.local.md

- Remove plugins_rust/rate_limiter/.claude/ralph-loop.local.md which
  was accidentally committed — this is a local Claude Code loop state
  file and should never have been checked in.
- Fix trailing whitespace in plugins_rust/rate_limiter/python/
  rate_limiter_rust/__init__.pyi docstrings to pass pre-commit hooks.

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

* chore: regenerate detect-secrets baseline for new exfil test strings

Update .secrets.baseline after adding test_extra_sensitive_keywords
in plugins_rust/encoded_exfil_detection/src/lib.rs:969 which contains
a fake credential string that triggers the Secret Keyword detector.
All new entries are false positives (test data).

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

* chore: audit new detect-secrets baseline entries as false positives

The baseline regeneration reset is_secret to null for entries whose
line numbers shifted. Mark all 17 unaudited entries as is_secret=false
(test data, example configs, fake credentials) to pass the
--fail-on-unaudited pre-commit check.

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

---------

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
Signed-off-by: Jonathan Springer <jps@s390x.com>
Co-authored-by: Jonathan Springer <jps@s390x.com>

Signed-off-by: lucarlig <luca.carlig@ibm.com>
brian-hussey pushed a commit that referenced this pull request Apr 10, 2026
…3965)

* refactor(plugins): replace in-tree rate_limiter with cpex-rate-limiter package

Remove the in-tree rate_limiter plugin and replace it with the
cpex-rate-limiter PyPI package, a compiled Rust extension providing
the same RateLimiterPlugin class with additional algorithms
(sliding-window, token-bucket) alongside the original fixed-window.

- Add cpex-rate-limiter>=0.0.2 as a [plugins] optional dependency
- Update Containerfile.lite to install the plugins extra
- Remove plugins/rate_limiter/ source directory
- Remove unit and integration tests that imported plugin internals
- Update all config files to use cpex_rate_limiter.RateLimiterPlugin
- Disable RateLimiterPlugin in test fixture config (package not
  available in unit test environment)
- Update documentation to reflect the external package

Signed-off-by: Jonathan Springer <jps@s390x.com>

Signed-off-by: lucarlig <luca.carlig@ibm.com>

* feat(rate-limiter): pluggable algorithms with Rust-backed execution engine, benchmarks, and validation (#3809)

* feat(rate-limiter): pluggable algorithms, tenant isolation fix, and scale load test

- Add pluggable algorithm strategy: fixed_window, sliding_window, token_bucket
- Add Redis backend for shared cross-instance rate limiting
- Fix tenant isolation: skip by_tenant when tenant_id is None
- Fix sliding window: sweep expired timestamps before counting
- Fix backend validation: restore _validate_config check
- Fix token bucket memory path: apply max(1,...) guard to reset timestamp
- Add Redis integration tests for all three algorithms
- Add direct regression tests for get_current_user tenant_id fallback
- Add scale load test with Redis memory timeline and live algorithm detection
- Add RL_PACE_MULTIPLIER for near-limit pace testing and boundary burst detection
- Remove redundant algorithm locustfile; scale file is canonical
- Correct stale comments and README limitations

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

* feat(rate-limiter): add Rust-backed engine, check() API, benchmarks, and validation

- Rust-backed sliding window engine with pyo3-log integration
- check() API with tenant propagation, sweep/retry-after support
- Eliminate redundant ZRANGE in sliding window Lua script
- Fix detect-secrets baseline for rate limiter load tests
- Clarify memory backend is single-instance only in docs

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

* chore: regenerate detect-secrets baseline after rebase

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

* refactor(rate-limiter): review fixes, Redis hardening, key-format parity tests

- Extract _dispatch_hook() shared by prompt_pre_fetch and tool_pre_invoke,
  reducing each hook to a single-line wrapper
- Elevate Redis val_i64/val_f64 parse-error logging from warn to error so
  silent fail-open degradation surfaces in operator dashboards
- Clamp sliding-window reset_timestamp with .max(1) so it is always strictly
  in the future even when the oldest entry expires in < 1 s
- Add 5 s tokio::time::timeout around Redis connection establishment to
  prevent indefinite blocking on network partition
- Replace silent except-pass in EVALSHA SHA tracking with logger.debug
- Document dual Lua-script invariant (rolling-upgrade key-format parity)
  in both Python RedisBackend docstring and Rust redis_backend.rs header
- Add 7 parametrized test_redis_key_format_parity_* tests validating that
  Python and Rust produce identical Redis keys for the same inputs
- Revert unrelated .pyi stub changes for encoded_exfil_detection, pii_filter,
  retry_with_backoff, and secrets_detection

Signed-off-by: Jonathan Springer <jps@s390x.com>

* fix: strip trailing whitespace in pyi stubs, remove accidental .claude/ralph-loop.local.md

- Remove plugins_rust/rate_limiter/.claude/ralph-loop.local.md which
  was accidentally committed — this is a local Claude Code loop state
  file and should never have been checked in.
- Fix trailing whitespace in plugins_rust/rate_limiter/python/
  rate_limiter_rust/__init__.pyi docstrings to pass pre-commit hooks.

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

* chore: regenerate detect-secrets baseline for new exfil test strings

Update .secrets.baseline after adding test_extra_sensitive_keywords
in plugins_rust/encoded_exfil_detection/src/lib.rs:969 which contains
a fake credential string that triggers the Secret Keyword detector.
All new entries are false positives (test data).

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

* chore: audit new detect-secrets baseline entries as false positives

The baseline regeneration reset is_secret to null for entries whose
line numbers shifted. Mark all 17 unaudited entries as is_secret=false
(test data, example configs, fake credentials) to pass the
--fail-on-unaudited pre-commit check.

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

---------

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
Signed-off-by: Jonathan Springer <jps@s390x.com>
Co-authored-by: Jonathan Springer <jps@s390x.com>

Signed-off-by: lucarlig <luca.carlig@ibm.com>

* feat(discovery): add automatic tool discovery with hot/cold classification (#3839)

Implement automatic tool discovery for upstream MCP servers via
usage-aware adaptive polling. The gateway can now continuously
synchronise tool lists from registered servers without manual
intervention.

Server classification (hot/cold):
- Classify servers based on MCP session pool usage patterns
- Hot servers (top 20% by recent usage): polled at 1x base interval
- Cold servers (remaining 80%): polled at 3x base interval
- Classification is deterministic: sorted by recency, active sessions,
  use count, and URL for tie-breaking
- Leader election via Redis with TTL renewal for multi-worker
  coordination
- Falls back to local-only operation without Redis

Integration with GatewayService:
- Health checks respect hot/cold classification intervals
- Auto-refresh of tools/resources/prompts respects classification
- Fail-open on classification errors (poll anyway)
- Poll timestamps tracked via Redis with TTL expiry
- Uses base gateway URL (pre-auth) for classification lookups to
  avoid leaking query-param auth secrets to Redis

Configuration:
- AUTO_REFRESH_SERVERS=true enables automatic tool sync (default: false)
- GATEWAY_AUTO_REFRESH_INTERVAL=300 sets base polling interval
- HOT_COLD_CLASSIFICATION_ENABLED=false (opt-in, requires Redis)

Includes comprehensive tests with 100% coverage on the new
ServerClassificationService and integration tests for the
GatewayService hot/cold polling paths.

Closes #3734

Signed-off-by: Lang-Akshay <akshay.shinde26@ibm.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

Signed-off-by: lucarlig <luca.carlig@ibm.com>

* refactor(plugins): replace in-tree rate_limiter with cpex-rate-limiter package

Remove the in-tree rate_limiter plugin and replace it with the
cpex-rate-limiter PyPI package, a compiled Rust extension providing
the same RateLimiterPlugin class with additional algorithms
(sliding-window, token-bucket) alongside the original fixed-window.

- Add cpex-rate-limiter>=0.0.2 as a [plugins] optional dependency
- Update Containerfile.lite to install the plugins extra
- Remove plugins/rate_limiter/ source directory
- Remove unit and integration tests that imported plugin internals
- Update all config files to use cpex_rate_limiter.RateLimiterPlugin
- Disable RateLimiterPlugin in test fixture config (package not
  available in unit test environment)
- Update documentation to reflect the external package

Signed-off-by: Jonathan Springer <jps@s390x.com>

Signed-off-by: lucarlig <luca.carlig@ibm.com>

* refactor(plugins): update build, CI, and docs for PyPI plugin migration

Remove all plugins_rust/ build infrastructure and update references
across Containerfiles, Makefile, CI workflows, pre-commit configs,
CODEOWNERS, and documentation to reflect that plugins are now
distributed as PyPI packages (cpex-*) via the [plugins] optional extra.

- Remove Rust plugin builder stages from all Containerfiles
- Remove ~100 lines of rust-* plugin Makefile targets (keep mcp-runtime)
- Add --extra plugins to CI pytest workflow
- Add [plugins] extra to install-dev Makefile target
- Update tool_service.py import to use cpex_retry_with_backoff
- Update plugin kind paths in 7 doc files to cpex_pii_filter.*
- Clean up pre-commit, CODEOWNERS, MANIFEST.in, whitesource, .gitignore

Signed-off-by: Jonathan Springer <jps@s390x.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>

* fix(plugins): address PR review findings on PyPI plugin migration

Round 1 (blockers + high):
- Restore exclude-newer = "10 days" in pyproject.toml; replace stale
  langchain/requests pins with cpex-* per-package overrides anchored
  to 2026-04-09 so the plugins resolve newer than the global window
- Guard cpex_retry_with_backoff import in tool_service.py with
  try/except ImportError; falls back to (None, True) for the Python
  pipeline when the optional [plugins] extra is not installed
- Delete orphaned .github/workflows/rust-plugins.yml and the
  associated test cases in tests/unit/test_rust_plugins_workflow.py;
  drop the workflow card from docs/docs/architecture/explorer.html
- Delete orphaned docs/docs/using/plugins/rust-plugins.md and remove
  it from docs/docs/using/plugins/.pages mkdocs nav
- Harden docker-entrypoint.sh install_plugin_requirements:
  canonicalize /app and the resolved requirements path with
  readlink -f and require the path to live under /app/, log
  non-comment lines from the requirements file before pip runs,
  and skip cleanly on validation failure
- Delete PLUGIN-MIGRATION-PLAN.md (one-time planning doc)
- Add COPY plugins/requirements.txt to Containerfile.scratch (the
  layered Containerfile.lite already had it; the broad COPY . in
  Containerfile already includes it)

Round 2 (medium + low):
- Bump cpex-* version pin floors in pyproject.toml [plugins] to
  match resolved versions in uv.lock (cpex-rate-limiter>=0.0.3,
  cpex-encoded-exfil-detection>=0.2.0, cpex-pii-filter>=0.2.0,
  cpex-url-reputation>=0.1.1)
- Add Prerequisites section to tests/performance/PLUGIN_PROFILING.md
  documenting the [plugins] extra requirement
- Add Status: Partially superseded note to ADR-041 explaining that
  plugins_rust/ was removed when in-tree Rust plugins migrated to
  PyPI packages
- Document upgrade semantics in plugins/requirements.txt header
  (pip without --upgrade skips already-satisfied constraints)
- Add importlib.util.find_spec() precheck to
  tests/performance/test_plugins_performance.py main(); the script
  now skips cleanly with an actionable message if any of the five
  cpex packages referenced by the perf config are missing
- Rename tests/unit/test_rust_plugins_workflow.py to
  test_go_toolchain_pinning.py to match its remaining contents
  (Go workflow pin and Makefile toolchain assertion)

Follow-ups tracked in #4116 and
IBM/cpex-plugins#21 for the longer-term tool_service.py refactor
that will eliminate the cross-package import entirely.

Signed-off-by: Jonathan Springer <jps@s390x.com>

Signed-off-by: lucarlig <luca.carlig@ibm.com>

* revert: restore tests changes from PR #3965

Signed-off-by: lucarlig <luca.carlig@ibm.com>

* fix(ci): align plugin tests with PyPI migration

Signed-off-by: lucarlig <luca.carlig@ibm.com>

* test: remove legacy plugin test skip infrastructure

Signed-off-by: lucarlig <luca.carlig@ibm.com>

* test: align packaged plugin tests with rust shims

Signed-off-by: lucarlig <luca.carlig@ibm.com>

* test: cover retry policy import path in tool service

Signed-off-by: lucarlig <luca.carlig@ibm.com>

* fix: harden cpex plugin migration paths

Signed-off-by: lucarlig <luca.carlig@ibm.com>

* test: cover retry policy parser branches

Signed-off-by: lucarlig <luca.carlig@ibm.com>

* test: cover plugin requirements entrypoint path

Signed-off-by: lucarlig <luca.carlig@ibm.com>

---------

Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: Jonathan Springer <jps@s390x.com>
Co-authored-by: Pratik Gandhi <gandhipratik203@gmail.com>
Co-authored-by: Lang-Akshay <akshay.shinde26@ibm.com>
Co-authored-by: lucarlig <luca.carlig@ibm.com>
claudia-gray pushed a commit that referenced this pull request Apr 13, 2026
…3965)

* refactor(plugins): replace in-tree rate_limiter with cpex-rate-limiter package

Remove the in-tree rate_limiter plugin and replace it with the
cpex-rate-limiter PyPI package, a compiled Rust extension providing
the same RateLimiterPlugin class with additional algorithms
(sliding-window, token-bucket) alongside the original fixed-window.

- Add cpex-rate-limiter>=0.0.2 as a [plugins] optional dependency
- Update Containerfile.lite to install the plugins extra
- Remove plugins/rate_limiter/ source directory
- Remove unit and integration tests that imported plugin internals
- Update all config files to use cpex_rate_limiter.RateLimiterPlugin
- Disable RateLimiterPlugin in test fixture config (package not
  available in unit test environment)
- Update documentation to reflect the external package

Signed-off-by: Jonathan Springer <jps@s390x.com>

Signed-off-by: lucarlig <luca.carlig@ibm.com>

* feat(rate-limiter): pluggable algorithms with Rust-backed execution engine, benchmarks, and validation (#3809)

* feat(rate-limiter): pluggable algorithms, tenant isolation fix, and scale load test

- Add pluggable algorithm strategy: fixed_window, sliding_window, token_bucket
- Add Redis backend for shared cross-instance rate limiting
- Fix tenant isolation: skip by_tenant when tenant_id is None
- Fix sliding window: sweep expired timestamps before counting
- Fix backend validation: restore _validate_config check
- Fix token bucket memory path: apply max(1,...) guard to reset timestamp
- Add Redis integration tests for all three algorithms
- Add direct regression tests for get_current_user tenant_id fallback
- Add scale load test with Redis memory timeline and live algorithm detection
- Add RL_PACE_MULTIPLIER for near-limit pace testing and boundary burst detection
- Remove redundant algorithm locustfile; scale file is canonical
- Correct stale comments and README limitations

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

* feat(rate-limiter): add Rust-backed engine, check() API, benchmarks, and validation

- Rust-backed sliding window engine with pyo3-log integration
- check() API with tenant propagation, sweep/retry-after support
- Eliminate redundant ZRANGE in sliding window Lua script
- Fix detect-secrets baseline for rate limiter load tests
- Clarify memory backend is single-instance only in docs

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

* chore: regenerate detect-secrets baseline after rebase

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

* refactor(rate-limiter): review fixes, Redis hardening, key-format parity tests

- Extract _dispatch_hook() shared by prompt_pre_fetch and tool_pre_invoke,
  reducing each hook to a single-line wrapper
- Elevate Redis val_i64/val_f64 parse-error logging from warn to error so
  silent fail-open degradation surfaces in operator dashboards
- Clamp sliding-window reset_timestamp with .max(1) so it is always strictly
  in the future even when the oldest entry expires in < 1 s
- Add 5 s tokio::time::timeout around Redis connection establishment to
  prevent indefinite blocking on network partition
- Replace silent except-pass in EVALSHA SHA tracking with logger.debug
- Document dual Lua-script invariant (rolling-upgrade key-format parity)
  in both Python RedisBackend docstring and Rust redis_backend.rs header
- Add 7 parametrized test_redis_key_format_parity_* tests validating that
  Python and Rust produce identical Redis keys for the same inputs
- Revert unrelated .pyi stub changes for encoded_exfil_detection, pii_filter,
  retry_with_backoff, and secrets_detection

Signed-off-by: Jonathan Springer <jps@s390x.com>

* fix: strip trailing whitespace in pyi stubs, remove accidental .claude/ralph-loop.local.md

- Remove plugins_rust/rate_limiter/.claude/ralph-loop.local.md which
  was accidentally committed — this is a local Claude Code loop state
  file and should never have been checked in.
- Fix trailing whitespace in plugins_rust/rate_limiter/python/
  rate_limiter_rust/__init__.pyi docstrings to pass pre-commit hooks.

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

* chore: regenerate detect-secrets baseline for new exfil test strings

Update .secrets.baseline after adding test_extra_sensitive_keywords
in plugins_rust/encoded_exfil_detection/src/lib.rs:969 which contains
a fake credential string that triggers the Secret Keyword detector.
All new entries are false positives (test data).

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

* chore: audit new detect-secrets baseline entries as false positives

The baseline regeneration reset is_secret to null for entries whose
line numbers shifted. Mark all 17 unaudited entries as is_secret=false
(test data, example configs, fake credentials) to pass the
--fail-on-unaudited pre-commit check.

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

---------

Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
Signed-off-by: Jonathan Springer <jps@s390x.com>
Co-authored-by: Jonathan Springer <jps@s390x.com>

Signed-off-by: lucarlig <luca.carlig@ibm.com>

* feat(discovery): add automatic tool discovery with hot/cold classification (#3839)

Implement automatic tool discovery for upstream MCP servers via
usage-aware adaptive polling. The gateway can now continuously
synchronise tool lists from registered servers without manual
intervention.

Server classification (hot/cold):
- Classify servers based on MCP session pool usage patterns
- Hot servers (top 20% by recent usage): polled at 1x base interval
- Cold servers (remaining 80%): polled at 3x base interval
- Classification is deterministic: sorted by recency, active sessions,
  use count, and URL for tie-breaking
- Leader election via Redis with TTL renewal for multi-worker
  coordination
- Falls back to local-only operation without Redis

Integration with GatewayService:
- Health checks respect hot/cold classification intervals
- Auto-refresh of tools/resources/prompts respects classification
- Fail-open on classification errors (poll anyway)
- Poll timestamps tracked via Redis with TTL expiry
- Uses base gateway URL (pre-auth) for classification lookups to
  avoid leaking query-param auth secrets to Redis

Configuration:
- AUTO_REFRESH_SERVERS=true enables automatic tool sync (default: false)
- GATEWAY_AUTO_REFRESH_INTERVAL=300 sets base polling interval
- HOT_COLD_CLASSIFICATION_ENABLED=false (opt-in, requires Redis)

Includes comprehensive tests with 100% coverage on the new
ServerClassificationService and integration tests for the
GatewayService hot/cold polling paths.

Closes #3734

Signed-off-by: Lang-Akshay <akshay.shinde26@ibm.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

Signed-off-by: lucarlig <luca.carlig@ibm.com>

* refactor(plugins): replace in-tree rate_limiter with cpex-rate-limiter package

Remove the in-tree rate_limiter plugin and replace it with the
cpex-rate-limiter PyPI package, a compiled Rust extension providing
the same RateLimiterPlugin class with additional algorithms
(sliding-window, token-bucket) alongside the original fixed-window.

- Add cpex-rate-limiter>=0.0.2 as a [plugins] optional dependency
- Update Containerfile.lite to install the plugins extra
- Remove plugins/rate_limiter/ source directory
- Remove unit and integration tests that imported plugin internals
- Update all config files to use cpex_rate_limiter.RateLimiterPlugin
- Disable RateLimiterPlugin in test fixture config (package not
  available in unit test environment)
- Update documentation to reflect the external package

Signed-off-by: Jonathan Springer <jps@s390x.com>

Signed-off-by: lucarlig <luca.carlig@ibm.com>

* refactor(plugins): update build, CI, and docs for PyPI plugin migration

Remove all plugins_rust/ build infrastructure and update references
across Containerfiles, Makefile, CI workflows, pre-commit configs,
CODEOWNERS, and documentation to reflect that plugins are now
distributed as PyPI packages (cpex-*) via the [plugins] optional extra.

- Remove Rust plugin builder stages from all Containerfiles
- Remove ~100 lines of rust-* plugin Makefile targets (keep mcp-runtime)
- Add --extra plugins to CI pytest workflow
- Add [plugins] extra to install-dev Makefile target
- Update tool_service.py import to use cpex_retry_with_backoff
- Update plugin kind paths in 7 doc files to cpex_pii_filter.*
- Clean up pre-commit, CODEOWNERS, MANIFEST.in, whitesource, .gitignore

Signed-off-by: Jonathan Springer <jps@s390x.com>
Signed-off-by: lucarlig <luca.carlig@ibm.com>

* fix(plugins): address PR review findings on PyPI plugin migration

Round 1 (blockers + high):
- Restore exclude-newer = "10 days" in pyproject.toml; replace stale
  langchain/requests pins with cpex-* per-package overrides anchored
  to 2026-04-09 so the plugins resolve newer than the global window
- Guard cpex_retry_with_backoff import in tool_service.py with
  try/except ImportError; falls back to (None, True) for the Python
  pipeline when the optional [plugins] extra is not installed
- Delete orphaned .github/workflows/rust-plugins.yml and the
  associated test cases in tests/unit/test_rust_plugins_workflow.py;
  drop the workflow card from docs/docs/architecture/explorer.html
- Delete orphaned docs/docs/using/plugins/rust-plugins.md and remove
  it from docs/docs/using/plugins/.pages mkdocs nav
- Harden docker-entrypoint.sh install_plugin_requirements:
  canonicalize /app and the resolved requirements path with
  readlink -f and require the path to live under /app/, log
  non-comment lines from the requirements file before pip runs,
  and skip cleanly on validation failure
- Delete PLUGIN-MIGRATION-PLAN.md (one-time planning doc)
- Add COPY plugins/requirements.txt to Containerfile.scratch (the
  layered Containerfile.lite already had it; the broad COPY . in
  Containerfile already includes it)

Round 2 (medium + low):
- Bump cpex-* version pin floors in pyproject.toml [plugins] to
  match resolved versions in uv.lock (cpex-rate-limiter>=0.0.3,
  cpex-encoded-exfil-detection>=0.2.0, cpex-pii-filter>=0.2.0,
  cpex-url-reputation>=0.1.1)
- Add Prerequisites section to tests/performance/PLUGIN_PROFILING.md
  documenting the [plugins] extra requirement
- Add Status: Partially superseded note to ADR-041 explaining that
  plugins_rust/ was removed when in-tree Rust plugins migrated to
  PyPI packages
- Document upgrade semantics in plugins/requirements.txt header
  (pip without --upgrade skips already-satisfied constraints)
- Add importlib.util.find_spec() precheck to
  tests/performance/test_plugins_performance.py main(); the script
  now skips cleanly with an actionable message if any of the five
  cpex packages referenced by the perf config are missing
- Rename tests/unit/test_rust_plugins_workflow.py to
  test_go_toolchain_pinning.py to match its remaining contents
  (Go workflow pin and Makefile toolchain assertion)

Follow-ups tracked in #4116 and
IBM/cpex-plugins#21 for the longer-term tool_service.py refactor
that will eliminate the cross-package import entirely.

Signed-off-by: Jonathan Springer <jps@s390x.com>

Signed-off-by: lucarlig <luca.carlig@ibm.com>

* revert: restore tests changes from PR #3965

Signed-off-by: lucarlig <luca.carlig@ibm.com>

* fix(ci): align plugin tests with PyPI migration

Signed-off-by: lucarlig <luca.carlig@ibm.com>

* test: remove legacy plugin test skip infrastructure

Signed-off-by: lucarlig <luca.carlig@ibm.com>

* test: align packaged plugin tests with rust shims

Signed-off-by: lucarlig <luca.carlig@ibm.com>

* test: cover retry policy import path in tool service

Signed-off-by: lucarlig <luca.carlig@ibm.com>

* fix: harden cpex plugin migration paths

Signed-off-by: lucarlig <luca.carlig@ibm.com>

* test: cover retry policy parser branches

Signed-off-by: lucarlig <luca.carlig@ibm.com>

* test: cover plugin requirements entrypoint path

Signed-off-by: lucarlig <luca.carlig@ibm.com>

---------

Signed-off-by: lucarlig <luca.carlig@ibm.com>
Signed-off-by: Jonathan Springer <jps@s390x.com>
Co-authored-by: Pratik Gandhi <gandhipratik203@gmail.com>
Co-authored-by: Lang-Akshay <akshay.shinde26@ibm.com>
Co-authored-by: lucarlig <luca.carlig@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request MUST P1: Non-negotiable, critical requirements without which the product is non-functional or unsafe performance Performance related items plugins release-fix Critical bugfix required for the release wxo wxo integration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants