feat(rate-limiter): pluggable algorithms with Rust-backed execution engine, benchmarks, and validation#3809
Merged
brian-hussey merged 7 commits intomainfrom Apr 2, 2026
Merged
Conversation
d287cc3 to
f48fc3a
Compare
lucarlig
requested changes
Mar 24, 2026
Collaborator
lucarlig
left a comment
There was a problem hiding this comment.
I found one issue to address before merge: the new tenant_id propagation still does not make by_tenant limits work for session-authenticated users.
6b517ff to
60f07cb
Compare
Collaborator
|
https://docs.rs/pyo3-log/latest/pyo3_log/ |
d44cd1d to
5514d53
Compare
a433395 to
08cd2ad
Compare
gandhipratik203
added a commit
that referenced
this pull request
Mar 30, 2026
Two optimizations informed by the rate limiter PR (#3809) patterns: 1. Batch list processing: for all-string lists in truncate mode, extract all &str borrows in one pass, process in a tight Rust loop, build output PyList in a single pass. Better cache locality and avoids per-item path string formatting and interleaved append calls. 2. Pre-sized String::with_capacity(): eliminate reallocation during truncation by pre-computing body + ellipsis size. Results: - Short list passthrough: 13.6x → 18.9x faster - List 10x10KB: 2.6x → 3.0x faster - Deep nested dict: 7.1x → 7.0x faster (stable) - Wide nested dict: 8.4x → 8.5x faster (stable) - 331 Python tests + 47 Rust tests pass Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
Collaborator
|
GTG |
dima-zakharov
previously approved these changes
Mar 30, 2026
Collaborator
dima-zakharov
left a comment
There was a problem hiding this comment.
Github show that no new commits after review.
lucarlig
previously approved these changes
Mar 30, 2026
…cale load test - Add pluggable algorithm strategy: fixed_window, sliding_window, token_bucket - Add Redis backend for shared cross-instance rate limiting - Fix tenant isolation: skip by_tenant when tenant_id is None - Fix sliding window: sweep expired timestamps before counting - Fix backend validation: restore _validate_config check - Fix token bucket memory path: apply max(1,...) guard to reset timestamp - Add Redis integration tests for all three algorithms - Add direct regression tests for get_current_user tenant_id fallback - Add scale load test with Redis memory timeline and live algorithm detection - Add RL_PACE_MULTIPLIER for near-limit pace testing and boundary burst detection - Remove redundant algorithm locustfile; scale file is canonical - Correct stale comments and README limitations Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
…and validation - Rust-backed sliding window engine with pyo3-log integration - check() API with tenant propagation, sweep/retry-after support - Eliminate redundant ZRANGE in sliding window Lua script - Fix detect-secrets baseline for rate limiter load tests - Clarify memory backend is single-instance only in docs Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
…ity tests - Extract _dispatch_hook() shared by prompt_pre_fetch and tool_pre_invoke, reducing each hook to a single-line wrapper - Elevate Redis val_i64/val_f64 parse-error logging from warn to error so silent fail-open degradation surfaces in operator dashboards - Clamp sliding-window reset_timestamp with .max(1) so it is always strictly in the future even when the oldest entry expires in < 1 s - Add 5 s tokio::time::timeout around Redis connection establishment to prevent indefinite blocking on network partition - Replace silent except-pass in EVALSHA SHA tracking with logger.debug - Document dual Lua-script invariant (rolling-upgrade key-format parity) in both Python RedisBackend docstring and Rust redis_backend.rs header - Add 7 parametrized test_redis_key_format_parity_* tests validating that Python and Rust produce identical Redis keys for the same inputs - Revert unrelated .pyi stub changes for encoded_exfil_detection, pii_filter, retry_with_backoff, and secrets_detection Signed-off-by: Jonathan Springer <jps@s390x.com>
8aa04e1 to
8617367
Compare
…e/ralph-loop.local.md - Remove plugins_rust/rate_limiter/.claude/ralph-loop.local.md which was accidentally committed — this is a local Claude Code loop state file and should never have been checked in. - Fix trailing whitespace in plugins_rust/rate_limiter/python/ rate_limiter_rust/__init__.pyi docstrings to pass pre-commit hooks. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
Update .secrets.baseline after adding test_extra_sensitive_keywords in plugins_rust/encoded_exfil_detection/src/lib.rs:969 which contains a fake credential string that triggers the Secret Keyword detector. All new entries are false positives (test data). Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
The baseline regeneration reset is_secret to null for entries whose line numbers shifted. Mark all 17 unaudited entries as is_secret=false (test data, example configs, fake credentials) to pass the --fail-on-unaudited pre-commit check. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
brian-hussey
approved these changes
Apr 2, 2026
Member
brian-hussey
left a comment
There was a problem hiding this comment.
Approving as this has undergone several rounds of review and has now passed all CI checks.
msureshkumar88
pushed a commit
that referenced
this pull request
Apr 9, 2026
Two optimizations informed by the rate limiter PR (#3809) patterns: 1. Batch list processing: for all-string lists in truncate mode, extract all &str borrows in one pass, process in a tight Rust loop, build output PyList in a single pass. Better cache locality and avoids per-item path string formatting and interleaved append calls. 2. Pre-sized String::with_capacity(): eliminate reallocation during truncation by pre-computing body + ellipsis size. Results: - Short list passthrough: 13.6x → 18.9x faster - List 10x10KB: 2.6x → 3.0x faster - Deep nested dict: 7.1x → 7.0x faster (stable) - Wide nested dict: 8.4x → 8.5x faster (stable) - 331 Python tests + 47 Rust tests pass Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
jonpspri
added a commit
that referenced
this pull request
Apr 10, 2026
…ngine, benchmarks, and validation (#3809) * feat(rate-limiter): pluggable algorithms, tenant isolation fix, and scale load test - Add pluggable algorithm strategy: fixed_window, sliding_window, token_bucket - Add Redis backend for shared cross-instance rate limiting - Fix tenant isolation: skip by_tenant when tenant_id is None - Fix sliding window: sweep expired timestamps before counting - Fix backend validation: restore _validate_config check - Fix token bucket memory path: apply max(1,...) guard to reset timestamp - Add Redis integration tests for all three algorithms - Add direct regression tests for get_current_user tenant_id fallback - Add scale load test with Redis memory timeline and live algorithm detection - Add RL_PACE_MULTIPLIER for near-limit pace testing and boundary burst detection - Remove redundant algorithm locustfile; scale file is canonical - Correct stale comments and README limitations Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * feat(rate-limiter): add Rust-backed engine, check() API, benchmarks, and validation - Rust-backed sliding window engine with pyo3-log integration - check() API with tenant propagation, sweep/retry-after support - Eliminate redundant ZRANGE in sliding window Lua script - Fix detect-secrets baseline for rate limiter load tests - Clarify memory backend is single-instance only in docs Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * chore: regenerate detect-secrets baseline after rebase Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * refactor(rate-limiter): review fixes, Redis hardening, key-format parity tests - Extract _dispatch_hook() shared by prompt_pre_fetch and tool_pre_invoke, reducing each hook to a single-line wrapper - Elevate Redis val_i64/val_f64 parse-error logging from warn to error so silent fail-open degradation surfaces in operator dashboards - Clamp sliding-window reset_timestamp with .max(1) so it is always strictly in the future even when the oldest entry expires in < 1 s - Add 5 s tokio::time::timeout around Redis connection establishment to prevent indefinite blocking on network partition - Replace silent except-pass in EVALSHA SHA tracking with logger.debug - Document dual Lua-script invariant (rolling-upgrade key-format parity) in both Python RedisBackend docstring and Rust redis_backend.rs header - Add 7 parametrized test_redis_key_format_parity_* tests validating that Python and Rust produce identical Redis keys for the same inputs - Revert unrelated .pyi stub changes for encoded_exfil_detection, pii_filter, retry_with_backoff, and secrets_detection Signed-off-by: Jonathan Springer <jps@s390x.com> * fix: strip trailing whitespace in pyi stubs, remove accidental .claude/ralph-loop.local.md - Remove plugins_rust/rate_limiter/.claude/ralph-loop.local.md which was accidentally committed — this is a local Claude Code loop state file and should never have been checked in. - Fix trailing whitespace in plugins_rust/rate_limiter/python/ rate_limiter_rust/__init__.pyi docstrings to pass pre-commit hooks. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * chore: regenerate detect-secrets baseline for new exfil test strings Update .secrets.baseline after adding test_extra_sensitive_keywords in plugins_rust/encoded_exfil_detection/src/lib.rs:969 which contains a fake credential string that triggers the Secret Keyword detector. All new entries are false positives (test data). Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * chore: audit new detect-secrets baseline entries as false positives The baseline regeneration reset is_secret to null for entries whose line numbers shifted. Mark all 17 unaudited entries as is_secret=false (test data, example configs, fake credentials) to pass the --fail-on-unaudited pre-commit check. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> --------- Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> Signed-off-by: Jonathan Springer <jps@s390x.com> Co-authored-by: Jonathan Springer <jps@s390x.com>
jonpspri
added a commit
that referenced
this pull request
Apr 10, 2026
…ngine, benchmarks, and validation (#3809) * feat(rate-limiter): pluggable algorithms, tenant isolation fix, and scale load test - Add pluggable algorithm strategy: fixed_window, sliding_window, token_bucket - Add Redis backend for shared cross-instance rate limiting - Fix tenant isolation: skip by_tenant when tenant_id is None - Fix sliding window: sweep expired timestamps before counting - Fix backend validation: restore _validate_config check - Fix token bucket memory path: apply max(1,...) guard to reset timestamp - Add Redis integration tests for all three algorithms - Add direct regression tests for get_current_user tenant_id fallback - Add scale load test with Redis memory timeline and live algorithm detection - Add RL_PACE_MULTIPLIER for near-limit pace testing and boundary burst detection - Remove redundant algorithm locustfile; scale file is canonical - Correct stale comments and README limitations Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * feat(rate-limiter): add Rust-backed engine, check() API, benchmarks, and validation - Rust-backed sliding window engine with pyo3-log integration - check() API with tenant propagation, sweep/retry-after support - Eliminate redundant ZRANGE in sliding window Lua script - Fix detect-secrets baseline for rate limiter load tests - Clarify memory backend is single-instance only in docs Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * chore: regenerate detect-secrets baseline after rebase Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * refactor(rate-limiter): review fixes, Redis hardening, key-format parity tests - Extract _dispatch_hook() shared by prompt_pre_fetch and tool_pre_invoke, reducing each hook to a single-line wrapper - Elevate Redis val_i64/val_f64 parse-error logging from warn to error so silent fail-open degradation surfaces in operator dashboards - Clamp sliding-window reset_timestamp with .max(1) so it is always strictly in the future even when the oldest entry expires in < 1 s - Add 5 s tokio::time::timeout around Redis connection establishment to prevent indefinite blocking on network partition - Replace silent except-pass in EVALSHA SHA tracking with logger.debug - Document dual Lua-script invariant (rolling-upgrade key-format parity) in both Python RedisBackend docstring and Rust redis_backend.rs header - Add 7 parametrized test_redis_key_format_parity_* tests validating that Python and Rust produce identical Redis keys for the same inputs - Revert unrelated .pyi stub changes for encoded_exfil_detection, pii_filter, retry_with_backoff, and secrets_detection Signed-off-by: Jonathan Springer <jps@s390x.com> * fix: strip trailing whitespace in pyi stubs, remove accidental .claude/ralph-loop.local.md - Remove plugins_rust/rate_limiter/.claude/ralph-loop.local.md which was accidentally committed — this is a local Claude Code loop state file and should never have been checked in. - Fix trailing whitespace in plugins_rust/rate_limiter/python/ rate_limiter_rust/__init__.pyi docstrings to pass pre-commit hooks. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * chore: regenerate detect-secrets baseline for new exfil test strings Update .secrets.baseline after adding test_extra_sensitive_keywords in plugins_rust/encoded_exfil_detection/src/lib.rs:969 which contains a fake credential string that triggers the Secret Keyword detector. All new entries are false positives (test data). Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * chore: audit new detect-secrets baseline entries as false positives The baseline regeneration reset is_secret to null for entries whose line numbers shifted. Mark all 17 unaudited entries as is_secret=false (test data, example configs, fake credentials) to pass the --fail-on-unaudited pre-commit check. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> --------- Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> Signed-off-by: Jonathan Springer <jps@s390x.com> Co-authored-by: Jonathan Springer <jps@s390x.com>
lucarlig
pushed a commit
that referenced
this pull request
Apr 10, 2026
…ngine, benchmarks, and validation (#3809) * feat(rate-limiter): pluggable algorithms, tenant isolation fix, and scale load test - Add pluggable algorithm strategy: fixed_window, sliding_window, token_bucket - Add Redis backend for shared cross-instance rate limiting - Fix tenant isolation: skip by_tenant when tenant_id is None - Fix sliding window: sweep expired timestamps before counting - Fix backend validation: restore _validate_config check - Fix token bucket memory path: apply max(1,...) guard to reset timestamp - Add Redis integration tests for all three algorithms - Add direct regression tests for get_current_user tenant_id fallback - Add scale load test with Redis memory timeline and live algorithm detection - Add RL_PACE_MULTIPLIER for near-limit pace testing and boundary burst detection - Remove redundant algorithm locustfile; scale file is canonical - Correct stale comments and README limitations Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * feat(rate-limiter): add Rust-backed engine, check() API, benchmarks, and validation - Rust-backed sliding window engine with pyo3-log integration - check() API with tenant propagation, sweep/retry-after support - Eliminate redundant ZRANGE in sliding window Lua script - Fix detect-secrets baseline for rate limiter load tests - Clarify memory backend is single-instance only in docs Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * chore: regenerate detect-secrets baseline after rebase Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * refactor(rate-limiter): review fixes, Redis hardening, key-format parity tests - Extract _dispatch_hook() shared by prompt_pre_fetch and tool_pre_invoke, reducing each hook to a single-line wrapper - Elevate Redis val_i64/val_f64 parse-error logging from warn to error so silent fail-open degradation surfaces in operator dashboards - Clamp sliding-window reset_timestamp with .max(1) so it is always strictly in the future even when the oldest entry expires in < 1 s - Add 5 s tokio::time::timeout around Redis connection establishment to prevent indefinite blocking on network partition - Replace silent except-pass in EVALSHA SHA tracking with logger.debug - Document dual Lua-script invariant (rolling-upgrade key-format parity) in both Python RedisBackend docstring and Rust redis_backend.rs header - Add 7 parametrized test_redis_key_format_parity_* tests validating that Python and Rust produce identical Redis keys for the same inputs - Revert unrelated .pyi stub changes for encoded_exfil_detection, pii_filter, retry_with_backoff, and secrets_detection Signed-off-by: Jonathan Springer <jps@s390x.com> * fix: strip trailing whitespace in pyi stubs, remove accidental .claude/ralph-loop.local.md - Remove plugins_rust/rate_limiter/.claude/ralph-loop.local.md which was accidentally committed — this is a local Claude Code loop state file and should never have been checked in. - Fix trailing whitespace in plugins_rust/rate_limiter/python/ rate_limiter_rust/__init__.pyi docstrings to pass pre-commit hooks. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * chore: regenerate detect-secrets baseline for new exfil test strings Update .secrets.baseline after adding test_extra_sensitive_keywords in plugins_rust/encoded_exfil_detection/src/lib.rs:969 which contains a fake credential string that triggers the Secret Keyword detector. All new entries are false positives (test data). Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * chore: audit new detect-secrets baseline entries as false positives The baseline regeneration reset is_secret to null for entries whose line numbers shifted. Mark all 17 unaudited entries as is_secret=false (test data, example configs, fake credentials) to pass the --fail-on-unaudited pre-commit check. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> --------- Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> Signed-off-by: Jonathan Springer <jps@s390x.com> Co-authored-by: Jonathan Springer <jps@s390x.com> Signed-off-by: lucarlig <luca.carlig@ibm.com>
brian-hussey
pushed a commit
that referenced
this pull request
Apr 10, 2026
…3965) * refactor(plugins): replace in-tree rate_limiter with cpex-rate-limiter package Remove the in-tree rate_limiter plugin and replace it with the cpex-rate-limiter PyPI package, a compiled Rust extension providing the same RateLimiterPlugin class with additional algorithms (sliding-window, token-bucket) alongside the original fixed-window. - Add cpex-rate-limiter>=0.0.2 as a [plugins] optional dependency - Update Containerfile.lite to install the plugins extra - Remove plugins/rate_limiter/ source directory - Remove unit and integration tests that imported plugin internals - Update all config files to use cpex_rate_limiter.RateLimiterPlugin - Disable RateLimiterPlugin in test fixture config (package not available in unit test environment) - Update documentation to reflect the external package Signed-off-by: Jonathan Springer <jps@s390x.com> Signed-off-by: lucarlig <luca.carlig@ibm.com> * feat(rate-limiter): pluggable algorithms with Rust-backed execution engine, benchmarks, and validation (#3809) * feat(rate-limiter): pluggable algorithms, tenant isolation fix, and scale load test - Add pluggable algorithm strategy: fixed_window, sliding_window, token_bucket - Add Redis backend for shared cross-instance rate limiting - Fix tenant isolation: skip by_tenant when tenant_id is None - Fix sliding window: sweep expired timestamps before counting - Fix backend validation: restore _validate_config check - Fix token bucket memory path: apply max(1,...) guard to reset timestamp - Add Redis integration tests for all three algorithms - Add direct regression tests for get_current_user tenant_id fallback - Add scale load test with Redis memory timeline and live algorithm detection - Add RL_PACE_MULTIPLIER for near-limit pace testing and boundary burst detection - Remove redundant algorithm locustfile; scale file is canonical - Correct stale comments and README limitations Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * feat(rate-limiter): add Rust-backed engine, check() API, benchmarks, and validation - Rust-backed sliding window engine with pyo3-log integration - check() API with tenant propagation, sweep/retry-after support - Eliminate redundant ZRANGE in sliding window Lua script - Fix detect-secrets baseline for rate limiter load tests - Clarify memory backend is single-instance only in docs Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * chore: regenerate detect-secrets baseline after rebase Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * refactor(rate-limiter): review fixes, Redis hardening, key-format parity tests - Extract _dispatch_hook() shared by prompt_pre_fetch and tool_pre_invoke, reducing each hook to a single-line wrapper - Elevate Redis val_i64/val_f64 parse-error logging from warn to error so silent fail-open degradation surfaces in operator dashboards - Clamp sliding-window reset_timestamp with .max(1) so it is always strictly in the future even when the oldest entry expires in < 1 s - Add 5 s tokio::time::timeout around Redis connection establishment to prevent indefinite blocking on network partition - Replace silent except-pass in EVALSHA SHA tracking with logger.debug - Document dual Lua-script invariant (rolling-upgrade key-format parity) in both Python RedisBackend docstring and Rust redis_backend.rs header - Add 7 parametrized test_redis_key_format_parity_* tests validating that Python and Rust produce identical Redis keys for the same inputs - Revert unrelated .pyi stub changes for encoded_exfil_detection, pii_filter, retry_with_backoff, and secrets_detection Signed-off-by: Jonathan Springer <jps@s390x.com> * fix: strip trailing whitespace in pyi stubs, remove accidental .claude/ralph-loop.local.md - Remove plugins_rust/rate_limiter/.claude/ralph-loop.local.md which was accidentally committed — this is a local Claude Code loop state file and should never have been checked in. - Fix trailing whitespace in plugins_rust/rate_limiter/python/ rate_limiter_rust/__init__.pyi docstrings to pass pre-commit hooks. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * chore: regenerate detect-secrets baseline for new exfil test strings Update .secrets.baseline after adding test_extra_sensitive_keywords in plugins_rust/encoded_exfil_detection/src/lib.rs:969 which contains a fake credential string that triggers the Secret Keyword detector. All new entries are false positives (test data). Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * chore: audit new detect-secrets baseline entries as false positives The baseline regeneration reset is_secret to null for entries whose line numbers shifted. Mark all 17 unaudited entries as is_secret=false (test data, example configs, fake credentials) to pass the --fail-on-unaudited pre-commit check. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> --------- Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> Signed-off-by: Jonathan Springer <jps@s390x.com> Co-authored-by: Jonathan Springer <jps@s390x.com> Signed-off-by: lucarlig <luca.carlig@ibm.com> * feat(discovery): add automatic tool discovery with hot/cold classification (#3839) Implement automatic tool discovery for upstream MCP servers via usage-aware adaptive polling. The gateway can now continuously synchronise tool lists from registered servers without manual intervention. Server classification (hot/cold): - Classify servers based on MCP session pool usage patterns - Hot servers (top 20% by recent usage): polled at 1x base interval - Cold servers (remaining 80%): polled at 3x base interval - Classification is deterministic: sorted by recency, active sessions, use count, and URL for tie-breaking - Leader election via Redis with TTL renewal for multi-worker coordination - Falls back to local-only operation without Redis Integration with GatewayService: - Health checks respect hot/cold classification intervals - Auto-refresh of tools/resources/prompts respects classification - Fail-open on classification errors (poll anyway) - Poll timestamps tracked via Redis with TTL expiry - Uses base gateway URL (pre-auth) for classification lookups to avoid leaking query-param auth secrets to Redis Configuration: - AUTO_REFRESH_SERVERS=true enables automatic tool sync (default: false) - GATEWAY_AUTO_REFRESH_INTERVAL=300 sets base polling interval - HOT_COLD_CLASSIFICATION_ENABLED=false (opt-in, requires Redis) Includes comprehensive tests with 100% coverage on the new ServerClassificationService and integration tests for the GatewayService hot/cold polling paths. Closes #3734 Signed-off-by: Lang-Akshay <akshay.shinde26@ibm.com> Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> Signed-off-by: lucarlig <luca.carlig@ibm.com> * refactor(plugins): replace in-tree rate_limiter with cpex-rate-limiter package Remove the in-tree rate_limiter plugin and replace it with the cpex-rate-limiter PyPI package, a compiled Rust extension providing the same RateLimiterPlugin class with additional algorithms (sliding-window, token-bucket) alongside the original fixed-window. - Add cpex-rate-limiter>=0.0.2 as a [plugins] optional dependency - Update Containerfile.lite to install the plugins extra - Remove plugins/rate_limiter/ source directory - Remove unit and integration tests that imported plugin internals - Update all config files to use cpex_rate_limiter.RateLimiterPlugin - Disable RateLimiterPlugin in test fixture config (package not available in unit test environment) - Update documentation to reflect the external package Signed-off-by: Jonathan Springer <jps@s390x.com> Signed-off-by: lucarlig <luca.carlig@ibm.com> * refactor(plugins): update build, CI, and docs for PyPI plugin migration Remove all plugins_rust/ build infrastructure and update references across Containerfiles, Makefile, CI workflows, pre-commit configs, CODEOWNERS, and documentation to reflect that plugins are now distributed as PyPI packages (cpex-*) via the [plugins] optional extra. - Remove Rust plugin builder stages from all Containerfiles - Remove ~100 lines of rust-* plugin Makefile targets (keep mcp-runtime) - Add --extra plugins to CI pytest workflow - Add [plugins] extra to install-dev Makefile target - Update tool_service.py import to use cpex_retry_with_backoff - Update plugin kind paths in 7 doc files to cpex_pii_filter.* - Clean up pre-commit, CODEOWNERS, MANIFEST.in, whitesource, .gitignore Signed-off-by: Jonathan Springer <jps@s390x.com> Signed-off-by: lucarlig <luca.carlig@ibm.com> * fix(plugins): address PR review findings on PyPI plugin migration Round 1 (blockers + high): - Restore exclude-newer = "10 days" in pyproject.toml; replace stale langchain/requests pins with cpex-* per-package overrides anchored to 2026-04-09 so the plugins resolve newer than the global window - Guard cpex_retry_with_backoff import in tool_service.py with try/except ImportError; falls back to (None, True) for the Python pipeline when the optional [plugins] extra is not installed - Delete orphaned .github/workflows/rust-plugins.yml and the associated test cases in tests/unit/test_rust_plugins_workflow.py; drop the workflow card from docs/docs/architecture/explorer.html - Delete orphaned docs/docs/using/plugins/rust-plugins.md and remove it from docs/docs/using/plugins/.pages mkdocs nav - Harden docker-entrypoint.sh install_plugin_requirements: canonicalize /app and the resolved requirements path with readlink -f and require the path to live under /app/, log non-comment lines from the requirements file before pip runs, and skip cleanly on validation failure - Delete PLUGIN-MIGRATION-PLAN.md (one-time planning doc) - Add COPY plugins/requirements.txt to Containerfile.scratch (the layered Containerfile.lite already had it; the broad COPY . in Containerfile already includes it) Round 2 (medium + low): - Bump cpex-* version pin floors in pyproject.toml [plugins] to match resolved versions in uv.lock (cpex-rate-limiter>=0.0.3, cpex-encoded-exfil-detection>=0.2.0, cpex-pii-filter>=0.2.0, cpex-url-reputation>=0.1.1) - Add Prerequisites section to tests/performance/PLUGIN_PROFILING.md documenting the [plugins] extra requirement - Add Status: Partially superseded note to ADR-041 explaining that plugins_rust/ was removed when in-tree Rust plugins migrated to PyPI packages - Document upgrade semantics in plugins/requirements.txt header (pip without --upgrade skips already-satisfied constraints) - Add importlib.util.find_spec() precheck to tests/performance/test_plugins_performance.py main(); the script now skips cleanly with an actionable message if any of the five cpex packages referenced by the perf config are missing - Rename tests/unit/test_rust_plugins_workflow.py to test_go_toolchain_pinning.py to match its remaining contents (Go workflow pin and Makefile toolchain assertion) Follow-ups tracked in #4116 and IBM/cpex-plugins#21 for the longer-term tool_service.py refactor that will eliminate the cross-package import entirely. Signed-off-by: Jonathan Springer <jps@s390x.com> Signed-off-by: lucarlig <luca.carlig@ibm.com> * revert: restore tests changes from PR #3965 Signed-off-by: lucarlig <luca.carlig@ibm.com> * fix(ci): align plugin tests with PyPI migration Signed-off-by: lucarlig <luca.carlig@ibm.com> * test: remove legacy plugin test skip infrastructure Signed-off-by: lucarlig <luca.carlig@ibm.com> * test: align packaged plugin tests with rust shims Signed-off-by: lucarlig <luca.carlig@ibm.com> * test: cover retry policy import path in tool service Signed-off-by: lucarlig <luca.carlig@ibm.com> * fix: harden cpex plugin migration paths Signed-off-by: lucarlig <luca.carlig@ibm.com> * test: cover retry policy parser branches Signed-off-by: lucarlig <luca.carlig@ibm.com> * test: cover plugin requirements entrypoint path Signed-off-by: lucarlig <luca.carlig@ibm.com> --------- Signed-off-by: lucarlig <luca.carlig@ibm.com> Signed-off-by: Jonathan Springer <jps@s390x.com> Co-authored-by: Pratik Gandhi <gandhipratik203@gmail.com> Co-authored-by: Lang-Akshay <akshay.shinde26@ibm.com> Co-authored-by: lucarlig <luca.carlig@ibm.com>
This was referenced Apr 13, 2026
Closed
claudia-gray
pushed a commit
that referenced
this pull request
Apr 13, 2026
…3965) * refactor(plugins): replace in-tree rate_limiter with cpex-rate-limiter package Remove the in-tree rate_limiter plugin and replace it with the cpex-rate-limiter PyPI package, a compiled Rust extension providing the same RateLimiterPlugin class with additional algorithms (sliding-window, token-bucket) alongside the original fixed-window. - Add cpex-rate-limiter>=0.0.2 as a [plugins] optional dependency - Update Containerfile.lite to install the plugins extra - Remove plugins/rate_limiter/ source directory - Remove unit and integration tests that imported plugin internals - Update all config files to use cpex_rate_limiter.RateLimiterPlugin - Disable RateLimiterPlugin in test fixture config (package not available in unit test environment) - Update documentation to reflect the external package Signed-off-by: Jonathan Springer <jps@s390x.com> Signed-off-by: lucarlig <luca.carlig@ibm.com> * feat(rate-limiter): pluggable algorithms with Rust-backed execution engine, benchmarks, and validation (#3809) * feat(rate-limiter): pluggable algorithms, tenant isolation fix, and scale load test - Add pluggable algorithm strategy: fixed_window, sliding_window, token_bucket - Add Redis backend for shared cross-instance rate limiting - Fix tenant isolation: skip by_tenant when tenant_id is None - Fix sliding window: sweep expired timestamps before counting - Fix backend validation: restore _validate_config check - Fix token bucket memory path: apply max(1,...) guard to reset timestamp - Add Redis integration tests for all three algorithms - Add direct regression tests for get_current_user tenant_id fallback - Add scale load test with Redis memory timeline and live algorithm detection - Add RL_PACE_MULTIPLIER for near-limit pace testing and boundary burst detection - Remove redundant algorithm locustfile; scale file is canonical - Correct stale comments and README limitations Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * feat(rate-limiter): add Rust-backed engine, check() API, benchmarks, and validation - Rust-backed sliding window engine with pyo3-log integration - check() API with tenant propagation, sweep/retry-after support - Eliminate redundant ZRANGE in sliding window Lua script - Fix detect-secrets baseline for rate limiter load tests - Clarify memory backend is single-instance only in docs Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * chore: regenerate detect-secrets baseline after rebase Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * refactor(rate-limiter): review fixes, Redis hardening, key-format parity tests - Extract _dispatch_hook() shared by prompt_pre_fetch and tool_pre_invoke, reducing each hook to a single-line wrapper - Elevate Redis val_i64/val_f64 parse-error logging from warn to error so silent fail-open degradation surfaces in operator dashboards - Clamp sliding-window reset_timestamp with .max(1) so it is always strictly in the future even when the oldest entry expires in < 1 s - Add 5 s tokio::time::timeout around Redis connection establishment to prevent indefinite blocking on network partition - Replace silent except-pass in EVALSHA SHA tracking with logger.debug - Document dual Lua-script invariant (rolling-upgrade key-format parity) in both Python RedisBackend docstring and Rust redis_backend.rs header - Add 7 parametrized test_redis_key_format_parity_* tests validating that Python and Rust produce identical Redis keys for the same inputs - Revert unrelated .pyi stub changes for encoded_exfil_detection, pii_filter, retry_with_backoff, and secrets_detection Signed-off-by: Jonathan Springer <jps@s390x.com> * fix: strip trailing whitespace in pyi stubs, remove accidental .claude/ralph-loop.local.md - Remove plugins_rust/rate_limiter/.claude/ralph-loop.local.md which was accidentally committed — this is a local Claude Code loop state file and should never have been checked in. - Fix trailing whitespace in plugins_rust/rate_limiter/python/ rate_limiter_rust/__init__.pyi docstrings to pass pre-commit hooks. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * chore: regenerate detect-secrets baseline for new exfil test strings Update .secrets.baseline after adding test_extra_sensitive_keywords in plugins_rust/encoded_exfil_detection/src/lib.rs:969 which contains a fake credential string that triggers the Secret Keyword detector. All new entries are false positives (test data). Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> * chore: audit new detect-secrets baseline entries as false positives The baseline regeneration reset is_secret to null for entries whose line numbers shifted. Mark all 17 unaudited entries as is_secret=false (test data, example configs, fake credentials) to pass the --fail-on-unaudited pre-commit check. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> --------- Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com> Signed-off-by: Jonathan Springer <jps@s390x.com> Co-authored-by: Jonathan Springer <jps@s390x.com> Signed-off-by: lucarlig <luca.carlig@ibm.com> * feat(discovery): add automatic tool discovery with hot/cold classification (#3839) Implement automatic tool discovery for upstream MCP servers via usage-aware adaptive polling. The gateway can now continuously synchronise tool lists from registered servers without manual intervention. Server classification (hot/cold): - Classify servers based on MCP session pool usage patterns - Hot servers (top 20% by recent usage): polled at 1x base interval - Cold servers (remaining 80%): polled at 3x base interval - Classification is deterministic: sorted by recency, active sessions, use count, and URL for tie-breaking - Leader election via Redis with TTL renewal for multi-worker coordination - Falls back to local-only operation without Redis Integration with GatewayService: - Health checks respect hot/cold classification intervals - Auto-refresh of tools/resources/prompts respects classification - Fail-open on classification errors (poll anyway) - Poll timestamps tracked via Redis with TTL expiry - Uses base gateway URL (pre-auth) for classification lookups to avoid leaking query-param auth secrets to Redis Configuration: - AUTO_REFRESH_SERVERS=true enables automatic tool sync (default: false) - GATEWAY_AUTO_REFRESH_INTERVAL=300 sets base polling interval - HOT_COLD_CLASSIFICATION_ENABLED=false (opt-in, requires Redis) Includes comprehensive tests with 100% coverage on the new ServerClassificationService and integration tests for the GatewayService hot/cold polling paths. Closes #3734 Signed-off-by: Lang-Akshay <akshay.shinde26@ibm.com> Signed-off-by: Mihai Criveti <crivetimihai@gmail.com> Signed-off-by: lucarlig <luca.carlig@ibm.com> * refactor(plugins): replace in-tree rate_limiter with cpex-rate-limiter package Remove the in-tree rate_limiter plugin and replace it with the cpex-rate-limiter PyPI package, a compiled Rust extension providing the same RateLimiterPlugin class with additional algorithms (sliding-window, token-bucket) alongside the original fixed-window. - Add cpex-rate-limiter>=0.0.2 as a [plugins] optional dependency - Update Containerfile.lite to install the plugins extra - Remove plugins/rate_limiter/ source directory - Remove unit and integration tests that imported plugin internals - Update all config files to use cpex_rate_limiter.RateLimiterPlugin - Disable RateLimiterPlugin in test fixture config (package not available in unit test environment) - Update documentation to reflect the external package Signed-off-by: Jonathan Springer <jps@s390x.com> Signed-off-by: lucarlig <luca.carlig@ibm.com> * refactor(plugins): update build, CI, and docs for PyPI plugin migration Remove all plugins_rust/ build infrastructure and update references across Containerfiles, Makefile, CI workflows, pre-commit configs, CODEOWNERS, and documentation to reflect that plugins are now distributed as PyPI packages (cpex-*) via the [plugins] optional extra. - Remove Rust plugin builder stages from all Containerfiles - Remove ~100 lines of rust-* plugin Makefile targets (keep mcp-runtime) - Add --extra plugins to CI pytest workflow - Add [plugins] extra to install-dev Makefile target - Update tool_service.py import to use cpex_retry_with_backoff - Update plugin kind paths in 7 doc files to cpex_pii_filter.* - Clean up pre-commit, CODEOWNERS, MANIFEST.in, whitesource, .gitignore Signed-off-by: Jonathan Springer <jps@s390x.com> Signed-off-by: lucarlig <luca.carlig@ibm.com> * fix(plugins): address PR review findings on PyPI plugin migration Round 1 (blockers + high): - Restore exclude-newer = "10 days" in pyproject.toml; replace stale langchain/requests pins with cpex-* per-package overrides anchored to 2026-04-09 so the plugins resolve newer than the global window - Guard cpex_retry_with_backoff import in tool_service.py with try/except ImportError; falls back to (None, True) for the Python pipeline when the optional [plugins] extra is not installed - Delete orphaned .github/workflows/rust-plugins.yml and the associated test cases in tests/unit/test_rust_plugins_workflow.py; drop the workflow card from docs/docs/architecture/explorer.html - Delete orphaned docs/docs/using/plugins/rust-plugins.md and remove it from docs/docs/using/plugins/.pages mkdocs nav - Harden docker-entrypoint.sh install_plugin_requirements: canonicalize /app and the resolved requirements path with readlink -f and require the path to live under /app/, log non-comment lines from the requirements file before pip runs, and skip cleanly on validation failure - Delete PLUGIN-MIGRATION-PLAN.md (one-time planning doc) - Add COPY plugins/requirements.txt to Containerfile.scratch (the layered Containerfile.lite already had it; the broad COPY . in Containerfile already includes it) Round 2 (medium + low): - Bump cpex-* version pin floors in pyproject.toml [plugins] to match resolved versions in uv.lock (cpex-rate-limiter>=0.0.3, cpex-encoded-exfil-detection>=0.2.0, cpex-pii-filter>=0.2.0, cpex-url-reputation>=0.1.1) - Add Prerequisites section to tests/performance/PLUGIN_PROFILING.md documenting the [plugins] extra requirement - Add Status: Partially superseded note to ADR-041 explaining that plugins_rust/ was removed when in-tree Rust plugins migrated to PyPI packages - Document upgrade semantics in plugins/requirements.txt header (pip without --upgrade skips already-satisfied constraints) - Add importlib.util.find_spec() precheck to tests/performance/test_plugins_performance.py main(); the script now skips cleanly with an actionable message if any of the five cpex packages referenced by the perf config are missing - Rename tests/unit/test_rust_plugins_workflow.py to test_go_toolchain_pinning.py to match its remaining contents (Go workflow pin and Makefile toolchain assertion) Follow-ups tracked in #4116 and IBM/cpex-plugins#21 for the longer-term tool_service.py refactor that will eliminate the cross-package import entirely. Signed-off-by: Jonathan Springer <jps@s390x.com> Signed-off-by: lucarlig <luca.carlig@ibm.com> * revert: restore tests changes from PR #3965 Signed-off-by: lucarlig <luca.carlig@ibm.com> * fix(ci): align plugin tests with PyPI migration Signed-off-by: lucarlig <luca.carlig@ibm.com> * test: remove legacy plugin test skip infrastructure Signed-off-by: lucarlig <luca.carlig@ibm.com> * test: align packaged plugin tests with rust shims Signed-off-by: lucarlig <luca.carlig@ibm.com> * test: cover retry policy import path in tool service Signed-off-by: lucarlig <luca.carlig@ibm.com> * fix: harden cpex plugin migration paths Signed-off-by: lucarlig <luca.carlig@ibm.com> * test: cover retry policy parser branches Signed-off-by: lucarlig <luca.carlig@ibm.com> * test: cover plugin requirements entrypoint path Signed-off-by: lucarlig <luca.carlig@ibm.com> --------- Signed-off-by: lucarlig <luca.carlig@ibm.com> Signed-off-by: Jonathan Springer <jps@s390x.com> Co-authored-by: Pratik Gandhi <gandhipratik203@gmail.com> Co-authored-by: Lang-Akshay <akshay.shinde26@ibm.com> Co-authored-by: lucarlig <luca.carlig@ibm.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Consolidates and extends the rate limiter plugin with a Rust-backed execution engine, carrying forward the Python-side foundation work from #3783 (pluggable algorithms, tenant isolation, correctness fixes) and replacing the Python algorithm/backend implementation with a high-performance Rust engine for both
memoryandredisbackends.The Rust engine is the preferred execution path — all algorithm execution, backend dispatch, result aggregation, and response construction live in Rust when the
rate_limiter_rustPyO3 extension is installed. Python retains ownership of plugin lifecycle, hook integration, request-context extraction, and config validation. The full Python algorithm and backend implementation is retained as a fallback when the Rust extension is unavailable or whenRATE_LIMITER_FORCE_PYTHON=1is set.The Rust engine exposes a high-level
check()API that reduces the Python-Rust boundary to a single call returning pre-built response dicts. This keeps the existing plugin integration model intact while reducing request-path overhead and preserving shared-counter semantics for multi-instance deployments.Gaps closed
Python foundation
Gap 1 (HIGH) — No algorithm choice: only
fixed_windowwas available — no way to handle boundary bursts or bursty workloads. Fixed by introducing a strategy pattern with three selectable algorithms (fixed_window,sliding_window,token_bucket) configurable via thealgorithm:field. All three run on bothmemoryandredisbackends; the Redis backend was extended with atomic Lua scripts forsliding_windowandtoken_bucketto maintain cluster-wide enforcement across instances.test_fixed_window_burst_at_boundary(xfail) documents the boundary burst as a known trade-off for users who stay onfixed_window;sliding_windoweliminates it entirely.Gap 2 (HIGH) —
by_tenantcross-throttle:tenant_id=Nonefell back to a shared"tenant:default"bucket, causing unrelated users to throttle each other across the entire deployment. Fixed in two places: (1) the plugin now skips theby_tenantcheck entirely whentenant_idisNoneinstead of inventing a phantom bucket; (2)mcpgateway/auth.pynow propagatesrequest.state.team_id→global_context.tenant_idunconditionally via_propagate_tenant_id(), called at all four return paths inget_current_user(), so single-team API tokens correctly populate tenant context for rate limiting regardless of theinclude_user_infosetting.Gap 3 (MEDIUM) — Sliding window memory leak in
sweep(): stale-but-non-empty timestamp lists were never evicted from_storebecausesweep()only removed empty lists. Fixed by embedding the window size in each store key ("{key}:{window}") so staleness is computable at sweep time, and rewritingsweep()to evict any key where all timestamps fall outside the window.Rust execution engine
Gap 4 (HIGH) — Python hot path overhead: every request still paid Python-side rate evaluation costs, including per-dimension orchestration, repeated backend calls, and response metadata construction via individual PyO3 attribute accesses. Fixed by introducing a Rust
RateLimiterEnginewith a singlecheck()call per hook invocation. The engine builds dimension keys internally from its pre-parsed config, evaluates all active dimensions in one batch, and returns pre-built Python dicts for HTTP headers and plugin metadata — eliminating ~18-25 PyO3 boundary crossings per request (depending on the number of active dimensions).Gap 5 (HIGH) — Rust acceleration was limited to the in-memory backend: Redis-backed deployments, which are the correctness-critical path for multi-instance enforcement, still relied on the Python backend. Fixed by adding a Redis backend to the Rust engine. Rust now owns the Redis connection and executes batch Lua scripts directly, preserving shared-counter behavior across replicas.
Gap 6 (MEDIUM) — Rate-limit computation, response shaping, and plugin policy handling were still interleaved in the Python implementation. Fixed by making the Python wrapper policy-only: it validates config, normalizes context, and invokes the engine. Algorithm execution, backend dispatch, result aggregation, and response dict construction now all live in Rust. The Python hot path is reduced to a single
check()call plus result dispatch.Gap 7 (LOW) — The Rust+Redis path needed end-to-end proof under a real multi-instance deployment. Validated with the backend-correctness load test against 3 gateways behind nginx with Redis shared state. Result:
60allowed,60rate-limited,49.6%blocked, matching the expected~50%shared-counter behavior.Additional hardening
_parse_raterobustness — wasrate.split("/")with no error handling; now usesmaxsplit=1+try/exceptwith a clear message showing the bad inputprompt_pre_fetchnormalisation —prompt_idnow normalised with.strip().lower(), consistent with howtool_pre_invokehandles tool names_normalised_by_toolpre-computed —by_toolkey lowercasing was re-done on every hook call; now computed once at init in_validate_config_propagate_tenant_id()copiesrequest.state.team_id→global_context.tenant_idat everyget_current_user()return path, not just inside_inject_userinfo_instate()which is gated byinclude_user_info(defaultFalse)Retry-After >= 1—inttruncation of(oldest_ts + window - now)could produce0when the oldest timestamp plus the window rounded down toint(now), telling clients to retry immediately while still over limit; fixed on both memory and Redis paths withmax(1, reset_in)time_to_full = windowon first request, while Redis derived it fromtokens_needed / refill_rate, causing metadata divergence between backends (e.g.60vs~6for a10/mlimit); both paths now usetokens_needed / refill_raterate_limiter_rustPyO3 extension is installed, the Rust engine handles all rate evaluation; when unavailable or whenRATE_LIMITER_FORCE_PYTHON=1is set, the plugin falls back to the full Python algorithm and backend implementationRATE_LIMITER_FORCE_PYTHONenv var — allows operators to force the Python backend for A/B comparisons or debugging; Makefile targetsbenchmark-rate-limiter-capacity-rustandbenchmark-rate-limiter-capacity-pythonexercise both pathsMemoryStorenow acquires one read lock on the outer map + one write lock on the per-key state per steady-state request (previously two outer read locks), eliminating a redundant lock cycleArchitecture
The plugin now delegates all rate evaluation to Rust:
memory→ Rust in-process store via the existing sync engine pathredis→ Rust async Redis execution via a multiplexed connection + batch Lua executionrate_limiter_rustPyO3 extension is installed, Rust handles all rate evaluation; otherwise the plugin falls back to the Python backendPlugin internals: request flow, wrapper responsibilities, Rust engine, backends, and response shaping
Test results
Validation for this PR is intentionally comprehensive: unit tests, Rust tests, micro-benchmarks, hook-path A/B comparisons, multi-instance Redis correctness, sustained load tests, algorithm comparison, and Fyre VM deployment validation are all included below.
The highest-signal correctness and unit-test summaries remain visible. The more detailed benchmark and deployment sections are preserved in expandables so the data stays in the PR without dominating the main narrative.
Test results summary
49.6%blocked vs expected~50%,0infra failures17-21 nssingle-key,49 nsthree-dimension1.7x-1.9xfor most 3-dim memory paths,28xforsliding_window60 msvs100 ms,28%less gateway memory47.6%blocked,0infra failures across28,650requests~47-49%blocked across~28,600requests each93plugin tests passed,47/47Rust tests passed1. Backend correctness (multi-instance Redis)
Multi-instance Redis correctness matched the expected shared-counter behavior:
49.6%blocked,0infrastructure failures, and an exact60allowed /60rate-limited split in the single-user validation run.Full backend correctness results
Topology:
nginx → 3 gateways → shared Redis, 1 user at 60 req/min against a 30/min limit.121606049.6%049.7 ms76 ms/88 msVerdict:
REDIS BACKEND — limit correctly enforced. The shared Redis counter produced the expected ~50% blocked rate across three gateway instances. Zero infrastructure failures.2. Rust micro-benchmarks (criterion, Apple Silicon M-series, release build)
Criterion benchmarks on Apple Silicon show nanosecond-scale in-process memory-backend costs: 17-21 ns for single-key operations, 49 ns for three-dimension evaluation, and 16-67 µs in the contention scenarios included here.
Full micro-benchmark results
In-process memory backend, direct
MemoryStorecalls (no PyO3 overhead):fixed_window/single_keytoken_bucket/single_keysliding_window/single_keyfixed_window/three_dimsfixed_window/hot_counterfixed_window/blocked_pathfixed_window/many_keys_10kfixed_window/concurrent_2tstd::thread::scopefixed_window/concurrent_4tfixed_window/concurrent_8t3. Hook-path comparison (Python pytest-benchmark, memory backend)
Measured per-hook Python→Rust round-trip overhead via
pytest-benchmark, comparing the Rust enginecheck()path against the older Python-only algorithm+backend stack. Numbers reflect the full hook path including PyO3 crossing, dict construction, and Python result dispatch.Full hook-path comparison results
All benchmarks:
fixed_window,sliding_window,token_bucket× single dimension, three dimensions.fixed_window/ 1 dimfixed_window/ 3 dimsliding_window/ 1 dimsliding_window/ 3 dimtoken_bucket/ 1 dimtoken_bucket/ 3 dimThe
sliding_windowimprovement is the most dramatic because the Python implementation walks and filters timestamp lists in pure Python on every check, while Rust uses aVecDequewith amortized O(1) front pops.4–8. Remaining validation
Redis capacity benchmark, multi-user scale, algorithm comparison, Fyre VM, unit tests
4. Redis capacity benchmark — p99 latency
60 ms(vs100 msPython),28%less gateway RSS. Three gateways, shared Redis, 100 users at 0.25 rps for 5 minutes.5. Multi-user scale test —
47.6%blocked across28,650requests with0infrastructure failures. 100 concurrent users, 5-minute sustained load.6. Algorithm comparison — All three algorithms (
fixed_window,sliding_window,token_bucket) converged to~47-49%blocked across~28,600requests each, confirming equivalent enforcement behavior.7. Fyre VM validation — Rust engine validated on x86 Fyre VM deployment.
8. Unit test suite —
93Python plugin tests passed,47/47Rust tests passed.