test(rate-limiter): toggle convergence-time probe + TTL-expiry cycle by gandhipratik203 · Pull Request #4511 · IBM/mcp-context-forge

gandhipratik203 · 2026-04-29T06:12:17Z

Summary

Two integration tests covering rate-limiter behaviour under runtime mode toggle via the admin API. Marked draft because they raise design questions about the rate limiter's toggle-latency contract that need team input before a final shape lands.

The rate limiter is the only stateful plugin we ship — counter state lives in Redis, mode is embedded in a per-worker cached plugin instance. That makes its response to a runtime mode toggle eventually consistent, not instantaneous, in a way the other supported plugins (Output Length Guard, Secrets Detection) aren't.

What's in this PR

Two tests, both currently passing:

`TestRateLimiterToggleAfterTTLExpiry` — correctness contract

tests/integration/test_rate_limiter_dynamic_behavior.py

Three-phase cycle: enforce → wait past Redis counter TTL → disabled → wait → enforce. Validates that re-enforcement works after a quiet period long enough for counters to age out. Tagged @pytest.mark.slow because of two ~75s sleeps.

`TestRateLimiterToggleLiveState::test_measure_convergence_time_for_mode_toggle` — latency probe

tests/integration/test_rate_limiter_toggle_live_state.py

Flips mode, then polls every second until traffic actually reflects the new mode, recording the elapsed time and the per-poll trajectory. No threshold-based assertion on the convergence time — the test fails only if convergence never happens within MAX_CONVERGENCE_S. Useful both as a regression pin (sudden slowdown caught) and as a measurement tool.

Sample observation

One local run, single replica × 24 workers, post-current-main image:

enforce → disabled : converged in 22.4s
  t= 0.00s   4/6 allowed,  2/6 blocked
  t= 2.84s   2/6 allowed,  4/6 blocked
  t= 7.20s   1/6 allowed,  5/6 blocked  (got worse before better)
  t=11.30s   0/6 allowed,  6/6 blocked
  t=15.42s   5/6 allowed,  1/6 blocked  (oscillates)
  t=20.86s   4/6 allowed,  2/6 blocked
  t=22.37s   6/6 allowed,  0/6 blocked  ← converged

disabled → enforce : converged in 9.7s

The ~22s disabled-convergence with mid-trajectory oscillation is suspicious — looks like pub/sub-driven invalidation is correct for most workers within a few seconds, but a fraction takes much longer to converge, possibly relying on the 30s TTL-based cache safety net rather than the pub/sub eviction. Convergence happens; just not as fast as expected.

Why this is different from stateless plugins

Stateless plugins decide based only on the current request — the moment a worker's mode flips to disabled, every subsequent request reflects it. Rate limiter decides based on (mode, persistent counter state). When mode flips, the counter is still hot in Redis, so any worker still serving requests under stale-mode-enforce will still block them. The convergence time is a property of how long it takes every worker's plugin-manager cache to rebuild with the new mode.

Open design questions for review

Toggle latency expectation. What's the SLA we want to commit to? "Within N seconds" feels right, but what's N? Operators flipping a plugin off in an incident response need to know.
Counter window on disabled → enforce. Today we don't reset the window — flipping back to enforce on an already-hot bucket immediately blocks. Is that the right call? Some operators might expect a fresh window on re-enable. Either is defensible.
Should disabled mode actually reset Redis state? Today it stops checking counters but doesn't clear them. Same question from the other direction.
Eventual-consistency contract for stateful plugins more generally. Rate limiter is the only stateful one today, but cpex-plugins#49 has follow-ups (quota accounting, cascade limits) that would be similarly stateful. Worth deciding the convergence-time SLA at the framework level once.

What this PR is not

Not a code fix. The rate limiter works; the question is how it behaves under toggle and what we want that behaviour to be. The tests document current behaviour and provide a measurement harness for any future fix.

How to run

Prerequisites

Local docker-compose stack from this repo brought up via make compose-up (3 gateway replicas behind nginx on :8080, Redis, Postgres, fast-test-server registered).
Gateway image built from current main — run make docker-build if your local image is stale.
Default dev credentials and JWT secret from your local .env (the .env.example defaults work for this test).
uv sync --extra plugins so cpex-rate-limiter is importable in the test env.

Run

# convergence probe — ~1 minute, emits the trajectory on every run
uv run pytest tests/integration/test_rate_limiter_toggle_live_state.py \
  --with-integration -s -v

# TTL-expiry cycle — ~3 minutes due to two ~75s sleeps; tagged @pytest.mark.slow
uv run pytest \
  tests/integration/test_rate_limiter_dynamic_behavior.py::TestRateLimiterToggleAfterTTLExpiry \
  --with-integration -s -v

Both tests skip cleanly if http://localhost:8080/health isn't reachable.

Update — empirical evidence and layered coverage

Since the original draft, this branch has grown into a layered toggle-behaviour story. Adding here, not editing the above — the original sections remain the right snapshot of the problem statement and findings at the time the PR was opened.

Layered toggle-behaviour coverage now on this branch

Layer	Pin	File
A	`reload_tenant → manager.shutdown` invocation	`tests/unit/mcpgateway/plugins/framework/test_tenant_plugin_manager_tool_scoped.py`
A-init	`reload_tenant → new manager.initialize` invocation	same
B-0	manager-mediated plugin wipe via direct `factory.reload_tenant`	`tests/integration/test_plugin_manager_wipe_on_disable.py`
B-0.5	publisher↔subscriber round-trip via real Redis pub/sub	same
C (existing)	end-to-end convergence probe via HTTP+MCP	`tests/integration/test_rate_limiter_toggle_live_state.py`
Multi-replica probe	end-to-end convergence via Redis pub/sub only, no HTTP amplification	`tests/integration/test_rate_limiter_toggle_via_redis_pubsub.py`

Empirical answer to design question #3

Question #3 asked whether disabled mode should clear Redis counter state. The answer landed on "yes, clear them," and the implementation lives on cpex-plugins branch feat/rate-limiter-wipe-on-disable-only. Running the new multi-replica pub/sub probe against a wipe-enabled gateway image (3 replicas × 24 workers, plain docker-compose stack):

Convergence trajectory (counter_exists per poll):
  t=  0.00s  counter_exists=True
  t=  0.11s  counter_exists=False  ← converged

✓ Wipe converged in 0.11s

vs. the ~22s baseline observed in the existing live-state probe at the top of this PR — a ~170× improvement on the disabled-direction convergence. Note the wipe doesn't accelerate the cache rebuild; it makes the rebuild's user-visible latency irrelevant — a stale-mode worker with empty counters cannot block any request.

Pre-conditions discovered while empirically running

Two things had to be in place for the wipe path to actually fire from a clean stack — worth flagging on the cpex-plugins wipe-on-disable PR description as deployment caveats:

Plugin must be loaded for the wipe to fire. plugins/config.yaml ships rate-limiter with mode: "disabled", and the framework loader skips disabled plugins. The probe first toggles to enforce via the admin API so the plugin is genuinely loaded.
At least one cached manager required. factory.invalidate_all() iterates only cached managers; a fresh gateway with zero traffic has zero cached managers, so the broadcast triggers no reload_tenant and no plugin.shutdown. The probe sends a 20-request warm-up burst before the disable to ride gunicorn's load balancer across multiple workers.

Status of original design questions

Configure Renovate #1 (toggle latency SLA): with wipe-on-disable enabled, the enforce → disabled direction converges in ~0.1s. The user-visible SLA could plausibly be "< 1 second" instead of "within N seconds" — but the disabled → enforce direction is unchanged by the wipe and still depends on cache rebuild propagation. Worth team input on whether one SLA covers both.
Open Source release of MCP Context Forge - MCP Gateway #2 (fresh window on re-enable): transitively answered by Update Makefile and pyproject.toml with packaging steps #3 — wipe-on-disable means re-enable always starts with empty counters, so the operator gets a fresh window for free.
Update Makefile and pyproject.toml with packaging steps #3 (disabled mode resets Redis state): answered yes, implementation in cpex-plugins linked above. This PR's tests are the framework-side coverage of that contract.
Add testing documentation #4 (eventual-consistency contract for stateful plugins): still open at the framework level, separate from this PR.

Two complementary integration tests covering rate-limiter behaviour when the plugin's mode is toggled at runtime via the admin API. The rate limiter is the only stateful plugin in the gateway's catalog (counter state lives in Redis; mode is embedded into a per-worker cached plugin instance), so its response to a runtime mode toggle is eventually consistent rather than instantaneous. These tests describe both halves of that contract. TestRateLimiterToggleAfterTTLExpiry (in test_rate_limiter_dynamic_behavior.py) is the *correctness* test: enforce -> wait past Redis counter TTL -> disabled -> wait -> enforce. Validates that re-enforcement works after a quiet period long enough for counters to age out. Tagged @pytest.mark.slow because of the two ~75s sleeps. Acts as a regression pin for the happy path. TestRateLimiterToggleLiveState::test_measure_convergence_time_for_mode_toggle (new file) is the *latency* probe: flip enforce -> disabled, then poll every second until traffic actually reflects the new mode, recording the elapsed time. Same pattern for disabled -> enforce. No threshold-based assertion on the convergence time — the test only fails if convergence never happens within MAX_CONVERGENCE_S. Records the per-poll observation trajectory so the actual convergence path (and any oscillation) is visible in the test output. Useful both as a regression pin (sudden slowdown caught) and as a measurement tool for informing a future toggle-latency SLA on the rate-limiter plugin. Also fixes _send_tool_burst in test_rate_limiter_dynamic_behavior.py to do an MCP initialize handshake (required by the streamable HTTP transport), set the MCP-Protocol-Version + Accept: text/event-stream headers, and use the trailing-slash form of /servers/{id}/mcp/ to avoid a 307 redirect. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

When an operator flips a plugin's mode via the admin API, the rebuild path runs through TenantPluginManagerFactory.reload_tenant(), which evicts the cached manager, awaits old.shutdown() (propagating to plugin.shutdown(), where stateful plugins react), and builds a new manager. The "new manager built" step is already pinned (test_factory_reload_- tool_context). The "old.shutdown() invoked" step is *not* — only its exception-handling behaviour is. test_factory_reload_shutdown_- exception and test_factory_build_manager_old_shutdown_fails would both pass if reload_tenant silently dropped the shutdown call entirely. test_factory_shutdown_propagates_exceptions pins shutdown invocation, but on the whole-factory shutdown path, not the toggle path. A regression at this boundary would silently break the operator- visible toggle contract for stateful plugins — wipe-on-disable (#4576), audit logging on disable, etc. would no-op despite every surrounding test staying green. This commit fills that gap with a single test that wraps the manager's shutdown in an AsyncMock(wraps=...) spy and asserts it was awaited exactly once on reload. Layered toggle-behaviour coverage on this branch (after this commit): Layer A — framework boundary (reload_tenant lifecycle) [unit, this commit] Layer C — convergence measurement harness [integration, existing] Layer B (plugin-side propagation: manager.shutdown -> plugin.shutdown fires for the rate limiter under the manager) and Layer D (convergence threshold under wipe-on-disable) will land as follow-on commits on this same branch as their prerequisites are met. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

Symmetric counterpart to the previous commit (test_factory_reload_tenant_invokes_old_manager_shutdown). That commit pinned the disable-direction boundary — old manager's shutdown fires on reload. This commit pins the enable-direction boundary — new manager's initialize fires on reload. test_factory_reload_tool_context already proves a new manager instance is constructed (manager2 is not manager1), but a regression that constructed without initializing would still pass that test — instances exist, they would just be unable to dispatch hooks. Without initialize firing, every plugin under the new manager would silently no-op after a toggle to enforce: hooks wouldn't run, mode-change-on-init logic (e.g. the rate limiter re-arming its counter window after wipe-on-disable) wouldn't fire, and the operator-visible "I just re-enabled this plugin" promise would silently break. Captures via an instance list rather than AsyncMock because the assertion needs to identify *which* manager was initialized (must match manager2), not just count calls — the initial get_manager above already triggers one initialize that we deliberately exclude by installing the spy *after* it. Layered toggle-behaviour coverage on this branch (after this commit): Layer A — framework boundary, shutdown side [unit, prev commit] Layer Ai — framework boundary, initialize side [unit, this commit] Layer C — convergence measurement harness [integration, existing] Layer B (single-replica integration: admin toggle -> rate-limiter plugin shutdown fires -> counter cleared) and Layer D (convergence threshold under wipe-on-disable) are next on the ladder. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

… real plugin + Redis Adds the integration-test rung between the framework-boundary unit tests in test_tenant_plugin_manager_tool_scoped.py (which use stub plugins) and the full HTTP-driven tests in test_rate_limiter_dynamic_behavior.py (which require a running gateway with admin auth). This test loads a real cpex-rate-limiter plugin into a real TenantPluginManagerFactory against a real Redis, deposits counter keys via manager.invoke_hook, simulates an operator mode toggle to "disabled" by setting the framework's mode key in Redis directly, then drives factory.reload_tenant and asserts the counter keys were wiped — proving the rebuild path reaches the plugin's wipe-on-disable code path end-to-end without HTTP, pubsub, or admin-handler involvement. What this pins (the boundary between Layer A and the HTTP layer): * factory.reload_tenant -> manager.shutdown -> registry.shutdown -> plugin.shutdown all fire in sequence against a real plugin. * The rate-limiter plugin's wipe-on-disable code path is reached, reads the mode key, and clears its counters from Redis. Out of scope (covered by existing tests at adjacent layers): * Admin HTTP -> reload_tenant pathway. * Pubsub-driven multi-worker fan-out. * Convergence timing across multiple workers. Layered toggle-behaviour coverage on this branch (after this commit): Layer A — framework boundary, shutdown side [unit, prev commit] Layer Ai — framework boundary, initialize side [unit, prev commit] Layer B-0 — manager-mediated plugin wipe [integration, this commit] Layer C — convergence measurement harness [integration, existing] Layer D (convergence threshold under wipe-on-disable) is the next rung once empirical validation against a wipe-enabled gateway image confirms the hypothesis. CI behaviour: this test lives in tests/integration/ and is therefore auto-skipped by the pytest_integration_mark plugin unless the runner passes --with-integration. CI today does NOT pass that flag, so the test does not run automatically on PR CI — same as every other integration test in this directory. It is exercised locally by developers who explicitly opt in. Once the cpex-plugins wipe-on-disable PR merges and the wipe-enabled wheel is published, local --with-integration runs will start exercising the wipe assertions; CI inclusion would require wiring up an integration-test job, which is framework-level work tracked separately. Skip-guards: * Skips when docker is unavailable (cannot stand up the test Redis). * Skips when the installed cpex-rate-limiter wheel does not include the wipe-on-disable code path (no _wipe_my_counters attribute on the plugin class) — i.e. when running against the PyPI baseline before the cpex-plugins wipe PR merges. Test infrastructure additions: * tests/integration/fixtures/configs/rate_limiter_redis_only.yaml — minimal plugin config loading just the rate limiter with backend=redis, sourcing redis_url via "{{ env.REDIS_URL }}" (Jinja substitution at load time; matches the pattern the gateway itself uses for its own Redis URL). * Dedicated host port 16380 so the test container is hermetic and never collides with a developer's local Redis on 6379 or with other smoke-test stacks. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

Adds B-0.5 — the next rung past B-0 — which exercises the real publish/subscribe round-trip that B-0 deliberately skipped. B-0 sets the mode key in Redis directly and calls factory.reload_tenant directly; this test calls publish_plugin_mode_change (the function the admin handler invokes), captures the resulting message off the real Redis channel, and feeds it through _handle_invalidation_message (the function the gateway's listener calls). What B-0.5 pins that B-0 cannot: * The publisher's actual wire format on the channel — channel name, JSON keys, value types — matches what _handle_invalidation_message accepts and routes to invalidate_all_plugin_managers -> factory.invalidate_all -> reload_tenant -> manager.shutdown -> plugin.shutdown -> wipe. * A regression where publish_plugin_mode_change and the handler drift on format (e.g. one renames a JSON key without the other) would still pass every existing mocked-JSON pubsub test, because those tests construct the message bytes themselves. This test PUBLISHes via the real publisher and receives off the real channel, so any drift breaks the round-trip. Wiring fix discovered while writing this test: the framework's shared-redis access goes through a dependency-inversion shim (set_shared_redis_provider in mcpgateway/plugins/framework/_redis.py), not via mcpgateway.utils.redis_client._client. Documented in the test's comment so future readers don't go down the wrong path. Layered toggle-behaviour coverage on this branch (after this commit): Layer A — framework boundary, shutdown side [unit] Layer Ai — framework boundary, initialize side [unit] Layer B-0 — manager-mediated plugin wipe [integration] Layer B-0.5 — pubsub round-trip wipe [integration, this commit] Layer C — convergence measurement harness [integration, existing] Layer D (convergence threshold under wipe-on-disable) remains gated on empirical validation against a wipe-enabled gateway image. CI behaviour and skip-guards: identical to B-0 (auto-skipped by the pytest_integration_mark plugin unless --with-integration is passed; skips when cpex-rate-limiter wheel lacks the wipe-on-disable code path; runs locally against a wipe-enabled wheel). Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

…lti-replica wipe probe via Redis pubsub Two coupled changes: 1. Align the test fixtures' plugin name with the gateway's own plugins/config.yaml convention. Before this commit, the wipe-on-disable yaml fixture and the boundary tests used name="RateLimiter" while the gateway's production plugin uses name="RateLimiterPlugin", creating two disjoint mode-key namespaces. Touches the yaml fixture and the explicit mode-key writes in B-0 / B-0.5. No behavioural change to either test — they were internally consistent before; this just makes them match the gateway convention. 2. Add tests/integration/test_rate_limiter_toggle_via_redis_pubsub.py — a multi-replica wipe-on-disable convergence probe that bypasses HTTP entirely. Talks only to Redis: SETs the mode key, PUBLISHes the invalidation frame, then polls until the counter key disappears. Records elapsed time as the convergence measurement. Companion to test_rate_limiter_toggle_live_state.py (Layer C), which drives the same toggle via HTTP+MCP and is therefore amplified by the STREAMABLEHTTP transport. This new test measures the wipe-on-disable claim itself, not "the claim plus measurement overhead." Skip behaviour: skips when the gateway stack isn't running or Redis isn't reachable. Does not skip when wipe-on-disable code is missing from the running gateway — fails with a clear timeout message instead, which is the right semantics for a deployment that has wipe enabled. Known caveat (not fixed by this commit, surfaced empirically while testing): factory.invalidate_all() iterates *cached* managers only. A fresh gateway with no traffic has zero cached managers, so the wipe-on-disable path is unreachable until a request warms a manager for some context. Worth flagging in the cpex-plugins wipe-on-disable PR description as a deployment caveat. Following commits will add a one-request warm-up step to the test so it can run from a clean stack; this commit lands the test as written and the rename together. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

araujof · 2026-05-03T11:21:18Z

DO NOT MERGE before #3754 is merged.
Touches files under mcpgateway/plugins

…stack Adds the two pre-condition steps the test needed to actually exercise the wipe path against a fresh gateway, discovered empirically while running the test against the wipe-enabled multi-replica stack: 1. _set_plugin_mode("enforce") via the existing admin-API helper. plugins/config.yaml ships with mode=disabled for the rate limiter, and the framework loader does not instantiate disabled plugins under any cached manager. No plugin instance ⇒ no shutdown to fire ⇒ no wipe path to run, regardless of how many subscribers the broadcast reaches. Toggling to enforce first writes the mode key, publishes the invalidation frame, and primes every worker's local override map so the next manager-build instantiates the plugin in enforce mode. 2. _send_tool_burst(server_id, tool_name, 20). factory.invalidate_- all() iterates only *cached* managers. Each gunicorn worker has its own factory; without prior traffic on a worker, that factory's _managers dict is empty and the broadcast triggers no reload_tenant. 20 MCP tools/call requests ride gunicorn's load balancer across many of the 72 workers (3 replicas × 24 workers), so multiple workers cache a manager — one warm worker firing its plugin's shutdown is enough since the wipe runs SCAN+DEL against shared Redis, but more warm workers raises the floor against transient races. These are amplified by the STREAMABLEHTTP transport but are a one-off pre-condition; the convergence measurement loop below them stays HTTP-free. Empirically confirmed against a multi-replica wipe-enabled stack: the wipe converges in ~0.11s — vs. the ~22s baseline observed in this PR's existing live-state probe. ~170x improvement, consistent with the hypothesis that wipe-on-disable doesn't speed up the cache rebuild but makes the rebuild's user-visible latency irrelevant (stale-mode workers with empty counters cannot block any request). Two iterations took to land this: * 1-warm-up run: didn't fire (only 1 of 72 workers warmed; not determinative on its own — see below). * 20-warm-ups + enforce-pre-condition run: ✓ converged in ~0.11s. The 1-warm-up failure was almost certainly because mode=disabled in plugins/config.yaml prevented the plugin from being loaded under the warmed manager (path B in the iterative diagnosis), not because 1 warm worker was insufficient. The enforce pre-condition was the load-bearing fix; the bump from 1 to 20 warm-ups is defensive against load-balancing races. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>

gandhipratik203 · 2026-05-03T12:36:33Z

@araujof Noted. Won't merge with mcpgateway/plugins changes. Will be removing the files under mcpgateway/plugins. So the PR will only have the changes for tests/integration.

This was referenced Apr 29, 2026

Rate limiter: runtime mode-toggle convergence behaviour and SLA decisions #4514

Open

[FEATURE]: Clear rate-limiter Redis counter state on disabled mode transition #4576

Open

gandhipratik203 added 2 commits May 3, 2026 08:49

gandhipratik203 marked this pull request as ready for review May 3, 2026 07:59

gandhipratik203 requested review from araujof, crivetimihai, jonpspri, kevalmahajan, madhav165 and terylt as code owners May 3, 2026 07:59

gandhipratik203 added 3 commits May 3, 2026 10:51

gandhipratik203 marked this pull request as draft May 3, 2026 12:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(rate-limiter): toggle convergence-time probe + TTL-expiry cycle#4511

test(rate-limiter): toggle convergence-time probe + TTL-expiry cycle#4511
gandhipratik203 wants to merge 7 commits intomainfrom
test/rate-limiter-toggle-convergence

gandhipratik203 commented Apr 29, 2026 •

edited

Loading

Uh oh!

araujof commented May 3, 2026

Uh oh!

gandhipratik203 commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gandhipratik203 commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in this PR

TestRateLimiterToggleAfterTTLExpiry — correctness contract

TestRateLimiterToggleLiveState::test_measure_convergence_time_for_mode_toggle — latency probe

Sample observation

Why this is different from stateless plugins

Open design questions for review

What this PR is not

How to run

Update — empirical evidence and layered coverage

Layered toggle-behaviour coverage now on this branch

Empirical answer to design question #3

Pre-conditions discovered while empirically running

Status of original design questions

Uh oh!

araujof commented May 3, 2026

Uh oh!

gandhipratik203 commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gandhipratik203 commented Apr 29, 2026 •

edited

Loading

`TestRateLimiterToggleAfterTTLExpiry` — correctness contract

`TestRateLimiterToggleLiveState::test_measure_convergence_time_for_mode_toggle` — latency probe