test(rate-limiter): toggle convergence-time probe + TTL-expiry cycle#4511
Draft
gandhipratik203 wants to merge 7 commits intomainfrom
Draft
test(rate-limiter): toggle convergence-time probe + TTL-expiry cycle#4511gandhipratik203 wants to merge 7 commits intomainfrom
gandhipratik203 wants to merge 7 commits intomainfrom
Conversation
Two complementary integration tests covering rate-limiter behaviour when
the plugin's mode is toggled at runtime via the admin API. The rate
limiter is the only stateful plugin in the gateway's catalog (counter
state lives in Redis; mode is embedded into a per-worker cached plugin
instance), so its response to a runtime mode toggle is eventually
consistent rather than instantaneous. These tests describe both halves
of that contract.
TestRateLimiterToggleAfterTTLExpiry (in test_rate_limiter_dynamic_behavior.py)
is the *correctness* test: enforce -> wait past Redis counter TTL ->
disabled -> wait -> enforce. Validates that re-enforcement works after a
quiet period long enough for counters to age out. Tagged @pytest.mark.slow
because of the two ~75s sleeps. Acts as a regression pin for the happy
path.
TestRateLimiterToggleLiveState::test_measure_convergence_time_for_mode_toggle
(new file) is the *latency* probe: flip enforce -> disabled, then poll
every second until traffic actually reflects the new mode, recording the
elapsed time. Same pattern for disabled -> enforce. No threshold-based
assertion on the convergence time — the test only fails if convergence
never happens within MAX_CONVERGENCE_S. Records the per-poll observation
trajectory so the actual convergence path (and any oscillation) is
visible in the test output. Useful both as a regression pin (sudden
slowdown caught) and as a measurement tool for informing a future
toggle-latency SLA on the rate-limiter plugin.
Also fixes _send_tool_burst in test_rate_limiter_dynamic_behavior.py to
do an MCP initialize handshake (required by the streamable HTTP
transport), set the MCP-Protocol-Version + Accept: text/event-stream
headers, and use the trailing-slash form of /servers/{id}/mcp/ to avoid
a 307 redirect.
Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
This was referenced Apr 29, 2026
When an operator flips a plugin's mode via the admin API, the rebuild path runs through TenantPluginManagerFactory.reload_tenant(), which evicts the cached manager, awaits old.shutdown() (propagating to plugin.shutdown(), where stateful plugins react), and builds a new manager. The "new manager built" step is already pinned (test_factory_reload_- tool_context). The "old.shutdown() invoked" step is *not* — only its exception-handling behaviour is. test_factory_reload_shutdown_- exception and test_factory_build_manager_old_shutdown_fails would both pass if reload_tenant silently dropped the shutdown call entirely. test_factory_shutdown_propagates_exceptions pins shutdown invocation, but on the whole-factory shutdown path, not the toggle path. A regression at this boundary would silently break the operator- visible toggle contract for stateful plugins — wipe-on-disable (#4576), audit logging on disable, etc. would no-op despite every surrounding test staying green. This commit fills that gap with a single test that wraps the manager's shutdown in an AsyncMock(wraps=...) spy and asserts it was awaited exactly once on reload. Layered toggle-behaviour coverage on this branch (after this commit): Layer A — framework boundary (reload_tenant lifecycle) [unit, this commit] Layer C — convergence measurement harness [integration, existing] Layer B (plugin-side propagation: manager.shutdown -> plugin.shutdown fires for the rate limiter under the manager) and Layer D (convergence threshold under wipe-on-disable) will land as follow-on commits on this same branch as their prerequisites are met. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
Symmetric counterpart to the previous commit (test_factory_reload_tenant_invokes_old_manager_shutdown). That commit pinned the disable-direction boundary — old manager's shutdown fires on reload. This commit pins the enable-direction boundary — new manager's initialize fires on reload. test_factory_reload_tool_context already proves a new manager instance is constructed (manager2 is not manager1), but a regression that constructed without initializing would still pass that test — instances exist, they would just be unable to dispatch hooks. Without initialize firing, every plugin under the new manager would silently no-op after a toggle to enforce: hooks wouldn't run, mode-change-on-init logic (e.g. the rate limiter re-arming its counter window after wipe-on-disable) wouldn't fire, and the operator-visible "I just re-enabled this plugin" promise would silently break. Captures via an instance list rather than AsyncMock because the assertion needs to identify *which* manager was initialized (must match manager2), not just count calls — the initial get_manager above already triggers one initialize that we deliberately exclude by installing the spy *after* it. Layered toggle-behaviour coverage on this branch (after this commit): Layer A — framework boundary, shutdown side [unit, prev commit] Layer Ai — framework boundary, initialize side [unit, this commit] Layer C — convergence measurement harness [integration, existing] Layer B (single-replica integration: admin toggle -> rate-limiter plugin shutdown fires -> counter cleared) and Layer D (convergence threshold under wipe-on-disable) are next on the ladder. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
… real plugin + Redis
Adds the integration-test rung between the framework-boundary unit
tests in test_tenant_plugin_manager_tool_scoped.py (which use stub
plugins) and the full HTTP-driven tests in
test_rate_limiter_dynamic_behavior.py (which require a running gateway
with admin auth).
This test loads a real cpex-rate-limiter plugin into a real
TenantPluginManagerFactory against a real Redis, deposits counter
keys via manager.invoke_hook, simulates an operator mode toggle to
"disabled" by setting the framework's mode key in Redis directly,
then drives factory.reload_tenant and asserts the counter keys were
wiped — proving the rebuild path reaches the plugin's wipe-on-disable
code path end-to-end without HTTP, pubsub, or admin-handler
involvement.
What this pins (the boundary between Layer A and the HTTP layer):
* factory.reload_tenant -> manager.shutdown -> registry.shutdown ->
plugin.shutdown all fire in sequence against a real plugin.
* The rate-limiter plugin's wipe-on-disable code path is reached,
reads the mode key, and clears its counters from Redis.
Out of scope (covered by existing tests at adjacent layers):
* Admin HTTP -> reload_tenant pathway.
* Pubsub-driven multi-worker fan-out.
* Convergence timing across multiple workers.
Layered toggle-behaviour coverage on this branch (after this commit):
Layer A — framework boundary, shutdown side [unit, prev commit]
Layer Ai — framework boundary, initialize side [unit, prev commit]
Layer B-0 — manager-mediated plugin wipe [integration, this commit]
Layer C — convergence measurement harness [integration, existing]
Layer D (convergence threshold under wipe-on-disable) is the next
rung once empirical validation against a wipe-enabled gateway image
confirms the hypothesis.
CI behaviour: this test lives in tests/integration/ and is therefore
auto-skipped by the pytest_integration_mark plugin unless the runner
passes --with-integration. CI today does NOT pass that flag, so the
test does not run automatically on PR CI — same as every other
integration test in this directory. It is exercised locally by
developers who explicitly opt in. Once the cpex-plugins
wipe-on-disable PR merges and the wipe-enabled wheel is published,
local --with-integration runs will start exercising the wipe
assertions; CI inclusion would require wiring up an integration-test
job, which is framework-level work tracked separately.
Skip-guards:
* Skips when docker is unavailable (cannot stand up the test Redis).
* Skips when the installed cpex-rate-limiter wheel does not include
the wipe-on-disable code path (no _wipe_my_counters attribute on
the plugin class) — i.e. when running against the PyPI baseline
before the cpex-plugins wipe PR merges.
Test infrastructure additions:
* tests/integration/fixtures/configs/rate_limiter_redis_only.yaml
— minimal plugin config loading just the rate limiter with
backend=redis, sourcing redis_url via "{{ env.REDIS_URL }}"
(Jinja substitution at load time; matches the pattern the
gateway itself uses for its own Redis URL).
* Dedicated host port 16380 so the test container is hermetic and
never collides with a developer's local Redis on 6379 or with
other smoke-test stacks.
Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
Adds B-0.5 — the next rung past B-0 — which exercises the real
publish/subscribe round-trip that B-0 deliberately skipped. B-0
sets the mode key in Redis directly and calls factory.reload_tenant
directly; this test calls publish_plugin_mode_change (the function
the admin handler invokes), captures the resulting message off the
real Redis channel, and feeds it through _handle_invalidation_message
(the function the gateway's listener calls).
What B-0.5 pins that B-0 cannot:
* The publisher's actual wire format on the channel — channel name,
JSON keys, value types — matches what _handle_invalidation_message
accepts and routes to invalidate_all_plugin_managers ->
factory.invalidate_all -> reload_tenant -> manager.shutdown ->
plugin.shutdown -> wipe.
* A regression where publish_plugin_mode_change and the handler
drift on format (e.g. one renames a JSON key without the other)
would still pass every existing mocked-JSON pubsub test, because
those tests construct the message bytes themselves. This test
PUBLISHes via the real publisher and receives off the real
channel, so any drift breaks the round-trip.
Wiring fix discovered while writing this test: the framework's
shared-redis access goes through a dependency-inversion shim
(set_shared_redis_provider in mcpgateway/plugins/framework/_redis.py),
not via mcpgateway.utils.redis_client._client. Documented in the
test's comment so future readers don't go down the wrong path.
Layered toggle-behaviour coverage on this branch (after this commit):
Layer A — framework boundary, shutdown side [unit]
Layer Ai — framework boundary, initialize side [unit]
Layer B-0 — manager-mediated plugin wipe [integration]
Layer B-0.5 — pubsub round-trip wipe [integration, this commit]
Layer C — convergence measurement harness [integration, existing]
Layer D (convergence threshold under wipe-on-disable) remains gated
on empirical validation against a wipe-enabled gateway image.
CI behaviour and skip-guards: identical to B-0 (auto-skipped by the
pytest_integration_mark plugin unless --with-integration is passed;
skips when cpex-rate-limiter wheel lacks the wipe-on-disable code
path; runs locally against a wipe-enabled wheel).
Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
…lti-replica wipe probe via Redis pubsub Two coupled changes: 1. Align the test fixtures' plugin name with the gateway's own plugins/config.yaml convention. Before this commit, the wipe-on-disable yaml fixture and the boundary tests used name="RateLimiter" while the gateway's production plugin uses name="RateLimiterPlugin", creating two disjoint mode-key namespaces. Touches the yaml fixture and the explicit mode-key writes in B-0 / B-0.5. No behavioural change to either test — they were internally consistent before; this just makes them match the gateway convention. 2. Add tests/integration/test_rate_limiter_toggle_via_redis_pubsub.py — a multi-replica wipe-on-disable convergence probe that bypasses HTTP entirely. Talks only to Redis: SETs the mode key, PUBLISHes the invalidation frame, then polls until the counter key disappears. Records elapsed time as the convergence measurement. Companion to test_rate_limiter_toggle_live_state.py (Layer C), which drives the same toggle via HTTP+MCP and is therefore amplified by the STREAMABLEHTTP transport. This new test measures the wipe-on-disable claim itself, not "the claim plus measurement overhead." Skip behaviour: skips when the gateway stack isn't running or Redis isn't reachable. Does not skip when wipe-on-disable code is missing from the running gateway — fails with a clear timeout message instead, which is the right semantics for a deployment that has wipe enabled. Known caveat (not fixed by this commit, surfaced empirically while testing): factory.invalidate_all() iterates *cached* managers only. A fresh gateway with no traffic has zero cached managers, so the wipe-on-disable path is unreachable until a request warms a manager for some context. Worth flagging in the cpex-plugins wipe-on-disable PR description as a deployment caveat. Following commits will add a one-request warm-up step to the test so it can run from a clean stack; this commit lands the test as written and the rename together. Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
Member
|
DO NOT MERGE before #3754 is merged. |
…stack
Adds the two pre-condition steps the test needed to actually exercise
the wipe path against a fresh gateway, discovered empirically while
running the test against the wipe-enabled multi-replica stack:
1. _set_plugin_mode("enforce") via the existing admin-API helper.
plugins/config.yaml ships with mode=disabled for the rate limiter,
and the framework loader does not instantiate disabled plugins
under any cached manager. No plugin instance ⇒ no shutdown to
fire ⇒ no wipe path to run, regardless of how many subscribers
the broadcast reaches. Toggling to enforce first writes the
mode key, publishes the invalidation frame, and primes every
worker's local override map so the next manager-build
instantiates the plugin in enforce mode.
2. _send_tool_burst(server_id, tool_name, 20). factory.invalidate_-
all() iterates only *cached* managers. Each gunicorn worker has
its own factory; without prior traffic on a worker, that
factory's _managers dict is empty and the broadcast triggers no
reload_tenant. 20 MCP tools/call requests ride gunicorn's load
balancer across many of the 72 workers (3 replicas × 24 workers),
so multiple workers cache a manager — one warm worker firing its
plugin's shutdown is enough since the wipe runs SCAN+DEL against
shared Redis, but more warm workers raises the floor against
transient races. These are amplified by the STREAMABLEHTTP
transport but are a one-off pre-condition; the convergence
measurement loop below them stays HTTP-free.
Empirically confirmed against a multi-replica wipe-enabled stack:
the wipe converges in ~0.11s — vs. the ~22s baseline observed in
this PR's existing live-state probe. ~170x improvement, consistent
with the hypothesis that wipe-on-disable doesn't speed up the cache
rebuild but makes the rebuild's user-visible latency irrelevant
(stale-mode workers with empty counters cannot block any request).
Two iterations took to land this:
* 1-warm-up run: didn't fire (only 1 of 72 workers warmed; not
determinative on its own — see below).
* 20-warm-ups + enforce-pre-condition run: ✓ converged in ~0.11s.
The 1-warm-up failure was almost certainly because mode=disabled in
plugins/config.yaml prevented the plugin from being loaded under
the warmed manager (path B in the iterative diagnosis), not because
1 warm worker was insufficient. The enforce pre-condition was the
load-bearing fix; the bump from 1 to 20 warm-ups is defensive
against load-balancing races.
Signed-off-by: Pratik Gandhi <gandhipratik203@gmail.com>
Collaborator
Author
|
@araujof Noted. Won't merge with mcpgateway/plugins changes. Will be removing the files under mcpgateway/plugins. So the PR will only have the changes for tests/integration. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two integration tests covering rate-limiter behaviour under runtime mode toggle via the admin API. Marked draft because they raise design questions about the rate limiter's toggle-latency contract that need team input before a final shape lands.
The rate limiter is the only stateful plugin we ship — counter state lives in Redis, mode is embedded in a per-worker cached plugin instance. That makes its response to a runtime mode toggle eventually consistent, not instantaneous, in a way the other supported plugins (Output Length Guard, Secrets Detection) aren't.
What's in this PR
Two tests, both currently passing:
TestRateLimiterToggleAfterTTLExpiry— correctness contracttests/integration/test_rate_limiter_dynamic_behavior.pyThree-phase cycle:
enforce→ wait past Redis counter TTL →disabled→ wait →enforce. Validates that re-enforcement works after a quiet period long enough for counters to age out. Tagged@pytest.mark.slowbecause of two ~75s sleeps.TestRateLimiterToggleLiveState::test_measure_convergence_time_for_mode_toggle— latency probetests/integration/test_rate_limiter_toggle_live_state.pyFlips mode, then polls every second until traffic actually reflects the new mode, recording the elapsed time and the per-poll trajectory. No threshold-based assertion on the convergence time — the test fails only if convergence never happens within
MAX_CONVERGENCE_S. Useful both as a regression pin (sudden slowdown caught) and as a measurement tool.Sample observation
One local run, single replica × 24 workers, post-current-main image:
The ~22s disabled-convergence with mid-trajectory oscillation is suspicious — looks like pub/sub-driven invalidation is correct for most workers within a few seconds, but a fraction takes much longer to converge, possibly relying on the 30s TTL-based cache safety net rather than the pub/sub eviction. Convergence happens; just not as fast as expected.
Why this is different from stateless plugins
Stateless plugins decide based only on the current request — the moment a worker's mode flips to disabled, every subsequent request reflects it. Rate limiter decides based on
(mode, persistent counter state). When mode flips, the counter is still hot in Redis, so any worker still serving requests under stale-mode-enforce will still block them. The convergence time is a property of how long it takes every worker's plugin-manager cache to rebuild with the new mode.Open design questions for review
disabled → enforce. Today we don't reset the window — flipping back to enforce on an already-hot bucket immediately blocks. Is that the right call? Some operators might expect a fresh window on re-enable. Either is defensible.disabledmode actually reset Redis state? Today it stops checking counters but doesn't clear them. Same question from the other direction.What this PR is not
Not a code fix. The rate limiter works; the question is how it behaves under toggle and what we want that behaviour to be. The tests document current behaviour and provide a measurement harness for any future fix.
How to run
Prerequisites
make compose-up(3 gateway replicas behind nginx on:8080, Redis, Postgres, fast-test-server registered).make docker-buildif your local image is stale..env(the.env.exampledefaults work for this test).uv sync --extra pluginssocpex-rate-limiteris importable in the test env.Run
Both tests skip cleanly if
http://localhost:8080/healthisn't reachable.Update — empirical evidence and layered coverage
Since the original draft, this branch has grown into a layered toggle-behaviour story. Adding here, not editing the above — the original sections remain the right snapshot of the problem statement and findings at the time the PR was opened.
Layered toggle-behaviour coverage now on this branch
reload_tenant → manager.shutdowninvocationtests/unit/mcpgateway/plugins/framework/test_tenant_plugin_manager_tool_scoped.pyreload_tenant → new manager.initializeinvocationfactory.reload_tenanttests/integration/test_plugin_manager_wipe_on_disable.pytests/integration/test_rate_limiter_toggle_live_state.pytests/integration/test_rate_limiter_toggle_via_redis_pubsub.pyEmpirical answer to design question #3
Question #3 asked whether
disabledmode should clear Redis counter state. The answer landed on "yes, clear them," and the implementation lives on cpex-plugins branchfeat/rate-limiter-wipe-on-disable-only. Running the new multi-replica pub/sub probe against a wipe-enabled gateway image (3 replicas × 24 workers, plain docker-compose stack):vs. the ~22s baseline observed in the existing live-state probe at the top of this PR — a ~170× improvement on the disabled-direction convergence. Note the wipe doesn't accelerate the cache rebuild; it makes the rebuild's user-visible latency irrelevant — a stale-mode worker with empty counters cannot block any request.
Pre-conditions discovered while empirically running
Two things had to be in place for the wipe path to actually fire from a clean stack — worth flagging on the cpex-plugins wipe-on-disable PR description as deployment caveats:
plugins/config.yamlships rate-limiter withmode: "disabled", and the framework loader skips disabled plugins. The probe first toggles toenforcevia the admin API so the plugin is genuinely loaded.factory.invalidate_all()iterates only cached managers; a fresh gateway with zero traffic has zero cached managers, so the broadcast triggers noreload_tenantand noplugin.shutdown. The probe sends a 20-request warm-up burst before the disable to ride gunicorn's load balancer across multiple workers.Status of original design questions
enforce → disableddirection converges in ~0.1s. The user-visible SLA could plausibly be "< 1 second" instead of "within N seconds" — but thedisabled → enforcedirection is unchanged by the wipe and still depends on cache rebuild propagation. Worth team input on whether one SLA covers both.