Benchmark docker compose deployment fix#4003
Conversation
Signed-off-by: Dmitry Zakharov <zakharov@ibm.com>
Signed-off-by: Dmitry Zakharov <zakharov@ibm.com>
madhu-mohan-jaishankar
left a comment
There was a problem hiding this comment.
This fix correctly addresses the stale cache issue after gateway registration by invalidating the registry and tool lookup caches immediately after db.flush().
LGTM
Lang-Akshay
left a comment
There was a problem hiding this comment.
Thanks you @dima-zakharov for the PR. Please implement following changes.
Finding #1: Race Condition in Cache Invalidation (CWE-362)
Problem
Cache invalidation broadcasts fire before the database transaction commits in register_gateway(). Other workers receiving the invalidation signal may query the DB before the commit completes, potentially caching empty/stale data.
Current Flow:
db.flush()— writes to DB (uncommitted)- Cache invalidation + Redis broadcast (lines 485–495)
- Return to caller
db.commit()happens later inget_db()dependency
Caution
Impact: Workers can cache stale data during the window between invalidation broadcast and commit.
Solution
Move cache invalidation to occur after db.commit(), matching the pattern already used in update_gateway() (lines 1634–1646). Either:
- Add explicit
db.commit()before invalidation inregister_gateway() - Use post-commit hooks/events to trigger invalidation
Finding #2: Unconditional Expensive Cache Operations (CWE-400)
Problem
invalidate_gateway() is called unconditionally on every gateway registration, even when the gateway has zero tools. This triggers O(N) Redis operations (key scans + pub/sub broadcast) unnecessarily.
Caution
Impact:
- Wasted resources on empty gateways
- No rate limiting on
POST /gateways - Potential thundering herd on Redis pub/sub channel
Solution
Quick fix: Only call invalidate_gateway() when tools exist:
if tools:
await tool_lookup_cache.invalidate_gateway(str(db_gateway.id))Additional improvements:
- Add rate limiting to gateway registration endpoint
- Optimize
invalidate_gateway()to check for cached data before scanning - Consider lazy invalidation strategies
Add tests for this
|
Thanks for identifying this stale-cache issue, @dima-zakharov — the root cause you found was real and impactful for benchmark workflows. However, this fix has already been superseded by #3839 (
After rebasing this branch onto current @Lang-Akshay — both of your review findings are also addressed on
Closing as already fixed. Thanks again for the contribution! |
🔗 Related Issue
The deployment and benchmark creates situation when tools not found:
As a result of this one cannot test with
make benchmark-mcp-tools📝 Summary
The PR fixes the gateway deployment issue, the tools list is empty when deployed with :
make testing-up-rust-full🏷️ Type of Change
🧪 Verification
make lintmake testmake coverage✅ Checklist
make black isort pre-commit)📓 Notes (optional)
Screenshots, design decisions, or additional context.