Skip to content

Commit 9975f03

Browse files
Lang-Akshayjonpspri
authored andcommitted
feat(discovery): add automatic tool discovery with hot/cold classification (#3839)
Implement automatic tool discovery for upstream MCP servers via usage-aware adaptive polling. The gateway can now continuously synchronise tool lists from registered servers without manual intervention. Server classification (hot/cold): - Classify servers based on MCP session pool usage patterns - Hot servers (top 20% by recent usage): polled at 1x base interval - Cold servers (remaining 80%): polled at 3x base interval - Classification is deterministic: sorted by recency, active sessions, use count, and URL for tie-breaking - Leader election via Redis with TTL renewal for multi-worker coordination - Falls back to local-only operation without Redis Integration with GatewayService: - Health checks respect hot/cold classification intervals - Auto-refresh of tools/resources/prompts respects classification - Fail-open on classification errors (poll anyway) - Poll timestamps tracked via Redis with TTL expiry - Uses base gateway URL (pre-auth) for classification lookups to avoid leaking query-param auth secrets to Redis Configuration: - AUTO_REFRESH_SERVERS=true enables automatic tool sync (default: false) - GATEWAY_AUTO_REFRESH_INTERVAL=300 sets base polling interval - HOT_COLD_CLASSIFICATION_ENABLED=false (opt-in, requires Redis) Includes comprehensive tests with 100% coverage on the new ServerClassificationService and integration tests for the GatewayService hot/cold polling paths. Closes #3734 Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
1 parent ad15e8a commit 9975f03

11 files changed

Lines changed: 3896 additions & 85 deletions

.env.example

Lines changed: 20 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2145,13 +2145,28 @@ OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
21452145
# Maximum concurrent health checks per worker (default: 10)
21462146
# MAX_CONCURRENT_HEALTH_CHECKS=10
21472147

2148-
# Enable automatic tools/prompts/resources refresh from the mcp servers during health checks (default: false)
2149-
# If the tools/prompts/resources in the mcp servers are not updated frequently, it is recommended to keep this disabled to reduce load on the servers
2148+
# -----------------------------------------------------------------------------
2149+
# Auto-Refresh / Polling (requires health checks above)
2150+
# -----------------------------------------------------------------------------
2151+
# Automatically re-fetch tools, prompts, and resources from upstream MCP
2152+
# servers during health-check cycles. When disabled (default), tool lists
2153+
# are only updated on manual refresh or gateway registration.
2154+
#
21502155
# AUTO_REFRESH_SERVERS=false
2156+
# GATEWAY_AUTO_REFRESH_INTERVAL=300 # interval in seconds (minimum: 60)
21512157

2152-
# Default refresh interval in seconds for gateway tools/resources/prompts sync
2153-
# Minimum: 60 seconds
2154-
# GATEWAY_AUTO_REFRESH_INTERVAL=300
2158+
# -----------------------------------------------------------------------------
2159+
# Hot/Cold Server Classification (requires auto-refresh + Redis)
2160+
# -----------------------------------------------------------------------------
2161+
# Classifies upstream servers by MCP session pool usage:
2162+
# hot (top 20% by recent usage) → polled at 1x GATEWAY_AUTO_REFRESH_INTERVAL
2163+
# cold (remaining 80%) → polled at 3x GATEWAY_AUTO_REFRESH_INTERVAL
2164+
#
2165+
# Requires Redis for multi-worker leader election and state sharing.
2166+
# Falls back to local-only (always-poll) in single-worker mode (make dev).
2167+
# Poll intervals are auto-derived — no additional config needed.
2168+
#
2169+
# HOT_COLD_CLASSIFICATION_ENABLED=false
21552170

21562171
# File lock name for gateway service leader election
21572172
# Used to coordinate multiple gateway instances when running in cluster mode

0 commit comments

Comments
 (0)