Show & Tell: FederationHealthMonitor — weighted health score, SSE stream, and cluster circuit breaker #316
web3guru888
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Overview
FederationHealthMonitoris the final sub-phase of Phase 9 — the observability layer that ties together all four federation components into a single health signal, with SSE streaming and a cluster-level circuit breaker.Filed as: Issue #315
Component Map
Health Score Computation
On every poll tick (default: 5 seconds), the monitor calls
snapshot()on all four components and computes:Component scoring logic
Circuit Breaker
FederatedTaskRouter.route()returnsRoutingDecision.FALLBACKimmediatelyCognitiveCycle._phase_federation()writesfederation.circuit_open = Trueto BlackboardSSE Health Stream
Each subscriber gets a dedicted
asyncio.Queue(maxsize=100). Slow consumers are evicted (not blocked):This pattern prevents the poll loop from being blocked by a slow HTTP client.
FederationHealthEvent Data Model
Example event payload (JSON)
{ "timestamp_ms": 1744497000000, "components": [ {"name": "gateway", "health": "healthy", "score": 0.92, "details": "11/12 healthy peers"}, {"name": "blackboard", "health": "healthy", "score": 0.95, "details": "drift=43ms"}, {"name": "router", "health": "degraded", "score": 0.72, "details": "fallback=14.0%"}, {"name": "consensus", "health": "healthy", "score": 0.88, "details": "abort_ratio=4.0%"} ], "overall_score": 0.868, "circuit_open": false, "consecutive_low": 0 }Prometheus Metrics
asi_federation_health_scorecomponentasi_federation_health_polls_totalasi_federation_circuit_trips_totalasi_federation_circuit_openasi_federation_component_healthcomponent,healthPromQL Alerts
12 Test Targets
test_score_gateway_all_healthytest_score_gateway_no_peerstest_score_gateway_degradedtest_score_blackboard_high_drifttest_score_router_high_fallbacktest_score_consensus_high_aborttest_poll_loop_emits_eventstest_circuit_breaker_tripstest_circuit_breaker_resettest_health_stream_subscribertest_snapshot_reflects_statetest_build_health_monitor_factoryOpen Questions
auto_reset_after_s=300) vs. always manual?dequewithmaxleninstead of aQueue?sepolia-exporter— should Phase 9.5 addasi_federation_commits_onchain_totalcounter anchored toCommitCertificatehashes?Beta Was this translation helpful? Give feedback.
All reactions