Show & Tell: Phase 10.4 — ExecutionMonitor (async event queue, health scoring, and stall detection) #331
web3guru888
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
What is ExecutionMonitor?
ExecutionMonitoris the observability layer for Phase 10's goal execution pipeline. WhilePlanExecutor(#326) coordinates task dispatch,ExecutionMonitorwatches every state transition in real time, computes a plan health score, detects stalled sub-tasks, and surfaces aMonitorViewthat theCognitiveCycleuses to decide whether to trigger theReplanningEngine(Phase 10.5).Architecture
EventType Enum — 6 State Transitions
TASK_STARTEDRUNNINGTASK_COMPLETEDSUCCESSTASK_FAILEDFAILEDTASK_RETRIEDRUNNINGTASK_SKIPPEDSKIPPEDPLAN_COMPLETEDis_complete=True)Health Score Formula
RUNNINGtasks count against health because they are uncertain — a running task might still fail.PENDINGtasks (not yet started) are excluded.Example: Diamond DAG
Execution events arrive in topological order:
Stall Detection Code
A stall is:
state == RUNNINGANDelapsed_ms > stall_threshold_ms. Default threshold: 30 seconds.5 Prometheus Metrics
execmon_events_totalevent_typeexecmon_plans_trackedexecmon_stalls_detected_totalexecmon_health_scoreplan_idexecmon_view_latency_secondsPromQL
Open Questions
Push vs Pull: Should
PlanExecutorpush events directly toExecutionMonitor.ingest(), or shouldExecutionMonitorpollPlanExecutor.snapshot()on a timer? Async queue push is lower latency but creates a dependency.Multi-monitor fan-out: Can multiple
ExecutionMonitorinstances subscribe to the samePlanExecutorfor different consumers (e.g., one for UI, one for replanning)?Persistence: Should completed
MonitorViewsnapshots be written to theBlackboardfor post-mortem analysis, or kept in-memory with TTL eviction only?Full spec: #330
Beta Was this translation helpful? Give feedback.
All reactions