Add autonomous agent system with Discord bot integration by fluidnumericsJoe · Pull Request #5 · FluidNumerics/spectre-ensembles

fluidnumericsJoe · 2026-03-29T13:49:30Z

Summary

This PR introduces a complete autonomous agent system for MITgcm simulation orchestration on the Spectre (Franklin) cluster, with Discord bot integration for bidirectional user communication. The system uses Claude AI agents via the Claude Agent SDK to manage the full simulation lifecycle: configuration validation, job submission, failure diagnosis, and recovery.

Key Changes

Core Agent System

Agent Framework (spectre_agents/agents/): Nine specialized agents with distinct responsibilities:
- orchestrator.py: Top-level lifecycle manager coordinating sub-agents
- workflow_runner.py: SLURM job submission and process management
- stdout_diagnostics.py: MITgcm STDOUT failure classification and diagnosis
- model_output_review.py: Physical plausibility assessment of simulation output
- namelist_validator.py: Pre-run configuration validation
- forcing_data_qc.py: EXF and OBC binary file validation
- dashboard_manager.py: Monitoring infrastructure lifecycle
- notify.py: Discord notification delivery
- web_research.py: Technical research capability
- base.py: Common agent infrastructure and tool registration

Tool Ecosystem

File I/O (tools/file_io.py): Read, write, edit, glob, grep operations
SLURM (tools/slurm.py): Job submission, status queries, cancellation
MITgcm (tools/mitgcm.py): STDOUT parsing, monitor stats extraction, CFL analysis
Forcing Data (tools/forcing.py): EXF and OBC binary validation with physical range checks
Namelist (tools/namelist.py): Fortran namelist parsing and cross-validation
Dashboard (tools/dashboard.py): Monitoring stack health checks and lifecycle
Bash (tools/bash.py): Safe subprocess execution with denylist protection
Discord (tools/discord_notify.py): Message posting, image uploads, interactive decisions

Discord Bot Integration

Bot Core (discord_bot/bot.py): Discord client with command tree and decision queue processing
Slash Commands (discord_bot/commands.py): /run group (start, status, stop, resubmit), /diagnose, /validate, /dashboard commands
Rich Embeds (discord_bot/embeds.py): Color-coded status, failure, health, and decision embeds
Interactive Views (discord_bot/views.py): Decision buttons for user approval flows

Configuration & Context

Config System (config.py): YAML-based configuration with environment variable overrides, per-agent model selection
Agent Context (context.py): Shared state between bot and agents, decision queue, simulation state persistence
Type Definitions (types.py): Enums and dataclasses for failures, health status, validation results

Entry Point & Documentation

Main (__main__.py): Async entry point orchestrating bot and agent runner
Setup Guide (docs/discord-setup.md): Complete walkthrough for Discord bot creation, server setup, secrets configuration, local testing, and systemd service installation
Systemd Service (systemd/spectre-agents.service): Service unit for daemon deployment

Notable Implementation Details

Thread Pool Execution: Agent invocations run in a ThreadPoolExecutor to keep the Discord bot responsive during long-running operations
Decision Queue: Interactive approval flows use asyncio Futures resolved by Discord button clicks, blocking the orchestrator until user input
State Persistence: Simulation state (job ID, status, model days, CFL) is saved to .spectre-agents-state.json for daemon restart resilience
Safety Denylist: Bash tool includes regex patterns to prevent destructive commands (rm -rf /, mkfs, fork bombs, etc.)
Physical Range Validation: Forcing data QC checks EXF variables against expected ranges (e.g., atemp 240-320K, aqh 0-0.025 kg/kg)
**CFL-Based

https://claude.ai/code/session_01WNamUYvvru6xmxpPLqqW4f

Convert all 9 .claude/agents/*.md definitions to standalone Python agents using the Claude Agent SDK, with a Discord bot replacing Slack for bidirectional simulation operations. New spectre_agents package: - 9 agent classes mirroring .md agents (orchestrator, workflow-runner, stdout-diagnostics, model-output-review, namelist-validator, forcing-data-qc, dashboard-manager, notify, web-research) - 8 tool modules (bash, file_io, slurm, mitgcm, forcing, namelist, dashboard, discord_notify) - Discord bot with slash commands (/run, /diagnose, /review, /validate, /qc, /dashboard, /ensemble, /config) - Interactive decision views (buttons) for orchestrator halting triggers - Systemd service for daemon deployment on Spectre cluster - Complete setup documentation in docs/discord-setup.md https://claude.ai/code/session_01WNamUYvvru6xmxpPLqqW4f

Add a knowledge handler that listens in #ask-mitgcm for natural language questions about MITgcm, ERA5, oceanography, and the codebase. Uses Claude (Sonnet) with a comprehensive system prompt derived from CLAUDE.md, plus WebSearch/WebFetch for live documentation lookups. Runs on the same bot instance as the simulation ops bot — no second token needed. Long answers auto-create threads to keep the channel clean. https://claude.ai/code/session_01WNamUYvvru6xmxpPLqqW4f

claude added 2 commits March 29, 2026 13:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add autonomous agent system with Discord bot integration#5

Add autonomous agent system with Discord bot integration#5
fluidnumericsJoe wants to merge 2 commits intomainfrom
claude/markdown-to-python-agents-XW9Kv

fluidnumericsJoe commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fluidnumericsJoe commented Mar 29, 2026

Summary

Key Changes

Core Agent System

Tool Ecosystem

Discord Bot Integration

Configuration & Context

Entry Point & Documentation

Notable Implementation Details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants