Skip to content

Add autonomous agent system with Discord bot integration#5

Open
fluidnumericsJoe wants to merge 2 commits intomainfrom
claude/markdown-to-python-agents-XW9Kv
Open

Add autonomous agent system with Discord bot integration#5
fluidnumericsJoe wants to merge 2 commits intomainfrom
claude/markdown-to-python-agents-XW9Kv

Conversation

@fluidnumericsJoe
Copy link
Copy Markdown
Member

Summary

This PR introduces a complete autonomous agent system for MITgcm simulation orchestration on the Spectre (Franklin) cluster, with Discord bot integration for bidirectional user communication. The system uses Claude AI agents via the Claude Agent SDK to manage the full simulation lifecycle: configuration validation, job submission, failure diagnosis, and recovery.

Key Changes

Core Agent System

  • Agent Framework (spectre_agents/agents/): Nine specialized agents with distinct responsibilities:
    • orchestrator.py: Top-level lifecycle manager coordinating sub-agents
    • workflow_runner.py: SLURM job submission and process management
    • stdout_diagnostics.py: MITgcm STDOUT failure classification and diagnosis
    • model_output_review.py: Physical plausibility assessment of simulation output
    • namelist_validator.py: Pre-run configuration validation
    • forcing_data_qc.py: EXF and OBC binary file validation
    • dashboard_manager.py: Monitoring infrastructure lifecycle
    • notify.py: Discord notification delivery
    • web_research.py: Technical research capability
    • base.py: Common agent infrastructure and tool registration

Tool Ecosystem

  • File I/O (tools/file_io.py): Read, write, edit, glob, grep operations
  • SLURM (tools/slurm.py): Job submission, status queries, cancellation
  • MITgcm (tools/mitgcm.py): STDOUT parsing, monitor stats extraction, CFL analysis
  • Forcing Data (tools/forcing.py): EXF and OBC binary validation with physical range checks
  • Namelist (tools/namelist.py): Fortran namelist parsing and cross-validation
  • Dashboard (tools/dashboard.py): Monitoring stack health checks and lifecycle
  • Bash (tools/bash.py): Safe subprocess execution with denylist protection
  • Discord (tools/discord_notify.py): Message posting, image uploads, interactive decisions

Discord Bot Integration

  • Bot Core (discord_bot/bot.py): Discord client with command tree and decision queue processing
  • Slash Commands (discord_bot/commands.py): /run group (start, status, stop, resubmit), /diagnose, /validate, /dashboard commands
  • Rich Embeds (discord_bot/embeds.py): Color-coded status, failure, health, and decision embeds
  • Interactive Views (discord_bot/views.py): Decision buttons for user approval flows

Configuration & Context

  • Config System (config.py): YAML-based configuration with environment variable overrides, per-agent model selection
  • Agent Context (context.py): Shared state between bot and agents, decision queue, simulation state persistence
  • Type Definitions (types.py): Enums and dataclasses for failures, health status, validation results

Entry Point & Documentation

  • Main (__main__.py): Async entry point orchestrating bot and agent runner
  • Setup Guide (docs/discord-setup.md): Complete walkthrough for Discord bot creation, server setup, secrets configuration, local testing, and systemd service installation
  • Systemd Service (systemd/spectre-agents.service): Service unit for daemon deployment

Notable Implementation Details

  • Thread Pool Execution: Agent invocations run in a ThreadPoolExecutor to keep the Discord bot responsive during long-running operations
  • Decision Queue: Interactive approval flows use asyncio Futures resolved by Discord button clicks, blocking the orchestrator until user input
  • State Persistence: Simulation state (job ID, status, model days, CFL) is saved to .spectre-agents-state.json for daemon restart resilience
  • Safety Denylist: Bash tool includes regex patterns to prevent destructive commands (rm -rf /, mkfs, fork bombs, etc.)
  • Physical Range Validation: Forcing data QC checks EXF variables against expected ranges (e.g., atemp 240-320K, aqh 0-0.025 kg/kg)
  • **CFL-Based

https://claude.ai/code/session_01WNamUYvvru6xmxpPLqqW4f

claude added 2 commits March 29, 2026 13:48
Convert all 9 .claude/agents/*.md definitions to standalone Python agents
using the Claude Agent SDK, with a Discord bot replacing Slack for
bidirectional simulation operations.

New spectre_agents package:
- 9 agent classes mirroring .md agents (orchestrator, workflow-runner,
  stdout-diagnostics, model-output-review, namelist-validator,
  forcing-data-qc, dashboard-manager, notify, web-research)
- 8 tool modules (bash, file_io, slurm, mitgcm, forcing, namelist,
  dashboard, discord_notify)
- Discord bot with slash commands (/run, /diagnose, /review, /validate,
  /qc, /dashboard, /ensemble, /config)
- Interactive decision views (buttons) for orchestrator halting triggers
- Systemd service for daemon deployment on Spectre cluster
- Complete setup documentation in docs/discord-setup.md

https://claude.ai/code/session_01WNamUYvvru6xmxpPLqqW4f
Add a knowledge handler that listens in #ask-mitgcm for natural language
questions about MITgcm, ERA5, oceanography, and the codebase. Uses Claude
(Sonnet) with a comprehensive system prompt derived from CLAUDE.md, plus
WebSearch/WebFetch for live documentation lookups.

Runs on the same bot instance as the simulation ops bot — no second token
needed. Long answers auto-create threads to keep the channel clean.

https://claude.ai/code/session_01WNamUYvvru6xmxpPLqqW4f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants