Skip to content

Commit 1648687

Browse files
committed
Add autonomous Python agent system with Discord bot integration
Convert all 9 .claude/agents/*.md definitions to standalone Python agents using the Claude Agent SDK, with a Discord bot replacing Slack for bidirectional simulation operations. New spectre_agents package: - 9 agent classes mirroring .md agents (orchestrator, workflow-runner, stdout-diagnostics, model-output-review, namelist-validator, forcing-data-qc, dashboard-manager, notify, web-research) - 8 tool modules (bash, file_io, slurm, mitgcm, forcing, namelist, dashboard, discord_notify) - Discord bot with slash commands (/run, /diagnose, /review, /validate, /qc, /dashboard, /ensemble, /config) - Interactive decision views (buttons) for orchestrator halting triggers - Systemd service for daemon deployment on Spectre cluster - Complete setup documentation in docs/discord-setup.md https://claude.ai/code/session_01WNamUYvvru6xmxpPLqqW4f
1 parent e9e7030 commit 1648687

34 files changed

Lines changed: 3308 additions & 2 deletions

docs/discord-setup.md

Lines changed: 244 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,244 @@
1+
# SPECTRE Agent System — Complete Setup Guide
2+
3+
This guide covers setting up the autonomous Python agent system with Discord bot integration on the Spectre (Franklin) cluster.
4+
5+
## Prerequisites
6+
7+
- Python 3.11+ on the cluster login/utility node
8+
- `uv` package manager installed
9+
- Access to SLURM commands (`sbatch`, `sacct`, `squeue`)
10+
- BeeGFS mounted at `/mnt/beegfs/`
11+
- An Anthropic API key with access to Claude Opus 4.6, Sonnet 4.6, and Haiku 4.5
12+
- A Discord account with permission to create bots
13+
14+
---
15+
16+
## Step 1: Create a Discord Bot
17+
18+
1. Go to https://discord.com/developers/applications
19+
2. Click **New Application** and name it `SPECTRE Bot`
20+
3. Go to the **Bot** tab:
21+
- Click **Add Bot** (if not already created)
22+
- Copy the **Token** — save it securely, you'll need it later
23+
- Enable **Message Content Intent** (required for reading messages)
24+
- Enable **Server Members Intent**
25+
4. Go to the **OAuth2 → URL Generator** tab:
26+
- **Scopes**: select `bot` and `applications.commands`
27+
- **Bot Permissions**: select:
28+
- Send Messages
29+
- Embed Links
30+
- Attach Files
31+
- Use Slash Commands
32+
- Read Message History
33+
- Create Public Threads
34+
- Copy the generated URL
35+
5. Open the URL in your browser and add the bot to your Discord server
36+
37+
## Step 2: Set Up Discord Server Channels
38+
39+
Create these channels in your Discord server:
40+
41+
| Channel | Purpose |
42+
|---------|---------|
43+
| `#simulation-status` | Automated status updates, milestones |
44+
| `#decisions` | Interactive decision requests with buttons |
45+
| `#alerts` | Failure alerts and critical warnings |
46+
| `#plots` | Surface field PNGs, convergence plots |
47+
| `#logs` | Verbose agent activity (optional) |
48+
49+
**Get your Guild (Server) ID:**
50+
- Enable Developer Mode in Discord (Settings → Advanced → Developer Mode)
51+
- Right-click your server name → Copy Server ID
52+
53+
## Step 3: Configure Secrets
54+
55+
Create the secrets file on the cluster:
56+
57+
```bash
58+
sudo mkdir -p /etc/spectre-agents
59+
sudo tee /etc/spectre-agents/env << 'EOF'
60+
ANTHROPIC_API_KEY=sk-ant-your-key-here
61+
DISCORD_BOT_TOKEN=your-bot-token-here
62+
DISCORD_GUILD_ID=your-guild-id-here
63+
EOF
64+
sudo chmod 600 /etc/spectre-agents/env
65+
sudo chown joe:joe /etc/spectre-agents/env
66+
```
67+
68+
## Step 4: Install the Agent System
69+
70+
```bash
71+
cd /mnt/beegfs/spectre-150-ensembles
72+
73+
# Create virtual environment
74+
uv venv .venv
75+
76+
# Install dependencies (includes spectre_agents package)
77+
uv sync
78+
79+
# Verify the package loads
80+
.venv/bin/python -c "from spectre_agents.config import load_config; print('OK')"
81+
```
82+
83+
## Step 5: Test the Bot Locally
84+
85+
Before installing as a service, test interactively:
86+
87+
```bash
88+
cd /mnt/beegfs/spectre-150-ensembles
89+
90+
# Source the secrets
91+
source /etc/spectre-agents/env
92+
export ANTHROPIC_API_KEY DISCORD_BOT_TOKEN DISCORD_GUILD_ID
93+
94+
# Run the agent system
95+
.venv/bin/python -m spectre_agents --config spectre_agents_config.yaml
96+
```
97+
98+
You should see:
99+
```
100+
SPECTRE Agent System starting...
101+
Bot connected as SPECTRE Bot#1234 (ID: ...)
102+
Synced commands to guild ...
103+
```
104+
105+
In Discord, the bot should post "SPECTRE Agent System online" in `#simulation-status`.
106+
107+
Test slash commands:
108+
- `/run status` — should show current (idle) status
109+
- `/validate` — should run namelist validation
110+
- `/dashboard status` — should check dashboard health
111+
112+
Press `Ctrl+C` to stop.
113+
114+
## Step 6: Install as a Systemd Service
115+
116+
```bash
117+
# Copy the service file
118+
sudo cp systemd/spectre-agents.service /etc/systemd/system/
119+
120+
# Reload systemd
121+
sudo systemctl daemon-reload
122+
123+
# Enable and start the service
124+
sudo systemctl enable spectre-agents
125+
sudo systemctl start spectre-agents
126+
127+
# Check status
128+
sudo systemctl status spectre-agents
129+
130+
# View logs
131+
journalctl -u spectre-agents -f
132+
```
133+
134+
## Step 7: Verify Everything Works
135+
136+
1. In Discord, run `/run status` — bot should respond with a status embed
137+
2. Run `/validate` — should trigger namelist validation and return results
138+
3. Run `/dashboard status` — should report dashboard component health
139+
4. Run `/run start` — should validate, submit a SLURM job, and start monitoring
140+
141+
---
142+
143+
## Architecture Overview
144+
145+
```
146+
┌─────────────────────────────────────────────┐
147+
│ Spectre Cluster Node │
148+
│ │
149+
│ systemd: spectre-agents.service │
150+
│ ┌───────────────────────────────────────┐ │
151+
│ │ python -m spectre_agents │ │
152+
│ │ │ │
153+
│ │ Discord Bot (asyncio event loop) │ │
154+
│ │ ├── Slash commands → Agent runner │ │
155+
│ │ ├── Decision queue ← Orchestrator │ │
156+
│ │ └── Status embeds → Discord │ │
157+
│ │ │ │
158+
│ │ Agent Runner (ThreadPoolExecutor) │ │
159+
│ │ ├── Orchestrator (Opus) │ │
160+
│ │ │ delegates to: │ │
161+
│ │ ├── WorkflowRunner (Haiku) │ │
162+
│ │ ├── StdoutDiagnostics (Sonnet) │ │
163+
│ │ ├── ModelOutputReview (Sonnet) │ │
164+
│ │ ├── NamelistValidator (Sonnet) │ │
165+
│ │ ├── ForcingDataQC (Sonnet) │ │
166+
│ │ ├── DashboardManager (Haiku) │ │
167+
│ │ ├── DiscordNotifier (Haiku) │ │
168+
│ │ └── WebResearch (Sonnet) │ │
169+
│ └───────────────────────────────────────┘ │
170+
│ │
171+
│ SLURM ←→ sbatch/sacct/squeue │
172+
│ BeeGFS ←→ /mnt/beegfs/spectre-* │
173+
│ Tailscale ←→ Dashboard proxy │
174+
└─────────────────────────────────────────────┘
175+
```
176+
177+
## Discord Commands Reference
178+
179+
| Command | Description |
180+
|---------|-------------|
181+
| `/run start` | Validate config, submit simulation, start monitoring |
182+
| `/run status` | Show job state, model days, CFL, throughput |
183+
| `/run stop` | Cancel SLURM job, stop monitoring |
184+
| `/run resubmit` | Clear run dir, resubmit from pickup |
185+
| `/diagnose [job_id]` | Run STDOUT failure diagnostics |
186+
| `/review` | Model output physical plausibility check |
187+
| `/validate` | Pre-flight namelist validation |
188+
| `/qc forcing` | EXF forcing data QC |
189+
| `/qc obc` | OBC boundary data QC |
190+
| `/dashboard start` | Start monitoring stack |
191+
| `/dashboard status` | Health-check all components |
192+
| `/dashboard restart [component]` | Restart dashboard/converter/plotter |
193+
| `/ensemble start` | Begin bred vector generation |
194+
| `/ensemble status` | Show ensemble convergence |
195+
| `/config [param]` | Show simulation configuration |
196+
197+
## Agent Autonomy Levels
198+
199+
The system operates with **high autonomy**:
200+
201+
**Autonomous actions (no Discord approval needed):**
202+
- Resubmit after SLURM walltime exceeded
203+
- Restart dead dashboard/plotter/converter processes
204+
- Clear run directory before resubmit
205+
- Rebuild container image if not found
206+
207+
**Requires Discord approval (posts interactive buttons):**
208+
- Timestep changes (CFL approaching 0.45)
209+
- Ambiguous failure with multiple fix options
210+
- Physics parameter changes (viscosity, diffusion)
211+
- First-time configuration submission
212+
- Bred vector cycle completion review
213+
214+
## Troubleshooting
215+
216+
### Bot doesn't respond to commands
217+
- Check `journalctl -u spectre-agents -f` for errors
218+
- Verify `DISCORD_BOT_TOKEN` and `DISCORD_GUILD_ID` are correct
219+
- Ensure the bot has the required permissions in your server
220+
- Commands may take up to 1 hour to sync globally; guild sync is instant
221+
222+
### "Claude Agent SDK not found" error
223+
- Ensure `claude-agent-sdk` is installed: `.venv/bin/pip list | grep claude`
224+
- The Claude Code CLI must be installed on the system: `which claude`
225+
226+
### Agent times out
227+
- Check `ANTHROPIC_API_KEY` is valid and has quota
228+
- Increase `max_turns` in `spectre_agents_config.yaml` if agents need more steps
229+
- Check network connectivity from the cluster node
230+
231+
### SLURM commands fail
232+
- Verify the service runs as the correct user (joe)
233+
- Check that SLURM is accessible from the node running the service
234+
- Ensure the working directory exists: `/mnt/beegfs/spectre-150-ensembles`
235+
236+
## Cost Estimates
237+
238+
| Agent | Model | Approx. cost per invocation |
239+
|-------|-------|---------------------------|
240+
| Orchestrator | Opus 4.6 | $0.10 – $0.50 |
241+
| Diagnostics/Review/Validator/QC | Sonnet 4.6 | $0.02 – $0.10 |
242+
| WorkflowRunner/Dashboard/Notify | Haiku 4.5 | $0.005 – $0.02 |
243+
244+
A typical run-diagnose-fix-restart cycle costs approximately $0.50 – $1.00.

pyproject.toml

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,11 +33,17 @@ dependencies = [
3333
"xarray==2025.6.1",
3434
"xgcm==0.8.1",
3535
"zarr==3.0.8",
36-
"MetPy==1.7.1"
36+
"MetPy==1.7.1",
37+
"claude-agent-sdk",
38+
"discord.py>=2.3.0",
39+
"anyio>=4.0.0",
3740
]
3841
[project.urls]
3942
Homepage = "https://github.com/ocean-spectre/spectra-150-ensembles"
4043
Issues = "https://github.com/fluidnumerics/spectre_utils/issues"
4144

45+
[project.scripts]
46+
spectre-agents = "spectre_agents.__main__:cli"
47+
4248
[tool.setuptools]
43-
packages = ["spectre_utils"]
49+
packages = ["spectre_utils", "spectre_agents", "spectre_agents.tools", "spectre_agents.agents", "spectre_agents.discord_bot"]

spectre_agents/__init__.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
"""SPECTRE Simulation Agent System.
2+
3+
Autonomous Python agents for MITgcm ocean simulation orchestration,
4+
with Discord bot integration for bidirectional communication.
5+
"""
6+
7+
__version__ = "0.1.0"

spectre_agents/__main__.py

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
"""Entry point for the SPECTRE agent system.
2+
3+
Usage:
4+
python -m spectre_agents [--config PATH]
5+
6+
Starts the Discord bot and agent runner as concurrent asyncio tasks.
7+
"""
8+
9+
from __future__ import annotations
10+
11+
import argparse
12+
import asyncio
13+
import logging
14+
import signal
15+
import sys
16+
from pathlib import Path
17+
18+
from spectre_agents.config import load_config
19+
from spectre_agents.context import AgentContext
20+
from spectre_agents.discord_bot.bot import run_bot
21+
from spectre_agents.tools.discord_notify import set_agent_context
22+
23+
logger = logging.getLogger("spectre_agents")
24+
25+
26+
def setup_logging() -> None:
27+
"""Configure structured logging to stderr and optional file."""
28+
fmt = logging.Formatter(
29+
"%(asctime)s [%(levelname)s] %(name)s: %(message)s",
30+
datefmt="%Y-%m-%d %H:%M:%S",
31+
)
32+
handler = logging.StreamHandler(sys.stderr)
33+
handler.setFormatter(fmt)
34+
35+
root = logging.getLogger()
36+
root.setLevel(logging.INFO)
37+
root.addHandler(handler)
38+
39+
# Suppress noisy discord.py debug logs
40+
logging.getLogger("discord").setLevel(logging.WARNING)
41+
logging.getLogger("discord.http").setLevel(logging.WARNING)
42+
43+
44+
async def main(config_path: str | None = None) -> None:
45+
"""Main async entry point."""
46+
setup_logging()
47+
48+
config = load_config(config_path)
49+
logger.info("Loaded config: base_dir=%s, sim_dir=%s", config.base_dir, config.sim_dir)
50+
51+
# Validate required secrets
52+
if not config.anthropic_api_key:
53+
logger.error("ANTHROPIC_API_KEY not set. Set it in /etc/spectre-agents/env or environment.")
54+
sys.exit(1)
55+
if not config.discord_bot_token:
56+
logger.error("DISCORD_BOT_TOKEN not set. Set it in /etc/spectre-agents/env or environment.")
57+
sys.exit(1)
58+
59+
# Initialize shared context
60+
ctx = AgentContext(base_dir=config.base_dir)
61+
ctx.load_state()
62+
logger.info("Loaded state: status=%s, job=%s", ctx.simulation.status, ctx.simulation.active_job_id)
63+
64+
# Wire up Discord tools with the context
65+
set_agent_context(ctx)
66+
67+
# Handle shutdown signals
68+
loop = asyncio.get_event_loop()
69+
stop_event = asyncio.Event()
70+
71+
def signal_handler(sig):
72+
logger.info("Received signal %s, shutting down...", sig)
73+
stop_event.set()
74+
75+
for sig in (signal.SIGINT, signal.SIGTERM):
76+
loop.add_signal_handler(sig, signal_handler, sig)
77+
78+
# Run the Discord bot — it manages the event loop
79+
logger.info("Starting SPECTRE Agent System...")
80+
try:
81+
await run_bot(config, ctx)
82+
except asyncio.CancelledError:
83+
pass
84+
finally:
85+
ctx.save_state()
86+
logger.info("SPECTRE Agent System stopped.")
87+
88+
89+
def cli() -> None:
90+
"""CLI entry point."""
91+
parser = argparse.ArgumentParser(
92+
description="SPECTRE Simulation Agent System with Discord bot"
93+
)
94+
parser.add_argument(
95+
"--config",
96+
type=str,
97+
default=None,
98+
help="Path to spectre_agents_config.yaml",
99+
)
100+
args = parser.parse_args()
101+
asyncio.run(main(args.config))
102+
103+
104+
if __name__ == "__main__":
105+
cli()

0 commit comments

Comments
 (0)