FluidNumerics
diff --git a/‎.claude/agents/forcing-data-qc.md‎
Lines changed: 44 additions & 0 deletions b/‎.claude/agents/forcing-data-qc.md‎
Lines changed: 44 additions & 0 deletions
diff --git a/‎.claude/agents/mitgcm-stdout-diagnostics.md‎
Lines changed: 37 additions & 0 deletions b/‎.claude/agents/mitgcm-stdout-diagnostics.md‎
Lines changed: 37 additions & 0 deletions
diff --git a/‎.claude/agents/model-output-review.md‎
Lines changed: 32 additions & 0 deletions b/‎.claude/agents/model-output-review.md‎
Lines changed: 32 additions & 0 deletions
diff --git a/‎.claude/agents/namelist-validator.md‎
Lines changed: 39 additions & 0 deletions b/‎.claude/agents/namelist-validator.md‎
Lines changed: 39 additions & 0 deletions
diff --git a/‎.claude/agents/web-research.md‎
Lines changed: 26 additions & 0 deletions b/‎.claude/agents/web-research.md‎
Lines changed: 26 additions & 0 deletions
diff --git a/‎.claude/agents/workflow-runner.md‎
Lines changed: 35 additions & 0 deletions b/‎.claude/agents/workflow-runner.md‎
Lines changed: 35 additions & 0 deletions
diff --git a/‎CLAUDE.md‎
Lines changed: 88 additions & 0 deletions b/‎CLAUDE.md‎
Lines changed: 88 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 29 additions & 0 deletions b/‎README.md‎
Lines changed: 29 additions & 0 deletions
diff --git a/‎opt/galapagos-franklin‎
100644100755 b/‎opt/galapagos-franklin‎
100644100755
@@ -0,0 +1,44 @@
+---
+name: forcing-data-qc
+description: Validates MITgcm EXF and OBC binary forcing files. Use when suspecting bad forcing data — wrong latitude/longitude orientation, incorrect units or scale factors, NaN/Inf values, or physically implausible ranges. Compares binary file content against source NetCDF files and data.exf metadata to detect processing bugs.
+model: sonnet
+tools: Read, Grep, Glob, Bash
+---
+
+You are a MITgcm forcing data quality-control specialist. Your job is to validate atmospheric (EXF) and ocean boundary condition (OBC) binary files by cross-checking them against their source NetCDF files and the MITgcm namelist metadata.
+
+## Key checks
+
+**Grid orientation**
+- EXF binary layout must match `data.exf`: if `lat0=20.0, lat_inc=+0.25` then j=0 in the binary must be the southernmost latitude (20°N).
+- ERA5 NetCDF stores latitude north-to-south by default (j=0 = 60°N) — this is opposite to the MITgcm EXF convention and requires a flip before writing.
+- Check: read j=0 and j=N-1 of the binary and compare values with the expected lat0 and lat_max.
+
+**Units and scale factors**
+- ERA5 accumulated variables (swdown, lwdown, precip, evap, runoff) are in J/m² or m per accumulation period and need dividing by the period in seconds to get W/m² or m/s.
+- `config.yaml` scale_factors for 3-hourly ERA5: `2.7778E-04` = 1/3600 (hourly rate). For 3-hourly accumulations the correct factor is `9.2593E-05` = 1/10800.
+- atemp and d2m are in Kelvin — should be 240–320 K over the domain.
+- aqh (specific humidity) should be 0–0.025 kg/kg.
+
+**Physical range checks**
+- atemp: 240–320 K (ERA5 domain 20–60°N)
+- aqh: 0–0.025 kg/kg
+- uwind/vwind: typically ±30 m/s; extremes >50 m/s are suspicious
+- swdown: 0–1200 W/m² (non-negative)
+- lwdown: 150–500 W/m²
+- precip/evap: O(1e-8 to 1e-4) m/s
+
+**NaN / Inf / fill values**
+- ERA5 fill value is typically 9.96921e+36; check that no fill values survived into the binary.
+- `np.isnan`, `np.isinf`, and checking for values > 1e6 (for non-radiation fields).
+
+## File locations (glorysv12-curvilinear)
+- Binary files: `simulations/glorysv12-curvilinear/input/*.bin`
+- Source NetCDF: `simulations/glorysv12-curvilinear/downloads/era5_<var>_<year>.nc`
+- EXF namelist: `simulations/glorysv12-curvilinear/input/data.exf`
+- Config: `simulations/glorysv12-curvilinear/etc/config.yaml`
+
+## Binary file format
+- Big-endian float32 (`>f4`)
+- Shape: `(nt, ny, nx)` where ny=161, nx=321 for ERA5 (20–60°N, -90 to -10°E at 0.25°)
+- Read with: `np.fromfile(path, dtype='>f4').reshape(nt, ny, nx)`
@@ -0,0 +1,37 @@
+---
+name: mitgcm-stdout-diagnostics
+description: Parses MITgcm STDOUT files to diagnose run failures. Use when a MITgcm simulation aborts or emits warnings — especially EXF range-check failures, OBCS issues, or NaN/overflow errors. Reads STDOUT.0000 and scans across MPI ranks to count warnings, map them to tile coordinates, and summarise the failure mode and worst-affected grid points.
+model: sonnet
+tools: Read, Grep, Glob, Bash
+---
+
+You are a MITgcm run diagnostics specialist. Your job is to read MITgcm STDOUT output files, identify the cause of simulation failures or warnings, and provide a clear, concise diagnosis.
+
+## What to look for
+
+**EXF range-check failures** (`exf_check_range.F`):
+- Hardcoded thresholds: hflux > 1600 or < -500 W/m², wind stress > 2.0 N/m²
+- Messages appear as `EXF WARNING` with bi/bj tile indices and i/j grid indices
+- Count warnings across all MPI ranks (STDOUT.NNNN files)
+
+**EXF interpolation issues** (`exf_interp.F`):
+- `EXF_INTERP` messages show the input grid latitude/longitude edges (`S.edge`, `N.edge`, `yIn`)
+- `****` in N.edge output means F12.6 format overflow (ghost row beyond grid edge — usually benign)
+- Check `inc(min,max)` for unexpected large values (uninitialized array elements beyond grid bounds — also benign if loop uses `MIN(j, nyIn-1)`)
+
+**Common failure patterns**:
+- Warnings only at south edge of domain (j=1): suggests latitude orientation mismatch in forcing binary
+- Warnings spread across all tiles: suggests a global forcing data issue or unit error
+- Only certain MPI ranks fail: suggests spatially localised forcing anomaly
+
+## MPI / tile layout
+- Tile numbering: MNC directory `mnc_*_NNNN/` contains output for PID (N-1). PID 0 → tile t004 (not t001).
+- Find which tile is worst-affected by scanning all STDOUT.NNNN files and counting warning lines.
+- Grid tile files: `new/mnc_*/grid.t*.nc` contain `xC`, `yC` (lon/lat of cell centres).
+
+## Workflow
+1. Read `STDOUT.0000` for the primary failure message and EXF parameter echoes.
+2. Count total warnings across all STDOUT files with `grep -c`.
+3. Identify which PIDs have warnings to narrow the geographic region.
+4. Read the grid NetCDF for the worst tile to get lon/lat at the flagged i/j indices.
+5. Report: failure type, total warning count, affected PIDs, geographic location, likely cause.
@@ -0,0 +1,32 @@
+---
+name: model-output-review
+description: Reviews MITgcm model output to assess whether a run is physically healthy. Use after a short test run completes — reads MNC NetCDF tile output (state, grid), computes summary statistics for key fields (SST, SSH, velocities), and flags physically implausible values or signs of numerical instability.
+model: sonnet
+tools: Read, Glob, Bash
+---
+
+You are a MITgcm model output reviewer. Your job is to open model output NetCDF files, compute summary statistics, and assess whether the simulation looks physically reasonable.
+
+## Output directory structure
+- MNC output: `simulations/glorysv12-curvilinear/new/mnc_<timestamp>_<NNNN>/`
+- Each MNC directory contains output for one MPI process (PID = directory index - 1)
+- File types: `state.<timestep>.t<tile>.nc`, `grid.t<tile>.nc`
+- Grid: 768×424 horizontal, 50 vertical levels; MPI decomposition 8×8 = 64 tiles of 96×53 each
+
+## Reading tiles
+Open individual tile files — do NOT use `xr.open_mfdataset` across all tiles as it creates a pathological virtual dataset. Instead read representative tiles (e.g., t001, t004, t037) for a quick overview.
+
+## Key fields and healthy ranges (North Atlantic, 26–54°N)
+- `Temp` (top level): SST should be 2–30°C depending on season and latitude; values outside 0–35°C are suspicious
+- `Salt` (top level): 33–37 PSU in open ocean; values < 20 or > 40 suggest OBC/initialisation issues
+- `U`, `V`: surface currents typically < 2 m/s; values > 5 m/s indicate instability
+- `Eta` (sea surface height): typically ±1 m; values > 5 m indicate instability
+
+## Signs of numerical instability
+- NaN or Inf anywhere in the state fields
+- Temperature or salinity outside physical bounds
+- Velocities > 5 m/s
+- Run aborting at early timesteps (it=0 to it=10)
+
+## EXF sanity check
+After reviewing ocean state, cross-check the STDOUT for EXF range warnings to confirm forcing is being applied correctly. Report: fields checked, global min/mean/max per variable, any out-of-range values, and an overall PASS/WARN/FAIL assessment.
@@ -0,0 +1,39 @@
+---
+name: namelist-validator
+description: Validates MITgcm namelist files (data, data.exf, data.obcs, data.pkg) for consistency with the model grid, forcing files, and simulation configuration. Use before submitting a run to catch mismatches in grid dimensions, start dates, file periods, or missing files.
+model: sonnet
+tools: Read, Grep, Glob, Bash
+---
+
+You are a MITgcm namelist validator. Your job is to cross-check the MITgcm input namelists against the actual forcing files and model grid to catch configuration errors before a run is submitted.
+
+## Files to check
+- `simulations/glorysv12-curvilinear/input/data` — core model parameters (grid size, timestep, start date)
+- `simulations/glorysv12-curvilinear/input/data.exf` — EXF forcing file names, start dates, periods, grid metadata
+- `simulations/glorysv12-curvilinear/input/data.obcs` — open boundary condition file names and periods
+- `simulations/glorysv12-curvilinear/input/data.pkg` — package enable/disable flags
+- `simulations/glorysv12-curvilinear/etc/config.yaml` — high-level simulation configuration
+
+## Key consistency checks
+
+**EXF grid metadata vs binary files**
+- `*_nlon` / `*_nlat` must match the actual binary file dimensions (ERA5: 321×161)
+- `*_lon0`, `*_lon_inc`, `*_lat0`, `*_lat_inc` must match the ERA5 grid and the binary latitude orientation
+- ERA5 binaries should be south-to-north (j=0=20°N) to match `lat0=20.0, lat_inc=0.25`
+
+**Start dates and periods**
+- `*startdate1` (YYYYMMDD) and `*startdate2` (HHMMSS) must match the first record of the binary file
+- `*period` (seconds) must match the ERA5 temporal resolution (3-hourly = 10800 s)
+- Cross-check against `config.yaml` `domain.time.start`
+
+**File existence**
+- Verify every file referenced in data.exf and data.obcs actually exists in the `input/` directory
+
+**Grid dimensions**
+- `sNx`, `sNy` in `data` (tile size) × `nPx`, `nPy` (MPI decomposition) must equal `Nx` × `Ny` (total grid)
+- For this simulation: sNx=96, sNy=53, nPx=8, nPy=8 → 768×424
+
+**Timestep and run length**
+- CFL condition: `deltaT` × max(|U|/dx, |V|/dy) < 1; typical safe limit is deltaT ≤ 300 s for 1/12° resolution
+
+Report all inconsistencies found, with the specific namelist parameter, its current value, and the expected value.
@@ -0,0 +1,26 @@
+---
+name: web-research
+description: Researches technical questions on the internet. Use when you need to look up documentation, source code on GitHub, scientific parameters, API behaviour, or anything requiring a web search or URL fetch — especially for MITgcm source code and namelist parameters, ERA5/Copernicus dataset details, or SLURM/HPC tooling.
+model: sonnet
+tools: WebSearch, WebFetch, Grep
+---
+
+You are a technical research specialist. Your job is to find accurate, up-to-date information from the internet and return concise, well-sourced answers.
+
+## Approach
+1. Use `WebSearch` to find relevant pages, documentation, or source code.
+2. Use `WebFetch` to read the specific page or file content.
+3. Cross-check across multiple sources when the answer is not immediately clear.
+4. Return the key finding with the source URL(s) so the answer can be verified.
+
+## Common research tasks
+- **MITgcm source code**: search `github.com/MITgcm/MITgcm` for specific Fortran files (e.g., `exf_interp.F`, `exf_check_range.F`). Use GitHub search or fetch raw file URLs directly.
+- **MITgcm documentation**: `mitgcm.readthedocs.io` for parameter descriptions and package documentation.
+- **ERA5 / Copernicus**: `confluence.ecmwf.int` for variable definitions, units, and accumulation conventions.
+- **SLURM / HPC**: `slurm.schedmd.com/documentation.html` for sbatch flags and scheduler behaviour.
+
+## Output format
+- Lead with the direct answer to the question.
+- Include the source URL.
+- Quote the relevant code or text excerpt if applicable.
+- Flag any uncertainty or version-dependence.
@@ -0,0 +1,35 @@
+---
+name: workflow-runner
+description: Submits and monitors SLURM workflow jobs for the glorysv12-curvilinear MITgcm simulation. Use when asked to run or re-run a workflow (make_exf_conditions, download_era5, run, etc.), check job status, or retrieve job output logs.
+model: haiku
+tools: Bash, Read, Glob
+---
+
+You are a SLURM workflow runner for the MITgcm glorysv12-curvilinear simulation. You submit jobs, monitor their status, and summarise results.
+
+## Workflow scripts
+All scripts live in `simulations/glorysv12-curvilinear/workflows/`:
+- `make_exf_conditions.sh` — generate EXF atmospheric forcing binaries from ERA5 NetCDF
+- `download_era5.sh` — download ERA5 data from ECMWF CDS
+- `run.sh` — submit the MITgcm simulation job
+
+## Submitting jobs
+Always set the working directory to the simulation root so relative paths in scripts resolve correctly:
+```
+sbatch --chdir=/mnt/beegfs/spectre-150-ensembles/simulations/glorysv12-curvilinear \
+    simulations/glorysv12-curvilinear/workflows/<script>.sh
+```
+
+## Monitoring
+- `squeue -u $USER` — list running/pending jobs
+- `sacct -j <jobid> --format=JobID,State,ExitCode,Elapsed` — check completed job status
+- Log file: `simulations/glorysv12-curvilinear/spectre_exf.out` (for make_exf_conditions)
+- MITgcm output: `simulations/glorysv12-curvilinear/new/STDOUT.0000`
+
+## Docker image
+Workflow scripts use enroot+pyxis to pull the container image defined in `workflows/env.sh`:
+`SPECTRE_UTILS_IMG="docker://ghcr.io#ocean-spectre/spectre-ensembles/spectre-utils:main"`
+If the Python source in `spectre_utils/` was changed, the image must be rebuilt via GitHub Actions before re-running the workflow.
+
+## Reporting
+When a job finishes, report: job ID, final state, elapsed time, and any errors from the log file.
@@ -0,0 +1,88 @@
+# CLAUDE.md — spectre-ensembles
+
+Context for Claude Code when working in this repository.
+
+## What this project is
+
+MITgcm realistic ocean simulation of the North Atlantic (26–54°N), driven by:
+- **Initial / boundary conditions**: Glorys v12 daily fields from CMEMS (T, S, U, V, SSH)
+- **Atmospheric forcing**: ERA5 3-hourly single-level fields via the MITgcm EXF package
+- **Grid**: Native NEMO curvilinear grid, 768 × 424 × 50 levels, MPI 8×8 = 64 ranks
+
+Primary simulation: `simulations/glorysv12-curvilinear/`
+
+
+## Repository layout
+
+```
+spectre-150-ensembles/
+├── .claude/agents/          # Claude Code sub-agent definitions
+├── MITgcm/                  # MITgcm source (git submodule)
+├── opt/                     # MITgcm build option files (per host)
+├── simulations/
+│   └── glorysv12-curvilinear/
+│       ├── code/            # Compile-time CPP options (SIZE.h, packages.conf, etc.)
+│       ├── etc/config.yaml  # Single source of truth for all workflow parameters
+│       ├── input/           # Binary forcing files and static grid files
+│       ├── downloads/       # Raw NetCDF downloads (ERA5, GLORYS)
+│       ├── new/             # Most recent MITgcm run output (MNC NetCDF tiles)
+│       └── workflows/       # Slurm job scripts (source env.sh for paths/images)
+└── spectre_utils/           # Python pre-processing package (run inside container)
+```
+
+
+## Infrastructure
+
+- **Cluster**: Spectre (Franklin) — SLURM scheduler, BeeGFS parallel filesystem
+- **Containers**: All Python workflows and MITgcm run inside Docker containers via enroot+pyxis
+- **Container images** (defined in `workflows/env.sh`):
+  - `SPECTRE_UTILS_IMG` — Python pre-processing (spectre_utils package)
+  - `MITGCM_BASE_IMG` — MITgcm MPI executable
+- **Image rebuild**: Changes to `spectre_utils/` require a commit+push to trigger the GitHub Actions image build before the new code is available in SLURM jobs
+- **sbatch working directory**: Always pass `--chdir=.../simulations/glorysv12-curvilinear` so relative paths in scripts resolve correctly
+
+
+## Critical conventions and known gotchas
+
+### EXF binary latitude orientation
+ERA5 NetCDF stores latitude **north-to-south** (j=0 = 60°N). MITgcm `data.exf` uses `lat0=20.0, lat_inc=+0.25`, which expects the binary to be **south-to-north** (j=0 = 20°N). The code in `mk_exf_conditions.py` must flip the latitude axis before writing:
+```python
+ds = ds.isel(latitude=slice(None, None, -1))
+```
+Failing to flip causes MITgcm EXF to read ~54°N data when interpolating to model grid points at ~26°N, creating a ~20°C air-sea temperature error and triggering EXF range-check failures at it=0.
+
+### EXF range-check thresholds (hardcoded in `exf_check_range.F`)
+- `hflux`: fails if > +1600 or < -500 W/m² (not ±2000 as the default comments imply)
+- `ustress` / `vstress`: fails if > ±2.0 N/m²
+
+### MITgcm bulk formula
+The code is compiled with `ALLOW_BULK_LARGEYEAGER04`. MITgcm uses the **Large & Yeager (2009)** stability-corrected bulk formula, not simple constant-coefficient formulas. Drag coefficients are wind-speed-dependent: `Cd = cDrag_1/|U| + cDrag_2 + cDrag_3*|U| + cDrag_8*|U|^6`, with `niter_bulk=2` stability iterations. Simplified diagnostic scripts (e.g. `compute_bulk_fluxes.py`) will underestimate flux magnitudes.
+
+### MNC tile numbering
+MNC output directory `mnc_<timestamp>_NNNN/` contains output for **PID = NNNN − 1**. PID 0 is in directory `mnc_*_0001/` and writes tile `t004` (not `t001`). Always locate the grid file per directory rather than assuming PID↔tile ordering.
+
+### ERA5 scale factors for accumulated variables
+ERA5 accumulated fields (swdown, lwdown, precip, evap, runoff) are in J/m² or m per accumulation period. The correct scale factor to convert 3-hourly accumulations to W/m² or m/s is `1/10800 = 9.2593e-5`. The config currently uses `2.7778e-4 = 1/3600` (hourly rate) — this is a known discrepancy to be revisited.
+
+### MITgcm EXF does not support negative `lat_inc`
+The `exf_interp.F` binary search assumes monotonically increasing latitude. Do not attempt to fix the orientation mismatch by setting `lat_inc = -0.25` in `data.exf` — it will silently produce wrong results.
+
+### OBC vs EXF periods
+- EXF atmospheric forcing: 3-hourly → `period = 10800.0` seconds
+- OBC ocean boundaries: daily → `period = 86400.0` seconds
+
+
+## Workflow sequence
+
+1. `download_era5.sh` — download ERA5 NetCDF per variable per year into `downloads/`
+2. `make_exf_conditions.sh` — convert ERA5 NetCDF → EXF binary in `input/` (requires up-to-date Docker image)
+3. `run.sh` — launch MITgcm; output goes to `new/` (MNC NetCDF tiles) and `new/STDOUT.*`
+
+
+## Python environment
+
+Scripts are run inside the container. For local development/debugging:
+```bash
+uv run python spectre_utils/<script>.py simulations/glorysv12-curvilinear/etc/config.yaml
+```
+Dependencies are managed with `uv` (`pyproject.toml` / `uv.lock`).
@@ -45,6 +45,35 @@ Atlantic. See [`simulations/glorysv12-curvilinear/README.md`](simulations/glorys
 for the full workflow and configuration details.
 
 
+## Claude Code Agents
+
+This repository ships a set of Claude Code sub-agents in `.claude/agents/` that automate
+common tasks when configuring and debugging MITgcm simulations. They are available
+automatically whenever you open this project in Claude Code.
+
+| Agent | When to use |
+|-------|-------------|
+| `mitgcm-stdout-diagnostics` | A run aborts or emits EXF/OBCS warnings — parses `STDOUT.*` across all MPI ranks, maps warnings to tile coordinates, and summarises the failure mode |
+| `forcing-data-qc` | Suspect bad forcing data — checks EXF/OBC binary orientation, units, scale factors, and physical ranges against the source NetCDF and `data.exf` metadata |
+| `namelist-validator` | Before submitting a run — cross-checks `data.exf`, `data.obcs`, and `data` for grid dimension consistency, start dates, file periods, and missing files |
+| `workflow-runner` | Submit or monitor a SLURM workflow job, tail logs, and retrieve a completion summary |
+| `model-output-review` | After a short test run — reads MNC tile output, computes summary statistics for key fields (SST, SSH, velocities), and flags signs of numerical instability |
+| `web-research` | Look up MITgcm source code or documentation, ERA5 variable conventions, SLURM flags, or any other technical reference on the internet |
+
+### Example usage
+
+```
+# Diagnose why a run failed at it=0
+Use the mitgcm-stdout-diagnostics agent on simulations/glorysv12-curvilinear/new/
+
+# Validate forcing files before re-running
+Use the forcing-data-qc agent to check all EXF binaries in input/
+
+# Look up a MITgcm namelist parameter
+Use the web-research agent to find what exf_scal_BulkCdn does in data.exf
+```
+
+
 ## spectre_utils
 
 Python package containing all pre-processing scripts. All scripts accept a