|
| 1 | +# Bred Vector Ensemble Generation |
| 2 | + |
| 3 | +This directory contains the configuration and member directories for generating |
| 4 | +50 independent ensemble initial conditions using the **bred vector method**. |
| 5 | + |
| 6 | +## Methodology |
| 7 | + |
| 8 | +### Why bred vectors? |
| 9 | + |
| 10 | +Ocean models are chaotic — small differences in initial conditions grow |
| 11 | +exponentially, projecting onto the system's fastest-growing instability modes. |
| 12 | +The bred vector method exploits this by: |
| 13 | + |
| 14 | +1. Perturbing a control state with small noise |
| 15 | +2. Running the model forward to let perturbations grow |
| 16 | +3. Rescaling the perturbation back to a target amplitude |
| 17 | +4. Repeating until the perturbation "locks on" to the dominant growing modes |
| 18 | + |
| 19 | +After several breeding cycles, the perturbations are no longer random noise — |
| 20 | +they represent physically meaningful, dynamically balanced uncertainty structures |
| 21 | +(Gulf Stream meanders, baroclinic eddies, frontal instabilities). These are |
| 22 | +far more realistic initial condition perturbations than random noise alone. |
| 23 | + |
| 24 | +### Breeding cycle |
| 25 | + |
| 26 | +``` |
| 27 | +Control: ─────────────────────────────────────────────► |
| 28 | + │ │ |
| 29 | + │ perturb │ rescale & perturb |
| 30 | + ▼ ▼ |
| 31 | +Member i: ─────●════════════════════●════════════════════●───► |
| 32 | + cycle 0 cycle 1 cycle 2 ... |
| 33 | + (30 days) (30 days) |
| 34 | +``` |
| 35 | + |
| 36 | +Each cycle: |
| 37 | +1. Member starts from `control_state + perturbation` |
| 38 | +2. Runs forward for 30 days (configurable) |
| 39 | +3. At end: `bred_vector = member_state - control_state` |
| 40 | +4. Rescale factor = `target_RMS / actual_RMS` (computed from temperature) |
| 41 | +5. **Same rescale factor applied to ALL variables** (T, S, U, V, SSH) to preserve |
| 42 | + geostrophic and hydrostatic balance |
| 43 | +6. New perturbation: `control_state + rescale_factor × bred_vector` |
| 44 | + |
| 45 | +### Design choices |
| 46 | + |
| 47 | +**50 independent streams**: Each member has its own random seed and evolves |
| 48 | +independently. This maximizes the diversity of growing modes captured. |
| 49 | + |
| 50 | +**Single rescaling factor from temperature**: Rather than rescaling each variable |
| 51 | +independently (which would break dynamical consistency), we compute one factor |
| 52 | +from the temperature field and apply it uniformly. The bred vector's internal |
| 53 | +balance between T, S, U, V, and SSH is preserved. |
| 54 | + |
| 55 | +**Target amplitude 0.05°C RMS**: This is the standard for mesoscale-resolving |
| 56 | +North Atlantic ensembles — large enough to seed growing instabilities but small |
| 57 | +enough to remain in the linear growth regime. |
| 58 | + |
| 59 | +**30-day cycle length**: Captures mesoscale eddy growth and Gulf Stream |
| 60 | +instabilities (Rossby deformation timescale). Shorter cycles (7 days) emphasize |
| 61 | +fast barotropic modes; longer cycles (60+ days) allow slower baroclinic modes. |
| 62 | +Configurable in `breed_config.yaml`. |
| 63 | + |
| 64 | +**8 breeding cycles**: Empirically sufficient for convergence. Monitor with |
| 65 | +`breed_vectors.py status` — the per-variable RMS should stabilize by cycle 5–6. |
| 66 | + |
| 67 | +## Workflow |
| 68 | + |
| 69 | +### Prerequisites |
| 70 | + |
| 71 | +- Completed control spinup run with at least one permanent pickup file |
| 72 | +- Set `pickup_iter` in `breed_config.yaml` (or leave null to auto-detect latest) |
| 73 | + |
| 74 | +### Steps |
| 75 | + |
| 76 | +```bash |
| 77 | +cd simulations/glorysv12-curvilinear |
| 78 | + |
| 79 | +# 1. Initialize 50 perturbed pickups from the control state |
| 80 | +uv run python ../../spectre_utils/breed_vectors.py init ensemble/breed_config.yaml |
| 81 | + |
| 82 | +# 2. Run all 50 members for one breeding cycle (SLURM array job) |
| 83 | +sbatch --chdir=$(pwd) workflows/breed_vectors.sh |
| 84 | + |
| 85 | +# 3. After all members complete — compute bred vectors and rescale |
| 86 | +uv run python ../../spectre_utils/breed_vectors.py rescale ensemble/breed_config.yaml --cycle 1 |
| 87 | + |
| 88 | +# 4. Check convergence (per-variable RMS table) |
| 89 | +uv run python ../../spectre_utils/breed_vectors.py status ensemble/breed_config.yaml --cycle 1 |
| 90 | + |
| 91 | +# 5. Repeat steps 2–4 for each cycle |
| 92 | +# Update --cycle 2, 3, ... 8 |
| 93 | +``` |
| 94 | + |
| 95 | +### GCP deployment |
| 96 | + |
| 97 | +Each member directory (`member_001/` through `member_050/`) is self-contained: |
| 98 | +- Perturbed pickup file (`.data` + `.meta`) |
| 99 | +- `nIter0.txt` with the starting iteration |
| 100 | + |
| 101 | +To run on GCP: |
| 102 | +1. Copy the input deck and member pickups to each compute node's local disk |
| 103 | +2. Each member runs standard MITgcm with the member's pickup as the restart file |
| 104 | +3. After all members finish, copy pickups back and run the `rescale` step |
| 105 | + |
| 106 | +## GCP Cost Estimate |
| 107 | + |
| 108 | +### Cluster configuration |
| 109 | + |
| 110 | +| Component | Machine type | Count | Purpose | |
| 111 | +|-----------|-------------|-------|---------| |
| 112 | +| Compute | h4d-standard-192 | 17 | 3 simulations per node (64 cores each) | |
| 113 | +| Login | n1-standard-2 | 1 | SSH access, job submission | |
| 114 | +| Controller | n1-standard-2 | 1 | Slurm controller | |
| 115 | + |
| 116 | +### Compute requirements per cycle |
| 117 | + |
| 118 | +- 50 members ÷ 3 per node = **17 nodes** per cycle |
| 119 | +- 30 sim-days at 12–20 sim-days/wall-hr = **1.5–2.5 wall hours** per cycle |
| 120 | +- 8 cycles × 2.5 hrs = **~20 hours** total wall time (plus ~30 min rescaling between cycles) |
| 121 | +- Total compute: 17 nodes × 20 hrs = **340 node-hours** (conservative) |
| 122 | + |
| 123 | +### Local disk per node |
| 124 | + |
| 125 | +| Data | Size | |
| 126 | +|------|------| |
| 127 | +| EXF forcing (8 variables × 54 GB) | 432 GB | |
| 128 | +| OBC boundary files | 20 GB | |
| 129 | +| Grid, bathymetry, initial conditions | 2 GB | |
| 130 | +| Pickup files (3 members) | 6 GB | |
| 131 | +| Output headroom (diagnostics, pickups) | 40 GB | |
| 132 | +| **Total** | **~500 GB** | |
| 133 | + |
| 134 | +Recommend **1 TB pd-ssd** per compute node, or local NVMe SSD if available |
| 135 | +on the machine type. |
| 136 | + |
| 137 | +### Cost breakdown |
| 138 | + |
| 139 | +| Item | On-demand | Spot (~70% discount) | |
| 140 | +|------|-----------|---------------------| |
| 141 | +| h4d-standard-192 × 340 node-hrs @ $9.64/hr | $3,278 | $983 | |
| 142 | +| pd-ssd 1 TB × 17 nodes × 20 hrs @ $0.23/hr | $78 | $78 | |
| 143 | +| n1-standard-2 × 2 × 24 hrs @ $0.095/hr | $5 | $5 | |
| 144 | +| **Total** | **~$3,400** | **~$1,100** | |
| 145 | + |
| 146 | +### Notes |
| 147 | + |
| 148 | +- Spot/preemptible instances are viable since each breeding cycle is only |
| 149 | + 1.5–2.5 hours — short enough to avoid most preemptions |
| 150 | +- The 30-min rescaling step between cycles runs on a single node and is |
| 151 | + negligible cost |
| 152 | +- Data transfer: ~500 GB input deck upload (one-time) + ~100 MB pickups per |
| 153 | + cycle (negligible) |
| 154 | +- The control run must also advance 30 days per cycle to provide the reference |
| 155 | + state — this can run on one of the 17 compute nodes |
| 156 | + |
| 157 | +## Configuration |
| 158 | + |
| 159 | +All parameters are in `breed_config.yaml`: |
| 160 | + |
| 161 | +```yaml |
| 162 | +breeding: |
| 163 | + n_members: 50 |
| 164 | + n_cycles: 8 |
| 165 | + cycle_length_days: 30 # configurable |
| 166 | + target_amplitude: |
| 167 | + temperature_rms: 0.05 # °C |
| 168 | +``` |
| 169 | +
|
| 170 | +## Convergence monitoring |
| 171 | +
|
| 172 | +Run `breed_vectors.py status` after each cycle. You should see: |
| 173 | + |
| 174 | +- **Cycles 1–3**: RMS ratios between variables shift as perturbations reorganize |
| 175 | +- **Cycles 4–6**: Per-variable RMS stabilizes — bred vectors are converging |
| 176 | +- **Cycles 7–8**: Minimal change — bred vectors have locked onto growing modes |
| 177 | + |
| 178 | +If temperature RMS doesn't stabilize by cycle 8, increase `n_cycles` or |
| 179 | +consider a shorter `cycle_length_days` to accelerate convergence. |
| 180 | + |
| 181 | +## Directory structure |
| 182 | + |
| 183 | +``` |
| 184 | +ensemble/ |
| 185 | +├── breed_config.yaml # Breeding parameters |
| 186 | +├── README.md # This file |
| 187 | +├── member_001/ # Member 1 |
| 188 | +│ ├── pickup.NNNNNNNNNN.data |
| 189 | +│ ├── pickup.NNNNNNNNNN.meta |
| 190 | +│ └── nIter0.txt |
| 191 | +├── member_002/ |
| 192 | +│ └── ... |
| 193 | +└── member_050/ |
| 194 | + └── ... |
| 195 | +``` |
0 commit comments