Add enhanced LOLA eval compatibility by sgreenbury · Pull Request #409 · alan-turing-institute/autocast

sgreenbury · 2026-06-05T08:43:53Z

Summary

Contributes towards Run latest pipeline with Rayleigh-Bénard dataset #283.
Extend LOLA-compatible metrics with VMSE/VRMSE v2, spectral
PSRMSE/PSCC bands, spread, skill, and spread-skill ratio outputs.
Add LOLA-aligned rollout controls, including eval.rollout_start,
autoencoded-target metric CSVs, spatial transpose/aspect controls, and
metric skip flags for cheaper rerenders.
Add ensemble diagnostics for selected deterministic metric members,
member-average deterministic metrics, and optional rollout member
visualizations.
Update wrapped LOLA encoder/decoder eval paths to preserve conditioning
and support chunked forwards for high-resolution rollout evaluation.
Add a Rayleigh-Benard FM eval submitter for LOLA autoencoder-backed
encode-once evaluation.

Testing

Not run in this PR-prep pass.

Introduce the first encoder-processor-decoder eval path for saved runs so RB experiments can render and score rollouts offline.

Add the command-line eval script and config wiring needed to run encoder-processor-decoder evaluations outside training.

Support applying expensive tensor functions over smaller chunks. This keeps memory-heavy eval metrics usable on large rollouts.

Include energy score with the memory-intensive metrics so eval can report ensemble quality alongside deterministic diagnostics.

Move optional chunking support into the metric base classes. This keeps metric implementations smaller and gives eval one API.

Ensure scalar eval values are logged when metric objects return plain tensors, preserving summary output for RB comparisons.

Allow eval rendering and metrics to transpose spatial axes so LOLA outputs can be compared in the expected Rayleigh-Benard layout.

Add deterministic metrics matching the LOLA evaluation convention so RB eval outputs can be compared against existing baselines.

Extend the eval script with the controls needed by the RB workflow, including output selection and rendering options.

Render individual ensemble members during eval so comparisons can inspect member spread instead of only aggregate summaries.

Allow eval runs to skip metric and benchmark calculations. This supports render-only checks when full scoring is too expensive.

Wire the skip controls through the eval script so render-only RB jobs can avoid unnecessary metric and benchmark work.

Restore current origin/main snapshot plotting utilities and reapply the eval-specific member rows and transpose arguments on top.

Only auto-fill encoder input channels when that key exists. This preserves configs without in_channels while keeping RB auto setup.

Update spectral metrics to match LOLA ensemble handling and add optional AE-relative rollout outputs. Add eval.rollout_start as generic eval configuration, interpreting it as the final conditioning timestep for LOLA-compatible rollout starts.

sgreenbury added 15 commits May 23, 2026 18:16

Add initial eval workflow

e04523b

Introduce the first encoder-processor-decoder eval path for saved runs so RB experiments can render and score rollouts offline.

Add eval script entrypoint

33efd5f

Add the command-line eval script and config wiring needed to run encoder-processor-decoder evaluations outside training.

Add chunked apply utility

e40a17c

Support applying expensive tensor functions over smaller chunks. This keeps memory-heavy eval metrics usable on large rollouts.

Add energy score metric

517765c

Include energy score with the memory-intensive metrics so eval can report ensemble quality alongside deterministic diagnostics.

Refactor metric chunking hooks

560d0c0

Move optional chunking support into the metric base classes. This keeps metric implementations smaller and gives eval one API.

Fix scalar metric logging

adafe70

Ensure scalar eval values are logged when metric objects return plain tensors, preserving summary output for RB comparisons.

Add spatial transpose option

6819fb8

Allow eval rendering and metrics to transpose spatial axes so LOLA outputs can be compared in the expected Rayleigh-Benard layout.

Add LOLA VMSE2 and VRMSE2 metrics

27f6ab8

Add deterministic metrics matching the LOLA evaluation convention so RB eval outputs can be compared against existing baselines.

Update eval script controls

a62291b

Extend the eval script with the controls needed by the RB workflow, including output selection and rendering options.

Add ensemble member rendering

2eda9e3

Render individual ensemble members during eval so comparisons can inspect member spread instead of only aggregate summaries.

Add metric skip controls

932270c

Allow eval runs to skip metric and benchmark calculations. This supports render-only checks when full scoring is too expensive.

Update eval metric skipping

3eef34b

Wire the skip controls through the eval script so render-only RB jobs can avoid unnecessary metric and benchmark work.

Fix eval plotting merge

b9763df

Restore current origin/main snapshot plotting utilities and reapply the eval-specific member rows and transpose arguments on top.

Fix input channel guard

67c8a88

Only auto-fill encoder input channels when that key exists. This preserves configs without in_channels while keeping RB auto setup.

Add LOLA rollout eval metrics

ad4e5c3

Update spectral metrics to match LOLA ensemble handling and add optional AE-relative rollout outputs. Add eval.rollout_start as generic eval configuration, interpreting it as the final conditioning timestep for LOLA-compatible rollout starts.

sgreenbury mentioned this pull request Jun 5, 2026

Application to Rayleigh-Benard #369

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add enhanced LOLA eval compatibility#409

Add enhanced LOLA eval compatibility#409
sgreenbury wants to merge 15 commits into
283/rb-core-supportfrom
283/rb-eval-lola

sgreenbury commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sgreenbury commented Jun 5, 2026

Summary

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant