Skip to content

Add enhanced LOLA eval compatibility#409

Draft
sgreenbury wants to merge 15 commits into
283/rb-core-supportfrom
283/rb-eval-lola
Draft

Add enhanced LOLA eval compatibility#409
sgreenbury wants to merge 15 commits into
283/rb-core-supportfrom
283/rb-eval-lola

Conversation

@sgreenbury

Copy link
Copy Markdown
Contributor

Summary

  • Contributes towards Run latest pipeline with Rayleigh-Bénard dataset #283.
  • Extend LOLA-compatible metrics with VMSE/VRMSE v2, spectral
    PSRMSE/PSCC bands, spread, skill, and spread-skill ratio outputs.
  • Add LOLA-aligned rollout controls, including eval.rollout_start,
    autoencoded-target metric CSVs, spatial transpose/aspect controls, and
    metric skip flags for cheaper rerenders.
  • Add ensemble diagnostics for selected deterministic metric members,
    member-average deterministic metrics, and optional rollout member
    visualizations.
  • Update wrapped LOLA encoder/decoder eval paths to preserve conditioning
    and support chunked forwards for high-resolution rollout evaluation.
  • Add a Rayleigh-Benard FM eval submitter for LOLA autoencoder-backed
    encode-once evaluation.

Testing

  • Not run in this PR-prep pass.

sgreenbury added 15 commits May 23, 2026 18:16
Introduce the first encoder-processor-decoder eval path for saved
runs so RB experiments can render and score rollouts offline.
Add the command-line eval script and config wiring needed to run
encoder-processor-decoder evaluations outside training.
Support applying expensive tensor functions over smaller chunks.
This keeps memory-heavy eval metrics usable on large rollouts.
Include energy score with the memory-intensive metrics so eval can
report ensemble quality alongside deterministic diagnostics.
Move optional chunking support into the metric base classes. This
keeps metric implementations smaller and gives eval one API.
Ensure scalar eval values are logged when metric objects return
plain tensors, preserving summary output for RB comparisons.
Allow eval rendering and metrics to transpose spatial axes so LOLA
outputs can be compared in the expected Rayleigh-Benard layout.
Add deterministic metrics matching the LOLA evaluation convention
so RB eval outputs can be compared against existing baselines.
Extend the eval script with the controls needed by the RB workflow,
including output selection and rendering options.
Render individual ensemble members during eval so comparisons can
inspect member spread instead of only aggregate summaries.
Allow eval runs to skip metric and benchmark calculations. This
supports render-only checks when full scoring is too expensive.
Wire the skip controls through the eval script so render-only RB
jobs can avoid unnecessary metric and benchmark work.
Restore current origin/main snapshot plotting utilities and reapply
the eval-specific member rows and transpose arguments on top.
Only auto-fill encoder input channels when that key exists. This
preserves configs without in_channels while keeping RB auto setup.
Update spectral metrics to match LOLA ensemble handling and add optional
AE-relative rollout outputs. Add eval.rollout_start as generic eval
configuration, interpreting it as the final conditioning timestep for
LOLA-compatible rollout starts.
@sgreenbury sgreenbury mentioned this pull request Jun 5, 2026
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant