Add enhanced LOLA eval compatibility#409
Draft
sgreenbury wants to merge 15 commits into
Draft
Conversation
Introduce the first encoder-processor-decoder eval path for saved runs so RB experiments can render and score rollouts offline.
Add the command-line eval script and config wiring needed to run encoder-processor-decoder evaluations outside training.
Support applying expensive tensor functions over smaller chunks. This keeps memory-heavy eval metrics usable on large rollouts.
Include energy score with the memory-intensive metrics so eval can report ensemble quality alongside deterministic diagnostics.
Move optional chunking support into the metric base classes. This keeps metric implementations smaller and gives eval one API.
Ensure scalar eval values are logged when metric objects return plain tensors, preserving summary output for RB comparisons.
Allow eval rendering and metrics to transpose spatial axes so LOLA outputs can be compared in the expected Rayleigh-Benard layout.
Add deterministic metrics matching the LOLA evaluation convention so RB eval outputs can be compared against existing baselines.
Extend the eval script with the controls needed by the RB workflow, including output selection and rendering options.
Render individual ensemble members during eval so comparisons can inspect member spread instead of only aggregate summaries.
Allow eval runs to skip metric and benchmark calculations. This supports render-only checks when full scoring is too expensive.
Wire the skip controls through the eval script so render-only RB jobs can avoid unnecessary metric and benchmark work.
Restore current origin/main snapshot plotting utilities and reapply the eval-specific member rows and transpose arguments on top.
Only auto-fill encoder input channels when that key exists. This preserves configs without in_channels while keeping RB auto setup.
Update spectral metrics to match LOLA ensemble handling and add optional AE-relative rollout outputs. Add eval.rollout_start as generic eval configuration, interpreting it as the final conditioning timestep for LOLA-compatible rollout starts.
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PSRMSE/PSCC bands, spread, skill, and spread-skill ratio outputs.
eval.rollout_start,autoencoded-target metric CSVs, spatial transpose/aspect controls, and
metric skip flags for cheaper rerenders.
member-average deterministic metrics, and optional rollout member
visualizations.
and support chunked forwards for high-resolution rollout evaluation.
encode-once evaluation.
Testing