AutoCast

Installation

Prerequisites

uv: running scripts; managing virtual environments
ffmpeg: optional video generation during evaluation

Usage

If you'd just like to use the code in autocast:

# Clone the repo
git clone https://github.com/alan-turing-institute/autocast.git
cd autocast

# Install dependencies
uv sync

This will allow you to run uv run autocast from within the repository.

Development

If you want to contribute to the autocast codebase, the following will get you set up:

# Clone the repo
git clone https://github.com/alan-turing-institute/autocast.git
cd autocast

# Install development dependencies
uv sync --extra dev

# Set up pre-commit checks, so that any pushed commits pass CI
uv run pre-commit install

Introduction

autocast is primarily meant to be used as a CLI tool.

The autocast CLI is built on top of Hydra. This means that configurations are specified in YAML files and can be composed together to quickly switch between different datasets and model architectures.

Base configurations and subcommands

The 'base' configurations for datasets and model architectures are stored in src/autocast/configs. In particular, autocast comes with some subcommands for training standard model stacks:

Command	Description	Default config
`uv run autocast ae`	Train an autoencoder	`src/autocast/configs/autoencoder.yaml`
`uv run autocast cache-latents`	Cache latents from an encoder	`src/autocast/configs/cache_latents.yaml`
`uv run autocast processor`	Train a processor (frozen encoder/decoder)	`src/autocast/configs/processor.yaml`
`uv run autocast epd`	Train an encoder-processor-decoder	`src/autocast/configs/encoder_processor_decoder.yaml`
`uv run autocast eval`	Evaluate a trained model	`src/autocast/configs/eval/encoder_processor_decoder.yaml`

Notice that each of these YAML files in turn refer to a number of other YAML files. For example, src/autocast/configs/autoencoder.yaml specifies (amongst other things)

defaults:
  - model: autoencoder
  - logging: wandb

which in turn point to src/autocast/configs/model/autoencoder.yaml and src/autocast/configs/logging/wandb.yaml respectively. In this way, configurations can be built up from smaller pieces in a modular way.

Adding and overriding configurations

Hydra allows you to add or override any configuration value from the command line. See the Hydra documentation for more details. As an example, to override the number of training epochs for the ae command, you can run:

uv run autocast ae trainer.max_epochs=5

Note that this only works if the trainer.max_epochs key is defined in the default configuration for that command. If the key is not defined, you have to prefix it with + to tell Hydra to add it:

uv run autocast ae +trainer.max_epochs=5

If you want to specify the option regardless of whether it is defined in the default config or not, you can use ++:

uv run autocast ae ++trainer.max_epochs=5

Data modules

By default, these commands all point to their own data modules, which specify the dataset and how it gets loaded. The data module configurations are stored in src/autocast/configs/datamodule/.

autocast can read datasets stored in two different formats:

Format	Appropriate setting for `datamodule`	Override...
`.pt` files from autosim	`datamodule=advection_diffusion`	`datamodule.data_path`
HDF5 files from The Well	`datamodule=the_well`	`datamodule.well_base_path` and `datamodule.well_dataset_name`

For example, let's say that you have used autosim to generate a dataset of advection-diffusion simulations. We'll keep the size of the spatial grid (simulator.n) and the number of trajectories (dataset.n_...) small for this example. We'll also manually specify the output directory for the generated dataset, so that we can point autocast to it later (otherwise autosim will automatically generate a directory for you by default):

# See the autosim repository for more information on this.

uv run autosim simulator=advection_diffusion \
    simulator.n=16 dataset.n_train=10 dataset.n_valid=2 dataset.n_test=2 \
    dataset.output_dir=/path/to/dataset

You can then train an autoencoder on that dataset (with all other settings inherited from that default) with:

uv run autocast ae \
    datamodule=advection_diffusion \
    datamodule.data_path=/path/to/dataset \
    +trainer.max_epochs=5

Output paths

To specify the output directory for an experiment, you can either use the --workdir flag:

uv run autocast ae \
    --workdir /path/to/output/directory \
    datamodule=advection_diffusion \
    datamodule.data_path=/path/to/dataset \
    +trainer.max_epochs=5

or alternatively, specify --run-group and --run-id

uv run autocast ae \
    --run-group MYGROUP \
    --run-id MYID \
    datamodule=advection_diffusion \
    datamodule.data_path=/path/to/dataset \
    +trainer.max_epochs=5

and autocast will automatically use outputs/MYGROUP/MYID as the output directory. If logging to Weights and Biases is enabled, the run ID will also be used for the default W&B run name.

--run-group defaults to the current date, and --run-id defaults to a legacy-style run id (a concatenation of dataset/model/hash/uuid).

Making your own configurations

If you find yourself making the same overrides repeatedly, it is probably worth it to make a new YAML configuration that specifies these overrides. If they are generally useful for the package, these can stored in src/autocast/configs/experiments/<myexpt>.yaml and specified on the command-line with +experiment=<myexpt>.

Some of the configurations we used for our experiments are stored in the local_hydra/local_experiment folder. These are not in the main src/autocast/configs folder because they are not meant to be distributed as part of the package. These configuration files can, however, still be used by setting local_experiment=<name> on the command line.

(This works because autocast makes sure to add local_hydra to Hydra's search path, allowing you to load configurations from there even though they are outside the package.)

Running experiments on SLURM

autocast supports running experiments on SLURM clusters by adding the --mode slurm flag. This automatically generates a submission Bash script and submits it to the cluster, so you don't have to worry about writing your own submission scripts.

For example, to run the same autoencoder training as above, but on SLURM, you can run:

uv run autocast ae --mode slurm \
    datamodule=advection_diffusion \
    datamodule.data_path=/path/to/dataset \
    +trainer.max_epochs=5

Evaluating models

To run a series of preset evaluation tests on a saved model checkpoint, including single-step predictions and autoregressive rollout, you can use the eval subcommand and set --workdir to the run folder containing the configuration and model checkpoint to evaluate.

uv run autocast eval \
    --workdir /path/to/outputs

Some useful Hydra options for further controlling the evaluation are:

autoencoder_checkpoint: the path to the autoencoder checkpoint to use for evaluation (if applicable). This is used if you trained a standalone processor (i.e., uv run autocast processor) in latent space. If you trained a full encoder-processor-decoder stack (i.e., uv run autocast epd), the autoencoder is already part of the model checkpoint, so does not need to be supplied separately.
eval.metrics: a list of metrics to compute during evaluation.
eval.n_members: the number of members to use for ensemble evaluation. Increasing this allows you to get a smoother estimate of the model's uncertainty.

Other useful configuration flags

Performing a dry run

Add --dry-run to print the commands that will be executed without actually running them.

Using multiple GPUs / nodes

Multi-GPU and multi-node SLURM runs are supported through distributed presets (found in src/autocast/configs/distributed/):

To use 4 GPUs on a single node with DDP, add ++distributed=ddp_4gpu_slurm
To use 8 GPUs across 2 nodes with DDP, add ++distributed=ddp_4gpu_2node_slurm
To use 12 GPUs across 3 nodes with DDP, add ++distributed=ddp_4gpu_2node_slurm ++trainer.num_nodes=3 ++eval.num_nodes=3 ++hydra.launcher.nodes=3

The preset configurations set both Lightning trainer.devices/trainer.num_nodes and the matching Slurm hydra.launcher.nodes/gpus_per_node/tasks_per_node values. For more fine-grained control, you can also explicitly override these configuration values:

uv run autocast epd --mode slurm \
	datamodule=reaction_diffusion \
	trainer.devices=4 trainer.num_nodes=2 trainer.strategy=ddp \
	hydra.launcher.nodes=2 hydra.launcher.gpus_per_node=4 \
	hydra.launcher.tasks_per_node=4

Resuming from a checkpoint

The following extra CLI options can be passed to the ae, epd, and train-eval subcommands (or added to configuration files):

Resume from a saved checkpoint. The default is to perform a full-state resume, which restores model + optimizer/scheduler/trainer loop state.
```
--resume-from path/to/encoder_processor_decoder.ckpt
```

To additionally reset the timer budget:

--resume-from path/to/encoder_processor_decoder.ckpt ++trainer.max_time="00:04:00:00" ++train_eval.reset_resume_time_budget=true

To restore only the model weights and generate a fresh optimizer/trainer state:
```
--resume-from path/to/encoder_processor_decoder.ckpt ++trainer.max_time="00:04:00:00" ++train_eval.resume_weights_only=true
```
In conjunction with trainer.max_time, this allows you to continue training with a fresh timer budget. Note that if resume_weights_only=true is set without a checkpoint, AutoCast raises an error.

Weights & Biases logging

AutoCast optionally integrates with Weights & Biases; this is driven by the Hydra config in src/autocast/configs/logging/wandb.yaml.

Logging to W&B can be enabled (or disabled) with ++logging.wandb.enabled=true (or false respectively).

The W&B project name can be set with ++logging.wandb.project=MYPROJECT.

By default the --run-id is used as the W&B run name. You can override this with ++logging.wandb.name=MYNAME.

If you are resuming from a previous run and want to continue logging to the same W&B run, you need to set:

++logging.wandb.id=EXISTING_RUN_ID (the run ID of the existing W&B run)
++logging.wandb.resume=allow (or must)

Without this combination, W&B will create a new run even if the logging.wandb.name matches an existing run name.

Direct usage of lower-level Hydra scripts

The autocast CLI is a convenient wrapper around the lower-level Hydra scripts in src/autocast/scripts/. Here are some example invocations:

Train autoencoder script

uv run train_autoencoder \
	hydra.run.dir=outputs/rd/00 \
	datamodule.data_path=$AUTOCAST_DATASETS/reaction_diffusion \
	datamodule.use_simulator=false \
	optimizer.learning_rate=0.00005 \
	trainer.max_epochs=10 \
	logging.wandb.enabled=true

Train processor script

uv run train_encoder_processor_decoder \
	hydra.run.dir=outputs/rd/00 \
	datamodule.data_path=$AUTOCAST_DATASETS/reaction_diffusion \
	datamodule.use_simulator=false \
	optimizer.learning_rate=0.0001 \
	trainer.max_epochs=10 \
	logging.wandb.enabled=true \
	'autoencoder_checkpoint=outputs/rd/00/autoencoder.ckpt'

Evaluation script

uv run evaluate_encoder_processor_decoder \
	hydra.run.dir=outputs/rd/00/eval \
	eval.checkpoint=outputs/rd/00/encoder_processor_decoder.ckpt \
	eval.batch_indices=[0,1,2,3] \
	eval.video_dir=outputs/rd/00/eval/videos \
	datamodule.data_path=$AUTOCAST_DATASETS/reaction_diffusion \
	datamodule.use_simulator=false

Ethical guidance

See ETHICAL_GUIDANCE.md for guidance on the intended scope and use of AutoCast.

Contributors ✨

Thanks goes to these wonderful people (emoji key):

_{Jason McEwen} 🤔 📆	_{Radka Jersakova} 🤔 📆 💻 👀	_{Paolo Conti} 🤔 💻 👀	_{Marjan Famili} 🤔 💻 👀	_{Christopher Iliffe Sprague} 🤔 💻 👀	_Edwin 🤔 💻 👀	_{Sam Greenbury} 🤔 📆 💻 👀
_QC 🤔 💻 🐛	_{Penelope Yong} 🤔 💻 🐛 👀	_farhanferoz 🤔 💻 🐛 👀

This project follows the all-contributors specification. Contributions of any kind welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 1,752 Commits
.github/workflows		.github/workflows
docs		docs
local_hydra		local_hydra
notebooks		notebooks
run_manifests		run_manifests
scripts		scripts
slurm_scripts		slurm_scripts
src/autocast		src/autocast
tests		tests
.all-contributorsrc		.all-contributorsrc
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
AC.png		AC.png
ETHICAL_GUIDANCE.md		ETHICAL_GUIDANCE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoCast

Installation

Prerequisites

Usage

Development

Introduction

Base configurations and subcommands

Adding and overriding configurations

Data modules

Output paths

Making your own configurations

Running experiments on SLURM

Evaluating models

Other useful configuration flags

Performing a dry run

Using multiple GPUs / nodes

Resuming from a checkpoint

Weights & Biases logging

Direct usage of lower-level Hydra scripts

Train autoencoder script

Train processor script

Evaluation script

Ethical guidance

Contributors ✨

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AutoCast

Installation

Prerequisites

Usage

Development

Introduction

Base configurations and subcommands

Adding and overriding configurations

Data modules

Output paths

Making your own configurations

Running experiments on SLURM

Evaluating models

Other useful configuration flags

Performing a dry run

Using multiple GPUs / nodes

Resuming from a checkpoint

Weights & Biases logging

Direct usage of lower-level Hydra scripts

Train autoencoder script

Train processor script

Evaluation script

Ethical guidance

Contributors ✨

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages