Skip to content

Commit 0074339

Browse files
lbluquemshuaibii
andauthored
Evaluation and benchmark docs (#1402)
* evaluation doc page * cleanup evaluation docs * cleanup evaluation docs * title nits * benchmark docs * remove extra title --------- Co-authored-by: Muhammed Shuaibi <45150244+mshuaibii@users.noreply.github.com>
1 parent 4c6d01e commit 0074339

6 files changed

Lines changed: 165 additions & 7 deletions

File tree

docs/core/common_tasks/ase_calculator.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ kernelspec:
1111
name: python3
1212
---
1313

14-
Inference using ASE and Predictor Interface
14+
Inference using ASE and Predictor interface
1515
------------------
1616

1717
Inference is done using [MLIPPredictUnit](https://github.com/facebookresearch/fairchem/blob/main/src/fairchem/core/units/mlip_unit/mlip_unit.py#L867). The [FairchemCalculator](https://github.com/facebookresearch/fairchem/blob/main/src/fairchem/core/calculate/ase_calculator.py#L3) (an ASE calculator) is simply a convenience wrapper around the MLIPPredictUnit.

docs/core/common_tasks/ase_dataset_creation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11

2-
# FAIRChem & Custom Datasets
2+
# FAIRChem & custom datasets
33

44
## Datasets in `fairchem`:
55
`fairchem` provides training and evaluation code for tasks and models that take arbitrary
Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
## Running model benchmarks
2+
3+
Model benchmarks involve evaluating a model on downstream property predictions involving several model evaluations to calculate a single or set of related properties. For example calculating structure relaxations, elastic tensors, phonons, or adsportion energy.
4+
5+
To benchmark UMA models on standard datasets, you can find benchmark configuration files in `configs/uma/benchmark`. Example files include:
6+
- `adsorbml.yaml`
7+
- `hea-is2re.yaml`
8+
- `kappa103.yaml`
9+
- `matbench-discovery-discovery.yaml`
10+
- `mdr-phonon.yaml`
11+
12+
Note that to run these UMA benchmarks you will need to obtain the target data.
13+
14+
1. **Run the Benchmark Script**
15+
Use the same runner script, specifying the benchmark config:
16+
```bash
17+
fairchem --config configs/uma/benchmark/benchmark.yaml
18+
```
19+
Replace `benchmark.yaml` with the desired benchmark config file.
20+
21+
2. **Output**
22+
Benchmark results will are saved to a *results* directory under the *run_dir* specified in the configuration file. Additionally benchmark metrics are logged using the specified logger. We currently only support Weights and Biases.
23+
24+
## Benchmark Configuration File Format
25+
26+
Evaluation configuration files are written in Hydra YAML format and specify how a model evaluation should be run. UMA evaluation configuration files, which can be used as templates to evaluate other models if needed, are located in `configs/uma/evaluate/`.
27+
28+
### Top-Level Keys
29+
30+
The benchmark configuration files follow the same format as model training and evaluation configuration files, with the addition of a **reducer** flag to specify how final metrics are calculated from the results of a given benchmark calculation protocol.
31+
32+
A benchmark configuration files should define the following top level keys:
33+
34+
- **job**: Contains all settings related to the evaluation job itself, including model, data, and logger configuration. For additional details see the description given in the Evaluation page.
35+
- **runner**: Contains settings for a `CalculateRunner` which implements a downstream property calculation or simulation.
36+
- **reducer**: Contains the settings for a `BenchmarkReducer` class which defines how to aggregate the results of calculated by the `CalculateRunner` and computes metrics based on given target values.
37+
38+
#### `CalculateRunner`s:
39+
The benchmark details including the type of calculations and the model checkpoint are specified under the runner flag. The specific benchmark calculations are based on the chosen `CalculateRunner` (for example a `RelaxationRunner`). Several `CalculateRunner` implementations are found in the `fairchem.core.components.calculate` submodule.
40+
41+
### Implementing new calculations in a `CalculateRunner`
42+
It is straightforward to write your own calculations in a `CalculateRunner`. Although implementation is very flexible and open ended, we suggest that you have a look at the interface set up by the `CalculateRunner` base class. At a minimum you will need to implement the following methods:
43+
44+
```python
45+
def calculate(self, job_num: int = 0, num_jobs: int = 1) -> R:
46+
"""Implement your calculations here by iterating over the self.input_data attribute"""
47+
48+
def write_results(
49+
self, results: R, results_dir: str, job_num: int = 0, num_jobs: int = 1
50+
) -> None:
51+
"""Write the results returned by your calculations in the method above"""
52+
```
53+
54+
You will also see a `save_state` and `load_state` abstract methods that you can use to checkpoint calculations, however in most cases if calculations are fast enough you wont need this and you can simply implement those as empty methods.
55+
56+
57+
#### `BenchmarkReducer`s:
58+
A `CalculateRunner` will run calculations over a given set of structures and write out results. In order to compute benchmark metrics, a `BenchmarkReducer` is used to aggregate all these results, compute metrics and report them. Implementations of `BenchmarkReducer` classes are found in the `fairchem.core.components.benchmark` submodule
59+
60+
### Implenting metrics in a `BenchmarkReducer`
61+
62+
If you want to implement your own benchmark metric calculation you can write a `BenchmarkReducer` class. At a minimum, you will need to implement the following methods:
63+
64+
```python
65+
def join_results(self, results_dir: str, glob_pattern: str) -> R:
66+
"""Join your results from multiple files into a single result object."""
67+
68+
def save_results(self, results: R, results_dir: str) -> None:
69+
"""Save joined results to a single file"""
70+
71+
def compute_metrics(self, results: R, run_name: str) -> M:
72+
"""Compute metrics using the joined results and target data in your BenchmarkReducer."""
73+
74+
def save_metrics(self, metrics: M, results_dir: str) -> None:
75+
"""Save the computed metrics to a file."""
76+
77+
def log_metrics(self, metrics: M, run_name: str):
78+
"""Log metrics to the configured logger."""
79+
```
80+
81+
If it makes sense for your benchmark metrics and are happy working with dictionaries and pandas `DataFrames`, a lot of boilerplate code is implemented in the `JsonDFReducer`. We recommend that you start there by deriving your class from it, and focusing only on implementing the `compute_metrics` method.
Lines changed: 79 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,80 @@
1-
# Evaluation
1+
# Evaluating pretrained models
22

3-
This repo provides a number of methods used to benchmark and evaluate the UMA models that will be helpful for apples-to-apples comparisons with the paper results. More details to be provided here soon.
3+
`fairchemV2` provides a number of methods used to benchmark and evaluate the UMA models that will be helpful for apples-to-apples comparisons with the paper results. More details to be provided here soon.
4+
5+
## Running Model Evaluations
6+
7+
To evaluate a UMA model using a pre-existing configuration file, follow these steps. Example configuration files used to evaluate uma models are stored in `configs/uma/evaluate`.
8+
9+
1. **Run the Evaluation Script**
10+
To run an evaluation simply run:
11+
```bash
12+
fairchem --config evaluation_config.yaml
13+
```
14+
Replace `evaluation_config.yaml` with the desired config file. For example, `configs/uma/evaluate/uma_conserving.yaml`
15+
16+
1. **Output**
17+
Results will be logged according the specified logger. We currently only support Weights and Biases.
18+
19+
## Evaluation Configuration File Format
20+
21+
Evaluation configuration files are written in Hydra YAML format and specify how a model evaluation should be run. UMA evaluation configuration files, which can be used as templates to evaluate other models if needed, are located in `configs/uma/evaluate/`.
22+
23+
### Top-Level Keys
24+
25+
Similar to training configuration files, the only allowed top-level keys are the `job` and `runner` keys as well interpolation keys that are resolved at runtime.
26+
27+
- **job**: Contains all settings related to the evaluation job itself, including model, data, and logger configuration.
28+
- **runner**: Contains settings for the evaluation runner, such as which script to use and runtime options.
29+
30+
Important configuration options are nested under these keys as follows:
31+
32+
#### Under `job`:
33+
Specifications of how to run the actual job. The configuration options are the same here as those in a training job. Some notable flags are detailed below,
34+
- `device_type`: The device to run model inference on (ie CUDA or CPU)
35+
- `scheduler`: The compute scheduler specifications
36+
- `logger`: Configuration for logging results.
37+
- `type`: Logger type (e.g., `wandb`).
38+
- `project`: Logging project name.
39+
- `entity`: (Optional) Logger entity/user.
40+
- `run_dir`: Directory where results and logs will be saved.
41+
42+
#### Under `runner`:
43+
The actual benchmark details such as model checkpoint and the dataset are specified under the runner flag. An evaluation run should use the `EvalRunner` class which relies on an `MLIPEvalUnit` to run inference using a pretrained model.
44+
45+
- `dataloader`: Dataloader specification for the evaluation dataset.
46+
- `eval_unit`: The specification of the `MLIPEvalUnit` to be used.
47+
- `tasks`: The prediction task configuration. In almost all cases you can think of, these should be loaded from a model checkpoint using the `fairchem.core.units.mlip_unit.utils.load_tasks` function.
48+
- `model`: Defines how to load a pretrained model. We recommend using the `fairchem.core.units.mlip_unit.mlip_unit.load_inference_model` function to do so.
49+
50+
51+
### Using the `defaults` key to define config groups
52+
53+
The `defaults` key is a Hydra feature that allows you to compose configuration files from modular config groups. Each entry under `defaults` refers to a config group (such as `model`, `data`, or other reusable components) that is merged into the final configuration at runtime. This makes it easy to swap out models, datasets, or other settings without duplicating configuration code.
54+
55+
For example in the UMA evaluation configs we have set up the following config groups and defaults:
56+
```yaml
57+
defaults:
58+
- _self_
59+
- model: omc_conserving
60+
- data: my_eval_data
61+
```
62+
This will include the configuration from `configs/uma/evaluate/model/omc_conserving.yaml` and `configs/uma/evaluate/data/my_eval_data.yaml` into the main config. The `_self_` entry ensures the current file's contents are included.
63+
64+
You can create new config groups or override existing ones by changing the entries under `defaults`.
65+
66+
```yaml
67+
defaults:
68+
- cluster: Configuration settings for a particular compute cluster
69+
- dataset: Configuration settings for the evaluation dataset
70+
- checkpoint: Configuration settings of the pretrained model checkpoint
71+
- _self_
72+
```
73+
74+
Using config groups allows to easily override defaults in the cli. For example,
75+
76+
```bash
77+
fairchem --config evaluation_config.yaml cluster=cluster_config checkpoint=checkpoint_config
78+
```
79+
80+
Where `cluster_config` and `checkpoint_config` are cluster and checkpoint configuration files written to directories under cluster and checkpoint respectively. See the files in `configs/uma/evaluate` as a full example.

docs/core/common_tasks/training.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
# Training
1+
# Training models from scratch
22

33
This repo is uses to train large state-of-the-art graph neural networks from scratch on datasets like OC20, OMol25, or OMat24, among others. We now provide a simple CLI to handle this using your own custom datasets, but we suggest fine-tuning one of the existing checkpoints first before trying a from-scratch training.

docs/core/common_tasks/workflows.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,8 @@ kernelspec:
1111
name: python3
1212
---
1313

14-
Workflows
15-
------------------
14+
Calculation workflows with FAIRChem models
15+
------------------------------------------
1616

1717
This repo is integrated with workflow tools like [QuAcc](https://github.com/Quantum-Accelerators/quacc) to make complex molecular simulation workflows easy. You can use any MLP recipe (relaxations, single-points, elastic calculations, etc) and simply specify the `fairchem` model type. Below is an example that uses the default elastic_tensor_flow flow.
1818

0 commit comments

Comments
 (0)