FAIRChem is Meta FAIR Chemistry's ML framework for atomistic simulations. Core abstractions: foundation models (UMA) with backbone+heads architecture, ASE calculator integration, Hydra-based config, and multi-task training via TorchTNT.
# Install
pip install -e packages/fairchem-core[dev]
# Tests (always pass -c flag)
pytest tests -c packages/fairchem-core/pyproject.toml
pytest tests/core/models/test_uma.py -vv
pytest tests/core -m "not gpu"
# Lint & format — REQUIRED for every modified file before committing
pre-commit run --files path/to/modified_file.py
# CLI
fairchem -c config.yaml [overrides...]IMPORTANT: You MUST run pre-commit run --files /path/to/modified_file.py on every file you modify, before considering the task complete. No exceptions.
Every file must start with:
"""
Copyright (c) Meta Platforms, Inc. and affiliates.
This source code is licensed under the MIT license found in the
LICENSE file in the root directory of this source tree.
"""
from __future__ import annotationsLine length: 88 characters. Linter: Ruff (config in ruff.toml).
Docstrings use Google convention. No text on opening/closing quote lines:
# WRONG
"""This is wrong."""
# RIGHT
"""
Short description.
"""
# RIGHT (with args)
"""
Short description.
Args:
x: The input tensor.
Returns:
The processed tensor.
"""Imports: isort enforced via Ruff. fairchem.core is known-first-party. Use TYPE_CHECKING for type-only imports:
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from fairchem.core.datasets.atomic_data import AtomicDataAll tests go in tests/ (mirrors src/fairchem/ structure). Always run with:
pytest tests -c packages/fairchem-core/pyproject.toml@pytest.mark.gpu: GPU-only (auto-skipped when CUDA unavailable)@pytest.mark.cpu_and_gpu: Runs on both CPU and GPU@pytest.mark.dgl: Requiresfairchem_cpp@pytest.mark.inference_check: Inference validation (skipped by default)
Root conftest (tests/conftest.py):
seed_fixture(function): Seeds all RNGs to 42water_xyz_file(session): Path to a minimal 3-atom water XYZ filecompile_reset_state(function): Resetstorch.compilerbefore/after testsetup_before_each_test(autouse): Cleans up Ray, GPU memory, distributed state
Core conftest (tests/core/conftest.py):
dummy_binary_dataset(session, parametrized): ASE dataset in both LMDB and CIF formatsfake_uma_dataset(session): Full UMA training dataset path + configdirect_checkpoint(session): Trained model checkpoint (inference + resume)direct_mole_checkpoint(session): Trained MOLE checkpointtorch_deterministic(function): Enables deterministic algorithmssnapshot(function): Syrupy snapshot with approximate numpy comparison (Approx)
Dataset conftest (tests/core/datasets/conftest.py):
structures(module): List of test atoms [H2O molecule, Cu bulk, Pt slab]
GPU/CPU dual tests:
@pytest.mark.gpu
def test_something_gpu():
_test_something("cuda")
def test_something_cpu():
_test_something("cpu")
def _test_something(device):
# shared implementation
...Snapshot testing with approximate comparison:
def test_values(snapshot):
result = compute_something()
assert pytest.approx(result.numpy(), abs=1e-3) == snapshotIntegration tests using the CLI:
from tests.core.testing_utils import launch_main
def test_training(fake_uma_dataset):
sys_args = [
"--config",
"tests/core/units/mlip_unit/test_mlip_train.yaml",
f"datasets.data_root_dir={fake_uma_dataset}",
"job.device_type=CPU",
"max_steps=2",
]
launch_main(sys_args)Models use HydraModel (registered as "hydra"): one backbone extracts features, multiple heads predict properties.
BackboneInterface.forward(data: AtomicData) -> dict[str, Tensor] # features
HeadInterface.forward(data: AtomicData, emb: dict) -> dict[str, Tensor] # predictions
Primary backbone: escnmd_backbone (SO(3)-equivariant eSCN with MD modifications).
Heads: MLP_Energy_Head, Linear_Force_Head, DatasetSpecificSingleHeadWrapper.
Components are registered for dynamic Hydra instantiation:
@registry.register_model("my_backbone")
class MyBackbone(nn.Module, BackboneInterface): ...Available decorators: register_model, register_dataset, register_loss, register_task, register_logger, register_trainer.
Lookup: registry.get_model_class("my_backbone") or by full import path "fairchem.core.models.my_module.MyBackbone".
ASE Atoms -> AtomicData.from_ase() -> graph generation -> backbone -> heads -> predictions
AtomicData required fields: pos, atomic_numbers, cell, pbc, natoms, edge_index, cell_offsets, nedges, charge, spin, fixed, tags.
Optional targets: energy, forces, stress.
Batching via atomicdata_list_to_batch(). Multi-task collation via MTCollater (fills missing targets with inf for loss masking).
YAML configs use _target_ keys for component instantiation:
runner:
_target_: fairchem.core.components.train.train_runner.TrainEvalRunner
train_eval_unit:
_target_: fairchem.core.units.mlip_unit.mlip_unit.MLIPTrainEvalUnit
model:
_target_: fairchem.core.models.base.HydraModel
backbone: ${backbone}Config sections: job, runner, datasets, tasks, backbone, optimizer.
Default configs in configs/. Overrides via CLI: fairchem -c config.yaml key=value.
oc20: Catalysis (Open Catalyst)omat: Inorganic materialsomol: Moleculesodac: Metal-organic frameworksomc: Molecular crystals
TrainEvalRunner orchestrates training via TorchTNT's fit(). Core unit: MLIPTrainEvalUnit (handles forward pass, loss, metrics, EMA, gradient clipping). Checkpoints use DCP (Distributed Checkpoint Protocol) with dcp_to_torch_save() for inference export.
from fairchem.core import pretrained_mlip, FAIRChemCalculator
predictor = pretrained_mlip.get_predict_unit("uma-s-1p1", device="cuda")
calc = FAIRChemCalculator(predictor, task_name="oc20")
atoms.calc = calcsrc/fairchem/core/
├── models/ # Backbones and heads (UMA, eSCN-MD, GemNet)
├── datasets/ # Data loading (LMDB, ASE), collaters, samplers
├── components/ # Runner components (train, evaluate, calculate)
├── units/ # TorchTNT train/eval/predict units
├── modules/ # Loss, schedulers, normalizers, evaluators
├── launchers/ # Local, Ray, SLURM job launchers
├── common/ # Registry, distributed utils, logging
├── graph/ # Graph generation, neighbor finding with PBC
└── _cli.py # CLI entry point
tests/ # All tests (mirrors src structure)
packages/ # Installable packages (fairchem-core, fairchem-data-*)
configs/ # Hydra YAML configs (datasets, tasks, backbone, optimizer)
torch~=2.8.0,e3nn>=0.5- PyTorch + equivariant neural networksase>=3.26.0- Atomic Simulation Environmenttorchtnt- PyTorch training framework (TrainUnit/EvalUnit)hydra-core+omegaconf- Configuration managementlmdb- Dataset storage formatray[serve]>=2.53.0- Distributed computing
Anytime we learn something that could be beneficial in future coding sessions, automatically add it to CLAUDE.md.
This includes:
- Gotchas that are not obvious
- Subtle bugs that manifest under specific conditions
- Repeat corrections I make to the output of coding agents