You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are multiple ways to train and evaluate FAIRChem models on data other than OC20 and OC22. Writing an LMDB is the most performant option. However, ASE-based dataset formats are also included as a convenience for people with existing data who simply want to try fairchem tools without needing to learn about LMDBs.
Copy file name to clipboardExpand all lines: docs/core/uma.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -34,10 +34,10 @@ UMA is trained on 5 different DFT datasets with different levels of theory. An U
34
34
35
35
| Task | Dataset | DFT Level of Theory | Relevant applications | Usage Notes |
36
36
| ------- | ------- | ----- | ------ | ----- |
37
-
| omol |[Omol25](https://arxiv.org/abs/2505.08762)| wB97M-V/def2-TZVPD as implemented in ORCA6, including non-local dispersion. All solvation should be explicit. | Biology, organic chemistry, protein folding, small-molecule pharmaceuticals, organic liquid properties, homogeneous catalysis | total charge and spin multiplicity. If you don't know what these are, you should be very careful if modeling charged or open-shell systems. This can be used to study radical chemistry or understand the impact of magnetic states on the structure of a molecule. All training data is aperiodic, so any periodic systems should be treated with some caution. Probably won't work well for inorganic materials. |
38
-
| omc |Omc25| PBE+D3 as implemented in VASP. | Pharmaceutical packaging, bio-inspired materials, organic electronics, organic LEDs | UMA has not seen varying charge or spin multiplicity for the OMC task, and expects total_charge=0 and spin multiplicity=0 as model inputs. |
39
-
| omat |[Omat24](https://arxiv.org/abs/2410.12771)| PBE/PBE+U as implemented in VASP using Materials Project suggested settings, except with VASP 54 pseudopotentials. No dispersion. | Inorganic materials discovery, solar photovoltaics, advanced alloys, superconductors, electronic materials, optical materials | UMA has not seen varying charge or spin multiplicity for the OMat task, and expects total_charge=0 and spin multiplicity=0 as model inputs. Spin polarization effects are included, but you can't select the magnetic state. Further, OMat24 did not fully sample possible spin states in the training data. |
37
+
| omol |[OMol25](https://arxiv.org/abs/2505.08762)| wB97M-V/def2-TZVPD as implemented in ORCA6, including non-local dispersion. All solvation should be explicit. | Biology, organic chemistry, protein folding, small-molecule pharmaceuticals, organic liquid properties, homogeneous catalysis | total charge and spin multiplicity. If you don't know what these are, you should be very careful if modeling charged or open-shell systems. This can be used to study radical chemistry or understand the impact of magnetic states on the structure of a molecule. All training data is aperiodic, so any periodic systems should be treated with some caution. Probably won't work well for inorganic materials. |
38
+
| omc |[OMC25](https://arxiv.org/abs/2508.02651)| PBE+D3 as implemented in VASP. | Pharmaceutical packaging, bio-inspired materials, organic electronics, organic LEDs | UMA has not seen varying charge or spin multiplicity for the OMC task, and expects total_charge=0 and spin multiplicity=0 as model inputs. |
39
+
| omat |[OMat24](https://arxiv.org/abs/2410.12771)| PBE/PBE+U as implemented in VASP using Materials Project suggested settings, except with VASP 54 pseudopotentials. No dispersion. | Inorganic materials discovery, solar photovoltaics, advanced alloys, superconductors, electronic materials, optical materials | UMA has not seen varying charge or spin multiplicity for the OMat task, and expects total_charge=0 and spin multiplicity=0 as model inputs. Spin polarization effects are included, but you can't select the magnetic state. Further, OMat24 did not fully sample possible spin states in the training data. |
40
40
| oc20 |[OC20*](https://arxiv.org/abs/2010.09990)| RPBE as implemented in VASP, with VASP5.4 pseudopotentials. No dispersion. | Renewable energy, catalysis, fuel cells, energy conversion, sustainable fertilizer production, chemical refining, plastics synthesis/upcycling | UMA has not seen varying charge or spin multiplicity for the OC20 task, and expects total_charge=0 and spin multiplicity=0 as model inputs. No oxides or explicit solvents are included in OC20. The model works surprisingly well for transition state searches given the nature of the training data, but you should be careful. RPBE works well for small molecules, but dispersion will be important for larger molecules on surfaces. |
41
-
| odac |[ODac23](https://arxiv.org/abs/2311.00341)| PBE+D3 as implemented in VASP, with VASP5.4 pseudopotentials. | Direct air capture, carbon capture and storage, CO2 conversion, catalysis | UMA has not seen varying charge or spin multiplicity for the ODAC task, and expects total_charge=0 and spin multiplicity=0 as model inputs. The ODAC23 dataset only contains CO2/H2O water absorption, so anything more than might be inaccurate (e.g. hydrocarbons in MOFs). Further, there is a limited number of bare-MOF structures in the training data, so you should be careful if you are using a new MOF structure. |
41
+
| odac |[ODAC23](https://arxiv.org/abs/2311.00341)| PBE+D3 as implemented in VASP, with VASP5.4 pseudopotentials. | Direct air capture, carbon capture and storage, CO2 conversion, catalysis | UMA has not seen varying charge or spin multiplicity for the ODAC task, and expects total_charge=0 and spin multiplicity=0 as model inputs. The ODAC23 dataset only contains CO2/H2O water absorption, so anything more than might be inaccurate (e.g. hydrocarbons in MOFs). Further, there is a limited number of bare-MOF structures in the training data, so you should be careful if you are using a new MOF structure. |
42
42
43
43
*Note: OC20 is was updated from the original OC20 and recomputed to produce total energies instead of adsorption energies.
The Open Molecular Crystals 2025 (OMC25) dataset was announced along with UMA, and comprises ~25 million calculations of organic molecular crystals from random packing of OE62 structures into various 3D unit cells. It is calculated at the PBE+D3 level of theory via VASP. More details and download information coming!
3
+
The Open Molecular Crystals 2025 (OMC25) dataset comprises >25 million structures of organic molecular crystals from relaxation trajectories of random packings of OE62 molecules into various 3D unit cells using Genarris 3.0 package. The dataset contains structures labeled with total energy (eV), forces (eV/A), and stress (ev/A^3) via VASP.
4
+
5
+
The training and validation splits of the OMC25 dataset are available for download from HuggingFace at https://huggingface.co/facebook/OMC25, under the CC BY 4.0 license, after applying for the repository access on HuggingFace.
6
+
7
+
## Dataset format
8
+
9
+
The dataset is provided in ASE DB compatible lmdb files (*.aselmdb).
10
+
11
+
## Level of theory
12
+
13
+
OMC25 was calculated at the PBE+D3 level via VASP. To reproduce the calculations, please use `fairchem.data.omc.scripts.create_vasp_inputs.py` to write compatible VASP inputs.
14
+
15
+
## Citing
16
+
17
+
We encourage users to cite this paper when using the OMC25 dataset or pretrained models for molecular crystals in their research.
title={Open Molecular Crystals 2025 (OMC25) Dataset and Models},
22
+
author={Vahe Gharakhanyan and Luis Barroso-Luque and Yi Yang and Muhammed Shuaibi and Kyle Michel and Daniel S. Levine and Misko Dzamba and Xiang Fu and Meng Gao and Xingyu Liu and Haoran Ni and Keian Noori and Brandon M. Wood and Matt Uyttendaele and Arman Boromand and C. Lawrence Zitnick and Noa Marom and Zachary W. Ulissi and Anuroop Sriram},
Copy file name to clipboardExpand all lines: docs/molecules/models.md
+28-9Lines changed: 28 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,32 +2,51 @@
2
2
3
3
**2025 recommendation:** We suggest using the [UMA model](../core/uma), trained on all of the FAIR chemistry datasets before using one of the checkpoints below. The UMA model has a number of nice features over the previous checkpoints
4
4
1. It is state-of-the-art in out-of-domain prediction accuracy
5
-
2. The UMA small model is an energy conserving and smooth checkpoint, so should work much better for vibrational calculations, molecular dynamics, etc.
5
+
2. The UMA small model is an energy conserving and smooth checkpoint, so should work much better for vibrational calculations, molecular dynamics, etc.
6
6
3. The UMA model is most likely to be updated in the future.
7
7
8
8
## Baseline models in the OMol25 paper
9
9
As part of the OMol25 release, we released two sets of models:
10
10
1.[preferred] UMA models trained on a range of FAIR chemistry datasets, available at [HuggingFace](https://huggingface.co/facebook/UMA)
11
11
2. eSEN models trained only on OMol25, available at [HuggingFace](https://huggingface.co/facebook/OMol25/tree/main)
12
12
13
-
The UMA models will continue to be updated regularly and we expect those to remain the default and performant option for the forseeable future. The OMol25-only eSEN models are provided mostly as a base-line for models trained only on OMol25.
14
-
15
-
## License
16
-
17
-
Both models require users to agree to the FAIR Chemistry License as part of the HuggingFace model gating process.
13
+
The UMA models will continue to be updated regularly and we expect those to remain the default and performant option for the forseeable future. The OMol25-only eSEN models are provided mostly as a base-line for models trained only on OMol25.
18
14
19
15
## Citing
20
16
21
-
If you use the OMol25-trained eSEN models, please cite the following paper.
17
+
If you use the OMol25-trained eSEN models, please cite the following paper.
22
18
23
19
```bib
24
20
@misc{levine2025openmolecules2025omol25,
25
-
title={The Open Molecules 2025 (OMol25) Dataset, Evaluations, and Models},
21
+
title={The Open Molecules 2025 (OMol25) Dataset, Evaluations, and Models},
26
22
author={Daniel S. Levine and Muhammed Shuaibi and Evan Walter Clark Spotte-Smith and Michael G. Taylor and Muhammad R. Hasyim and Kyle Michel and Ilyes Batatia and Gábor Csányi and Misko Dzamba and Peter Eastman and Nathan C. Frey and Xiang Fu and Vahe Gharakhanyan and Aditi S. Krishnapriyan and Joshua A. Rackers and Sanjeev Raja and Ammar Rizvi and Andrew S. Rosen and Zachary Ulissi and Santiago Vargas and C. Lawrence Zitnick and Samuel M. Blau and Brandon M. Wood},
27
23
year={2025},
28
24
eprint={2505.08762},
29
25
archivePrefix={arXiv},
30
26
primaryClass={physics.chem-ph},
31
-
url={https://arxiv.org/abs/2505.08762},
27
+
url={https://arxiv.org/abs/2505.08762},
28
+
}
29
+
```
30
+
31
+
## Baseline models in the OMC25 paper
32
+
As part of the OMC25 release, we released eSEN model trained only on OMC25, available at [HuggingFace](https://huggingface.co/facebook/OMC25). [preferred] UMA models trained on a range of FAIR chemistry datasets are available at [HuggingFace](https://huggingface.co/facebook/UMA).
33
+
34
+
## Citing
35
+
36
+
We encourage users to cite this paper when using the OMC25 dataset or pretrained models for molecular crystals in their research.
title={Open Molecular Crystals 2025 (OMC25) Dataset and Models},
41
+
author={Vahe Gharakhanyan and Luis Barroso-Luque and Yi Yang and Muhammed Shuaibi and Kyle Michel and Daniel S. Levine and Misko Dzamba and Xiang Fu and Meng Gao and Xingyu Liu and Haoran Ni and Keian Noori and Brandon M. Wood and Matt Uyttendaele and Arman Boromand and C. Lawrence Zitnick and Noa Marom and Zachary W. Ulissi and Anuroop Sriram},
42
+
year={2025},
43
+
eprint={2508.02651},
44
+
archivePrefix={arXiv},
45
+
primaryClass={physics.chem-ph},
46
+
url={https://arxiv.org/abs/2508.02651},
32
47
}
33
48
```
49
+
50
+
## License
51
+
52
+
All models require users to agree to the FAIR Chemistry License as part of the HuggingFace model gating process.
0 commit comments