Skip to content

Commit ea08011

Browse files
authored
add to toc (#1518)
1 parent 86c51ea commit ea08011

2 files changed

Lines changed: 6 additions & 5 deletions

File tree

docs/_toc.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ parts:
3737
- file: molecules/datasets/summary
3838
sections:
3939
- file: molecules/datasets/omol25
40+
- file: molecules/datasets/omol25_elec
4041
- file: molecules/datasets/omc25
4142
- file: molecules/models
4243
- file: molecules/leaderboard

docs/molecules/datasets/omol25_elec.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,18 @@
1-
# Open Molecule 2025 Electronic Structures Dataset
1+
# OMol25 Electronic Structures
22

33
The Open Molecules 2025 (OMol25) dataset represents the largest dataset of its kind, with more than 100 million density functional theory (DFT) calculations at the ωB97M-V/def2-TZVPD level of theory, spanning several chemical domains including small molecules, biomolecules, metal complexes, and electrolytes.
44

55
At release, the OMol25 dataset provided structure energies, per-atom forces, and Lowdin/Mulliken charges and spins, where available. These properties were sufficient to train state-of-the-art machine learning interatomic potentials (MLIPs) and are already demonstrating incredible performance across a wide range of applications. However, to maximize the community benefit of these calculations, we have partnered with the [Department of Energy’s Argonne National Laboratory](https://www.anl.gov/) to provide access to the raw DFT outputs and additional files for the OMol25 dataset.
66

7-
By releasing the ORCA output files, users will be able to parse NBO orbital/bonding information, reduced orbital populations, Fock matrices, and more. By releasing the ORCA GBW files, users will be able to run electronic structure post-processing in order to obtain higher quality partial charges and partial spins and a variety of more advanced electronic features that could be extremely valuable for physics-informed ML models. Finally, the release will provide critical high quality data for nascent ML models that train directly on electron densities.
7+
By releasing the [ORCA](https://www.faccts.de/docs/orca/6.0/manual/) output files, users will be able to parse NBO orbital/bonding information, reduced orbital populations, Fock matrices, and more. By releasing the ORCA GBW files, users will be able to run electronic structure post-processing in order to obtain higher quality partial charges and partial spins and a variety of more advanced electronic features that could be extremely valuable for physics-informed ML models. Finally, the release will provide critical high quality data for nascent ML models that train directly on electron densities.
88

99
## Data Description
1010

11-
The OMol25 dataset is broken into several training splits - All and 4M. The 4M split corresponds to a randomly sampled 4M subset of the full OMol25 dataset. Given the size of the full dataset, O(petabytes), we are first releasing all electronic structure and ORCA output data for the 4M split. Based on community interest, we will work to provide the full dataset.
11+
The OMol25 dataset is broken into several training splits - All and 4M. The 4M split corresponds to a randomly sampled 4M subset of the full OMol25 dataset. Given the size of the full dataset, O(petabytes), we are first releasing all electronic structure and ORCA output data for the 4M split. Based on community interest, we will work to provide the full dataset.
1212

1313
For each calculation, the following data is available:
1414

15-
* **orca.tar.zst**: Bundle of the raw ORCA outputs - including (orca.out, orca.inp orca.engrad, orca_property.txt, orca.xyz). To open:
15+
* **orca.tar.zst**: Bundle of the raw [ORCA](https://www.faccts.de/docs/orca/6.0/manual/) outputs - including (orca.out, orca.inp orca.engrad, orca_property.txt, orca.xyz). To open:
1616

1717
```
1818
>> tar --zstd -xvf orca.tar.zst
@@ -61,7 +61,7 @@ argonne_paths = []
6161
for idx in indices:
6262
# ASE Atoms object that can be visualized/examined
6363
atoms = dataset.get_atoms(idx)
64-
# Check if this is a system you care about.
64+
# Check if this is a system you care about.
6565
is_relevant = is_atoms_object_relevant(atoms)
6666
if is_relevant:
6767
# Extract the relative path that matches the Argonne cluster

0 commit comments

Comments
 (0)