Skip to content

Commit 6ba77f5

Browse files
authored
Merge branch 'main' into formation-energy-calculator
2 parents 2e54dd7 + 8255e37 commit 6ba77f5

15 files changed

Lines changed: 309 additions & 23 deletions

File tree

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
include-paths:
2+
- src/fairchem/data/omc
3+
- packages/fairchem-data-omc
4+
tag-prefix: fairchem_data_omc
5+
tag-template: 'fairchem_data_omc-$RESOLVED_VERSION'
6+
name-template: 'fairchem_data_omc-$RESOLVED_VERSION'
7+
exclude-contributors: [github-actions]
8+
categories:
9+
- title: New Features / Enhancements
10+
labels: [enhancement]
11+
- title: Bug Fixes
12+
labels: [bug]
13+
- title: Documentation
14+
labels: [documentation]
15+
- title: Tests
16+
labels: [test]
17+
- title: Deprecations
18+
labels: [deprecation]
19+
- title: Dependencies
20+
labels: [dependencies]
21+
- title: Other Changes
22+
labels: ["*"]
23+
version-resolver:
24+
major:
25+
labels:
26+
- 'major'
27+
minor:
28+
labels:
29+
- 'minor'
30+
patch:
31+
labels:
32+
- 'patch'
33+
default: patch
34+
template: |
35+
## What’s Changed
36+
37+
$CHANGES

.github/workflows/build.yml

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ jobs:
2626
- name: Build
2727
run: |
2828
# add packages that are supposed to be built to this list
29-
for package in fairchem-core fairchem-data-oc fairchem-demo-ocpapi fairchem-applications-cattsunami fairchem-data-omol fairchem-data-omat fairchem-lammps
29+
for package in fairchem-core fairchem-data-oc fairchem-demo-ocpapi fairchem-applications-cattsunami fairchem-data-omol fairchem-data-omat fairchem-data-omc fairchem-lammps
3030
do
3131
pushd packages/$package
3232
hatch build
@@ -68,5 +68,11 @@ jobs:
6868
- name: Upload omat artifact
6969
uses: actions/upload-artifact@v5
7070
with:
71-
name: dist-omat
72-
path: dist-omat/*
71+
name: dist-data-omat
72+
path: dist-data-omat/*
73+
74+
- name: Upload omc artifact
75+
uses: actions/upload-artifact@v5
76+
with:
77+
name: dist-data-omc
78+
path: dist-data-omc/*
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
name: Release Drafter - fairchem-data-omc
2+
3+
on:
4+
push:
5+
branches:
6+
- main
7+
paths:
8+
- 'src/fairchem/data/omc/**'
9+
- 'packages/fairchem-data-omc/**'
10+
workflow_dispatch:
11+
12+
permissions:
13+
contents: read
14+
15+
jobs:
16+
update_release_draft:
17+
permissions:
18+
# write permission is required to create a github release
19+
contents: write
20+
pull-requests: read
21+
runs-on: ubuntu-latest
22+
steps:
23+
- uses: release-drafter/release-drafter@v6
24+
with:
25+
disable-autolabeler: true
26+
config-name: release-drafter-data-omc.yml
27+
env:
28+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

.github/workflows/release.yml

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -178,3 +178,53 @@ jobs:
178178
packages-dir: dist-lammps/
179179
skip-existing: true
180180
verbose: true
181+
182+
release-data-omat:
183+
needs: [ build ]
184+
runs-on: ubuntu-latest
185+
if: |
186+
( github.event.inputs.release-pypi == 'true' && startsWith(github.ref_name, 'fairchem_data_omat-') ) || startsWith(github.event.release.tag_name, 'fairchem_data_omat-')
187+
188+
environment:
189+
name: pypi
190+
url: https://pypi.org/p/fairchem-data-omat/
191+
192+
permissions:
193+
id-token: write
194+
195+
steps:
196+
- uses: actions/download-artifact@v6
197+
with:
198+
name: dist-data-omat
199+
path: dist-data-omat
200+
201+
- uses: pypa/gh-action-pypi-publish@release/v1
202+
with:
203+
verbose: true
204+
packages-dir: dist-data-omat/
205+
skip-existing: true
206+
207+
release-data-omc:
208+
needs: [ build ]
209+
runs-on: ubuntu-latest
210+
if: |
211+
( github.event.inputs.release-pypi == 'true' && startsWith(github.ref_name, 'fairchem_data_omc-') ) || startsWith(github.event.release.tag_name, 'fairchem_data_omc-')
212+
213+
environment:
214+
name: pypi
215+
url: https://pypi.org/p/fairchem-data-omc/
216+
217+
permissions:
218+
id-token: write
219+
220+
steps:
221+
- uses: actions/download-artifact@v6
222+
with:
223+
name: dist-data-omc
224+
path: dist-data-omc
225+
226+
- uses: pypa/gh-action-pypi-publish@release/v1
227+
with:
228+
verbose: true
229+
packages-dir: dist-data-omc/
230+
skip-existing: true

docs/_toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,10 +51,12 @@ parts:
5151
- file: catalysts/datasets/summary
5252
sections:
5353
- file: catalysts/datasets/oc20
54+
- file: catalysts/datasets/oc20_mads
5455
- file: catalysts/datasets/oc22
5556
- file: catalysts/datasets/oc20dense
5657
- file: catalysts/datasets/oc20neb
5758
- file: catalysts/datasets/ocx24
59+
- file: catalysts/datasets/oc25
5860
- file: catalysts/models
5961
- file: catalysts/examples_tutorials/summary
6062
sections:
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
2+
# Open Catalyst 2020 Multi-Adsorbate (mAds) Dataset
3+
4+
## Overview
5+
The OC20-mAds dataset is a training set expanding the original OC20 dataset to include multi-adsorbate and coverage effects on catalyst surfaces. Adsorbates are randomly sampled from the list of OC20 adsorbates, up to 5 maximum adsorbates. For a small fraction of the dataset, all adsorbates on the surface may be identical. OC20-mAds is introduced in the [UMA paper](https://arxiv.org/pdf/2506.23971).
6+
## File Contents and Download
7+
|Splits |Size | MD5 checksum (download link) |
8+
|--- |--- |--- |
9+
|Train | 21,804,758 | [6435960ba5ad1a7c949bd2f2b51825bc](https://dl.fbaipublicfiles.com/opencatalystproject/data/oc20mAds/oc20_multiads_train.tar.gz) |
10+
11+
The following metadata can be accessed in the respective `atoms.info` entry:
12+
13+
- `bulk_id`: Bulk identifier
14+
- `millers`: 3-tuple of integers indicating the Miller indices of the surface.
15+
- `shift`: C-direction shift used to determine cutoff for the surface (c-direction is following the nomenclature from Pymatgen).
16+
- `top`: Boolean indicating whether the chosen surface was at the top or bottom of the originally enumerated surface.
17+
- `adsorbates`: List of adsorbates sampled and their respective placements.
18+
- `sid`: Unique system identifier.
19+
- `fid`: Frame index along the relaxation/AIMD trajectory.
20+
- `results_path`: Internal results location.
21+
- `fmax`: Max per-atom force.

src/fairchem/core/launchers/cluster/ray_cluster.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -326,6 +326,7 @@ def __init__(
326326
def start_head(
327327
self,
328328
requirements: dict[str, int | str],
329+
name: str = "default",
329330
executor: str = "slurm",
330331
payload: Optional[Callable[..., PayloadReturnT]] = None,
331332
**kwargs,
@@ -341,7 +342,7 @@ def start_head(
341342
cluster=executor,
342343
)
343344
s_executor.update_parameters(
344-
name=f"ray_head_{self.state.cluster_id}", # TODO name should probably include more details (cluster_id)
345+
name=f"ray_head_{name}_{self.state.cluster_id}", # TODO name should probably include more details (cluster_id)
345346
**requirements,
346347
)
347348
head_job = s_executor.submit(
@@ -360,6 +361,7 @@ def start_workers(
360361
self,
361362
num_workers: int,
362363
requirements: dict[str, int | str],
364+
name: str = "default",
363365
executor: str = "slurm",
364366
) -> list[str]:
365367
"""
@@ -370,7 +372,7 @@ def start_workers(
370372
# start the workers
371373
s_executor = submitit.AutoExecutor(folder=str(self.log_dir), cluster=executor)
372374
s_executor.update_parameters(
373-
name=f"ray_worker_{self.num_worker_groups}_{self.state.cluster_id}", # TODO name should probably include more details (cluster_id)
375+
name=f"ray_worker_{name}_{self.num_worker_groups}_{self.state.cluster_id}", # TODO name should probably include more details (cluster_id)
374376
**requirements,
375377
)
376378

src/fairchem/core/launchers/ray_on_slurm_launch.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -202,6 +202,7 @@ def ray_on_slurm_launch(config: DictConfig, log_dir: str):
202202

203203
all_job_ids = []
204204
head_job_id = cluster.start_head(
205+
name=config.job.run_name,
205206
requirements=cluster_reqs
206207
| {
207208
"nodes": 1,
@@ -220,6 +221,7 @@ def ray_on_slurm_launch(config: DictConfig, log_dir: str):
220221
if worker_nodes > 0:
221222
worker_ids = cluster.start_workers(
222223
1,
224+
name=config.job.run_name,
223225
requirements=cluster_reqs
224226
| {
225227
"nodes": worker_nodes,

src/fairchem/core/models/uma/escn_md.py

Lines changed: 53 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,51 @@
4848
from fairchem.core.datasets.atomic_data import AtomicData
4949

5050

51-
ESCNMD_DEFAULT_EDGE_CHUNK_SIZE = 1024 * 128
51+
ESCNMD_DEFAULT_EDGE_ACTIVATION_CHECKPOINT_CHUNK_SIZE = 1024 * 128
52+
53+
54+
def add_n_empty_edges(graph_dict: dict, edges_to_add: int, cutoff: float):
55+
graph_dict["edge_index"] = torch.cat(
56+
(
57+
graph_dict["edge_index"].new_ones(2, edges_to_add)
58+
* graph_dict["node_offset"],
59+
graph_dict["edge_index"],
60+
),
61+
dim=1,
62+
)
63+
64+
self_edge_distance_vec = graph_dict["edge_distance_vec"].new_ones(1, 3) + cutoff
65+
graph_dict["edge_distance_vec"] = torch.cat(
66+
(
67+
self_edge_distance_vec.expand(edges_to_add, 3),
68+
graph_dict["edge_distance_vec"],
69+
),
70+
dim=0,
71+
)
72+
73+
edge_distance = torch.linalg.norm(self_edge_distance_vec, dim=-1, keepdim=False)
74+
graph_dict["edge_distance"] = torch.cat(
75+
(edge_distance.expand(edges_to_add), graph_dict["edge_distance"]), dim=0
76+
)
77+
78+
79+
@torch.compiler.disable
80+
def pad_edges(graph_dict, edge_chunk_size: int, cutoff: float):
81+
n_edges = n_edges_post = graph_dict["edge_index"].shape[1]
82+
83+
if edge_chunk_size > 0 and n_edges_post % edge_chunk_size != 0:
84+
# make sure we have a multiple of self.edge_chunk_size edges
85+
n_edges_post += edge_chunk_size - n_edges_post % edge_chunk_size
86+
87+
n_edges_post = max(n_edges_post, 1) # at least 1 edge to avoid empty "edge" case
88+
if n_edges_post > n_edges:
89+
# We append synthetic padding edges whose distance vector has norm > cutoff
90+
# (see add_n_empty_edges where distance_vec is set to 1+cutoff). The radial
91+
# polynomial envelope returns 0 for distances >= cutoff, so these edges never
92+
# contribute to embeddings or message passing; they only ensure the edge count
93+
# is a multiple of edge_chunk_size (or at least one edge), aiding chunked
94+
# activation checkpointing and avoiding empty tensor edge cases.
95+
add_n_empty_edges(graph_dict, n_edges_post - n_edges, cutoff)
5296

5397

5498
@registry.register_model("escnmd_backbone")
@@ -88,6 +132,7 @@ def __init__(
88132
use_cuda_graph_wigner: bool = False,
89133
radius_pbc_version: int = 1,
90134
always_use_pbc: bool = True,
135+
edge_chunk_size: int | None = None,
91136
) -> None:
92137
super().__init__()
93138
self.max_num_elements = max_num_elements
@@ -116,7 +161,10 @@ def __init__(
116161
activation_checkpoint_chunk_size = None
117162
if activation_checkpointing:
118163
# The size of edge blocks to use in activation checkpointing
119-
activation_checkpoint_chunk_size = ESCNMD_DEFAULT_EDGE_CHUNK_SIZE
164+
activation_checkpoint_chunk_size = (
165+
ESCNMD_DEFAULT_EDGE_ACTIVATION_CHECKPOINT_CHUNK_SIZE
166+
)
167+
self.edge_chunk_size = edge_chunk_size
120168

121169
# related to charge spin dataset system embedding
122170
self.chg_spin_emb_type = chg_spin_emb_type
@@ -401,6 +449,9 @@ def _generate_graph(self, data_dict):
401449
]
402450
data_dict["batch"] = data_dict["batch_full"][graph_dict["node_partition"]]
403451

452+
if self.edge_chunk_size is not None:
453+
pad_edges(graph_dict, self.edge_chunk_size, self.cutoff)
454+
404455
return graph_dict
405456

406457
@conditional_grad(torch.enable_grad())
@@ -533,7 +584,6 @@ def _init_gp_partitions(self, graph_dict, atomic_numbers_full):
533584
torch.arange(len(atomic_numbers_full)).to(atomic_numbers_full.device),
534585
gp_utils.get_gp_world_size(),
535586
)[gp_utils.get_gp_rank()]
536-
537587
assert (
538588
node_partition.numel() > 0
539589
), "Looks like there is no atoms in this graph paralell partition. Cannot proceed"
@@ -551,7 +601,6 @@ def _init_gp_partitions(self, graph_dict, atomic_numbers_full):
551601
graph_dict["edge_distance_vec"] = graph_dict["edge_distance_vec"][
552602
edge_partition
553603
]
554-
555604
return graph_dict
556605

557606
@property

src/fairchem/core/units/mlip_unit/api/inference.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,8 @@ class InferenceSettings:
8787
# Number of internal torch threads to use for inference
8888
torch_num_threads: int | None = None
8989

90+
edge_chunk_size: int | None = None
91+
9092

9193
# this is most general setting that works for most systems and models,
9294
# not optimized for speed

0 commit comments

Comments
 (0)