You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implement training and evaluation orchestration for AlphaFold2
- Added test for training and evaluation metrics preference when both loaders are present.
- Introduced training utilities for optimization, metrics, checkpointing, and runtime in `training/__init__.py`.
- Developed checkpoint save and restore utilities in `training/checkpoints.py`, including model serialization and state tracking.
- Created efficient metrics computation for RMSD, TM-score, and GDT-TS in `training/efficient_metrics.py`.
- Implemented evaluation utilities for AlphaFold2-like runs in `training/eval_one_epoch.py`.
- Added parallel training helpers for data, model, and hybrid execution modes in `training/train_parallel/__init__.py`.
- Developed distributed and data-parallel helpers for multi-GPU training in `training/train_parallel/data_parallel.py`.
- Created two-stage model-parallel wrappers for the AlphaFold2 model in `training/train_parallel/model_parallel.py`.
@@ -70,7 +71,7 @@ To make experimentation easier to reproduce, the repository follows a **manifest
70
71
-**Config-Driven Experiments:** Main settings such as model size, depth, learning rate, and EMA can be adjusted through YAML files.
71
72
-**Feature-Rich Loader:** The current dataloader returns sequence/MSA tensors plus `extra_msa_feat`, `extra_msa_mask`, `template_angle_feat`, `template_pair_feat`, and `template_mask` when those artifacts are present in the Foldbench assets.
72
73
-**Data Inspection Utilities:** Provides simple CLI tools to inspect manifests, preview A3M files, and visualize CA distance maps before training.
73
-
-**Notebook-Friendly Workflow:** The main walkthrough notebook is [Alpha_Fold_English.ipynb](notebooks/Alpha_Fold_English.ipynb), and a local training-focused version is available in [notebooks\train_model_setup_examples.ipynb](notebooks/train_model_local.ipynb).
74
+
-**Notebook-Friendly Workflow:** The main walkthrough notebook is [Alpha_Fold_English.ipynb](notebooks/Alpha_Fold_English.ipynb), and a local training-focused walkthrough is available in [train_model_setup_examples.ipynb](notebooks/train_model_setup_examples.ipynb).
74
75
75
76
---
76
77
@@ -84,7 +85,8 @@ To make experimentation easier to reproduce, the repository follows a **manifest
84
85
├── data/ # manifest-based data pipeline plus a tiny bundled showcase subset
85
86
│ ├── download_data.sh
86
87
│ ├── foldbench.py
87
-
│ ├── preproces_data.py
88
+
│ ├── preprocess_data.py
89
+
│ ├── loader_wrappers.py
88
90
│ ├── dataloaders.py
89
91
│ ├── collate_proteins.py
90
92
│ ├── visualize_data.py
@@ -94,7 +96,7 @@ To make experimentation easier to reproduce, the repository follows a **manifest
94
96
│ └── losses/
95
97
├── training/ # single-device training loop, ablation registry, AMP, EMA, checkpoints, and metrics
96
98
│ ├── ablations/ # predefined architecture and loss ablation presets
97
-
│ └── train_paralel/ # DDP and model-parallel helpers
99
+
│ └── train_parallel/ # DDP and model-parallel helpers
98
100
├── scripts/ # operational CLIs for data prep, validation, and training
99
101
│ ├── prepare_data.py
100
102
│ ├── inspect_data.py
@@ -108,6 +110,7 @@ To make experimentation easier to reproduce, the repository follows a **manifest
108
110
├── notebooks/ # interactive experiments for Colab or local exploration
109
111
├── paper/ # reference material from the AlphaFold paper and notes
110
112
├── assets/ # README visuals and showcase media
113
+
├── pyproject.toml
111
114
├── requirements.txt
112
115
├── Dockerfile
113
116
└── README.md
@@ -116,7 +119,8 @@ To make experimentation easier to reproduce, the repository follows a **manifest
116
119
### Key files
117
120
118
121
-[data/download_data.sh](data/download_data.sh) — downloads the Foldbench subset from a target list or CSV input.
119
-
-[data/preproces_data.py](data/preproces_data.py) — rebuilds manifests, normalizes local paths, and emits YAML summaries.
122
+
-[data/preprocess_data.py](data/preprocess_data.py) — rebuilds manifests, normalizes local paths, and emits YAML summaries.
123
+
-[data/loader_wrappers.py](data/loader_wrappers.py) — convenience builders for plain dataloaders and deterministic train/eval splits over one dataset.
120
124
-[data/dataloaders.py](data/dataloaders.py) — dataset layer that maps manifests, mmCIF structures, MSA files, and torsion targets into tensors.
121
125
-[scripts/prepare_data.py](scripts/prepare_data.py) — high-level CLI for downloading data, refreshing manifests, and smoke-testing loaders.
122
126
-[model/alphafold2.py](model/alphafold2.py) — top-level AlphaFold2-like model that wires embeddings, Evoformer, structure, recycling, and heads.
@@ -125,11 +129,12 @@ To make experimentation easier to reproduce, the repository follows a **manifest
125
129
-[model/alphafold2_full_loss.py](model/alphafold2_full_loss.py) — full training loss orchestrator combining FAPE, distogram, pLDDT, and torsion supervision.
126
130
-[model/losses/](model/losses/) — component losses and helpers for geometry-aware supervision.
127
131
-[training/train_one_epoch.py](training/train_one_epoch.py) — per-epoch optimization routine with AMP, recycling, logging, and metric collection.
132
+
-[training/eval_one_epoch.py](training/eval_one_epoch.py) — evaluation loop that mirrors training-time logging without optimizer steps.
128
133
-[training/train_alphafold2.py](training/train_alphafold2.py) — full training orchestrator for checkpointing, resume, monitoring, and epoch scheduling.
129
134
-[training/ablations/catalog.py](training/ablations/catalog.py) — registry of prebuilt architecture and loss ablations resolved on top of a base experiment config.
130
135
-[training/ablations/runtime.py](training/ablations/runtime.py) — resolves baseline or named ablations into a safe config variant without changing the default training path.
131
-
-[training/train_paralel/data_parallel.py](training/train_paralel/data_parallel.py) — DDP utilities, distributed samplers, and rank synchronization helpers.
132
-
-[training/train_paralel/model_parallel.py](training/train_paralel/model_parallel.py) — two-stage model-parallel wrapper for splitting AlphaFold2 across GPUs.
136
+
-[training/train_parallel/data_parallel.py](training/train_parallel/data_parallel.py) — DDP utilities, distributed samplers, and rank synchronization helpers.
137
+
-[training/train_parallel/model_parallel.py](training/train_parallel/model_parallel.py) — two-stage model-parallel wrapper for splitting AlphaFold2 across GPUs.
133
138
-[scripts/train_model.py](scripts/train_model.py) — standard config-driven single-device training launcher.
134
139
-[scripts/train_parallel.py](scripts/train_parallel.py) — multi-GPU launcher for DDP, model parallelism, and hybrid setups.
135
140
-[scripts/train_ablation.py](scripts/train_ablation.py) — single-device launcher for named architecture and loss ablations.
@@ -150,6 +155,9 @@ The repository includes a tiny downloaded test subset under [data/af_subset_show
150
155
python3 -m venv .venv
151
156
source .venv/bin/activate
152
157
pip install -r requirements.txt
158
+
159
+
# Editable install with package metadata and CLI entry points
The full notebook [notebooks/train_model_local.ipynb](notebooks/train_model_local.ipynb) exposes many knobs, but the smallest useful training setup looks like this:
214
+
The full notebook [notebooks/train_model_setup_examples.ipynb](notebooks/train_model_setup_examples.ipynb) exposes many knobs, but the smallest useful training setup looks like this:
207
215
208
216
```python
209
217
import torch
@@ -443,7 +451,7 @@ Low-VRAM preset for Colab-class GPUs in the `15-20 GB` range, using a reduced tr
443
451
444
452
This file is a **reference document**, not a statement that the current code already consumes every field end-to-end.
445
453
446
-
Its role is to provide a structured target for future extension and to document the broader AlphaFold/OpenFold design space.
454
+
Its role is to provide a structured target for future extension and to document the broader AlphaFold/OpenFold design space. It also includes a `current_repo_alignment` section that maps the nested reference schema to the flat config fields consumed by the current codebase.
0 commit comments