Skip to content

Commit 80d7345

Browse files
authored
Add Autoresearch + Trackio research example (#449)
1 parent 4681292 commit 80d7345

3 files changed

Lines changed: 1345 additions & 0 deletions

File tree

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
Run a series of autonomous ML experiments using the autoresearch training script (adapted from https://github.com/karpathy/autoresearch). The script trains a small GPT model for a fixed 5-minute wall-clock budget and reports val_bpb (validation bits per byte — lower is better).
2+
3+
The script has Trackio integrated for real-time monitoring and alerting. If the key metrics start behaving badly (loss spike, NaN, stagnation), a Trackio alert fires and the run terminates early to save compute (might be less than 5 minutes). Watch the alerts — they tell you which experiments failed and why.
4+
5+
Run the train_autoresearch.py script using Hugging Face Jobs, using my locally logged in Hugging Face token, like this:
6+
7+
hf jobs uv run \
8+
--flavor a100-large \
9+
--timeout 10m \
10+
--secrets HF_TOKEN \
11+
--with 'torch>=2.9' \
12+
--with 'kernels>=0.11.7' \
13+
--with pyarrow \
14+
--with rustbpe \
15+
--with tiktoken \
16+
--with trackio \
17+
--with requests \
18+
train_autoresearch.py
19+
20+
The first run should be the baseline — run it as-is without modifications.
21+
22+
After the baseline, start experimenting: modify hyperparameters (learning rates, batch size, weight decay, depth, etc.) or architecture choices in the HYPERPARAMETERS section of the script. Edit the file, then submit a new HF Job for each experiment.
23+
24+
For each experiment:
25+
1. Edit train_autoresearch.py (only the HYPERPARAMETERS section or model architecture)
26+
2. Submit the job with a descriptive --run-name (e.g. `--run-name "depth-12"`)
27+
3. Check the output for val_bpb or early termination alerts
28+
4. If val_bpb improved (lower), keep the change. If not, revert.
29+
30+
Key metrics to watch in the output:
31+
- `val_bpb:` — the main metric (lower is better)
32+
- `TERMINATED EARLY` — means Trackio detected bad metrics and killed the run
33+
- Trackio alerts print to stdout so you'll see them in the job logs
98.9 KB
Loading

0 commit comments

Comments
 (0)