Skip to content

Commit 6beaa25

Browse files
committed
changes
1 parent b3d76cc commit 6beaa25

2 files changed

Lines changed: 18 additions & 6 deletions

File tree

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,11 @@
11
Run a series of experiments sequentially as an autonomous machine learning researcher. Start with learning rates of 1, then 0.5, then 0.1, and so on. The idea is to find the largest learning rate that doesn't lead to wild oscillations in validation loss. So keep watching the Trackio Alerts. If you see instability, then just terminate the job and lower the learning rate, and keep going until you have stable training.
22

3-
Run the train_nanogpt.py script using Hugging Face Jobs, using my locally logged in Hugging Face token.
3+
Run the train_nanogpt.py script using Hugging Face Jobs, using my locally logged in Hugging Face token, like this:
4+
5+
hf jobs uv run \
6+
--flavor a100-large \
7+
--timeout 10m \
8+
--secrets HF_TOKEN \
9+
--with torch \
10+
--with numpy \
11+
train_nanogpt.py

autonomous-experiments/01_finding_the_best_learning_rate/train_nanogpt.py

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,15 @@
2222
Data: FineWeb (pre-tokenized with GPT-2 tokenizer, auto-downloaded from HF Hub).
2323
Downloads ~1.8GB for 9 training shards (~900M tokens) + validation.
2424
25-
Examples:
26-
python train_nanogpt.py # default: Muon + compile
27-
python train_nanogpt.py --optimizer adamw # compare with AdamW
28-
python train_nanogpt.py --max_steps 10000 # train longer
29-
python train_nanogpt.py --batch_size 32 --no_compile # debug without compile
25+
Run with Hugging Face Jobs like this:
26+
27+
hf jobs uv run \
28+
--flavor a100-large \
29+
--timeout 10m \
30+
--secrets HF_TOKEN \
31+
--with torch \
32+
--with numpy \
33+
train_nanogpt.py
3034
"""
3135

3236
import glob

0 commit comments

Comments
 (0)