Skip to content

Commit e2ce0cf

Browse files
committed
Make checkpoint saving folder clear in the config (meta-pytorch#444)
1 parent 9499c29 commit e2ce0cf

File tree

5 files changed

+15
-10
lines changed

5 files changed

+15
-10
lines changed

apps/grpo/qwen3_1_7b.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -74,8 +74,9 @@ trainer:
7474
disable_loss_parallel: true
7575
checkpoint:
7676
enable: true
77-
initial_load_path: hf://${model}
78-
initial_load_in_hf: true
77+
folder: ./checkpoint # The folder to save checkpoints to.
78+
initial_load_path: hf://${model} # The path to load the initial checkpoint from. Ignored if `folder` exists.
79+
initial_load_in_hf: true # If true, interpret initial_load_path as a HuggingFace model repo
7980
last_save_in_hf: true
8081
interval: 500
8182
async_mode: "disabled"

apps/grpo/qwen3_32b.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -77,8 +77,9 @@ trainer:
7777
disable_loss_parallel: true
7878
checkpoint:
7979
enable: true
80-
initial_load_path: hf://${model}
81-
initial_load_in_hf: true
80+
folder: ./checkpoint # The folder to save checkpoints to.
81+
initial_load_path: hf://${model} # The path to load the initial checkpoint from. Ignored if `folder` exists.
82+
initial_load_in_hf: true # If true, interpret initial_load_path as a HuggingFace model repo
8283
last_save_in_hf: true
8384
interval: 500
8485
async_mode: "disabled"

apps/grpo/qwen3_8b.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -70,8 +70,9 @@ trainer:
7070
disable_loss_parallel: true
7171
checkpoint:
7272
enable: true
73-
initial_load_path: hf://${model}
74-
initial_load_in_hf: true
73+
folder: ./checkpoint # The folder to save checkpoints to.
74+
initial_load_path: hf://${model} # The path to load the initial checkpoint from. Ignored if `folder` exists.
75+
initial_load_in_hf: true # If true, interpret initial_load_path as a HuggingFace model repo
7576
last_save_in_hf: true
7677
interval: 500
7778
async_mode: "disabled"

apps/sft/llama3_8b.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45,8 +45,9 @@ parallelism:
4545

4646
checkpoint:
4747
enable: true
48-
initial_load_path: hf://${model_name}
49-
initial_load_in_hf: true
48+
folder: ./checkpoint # The folder to save checkpoints to.
49+
initial_load_path: hf://${model} # The path to load the initial checkpoint from. Ignored if `folder` exists.
50+
initial_load_in_hf: true # If true, interpret initial_load_path as a HuggingFace model repo
5051
last_save_in_hf: true
5152
interval: 500
5253
async_mode: "disabled"

apps/sft/qwen3_8b.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,8 +44,9 @@ parallelism:
4444

4545
checkpoint:
4646
enable: true
47-
initial_load_path: hf://${model_name}
48-
initial_load_in_hf: true
47+
folder: ./checkpoint # The folder to save checkpoints to.
48+
initial_load_path: hf://${model} # The path to load the initial checkpoint from. Ignored if `folder` exists.
49+
initial_load_in_hf: true # If true, interpret initial_load_path as a HuggingFace model repo
4950
last_save_in_hf: true
5051
interval: 500
5152
async_mode: "disabled"

0 commit comments

Comments
 (0)