You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[✔] Change marin tokenizer to map <|start_think|> and <|end_think|> to reserved tokens
[✔] Changed the adapter configuration to accept keyword replacement. This allows us to standardize different keywords to our marin standard. E.g., <think> --> <|start_think|> or <begin_think> --> <|start_think|>
[✔] Merged the latest executor and updated the functions.
[✔] Fixed lm-eval (vllm) script
Datasets
[✔] Converted and tokenized nvidia/Llama-Nemotron-Post-Training-Dataset-v1-SFT. This differes from the previous implementation in that we do not filter any data.
[✔] Converted and tokenized open-thoughts/OpenThoughts3-1.2M
[✔] Added download and adapter scripts for nvidia/Nemotron-Post-Training-Dataset-v2-SFT
Training and evaluation
[✔] Initialize from base model (tootsie_8b_deeper_starling: step-1419967) and fine-tuned on a mixture comprising of
the mixture from exp916 + Nemotron + openthought3. We trained for close to 2.5 epochs at the default learning rate of 1e-4 Wandb.
[✔] We evaluated on lm-eval. However, running the full suite is impractical as the runtime is long and the TPU instance gets pre-empted. Nicolo from HessianFree has helped to evaluate the model on his instance (reported below).
Artifacts
Scripts
Data download, conversion, and tokenization: exp905a
Results1. We see slight regressions compared to the prior marin-8b-instruct model. However, these benchmarks do not use/support thinking tokens, which is the whole point of adding openthoughts3.
Description
Source: https://x.com/kuchaev/status/1903118540519153724?s=46&t=cTanq0q3I5HBE3Uj2hYikw
This dataset has 15M samples and supports improvements of math, code, general reasoning, and instruction following capabilities HF Link: https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset-v1
What has been done:
Infrastructure
<|start_think|>and<|end_think|>to reserved tokens<think>--><|start_think|>or<begin_think>--><|start_think|>Datasets
nvidia/Llama-Nemotron-Post-Training-Dataset-v1-SFT. This differes from the previous implementation in that we do not filter any data.open-thoughts/OpenThoughts3-1.2Mnvidia/Nemotron-Post-Training-Dataset-v2-SFTTraining and evaluation
tootsie_8b_deeper_starling: step-1419967) and fine-tuned on a mixture comprising ofthe mixture from exp916 + Nemotron + openthought3. We trained for close to 2.5 epochs at the default learning rate of 1e-4 Wandb.
Artifacts
Scripts
See PR
Models & checkpoints
Results1. We see slight regressions compared to the prior marin-8b-instruct model. However, these benchmarks do not use/support thinking tokens, which is the whole point of adding openthoughts3.
Possible TODOs:
Related future TODOs: