Please install torch, vllm and ray according to your own environment configuration.
# install torch
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
# install vllm
pip install vllm==0.6.3
pip install ray
Please further install the verl in the current project and flash attention.
# verl
cd verl
pip install -e .
# flash attention 2
pip install flash-attn --no-build-isolation
Please first clone the ToolRL repo then copy the data to this folder
git clone git@github.com:qiancheng0/ToolRL.git
cp -r ToolRL/dataset verl-GDPO/
Before starting, configure your API keys and Hugging Face cache path:
export WANDB_API_KEY="Your API KEY"
export HF_TOKEN="Your API KEY"
export HF_HOME="YOU HF CACHE ADDRESS"
For GRPO and GDPO training, please specify the configuration in train_gdpo.sh and train_grpo.sh
bash train_gdpo.sh # For GDPO Training
bash train_grpo.sh # For GRPO Training
FYI. training Qwen2.5-1.5B-Instruct on a single Node with 8xA100 takes about an hour to finish.
Please see line 175 in verl-GDPO/verl/trainer/ppo/ray_trainer.py
