Name	Name	Last commit message	Last commit date
parent directory ..
verl	verl
.gitignore	.gitignore
LICENSE	LICENSE
README.md	README.md
pyproject.toml	pyproject.toml
requirements.txt	requirements.txt
setup.py	setup.py
train_gdpo.sh	train_gdpo.sh
train_grpo.sh	train_grpo.sh

Name

Last commit message

Last commit date

verl

GDPO vs. GRPO on tool calling RL training (GDPO Implementation based on verl)

Installation

Please install torch, vllm and ray according to your own environment configuration.

# install torch
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
# install vllm
pip install vllm==0.6.3
pip install ray

Please further install the verl in the current project and flash attention.

# verl
cd verl
pip install -e .

# flash attention 2
pip install flash-attn --no-build-isolation

Dataset

Please first clone the ToolRL repo then copy the data to this folder

git clone git@github.com:qiancheng0/ToolRL.git
cp -r ToolRL/dataset verl-GDPO/

Training

Before starting, configure your API keys and Hugging Face cache path:

export WANDB_API_KEY="Your API KEY"
export HF_TOKEN="Your API KEY"
export HF_HOME="YOU HF CACHE ADDRESS"

For GRPO and GDPO training, please specify the configuration in train_gdpo.sh and train_grpo.sh

bash train_gdpo.sh # For GDPO Training
bash train_grpo.sh # For GRPO Training

FYI. training Qwen2.5-1.5B-Instruct on a single Node with 8xA100 takes about an hour to finish.

GDPO Implementation details

Please see line 175 in verl-GDPO/verl/trainer/ppo/ray_trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

GDPO vs. GRPO on tool calling RL training (GDPO Implementation based on verl)

Installation

Dataset

Training

GDPO Implementation details

FilesExpand file tree

verl-GDPO

Directory actions

More options

Directory actions

More options

Latest commit

History

verl-GDPO

Folders and files

parent directory

README.md

GDPO vs. GRPO on tool calling RL training (GDPO Implementation based on verl)

Installation

Dataset

Training

GDPO Implementation details