Skip to content

hengzzzhou/ReSo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

26 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

ReSo: A Reward-driven Self-organizing LLM-based Multi-Agent System for Reasoning Tasks

Paper HuggingFace Models HuggingFace Dataset


๐ŸŽ‰ News

  • [2025-08-21] ReSo has been accepted to EMNLP 2025! See you in Suzhou in November.

๐Ÿ“– Introduction

ReSo is a comprehensive framework for multi-step mathematical and scientific reasoning. It combines a self-organizing multi-agent architecture with reward-driven optimization to plan, solve, and refine solutions iteratively.

ReSo intro

Key capabilities:

  • Agent graph for task decomposition and collaboration
  • Reward modeling for iterative self-optimization
  • Modular LLM backends and configurable pipelines

ReSo pipeline

๐Ÿ“Š Main Results

ReSo achieves a 30% higher accuracy than other frameworks. Experimental results demonstrate the superior performance of ReSo on challenging tasks.

ReSo results

โœจ Getting Started

Prerequisites

  • Python 3.10+
  • CUDA-compatible GPU (recommended)
  • Git

Installation

  1. Clone the repository
git clone <repository-url>
cd ReSo/
  1. Create and activate environment
conda create -n ReSo python=3.10 -y
conda activate ReSo
pip install -r requirements.txt
  1. Configure API keys (optional, if using external LLMs)

Create and edit your environment file:

cp .env.template .env

Fill .env with your credentials:

# OpenAI
OAI_API_KEY=your_openai_api_key
OAI_BASE_URL=https://api.openai.com/v1

# Qwen
QWEN_API_KEY=your_qwen_api_key
QWEN_BASE_URL=your_qwen_base_url

# Claude
CLAUDE_API_KEY=your_claude_api_key
CLAUDE_BASE_URL=your_claude_base_url

# Gemini
GEMINI_API_KEY=your_gemini_api_key
GEMINI_BASE_URL=your_gemini_base_url

# DeepSeek
DEEPSEEK_API_KEY=your_deepseek_api_key
DEEPSEEK_BASE_URL=your_deepseek_base_url

Project Structure

ReSo/
โ”œโ”€โ”€ ReSo/                    # Core framework modules
โ”‚   โ”œโ”€โ”€ agent_graph/         # Agent graph implementation
โ”‚   โ”œโ”€โ”€ llm_agent/           # LLM agent components
โ”‚   โ”œโ”€โ”€ model/               # Custom model implementations
โ”‚   โ””โ”€โ”€ task_graph/          # Task graph management
โ”œโ”€โ”€ datasets/                # Data synthesis and storage
โ”‚   โ”œโ”€โ”€ data_gen.py          # Complex problem generator
โ”‚   โ”œโ”€โ”€ get_answer.py        # Answer extraction utilities
โ”‚   โ”œโ”€โ”€ sub_question/        # Base sub-question datasets
โ”‚   โ”œโ”€โ”€ MATH-MAS/            # MATH MAS datasets
โ”‚   โ””โ”€โ”€ Scibench-MAS/        # Science benchmark datasets
โ”œโ”€โ”€ experiments/             # Training and evaluation scripts
โ”œโ”€โ”€ reward_model/            # Reward model training & usage
โ”œโ”€โ”€ config.ini               # Model & agent configuration
โ”œโ”€โ”€ config_hyper.ini         # Training hyperparameters
โ””โ”€โ”€ requirements.txt         # Python dependencies

๐ŸŽฏ Usage

Training

Train on your dataset:

python experiments/train_ReSo.py --dataset_path <path_to_training_data>

Notes:

  • Configure training hyperparameters in config_hyper.ini.
  • Adjust model/agent settings in config.ini.

Evaluation

MATH-MAS benchmarks:

# Easy
python experiments/test_ReSo.py --dataset_path datasets/MATH-MAS/MATH-MAS-Easy.json --plan_mode gt

# Medium
python experiments/test_ReSo.py --dataset_path datasets/MATH-MAS/MATH-MAS-Medium.json --plan_mode gt

# Hard
python experiments/test_ReSo.py --dataset_path datasets/MATH-MAS/MATH-MAS-Hard.json --plan_mode gt

GSM8K:

python experiments/test_gsm8k.py --dataset_path <gsm8k_dataset_path>

Common flags:

  • --dataset_path: Path to dataset file
  • --plan_mode: Planning mode (gt for ground truth)
  • --random_select: Randomized selection (optional)
  • --error_tolerance: Error threshold (optional)

๐Ÿ“Š Data Generation

Create complex multi-step problems using the generator.

1) Prepare base sub-questions

Location: datasets/sub_question/

  • math_test.json (math)
  • scibench.json (science)

Each entry contains a prompt, answer, variables, and metadata.

2) Generate complex problems

python datasets/data_gen.py -n <num_questions> -c <complexity_level> [-o <output_file>]

Examples:

# 100 questions, 3 sub-questions
python datasets/data_gen.py -n 100 -c 3

# 50 questions, 5 sub-questions, custom output
python datasets/data_gen.py -n 50 -c 5 -o datasets/mixed/complex_dataset.json

3) How it works

  1. DAG construction for dependency structure
  2. Linking sub-questions via variables/answers
  3. Integration into a final composite task
  4. Validation for consistency and solvability

See datasets/README.md for details.

๐Ÿค– Pre-trained Models

We provide fine-tuned models on Hugging Face:

  • Plan model for multi-step planning
  • CRM (Critic-Reward Model) for evaluation and optimization

Browse: https://huggingface.co/henggg/ReSo/tree/main

๐Ÿ”ฌ Key Features

Multi-Agent Reasoning

  • Agent graph for structured collaboration
  • Automatic task decomposition
  • Coordinated solving across agents

Self-Optimization

  • Reward modeling for quality assessment
  • Iterative refinement and error detection
  • Supports custom reward models

Flexible Architecture

  • Modular design for new models/strategies
  • Multiple LLM providers (OpenAI, Claude, Gemini, Qwen, DeepSeek, etc.)
  • Configurable pipelines and behaviors

๐Ÿ“ˆ Performance

ReSo shows strong performance on MATH, GSM8K, and science benchmarks. Refer to the paper for full metrics.

๐Ÿ› ๏ธ Development

Add a new model

  1. Implement interface in ReSo/llm_agent/
  2. Add options in config.ini
  3. Register in ReSo/llm_agent/model_info.py

Custom reward models

  1. Define architecture in ReSo/model/
  2. Implement training in reward_model/train.py
  3. Add evaluation in reward_model/test.py

Extend data generation

  1. Add formats in datasets/sub_question/
  2. Update logic in datasets/data_gen.py
  3. Update parsing in datasets/get_answer.py

๐Ÿค Contributing

Issues and PRs are welcome. Please follow standard code style, add tests when changing behavior, and update docs when relevant.

๐Ÿ“จ Contact

๐ŸŽˆ Citation

If you find ReSo helpful, please cite our paper:

@article{zhou2025reso,
  title={ReSo: A reward-driven self-organizing llm-based multi-agent system for reasoning tasks},
  author={Zhou, Heng and Geng, Hejia and Xue, Xiangyuan and Kang, Li and Qin, Yiran and Wang, Zhiyong and Yin, Zhenfei and Bai, Lei},
  journal={arXiv preprint arXiv:2503.02390},
  year={2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

โšก