Question: Multi-stage RL worth trying?

DeepScaler demonstrates that incrementally increasing context length (8K -> 16K -> 24K) during GPT training allows even small 1.5B parameter LLMs to outperform O1 models on mathematical tasks while significantly reduce the FLOPs compared to full-context RL. Their multi-stage reinforcement learning training approach might be worth experimenting with for our project?

https://github.com/agentica-project/deepscaler

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Multi-stage RL worth trying? #50

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question: Multi-stage RL worth trying? #50

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions