docs: clarify RL training GPU requirements by model config (#764)

kibitzing · web-flow · commit d3eb3bfd7c02 · 2026-02-23T13:25:44.000-05:00
diff --git a/README.md b/README.md
@@ -79,7 +79,7 @@ pixi run install
 
 > **Note:** We are actively working on enabling pure `uv` installation. Currently, Conda is the recommended approach. `uv` support is not fully working at the moment but is being tracked in [issue #494](https://github.com/meta-pytorch/torchforge/issues/494).
 
-After install, you can run the following command and should see output confirming GRPO training is running (you need a minimum 3 GPU devices):
+After install, you can run the following command and should see output confirming GRPO training is running (you need a minimum 2 GPU devices):
 
 ```
 python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml
diff --git a/apps/grpo/README.md b/apps/grpo/README.md
@@ -40,12 +40,12 @@ Since each can covers 2 square meters, we need to divide the total wall area by
 
 ## Quick Start
 
-**Llama 3.1 8B** (recommended for learning, requires 5 GPUs as is, not optimized):
+**Llama 3.1 8B** (recommended for learning, requires 4 GPUs as is, not optimized):
 ```bash
 python -m apps.grpo.main --config apps/grpo/llama3_8b.yaml
 ```
 
-**Qwen3 1.7B** (NOTE: Qwen3 is already saturated on GSM8K, so rewards will **not** increase. Requires 3 GPUs, not optimized):
+**Qwen3 1.7B** (NOTE: Qwen3 is already saturated on GSM8K, so rewards will **not** increase. Requires 2 GPUs, not optimized):
 ```bash
 python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml
 ```
diff --git a/docs/source/getting_started.md b/docs/source/getting_started.md
@@ -11,7 +11,7 @@ Before installing TorchForge, ensure your system meets the following requirement
 | **Operating System** | Linux (Fedora/Ubuntu/Debian) | MacOS and Windows not currently supported |
 | **Python** | 3.10 or higher | Python 3.11 recommended |
 | **GPU** | NVIDIA with CUDA support | AMD GPUs not currently supported |
-| **Minimum GPUs** | 2+ for SFT, 3+ for GRPO | More GPUs enable larger models |
+| **Minimum GPUs** | 2+ for SFT; 2+ for GRPO | More GPUs enable training larger models; GRPO with KL (`beta > 0`) requires a reference model and increases the GPU requirement. |
 | **CUDA** | 12.8 | Required for GPU training |
 | **RAM** | 32GB+ recommended | Depends on model size |
 | **Disk Space** | 50GB+ free | For models, datasets, and checkpoints |
@@ -150,7 +150,7 @@ hf download meta-llama/Meta-Llama-3.1-8B-Instruct --local-dir /tmp/Meta-Llama-3.
 uv run forge run --nproc_per_node 2 \
   apps/sft/main.py --config apps/sft/llama3_8b.yaml
 
-# Run GRPO training (requires 3+ GPUs)
+# Run GRPO training (requires 2+ GPUs)
 python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml
 ```
 
@@ -181,16 +181,15 @@ Fine-tune Llama 3 8B on your data. **Requires: 2+ GPUs**
 
 ### Example 2: GRPO Training
 
-Train a model using reinforcement learning with GRPO. **Requires: 3+ GPUs**
+Train a model using reinforcement learning with GRPO. **Requires: 2+ GPUs**
 
 ```bash
 python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml
 ```
 
 **What's Happening:**
 - GPU 0: Trainer model (being trained, powered by TorchTitan)
-- GPU 1: Reference model (frozen baseline, powered by TorchTitan)
-- GPU 2: Policy model (scoring outputs, powered by vLLM)
+- GPU 1: Policy model (scoring outputs, powered by vLLM)
 - **Monarch** orchestrates all three components
 - **TorchStore** handles weight synchronization from training to inference
 
diff --git a/docs/source/index.md b/docs/source/index.md
@@ -185,7 +185,7 @@ Before starting significant work, signal your intention in the issue tracker to
   [Monarch](https://meta-pytorch.org/monarch), [vLLM](https://docs.vllm.ai/en/latest/),
   and [TorchTitan](https://github.com/pytorch/torchtitan).
 * **Multi-GPU Support**: Designed for distributed training
-  with minimum 3 GPU requirement for GRPO training
+  with minimum 2 GPU requirement for GRPO training
 * **Model Support**: Includes pre-configured setups for popular models
   like Llama3 8B and Qwen3.1 7B