Skip to content

Commit ab840ed

Browse files
authored
Update Tutorial Docs: PolicyActor -> Policy Actor (#436)
1 parent 75df074 commit ab840ed

File tree

3 files changed

+19
-19
lines changed

3 files changed

+19
-19
lines changed

docs/source/tutorial_sources/zero-to-forge/1_RL_and_Forge_Fundamentals.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ graph LR
8888
subgraph Services["TorchForge Services (Real Classes)"]
8989
direction TB
9090
S1["DatasetActor"]
91-
S2["Policy"]
91+
S2["Generator"]
9292
S3["RewardActor"]
9393
S4["ReferenceModel"]
9494
S5["ReplayBuffer"]
@@ -290,7 +290,7 @@ TorchForge handles behind the scenes:
290290
### Independent Scaling
291291
```python
292292

293-
from forge.actors.policy import Policy
293+
from forge.actors.generator import Generator as Policy
294294
from forge.actors.replay_buffer import ReplayBuffer
295295
from forge.actors.reference_model import ReferenceModel
296296
from forge.actors.trainer import RLTrainer

docs/source/tutorial_sources/zero-to-forge/2_Forge_Internals.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ The service creation automatically handles:
7373
- Message routing and serialization
7474

7575
```python
76-
from forge.actors.policy import Policy
76+
from forge.actors.generator import Generator as Policy
7777

7878
model = "Qwen/Qwen3-1.7B"
7979

@@ -560,7 +560,7 @@ Now let's see how services coordinate in a real training loop:
560560

561561
import asyncio
562562
import torch
563-
from forge.actors.policy import Policy
563+
from forge.actors.generator import Generator as Policy
564564
from forge.actors.reference_model import ReferenceModel
565565
from forge.actors.replay_buffer import ReplayBuffer
566566
from forge.actors.trainer import RLTrainer

docs/source/tutorial_sources/zero-to-forge/3_Monarch_101.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -18,15 +18,15 @@ graph TD
1818
end
1919
2020
subgraph MonarchLayer["3. Monarch Actor Layer"]
21-
ActorMesh["ActorMesh PolicyActor: 4 instances, Different GPUs, Message passing"]
21+
ActorMesh["ActorMesh Policy Actor: 4 instances, Different GPUs, Message passing"]
2222
ProcMesh["ProcMesh: 4 processes, GPU topology 0,1,2,3, Network interconnect"]
2323
end
2424
2525
subgraph Hardware["4. Physical Hardware"]
26-
GPU0["GPU 0: PolicyActor #1, vLLM Engine, Model Weights"]
27-
GPU1["GPU 1: PolicyActor #2, vLLM Engine, Model Weights"]
28-
GPU2["GPU 2: PolicyActor #3, vLLM Engine, Model Weights"]
29-
GPU3["GPU 3: PolicyActor #4, vLLM Engine, Model Weights"]
26+
GPU0["GPU 0: Policy Actor #1, vLLM Engine, Model Weights"]
27+
GPU1["GPU 1: Policy Actor #2, vLLM Engine, Model Weights"]
28+
GPU2["GPU 2: Policy Actor #3, vLLM Engine, Model Weights"]
29+
GPU3["GPU 3: Policy Actor #4, vLLM Engine, Model Weights"]
3030
end
3131
3232
Call --> ServiceInterface
@@ -154,17 +154,17 @@ await procs.stop()
154154

155155
**ActorMesh** is created when you spawn actors across a ProcMesh. Key points:
156156

157-
- **One actor instance per process**: `mesh.spawn("policy", PolicyActor)` creates one PolicyActor in each process
157+
- **One actor instance per process**: `mesh.spawn("policy", Policy)` creates one Policy Actor in each process
158158
- **Same constructor arguments**: All instances get the same initialization parameters
159159
- **Independent state**: Each actor instance maintains its own state and memory
160160
- **Message routing**: You can send messages to one actor or all actors using different methods
161161

162162
```python
163163
# Simple example:
164164
procs = spawn_procs(per_host={"gpus": 4}) # 4 processes
165-
policy_actors = procs.spawn("policy", PolicyActor, model="Qwen/Qwen3-7B")
165+
policy_actors = procs.spawn("policy", Policy, model="Qwen/Qwen3-7B")
166166

167-
# Now you have 4 PolicyActor instances, one per GPU
167+
# Now you have 4 Policy Actor instances, one per GPU
168168
# All initialized with the same model parameter
169169
```
170170

@@ -177,29 +177,29 @@ Now the key insight: **TorchForge services are ServiceActors that manage ActorMe
177177
```mermaid
178178
graph TD
179179
subgraph ServiceCreation["Service Creation Process"]
180-
Call["await PolicyActor.options(num_replicas=4, procs=1).as_service(model='Qwen')"]
180+
Call["await Policy.options(num_replicas=4, procs=1).as_service(model='Qwen')"]
181181
182182
ServiceActor["ServiceActor: Manages 4 replicas, Health checks, Routes calls"]
183183
184184
subgraph Replicas["4 Independent Replicas"]
185185
subgraph R0["Replica 0"]
186186
PM0["ProcMesh: 1 process, GPU 0"]
187-
AM0["ActorMesh<br/>1 PolicyActor"]
187+
AM0["ActorMesh<br/>1 Policy Actor"]
188188
end
189189
190190
subgraph R1["Replica 1"]
191191
PM1["ProcMesh: 1 process, GPU 1"]
192-
AM1["ActorMesh<br/>1 PolicyActor"]
192+
AM1["ActorMesh<br/>1 Policy Actor"]
193193
end
194194
195195
subgraph R2["Replica 2"]
196196
PM2["ProcMesh: 1 process, GPU 2"]
197-
AM2["ActorMesh<br/>1 PolicyActor"]
197+
AM2["ActorMesh<br/>1 Policy Actor"]
198198
end
199199
200200
subgraph R3["Replica 3"]
201201
PM3["ProcMesh: 1 process, GPU 3"]
202-
AM3["ActorMesh<br/>1 PolicyActor"]
202+
AM3["ActorMesh<br/>1 Policy Actor"]
203203
end
204204
end
205205
@@ -232,9 +232,9 @@ graph TD
232232
233233
ServiceActor["ServiceActor: Selects healthy replica, Load balancing, Failure handling"]
234234
235-
SelectedReplica["Selected Replica #2: ProcMesh 1 process, ActorMesh 1 PolicyActor"]
235+
SelectedReplica["Selected Replica #2: ProcMesh 1 process, ActorMesh 1 Policy Actor"]
236236
237-
PolicyActor["PolicyActor Instance: Loads model, Runs vLLM inference"]
237+
PolicyActor["Policy Actor Instance: Loads model, Runs vLLM inference"]
238238
239239
GPU["GPU 2: vLLM engine, Model weights, KV cache, CUDA kernels"]
240240

0 commit comments

Comments
 (0)