rLLM's @rllm.rollout decorator makes it easy to turn any agent into an RL training loop. But when rollouts execute LLM-generated code (e.g., coding tasks, tool use, reward evaluation), each episode runs with full host access -- filesystem, network, environment variables.
At scale this creates two problems:
- Safety: A single malicious or buggy rollout can corrupt the host, leak data, or interfere with other rollouts running in parallel.
- Reproducibility: Rollouts that depend on shared mutable state (temp files, network, system time) produce non-deterministic rewards.
Container-based isolation (Docker, Firecracker) works but adds significant overhead per episode:
| Approach |
Startup per episode |
100K episodes overhead |
| Docker |
~200ms |
5.6 hours |
| Firecracker (E2B) |
~150ms |
4.2 hours |
| sandlock |
~5ms |
8 minutes |
| sandlock COW fork |
~0.5ms |
50 seconds |
sandlock provides kernel-level process isolation using Landlock + seccomp. No containers, no VMs, no root. Each rollout gets its own filesystem, network, memory, and process limits enforced by the kernel.
Example integration with a rollout function:
from sandlock import Sandbox, Policy
policy = Policy(
fs_readable=["/usr", "/lib", "/lib64", "/etc"],
fs_writable=["/tmp/rollout"],
net_allow_hosts=[], # no network
max_memory="512M",
max_processes=10,
random_seed=episode_id, # deterministic randomness
time_start="2025-01-01T00:00:00", # frozen time
)
result = Sandbox(policy).run(["python3", "-c", generated_code])
# result.stdout, result.exit_code
For batch rollouts, COW fork initializes once and clones cheaply:
mapper = Sandbox(policy, init_fn=load_env, work_fn=run_episode)
clones = mapper.fork(1000) # ~530ms for 1000 clones, shared memory pages
Each clone inherits kernel-enforced isolation. CLONE_ID=0..N-1 is set automatically.
Would the rLLM team be interested in a sandbox backend for rollout execution? Happy to help with integration or answer questions.
rLLM's
@rllm.rolloutdecorator makes it easy to turn any agent into an RL training loop. But when rollouts execute LLM-generated code (e.g., coding tasks, tool use, reward evaluation), each episode runs with full host access -- filesystem, network, environment variables.At scale this creates two problems:
Container-based isolation (Docker, Firecracker) works but adds significant overhead per episode:
sandlock provides kernel-level process isolation using Landlock + seccomp. No containers, no VMs, no root. Each rollout gets its own filesystem, network, memory, and process limits enforced by the kernel.
Example integration with a rollout function:
For batch rollouts, COW fork initializes once and clones cheaply:
Each clone inherits kernel-enforced isolation.
CLONE_ID=0..N-1is set automatically.Would the rLLM team be interested in a sandbox backend for rollout execution? Happy to help with integration or answer questions.