Skip to content

Sandboxed code execution for RL rollouts #468

@congwang-mk

Description

@congwang-mk

rLLM's @rllm.rollout decorator makes it easy to turn any agent into an RL training loop. But when rollouts execute LLM-generated code (e.g., coding tasks, tool use, reward evaluation), each episode runs with full host access -- filesystem, network, environment variables.

At scale this creates two problems:

  1. Safety: A single malicious or buggy rollout can corrupt the host, leak data, or interfere with other rollouts running in parallel.
  2. Reproducibility: Rollouts that depend on shared mutable state (temp files, network, system time) produce non-deterministic rewards.

Container-based isolation (Docker, Firecracker) works but adds significant overhead per episode:

Approach Startup per episode 100K episodes overhead
Docker ~200ms 5.6 hours
Firecracker (E2B) ~150ms 4.2 hours
sandlock ~5ms 8 minutes
sandlock COW fork ~0.5ms 50 seconds

sandlock provides kernel-level process isolation using Landlock + seccomp. No containers, no VMs, no root. Each rollout gets its own filesystem, network, memory, and process limits enforced by the kernel.

Example integration with a rollout function:

from sandlock import Sandbox, Policy

policy = Policy(
    fs_readable=["/usr", "/lib", "/lib64", "/etc"],
    fs_writable=["/tmp/rollout"],
    net_allow_hosts=[],       # no network
    max_memory="512M",
    max_processes=10,
    random_seed=episode_id,   # deterministic randomness
    time_start="2025-01-01T00:00:00",  # frozen time
)
result = Sandbox(policy).run(["python3", "-c", generated_code])
# result.stdout, result.exit_code

For batch rollouts, COW fork initializes once and clones cheaply:

mapper = Sandbox(policy, init_fn=load_env, work_fn=run_episode)
clones = mapper.fork(1000)  # ~530ms for 1000 clones, shared memory pages

Each clone inherits kernel-enforced isolation. CLONE_ID=0..N-1 is set automatically.

Would the rLLM team be interested in a sandbox backend for rollout execution? Happy to help with integration or answer questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions