Start with the shared practices below. Consult subproject manuals for directory-specific guidance:
lib/levanter/AGENTS.md— Levanter (JAX training library)lib/marin/AGENTS.md— Marin (pipeline framework)lib/iris/AGENTS.md— Iris (job orchestration)lib/zephyr/AGENTS.md— Zephyr (dataset processing)lib/fray/AGENTS.md— Fray (distributed execution)
For debugging and operating live infrastructure, read the relevant OPS.md:
lib/iris/OPS.md— cluster lifecycle, job/task management, profiling, SQL queries, GCP/CoreWeave operationslib/zephyr/OPS.md— pipeline debugging, straggler diagnosis, coordinator queries, diagnostic patterns
Zephyr OPS.md references Iris OPS.md for shared infrastructure commands — read Iris first when debugging zephyr jobs on Iris.
Skills are task-focused playbooks in .agents/skills/ (also accessible as
.claude/skills/). Before starting any non-trivial task, check whether a
matching skill exists by scanning the skill descriptions in your system
prompt. If a skill matches, invoke it via the Skill tool — do not skip it in
favor of ad-hoc commands.
# Lint and format
./infra/pre-commit.py --all-files --fix
- `./infra/pre-commit.py` is the required lint entry point for this repo.
- Do not replace it with `uv run pre-commit ...`!
# Type checking (also done by pre-commit.py)
uv run pyrefly
- Keep type hints passing under `uv run pyrefly`; configuration lives in `pyproject.toml`.- Python >=3.11. Use
uv runfor entry points; fall back to.venv/bin/pythonif needed. - NEVER stop, restart, or bounce an Iris cluster unless the user gives express permission.
- In general, never read or write large amounts of data across GCS regions or to the open internet; storage and bandwidth are major cost drivers for this project.
- do not use storage transfer service to move files from one region to another unless the user says "I personally will write grants for Percy to pay for this"
- NEVER SAY "You're absolutely right!"
- NEVER credit yourself in commits.
- When an agent creates a PR or issue, add the
agent-generatedlabel. - Agent comments on PRs/issues must begin with
🤖unless the exact text was explicitly approved by the user. - When using
ghto inspect issues or PRs, prefer--json <fields>or explicit narrow flags such as--comments; avoid plaingh issue view/gh pr view, which can fail on this repo because GitHub classic project fields are deprecated.
- All imports at the top of the file. No local imports except to break circular dependencies or guard optional deps. No
TYPE_CHECKINGguards — fix cycles structurally via protocols. - Prefer top-level functions over classes when code does not mutate shared state. Reduce deep inheritance hierarchies.
- Use early returns to reduce nesting.
- Document public APIs with concise Google-style docstrings. Skip docstrings on trivial functions with clear names.
- Prefer
dataclasses.replaceover mutating config arguments in-place. - Prefer logging over
print(except in scripts and debugging). - Resolve environment-dependent defaults once and fail fast on unknown inputs.
- No ad-hoc compatibility hacks (
hasattr(m, "old_attr")); update code consistently. - Prefer small concrete helpers over abstraction that adds indirection without reuse. Start simple; abstract only under real pressure.
- Delete dead code: unused parameters, stale options, old experiments.
- Top-level constants for magic strings/numbers.
- Separate computation from I/O (split compute from upload/write).
- Use context managers for resource lifecycle.
- No
*_utils.py— use descriptive names liketext_cleaning.py. - Function names should reflect return types (
probe_task→task_status). - No
_ssuffix for seconds (assumed in this codebase). No abbreviations likeexe— useexecor full words.
- Dataclass/namedtuple over raw dicts.
StrEnumover string keys. - Use
Protocolfor decoupling; avoid hard-coupling to concrete types. - Avoid
X | strunions that requireisinstancechecks — pick one input type. - Replace compound booleans encoding state with an enum.
- No
default_*wrappers that obscure underlying mechanisms. - Force explicit specification of critical parameters (no silent defaults).
- Centralize defaults in one canonical location.
- Prefer explicit constructor/config parameters over env vars.
- Composition over inheritance: embed sub-configs, don't subclass.
- Accept only what's necessary. Replace boolean flags with meaningful parameters (e.g.,
num_workers: intinstead ofparallel: bool). - Use separate classes over boolean flags for variant behavior (
NativeVllm/DockerVllm, notVllm(docker=True)). - Normalize inputs to a standard format once at the boundary, not throughout.
- Let exceptions propagate by default.
- Only catch to add meaningful context and re-raise, or to intentionally alter control flow.
- NEVER swallow exceptions unless specifically requested.
- Assert liberally; prefer
raise ValueErrorover silent fallbacks.
- Keep MkDocs content in sync with code. Use Markdown and mkdocs-style links.
- Write docs that stand alone without conversational context.
NO BACKWARD COMPATIBILITY: Update all call sites instead. Only add compatibility shims if the user explicitly requests it.
- Write comments for module/class-level behavior or subtle logic. Do not restate the code.
- Delete stale comments immediately on discovery.
- Inline comments to clarify non-obvious boolean arguments.
Watch for and eliminate these patterns in generated code:
- Over-protective try/except and defensive None checks
- Tautological tests (type exists, constant has value)
- Verbose/redundant docstrings and
__all__in__init__.py - Boolean dispatch instead of separate classes
- Environment variables instead of explicit parameters
- Produce detailed plans with code snippets. Ask questions up front instead of guessing.
- When a request is too large for one pass, capture a plan in
.agents/projects/before pausing.
Before writing any utility function, helper, or data structure:
- Search the codebase for existing implementations
- Check subproject utils:
lib/marin/src/marin/,lib/iris/src/iris/,lib/levanter/ - Check
pyproject.tomlfor available third-party packages before adding new ones
If a suitable implementation exists, use it. Do not create parallel implementations.
Dependency direction: {iris, haliax} → {levanter, zephyr} → marin. Each layer may only import from layers to its left. Never introduce reverse dependencies (e.g., levanter importing from marin).
- Always fix tests you broke. Do not relax tolerances or hack around failures.
- Prefer integration-style tests that validate externally-observable behavior.
- Do not write tautological tests: tests must fail if behavior is wrong, not just if implementation changes.
- Use pytest fixtures and parameterization to avoid duplication.
- Prefer top-level
def test_*with fixtures over test classes. - Search for existing test files before creating new ones. Extend existing files first.
- No mocks unless testing I/O boundaries (network, filesystem). Test against real behavior.
- No
time.sleep()in tests — injectnow=time.time()or mock time instead. - Mock at boundaries (e.g., wandb), not internal logger output.