CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Repository Overview

BioNeMo Framework is a comprehensive suite for building and training biological foundation models at scale. The repository is split into three main areas:

bionemo-recipes/ - Lightweight, self-contained training recipes using FSDP (Fully Sharded Data Parallel) with TransformerEngine and megatron-FSDP
sub-packages/ - 5D parallel models (tensor/pipeline/context parallel) using NeMo/Megatron-Core, and dataloading/processing tools
3rdparty/ - Git submodules for NeMo and Megatron-LM dependencies

Code Architecture

bionemo-recipes Structure

The recipes directory contains two types of components:

Models (bionemo-recipes/models/): HuggingFace-compatible PreTrainedModel classes with TransformerEngine layers. These are:

Distributed via Hugging Face Hub (e.g., nvidia/esm2_t48_15B_UR50D)
Drop-in replacements for standard transformers, compatible with AutoModel.from_pretrained()
Each model includes: conversion utilities (HF ↔ TE), golden value tests, checkpoint export scripts, and open-source license

Recipes (bionemo-recipes/recipes/): Self-contained Docker environments demonstrating training patterns. Each recipe:

Is completely isolated with no shared dependencies between recipes
Contains everything needed: Dockerfile, training scripts, Hydra configs, tests, sample data
Prioritizes KISS (Keep It Simple) over DRY - code duplication is preferred for clarity
Follows naming: {model_name}_{framework}_{features}/ (e.g., esm2_native_te/)

Essential Commands

Pre-commit and Linting

CRITICAL: Always run pre-commit hooks after making changes:

# After editing files
pre-commit run --all-files
# Or for modified files only
pre-commit run

Do not copy license headers from other files, instead allow the license-check.py script to add the license header during pre-commit to ensure the proper year is used.

Pre-commit includes:

Ruff linting/formatting (line-length: 119, Google-style docstrings)
Markdown formatting (mdformat)
License header checks
Trailing whitespace/EOF fixes
YAML validation
Secret detection

If pre-commit fails, fix issues before considering task complete. Tasks are NOT complete until all linter errors are resolved and pre-commit passes.

Testing

For bionemo-recipes:

# Test a specific recipe/model inside the devcontainer
cd bionemo-recipes/recipes/{recipe_name}  # or models/{model_name}
pytest -v .

Development Workflow

Working on bionemo-recipes

Navigate to the recipe/model: cd bionemo-recipes/recipes/{name} or cd bionemo-recipes/models/{name}
Make changes - remember each recipe is self-contained
Test locally inside the devcontainer:
```
pytest -v .
```
Run pre-commit: pre-commit run --files $(git ls-files -m)

Code Quality Standards

Line length: 119 characters
Docstrings: Google-style (pydocstyle convention)
Import sorting: isort configuration (2 lines after imports)
Linting: Ruff for Python, Pyright for type checking
Test files and __init__.py: Have relaxed linting rules as configured in pyproject.toml

Key Configuration Files

.pre-commit-config.yaml: Pre-commit hook configuration
.cursorrules: Cursor AI coding guidelines
pyproject.toml (per sub-package): Individual package configuration

Important Patterns

bionemo-recipes Philosophy

Self-contained: No cross-recipe dependencies, no imports from other recipes
Educational: Code is documentation - prioritize readability over abstraction
KISS over DRY: Duplicate code if it improves clarity
One concept per recipe: Don't try to demonstrate every feature in one script

Model Conversion Pattern (bionemo-recipes/models)

Models require:

Golden value tests proving TE model matches reference model
Bidirectional conversion functions: convert_hf_to_te() and convert_te_to_hf()
Export script (export.py) bundling all files for Hugging Face Hub
Open-source license

Pre-commit Failures

Fix all issues before committing
Ruff will auto-fix many issues - verify fixes are appropriate
Some files have per-file ignores in pyproject.toml - respect those

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Repository Overview

Code Architecture

bionemo-recipes Structure

Essential Commands

Pre-commit and Linting

Testing

Development Workflow

Working on bionemo-recipes

Code Quality Standards

Key Configuration Files

Important Patterns

bionemo-recipes Philosophy

Model Conversion Pattern (bionemo-recipes/models)

Pre-commit Failures

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Repository Overview

Code Architecture

bionemo-recipes Structure

Essential Commands

Pre-commit and Linting

Testing

Development Workflow

Working on bionemo-recipes

Code Quality Standards

Key Configuration Files

Important Patterns

bionemo-recipes Philosophy

Model Conversion Pattern (bionemo-recipes/models)

Pre-commit Failures