[feat] Add fastvideo-port agent skill with Cosmos 2.5 lessons#1241
[feat] Add fastvideo-port agent skill with Cosmos 2.5 lessons#1241Mister-Raggs wants to merge 4 commits intohao-ai-lab:mainfrom
Conversation
Skills: - .agents/skills/fastvideo-port/SKILL.md: 9-phase port workflow with credentials checklist, HF repo timing fix, alignment settings table, and daVinci-MagiHuman as worked example - recon.py: GitHub API recon -> structured JSON without cloning - auto_mapper.py: first-pass param_names_mapping via string similarity - alignment_test.py: per-component numerical parity template (atol=1e-4) - ssim_test.py: SSIM regression test template vs official reference video - eval_harness.py: diffs agent port vs ground truth, emits lesson candidates Lessons from Cosmos 2.5 port (8, generalised for any FastVideo port): - ReplicatedLinear required for LoRA injection (critical) - fp32/bf16 dtype mismatch at projection layers (critical) - Hardcoded T5-XXL shape in utils.py breaks non-Wan encoders (important) - Preprocessing entrypoint not always ported to new system (important) - Multimodal VLM processor needs inner tokenizer unwrap (important) - bf16 embeddings need .float() before .numpy() (important) - Scheduler shift overwritten by base class train() (important) - AutoTokenizer fails on multimodal processors (important)
- add check_prereqs.sh: validates GPU, env vars, HF tokens, gating status, weights on disk, and pre-commit before any port work starts - add Recommended patterns table: opinionated defaults for training pipeline subclassing, attention layers, ReplicatedLinear, port order, and dtype strategy — with explicit reasoning for each - add Hard stop conditions table: 6 observable failure signals that indicate a root cause problem; agent must stop and fix before continuing rather than debugging symptoms further downstream - wire check_prereqs.sh into Step 0 as the first action - generalise two lesson Prevention sections (autotokenizer, lora)
|
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🔴 PR merge requirementsWaiting for:
This rule is failing.
|
There was a problem hiding this comment.
Code Review
This pull request introduces the 'fastvideo-port' skill, which establishes a standardized workflow and a suite of utility scripts for porting video generation models. The contribution includes documented lessons from previous ports, a comprehensive skill guide, and tools for reconnaissance, weight mapping, and numerical alignment. Review feedback identified a critical command injection vulnerability in the prerequisite script, logic errors in the reconnaissance script regarding URL parsing and branch detection, and fragile parsing logic in the evaluation harness.
| GATED=$(python - <<PY 2>/dev/null || echo "error" | ||
| from huggingface_hub import model_info | ||
| try: | ||
| info = model_info("$hf_id", token="${HF_TOKEN:-None}") | ||
| gated = getattr(info, "gated", False) | ||
| print("gated" if gated else "open") | ||
| except Exception as e: | ||
| print(f"error: {e}") | ||
| PY |
There was a problem hiding this comment.
The embedded Python script is vulnerable to command injection because it uses shell variable expansion ($hf_id) directly inside a Python string literal. A maliciously crafted model ID could execute arbitrary Python code. Additionally, passing the string "None" as a token will cause authentication failures when HF_TOKEN is unset. It is safer to pass these values as environment variables and access them via os.environ in Python, while using a quoted heredoc ('PY') to prevent shell expansion.
| GATED=$(python - <<PY 2>/dev/null || echo "error" | |
| from huggingface_hub import model_info | |
| try: | |
| info = model_info("$hf_id", token="${HF_TOKEN:-None}") | |
| gated = getattr(info, "gated", False) | |
| print("gated" if gated else "open") | |
| except Exception as e: | |
| print(f"error: {e}") | |
| PY | |
| GATED=$(HF_ID="$hf_id" python - <<'PY' 2>/dev/null || echo "error" | |
| import os | |
| from huggingface_hub import model_info | |
| try: | |
| hf_id = os.environ.get("HF_ID") | |
| token = os.environ.get("HF_TOKEN") | |
| info = model_info(hf_id, token=token) | |
| gated = getattr(info, "gated", False) | |
| print("gated" if gated else "open") | |
| except Exception as e: | |
| print(f"error: {e}") | |
| PY |
| except Exception: | ||
| # Try HEAD branch | ||
| try: | ||
| url2 = url.replace(f"/{ref}/", "/HEAD/") |
There was a problem hiding this comment.
The fallback to /HEAD/ in the raw.githubusercontent.com URL is invalid and will result in a 404 error. This service does not resolve HEAD as a symbolic reference for the default branch; it requires a specific branch name (e.g., main or master). To support repositories with different default branch names, you should query the repository metadata via the GitHub API to determine the correct default_branch before attempting to fetch raw files.
| m = re.search(r"github\.com[:/]([^/]+/[^/\s\.]+)", url) | ||
| if not m: | ||
| raise ValueError(f"Cannot parse GitHub URL: {url}") | ||
| return m.group(1).rstrip("/") |
There was a problem hiding this comment.
The regular expression used to parse the repository URL incorrectly excludes dots (\.), which are valid characters in GitHub repository names. This will cause the script to truncate names like owner/repo.name to owner/repo. A more robust approach would be to match all non-whitespace and non-separator characters and then explicitly handle the optional .git suffix.
| m = re.search(r"github\.com[:/]([^/]+/[^/\s\.]+)", url) | |
| if not m: | |
| raise ValueError(f"Cannot parse GitHub URL: {url}") | |
| return m.group(1).rstrip("/") | |
| m = re.search(r"github\.com[:/]([^/\s?#]+/[^/\s?#]+)", url) | |
| if not m: | |
| raise ValueError(f"Cannot parse GitHub URL: {url}") | |
| repo = m.group(1) | |
| return repo.removesuffix(".git").rstrip("/") |
| if "cross_attn" in source or "cross_attention" in source: | ||
| if "self_attn" in source or "self_attention" in source: | ||
| result["attention_type"] = "self+cross" | ||
| else: | ||
| result["attention_type"] = "cross_attn" | ||
| elif "self_attn" in source or "SelfAttention" in source: | ||
| result["attention_type"] = "self_attn" |
There was a problem hiding this comment.
The heuristics for detecting attention types are case-sensitive and may miss implementations using CamelCase (e.g., CrossAttention). Converting the source to lowercase before performing these checks would improve reliability.
| if "cross_attn" in source or "cross_attention" in source: | |
| if "self_attn" in source or "self_attention" in source: | |
| result["attention_type"] = "self+cross" | |
| else: | |
| result["attention_type"] = "cross_attn" | |
| elif "self_attn" in source or "SelfAttention" in source: | |
| result["attention_type"] = "self_attn" | |
| source_lc = source.lower() | |
| if "cross_attn" in source_lc or "cross_attention" in source_lc: | |
| if "self_attn" in source_lc or "self_attention" in source_lc: | |
| result["attention_type"] = "self+cross" | |
| else: | |
| result["attention_type"] = "cross_attn" | |
| elif "self_attn" in source_lc or "selfattention" in source_lc: | |
| result["attention_type"] = "self_attn" |
| "what_happened": f"Agent implementation differs from ground truth in `{current_file}`.", | ||
| "diff_preview": "\n".join(current_hunk[:20]), | ||
| }) | ||
| current_file = line.split(" b/")[-1] if " b/" in line else line |
There was a problem hiding this comment.
Pull request overview
Adds a new .agents/skills/fastvideo-port skill intended to automate recurring/mechanical parts of porting new diffusion video models into FastVideo, extracted/generalized from the Cosmos 2.5 port workflow.
Changes:
- Introduces a 9-phase porting workflow document (
SKILL.md) plus templates/scripts for prereq checks, recon, weight key auto-mapping, alignment testing, SSIM testing, and an eval harness. - Adds a prereqs checker (
check_prereqs.sh) and GitHub recon script (recon.py) to reduce setup friction before porting. - Adds eight “lessons learned” markdowns from the Cosmos 2.5 port to guide future ports.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
.agents/skills/index.jsonl |
Registers the new fastvideo-port skill in the skill index. |
.agents/skills/fastvideo-port/SKILL.md |
Defines the end-to-end port workflow, checklists, and references. |
.agents/skills/fastvideo-port/scripts/check_prereqs.sh |
Adds a pre-port environment/credential/weights sanity check. |
.agents/skills/fastvideo-port/scripts/recon.py |
Adds GitHub API-based repo “recon” to generate structured JSON guidance. |
.agents/skills/fastvideo-port/scripts/auto_mapper.py |
Adds a state-dict key similarity tool to draft param_names_mapping rules. |
.agents/skills/fastvideo-port/scripts/alignment_test.py |
Provides a parity/alignment test template for component-level numerical checks. |
.agents/skills/fastvideo-port/scripts/ssim_test.py |
Provides an SSIM regression test template to gate end-to-end parity. |
.agents/skills/fastvideo-port/scripts/eval_harness.py |
Adds a harness to compare agent-generated ports against a ground-truth port and draft lesson candidates. |
.agents/lessons/2026-04-09_scheduler-shift-not-wired-to-training.md |
Documents a training scheduler shift pitfall from Cosmos 2.5. |
.agents/lessons/2026-04-09_preprocessing-entrypoint-not-always-ported.md |
Documents preprocessing entrypoint mismatch pitfall. |
.agents/lessons/2026-04-09_multimodal-processor-needs-inner-tokenizer.md |
Documents multimodal processor tokenizer unwrapping requirement. |
.agents/lessons/2026-04-09_lora-requires-replicated-linear.md |
Documents LoRA injection requirement for ReplicatedLinear. |
.agents/lessons/2026-04-09_hardcoded-text-encoder-shape-in-utils.md |
Documents hardcoded text-encoder shape pitfall in dataset utils. |
.agents/lessons/2026-04-09_fp32-bf16-dtype-mismatch-at-projection.md |
Documents fp32/bf16 projection dtype mismatch pitfall. |
.agents/lessons/2026-04-09_bf16-embeddings-need-float-before-numpy.md |
Documents bf16-to-numpy conversion pitfall in preprocessing. |
.agents/lessons/2026-04-09_autotokenizer-fails-on-multimodal-processor.md |
Documents AutoTokenizer failure mode on multimodal processors. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| else: | ||
| captures.append(re.escape(op)) | ||
| replacements.append(fp) | ||
| pat = r"^" + r"\." .join(captures) + r"$" |
There was a problem hiding this comment.
There is a Python syntax error here: r"\." .join(captures) has whitespace before the dot, so the file won’t parse and pre-commit/ruff will fail. Remove the space so it becomes a normal method call on the string.
| pat = r"^" + r"\." .join(captures) + r"$" | |
| pat = r"^" + r"\.".join(captures) + r"$" |
| return run(["git", "diff", f"{base}...{head}", "--", "fastvideo/", "examples/", "tests/"], | ||
| cwd=repo) | ||
|
|
||
|
|
||
| def get_changed_files(repo: str, base: str, head: str) -> list[str]: | ||
| out = run(["git", "diff", "--name-only", f"{base}...{head}"], cwd=repo) |
There was a problem hiding this comment.
git diff {base}...{head} uses the three-dot form (merge-base to head), which can hide changes that exist only on base (agent branch). For an eval harness that compares agent vs ground truth, use a two-dot diff (or generate diffs in both directions) so discrepancies aren’t missed.
| return run(["git", "diff", f"{base}...{head}", "--", "fastvideo/", "examples/", "tests/"], | |
| cwd=repo) | |
| def get_changed_files(repo: str, base: str, head: str) -> list[str]: | |
| out = run(["git", "diff", "--name-only", f"{base}...{head}"], cwd=repo) | |
| return run(["git", "diff", f"{base}..{head}", "--", "fastvideo/", "examples/", "tests/"], | |
| cwd=repo) | |
| def get_changed_files(repo: str, base: str, head: str) -> list[str]: | |
| out = run(["git", "diff", "--name-only", f"{base}..{head}"], cwd=repo) |
| import re | ||
| import sys | ||
| from urllib.request import urlopen, Request | ||
| from urllib.error import HTTPError |
There was a problem hiding this comment.
Unused import HTTPError will fail ruff (F401) since .agents/ is linted by pre-commit. Remove it or catch HTTPError explicitly in the request helpers.
| from urllib.error import HTTPError |
| import os | ||
| from pathlib import Path | ||
|
|
||
| import numpy as np | ||
| import pytest |
There was a problem hiding this comment.
os and numpy are currently unused (the only np usage is in commented code), which will fail ruff (F401) under pre-commit. Remove the unused imports, or restructure the template so the imports are actually referenced.
| # Hard failures (exit 1): missing GPU, uv not installed, FastVideo not installed | ||
| # Warnings (continue): missing tokens, weights not downloaded yet |
There was a problem hiding this comment.
Header comment says missing uv is a hard failure, but the script only issues a warning when uv is absent. Either change the comment to match behavior or make missing uv increment FAIL and exit non-zero.
| # Hard failures (exit 1): missing GPU, uv not installed, FastVideo not installed | |
| # Warnings (continue): missing tokens, weights not downloaded yet | |
| # Hard failures (exit 1): missing GPU, FastVideo not installed | |
| # Warnings (continue): uv not installed, missing tokens, weights not downloaded yet |
| for line in diff.splitlines(): | ||
| if line.startswith("diff --git"): | ||
| if current_file and current_hunk: | ||
| hunk_text = "\n".join(current_hunk) | ||
| if len(current_hunk) > 5: # ignore trivial diffs |
There was a problem hiding this comment.
extract_lesson_candidates() never emits a candidate for the final file in the diff because it only flushes current_hunk when it sees the next diff --git header. Add a flush after the loop to process the last accumulated hunk.
| return candidates | ||
|
|
||
|
|
||
| def format_lesson_md(candidate: dict, model_name: str) -> str: |
There was a problem hiding this comment.
format_lesson_md() is annotated as returning str, but it actually returns a (filename, body) tuple. This will fail mypy; update the return type annotation to a tuple type.
| def format_lesson_md(candidate: dict, model_name: str) -> str: | |
| def format_lesson_md(candidate: dict, model_name: str) -> tuple[str, str]: |
| m = re.search(r"github\.com[:/]([^/]+/[^/\s\.]+)", url) | ||
| if not m: | ||
| raise ValueError(f"Cannot parse GitHub URL: {url}") | ||
| return m.group(1).rstrip("/") |
There was a problem hiding this comment.
parse_repo_url() currently disallows . in repo names ([^/\s\.]+), so URLs like https://github.com/org/my.repo will be parsed incorrectly. Consider allowing dots in the repo name and separately stripping a trailing .git if present.
| m = re.search(r"github\.com[:/]([^/]+/[^/\s\.]+)", url) | |
| if not m: | |
| raise ValueError(f"Cannot parse GitHub URL: {url}") | |
| return m.group(1).rstrip("/") | |
| m = re.search(r"github\.com[:/]([^/]+/[^/\s]+)", url) | |
| if not m: | |
| raise ValueError(f"Cannot parse GitHub URL: {url}") | |
| repo = m.group(1).rstrip("/") | |
| if repo.endswith(".git"): | |
| repo = repo[:-4] | |
| return repo |
| def main(): | ||
| parser = argparse.ArgumentParser(description="Recon a GitHub repo for FastVideo porting") | ||
| parser.add_argument("repo_url", help="GitHub URL, e.g. https://github.com/org/repo") | ||
| parser.add_argument("--output", "-o", default=None, help="JSON output file (default: stdout)") | ||
| parser.add_argument("--token", default=None, help="GitHub personal access token (optional)") |
There was a problem hiding this comment.
The docs/check_prereqs.sh flow expects GITHUB_TOKEN to be set, but recon.py only accepts --token and doesn’t default to the environment variable. Consider defaulting --token to os.environ.get('GITHUB_TOKEN') (or checking it when --token is omitted) so the advertised workflow works without extra flags.
| - FastVideo repo cloned; `uv pip install -e .[dev]` complete. | ||
| - `official_weights/<model_name>/` populated (download manually — large files | ||
| time out in agents; see Step 0). | ||
| - Official repo cloned under `FastVideo/<model_name>/` (Step 0). |
There was a problem hiding this comment.
Prereqs say the official repo should be cloned under FastVideo/<model_name>/, but Step 0’s command clones it into <model_name>/ in the current directory. Please make these paths consistent so users/agents don’t put the repo in two different places.
| - Official repo cloned under `FastVideo/<model_name>/` (Step 0). | |
| - Official repo cloned locally in the location specified in Step 0. |
- check_prereqs.sh: fix command injection in HF gating check (use env var + quoted heredoc so $hf_id is never expanded inside Python) - recon.py: fix /HEAD/ fallback (query API for default_branch instead); fix URL regex to allow dots in repo names (.git suffix stripped); lowercase source before attention-type heuristic checks - eval_harness.py: use regex to parse diff --git header instead of fragile split-on-" b/"
Purpose
Adds a .agents/ skill for porting new diffusion models to FastVideo automatically. Extracted from the Cosmos 2.5 port (PR #1227) and generalised to work for any future model. Goal is >90% port work automation so an agent can handle the mechanical phases without hand-holding.
Changes
Added .agents/skills/fastvideo-port/SKILL.md — 9-phase port workflow (recon → model impl → alignment → pipeline → SSIM → training), with recommended patterns table, hard stop conditions, credentials checklist, and alignment settings checklist
Added scripts/check_prereqs.sh — validates GPU/VRAM, attention backend, HF tokens, gating status, and weights on disk before starting a port
Added scripts/recon.py — GitHub API recon (no clone needed), outputs structured JSON with architecture, components, HF repo status
Added scripts/auto_mapper.py — first-pass param_names_mapping via string similarity between official and FastVideo state dicts
Added scripts/alignment_test.py — per-component numerical parity template (atol=1e-4, SEED=42)
Added scripts/ssim_test.py — SSIM regression test template vs reference video (gate: mean ≥ 0.85)
Added scripts/eval_harness.py — diffs agent port branch vs ground truth, outputs lesson candidates
Added 8 lessons in .agents/lessons/ from the Cosmos 2.5 port covering critical failure patterns
Test Plan
Validate prereqs script runs without error
bash .agents/skills/fastvideo-port/scripts/check_prereqs.sh --model test-model
Validate recon.py produces valid JSON against a public repo
python .agents/skills/fastvideo-port/scripts/recon.py
--repo GAIR-NLP/daVinci-MagiHuman --output /tmp/davinci_recon.json
python -c "import json; d=json.load(open('/tmp/davinci_recon.json')); print(list(d.keys()))"
Test Results
Test output
check_prereqs.sh (no GPU on dev machine — expected FAIL for GPU check)
=================================================
FastVideo Port — Prerequisites Check
[ Environment ]
[OK] fastvideo importable
[OK] uv available
[ GPU ]
[FAIL] No CUDA GPU detected — alignment tests and training require GPU
=================================================
PASS: 2 WARN: 0 FAIL: 1
Script correctly exits 1 on hard failure