Skip to content

[feat] Add fastvideo-port agent skill with Cosmos 2.5 lessons#1241

Open
Mister-Raggs wants to merge 4 commits intohao-ai-lab:mainfrom
Mister-Raggs:feat/agent-port-skill
Open

[feat] Add fastvideo-port agent skill with Cosmos 2.5 lessons#1241
Mister-Raggs wants to merge 4 commits intohao-ai-lab:mainfrom
Mister-Raggs:feat/agent-port-skill

Conversation

@Mister-Raggs
Copy link
Copy Markdown

Purpose
Adds a .agents/ skill for porting new diffusion models to FastVideo automatically. Extracted from the Cosmos 2.5 port (PR #1227) and generalised to work for any future model. Goal is >90% port work automation so an agent can handle the mechanical phases without hand-holding.

Changes
Added .agents/skills/fastvideo-port/SKILL.md — 9-phase port workflow (recon → model impl → alignment → pipeline → SSIM → training), with recommended patterns table, hard stop conditions, credentials checklist, and alignment settings checklist
Added scripts/check_prereqs.sh — validates GPU/VRAM, attention backend, HF tokens, gating status, and weights on disk before starting a port
Added scripts/recon.py — GitHub API recon (no clone needed), outputs structured JSON with architecture, components, HF repo status
Added scripts/auto_mapper.py — first-pass param_names_mapping via string similarity between official and FastVideo state dicts
Added scripts/alignment_test.py — per-component numerical parity template (atol=1e-4, SEED=42)
Added scripts/ssim_test.py — SSIM regression test template vs reference video (gate: mean ≥ 0.85)
Added scripts/eval_harness.py — diffs agent port branch vs ground truth, outputs lesson candidates
Added 8 lessons in .agents/lessons/ from the Cosmos 2.5 port covering critical failure patterns
Test Plan

Validate prereqs script runs without error

bash .agents/skills/fastvideo-port/scripts/check_prereqs.sh --model test-model

Validate recon.py produces valid JSON against a public repo

python .agents/skills/fastvideo-port/scripts/recon.py
--repo GAIR-NLP/daVinci-MagiHuman --output /tmp/davinci_recon.json
python -c "import json; d=json.load(open('/tmp/davinci_recon.json')); print(list(d.keys()))"
Test Results

Test output

check_prereqs.sh (no GPU on dev machine — expected FAIL for GPU check)

=================================================
FastVideo Port — Prerequisites Check

[ Environment ]
[OK] fastvideo importable
[OK] uv available

[ GPU ]
[FAIL] No CUDA GPU detected — alignment tests and training require GPU

=================================================
PASS: 2 WARN: 0 FAIL: 1

Script correctly exits 1 on hard failure

Checklist I ran pre-commit run --all-files and fixed all issues I added or updated tests for my changes I updated documentation if needed I considered GPU memory impact of my changes

Mister-Raggs and others added 3 commits April 17, 2026 14:13
Skills:
- .agents/skills/fastvideo-port/SKILL.md: 9-phase port workflow with
  credentials checklist, HF repo timing fix, alignment settings table,
  and daVinci-MagiHuman as worked example
- recon.py: GitHub API recon -> structured JSON without cloning
- auto_mapper.py: first-pass param_names_mapping via string similarity
- alignment_test.py: per-component numerical parity template (atol=1e-4)
- ssim_test.py: SSIM regression test template vs official reference video
- eval_harness.py: diffs agent port vs ground truth, emits lesson candidates

Lessons from Cosmos 2.5 port (8, generalised for any FastVideo port):
- ReplicatedLinear required for LoRA injection (critical)
- fp32/bf16 dtype mismatch at projection layers (critical)
- Hardcoded T5-XXL shape in utils.py breaks non-Wan encoders (important)
- Preprocessing entrypoint not always ported to new system (important)
- Multimodal VLM processor needs inner tokenizer unwrap (important)
- bf16 embeddings need .float() before .numpy() (important)
- Scheduler shift overwritten by base class train() (important)
- AutoTokenizer fails on multimodal processors (important)
- add check_prereqs.sh: validates GPU, env vars, HF tokens, gating
  status, weights on disk, and pre-commit before any port work starts
- add Recommended patterns table: opinionated defaults for training
  pipeline subclassing, attention layers, ReplicatedLinear, port order,
  and dtype strategy — with explicit reasoning for each
- add Hard stop conditions table: 6 observable failure signals that
  indicate a root cause problem; agent must stop and fix before continuing
  rather than debugging symptoms further downstream
- wire check_prereqs.sh into Step 0 as the first action
- generalise two lesson Prevention sections (autotokenizer, lora)
Copilot AI review requested due to automatic review settings April 18, 2026 05:54
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Apr 18, 2026

⚠️ PR title format required

Your PR title must start with a type tag in brackets. Examples:

  • [feat] Add new model support
  • [bugfix] Fix VAE tiling corruption
  • [refactor] Restructure training pipeline
  • [perf] Optimize attention kernel
  • [ci] Update test infrastructure
  • [docs] Add inference guide
  • [misc] Clean up configs
  • [new-model] Port Flux2 to FastVideo

Valid tags: feat, feature, bugfix, fix, refactor, perf, ci, doc, docs, misc, chore, kernel, new-model

Please update your PR title and the merge protection check will pass automatically.

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Apr 18, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 PR merge requirements

Waiting for:

  • #approved-reviews-by>=1
  • check-success=full-suite-passed
  • check-success~=pre-commit
This rule is failing.
  • #approved-reviews-by>=1
  • check-success=full-suite-passed
  • check-success~=pre-commit
  • check-success=fastcheck-passed
  • title~=(?i)^\[(feat|feature|bugfix|fix|refactor|perf|ci|doc|docs|misc|chore|kernel|new.?model)\]

@Mister-Raggs Mister-Raggs changed the title Feat/agent port skill [feat] Add fastvideo-port agent skill with Cosmos 2.5 lessonsFeat/agent port skill Apr 18, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the 'fastvideo-port' skill, which establishes a standardized workflow and a suite of utility scripts for porting video generation models. The contribution includes documented lessons from previous ports, a comprehensive skill guide, and tools for reconnaissance, weight mapping, and numerical alignment. Review feedback identified a critical command injection vulnerability in the prerequisite script, logic errors in the reconnaissance script regarding URL parsing and branch detection, and fragile parsing logic in the evaluation harness.

Comment on lines +111 to +119
GATED=$(python - <<PY 2>/dev/null || echo "error"
from huggingface_hub import model_info
try:
info = model_info("$hf_id", token="${HF_TOKEN:-None}")
gated = getattr(info, "gated", False)
print("gated" if gated else "open")
except Exception as e:
print(f"error: {e}")
PY
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

The embedded Python script is vulnerable to command injection because it uses shell variable expansion ($hf_id) directly inside a Python string literal. A maliciously crafted model ID could execute arbitrary Python code. Additionally, passing the string "None" as a token will cause authentication failures when HF_TOKEN is unset. It is safer to pass these values as environment variables and access them via os.environ in Python, while using a quoted heredoc ('PY') to prevent shell expansion.

Suggested change
GATED=$(python - <<PY 2>/dev/null || echo "error"
from huggingface_hub import model_info
try:
info = model_info("$hf_id", token="${HF_TOKEN:-None}")
gated = getattr(info, "gated", False)
print("gated" if gated else "open")
except Exception as e:
print(f"error: {e}")
PY
GATED=$(HF_ID="$hf_id" python - <<'PY' 2>/dev/null || echo "error"
import os
from huggingface_hub import model_info
try:
hf_id = os.environ.get("HF_ID")
token = os.environ.get("HF_TOKEN")
info = model_info(hf_id, token=token)
gated = getattr(info, "gated", False)
print("gated" if gated else "open")
except Exception as e:
print(f"error: {e}")
PY

except Exception:
# Try HEAD branch
try:
url2 = url.replace(f"/{ref}/", "/HEAD/")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The fallback to /HEAD/ in the raw.githubusercontent.com URL is invalid and will result in a 404 error. This service does not resolve HEAD as a symbolic reference for the default branch; it requires a specific branch name (e.g., main or master). To support repositories with different default branch names, you should query the repository metadata via the GitHub API to determine the correct default_branch before attempting to fetch raw files.

Comment on lines +113 to +116
m = re.search(r"github\.com[:/]([^/]+/[^/\s\.]+)", url)
if not m:
raise ValueError(f"Cannot parse GitHub URL: {url}")
return m.group(1).rstrip("/")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The regular expression used to parse the repository URL incorrectly excludes dots (\.), which are valid characters in GitHub repository names. This will cause the script to truncate names like owner/repo.name to owner/repo. A more robust approach would be to match all non-whitespace and non-separator characters and then explicitly handle the optional .git suffix.

Suggested change
m = re.search(r"github\.com[:/]([^/]+/[^/\s\.]+)", url)
if not m:
raise ValueError(f"Cannot parse GitHub URL: {url}")
return m.group(1).rstrip("/")
m = re.search(r"github\.com[:/]([^/\s?#]+/[^/\s?#]+)", url)
if not m:
raise ValueError(f"Cannot parse GitHub URL: {url}")
repo = m.group(1)
return repo.removesuffix(".git").rstrip("/")

Comment on lines +158 to +164
if "cross_attn" in source or "cross_attention" in source:
if "self_attn" in source or "self_attention" in source:
result["attention_type"] = "self+cross"
else:
result["attention_type"] = "cross_attn"
elif "self_attn" in source or "SelfAttention" in source:
result["attention_type"] = "self_attn"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The heuristics for detecting attention types are case-sensitive and may miss implementations using CamelCase (e.g., CrossAttention). Converting the source to lowercase before performing these checks would improve reliability.

Suggested change
if "cross_attn" in source or "cross_attention" in source:
if "self_attn" in source or "self_attention" in source:
result["attention_type"] = "self+cross"
else:
result["attention_type"] = "cross_attn"
elif "self_attn" in source or "SelfAttention" in source:
result["attention_type"] = "self_attn"
source_lc = source.lower()
if "cross_attn" in source_lc or "cross_attention" in source_lc:
if "self_attn" in source_lc or "self_attention" in source_lc:
result["attention_type"] = "self+cross"
else:
result["attention_type"] = "cross_attn"
elif "self_attn" in source_lc or "selfattention" in source_lc:
result["attention_type"] = "self_attn"

"what_happened": f"Agent implementation differs from ground truth in `{current_file}`.",
"diff_preview": "\n".join(current_hunk[:20]),
})
current_file = line.split(" b/")[-1] if " b/" in line else line
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Parsing the diff header by splitting on " b/" is fragile and will fail if a filename contains that specific sequence. While acceptable for an internal tool, using a regular expression to match the diff --git format would be more robust.

@mergify mergify bot added the type: feat New feature or capability label Apr 18, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new .agents/skills/fastvideo-port skill intended to automate recurring/mechanical parts of porting new diffusion video models into FastVideo, extracted/generalized from the Cosmos 2.5 port workflow.

Changes:

  • Introduces a 9-phase porting workflow document (SKILL.md) plus templates/scripts for prereq checks, recon, weight key auto-mapping, alignment testing, SSIM testing, and an eval harness.
  • Adds a prereqs checker (check_prereqs.sh) and GitHub recon script (recon.py) to reduce setup friction before porting.
  • Adds eight “lessons learned” markdowns from the Cosmos 2.5 port to guide future ports.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
.agents/skills/index.jsonl Registers the new fastvideo-port skill in the skill index.
.agents/skills/fastvideo-port/SKILL.md Defines the end-to-end port workflow, checklists, and references.
.agents/skills/fastvideo-port/scripts/check_prereqs.sh Adds a pre-port environment/credential/weights sanity check.
.agents/skills/fastvideo-port/scripts/recon.py Adds GitHub API-based repo “recon” to generate structured JSON guidance.
.agents/skills/fastvideo-port/scripts/auto_mapper.py Adds a state-dict key similarity tool to draft param_names_mapping rules.
.agents/skills/fastvideo-port/scripts/alignment_test.py Provides a parity/alignment test template for component-level numerical checks.
.agents/skills/fastvideo-port/scripts/ssim_test.py Provides an SSIM regression test template to gate end-to-end parity.
.agents/skills/fastvideo-port/scripts/eval_harness.py Adds a harness to compare agent-generated ports against a ground-truth port and draft lesson candidates.
.agents/lessons/2026-04-09_scheduler-shift-not-wired-to-training.md Documents a training scheduler shift pitfall from Cosmos 2.5.
.agents/lessons/2026-04-09_preprocessing-entrypoint-not-always-ported.md Documents preprocessing entrypoint mismatch pitfall.
.agents/lessons/2026-04-09_multimodal-processor-needs-inner-tokenizer.md Documents multimodal processor tokenizer unwrapping requirement.
.agents/lessons/2026-04-09_lora-requires-replicated-linear.md Documents LoRA injection requirement for ReplicatedLinear.
.agents/lessons/2026-04-09_hardcoded-text-encoder-shape-in-utils.md Documents hardcoded text-encoder shape pitfall in dataset utils.
.agents/lessons/2026-04-09_fp32-bf16-dtype-mismatch-at-projection.md Documents fp32/bf16 projection dtype mismatch pitfall.
.agents/lessons/2026-04-09_bf16-embeddings-need-float-before-numpy.md Documents bf16-to-numpy conversion pitfall in preprocessing.
.agents/lessons/2026-04-09_autotokenizer-fails-on-multimodal-processor.md Documents AutoTokenizer failure mode on multimodal processors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

else:
captures.append(re.escape(op))
replacements.append(fp)
pat = r"^" + r"\." .join(captures) + r"$"
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a Python syntax error here: r"\." .join(captures) has whitespace before the dot, so the file won’t parse and pre-commit/ruff will fail. Remove the space so it becomes a normal method call on the string.

Suggested change
pat = r"^" + r"\." .join(captures) + r"$"
pat = r"^" + r"\.".join(captures) + r"$"

Copilot uses AI. Check for mistakes.
Comment on lines +80 to +85
return run(["git", "diff", f"{base}...{head}", "--", "fastvideo/", "examples/", "tests/"],
cwd=repo)


def get_changed_files(repo: str, base: str, head: str) -> list[str]:
out = run(["git", "diff", "--name-only", f"{base}...{head}"], cwd=repo)
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git diff {base}...{head} uses the three-dot form (merge-base to head), which can hide changes that exist only on base (agent branch). For an eval harness that compares agent vs ground truth, use a two-dot diff (or generate diffs in both directions) so discrepancies aren’t missed.

Suggested change
return run(["git", "diff", f"{base}...{head}", "--", "fastvideo/", "examples/", "tests/"],
cwd=repo)
def get_changed_files(repo: str, base: str, head: str) -> list[str]:
out = run(["git", "diff", "--name-only", f"{base}...{head}"], cwd=repo)
return run(["git", "diff", f"{base}..{head}", "--", "fastvideo/", "examples/", "tests/"],
cwd=repo)
def get_changed_files(repo: str, base: str, head: str) -> list[str]:
out = run(["git", "diff", "--name-only", f"{base}..{head}"], cwd=repo)

Copilot uses AI. Check for mistakes.
import re
import sys
from urllib.request import urlopen, Request
from urllib.error import HTTPError
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import HTTPError will fail ruff (F401) since .agents/ is linted by pre-commit. Remove it or catch HTTPError explicitly in the request helpers.

Suggested change
from urllib.error import HTTPError

Copilot uses AI. Check for mistakes.
Comment on lines +17 to +21
import os
from pathlib import Path

import numpy as np
import pytest
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os and numpy are currently unused (the only np usage is in commented code), which will fail ruff (F401) under pre-commit. Remove the unused imports, or restructure the template so the imports are actually referenced.

Copilot uses AI. Check for mistakes.
Comment on lines +10 to +11
# Hard failures (exit 1): missing GPU, uv not installed, FastVideo not installed
# Warnings (continue): missing tokens, weights not downloaded yet
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Header comment says missing uv is a hard failure, but the script only issues a warning when uv is absent. Either change the comment to match behavior or make missing uv increment FAIL and exit non-zero.

Suggested change
# Hard failures (exit 1): missing GPU, uv not installed, FastVideo not installed
# Warnings (continue): missing tokens, weights not downloaded yet
# Hard failures (exit 1): missing GPU, FastVideo not installed
# Warnings (continue): uv not installed, missing tokens, weights not downloaded yet

Copilot uses AI. Check for mistakes.
Comment on lines +127 to +131
for line in diff.splitlines():
if line.startswith("diff --git"):
if current_file and current_hunk:
hunk_text = "\n".join(current_hunk)
if len(current_hunk) > 5: # ignore trivial diffs
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract_lesson_candidates() never emits a candidate for the final file in the diff because it only flushes current_hunk when it sees the next diff --git header. Add a flush after the loop to process the last accumulated hunk.

Copilot uses AI. Check for mistakes.
return candidates


def format_lesson_md(candidate: dict, model_name: str) -> str:
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format_lesson_md() is annotated as returning str, but it actually returns a (filename, body) tuple. This will fail mypy; update the return type annotation to a tuple type.

Suggested change
def format_lesson_md(candidate: dict, model_name: str) -> str:
def format_lesson_md(candidate: dict, model_name: str) -> tuple[str, str]:

Copilot uses AI. Check for mistakes.
Comment on lines +113 to +116
m = re.search(r"github\.com[:/]([^/]+/[^/\s\.]+)", url)
if not m:
raise ValueError(f"Cannot parse GitHub URL: {url}")
return m.group(1).rstrip("/")
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_repo_url() currently disallows . in repo names ([^/\s\.]+), so URLs like https://github.com/org/my.repo will be parsed incorrectly. Consider allowing dots in the repo name and separately stripping a trailing .git if present.

Suggested change
m = re.search(r"github\.com[:/]([^/]+/[^/\s\.]+)", url)
if not m:
raise ValueError(f"Cannot parse GitHub URL: {url}")
return m.group(1).rstrip("/")
m = re.search(r"github\.com[:/]([^/]+/[^/\s]+)", url)
if not m:
raise ValueError(f"Cannot parse GitHub URL: {url}")
repo = m.group(1).rstrip("/")
if repo.endswith(".git"):
repo = repo[:-4]
return repo

Copilot uses AI. Check for mistakes.
Comment on lines +246 to +250
def main():
parser = argparse.ArgumentParser(description="Recon a GitHub repo for FastVideo porting")
parser.add_argument("repo_url", help="GitHub URL, e.g. https://github.com/org/repo")
parser.add_argument("--output", "-o", default=None, help="JSON output file (default: stdout)")
parser.add_argument("--token", default=None, help="GitHub personal access token (optional)")
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs/check_prereqs.sh flow expects GITHUB_TOKEN to be set, but recon.py only accepts --token and doesn’t default to the environment variable. Consider defaulting --token to os.environ.get('GITHUB_TOKEN') (or checking it when --token is omitted) so the advertised workflow works without extra flags.

Copilot uses AI. Check for mistakes.
- FastVideo repo cloned; `uv pip install -e .[dev]` complete.
- `official_weights/<model_name>/` populated (download manually — large files
time out in agents; see Step 0).
- Official repo cloned under `FastVideo/<model_name>/` (Step 0).
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prereqs say the official repo should be cloned under FastVideo/<model_name>/, but Step 0’s command clones it into <model_name>/ in the current directory. Please make these paths consistent so users/agents don’t put the repo in two different places.

Suggested change
- Official repo cloned under `FastVideo/<model_name>/` (Step 0).
- Official repo cloned locally in the location specified in Step 0.

Copilot uses AI. Check for mistakes.
@Mister-Raggs Mister-Raggs changed the title [feat] Add fastvideo-port agent skill with Cosmos 2.5 lessonsFeat/agent port skill [feat] Add fastvideo-port agent skill with Cosmos 2.5 lessons Apr 19, 2026
- check_prereqs.sh: fix command injection in HF gating check (use env
  var + quoted heredoc so $hf_id is never expanded inside Python)
- recon.py: fix /HEAD/ fallback (query API for default_branch instead);
  fix URL regex to allow dots in repo names (.git suffix stripped);
  lowercase source before attention-type heuristic checks
- eval_harness.py: use regex to parse diff --git header instead of
  fragile split-on-" b/"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: feat New feature or capability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants