[feat] Add Cosmos 2.5 T2W training pipeline (LoRA + full fine-tune) by Mister-Raggs · Pull Request #1227 · hao-ai-lab/FastVideo

Mister-Raggs · 2026-04-09T19:49:43Z

Purpose

Adds end-to-end LoRA and full fine-tuning support for Cosmos-Predict2.5-2B (text-to-world) in the FastVideo training framework, along with the preprocessing and example scripts needed to run it. Also fixes several bugs in existing infrastructure that blocked Cosmos 2.5 from working with `v1_preprocess.py` and the shared text encoding stage.

Changes

New files:

`fastvideo/training/cosmos2_5_training_pipeline.py` — `Cosmos25TrainingPipeline` subclassing `TrainingPipeline`. Handles Cosmos 2.5 specifics: flow-matching scheduler (shift=5.0), 18-channel input (16 latent + 1 condition mask + 1 padding mask), Reason1 100352-dim text embeddings, and skips latent normalization (applied inside the VAE encoder).
`examples/training/finetune/cosmos2_5/` — README, preprocessing script, full fine-tune script, LoRA fine-tune script, and `validation.json`.

Bug fixes:

`fastvideo/models/dits/cosmos2_5.py` — Replace `nn.Linear` with `ReplicatedLinear` in all attention projections (`to_q`, `to_k`, `to_v`, `to_out` in both self-attn and cross-attn). `get_lora_layer()` is invisible to plain `nn.Linear`, so LoRA adapters were silently not applied. Also fixes two `fp32`/`bf16` dtype mismatches in `Cosmos25PatchEmbed.proj` and `crossattn_proj` that crashed validation inference.
`fastvideo/dataset/utils.py` — CFG dropout zero tensor was hardcoded as `np.zeros((512, 4096))` (T5-XXL shape). Cosmos 2.5 uses Reason1 embeddings of a different shape, causing a mismatch at training time. Fix: use `np.zeros(shape)` derived from the stored embedding schema.
`fastvideo/pipelines/preprocess/v1_preprocess.py` — VAE config was unconditionally overwritten with `WanVAEConfig(load_encoder=True, load_decoder=True)`, replacing the Cosmos 2.5 VAE config with a Wan-specific one. Fix: set `load_encoder`/`load_decoder` flags on the existing config instead.
`fastvideo/pipelines/preprocess/preprocess_pipeline_base.py` — Add `.float()` before `.numpy()` to handle bf16 text embeddings (numpy does not support bfloat16).
`fastvideo/pipelines/stages/text_encoding.py` — Guard against empty strings with the Qwen2 tokenizer; unwrap `Qwen2_5_VLProcessor` to its inner tokenizer for text-only encoding.
`fastvideo/dataset/preprocessing_datasets.py` — Wrap `AutoTokenizer.from_pretrained` in try/except for multimodal processors (Qwen2.5-VL) that raise on plain loading.

Test Plan

# Preprocess a video dataset
bash examples/training/finetune/cosmos2_5/preprocess_cosmos2_5_t2w.sh

# LoRA fine-tune with validation every 200 steps
bash examples/training/finetune/cosmos2_5/finetune_t2w_lora.sh

Test Results

Ran 2000-step LoRA training on the wlsaidhi/crush-smol-merged dataset (47 hydraulic press videos, 480×832, 49 frames) on a single RTX PRO 6000 (94GB VRAM):

Training completed without errors: final loss 0.067, grad norm 0.007, ~2.1s/step
Validation ran at steps 500, 1000, 1500, 2000 using domain-matched prompts — videos show clear style adaptation vs. base model
W&B run: https://wandb.ai/raghav-kachroo/cosmos2_5_t2w_lora

Training output (final steps)

step: 1980/2000 | loss: 0.0674 | grad_norm: 0.0071 | lr: 9.80e-05
step: 1990/2000 | loss: 0.0689 | grad_norm: 0.0068 | lr: 9.90e-05
step: 2000/2000 | loss: 0.0671 | grad_norm: 0.0073 | lr: 1.00e-04

Checklist

I ran pre-commit run --all-files and fixed all issues
I added or updated tests for my changes
I updated documentation if needed
I considered GPU memory impact of my changes

Notes:

Preprocessing script currently uses v1_preprocess.py. Port to v1_preprocessing_new is a follow-up (~1-2h).
14B model configs and Cosmos 2.5 selective activation checkpointing policies are prototyped locally — can follow up if useful.

New files: - fastvideo/training/cosmos2_5_training_pipeline.py: Cosmos25TrainingPipeline subclassing TrainingPipeline. Handles flow-matching (shift=5.0), 18-channel input (16 latent + condition/padding masks), Reason1 100352-dim embeddings, and skips latent normalisation (applied inside the VAE encoder). - examples/training/finetune/cosmos2_5/: README, preprocessing script, full fine-tune script, LoRA fine-tune script, and validation.json. Bug fixes: - cosmos2_5.py: Replace nn.Linear with ReplicatedLinear in all attention projections so LoRA adapters are correctly injected. Fix fp32/bf16 dtype mismatches in Cosmos25PatchEmbed.proj and crossattn_proj that crashed validation inference. - dataset/utils.py: CFG dropout zero tensor was hardcoded as (512, 4096) (T5-XXL shape); use actual embedding shape from schema instead. - v1_preprocess.py: VAE config was overwritten with WanVAEConfig, replacing the Cosmos 2.5 VAE config. Set load_encoder/load_decoder on existing config. - preprocess_pipeline_base.py: Add .float() before .numpy() to handle bf16 text embeddings (numpy does not support bfloat16). - text_encoding.py: Guard against empty strings with Qwen2 tokenizer; unwrap Qwen2_5_VLProcessor to its inner tokenizer for text-only encoding. - preprocessing_datasets.py: Wrap AutoTokenizer.from_pretrained in try/except for multimodal processors (Qwen2.5-VL) that raise on plain loading.

github-actions

Welcome to FastVideo! Thanks for your first pull request.

How our CI works:

PRs run a two-tier CI system:

Pre-commit — formatting (yapf), linting (ruff), type checking (mypy). Runs immediately on every PR.
Fastcheck — core GPU tests (encoders, VAEs, transformers, kernels, unit tests). Runs automatically via Buildkite on relevant file changes (~10-15 min).
Full Suite — integration tests, training pipelines, SSIM regression. Runs only when a reviewer adds the ready label.

Before your PR is reviewed:

pre-commit run --all-files passes locally
You've added or updated tests for your changes
The PR description explains what and why

If pre-commit fails, a bot comment will explain how to fix it. Fastcheck and Full Suite results appear in the Checks section below.

Useful links:

mergify · 2026-04-09T19:50:25Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 PR merge requirements

Waiting for:

#approved-reviews-by>=1
check-success=full-suite-passed
check-success~=pre-commit

This rule is failing.

#approved-reviews-by>=1
check-success=full-suite-passed
check-success~=pre-commit
check-success=fastcheck-passed
title~=(?i)^\[(feat|feature|bugfix|fix|refactor|perf|ci|doc|docs|misc|chore|kernel|new.?model)\]

gemini-code-assist

Code Review

This pull request introduces support for fine-tuning the Cosmos 2.5 text-to-world model, including the necessary training pipeline, preprocessing scripts, and configuration files. It also includes several robustness improvements, such as handling potential tokenizer errors, ensuring correct data shapes during training, and fixing precision mismatches. Regarding the review feedback, I have kept the comment regarding the overly broad exception handling in the tokenizer initialization as it represents a potential risk for debugging. The comment regarding the dictionary construction in the preprocessing script was removed as the current implementation is concise and idiomatic.

gemini-code-assist · 2026-04-09T19:53:21Z

+            try:
+                tokenizer = AutoTokenizer.from_pretrained(
+                    tokenizer_path, cache_dir=args.cache_dir)
+            except (ValueError, OSError):
+                pass


The try-except block suppresses all ValueError and OSError exceptions, which might hide genuine configuration errors or missing files. It is better to log the error or at least be more specific about which exceptions are expected.

Copilot

Pull request overview

Adds first-class Cosmos-Predict2.5-2B (Cosmos 2.5) text-to-world fine-tuning support to FastVideo by introducing a dedicated training pipeline, example scripts, and several compatibility fixes across preprocessing, dataset loading, and the Cosmos 2.5 DiT implementation.

Changes:

Added a Cosmos 2.5 training pipeline supporting both full fine-tuning and LoRA.
Fixed multiple Cosmos 2.5 blockers in preprocessing/text encoding/dataset utilities (dtype handling, tokenizer edge cases, CFG dropout shape).
Updated Cosmos 2.5 DiT internals to make LoRA attach correctly and to fix fp32/bf16 mismatches.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
fastvideo/training/cosmos2_5_training_pipeline.py	New Cosmos 2.5-specific training pipeline wrapper and input kwarg construction.
fastvideo/pipelines/stages/text_encoding.py	Tokenizer edge-case handling for empty strings and multimodal processors.
fastvideo/pipelines/preprocess/v1_preprocess.py	Avoids overwriting model-specific VAE config during preprocessing.
fastvideo/pipelines/preprocess/preprocess_pipeline_base.py	Casts bf16 embeddings to float before NumPy conversion.
fastvideo/models/dits/cosmos2_5.py	Enables LoRA on attention projections and fixes dtype mismatches in patch/cross-attn projections.
fastvideo/dataset/utils.py	Makes CFG-dropout zero embeddings match the stored embedding shape.
fastvideo/dataset/preprocessing_datasets.py	Makes tokenizer initialization more robust to multimodal processor loading errors.
examples/training/finetune/cosmos2_5/README.md	Adds user-facing instructions for Cosmos 2.5 T2W preprocessing and training.
examples/training/finetune/cosmos2_5/preprocess_cosmos2_5_t2w.sh	Adds an end-to-end preprocessing script for Cosmos 2.5 T2W.
examples/training/finetune/cosmos2_5/finetune_t2w.sh	Adds a reference full fine-tune launch script.
examples/training/finetune/cosmos2_5/finetune_t2w_lora.sh	Adds a reference LoRA fine-tune launch script.
examples/training/finetune/cosmos2_5/validation.json	Adds sample validation prompts/config for periodic validation runs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-09T19:55:29Z

+            # If tokenizer is a multimodal processor (e.g. Qwen2_5_VLProcessor),
+            # use its inner tokenizer for text-only encoding.
+            tok = getattr(tokenizer, "tokenizer", tokenizer)
+
            if encoder_config.is_chat_model:
                text_inputs = tokenizer.apply_chat_template(processed_texts, **tok_kwargs).to(target_device)
            else:
-                text_inputs = tokenizer(processed_texts, **tok_kwargs).to(target_device)
+                text_inputs = tok(processed_texts, **tok_kwargs).to(target_device)


In the is_chat_model branch, this still calls tokenizer.apply_chat_template(...) instead of using the unwrapped tok. When tokenizer is a multimodal processor (e.g., Qwen2_5_VLProcessor), apply_chat_template typically exists on the inner tokenizer, so this will raise an AttributeError (or apply the wrong preprocessing). Use tok.apply_chat_template(...) here for consistency with the text-only path.

Copilot · 2026-04-09T19:55:30Z

+        tokenizer = None
        if os.path.exists(tokenizer_path):
-            tokenizer = AutoTokenizer.from_pretrained(tokenizer_path,
-                                                      cache_dir=args.cache_dir)
-        else:
-            tokenizer = None
+            try:
+                tokenizer = AutoTokenizer.from_pretrained(
+                    tokenizer_path, cache_dir=args.cache_dir)
+            except (ValueError, OSError):
+                pass


The tokenizer load failure is swallowed silently (except ...: pass), which can make real configuration/path issues hard to diagnose and will later disable text encoding without explanation. Consider logging a warning (including the exception) and/or narrowing the exception handling to the known multimodal-processor case you want to ignore.

Copilot · 2026-04-09T19:55:30Z

+    def initialize_pipeline(self, fastvideo_args: FastVideoArgs):
+        """Create the flow-matching scheduler with Cosmos 2.5's shift=5.0."""
+        self.modules["scheduler"] = FlowUniPCMultistepScheduler(shift=fastvideo_args.pipeline_config.flow_shift)
+


initialize_pipeline() sets self.modules["scheduler"] with flow_shift, but TrainingPipeline.train() later overwrites self.noise_scheduler with a new FlowMatchEulerDiscreteScheduler() (default args). That means the flow_shift you set here won’t affect timestep/sigma sampling during training. If Cosmos 2.5 training depends on shift=5.0, consider overriding initialize_training_pipeline()/train() (or otherwise wiring self.noise_scheduler) so training uses the configured shift.

Copilot AI review requested due to automatic review settings April 9, 2026 19:49

github-actions bot reviewed Apr 9, 2026

View reviewed changes

mergify bot added type: feat New feature or capability scope: training Training pipeline, methods, configs labels Apr 9, 2026

Copilot started reviewing on behalf of Mister-Raggs April 9, 2026 19:50 View session

mergify bot added scope: inference Inference pipeline, serving, CLI scope: data Data preprocessing, datasets scope: model Model architecture (DiTs, encoders, VAEs) labels Apr 9, 2026

gemini-code-assist bot reviewed Apr 9, 2026

View reviewed changes

Copilot AI reviewed Apr 9, 2026

View reviewed changes

Mister-Raggs mentioned this pull request Apr 18, 2026

[feat] Add fastvideo-port agent skill with Cosmos 2.5 lessons #1241

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Add Cosmos 2.5 T2W training pipeline (LoRA + full fine-tune)#1227

[feat] Add Cosmos 2.5 T2W training pipeline (LoRA + full fine-tune)#1227
Mister-Raggs wants to merge 1 commit intohao-ai-lab:mainfrom
Mister-Raggs:feat/cosmos25-training

Mister-Raggs commented Apr 9, 2026

Uh oh!

github-actions bot left a comment

Uh oh!

mergify bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Mister-Raggs commented Apr 9, 2026

Purpose

Changes

Test Plan

Test Results

Checklist

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Protections

🔴 PR merge requirements

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mergify bot commented Apr 9, 2026 •

edited

Loading