Multi-GPU RuntimeError: Invalid Attention Shape [2, 65631, -1] for Size 204171264

The HunyuanCustom model fails when using multiple GPUs (--nproc_per_node > 1), throwing a shape mismatch error in the attention computation:

RuntimeError: shape '[2, 65631, -1]' is invalid for input of size 204171264

✅ Works in single-GPU mode (--nproc_per_node=1)
❌ Fails in multi-GPU mode (--nproc_per_node >= 2)

Error Location:
hymm_sp/modules/models.py, line 181 (attn.view() reshape op)

Root Cause:

The attention mechanism doesn’t account for distributed tensor partitioning across GPUs

Batch/sequence dimensions are misaligned during parallel execution

Repro Command:
torchrun --nproc_per_node=2 hymm_sp/sample_batch.py --use-fp8  # Fails
torchrun --nproc_per_node=1 hymm_sp/sample_batch.py --use-fp8  # Works


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-GPU RuntimeError: Invalid Attention Shape [2, 65631, -1] for Size 204171264 #47

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multi-GPU RuntimeError: Invalid Attention Shape [2, 65631, -1] for Size 204171264 #47

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions