Multi-GPU Model Loading Issues - Issue & Code Fix

It takes 10-20 min to load up torch, checkpoints, etc each when using 2 GPUs.  The time grows with more GPUs.  It otherwise only takes a couple minutes if it were 1 GPU.  I suspect it's because of contention issues where all GPUs are trying to access the model files at the same time.

Example log
Look at lines for "Loading torch model" (10 minutes) and "Loading text encoder model" (20 minutes)

```
2025-03-16 18:33:46.289 | INFO     | hyvideo.inference:from_pretrained:170 - Got text-to-video model root path: ckpts
2025-03-16 18:33:46.289 | INFO     | hyvideo.inference:from_pretrained:170 - Got text-to-video model root path: ckpts
DEBUG 03-16 18:33:46 [parallel_state.py:200] world_size=2 rank=0 local_rank=-1 distributed_init_method=env:// backend=nccl
DEBUG 03-16 18:33:46 [parallel_state.py:200] world_size=2 rank=1 local_rank=-1 distributed_init_method=env:// backend=nccl
2025-03-16 18:33:46.352 | INFO     | hyvideo.inference:from_pretrained:192 - Building model...
2025-03-16 18:33:46.352 | INFO     | hyvideo.inference:from_pretrained:192 - Building model...
2025-03-16 18:33:47.023 | INFO     | hyvideo.inference:load_state_dict:334 - Loading torch model ckpts/hunyuan-video-i2v-720p/transformers/mp_rank_00_model_states.pt...
2025-03-16 18:33:47.024 | INFO     | hyvideo.inference:load_state_dict:334 - Loading torch model ckpts/hunyuan-video-i2v-720p/transformers/mp_rank_00_model_states.pt...
2025-03-16 18:42:56.412 | INFO     | hyvideo.vae:load_vae:29 - Loading 3D VAE model (884-16c-hy) from: ./ckpts/hunyuan-video-i2v-720p/vae
2025-03-16 18:42:56.453 | INFO     | hyvideo.vae:load_vae:29 - Loading 3D VAE model (884-16c-hy) from: ./ckpts/hunyuan-video-i2v-720p/vae
2025-03-16 18:46:43.852 | INFO     | hyvideo.vae:load_vae:55 - VAE to dtype: torch.float16
2025-03-16 18:46:43.962 | INFO     | hyvideo.vae:load_vae:55 - VAE to dtype: torch.float16
2025-03-16 18:46:43.973 | INFO     | hyvideo.text_encoder:load_text_encoder:35 - Loading text encoder model (llm-i2v) from: ./ckpts/text_encoder_i2v
2025-03-16 18:46:44.075 | INFO     | hyvideo.text_encoder:load_text_encoder:35 - Loading text encoder model (llm-i2v) from: ./ckpts/text_encoder_i2v
Loading checkpoint shards:  25%|███████████████████████████████████████▊                                                                                                                       | 1/4 [08:59<26:57, 539.10s/it]Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [20:05<00:00, 301.49s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [20:06<00:00, 301.52s/it]
2025-03-16 19:08:08.187 | INFO     | hyvideo.text_encoder:load_text_encoder:61 - Text encoder to dtype: torch.float16
2025-03-16 19:08:11.073 | INFO     | hyvideo.text_encoder:load_tokenizer:75 - Loading tokenizer (llm-i2v) from: ./ckpts/text_encoder_i2v
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
2025-03-16 19:08:11.618 | INFO     | hyvideo.text_encoder:load_text_encoder:35 - Loading text encoder model (clipL) from: ./ckpts/text_encoder_2
2025-03-16 19:08:15.042 | INFO     | hyvideo.text_encoder:load_text_encoder:61 - Text encoder to dtype: torch.float16
2025-03-16 19:08:17.000 | INFO     | hyvideo.text_encoder:load_text_encoder:61 - Text encoder to dtype: torch.float16
2025-03-16 19:08:17.035 | INFO     | hyvideo.text_encoder:load_tokenizer:75 - Loading tokenizer (clipL) from: ./ckpts/text_encoder_2
2025-03-16 19:08:17.248 | INFO     | hyvideo.inference:predict:596 - Input (height, width, video_length) = (720, 720, 129)
2025-03-16 19:08:19.307 | INFO     | hyvideo.text_encoder:load_tokenizer:75 - Loading tokenizer (llm-i2v) from: ./ckpts/text_encoder_i2v
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
2025-03-16 19:08:19.639 | INFO     | hyvideo.text_encoder:load_text_encoder:35 - Loading text encoder model (clipL) from: ./ckpts/text_encoder_2
2025-03-16 19:08:19.996 | INFO     | hyvideo.text_encoder:load_text_encoder:61 - Text encoder to dtype: torch.float16
2025-03-16 19:08:20.028 | INFO     | hyvideo.text_encoder:load_tokenizer:75 - Loading tokenizer (clipL) from: ./ckpts/text_encoder_2
2025-03-16 19:08:20.149 | INFO     | hyvideo.inference:predict:596 - Input (height, width, video_length) = (720, 720, 129)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-GPU Model Loading Issues - Issue & Code Fix #36

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multi-GPU Model Loading Issues - Issue & Code Fix #36

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions