Skip to content

Multi-GPU Model Loading Issues - Issue & Code Fix #36

@pftq

Description

@pftq

It takes 10-20 min to load up torch, checkpoints, etc each when using 2 GPUs. The time grows with more GPUs. It otherwise only takes a couple minutes if it were 1 GPU. I suspect it's because of contention issues where all GPUs are trying to access the model files at the same time.

Example log
Look at lines for "Loading torch model" (10 minutes) and "Loading text encoder model" (20 minutes)

2025-03-16 18:33:46.289 | INFO     | hyvideo.inference:from_pretrained:170 - Got text-to-video model root path: ckpts
2025-03-16 18:33:46.289 | INFO     | hyvideo.inference:from_pretrained:170 - Got text-to-video model root path: ckpts
DEBUG 03-16 18:33:46 [parallel_state.py:200] world_size=2 rank=0 local_rank=-1 distributed_init_method=env:// backend=nccl
DEBUG 03-16 18:33:46 [parallel_state.py:200] world_size=2 rank=1 local_rank=-1 distributed_init_method=env:// backend=nccl
2025-03-16 18:33:46.352 | INFO     | hyvideo.inference:from_pretrained:192 - Building model...
2025-03-16 18:33:46.352 | INFO     | hyvideo.inference:from_pretrained:192 - Building model...
2025-03-16 18:33:47.023 | INFO     | hyvideo.inference:load_state_dict:334 - Loading torch model ckpts/hunyuan-video-i2v-720p/transformers/mp_rank_00_model_states.pt...
2025-03-16 18:33:47.024 | INFO     | hyvideo.inference:load_state_dict:334 - Loading torch model ckpts/hunyuan-video-i2v-720p/transformers/mp_rank_00_model_states.pt...
2025-03-16 18:42:56.412 | INFO     | hyvideo.vae:load_vae:29 - Loading 3D VAE model (884-16c-hy) from: ./ckpts/hunyuan-video-i2v-720p/vae
2025-03-16 18:42:56.453 | INFO     | hyvideo.vae:load_vae:29 - Loading 3D VAE model (884-16c-hy) from: ./ckpts/hunyuan-video-i2v-720p/vae
2025-03-16 18:46:43.852 | INFO     | hyvideo.vae:load_vae:55 - VAE to dtype: torch.float16
2025-03-16 18:46:43.962 | INFO     | hyvideo.vae:load_vae:55 - VAE to dtype: torch.float16
2025-03-16 18:46:43.973 | INFO     | hyvideo.text_encoder:load_text_encoder:35 - Loading text encoder model (llm-i2v) from: ./ckpts/text_encoder_i2v
2025-03-16 18:46:44.075 | INFO     | hyvideo.text_encoder:load_text_encoder:35 - Loading text encoder model (llm-i2v) from: ./ckpts/text_encoder_i2v
Loading checkpoint shards:  25%|███████████████████████████████████████▊                                                                                                                       | 1/4 [08:59<26:57, 539.10s/it]Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [20:05<00:00, 301.49s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [20:06<00:00, 301.52s/it]
2025-03-16 19:08:08.187 | INFO     | hyvideo.text_encoder:load_text_encoder:61 - Text encoder to dtype: torch.float16
2025-03-16 19:08:11.073 | INFO     | hyvideo.text_encoder:load_tokenizer:75 - Loading tokenizer (llm-i2v) from: ./ckpts/text_encoder_i2v
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
2025-03-16 19:08:11.618 | INFO     | hyvideo.text_encoder:load_text_encoder:35 - Loading text encoder model (clipL) from: ./ckpts/text_encoder_2
2025-03-16 19:08:15.042 | INFO     | hyvideo.text_encoder:load_text_encoder:61 - Text encoder to dtype: torch.float16
2025-03-16 19:08:17.000 | INFO     | hyvideo.text_encoder:load_text_encoder:61 - Text encoder to dtype: torch.float16
2025-03-16 19:08:17.035 | INFO     | hyvideo.text_encoder:load_tokenizer:75 - Loading tokenizer (clipL) from: ./ckpts/text_encoder_2
2025-03-16 19:08:17.248 | INFO     | hyvideo.inference:predict:596 - Input (height, width, video_length) = (720, 720, 129)
2025-03-16 19:08:19.307 | INFO     | hyvideo.text_encoder:load_tokenizer:75 - Loading tokenizer (llm-i2v) from: ./ckpts/text_encoder_i2v
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
2025-03-16 19:08:19.639 | INFO     | hyvideo.text_encoder:load_text_encoder:35 - Loading text encoder model (clipL) from: ./ckpts/text_encoder_2
2025-03-16 19:08:19.996 | INFO     | hyvideo.text_encoder:load_text_encoder:61 - Text encoder to dtype: torch.float16
2025-03-16 19:08:20.028 | INFO     | hyvideo.text_encoder:load_tokenizer:75 - Loading tokenizer (clipL) from: ./ckpts/text_encoder_2
2025-03-16 19:08:20.149 | INFO     | hyvideo.inference:predict:596 - Input (height, width, video_length) = (720, 720, 129)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions