Training with DistrubutedSampler

Thank you for sharing!

I'm using a single machine with multiple GPUs (8 gpus). When I set the sampler to DistributedSampler, the calculated `max_train_steps` is **one-eighth** the value when using RandomSampler. This seems to indicate that the dataloader, influenced by DistributedSampler, is splitting the data **eight times** more. However, I'm not sure if this allows for full coverage of the training data. Is this feasible?

```
train_sampler = DistributedSampler(train_dataset, num_replicas=accelerator.num_processes,
                                       rank=accelerator.process_index, shuffle=True, drop_last=True)
train_dataloader = DataLoader(
    train_dataset,
    batch_size=cfg.per_gpu_batch_size,
    sampler=train_sampler,
    num_workers=cfg.dataloader_num_workers,
    pin_memory=True,
    drop_last=True,
)
```

I trained both versions. Given the same amount of data, the DistributedSampler model performed better when its `max_train_steps `was one-eighth of that of the RandomSampler. This puzzles me, and I'd appreciate your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training with DistrubutedSampler #78

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Training with DistrubutedSampler #78

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions