Skip to content

Tests fail at collection due to torch/NCCL version mismatch #209

@alessiodevoto

Description

@alessiodevoto

Bug

Tests fail to collect on with:
▎ ImportError: libtorch_cuda.so: undefined symbol: ncclDevCommDestroy
▎ torch 2.10.0 requires NCCL 2.19+, which is unavailable on the runner. Downgrading torch is not straightforward since the prebuild flash-attn wheels
(mjun0812/flash-attention-prebuild-wheels v0.7.12) are only built for torch 2.10.

To Reproduce

See example here

Repository version

v0.5.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions