You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Remove redundant argsort in _sparse_index_exchange: the caller
(build_gp_context) already sorts needed_atoms by source rank, so
the exchange function can use them directly as the send buffer.
- Batch send_counts and recv_counts into a single GPU→CPU transfer
(torch.stack + .cpu()) instead of separate .sum().item() and
.tolist() calls, eliminating 2 GPU→CPU synchronization points.
- Remove now-unused needed_from_ranks parameter from the function.
0 commit comments