Skip to content

Pull requests: awslabs/awsome-distributed-training

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Adding OpenEnv Wordle GRPO sample
#1063 opened Apr 14, 2026 by allela-roy Contributor Loading…
Bump transformers from 4.53.0 to 5.0.0rc3 in /3.test_cases/pytorch/distillation/src dependencies Pull requests that update a dependency file python Pull requests that update python code
#1060 opened Apr 8, 2026 by dependabot bot Loading…
Bump transformers from 4.53.0 to 5.0.0rc3 in /3.test_cases/pytorch/FSDP/src dependencies Pull requests that update a dependency file python Pull requests that update python code
#1059 opened Apr 8, 2026 by dependabot bot Loading…
Bump transformers from 4.48.0 to 5.0.0rc3 in /3.test_cases/pytorch/nvrx dependencies Pull requests that update a dependency file python Pull requests that update python code
#1057 opened Apr 8, 2026 by dependabot bot Loading…
Add veRL GRPO training recipe for gpt-oss-20b on g5.12xlarge
#1054 opened Apr 4, 2026 by nkumaraws Contributor Loading…
Add CPU support for PyTorch DDP training
#1040 opened Mar 27, 2026 by aagallo Loading…
7 tasks done
Bump requests from 2.32.3 to 2.33.0 in /3.test_cases/pytorch/nvrx dependencies Pull requests that update a dependency file python Pull requests that update python code
#1036 opened Mar 25, 2026 by dependabot bot Loading…
Add V-JEPA 2 (Meta FAIR) distributed training test case
#1035 opened Mar 23, 2026 by paragao Contributor Loading…
Add DeepSpeed CI regression tests for QLoRA and GPT-103B
#1029 opened Mar 20, 2026 by paragao Contributor Loading…
Add NeMo RL GRPO training on P5en with EFA RDMA
#1025 opened Mar 17, 2026 by dmvevents Loading…
5 of 7 tasks
fix: overhaul CI workflows for FSDP regression tests
#1024 opened Mar 17, 2026 by paragao Contributor Loading…
Updating hyperpod-elastic-agent (HPEA) to v1.1.2 to support torch v2.6+
#1022 opened Mar 13, 2026 by aravneelaws Contributor Loading…
7 tasks done
docs: comprehensive instance hardware profiles (16 families)
#1021 opened Mar 13, 2026 by KeitaW Collaborator Draft
4 tasks
Add OSMO AMR Navigation test case
#1018 opened Mar 12, 2026 by KeitaW Collaborator Loading…
1 of 3 tasks
Add NeMo RL GRPO training with fault tolerance (NVRx) on EKS
#1010 opened Mar 9, 2026 by dmvevents Loading…
6 tasks
Add LeRobot pi0-FAST DROID multi-node training test case
#1003 opened Feb 26, 2026 by KeitaW Collaborator Draft
7 tasks
Updating CF stack for GB200 local zone deployments
#968 opened Feb 17, 2026 by KeitaW Collaborator Loading…
Syntax improvements and code quality enhancements for EFA node exporter
#966 opened Feb 17, 2026 by KeitaW Collaborator Loading…
ProTip! Follow long discussions with comments:>50.