[Model] DeepSeek-V3 Enhancements 

This issue tracks follow up enhancements after initial support for the Deepseek V3 model. Please feel free to chime in and contribute! 

- [x] Follow up #11523: enhance testing with shapes of production models and run it regularly on H100. 
   * Solving via cutlas blockwise quantization kernels. 
- [x] Follow up #11502: 
    - [x] Test and enable torch.compile
    - [ ] ~Refactor MoEMethodBase to unify and clean up the extra arguments of `scoring_func` and `e_correction_bias`~
    - [x] Kernel tuning for 8xH200, MI300x, H100 (TP16 and TP8PP2 case)
        - Use https://github.com/vllm-project/vllm/blob/main/benchmarks/kernels/benchmark_moe.py, but adapt it for the w8a8 fused moe kernel. 
    - [x] CUDA Graph support 
- [x] MLA #10927 @simon-mo 
- [ ] Support nextn prediction heads ([EAGLE](https://arxiv.org/abs/2401.15077) style prediction heads)
    - Original PR for EAGLE support #6830 Perf #9565 Discussion #11126 Docs #11417
- [ ] Support expert parallelism for MoE.
- [ ] Support data parallelism for MLA. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Model] DeepSeek-V3 Enhancements #11539

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Model] DeepSeek-V3 Enhancements #11539

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions