## Triton Backend @ispobock @pankajroark - [x] [refactor triton backend 1](https://github.com/sgl-project/sglang/pull/3292), [2](https://github.com/sgl-project/sglang/pull/3309) - [x] [support custom mask](https://github.com/sgl-project/sglang/pull/3317) - [x] [support EAGLE 2](https://github.com/sgl-project/sglang/pull/3466) - [x] [compatible with CUDA Graph](https://github.com/sgl-project/sglang/pull/3500) - [x] [support nextn I (single MTP head)](https://github.com/sgl-project/sglang/pull/3582) - [x] support next II (multi MTP heads) (WIP @pankajroark ) ## FlashInfer Backend @zhyncs @yzh119 - [x] compatible with disable MLA - [x] support FlashInfer nightly MLA ragged prefill and CUDA Core MLA decoding - [x] support FlashInfer v0.2.0.post3 MLA ragged, paged prefill and decoding (@zhyncs @yzh119 ) - [x] nextn parts can be shared with Triton Backend ## EAGLE 2 @zhyncs @Ying1123 - [x] implement sampling kernel in [sgl-kernel](https://github.com/sgl-project/sglang/tree/main/sgl-kernel) (drop cutex) [kernel part](https://github.com/sgl-project/sglang/pull/3373), [python part](https://github.com/sgl-project/sglang/pull/3378) - [x] bunch of fixes [non greedy fix](https://github.com/sgl-project/sglang/pull/3407), [disable cuda graph fix 1](https://github.com/sgl-project/sglang/pull/3412), [fix 2](https://github.com/sgl-project/sglang/pull/3411), [cleanup 1](https://github.com/sgl-project/sglang/pull/3415), [cleanup 2](https://github.com/sgl-project/sglang/pull/3422), [fix cuda graph capture failure](https://github.com/sgl-project/sglang/pull/3430), [fix 2](https://github.com/sgl-project/sglang/pull/3431), [reduce one draft forward](https://github.com/sgl-project/sglang/pull/3468) - [x] compatible with radix cache and chunked prefill (WIP @Ying1123 )
Triton Backend
@ispobock @pankajroark
refactor triton backend 1, 2
support custom mask
support EAGLE 2
compatible with CUDA Graph
support nextn I (single MTP head)
support next II (multi MTP heads) (WIP @pankajroark )
FlashInfer Backend
@zhyncs @yzh119
compatible with disable MLA
support FlashInfer nightly MLA ragged prefill and CUDA Core MLA decoding
support FlashInfer v0.2.0.post3 MLA ragged, paged prefill and decoding (@zhyncs @yzh119 )
nextn parts can be shared with Triton Backend
EAGLE 2
@zhyncs @Ying1123
implement sampling kernel in sgl-kernel (drop cutex) kernel part, python part
bunch of fixes non greedy fix, disable cuda graph fix 1, fix 2, cleanup 1, cleanup 2, fix cuda graph capture failure, fix 2, reduce one draft forward
compatible with radix cache and chunked prefill (WIP @Ying1123 )