Support FlashMLA backend by sleepcoo · Pull Request #4472 · sgl-project/sglang

sleepcoo · 2025-03-16T10:07:56Z

Motivation

Integrate flashmla for decoding, and the accuracy test is currently okay. The current implementation is quite simple, directly integrating flashmla as the backend. Later, we need to abstract a fastmla_backend, using fa3 for prefill and flashmla for decode

Modifications

FlashMLABackend inherits from FlashInferMLAAttnBackend, using FlashInferMLAAttnBackend for prefill and FlashMLABackend for decoding.
Add the create_flashmla_kv_indices_triton function to be compatible with the block table format of flashmla.

command

 python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3 --trust-remote --tp 8 --page-size 64 --disable-cuda-graph --enable-flashmla

Todo

Support FlashMLA decode with cudagraph
Enable speculative sampling in FlashMLA
Add unit test
Performance Analysis and Optimization
Integrate FA3 prefill

merrymercy · 2025-03-16T23:37:41Z

        assert self.chunked_prefill_size % self.page_size == 0

+        if self.enable_flashmla is True:
+            assert self.page_size == 64, "FlashMLA only support page_size=64"


automatically set this

merrymercy · 2025-03-16T23:38:28Z

+        flashmla_index = torch.full(
+            (bs, max_seqlen_pad), -1, dtype=torch.int32, device=q.device
+        )
+        create_flashmla_kv_indices_triton[(bs,)](


this metadata is the same for all layers, so we can process them only once in init_forward_metadata

I will fix the issue in the PR of CUDA graph.

lishicheng1996 · 2025-03-17T13:26:40Z

Hi, I test --enable-flashmla on H20*16, and it is slower than normal version. Do you have data about E2E inference speed gain with flashmla? Thanks very much!

sleepcoo · 2025-03-17T13:29:00Z

Hi, I test --enable-flashmla on H20*16, and it become slower than normal version. Do you have data about E2E inference speed gain with flashmla? Thanks very much!

Wait for my cudagraph implementation

sleepcoo and others added 6 commits March 14, 2025 11:57

Init flashmla backend

1e2b373

add flashmla index triton

190046f

fix create_flashmla_kv_indices_triton

b1e2f61

fix idx

cb88fa1

support flashmla decode

3c856f3

add page size check

f2f2c75

sleepcoo changed the title ~~Support flashmla~~ Support FlashMLA backend Mar 16, 2025

Merge branch 'main' into support-flashmla

1d667ae

sleepcoo marked this pull request as ready for review March 16, 2025 10:09

sleepcoo requested review from ByronHsu, HaiShaw, Ying1123, hnyls2002, ispobock, merrymercy and zhyncs as code owners March 16, 2025 10:09

zhyncs approved these changes Mar 16, 2025

View reviewed changes

zhyncs merged commit a53fe42 into sgl-project:main Mar 16, 2025

merrymercy reviewed Mar 16, 2025

View reviewed changes

sleepcoo deleted the support-flashmla branch March 17, 2025 03:34

This was referenced Mar 13, 2025

Development Roadmap (2025 H1) #4042

Closed

[Feature] DeepSeek V3 optimization #2591

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support FlashMLA backend#4472

Support FlashMLA backend#4472
zhyncs merged 7 commits intosgl-project:mainfrom
sleepcoo:support-flashmla

sleepcoo commented Mar 16, 2025 •

edited

Loading

Uh oh!

merrymercy Mar 16, 2025

Uh oh!

merrymercy Mar 16, 2025

Uh oh!

sleepcoo Mar 17, 2025

Uh oh!

lishicheng1996 commented Mar 17, 2025 •

edited

Loading

Uh oh!

sleepcoo commented Mar 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

sleepcoo commented Mar 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

command

Todo

Uh oh!

merrymercy Mar 16, 2025

Choose a reason for hiding this comment

Uh oh!

merrymercy Mar 16, 2025

Choose a reason for hiding this comment

Uh oh!

sleepcoo Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

lishicheng1996 commented Mar 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sleepcoo commented Mar 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sleepcoo commented Mar 16, 2025 •

edited

Loading

lishicheng1996 commented Mar 17, 2025 •

edited

Loading