I've made a fused Qwen3 MoE layer for faster fine-tuning #2890

woct0rdho · 2025-07-06T12:17:27Z

woct0rdho
Jul 6, 2025

https://github.com/woct0rdho/transformers-qwen3-moe-fused

A few months ago there was a PR to introduce the fused MoE kernels: #2465 , but if I understand correctly, it's not actually used when we fine-tune an MoE model in Unsloth. So I started to try actually using it, while being compatible with the HF Transformers ecosystem.

Now I provide an example of fine-tuning the fused Qwen3-30B-A3B with LoRA and 4-bit quantization. On a single GPU with 24GB VRAM, it reaches 100% GPU usage and 5x speedup compared to the unfused model. The Unsloth optimizations such as fast attention and fast LoRA (on the non-MoE linear layers), RMSNorm, gradient checkpointing, can be automatically applied.

There is still room for further optimization, such as supporting the fast LoRA on the MoE layer. (Update: This is done!)

Do you have any idea how this can be integrated into Unsloth? I guess the MoE kernels can get some visibility only if we enable them by default.

danielhanchen · 2025-07-06T12:44:32Z

danielhanchen
Jul 6, 2025
Maintainer

Hey @woct0rdho ! Nice work! We haven't yet started to integrate everything re MoE kernels, but we first wanted to compartmentalize stuff - hence there was some code for MoE kernels, but we haven't yet enabled them.

I took a look at your repo - fantastic work! Would you be interested in making a PR? In fact, are you interested in joining Unsloth full time / part time to work on this? :)

6 replies

woct0rdho Jul 6, 2025
Author

OK I see. I can try to make a PR, but I still need to better read the code architecture of Unsloth (to be honest I think it's pretty atypical of a Python package, so I started by modifying Transformers rather than Unsloth). In the next days I'll try to implement the fast LoRA and see how to add my code to Unsloth.

Also, thank you for the job invitation, but I have a job elsewhere :)

danielhanchen Jul 6, 2025
Maintainer

Oh a shame :( Well the door is always open! I remember you're also the one who maintains triton-windows as well :) I'm actually working on an even faster fused LoRA kernel, but maybe it'll be in the repo in a few weeks

danielhanchen Jul 6, 2025
Maintainer

Oh I forgot to mention re licensing - we decided to make Unsloth dual licensed so all code under the kernels folder is agplv3 licensed - there's a license file in the folder :)

The primary reason is because many other packages and companies plagiarize from Unsloth without any credit ( ie no acknowledgements and license copyright mentions ), and so we tried doing linking via lgplv3 with no success, since people would sneakily fork the lgpl package and link to their fork.

We'll be updating the main package license and readme in the coming days to use two licenses and explain that it'll be optional to use code under kernels!

woct0rdho Jul 6, 2025
Author

Good point. I've added the AGPLv3 license.

danielhanchen Jul 7, 2025
Maintainer

Thank you! :) I'll look through your repo today and see how we can directly make MoE LoRA super fast - again appreciate your work!

If you wanna do a joint PR, more than happy to collab!

c3-semihasaj · 2025-08-26T17:09:40Z

woct0rdho Aug 27, 2025
Author

@zenyanbo I don't think there is any technical difficulty. It's just no one had the time to do it yet. You can do the quantization if you need it.

Update: There is https://huggingface.co/bash99/Qwen3-30B-A3B-Instruct-2507-fused-bnb-4bit

Uh oh!

I've made a fused Qwen3 MoE layer for faster fine-tuning #2890

Uh oh!

Uh oh!

woct0rdho Jul 6, 2025

Replies: 2 comments · 9 replies

Uh oh!

danielhanchen Jul 6, 2025 Maintainer

Uh oh!

Uh oh!

woct0rdho Jul 6, 2025 Author

Uh oh!

Uh oh!

danielhanchen Jul 6, 2025 Maintainer

Uh oh!

danielhanchen Jul 6, 2025 Maintainer

Uh oh!

woct0rdho Jul 6, 2025 Author

Uh oh!

danielhanchen Jul 7, 2025 Maintainer

Uh oh!

c3-semihasaj Aug 26, 2025

Uh oh!

zenyanbo Aug 26, 2025

Uh oh!

Uh oh!

c3-semihasaj Aug 26, 2025

Uh oh!

Uh oh!

woct0rdho Aug 27, 2025 Author

woct0rdho
Jul 6, 2025

Replies: 2 comments 9 replies

danielhanchen
Jul 6, 2025
Maintainer

woct0rdho Jul 6, 2025
Author

danielhanchen Jul 6, 2025
Maintainer

danielhanchen Jul 6, 2025
Maintainer

woct0rdho Jul 6, 2025
Author

danielhanchen Jul 7, 2025
Maintainer

c3-semihasaj
Aug 26, 2025

woct0rdho Aug 27, 2025
Author