Hi, I'm trying to implement a simpler version of switch transformer following your work. But the detail of switch_gate is invisible, like limit_by_capacity. My implementation has a slight different result with fastmoe.
Can you release the detail code of switch_gate?
Thanks.
Hi, I'm trying to implement a simpler version of
switch transformerfollowing your work. But the detail ofswitch_gateis invisible, likelimit_by_capacity. My implementation has a slight different result withfastmoe.Can you release the detail code of
switch_gate?Thanks.