Skip to content

gate部分代码疑惑 #24

@mioadxll

Description

@mioadxll

https://github.com/Tencent/Tencent-Hunyuan-Large/blob/main/models/modeling_hunyuan.py#L136处代码:
capacity = gates.shape[0]
不太理解
个人认为这样设置不是设置为全局的token数量了吗?🤔
那样这个capacity不就形同虚设?导致每个专家肯定能装下所有的token?topk和后续的专家重分配也不会work吧

以上仅为我的一些个人理解,如有不当之处,烦请指正。

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions