Skip to content

Mixtral  #223

@nivibilla

Description

@nivibilla

Not an issue but seeing that exl2 2 bit quants of a 70b model can fit in a single 24gb GPU. I'm wondering if it's possible to run a quantized version of mixtral 7b*8 on a single 24gb GPU. And if that's something exllama2 could support or a completely different project?

Mistral MoE 7b*8 model
https://twitter.com/MistralAI/status/1733150512395038967?t=6jDOugc19MUNyOV1KK6Ing&s=19

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions