Not an issue but seeing that exl2 2 bit quants of a 70b model can fit in a single 24gb GPU. I'm wondering if it's possible to run a quantized version of mixtral 7b*8 on a single 24gb GPU. And if that's something exllama2 could support or a completely different project?
Mistral MoE 7b*8 model
https://twitter.com/MistralAI/status/1733150512395038967?t=6jDOugc19MUNyOV1KK6Ing&s=19
Not an issue but seeing that exl2 2 bit quants of a 70b model can fit in a single 24gb GPU. I'm wondering if it's possible to run a quantized version of mixtral 7b*8 on a single 24gb GPU. And if that's something exllama2 could support or a completely different project?
Mistral MoE 7b*8 model
https://twitter.com/MistralAI/status/1733150512395038967?t=6jDOugc19MUNyOV1KK6Ing&s=19