Question
Hi, I ran into a training issue when using rLLM with a VL model backend stack and wanted to check whether this is a known compatibility problem on the rLLM side.
RuntimeError: This flash attention build does not support headdim not being a multiple of 32.
Context
https://github.com/vllm-project/vllm/issues/26989 I have read about this one, but it doses not seem to work at rllm
Relevant Code / Config
Environment
No response
Question
Hi, I ran into a training issue when using rLLM with a VL model backend stack and wanted to check whether this is a known compatibility problem on the rLLM side.
RuntimeError: This flash attention build does not support headdim not being a multiple of 32.Context
https://github.com/vllm-project/vllm/issues/26989 I have read about this one, but it doses not seem to work at rllm
Relevant Code / Config
Environment
No response