-
Install the required packages.
pip install datasets
-
If you directly proceed to the third step, you may encounter the following problems:
RuntimeError: Currently, quantification and calibration of Qwen2_5_VLTextModel are not supported. The supported model types are InternLMForCausalLM, InternLM2ForCausalLM, InternLM3ForCausalLM, QWenLMHeadModel, Qwen2ForCausalLM, Qwen3ForCausalLM, BaiChuanForCausalLM, BaichuanForCausalLM, LlamaForCausalLM, LlavaLlamaForCausalLM,MGMLlamaForCausalLM, InternLMXComposer2ForCausalLM, Phi3ForCausalLM, ChatGLMForConditionalGeneration, MixtralForCausalLM, Qwen2VLForConditionalGeneration, Qwen2_5_VLForConditionalGeneration, MistralForCausalLM.
This is because in the calibrte.py file of the lmdeploy library, the following code (lines 255-258) replaces
modelwithvl_model.language_model, causingmodel_typeto becomeQwen2_5_VLTextModelinstead of the supportedQwen2_5_VLForConditionalGeneration:if hasattr(vl_model, 'language_model'): # deepseek-vl, ... model = vl_model.language_model if hasattr(vl_model, 'llm'): # MiniCPMV, ... model = vl_model.llmFind these codes and comment out these lines:
# if hasattr(vl_model, 'language_model'): # deepseek-vl, ... # model = vl_model.language_model # if hasattr(vl_model, 'llm'): # MiniCPMV, ... # model = vl_model.llmYou can use the following command to view the directory of the lmdeploy library:
python -c "import lmdeploy; import os; print(os.path.dirname(lmdeploy.__file__))"The relative location of calibrte.py is in lmdeploy/lite/apis/calibrate.py
Or you can download tools/fix_qwen2_5_vl_awq.py
Run in your environment:
python tools/fix_qwen2_5_vl_awq.py patch
Note: This command modifies LMDeploy’s source code in your environment. To undo the changes, simply run:
python tools/fix_qwen2_5_vl_awq.py restore
-
Enter the following in the terminal.
lmdeploy lite auto_awq \ ./model_weight/Recognition \ --calib-dataset 'ptb' \ --calib-samples 64 \ --calib-seqlen 1024 \ --w-bits 4 \ --w-group-size 128 \ --batch-size 1 \ --work-dir ./monkeyocr_quantizationWait for the quantization to complete.
- If the quantization process is killed, you need to check if you have sufficient memory.
- For reference, the maximum VRAM usage for quantization with these parameters is approximately 6.47GB.
-
You might encounter the following error:
RuntimeError: Error(s) in loading state_dict for Linear: size mismatch for bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1280]).This is because your installed version of LMDeploy is not yet compatible with Qwen2.5VL. You need to install the latest development version from the GitHub repository.
pip install git+https://github.com/InternLM/lmdeploy.git
After the installation is complete, try quantizing again.
-
After quantization is complete, replace the
Recognitionfolder.mv model_weight/Recognition Recognition_backup mv monkeyocr_quantization model_weight/Recognition
Then, you can try running the program again.