Skip to content

Commit c06b183

Browse files
Chanun Chirattikanonclaude
authored andcommitted
Deploy and benchmark MiMo V2.5 4-bit 130-expert pruned variant
Adds Xiaomi's `MiMoV2ForCausalLM` to the stack via vllm-mlx, with the architecture file vendored from open PR ml-explore/mlx-lm#1219 (single 556-line `mimo_v2.py` drop-in; mlx-lm 0.31.3 only ships `mimo.py` and `mimo_v2_flash.py`). The Ling thread-local-stream + inline-gen patches that are already in the venv are sufficient — no MiMo-specific patches needed. Loads cleanly. Tool-call format is Hermes-style XML (same body as Qwen3-coder, wrapped in `<tool_call>` tags) so `--tool-call-parser qwen3_coder` + `--reasoning-parser qwen3` is the right combo. OpenCode bench result: not viable as an agent backbone in this stack. Browse: 1/3 runs emit invalid tool call, 2/3 hit the 8K output cap reasoning into oblivion. Search: 0/3 runs ever call a tool — model burns all 8192 output tokens in `<think>` and never exits to a tool call. Raw `/v1/chat/completions` with a single tool *does* emit valid tool calls cleanly, so the issue is the heavily-pruned 130-expert variant choking on OpenCode's 10-tool catalog with thinking-on, not the server config. Adds MiMo as a ⚠ / ⛔ row in the cross-model OpenCode end-to-end table with the failure mode in the notes column. Full deployment guide and caveats in `docs/models/model-summary-mimo-v2.5.md`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent be3b44c commit c06b183

3 files changed

Lines changed: 544 additions & 2 deletions

File tree

0 commit comments

Comments
 (0)