Commit c06b183
Deploy and benchmark MiMo V2.5 4-bit 130-expert pruned variant
Adds Xiaomi's `MiMoV2ForCausalLM` to the stack via vllm-mlx, with
the architecture file vendored from open PR
ml-explore/mlx-lm#1219 (single 556-line `mimo_v2.py` drop-in;
mlx-lm 0.31.3 only ships `mimo.py` and `mimo_v2_flash.py`). The
Ling thread-local-stream + inline-gen patches that are already in
the venv are sufficient — no MiMo-specific patches needed.
Loads cleanly. Tool-call format is Hermes-style XML (same body as
Qwen3-coder, wrapped in `<tool_call>` tags) so `--tool-call-parser
qwen3_coder` + `--reasoning-parser qwen3` is the right combo.
OpenCode bench result: not viable as an agent backbone in this
stack. Browse: 1/3 runs emit invalid tool call, 2/3 hit the 8K
output cap reasoning into oblivion. Search: 0/3 runs ever call a
tool — model burns all 8192 output tokens in `<think>` and never
exits to a tool call. Raw `/v1/chat/completions` with a single
tool *does* emit valid tool calls cleanly, so the issue is the
heavily-pruned 130-expert variant choking on OpenCode's 10-tool
catalog with thinking-on, not the server config.
Adds MiMo as a ⚠ / ⛔ row in the cross-model OpenCode end-to-end
table with the failure mode in the notes column. Full deployment
guide and caveats in `docs/models/model-summary-mimo-v2.5.md`.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>1 parent be3b44c commit c06b183
3 files changed
Lines changed: 544 additions & 2 deletions
File tree
- docs/models
- benchmarks/mimo-v2.5-4bit-130experts
0 commit comments