Skip to content

Add MiMo V2.5#1219

Open
kernelpool wants to merge 7 commits intoml-explore:mainfrom
kernelpool:add-mimo-v2
Open

Add MiMo V2.5#1219
kernelpool wants to merge 7 commits intoml-explore:mainfrom
kernelpool:add-mimo-v2

Conversation

@kernelpool
Copy link
Copy Markdown
Contributor

@kernelpool kernelpool commented Apr 28, 2026

Models

MiMo-V2.5-Pro (distributed)

mlx.launch --verbose --backend jaccl --hostfile hosts-jaccl.json --env MLX_METAL_FAST_SYNCH=1 -- /Users/optimus/repo/mlx-lm/mlx_lm/examples/sharded_generate.py --model /Users/optimus/models/catalyst/MiMo-V2.5-Pro-4bit --prompt "Who is Albert Einstein?" -m 1024                                 
[INFO] Running /Users/optimus/repo/mlx-lm/.venv/bin/python /Users/optimus/repo/mlx-lm/mlx_lm/examples/sharded_generate.py --model /Users/optimus/models/catalyst/MiMo-V2.5-Pro-4bit --prompt 'Who is Albert Einstein?' -m 1024 
[jaccl] Connection attempt 0 waiting 1000 ms
/Users/optimus/repo/mlx-lm/.venv/lib/python3.12/site-packages/transformers/modeling_rope_utils.py:936: FutureWarning: `rope_config_validation` is deprecated and has been removed. Its functionality has been moved to RotaryEmbeddingConfigMixin.validate_rope method. PreTrainedConfig inherits this class, so please call self.validate_rope() instead. Also, make sure to use the new rope_parameters syntax. You can call self.standardize_rope_params() in the meantime.
  warnings.warn(
/Users/optimus/repo/mlx-lm/.venv/lib/python3.12/site-packages/transformers/modeling_rope_utils.py:936: FutureWarning: `rope_config_validation` is deprecated and has been removed. Its functionality has been moved to RotaryEmbeddingConfigMixin.validate_rope method. PreTrainedConfig inherits this class, so please call self.validate_rope() instead. Also, make sure to use the new rope_parameters syntax. You can call self.standardize_rope_params() in the meantime.
  warnings.warn(
<think>The user is asking about Albert Einstein, one of the most famous scientists in history. This is a straightforward factual question.</think># Albert Einstein (1879–1955)

Albert Einstein was a German-born theoretical physicist, widely regarded as one of the most influential scientists in history. Here are some key highlights:

## Major Contributions

- **Special Theory of Relativity (1905):** Introduced the famous equation **E = mc²**, showing the relationship between mass and energy.
- **General Theory of Relativity (1915):** Redefined our understanding of gravity as the curvature of spacetime caused by mass and energy.
- **Photoelectric Effect:** Demonstred to quantum mechanics by explaining light as discrete packets (photons). This work earned him the **Nobel Prize in Physics in 1921**.
- **Brownian Motion:** Provided empirical evidence for the existence of atoms.

## Personal Life

- Born on **March 14, 1879**, in **Ulm, Germany**.
- He worked at the Swiss Patent Office before entering academia.
- Due to the rise of Nazism, he emigrated to the **United States** in 1933 and joined the **Institute for Advanced Study** in Princeton, New Jersey.
- He became a U.S. citizen in 1940.

## Legacy

Einstein's name has become synonymous with **genius**. His theories laid the groundwork for modern physics, influencing everything from GPS satellites to black hole research. Beyond science, he was also a vocal advocate for **civil rights, pacifism, and nuclear disarmament**.

He passed away on **April 18, 1955**, in Princeton, New Jersey.

Would you like to know more about any specific aspect of his life or work? 😊
==========
Prompt: 256 tokens, 62.765 tokens-per-sec
Generation: 384 tokens, 30.303 tokens-per-sec
Peak memory: 297.086 GB
[INFO] Node with rank 1 completed 
[INFO] Node with rank 0 completed 

MiMo-V2.5 (single node)

mlx_lm.generate --model /Volumes/WD_EXTRA/temp/MiMo-V2.5-4bit --prompt "Who is Ablert Einstein?" -m 1024
The repository /Volumes/WD_EXTRA/temp/MiMo-V2.5-4bit contains custom code which must be executed to correctly load the model. You can inspect the repository content at /Volumes/WD_EXTRA/temp/MiMo-V2.5-4bit .
 You can inspect the repository content at https://hf.co//Volumes/WD_EXTRA/temp/MiMo-V2.5-4bit.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y
/Users/optimus/repo/mlx-lm/.venv/lib/python3.12/site-packages/transformers/modeling_rope_utils.py:936: FutureWarning: `rope_config_validation` is deprecated and has been removed. Its functionality has been moved to RotaryEmbeddingConfigMixin.validate_rope method. PreTrainedConfig inherits this class, so please call self.validate_rope() instead. Also, make sure to use the new rope_parameters syntax. You can call self.standardize_rope_params() in the meantime.
  warnings.warn(
==========
<think>The user is asking about Albert Einstein (they misspelled it as "Ablert"). I'll provide a comprehensive overview of who he was.</think># Albert Einstein (1879–1955)

Albert Einstein was a **German-born theoretical physicist**, widely regarded as one of the most influential scientists in history. Here are the key highlights of his life and work:

## Major Contributions
- **Special Theory of Relativity (1905)** – Introduced the famous equation **E = mc²**, showing the equivalence of mass and energy.
- **General Theory of Relativity (1915)** – Redefined our understanding of gravity as the curvature of space-time.
- **Photoelectric Effect** – Helped establish quantum theory by showing that light can behave as particles (photons). This work earned him the **Nobel Prize in Physics in 1921**.
- **Brownian Motion** – Provided evidence for the existence of atoms.

## Life Highlights
- **Born:** March 14, 1879, in Ulm, Germany
- **Education:** Studied at the Swiss Federal Polytechnic in Zurich
- **1905 "Miracle Year":** Published four groundbreaking papers that revolutionized physics
- **Emigrated to the U.S.** in 1933, fleeing the rise of Nazism, and joined the **Institute for Advanced Study** in Princeton, New Jersey
- **1939:** Signed a letter to President Roosevelt warning about the potential for atomic weapons, which contributed to the Manhattan Project (though Einstein himself did not work on it)
- **Died:** April 18, 1955, in Princeton, New Jersey

## Legacy
Einstein's work fundamentally changed our understanding of the universe — from the nature of light and gravity to the fabric of space and time. He remains a cultural icon of **genius and creativity**, and his name is virtually synonymous with intelligence.
==========
Prompt: 31 tokens, 23.129 tokens-per-sec
Generation: 413 tokens, 48.367 tokens-per-sec
Peak memory: 173.935 GB

@phpmac
Copy link
Copy Markdown

phpmac commented Apr 28, 2026

牛逼

chanunc pushed a commit to chanunc/local-llm-mac-studio that referenced this pull request Apr 30, 2026
Adds Xiaomi's `MiMoV2ForCausalLM` to the stack via vllm-mlx, with
the architecture file vendored from open PR
ml-explore/mlx-lm#1219 (single 556-line `mimo_v2.py` drop-in;
mlx-lm 0.31.3 only ships `mimo.py` and `mimo_v2_flash.py`). The
Ling thread-local-stream + inline-gen patches that are already in
the venv are sufficient — no MiMo-specific patches needed.

Loads cleanly. Tool-call format is Hermes-style XML (same body as
Qwen3-coder, wrapped in `<tool_call>` tags) so `--tool-call-parser
qwen3_coder` + `--reasoning-parser qwen3` is the right combo.

OpenCode bench result: not viable as an agent backbone in this
stack. Browse: 1/3 runs emit invalid tool call, 2/3 hit the 8K
output cap reasoning into oblivion. Search: 0/3 runs ever call a
tool — model burns all 8192 output tokens in `<think>` and never
exits to a tool call. Raw `/v1/chat/completions` with a single
tool *does* emit valid tool calls cleanly, so the issue is the
heavily-pruned 130-expert variant choking on OpenCode's 10-tool
catalog with thinking-on, not the server config.

Adds MiMo as a ⚠ / ⛔ row in the cross-model OpenCode end-to-end
table with the failure mode in the notes column. Full deployment
guide and caveats in `docs/models/model-summary-mimo-v2.5.md`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@yohann-bearzi
Copy link
Copy Markdown

yohann-bearzi commented May 2, 2026

Tested on Mac Studio M3 Ultra 512 GB, mlx-lm installed from this branch.

Quantized MiMo-V2-Flash to 8-bit via mlx_lm.convert -q --q-bits 8 and ran generation:

  • Prompt: "The capital of France is"
  • Output: "Paris. \nThe capital of Germany is Berlin. \nThe capital of Italy"

Forward pass works, generation is coherent. Architecture instantiates cleanly from MiMo-V2-Flash's config (313 GB BF16 source).

Thanks for the contribution @kernelpool — great to have MiMo on Apple Silicon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants