Fix server XTC crash from heterogeneous xtc_special_tokens by odysa · Pull Request #1258 · ml-explore/mlx-lm

odysa · 2026-05-07T20:20:59Z

Summary

mlx_lm.server._make_sampler builds xtc_special_tokens as a heterogeneous list — [int, list[int]] — which MLX fancy-indexing in apply_xtc cannot handle. Every chat-completion request with xtc_probability > 0 crashes the generation worker with ValueError: Initialization encountered extra dimension. The client just sees a dropped connection.

This fix flattens the list to match the construction already used in generate.py:2070 and chat.py:156, and switches to eos_token_ids for multi-EOS tokenizers.

- xtc_special_tokens=[
-     tokenizer.eos_token_id,
-     tokenizer.encode("\n"),
- ],
+ xtc_special_tokens=tokenizer.encode("\n", add_special_tokens=False)
+ + list(tokenizer.eos_token_ids),

For Gemma the produced list goes from [1, [2, 107]] (broken) to [107, 1, 106] (works).

Reproduction (before this PR)

import mlx.core as mx
from mlx_lm.sample_utils import apply_xtc
apply_xtc(mx.zeros((1, 100)), 0.5, 0.1, xtc_special_tokens=[1, [2, 107]])
# ValueError: Initialization encountered extra dimension.

End-to-end via the server:

mlx_lm.server --model mlx-community/gemma-3-1b-it-4bit-DWQ --port 8080 &
curl -X POST http://127.0.0.1:8080/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"default_model","messages":[{"role":"user","content":"hi"}],"xtc_probability":0.5,"xtc_threshold":0.1,"max_tokens":5}'
# server stderr: ValueError: Initialization encountered extra dimension.

Test plan

Repro apply_xtc crash on main.
Confirm fix produces a flat list[int] for Gemma 3 / Gemma 4 tokenizers.
Run sampler with the new list — no crash.
End-to-end server smoke test with xtc_probability > 0.
Existing tests pass.

`_make_sampler` constructed `xtc_special_tokens` as `[int, list[int]]` (scalar `eos_token_id` + nested `tokenizer.encode("\n")`). MLX fancy- indexing in `apply_xtc` (`mask[..., xtc_special_tokens] = False`) cannot handle the nested list and raises `ValueError: Initialization encountered extra dimension` on the first sampling step, crashing every chat-completion request with `xtc_probability > 0`. Match the flat-list construction already used in generate.py:2070 and chat.py:156, and pass `add_special_tokens=False` so BOS isn't included. Also covers multi-EOS tokenizers via `tokenizer.eos_token_ids`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix server XTC crash from heterogeneous xtc_special_tokens#1258

Fix server XTC crash from heterogeneous xtc_special_tokens#1258
odysa wants to merge 1 commit intoml-explore:mainfrom
odysa:fix/server-xtc-heterogeneous-list

odysa commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

odysa commented May 7, 2026

Summary

Reproduction (before this PR)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant