When attempting to load Qwen models (e.g., Qwen2.5 / Qwen3) using nanovllm, the initialization crashes during the RoPE (Rotary Position Embedding) setup. This occurs even if the model's config.json explicitly sets "rope_scaling": null.
The crash happens in two sequential stages depending on how the parameter is passed:
First, it throws a TypeError: unhashable type: 'dict' because a dictionary is passed to get_rope, which is wrapped by a cache decorator.
If the dictionary is bypassed, it hits an AssertionError in rotary_embedding.py because nanovllm currently does not support rope_scaling.
[rank0]: Traceback (most recent call last):
[rank0]: File "/mnt/e/study/nanovllm/nano-kvllm-main/chat_cli.py", line 74, in
[rank0]: main()
[rank0]: File "/mnt/e/study/nanovllm/nano-kvllm-main/chat_cli.py", line 9, in main
[rank0]: llm = LLM(
[rank0]: File "/mnt/e/study/nanovllm/nano-kvllm-main/KvChat/engine/llm_engine.py", line 30, in init
[rank0]: self.model_runner = ModelRunner(config, 0, self.events)
[rank0]: File "/mnt/e/study/nanovllm/nano-kvllm-main/KvChat/engine/model_runner.py", line 30, in init
[rank0]: self.model = Qwen3ForCausalLM(hf_config,config)
[rank0]: File "/mnt/e/study/nanovllm/nano-kvllm-main/KvChat/models/qwen3.py", line 216, in init
[rank0]: self.model = Qwen3Model(config,vllm_config)
[rank0]: File "/mnt/e/study/nanovllm/nano-kvllm-main/KvChat/models/qwen3.py", line 185, in init
[rank0]: self.layers = nn.ModuleList([Qwen3DecoderLayer(config,vllm_config) for _ in range(config.num_hidden_layers)])
[rank0]: File "/mnt/e/study/nanovllm/nano-kvllm-main/KvChat/models/qwen3.py", line 185, in
[rank0]: self.layers = nn.ModuleList([Qwen3DecoderLayer(config,vllm_config) for _ in range(config.num_hidden_layers)])
[rank0]: File "/mnt/e/study/nanovllm/nano-kvllm-main/KvChat/models/qwen3.py", line 132, in init
[rank0]: self.self_attn = Qwen3Attention(
[rank0]: File "/mnt/e/study/nanovllm/nano-kvllm-main/KvChat/models/qwen3.py", line 55, in init
[rank0]: self.rotary_emb = get_rope(
[rank0]: TypeError: unhashable type: 'dict'
Implicit Mutation by transformers: In recent versions of (e.g., ) or specific Qwen config implementations, even if is set in the , the object initializes to a default dictionary (or dynamically parses it for long-context models).transformers4.51.0"rope_scaling": nullconfig.jsonQwen3Configconfig.rope_scaling
Cache Hash Collision: likely uses a caching mechanism (like ) on the function. Passing the dict directly from the config causes the cache hasher to crash ().nanovllm/layers/rotary_embedding.py@lru_cacheget_roperope_scalingunhashable type: 'dict'
Lack of RoPE Scaling Implementation: Even if the dict is sanitized, at line 59 explicitly asserts . This indicates that RoPE scaling algorithms (like Yarn or Dynamic NTK) are not yet implemented in .rotary_embedding.pyassert rope_scaling is None
Since currently does not support RoPE scaling, we should explicitly discard the parameter during the initialization to allow the model to load properly for standard context lengths.nanovllmrope_scalingQwen3Attention
self.rotary_emb = get_rope(
self.head_dim,
rotary_dim=self.head_dim,
max_position=max_position,
base=rope_theta,
# Explicitly set to None to bypass unhashable dict error
# and satisfy the assertion in rotary_embedding.py
rope_scaling=None,
)
Environment:
Models: Qwen3 / Qwen2.5 series
Transformers version: 4.51.0
When attempting to load Qwen models (e.g., Qwen2.5 / Qwen3) using nanovllm, the initialization crashes during the RoPE (Rotary Position Embedding) setup. This occurs even if the model's config.json explicitly sets "rope_scaling": null.
The crash happens in two sequential stages depending on how the parameter is passed:
First, it throws a TypeError: unhashable type: 'dict' because a dictionary is passed to get_rope, which is wrapped by a cache decorator.
If the dictionary is bypassed, it hits an AssertionError in rotary_embedding.py because nanovllm currently does not support rope_scaling.
[rank0]: Traceback (most recent call last):
[rank0]: File "/mnt/e/study/nanovllm/nano-kvllm-main/chat_cli.py", line 74, in
[rank0]: main()
[rank0]: File "/mnt/e/study/nanovllm/nano-kvllm-main/chat_cli.py", line 9, in main
[rank0]: llm = LLM(
[rank0]: File "/mnt/e/study/nanovllm/nano-kvllm-main/KvChat/engine/llm_engine.py", line 30, in init
[rank0]: self.model_runner = ModelRunner(config, 0, self.events)
[rank0]: File "/mnt/e/study/nanovllm/nano-kvllm-main/KvChat/engine/model_runner.py", line 30, in init
[rank0]: self.model = Qwen3ForCausalLM(hf_config,config)
[rank0]: File "/mnt/e/study/nanovllm/nano-kvllm-main/KvChat/models/qwen3.py", line 216, in init
[rank0]: self.model = Qwen3Model(config,vllm_config)
[rank0]: File "/mnt/e/study/nanovllm/nano-kvllm-main/KvChat/models/qwen3.py", line 185, in init
[rank0]: self.layers = nn.ModuleList([Qwen3DecoderLayer(config,vllm_config) for _ in range(config.num_hidden_layers)])
[rank0]: File "/mnt/e/study/nanovllm/nano-kvllm-main/KvChat/models/qwen3.py", line 185, in
[rank0]: self.layers = nn.ModuleList([Qwen3DecoderLayer(config,vllm_config) for _ in range(config.num_hidden_layers)])
[rank0]: File "/mnt/e/study/nanovllm/nano-kvllm-main/KvChat/models/qwen3.py", line 132, in init
[rank0]: self.self_attn = Qwen3Attention(
[rank0]: File "/mnt/e/study/nanovllm/nano-kvllm-main/KvChat/models/qwen3.py", line 55, in init
[rank0]: self.rotary_emb = get_rope(
[rank0]: TypeError: unhashable type: 'dict'
Implicit Mutation by transformers: In recent versions of (e.g., ) or specific Qwen config implementations, even if is set in the , the object initializes to a default dictionary (or dynamically parses it for long-context models).transformers4.51.0"rope_scaling": nullconfig.jsonQwen3Configconfig.rope_scaling
Cache Hash Collision: likely uses a caching mechanism (like ) on the function. Passing the dict directly from the config causes the cache hasher to crash ().nanovllm/layers/rotary_embedding.py@lru_cacheget_roperope_scalingunhashable type: 'dict'
Lack of RoPE Scaling Implementation: Even if the dict is sanitized, at line 59 explicitly asserts . This indicates that RoPE scaling algorithms (like Yarn or Dynamic NTK) are not yet implemented in .rotary_embedding.pyassert rope_scaling is None
Since currently does not support RoPE scaling, we should explicitly discard the parameter during the initialization to allow the model to load properly for standard context lengths.nanovllmrope_scalingQwen3Attention
self.rotary_emb = get_rope(
self.head_dim,
rotary_dim=self.head_dim,
max_position=max_position,
base=rope_theta,
# Explicitly set to None to bypass unhashable dict error
# and satisfy the assertion in rotary_embedding.py
rope_scaling=None,
)
Environment:
Models: Qwen3 / Qwen2.5 series
Transformers version: 4.51.0