In Qwen3Attention, the code appears to support RoPE scaling by forwarding a rope_scaling argument into get_rope:
self.rotary_emb = get_rope( self.head_dim, rotary_dim=self.head_dim, max_position=max_position, base=rope_theta, rope_scaling=rope_scaling, )
However, in layers/rotary_embedding.py, get_rope immediately enforces:
assert rope_scaling is None
It seems like the two pieces of code encode contradictory assumptions
In Qwen3Attention, the code appears to support RoPE scaling by forwarding a rope_scaling argument into get_rope:
self.rotary_emb = get_rope( self.head_dim, rotary_dim=self.head_dim, max_position=max_position, base=rope_theta, rope_scaling=rope_scaling, )However, in layers/rotary_embedding.py, get_rope immediately enforces:
assert rope_scaling is NoneIt seems like the two pieces of code encode contradictory assumptions