Skip to content

Add Qwen3.5 hybrid model support#3592

Closed
zhaohb wants to merge 11 commits intoopenvinotoolkit:masterfrom
zhaohb:support_qwen3.5
Closed

Add Qwen3.5 hybrid model support#3592
zhaohb wants to merge 11 commits intoopenvinotoolkit:masterfrom
zhaohb:support_qwen3.5

Conversation

@zhaohb
Copy link
Copy Markdown
Contributor

@zhaohb zhaohb commented Mar 27, 2026

No description provided.

Co-authored-by: gitpqLee <pengqiang.li@intel.com>
Copilot AI review requested due to automatic review settings March 27, 2026 04:11
@zhaohb zhaohb marked this pull request as draft March 27, 2026 04:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Qwen3.5 (hybrid) Visual Language Model support by wiring a new VLMModelType through config parsing and factory creation, plus implementing a Qwen3.5-specific InputsEmbedder that reuses Qwen3-VL vision encoding while adapting merger/extra-input handling. Also updates pipeline + KV-cache utilities to better handle hybrid models that may not expose position_ids or may include non-KV state tensors.

Changes:

  • Add VLMModelType::QWEN3_5 and parse "qwen3_5" from config.json.
  • Introduce visual_language/qwen3_5 implementation and connect it in VisionEncoder / InputsEmbedder factories.
  • Make VLMPipeline pass position_ids only when the compiled language model exposes that input; adjust KV-cache detection/trimming to skip non-KV states.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/cpp/src/visual_language/vlm_config.hpp Adds new VLMModelType::QWEN3_5 enum value.
src/cpp/src/visual_language/vlm_config.cpp Adds "qwen3_5" string mapping to the new model type.
src/cpp/src/visual_language/vision_encoder.cpp Routes QWEN3_5 to VisionEncoderQwen3_5.
src/cpp/src/visual_language/inputs_embedder.cpp Routes QWEN3_5 to InputsEmbedderQwen3_5.
src/cpp/src/visual_language/qwen3_5/classes.hpp Declares Qwen3.5 vision encoder wrapper + inputs embedder overrides.
src/cpp/src/visual_language/qwen3_5/classes.cpp Implements Qwen3.5 merger path and disables LM extra inputs.
src/cpp/src/visual_language/pipeline.cpp Conditionally computes/passes position_ids based on compiled model inputs.
src/cpp/src/utils.cpp Improves KV-cache axis detection + trimming to skip non-KV hybrid states.

Comment thread src/cpp/src/visual_language/qwen3_5/classes.cpp Outdated
Comment on lines +668 to +683
// Only compute and pass position_ids if the language model accepts them.
// Hybrid models (e.g. Qwen3.5) compute rotary embeddings internally.
std::optional<ov::Tensor> position_ids;
std::optional<int64_t> rope_delta;
std::tie(position_ids, rope_delta) = m_inputs_embedder->get_position_ids(inputs_embeds_size, history_size);
bool has_position_ids_input = false;
for (const auto& input : m_language.get_compiled_model().inputs()) {
if (input.get_any_name() == "position_ids") {
has_position_ids_input = true;
break;
}
}
if (has_position_ids_input) {
auto [pos_ids, delta] = m_inputs_embedder->get_position_ids(inputs_embeds_size, history_size);
position_ids = std::move(pos_ids);
rope_delta = delta;
}
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change introduces a new execution path where position_ids are omitted when the compiled LM lacks that input. Since the repo has extensive VLM pipeline coverage in tests/python_tests/test_vlm_pipeline.py, please add/extend tests to exercise a model without position_ids (intended for Qwen3.5) to ensure generation works and no attempt is made to set position_ids/rope_delta.

Copilot generated this review using guidance from repository custom instructions.
Comment thread src/cpp/src/utils.cpp Outdated
Comment on lines +467 to +472
// Only accept ReadValue nodes with a zero-dim (growing seq_len axis),
// which identifies actual KV-cache states. Hybrid models (e.g. Qwen3.5)
// may have fixed-size conv/ssm states without a zero-dim; skip those.
if (has_zero_dim) {
break;
}
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_kv_axes_pos() now ignores ReadValue states without a zero-length axis to avoid treating hybrid conv/SSM states as KV-cache. Please add a regression test that covers a model with both types of states to ensure KV axes detection remains correct and stable across model variants.

Copilot generated this review using guidance from repository custom instructions.
Comment thread src/cpp/src/utils.cpp Outdated
Comment on lines +509 to +514
// Skip non-KV-cache states (e.g. conv/ssm states in hybrid models like Qwen3.5).
// KV-cache states have at least seq_length_axis+1 dimensions and enough tokens to trim.
if (shape.size() <= kv_cache_state.seq_length_axis ||
shape[kv_cache_state.seq_length_axis] < kv_cache_state.num_tokens_to_trim) {
continue;
}
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trim_kv_cache() now skips states that don't look like KV-cache tensors (e.g., fixed-size conv/SSM states). Please add a regression test that constructs an InferRequest with mixed state shapes and verifies only KV-cache tensors are trimmed while others are left unchanged.

Copilot generated this review using guidance from repository custom instructions.
@sund00bie
Copy link
Copy Markdown

would be good to get this merged now that openvinotoolkit/openvino#34481 has been pushed to nightly

Copy link
Copy Markdown
Collaborator

@rkazants rkazants left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhaohb, how did you generate IRs for this model?

@rkazants rkazants requested a review from apaniukov April 1, 2026 13:10
Comment thread src/cpp/src/utils.cpp Outdated
@savvadesogle
Copy link
Copy Markdown

Cooooool! Thank youuu!

@SearchSavior
Copy link
Copy Markdown

Very exciting!

@sund00bie
Copy link
Copy Markdown

Very exciting!

Very excited to see this in OpenArc

@sund00bie
Copy link
Copy Markdown

and now qwen3.6 is out :o

@savvadesogle
Copy link
Copy Markdown

@sund00bie not yet
Screenshot_2026-04-04-20-56-59-307_org mozilla firefox-edit

@zhaohb
Copy link
Copy Markdown
Contributor Author

zhaohb commented Apr 7, 2026

@zhaohb, how did you generate IRs for this model?

Hi @rkazants , I generate the IR with that that PR: huggingface/optimum-intel#1634

Copilot AI review requested due to automatic review settings April 10, 2026 08:22
zhaohb and others added 2 commits April 10, 2026 16:24
Co-authored-by: gitpqLee <pengqiang.li@intel.com>
@sund00bie
Copy link
Copy Markdown

@zhaohb 👏🏾👏🏾

This comment was marked as abuse.

Copilot AI review requested due to automatic review settings April 10, 2026 08:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Comment on lines +737 to +752
// Only compute and pass position_ids if the language model accepts them.
// Hybrid models (e.g. Qwen3.5) compute rotary embeddings internally.
std::optional<ov::Tensor> position_ids;
std::optional<int64_t> rope_delta;
std::tie(position_ids, rope_delta) = m_inputs_embedder->get_position_ids(inputs_embeds_size, history_size);
bool has_position_ids_input = false;
for (const auto& input : m_language.get_compiled_model().inputs()) {
if (input.get_any_name() == "position_ids") {
has_position_ids_input = true;
break;
}
}
if (has_position_ids_input) {
auto [pos_ids, delta] = m_inputs_embedder->get_position_ids(inputs_embeds_size, history_size);
position_ids = std::move(pos_ids);
rope_delta = delta;
}
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This scans compiled_model().inputs() each time this block runs, which is likely on a hot path during generation. Cache has_position_ids_input once (e.g., as a VLMPipelineImpl member initialized after model compilation) and reuse it to avoid repeated linear scans.

Copilot uses AI. Check for mistakes.
Comment on lines +87 to +105
size_t video_tokens = calc_vec_tokens_num(reordered_videos_grid_thw);
size_t image_tokens = calc_vec_tokens_num(reordered_images_grid_thw);
size_t total_tokens = video_tokens + image_tokens;

size_t video_token_count = 0;
if (total_tokens > 0) {
video_token_count = vision_embeds_shape[0] * video_tokens / total_tokens;
}
size_t image_token_count = vision_embeds_shape[0] - video_token_count;

ov::Tensor video_embeds{vision_embeds.get_element_type(), {video_token_count, vision_embeds_shape[1]}};
ov::Tensor image_embeds{vision_embeds.get_element_type(), {image_token_count, vision_embeds_shape[1]}};

std::memcpy(video_embeds.data(), vision_embeds.data(), video_embeds.get_byte_size());
std::memcpy(image_embeds.data(),
static_cast<uint8_t*>(vision_embeds.data()) + video_embeds.get_byte_size(),
image_embeds.get_byte_size());

return {video_embeds, image_embeds};
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting vision_embeds by proportional ratio can silently produce incorrect boundaries (and rounding artifacts) if the merger output token dimension differs from video_tokens + image_tokens or if the model changes tokenization behavior. If the output is expected to preserve token count/order, split deterministically using video_tokens and image_tokens (and validate vision_embeds_shape[0] == total_tokens); otherwise, this needs an explicit, model-defined mapping rather than a proportional guess.

Copilot uses AI. Check for mistakes.
@zhaohb
Copy link
Copy Markdown
Contributor Author

zhaohb commented Apr 10, 2026

Could you please take a look at the implementation and verify whether the approach is reasonable?
@apaniukov @yatarkan
Thank you very much.

Copilot AI review requested due to automatic review settings April 10, 2026 10:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comment on lines 288 to 292
} else if (vlm_config.model_type == VLMModelType::QWEN3_VL) {
m_impl = std::make_shared<InputsEmbedderQwen3VL>(vlm_config, model_dir, device, device_config);
} else if (vlm_config.model_type == VLMModelType::QWEN3_5) {
m_impl = std::make_shared<InputsEmbedderQwen3_5>(vlm_config, model_dir, device, device_config);
} else if (vlm_config.model_type == VLMModelType::GEMMA3) {
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New QWEN3_5 model branch is introduced here, but there is no corresponding functional coverage in the existing VLM pipeline test suite (e.g., tests/python_tests/test_vlm_pipeline.py enumerates supported tiny-random VLMs and currently has no Qwen3.5 entry). Please add at least one test case exercising this code path (including the hybrid behavior where the LM may omit position_ids and Qwen3.5 returns empty extra inputs).

Copilot generated this review using guidance from repository custom instructions.
@yatarkan
Copy link
Copy Markdown
Contributor

yatarkan commented Apr 15, 2026

Closing in favor of #3717 as it includes proper position_ids input handling, supports video inputs and new-style processor configs (introduced with transformers v5).
Thanks for initial enablement proposal, it was helpful for a base implementation.

@yatarkan yatarkan closed this Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: visual language Visual language pipeline

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants