Add Qwen3.5 hybrid model support by zhaohb · Pull Request #3592 · openvinotoolkit/openvino.genai

zhaohb · 2026-03-27T04:11:36Z

No description provided.

Co-authored-by: gitpqLee <pengqiang.li@intel.com>

Copilot

Pull request overview

Adds Qwen3.5 (hybrid) Visual Language Model support by wiring a new VLMModelType through config parsing and factory creation, plus implementing a Qwen3.5-specific InputsEmbedder that reuses Qwen3-VL vision encoding while adapting merger/extra-input handling. Also updates pipeline + KV-cache utilities to better handle hybrid models that may not expose position_ids or may include non-KV state tensors.

Changes:

Add VLMModelType::QWEN3_5 and parse "qwen3_5" from config.json.
Introduce visual_language/qwen3_5 implementation and connect it in VisionEncoder / InputsEmbedder factories.
Make VLMPipeline pass position_ids only when the compiled language model exposes that input; adjust KV-cache detection/trimming to skip non-KV states.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
src/cpp/src/visual_language/vlm_config.hpp	Adds new `VLMModelType::QWEN3_5` enum value.
src/cpp/src/visual_language/vlm_config.cpp	Adds `"qwen3_5"` string mapping to the new model type.
src/cpp/src/visual_language/vision_encoder.cpp	Routes `QWEN3_5` to `VisionEncoderQwen3_5`.
src/cpp/src/visual_language/inputs_embedder.cpp	Routes `QWEN3_5` to `InputsEmbedderQwen3_5`.
src/cpp/src/visual_language/qwen3_5/classes.hpp	Declares Qwen3.5 vision encoder wrapper + inputs embedder overrides.
src/cpp/src/visual_language/qwen3_5/classes.cpp	Implements Qwen3.5 merger path and disables LM extra inputs.
src/cpp/src/visual_language/pipeline.cpp	Conditionally computes/passes `position_ids` based on compiled model inputs.
src/cpp/src/utils.cpp	Improves KV-cache axis detection + trimming to skip non-KV hybrid states.

Copilot · 2026-03-27T05:18:29Z

+        // Only compute and pass position_ids if the language model accepts them.
+        // Hybrid models (e.g. Qwen3.5) compute rotary embeddings internally.
+        std::optional<ov::Tensor> position_ids;
        std::optional<int64_t> rope_delta;
-        std::tie(position_ids, rope_delta) = m_inputs_embedder->get_position_ids(inputs_embeds_size, history_size);
+        bool has_position_ids_input = false;
+        for (const auto& input : m_language.get_compiled_model().inputs()) {
+            if (input.get_any_name() == "position_ids") {
+                has_position_ids_input = true;
+                break;
+            }
+        }
+        if (has_position_ids_input) {
+            auto [pos_ids, delta] = m_inputs_embedder->get_position_ids(inputs_embeds_size, history_size);
+            position_ids = std::move(pos_ids);
+            rope_delta = delta;
+        }


This change introduces a new execution path where position_ids are omitted when the compiled LM lacks that input. Since the repo has extensive VLM pipeline coverage in tests/python_tests/test_vlm_pipeline.py, please add/extend tests to exercise a model without position_ids (intended for Qwen3.5) to ensure generation works and no attempt is made to set position_ids/rope_delta.

Copilot · 2026-03-27T05:18:29Z

+        // Only accept ReadValue nodes with a zero-dim (growing seq_len axis),
+        // which identifies actual KV-cache states. Hybrid models (e.g. Qwen3.5)
+        // may have fixed-size conv/ssm states without a zero-dim; skip those.
+        if (has_zero_dim) {
+            break;
+        }


get_kv_axes_pos() now ignores ReadValue states without a zero-length axis to avoid treating hybrid conv/SSM states as KV-cache. Please add a regression test that covers a model with both types of states to ensure KV axes detection remains correct and stable across model variants.

Copilot · 2026-03-27T05:18:29Z

+        // Skip non-KV-cache states (e.g. conv/ssm states in hybrid models like Qwen3.5).
+        // KV-cache states have at least seq_length_axis+1 dimensions and enough tokens to trim.
+        if (shape.size() <= kv_cache_state.seq_length_axis ||
+            shape[kv_cache_state.seq_length_axis] < kv_cache_state.num_tokens_to_trim) {
+            continue;
+        }


trim_kv_cache() now skips states that don't look like KV-cache tensors (e.g., fixed-size conv/SSM states). Please add a regression test that constructs an InferRequest with mixed state shapes and verifies only KV-cache tensors are trimmed while others are left unchanged.

sund00bie · 2026-04-01T10:51:49Z

would be good to get this merged now that openvinotoolkit/openvino#34481 has been pushed to nightly

rkazants

@zhaohb, how did you generate IRs for this model?

savvadesogle · 2026-04-01T19:54:23Z

Cooooool! Thank youuu!

SearchSavior · 2026-04-01T20:11:27Z

Very exciting!

sund00bie · 2026-04-01T23:49:14Z

Very exciting!

Very excited to see this in OpenArc

sund00bie · 2026-04-04T16:59:02Z

and now qwen3.6 is out :o

savvadesogle · 2026-04-04T17:57:54Z

@sund00bie not yet

zhaohb · 2026-04-07T07:22:40Z

@zhaohb, how did you generate IRs for this model?

Hi @rkazants , I generate the IR with that that PR: huggingface/optimum-intel#1634

Co-authored-by: gitpqLee <pengqiang.li@intel.com>

sund00bie · 2026-04-10T08:25:01Z

@zhaohb 👏🏾👏🏾

…genai into support_qwen3.5

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Copilot · 2026-04-10T09:02:00Z

+        // Only compute and pass position_ids if the language model accepts them.
+        // Hybrid models (e.g. Qwen3.5) compute rotary embeddings internally.
+        std::optional<ov::Tensor> position_ids;
        std::optional<int64_t> rope_delta;
-        std::tie(position_ids, rope_delta) = m_inputs_embedder->get_position_ids(inputs_embeds_size, history_size);
+        bool has_position_ids_input = false;
+        for (const auto& input : m_language.get_compiled_model().inputs()) {
+            if (input.get_any_name() == "position_ids") {
+                has_position_ids_input = true;
+                break;
+            }
+        }
+        if (has_position_ids_input) {
+            auto [pos_ids, delta] = m_inputs_embedder->get_position_ids(inputs_embeds_size, history_size);
+            position_ids = std::move(pos_ids);
+            rope_delta = delta;
+        }


This scans compiled_model().inputs() each time this block runs, which is likely on a hot path during generation. Cache has_position_ids_input once (e.g., as a VLMPipelineImpl member initialized after model compilation) and reuse it to avoid repeated linear scans.

Copilot · 2026-04-10T09:02:01Z

+    size_t video_tokens = calc_vec_tokens_num(reordered_videos_grid_thw);
+    size_t image_tokens = calc_vec_tokens_num(reordered_images_grid_thw);
+    size_t total_tokens = video_tokens + image_tokens;
+
+    size_t video_token_count = 0;
+    if (total_tokens > 0) {
+        video_token_count = vision_embeds_shape[0] * video_tokens / total_tokens;
+    }
+    size_t image_token_count = vision_embeds_shape[0] - video_token_count;
+
+    ov::Tensor video_embeds{vision_embeds.get_element_type(), {video_token_count, vision_embeds_shape[1]}};
+    ov::Tensor image_embeds{vision_embeds.get_element_type(), {image_token_count, vision_embeds_shape[1]}};
+
+    std::memcpy(video_embeds.data(), vision_embeds.data(), video_embeds.get_byte_size());
+    std::memcpy(image_embeds.data(),
+                static_cast<uint8_t*>(vision_embeds.data()) + video_embeds.get_byte_size(),
+                image_embeds.get_byte_size());
+
+    return {video_embeds, image_embeds};


Splitting vision_embeds by proportional ratio can silently produce incorrect boundaries (and rounding artifacts) if the merger output token dimension differs from video_tokens + image_tokens or if the model changes tokenization behavior. If the output is expected to preserve token count/order, split deterministically using video_tokens and image_tokens (and validate vision_embeds_shape[0] == total_tokens); otherwise, this needs an explicit, model-defined mapping rather than a proportional guess.

zhaohb · 2026-04-10T09:13:09Z

Could you please take a look at the implementation and verify whether the approach is reasonable?
@apaniukov @yatarkan
Thank you very much.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Copilot · 2026-04-10T10:20:54Z

    } else if (vlm_config.model_type == VLMModelType::QWEN3_VL) {
        m_impl = std::make_shared<InputsEmbedderQwen3VL>(vlm_config, model_dir, device, device_config);
+    } else if (vlm_config.model_type == VLMModelType::QWEN3_5) {
+        m_impl = std::make_shared<InputsEmbedderQwen3_5>(vlm_config, model_dir, device, device_config);
    } else if (vlm_config.model_type == VLMModelType::GEMMA3) {


New QWEN3_5 model branch is introduced here, but there is no corresponding functional coverage in the existing VLM pipeline test suite (e.g., tests/python_tests/test_vlm_pipeline.py enumerates supported tiny-random VLMs and currently has no Qwen3.5 entry). Please add at least one test case exercising this code path (including the hybrid behavior where the LM may omit position_ids and Qwen3.5 returns empty extra inputs).

yatarkan · 2026-04-15T21:30:35Z

Closing in favor of #3717 as it includes proper position_ids input handling, supports video inputs and new-style processor configs (introduced with transformers v5).
Thanks for initial enablement proposal, it was helpful for a base implementation.

Add Qwen3.5 hybrid model support

69f0de1

Co-authored-by: gitpqLee <pengqiang.li@intel.com>

Copilot AI review requested due to automatic review settings March 27, 2026 04:11

zhaohb requested review from Wovchena, as-suvorov and yatarkan as code owners March 27, 2026 04:11

Merge branch 'master' into support_qwen3.5

64aad8b

github-actions Bot added category: visual language Visual language pipeline no-match-files labels Mar 27, 2026

zhaohb marked this pull request as draft March 27, 2026 04:12

Copilot started reviewing on behalf of zhaohb March 27, 2026 05:09 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

rkazants reviewed Apr 1, 2026

View reviewed changes

rkazants requested a review from apaniukov April 1, 2026 13:10

apaniukov reviewed Apr 1, 2026

View reviewed changes

Comment thread src/cpp/src/utils.cpp Outdated

sund00bie mentioned this pull request Apr 9, 2026

[Feature Request] Gemma 4 support #3653

Open

zhaohb added 2 commits April 10, 2026 15:20

Merge branch 'master' into support_qwen3.5

c5ddc81

update

6f4e457

Copilot AI review requested due to automatic review settings April 10, 2026 08:22

zhaohb and others added 2 commits April 10, 2026 16:24

Add Qwen3.5 hybrid model support

4f1a13f

Co-authored-by: gitpqLee <pengqiang.li@intel.com>

update

58cfb0c

Merge branch 'support_qwen3.5' of https://github.com/zhaohb/openvino.…

a8d98f6

…genai into support_qwen3.5

This comment was marked as abuse.

Sign in to view

zhaohb added 2 commits April 10, 2026 16:50

align with the merged implementation

584aa12

update

b68153a

Copilot AI review requested due to automatic review settings April 10, 2026 08:54

Copilot started reviewing on behalf of zhaohb April 10, 2026 09:01 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

github-actions Bot removed the no-match-files label Apr 10, 2026

zhaohb requested a review from apaniukov April 10, 2026 09:06

Copilot started reviewing on behalf of zhaohb April 10, 2026 09:10 View session

Merge branch 'master' into support_qwen3.5

c99df00

Merge branch 'master' into support_qwen3.5

bc2002a

Copilot AI review requested due to automatic review settings April 10, 2026 10:14

Copilot started reviewing on behalf of zhaohb April 10, 2026 10:15 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

yatarkan closed this Apr 15, 2026

Conversation

zhaohb commented Mar 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

sund00bie commented Apr 1, 2026

Uh oh!

rkazants left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

savvadesogle commented Apr 1, 2026

Uh oh!

SearchSavior commented Apr 1, 2026

Uh oh!

sund00bie commented Apr 1, 2026

Uh oh!

sund00bie commented Apr 4, 2026

Uh oh!

savvadesogle commented Apr 4, 2026

Uh oh!

zhaohb commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sund00bie commented Apr 10, 2026

Uh oh!

This comment was marked as abuse.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

zhaohb commented Apr 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

yatarkan commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

zhaohb commented Apr 7, 2026 •

edited

Loading

yatarkan commented Apr 15, 2026 •

edited

Loading