Support Gemma4 model by as-suvorov · Pull Request #3644 · openvinotoolkit/openvino.genai

as-suvorov · 2026-04-02T17:26:09Z

Description

Depends on: huggingface/optimum-intel#1688
optimum-intel PR depends on transformers v5 (update: transformers v5 support merged to optimum-intel).

WWB Accuracy:

genai vs optimum-intel: 0.9682357
genai vs transformers: 0.94821364
optimum-intel vs transformers: 0.9387633

Fixes: #3653

Checklist:

This PR follows GenAI Contributing guidelines.
Tests have been updated or added to cover the new code.
This PR fully addresses the ticket.
I have made corresponding changes to the documentation.

Copilot

Pull request overview

This PR adds initial support for the Gemma4 visual-language model in the C++ VLM pipeline, extending the existing model-type routing, preprocessing config, and LM input wiring to handle Gemma4’s vision encoder + optional per-layer text embeddings.

Changes:

Add GEMMA4 model type and wire it into VisionEncoder and InputsEmbedder factories.
Introduce Gemma4-specific vision preprocessing + prompt normalization and optional per_layer_inputs handling.
Extend LM generation to recompute and feed per_layer_inputs during the generation phase when available.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
src/cpp/src/visual_language/vlm_config.hpp	Adds `GEMMA4` model type and Gemma4 prompt token fields.
src/cpp/src/visual_language/vlm_config.cpp	Maps `"gemma4"` string to `VLMModelType::GEMMA4`.
src/cpp/src/visual_language/vision_encoder.cpp	Routes Gemma4 to `VisionEncoderGemma4`.
src/cpp/src/visual_language/processor_config.hpp	Adds Gemma4 processor params (`pooling_kernel_size`, `max_soft_tokens`).
src/cpp/src/visual_language/processor_config.cpp	Reads Gemma4 processor params from JSON when present.
src/cpp/src/visual_language/pipeline.cpp	Passes per-layer embedding queue to LM encoding; extends SDPA requirement to Gemma4.
src/cpp/src/visual_language/inputs_embedder.hpp	Exposes optional per-layer embeddings queue from embedders.
src/cpp/src/visual_language/inputs_embedder.cpp	Instantiates `InputsEmbedderGemma4` and forwards queue accessor.
src/cpp/src/visual_language/gemma4/classes.hpp	Declares Gemma4 vision encoder + inputs embedder.
src/cpp/src/visual_language/gemma4/classes.cpp	Implements Gemma4 preprocessing, prompt normalization, and per-layer embeddings model support.
src/cpp/src/lm_encoding.hpp	Extends `get_lm_encoded_results()` signature to accept per-layer embeddings request queue.
src/cpp/src/lm_encoding.cpp	Recomputes `per_layer_inputs` on each generation step when configured.

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 9 comments.

Copilot

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.

Copilot

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.

+        // fixme: there is seem to be an issue with how image_token is replaced. unified_prompt.find needs search_offset.
+        // refer to gemma4 implementation.


+    // 5. Extract patches: (num_patches, patch_size*patch_size*3)
+    const size_t num_patches_h = target_height / config.patch_size;
+    const size_t num_patches_w = target_width / config.patch_size;
+    const size_t patch_dim = config.patch_size * config.patch_size * 3;
+
+    // Create padded pixel_values tensor [1, max_patches, patch_dim]
+    ov::Tensor pixel_values(ov::element::f32, {1, max_patches, patch_dim});
+    float* pv_data = pixel_values.data<float>();
+    std::fill(pv_data, pv_data + max_patches * patch_dim, 0.0f);
+
+    extract_patches(float_image, config.patch_size, pv_data, num_patches_h, num_patches_w);
+


+
+    ov::Tensor input_ids = get_encoded_input_ids(prompt, metrics);
+
+    m_lm_extra_inputs["per_layer_inputs"] = get_per_layer_embeddings(input_ids);


as-suvorov added 6 commits April 2, 2026 13:26

Enable text part

81c8c47

Add image pre-processing

c7866c3

Fix accuracy & address hardcoded tokens

3566050

Add double newline fix

8b20888

Require sdpa

bff90d0

remove notes

1ba2ce7

Copilot AI review requested due to automatic review settings April 2, 2026 17:26

github-actions Bot added category: visual language Visual language pipeline category: LLM LLM pipeline (stateful, static) labels Apr 2, 2026

Copilot started reviewing on behalf of as-suvorov April 2, 2026 17:26 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

as-suvorov added do_not_merge do_not_review labels Apr 2, 2026

MaximProshin mentioned this pull request Apr 6, 2026

[Feature Request] Gemma 4 support #3653

Open

as-suvorov added 5 commits April 9, 2026 09:41

Merge remote-tracking branch 'upstream/master' into as/vlm_enable_1

4f6b1b4

Make per-layer embeddings model non optional

28c26bb

Use callback for per-layer embeddings

cfda1fb

Move namespace

6ab4f9f

Move get_per_layer_embeds to private

e8afd00

Copilot AI review requested due to automatic review settings April 9, 2026 11:55

Copilot started reviewing on behalf of as-suvorov April 9, 2026 11:56 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

Comment thread src/cpp/src/visual_language/gemma4/classes.cpp Outdated

Comment thread src/cpp/src/visual_language/vision_encoder.cpp

Comment thread src/cpp/src/visual_language/pipeline.cpp

Comment thread src/cpp/src/visual_language/pipeline.cpp Outdated

as-suvorov added 2 commits April 9, 2026 15:24

Remove unused var

5f2e635

Update docs

4f29647

Copilot AI review requested due to automatic review settings April 9, 2026 13:38

github-actions Bot added the category: GH Pages Docs Github Pages documentation label Apr 9, 2026

Copilot started reviewing on behalf of as-suvorov April 9, 2026 13:38 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

Comment thread src/cpp/src/visual_language/pipeline.cpp

Comment thread src/cpp/src/visual_language/gemma4/classes.cpp

Comment thread src/cpp/src/visual_language/processor_config.cpp

Comment thread src/cpp/src/visual_language/gemma4/classes.cpp

as-suvorov assigned yatarkan Apr 9, 2026

as-suvorov removed the do_not_merge label Apr 9, 2026

Use no-deps

f202350

Copilot AI review requested due to automatic review settings April 22, 2026 10:35

Copilot started reviewing on behalf of as-suvorov April 22, 2026 10:36 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

Comment thread .github/workflows/manylinux_2_28.yml Outdated

Comment thread site/docs/supported-models/_components/vlm-models-table/models.ts

Comment thread src/cpp/src/visual_language/gemma3/classes.cpp

Comment thread .github/workflows/linux.yml Outdated

as-suvorov added 2 commits April 22, 2026 13:15

Force wheels install

123375e

fix wheels install

c6efa90

Copilot AI review requested due to automatic review settings April 22, 2026 12:07

Copilot started reviewing on behalf of as-suvorov April 22, 2026 12:08 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

Comment thread site/docs/supported-models/_components/vlm-models-table/models.ts

Comment thread src/cpp/src/visual_language/gemma3/classes.cpp

Comment thread .github/workflows/linux.yml Outdated

Comment thread .github/workflows/manylinux_2_28.yml Outdated

as-suvorov added 2 commits April 22, 2026 14:54

no-deps for optimum-intel

08b6ab8

Merge remote-tracking branch 'upstream/master' into as/vlm_enable_1

3611ee4

Copilot AI review requested due to automatic review settings April 22, 2026 13:05

Copilot started reviewing on behalf of as-suvorov April 22, 2026 13:05 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

as-suvorov added 2 commits April 22, 2026 15:29

Fix include

1254f0c

Use commit for optimum. Disable NPU tests

10c21fe

Copilot AI review requested due to automatic review settings April 22, 2026 14:02

Copilot started reviewing on behalf of as-suvorov April 22, 2026 14:02 View session

Add note

e2ded62

Copilot AI reviewed Apr 22, 2026

View reviewed changes

Comment thread src/cpp/src/visual_language/inputs_embedder.cpp

Comment thread src/cpp/src/visual_language/gemma3/classes.cpp

Comment thread src/cpp/src/visual_language/gemma3/classes.cpp

as-suvorov requested a review from yatarkan April 22, 2026 14:48

Remove typo

2aad8d7

as-suvorov marked this pull request as ready for review April 22, 2026 14:56

as-suvorov requested a review from sgonorov as a code owner April 22, 2026 14:56

Copilot AI review requested due to automatic review settings April 22, 2026 14:56

as-suvorov requested review from Wovchena and akashchi as code owners April 22, 2026 14:56

Copilot started reviewing on behalf of as-suvorov April 22, 2026 14:57 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

Remove test from mac workflow

81df453

		// fixme: there is seem to be an issue with how image_token is replaced. unified_prompt.find needs search_offset.
		// refer to gemma4 implementation.


		ov::Tensor input_ids = get_encoded_input_ids(prompt, metrics);

		m_lm_extra_inputs["per_layer_inputs"] = get_per_layer_embeddings(input_ids);

Conversation

as-suvorov commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

WWB Accuracy:

Checklist:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

as-suvorov commented Apr 2, 2026 •

edited

Loading