Skip to content

Support Gemma4 model#3644

Open
as-suvorov wants to merge 41 commits intoopenvinotoolkit:masterfrom
as-suvorov:as/vlm_enable_1
Open

Support Gemma4 model#3644
as-suvorov wants to merge 41 commits intoopenvinotoolkit:masterfrom
as-suvorov:as/vlm_enable_1

Conversation

@as-suvorov
Copy link
Copy Markdown
Collaborator

@as-suvorov as-suvorov commented Apr 2, 2026

Description

Depends on: huggingface/optimum-intel#1688
optimum-intel PR depends on transformers v5 (update: transformers v5 support merged to optimum-intel).

WWB Accuracy:

genai vs optimum-intel: 0.9682357
genai vs transformers: 0.94821364
optimum-intel vs transformers: 0.9387633

Fixes: #3653

Checklist:

  • This PR follows GenAI Contributing guidelines.
  • Tests have been updated or added to cover the new code.
  • This PR fully addresses the ticket.
  • I have made corresponding changes to the documentation.

Copilot AI review requested due to automatic review settings April 2, 2026 17:26
@github-actions github-actions Bot added category: visual language Visual language pipeline category: LLM LLM pipeline (stateful, static) labels Apr 2, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds initial support for the Gemma4 visual-language model in the C++ VLM pipeline, extending the existing model-type routing, preprocessing config, and LM input wiring to handle Gemma4’s vision encoder + optional per-layer text embeddings.

Changes:

  • Add GEMMA4 model type and wire it into VisionEncoder and InputsEmbedder factories.
  • Introduce Gemma4-specific vision preprocessing + prompt normalization and optional per_layer_inputs handling.
  • Extend LM generation to recompute and feed per_layer_inputs during the generation phase when available.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/cpp/src/visual_language/vlm_config.hpp Adds GEMMA4 model type and Gemma4 prompt token fields.
src/cpp/src/visual_language/vlm_config.cpp Maps "gemma4" string to VLMModelType::GEMMA4.
src/cpp/src/visual_language/vision_encoder.cpp Routes Gemma4 to VisionEncoderGemma4.
src/cpp/src/visual_language/processor_config.hpp Adds Gemma4 processor params (pooling_kernel_size, max_soft_tokens).
src/cpp/src/visual_language/processor_config.cpp Reads Gemma4 processor params from JSON when present.
src/cpp/src/visual_language/pipeline.cpp Passes per-layer embedding queue to LM encoding; extends SDPA requirement to Gemma4.
src/cpp/src/visual_language/inputs_embedder.hpp Exposes optional per-layer embeddings queue from embedders.
src/cpp/src/visual_language/inputs_embedder.cpp Instantiates InputsEmbedderGemma4 and forwards queue accessor.
src/cpp/src/visual_language/gemma4/classes.hpp Declares Gemma4 vision encoder + inputs embedder.
src/cpp/src/visual_language/gemma4/classes.cpp Implements Gemma4 preprocessing, prompt normalization, and per-layer embeddings model support.
src/cpp/src/lm_encoding.hpp Extends get_lm_encoded_results() signature to accept per-layer embeddings request queue.
src/cpp/src/lm_encoding.cpp Recomputes per_layer_inputs on each generation step when configured.

Comment thread src/cpp/src/visual_language/gemma4/classes.cpp
Comment thread src/cpp/src/visual_language/gemma4/classes.cpp
Comment thread src/cpp/src/visual_language/gemma4/classes.cpp Outdated
Comment thread src/cpp/src/visual_language/pipeline.cpp Outdated
Comment thread src/cpp/src/visual_language/gemma4/classes.cpp Outdated
Comment thread src/cpp/src/visual_language/pipeline.cpp Outdated
Copilot AI review requested due to automatic review settings April 9, 2026 11:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

Comment thread src/cpp/src/visual_language/gemma4/classes.cpp Outdated
Comment thread src/cpp/src/visual_language/vision_encoder.cpp
Comment thread src/cpp/src/visual_language/pipeline.cpp
Comment thread src/cpp/src/visual_language/pipeline.cpp Outdated
Copilot AI review requested due to automatic review settings April 9, 2026 13:38
@github-actions github-actions Bot added the category: GH Pages Docs Github Pages documentation label Apr 9, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Comment thread src/cpp/src/visual_language/pipeline.cpp
Comment thread src/cpp/src/visual_language/gemma4/classes.cpp
Comment thread src/cpp/src/visual_language/processor_config.cpp
Comment thread src/cpp/src/visual_language/gemma4/classes.cpp
Copilot AI review requested due to automatic review settings April 22, 2026 10:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.

Comment thread .github/workflows/manylinux_2_28.yml Outdated
Comment thread site/docs/supported-models/_components/vlm-models-table/models.ts
Comment thread src/cpp/src/visual_language/gemma3/classes.cpp
Comment thread .github/workflows/linux.yml Outdated
Copilot AI review requested due to automatic review settings April 22, 2026 12:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.

Comment thread site/docs/supported-models/_components/vlm-models-table/models.ts
Comment thread src/cpp/src/visual_language/gemma3/classes.cpp
Comment thread .github/workflows/linux.yml Outdated
Comment thread .github/workflows/manylinux_2_28.yml Outdated
Copilot AI review requested due to automatic review settings April 22, 2026 13:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 9 comments.

Comment thread src/cpp/src/visual_language/gemma3/classes.cpp
Comment thread src/cpp/src/visual_language/gemma4/classes.cpp
Comment thread .github/workflows/linux.yml Outdated
Comment thread .github/workflows/manylinux_2_28.yml Outdated
Comment thread site/docs/supported-models/_components/vlm-models-table/models.ts
Comment thread src/cpp/src/visual_language/vision_encoder.cpp
Comment thread src/cpp/src/visual_language/inputs_embedder.cpp
Comment thread src/cpp/src/visual_language/gemma3/classes.cpp
Comment thread src/cpp/src/visual_language/gemma4/classes.cpp
Copilot AI review requested due to automatic review settings April 22, 2026 14:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.

Comment thread src/cpp/src/visual_language/inputs_embedder.cpp
Comment thread src/cpp/src/visual_language/gemma3/classes.cpp
Comment thread src/cpp/src/visual_language/gemma3/classes.cpp
@as-suvorov as-suvorov requested a review from yatarkan April 22, 2026 14:48
@as-suvorov as-suvorov marked this pull request as ready for review April 22, 2026 14:56
@as-suvorov as-suvorov requested a review from sgonorov as a code owner April 22, 2026 14:56
Copilot AI review requested due to automatic review settings April 22, 2026 14:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.

Comment on lines +109 to +110
// fixme: there is seem to be an issue with how image_token is replaced. unified_prompt.find needs search_offset.
// refer to gemma4 implementation.
Comment on lines +116 to +127
// 5. Extract patches: (num_patches, patch_size*patch_size*3)
const size_t num_patches_h = target_height / config.patch_size;
const size_t num_patches_w = target_width / config.patch_size;
const size_t patch_dim = config.patch_size * config.patch_size * 3;

// Create padded pixel_values tensor [1, max_patches, patch_dim]
ov::Tensor pixel_values(ov::element::f32, {1, max_patches, patch_dim});
float* pv_data = pixel_values.data<float>();
std::fill(pv_data, pv_data + max_patches * patch_dim, 0.0f);

extract_patches(float_image, config.patch_size, pv_data, num_patches_h, num_patches_w);


ov::Tensor input_ids = get_encoded_input_ids(prompt, metrics);

m_lm_extra_inputs["per_layer_inputs"] = get_per_layer_embeddings(input_ids);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: GGUF GGUF file reader category: GH Pages Docs Github Pages documentation category: GHA CI based on Github actions category: LLM LLM pipeline (stateful, static) category: visual language Visual language pipeline no-match-files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Gemma 4 support

6 participants