Skip to content

[Bug]: Qwen3-Coder-30B-A3B-Instruct-int4-ov runs on CPU but triggers OOM on iGPU #34415

@ikirsh

Description

@ikirsh

OpenVINO Version

OpenVINO Model Server 2026.0.0.4d3933c5c, OpenVINO backend 2026.0.0.0rc3

Operating System

Other (Please specify in description)

Device used for inference

GPU

Framework

None

Model used

Qwen3-Coder-30B-A3B-Instruct

Issue description

I tried running the Qwen3-Coder-30B-A3B-Instruct model on Intel Core Ultra 7 265, Ubuntu 25.10.

It worked on the CPU but failed on the GPU: During the GPU load attempt, system memory usage climbs until it fills the entire 96GB of RAM and exhausts nearly all of the 32GB swap file. I have attempted to mitigate this by applying various parameters to limit the KV cache size, but the memory leak/exhaustion persists regardless of these settings.

Step-by-step reproduction

I downloaded the optimized model pulled directly via:

/opt/openvino/ovms/bin/ovms --pull \
  --source_model "OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4-ov" \
  --model_repository_path /opt/openvino/models \
  --model_name Qwen3-Coder-30B-A3B-Instruct-int4-ov \
  --task text_generation

it worked with:

/opt/openvino/ovms/bin/ovms \
  --model_repository_path /opt/openvino/models \
  --model_name Qwen3-Coder-30B-A3B-Instruct-int4-ov \
  --task text_generation \
  --port 9001 \
  --rest_port 8000 \
  --target_device CPU

and failed (killed by Linux) with:

/opt/openvino/ovms/bin/ovms \
  --model_repository_path /opt/openvino/models \
  --model_name Qwen3-Coder-30B-A3B-Instruct-int4-ov \
  --task text_generation \
  --port 9001 \
  --rest_port 8000 \
  --target_device GPU

Relevant log output

Issue submission checklist

  • I'm reporting an issue. It's not a question.
  • I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
  • There is reproducer code and related data files such as images, videos, models, etc.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions