[Bug]: Qwen3-Coder-30B-A3B-Instruct-int4-ov runs on CPU but triggers OOM on iGPU

### OpenVINO Version

OpenVINO Model Server 2026.0.0.4d3933c5c, OpenVINO backend 2026.0.0.0rc3

### Operating System

Other (Please specify in description)

### Device used for inference

GPU

### Framework

None

### Model used

Qwen3-Coder-30B-A3B-Instruct

### Issue description

I tried running the Qwen3-Coder-30B-A3B-Instruct model  on Intel Core Ultra 7 265, Ubuntu 25.10.

It worked on the CPU but failed on the GPU: During the GPU load attempt, system memory usage climbs until it fills the entire 96GB of RAM and exhausts nearly all of the 32GB swap file. I have attempted to mitigate this by applying various parameters to limit the KV cache size, but the memory leak/exhaustion persists regardless of these settings.

### Step-by-step reproduction

I downloaded the optimized model pulled directly via:

```
/opt/openvino/ovms/bin/ovms --pull \
  --source_model "OpenVINO/Qwen3-Coder-30B-A3B-Instruct-int4-ov" \
  --model_repository_path /opt/openvino/models \
  --model_name Qwen3-Coder-30B-A3B-Instruct-int4-ov \
  --task text_generation
```
 it worked with:

```
/opt/openvino/ovms/bin/ovms \
  --model_repository_path /opt/openvino/models \
  --model_name Qwen3-Coder-30B-A3B-Instruct-int4-ov \
  --task text_generation \
  --port 9001 \
  --rest_port 8000 \
  --target_device CPU
```
and failed (killed by Linux) with:

```
/opt/openvino/ovms/bin/ovms \
  --model_repository_path /opt/openvino/models \
  --model_name Qwen3-Coder-30B-A3B-Instruct-int4-ov \
  --task text_generation \
  --port 9001 \
  --rest_port 8000 \
  --target_device GPU
```

### Relevant log output

```shell

```

### Issue submission checklist

- [x] I'm reporting an issue. It's not a question.
- [x] I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
- [x] There is reproducer code and related data files such as images, videos, models, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Qwen3-Coder-30B-A3B-Instruct-int4-ov runs on CPU but triggers OOM on iGPU #34415

OpenVINO Version

Operating System

Device used for inference

Framework

Model used

Issue description

Step-by-step reproduction

Relevant log output

Issue submission checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Qwen3-Coder-30B-A3B-Instruct-int4-ov runs on CPU but triggers OOM on iGPU #34415

Description

OpenVINO Version

Operating System

Device used for inference

Framework

Model used

Issue description

Step-by-step reproduction

Relevant log output

Issue submission checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions