gemini-3.1-flash-image-preview generation times are significantly slower than gemini-3-pro-image-preview

#### Environment details

- Programming language: JavaScript / TypeScript
- Operating systems: macOS and Windows
- Runtime: Node.js v20+ / browser environment
- Package version: `@google/genai@1.51.0`

#### Steps to reproduce

1. Initialize the `@google/genai` client.
2. Send an image generation request to `gemini-3.1-flash-image-preview` using a minimal configuration: `imageSize: "1K"` and `thinkingLevel: "minimal"`.
3. Send the same request to `gemini-3-pro-image-preview`.
4. Measure the latency for each request, specifically the time until the first image data is returned through the stream.
5. Observe that `gemini-3.1-flash-image-preview` consistently takes significantly longer to return an image than `gemini-3-pro-image-preview`, despite being the Flash variant.

#### Minimal reproducible code

```typescript
import { GoogleGenAI } from '@google/genai';

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

async function testLatency(modelName) {
  const startTime = Date.now();
  console.log(`Testing ${modelName}...`);

  const response = await ai.models.generateContentStream({
    model: modelName,
    contents: "A futuristic city skyline at sunset.",
    config: {
      responseModalities: ["IMAGE"],
      imageConfig: { imageSize: "1K" },
      thinkingConfig: { thinkingLevel: "minimal" }
    }
  });

  for await (const chunk of response) {
    if (chunk.candidates?.[0]?.content?.parts?.[0]?.inlineData) {
      console.log(`${modelName} finished in ${(Date.now() - startTime) / 1000} seconds.`);
      break;
    }
  }
}

// Run sequentially to compare latency under the same conditions.
await testLatency('gemini-3-pro-image-preview');
await testLatency('gemini-3.1-flash-image-preview');
```

#### Expected behavior and additional context

When using the API to generate a baseline 1K image with minimal thinking enabled, I would expect `gemini-3.1-flash-image-preview` to return results faster than the Pro image preview model, or at least within a comparable latency range. Instead, the Flash endpoint is taking considerably longer than `gemini-3-pro-image-preview` under the same request conditions.

My understanding was that the 3.1 Flash Image Preview model is intended to be faster, lighter-weight, and better suited for rapid iteration. The observed behavior appears to contradict that expectation, especially when the payload is intentionally minimal and no higher-resolution image settings or heavier thinking configuration are being used.

Is there currently a known warm-up delay, routing bottleneck, capacity issue, or backend initialization cost affecting the `gemini-3.1-flash-image-preview` endpoint? Alternatively, could this be related to how the SDK handles streaming responses for this specific model, where image data may not be emitted until later in the generation process even if the request itself has begun processing earlier?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gemini-3.1-flash-image-preview generation times are significantly slower than gemini-3-pro-image-preview #1544

Environment details

Steps to reproduce

Minimal reproducible code

Expected behavior and additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

gemini-3.1-flash-image-preview generation times are significantly slower than gemini-3-pro-image-preview #1544

Description

Environment details

Steps to reproduce

Minimal reproducible code

Expected behavior and additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions