Skip to content

gemini-3.1-flash-image-preview generation times are significantly slower than gemini-3-pro-image-preview #1544

@fractaldna22

Description

@fractaldna22

Environment details

  • Programming language: JavaScript / TypeScript
  • Operating systems: macOS and Windows
  • Runtime: Node.js v20+ / browser environment
  • Package version: @google/genai@1.51.0

Steps to reproduce

  1. Initialize the @google/genai client.
  2. Send an image generation request to gemini-3.1-flash-image-preview using a minimal configuration: imageSize: "1K" and thinkingLevel: "minimal".
  3. Send the same request to gemini-3-pro-image-preview.
  4. Measure the latency for each request, specifically the time until the first image data is returned through the stream.
  5. Observe that gemini-3.1-flash-image-preview consistently takes significantly longer to return an image than gemini-3-pro-image-preview, despite being the Flash variant.

Minimal reproducible code

import { GoogleGenAI } from '@google/genai';

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

async function testLatency(modelName) {
  const startTime = Date.now();
  console.log(`Testing ${modelName}...`);

  const response = await ai.models.generateContentStream({
    model: modelName,
    contents: "A futuristic city skyline at sunset.",
    config: {
      responseModalities: ["IMAGE"],
      imageConfig: { imageSize: "1K" },
      thinkingConfig: { thinkingLevel: "minimal" }
    }
  });

  for await (const chunk of response) {
    if (chunk.candidates?.[0]?.content?.parts?.[0]?.inlineData) {
      console.log(`${modelName} finished in ${(Date.now() - startTime) / 1000} seconds.`);
      break;
    }
  }
}

// Run sequentially to compare latency under the same conditions.
await testLatency('gemini-3-pro-image-preview');
await testLatency('gemini-3.1-flash-image-preview');

Expected behavior and additional context

When using the API to generate a baseline 1K image with minimal thinking enabled, I would expect gemini-3.1-flash-image-preview to return results faster than the Pro image preview model, or at least within a comparable latency range. Instead, the Flash endpoint is taking considerably longer than gemini-3-pro-image-preview under the same request conditions.

My understanding was that the 3.1 Flash Image Preview model is intended to be faster, lighter-weight, and better suited for rapid iteration. The observed behavior appears to contradict that expectation, especially when the payload is intentionally minimal and no higher-resolution image settings or heavier thinking configuration are being used.

Is there currently a known warm-up delay, routing bottleneck, capacity issue, or backend initialization cost affecting the gemini-3.1-flash-image-preview endpoint? Alternatively, could this be related to how the SDK handles streaming responses for this specific model, where image data may not be emitted until later in the generation process even if the request itself has begun processing earlier?

Metadata

Metadata

Assignees

Labels

api:gemini-apipriority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions