Environment details
- Programming language: JavaScript / TypeScript
- Operating systems: macOS and Windows
- Runtime: Node.js v20+ / browser environment
- Package version:
@google/genai@1.51.0
Steps to reproduce
- Initialize the
@google/genai client.
- Send an image generation request to
gemini-3.1-flash-image-preview using a minimal configuration: imageSize: "1K" and thinkingLevel: "minimal".
- Send the same request to
gemini-3-pro-image-preview.
- Measure the latency for each request, specifically the time until the first image data is returned through the stream.
- Observe that
gemini-3.1-flash-image-preview consistently takes significantly longer to return an image than gemini-3-pro-image-preview, despite being the Flash variant.
Minimal reproducible code
import { GoogleGenAI } from '@google/genai';
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
async function testLatency(modelName) {
const startTime = Date.now();
console.log(`Testing ${modelName}...`);
const response = await ai.models.generateContentStream({
model: modelName,
contents: "A futuristic city skyline at sunset.",
config: {
responseModalities: ["IMAGE"],
imageConfig: { imageSize: "1K" },
thinkingConfig: { thinkingLevel: "minimal" }
}
});
for await (const chunk of response) {
if (chunk.candidates?.[0]?.content?.parts?.[0]?.inlineData) {
console.log(`${modelName} finished in ${(Date.now() - startTime) / 1000} seconds.`);
break;
}
}
}
// Run sequentially to compare latency under the same conditions.
await testLatency('gemini-3-pro-image-preview');
await testLatency('gemini-3.1-flash-image-preview');
Expected behavior and additional context
When using the API to generate a baseline 1K image with minimal thinking enabled, I would expect gemini-3.1-flash-image-preview to return results faster than the Pro image preview model, or at least within a comparable latency range. Instead, the Flash endpoint is taking considerably longer than gemini-3-pro-image-preview under the same request conditions.
My understanding was that the 3.1 Flash Image Preview model is intended to be faster, lighter-weight, and better suited for rapid iteration. The observed behavior appears to contradict that expectation, especially when the payload is intentionally minimal and no higher-resolution image settings or heavier thinking configuration are being used.
Is there currently a known warm-up delay, routing bottleneck, capacity issue, or backend initialization cost affecting the gemini-3.1-flash-image-preview endpoint? Alternatively, could this be related to how the SDK handles streaming responses for this specific model, where image data may not be emitted until later in the generation process even if the request itself has begun processing earlier?
Environment details
@google/genai@1.51.0Steps to reproduce
@google/genaiclient.gemini-3.1-flash-image-previewusing a minimal configuration:imageSize: "1K"andthinkingLevel: "minimal".gemini-3-pro-image-preview.gemini-3.1-flash-image-previewconsistently takes significantly longer to return an image thangemini-3-pro-image-preview, despite being the Flash variant.Minimal reproducible code
Expected behavior and additional context
When using the API to generate a baseline 1K image with minimal thinking enabled, I would expect
gemini-3.1-flash-image-previewto return results faster than the Pro image preview model, or at least within a comparable latency range. Instead, the Flash endpoint is taking considerably longer thangemini-3-pro-image-previewunder the same request conditions.My understanding was that the 3.1 Flash Image Preview model is intended to be faster, lighter-weight, and better suited for rapid iteration. The observed behavior appears to contradict that expectation, especially when the payload is intentionally minimal and no higher-resolution image settings or heavier thinking configuration are being used.
Is there currently a known warm-up delay, routing bottleneck, capacity issue, or backend initialization cost affecting the
gemini-3.1-flash-image-previewendpoint? Alternatively, could this be related to how the SDK handles streaming responses for this specific model, where image data may not be emitted until later in the generation process even if the request itself has begun processing earlier?