feat: add dynamic batching to Mistral and OpenAI embedders by bjorncs · Pull Request #36461 · vespa-engine/vespa

bjorncs · 2026-04-20T10:51:53Z

Summary

Opt the Mistral and OpenAI embedders into the framework-level dynamic batching mechanism already used by the VoyageAI embedder (EmbedExpression + DynamicBatcher), so concurrent per-document embed() calls are coalesced into a single multi-input API request.
Extend the mistral-embedder and openai-embedder config definitions with batching.maxSize / batching.maxDelayMillis (default 0 = disabled) and expose the matching <batching max-size="…" max-delay="…"/> XML element via common.rnc.
Consolidate builder-forwarding logic across all three embedder implementations (VoyageAI, Mistral, OpenAI) via a new EmbedderBatchingConfig.applyTo(maxSizeSetter, maxDelayMillisSetter) helper.
Fold in a pre-existing fix (fix: record actual 5xx status code when embedder retries are exhausted) that's already on this branch.
Add unit, config-model XML, and integration tests covering the new batching path.

The RetryInterceptor used to throw an IOException with the final status code encoded in its message when retries were exhausted on a retryable 5xx response. The outer doHttpRequest's IOException catch then sampled the failure metric as status 0 (connection error), so dashboards/alerts that distinguish network failures from server overload misclassified exhausted-retry failures across all HTTP embedders (OpenAI, Mistral, VoyageAI). Return the final retryable 5xx response from the interceptor instead of throwing. The existing non-2xx path in doHttpRequest then samples the real code (500/502/503/504) and maps to the appropriate exception. True connection failures still propagate as IOException and sample 0, as intended.

Opt the Mistral and OpenAI embedders into the framework-level dynamic batching mechanism already used by the VoyageAI embedder. Under concurrent load, EmbedExpression + DynamicBatcher now accumulates per-document embed() calls into a single multi-input API request, reducing request count and improving throughput. - Add batching.maxSize / batching.maxDelayMillis to the mistral- and openai-embedder config definitions (default 0 = disabled). - Initialize Embedder.Batching in the runtime embedders and expose it via batchingConfig(). - Parse the <batching max-size="..." max-delay="..."/> XML element in the config-model component builders via the existing EmbedderBatchingConfig helper; extend EmbedderBatchingConfig with an applyTo(maxSizeSetter, maxDelayMillisSetter) method so all three embedders (VoyageAI, Mistral, OpenAI) forward the parsed values to the generated config builder through a single shared call. - Add EmbedderBatchingParams to the RELAX NG schema for both element types. - Extend unit, config-model XML, and integration tests to cover the new batching config path.

Copilot

Pull request overview

This PR opts the OpenAI and Mistral embedders into Vespa’s existing framework-level dynamic batching (used by EmbedExpression + DynamicBatcher) by exposing batching configuration via embedder configs and returning that configuration from the embedder implementations. It also incorporates a retry/metrics fix to ensure the final 5xx status code is recorded when retries are exhausted.

Changes:

Add batching.maxSize / batching.maxDelayMillis config to OpenAI and Mistral embedders, wire through XML (<batching max-size="…" max-delay="…"/>) and model config builders.
Expose batching configuration from OpenAIEmbedder and MistralEmbedder via Embedder#batchingConfig() so EmbedExpression can enable dynamic batching.
Adjust retry behavior/tests so exhausted retries surface the actual last HTTP status code and sample it in metrics.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
model-integration/src/main/java/ai/vespa/embedding/OpenAIEmbedder.java	Exposes `batchingConfig()` based on new config fields.
model-integration/src/main/java/ai/vespa/embedding/MistralEmbedder.java	Exposes `batchingConfig()` based on new config fields.
model-integration/src/main/java/ai/vespa/embedding/AbstractHttpEmbedder.java	Retry interceptor now returns the final retryable response so callers can record the actual status code.
configdefinitions/src/vespa/openai-embedder.def	Adds batching config fields with defaults (disabled).
configdefinitions/src/vespa/mistral-embedder.def	Adds batching config fields with defaults (disabled).
config-model/src/main/resources/schema/common.rnc	Allows `<batching …/>` for OpenAI and Mistral embedders.
config-model/src/main/java/com/yahoo/vespa/model/container/component/EmbedderBatchingConfig.java	Adds `applyTo(...)` helper for builder forwarding.
config-model/src/main/java/com/yahoo/vespa/model/container/component/OpenAIEmbedder.java	Parses `<batching>` and forwards to config builder.
config-model/src/main/java/com/yahoo/vespa/model/container/component/MistralEmbedder.java	Parses `<batching>` and forwards to config builder.
config-model/src/main/java/com/yahoo/vespa/model/container/component/VoyageAIEmbedder.java	Refactors batching forwarding to use the new helper.
config-model/src/test/java/com/yahoo/vespa/model/container/xml/OpenAIEmbedderTest.java	Verifies batching config is present in generated config.
config-model/src/test/java/com/yahoo/vespa/model/container/xml/MistralEmbedderTest.java	Verifies batching config is present in generated config.
config-model/src/test/cfg/application/openai-embedder/services.xml	Adds `<batching .../>` to test services.xml.
config-model/src/test/cfg/application/mistral-embedder/services.xml	Adds `<batching .../>` to test services.xml.
model-integration/src/test/java/ai/vespa/embedding/OpenAIEmbedderTest.java	Adds unit test for batching config exposure + minor builder helper refactor.
model-integration/src/test/java/ai/vespa/embedding/MistralEmbedderTest.java	Adds batch embedding unit test + minor builder helper refactor.
model-integration/src/test/java/ai/vespa/embedding/OpenAIEmbedderIntegrationTest.java	Adds integration test covering batching config + embed call.
model-integration/src/test/java/ai/vespa/embedding/MistralEmbedderIntegrationTest.java	Adds integration test covering batching config + embed call.
model-integration/src/test/java/ai/vespa/embedding/AbstractHttpEmbedderTest.java	Updates tests to assert actual status code is surfaced and sampled after retries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Add the `batching` element row to the OpenAI and Mistral reference config tables. Follow-up to the PR #4655 review: dynamic batching is being added to these embedders in vespa-engine/vespa#36461.

glebashnik

Looks good.
Minor suggestions.

The RetryInterceptor's catch(IOException) branch — which retries transport-level failures that occur after a request has been sent (e.g. connection dropped while awaiting response) — was uncovered. OkHttp's retryOnConnectionFailure(true) transparently handles connection-establishment failures, so simulating those would not exercise this path; use SocketPolicy.DISCONNECT_AFTER_REQUEST to force a mid-flight IOException instead.

Hoist the "max retries exceeded" log+return out of the combined early-return condition into its own early-return. Equivalent behavior since `retryable` is only true for 5xx (mutually exclusive with `isSuccessful()`), and separating the exhausted- retries path makes the intent explicit.

glebashnik

Looks good!

bjorncs requested a review from glebashnik April 20, 2026 10:53

bjorncs added 2 commits April 20, 2026 13:09

bjorncs force-pushed the bjorncs/openai-embedder branch from 9e138dc to 49e166c Compare April 20, 2026 11:11

bjorncs requested a review from Copilot April 20, 2026 11:11

Copilot started reviewing on behalf of bjorncs April 20, 2026 11:12 View session

Copilot AI reviewed Apr 20, 2026

View reviewed changes

bjorncs mentioned this pull request Apr 20, 2026

Document OpenAI and Mistral embedders vespa-engine/documentation#4655

Merged

glebashnik previously approved these changes Apr 20, 2026

View reviewed changes

Comment thread configdefinitions/src/vespa/mistral-embedder.def

Comment thread model-integration/src/main/java/ai/vespa/embedding/AbstractHttpEmbedder.java

Comment thread model-integration/src/main/java/ai/vespa/embedding/AbstractHttpEmbedder.java Outdated

bjorncs dismissed glebashnik’s stale review via ed86948 April 20, 2026 12:18

bjorncs requested a review from glebashnik April 20, 2026 12:32

glebashnik approved these changes Apr 20, 2026

View reviewed changes

bjorncs merged commit d78a07f into master Apr 20, 2026
3 checks passed

bjorncs deleted the bjorncs/openai-embedder branch April 20, 2026 12:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add dynamic batching to Mistral and OpenAI embedders#36461

feat: add dynamic batching to Mistral and OpenAI embedders#36461
bjorncs merged 4 commits intomasterfrom
bjorncs/openai-embedder

bjorncs commented Apr 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

glebashnik left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glebashnik left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bjorncs commented Apr 20, 2026

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

glebashnik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glebashnik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants