feat: add dynamic batching to Mistral and OpenAI embedders#36461
Merged
feat: add dynamic batching to Mistral and OpenAI embedders#36461
Conversation
The RetryInterceptor used to throw an IOException with the final status code encoded in its message when retries were exhausted on a retryable 5xx response. The outer doHttpRequest's IOException catch then sampled the failure metric as status 0 (connection error), so dashboards/alerts that distinguish network failures from server overload misclassified exhausted-retry failures across all HTTP embedders (OpenAI, Mistral, VoyageAI). Return the final retryable 5xx response from the interceptor instead of throwing. The existing non-2xx path in doHttpRequest then samples the real code (500/502/503/504) and maps to the appropriate exception. True connection failures still propagate as IOException and sample 0, as intended.
Opt the Mistral and OpenAI embedders into the framework-level dynamic batching mechanism already used by the VoyageAI embedder. Under concurrent load, EmbedExpression + DynamicBatcher now accumulates per-document embed() calls into a single multi-input API request, reducing request count and improving throughput. - Add batching.maxSize / batching.maxDelayMillis to the mistral- and openai-embedder config definitions (default 0 = disabled). - Initialize Embedder.Batching in the runtime embedders and expose it via batchingConfig(). - Parse the <batching max-size="..." max-delay="..."/> XML element in the config-model component builders via the existing EmbedderBatchingConfig helper; extend EmbedderBatchingConfig with an applyTo(maxSizeSetter, maxDelayMillisSetter) method so all three embedders (VoyageAI, Mistral, OpenAI) forward the parsed values to the generated config builder through a single shared call. - Add EmbedderBatchingParams to the RELAX NG schema for both element types. - Extend unit, config-model XML, and integration tests to cover the new batching config path.
9e138dc to
49e166c
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR opts the OpenAI and Mistral embedders into Vespa’s existing framework-level dynamic batching (used by EmbedExpression + DynamicBatcher) by exposing batching configuration via embedder configs and returning that configuration from the embedder implementations. It also incorporates a retry/metrics fix to ensure the final 5xx status code is recorded when retries are exhausted.
Changes:
- Add
batching.maxSize/batching.maxDelayMillisconfig to OpenAI and Mistral embedders, wire through XML (<batching max-size="…" max-delay="…"/>) and model config builders. - Expose batching configuration from
OpenAIEmbedderandMistralEmbedderviaEmbedder#batchingConfig()soEmbedExpressioncan enable dynamic batching. - Adjust retry behavior/tests so exhausted retries surface the actual last HTTP status code and sample it in metrics.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| model-integration/src/main/java/ai/vespa/embedding/OpenAIEmbedder.java | Exposes batchingConfig() based on new config fields. |
| model-integration/src/main/java/ai/vespa/embedding/MistralEmbedder.java | Exposes batchingConfig() based on new config fields. |
| model-integration/src/main/java/ai/vespa/embedding/AbstractHttpEmbedder.java | Retry interceptor now returns the final retryable response so callers can record the actual status code. |
| configdefinitions/src/vespa/openai-embedder.def | Adds batching config fields with defaults (disabled). |
| configdefinitions/src/vespa/mistral-embedder.def | Adds batching config fields with defaults (disabled). |
| config-model/src/main/resources/schema/common.rnc | Allows <batching …/> for OpenAI and Mistral embedders. |
| config-model/src/main/java/com/yahoo/vespa/model/container/component/EmbedderBatchingConfig.java | Adds applyTo(...) helper for builder forwarding. |
| config-model/src/main/java/com/yahoo/vespa/model/container/component/OpenAIEmbedder.java | Parses <batching> and forwards to config builder. |
| config-model/src/main/java/com/yahoo/vespa/model/container/component/MistralEmbedder.java | Parses <batching> and forwards to config builder. |
| config-model/src/main/java/com/yahoo/vespa/model/container/component/VoyageAIEmbedder.java | Refactors batching forwarding to use the new helper. |
| config-model/src/test/java/com/yahoo/vespa/model/container/xml/OpenAIEmbedderTest.java | Verifies batching config is present in generated config. |
| config-model/src/test/java/com/yahoo/vespa/model/container/xml/MistralEmbedderTest.java | Verifies batching config is present in generated config. |
| config-model/src/test/cfg/application/openai-embedder/services.xml | Adds <batching .../> to test services.xml. |
| config-model/src/test/cfg/application/mistral-embedder/services.xml | Adds <batching .../> to test services.xml. |
| model-integration/src/test/java/ai/vespa/embedding/OpenAIEmbedderTest.java | Adds unit test for batching config exposure + minor builder helper refactor. |
| model-integration/src/test/java/ai/vespa/embedding/MistralEmbedderTest.java | Adds batch embedding unit test + minor builder helper refactor. |
| model-integration/src/test/java/ai/vespa/embedding/OpenAIEmbedderIntegrationTest.java | Adds integration test covering batching config + embed call. |
| model-integration/src/test/java/ai/vespa/embedding/MistralEmbedderIntegrationTest.java | Adds integration test covering batching config + embed call. |
| model-integration/src/test/java/ai/vespa/embedding/AbstractHttpEmbedderTest.java | Updates tests to assert actual status code is surfaced and sampled after retries. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
bjorncs
added a commit
to vespa-engine/documentation
that referenced
this pull request
Apr 20, 2026
Add the `batching` element row to the OpenAI and Mistral reference config tables. Follow-up to the PR #4655 review: dynamic batching is being added to these embedders in vespa-engine/vespa#36461.
glebashnik
previously approved these changes
Apr 20, 2026
Member
glebashnik
left a comment
There was a problem hiding this comment.
Looks good.
Minor suggestions.
The RetryInterceptor's catch(IOException) branch — which retries transport-level failures that occur after a request has been sent (e.g. connection dropped while awaiting response) — was uncovered. OkHttp's retryOnConnectionFailure(true) transparently handles connection-establishment failures, so simulating those would not exercise this path; use SocketPolicy.DISCONNECT_AFTER_REQUEST to force a mid-flight IOException instead.
Hoist the "max retries exceeded" log+return out of the combined early-return condition into its own early-return. Equivalent behavior since `retryable` is only true for 5xx (mutually exclusive with `isSuccessful()`), and separating the exhausted- retries path makes the intent explicit.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
EmbedExpression+DynamicBatcher), so concurrent per-documentembed()calls are coalesced into a single multi-input API request.mistral-embedderandopenai-embedderconfig definitions withbatching.maxSize/batching.maxDelayMillis(default 0 = disabled) and expose the matching<batching max-size="…" max-delay="…"/>XML element viacommon.rnc.EmbedderBatchingConfig.applyTo(maxSizeSetter, maxDelayMillisSetter)helper.fix: record actual 5xx status code when embedder retries are exhausted) that's already on this branch.