Skip to content

feat: add dynamic batching to Mistral and OpenAI embedders#36461

Merged
bjorncs merged 4 commits intomasterfrom
bjorncs/openai-embedder
Apr 20, 2026
Merged

feat: add dynamic batching to Mistral and OpenAI embedders#36461
bjorncs merged 4 commits intomasterfrom
bjorncs/openai-embedder

Conversation

@bjorncs
Copy link
Copy Markdown
Member

@bjorncs bjorncs commented Apr 20, 2026

Summary

  • Opt the Mistral and OpenAI embedders into the framework-level dynamic batching mechanism already used by the VoyageAI embedder (EmbedExpression + DynamicBatcher), so concurrent per-document embed() calls are coalesced into a single multi-input API request.
  • Extend the mistral-embedder and openai-embedder config definitions with batching.maxSize / batching.maxDelayMillis (default 0 = disabled) and expose the matching <batching max-size="…" max-delay="…"/> XML element via common.rnc.
  • Consolidate builder-forwarding logic across all three embedder implementations (VoyageAI, Mistral, OpenAI) via a new EmbedderBatchingConfig.applyTo(maxSizeSetter, maxDelayMillisSetter) helper.
  • Fold in a pre-existing fix (fix: record actual 5xx status code when embedder retries are exhausted) that's already on this branch.
  • Add unit, config-model XML, and integration tests covering the new batching path.

@bjorncs bjorncs requested a review from glebashnik April 20, 2026 10:53
bjorncs added 2 commits April 20, 2026 13:09
The RetryInterceptor used to throw an IOException with the final status
code encoded in its message when retries were exhausted on a retryable
5xx response. The outer doHttpRequest's IOException catch then sampled
the failure metric as status 0 (connection error), so dashboards/alerts
that distinguish network failures from server overload misclassified
exhausted-retry failures across all HTTP embedders (OpenAI, Mistral,
VoyageAI).

Return the final retryable 5xx response from the interceptor instead of
throwing. The existing non-2xx path in doHttpRequest then samples the
real code (500/502/503/504) and maps to the appropriate exception.
True connection failures still propagate as IOException and sample 0,
as intended.
Opt the Mistral and OpenAI embedders into the framework-level dynamic
batching mechanism already used by the VoyageAI embedder. Under
concurrent load, EmbedExpression + DynamicBatcher now accumulates
per-document embed() calls into a single multi-input API request,
reducing request count and improving throughput.

- Add batching.maxSize / batching.maxDelayMillis to the mistral- and
  openai-embedder config definitions (default 0 = disabled).
- Initialize Embedder.Batching in the runtime embedders and expose it
  via batchingConfig().
- Parse the <batching max-size="..." max-delay="..."/> XML element in
  the config-model component builders via the existing
  EmbedderBatchingConfig helper; extend EmbedderBatchingConfig with an
  applyTo(maxSizeSetter, maxDelayMillisSetter) method so all three
  embedders (VoyageAI, Mistral, OpenAI) forward the parsed values to
  the generated config builder through a single shared call.
- Add EmbedderBatchingParams to the RELAX NG schema for both element
  types.
- Extend unit, config-model XML, and integration tests to cover the
  new batching config path.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR opts the OpenAI and Mistral embedders into Vespa’s existing framework-level dynamic batching (used by EmbedExpression + DynamicBatcher) by exposing batching configuration via embedder configs and returning that configuration from the embedder implementations. It also incorporates a retry/metrics fix to ensure the final 5xx status code is recorded when retries are exhausted.

Changes:

  • Add batching.maxSize / batching.maxDelayMillis config to OpenAI and Mistral embedders, wire through XML (<batching max-size="…" max-delay="…"/>) and model config builders.
  • Expose batching configuration from OpenAIEmbedder and MistralEmbedder via Embedder#batchingConfig() so EmbedExpression can enable dynamic batching.
  • Adjust retry behavior/tests so exhausted retries surface the actual last HTTP status code and sample it in metrics.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated no comments.

Show a summary per file
File Description
model-integration/src/main/java/ai/vespa/embedding/OpenAIEmbedder.java Exposes batchingConfig() based on new config fields.
model-integration/src/main/java/ai/vespa/embedding/MistralEmbedder.java Exposes batchingConfig() based on new config fields.
model-integration/src/main/java/ai/vespa/embedding/AbstractHttpEmbedder.java Retry interceptor now returns the final retryable response so callers can record the actual status code.
configdefinitions/src/vespa/openai-embedder.def Adds batching config fields with defaults (disabled).
configdefinitions/src/vespa/mistral-embedder.def Adds batching config fields with defaults (disabled).
config-model/src/main/resources/schema/common.rnc Allows <batching …/> for OpenAI and Mistral embedders.
config-model/src/main/java/com/yahoo/vespa/model/container/component/EmbedderBatchingConfig.java Adds applyTo(...) helper for builder forwarding.
config-model/src/main/java/com/yahoo/vespa/model/container/component/OpenAIEmbedder.java Parses <batching> and forwards to config builder.
config-model/src/main/java/com/yahoo/vespa/model/container/component/MistralEmbedder.java Parses <batching> and forwards to config builder.
config-model/src/main/java/com/yahoo/vespa/model/container/component/VoyageAIEmbedder.java Refactors batching forwarding to use the new helper.
config-model/src/test/java/com/yahoo/vespa/model/container/xml/OpenAIEmbedderTest.java Verifies batching config is present in generated config.
config-model/src/test/java/com/yahoo/vespa/model/container/xml/MistralEmbedderTest.java Verifies batching config is present in generated config.
config-model/src/test/cfg/application/openai-embedder/services.xml Adds <batching .../> to test services.xml.
config-model/src/test/cfg/application/mistral-embedder/services.xml Adds <batching .../> to test services.xml.
model-integration/src/test/java/ai/vespa/embedding/OpenAIEmbedderTest.java Adds unit test for batching config exposure + minor builder helper refactor.
model-integration/src/test/java/ai/vespa/embedding/MistralEmbedderTest.java Adds batch embedding unit test + minor builder helper refactor.
model-integration/src/test/java/ai/vespa/embedding/OpenAIEmbedderIntegrationTest.java Adds integration test covering batching config + embed call.
model-integration/src/test/java/ai/vespa/embedding/MistralEmbedderIntegrationTest.java Adds integration test covering batching config + embed call.
model-integration/src/test/java/ai/vespa/embedding/AbstractHttpEmbedderTest.java Updates tests to assert actual status code is surfaced and sampled after retries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

bjorncs added a commit to vespa-engine/documentation that referenced this pull request Apr 20, 2026
Add the `batching` element row to the OpenAI and Mistral reference
config tables. Follow-up to the PR #4655 review: dynamic batching is
being added to these embedders in vespa-engine/vespa#36461.
glebashnik
glebashnik previously approved these changes Apr 20, 2026
Copy link
Copy Markdown
Member

@glebashnik glebashnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
Minor suggestions.

Comment thread configdefinitions/src/vespa/mistral-embedder.def
Comment thread model-integration/src/main/java/ai/vespa/embedding/AbstractHttpEmbedder.java Outdated
The RetryInterceptor's catch(IOException) branch — which retries
transport-level failures that occur after a request has been sent
(e.g. connection dropped while awaiting response) — was uncovered.
OkHttp's retryOnConnectionFailure(true) transparently handles
connection-establishment failures, so simulating those would not
exercise this path; use SocketPolicy.DISCONNECT_AFTER_REQUEST to
force a mid-flight IOException instead.
Hoist the "max retries exceeded" log+return out of the combined
early-return condition into its own early-return. Equivalent
behavior since `retryable` is only true for 5xx (mutually
exclusive with `isSuccessful()`), and separating the exhausted-
retries path makes the intent explicit.
@bjorncs bjorncs requested a review from glebashnik April 20, 2026 12:32
Copy link
Copy Markdown
Member

@glebashnik glebashnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@bjorncs bjorncs merged commit d78a07f into master Apr 20, 2026
3 checks passed
@bjorncs bjorncs deleted the bjorncs/openai-embedder branch April 20, 2026 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants