Skip to content

3.0.x fails to boot if embedding model name had uppercase charachters #9746

@fejesd

Description

@fejesd

Description

On the first boot after upgrading from v2.x.x to v3.0.x, the api_server attempts to create an OpenSearch index whose name is derived from the embedding model name.

If the embedding model name contains uppercase characters, such as BAAI/bge-m3, the derived index name also contains uppercase characters. For example:

danswer_chunk_BAAI_bge_m3

OpenSearch does not allow uppercase letters in index names, so index creation fails during startup. As a result, the API server fails to start, and recovery is non-trivial.

Detail Value
Affected version v3.0.4
Embedding model BAAI/bge-m3
Root cause Index name derived from model name contains uppercase letters; OpenSearch rejects it
Error Fatal error at API server boot during OpenSearch index creation
Suggested fix Apply .lower() to the generated index name before passing it to OpenSearch

Reproduction

  1. Start from a working v2.x.x installation.
  2. Configure the embedding model as BAAI/bge-m3.
  3. Upgrade the deployment to v3.0.x (confirmed on v3.0.4).
  4. Boot the system for the first time after the upgrade.
  5. Observe that api_server attempts to create an OpenSearch index with a name derived directly from the model name, resulting in something like:
    danswer_chunk_BAAI_bge_m3
  6. OpenSearch rejects the index creation because uppercase characters are not allowed in index names.
  7. api_server fails to start.

Impact

This issue causes a fatal startup failure after upgrade for installations using embedding models with uppercase characters in their names.

The impact is significant because:

  • the API server cannot start successfully,
  • the failure happens during the upgrade path from v2.x.x to v3.0.x,
  • the recovery process is not straightforward for affected users,
  • the issue can block production upgrades until manual intervention is performed.

Recommended fix

Normalize the generated OpenSearch index name to lowercase before passing it to OpenSearch.

Recommended change:

  • when deriving the index name from the embedding model name,
  • apply lowercase normalization to the final index string, for example with .lower().

Example:

  • current: danswer_chunk_BAAI_bge_m3
  • expected: danswer_chunk_baai_bge_m3

This should prevent OpenSearch from rejecting the index name and allow api_server to start successfully after upgrade.

More context

Link: https://discord.com/channels/1119886360506023946/1466924839062470922/threads/1487471967446499339

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions