-
Notifications
You must be signed in to change notification settings - Fork 3.1k
3.0.x fails to boot if embedding model name had uppercase charachters #9746
Description
Description
On the first boot after upgrading from v2.x.x to v3.0.x, the api_server attempts to create an OpenSearch index whose name is derived from the embedding model name.
If the embedding model name contains uppercase characters, such as BAAI/bge-m3, the derived index name also contains uppercase characters. For example:
danswer_chunk_BAAI_bge_m3
OpenSearch does not allow uppercase letters in index names, so index creation fails during startup. As a result, the API server fails to start, and recovery is non-trivial.
| Detail | Value |
|---|---|
| Affected version | v3.0.4 |
| Embedding model | BAAI/bge-m3 |
| Root cause | Index name derived from model name contains uppercase letters; OpenSearch rejects it |
| Error | Fatal error at API server boot during OpenSearch index creation |
| Suggested fix | Apply .lower() to the generated index name before passing it to OpenSearch |
Reproduction
- Start from a working
v2.x.xinstallation. - Configure the embedding model as
BAAI/bge-m3. - Upgrade the deployment to
v3.0.x(confirmed onv3.0.4). - Boot the system for the first time after the upgrade.
- Observe that
api_serverattempts to create an OpenSearch index with a name derived directly from the model name, resulting in something like:
danswer_chunk_BAAI_bge_m3 - OpenSearch rejects the index creation because uppercase characters are not allowed in index names.
api_serverfails to start.
Impact
This issue causes a fatal startup failure after upgrade for installations using embedding models with uppercase characters in their names.
The impact is significant because:
- the API server cannot start successfully,
- the failure happens during the upgrade path from
v2.x.xtov3.0.x, - the recovery process is not straightforward for affected users,
- the issue can block production upgrades until manual intervention is performed.
Recommended fix
Normalize the generated OpenSearch index name to lowercase before passing it to OpenSearch.
Recommended change:
- when deriving the index name from the embedding model name,
- apply lowercase normalization to the final index string, for example with
.lower().
Example:
- current:
danswer_chunk_BAAI_bge_m3 - expected:
danswer_chunk_baai_bge_m3
This should prevent OpenSearch from rejecting the index name and allow api_server to start successfully after upgrade.
More context
Link: https://discord.com/channels/1119886360506023946/1466924839062470922/threads/1487471967446499339