Commit e197bec
committed
feat(embeddings): add support for multiple embedding providers
This change introduces support for multiple embedding model providers beyond the default OpenAI, allowing users to configure providers like Google Vertex AI, Google Gemini, AWS Bedrock, and Azure OpenAI via the `DOCS_MCP_EMBEDDING_MODEL` environment variable (format: `provider:model_name`).
**Key Changes:**
* **Embedding Factory (`EmbeddingFactory.ts`):**
* Refactored to dynamically instantiate LangChain embedding classes based on the specified provider.
* Added support for `vertex`, `gemini`, `aws`, and `microsoft` providers.
* Includes checks for required environment variables per provider.
* **Dimension Handling (`FixedDimensionEmbeddings.ts`):**
* Introduced a new `FixedDimensionEmbeddings` wrapper class.
* This wrapper ensures all vectors match the database's fixed dimension (1536).
* Pads vectors smaller than 1536 with zeros.
* Truncates vectors larger than 1536 *only* if `allowTruncate` is true (currently enabled for the `gemini` provider, which supports MRL).
* Throws a `DimensionError` if a non-truncatable model produces vectors > 1536.
* **Factory Integration:**
* Updated `EmbeddingFactory.ts` to wrap the `gemini` provider's embeddings with `FixedDimensionEmbeddings(..., allowTruncate: true)`.
* **Configuration (`.env.example`, `Dockerfile`):**
* Added necessary environment variables for all supported providers.
* Updated examples and comments.
* **Testing:**
* Added comprehensive tests for `FixedDimensionEmbeddings.ts`.
* Updated tests for `EmbeddingFactory.ts` to cover new providers and the wrapper integration.
* **Documentation (`README.md`, `ARCHITECTURE.md`):**
* Updated `README.md` to list supported providers, required environment variables, and simplified the vector dimension explanation.
* Updated `ARCHITECTURE.md` with details on the embedding factory, the `FixedDimensionEmbeddings` wrapper, and the dimension handling logic (padding, MRL truncation, errors).
* Removed examples using unsupported large-dimension models (e.g., `text-embedding-3-large`).
This enhancement provides greater flexibility in choosing embedding models while maintaining compatibility with the existing database schema.
Implements #281 parent 636978f commit e197bec
File tree
13 files changed
+5431
-2470
lines changed- src/store
- embeddings
13 files changed
+5431
-2470
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | | - | |
10 | | - | |
11 | | - | |
| 9 | + | |
| 10 | + | |
12 | 11 | | |
13 | 12 | | |
14 | 13 | | |
| |||
19 | 18 | | |
20 | 19 | | |
21 | 20 | | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
2 | | - | |
3 | | - | |
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
4 | 10 | | |
5 | | - | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
6 | 15 | | |
7 | | - | |
8 | | - | |
| 16 | + | |
9 | 17 | | |
10 | 18 | | |
11 | | - | |
12 | | - | |
13 | | - | |
14 | | - | |
15 | | - | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
16 | 41 | | |
17 | 42 | | |
18 | 43 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
153 | 153 | | |
154 | 154 | | |
155 | 155 | | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
156 | 213 | | |
157 | 214 | | |
158 | 215 | | |
| |||
251 | 308 | | |
252 | 309 | | |
253 | 310 | | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
254 | 349 | | |
255 | 350 | | |
256 | 351 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
33 | | - | |
34 | | - | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
35 | 57 | | |
36 | | - | |
37 | | - | |
38 | | - | |
| 58 | + | |
39 | 59 | | |
| 60 | + | |
40 | 61 | | |
41 | 62 | | |
42 | 63 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
31 | | - | |
| 31 | + | |
32 | 32 | | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
| 33 | + | |
37 | 34 | | |
38 | | - | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
39 | 68 | | |
40 | 69 | | |
41 | 70 | | |
| |||
92 | 121 | | |
93 | 122 | | |
94 | 123 | | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
95 | 132 | | |
96 | 133 | | |
97 | | - | |
98 | | - | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
99 | 172 | | |
100 | 173 | | |
101 | 174 | | |
| |||
0 commit comments