You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A `Document` holds a dense embedding vector, optional raw text for BM25, and any metadata you want returned with results.
21
+
A `Document` holds a dense embedding vector, optional raw text for BM25, and any metadata you want returned with results. The `id` field is optional — if omitted, a random UUID v4 is assigned automatically.
text: 'Hybrid search with Reciprocal Rank Fusion',
52
+
),
47
53
]);
48
54
```
49
55
@@ -180,76 +186,108 @@ $db = new VectorDatabase(
180
186
181
187
## Persistence
182
188
183
-
The full database state — HNSW graph, BM25 index, and all documents — can be saved to a single binary file and restored in one call. The format (`PHPV`) uses raw `pack/unpack` for float arrays and integer sequences, so reads and writes are fast even for large indexes.
189
+
PHPVector uses a **folder-based** persistence model. Each database lives in its own directory containing separate files for the HNSW graph, the BM25 index, and one file per document. This design has two key advantages:
190
+
191
+
-**Low memory footprint on load** — only the HNSW graph and BM25 index are loaded into memory. Individual document files (`docs/{n}.bin`) are read lazily, only for the documents that appear in search results.
192
+
-**Low insert latency** — document files are written to disk asynchronously in a forked child process (requires `ext-pcntl`), so `addDocument()` returns immediately.
193
+
194
+
### Folder layout
195
+
196
+
```
197
+
/var/data/mydb/
198
+
meta.json — distance metric, dimension, document ID map
199
+
hnsw.bin — HNSW graph (vectors + connections)
200
+
bm25.bin — BM25 inverted index
201
+
docs/
202
+
0.bin — document 0 (id, text, metadata)
203
+
1.bin — document 1
204
+
…
205
+
```
184
206
185
207
### Saving
186
208
209
+
Pass a `path` to the constructor to enable persistence. Each `addDocument()` call writes the document file to `docs/` (asynchronously when `ext-pcntl` is available). Call `save()` once to flush the HNSW graph and BM25 index — it waits for any outstanding async writes before proceeding.
// Flush HNSW graph + BM25 index to disk (document files already written).
224
+
$db->save();
199
225
```
200
226
201
227
### Loading
202
228
203
-
Pass the same `HNSWConfig` (including the same `distance` metric) that was used when building the index. The method throws `\RuntimeException` if the distance codes do not match.
229
+
Use `VectorDatabase::open()` to load a previously saved folder. Only `hnsw.bin` and `bm25.bin` are read into memory; document files are loaded on demand after search.
230
+
231
+
Pass the same `HNSWConfig` (including the same `distance` metric) that was used when building the index — a `RuntimeException` is thrown on mismatch.
If the index was built with non-default settings, pass the same config objects to `load()`:
244
+
### Custom configuration on open
225
245
226
246
```php
227
-
$db = VectorDatabase::load(
228
-
path: '/var/data/myindex.phpv',
247
+
use PHPVector\BM25\Config as BM25Config;
248
+
use PHPVector\Distance;
249
+
use PHPVector\HNSW\Config as HNSWConfig;
250
+
use PHPVector\VectorDatabase;
251
+
252
+
$db = VectorDatabase::open(
253
+
path: '/var/data/mydb',
229
254
hnswConfig: new HNSWConfig(
230
255
M: 16,
231
256
efSearch: 100,
232
-
distance: Distance::Euclidean, // must match what was used on persist
257
+
distance: Distance::Euclidean, // must match the value used on save()
233
258
),
234
259
bm25Config: new BM25Config(k1: 1.2, b: 0.8),
235
260
tokenizer: new MyCustomTokenizer(),
236
261
);
237
262
```
238
263
239
-
> **Note:** Only `efSearch` and `bm25Config`/`tokenizer` affect query-time behaviour and can differ from build time. `distance` and the graph parameters (`M`, `efConstruction`) are fixed at build time — `distance` is validated on load and must match.
264
+
> **Note:** Only `efSearch` and `bm25Config`/`tokenizer` affect query-time behaviour and can differ from build time. `distance` and the graph parameters (`M`, `efConstruction`) are fixed at build time — `distance` is validated on `open()` and must match.
265
+
266
+
### Incremental updates
267
+
268
+
You can add new documents to a database that was loaded from disk, then call `save()` again. The existing document files are left in place; only the new ones are written along with updated index files.
Copy file name to clipboardExpand all lines: composer.json
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
{
2
2
"name": "ezimuel/phpvector",
3
-
"description": "A fast vector database in PHP implementing HNSW for approximate nearest-neighbor search and BM25 for hybrid full-text + vector retrieval.",
3
+
"description": "A vector database in PHP implementing HNSW for approximate nearest-neighbor search and BM25 for hybrid full-text + vector retrieval.",
0 commit comments