Metadata filter - Adds metadata filtering support across all search methods#4
Metadata filter - Adds metadata filtering support across all search methods#4danielebarbaro wants to merge 9 commits intoezimuel:mainfrom
Conversation
acf6551 to
5ae150a
Compare
68f8161 to
abbdacd
Compare
There was a problem hiding this comment.
Pull request overview
Adds first-class metadata filtering and metadata-only querying to VectorDatabase, including a filter DSL (MetadataFilter), an evaluator with AND/OR group semantics, configurable over-fetching for filtered searches, and metadata patching without reindexing.
Changes:
- Introduce
MetadataFilter,MetadataFilterEvaluator, andSortDirectionto express and evaluate metadata predicates (incl. AND/OR groups). - Integrate metadata filtering into
vectorSearch,textSearch, andhybridSearchvia over-fetch + post-filter; addmetadataSearch()for metadata-only queries with sort/limit. - Add
patchMetadata()to update persisted/in-memory metadata without touching HNSW/BM25 indexes; expand test coverage and README docs.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/VectorDatabaseTest.php | Adds extensive coverage for filtered searches, metadataSearch(), and patchMetadata() behavior (incl. persistence). |
| tests/MetadataFilterTest.php | Adds unit tests for MetadataFilter construction and factory methods (currently brittle vs implementation messages). |
| tests/Metadata/MetadataFilterEvaluatorTest.php | Adds operator + AND/OR logic tests for MetadataFilterEvaluator. |
| src/VectorDatabase.php | Implements filtering integration, metadataSearch(), patchMetadata(), and configurable overFetchMultiplier. |
| src/Metadata/MetadataFilter.php | Adds filter value object with operator validation and factory methods. |
| src/Metadata/MetadataFilterEvaluator.php | Implements AND/OR evaluation and operator semantics. |
| src/Metadata/SortDirection.php | Adds enum for sort direction in metadata-only search. |
| src/HNSW/Config.php | Minor whitespace cleanup. |
| README.md | Documents metadata filtering, over-fetch tuning, metadata patching, and metadata-only search. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| public function testConstructorThrowsOnUnknownOperator(): void | ||
| { | ||
| $this->expectException(InvalidArgumentException::class); | ||
| $this->expectExceptionMessage('Unknown operator "invalid"'); |
There was a problem hiding this comment.
The expected exception message is too strict and doesn't match the actual message thrown by MetadataFilter (it includes the list of valid operators). Update the expected message (or switch to expectExceptionMessageMatches/contains) so this test reflects the real error text.
| $this->expectExceptionMessage('Unknown operator "invalid"'); | |
| $this->expectExceptionMessageMatches('/^Unknown operator "invalid"/'); |
There was a problem hiding this comment.
🤦🏼 pleeasee Copilot.
| $this->expectException(InvalidArgumentException::class); | ||
| $this->expectExceptionMessage('Operator "in" requires an array value'); | ||
|
|
||
| new MetadataFilter('key', 'not-an-array', 'in'); | ||
| } | ||
|
|
||
| public function testConstructorThrowsWhenNotInOperatorGetsNonArrayValue(): void | ||
| { | ||
| $this->expectException(InvalidArgumentException::class); | ||
| $this->expectExceptionMessage('Operator "not_in" requires an array value'); | ||
|
|
||
| new MetadataFilter('key', 'not-an-array', 'not_in'); | ||
| } |
There was a problem hiding this comment.
These expected exception messages don't match the current MetadataFilter constructor messages (they include trailing periods). Align the expected strings with the implementation, or use a partial/regex match to avoid brittle tests.
| ### Strict type comparison | ||
|
|
||
| Metadata filtering uses **strict type comparison** (PHP `===`). This means: | ||
| - String `'5'` does NOT match integer `5` | ||
| - Float `1.0` does NOT match integer `1` | ||
|
|
||
| ```php | ||
| // Document with metadata: ['year' => 2024] (integer) | ||
| MetadataFilter::eq('year', 2024); // ✓ matches | ||
| MetadataFilter::eq('year', '2024'); // ✗ does not match (string vs int) | ||
|
|
||
| // Document with metadata: ['rating' => 4.5] (float) | ||
| MetadataFilter::gt('rating', 4); // ✓ matches (4.5 > 4) | ||
| MetadataFilter::eq('rating', 4.5); // ✓ matches | ||
| MetadataFilter::eq('rating', '4.5'); // ✗ does not match (string vs float) |
There was a problem hiding this comment.
The docs claim metadata filtering uses strict type comparison (===), but the implementation only uses strict comparison for '=', '!=', 'in', 'not_in', and 'contains'. The ordering operators (<, <=, >, >=) use PHP's normal comparisons which can coerce types (e.g., numeric strings). Please clarify this section or enforce strict typing for comparison operators.
| ### Strict type comparison | |
| Metadata filtering uses **strict type comparison** (PHP `===`). This means: | |
| - String `'5'` does NOT match integer `5` | |
| - Float `1.0` does NOT match integer `1` | |
| ```php | |
| // Document with metadata: ['year' => 2024] (integer) | |
| MetadataFilter::eq('year', 2024); // ✓ matches | |
| MetadataFilter::eq('year', '2024'); // ✗ does not match (string vs int) | |
| // Document with metadata: ['rating' => 4.5] (float) | |
| MetadataFilter::gt('rating', 4); // ✓ matches (4.5 > 4) | |
| MetadataFilter::eq('rating', 4.5); // ✓ matches | |
| MetadataFilter::eq('rating', '4.5'); // ✗ does not match (string vs float) | |
| ### Metadata comparison semantics | |
| Equality-style metadata filters use **strict type comparison** (PHP `===`). This applies to exact-match operators such as `eq`, `neq`, `in`, `notIn`, and `contains`. This means: | |
| - String `'5'` does NOT match integer `5` | |
| - Float `1.0` does NOT match integer `1` | |
| Ordering operators such as `gt`, `gte`, `lt`, and `lte` use PHP's normal comparison rules rather than strict type comparison. In practice, that means comparable scalar values such as numeric strings may be coerced during comparison. | |
| ```php | |
| // Document with metadata: ['year' => 2024] (integer) | |
| MetadataFilter::eq('year', 2024); // ✓ matches | |
| MetadataFilter::eq('year', '2024'); // ✗ does not match (string vs int) | |
| // Document with metadata: ['rating' => 4.5] (float) | |
| MetadataFilter::eq('rating', 4.5); // ✓ matches | |
| MetadataFilter::eq('rating', '4.5'); // ✗ does not match (string vs float) | |
| MetadataFilter::gt('rating', 4); // ✓ matches (4.5 > 4) | |
| // Ordering comparisons follow PHP's normal comparison semantics | |
| // Example: a numeric string may be compared numerically | |
| MetadataFilter::gt('year', '2023'); // may match metadata value 2024 |
MetadataFiltervalue object with 11 operators (eq, neq, lt, lte, gt, gte, in, notIn, contains, exists, notExists)MetadataFilterEvaluatorwith AND/OR group logicSortDirectionenumvectorSearch,textSearch,hybridSearchvia over-fetch + post-filtermetadataSearch()for metadata-only queries with sort and limitpatchMetadata()for updating metadata without re-indexingoverFetchMultiplieron VectorDatabase constructor