Skip to content

Metadata filter - Adds metadata filtering support across all search methods#4

Open
danielebarbaro wants to merge 9 commits intoezimuel:mainfrom
danielebarbaro:feat/metadata
Open

Metadata filter - Adds metadata filtering support across all search methods#4
danielebarbaro wants to merge 9 commits intoezimuel:mainfrom
danielebarbaro:feat/metadata

Conversation

@danielebarbaro
Copy link
Copy Markdown
Collaborator

@danielebarbaro danielebarbaro commented Mar 24, 2026

  • MetadataFilter value object with 11 operators (eq, neq, lt, lte, gt, gte, in, notIn, contains, exists, notExists)
  • MetadataFilterEvaluator with AND/OR group logic
  • SortDirection enum
  • Filtering integrated into vectorSearch, textSearch, hybridSearch via over-fetch + post-filter
  • metadataSearch() for metadata-only queries with sort and limit
  • patchMetadata() for updating metadata without re-indexing
  • Configurable overFetchMultiplier on VectorDatabase constructor

@danielebarbaro danielebarbaro self-assigned this Mar 24, 2026
@danielebarbaro danielebarbaro changed the title WIP - Metadata filter [WIP] - Metadata filter Mar 28, 2026
@danielebarbaro danielebarbaro reopened this Apr 1, 2026
@danielebarbaro danielebarbaro changed the title [WIP] - Metadata filter Metadata filter - Adds metadata filtering support across all search methods Apr 3, 2026
@danielebarbaro danielebarbaro marked this pull request as ready for review April 3, 2026 07:53
@danielebarbaro danielebarbaro requested a review from ezimuel April 3, 2026 07:53
@ezimuel ezimuel requested a review from Copilot April 4, 2026 10:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-class metadata filtering and metadata-only querying to VectorDatabase, including a filter DSL (MetadataFilter), an evaluator with AND/OR group semantics, configurable over-fetching for filtered searches, and metadata patching without reindexing.

Changes:

  • Introduce MetadataFilter, MetadataFilterEvaluator, and SortDirection to express and evaluate metadata predicates (incl. AND/OR groups).
  • Integrate metadata filtering into vectorSearch, textSearch, and hybridSearch via over-fetch + post-filter; add metadataSearch() for metadata-only queries with sort/limit.
  • Add patchMetadata() to update persisted/in-memory metadata without touching HNSW/BM25 indexes; expand test coverage and README docs.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/VectorDatabaseTest.php Adds extensive coverage for filtered searches, metadataSearch(), and patchMetadata() behavior (incl. persistence).
tests/MetadataFilterTest.php Adds unit tests for MetadataFilter construction and factory methods (currently brittle vs implementation messages).
tests/Metadata/MetadataFilterEvaluatorTest.php Adds operator + AND/OR logic tests for MetadataFilterEvaluator.
src/VectorDatabase.php Implements filtering integration, metadataSearch(), patchMetadata(), and configurable overFetchMultiplier.
src/Metadata/MetadataFilter.php Adds filter value object with operator validation and factory methods.
src/Metadata/MetadataFilterEvaluator.php Implements AND/OR evaluation and operator semantics.
src/Metadata/SortDirection.php Adds enum for sort direction in metadata-only search.
src/HNSW/Config.php Minor whitespace cleanup.
README.md Documents metadata filtering, over-fetch tuning, metadata patching, and metadata-only search.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

public function testConstructorThrowsOnUnknownOperator(): void
{
$this->expectException(InvalidArgumentException::class);
$this->expectExceptionMessage('Unknown operator "invalid"');
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expected exception message is too strict and doesn't match the actual message thrown by MetadataFilter (it includes the list of valid operators). Update the expected message (or switch to expectExceptionMessageMatches/contains) so this test reflects the real error text.

Suggested change
$this->expectExceptionMessage('Unknown operator "invalid"');
$this->expectExceptionMessageMatches('/^Unknown operator "invalid"/');

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦🏼 pleeasee Copilot.

Comment on lines +34 to +46
$this->expectException(InvalidArgumentException::class);
$this->expectExceptionMessage('Operator "in" requires an array value');

new MetadataFilter('key', 'not-an-array', 'in');
}

public function testConstructorThrowsWhenNotInOperatorGetsNonArrayValue(): void
{
$this->expectException(InvalidArgumentException::class);
$this->expectExceptionMessage('Operator "not_in" requires an array value');

new MetadataFilter('key', 'not-an-array', 'not_in');
}
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These expected exception messages don't match the current MetadataFilter constructor messages (they include trailing periods). Align the expected strings with the implementation, or use a partial/regex match to avoid brittle tests.

Copilot uses AI. Check for mistakes.
Comment thread README.md
Comment on lines +538 to +552
### Strict type comparison

Metadata filtering uses **strict type comparison** (PHP `===`). This means:
- String `'5'` does NOT match integer `5`
- Float `1.0` does NOT match integer `1`

```php
// Document with metadata: ['year' => 2024] (integer)
MetadataFilter::eq('year', 2024); // ✓ matches
MetadataFilter::eq('year', '2024'); // ✗ does not match (string vs int)

// Document with metadata: ['rating' => 4.5] (float)
MetadataFilter::gt('rating', 4); // ✓ matches (4.5 > 4)
MetadataFilter::eq('rating', 4.5); // ✓ matches
MetadataFilter::eq('rating', '4.5'); // ✗ does not match (string vs float)
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs claim metadata filtering uses strict type comparison (===), but the implementation only uses strict comparison for '=', '!=', 'in', 'not_in', and 'contains'. The ordering operators (<, <=, >, >=) use PHP's normal comparisons which can coerce types (e.g., numeric strings). Please clarify this section or enforce strict typing for comparison operators.

Suggested change
### Strict type comparison
Metadata filtering uses **strict type comparison** (PHP `===`). This means:
- String `'5'` does NOT match integer `5`
- Float `1.0` does NOT match integer `1`
```php
// Document with metadata: ['year' => 2024] (integer)
MetadataFilter::eq('year', 2024); // ✓ matches
MetadataFilter::eq('year', '2024'); // ✗ does not match (string vs int)
// Document with metadata: ['rating' => 4.5] (float)
MetadataFilter::gt('rating', 4); // ✓ matches (4.5 > 4)
MetadataFilter::eq('rating', 4.5); // ✓ matches
MetadataFilter::eq('rating', '4.5'); // ✗ does not match (string vs float)
### Metadata comparison semantics
Equality-style metadata filters use **strict type comparison** (PHP `===`). This applies to exact-match operators such as `eq`, `neq`, `in`, `notIn`, and `contains`. This means:
- String `'5'` does NOT match integer `5`
- Float `1.0` does NOT match integer `1`
Ordering operators such as `gt`, `gte`, `lt`, and `lte` use PHP's normal comparison rules rather than strict type comparison. In practice, that means comparable scalar values such as numeric strings may be coerced during comparison.
```php
// Document with metadata: ['year' => 2024] (integer)
MetadataFilter::eq('year', 2024); // ✓ matches
MetadataFilter::eq('year', '2024'); // ✗ does not match (string vs int)
// Document with metadata: ['rating' => 4.5] (float)
MetadataFilter::eq('rating', 4.5); // ✓ matches
MetadataFilter::eq('rating', '4.5'); // ✗ does not match (string vs float)
MetadataFilter::gt('rating', 4); // ✓ matches (4.5 > 4)
// Ordering comparisons follow PHP's normal comparison semantics
// Example: a numeric string may be compared numerically
MetadataFilter::gt('year', '2023'); // may match metadata value 2024

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielebarbaro can you check this? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants