Adding translator for filter query#2667
Adding translator for filter query#2667nagarajg17 wants to merge 1 commit intoopensearch-project:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2667 +/- ##
============================================
+ Coverage 72.52% 72.54% +0.01%
Complexity 90 90
============================================
Files 709 711 +2
Lines 32988 33011 +23
Branches 2823 2831 +8
============================================
+ Hits 23925 23948 +23
Misses 7784 7784
Partials 1279 1279
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Please look into the code coverage and integration test failures. |
| match: (ctx) => ctx.params.has('fq'), | ||
| apply: (ctx) => { | ||
| // Get all fq values (can be multiple) | ||
| const fqValues = ctx.params.getAll('fq'); |
There was a problem hiding this comment.
- Should we need to filter out empty or whitespace-only values before processing? If this is handled at the parser please make sure we have sufficient test coverage to confirm the handling.
- Solr allows filter(condition) syntax inside queries to cache sub-clauses independently. Should we detect and normalize filter(...) constructs?
There was a problem hiding this comment.
- Added validations to filter out empty or whitespace-only values and tests
- Actually it is a separate syntax or query itself from
fq. This needs PEG changes, separate translate transformation rule etc, so had excluded in this PR scope. Anyway have added it now
| if (fqValues.length === 0) return; | ||
|
|
||
| // Parse each fq into OpenSearch DSL | ||
| const filters = fqValues.map(parseFq); |
There was a problem hiding this comment.
-
Solr allows complex boolean expressions within a single fq (e.g., +A +B) which are semantically equivalent to multiple fq params. The current implementation does not explicitly account for intra-fq boolean structure. If translateQ does not normalize +A +B into must, behavior may diverge. Please ensure translateQ preserves boolean semantics consistently between multiple fq and single fq with boolean clauses.
eg:fq=+A +Bshould behave same asfq=A&fq=B -
Callout: Currently, if one
fqfails to parse, the entire transformation will fail. We will need to consider the translation modes for later phases to handle these failures.
There was a problem hiding this comment.
- Good point. The parser does handle +/- prefix syntax (see prefixExpr in solr.pegjs). This applies to not just fq even to q param. When
+/-is used and there are 2 or more terms, it is implicit AND instead of OR. Currently PEG works with implicit OR which is true for rest of all other usecases. We can fix at PEG level or before calling PEG itself to sendq.op=ANDfor such cases. I'm tracking the fix in fine tune task. Once we add the fix it will work for both q and fq, no explicit changes will be needed for fq - Ack, it supports fail fast now
There was a problem hiding this comment.
- Once the fix is implemented, please ensure we have explicit test coverage validating equivalence between multiple fq params and single fq with + (required) clauses. Please add a code level TODO to ensure this is not slipped through the crack.
| if (existingQuery) { | ||
| boolQuery.set('must', [existingQuery]); | ||
| } |
There was a problem hiding this comment.
Wrapping existingQuery in an array is correct, but if existingQuery is already a bool query, this may lead to unnecessary nesting. Can we flatten when possible to avoid deep nesting and improve query performance/readability.
| } | ||
|
|
||
| // Add filter clauses | ||
| boolQuery.set('filter', filters); |
There was a problem hiding this comment.
This overwrites any existing filter clause if the original query already had one (e.g., from previous transforms like query-q ). Should we merge with existing filters instead of overwriting?
const existingFilter = existingQuery?.get?.('bool')?.get?.('filter') || [];
boolQuery.set('filter', [...existingFilter, ...filters]);
There was a problem hiding this comment.
We are not overriding it. We are wrapping existing query and adding filter on top of it. For merging filters is handled as part of above comment of nesting
const boolQuery = new Map<string, any>();
if (existingQuery) {
boolQuery.set('must', [existingQuery]);
}
boolQuery.set('filter', filters);
| boolQuery.set('filter', filters); | ||
|
|
||
| // Replace query with the new bool wrapper | ||
| ctx.body.set('query', new Map([['bool', boolQuery]])); |
There was a problem hiding this comment.
I would like to understand why we are doing this replacement? Will it impact any transformations from query-q?
This replaces the entire query with a new bool structure, which may discard other clauses such as should or must_not that exist in the original query. I would recommend merging into the existing bool query when present instead of replacing it.
There was a problem hiding this comment.
Answered in above comments. We are wrapping existing query and adding filter on it. It won't impact anything. If existing query from query-q is bool query, have merged filter to that now
| return result.dsl; | ||
| } | ||
|
|
||
| export const request: MicroTransform<RequestContext> = { |
There was a problem hiding this comment.
Solr supports negation in fq (e.g., -domainStatus:deleted). These should be mapped to must_not instead of filter.
There was a problem hiding this comment.
Filter will have any kind of query, must_not is inside that. Translation produces
{
"query": {
"bool": {
"must": [{ "match_all": {} }],
"filter": [
{
"bool": {
"must_not": [{ "match": { "category": { "query": "software" } } }]
}
}
]
}
}
}
There was a problem hiding this comment.
This introduces an extra level of nesting, whereas OpenSearch allows must_not directly at the top-level bool, which is more natural and avoids unnecessary wrapping. Consider lifting must_not to the top-level bool.must_not when the fq is a pure negation, for a flatter and more idiomatic query structure.
| * Parse a single fq value using the same query engine as q. | ||
| * Creates a params map with the fq value as 'q' for translateQ. | ||
| */ | ||
| function parseFq(fq: string): Map<string, any> { |
There was a problem hiding this comment.
Solr fq supports local params (e.g., {!frange}, {!geofilt}, {!cache=false}), which are not standard query syntax. Passing raw fq into translateQ may fail or misinterpret {!frange l=10 u=100}field, {!cache=false}field:value.
Introduce a preprocessing step to parse/extract local params and route to appropriate DSL constructs will help.
const { localParams, query } = parseLocalParams(fq);
// Example mapping
// {!frange l=10 u=100}price → range query
There was a problem hiding this comment.
Local params is not P0 and tracked separately. It needs separate translation mechanism. It is out of scope for this PR. Just like we are calling supporting all query types of fq by calling translateQ, it would be the same for local params as well when we add support for it. It isn't strictly coupled to this task
| @@ -0,0 +1,65 @@ | |||
| /** | |||
There was a problem hiding this comment.
- A key feature of fq is that filters are cached independently for performance. OpenSearch does not have identical semantics, but filter context is cache-friendly and must is not. Should we support cache hints and map them appropriately (if possible).
// fq={!cache=false}status:active
// → may require bypassing filter context or marking differently
-
Solr supports cost for ordering filter execution, especially for expensive filters and post-filters. At minimum we must parse and ignore explicitly with documentation, or optionally reorder filters based on cost.
-
Solr distinguishes between normal filters (pre-query) and post filters (executed after main query when cost >= 100). All filters are currently treated equally in OpenSearch. We must either document or support mapping high-cost filters → post_filter (OpenSearch equivalent concept)
{
"query": { ... },
"post_filter": { ... }
}
- Solr fq semantics are strictly:multiple fq → intersection (AND) but union (OR) can be achieved via boolean queries inside fq or filter(condition) tricks. The current implementation assumes all fq → AND filters only. Should we ensure OR logic inside fq is preserved via should?
- Solr explicitly recommends single fq with multiple clauses when reused together and multiple fq when independent (for caching efficiency). The transform loses this distinction where everything becomes a flat filter array. Should we optionally preserve grouping information for better optimization?
- Solr allows function queries inside fq (e.g., mul(popularity,price)). These require special handling in OpenSearch (e.g., script queries). Please ensure translateQ supports function queries in filter context.
{
"script": {
"script": "doc['popularity'] * doc['price'] > 10"
}
}
There was a problem hiding this comment.
- Similar to other comment, local params out of scope for this PR
- Cost is a local parameter used with non-cached filter queries (fq) to hint at the order they should be evaluated. This also requires local params. Using fq for non caching scenarios will be very rare and not a general usecase. Not much research I've done but I know OpenSearch doesn't have same equivalent, could be for same reason. This IMO we must implement only if we ever receive a customer request
- Above point would suffice I think
- We are processing each fq param and the adding result/transformation query in filter array. If fq is using boolean or range or any query inside it, it is processed appropriately
- Good observation. OpenSearch does not cache the final combined result of the entire filter array. It typically caches each filter clause (or reusable filter bitset) independently, and then combines them at query time
- Functional queries as discussed with product as part of other tasks, its not P0. More over it is independent task. When we add functional queries support,
fqalso supports automatically just like we are supporting all other query types now
There was a problem hiding this comment.
On 1, 2 and 3, please update the fq rule transform to flag unsupported constructs, aligned with validations introduced in PR #2688. Additionally document all unsupported fq features (cache, cost., etc) in the LIMITATIONS file.
-
Thanks for the confirmation. Can we add a targeted test for plain OR within a single fq to explicitly validate the contract?
-
Solr’s distinction between single fq with multiple clauses (treated as one logical unit) and multiple fq params (independent filters) is not just about caching, but also about preserving logical grouping and intent. I'd still see treat this as a limitation. Any thoughts on documenting that this distinction is not preserved, or optionally preserving grouping for closer semantic parity/debuggability.
-
Same comment as 1 for function queries in
fq.
|
|
||
| export const request: MicroTransform<RequestContext> = { | ||
| name: 'filter-query-fq', | ||
| match: (ctx) => ctx.params.has('fq'), |
There was a problem hiding this comment.
Solr requires proper escaping of special characters in fq. If ctx.params does not already decode values, queries may break or mis-parse. Please make sure decoding is applied before translateQ. Please add sufficient test coverage for this as well
There was a problem hiding this comment.
URL decoding is handled by URLSearchParams which is used to construct ctx.params. This applies equally to q, fq, and all other params - it's not specific to this PR's changes. If there's a specific escaping scenario you're concerned about which only applies to fq, happy to add a test case, but the decoding layer is upstream of the transform pipeline and not scope of this task
There was a problem hiding this comment.
The intent of the comment was to ensure that escaped/special characters in fq are correctly preserved and parsed through translateQ, since fq often contains ranges, phrases, and special syntax. Adding test coverage for fq values with escaped/special characters to validate end to end behavior through the transform will future proof the transform rule for fq. Thoughts?
| @@ -0,0 +1,218 @@ | |||
| import { describe, it, expect } from 'vitest'; | |||
There was a problem hiding this comment.
I'll review the UTs in the next revision.
Signed-off-by: Nagaraj G <narajg@amazon.com>
| return obj; | ||
| } | ||
|
|
||
| describe('filter-query-fq request transform', () => { |
There was a problem hiding this comment.
Any thoughts on adding below test coverage
- Multiple merge scenario. The current test covers scenario of merging with a single fq, but not with multiple fq values when existing filters are present.
- No op when match passes but apply skips
| const result = mapToObject(ctx.body.get('query')); | ||
| expect(result.bool.filter).toHaveLength(1); | ||
| // The fq itself becomes a bool query | ||
| expect(result.bool.filter[0].bool).toBeDefined(); |
There was a problem hiding this comment.
This only checks presence of bool, not correctness of structure (must, should, etc.). Any thoughts on adding assertions to validate actual semantics?
| @@ -0,0 +1,65 @@ | |||
| /** | |||
There was a problem hiding this comment.
On 1, 2 and 3, please update the fq rule transform to flag unsupported constructs, aligned with validations introduced in PR #2688. Additionally document all unsupported fq features (cache, cost., etc) in the LIMITATIONS file.
-
Thanks for the confirmation. Can we add a targeted test for plain OR within a single fq to explicitly validate the contract?
-
Solr’s distinction between single fq with multiple clauses (treated as one logical unit) and multiple fq params (independent filters) is not just about caching, but also about preserving logical grouping and intent. I'd still see treat this as a limitation. Any thoughts on documenting that this distinction is not preserved, or optionally preserving grouping for closer semantic parity/debuggability.
-
Same comment as 1 for function queries in
fq.
| return result.dsl; | ||
| } | ||
|
|
||
| export const request: MicroTransform<RequestContext> = { |
There was a problem hiding this comment.
This introduces an extra level of nesting, whereas OpenSearch allows must_not directly at the top-level bool, which is more natural and avoids unnecessary wrapping. Consider lifting must_not to the top-level bool.must_not when the fq is a pure negation, for a flatter and more idiomatic query structure.
|
|
||
| export const request: MicroTransform<RequestContext> = { | ||
| name: 'filter-query-fq', | ||
| match: (ctx) => ctx.params.has('fq'), |
There was a problem hiding this comment.
The intent of the comment was to ensure that escaped/special characters in fq are correctly preserved and parsed through translateQ, since fq often contains ranges, phrases, and special syntax. Adding test coverage for fq values with escaped/special characters to validate end to end behavior through the transform will future proof the transform rule for fq. Thoughts?
| filterFunc | ||
| = "filter(" _ expr:query _ ")" boost:boost? { | ||
| const node = { type: 'filter', child: expr }; | ||
| if (boost !== null) return { type: 'boost', child: node, value: boost }; |
There was a problem hiding this comment.
Wrapping filter() with a boost node is syntactically correct, but semantically misleading since filter() is a non-scoring construct. Boost on a filter wrapper is effectively a no-op in Solr/OpenSearch filter context. Can we normalise or explicitly documenting that boost on filter() has no effect.
| // bool.filter for equivalent non-scoring behavior. | ||
| // See: https://solr.apache.org/guide/solr/latest/query-guide/standard-query-parser.html | ||
| filterFunc | ||
| = "filter(" _ expr:query _ ")" boost:boost? { |
There was a problem hiding this comment.
- The grammar allows any query inside filter() without validation. While flexible, this may allow unsupported or unintended constructs (e.g., nested filter, scoring queries).
- The rule assumes query always produces a valid expression. Should we ensure grammar rejects empty expressions
|
|
||
| const emptyParams = new Map<string, string>(); | ||
|
|
||
| describe('FilterNode parsing', () => { |
There was a problem hiding this comment.
Any thoughts on addong below test coverage
- nested filter() coverage
- boost + boolean
- Precedence validations
- prefix + boost
| transformChild: TransformChild, | ||
| ): Map<string, any> => { | ||
| // Transform the child node | ||
| const childResult = transformChild(node.child); |
There was a problem hiding this comment.
- If childResult is already a bool query with a filter clause, this creates unnecessary nesting.
- In filter-query-fq, filters are merged into an existing bool.filter. Here, we always create a new bool. This results in results in slightly different shapes vs fq. If this is intended, good to document this. Similar comment for boost on filter nodes.
- The rule assumes transformChild(node.child) always returns a valid query. Is this intentional?
|
|
||
| /** Helper to convert nested Maps to plain objects for easier assertion. */ | ||
| function mapToObject(map: Map<string, any>): any { |
There was a problem hiding this comment.
Should we add test coverage for following aspects to future proof the rule - nested filters, negation and boost.
Description
This translates Solr filter query to OpenSearch equivalent
Testing
UTs and Integ
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.