Skip to content

Adding translator for filter query#2667

Open
nagarajg17 wants to merge 1 commit intoopensearch-project:mainfrom
nagarajg17:feature/fq
Open

Adding translator for filter query#2667
nagarajg17 wants to merge 1 commit intoopensearch-project:mainfrom
nagarajg17:feature/fq

Conversation

@nagarajg17
Copy link
Copy Markdown
Collaborator

@nagarajg17 nagarajg17 commented Apr 9, 2026

Description

This translates Solr filter query to OpenSearch equivalent

Testing

UTs and Integ

Check List

  • New functionality includes testing
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@nagarajg17 nagarajg17 had a problem deploying to migrations-cicd-require-approval April 9, 2026 06:05 — with GitHub Actions Error
@nagarajg17 nagarajg17 had a problem deploying to migrations-cicd-require-approval April 9, 2026 06:05 — with GitHub Actions Error
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.54%. Comparing base (b3bb603) to head (fbbf0b4).

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2667      +/-   ##
============================================
+ Coverage     72.52%   72.54%   +0.01%     
  Complexity       90       90              
============================================
  Files           709      711       +2     
  Lines         32988    33011      +23     
  Branches       2823     2831       +8     
============================================
+ Hits          23925    23948      +23     
  Misses         7784     7784              
  Partials       1279     1279              
Flag Coverage Δ
gradle 68.74% <ø> (ø)
node 92.71% <100.00%> (+0.07%) ⬆️
python 76.56% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@k-rooot
Copy link
Copy Markdown
Collaborator

k-rooot commented Apr 9, 2026

Please look into the code coverage and integration test failures.

@nagarajg17 nagarajg17 had a problem deploying to migrations-cicd-require-approval April 9, 2026 15:38 — with GitHub Actions Error
@nagarajg17 nagarajg17 had a problem deploying to migrations-cicd-require-approval April 9, 2026 15:38 — with GitHub Actions Error
match: (ctx) => ctx.params.has('fq'),
apply: (ctx) => {
// Get all fq values (can be multiple)
const fqValues = ctx.params.getAll('fq');
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Should we need to filter out empty or whitespace-only values before processing? If this is handled at the parser please make sure we have sufficient test coverage to confirm the handling.
  2. Solr allows filter(condition) syntax inside queries to cache sub-clauses independently. Should we detect and normalize filter(...) constructs?

Copy link
Copy Markdown
Collaborator Author

@nagarajg17 nagarajg17 Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Added validations to filter out empty or whitespace-only values and tests
  2. Actually it is a separate syntax or query itself from fq. This needs PEG changes, separate translate transformation rule etc, so had excluded in this PR scope. Anyway have added it now

if (fqValues.length === 0) return;

// Parse each fq into OpenSearch DSL
const filters = fqValues.map(parseFq);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Solr allows complex boolean expressions within a single fq (e.g., +A +B) which are semantically equivalent to multiple fq params. The current implementation does not explicitly account for intra-fq boolean structure. If translateQ does not normalize +A +B into must, behavior may diverge. Please ensure translateQ preserves boolean semantics consistently between multiple fq and single fq with boolean clauses.
    eg: fq=+A +B should behave same as fq=A&fq=B

  2. Callout: Currently, if one fq fails to parse, the entire transformation will fail. We will need to consider the translation modes for later phases to handle these failures.

Copy link
Copy Markdown
Collaborator Author

@nagarajg17 nagarajg17 Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Good point. The parser does handle +/- prefix syntax (see prefixExpr in solr.pegjs). This applies to not just fq even to q param. When +/- is used and there are 2 or more terms, it is implicit AND instead of OR. Currently PEG works with implicit OR which is true for rest of all other usecases. We can fix at PEG level or before calling PEG itself to send q.op=AND for such cases. I'm tracking the fix in fine tune task. Once we add the fix it will work for both q and fq, no explicit changes will be needed for fq
  2. Ack, it supports fail fast now

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Once the fix is implemented, please ensure we have explicit test coverage validating equivalence between multiple fq params and single fq with + (required) clauses. Please add a code level TODO to ensure this is not slipped through the crack.

Comment on lines +55 to +57
if (existingQuery) {
boolQuery.set('must', [existingQuery]);
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapping existingQuery in an array is correct, but if existingQuery is already a bool query, this may lead to unnecessary nesting. Can we flatten when possible to avoid deep nesting and improve query performance/readability.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

// Add filter clauses
boolQuery.set('filter', filters);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overwrites any existing filter clause if the original query already had one (e.g., from previous transforms like query-q ). Should we merge with existing filters instead of overwriting?

const existingFilter = existingQuery?.get?.('bool')?.get?.('filter') || [];
boolQuery.set('filter', [...existingFilter, ...filters]);

Copy link
Copy Markdown
Collaborator Author

@nagarajg17 nagarajg17 Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not overriding it. We are wrapping existing query and adding filter on top of it. For merging filters is handled as part of above comment of nesting

const boolQuery = new Map<string, any>();
      if (existingQuery) {
        boolQuery.set('must', [existingQuery]);
      }
boolQuery.set('filter', filters);

boolQuery.set('filter', filters);

// Replace query with the new bool wrapper
ctx.body.set('query', new Map([['bool', boolQuery]]));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to understand why we are doing this replacement? Will it impact any transformations from query-q?

This replaces the entire query with a new bool structure, which may discard other clauses such as should or must_not that exist in the original query. I would recommend merging into the existing bool query when present instead of replacing it.

Copy link
Copy Markdown
Collaborator Author

@nagarajg17 nagarajg17 Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered in above comments. We are wrapping existing query and adding filter on it. It won't impact anything. If existing query from query-q is bool query, have merged filter to that now

return result.dsl;
}

export const request: MicroTransform<RequestContext> = {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solr supports negation in fq (e.g., -domainStatus:deleted). These should be mapped to must_not instead of filter.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filter will have any kind of query, must_not is inside that. Translation produces

{
  "query": {
    "bool": {
      "must": [{ "match_all": {} }],
      "filter": [
        {
          "bool": {
            "must_not": [{ "match": { "category": { "query": "software" } } }]
          }
        }
      ]
    }
  }
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces an extra level of nesting, whereas OpenSearch allows must_not directly at the top-level bool, which is more natural and avoids unnecessary wrapping. Consider lifting must_not to the top-level bool.must_not when the fq is a pure negation, for a flatter and more idiomatic query structure.

* Parse a single fq value using the same query engine as q.
* Creates a params map with the fq value as 'q' for translateQ.
*/
function parseFq(fq: string): Map<string, any> {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solr fq supports local params (e.g., {!frange}, {!geofilt}, {!cache=false}), which are not standard query syntax. Passing raw fq into translateQ may fail or misinterpret {!frange l=10 u=100}field, {!cache=false}field:value.

Introduce a preprocessing step to parse/extract local params and route to appropriate DSL constructs will help.

const { localParams, query } = parseLocalParams(fq);

// Example mapping
// {!frange l=10 u=100}price → range query

Copy link
Copy Markdown
Collaborator Author

@nagarajg17 nagarajg17 Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local params is not P0 and tracked separately. It needs separate translation mechanism. It is out of scope for this PR. Just like we are calling supporting all query types of fq by calling translateQ, it would be the same for local params as well when we add support for it. It isn't strictly coupled to this task

@@ -0,0 +1,65 @@
/**
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. A key feature of fq is that filters are cached independently for performance. OpenSearch does not have identical semantics, but filter context is cache-friendly and must is not. Should we support cache hints and map them appropriately (if possible).
// fq={!cache=false}status:active
// → may require bypassing filter context or marking differently
  1. Solr supports cost for ordering filter execution, especially for expensive filters and post-filters. At minimum we must parse and ignore explicitly with documentation, or optionally reorder filters based on cost.

  2. Solr distinguishes between normal filters (pre-query) and post filters (executed after main query when cost >= 100). All filters are currently treated equally in OpenSearch. We must either document or support mapping high-cost filters → post_filter (OpenSearch equivalent concept)

{
  "query": { ... },
  "post_filter": { ... }
}
  1. Solr fq semantics are strictly:multiple fq → intersection (AND) but union (OR) can be achieved via boolean queries inside fq or filter(condition) tricks. The current implementation assumes all fq → AND filters only. Should we ensure OR logic inside fq is preserved via should?
  2. Solr explicitly recommends single fq with multiple clauses when reused together and multiple fq when independent (for caching efficiency). The transform loses this distinction where everything becomes a flat filter array. Should we optionally preserve grouping information for better optimization?
  3. Solr allows function queries inside fq (e.g., mul(popularity,price)). These require special handling in OpenSearch (e.g., script queries). Please ensure translateQ supports function queries in filter context.
{
  "script": {
    "script": "doc['popularity'] * doc['price'] > 10"
  }
}

Copy link
Copy Markdown
Collaborator Author

@nagarajg17 nagarajg17 Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Similar to other comment, local params out of scope for this PR
  2. Cost is a local parameter used with non-cached filter queries (fq) to hint at the order they should be evaluated. This also requires local params. Using fq for non caching scenarios will be very rare and not a general usecase. Not much research I've done but I know OpenSearch doesn't have same equivalent, could be for same reason. This IMO we must implement only if we ever receive a customer request
  3. Above point would suffice I think
  4. We are processing each fq param and the adding result/transformation query in filter array. If fq is using boolean or range or any query inside it, it is processed appropriately
  5. Good observation. OpenSearch does not cache the final combined result of the entire filter array. It typically caches each filter clause (or reusable filter bitset) independently, and then combines them at query time
  6. Functional queries as discussed with product as part of other tasks, its not P0. More over it is independent task. When we add functional queries support, fq also supports automatically just like we are supporting all other query types now

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On 1, 2 and 3, please update the fq rule transform to flag unsupported constructs, aligned with validations introduced in PR #2688. Additionally document all unsupported fq features (cache, cost., etc) in the LIMITATIONS file.

  1. Thanks for the confirmation. Can we add a targeted test for plain OR within a single fq to explicitly validate the contract?

  2. Solr’s distinction between single fq with multiple clauses (treated as one logical unit) and multiple fq params (independent filters) is not just about caching, but also about preserving logical grouping and intent. I'd still see treat this as a limitation. Any thoughts on documenting that this distinction is not preserved, or optionally preserving grouping for closer semantic parity/debuggability.

  3. Same comment as 1 for function queries in fq.


export const request: MicroTransform<RequestContext> = {
name: 'filter-query-fq',
match: (ctx) => ctx.params.has('fq'),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solr requires proper escaping of special characters in fq. If ctx.params does not already decode values, queries may break or mis-parse. Please make sure decoding is applied before translateQ. Please add sufficient test coverage for this as well

Copy link
Copy Markdown
Collaborator Author

@nagarajg17 nagarajg17 Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URL decoding is handled by URLSearchParams which is used to construct ctx.params. This applies equally to q, fq, and all other params - it's not specific to this PR's changes. If there's a specific escaping scenario you're concerned about which only applies to fq, happy to add a test case, but the decoding layer is upstream of the transform pipeline and not scope of this task

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent of the comment was to ensure that escaped/special characters in fq are correctly preserved and parsed through translateQ, since fq often contains ranges, phrases, and special syntax. Adding test coverage for fq values with escaped/special characters to validate end to end behavior through the transform will future proof the transform rule for fq. Thoughts?

@@ -0,0 +1,218 @@
import { describe, it, expect } from 'vitest';
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll review the UTs in the next revision.

Signed-off-by: Nagaraj G <narajg@amazon.com>
@nagarajg17 nagarajg17 requested a deployment to migrations-cicd-require-approval April 12, 2026 10:21 — with GitHub Actions Waiting
@nagarajg17 nagarajg17 requested a deployment to migrations-cicd-require-approval April 12, 2026 10:21 — with GitHub Actions Waiting
return obj;
}

describe('filter-query-fq request transform', () => {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts on adding below test coverage

  1. Multiple merge scenario. The current test covers scenario of merging with a single fq, but not with multiple fq values when existing filters are present.
  2. No op when match passes but apply skips

const result = mapToObject(ctx.body.get('query'));
expect(result.bool.filter).toHaveLength(1);
// The fq itself becomes a bool query
expect(result.bool.filter[0].bool).toBeDefined();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only checks presence of bool, not correctness of structure (must, should, etc.). Any thoughts on adding assertions to validate actual semantics?

@@ -0,0 +1,65 @@
/**
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On 1, 2 and 3, please update the fq rule transform to flag unsupported constructs, aligned with validations introduced in PR #2688. Additionally document all unsupported fq features (cache, cost., etc) in the LIMITATIONS file.

  1. Thanks for the confirmation. Can we add a targeted test for plain OR within a single fq to explicitly validate the contract?

  2. Solr’s distinction between single fq with multiple clauses (treated as one logical unit) and multiple fq params (independent filters) is not just about caching, but also about preserving logical grouping and intent. I'd still see treat this as a limitation. Any thoughts on documenting that this distinction is not preserved, or optionally preserving grouping for closer semantic parity/debuggability.

  3. Same comment as 1 for function queries in fq.

return result.dsl;
}

export const request: MicroTransform<RequestContext> = {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces an extra level of nesting, whereas OpenSearch allows must_not directly at the top-level bool, which is more natural and avoids unnecessary wrapping. Consider lifting must_not to the top-level bool.must_not when the fq is a pure negation, for a flatter and more idiomatic query structure.


export const request: MicroTransform<RequestContext> = {
name: 'filter-query-fq',
match: (ctx) => ctx.params.has('fq'),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent of the comment was to ensure that escaped/special characters in fq are correctly preserved and parsed through translateQ, since fq often contains ranges, phrases, and special syntax. Adding test coverage for fq values with escaped/special characters to validate end to end behavior through the transform will future proof the transform rule for fq. Thoughts?

filterFunc
= "filter(" _ expr:query _ ")" boost:boost? {
const node = { type: 'filter', child: expr };
if (boost !== null) return { type: 'boost', child: node, value: boost };
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapping filter() with a boost node is syntactically correct, but semantically misleading since filter() is a non-scoring construct. Boost on a filter wrapper is effectively a no-op in Solr/OpenSearch filter context. Can we normalise or explicitly documenting that boost on filter() has no effect.

// bool.filter for equivalent non-scoring behavior.
// See: https://solr.apache.org/guide/solr/latest/query-guide/standard-query-parser.html
filterFunc
= "filter(" _ expr:query _ ")" boost:boost? {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The grammar allows any query inside filter() without validation. While flexible, this may allow unsupported or unintended constructs (e.g., nested filter, scoring queries).
  2. The rule assumes query always produces a valid expression. Should we ensure grammar rejects empty expressions


const emptyParams = new Map<string, string>();

describe('FilterNode parsing', () => {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts on addong below test coverage

  1. nested filter() coverage
  2. boost + boolean
  3. Precedence validations
  4. prefix + boost

transformChild: TransformChild,
): Map<string, any> => {
// Transform the child node
const childResult = transformChild(node.child);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. If childResult is already a bool query with a filter clause, this creates unnecessary nesting.
  2. In filter-query-fq, filters are merged into an existing bool.filter. Here, we always create a new bool. This results in results in slightly different shapes vs fq. If this is intended, good to document this. Similar comment for boost on filter nodes.
  3. The rule assumes transformChild(node.child) always returns a valid query. Is this intentional?

Comment on lines +5 to +7

/** Helper to convert nested Maps to plain objects for easier assertion. */
function mapToObject(map: Map<string, any>): any {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add test coverage for following aspects to future proof the rule - nested filters, negation and boost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants