Skip to content

[Proposal] Priority-Based Ordering for System-Generated Search Pipeline Processors #21176

@bzhangam

Description

@bzhangam

Is your feature request related to a problem? Please describe

Proposal: Priority-Based Ordering for System-Generated Search Pipeline Processors


Summary

Lift the current one-processor-per-stage-per-type limit in the system-generated search pipeline framework by introducing a getPriority() method on SystemGeneratedProcessor. This enables multiple plugins to register system-generated processors at the same execution stage without conflicts, while guaranteeing deterministic execution order.


Motivation

The system-generated search pipeline framework (introduced in OpenSearch 3.3) currently enforces a hard limit of one system-generated processor per processor type per execution stage. This limit was introduced to avoid non-deterministic ordering, since factory iteration uses HashMap.entrySet() which provides no ordering guarantees.

As the framework is adopted by more plugins, this limit is becoming a blocker. Concretely:

  • The mmr_rerank processor (k-NN plugin) occupies the SearchResponse PRE slot
  • A proposed hybrid-explain processor (neural-search plugin) also needs the SearchResponse PRE slot
  • Both slots in the SearchResponse type (PRE and POST) are now occupied, with no room for future processors

The limit must be lifted in a way that:

  1. Preserves deterministic execution order
  2. Does not require plugin authors to modify OpenSearch core to register a new processor
  3. Fails fast and clearly when two processors conflict
  4. Scales to future processors without renumbering

Describe the solution you'd like

Proposed Solution

1. Add getPriority() to SystemGeneratedProcessor (Core Change)

Add a single abstract method to the SystemGeneratedProcessor interface:

public interface SystemGeneratedProcessor extends Processor {

    /**
     * Execution order within the same processor type and execution stage.
     * Lower value runs first. Must be a unique positive integer among all
     * system-generated processors of the same type at the same execution stage
     * for a given search request.
     *
     * Plugin authors must choose a value that reflects when in the pipeline
     * lifecycle their processor needs to run relative to others.
     * See the OpenSearch documentation for recommended priority bands.
     *
     * @return a positive integer representing execution priority
     */
    int getPriority();

    // existing methods unchanged
    default ExecutionStage getExecutionStage() { return ExecutionStage.POST_USER_DEFINED; }
    default void evaluateConflicts(ProcessorConflictEvaluationContext ctx) {}
}

getPriority() is abstract (no default) to force every implementor to make a conscious, explicit decision about execution timing. A silent default would mask ordering bugs.

2. Replace ensureSingleProcessor with ensureNoDuplicatePriority (Core Change)

In SystemGeneratedPipelineWithMetrics, replace the current hard size check:

// BEFORE (current):
private static <T extends Processor> void ensureSingleProcessor(...) {
    if (processors.size() > 1) {
        throw new IllegalArgumentException("Cannot support more than one...");
    }
}

// AFTER (proposed):
private static <T extends Processor> void ensureNoDuplicatePriority(
    String typeName,
    SystemGeneratedProcessor.ExecutionStage stage,
    List<T> processors
) {
    Map<Integer, List<String>> byPriority = new LinkedHashMap<>();
    for (T processor : processors) {
        int priority = ((SystemGeneratedProcessor) processor).getPriority();
        byPriority.computeIfAbsent(priority, k -> new ArrayList<>()).add(processor.getType());
    }
    for (Map.Entry<Integer, List<String>> entry : byPriority.entrySet()) {
        if (entry.getValue().size() > 1) {
            throw new IllegalArgumentException(String.format(
                Locale.ROOT,
                "System-generated %s processors %s at stage %s share the same priority %d. " +
                "Each processor must declare a unique priority. " +
                "See the OpenSearch documentation for recommended priority bands.",
                typeName, entry.getValue(), stage, entry.getKey()
            ));
        }
    }
}

3. Sort Processors by Priority Before Building the Pipeline (Core Change)

In generateProcessors(), after collecting processors into lists.pre and lists.post, sort them:

lists.pre.sort(Comparator.comparingInt(p -> ((SystemGeneratedProcessor) p).getPriority()));
lists.post.sort(Comparator.comparingInt(p -> ((SystemGeneratedProcessor) p).getPriority()));

This makes execution order deterministic and explicit — lower priority number always runs first, regardless of factory registration order.


Priority Band Convention (Documentation, Not Code)

Priority bands are a documentation convention only — no constants are defined in core. This ensures plugin authors never need to modify core to introduce a new processor. The bands are defined per processor type, since each type operates on different data at a different point in the search lifecycle.

SearchRequest Processor Bands

Band Name Meaning
1–99 Query Rewriting Fundamentally rewrites the query structure (expansion, neural translation)
100–299 Query Parameter Injection Injects/modifies parameters (size, k) without changing query structure
300–499 Query Validation/Routing Validates query or adjusts routing
500–699 General Request Enrichment Adds context/metadata; no strong ordering dependency
700–999 Request Finalization Last-chance modifications before dispatch

SearchPhaseResults Processor Bands

Band Name Meaning
1–99 Score Normalization Normalizes raw shard scores to a common scale
100–299 Score Combination Combines normalized scores from multiple sub-queries
300–499 Candidate Selection Selects which candidates proceed to FETCH phase
500–699 Phase Result Enrichment Attaches metadata/context for downstream response processors
700–999 Phase Result Finalization Final adjustments before FETCH phase

SearchResponse Processor Bands

Band Name Meaning
1–99 Score Modification Changes numeric score values in the response
100–299 Score-Derived Computation Reads finalized scores to produce derived data (explanation, stats)
300–499 Result Set Shaping Changes which hits are in the result set or their order (reranking, deduplication, truncation)
500–699 General Response Enrichment Adds metadata/annotations; no strong ordering dependency
700–899 Content Transformation Transforms hit content/fields (highlighting, field rewriting)
900–999 Output Formatting Final response structure shaping

Scoping: Priority uniqueness is enforced within (processorType, executionStage). A SearchRequest processor with priority 150 and a SearchResponse processor with priority 150 are completely independent and never conflict.


Impact on Existing Processors

All existing SystemGeneratedProcessor implementations must be updated to implement getPriority(). Proposed assignments:

Processor Plugin Type Stage Priority Band
mmr_over_sample k-NN SearchRequest POST 150 Query Parameter Injection
hybrid-explain (proposed) neural-search SearchResponse PRE 150 Score-Derived Computation
mmr_rerank k-NN SearchResponse PRE 350 Result Set Shaping
semantic-highlighter neural-search SearchResponse POST 750 Content Transformation

With this ordering, the Response PRE stage executes as:

priority 150: hybrid-explain  → enriches explanation before any hits are removed
priority 350: mmr_rerank      → reorders and trims the result set

This is semantically correct: explanation enrichment must run before result set trimming, because hits removed by MMR reranking should still have their explanation data attached while they are in the response.


Changes Required

OpenSearch Core (opensearch-project/OpenSearch):

  • SystemGeneratedProcessor.java — add abstract getPriority() method
  • SystemGeneratedPipelineWithMetrics.java — replace ensureSingleProcessor with ensureNoDuplicatePriority, add sort by priority
  • Documentation — add priority band tables to the system-generated search processors page

neural-search plugin (opensearch-project/neural-search):

  • SemanticHighlightingProcessor.java — implement getPriority() returning 750
  • New SystemExplanationProcessor.java — implement getPriority() returning 150, getExecutionStage() returning PRE_USER_DEFINED

k-NN plugin (opensearch-project/k-NN):

  • mmr_rerank processor — implement getPriority() returning 350
  • mmr_over_sample processor — implement getPriority() returning 150

Backward Compatibility

Since getPriority() is abstract, this is a compile-time breaking change for any existing SystemGeneratedProcessor implementation. All known implementations are in first-party OpenSearch plugins and will be updated in the same release. Third-party plugin authors will receive a compile error that directs them to add getPriority() — this is intentional, as it forces explicit ordering decisions.

The runtime behavior for a single processor per stage is unchanged: a list of one element sorted by priority is still a list of one element.


Alternatives Considered

Keep the 1-per-stage limit and use a different execution stage: Not viable — both Response PRE and POST slots are now occupied by existing processors.

Define ExecutionPriority constants in core: Rejected — this would require plugin authors to modify core to add new bands, defeating the purpose of a plugin-extensible system.

Use a default priority value: Rejected — a default silently masks ordering bugs. Making getPriority() abstract forces explicit, documented decisions.

Use Integer.MAX_VALUE as default: Rejected for the same reason as above, and additionally because two processors both defaulting to MAX_VALUE would conflict at runtime rather than compile time.

Related component

Search

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    SearchSearch query, autocomplete ...etcenhancementEnhancement or improvement to existing feature or requestuntriaged

    Type

    No type

    Projects

    Status

    🆕 New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions