[Proposal] Priority-Based Ordering for System-Generated Search Pipeline Processors

### Is your feature request related to a problem? Please describe

## Proposal: Priority-Based Ordering for System-Generated Search Pipeline Processors

---

### Summary

Lift the current one-processor-per-stage-per-type limit in the system-generated search pipeline framework by introducing a `getPriority()` method on `SystemGeneratedProcessor`. This enables multiple plugins to register system-generated processors at the same execution stage without conflicts, while guaranteeing deterministic execution order.

---

### Motivation

The system-generated search pipeline framework (introduced in OpenSearch 3.3) currently enforces a hard limit of one system-generated processor per processor type per execution stage. This limit was introduced to avoid non-deterministic ordering, since factory iteration uses `HashMap.entrySet()` which provides no ordering guarantees.

As the framework is adopted by more plugins, this limit is becoming a blocker. Concretely:

- The `mmr_rerank` processor (k-NN plugin) occupies the **SearchResponse PRE** slot
- A proposed [`hybrid-explain` processor (neural-search plugin)](https://github.com/opensearch-project/neural-search/issues/1812) also needs the **SearchResponse PRE** slot
- Both slots in the SearchResponse type (PRE and POST) are now occupied, with no room for future processors

The limit must be lifted in a way that:
1. Preserves deterministic execution order
2. Does not require plugin authors to modify OpenSearch core to register a new processor
3. Fails fast and clearly when two processors conflict
4. Scales to future processors without renumbering

---


### Describe the solution you'd like


### Proposed Solution

#### 1. Add `getPriority()` to `SystemGeneratedProcessor` (Core Change)

Add a single abstract method to the `SystemGeneratedProcessor` interface:

```java
public interface SystemGeneratedProcessor extends Processor {

    /**
     * Execution order within the same processor type and execution stage.
     * Lower value runs first. Must be a unique positive integer among all
     * system-generated processors of the same type at the same execution stage
     * for a given search request.
     *
     * Plugin authors must choose a value that reflects when in the pipeline
     * lifecycle their processor needs to run relative to others.
     * See the OpenSearch documentation for recommended priority bands.
     *
     * @return a positive integer representing execution priority
     */
    int getPriority();

    // existing methods unchanged
    default ExecutionStage getExecutionStage() { return ExecutionStage.POST_USER_DEFINED; }
    default void evaluateConflicts(ProcessorConflictEvaluationContext ctx) {}
}
```

`getPriority()` is **abstract** (no default) to force every implementor to make a conscious, explicit decision about execution timing. A silent default would mask ordering bugs.

#### 2. Replace `ensureSingleProcessor` with `ensureNoDuplicatePriority` (Core Change)

In `SystemGeneratedPipelineWithMetrics`, replace the current hard size check:

```java
// BEFORE (current):
private static <T extends Processor> void ensureSingleProcessor(...) {
    if (processors.size() > 1) {
        throw new IllegalArgumentException("Cannot support more than one...");
    }
}

// AFTER (proposed):
private static <T extends Processor> void ensureNoDuplicatePriority(
    String typeName,
    SystemGeneratedProcessor.ExecutionStage stage,
    List<T> processors
) {
    Map<Integer, List<String>> byPriority = new LinkedHashMap<>();
    for (T processor : processors) {
        int priority = ((SystemGeneratedProcessor) processor).getPriority();
        byPriority.computeIfAbsent(priority, k -> new ArrayList<>()).add(processor.getType());
    }
    for (Map.Entry<Integer, List<String>> entry : byPriority.entrySet()) {
        if (entry.getValue().size() > 1) {
            throw new IllegalArgumentException(String.format(
                Locale.ROOT,
                "System-generated %s processors %s at stage %s share the same priority %d. " +
                "Each processor must declare a unique priority. " +
                "See the OpenSearch documentation for recommended priority bands.",
                typeName, entry.getValue(), stage, entry.getKey()
            ));
        }
    }
}
```

#### 3. Sort Processors by Priority Before Building the Pipeline (Core Change)

In `generateProcessors()`, after collecting processors into `lists.pre` and `lists.post`, sort them:

```java
lists.pre.sort(Comparator.comparingInt(p -> ((SystemGeneratedProcessor) p).getPriority()));
lists.post.sort(Comparator.comparingInt(p -> ((SystemGeneratedProcessor) p).getPriority()));
```

This makes execution order **deterministic and explicit** — lower priority number always runs first, regardless of factory registration order.

---

### Priority Band Convention (Documentation, Not Code)

Priority bands are a **documentation convention only** — no constants are defined in core. This ensures plugin authors never need to modify core to introduce a new processor. The bands are defined per processor type, since each type operates on different data at a different point in the search lifecycle.

#### SearchRequest Processor Bands

| Band | Name | Meaning |
|---|---|---|
| 1–99 | Query Rewriting | Fundamentally rewrites the query structure (expansion, neural translation) |
| 100–299 | Query Parameter Injection | Injects/modifies parameters (size, k) without changing query structure |
| 300–499 | Query Validation/Routing | Validates query or adjusts routing |
| 500–699 | General Request Enrichment | Adds context/metadata; no strong ordering dependency |
| 700–999 | Request Finalization | Last-chance modifications before dispatch |

#### SearchPhaseResults Processor Bands

| Band | Name | Meaning |
|---|---|---|
| 1–99 | Score Normalization | Normalizes raw shard scores to a common scale |
| 100–299 | Score Combination | Combines normalized scores from multiple sub-queries |
| 300–499 | Candidate Selection | Selects which candidates proceed to FETCH phase |
| 500–699 | Phase Result Enrichment | Attaches metadata/context for downstream response processors |
| 700–999 | Phase Result Finalization | Final adjustments before FETCH phase |

#### SearchResponse Processor Bands

| Band | Name | Meaning |
|---|---|---|
| 1–99 | Score Modification | Changes numeric score values in the response |
| 100–299 | Score-Derived Computation | Reads finalized scores to produce derived data (explanation, stats) |
| 300–499 | Result Set Shaping | Changes which hits are in the result set or their order (reranking, deduplication, truncation) |
| 500–699 | General Response Enrichment | Adds metadata/annotations; no strong ordering dependency |
| 700–899 | Content Transformation | Transforms hit content/fields (highlighting, field rewriting) |
| 900–999 | Output Formatting | Final response structure shaping |

**Scoping:** Priority uniqueness is enforced within `(processorType, executionStage)`. A SearchRequest processor with priority 150 and a SearchResponse processor with priority 150 are completely independent and never conflict.

---

### Impact on Existing Processors

All existing `SystemGeneratedProcessor` implementations must be updated to implement `getPriority()`. Proposed assignments:

| Processor | Plugin | Type | Stage | Priority | Band |
|---|---|---|---|---|---|
| `mmr_over_sample` | k-NN | SearchRequest | POST | `150` | Query Parameter Injection |
| `hybrid-explain` *(proposed)* | neural-search | SearchResponse | PRE | `150` | Score-Derived Computation |
| `mmr_rerank` | k-NN | SearchResponse | PRE | `350` | Result Set Shaping |
| `semantic-highlighter` | neural-search | SearchResponse | POST | `750` | Content Transformation |

With this ordering, the Response PRE stage executes as:
```
priority 150: hybrid-explain  → enriches explanation before any hits are removed
priority 350: mmr_rerank      → reorders and trims the result set
```

This is semantically correct: explanation enrichment must run before result set trimming, because hits removed by MMR reranking should still have their explanation data attached while they are in the response.

---

### Changes Required

**OpenSearch Core (`opensearch-project/OpenSearch`):**
- `SystemGeneratedProcessor.java` — add abstract `getPriority()` method
- `SystemGeneratedPipelineWithMetrics.java` — replace `ensureSingleProcessor` with `ensureNoDuplicatePriority`, add sort by priority
- Documentation — add priority band tables to the system-generated search processors page

**neural-search plugin (`opensearch-project/neural-search`):**
- `SemanticHighlightingProcessor.java` — implement `getPriority()` returning `750`
- New `SystemExplanationProcessor.java` — implement `getPriority()` returning `150`, `getExecutionStage()` returning `PRE_USER_DEFINED`

**k-NN plugin (`opensearch-project/k-NN`):**
- `mmr_rerank` processor — implement `getPriority()` returning `350`
- `mmr_over_sample` processor — implement `getPriority()` returning `150`

---

### Backward Compatibility

Since `getPriority()` is abstract, this is a **compile-time breaking change** for any existing `SystemGeneratedProcessor` implementation. All known implementations are in first-party OpenSearch plugins and will be updated in the same release. Third-party plugin authors will receive a compile error that directs them to add `getPriority()` — this is intentional, as it forces explicit ordering decisions.

The runtime behavior for a single processor per stage is unchanged: a list of one element sorted by priority is still a list of one element.

---

### Alternatives Considered

**Keep the 1-per-stage limit and use a different execution stage:** Not viable — both Response PRE and POST slots are now occupied by existing processors.

**Define `ExecutionPriority` constants in core:** Rejected — this would require plugin authors to modify core to add new bands, defeating the purpose of a plugin-extensible system.

**Use a default priority value:** Rejected — a default silently masks ordering bugs. Making `getPriority()` abstract forces explicit, documented decisions.

**Use `Integer.MAX_VALUE` as default:** Rejected for the same reason as above, and additionally because two processors both defaulting to `MAX_VALUE` would conflict at runtime rather than compile time.


### Related component

Search

### Describe alternatives you've considered

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Priority-Based Ordering for System-Generated Search Pipeline Processors #21176

Is your feature request related to a problem? Please describe

Proposal: Priority-Based Ordering for System-Generated Search Pipeline Processors

Summary

Motivation

Describe the solution you'd like

Proposed Solution

1. Add `getPriority()` to `SystemGeneratedProcessor` (Core Change)

2. Replace `ensureSingleProcessor` with `ensureNoDuplicatePriority` (Core Change)

3. Sort Processors by Priority Before Building the Pipeline (Core Change)

Priority Band Convention (Documentation, Not Code)

SearchRequest Processor Bands

SearchPhaseResults Processor Bands

SearchResponse Processor Bands

Impact on Existing Processors

Changes Required

Backward Compatibility

Alternatives Considered

Related component

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Band	Name	Meaning
1–99	Query Rewriting	Fundamentally rewrites the query structure (expansion, neural translation)
100–299	Query Parameter Injection	Injects/modifies parameters (size, k) without changing query structure
300–499	Query Validation/Routing	Validates query or adjusts routing
500–699	General Request Enrichment	Adds context/metadata; no strong ordering dependency
700–999	Request Finalization	Last-chance modifications before dispatch

Band	Name	Meaning
1–99	Score Normalization	Normalizes raw shard scores to a common scale
100–299	Score Combination	Combines normalized scores from multiple sub-queries
300–499	Candidate Selection	Selects which candidates proceed to FETCH phase
500–699	Phase Result Enrichment	Attaches metadata/context for downstream response processors
700–999	Phase Result Finalization	Final adjustments before FETCH phase

Band	Name	Meaning
1–99	Score Modification	Changes numeric score values in the response
100–299	Score-Derived Computation	Reads finalized scores to produce derived data (explanation, stats)
300–499	Result Set Shaping	Changes which hits are in the result set or their order (reranking, deduplication, truncation)
500–699	General Response Enrichment	Adds metadata/annotations; no strong ordering dependency
700–899	Content Transformation	Transforms hit content/fields (highlighting, field rewriting)
900–999	Output Formatting	Final response structure shaping

Processor	Plugin	Type	Stage	Priority	Band
`mmr_over_sample`	k-NN	SearchRequest	POST	`150`	Query Parameter Injection
`hybrid-explain` (proposed)	neural-search	SearchResponse	PRE	`150`	Score-Derived Computation
`mmr_rerank`	k-NN	SearchResponse	PRE	`350`	Result Set Shaping
`semantic-highlighter`	neural-search	SearchResponse	POST	`750`	Content Transformation

[Proposal] Priority-Based Ordering for System-Generated Search Pipeline Processors #21176

Description

Is your feature request related to a problem? Please describe

Proposal: Priority-Based Ordering for System-Generated Search Pipeline Processors

Summary

Motivation

Describe the solution you'd like

Proposed Solution

1. Add getPriority() to SystemGeneratedProcessor (Core Change)

2. Replace ensureSingleProcessor with ensureNoDuplicatePriority (Core Change)

3. Sort Processors by Priority Before Building the Pipeline (Core Change)

Priority Band Convention (Documentation, Not Code)

SearchRequest Processor Bands

SearchPhaseResults Processor Bands

SearchResponse Processor Bands

Impact on Existing Processors

Changes Required

Backward Compatibility

Alternatives Considered

Related component

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Add `getPriority()` to `SystemGeneratedProcessor` (Core Change)

2. Replace `ensureSingleProcessor` with `ensureNoDuplicatePriority` (Core Change)