PPL Engine V3 (for 3.0.0)

1. Motivations

Previously, we developed SQL engine V2 to support both SQL and PPL queries. However, as the complexity of supported SQL and PPL increased, the engine's limitations became increasingly apparent. Two major issues emerged:

Insufficient support for complex SQL/PPL queries: The development cycle for new commands such as Join and Subquery was lengthy, and it was difficult to achieve high robustness.
Lack of advanced query plan optimization: The V2 engine only supports a few pushdown optimization for certain operators and lacks of mature optimization rules and cost-based optimizer like those found in traditional databases. Query performance and scalability are core to design of PPL, enabling it to efficiently handle high-performance queries and scale to support large datasets and complex queries.

Why Apache Calcite?

Introducing Apache Calcite brings serval significant advantages:

Enhanced query plan optimization capabilities: Calcite's optimizer can effectively optimize execution plans for both complex SQL and PPL queries.
Simplified development of new commands and functions: Expanding PPL commands is one of the key targets to enhancing the PPL language. Calcite helps streamline the development cycle for new commands and functions.
Decoupled execution layer: Calcite can be used for both query optimization and execution, or solely for query optimization while delegating execution to other backends such as DataFusion or Velox."

Find more details in V3 Architecture.

2. What's New

In the initial release of the V3 engine (3.0.0), the main new features focus on enhancing the PPL language while maintaining maximum compatibility with V2 behavior.

Join Command
Lookup Command
Subquery Command

3.What are Changed

3.1 Breaking Changes

Because of implementation changed internally, following behaviors are changed from 3.0.0. (Behaviors in V3 is correct)

Item	V2	V3
Return type of `timestampdiff`	timestamp	int
Return type of `regexp`	int	boolean
Return type of `count`,`dc`,`distinct_count`	int	bigint
Return type of `ceiling`,`floor`,`sign`	int	same type with input
like(firstname, 'Ambe_') on value "Amber JOHnny"	true	false
like(firstname, 'Ambe*') on value "Amber JOHnny"	true	false
cast(firstname as boolean)	false	null
Sum multiple `null` values when pushdown enabled	0	null

3.2 Fallback Mechanism

As v3 engine is experimental in 3.0.0, not all PPL commands could work under this new engine. Those unsupported queries will be forwarded to V2 engine by fallback mechanism. To avoid impact on your side, normally you won't see any difference in a query response. If you want to check if and why your query falls back to be handled by V2 engine, please check OpenSearch log for "Fallback to V2 query engine since ...".

3.3 Limitations

For the following commands or functions, we add some defensive restrictions to ensure security.

New Restrictions

EVAL won't allow to use Metadata Fields of OpenSearch as the fields
RENAME won't allow renaming to a Metadata Fields of OpenSearch
as won't allow to use Metadata Fields of OpenSearch as the alias name

For the following functionalities in V3 engine, the query will be forwarded to the V2 query engine and thus you cannot use new features in 2. What's New.

Unsupported functionalities

All SQL queries
dedup with consecutive=true
Search relevant commands
- AD
- ML
- Kmeans
Commands with fetch_size parameter
Search relevant functions
- match
- match_phrase
- match_bool_prefix
- match_phrase_prefix
- simple_query_string
- query_string
- multi_match
Existed limitations of V2

4.How it's Implemented

If you're interested in the new query engine, please find more details in V3 Architecture.

5. What's Next

The following items are on our roadmap with high priority:

Resolve the V3 limitation.
Advancing pushdown optimization and benchmarking
Unified the PPL syntax between PPL-on-OpenSearch and PPL-on-Spark
Support more DSL aggregation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPL Engine V3 (for 3.0.0)

1. Motivations

Why Apache Calcite?

2. What's New

3.What are Changed

3.1 Breaking Changes

3.2 Fallback Mechanism

3.3 Limitations

New Restrictions

Unsupported functionalities

4.How it's Implemented

5. What's Next

FilesExpand file tree

intro-v3-engine.md

Latest commit

History

intro-v3-engine.md

File metadata and controls

PPL Engine V3 (for 3.0.0)

1. Motivations

Why Apache Calcite?

2. What's New

3.What are Changed

3.1 Breaking Changes

3.2 Fallback Mechanism

3.3 Limitations

New Restrictions

Unsupported functionalities

4.How it's Implemented

5. What's Next