Previously, we developed SQL engine V2 to support both SQL and PPL queries. However, as the complexity of supported SQL and PPL increased, the engine's limitations became increasingly apparent. Two major issues emerged:
-
Insufficient support for complex SQL/PPL queries: The development cycle for new commands such as
JoinandSubquerywas lengthy, and it was difficult to achieve high robustness. -
Lack of advanced query plan optimization: The V2 engine only supports a few pushdown optimization for certain operators and lacks of mature optimization rules and cost-based optimizer like those found in traditional databases. Query performance and scalability are core to design of PPL, enabling it to efficiently handle high-performance queries and scale to support large datasets and complex queries.
Introducing Apache Calcite brings serval significant advantages:
-
Enhanced query plan optimization capabilities: Calcite's optimizer can effectively optimize execution plans for both complex SQL and PPL queries.
-
Simplified development of new commands and functions: Expanding PPL commands is one of the key targets to enhancing the PPL language. Calcite helps streamline the development cycle for new commands and functions.
-
Decoupled execution layer: Calcite can be used for both query optimization and execution, or solely for query optimization while delegating execution to other backends such as DataFusion or Velox."
Find more details in V3 Architecture.
In the initial release of the V3 engine (3.0.0), the main new features focus on enhancing the PPL language while maintaining maximum compatibility with V2 behavior.
Because of implementation changed internally, following behaviors are changed from 3.0.0. (Behaviors in V3 is correct)
| Item | V2 | V3 |
|---|---|---|
Return type of timestampdiff |
timestamp | int |
Return type of regexp |
int | boolean |
Return type of count,dc,distinct_count |
int | bigint |
Return type of ceiling,floor,sign |
int | same type with input |
| like(firstname, 'Ambe_') on value "Amber JOHnny" | true | false |
| like(firstname, 'Ambe*') on value "Amber JOHnny" | true | false |
| cast(firstname as boolean) | false | null |
Sum multiple null values when pushdown enabled |
0 | null |
As v3 engine is experimental in 3.0.0, not all PPL commands could work under this new engine. Those unsupported queries will be forwarded to V2 engine by fallback mechanism. To avoid impact on your side, normally you won't see any difference in a query response. If you want to check if and why your query falls back to be handled by V2 engine, please check OpenSearch log for "Fallback to V2 query engine since ...".
For the following commands or functions, we add some defensive restrictions to ensure security.
EVALwon't allow to use Metadata Fields of OpenSearch as the fieldsRENAMEwon't allow renaming to a Metadata Fields of OpenSearchaswon't allow to use Metadata Fields of OpenSearch as the alias name
For the following functionalities in V3 engine, the query will be forwarded to the V2 query engine and thus you cannot use new features in 2. What's New.
- All SQL queries
dedupwithconsecutive=true- Search relevant commands
- AD
- ML
- Kmeans
- Commands with
fetch_sizeparameter - Search relevant functions
- match
- match_phrase
- match_bool_prefix
- match_phrase_prefix
- simple_query_string
- query_string
- multi_match
- Existed limitations of V2
If you're interested in the new query engine, please find more details in V3 Architecture.
The following items are on our roadmap with high priority:
- Resolve the V3 limitation.
- Advancing pushdown optimization and benchmarking
- Unified the PPL syntax between PPL-on-OpenSearch and PPL-on-Spark
- Support more DSL aggregation