Skip to content

fold column equal conditions when simplifying outer join #7679

@eurekaka

Description

@eurekaka

Feature Request

Is your feature request related to a problem? Please describe:

Currently, for the following 4 kinds of query, outer join is generated, but they can be converted to inner join semantically.

create table t1(id primary key, a int, b int);
create table t2(id primary key, a int, b int);

mysql> explain select * from t1 left join t2 on t1.a=t2.a where t1.b = t2.b and t1.b = 2;
+---------------------------+----------+------+-------------------------------------------------------------------------+
| id                        | count    | task | operator info                                                           |
+---------------------------+----------+------+-------------------------------------------------------------------------+
| Selection_7               | 10.00    | root | eq(test.t1.b, test.t2.b)                                                |
| └─HashLeftJoin_8          | 12.50    | root | left outer join, inner:TableReader_13, equal:[eq(test.t1.a, test.t2.a)] |
|   ├─TableReader_11        | 10.00    | root | data:Selection_10                                                       |
|   │ └─Selection_10        | 10.00    | cop  | eq(test.t1.b, 2)                                                        |
|   │   └─TableScan_9       | 10000.00 | cop  | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo             |
|   └─TableReader_13        | 10000.00 | root | data:TableScan_12                                                       |
|     └─TableScan_12        | 10000.00 | cop  | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo             |
+---------------------------+----------+------+-------------------------------------------------------------------------+
7 rows in set (0.01 sec)

mysql> explain select * from t1 left join t2 on t1.a = 1 where t1.a = t2.a;
+-------------------------+--------------+------+---------------------------------------------------------------------+
| id                      | count        | task | operator info                                                       |
+-------------------------+--------------+------+---------------------------------------------------------------------+
| Selection_7             | 80000000.00  | root | eq(test.t1.a, test.t2.a)                                            |
| └─HashLeftJoin_8        | 100000000.00 | root | left outer join, inner:TableReader_12, left cond:[eq(test.t1.a, 1)] |
|   ├─TableReader_10      | 10000.00     | root | data:TableScan_9                                                    |
|   │ └─TableScan_9       | 10000.00     | cop  | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo         |
|   └─TableReader_12      | 10000.00     | root | data:TableScan_11                                                   |
|     └─TableScan_11      | 10000.00     | cop  | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo         |
+-------------------------+--------------+------+---------------------------------------------------------------------+
6 rows in set (0.00 sec)

mysql> explain select * from t1 left join t2 on t1.a = t2.a where t1.b = t2.b and t1.b > 1;
+---------------------------+----------+------+-------------------------------------------------------------------------+
| id                        | count    | task | operator info                                                           |
+---------------------------+----------+------+-------------------------------------------------------------------------+
| Selection_7               | 3333.33  | root | eq(test.t1.b, test.t2.b)                                                |
| └─HashLeftJoin_8          | 4166.67  | root | left outer join, inner:TableReader_13, equal:[eq(test.t1.a, test.t2.a)] |
|   ├─TableReader_11        | 3333.33  | root | data:Selection_10                                                       |
|   │ └─Selection_10        | 3333.33  | cop  | gt(test.t1.b, 1)                                                        |
|   │   └─TableScan_9       | 10000.00 | cop  | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo             |
|   └─TableReader_13        | 10000.00 | root | data:TableScan_12                                                       |
|     └─TableScan_12        | 10000.00 | cop  | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo             |
+---------------------------+----------+------+-------------------------------------------------------------------------+
7 rows in set (0.00 sec)

mysql> explain select * from t1 left join t2 on t1.a > 1 where t1.a = t2.a;
+-------------------------+--------------+------+---------------------------------------------------------------------+
| id                      | count        | task | operator info                                                       |
+-------------------------+--------------+------+---------------------------------------------------------------------+
| Selection_7             | 80000000.00  | root | eq(test.t1.a, test.t2.a)                                            |
| └─HashLeftJoin_8        | 100000000.00 | root | left outer join, inner:TableReader_12, left cond:[gt(test.t1.a, 1)] |
|   ├─TableReader_10      | 10000.00     | root | data:TableScan_9                                                    |
|   │ └─TableScan_9       | 10000.00     | cop  | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo         |
|   └─TableReader_12      | 10000.00     | root | data:TableScan_11                                                   |
|     └─TableScan_11      | 10000.00     | cop  | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo         |
+-------------------------+--------------+------+---------------------------------------------------------------------+
6 rows in set (0.00 sec)

If we can convert them to inner join, we can apply more predicate pushdown optimizations, and we have more choices when exploring physical join operators.

Describe the feature you'd like:

rewrite the query to a semantically equivalent query by constant propagation, then this new query is able to be converted to inner join in function simplifyOuterJoin. For the above 4 kinds of query, the query after rewrite is:

select * from t1 left join t2 on t1.a=t2.a where t1.b = t2.b and t1.b = 2;
=> select * from t1 left join t2 on t1.a = t2.a where t2.b = 2 and t1.b = 2;

select * from t1 left join t2 on t1.a = 1 where t1.a = t2.a;
=> select * from t1 left join t2 on t1.a = 1 where 1 = t2.a;

select * from t1 left join t2 on t1.a = t2.a where t1.b = t2.b and t1.b > 1;
=> select * from t1 left join t2 on t1.a = t2.a where t1.b = t2.b and t1.b > 1 and t2.b > 1;

select * from t1 left join t2 on t1.a > 1 where t1.a = t2.a;
=> select * from t1 left join t2 on t1.a > 1 where t1.a = t2.a and t2.a > 1;

This enhancement should be a small step forward to tackle #7559

Metadata

Metadata

Assignees

Labels

sig/plannerSIG: Plannertype/enhancementThe issue or PR belongs to an enhancement.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions