Currently, for the following 4 kinds of query, outer join is generated, but they can be converted to inner join semantically.
create table t1(id primary key, a int, b int);
create table t2(id primary key, a int, b int);
mysql> explain select * from t1 left join t2 on t1.a=t2.a where t1.b = t2.b and t1.b = 2;
+---------------------------+----------+------+-------------------------------------------------------------------------+
| id | count | task | operator info |
+---------------------------+----------+------+-------------------------------------------------------------------------+
| Selection_7 | 10.00 | root | eq(test.t1.b, test.t2.b) |
| └─HashLeftJoin_8 | 12.50 | root | left outer join, inner:TableReader_13, equal:[eq(test.t1.a, test.t2.a)] |
| ├─TableReader_11 | 10.00 | root | data:Selection_10 |
| │ └─Selection_10 | 10.00 | cop | eq(test.t1.b, 2) |
| │ └─TableScan_9 | 10000.00 | cop | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo |
| └─TableReader_13 | 10000.00 | root | data:TableScan_12 |
| └─TableScan_12 | 10000.00 | cop | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo |
+---------------------------+----------+------+-------------------------------------------------------------------------+
7 rows in set (0.01 sec)
mysql> explain select * from t1 left join t2 on t1.a = 1 where t1.a = t2.a;
+-------------------------+--------------+------+---------------------------------------------------------------------+
| id | count | task | operator info |
+-------------------------+--------------+------+---------------------------------------------------------------------+
| Selection_7 | 80000000.00 | root | eq(test.t1.a, test.t2.a) |
| └─HashLeftJoin_8 | 100000000.00 | root | left outer join, inner:TableReader_12, left cond:[eq(test.t1.a, 1)] |
| ├─TableReader_10 | 10000.00 | root | data:TableScan_9 |
| │ └─TableScan_9 | 10000.00 | cop | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo |
| └─TableReader_12 | 10000.00 | root | data:TableScan_11 |
| └─TableScan_11 | 10000.00 | cop | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo |
+-------------------------+--------------+------+---------------------------------------------------------------------+
6 rows in set (0.00 sec)
mysql> explain select * from t1 left join t2 on t1.a = t2.a where t1.b = t2.b and t1.b > 1;
+---------------------------+----------+------+-------------------------------------------------------------------------+
| id | count | task | operator info |
+---------------------------+----------+------+-------------------------------------------------------------------------+
| Selection_7 | 3333.33 | root | eq(test.t1.b, test.t2.b) |
| └─HashLeftJoin_8 | 4166.67 | root | left outer join, inner:TableReader_13, equal:[eq(test.t1.a, test.t2.a)] |
| ├─TableReader_11 | 3333.33 | root | data:Selection_10 |
| │ └─Selection_10 | 3333.33 | cop | gt(test.t1.b, 1) |
| │ └─TableScan_9 | 10000.00 | cop | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo |
| └─TableReader_13 | 10000.00 | root | data:TableScan_12 |
| └─TableScan_12 | 10000.00 | cop | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo |
+---------------------------+----------+------+-------------------------------------------------------------------------+
7 rows in set (0.00 sec)
mysql> explain select * from t1 left join t2 on t1.a > 1 where t1.a = t2.a;
+-------------------------+--------------+------+---------------------------------------------------------------------+
| id | count | task | operator info |
+-------------------------+--------------+------+---------------------------------------------------------------------+
| Selection_7 | 80000000.00 | root | eq(test.t1.a, test.t2.a) |
| └─HashLeftJoin_8 | 100000000.00 | root | left outer join, inner:TableReader_12, left cond:[gt(test.t1.a, 1)] |
| ├─TableReader_10 | 10000.00 | root | data:TableScan_9 |
| │ └─TableScan_9 | 10000.00 | cop | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo |
| └─TableReader_12 | 10000.00 | root | data:TableScan_11 |
| └─TableScan_11 | 10000.00 | cop | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo |
+-------------------------+--------------+------+---------------------------------------------------------------------+
6 rows in set (0.00 sec)
If we can convert them to inner join, we can apply more predicate pushdown optimizations, and we have more choices when exploring physical join operators.
rewrite the query to a semantically equivalent query by constant propagation, then this new query is able to be converted to inner join in function simplifyOuterJoin. For the above 4 kinds of query, the query after rewrite is:
select * from t1 left join t2 on t1.a=t2.a where t1.b = t2.b and t1.b = 2;
=> select * from t1 left join t2 on t1.a = t2.a where t2.b = 2 and t1.b = 2;
select * from t1 left join t2 on t1.a = 1 where t1.a = t2.a;
=> select * from t1 left join t2 on t1.a = 1 where 1 = t2.a;
select * from t1 left join t2 on t1.a = t2.a where t1.b = t2.b and t1.b > 1;
=> select * from t1 left join t2 on t1.a = t2.a where t1.b = t2.b and t1.b > 1 and t2.b > 1;
select * from t1 left join t2 on t1.a > 1 where t1.a = t2.a;
=> select * from t1 left join t2 on t1.a > 1 where t1.a = t2.a and t2.a > 1;
Feature Request
Is your feature request related to a problem? Please describe:
Currently, for the following 4 kinds of query, outer join is generated, but they can be converted to inner join semantically.
If we can convert them to inner join, we can apply more predicate pushdown optimizations, and we have more choices when exploring physical join operators.
Describe the feature you'd like:
rewrite the query to a semantically equivalent query by constant propagation, then this new query is able to be converted to inner join in function
simplifyOuterJoin. For the above 4 kinds of query, the query after rewrite is:This enhancement should be a small step forward to tackle #7559