Skip to content

planner: utilize prefix of columns in order-limit clause to avoid full table scan in join queries #63280

@qw4990

Description

@qw4990

Enhancement

See the query and plan below, we have to full scan these 2 tables since columns in the order-limit clause are from different sides, then we can't push the Limit down through the Join.

create table t1 (a int, b int, c int, d int, key b(b));
create table t2 (a int, b int, c int, d int, key a(a));

explain select *  from t1 join t2 on t1.a=t2.a
order by t1.b, t2.b limit 100;
+--------------------------------+----------+-----------+---------------+----------------------------------------------+
| id                             | estRows  | task      | access object | operator info                                |
+--------------------------------+----------+-----------+---------------+----------------------------------------------+
| TopN_14                        | 100.00   | root      |               | test.t1.b, test.t2.b, offset:0, count:100    |
| └─HashJoin_32                  | 12487.50 | root      |               | inner join, equal:[eq(test.t1.a, test.t2.a)] |
|   ├─TableReader_43(Build)      | 9990.00  | root      |               | data:Selection_42                            |
|   │ └─Selection_42             | 9990.00  | cop[tikv] |               | not(isnull(test.t2.a))                       |
|   │   └─TableFullScan_41       | 10000.00 | cop[tikv] | table:t2      | keep order:false, stats:pseudo               |
|   └─TableReader_36(Probe)      | 9990.00  | root      |               | data:Selection_35                            |
|     └─Selection_35             | 9990.00  | cop[tikv] |               | not(isnull(test.t1.a))                       |
|       └─TableFullScan_34       | 10000.00 | cop[tikv] | table:t1      | keep order:false, stats:pseudo               |
+--------------------------------+----------+-----------+---------------+----------------------------------------------+

But actually, we could first use some prefix columns of order-limit clause to avoid full table scan, and then order the final results by 2 columns at the end, which is like:

-- it seems like we rewrite the query above to the below query:
explain select * from (
  select /*+ order_index(t1, b) */ 
  t1.b as t1b, t2.b as t2b from t1 join t2 on t1.a=t2.a
  order by t1.b limit 100
) tt order by tt.t1b, tt.t2b limit 100;

+-----------------------------------------+---------+-----------+----------------------+-----------------------------------------------------------------------------------------------------------------+
| id                                      | estRows | task      | access object        | operator info                                                                                                   |
+-----------------------------------------+---------+-----------+----------------------+-----------------------------------------------------------------------------------------------------------------+
| TopN_18                                 | 100.00  | root      |                      | test.t1.b, test.t2.b, offset:0, count:100                                                                       |
| └─Limit_29                              | 100.00  | root      |                      | offset:0, count:100                                                                                             |
|   └─IndexHashJoin_51                    | 100.00  | root      |                      | inner join, inner:IndexLookUp_60, outer key:test.t1.a, inner key:test.t2.a, equal cond:eq(test.t1.a, test.t2.a) |
|     ├─Projection_56(Build)              | 80.00   | root      |                      | test.t1.a, test.t1.b                                                                                            |
|     │ └─IndexLookUp_55                  | 80.00   | root      |                      |                                                                                                                 |
|     │   ├─IndexFullScan_52(Build)       | 179.28  | cop[tikv] | table:t1, index:b(b) | keep order:true, stats:pseudo                                                                                   |
|     │   └─Selection_54(Probe)           | 80.00   | cop[tikv] |                      | not(isnull(test.t1.a))                                                                                          |
|     │     └─TableRowIDScan_53           | 179.28  | cop[tikv] | table:t1             | keep order:false, stats:pseudo                                                                                  |
|     └─IndexLookUp_60(Probe)             | 100.00  | root      |                      |                                                                                                                 |
|       ├─Selection_59(Build)             | 100.00  | cop[tikv] |                      | not(isnull(test.t2.a))                                                                                          |
|       │ └─IndexRangeScan_57             | 100.10  | cop[tikv] | table:t2, index:a(a) | range: decided by [eq(test.t2.a, test.t1.a)], keep order:false, stats:pseudo                                    |
|       └─TableRowIDScan_58(Probe)        | 100.00  | cop[tikv] | table:t2             | keep order:false, stats:pseudo                                                                                  |
+-----------------------------------------+---------+-----------+----------------------+-----------------------------------------------------------------------------------------------------------------+

One thing we have to be careful with is that we need to read all rows in t1 that have the same value on column t1.b with the 100th row.
In the below example, we have to read 1~102 rows, since row-101 and row-102 have the same t1.b value with row-100.
So the limit 100 in the sub-query is not exactly accurate.

row_id    t1.b
1         1
2         1
3         2
...
99        13
100       13
101       13
102       13
103       14

Metadata

Metadata

Assignees

Labels

affects-8.5This bug affects the 8.5.x(LTS) versions.plan-rewritereport/customerCustomers have encountered this bug.sig/plannerSIG: Plannertype/enhancementThe issue or PR belongs to an enhancement.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions