planner: utilize prefix of columns in order-limit clause to avoid full table scan in join queries

## Enhancement
See the query and plan below, we have to full scan these 2 tables since columns in the order-limit clause are from different sides, then we can't push the Limit down through the Join.
```
create table t1 (a int, b int, c int, d int, key b(b));
create table t2 (a int, b int, c int, d int, key a(a));

explain select *  from t1 join t2 on t1.a=t2.a
order by t1.b, t2.b limit 100;
+--------------------------------+----------+-----------+---------------+----------------------------------------------+
| id                             | estRows  | task      | access object | operator info                                |
+--------------------------------+----------+-----------+---------------+----------------------------------------------+
| TopN_14                        | 100.00   | root      |               | test.t1.b, test.t2.b, offset:0, count:100    |
| └─HashJoin_32                  | 12487.50 | root      |               | inner join, equal:[eq(test.t1.a, test.t2.a)] |
|   ├─TableReader_43(Build)      | 9990.00  | root      |               | data:Selection_42                            |
|   │ └─Selection_42             | 9990.00  | cop[tikv] |               | not(isnull(test.t2.a))                       |
|   │   └─TableFullScan_41       | 10000.00 | cop[tikv] | table:t2      | keep order:false, stats:pseudo               |
|   └─TableReader_36(Probe)      | 9990.00  | root      |               | data:Selection_35                            |
|     └─Selection_35             | 9990.00  | cop[tikv] |               | not(isnull(test.t1.a))                       |
|       └─TableFullScan_34       | 10000.00 | cop[tikv] | table:t1      | keep order:false, stats:pseudo               |
+--------------------------------+----------+-----------+---------------+----------------------------------------------+
```

But actually, we could first use some prefix columns of order-limit clause to avoid full table scan, and then order the final results by 2 columns at the end, which is like:
```
-- it seems like we rewrite the query above to the below query:
explain select * from (
  select /*+ order_index(t1, b) */ 
  t1.b as t1b, t2.b as t2b from t1 join t2 on t1.a=t2.a
  order by t1.b limit 100
) tt order by tt.t1b, tt.t2b limit 100;

+-----------------------------------------+---------+-----------+----------------------+-----------------------------------------------------------------------------------------------------------------+
| id                                      | estRows | task      | access object        | operator info                                                                                                   |
+-----------------------------------------+---------+-----------+----------------------+-----------------------------------------------------------------------------------------------------------------+
| TopN_18                                 | 100.00  | root      |                      | test.t1.b, test.t2.b, offset:0, count:100                                                                       |
| └─Limit_29                              | 100.00  | root      |                      | offset:0, count:100                                                                                             |
|   └─IndexHashJoin_51                    | 100.00  | root      |                      | inner join, inner:IndexLookUp_60, outer key:test.t1.a, inner key:test.t2.a, equal cond:eq(test.t1.a, test.t2.a) |
|     ├─Projection_56(Build)              | 80.00   | root      |                      | test.t1.a, test.t1.b                                                                                            |
|     │ └─IndexLookUp_55                  | 80.00   | root      |                      |                                                                                                                 |
|     │   ├─IndexFullScan_52(Build)       | 179.28  | cop[tikv] | table:t1, index:b(b) | keep order:true, stats:pseudo                                                                                   |
|     │   └─Selection_54(Probe)           | 80.00   | cop[tikv] |                      | not(isnull(test.t1.a))                                                                                          |
|     │     └─TableRowIDScan_53           | 179.28  | cop[tikv] | table:t1             | keep order:false, stats:pseudo                                                                                  |
|     └─IndexLookUp_60(Probe)             | 100.00  | root      |                      |                                                                                                                 |
|       ├─Selection_59(Build)             | 100.00  | cop[tikv] |                      | not(isnull(test.t2.a))                                                                                          |
|       │ └─IndexRangeScan_57             | 100.10  | cop[tikv] | table:t2, index:a(a) | range: decided by [eq(test.t2.a, test.t1.a)], keep order:false, stats:pseudo                                    |
|       └─TableRowIDScan_58(Probe)        | 100.00  | cop[tikv] | table:t2             | keep order:false, stats:pseudo                                                                                  |
+-----------------------------------------+---------+-----------+----------------------+-----------------------------------------------------------------------------------------------------------------+
```

One thing we have to be careful with is that we need to read all rows in t1 that have the same value on column t1.b with the 100th row.
In the below example, we have to read 1~102 rows, since row-101 and row-102 have the same t1.b value with row-100.
So the `limit 100` in the sub-query is not exactly accurate.
```
row_id    t1.b
1         1
2         1
3         2
...
99        13
100       13
101       13
102       13
103       14
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

planner: utilize prefix of columns in order-limit clause to avoid full table scan in join queries #63280

Enhancement

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

planner: utilize prefix of columns in order-limit clause to avoid full table scan in join queries #63280

Description

Enhancement

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions