Enhancement
See the query and plan below, we have to full scan these 2 tables since columns in the order-limit clause are from different sides, then we can't push the Limit down through the Join.
create table t1 (a int, b int, c int, d int, key b(b));
create table t2 (a int, b int, c int, d int, key a(a));
explain select * from t1 join t2 on t1.a=t2.a
order by t1.b, t2.b limit 100;
+--------------------------------+----------+-----------+---------------+----------------------------------------------+
| id | estRows | task | access object | operator info |
+--------------------------------+----------+-----------+---------------+----------------------------------------------+
| TopN_14 | 100.00 | root | | test.t1.b, test.t2.b, offset:0, count:100 |
| └─HashJoin_32 | 12487.50 | root | | inner join, equal:[eq(test.t1.a, test.t2.a)] |
| ├─TableReader_43(Build) | 9990.00 | root | | data:Selection_42 |
| │ └─Selection_42 | 9990.00 | cop[tikv] | | not(isnull(test.t2.a)) |
| │ └─TableFullScan_41 | 10000.00 | cop[tikv] | table:t2 | keep order:false, stats:pseudo |
| └─TableReader_36(Probe) | 9990.00 | root | | data:Selection_35 |
| └─Selection_35 | 9990.00 | cop[tikv] | | not(isnull(test.t1.a)) |
| └─TableFullScan_34 | 10000.00 | cop[tikv] | table:t1 | keep order:false, stats:pseudo |
+--------------------------------+----------+-----------+---------------+----------------------------------------------+
But actually, we could first use some prefix columns of order-limit clause to avoid full table scan, and then order the final results by 2 columns at the end, which is like:
-- it seems like we rewrite the query above to the below query:
explain select * from (
select /*+ order_index(t1, b) */
t1.b as t1b, t2.b as t2b from t1 join t2 on t1.a=t2.a
order by t1.b limit 100
) tt order by tt.t1b, tt.t2b limit 100;
+-----------------------------------------+---------+-----------+----------------------+-----------------------------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+-----------------------------------------+---------+-----------+----------------------+-----------------------------------------------------------------------------------------------------------------+
| TopN_18 | 100.00 | root | | test.t1.b, test.t2.b, offset:0, count:100 |
| └─Limit_29 | 100.00 | root | | offset:0, count:100 |
| └─IndexHashJoin_51 | 100.00 | root | | inner join, inner:IndexLookUp_60, outer key:test.t1.a, inner key:test.t2.a, equal cond:eq(test.t1.a, test.t2.a) |
| ├─Projection_56(Build) | 80.00 | root | | test.t1.a, test.t1.b |
| │ └─IndexLookUp_55 | 80.00 | root | | |
| │ ├─IndexFullScan_52(Build) | 179.28 | cop[tikv] | table:t1, index:b(b) | keep order:true, stats:pseudo |
| │ └─Selection_54(Probe) | 80.00 | cop[tikv] | | not(isnull(test.t1.a)) |
| │ └─TableRowIDScan_53 | 179.28 | cop[tikv] | table:t1 | keep order:false, stats:pseudo |
| └─IndexLookUp_60(Probe) | 100.00 | root | | |
| ├─Selection_59(Build) | 100.00 | cop[tikv] | | not(isnull(test.t2.a)) |
| │ └─IndexRangeScan_57 | 100.10 | cop[tikv] | table:t2, index:a(a) | range: decided by [eq(test.t2.a, test.t1.a)], keep order:false, stats:pseudo |
| └─TableRowIDScan_58(Probe) | 100.00 | cop[tikv] | table:t2 | keep order:false, stats:pseudo |
+-----------------------------------------+---------+-----------+----------------------+-----------------------------------------------------------------------------------------------------------------+
One thing we have to be careful with is that we need to read all rows in t1 that have the same value on column t1.b with the 100th row.
In the below example, we have to read 1~102 rows, since row-101 and row-102 have the same t1.b value with row-100.
So the limit 100 in the sub-query is not exactly accurate.
row_id t1.b
1 1
2 1
3 2
...
99 13
100 13
101 13
102 13
103 14
Enhancement
See the query and plan below, we have to full scan these 2 tables since columns in the order-limit clause are from different sides, then we can't push the Limit down through the Join.
But actually, we could first use some prefix columns of order-limit clause to avoid full table scan, and then order the final results by 2 columns at the end, which is like:
One thing we have to be careful with is that we need to read all rows in t1 that have the same value on column t1.b with the 100th row.
In the below example, we have to read 1~102 rows, since row-101 and row-102 have the same t1.b value with row-100.
So the
limit 100in the sub-query is not exactly accurate.