Optimizer: Support Partial Order TopN Optimization with Prefix Index
Background
Current Limitation
Currently, TiDB's optimizer cannot effectively utilize prefix indexes for ORDER BY ... LIMIT queries. When a query uses ORDER BY on a column with a prefix index, the optimizer either:
- Performs a full table scan - Reading all rows, sorting them, and then applying the limit
- Ignores the prefix index entirely - Missing the opportunity to leverage the index's partial ordering property
This leads to significant performance issues, especially for large tables with high-cardinality columns like email addresses, URLs, or text descriptions.
Concrete Example
Table Schema:
CREATE TABLE users (
id INT PRIMARY KEY,
email VARCHAR(255),
name VARCHAR(100),
created_at TIMESTAMP,
INDEX idx_email_prefix (email(10)) -- Prefix index on first 10 bytes
);
-- Sample data: 10 million rows
INSERT INTO users VALUES
(1, 'alice@example.com', 'Alice', '2024-01-01'),
(2, 'alice@company.com', 'Alice Smith', '2024-01-02'),
(3, 'bob@example.com', 'Bob', '2024-01-03'),
(4, 'bob@another.com', 'Bob Jones', '2024-01-04'),
...
(10000000, 'zoe@test.com', 'Zoe', '2024-12-31');
Problematic Query:
SELECT * FROM users ORDER BY email LIMIT 10;
Current Execution Plan (Inefficient):
TopN_7 | 10.00 | root | offset:0, count:10
└─TableReader_15 | 10.00 | root | data:TopN_14
└─TopN_14 | 10.00 | cop[tikv] | offset:0, count:10
└─TableFullScan_13 | 10000000.00 | cop[tikv] | table:users | keep order:false
Performance Issues:
- Scans all 10 million rows from storage
- Sorts all rows in memory (CPU and memory intensive)
- Network transfer overhead for large dataset
- Query time: 10+ seconds (depending on data volume)
Why Can't Current Optimizer Use the Prefix Index?
The prefix index idx_email_prefix (email(10)) provides partial ordering:
- Rows with the same 10-byte prefix (e.g.,
alice@exam, alice@comp) are grouped together
- Within each prefix group, rows are unordered
- The optimizer doesn't recognize this partial ordering property
Real-World Impact
This limitation affects common use cases:
-
Pagination queries:
SELECT * FROM users ORDER BY email LIMIT 100 OFFSET 0;
SELECT * FROM users ORDER BY email LIMIT 100 OFFSET 100;
-
API responses with sorting:
SELECT id, email, name FROM products ORDER BY description LIMIT 20;
-
Admin dashboards showing sorted lists:
SELECT * FROM logs ORDER BY url LIMIT 50;
All these queries suffer from full table scans when prefix indexes are used.
Proposed Solution: Partial Order TopN Optimization
Key Insight
When a prefix index exists on the ORDER BY column, we can leverage its partial ordering property:
- The prefix index provides global ordering by prefix
- For TopN queries, we only need to read N + X rows, where:
N = desired result count
X = extra rows with the same prefix as the Nth row
This avoids reading the entire table while still guaranteeing correct results.
Example of Partial Ordering
With idx_email_prefix (email(10)):
Prefix (10 bytes) | Full Email | Logical Group
------------------|-----------------------|---------------
alice@exam | alice@example.com | Group A (2 rows)
alice@exam | alice@example.org |
alice@comp | alice@company.com | Group B (1 row)
bob@exampl | bob@example.com | Group C (3 rows)
bob@exampl | bob@example.org |
bob@exampl | bob@example.net |
carol@test | carol@test.com | Group D (1 row)
...
For LIMIT 3:
- Without optimization: Read all rows (millions)
- With optimization: Read Group A + Group B ≈ 3 + X rows (X = group size variance)
Optimized Execution Plan
Query:
SELECT * FROM users ORDER BY email LIMIT 10;
New Execution Plan (Two-Layer Mode):
TopN_7 | 10.00 | root | order by email, offset:0, count:10, partial_order_limit:10, prefix_col_id:2, prefix_len:10
└─IndexLookUp_14 | 10.00 | root |
├─Limit_13(Build) | 10.00 | cop[tikv] | count:10, partial_order_limit:10, prefix_col_id:2, prefix_len:10
│ └─IndexRangeScan_11 | 10.00 | cop[tikv] | table:users, index:idx_email_prefix(email), range:[NULL,+inf], keep order:true
└─TableRowIDScan_12(Probe) | 10.00 | cop[tikv] | table:users, keep order:false
Key Improvements:
- IndexRangeScan with KeepOrder - Reads index in sorted order
- Partial Ordered Limit on TiKV - Stops reading after N+X rows based on prefix matching
- Partial Ordered TopN on TiDB - Final sorting and short-circuiting based on prefix
Performance Gains:
- Scans ~10-20 rows instead of 10 million rows
- Query time: < 10ms (1000x improvement)
- Reduced network transfer
- Reduced memory usage
Algorithm Details
Short-Circuit Logic:
Read rows in index order:
For each row R:
If result set size < N:
Add R to result set
Else if R.prefix == result[N-1].prefix:
Add R to result set (extra row X)
Else:
STOP reading (short-circuit)
Sort result set by full column value
Return top N rows
Example Execution:
SELECT * FROM users ORDER BY email LIMIT 3;
-
Read from idx_email_prefix in order:
Row 1: alice@example.com (prefix: alice@exam) → Add to result [1]
Row 2: alice@example.org (prefix: alice@exam) → Add to result [2]
Row 3: alice@company.com (prefix: alice@comp) → Add to result [3]
Row 4: bob@example.com (prefix: bob@exampl) → STOP (different prefix, N=3 reached)
-
Sort 3 rows by full email:
alice@company.com
alice@example.com
alice@example.org
-
Return top 3 results
Statistics-Based Estimation:
The optimizer estimates X using:
NDV (number of distinct values) in index statistics
TopN statistics (most frequent prefixes)
- Safety factor: cap at 10% of total rows (configurable)
Two-Layer vs One-Layer Plans
Two-Layer Mode (when pushdown is possible):
- TiKV Layer: Partial ordered Limit (with prefix-based short-circuit)
- TiDB Layer: Partial ordered TopN (final sorting + short-circuit)
- Condition: No Selection filters on table scan
One-Layer Mode (when pushdown is not possible):
- TiDB Layer: Only partial ordered TopN
- Condition: Selection filters exist on table scan (may reorder rows)
Benefits
1. Performance Improvement
For queries with prefix indexes on ORDER BY columns:
- 100-1000x faster for large tables (millions of rows)
- Reduced I/O from O(N) table scan to O(LIMIT + X)
- Lower memory usage (sorting 10s of rows instead of millions)
2. Better Resource Utilization
- Network bandwidth: Transfer ~10 rows instead of millions
- CPU: Sort ~10 rows instead of millions
- Memory: Buffer ~10 rows instead of millions
3. Backward Compatible
- Controlled by system variable
tidb_opt_partial_ordered_index_for_topn (default OFF)
- No impact on existing queries when disabled
- Gradually enable for specific workloads
4. Cost-Based Decision
- Optimizer estimates
X using statistics
- Falls back to normal plan if estimated cost is higher
- Configurable via
tidb_opt_partial_ordered_max_limit_percent
Use Cases
1. Pagination APIs
-- First page
SELECT * FROM products ORDER BY name LIMIT 20 OFFSET 0;
-- Second page
SELECT * FROM products ORDER BY name LIMIT 20 OFFSET 20;
2. Admin Dashboards
-- Show recent users by email
SELECT id, email, created_at FROM users ORDER BY email LIMIT 100;
-- Show logs sorted by URL
SELECT * FROM access_logs ORDER BY url LIMIT 50;
3. Prefix Index Use Cases
Prefix indexes are commonly used for:
- Long VARCHAR columns (email, URL, description)
- Reducing index size while maintaining sort capability
- Balancing storage vs query performance
This optimization makes prefix indexes much more practical for ORDER BY queries.
Optimizer: Support Partial Order TopN Optimization with Prefix Index
Background
Current Limitation
Currently, TiDB's optimizer cannot effectively utilize prefix indexes for
ORDER BY ... LIMITqueries. When a query usesORDER BYon a column with a prefix index, the optimizer either:This leads to significant performance issues, especially for large tables with high-cardinality columns like email addresses, URLs, or text descriptions.
Concrete Example
Table Schema:
Problematic Query:
Current Execution Plan (Inefficient):
Performance Issues:
Why Can't Current Optimizer Use the Prefix Index?
The prefix index
idx_email_prefix (email(10))provides partial ordering:alice@exam,alice@comp) are grouped togetherReal-World Impact
This limitation affects common use cases:
Pagination queries:
API responses with sorting:
Admin dashboards showing sorted lists:
All these queries suffer from full table scans when prefix indexes are used.
Proposed Solution: Partial Order TopN Optimization
Key Insight
When a prefix index exists on the
ORDER BYcolumn, we can leverage its partial ordering property:N= desired result countX= extra rows with the same prefix as the Nth rowThis avoids reading the entire table while still guaranteeing correct results.
Example of Partial Ordering
With
idx_email_prefix (email(10)):For
LIMIT 3:Optimized Execution Plan
Query:
New Execution Plan (Two-Layer Mode):
Key Improvements:
Performance Gains:
Algorithm Details
Short-Circuit Logic:
Example Execution:
Read from
idx_email_prefixin order:Sort 3 rows by full email:
Return top 3 results
Statistics-Based Estimation:
The optimizer estimates
Xusing:NDV(number of distinct values) in index statisticsTopNstatistics (most frequent prefixes)Two-Layer vs One-Layer Plans
Two-Layer Mode (when pushdown is possible):
One-Layer Mode (when pushdown is not possible):
Benefits
1. Performance Improvement
For queries with prefix indexes on ORDER BY columns:
2. Better Resource Utilization
3. Backward Compatible
tidb_opt_partial_ordered_index_for_topn(default OFF)4. Cost-Based Decision
Xusing statisticstidb_opt_partial_ordered_max_limit_percentUse Cases
1. Pagination APIs
2. Admin Dashboards
3. Prefix Index Use Cases
Prefix indexes are commonly used for:
This optimization makes prefix indexes much more practical for ORDER BY queries.