planner: Allow Optimizer to Use Prefix Index for ORDER BY / LIMIT to Avoid Full Scan

# Optimizer: Support Partial Order TopN Optimization with Prefix Index

## Background

### Current Limitation

Currently, TiDB's optimizer cannot effectively utilize prefix indexes for `ORDER BY ... LIMIT` queries. When a query uses `ORDER BY` on a column with a prefix index, the optimizer either:

1. **Performs a full table scan** - Reading all rows, sorting them, and then applying the limit
2. **Ignores the prefix index entirely** - Missing the opportunity to leverage the index's partial ordering property

This leads to significant performance issues, especially for large tables with high-cardinality columns like email addresses, URLs, or text descriptions.

### Concrete Example

**Table Schema:**
```sql
CREATE TABLE users (
    id INT PRIMARY KEY,
    email VARCHAR(255),
    name VARCHAR(100),
    created_at TIMESTAMP,
    INDEX idx_email_prefix (email(10))  -- Prefix index on first 10 bytes
);

-- Sample data: 10 million rows
INSERT INTO users VALUES
    (1, 'alice@example.com', 'Alice', '2024-01-01'),
    (2, 'alice@company.com', 'Alice Smith', '2024-01-02'),
    (3, 'bob@example.com', 'Bob', '2024-01-03'),
    (4, 'bob@another.com', 'Bob Jones', '2024-01-04'),
    ...
    (10000000, 'zoe@test.com', 'Zoe', '2024-12-31');
```

**Problematic Query:**
```sql
SELECT * FROM users ORDER BY email LIMIT 10;
```

**Current Execution Plan (Inefficient):**
```
TopN_7                    | 10.00  | root      | offset:0, count:10
└─TableReader_15          | 10.00  | root      | data:TopN_14
  └─TopN_14               | 10.00  | cop[tikv] | offset:0, count:10
    └─TableFullScan_13    | 10000000.00 | cop[tikv] | table:users | keep order:false
```

**Performance Issues:**
- Scans **all 10 million rows** from storage
- Sorts all rows in memory (CPU and memory intensive)
- Network transfer overhead for large dataset
- Query time: **10+ seconds** (depending on data volume)

**Why Can't Current Optimizer Use the Prefix Index?**

The prefix index `idx_email_prefix (email(10))` provides **partial ordering**:
- Rows with the same 10-byte prefix (e.g., `alice@exam`, `alice@comp`) are grouped together
- Within each prefix group, rows are unordered
- The optimizer doesn't recognize this partial ordering property

### Real-World Impact

This limitation affects common use cases:

1. **Pagination queries:**
   ```sql
   SELECT * FROM users ORDER BY email LIMIT 100 OFFSET 0;
   SELECT * FROM users ORDER BY email LIMIT 100 OFFSET 100;
   ```

2. **API responses with sorting:**
   ```sql
   SELECT id, email, name FROM products ORDER BY description LIMIT 20;
   ```

3. **Admin dashboards showing sorted lists:**
   ```sql
   SELECT * FROM logs ORDER BY url LIMIT 50;
   ```

All these queries suffer from full table scans when prefix indexes are used.

---

## Proposed Solution: Partial Order TopN Optimization

### Key Insight

When a prefix index exists on the `ORDER BY` column, we can leverage its **partial ordering** property:

1. The prefix index provides **global ordering by prefix**
2. For TopN queries, we only need to read **N + X** rows, where:
   - `N` = desired result count
   - `X` = extra rows with the same prefix as the Nth row

This avoids reading the entire table while still guaranteeing correct results.

### Example of Partial Ordering

With `idx_email_prefix (email(10))`:

```
Prefix (10 bytes) | Full Email            | Logical Group
------------------|-----------------------|---------------
alice@exam        | alice@example.com     | Group A (2 rows)
alice@exam        | alice@example.org     | 
alice@comp        | alice@company.com     | Group B (1 row)
bob@exampl        | bob@example.com       | Group C (3 rows)
bob@exampl        | bob@example.org       |
bob@exampl        | bob@example.net       |
carol@test        | carol@test.com        | Group D (1 row)
...
```

For `LIMIT 3`:
- Without optimization: Read **all rows** (millions)
- With optimization: Read **Group A + Group B** ≈ **3 + X rows** (X = group size variance)

### Optimized Execution Plan

**Query:**
```sql
SELECT * FROM users ORDER BY email LIMIT 10;
```

**New Execution Plan (Two-Layer Mode):**
```
TopN_7                                     | 10.00  | root      | order by email, offset:0, count:10, partial_order_limit:10, prefix_col_id:2, prefix_len:10
└─IndexLookUp_14                           | 10.00  | root      | 
  ├─Limit_13(Build)                        | 10.00  | cop[tikv] | count:10, partial_order_limit:10, prefix_col_id:2, prefix_len:10
  │ └─IndexRangeScan_11                    | 10.00  | cop[tikv] | table:users, index:idx_email_prefix(email), range:[NULL,+inf], keep order:true
  └─TableRowIDScan_12(Probe)               | 10.00  | cop[tikv] | table:users, keep order:false
```

**Key Improvements:**

1. **IndexRangeScan with KeepOrder** - Reads index in sorted order
2. **Partial Ordered Limit on TiKV** - Stops reading after N+X rows based on prefix matching
3. **Partial Ordered TopN on TiDB** - Final sorting and short-circuiting based on prefix

**Performance Gains:**
- Scans **~10-20 rows** instead of 10 million rows
- Query time: **< 10ms** (1000x improvement)
- Reduced network transfer
- Reduced memory usage

### Algorithm Details

**Short-Circuit Logic:**

```
Read rows in index order:
  For each row R:
    If result set size < N:
      Add R to result set
    Else if R.prefix == result[N-1].prefix:
      Add R to result set (extra row X)
    Else:
      STOP reading (short-circuit)
      
Sort result set by full column value
Return top N rows
```

**Example Execution:**
```sql
SELECT * FROM users ORDER BY email LIMIT 3;
```

1. Read from `idx_email_prefix` in order:
   ```
   Row 1: alice@example.com   (prefix: alice@exam) → Add to result [1]
   Row 2: alice@example.org   (prefix: alice@exam) → Add to result [2]
   Row 3: alice@company.com   (prefix: alice@comp) → Add to result [3]
   Row 4: bob@example.com     (prefix: bob@exampl) → STOP (different prefix, N=3 reached)
   ```

2. Sort 3 rows by full email:
   ```
   alice@company.com
   alice@example.com
   alice@example.org
   ```

3. Return top 3 results

**Statistics-Based Estimation:**

The optimizer estimates `X` using:
- `NDV` (number of distinct values) in index statistics
- `TopN` statistics (most frequent prefixes)
- Safety factor: cap at 10% of total rows (configurable)

### Two-Layer vs One-Layer Plans

**Two-Layer Mode** (when pushdown is possible):
- **TiKV Layer**: Partial ordered Limit (with prefix-based short-circuit)
- **TiDB Layer**: Partial ordered TopN (final sorting + short-circuit)
- Condition: No Selection filters on table scan

**One-Layer Mode** (when pushdown is not possible):
- **TiDB Layer**: Only partial ordered TopN
- Condition: Selection filters exist on table scan (may reorder rows)

---

## Benefits

### 1. Performance Improvement

**For queries with prefix indexes on ORDER BY columns:**
- **100-1000x faster** for large tables (millions of rows)
- Reduced I/O from O(N) table scan to O(LIMIT + X)
- Lower memory usage (sorting 10s of rows instead of millions)

### 2. Better Resource Utilization

- **Network bandwidth**: Transfer ~10 rows instead of millions
- **CPU**: Sort ~10 rows instead of millions
- **Memory**: Buffer ~10 rows instead of millions

### 3. Backward Compatible

- Controlled by system variable `tidb_opt_partial_ordered_index_for_topn` (default OFF)
- No impact on existing queries when disabled
- Gradually enable for specific workloads

### 4. Cost-Based Decision

- Optimizer estimates `X` using statistics
- Falls back to normal plan if estimated cost is higher
- Configurable via `tidb_opt_partial_ordered_max_limit_percent`

---

## Use Cases

### 1. Pagination APIs

```sql
-- First page
SELECT * FROM products ORDER BY name LIMIT 20 OFFSET 0;
-- Second page
SELECT * FROM products ORDER BY name LIMIT 20 OFFSET 20;
```

### 2. Admin Dashboards

```sql
-- Show recent users by email
SELECT id, email, created_at FROM users ORDER BY email LIMIT 100;

-- Show logs sorted by URL
SELECT * FROM access_logs ORDER BY url LIMIT 50;
```

### 3. Prefix Index Use Cases

Prefix indexes are commonly used for:
- Long VARCHAR columns (email, URL, description)
- Reducing index size while maintaining sort capability
- Balancing storage vs query performance

This optimization makes prefix indexes much more practical for ORDER BY queries.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

planner: Allow Optimizer to Use Prefix Index for ORDER BY / LIMIT to Avoid Full Scan #65813

Optimizer: Support Partial Order TopN Optimization with Prefix Index

Background

Current Limitation

Concrete Example

Real-World Impact

Proposed Solution: Partial Order TopN Optimization

Key Insight

Example of Partial Ordering

Optimized Execution Plan

Algorithm Details

Two-Layer vs One-Layer Plans

Benefits

1. Performance Improvement

2. Better Resource Utilization

3. Backward Compatible

4. Cost-Based Decision

Use Cases

1. Pagination APIs

2. Admin Dashboards

3. Prefix Index Use Cases

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

planner: Allow Optimizer to Use Prefix Index for ORDER BY / LIMIT to Avoid Full Scan #65813

Description

Optimizer: Support Partial Order TopN Optimization with Prefix Index

Background

Current Limitation

Concrete Example

Real-World Impact

Proposed Solution: Partial Order TopN Optimization

Key Insight

Example of Partial Ordering

Optimized Execution Plan

Algorithm Details

Two-Layer vs One-Layer Plans

Benefits

1. Performance Improvement

2. Better Resource Utilization

3. Backward Compatible

4. Cost-Based Decision

Use Cases

1. Pagination APIs

2. Admin Dashboards

3. Prefix Index Use Cases

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions