Commit 9286563
fix: validate inter-file ordering in eq_properties() (apache#20329)
## Summary
Discovered this bug while working on apache#19724.
TLDR: just because the files themselves are sorted doesn't mean the
partition streams are sorted.
- **`eq_properties()` in `FileScanConfig` blindly trusted
`output_ordering`** (set from Parquet `sorting_columns` metadata)
without verifying that files within a group are in the correct
inter-file order
- `EnforceSorting` then removed `SortExec` based on this unvalidated
ordering, producing **wrong results** when filesystem order didn't match
data order
- Added `validated_output_ordering()` that filters orderings using
`MinMaxStatistics::new_from_files()` + `is_sorted()` to verify
inter-file sort order before reporting them to the optimizer
## Changes
### `datafusion/datasource/src/file_scan_config.rs`
- Added `validated_output_ordering()` method on `FileScanConfig` that
validates each output ordering against actual file group statistics
- Changed `eq_properties()` to call `self.validated_output_ordering()`
instead of `self.output_ordering.clone()`
### `datafusion/sqllogictest/test_files/sort_pushdown.slt`
Added 8 new regression tests (Tests 4-11):
| Test | Scenario | Key assertion |
|------|----------|---------------|
| **4** | Reversed filesystem order (inferred ordering) | SortExec
retained — wrong inter-file order detected |
| **5** | Overlapping file ranges (inferred ordering) | SortExec
retained — overlapping ranges detected |
| **6** | `WITH ORDER` + reversed filesystem order | SortExec retained
despite explicit ordering |
| **7** | Correctly ordered multi-file group (positive) | SortExec
eliminated — validation passes |
| **8** | DESC ordering with wrong inter-file DESC order | SortExec
retained for DESC direction |
| **9** | Multi-column sort key (overlapping vs non-overlapping) |
Conservative rejection with overlapping stats; passes with clean
boundaries |
| **10** | Correctly ordered + `WITH ORDER` (positive) | SortExec
eliminated — both ordering and stats agree |
| **11** | Multiple partitions (one file per group) |
`SortPreservingMergeExec` merges; no per-partition sort needed |
## Test plan
- [x] `cargo test --test sqllogictests -- sort_pushdown` — all new +
existing tests pass
- [x] `cargo test -p datafusion-datasource` — 97 unit tests + 6 doc
tests pass
- [x] Existing Test 1 (single-file sort pushdown with `WITH ORDER`)
still eliminates SortExec (no regression)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>1 parent 016e2ae commit 9286563
5 files changed
Lines changed: 660 additions & 47 deletions
File tree
- datafusion
- core/tests/physical_optimizer
- datasource/src
- sqllogictest/test_files
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
864 | 864 | | |
865 | 865 | | |
866 | 866 | | |
867 | | - | |
| 867 | + | |
868 | 868 | | |
869 | 869 | | |
870 | 870 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
665 | 665 | | |
666 | 666 | | |
667 | 667 | | |
668 | | - | |
| 668 | + | |
669 | 669 | | |
670 | 670 | | |
671 | 671 | | |
| |||
853 | 853 | | |
854 | 854 | | |
855 | 855 | | |
| 856 | + | |
| 857 | + | |
| 858 | + | |
| 859 | + | |
| 860 | + | |
| 861 | + | |
| 862 | + | |
| 863 | + | |
| 864 | + | |
| 865 | + | |
| 866 | + | |
| 867 | + | |
| 868 | + | |
| 869 | + | |
| 870 | + | |
| 871 | + | |
| 872 | + | |
| 873 | + | |
| 874 | + | |
| 875 | + | |
| 876 | + | |
| 877 | + | |
| 878 | + | |
| 879 | + | |
| 880 | + | |
| 881 | + | |
| 882 | + | |
| 883 | + | |
| 884 | + | |
| 885 | + | |
| 886 | + | |
| 887 | + | |
| 888 | + | |
| 889 | + | |
856 | 890 | | |
857 | 891 | | |
858 | 892 | | |
| |||
1202 | 1236 | | |
1203 | 1237 | | |
1204 | 1238 | | |
| 1239 | + | |
| 1240 | + | |
| 1241 | + | |
| 1242 | + | |
| 1243 | + | |
| 1244 | + | |
| 1245 | + | |
| 1246 | + | |
| 1247 | + | |
| 1248 | + | |
| 1249 | + | |
| 1250 | + | |
| 1251 | + | |
| 1252 | + | |
| 1253 | + | |
| 1254 | + | |
| 1255 | + | |
| 1256 | + | |
| 1257 | + | |
| 1258 | + | |
| 1259 | + | |
| 1260 | + | |
| 1261 | + | |
| 1262 | + | |
| 1263 | + | |
| 1264 | + | |
| 1265 | + | |
| 1266 | + | |
| 1267 | + | |
| 1268 | + | |
| 1269 | + | |
| 1270 | + | |
| 1271 | + | |
| 1272 | + | |
| 1273 | + | |
| 1274 | + | |
| 1275 | + | |
| 1276 | + | |
| 1277 | + | |
| 1278 | + | |
| 1279 | + | |
| 1280 | + | |
| 1281 | + | |
| 1282 | + | |
| 1283 | + | |
1205 | 1284 | | |
1206 | 1285 | | |
1207 | 1286 | | |
| |||
1268 | 1347 | | |
1269 | 1348 | | |
1270 | 1349 | | |
1271 | | - | |
1272 | | - | |
1273 | | - | |
1274 | | - | |
1275 | | - | |
1276 | | - | |
1277 | | - | |
1278 | | - | |
1279 | | - | |
1280 | | - | |
1281 | | - | |
1282 | | - | |
1283 | | - | |
1284 | | - | |
1285 | | - | |
1286 | | - | |
1287 | | - | |
1288 | | - | |
1289 | | - | |
1290 | | - | |
1291 | | - | |
| 1350 | + | |
| 1351 | + | |
| 1352 | + | |
| 1353 | + | |
| 1354 | + | |
| 1355 | + | |
| 1356 | + | |
| 1357 | + | |
| 1358 | + | |
| 1359 | + | |
| 1360 | + | |
1292 | 1361 | | |
1293 | | - | |
1294 | | - | |
1295 | | - | |
1296 | | - | |
1297 | | - | |
1298 | | - | |
1299 | | - | |
1300 | | - | |
1301 | | - | |
1302 | | - | |
1303 | | - | |
1304 | | - | |
1305 | | - | |
1306 | | - | |
1307 | | - | |
1308 | | - | |
1309 | | - | |
1310 | | - | |
1311 | | - | |
| 1362 | + | |
| 1363 | + | |
| 1364 | + | |
| 1365 | + | |
| 1366 | + | |
| 1367 | + | |
| 1368 | + | |
| 1369 | + | |
| 1370 | + | |
| 1371 | + | |
| 1372 | + | |
| 1373 | + | |
| 1374 | + | |
| 1375 | + | |
| 1376 | + | |
| 1377 | + | |
| 1378 | + | |
| 1379 | + | |
| 1380 | + | |
| 1381 | + | |
| 1382 | + | |
| 1383 | + | |
| 1384 | + | |
| 1385 | + | |
| 1386 | + | |
| 1387 | + | |
| 1388 | + | |
1312 | 1389 | | |
1313 | | - | |
1314 | | - | |
1315 | 1390 | | |
1316 | | - | |
1317 | 1391 | | |
1318 | 1392 | | |
1319 | 1393 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
266 | 266 | | |
267 | 267 | | |
268 | 268 | | |
| 269 | + | |
269 | 270 | | |
270 | 271 | | |
271 | 272 | | |
272 | 273 | | |
273 | | - | |
| 274 | + | |
274 | 275 | | |
275 | 276 | | |
276 | 277 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
274 | 274 | | |
275 | 275 | | |
276 | 276 | | |
277 | | - | |
| 277 | + | |
0 commit comments