Commit d090bf8
feat: Support reading Parquet ENUM logical type as String (#7805)
Closes #7723
## Background
Parquet's `ENUM` logical type is physically identical to `STRING`: both
annotate a `BINARY` column with UTF-8 encoded bytes. The only difference
is the label. External tools such as Spark and PyArrow use `ENUM` to
indicate a column holds a finite set of string values, but the wire
format is the same.
Deephaven's read pipeline has three stages where logical type is
dispatched. All three previously had no handling for
`EnumLogicalTypeAnnotation`, causing ENUM-annotated columns from
externally produced files to fail on read.
## Changes
### Stage 1 — Schema to Java type (`ParquetSchemaReader`)
**Before:** `visit(EnumLogicalTypeAnnotation)` set an error string and
returned `Optional.empty()`, so the column was unresolvable.
**After:** Returns `Optional.of(String.class)`, the same result as
`visit(StringLogicalTypeAnnotation)`.
### Stage 2 — Column data to chunk (`ParquetColumnLocation`)
**Before:** No `visit(EnumLogicalTypeAnnotation)` override existed, so
the visitor returned `Optional.empty()` and the read failed at runtime.
**After:** A new override delegates to `ToStringPage.create(...)`, the
same decoder used for `STRING` columns.
### Stage 3 — Pushdown statistics (`MinMaxFromStatistics`)
**Before:** `getMinMaxForStrings` only accepted
`StringLogicalTypeAnnotation`, so ENUM columns returned `false` and
forced a full scan on every filter.
**After:** The condition adds `|| instanceof EnumLogicalTypeAnnotation`,
enabling min/max pushdown for ENUM columns.
## Tests
- `MinMaxFromStatisticsTest.enumLogicalStatisticsAreMaterialised` — unit
test verifying ENUM statistics are extracted as strings.
- `ParquetTableReadWriteTest.testReadEnumLogicalTypeAsString` —
end-to-end test that writes a Parquet file with a `BINARY+ENUM` column
and reads it back, verifying the column materializes as `String` with
correct values.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>1 parent 69e6ea8 commit d090bf8
5 files changed
Lines changed: 87 additions & 19 deletions
File tree
- extensions/parquet/table/src
- main/java/io/deephaven/parquet/table
- location
- test/java/io/deephaven/parquet/table
- location
Lines changed: 33 additions & 18 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
361 | 361 | | |
362 | 362 | | |
363 | 363 | | |
364 | | - | |
365 | | - | |
366 | | - | |
367 | | - | |
368 | | - | |
369 | | - | |
370 | | - | |
371 | | - | |
372 | | - | |
373 | | - | |
374 | | - | |
375 | | - | |
376 | | - | |
377 | | - | |
378 | | - | |
379 | | - | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
380 | 367 | | |
381 | 368 | | |
382 | 369 | | |
| |||
393 | 380 | | |
394 | 381 | | |
395 | 382 | | |
396 | | - | |
397 | | - | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
398 | 387 | | |
399 | 388 | | |
400 | 389 | | |
| |||
501 | 490 | | |
502 | 491 | | |
503 | 492 | | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
504 | 519 | | |
Lines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
392 | 392 | | |
393 | 393 | | |
394 | 394 | | |
395 | | - | |
| 395 | + | |
| 396 | + | |
396 | 397 | | |
397 | 398 | | |
398 | 399 | | |
| |||
Lines changed: 6 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
535 | 535 | | |
536 | 536 | | |
537 | 537 | | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
538 | 544 | | |
539 | 545 | | |
540 | 546 | | |
| |||
Lines changed: 31 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
| 53 | + | |
53 | 54 | | |
| 55 | + | |
54 | 56 | | |
| 57 | + | |
| 58 | + | |
55 | 59 | | |
56 | 60 | | |
57 | 61 | | |
| |||
76 | 80 | | |
77 | 81 | | |
78 | 82 | | |
| 83 | + | |
79 | 84 | | |
80 | 85 | | |
81 | 86 | | |
| |||
133 | 138 | | |
134 | 139 | | |
135 | 140 | | |
| 141 | + | |
136 | 142 | | |
137 | 143 | | |
138 | 144 | | |
| |||
5002 | 5008 | | |
5003 | 5009 | | |
5004 | 5010 | | |
| 5011 | + | |
| 5012 | + | |
| 5013 | + | |
| 5014 | + | |
| 5015 | + | |
| 5016 | + | |
| 5017 | + | |
| 5018 | + | |
| 5019 | + | |
| 5020 | + | |
| 5021 | + | |
| 5022 | + | |
| 5023 | + | |
| 5024 | + | |
| 5025 | + | |
| 5026 | + | |
| 5027 | + | |
| 5028 | + | |
| 5029 | + | |
| 5030 | + | |
| 5031 | + | |
| 5032 | + | |
| 5033 | + | |
| 5034 | + | |
| 5035 | + | |
5005 | 5036 | | |
5006 | 5037 | | |
5007 | 5038 | | |
| |||
Lines changed: 15 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
648 | 648 | | |
649 | 649 | | |
650 | 650 | | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
| 665 | + | |
651 | 666 | | |
652 | 667 | | |
653 | 668 | | |
| |||
0 commit comments