Skip to content

Commit 9d06228

Browse files
docs: DOC-1353: Document Parquet ENUM logical type mapping to String (#7954)
1 parent b92332d commit 9d06228

2 files changed

Lines changed: 12 additions & 0 deletions

File tree

docs/groovy/how-to-guides/data-import-export/parquet-formats.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,12 @@ Note that if Deephaven does not know how to parse a value, it becomes a string.
3434

3535
Deephaven supports optional metadata files that let you specify the types of your partitioning columns, which may not be obvious otherwise. Top-level metadata files can supply the full table schema and information about partitioning columns, while leaf-level files provide additional information to the engine (such as grouping/indexing). Deephaven can discover your Parquet files without looking at the entire file system, and all metadata is loaded at once.
3636

37+
## Logical type support
38+
39+
Deephaven maps Parquet [logical types](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md) to Deephaven column types on read. The following non-obvious mappings are supported:
40+
41+
- **`ENUM`**: Read as `String`. The `ENUM` logical type is physically identical to `STRING` (both are `BINARY` with UTF-8 encoding) and is commonly used by tools such as Apache Spark and PyArrow to annotate columns with a finite set of string values. Deephaven reads `ENUM`-annotated columns transparently as `String`.
42+
3743
## Related documentation
3844

3945
- [Import Parquet into Deephaven video](https://youtu.be/k4gI6hSZ2Jc)

docs/python/how-to-guides/data-import-export/parquet-formats.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,12 @@ Note that if Deephaven does not know how to parse a value, it becomes a string.
3434

3535
Deephaven supports optional metadata files that let you specify the types of your partitioning columns, which may not be obvious otherwise. Top-level metadata files can supply the full table schema and information about partitioning columns, while leaf-level files provide additional information to the engine (such as grouping/indexing). Deephaven can discover your Parquet files without looking at the entire file system, and all metadata is loaded at once.
3636

37+
## Logical type support
38+
39+
Deephaven maps Parquet [logical types](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md) to Deephaven column types on read. The following non-obvious mappings are supported:
40+
41+
- **`ENUM`**: Read as `String`. The `ENUM` logical type is physically identical to `STRING` (both are `BINARY` with UTF-8 encoding) and is commonly used by tools such as Apache Spark and PyArrow to annotate columns with a finite set of string values. Deephaven reads `ENUM`-annotated columns transparently as `String`.
42+
3743
## Related documentation
3844

3945
- [Import Parquet into Deephaven video](https://youtu.be/k4gI6hSZ2Jc)

0 commit comments

Comments
 (0)