feat: DH-18399: Add ParquetColumnResolver by devinrsmith · Pull Request #6558 · deephaven/deephaven-core

devinrsmith · 2025-01-13T19:53:29Z

No description provided.

rcaudy · 2025-01-14T20:11:21Z

+        } else {
+            final ColumnDescriptor columnDescriptor = resolver.mapping().get(columnName);
+            if (columnDescriptor == null) {
+                nameList = List.of(); // empty, will not resolve


Let's make sure this actually results in supply ColumnRegion.Null implementations when a user tries to access the column.

Yes, verified.

rcaudy · 2025-01-14T20:23:09Z

+            // There are different resolution strategies that could all be reasonable. We could consider using only the
+            // field id closest to the leaf. This version, however, takes the most general approach and considers field
+            // ids wherever they appear; ultimately, only being resolvable if the field id mapping is unambiguous.
+            for (Type type : path) {


What happens if we do encounter a nested type here? That is, what's the current outcome?

I'm not sure I understand the question. There may definitely be "nested types" here; but path represents the full path to a single leaf (primitive) field.

rcaudy · 2025-01-14T20:23:58Z

+         * @param tableLocationKey the Parquet TLK
+         * @return the Parquet column resolver
+         */
+        ParquetColumnResolver init(TableKey tableKey, ParquetTableLocationKey tableLocationKey);


init -> resolver?

used generic naming of

rcaudy · 2025-01-15T19:21:49Z

+     * The map from Deephaven column name to {@link ColumnDescriptor}. The {@link #schema()} must contain each column
+     * descriptor.
+     */
+    public abstract Map<String, ColumnDescriptor> mapping();


Do we need this to keep ColumnDescriptor as part of the interface of the implementation? Do we want the Iceberg implementation using this?

I'm very much in favor of keeping this implementation strongly-typed with the safety checks. Iceberg should be able to use this implementation (and even use the same Factory to create it) in some situations.

Changed my mind; the map implementation is now very generic, no Parquet specific types.

rcaudy

.

…lumn-resolver

malhotrashivam

Minor comments

Also update relevant tests to check against full MessageType schema instead of pulling out the ColumnDescriptors

malhotrashivam · 2025-01-21T20:45:21Z

+    public static List<String[]> getPaths(MessageType schema) {
+        final List<String[]> out = new ArrayList<>();


It would be better to add a comment here what these method actually do because there's no comments in the MessageType methods and they are a bit tricky.

malhotrashivam · 2025-01-21T20:59:36Z

+                return null;
+            }
+            primitiveType = field.asPrimitiveType();
+            if (isRepeated(primitiveType)) {


You might be able to merge the two if branches because isRequired and isRepeated only take Type, and not Primitive or Group type

malhotrashivam · 2025-01-21T21:03:24Z

+    }
+
+    /**
+     * A more efficient implementation of {@link MessageType#getColumnDescription(String[])}.


I am not sure I understand the -1 in the calculation of max repetition level and max definition level in MessageType class.

The -1 in the upstream code is because the are counting the MessageType as part of the calculation (and it always appears to be REPEATED).

devinrsmith · 2025-01-22T02:47:37Z

nightlies all pass

malhotrashivam

Minor comments

feat: DH-18399: Add ParquetColumnResolver

d379760

devinrsmith added parquet Related to the Parquet integration NoDocumentationNeeded ReleaseNotesNeeded Release notes are needed labels Jan 13, 2025

devinrsmith added this to the 0.38.0 milestone Jan 13, 2025

devinrsmith self-assigned this Jan 13, 2025

Fix test location

ae60022

devinrsmith requested a review from malhotrashivam January 13, 2025 20:44

rcaudy reviewed Jan 14, 2025

View reviewed changes

Comment thread ...s/parquet/table/src/main/java/io/deephaven/parquet/table/location/ParquetColumnResolver.java Outdated

rcaudy reviewed Jan 14, 2025

View reviewed changes

Comment thread ...s/parquet/table/src/main/java/io/deephaven/parquet/table/location/ParquetColumnResolver.java

devinrsmith added 2 commits January 14, 2025 18:23

add tests, refactor

554ada6

small generalization

964b2b6

rcaudy reviewed Jan 15, 2025

View reviewed changes

devinrsmith added 2 commits January 15, 2025 16:40

Merge remote-tracking branch 'upstream/main' into DH-18399-parquet-co…

de1ad29

…lumn-resolver

documentation and testing

59b467f

devinrsmith marked this pull request as ready for review January 16, 2025 01:57

devinrsmith added 2 commits January 16, 2025 14:44

f

67be4a0

fix failure

c4b42b4

malhotrashivam reviewed Jan 17, 2025

View reviewed changes

Comment thread ...ns/parquet/table/src/main/java/io/deephaven/parquet/table/location/ParquetTableLocation.java Outdated

Comment thread extensions/parquet/table/src/main/java/io/deephaven/parquet/table/location/ParquetUtil.java Outdated

small rename, refactor

e1f6cb0

devinrsmith requested a review from malhotrashivam January 20, 2025 17:34

Make ParquetSchemaUtil an internal library

4433acd

Also update relevant tests to check against full MessageType schema instead of pulling out the ColumnDescriptors

malhotrashivam reviewed Jan 21, 2025

View reviewed changes

responses

aff378d

devinrsmith requested a review from malhotrashivam January 22, 2025 02:47

malhotrashivam reviewed Jan 22, 2025

View reviewed changes

devinrsmith requested a review from malhotrashivam January 22, 2025 16:10

responses

898514b

malhotrashivam approved these changes Jan 22, 2025

View reviewed changes

devinrsmith merged commit 4b3ea4b into deephaven:main Jan 22, 2025

devinrsmith deleted the DH-18399-parquet-column-resolver branch January 22, 2025 18:07

github-actions Bot locked and limited conversation to collaborators Jan 22, 2025

		public static List<String[]> getPaths(MessageType schema) {
		final List<String[]> out = new ArrayList<>();

Conversation

devinrsmith commented Jan 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devinrsmith Jan 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rcaudy left a comment

Choose a reason for hiding this comment

Uh oh!

malhotrashivam left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devinrsmith commented Jan 22, 2025

Uh oh!

malhotrashivam left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

devinrsmith Jan 16, 2025 •

edited

Loading