Skip to content

[Variant] Add primitive type timestamp_nanos(with&without timezone) and uuid#8149

Merged
alamb merged 3 commits intoapache:mainfrom
klion26:add_remaining_primitive
Aug 20, 2025
Merged

[Variant] Add primitive type timestamp_nanos(with&without timezone) and uuid#8149
alamb merged 3 commits intoapache:mainfrom
klion26:add_remaining_primitive

Conversation

@klion26
Copy link
Copy Markdown
Member

@klion26 klion26 commented Aug 15, 2025

Which issue does this PR close?

Rationale for this change

This PR adds remaining variant primitive types(timestamp_nanos/timestampntz_nanos/uuid)

What changes are included in this PR?

  • Add primitive variant types for timestamp_nanos/timestampntz_nanos/uuid

Are these changes tested?

Added some tests and reusing existing tests

Are there any user-facing changes?

No

@github-actions github-actions bot added the parquet-variant parquet-variant* crates label Aug 15, 2025
Comment thread parquet-variant/src/variant.rs Outdated
@klion26 klion26 force-pushed the add_remaining_primitive branch from e1c7f40 to 82489a3 Compare August 15, 2025 09:57
@klion26
Copy link
Copy Markdown
Member Author

klion26 commented Aug 15, 2025

spec said that UUID is stored as big-endian
Tests failed because of UUID decoding/encoding, I need to verify it. Currently, use from_slice_le and to_bytes_le( from UUID crate, seems the result is different from the bytes generated from the iceberg code.

Comment thread parquet-variant/src/builder.rs Outdated
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UUID is stored as big-endian

@klion26 klion26 force-pushed the add_remaining_primitive branch 3 times, most recently from 335ed92 to fc6e1b7 Compare August 15, 2025 13:40
@klion26
Copy link
Copy Markdown
Member Author

klion26 commented Aug 18, 2025

@alamb @superserious-dev Could you please help review this when you're free, thanks.

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR @klion26 🙏 ❤️

I left some comments -- the only one I think is important to change is the as_uuid_string --> as_uuid

Otherwise this is looking very nice

Comment thread parquet-variant/src/decoder.rs Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread parquet-variant/src/decoder.rs Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the implementation. When I implemented this, I did not notice that the DateTime::from_timestamp_nanos would never fail, added the comment to show why we use DateTime::from_timestamp instead of DateTime::from_timestamp_nanos

Comment thread parquet-variant/src/variant.rs Outdated
Comment thread parquet-variant/src/variant.rs Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having this be as_uuid(&self) -> Option<Uuid> would be both more consistent with the other methods that access variant data, as well as being more performant -- the caller could decide if they wanted to pay the cost to convert to string as well

I think the example is great, and we could show it as a way to convert to strings

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Copy Markdown
Member Author

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb Thanks for your review, I've updated the pr, please take another look when you're free. thanks.

Comment thread parquet-variant/src/decoder.rs Outdated
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the implementation. When I implemented this, I did not notice that the DateTime::from_timestamp_nanos would never fail, added the comment to show why we use DateTime::from_timestamp instead of DateTime::from_timestamp_nanos

Comment thread parquet-variant/src/variant.rs Outdated
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@klion26 klion26 requested a review from alamb August 19, 2025 05:31
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @klion26 -- this looks good to me

@klion26 klion26 force-pushed the add_remaining_primitive branch from a7df3ca to 41b3990 Compare August 20, 2025 02:31
@klion26 klion26 force-pushed the add_remaining_primitive branch from 41b3990 to c885cb6 Compare August 20, 2025 02:48
@klion26
Copy link
Copy Markdown
Member Author

klion26 commented Aug 20, 2025

@alamb thanks for the review. Updated the code to resolve the conflicts. The CI failed because of Error: Unable to locate package libpython3.11-dev, it seems it's irrelevant to the current pr

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Aug 20, 2025

@alamb thanks for the review. Updated the code to resolve the conflicts. The CI failed because of Error: Unable to locate package libpython3.11-dev, it seems it's irrelevant to the current pr

Yes I think that is is also happening on main -- I filed a ticket to track

@alamb alamb merged commit 4ac3114 into apache:main Aug 20, 2025
13 checks passed
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Aug 20, 2025

Thank you @klion26

@klion26 klion26 deleted the add_remaining_primitive branch August 21, 2025 02:59
@klion26
Copy link
Copy Markdown
Member Author

klion26 commented Aug 21, 2025

@alamb thank you for the review and merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Variant] Add support the remaing primitive type(timestamp_nanos/timestampntz_nanos/uuid) for parquet variant

2 participants