Skip to content

[Variant] Fix broken metadata builder rollback#8135

Merged
alamb merged 8 commits intoapache:mainfrom
scovich:variant-nested-rollback
Aug 18, 2025
Merged

[Variant] Fix broken metadata builder rollback#8135
alamb merged 8 commits intoapache:mainfrom
scovich:variant-nested-rollback

Conversation

@scovich
Copy link
Copy Markdown
Contributor

@scovich scovich commented Aug 13, 2025

Which issue does this PR close?

Rationale for this change

New unit tests demonstrate that variant builder rollback was broken, producing various validation failures. The problem was subtle -- using buffer length instead of field count when rolling back metadata builder state.

What changes are included in this PR?

Fix the bug, and fix two existing unit tests that expected wrong behavior.

While we're at it, add a human-readable impl Debug for Variant, which gives a convenient way of comparing two variant values. Also add the missing VariantBuilder::[try_]with_value methods that the other two builders already had.

Are these changes tested?

New and existing unit tests cover the changes.

Are there any user-facing changes?

Output of impl Debug for Variant changed.

Two new VariantBuilder methods.

@github-actions github-actions bot added the parquet-variant parquet-variant* crates label Aug 13, 2025
@scovich scovich force-pushed the variant-nested-rollback branch from 59f9e60 to 27b3cdb Compare August 14, 2025 03:04
@scovich scovich changed the title [Variant] Nested builder rollback is broken [Variant] Fix broken metadata builder rollback Aug 14, 2025
Comment thread parquet-variant/src/variant.rs Outdated
Comment thread parquet-variant/src/builder.rs Outdated
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @scovich -- this is a very nice find I had some suggestions, the most important of which I think are the naming of metadata_current_offset and the Debug display

Comment thread parquet-variant/src/builder.rs
///
/// This method will panic if the variant contains duplicate field names in objects
/// when validation is enabled. For a fallible version, use [`ListBuilder::try_with_value`].
pub fn with_value<'m, 'd, T: Into<Variant<'m, 'd>>>(mut self, value: T) -> Self {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

Comment thread parquet-variant/src/builder.rs Outdated
Comment thread parquet-variant/src/builder.rs Outdated
Comment thread parquet-variant/src/variant.rs Outdated
impl std::fmt::Debug for Variant<'_, '_> {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
Variant::Null => write!(f, "null"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for the debug printing we should also include the variant enum, otherwise one wouldn't be able to tell the difference between Variant::Int8(42) and Variant::Int16(42) from the debug output, as they would both be rendered as 42

So perhaps we can keep

Suggested change
Variant::Null => write!(f, "null"),
Variant::Null => write!(f, "Variant::Null"),

And then apply some nicer formatting to the binary / object ones

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled out the impl Debug as its own PR that should merge first:

Let's pick up the conversation there as needed, this PR will shrink a lot after rebasing on top of it.

Comment thread parquet-variant/src/builder.rs Outdated
@scovich scovich requested a review from alamb August 14, 2025 15:29
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you -- I think this looks great @scovich

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Aug 14, 2025

FYI @klion26 @abacef and @friendlymatthew as you may be interested in this PR

Copy link
Copy Markdown
Member

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix, LGTM

list.finish();
}
}
if i % skip != 0 {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test!

Do we need to make the finish conditions for the inner and outer objects different? so that the field names added by the inner and outer objects are different.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each loop has a different i (they shadow each other). No field name is used twice, and skipped field names are not used at all.

@scovich
Copy link
Copy Markdown
Contributor Author

scovich commented Aug 15, 2025

Any reason not to merge this?

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Aug 18, 2025

Any reason not to merge this?

Nope -- sorry -- was behind on alerts. Merged!

@alamb alamb merged commit 7d90679 into apache:main Aug 18, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Variant] Nested builder rollback is broken

3 participants