Skip to content

[Variant] Add human-readable impl Debug for Variant#8140

Merged
alamb merged 1 commit intoapache:mainfrom
scovich:variant-debug
Aug 14, 2025
Merged

[Variant] Add human-readable impl Debug for Variant#8140
alamb merged 1 commit intoapache:mainfrom
scovich:variant-debug

Conversation

@scovich
Copy link
Copy Markdown
Contributor

@scovich scovich commented Aug 14, 2025

Which issue does this PR close?

Rationale for this change

Unit tests need a way to verify two Variant are logically equivalent, and Debug is a good way to achieve that without wading into the complexities of a proper PartialEq implementation that would become part of the public API.

More generally, byte slices are not very easy for humans to interpret, so it makes sense for Debug to do something nicer.

What changes are included in this PR?

Manually impl Debug for Variant, maintaining a traditional look but with nicer handling of Variant::Binary, Variant::Object and Variant::List subtypes. The latter two use fallible iteration to avoid potential panics, since Debug is likely to be used when formatting error messages.

Are these changes tested?

New unit test

Are there any user-facing changes?

The debug formatting of Variant has changed.

@github-actions github-actions bot added the parquet-variant parquet-variant* crates label Aug 14, 2025
Comment on lines +1524 to +1528
"decimal16": Decimal16(
VariantDecimal16 {
integer: 123456789012345678901234567890,
scale: 4,
},
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was very tempting to do something more clever for these decimal subtypes, but ultimately I decided that this is Debug which generally matches the physical layout of the objects even when it's a bit verbose.

Ditto for ShortString.

Happy to reconsider tho!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good

Comment on lines +1289 to +1292
// helper to print <invalid> instead of "<invalid>" in debug mode when a VariantObject or VariantList contains invalid values.
struct InvalidVariant;

impl std::fmt::Debug for InvalidVariant {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea? Bad? Terrible?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be clearer if you just used the the string directly.

So instead of

                        Err(_) => map.entry(&InvalidVariant, &InvalidVariant),

do

                        Err(_) => map.entry(&"invalid", &"invalid"),

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I had at first, but it would print e.g.

{
    "invalid": "invalid",
}

Since "invalid": "invalid" is valid JSON, it seemed better to have something unambiguous:

{
    <invalid>: <invalid>,
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there's certainly an argument to be made that this is overkill for an error case?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine. Let's go with this and if someone else comes up with something better we can use that

Comment on lines +1298 to +1303
// helper to print binary data in hex format in debug mode, as space-separated hex byte values.
struct HexString<'a>(&'a [u8]);

impl<'a> std::fmt::Debug for HexString<'a> {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
if let Some((first, rest)) = self.0.split_first() {
Copy link
Copy Markdown
Contributor Author

@scovich scovich Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most hex editors use this format, 01 02 03 04 de ad be ef as more compact and readable than something comma-separated.

I intentionally chose not to honor alt syntax for this one because IMO both of these look weird:

Binary(
),
Binary(
    01 02 03 04 de ad be ef,
),

vs.

Binary(),
Binary(01 02 03 04 de ad be ef),

Happy to consider other perspectives tho!

(honestly, I wish the alt-printing for tuple types would always avoid line breaks when there's only a single value, but oh well)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no this is good I think

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind this is a clear improvement -- thank you @scovich

I had some comments about InvalidVariant but I also think it is fine the way you have it too

Comment on lines +1289 to +1292
// helper to print <invalid> instead of "<invalid>" in debug mode when a VariantObject or VariantList contains invalid values.
struct InvalidVariant;

impl std::fmt::Debug for InvalidVariant {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be clearer if you just used the the string directly.

So instead of

                        Err(_) => map.entry(&InvalidVariant, &InvalidVariant),

do

                        Err(_) => map.entry(&"invalid", &"invalid"),

Comment on lines +1298 to +1303
// helper to print binary data in hex format in debug mode, as space-separated hex byte values.
struct HexString<'a>(&'a [u8]);

impl<'a> std::fmt::Debug for HexString<'a> {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
if let Some((first, rest)) = self.0.split_first() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no this is good I think

Comment on lines +1524 to +1528
"decimal16": Decimal16(
VariantDecimal16 {
integer: 123456789012345678901234567890,
scale: 4,
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good

@alamb alamb merged commit 4bb9127 into apache:main Aug 14, 2025
13 checks passed
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Aug 14, 2025

Thank you @scovich

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants