Skip to content

fix: support to_json on Spark 4.0#4036

Merged
mbutrovich merged 1 commit intoapache:mainfrom
andygrove:fix-3920-to-json-spark4
Apr 22, 2026
Merged

fix: support to_json on Spark 4.0#4036
mbutrovich merged 1 commit intoapache:mainfrom
andygrove:fix-3920-to-json-spark4

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #3920.

Rationale for this change

In Spark 4.0, StructsToJson was changed to extend RuntimeReplaceable. Its replacement is Invoke(Literal(StructsToJsonEvaluator), "evaluate", ...), so by the time Comet's serde walks the optimized plan it only sees the Invoke node. Comet has no Invoke handler, so the four to_json tests fell back with COMET: invoke is not supported and were skipped via assume(!isSpark40Plus).

What changes are included in this PR?

  • Add a Spark 4-only matcher in CometExprShim.versionSpecificExprToProtoInternal that detects Invoke(Literal(_: StructsToJsonEvaluator, _), "evaluate", Seq(child)), reconstructs the original StructsToJson(options, child, timeZoneId) from the evaluator's accessors, and recurses through exprToProtoInternal so CometStructsToJson's incompat / support-level checks still apply.
  • Re-enable the four previously-skipped to_json tests in CometExpressionSuite and CometJsonExpressionSuite.

How are these changes tested?

The four tests that were guarded by assume(!isSpark40Plus) now run and pass on Spark 4.0:

  • CometExpressionSuite: to_json
  • CometExpressionSuite: to_json escaping of field names and string values
  • CometExpressionSuite: to_json unicode
  • CometJsonExpressionSuite: to_json - all supported types

Also verified the same tests still pass on Spark 3.5 (default profile) and that the rest of CometJsonExpressionSuite (the from_json tests) is unaffected on Spark 4.

In Spark 4.0, StructsToJson is a RuntimeReplaceable whose replacement is
Invoke(Literal(StructsToJsonEvaluator), "evaluate", ...). Comet's serde
saw the post-replacement Invoke and fell back with "invoke is not
supported". Add a Spark 4-only matcher in CometExprShim that detects this
specific Invoke shape, reconstructs StructsToJson from the evaluator's
options, child, and timeZoneId, and recurses through exprToProtoInternal
so support-level checks still apply. Re-enable the four to_json tests
that were skipped on Spark 4.

Closes apache#3920
@andygrove
Copy link
Copy Markdown
Member Author

@kazantsev-maksim fyi

Copy link
Copy Markdown
Contributor

@mbutrovich mbutrovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove!

@mbutrovich mbutrovich merged commit 6e6a2de into apache:main Apr 22, 2026
133 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for to_json compatibility with Spark 4.0.0.

2 participants