Skip to content

[SPARK-56388][CONNECT] Add XML support to Spark Connect Parse protocol#55262

Open
hvanhovell wants to merge 1 commit intoapache:masterfrom
hvanhovell:SPARK-56388
Open

[SPARK-56388][CONNECT] Add XML support to Spark Connect Parse protocol#55262
hvanhovell wants to merge 1 commit intoapache:masterfrom
hvanhovell:SPARK-56388

Conversation

@hvanhovell
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Added PARSE_FORMAT_XML to the Parse.ParseFormat enum in the Spark Connect proto
(relations.proto) and wired it through the full stack:

  • Client (DataFrameReader.scala): xml(Dataset[String]) now sends PARSE_FORMAT_XML
    instead of PARSE_FORMAT_UNSPECIFIED
  • Server (SparkConnectPlanner.scala): handles PARSE_FORMAT_XML by dispatching to
    dataFrameReader.xml(ds)
  • Proto generated files: regenerated Python proto bindings

Why are the changes needed?

DataFrameReader.xml(Dataset[String]) was sending PARSE_FORMAT_UNSPECIFIED to the server,
causing it to fail. This adds proper XML support to the Connect Parse protocol.

Does this PR introduce any user-facing change?

Yes. spark.read.xml(dataset) now works correctly in Spark Connect (previously it threw an error).

How was this patch tested?

  • Added ClientE2ETestSuite tests: xml from Dataset[String] inferSchema,
    xml from Dataset[String] with schema, xml from Dataset[String] with invalid schema
  • Added PlanGenerationTestSuite test: xml from dataset
  • Added query plan golden files: xml_from_dataset.json / xml_from_dataset.proto.bin

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Sonnet 4.6

Add PARSE_FORMAT_XML to the Connect Parse proto enum and wire it through
DataFrameReader.xml(Dataset[String]) on the client and SparkConnectPlanner
on the server. Includes E2E tests and plan generation test coverage.

Co-authored-by: Herman van Hövell
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants