[SPARK-56403] Refactor kafka test so it's skipped when dependency is not available by gaogaotiantian · Pull Request #55266 · apache/spark

gaogaotiantian · 2026-04-08T22:14:47Z

What changes were proposed in this pull request?

Add a util method has_dependencies to KafkaUtils
Move all the module-level code of kafka test inside classs
Skip the kafka test when dependency is not available, instead of raising an error
Restore os.environ after the test
Use the testing utility in __main__ instead of old entry

Why are the changes needed?

We don't want a test to fail if optional dependency is not available. It's breaking our CIs - https://github.com/apache/spark/actions/runs/24128422095 . The test itself should not have too much module-level code, and should have minimum side effect on the environment.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Confirmed it was correctly skipped when dependency was not available. CI should confirm whether test is still working properly.

Was this patch authored or co-authored using generative AI tooling?

No.

gaogaotiantian · 2026-04-08T22:15:06Z

@jerrypeng as the original author.

zhengruifeng · 2026-04-09T05:35:25Z

python/pyspark/sql/tests/streaming/kafka_utils.py

        self.broker = None

+    @classmethod
+    def has_dependencies(cls) -> bool:


can we add

have_testcontainers = have_package("testcontainers") testcontainers_requirement_message = ( "" if have_testcontainers else "No module named 'testcontainers'" ) have_kafka = have_package("kafka") kafka_requirement_message = "" if have_kafka else "No module named 'kafka'"

in python/pyspark/testing/utils.py?

these deps were centrailized there

viirya

I wonder should we install the dependency and let the pipeline run the tests instead of skipping them?

jerrypeng · 2026-04-09T05:46:39Z

@gaogaotiantian I see the test is passing without problem for other PRs. Why is this run (https://github.com/apache/spark/actions/runs/24128422095) special that cause it to fail to due to missing dependencies?

As @viirya mentioned, can we just install the dependencies instead of skipping the tests?

jerrypeng · 2026-04-09T05:50:17Z

What was the point of the other PR https://github.com/apache/spark/pull/55270/changes ? It didn't solve the issue?

zhengruifeng · 2026-04-09T06:09:23Z

@jerrypeng @viirya
there are special testing envs for different purpose, e.g.

https://github.com/apache/spark/actions/workflows/build_python_minimum.yml to check the mandatory dependencies with old versions;
https://github.com/apache/spark/actions/workflows/build_python_3.12_macos26.yml to test on macos
https://github.com/apache/spark/actions/workflows/build_python_3.12_pandas_3.yml to check the pandas 3 compatibility

you can add the new dependencies in them if it makes sense and can make CI pass

viirya · 2026-04-09T06:28:09Z

I'm okay to skip them on test pipelines with special purposes such as dependencies with old versions, etc.

Refactor kafka test so it won't break CI

873335c

HyukjinKwon approved these changes Apr 9, 2026

View reviewed changes

HeartSaVioR mentioned this pull request Apr 9, 2026

[SPARK-55306][PYTHON][TESTS][FOLLOW-UP] Skip Kafka streaming RTM tests when dependencies are not installed #55270

Closed

zhengruifeng reviewed Apr 9, 2026

View reviewed changes

viirya reviewed Apr 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56403] Refactor kafka test so it's skipped when dependency is not available#55266

[SPARK-56403] Refactor kafka test so it's skipped when dependency is not available#55266
gaogaotiantian wants to merge 1 commit intoapache:masterfrom
gaogaotiantian:refactor-kafka-tests

gaogaotiantian commented Apr 8, 2026

Uh oh!

gaogaotiantian commented Apr 8, 2026

Uh oh!

zhengruifeng Apr 9, 2026

Uh oh!

viirya left a comment

Uh oh!

jerrypeng commented Apr 9, 2026 •

edited

Loading

Uh oh!

jerrypeng commented Apr 9, 2026

Uh oh!

zhengruifeng commented Apr 9, 2026

Uh oh!

viirya commented Apr 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

gaogaotiantian commented Apr 8, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

gaogaotiantian commented Apr 8, 2026

Uh oh!

zhengruifeng Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

viirya left a comment

Choose a reason for hiding this comment

Uh oh!

jerrypeng commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jerrypeng commented Apr 9, 2026

Uh oh!

zhengruifeng commented Apr 9, 2026

Uh oh!

viirya commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jerrypeng commented Apr 9, 2026 •

edited

Loading

viirya commented Apr 9, 2026 •

edited

Loading