Skip to content

[SPARK-56403] Refactor kafka test so it's skipped when dependency is not available#55266

Open
gaogaotiantian wants to merge 1 commit intoapache:masterfrom
gaogaotiantian:refactor-kafka-tests
Open

[SPARK-56403] Refactor kafka test so it's skipped when dependency is not available#55266
gaogaotiantian wants to merge 1 commit intoapache:masterfrom
gaogaotiantian:refactor-kafka-tests

Conversation

@gaogaotiantian
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

  • Add a util method has_dependencies to KafkaUtils
  • Move all the module-level code of kafka test inside classs
  • Skip the kafka test when dependency is not available, instead of raising an error
  • Restore os.environ after the test
  • Use the testing utility in __main__ instead of old entry

Why are the changes needed?

We don't want a test to fail if optional dependency is not available. It's breaking our CIs - https://github.com/apache/spark/actions/runs/24128422095 . The test itself should not have too much module-level code, and should have minimum side effect on the environment.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Confirmed it was correctly skipped when dependency was not available. CI should confirm whether test is still working properly.

Was this patch authored or co-authored using generative AI tooling?

No.

@gaogaotiantian
Copy link
Copy Markdown
Contributor Author

@jerrypeng as the original author.

self.broker = None

@classmethod
def has_dependencies(cls) -> bool:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add

have_testcontainers = have_package("testcontainers")
testcontainers_requirement_message = (
    "" if have_testcontainers else "No module named 'testcontainers'"
)

have_kafka = have_package("kafka")
kafka_requirement_message = "" if have_kafka else "No module named 'kafka'"

in python/pyspark/testing/utils.py?

these deps were centrailized there

Copy link
Copy Markdown
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder should we install the dependency and let the pipeline run the tests instead of skipping them?

@jerrypeng
Copy link
Copy Markdown
Contributor

jerrypeng commented Apr 9, 2026

@gaogaotiantian I see the test is passing without problem for other PRs. Why is this run (https://github.com/apache/spark/actions/runs/24128422095) special that cause it to fail to due to missing dependencies?

As @viirya mentioned, can we just install the dependencies instead of skipping the tests?

@jerrypeng
Copy link
Copy Markdown
Contributor

What was the point of the other PR https://github.com/apache/spark/pull/55270/changes ? It didn't solve the issue?

@zhengruifeng
Copy link
Copy Markdown
Contributor

@jerrypeng @viirya
there are special testing envs for different purpose, e.g.

you can add the new dependencies in them if it makes sense and can make CI pass

@viirya
Copy link
Copy Markdown
Member

viirya commented Apr 9, 2026

I'm okay to skip them on test pipelines with special purposes such as dependencies with old versions, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants