Skip to content

Commit d51b39b

Browse files
gaogaotiantianHyukjinKwon
authored andcommitted
[SPARK-56186][PYTHON] Retire pypy
### What changes were proposed in this pull request? We retire pypy * Remove all pypy related code in pyspark (actually the only mattered one is for simple traceback so it probably will still work) * Remove all pypy skips for tests * Remove master CI for pypy. **branch-4.0 and branch-4.1 tests are kept** * Remove pypy 3.11 docker image (3.10 is kept for testing) * Remove pypy from docs (we should probably do it for the actual spark website too) ### Why are the changes needed? We had a discussion in https://lists.apache.org/thread/glcq0zgr33sozo7y4y7jqph24yh3m92p about dropping support for pypy and we have many +1s and no -1s. `numpy` dropped support for pypy and pypy is not really in active development. ### Does this PR introduce _any_ user-facing change? Yes, we don't officially support pypy anymore. We still expect most of the old pypy code to work but we should not make any promises. ### How was this patch tested? CI. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #54988 from gaogaotiantian/retire-pypy. Authored-by: Tian Gao <gaogaotiantian@hotmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
1 parent 1010dab commit d51b39b

File tree

18 files changed

+15
-268
lines changed

18 files changed

+15
-268
lines changed

.github/workflows/build_infra_images_cache.yml

Lines changed: 0 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,6 @@ on:
3333
- 'dev/spark-test-image/python-minimum/Dockerfile'
3434
- 'dev/spark-test-image/python-ps-minimum/Dockerfile'
3535
- 'dev/spark-test-image/pypy-310/Dockerfile'
36-
- 'dev/spark-test-image/pypy-311/Dockerfile'
3736
- 'dev/spark-test-image/python-310/Dockerfile'
3837
- 'dev/spark-test-image/python-311/Dockerfile'
3938
- 'dev/spark-test-image/python-312/Dockerfile'
@@ -154,19 +153,6 @@ jobs:
154153
- name: Image digest (PySpark with PyPy 3.10)
155154
if: hashFiles('dev/spark-test-image/pypy-310/Dockerfile') != ''
156155
run: echo ${{ steps.docker_build_pyspark_pypy_310.outputs.digest }}
157-
- name: Build and push (PySpark with PyPy 3.11)
158-
if: hashFiles('dev/spark-test-image/pypy-311/Dockerfile') != ''
159-
id: docker_build_pyspark_pypy_311
160-
uses: docker/build-push-action@10e90e3645eae34f1e60eeb005ba3a3d33f178e8
161-
with:
162-
context: ./dev/spark-test-image/pypy-311/
163-
push: true
164-
tags: ghcr.io/apache/spark/apache-spark-github-action-image-pyspark-pypy-311-cache:${{ github.ref_name }}-static
165-
cache-from: type=registry,ref=ghcr.io/apache/spark/apache-spark-github-action-image-pyspark-pypy-311-cache:${{ github.ref_name }}
166-
cache-to: type=registry,ref=ghcr.io/apache/spark/apache-spark-github-action-image-pyspark-pypy-311-cache:${{ github.ref_name }},mode=max
167-
- name: Image digest (PySpark with PyPy 3.11)
168-
if: hashFiles('dev/spark-test-image/pypy-311/Dockerfile') != ''
169-
run: echo ${{ steps.docker_build_pyspark_pypy_311.outputs.digest }}
170156
- name: Build and push (PySpark with Python 3.10)
171157
if: hashFiles('dev/spark-test-image/python-310/Dockerfile') != ''
172158
id: docker_build_pyspark_python_310

.github/workflows/build_python_pypy3.10.yml

Lines changed: 0 additions & 47 deletions
This file was deleted.

.github/workflows/build_python_pypy3.11.yml

Lines changed: 0 additions & 47 deletions
This file was deleted.

.github/workflows/maven_test.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ on:
4242
type: string
4343
default: ubuntu-latest
4444
arch:
45-
description: The target architecture (x86, x64, arm64) of the Python or PyPy interpreter.
45+
description: The target architecture (x86, x64, arm64) of the Python interpreter.
4646
required: false
4747
type: string
4848
default: x64

.github/workflows/python_hosted_runner_test.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ on:
4646
type: string
4747
default: macos-15
4848
arch:
49-
description: The target architecture (x86, x64, arm64) of the Python or PyPy interpreter.
49+
description: The target architecture (x86, x64, arm64) of the Python interpreter.
5050
required: false
5151
type: string
5252
default: arm64

README.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,10 +42,8 @@ This README file only contains basic setup instructions.
4242
| | [![GitHub Actions Build](https://github.com/apache/spark/actions/workflows/build_maven_java21_macos26.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_maven_java21_macos26.yml) |
4343
| | [![GitHub Actions Build](https://github.com/apache/spark/actions/workflows/build_maven_java21_arm.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_maven_java21_arm.yml) |
4444
| | [![GitHub Actions Build](https://github.com/apache/spark/actions/workflows/build_coverage.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_coverage.yml) |
45-
| | [![GitHub Actions Build](https://github.com/apache/spark/actions/workflows/build_python_pypy3.10.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_python_pypy3.10.yml) |
4645
| | [![GitHub Actions Build](https://github.com/apache/spark/actions/workflows/build_python_3.10.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_python_3.10.yml) |
4746
| | [![GitHub Actions Build](https://github.com/apache/spark/actions/workflows/build_python_3.11.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_python_3.11.yml) |
48-
| | [![GitHub Actions Build](https://github.com/apache/spark/actions/workflows/build_python_pypy3.11.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_python_pypy3.11.yml) |
4947
| | [![GitHub Actions Build](https://github.com/apache/spark/actions/workflows/build_python_3.12_classic_only.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_python_3.12_classic_only.yml) |
5048
| | [![GitHub Actions Build](https://github.com/apache/spark/actions/workflows/build_python_3.12_arm.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_python_3.12_arm.yml) |
5149
| | [![GitHub Actions Build](https://github.com/apache/spark/actions/workflows/build_python_3.12_macos26.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_python_3.12_macos26.yml) |

dev/spark-test-image/pypy-311/Dockerfile

Lines changed: 0 additions & 67 deletions
This file was deleted.

dev/sparktestsupport/modules.py

Lines changed: 0 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -700,9 +700,6 @@ def __hash__(self):
700700
"pyspark.sql.tests.pandas.streaming.test_transform_with_state_state_variable_checkpoint_v2",
701701
"pyspark.sql.tests.pandas.streaming.test_tws_tester",
702702
],
703-
excluded_python_implementations=[
704-
"PyPy" # Skip these tests under PyPy since they require numpy and it isn't available there
705-
],
706703
)
707704

708705
pyspark_mllib = Module(
@@ -733,9 +730,6 @@ def __hash__(self):
733730
"pyspark.mllib.tests.test_streaming_algorithms",
734731
"pyspark.mllib.tests.test_util",
735732
],
736-
excluded_python_implementations=[
737-
"PyPy" # Skip these tests under PyPy since they require numpy and it isn't available there
738-
],
739733
)
740734

741735

@@ -799,9 +793,6 @@ def __hash__(self):
799793
"pyspark.ml.tests.test_regression",
800794
"pyspark.ml.tests.test_clustering",
801795
],
802-
excluded_python_implementations=[
803-
"PyPy" # Skip these tests under PyPy since they require numpy and it isn't available there
804-
],
805796
)
806797

807798
pyspark_install = Module(
@@ -978,10 +969,6 @@ def __hash__(self):
978969
"pyspark.pandas.tests.frame.test_asfreq",
979970
"pyspark.pandas.tests.frame.test_asof",
980971
],
981-
excluded_python_implementations=[
982-
"PyPy" # Skip these tests under PyPy since they require numpy, pandas, and pyarrow and
983-
# they aren't available there
984-
],
985972
)
986973

987974
pyspark_pandas_slow = Module(
@@ -1112,10 +1099,6 @@ def __hash__(self):
11121099
"pyspark.pandas.tests.diff_frames_ops.test_groupby_rolling_adv",
11131100
"pyspark.pandas.tests.diff_frames_ops.test_groupby_rolling_count",
11141101
],
1115-
excluded_python_implementations=[
1116-
"PyPy" # Skip these tests under PyPy since they require numpy, pandas, and pyarrow and
1117-
# they aren't available there
1118-
],
11191102
)
11201103

11211104
pyspark_connect = Module(
@@ -1219,10 +1202,6 @@ def __hash__(self):
12191202
"pyspark.sql.tests.connect.pandas.test_parity_pandas_udf_grouped_agg",
12201203
"pyspark.sql.tests.connect.pandas.test_parity_pandas_udf_window",
12211204
],
1222-
excluded_python_implementations=[
1223-
"PyPy" # Skip these tests under PyPy since they require numpy, pandas, and pyarrow and
1224-
# they aren't available there
1225-
],
12261205
)
12271206

12281207
pyspark_structured_streaming_connect = Module(
@@ -1244,9 +1223,6 @@ def __hash__(self):
12441223
"pyspark.sql.tests.connect.pandas.streaming.test_parity_transform_with_state",
12451224
"pyspark.sql.tests.connect.pandas.streaming.test_parity_transform_with_state_state_variable",
12461225
],
1247-
excluded_python_implementations=[
1248-
"PyPy" # Skip these tests under PyPy since they require numpy and it isn't available there
1249-
],
12501226
)
12511227

12521228

@@ -1284,10 +1260,6 @@ def __hash__(self):
12841260
"pyspark.ml.tests.connect.test_parity_ovr",
12851261
"pyspark.ml.tests.connect.test_parity_stat",
12861262
],
1287-
excluded_python_implementations=[
1288-
"PyPy" # Skip these tests under PyPy since they require numpy, pandas, and pyarrow and
1289-
# they aren't available there
1290-
],
12911263
)
12921264

12931265

@@ -1427,10 +1399,6 @@ def __hash__(self):
14271399
"pyspark.pandas.tests.connect.frame.test_parity_asfreq",
14281400
"pyspark.pandas.tests.connect.frame.test_parity_asof",
14291401
],
1430-
excluded_python_implementations=[
1431-
"PyPy" # Skip these tests under PyPy since they require numpy, pandas, and pyarrow and
1432-
# they aren't available there
1433-
],
14341402
)
14351403

14361404
pyspark_pandas_slow_connect = Module(
@@ -1559,10 +1527,6 @@ def __hash__(self):
15591527
"pyspark.pandas.tests.connect.diff_frames_ops.test_parity_groupby_shift",
15601528
"pyspark.pandas.tests.connect.diff_frames_ops.test_parity_groupby_transform",
15611529
],
1562-
excluded_python_implementations=[
1563-
"PyPy" # Skip these tests under PyPy since they require numpy, pandas, and pyarrow and
1564-
# they aren't available there
1565-
],
15661530
)
15671531

15681532

docs/rdd-programming-guide.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ along with if you launch Spark's interactive shell -- either `bin/spark-shell` f
4040
<div data-lang="python" markdown="1">
4141

4242
Spark {{site.SPARK_VERSION}} works with Python 3.10+. It can use the standard CPython interpreter,
43-
so C libraries like NumPy can be used. It also works with PyPy 7.3.6+.
43+
so C libraries like NumPy can be used.
4444

4545
Spark applications in Python can either be run with the `bin/spark-submit` script which includes Spark at runtime, or by including it in your setup.py as:
4646

@@ -71,7 +71,6 @@ you can specify which version of Python you want to use by `PYSPARK_PYTHON`, for
7171

7272
{% highlight bash %}
7373
$ PYSPARK_PYTHON=python3.8 bin/pyspark
74-
$ PYSPARK_PYTHON=/path-to-your-pypy/pypy bin/spark-submit examples/src/main/python/pi.py
7574
{% endhighlight %}
7675

7776
</div>

python/packaging/classic/setup.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -390,7 +390,6 @@ def run(self):
390390
"Programming Language :: Python :: 3.13",
391391
"Programming Language :: Python :: 3.14",
392392
"Programming Language :: Python :: Implementation :: CPython",
393-
"Programming Language :: Python :: Implementation :: PyPy",
394393
"Typing :: Typed",
395394
],
396395
cmdclass={

0 commit comments

Comments
 (0)