Skip to content

Commit a32559a

Browse files
committed
[SPARK-54046][INFRA] Upgrade PyArrow to 22.0.0
### What changes were proposed in this pull request? This PR aims to upgrade `PyArrow` to 22.0.0. ### Why are the changes needed? To test against the latest `PyArrow` version. `PyArrow 22.0.0` is the first version to support `Python 3.14`. - https://pypi.org/project/pyarrow/22.0.0/ (2025-10-24) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #52748 from dongjoon-hyun/SPARK-54046. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
1 parent d16b128 commit a32559a

File tree

10 files changed

+20
-10
lines changed

10 files changed

+20
-10
lines changed

.github/workflows/python_hosted_runner_test.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ jobs:
147147
run: |
148148
python${{matrix.python}} -m pip install --ignore-installed 'blinker>=1.6.2'
149149
python${{matrix.python}} -m pip install --ignore-installed 'six==1.16.0'
150-
python${{matrix.python}} -m pip install numpy 'pyarrow>=21.0.0' 'six==1.16.0' 'pandas==2.3.3' scipy 'plotly<6.0.0' 'mlflow>=2.8.1' coverage matplotlib openpyxl 'memory-profiler>=0.61.0' 'scikit-learn>=1.3.2' unittest-xml-reporting && \
150+
python${{matrix.python}} -m pip install numpy 'pyarrow>=22.0.0' 'six==1.16.0' 'pandas==2.3.3' scipy 'plotly<6.0.0' 'mlflow>=2.8.1' coverage matplotlib openpyxl 'memory-profiler>=0.61.0' 'scikit-learn>=1.3.2' unittest-xml-reporting && \
151151
python${{matrix.python}} -m pip install 'grpcio==1.76.0' 'grpcio-status==1.76.0' 'protobuf==6.33.0' 'googleapis-common-protos==1.71.0' 'graphviz==0.20.3' && \
152152
python${{matrix.python}} -m pip cache purge
153153
- name: List Python packages

dev/spark-test-image/lint/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ RUN python3.11 -m pip install \
9494
'pandas' \
9595
'pandas-stubs==1.2.0.53' \
9696
'plotly>=4.8' \
97-
'pyarrow>=21.0.0' \
97+
'pyarrow>=22.0.0' \
9898
'pytest-mypy-plugins==1.9.3' \
9999
'pytest==7.1.3' \
100100
&& python3.11 -m pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu \

dev/spark-test-image/numpy-213/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ RUN apt-get update && apt-get install -y \
6969

7070

7171
# Pin numpy==2.1.3
72-
ARG BASIC_PIP_PKGS="numpy==2.1.3 pyarrow>=21.0.0 six==1.16.0 pandas==2.2.3 scipy plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2"
72+
ARG BASIC_PIP_PKGS="numpy==2.1.3 pyarrow>=22.0.0 six==1.16.0 pandas==2.2.3 scipy plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2"
7373
# Python deps for Spark Connect
7474
ARG CONNECT_PIP_PKGS="grpcio==1.76.0 grpcio-status==1.76.0 protobuf==6.33.0 googleapis-common-protos==1.71.0 graphviz==0.20.3"
7575

dev/spark-test-image/python-310/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ RUN apt-get update && apt-get install -y \
6464
&& rm -rf /var/lib/apt/lists/*
6565

6666

67-
ARG BASIC_PIP_PKGS="numpy pyarrow>=21.0.0 six==1.16.0 pandas==2.3.3 scipy plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2"
67+
ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas==2.3.3 scipy plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2"
6868
# Python deps for Spark Connect
6969
ARG CONNECT_PIP_PKGS="grpcio==1.76.0 grpcio-status==1.76.0 protobuf==6.33.0 googleapis-common-protos==1.71.0 graphviz==0.20.3"
7070

dev/spark-test-image/python-311-classic-only/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ RUN apt-get update && apt-get install -y \
6868
&& rm -rf /var/lib/apt/lists/*
6969

7070

71-
ARG BASIC_PIP_PKGS="numpy pyarrow>=21.0.0 pandas==2.3.3 plotly<6.0.0 matplotlib openpyxl memory-profiler>=0.61.0 mlflow>=2.8.1 scipy scikit-learn>=1.3.2"
71+
ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 pandas==2.3.3 plotly<6.0.0 matplotlib openpyxl memory-profiler>=0.61.0 mlflow>=2.8.1 scipy scikit-learn>=1.3.2"
7272
ARG TEST_PIP_PKGS="coverage unittest-xml-reporting"
7373

7474
# Install Python 3.11 packages

dev/spark-test-image/python-311/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ RUN apt-get update && apt-get install -y \
6868
&& rm -rf /var/lib/apt/lists/*
6969

7070

71-
ARG BASIC_PIP_PKGS="numpy pyarrow>=21.0.0 six==1.16.0 pandas==2.3.3 scipy plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2"
71+
ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas==2.3.3 scipy plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2"
7272
# Python deps for Spark Connect
7373
ARG CONNECT_PIP_PKGS="grpcio==1.76.0 grpcio-status==1.76.0 protobuf==6.33.0 googleapis-common-protos==1.71.0 graphviz==0.20.3"
7474

dev/spark-test-image/python-312/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ RUN apt-get update && apt-get install -y \
6868
&& rm -rf /var/lib/apt/lists/*
6969

7070

71-
ARG BASIC_PIP_PKGS="numpy pyarrow>=21.0.0 six==1.16.0 pandas==2.3.3 scipy plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2"
71+
ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas==2.3.3 scipy plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2"
7272
# Python deps for Spark Connect
7373
ARG CONNECT_PIP_PKGS="grpcio==1.76.0 grpcio-status==1.76.0 protobuf==6.33.0 googleapis-common-protos==1.71.0 graphviz==0.20.3"
7474

dev/spark-test-image/python-313-nogil/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ RUN apt-get update && apt-get install -y \
6868
&& rm -rf /var/lib/apt/lists/*
6969

7070

71-
ARG BASIC_PIP_PKGS="numpy pyarrow>=21.0.0 six==1.16.0 pandas==2.3.3 scipy plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2"
71+
ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas==2.3.3 scipy plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2"
7272
ARG CONNECT_PIP_PKGS="grpcio==1.76.0 grpcio-status==1.76.0 protobuf==6.33.0 googleapis-common-protos==1.71.0 graphviz==0.20.3"
7373

7474

dev/spark-test-image/python-313/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ RUN apt-get update && apt-get install -y \
6868
&& rm -rf /var/lib/apt/lists/*
6969

7070

71-
ARG BASIC_PIP_PKGS="numpy pyarrow>=21.0.0 six==1.16.0 pandas==2.3.3 scipy plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2"
71+
ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas==2.3.3 scipy plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2"
7272
# Python deps for Spark Connect
7373
ARG CONNECT_PIP_PKGS="grpcio==1.76.0 grpcio-status==1.76.0 protobuf==6.33.0 googleapis-common-protos==1.71.0 graphviz==0.20.3"
7474

python/pyspark/pandas/tests/io/test_feather.py

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
import sys
2121

2222
from pyspark import pandas as ps
23+
from pyspark.loose_version import LooseVersion
2324
from pyspark.testing.pandasutils import PandasOnSparkTestCase, TestUtils
2425

2526

@@ -35,7 +36,16 @@ def pdf(self):
3536
def psdf(self):
3637
return ps.from_pandas(self.pdf)
3738

38-
@unittest.skipIf(sys.version_info > (3, 13), "SPARK-54068")
39+
has_arrow_21_or_below = False
40+
try:
41+
import pyarrow as pa
42+
43+
if LooseVersion(pa.__version__) < LooseVersion("22.0.0"):
44+
has_arrow_21_or_below = True
45+
except ImportError:
46+
pass
47+
48+
@unittest.skipIf(not has_arrow_21_or_below, "SPARK-54068")
3949
def test_to_feather(self):
4050
with self.temp_dir() as dirpath:
4151
path1 = f"{dirpath}/file1.feather"

0 commit comments

Comments
 (0)