Skip to content

Commit 65e5948

Browse files
zhengruifengYicong-Huang
authored andcommitted
[SPARK-55414][PYTHON][INFRA] Upgrade Python 3.12 test images for classic-only and pandas 3 to Ubuntu 24.04
### What changes were proposed in this pull request? Upgrade Python 3.12 test images for classic-only and pandas 3 to Ubuntu 24.04 ### Why are the changes needed? to test with newer os ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? PR builder with ``` default: '{"PYSPARK_IMAGE_TO_TEST": "python-312-classic-only", "PYTHON_TO_TEST": "python3.12"}' ``` https://github.com/zhengruifeng/spark/actions/runs/21777398247/job/62836232446 passed ``` default: '{"PYSPARK_IMAGE_TO_TEST": "python-312-pandas-3", "PYTHON_TO_TEST": "python3.12"}' ``` https://github.com/zhengruifeng/spark/actions/runs/21778886479/job/62840558934 failed as expected, since pandas 3 support is still WIP ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#54201 from zhengruifeng/u24_py_312_313. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
1 parent 204bab9 commit 65e5948

File tree

2 files changed

+25
-24
lines changed

2 files changed

+25
-24
lines changed

dev/spark-test-image/python-312-classic-only/Dockerfile

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -15,16 +15,16 @@
1515
# limitations under the License.
1616
#
1717

18-
# Image for building and testing Spark branches. Based on Ubuntu 22.04.
18+
# Image for building and testing Spark branches. Based on Ubuntu 24.04.
1919
# See also in https://hub.docker.com/_/ubuntu
20-
FROM ubuntu:jammy-20240911.1
20+
FROM ubuntu:noble
2121
LABEL org.opencontainers.image.authors="Apache Spark project <dev@spark.apache.org>"
2222
LABEL org.opencontainers.image.licenses="Apache-2.0"
2323
LABEL org.opencontainers.image.ref.name="Apache Spark Infra Image For PySpark Classic with Python 3.12"
2424
# Overwrite this label to avoid exposing the underlying Ubuntu OS version label
2525
LABEL org.opencontainers.image.version=""
2626

27-
ENV FULL_REFRESH_DATE=20260203
27+
ENV FULL_REFRESH_DATE=20260207
2828

2929
ENV DEBIAN_FRONTEND=noninteractive
3030
ENV DEBCONF_NONINTERACTIVE_SEEN=true
@@ -41,26 +41,27 @@ RUN apt-get update && apt-get install -y \
4141
libopenblas-dev \
4242
libssl-dev \
4343
openjdk-17-jdk-headless \
44+
python3.12 \
4445
pkg-config \
4546
tzdata \
4647
software-properties-common \
47-
zlib1g-dev
48-
49-
# Install Python 3.12
50-
RUN add-apt-repository ppa:deadsnakes/ppa
51-
RUN apt-get update && apt-get install -y \
52-
python3.12 \
48+
zlib1g-dev \
5349
&& apt-get autoremove --purge -y \
5450
&& apt-get clean \
5551
&& rm -rf /var/lib/apt/lists/*
5652

53+
# Setup virtual environment
54+
ENV VIRTUAL_ENV=/opt/spark-venv
55+
RUN python3.12 -m venv --without-pip $VIRTUAL_ENV
56+
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
57+
58+
# Install Python 3.12 packages
59+
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.12
5760

5861
ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 pandas==2.3.3 plotly<6.0.0 matplotlib openpyxl memory-profiler>=0.61.0 mlflow>=2.8.1 scipy scikit-learn>=1.3.2 pystack>=1.6.0 psutil"
5962
ARG TEST_PIP_PKGS="coverage unittest-xml-reporting"
6063

61-
# Install Python 3.12 packages
6264
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.12
63-
RUN python3.12 -m pip install --ignore-installed 'blinker>=1.6.2' # mlflow needs this
6465
RUN python3.12 -m pip install $BASIC_PIP_PKGS $TEST_PIP_PKGS && \
6566
python3.12 -m pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu && \
6667
python3.12 -m pip install deepspeed torcheval && \

dev/spark-test-image/python-312-pandas-3/Dockerfile

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -18,16 +18,16 @@
1818
# Note this is a temporary image file for development with Pandas 3,
1919
# and will be remvoed after PySpark is fully compatible with Pandas 3.
2020

21-
# Image for building and testing Spark branches. Based on Ubuntu 22.04.
21+
# Image for building and testing Spark branches. Based on Ubuntu 24.04.
2222
# See also in https://hub.docker.com/_/ubuntu
23-
FROM ubuntu:jammy-20240911.1
23+
FROM ubuntu:noble
2424
LABEL org.opencontainers.image.authors="Apache Spark project <dev@spark.apache.org>"
2525
LABEL org.opencontainers.image.licenses="Apache-2.0"
2626
LABEL org.opencontainers.image.ref.name="Apache Spark Infra Image For PySpark with Python 3.12 and Pandas 3"
2727
# Overwrite this label to avoid exposing the underlying Ubuntu OS version label
2828
LABEL org.opencontainers.image.version=""
2929

30-
ENV FULL_REFRESH_DATE=20260127
30+
ENV FULL_REFRESH_DATE=20260207
3131

3232
ENV DEBIAN_FRONTEND=noninteractive
3333
ENV DEBCONF_NONINTERACTIVE_SEEN=true
@@ -44,27 +44,27 @@ RUN apt-get update && apt-get install -y \
4444
libopenblas-dev \
4545
libssl-dev \
4646
openjdk-17-jdk-headless \
47+
python3.12 \
4748
pkg-config \
4849
tzdata \
4950
software-properties-common \
50-
zlib1g-dev
51-
52-
# Install Python 3.12
53-
RUN add-apt-repository ppa:deadsnakes/ppa
54-
RUN apt-get update && apt-get install -y \
55-
python3.12 \
51+
zlib1g-dev \
5652
&& apt-get autoremove --purge -y \
5753
&& apt-get clean \
5854
&& rm -rf /var/lib/apt/lists/*
5955

56+
# Setup virtual environment
57+
ENV VIRTUAL_ENV=/opt/spark-venv
58+
RUN python3.12 -m venv --without-pip $VIRTUAL_ENV
59+
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
60+
61+
# Install Python 3.12 packages
6062
# Note that mlflow is execluded since it requires pandas<3
63+
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.12
64+
6165
ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas>=3 scipy plotly<6.0.0 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2"
62-
# Python deps for Spark Connect
6366
ARG CONNECT_PIP_PKGS="grpcio==1.76.0 grpcio-status==1.76.0 protobuf==6.33.5 googleapis-common-protos==1.71.0 zstandard==0.25.0 graphviz==0.20.3"
6467

65-
# Install Python 3.12 packages
66-
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.12
67-
# RUN python3.12 -m pip install --ignore-installed 'blinker>=1.6.2' # mlflow needs this
6868
RUN python3.12 -m pip install $BASIC_PIP_PKGS unittest-xml-reporting $CONNECT_PIP_PKGS lxml && \
6969
python3.12 -m pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu && \
7070
python3.12 -m pip install torcheval && \

0 commit comments

Comments
 (0)