Skip to content

Commit 8b26f49

Browse files
committed
[SPARK-55141][PYTHON][INFRA] Set up a scheduled workflow for Pandas 3
### What changes were proposed in this pull request? Set up a scheduled builder for Pandas 3 ### Why are the changes needed? for development purpose, to monitor how pyspark is compatible with pandas 3 ### Does this PR introduce _any_ user-facing change? no, infra-only ### How was this patch tested? test the image build with PR builder this image is successfully built in https://github.com/zhengruifeng/spark/actions/runs/21272805063/job/61226373282 ``` Successfully installed contourpy-1.3.3 coverage-7.13.1 cycler-0.12.1 et-xmlfile-2.0.0 fonttools-4.61.1 googleapis-common-protos-1.71.0 graphviz-0.20.3 grpcio-1.76.0 grpcio-status-1.76.0 joblib-1.5.3 kiwisolver-1.4.9 lxml-6.0.2 matplotlib-3.10.8 memory-profiler-0.61.0 numpy-2.4.1 openpyxl-3.1.5 packaging-26.0 pandas-3.0.0 pillow-12.1.0 plotly-5.24.1 protobuf-6.33.0 psutil-7.2.1 pyarrow-23.0.0 pyparsing-3.3.2 python-dateutil-2.9.0.post0 scikit-learn-1.8.0 scipy-1.17.0 tenacity-9.1.2 threadpoolctl-3.6.0 typing-extensions-4.15.0 unittest-xml-reporting-4.0.0 zstandard-0.25.0 ``` ### Was this patch authored or co-authored using generative AI tooling? no Closes #53926 from zhengruifeng/infra_pandas_3. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
1 parent 9c4d962 commit 8b26f49

File tree

4 files changed

+146
-0
lines changed

4 files changed

+146
-0
lines changed

.github/workflows/build_infra_images_cache.yml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ on:
3838
- 'dev/spark-test-image/python-311/Dockerfile'
3939
- 'dev/spark-test-image/python-311-classic-only/Dockerfile'
4040
- 'dev/spark-test-image/python-312/Dockerfile'
41+
- 'dev/spark-test-image/python-312-pandas-3/Dockerfile'
4142
- 'dev/spark-test-image/python-313/Dockerfile'
4243
- 'dev/spark-test-image/python-313-nogil/Dockerfile'
4344
- 'dev/spark-test-image/python-314/Dockerfile'
@@ -219,6 +220,19 @@ jobs:
219220
- name: Image digest (PySpark with Python 3.12)
220221
if: hashFiles('dev/spark-test-image/python-312/Dockerfile') != ''
221222
run: echo ${{ steps.docker_build_pyspark_python_312.outputs.digest }}
223+
- name: Build and push (PySpark with Python 3.12 Pandas 3)
224+
if: hashFiles('dev/spark-test-image/python-312-pandas-3/Dockerfile') != ''
225+
id: docker_build_pyspark_python_312_pandas_3
226+
uses: docker/build-push-action@v6
227+
with:
228+
context: ./dev/spark-test-image/python-312-pandas-3/
229+
push: true
230+
tags: ghcr.io/apache/spark/apache-spark-github-action-image-pyspark-python-312-pandas-3-cache:${{ github.ref_name }}-static
231+
cache-from: type=registry,ref=ghcr.io/apache/spark/apache-spark-github-action-image-pyspark-python-312-pandas-3-cache:${{ github.ref_name }}
232+
cache-to: type=registry,ref=ghcr.io/apache/spark/apache-spark-github-action-image-pyspark-python-312-pandas-3-cache:${{ github.ref_name }},mode=max
233+
- name: Image digest (PySpark with Python 3.12 Pandas 3)
234+
if: hashFiles('dev/spark-test-image/python-312-pandas-3/Dockerfile') != ''
235+
run: echo ${{ steps.docker_build_pyspark_python_312_pandas_3.outputs.digest }}
222236
- name: Build and push (PySpark with Python 3.13)
223237
if: hashFiles('dev/spark-test-image/python-313/Dockerfile') != ''
224238
id: docker_build_pyspark_python_313
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
#
2+
# Licensed to the Apache Software Foundation (ASF) under one
3+
# or more contributor license agreements. See the NOTICE file
4+
# distributed with this work for additional information
5+
# regarding copyright ownership. The ASF licenses this file
6+
# to you under the Apache License, Version 2.0 (the
7+
# "License"); you may not use this file except in compliance
8+
# with the License. You may obtain a copy of the License at
9+
#
10+
# http://www.apache.org/licenses/LICENSE-2.0
11+
#
12+
# Unless required by applicable law or agreed to in writing,
13+
# software distributed under the License is distributed on an
14+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
# KIND, either express or implied. See the License for the
16+
# specific language governing permissions and limitations
17+
# under the License.
18+
#
19+
20+
name: "Build / Python-only (master, Python 3.12, Pandas 3)"
21+
22+
on:
23+
schedule:
24+
- cron: '0 21 * * *'
25+
workflow_dispatch:
26+
27+
jobs:
28+
run-build:
29+
permissions:
30+
packages: write
31+
name: Run
32+
uses: ./.github/workflows/build_and_test.yml
33+
if: github.repository == 'apache/spark'
34+
with:
35+
java: 17
36+
branch: master
37+
hadoop: hadoop3
38+
envs: >-
39+
{
40+
"PYSPARK_IMAGE_TO_TEST": "python-312-pandas-3",
41+
"PYTHON_TO_TEST": "python3.12"
42+
}
43+
jobs: >-
44+
{
45+
"pyspark": "true",
46+
"pyspark-pandas": "true"
47+
}

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ This README file only contains basic setup instructions.
4242
| | [![GitHub Actions Build](https://github.com/apache/spark/actions/workflows/build_python_3.11_arm.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_python_3.11_arm.yml) |
4343
| | [![GitHub Actions Build](https://github.com/apache/spark/actions/workflows/build_python_3.11_macos26.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_python_3.11_macos26.yml) |
4444
| | [![GitHub Actions Build](https://github.com/apache/spark/actions/workflows/build_python_3.12.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_python_3.12.yml) |
45+
| | [![GitHub Actions Build](https://github.com/apache/spark/actions/workflows/build_python_3.12_pandas_3.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_python_3.12_pandas_3.yml) |
4546
| | [![GitHub Actions Build](https://github.com/apache/spark/actions/workflows/build_python_3.13.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_python_3.13.yml) |
4647
| | [![GitHub Actions Build](https://github.com/apache/spark/actions/workflows/build_python_3.13_nogil.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_python_3.13_nogil.yml) |
4748
| | [![GitHub Actions Build](https://github.com/apache/spark/actions/workflows/build_python_3.14.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_python_3.14.yml) |
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
#
2+
# Licensed to the Apache Software Foundation (ASF) under one or more
3+
# contributor license agreements. See the NOTICE file distributed with
4+
# this work for additional information regarding copyright ownership.
5+
# The ASF licenses this file to You under the Apache License, Version 2.0
6+
# (the "License"); you may not use this file except in compliance with
7+
# the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
#
17+
18+
# Note this is a temporary image file for development with Pandas 3,
19+
# and will be remvoed after PySpark is fully compatible with Pandas 3.
20+
21+
# Image for building and testing Spark branches. Based on Ubuntu 22.04.
22+
# See also in https://hub.docker.com/_/ubuntu
23+
FROM ubuntu:jammy-20240911.1
24+
LABEL org.opencontainers.image.authors="Apache Spark project <[email protected]>"
25+
LABEL org.opencontainers.image.licenses="Apache-2.0"
26+
LABEL org.opencontainers.image.ref.name="Apache Spark Infra Image For PySpark with Python 3.12 and Pandas 3"
27+
# Overwrite this label to avoid exposing the underlying Ubuntu OS version label
28+
LABEL org.opencontainers.image.version=""
29+
30+
ENV FULL_REFRESH_DATE=20260110
31+
32+
ENV DEBIAN_FRONTEND=noninteractive
33+
ENV DEBCONF_NONINTERACTIVE_SEEN=true
34+
35+
RUN apt-get update && apt-get install -y \
36+
build-essential \
37+
ca-certificates \
38+
curl \
39+
gfortran \
40+
git \
41+
gnupg \
42+
libcurl4-openssl-dev \
43+
libfontconfig1-dev \
44+
libfreetype6-dev \
45+
libfribidi-dev \
46+
libgit2-dev \
47+
libharfbuzz-dev \
48+
libjpeg-dev \
49+
liblapack-dev \
50+
libopenblas-dev \
51+
libpng-dev \
52+
libpython3-dev \
53+
libssl-dev \
54+
libtiff5-dev \
55+
libwebp-dev \
56+
libxml2-dev \
57+
openjdk-17-jdk-headless \
58+
pkg-config \
59+
qpdf \
60+
tzdata \
61+
software-properties-common \
62+
wget \
63+
zlib1g-dev
64+
65+
# Install Python 3.12
66+
RUN add-apt-repository ppa:deadsnakes/ppa
67+
RUN apt-get update && apt-get install -y \
68+
python3.12 \
69+
&& apt-get autoremove --purge -y \
70+
&& apt-get clean \
71+
&& rm -rf /var/lib/apt/lists/*
72+
73+
# Note that mlflow is execluded since it requires pandas<3
74+
ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas>=3 scipy plotly<6.0.0 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2"
75+
# Python deps for Spark Connect
76+
ARG CONNECT_PIP_PKGS="grpcio==1.76.0 grpcio-status==1.76.0 protobuf==6.33.0 googleapis-common-protos==1.71.0 zstandard==0.25.0 graphviz==0.20.3"
77+
78+
# Install Python 3.12 packages
79+
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.12
80+
# RUN python3.12 -m pip install --ignore-installed 'blinker>=1.6.2' # mlflow needs this
81+
RUN python3.12 -m pip install $BASIC_PIP_PKGS unittest-xml-reporting $CONNECT_PIP_PKGS lxml && \
82+
python3.12 -m pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu && \
83+
python3.12 -m pip install torcheval && \
84+
python3.12 -m pip cache purge

0 commit comments

Comments
 (0)