Skip to content

Commit b03c69c

Browse files
gaogaotiantianHyukjinKwon
authored andcommitted
[SPARK-55346][INFRA][PYTHON] Upgrade pystack version to 1.6.0 and install it on all major images
### What changes were proposed in this pull request? * Upgrade pystack to >= 1.6.0 because it supports 3.13t now * Install it (and psutil) on all major docker images ### Why are the changes needed? pystack used to lack 3.13t wheels and we had to skip 3.13 for requirements. Now it supports it so we don't need this special rule. `pystack` has been proven very useful to find hanging issues (#53783). Enabling it on not only master, but also other scheduled tests could help us diagnosis more hanging issues (notice that master is using 3.12 now so we are not even using it on master). For example, https://github.com/apache/spark/actions/runs/21645825351/job/62398366525 is a hanging issue but we have no information from it. https://github.com/apache/spark/actions/runs/21648052684/job/62405320893 also timed out without useful information. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? It has been working well with 3.11 without causing issues. It helped us figure out a very difficult racing issue. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #54124 from gaogaotiantian/upgrade-pystack. Authored-by: Tian Gao <gaogaotiantian@hotmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
1 parent 663a6c4 commit b03c69c

File tree

8 files changed

+15
-15
lines changed

8 files changed

+15
-15
lines changed

dev/requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ graphviz==0.20.3
7777
flameprof==0.4
7878
viztracer
7979
debugpy
80-
pystack>=1.5.1; python_version!='3.13' and sys_platform=='linux' # no 3.13t wheels
80+
pystack>=1.6.0; sys_platform=='linux'
8181
psutil
8282

8383
# TorchDistributor dependencies

dev/spark-test-image/python-310/Dockerfile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ LABEL org.opencontainers.image.ref.name="Apache Spark Infra Image For PySpark wi
2424
# Overwrite this label to avoid exposing the underlying Ubuntu OS version label
2525
LABEL org.opencontainers.image.version=""
2626

27-
ENV FULL_REFRESH_DATE=20260124
27+
ENV FULL_REFRESH_DATE=20260203
2828

2929
ENV DEBIAN_FRONTEND=noninteractive
3030
ENV DEBCONF_NONINTERACTIVE_SEEN=true
@@ -56,7 +56,7 @@ RUN apt-get update && apt-get install -y \
5656
&& rm -rf /var/lib/apt/lists/*
5757

5858

59-
ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas==2.3.3 scipy plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2"
59+
ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas==2.3.3 scipy plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2 pystack>=1.6.0 psutil"
6060
# Python deps for Spark Connect
6161
ARG CONNECT_PIP_PKGS="grpcio==1.76.0 grpcio-status==1.76.0 protobuf==6.33.5 googleapis-common-protos==1.71.0 zstandard==0.25.0 graphviz==0.20.3"
6262

dev/spark-test-image/python-311/Dockerfile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ LABEL org.opencontainers.image.ref.name="Apache Spark Infra Image For PySpark wi
2424
# Overwrite this label to avoid exposing the underlying Ubuntu OS version label
2525
LABEL org.opencontainers.image.version=""
2626

27-
ENV FULL_REFRESH_DATE=20260124
27+
ENV FULL_REFRESH_DATE=20260203
2828

2929
ENV DEBIAN_FRONTEND=noninteractive
3030
ENV DEBCONF_NONINTERACTIVE_SEEN=true
@@ -55,7 +55,7 @@ RUN apt-get update && apt-get install -y \
5555
&& rm -rf /var/lib/apt/lists/*
5656

5757

58-
ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas==2.3.3 scipy plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2 pystack psutil"
58+
ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas==2.3.3 scipy plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2 pystack>=1.6.0 psutil"
5959
# Python deps for Spark Connect
6060
ARG CONNECT_PIP_PKGS="grpcio==1.76.0 grpcio-status==1.76.0 protobuf==6.33.5 googleapis-common-protos==1.71.0 zstandard==0.25.0 graphviz==0.20.3"
6161

dev/spark-test-image/python-312-classic-only/Dockerfile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ LABEL org.opencontainers.image.ref.name="Apache Spark Infra Image For PySpark Cl
2424
# Overwrite this label to avoid exposing the underlying Ubuntu OS version label
2525
LABEL org.opencontainers.image.version=""
2626

27-
ENV FULL_REFRESH_DATE=20260127
27+
ENV FULL_REFRESH_DATE=20260203
2828

2929
ENV DEBIAN_FRONTEND=noninteractive
3030
ENV DEBCONF_NONINTERACTIVE_SEEN=true
@@ -55,7 +55,7 @@ RUN apt-get update && apt-get install -y \
5555
&& rm -rf /var/lib/apt/lists/*
5656

5757

58-
ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 pandas==2.3.3 plotly<6.0.0 matplotlib openpyxl memory-profiler>=0.61.0 mlflow>=2.8.1 scipy scikit-learn>=1.3.2"
58+
ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 pandas==2.3.3 plotly<6.0.0 matplotlib openpyxl memory-profiler>=0.61.0 mlflow>=2.8.1 scipy scikit-learn>=1.3.2 pystack>=1.6.0 psutil"
5959
ARG TEST_PIP_PKGS="coverage unittest-xml-reporting"
6060

6161
# Install Python 3.12 packages

dev/spark-test-image/python-312/Dockerfile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ LABEL org.opencontainers.image.ref.name="Apache Spark Infra Image For PySpark wi
2424
# Overwrite this label to avoid exposing the underlying Ubuntu OS version label
2525
LABEL org.opencontainers.image.version=""
2626

27-
ENV FULL_REFRESH_DATE=20260124
27+
ENV FULL_REFRESH_DATE=20260203
2828

2929
ENV DEBIAN_FRONTEND=noninteractive
3030
ENV DEBCONF_NONINTERACTIVE_SEEN=true
@@ -55,7 +55,7 @@ RUN apt-get update && apt-get install -y \
5555
&& rm -rf /var/lib/apt/lists/*
5656

5757

58-
ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas==2.3.3 scipy plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2"
58+
ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas==2.3.3 scipy plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2 pystack>=1.6.0 psutil"
5959
# Python deps for Spark Connect
6060
ARG CONNECT_PIP_PKGS="grpcio==1.76.0 grpcio-status==1.76.0 protobuf==6.33.5 googleapis-common-protos==1.71.0 zstandard==0.25.0 graphviz==0.20.3"
6161

dev/spark-test-image/python-313/Dockerfile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ LABEL org.opencontainers.image.ref.name="Apache Spark Infra Image For PySpark wi
2424
# Overwrite this label to avoid exposing the underlying Ubuntu OS version label
2525
LABEL org.opencontainers.image.version=""
2626

27-
ENV FULL_REFRESH_DATE=20260124
27+
ENV FULL_REFRESH_DATE=20260203
2828

2929
ENV DEBIAN_FRONTEND=noninteractive
3030
ENV DEBCONF_NONINTERACTIVE_SEEN=true
@@ -55,7 +55,7 @@ RUN apt-get update && apt-get install -y \
5555
&& rm -rf /var/lib/apt/lists/*
5656

5757

58-
ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas==2.3.3 scipy plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2"
58+
ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas==2.3.3 scipy plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2 pystack>=1.6.0 psutil"
5959
# Python deps for Spark Connect
6060
ARG CONNECT_PIP_PKGS="grpcio==1.76.0 grpcio-status==1.76.0 protobuf==6.33.5 googleapis-common-protos==1.71.0 zstandard==0.25.0 graphviz==0.20.3"
6161

dev/spark-test-image/python-314-nogil/Dockerfile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ LABEL org.opencontainers.image.ref.name="Apache Spark Infra Image For PySpark wi
2424
# Overwrite this label to avoid exposing the underlying Ubuntu OS version label
2525
LABEL org.opencontainers.image.version=""
2626

27-
ENV FULL_REFRESH_DATE=20260127
27+
ENV FULL_REFRESH_DATE=20260203
2828

2929
ENV DEBIAN_FRONTEND=noninteractive
3030
ENV DEBCONF_NONINTERACTIVE_SEEN=true
@@ -64,5 +64,5 @@ RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.14t
6464
# TODO: Add BASIC_PIP_PKGS and CONNECT_PIP_PKGS when it supports Python 3.14 free threaded
6565
# TODO: Add lxml, grpcio, grpcio-status back when they support Python 3.14 free threaded
6666
RUN python3.14t -m pip install --ignore-installed 'blinker>=1.6.2' # mlflow needs this
67-
RUN python3.14t -m pip install 'numpy>=2.1' 'pyarrow>=19.0.0' 'six==1.16.0' 'pandas==2.3.3' scipy coverage matplotlib openpyxl jinja2 && \
67+
RUN python3.14t -m pip install 'numpy>=2.1' 'pyarrow>=19.0.0' 'six==1.16.0' 'pandas==2.3.3' 'pystack>=1.6.0' scipy coverage matplotlib openpyxl jinja2 psutil && \
6868
python3.14t -m pip cache purge

dev/spark-test-image/python-314/Dockerfile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ LABEL org.opencontainers.image.ref.name="Apache Spark Infra Image For PySpark wi
2424
# Overwrite this label to avoid exposing the underlying Ubuntu OS version label
2525
LABEL org.opencontainers.image.version=""
2626

27-
ENV FULL_REFRESH_DATE=20260124
27+
ENV FULL_REFRESH_DATE=20260203
2828

2929
ENV DEBIAN_FRONTEND=noninteractive
3030
ENV DEBCONF_NONINTERACTIVE_SEEN=true
@@ -55,7 +55,7 @@ RUN apt-get update && apt-get install -y \
5555
&& rm -rf /var/lib/apt/lists/*
5656

5757

58-
ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas==2.3.3 scipy plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2"
58+
ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas==2.3.3 scipy plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2 pystack>=1.6.0 psutil"
5959
# Python deps for Spark Connect
6060
ARG CONNECT_PIP_PKGS="grpcio==1.76.0 grpcio-status==1.76.0 protobuf==6.33.5 googleapis-common-protos==1.71.0 zstandard==0.25.0 graphviz==0.20.3"
6161

0 commit comments

Comments
 (0)