Skip to content

Commit 6ab3c80

Browse files
zhengruifengHyukjinKwon
authored andcommitted
[SPARK-55358][PYTHON][INFRA][FOLLOW-UP] Do not apt-get install python3-xxx
### What changes were proposed in this pull request? Do not apt-get install `python3-xxx` ### Why are the changes needed? In ubuntu 24, apt-get install python3-xxx will also install python3.12. It is error-prone and doesn't work with other python versions from `deadsnakes`, we should always install python packages via pip. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #54197 from zhengruifeng/ubuntu_24_py_12_fu. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
1 parent 668b2c5 commit 6ab3c80

File tree

1 file changed

+12
-9
lines changed

1 file changed

+12
-9
lines changed

dev/spark-test-image/python-312/Dockerfile

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -42,21 +42,24 @@ RUN apt-get update && apt-get install -y \
4242
libssl-dev \
4343
openjdk-17-jdk-headless \
4444
python3.12 \
45-
python3-pip \
46-
python3-venv \
4745
pkg-config \
4846
tzdata \
4947
software-properties-common \
50-
zlib1g-dev
48+
zlib1g-dev \
49+
&& apt-get autoremove --purge -y \
50+
&& apt-get clean \
51+
&& rm -rf /var/lib/apt/lists/*
5152

52-
ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas==2.3.3 scipy plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2 pystack>=1.6.0 psutil"
53-
# Python deps for Spark Connect
54-
ARG CONNECT_PIP_PKGS="grpcio==1.76.0 grpcio-status==1.76.0 protobuf==6.33.5 googleapis-common-protos==1.71.0 zstandard==0.25.0 graphviz==0.20.3"
53+
# Setup virtual environment
54+
ENV VIRTUAL_ENV=/opt/spark-venv
55+
RUN python3.12 -m venv --without-pip $VIRTUAL_ENV
56+
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
5557

5658
# Install Python 3.12 packages
57-
ENV VIRTUAL_ENV /opt/spark-venv
58-
RUN python3.12 -m venv $VIRTUAL_ENV
59-
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
59+
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.12
60+
61+
ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas==2.3.3 scipy plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2 pystack>=1.6.0 psutil"
62+
ARG CONNECT_PIP_PKGS="grpcio==1.76.0 grpcio-status==1.76.0 protobuf==6.33.5 googleapis-common-protos==1.71.0 zstandard==0.25.0 graphviz==0.20.3"
6063

6164
RUN python3.12 -m pip install $BASIC_PIP_PKGS unittest-xml-reporting $CONNECT_PIP_PKGS lxml && \
6265
python3.12 -m pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu && \

0 commit comments

Comments
 (0)