Remove PyTorch/CUDA dependency and slim production image to <1.5GB#1500
Remove PyTorch/CUDA dependency and slim production image to <1.5GB#1500
Conversation
…er reranker (#1494) Switches compose/{production,local}/django/Dockerfile from pytorch/pytorch:2.7.1-cuda12.6-cudnn9-runtime (~6 GB) to python:3.11-slim-bookworm with strict multi-stage build/runtime separation. Removes the optional in-process CrossEncoderReranker and its torch-pulling side-effects so the new slim base is sufficient for every code path that ships in the image; reranking remains available via MicroserviceReranker and CohereReranker. Companion cleanups: - Delete cross_encoder_reranker.py (auto-discovered, so no registry edits). - Delete CrossEncoderRerankerTest / CrossEncoderRerankerTests stub-based test classes and their now-unused MagicMock import. - Drop tokenizers pin from requirements/base.txt (orphaned once sentence-transformers is gone; nothing imports it directly). - Drop SENTENCE_TRANSFORMER_MODELS_PATH setting and its production-stack.yml env var (only consumer was the deleted backend). - Tighten .dockerignore to exclude frontend/, docs/, fixtures/, locale/, cloudflare-og-worker/, tools/, model_preloaders/, .github/, editor state, top-level READMEs, and Python caches. - Add CI image-size budget (1.5 GiB) to docker-build-release.yml that fails the release build on regression. - Delete docker-build-cuda.yml workflow and docs/deployment/docker-gpu-setup.md (and its mkdocs nav entry); both documented the now-removed CUDA path. Closes #1494
The pytorch base image masked these because Ubuntu+CUDA shipped them already; on python:3.11-slim-bookworm 'import cv2' fails with ImportError on libgthread-2.0.so.0 / libgomp.so.1 until libglib2.0-0 and libgomp1 are installed. Adds the two missing apt packages to both Dockerfiles plus a 'python -c "import cv2"' smoke test so a future opencv release that introduces a new shared-library dep fails the build immediately instead of silently shipping a broken image. #1494
Code Review: Remove PyTorch/CUDA dependency and slim production imageThis is a well-motivated, impactful change. Reducing the production image from ~6.3 GB to ≤1.5 GB is a significant operational improvement. The PR is generally clean and thorough, but there are a few issues worth addressing before merge. Critical / Potential Breaking1. The updated Django uses the files in Before merging, please verify: does the Medium Issues2. The new size-budget step in human_size="$(numfmt --to=iec-i --suffix=B "$size_bytes")"
human_budget="$(numfmt --to=iec-i --suffix=B "$budget")"
human_size="$(numfmt --to=iec-i --suffix=B "$size_bytes" 2>/dev/null || echo "${size_bytes} bytes")"3. Inconsistent spacy download strategy between local and production Dockerfiles Local runtime stage uses RUN wget --retry-connrefused --waitretry=1 --read-timeout=20 --timeout=15 -t 5 \
https://...en_core_web_lg...whl \
&& pip install en_core_web_lg...Production runtime stage uses the simpler but less resilient form: RUN python -m spacy download en_core_web_sm \
&& python -m spacy download en_core_web_lgThe production build is the one that runs in CI and on release builds, where transient network failures are more costly to retry manually. Consider applying the same 4. Production runtime correctly uses Minor / Observations5. CLAUDE.md attribution rule The PR description ends with 6. load: ${{ matrix.image == 'django' }}This is correct — 7. Production Dockerfile removes
What's Done Well
Summary: The core of this PR (image slimming + CrossEncoder removal) is solid. The |
Code Review — PR #1500: Remove PyTorch/CUDA dependency, slim production imageThis is a well-scoped, well-motivated PR. The primary goal (drop the 4–5 GB PyTorch/CUDA layer) is clearly achieved, the motivations are well-documented in both the PR description and CHANGELOG, and the layer-separation pattern (build stage vs. runtime stage) is correct. Below are some issues and suggestions, roughly ordered by severity. Potential Bug:
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
The previous PyTorch base image shipped `brotli` incidentally via conda,
so httpx registered a `br` decoder and transparently decompressed
brotli-encoded responses. The new `python:3.11-slim-bookworm` base does
not, leaving httpx's SUPPORTED_DECODERS at {identity, gzip, deflate,
zstd}. Without a brotli decoder, httpx hands the still-compressed bytes
to callers and the OpenAI SDK raises
`json.JSONDecodeError: Expecting value: line 1 column 1 (char 0)` when
it tries to parse them.
This surfaced as `test_structured_response_uses_document_tools`
failing — its VCR cassette is the only one in the repo recorded with
`Content-Encoding: br`. Live OpenAI traffic was equally affected
whenever the API negotiated brotli.
Adding `brotli>=1.1.0` re-enables httpx's `br` decoder for both VCR
replay and production HTTP.
Code Review: PR #1500 — Remove PyTorch/CUDA and slim production imageOverall: This is a well-scoped, operationally important PR. The rationale is sound (eliminating ~4–5 GB of PyTorch/CUDA bloat that only served one optional in-process reranker), the implementation is clean, and the CHANGELOG entries are exemplary. A few issues worth addressing before merge: Potential Bugs / Issues1.
2. The diff adds
3.
grep -r "ffmpeg\|subprocess.*ffmpeg\|import ffmpeg" opencontractserver/ --include="*.py"If any parser or pipeline component shells out to Code Quality Observations4. Inconsistent spaCy download method between local and production Dockerfiles
The 5. Python version not pinned to a patch version Both Dockerfiles now use 6.
Positive Highlights
SummaryOne real issue (locale/ exclusion) and one workflow concern (load+push interaction) should be verified before merge. The Python version pinning and spaCy consistency items are lower priority but improve reproducibility. The overall approach is sound and the operational win (~4.8 GB image reduction) is substantial. |
CHANGELOG.md: combined "Changed" entries — both the image-slim entry (branch) and the Celery at-least-once delivery entry (main). Screenshots in docs/assets/images/screenshots/auto/: forced to branch versions via `git checkout HEAD --`. The two new `discover--search-results--*.png` files added on main are new files (not conflicts) and are kept.
Summary
This PR removes the PyTorch and CUDA dependencies from the Django/Celery production image, reducing the uncompressed image size from ~6.3 GB to a target ≤1.5 GB. The changes eliminate unnecessary GPU-related infrastructure that was only used by an optional in-process cross-encoder reranker component, which has been removed entirely.
Key Changes
Docker Images
pytorch/pytorch:2.7.1-cuda12.6-cudnn9-runtimetopython:3.11-slim-bookwormfor both production and local Django DockerfilesCUDA_MODULE_LOADING,TORCH_CUDA_ARCH_LIST,CUDA_VISIBLE_DEVICES, andPYTORCH_CUDA_ALLOC_CONFwhich are no longer applicableCode Removals
CrossEncoderRerankerclass (opencontractserver/pipeline/rerankers/cross_encoder_reranker.py): The in-process cross-encoder reranker that required sentence-transformers and torchCrossEncoderRerankerTestandCrossEncoderRerankerTestsfrom test filesdocs/deployment/docker-gpu-setup.mdand the CUDA-specific GitHub Actions workflow (docker-build-cuda.yml)SENTENCE_TRANSFORMER_MODELS_PATHsetting fromconfig/settings/base.pyCI/CD Updates
Documentation
CHANGELOG.mdto document the image size reduction and removal of GPU supportImplementation Details
https://claude.ai/code/session_01YPhM1iPb5kRAiie4VXT5hu