Skip to content

Remove PyTorch/CUDA dependency and slim production image to <1.5GB#1500

Merged
JSv4 merged 5 commits intomainfrom
claude/fix-issue-1494-1oW7Q
May 4, 2026
Merged

Remove PyTorch/CUDA dependency and slim production image to <1.5GB#1500
JSv4 merged 5 commits intomainfrom
claude/fix-issue-1494-1oW7Q

Conversation

@JSv4
Copy link
Copy Markdown
Collaborator

@JSv4 JSv4 commented May 3, 2026

Summary

This PR removes the PyTorch and CUDA dependencies from the Django/Celery production image, reducing the uncompressed image size from ~6.3 GB to a target ≤1.5 GB. The changes eliminate unnecessary GPU-related infrastructure that was only used by an optional in-process cross-encoder reranker component, which has been removed entirely.

Key Changes

Docker Images

  • Base image change: Switched from pytorch/pytorch:2.7.1-cuda12.6-cudnn9-runtime to python:3.11-slim-bookworm for both production and local Django Dockerfiles
  • Removed CUDA environment variables: Eliminated CUDA_MODULE_LOADING, TORCH_CUDA_ARCH_LIST, CUDA_VISIBLE_DEVICES, and PYTORCH_CUDA_ALLOC_CONF which are no longer applicable
  • Streamlined system dependencies: Removed build-time packages (tesseract-ocr, libtesseract-dev, ffmpeg, etc.) that were only needed for the cross-encoder; kept only runtime essentials (libpq5, gettext, poppler-utils, tesseract-ocr, opencv transitive deps)
  • Improved layer organization: Added clear section comments separating build and runtime stages; consolidated RUN commands to reduce layer count

Code Removals

  • Removed CrossEncoderReranker class (opencontractserver/pipeline/rerankers/cross_encoder_reranker.py): The in-process cross-encoder reranker that required sentence-transformers and torch
  • Removed cross-encoder tests: Deleted test classes CrossEncoderRerankerTest and CrossEncoderRerankerTests from test files
  • Removed GPU documentation: Deleted docs/deployment/docker-gpu-setup.md and the CUDA-specific GitHub Actions workflow (docker-build-cuda.yml)
  • Removed configuration: Deleted SENTENCE_TRANSFORMER_MODELS_PATH setting from config/settings/base.py

CI/CD Updates

  • Updated release workflow: Added image size budget enforcement (1.5 GiB limit) to the Docker release build process to prevent future regressions
  • Removed CUDA workflow: Deleted the dedicated CUDA image build workflow since GPU support is no longer provided

Documentation

  • Updated CHANGELOG.md to document the image size reduction and removal of GPU support
  • Removed GPU setup guide from mkdocs navigation

Implementation Details

  • The production image now uses a multi-stage build with a slim Python base, keeping only the minimal runtime dependencies needed for the Django application
  • System package cleanup (apt-get purge and rm -rf /var/lib/apt/lists/*) is applied consistently to reduce layer bloat
  • The image size budget is enforced at build time via a shell script that validates the uncompressed image size against the 1.5 GiB limit
  • Cold pod pulls should now complete in seconds rather than 5+ minutes, significantly improving recovery time when nodes reschedule

https://claude.ai/code/session_01YPhM1iPb5kRAiie4VXT5hu

claude added 2 commits May 3, 2026 20:38
…er reranker (#1494)

Switches compose/{production,local}/django/Dockerfile from
pytorch/pytorch:2.7.1-cuda12.6-cudnn9-runtime (~6 GB) to
python:3.11-slim-bookworm with strict multi-stage build/runtime separation.
Removes the optional in-process CrossEncoderReranker and its torch-pulling
side-effects so the new slim base is sufficient for every code path that
ships in the image; reranking remains available via MicroserviceReranker
and CohereReranker.

Companion cleanups:
- Delete cross_encoder_reranker.py (auto-discovered, so no registry edits).
- Delete CrossEncoderRerankerTest / CrossEncoderRerankerTests stub-based
  test classes and their now-unused MagicMock import.
- Drop tokenizers pin from requirements/base.txt (orphaned once
  sentence-transformers is gone; nothing imports it directly).
- Drop SENTENCE_TRANSFORMER_MODELS_PATH setting and its
  production-stack.yml env var (only consumer was the deleted backend).
- Tighten .dockerignore to exclude frontend/, docs/, fixtures/, locale/,
  cloudflare-og-worker/, tools/, model_preloaders/, .github/, editor
  state, top-level READMEs, and Python caches.
- Add CI image-size budget (1.5 GiB) to docker-build-release.yml that
  fails the release build on regression.
- Delete docker-build-cuda.yml workflow and docs/deployment/docker-gpu-setup.md
  (and its mkdocs nav entry); both documented the now-removed CUDA path.

Closes #1494
The pytorch base image masked these because Ubuntu+CUDA shipped them
already; on python:3.11-slim-bookworm 'import cv2' fails with
ImportError on libgthread-2.0.so.0 / libgomp.so.1 until libglib2.0-0
and libgomp1 are installed.

Adds the two missing apt packages to both Dockerfiles plus a
'python -c "import cv2"' smoke test so a future opencv release that
introduces a new shared-library dep fails the build immediately
instead of silently shipping a broken image.

#1494
@claude
Copy link
Copy Markdown

claude Bot commented May 3, 2026

Code Review: Remove PyTorch/CUDA dependency and slim production image

This is a well-motivated, impactful change. Reducing the production image from ~6.3 GB to ≤1.5 GB is a significant operational improvement. The PR is generally clean and thorough, but there are a few issues worth addressing before merge.


Critical / Potential Breaking

1. locale/ excluded from build context — may break Django translations

The updated .dockerignore adds:

locale/

Django uses the files in locale/ for compiled translation messages (.po/.mo). The gettext system package is still installed in the runtime image, but if the translation files are never COPY-ed in (because they're excluded from the build context), compilemessages and all runtime _() calls will silently fall back to the raw msgid strings.

Before merging, please verify: does the locale/ directory contain anything that the running server reads? If so, this entry needs to be removed from .dockerignore. If translations are compiled and the artifacts live elsewhere (e.g. in staticfiles/), a clarifying comment here would help.


Medium Issues

2. numfmt may not be installed on ubuntu-latest runners

The new size-budget step in .github/workflows/docker-build-release.yml calls:

human_size="$(numfmt --to=iec-i --suffix=B "$size_bytes")"
human_budget="$(numfmt --to=iec-i --suffix=B "$budget")"

numfmt is part of GNU coreutils and is present on current ubuntu-latest GitHub-hosted runners, but this is not guaranteed to remain true across future runner image updates. The enforcement logic itself ([ "$size_bytes" -gt "$budget" ]) doesn't depend on numfmt, so a safer approach is to make numfmt calls optional:

human_size="$(numfmt --to=iec-i --suffix=B "$size_bytes" 2>/dev/null || echo "${size_bytes} bytes")"

3. Inconsistent spacy download strategy between local and production Dockerfiles

Local runtime stage uses wget with explicit retry flags:

RUN wget --retry-connrefused --waitretry=1 --read-timeout=20 --timeout=15 -t 5 \
      https://...en_core_web_lg...whl \
    && pip install en_core_web_lg...

Production runtime stage uses the simpler but less resilient form:

RUN python -m spacy download en_core_web_sm \
    && python -m spacy download en_core_web_lg

The production build is the one that runs in CI and on release builds, where transient network failures are more costly to retry manually. Consider applying the same wget-with-retry pattern to the production Dockerfile for the large model, or at minimum using pip install --retries N.

4. libpq-dev (dev headers) still used in local runtime stage

Production runtime correctly uses libpq5 (the shared library only). Local runtime still lists libpq-dev (which pulls in headers and static libs). Since wheels are pre-built in the builder stage, the runtime stage has no use for the -dev package. Switching local to libpq5 would make the two Dockerfiles consistent and trim the local image too.


Minor / Observations

5. CLAUDE.md attribution rule

The PR description ends with https://claude.ai/code/session_01YPhM1iPb5kRAiie4VXT5hu. The project's CLAUDE.md explicitly states: "Never credit Claude or Claude Code in commit messages, PR messages, comments, or any other artifacts." Please remove this link before merge.

6. load: true only for django image — correct, but worth a comment

load: ${{ matrix.image == 'django' }}

This is correct — load only makes sense when docker image inspect is needed immediately after. A one-line comment explaining why the condition exists would prevent a future maintainer from removing it thinking it's a bug.

7. Production Dockerfile removes git from runtime deps — confirm it's not needed

git was removed from the production runtime install list (it was present in the old image). This is almost certainly fine, but it's worth a quick check that no installed package's runtime path invokes git for version detection (some libraries do this). A grep -r 'subprocess.*git\|os.system.*git' /app sanity-check inside the old image would confirm.


What's Done Well

  • Clean, complete removal: every trace of CrossEncoderReranker — the class, tests, settings key, env var, CI workflow, and docs page — is gone. No dead code left behind.
  • Multi-stage build hygiene: build-only packages are correctly confined to python-build-stage; the runtime stage only installs what's needed at execution time.
  • Layer consolidation: chaining sed and chmod into single RUN statements is a solid improvement.
  • PIP_NO_CACHE_DIR=1 as an env var: cleaner than repeating --no-cache-dir on every pip install call.
  • .dockerignore expansion: excluding .cursor/, .claude/, .github/, docs/, fixtures/, cloudflare-og-worker/, model_preloaders/, etc., is the right call — the build context was leaking a lot of non-runtime material.
  • CI size budget: enforcing the 1.5 GiB ceiling in the release workflow is exactly the right way to prevent regressions from sneaking back in.
  • CHANGELOG entries: detailed, follow the Keep a Changelog format, and include file paths/line numbers per project convention.
  • tokenizers pin removal: the comment in requirements/base.txt correctly identified it as a transitive constraint for sentence-transformers; removing it now is appropriate cleanup.

Summary: The core of this PR (image slimming + CrossEncoder removal) is solid. The locale/ exclusion in .dockerignore is the one item that could cause a silent production regression and should be verified before merge. The other items are lower-severity but worth addressing while the Dockerfiles are being touched.

@claude
Copy link
Copy Markdown

claude Bot commented May 3, 2026

Code Review — PR #1500: Remove PyTorch/CUDA dependency, slim production image

This is a well-scoped, well-motivated PR. The primary goal (drop the 4–5 GB PyTorch/CUDA layer) is clearly achieved, the motivations are well-documented in both the PR description and CHANGELOG, and the layer-separation pattern (build stage vs. runtime stage) is correct. Below are some issues and suggestions, roughly ordered by severity.


Potential Bug: locale/ excluded from Docker build context

.dockerignore now excludes the locale/ directory:

locale/

However, the Dockerfiles still install gettext and the Django compilemessages management command is called in the entrypoint on some deployments. If any .po/.mo translation files live under locale/ (even unused ones), they will be silently absent from the image and any compilemessages run will be a no-op. Worth confirming whether locale/ is empty or already excluded from VCS before merging; if it's in use this is a silent regression.


Minor Issue: libpq-dev vs libpq5 inconsistency between local and production runtime stages

Production runtime stage (correct):

libpq5 \

Local runtime stage (installs the dev package):

libpq-dev \

libpq-dev includes the headers and the client library; libpq5 is the runtime-only package. Since psycopg2 is installed from a prebuilt wheel in the runtime stage, the -dev headers are never needed there and add unnecessary weight to the local image. The local Dockerfile comment says "Mirror production with the addition of wget" but diverges here. Suggest changing libpq-devlibpq5 in the local runtime stage (line ~82 after the diff applies).


Minor Issue: first_tag extraction in the size-budget script is fragile

first_tag="$(echo '${{ steps.meta.outputs.tags }}' | head -n1)"
size_bytes="$(docker image inspect --format='{{.Size}}' "$first_tag")"

steps.meta.outputs.tags is injected by GitHub Actions before the shell sees it. If the tag list contains a newline immediately after expansion, the single-quoting approach may produce unexpected results on some runners, and the {{.Size}} Go template inside a double-quoted string can confuse some shells. Since the build step now has id: build, a more robust alternative is:

image_id="${{ steps.build.outputs.imageid }}"
size_bytes="$(docker image inspect --format='{{.Size}}' "$image_id")"

steps.build.outputs.imageid is the image digest returned directly by docker/build-push-action — it's guaranteed to be a single-line value regardless of how many tags were applied.


Suggestion: Add image size gate to the PR / main CI run, not only the release build

The budget enforcement step is in docker-build-release.yml, which only fires on release publication. A size regression introduced in a feature branch will pass all CI checks and won't be caught until the next release build. Consider adding a similar (but non-blocking, or blocking) step to the main CI workflow (e.g., production-stack.yml) that builds the Django image and checks its size on every push to main — or at minimum on every change to compose/production/django/Dockerfile or requirements/.


Nit: Production spaCy download less resilient than local

Local correctly uses wget --retry-connrefused + a retry count for the large spaCy model, which is good for flaky CI runners. Production uses python -m spacy download en_core_web_lg which internally calls pip and inherits the PIP_RETRIES=10 env var — so it does retry, but the failure messages are less transparent. Not a blocking issue, just worth being aware of if production builds start failing intermittently on model downloads.


Positive callouts

  • Smoke-test import cv2 during build: Excellent practice. This has caught real regressions where a new opencv-python-headless release gained a shared-library dependency that wasn't installed.
  • Multi-stage build isolation: Build toolchain (build-essential, cmake, libtesseract-dev, libpq-dev) correctly stays in the build stage and never leaks to the runtime image.
  • PIP_NO_CACHE_DIR=1: Correct in both stages — ensures pip's HTTP cache doesn't inflate layer size.
  • Consolidated RUN commands for scripts (sed + chmod combined): reduces layer count without sacrificing readability.
  • CHANGELOG entry is detailed and accurately cross-references the files/settings removed.
  • Auto-discovery means no registry wiring needed: The comment noting that removing the file is sufficient because the pipeline registry auto-discovers rerankers is worth keeping — future contributors will understand why there's no explicit deregistration call.

Summary: The locale/ exclusion is the one item worth confirming before merge (it could silently break translations). The libpq-devlibpq5 fix in the local Dockerfile is a straightforward cleanup. Everything else is suggestions or nits. The core image-slim work is solid.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

The previous PyTorch base image shipped `brotli` incidentally via conda,
so httpx registered a `br` decoder and transparently decompressed
brotli-encoded responses. The new `python:3.11-slim-bookworm` base does
not, leaving httpx's SUPPORTED_DECODERS at {identity, gzip, deflate,
zstd}. Without a brotli decoder, httpx hands the still-compressed bytes
to callers and the OpenAI SDK raises
`json.JSONDecodeError: Expecting value: line 1 column 1 (char 0)` when
it tries to parse them.

This surfaced as `test_structured_response_uses_document_tools`
failing — its VCR cassette is the only one in the repo recorded with
`Content-Encoding: br`. Live OpenAI traffic was equally affected
whenever the API negotiated brotli.

Adding `brotli>=1.1.0` re-enables httpx's `br` decoder for both VCR
replay and production HTTP.
@claude
Copy link
Copy Markdown

claude Bot commented May 3, 2026

Code Review: PR #1500 — Remove PyTorch/CUDA and slim production image

Overall: This is a well-scoped, operationally important PR. The rationale is sound (eliminating ~4–5 GB of PyTorch/CUDA bloat that only served one optional in-process reranker), the implementation is clean, and the CHANGELOG entries are exemplary. A few issues worth addressing before merge:


Potential Bugs / Issues

1. locale/ excluded from Docker build context (.dockerignore line 56)

locale/ is now excluded from the Docker build context entirely. If compilemessages is called at container startup (or during the Django entrypoint), the .po/.mo files won't be present in the image and the command will silently produce nothing — or in some configurations fail. The production runtime stage still installs gettext, which implies compiled translations are expected. Please verify whether any Celery/Django startup script runs compilemessages, and if so, either:

  • Remove locale/ from .dockerignore, or
  • Pre-compile .mo files in the build stage and COPY them in, or
  • Document that translations are intentionally disabled.

2. load: true + push: true in release workflow (.github/workflows/docker-build-release.yml)

The diff adds load: ${{ matrix.image == 'django' }} to the docker/build-push-action step. If the existing workflow also has push: true (which is expected for a release workflow), both flags are active simultaneously for the Django image. While docker/build-push-action@v7 with Buildx ≥0.10 supports this for single-platform builds via --output fanout, it's not well-documented and some runner/daemon configurations reject it. Recommend verifying this works end-to-end in a test build before merge, or restructuring as:

  1. Build and load (no push) for size inspection.
  2. Push using the already-built image or a separate docker push step.

3. ffmpeg removal — verify no other consumers exist

ffmpeg was removed because it was "only needed for the cross-encoder." Before merging, a quick grep confirms:

grep -r "ffmpeg\|subprocess.*ffmpeg\|import ffmpeg" opencontractserver/ --include="*.py"

If any parser or pipeline component shells out to ffmpeg (e.g., for audio/video extraction or PDF rendering), it will silently fail at runtime. The PR description asserts this is safe, but it's worth a callout that this was verified.


Code Quality Observations

4. Inconsistent spaCy download method between local and production Dockerfiles

  • Local (compose/local/django/Dockerfile): Uses pip install <direct GitHub URL> + wget with retry logic for the large model.
  • Production (compose/production/django/Dockerfile): Uses python -m spacy download en_core_web_sm / python -m spacy download en_core_web_lg.

The spacy download command is less deterministic — it resolves the exact wheel URL at build time and could silently drift to a different patch version. The local pip install <pinned-URL> approach is more reproducible. Consider using the same pinned-URL pattern in production (or adding wget to the production build stage and using the same retry-safe wget approach).

5. Python version not pinned to a patch version

Both Dockerfiles now use ARG PYTHON_VERSION=3.11-slim-bookworm. This floats on the latest Python 3.11.x patch, meaning rebuilds on different dates can pick up new CPython releases that change behavior. Consider pinning to a specific patch (e.g., 3.11.12-slim-bookworm) for reproducible builds.

6. brotli addition is well-justified, but worth a note on the version floor

brotli>=1.1.0 is the right fix for the httpx br decoder issue described in the CHANGELOG. One minor nit: brotli ≥1.0.0 already exposed the C-extension API that httpx uses; 1.1.0 is fine, just verify the floor was chosen deliberately (e.g., a CVE or API change) rather than incidentally from the local environment.


Positive Highlights

  • CI image size budget (DJANGO_IMAGE_BUDGET_BYTES) is an excellent regression guard — surfacing this at release time rather than at pod cold-start is exactly the right place to catch it.
  • Multi-stage build clarity: The # --- Build stage --- / # --- Runtime stage --- section comments and the per-package comments in the runtime apt-get block make the intent of every dependency explicit.
  • cv2 smoke test (python -c "import cv2; print(...)") is a good practice for catching missing shared-lib deps early in the build.
  • .dockerignore expansion: Excluding .cursor/, .claude/, tools/, model_preloaders/, and similar dev-only paths is correct — these had no business in the runtime image and could leak development context.
  • Clean test removal: The CrossEncoderRerankerTest / CrossEncoderRerankerTests deletions correctly follow the "no dead code" rule, and the orphaned MagicMock import cleanup is thorough.
  • CHANGELOG entries are detailed, well-structured, and include file paths and line numbers per project conventions.

Summary

One real issue (locale/ exclusion) and one workflow concern (load+push interaction) should be verified before merge. The Python version pinning and spaCy consistency items are lower priority but improve reproducibility. The overall approach is sound and the operational win (~4.8 GB image reduction) is substantial.

github-actions Bot and others added 2 commits May 4, 2026 00:27
CHANGELOG.md: combined "Changed" entries — both the image-slim entry
(branch) and the Celery at-least-once delivery entry (main).

Screenshots in docs/assets/images/screenshots/auto/: forced to branch
versions via `git checkout HEAD --`. The two new
`discover--search-results--*.png` files added on main are new files
(not conflicts) and are kept.
@JSv4 JSv4 merged commit 1236d57 into main May 4, 2026
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants