Skip to content

Add Docker + docker-compose for reproducible pipeline execution #11

@DogInfantry

Description

@DogInfantry

Summary

Add a Dockerfile and docker-compose.yml so the pipeline runs identically on any machine without manual environment setup.

Motivation

Current setup requires Anaconda, specific Python version, and system dependencies for WeasyPrint (Cairo, Pango). This is the most common friction point for new users. A Docker image eliminates this entirely.

Proposed Structure

# Dockerfile
FROM python:3.11-slim

# WeasyPrint system deps
RUN apt-get update && apt-get install -y \
    libcairo2 libpango-1.0-0 libpangocairo-1.0-0 \
    libgdk-pixbuf2.0-0 libffi-dev shared-mime-info

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .

ENTRYPOINT ["python", "main_v2.py"]
# docker-compose.yml
services:
  research-engine:
    build: .
    environment:
      - SEC_USER_AGENT=${SEC_USER_AGENT}
    volumes:
      - ./outputs:/app/outputs
      - ./data:/app/data
    command: build-all --as-of 2026-03-27

Acceptance Criteria

  • Dockerfile builds successfully
  • docker-compose up runs a full build-all and writes output to ./outputs/
  • SEC_USER_AGENT passed via environment variable (not hardcoded)
  • .dockerignore added to exclude outputs/, data/cache/, .venv/
  • Docker instructions added to README Quick Start
  • Image size under 800MB

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions