Skip to content

Docker hardening follow-ups (Phase 3) #1551

@brunobuddy

Description

@brunobuddy

Follow-up polish items from the Docker release audit. The critical correctness fixes (routing status honesty, provider whitelist, tier override validation, pricing cache health UI, .env.example wiring, MANIFEST_TRUST_LAN docs, OpenClaw branding) are being shipped in the companion PR. This issue tracks the remaining items that didn't make that pass.

Compose hardening leftovers (docker/docker-compose.yml)

  • Add a size limit to the /tmp tmpfs: tmpfs: - /tmp:size=64m. A misbehaving upload or stream buffer today can grow unbounded into container memory.
  • Raise pids_limit from 256 to 512. Node + Better Auth + proxy fanout can burst past 100 pids easily, and 256 is close enough to the ceiling to cause surprise EAGAIN failures.
  • Replace the healthcheck wget -qO- with node -e "fetch('http://127.0.0.1:3001/api/v1/health').then(r=>process.exit(r.ok?0:1)).catch(()=>process.exit(1))". BusyBox wget support for `-qO-` is shaky across Alpine variants; node is guaranteed to be present in the runtime image.

HTTP status codes for non-chat error responses

  • In `packages/backend/src/routing/proxy/proxy-exception.filter.ts`, detect the request intent (e.g. `Accept: application/json` vs. a chat client SSE/OpenAI client) and return real 4xx statuses (401/400) for auth/body errors when the caller clearly isn't a chat UI. Keep the current friendly HTTP 200 envelope for chat clients. CI pipelines and monitors are currently misled into thinking broken configs are healthy.

Gate `/api/v1/public/usage` behind an opt-in env var

  • `packages/backend/src/public-stats/public-stats.controller.ts` — only serve the public usage stats when `MANIFEST_PUBLIC_STATS=true`. Default off. Loopback binding in the Phase 1 PR already mitigates the default exposure, but anyone running Manifest behind a public reverse proxy today leaks aggregate message counts to unauthenticated callers.

Verify + demote Better Auth "User not found" log level

  • Reproduce by attempting login with an unknown email and grepping backend logs for `User not found`. Confirmed absent from our own source, so it's likely emitted upstream by Better Auth. If so, wrap their logger or file upstream. If it turns up in our own code, demote to WARN — it's an enumeration aid and floods alerting pipelines that watch for ERROR-level spikes.

Parameterize `og:url` / `og:image` from `BETTER_AUTH_URL`

  • `packages/frontend/index.html` currently hardcodes `https://app.manifest.build\` for `og:url` and `og:image`. Inject them at boot time from `BETTER_AUTH_URL` (server-side template substitution) so self-hosters' shared links carry their own branding instead of Cloud Manifest's.

CI smoke test for `read_only: true` compose boot

  • Add `.github/workflows/docker-smoke.yml` that runs `docker compose -f docker/docker-compose.yml up -d`, waits for `/api/v1/health`, runs a login + agent-create e2e path via curl, and tears down. Guards against any future code that silently writes to disk (which would crash the read-only production container).

Narrow the Dockerfile `*.md` cleanup inside node_modules

  • `docker/Dockerfile` currently runs `find . -path "/node_modules/" -name "*.md" -delete`. Rare, but some packages read markdown at runtime (e.g. `js-yaml`'s schema docs). Scope the delete to package roots only (`maxdepth`-bounded find or explicit README targeting).

Minor — filter diagnostic/error `agent_messages` rows out of the "messages today" counters

  • (already landed in the Phase 1/2 PR for workspace-card and per-agent analytics, not for `/api/v1/messages` itself). Consider adding a `status` filter parameter to `/api/v1/messages` so the UI can offer an "errors only" toggle without a schema change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions