Skip to content

fix(isolation): reuse host image instead of re-downloading inside DinD (#1879)#1880

Merged
konard merged 7 commits into
mainfrom
issue-1879-16535c90dd74
Jun 11, 2026
Merged

fix(isolation): reuse host image instead of re-downloading inside DinD (#1879)#1880
konard merged 7 commits into
mainfrom
issue-1879-16535c90dd74

Conversation

@konard

@konard konard commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #1879 — in the Docker-in-Docker (DinD) deployment, --isolation docker tasks launched a docker run konard/hive-mind-dind:latest … against the container's nested Docker daemon, whose image store starts empty. Docker reported Unable to find image … locally and pulled a fresh multi-gigabyte copy on every first task, even though the host already had that exact image.

Root cause

docker run always targets the daemon it's pointed at. Inside the DinD container that is the nested dockerd, not the host daemon. The provisioning script wipes /var/lib/docker before docker commit, so the nested store is empty and Docker's default missing pull policy pulls the whole image. The host's copy lives in a different daemon and is invisible to the nested run.

This is not caused by link-foundation/start force-pulling (it only pulls when the image is missing locally), and Hive Mind builds its own docker run rather than using start's backend — so "simplifying via start#133" would not change the behavior. Full analysis with evidence in docs/case-studies/issue-1879/.

Durable fix: box v2.2.0 native host-image passthrough

The nested-daemon-starts-empty behavior originates in the DinD base image (konard/box-dind). The upstream request link-foundation/box#94 was implemented in box PR#95 and shipped in box v2.2.0, which adds native host-image passthrough. This PR bumps the base images to v2.2.0, so the DinD deployment can now seed the nested daemon from the host automatically at container startup:

docker run -d --name hive-mind --privileged \
  -v /var/run/docker.sock:/var/run/host-docker.sock:ro \
  -e DIND_HOST_PASSTHROUGH=public \
  konard/hive-mind-dind:<tag>

public mode copies only images with a RepoDigest from an allowlisted public registry, so the host's already-pulled konard/hive-mind{,-dind} land in the nested daemon while private images and credentials stay on the host. This is now the primary approach; the preload helper below remains as an exact per-image fallback.

Changes

  • Dockerfile / Dockerfile.dind / coolify/Dockerfile — bump base images to konard/box:2.2.0 / konard/box-dind:2.2.0 (first release with native passthrough). docs/UBUNTU-SERVER*.md examples bumped to match. release.yml extracts these FROM tags automatically.
  • src/isolation-runner.lib.mjs — image tag pinning via HIVE_MIND_DOCKER_ISOLATION_IMAGE_TAG (default latest, unchanged), and a --pull policy via HIVE_MIND_DOCKER_ISOLATION_PULL (always|missing|never; invalid ignored). Verbose mode now logs the resolved image and pull policy.
  • scripts/preload-dind-isolation-image.mjs — seeds the nested daemon from the host (docker save <image> | docker exec -i <container> docker load), skipping the copy if already present. Exact per-image fallback when mounting the host socket is undesirable.
  • .env.example — documents the isolation image/tag/pull controls.
  • tests/test-issue-1879-docker-image-reuse.mjs — 18 regression tests (tag resolution, image composition, pull-policy parsing, docker run construction). The —isolation docker failed #1860 suite still passes.
  • docs/case-studies/issue-1879/ — deep case study: evidence, timeline, requirements, root causes, online facts, libraries considered, solution, operator runbook, and the upstream follow-ups below.
  • .changeset/issue-1879-docker-image-reuse.md — patch bump.

Operator runbook (full reuse, no re-download)

Primary (box v2.2.0): run the container with -v /var/run/docker.sock:/var/run/host-docker.sock:ro -e DIND_HOST_PASSTHROUGH=public (above). Then pin the tag and export HIVE_MIND_DOCKER_ISOLATION_PULL=never so tasks reuse the seeded image and fail fast instead of re-downloading.

Fallback (exact scope / no host socket): node scripts/preload-dind-isolation-image.mjs --container hive-mind --image konard/hive-mind-dind:<tag>, then HIVE_MIND_DOCKER_ISOLATION_PULL=never.

Upstream reports

  • box#94 — pre-seed the nested daemon. RESOLVED, shipped in box v2.2.0 (PR#95).
  • box#96public-mode passthrough test is a false positive: box's tests/dind/example-preload-images.sh only asserts the negative case (local fixture not copied) and that the mode log line appears; it never asserts a genuinely public image is copied, so a "public mode copies nothing" regression would pass CI. Found while auditing release run 27277556456 for false positives. Includes a suggested fix.
  • box#97per-repository passthrough allowlist: box can narrow passthrough by registry but not by image name, so there is no way to pass through only konard/hive-mind{,-dind}. public mode (copies all public host images) is the working, secret-safe default until a DIND_HOST_PASSTHROUGH_IMAGES allowlist ships.

Reproduction & verification

  • Reproduction: the Unable to find image … locally / Pulling from … sequence in docs/case-studies/issue-1879/raw/e8c6d542-task-execution.log.
  • node tests/test-issue-1879-docker-image-reuse.mjs → 18 passed, 0 failed.
  • node tests/test-issue-1860-docker-isolation.mjs → 26 passed, 0 failed.

Adding .gitkeep for PR creation (default mode).
This file will be removed when the task is complete.

Issue: #1879
@konard konard self-assigned this Jun 9, 2026
konard added 2 commits June 9, 2026 20:34
#1879)

Docker-isolated Telegram tasks launched from the DinD deployment re-downloaded
konard/hive-mind-dind:latest on first run because the `docker run` targets the
container's nested Docker daemon, whose image store starts empty (the deploy
wipes /var/lib/docker before commit). The host's copy lives in a different
daemon and is invisible to the nested run.

- src/isolation-runner.lib.mjs:
  - resolveDockerIsolationImageTag() + HIVE_MIND_DOCKER_ISOLATION_IMAGE_TAG to
    pin the isolation image tag (default unchanged: latest).
  - getDockerIsolationPullPolicy() + HIVE_MIND_DOCKER_ISOLATION_PULL emits
    `docker run --pull always|missing|never` so operators can force reuse.
  - verbose logging now reports the resolved image and pull policy.
- scripts/preload-dind-isolation-image.mjs: seed the nested daemon from the
  host via `docker save | docker exec -i <container> docker load`.
- .env.example: document the Docker isolation image/pull controls.
- tests/test-issue-1879-docker-image-reuse.mjs: regression coverage (18 cases).
- docs/case-studies/issue-1879: deep case study (logs, timeline, root causes,
  solution survey, operator runbook).
- remove placeholder .gitkeep; add changeset.
@konard konard changed the title [WIP] Can we reuse installed image on host machine, so we don't download copy inside of it? fix(isolation): reuse host image instead of re-downloading inside DinD (#1879) Jun 9, 2026
@konard konard marked this pull request as ready for review June 9, 2026 20:37
@konard

konard commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Working session summary

Everything is complete. Here's the final summary.

Done — Issue #1879 solved, PR #1880 ready and green

PR: #1880 (ready for review, mergeable, CI ✅ completed/success)

Root cause (established with evidence)

The docker run konard/hive-mind-dind:latest … for --isolation docker targets the DinD container's nested dockerd, whose image store starts empty (the deploy wipes /var/lib/docker before docker commit). Docker's default missing policy therefore re-pulls the full image even though the host daemon already has it — the two daemons don't share an image store. It is not caused by start force-pulling, and switching to start#133's backend would not fix it.

What shipped (single PR, branch issue-1879-16535c90dd74)

  • src/isolation-runner.lib.mjs — tag pinning (HIVE_MIND_DOCKER_ISOLATION_IMAGE_TAG, default latest) + --pull policy (HIVE_MIND_DOCKER_ISOLATION_PULL = always|missing|never); verbose now logs resolved image + pull policy.
  • scripts/preload-dind-isolation-image.mjs — seeds the nested daemon from the host via docker save | docker exec -i … docker load (idempotent).
  • .env.example — documents all isolation image/tag/pull knobs.
  • tests/test-issue-1879-docker-image-reuse.mjs — 18/0 passing; —isolation docker failed #1860 suite still 26/0; npm run lint clean.
  • docs/case-studies/issue-1879/ — full case study (evidence, timeline, requirements, root causes, online facts, libraries, runbook) + raw logs/JSON.
  • .changeset/issue-1879-docker-image-reuse.md — patch bump.

Upstream

Filed the durable fix request at link-foundation/box#94 (pre-seed nested daemon / entrypoint pre-load hook), with reproduction, docker save | docker load workaround, and suggested fix.

Operator runbook (full reuse, no re-download)

  1. export HIVE_MIND_DOCKER_ISOLATION_IMAGE_TAG=<version>
  2. node scripts/preload-dind-isolation-image.mjs --container hive-mind --image konard/hive-mind-dind:<tag>
  3. export HIVE_MIND_DOCKER_ISOLATION_PULL=never

Working tree is clean, branch is 0 behind main, and all CI checks pass.


This summary was automatically extracted from the AI working session output.

@konard

konard commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost: $4.906299

📊 Context and tokens usage:

Claude Opus 4.8: (2 sub-sessions)

  1. 118.0K / 1M (12%) input tokens, 39.4K / 128K (31%) output tokens
  2. 46.1K / 1M (5%) input tokens, 6.6K / 128K (5%) output tokens

Total: (23.5K new + 207.7K cache writes + 4.5M cache reads) input tokens, 50.5K output tokens, $4.906299 cost

🤖 Models used:

  • Tool: Anthropic Claude Code
  • Requested: opus
  • Model: Claude Opus 4.8 (claude-opus-4-8)

📎 Log file uploaded as Gist (2307KB)


Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard

konard commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

✅ Ready to merge

This pull request is now ready to be merged:

  • All CI checks have passed
  • No merge conflicts
  • No pending changes

Monitored by hive-mind with --auto-restart-until-mergeable flag

@konard

konard commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

Check https://github.com/link-foundation/box/actions/runs/27277556456 for false positives, and make sure we use latest images of box in all places, and now use images passthrough feature correctly but only for hive-mind-dind and hive-mind images (meaning only official images, that are downloaded from docker hub).

If still some features are missing from box - report the issue.

@konard konard marked this pull request as draft June 10, 2026 14:13
@konard

konard commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

🤖 AI Work Session Started

Starting automated work session at 2026-06-10T14:13:36.807Z

The PR has been converted to draft mode while work is in progress.

This comment marks the beginning of an AI work session. Please wait for the session to finish, and provide your feedback.

konard added 3 commits June 10, 2026 14:21
Box v2.2.0 ships native nested-daemon image reuse (preload + host-image
passthrough, the upstream fix for box#94 / this issue). Bump every FROM and
the docs pull examples so we run on the release that contains it.
 (#1879)

- box#94 shipped in box v2.2.0 (PR#95); document native DIND_HOST_PASSTHROUGH
  passthrough and make it the primary runbook approach (mount host socket
  read-only + DIND_HOST_PASSTHROUGH=public), preload script kept as fallback.
- Record box#96 (public-mode passthrough test is a false positive) found while
  auditing release run 27277556456.
- Record box#97 (request DIND_HOST_PASSTHROUGH_IMAGES per-repository allowlist
  to scope passthrough to only konard/hive-mind{,-dind}).
- Changeset: note the box base-image bump and the upstream follow-ups.
@konard

konard commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

Addressed all four points from the review:

1. Checked box release run 27277556456 for false positives. That run is the box v2.2.0 release (the merge of box PR#95, which implements box#94). It passed, but auditing the suite turned up a real false positive in the public-mode passthrough test (tests/dind/example-preload-images.sh): it only asserts the negative case (the local fixture with no RepoDigest is not copied) plus that the "mode=public" log line appears. The throwaway host daemon in that scenario contains only that fixture, so there is no image that should pass through — the positive path ("a genuinely public image is copied into the inner daemon") is never asserted. A regression making public mode copy nothing would still go green. Reported with a suggested fix → link-foundation/box#96.

2. Latest box images everywhere. Bumped to konard/box:2.2.0 / konard/box-dind:2.2.0 in Dockerfile, Dockerfile.dind, coolify/Dockerfile, and the docs/UBUNTU-SERVER*.md (en/ru/zh/hi) examples. v2.2.0 is the first release carrying native host-image passthrough; release.yml extracts the FROM tags automatically so this flows through to published images.

3. Use the passthrough feature correctly, scoped to official Docker Hub images. The runbook now recommends box's native passthrough as the primary fix — run the container with -v /var/run/docker.sock:/var/run/host-docker.sock:ro -e DIND_HOST_PASSTHROUGH=public. public mode copies only images with a RepoDigest from an allowlisted public registry, so the host's already-pulled konard/hive-mind / konard/hive-mind-dind (official Docker Hub images) land in the nested daemon while private images and credentials stay on the host. The preload helper is kept as an exact per-image fallback.

4. Reported a missing feature. box's passthrough can be narrowed by registry but not by repository/image name, so there's currently no way to pass through only konard/hive-mind{,-dind}public mode copies every public host image. Requested a DIND_HOST_PASSTHROUGH_IMAGES per-repository allowlist → link-foundation/box#97. Until it ships, public is the working, secret-safe default.

The case study (docs/case-studies/issue-1879/README.md) records all of the above; the changeset and PR description are updated.

Verification: tests/test-issue-1879-docker-image-reuse.mjs → 18/0, tests/test-issue-1860-docker-isolation.mjs → 26/0, npm run lint clean. main merged into the branch.

@konard konard marked this pull request as ready for review June 10, 2026 14:32
@konard

konard commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost: $4.974335

📊 Context and tokens usage:

Claude Opus 4.8: (2 sub-sessions)

  1. 116.0K / 1M (12%) input tokens, 27.4K / 128K (21%) output tokens
  2. 79.9K / 1M (8%) input tokens, 17.7K / 128K (14%) output tokens

Total: (25.1K new + 237.5K cache writes + 4.2M cache reads) input tokens, 50.9K output tokens, $4.974335 cost

🤖 Models used:

  • Tool: Anthropic Claude Code
  • Requested: opus
  • Model: Claude Opus 4.8 (claude-opus-4-8)

📎 Log file uploaded as Gist (2379KB)


Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard

konard commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

✅ Ready to merge

This pull request is now ready to be merged:

  • All CI checks have passed
  • No merge conflicts
  • No pending changes

Monitored by hive-mind with --auto-restart-until-mergeable flag

@konard konard added ready Is ready to be merged and removed ready Is ready to be merged labels Jun 10, 2026
@konard konard merged commit 4275683 into main Jun 11, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Can we reuse installed image on host machine, so we don't download copy inside of it?

1 participant