fix(isolation): reuse host image instead of re-downloading inside DinD (#1879)#1880
Conversation
Adding .gitkeep for PR creation (default mode). This file will be removed when the task is complete. Issue: #1879
#1879) Docker-isolated Telegram tasks launched from the DinD deployment re-downloaded konard/hive-mind-dind:latest on first run because the `docker run` targets the container's nested Docker daemon, whose image store starts empty (the deploy wipes /var/lib/docker before commit). The host's copy lives in a different daemon and is invisible to the nested run. - src/isolation-runner.lib.mjs: - resolveDockerIsolationImageTag() + HIVE_MIND_DOCKER_ISOLATION_IMAGE_TAG to pin the isolation image tag (default unchanged: latest). - getDockerIsolationPullPolicy() + HIVE_MIND_DOCKER_ISOLATION_PULL emits `docker run --pull always|missing|never` so operators can force reuse. - verbose logging now reports the resolved image and pull policy. - scripts/preload-dind-isolation-image.mjs: seed the nested daemon from the host via `docker save | docker exec -i <container> docker load`. - .env.example: document the Docker isolation image/pull controls. - tests/test-issue-1879-docker-image-reuse.mjs: regression coverage (18 cases). - docs/case-studies/issue-1879: deep case study (logs, timeline, root causes, solution survey, operator runbook). - remove placeholder .gitkeep; add changeset.
Working session summaryEverything is complete. Here's the final summary. Done — Issue #1879 solved, PR #1880 ready and greenPR: #1880 (ready for review, mergeable, CI ✅ Root cause (established with evidence)The What shipped (single PR, branch
|
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost: $4.906299📊 Context and tokens usage:Claude Opus 4.8: (2 sub-sessions)
Total: (23.5K new + 207.7K cache writes + 4.5M cache reads) input tokens, 50.5K output tokens, $4.906299 cost 🤖 Models used:
📎 Log file uploaded as Gist (2307KB)Now working session is ended, feel free to review and add any feedback on the solution draft. |
✅ Ready to mergeThis pull request is now ready to be merged:
Monitored by hive-mind with --auto-restart-until-mergeable flag |
|
Check https://github.com/link-foundation/box/actions/runs/27277556456 for false positives, and make sure we use latest images of box in all places, and now use images passthrough feature correctly but only for hive-mind-dind and hive-mind images (meaning only official images, that are downloaded from docker hub). If still some features are missing from box - report the issue. |
|
🤖 AI Work Session Started Starting automated work session at 2026-06-10T14:13:36.807Z The PR has been converted to draft mode while work is in progress. This comment marks the beginning of an AI work session. Please wait for the session to finish, and provide your feedback. |
Box v2.2.0 ships native nested-daemon image reuse (preload + host-image passthrough, the upstream fix for box#94 / this issue). Bump every FROM and the docs pull examples so we run on the release that contains it.
(#1879) - box#94 shipped in box v2.2.0 (PR#95); document native DIND_HOST_PASSTHROUGH passthrough and make it the primary runbook approach (mount host socket read-only + DIND_HOST_PASSTHROUGH=public), preload script kept as fallback. - Record box#96 (public-mode passthrough test is a false positive) found while auditing release run 27277556456. - Record box#97 (request DIND_HOST_PASSTHROUGH_IMAGES per-repository allowlist to scope passthrough to only konard/hive-mind{,-dind}). - Changeset: note the box base-image bump and the upstream follow-ups.
|
Addressed all four points from the review: 1. Checked box release run 2. Latest box images everywhere. Bumped to 3. Use the passthrough feature correctly, scoped to official Docker Hub images. The runbook now recommends box's native passthrough as the primary fix — run the container with 4. Reported a missing feature. box's passthrough can be narrowed by registry but not by repository/image name, so there's currently no way to pass through only The case study ( Verification: |
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost: $4.974335📊 Context and tokens usage:Claude Opus 4.8: (2 sub-sessions)
Total: (25.1K new + 237.5K cache writes + 4.2M cache reads) input tokens, 50.9K output tokens, $4.974335 cost 🤖 Models used:
📎 Log file uploaded as Gist (2379KB)Now working session is ended, feel free to review and add any feedback on the solution draft. |
✅ Ready to mergeThis pull request is now ready to be merged:
Monitored by hive-mind with --auto-restart-until-mergeable flag |
Summary
Fixes #1879 — in the Docker-in-Docker (DinD) deployment,
--isolation dockertasks launched adocker run konard/hive-mind-dind:latest …against the container's nested Docker daemon, whose image store starts empty. Docker reportedUnable to find image … locallyand pulled a fresh multi-gigabyte copy on every first task, even though the host already had that exact image.Root cause
docker runalways targets the daemon it's pointed at. Inside the DinD container that is the nested dockerd, not the host daemon. The provisioning script wipes/var/lib/dockerbeforedocker commit, so the nested store is empty and Docker's defaultmissingpull policy pulls the whole image. The host's copy lives in a different daemon and is invisible to the nested run.This is not caused by
link-foundation/startforce-pulling (it only pulls when the image is missing locally), and Hive Mind builds its owndocker runrather than using start's backend — so "simplifying via start#133" would not change the behavior. Full analysis with evidence indocs/case-studies/issue-1879/.Durable fix: box v2.2.0 native host-image passthrough
The nested-daemon-starts-empty behavior originates in the DinD base image (
konard/box-dind). The upstream request link-foundation/box#94 was implemented in box PR#95 and shipped in box v2.2.0, which adds native host-image passthrough. This PR bumps the base images to v2.2.0, so the DinD deployment can now seed the nested daemon from the host automatically at container startup:publicmode copies only images with a RepoDigest from an allowlisted public registry, so the host's already-pulledkonard/hive-mind{,-dind}land in the nested daemon while private images and credentials stay on the host. This is now the primary approach; the preload helper below remains as an exact per-image fallback.Changes
Dockerfile/Dockerfile.dind/coolify/Dockerfile— bump base images tokonard/box:2.2.0/konard/box-dind:2.2.0(first release with native passthrough).docs/UBUNTU-SERVER*.mdexamples bumped to match.release.ymlextracts theseFROMtags automatically.src/isolation-runner.lib.mjs— image tag pinning viaHIVE_MIND_DOCKER_ISOLATION_IMAGE_TAG(defaultlatest, unchanged), and a--pullpolicy viaHIVE_MIND_DOCKER_ISOLATION_PULL(always|missing|never; invalid ignored). Verbose mode now logs the resolved image and pull policy.scripts/preload-dind-isolation-image.mjs— seeds the nested daemon from the host (docker save <image> | docker exec -i <container> docker load), skipping the copy if already present. Exact per-image fallback when mounting the host socket is undesirable..env.example— documents the isolation image/tag/pull controls.tests/test-issue-1879-docker-image-reuse.mjs— 18 regression tests (tag resolution, image composition, pull-policy parsing,docker runconstruction). The—isolation dockerfailed #1860 suite still passes.docs/case-studies/issue-1879/— deep case study: evidence, timeline, requirements, root causes, online facts, libraries considered, solution, operator runbook, and the upstream follow-ups below..changeset/issue-1879-docker-image-reuse.md— patch bump.Operator runbook (full reuse, no re-download)
Primary (box v2.2.0): run the container with
-v /var/run/docker.sock:/var/run/host-docker.sock:ro -e DIND_HOST_PASSTHROUGH=public(above). Then pin the tag andexport HIVE_MIND_DOCKER_ISOLATION_PULL=neverso tasks reuse the seeded image and fail fast instead of re-downloading.Fallback (exact scope / no host socket):
node scripts/preload-dind-isolation-image.mjs --container hive-mind --image konard/hive-mind-dind:<tag>, thenHIVE_MIND_DOCKER_ISOLATION_PULL=never.Upstream reports
tests/dind/example-preload-images.shonly asserts the negative case (local fixture not copied) and that the mode log line appears; it never asserts a genuinely public image is copied, so a "public mode copies nothing" regression would pass CI. Found while auditing release run27277556456for false positives. Includes a suggested fix.konard/hive-mind{,-dind}.publicmode (copies all public host images) is the working, secret-safe default until aDIND_HOST_PASSTHROUGH_IMAGESallowlist ships.Reproduction & verification
Unable to find image … locally/Pulling from …sequence indocs/case-studies/issue-1879/raw/e8c6d542-task-execution.log.node tests/test-issue-1879-docker-image-reuse.mjs→ 18 passed, 0 failed.node tests/test-issue-1860-docker-isolation.mjs→ 26 passed, 0 failed.