Skip to content

Commit b59a473

Browse files
authored
Merge pull request #1926 from link-assistant/issue-1914-0db0f6530c5c
fix(isolation): prevent DinD disk blowups and child image drift
2 parents 657c767 + a6a2102 commit b59a473

28 files changed

Lines changed: 1362 additions & 96 deletions
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
---
2+
'@link-assistant/hive-mind': patch
3+
---
4+
5+
fix(isolation): default nested Docker daemon to fuse-overlayfs so multi-GB images fit on disk + add storage-driver/disk preflight diagnostics (#1914)
6+
7+
`--isolation docker` was reopened after PR #1915: native Docker isolation and
8+
host-image passthrough now work, but the first isolated task on the >30 GB
9+
`konard/hive-mind-dind` image still died with:
10+
11+
```
12+
failed to register layer: no space left on device
13+
```
14+
15+
even though most layers reported `Already exists` (the daemon was correctly
16+
seeded — passthrough is working). The failure was during layer **registration**,
17+
not download.
18+
19+
**Root cause (in this repo).** `Dockerfile.dind` baked `ENV
20+
DIND_STORAGE_DRIVER="vfs"` (commit 44d2c29e). `vfs` performs **no copy-on-write**:
21+
it materializes a full, independent copy of the entire filesystem for *every*
22+
layer, so a multi-GB image's on-disk footprint becomes the *sum* of all
23+
cumulative layer sizes — many times the image size — and overflows the disk.
24+
Worse, pinning the env var **defeated box-dind's storage-driver auto-detection**
25+
(`overlay2 → fuse-overlayfs → vfs`, with graceful fallback): box would otherwise
26+
have picked a copy-on-write driver here. `/dev/fuse` is present (the dind
27+
container runs `--privileged`), the `fuse-overlayfs` binary ships in box-dind,
28+
and `overlay` is in `/proc/filesystems` — so copy-on-write was available the
29+
whole time but was being bypassed by the `vfs` pin.
30+
31+
**Fix.** `Dockerfile.dind` now pins `ENV DIND_STORAGE_DRIVER="fuse-overlayfs"` — a
32+
copy-on-write driver that also works overlay-on-overlay (the compatibility reason
33+
`vfs` was originally chosen; `overlay2` can fail on the overlay-backed hosts our
34+
deploys run on). Under `fuse-overlayfs`, registering a 498 MB top layer on a
35+
~30 GB base costs ~498 MB instead of ~30 GB, so the image fits. Empirically
36+
verified in the box-dind environment (`docs/case-studies/issue-1914/data/fuse-overlayfs-capability-proof.log`).
37+
38+
**Self-diagnosing preflight.** `src/isolation-runner.lib.mjs` gained two probes —
39+
`checkDockerStorageDriver()` and `checkDockerDiskSpace()` — wired into
40+
`preflightDockerIsolation()`. Before running an isolated task it now warns, with
41+
an actionable remedy, when the nested daemon is on `vfs` (even if the image is
42+
already present) or when free space at the Docker data root is below 40 GiB, so
43+
the next operator hitting this gets a clear breadcrumb instead of a cryptic
44+
`no space left on device`. Both probes are best-effort and never throw.
45+
46+
Added `tests/test-issue-1914-storage-driver-diagnostics.mjs` (34 assertions),
47+
extended `tests/test-issue-1914-preflight-passthrough.mjs` and
48+
`tests/test-docker-dind-variant.mjs`, refreshed `docs/DOCKER*.md`, and expanded
49+
the `docs/case-studies/issue-1914` case study with the reopen timeline, refined
50+
root-cause analysis, captured evidence, and an upstream observability request
51+
(link-foundation/box#104: warn when the nested daemon lands on `vfs`).

Dockerfile

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,10 @@
2020

2121
FROM konard/box:2.3.2
2222
ARG HIVE_MIND_VERSION=latest
23+
# Release builds pass the exact published package version here. Bake it as the
24+
# default child isolation image tag so a parent started via :latest still runs
25+
# Docker-isolated tasks on the same immutable release image.
26+
ENV HIVE_MIND_DOCKER_ISOLATION_IMAGE_TAG="${HIVE_MIND_VERSION}"
2327

2428
# --- Environment variables ---
2529
# Set environment variables EARLY so they're available in subsequent RUN commands

Dockerfile.dind

Lines changed: 33 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,39 @@ ARG HIVE_MIND_VERSION=latest
2020
# --- Environment variables ---
2121
ENV HOME=/home/box
2222
ENV HIVE_MIND_IMAGE_VARIANT=dind
23-
# Prefer compatibility for nested Docker. overlay2 can fail on common
24-
# overlay-backed hosts; users can override this to overlay2 or fuse-overlayfs.
25-
ENV DIND_STORAGE_DRIVER="vfs"
23+
# Release builds pass the exact published package version here. Bake it as the
24+
# default child isolation image tag so a parent started via :latest still runs
25+
# Docker-isolated tasks on the same immutable release image.
26+
ENV HIVE_MIND_DOCKER_ISOLATION_IMAGE_TAG="${HIVE_MIND_VERSION}"
27+
# Nested-Docker storage driver. MUST be a copy-on-write driver for these images.
28+
#
29+
# The Hive Mind images are multiple gigabytes with many layers. `vfs` performs
30+
# NO copy-on-write: it materializes a full, independent copy of the entire
31+
# filesystem for every layer, so the on-disk footprint is the *sum* of all
32+
# cumulative layer sizes — many times the image size. On a >30 GB image that
33+
# overflows the disk and the first isolated `--isolation docker` task dies with
34+
# `failed to register layer: no space left on device` (issue #1914 reopen).
35+
#
36+
# `fuse-overlayfs` is the right default: it is copy-on-write (so the image
37+
# costs roughly its real size once, not a per-layer multiple) AND it works
38+
# overlay-on-overlay, which is the compatibility `vfs` was originally chosen
39+
# for — `overlay2` can fail on the overlay-backed hosts our deploys run on.
40+
# box-dind ships the `fuse-overlayfs` binary and Hive Mind launches the dind
41+
# container with `--privileged`, so /dev/fuse is available at runtime.
42+
#
43+
# NOTE: box-dind already auto-detects a driver (overlay2 -> fuse-overlayfs ->
44+
# vfs, with graceful fallback). Setting this env var OVERRIDES that detection
45+
# and pins one driver. The previous `vfs` value here defeated the auto-detect
46+
# and forced the no-copy-on-write driver — the exact cause of the #1914 reopen.
47+
# We pin `fuse-overlayfs` (not blank/auto) for determinism: it skips the
48+
# overlay2-on-overlay attempt that tends to fail in our nested deploys and
49+
# guarantees a copy-on-write daemon.
50+
#
51+
# Override only if you know your host: `-e DIND_STORAGE_DRIVER=overlay2` is
52+
# faster where nested overlay mounts are supported; `-e DIND_STORAGE_DRIVER=vfs`
53+
# is a last-resort compatibility fallback but uses many times the disk and is
54+
# the configuration that caused issue #1914.
55+
ENV DIND_STORAGE_DRIVER="fuse-overlayfs"
2656
ENV NVM_DIR="/home/box/.nvm"
2757
ENV PYENV_ROOT="/home/box/.pyenv"
2858
ENV BUN_INSTALL="/home/box/.bun"

coolify/Dockerfile

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,10 @@
2020

2121
FROM konard/box:2.3.2
2222
ARG HIVE_MIND_VERSION=latest
23+
# Release builds pass the exact published package version here. Bake it as the
24+
# default child isolation image tag so a parent started via :latest still runs
25+
# Docker-isolated tasks on the same immutable release image.
26+
ENV HIVE_MIND_DOCKER_ISOLATION_IMAGE_TAG="${HIVE_MIND_VERSION}"
2327

2428
# --- Environment variables ---
2529
# Set environment variables EARLY so they're available in subsequent RUN commands

docs/DOCKER.hi.md

Lines changed: 33 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,12 @@ docker info
5353
docker run hello-world
5454
```
5555

56-
यह image overlay-backed hosts के साथ compatibility के लिए inner Docker daemon को default रूप से `DIND_STORAGE_DRIVER=vfs` पर चलाती है। जिन hosts पर nested overlay mounts supported हैं, वहां faster local runs के लिए `-e DIND_STORAGE_DRIVER=overlay2` pass करें।
56+
यह image inner Docker daemon को default रूप से `DIND_STORAGE_DRIVER=fuse-overlayfs` पर चलाती है। यह एक **copy-on-write** driver है, इसलिए कई गीगाबाइट की Hive Mind images डिस्क पर लगभग अपने असली आकार जितनी ही जगह (एक बार) लेती हैं — जबकि `vfs` हर layer की पूरी copy बनाता है और on-disk footprint को image आकार के कई गुना तक बढ़ा देता है, जिससे डिस्क `failed to register layer: no space left on device` के साथ भर जाती है ([issue #1914](https://github.com/link-assistant/hive-mind/issues/1914))। `fuse-overlayfs` overlay-on-overlay भी काम करता है (वही compatibility जिसके लिए शुरू में `vfs` चुना गया था), image में `fuse-overlayfs` binary पहले से मौजूद है, और Hive Mind DinD container को `--privileged` के साथ launch करता है, इसलिए `/dev/fuse` उपलब्ध रहता है। Override विकल्प:
57+
58+
- `-e DIND_STORAGE_DRIVER=overlay2` — nested overlay mounts को support करने वाले hosts पर तेज़, लेकिन overlay-backed hosts पर fail हो सकता है;
59+
- `-e DIND_STORAGE_DRIVER=vfs` — केवल अंतिम विकल्प (compatibility fallback); कई गुना ज़्यादा डिस्क लेता है और यही वह configuration है जिसने issue #1914 पैदा किया।
60+
61+
> **पुरानी `vfs` image पर container पहले से चल रहा है?** bot container के `docker run` में `-e DIND_STORAGE_DRIVER=fuse-overlayfs` जोड़ें और container को फिर से बनाएं — image rebuild की ज़रूरत नहीं।
5762
5863
Shared hosts पर, उपलब्ध हो तो Sysbox runtime को प्राथमिकता दें:
5964

@@ -65,13 +70,16 @@ DinD image `konard/hive-mind:latest` से अलग publish होती ह
6570

6671
#### Host-image passthrough (मल्टी-GB images फिर से download होने से बचाएं)
6772

68-
जब bot DinD image के अंदर `--isolation docker` के साथ चलता है, तो हर task एक _nested_
69-
`docker run konard/hive-mind-dind:latest …` के रूप में launch होता है। वह nested `docker run`
70-
**inner** dockerd से बात करता है, जिसका image store शुरू में **खाली** होता है (deploy
71-
`docker commit` से पहले `/var/lib/docker` को wipe कर देता है)। इसलिए Docker
72-
`Unable to find image '…' locally` report करता है और एक नई copy pull करता है — और Hive Mind
73-
images कई gigabytes की होती हैं, इसलिए पहला isolated task एक ऐसी image को re-download करने में
74-
बहुत समय लगा सकता है (या disk खत्म कर सकता है) जो **host के पास पहले से मौजूद** है। देखें
73+
जब bot release DinD image के अंदर `--isolation docker` के साथ चलता है, तो हर task एक _nested_
74+
`docker run konard/hive-mind-dind:<release-tag> …` के रूप में launch होता है। Release images
75+
published `HIVE_MIND_VERSION` से `HIVE_MIND_DOCKER_ISOLATION_IMAGE_TAG` bake करती हैं, इसलिए
76+
`konard/hive-mind-dind:latest` से started parent container भी child containers के लिए वही
77+
immutable release tag उपयोग करता है। वह nested `docker run` **inner** dockerd से बात करता है,
78+
जिसका image store शुरू में **खाली** होता है (deploy `docker commit` से पहले `/var/lib/docker`
79+
को wipe कर देता है)। इसलिए Docker `Unable to find image '…' locally` report करता है और एक
80+
नई copy pull करता है — और Hive Mind images कई gigabytes की होती हैं, इसलिए पहला isolated task
81+
एक ऐसी image को re-download करने में बहुत समय लगा सकता है (या disk खत्म कर सकता है) जो
82+
**host के पास पहले से मौजूद** है। देखें
7583
[issue #1914](https://github.com/link-assistant/hive-mind/issues/1914) और
7684
[#1879](https://github.com/link-assistant/hive-mind/issues/1879)
7785

@@ -101,22 +109,37 @@ default `public` mode में, केवल वे images copy होती
101109
इसलिए host copy एक pulled/pushed image होनी चाहिए (केवल local `docker build` से बनी, बिना
102110
`RepoDigest` वाली image skip हो जाएगी — पहले उसे push करें या `all` उपयोग करें)।
103111

112+
release deployments में final bot container start होने से पहले host पर exact child tag भी
113+
मौजूद होना चाहिए। केवल `:latest` pull करना अब काफी नहीं है, क्योंकि release image
114+
`HIVE_MIND_DOCKER_ISOLATION_IMAGE_TAG` pin करती है:
115+
116+
```bash
117+
TAG="$(docker image inspect konard/hive-mind-dind:latest \
118+
--format '{{range .Config.Env}}{{println .}}{{end}}' \
119+
| sed -n 's/^HIVE_MIND_DOCKER_ISOLATION_IMAGE_TAG=//p' \
120+
| tail -1)"
121+
docker pull "konard/hive-mind-dind:${TAG:-latest}"
122+
```
123+
104124
**Startup preflight.** जब `--isolation docker` enabled होता है, bot startup पर inner daemon को
105125
probe करके result log करता है, ताकि misconfiguration task के बीच में surprise pull बनने के बजाय
106126
तुरंत सामने आ जाए:
107127

108128
- ✅ image पहले से मौजूद → isolated tasks उसे reuse करते हैं (कोई pull नहीं);
109129
- ⚠️ socket mount **नहीं** है → यह आपको socket mount + allowlist जोड़ने को कहता है;
110-
- ⚠️ socket mounted है पर image अब भी absent → यह आपको passthrough mode/allowlist/digest जाँचने को कहता है।
130+
- ⚠️ socket mounted है पर image अब भी absent → यह आपको passthrough mode/allowlist/digest जाँचने को कहता है;
131+
- ⚠️ inner daemon `vfs` storage driver पर है → यह आपको `fuse-overlayfs` पर switch करने को कहता है (issue #1914 की disk-amplification root cause);
132+
- ⚠️ Docker data root पर कम free space और image अब भी absent → यह चेतावनी देता है कि आने वाला pull डिस्क खत्म कर सकता है।
111133

112134
underlying `docker image inspect` traces के लिए bot को `--verbose` (या `TELEGRAM_BOT_VERBOSE=true`) के साथ चलाएं।
113135

114136
**Manual fallback.** पहले से चल रहे container को तुरंत seed करने के लिए (या जब आप deployment नहीं बदल
115137
सकते), host image को inner daemon में copy करें:
116138

117139
```bash
140+
TAG="$(docker exec hive-mind printenv HIVE_MIND_DOCKER_ISOLATION_IMAGE_TAG || true)"
118141
node scripts/preload-dind-isolation-image.mjs \
119-
--container hive-mind --image konard/hive-mind-dind:latest
142+
--container hive-mind --image "konard/hive-mind-dind:${TAG:-latest}"
120143
```
121144

122145
यह `docker save … | docker exec -i <container> docker load` stream करता है ताकि tarball कभी disk पर

0 commit comments

Comments
 (0)