What's wrong
No way to validate infrastructure changes (Dockerfiles, compose configs, volume mounts, cross-service networking) before deploying. PR #216 needed 4 iterative fixes on the live server — all were runtime issues invisible to unit tests and config validation:
.ssh dir owned by root (container runs as uid 1000)
- SSH ignores
HOME env var, uses /etc/passwd for home directory resolution
- No git user identity configured in ephemeral container
git reset --hard wiping files written before the reset
The current workaround — SSH into the server, checkout a branch, build, and test — is risky and not reproducible.
What done looks like
A mise run test:smoke command validates that changed stacks actually work before merging. Specifically:
scripts/smoke-test.sh <stack> builds, starts, health-checks, and tears down a stack in an isolated Docker Compose project (-p test-<stack>)
- Test stacks join a shared
test-homelab Docker network for cross-stack communication via DNS (replacing Tailscale IP references)
compose.test.yaml overrides per stack: no host port binds, dummy env vars, shared test network
- A GHA workflow runs smoke tests on PRs touching
stacks/{agents,observability,knowledge}/
- ADR-018 documents the strategy and limitations
Focused on 3 stacks under active development: agents, observability, knowledge. Pre-built stacks (HA, MQTT, CrowdSec, flight-tracker) are out of scope.
What the agent can't discover
Cross-stack dependencies (from analysis):
- Alloy config has exactly 2 Tailscale IP references that need overriding for test mode:
100.100.146.119:6060 → CrowdSec metrics scrape target
100.100.146.119:8585 → Agent service metrics scrape target
- These need a
config.test.alloy that uses Docker DNS names on the shared test-homelab network
- All other cross-stack communication is internal to each compose project (Grafana→Prometheus, knowledge ingest→postgres)
Docker Compose project isolation:
-p test-<stack> automatically prefixes volume names (e.g., test-agents_repo-cache), preventing data conflicts
- Port conflicts are impossible — test stacks don't bind to host ports at all
- Container names are prefixed too, so no naming collisions
Constraints:
- Agent worker spawning shares the Docker daemon with production — smoke tests should verify the API starts and responds to health checks, not test full worker lifecycle
- The GHA workflow must be human-authored (agents cannot modify
.github/workflows/)
- Server has ~12GB free RAM; production containers use ~2GB total. A full parallel test stack is well within budget.
What must not break
- Production stacks must not be affected (no shared ports, no shared volumes, no shared container names)
mise run ci stays fast (~60s) — smoke tests are a separate workflow/task
- Deploy workflow is unchanged
- Teardown must be bulletproof (trap in script, always step in GHA) — leftover test containers waste resources
Deliverables
docs/decisions/018-isolated-test-environments.md — ADR
stacks/agents/compose.test.yaml — test overrides
stacks/observability/compose.test.yaml + config.test.alloy — test overrides with DNS-based scrape targets
stacks/knowledge/compose.test.yaml — test overrides
scripts/smoke-test.sh — build/start/health-check/teardown orchestrator
mise.toml — test:smoke and test:smoke:<stack> tasks
.github/workflows/smoke-test.yaml — PR-triggered smoke tests (human-authored, see below)
Out of scope (human follow-up)
.github/workflows/smoke-test.yaml — agents cannot modify workflow files. Write this after the smoke-test script is validated.
- Agent self-validation (future: agents run
scripts/smoke-test.sh from their worktree before creating PRs)
- Home Assistant, MQTT, CrowdSec, flight-tracker smoke tests (pre-built images, rarely break)
- Cloudflare tunnel testing (requires real token)
- Full end-to-end tests (SSH push to GitHub, Copilot CLI calls)
What's wrong
No way to validate infrastructure changes (Dockerfiles, compose configs, volume mounts, cross-service networking) before deploying. PR #216 needed 4 iterative fixes on the live server — all were runtime issues invisible to unit tests and config validation:
.sshdir owned by root (container runs as uid 1000)HOMEenv var, uses/etc/passwdfor home directory resolutiongit reset --hardwiping files written before the resetThe current workaround — SSH into the server, checkout a branch, build, and test — is risky and not reproducible.
What done looks like
A
mise run test:smokecommand validates that changed stacks actually work before merging. Specifically:scripts/smoke-test.sh <stack>builds, starts, health-checks, and tears down a stack in an isolated Docker Compose project (-p test-<stack>)test-homelabDocker network for cross-stack communication via DNS (replacing Tailscale IP references)compose.test.yamloverrides per stack: no host port binds, dummy env vars, shared test networkstacks/{agents,observability,knowledge}/Focused on 3 stacks under active development: agents, observability, knowledge. Pre-built stacks (HA, MQTT, CrowdSec, flight-tracker) are out of scope.
What the agent can't discover
Cross-stack dependencies (from analysis):
100.100.146.119:6060→ CrowdSec metrics scrape target100.100.146.119:8585→ Agent service metrics scrape targetconfig.test.alloythat uses Docker DNS names on the sharedtest-homelabnetworkDocker Compose project isolation:
-p test-<stack>automatically prefixes volume names (e.g.,test-agents_repo-cache), preventing data conflictsConstraints:
.github/workflows/)What must not break
mise run cistays fast (~60s) — smoke tests are a separate workflow/taskDeliverables
docs/decisions/018-isolated-test-environments.md— ADRstacks/agents/compose.test.yaml— test overridesstacks/observability/compose.test.yaml+config.test.alloy— test overrides with DNS-based scrape targetsstacks/knowledge/compose.test.yaml— test overridesscripts/smoke-test.sh— build/start/health-check/teardown orchestratormise.toml—test:smokeandtest:smoke:<stack>tasks.github/workflows/smoke-test.yaml— PR-triggered smoke tests (human-authored, see below)Out of scope (human follow-up)
.github/workflows/smoke-test.yaml— agents cannot modify workflow files. Write this after the smoke-test script is validated.scripts/smoke-test.shfrom their worktree before creating PRs)