This document explains how Matchlock tracks VM lifecycle state, which resources are owned by a VM, and how to recover from leaked host resources.
Sandbox shutdown can fail partway through (host signal, process crash, network permission issues, etc.). Matchlock now persists lifecycle state so cleanup can be resumed safely and auditable later.
Each VM has persistent lifecycle rows in:
~/.matchlock/state.db- table:
vm_lifecycle
The record includes:
- current lifecycle phase
- last lifecycle error (if any)
- known resource identifiers/paths (rootfs, subnet allocation, TAP table names)
- per-step cleanup status snapshot for close/reconcile operations
Lifecycle phase/audit data, VM runtime metadata, and subnet allocation metadata are stored in the same SQLite DB:
~/.matchlock/state.db- tables:
vms,subnet_allocations,vm_lifecycle,schema_migrations
Image metadata is stored in a separate SQLite DB:
~/.cache/matchlock/images/metadata.db- tables:
images,schema_migrations images.scopedistinguisheslocalvsregistrymetadata
Large artifacts remain filesystem-based:
- VM directories, logs, sockets, and per-VM rootfs copies
- image rootfs files under image cache directories
Both DBs use automatic startup migrations with:
journal_mode=WALforeign_keys=ONbusy_timeout
Migration behavior:
- applied versions are tracked in
schema_migrations - each migration runs in a transaction
- before pending migrations, Matchlock creates a pre-migration backup
- if migration fails, Matchlock restores from that backup
Phases are validated through an allowed-transition state machine.
Primary phases:
creatingcreatedstartingrunningstoppingstoppedcleaningcleaned
Failure phases:
create_failedstart_failedstop_failedcleanup_failed
Typical success path:
creating -> createdcreated -> starting -> runningrunning -> stopping -> cleaning -> cleaned
The latest lifecycle snapshot tracks resources needed for deterministic cleanup:
- VM state directory:
~/.matchlock/vms/<vm-id>/ - per-VM rootfs copy:
rootfs.ext4under VM state dir - subnet allocation row in
~/.matchlock/state.db(subnet_allocationstable) - Linux-only network artifacts:
- TAP interface (
fc-<suffix>) - nftables tables (
matchlock_<tap>,matchlock_nat_<tap>)
- TAP interface (
Sandbox.Close() now reports cleanup failures instead of silently ignoring
them. Failures are stored in lifecycle cleanup entries.
CLI behavior now preserves cleanup semantics:
- command exit codes are propagated without bypassing deferred cleanup
run --rm=false -itkeeps VM alive until signal, then performs close cleanup
Use gc to clean leaked resources for stopped/crashed VMs:
# Reconcile one VM
matchlock gc vm-abc12345
# Reconcile all VMs
matchlock gc
# Also reconcile currently running VMs (dangerous; use sparingly)
matchlock gc --force-runninggc reconciles:
- subnet allocation row
- rootfs copy
- Linux: TAP + nftables artifacts
If a VM is still running, reconciliation is skipped unless --force-running
is provided.
rm/prune now run reconciliation before removing VM metadata:
- if reconcile succeeds, VM state can be removed
- if reconcile fails, removal is aborted and error is returned
This prevents losing VM metadata while leaking host resources.
Subnet allocation is now DB-backed with:
- unique subnet octet constraint at the DB level
- atomic writes via SQLite transactions
- WAL mode for robust cross-process concurrency
This replaces the previous lock-file + JSON-scan allocator model.
- Linux: reconciles subnet/rootfs/TAP/nftables.
- macOS and non-Linux platforms: reconciles subnet/rootfs; platform-specific network artifact reconciliation is currently a no-op.
- Inspect VM states:
matchlock list
- Inspect lifecycle history (append-only):
sqlite3 ~/.matchlock/state.db 'select vm_id,version,phase,updated_at,last_error from vm_lifecycle where vm_id=\"<vm-id>\" order by version;'sqlite3 ~/.matchlock/state.db 'select version,cleanup_json from vm_lifecycle where vm_id=\"<vm-id>\" order by version;'
- (Optional) Inspect VM/subnet DB metadata:
sqlite3 ~/.matchlock/state.db 'select id,status,pid from vms;'sqlite3 ~/.matchlock/state.db 'select vm_id,octet,subnet from subnet_allocations;'
- (Optional) Inspect image metadata:
sqlite3 ~/.cache/matchlock/images/metadata.db 'select scope,tag,digest,size from images;'
- Reconcile leaked resources:
matchlock gc <vm-id>
- Remove stopped VM metadata after successful reconcile:
matchlock rm <vm-id>
If gc still fails, check failed cleanup steps in vm_lifecycle.cleanup_json
for recent versions and fix host permissions/network prerequisites before
retrying.