Purpose: This document is the single source of truth for coding Ignite from its current v1.1 state to the production-grade v2.0 release. Every section is a direct coding directive — no marketing, no vision statements. Follow this phase by phase.
- Current State Audit — What v1.1 Has and What's Wrong
- Repository & Crate Architecture Target
- Phase 1 — v1.2: Critical Fixes (8 weeks)
- Phase 2 — v1.3: Foundation Hardening (6 weeks)
- Phase 3 — v1.5: Power Features (12 weeks)
- Phase 4 — v2.0: Revolutionary Features (16 weeks)
- Cross-Cutting: Testing Strategy
- Cross-Cutting: Packaging & Distribution
- Deprecated Decisions & Migration Guide
| Crate | Binary | Status | Notes |
|---|---|---|---|
crates/ign |
/usr/bin/ign |
✅ Working | CLI, ~20 commands |
crates/ignited |
/usr/bin/ignited |
✅ Working | Daemon, runs as root |
crates/ignite-core |
library | ✅ Working | OCI, storage, network, vmm |
crates/ignite-compose |
library | ✅ Working | compose YAML parsing, ign up/down |
crates/ui |
— | ✅ Working | TypeScript/React dashboard at :3000 |
ign run <image> --vcpu N --memory MB -p H:V -v H:V --name Nign ps,ign stop,ign rm,ign start,ign restartign logs -f,ign exec <id> <cmd>ign pull,ign build -t <tag> .(FROM, RUN, COPY only)ign snapshot,ign restore,ign export,ign importign network create/ls/rmign up -d,ign down,ign scale <svc>=Nign swarm init,ign swarm join <ip>,ign swarm lsign doctor- OCI pull from Docker Hub (custom reqwest client, handles Index vs V2 Manifest)
- Device Mapper snapshots for instant clone (dm-snapshot via
dmsetupCLI calls) - TAP + Linux bridge (
ign0) networking with NAT - CNI integration (bridge/ptp plugins)
- Internal DNS on gateway IP (172.16.0.1:53)
- Virtiofs volume mounts (external
virtiofsdbinary required) - State persistence across daemon restarts (
.ignite/state/JSON files) - VXLAN overlay networking skeleton for Swarm
.deb/.rpmpackaging with bundled Firecracker binarysystemdservice unit forignited
- CMD/ENTRYPOINT/ENV not parsed — Ignitefile
FROM/RUN/COPYwork butCMD,ENTRYPOINT,ENVare silently dropped. VMs start/bin/sh. Most Docker Hub images are non-functional. - virtiofsd is an unmanaged external dep —
-vsilently fails on clean installs. "Batteries Included" promise is broken. - ignited runs as full root — ADR-019 adopted the Docker model but never constrained capabilities. Should be a dedicated
ignitesystem user withCAP_NET_ADMIN+CAP_SYS_ADMINonly, not a full root process. - No WAL / crash recovery — ADR-008 acknowledged daemon restart loses running VM control handles. ADR consequence: "Future task: State persistence/recovery" — never implemented properly. JSON state files partially address this but have no crash-safe write path.
- Compose schema
version: "1.0"is incompatible with Docker Compose v3 — Users cannotign upfrom existingdocker-compose.ymlwithout manual edits.networks:top-level key is not supported. All services share one default bridge. - Swarm VXLAN traffic is unencrypted plaintext — No WireGuard or any encryption on the overlay. Multi-tenant or cloud deployment is insecure.
- Swarm uses seed-node model — Single point of failure. No documented recovery if seed goes down.
- Git-based time travel is a hack — Using
git commiton snapshot files works but: (a) git binary is an external dep, (b) it has no delta storage, (c) checkout semantics are wrong for VM state, (d) not atomic.
- CLI system calls (ADR-002) —
std::process::Commandwrappingdmsetup,losetup,iptables,ip linkworks but is fragile. Error handling is string parsing of stderr. - No gRPC interface — REST-only API blocks Kubernetes CRI implementation.
- No VMIF image format — Images are raw OCI converted at runtime every pull. No signing, no caching spec, no defined format for the Hub.
ign commit,ign save,ign loadmissing — Referenced in docs but not implemented.ign rmbehavior unclear — Must clean up: DM snapshot, loop device, COW file, tap device, state JSON.- Environment variable injection —
environment:key in compose YAML is WIP (noted in README).
Add these crates progressively through the phases. Each gets its own crates/<name>/ directory and entry in the workspace members array.
# Cargo.toml (workspace root) — Target state
[workspace]
members = [
"crates/ign", # CLI binary
"crates/ignited", # Daemon binary
"crates/ignite-core", # VM lifecycle, OCI, storage, network, vmm
"crates/ignite-compose", # Compose YAML schema + orchestrator
"crates/ignite-net", # Network management (Phase 1 refactor)
"crates/ignite-storage", # Storage layer refactor (Phase 2)
"crates/ignite-image", # VMIF format + OCI bridge (Phase 3)
"crates/ignite-agent", # In-VM agent binary, musl target (Phase 4)
"crates/ignite-teleport", # Pre-copy live migration (Phase 3)
"crates/ignite-proto", # Protobuf definitions for gRPC (Phase 3)
]
[workspace.dependencies]
tokio = { version = "1", features = ["full"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
serde_yaml = "0.9"
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
anyhow = "1"
thiserror = "1"
axum = "0.7"
reqwest = { version = "0.11", features = ["json", "stream"] }
clap = { version = "4", features = ["derive"] }
uuid = { version = "1", features = ["v4"] }micro-vm-ecosystem/
├── crates/
│ ├── ign/ # CLI
│ │ └── src/
│ │ ├── main.rs
│ │ ├── commands/ # One file per command group
│ │ │ ├── lifecycle.rs (run, stop, start, restart, rm, ps)
│ │ │ ├── image.rs (pull, build, push, images, rmi, tag)
│ │ │ ├── network.rs (network create/ls/rm/connect)
│ │ │ ├── volume.rs (volume create/ls/rm)
│ │ │ ├── snapshot.rs (snapshot, restore, history, time-travel, branch, diff)
│ │ │ ├── compose.rs (up, down, scale, logs)
│ │ │ ├── swarm.rs (swarm init/join/ls, service create/ls/update)
│ │ │ └── system.rs (doctor, stats, inspect)
│ │ └── client.rs # HTTP+gRPC client for ignited
│ │
│ ├── ignited/ # Daemon
│ │ └── src/
│ │ ├── main.rs
│ │ ├── api/ # REST handlers (axum routes)
│ │ │ ├── vms.rs
│ │ │ ├── images.rs
│ │ │ ├── networks.rs
│ │ │ ├── volumes.rs
│ │ │ ├── snapshots.rs
│ │ │ └── swarm.rs
│ │ ├── state/ # WAL + persistent state
│ │ │ ├── wal.rs # Write-ahead log using sled
│ │ │ ├── store.rs # State store abstraction
│ │ │ └── recovery.rs # WAL replay on startup
│ │ ├── vm_manager.rs # VM lifecycle FSM
│ │ ├── firecracker.rs # Firecracker API wrapper
│ │ └── metrics.rs # Prometheus exporter
│ │
│ ├── ignite-core/ # Existing — refactored
│ │ └── src/
│ │ ├── oci.rs # OCI client (keep custom impl, improve)
│ │ ├── layers.rs # Layer unpacking
│ │ ├── vmm.rs # Firecracker HTTP client
│ │ └── types.rs # Shared domain types
│ │
│ ├── ignite-net/ # NEW — extracted from ignite-core
│ │ └── src/
│ │ ├── lib.rs
│ │ ├── bridge.rs # Linux bridge management via rtnetlink
│ │ ├── tap.rs # TAP device creation
│ │ ├── ipam.rs # IP allocation (deterministic subnet leases)
│ │ ├── dns.rs # Embedded DNS server
│ │ ├── nat.rs # iptables NAT management
│ │ └── wireguard.rs # boringtun integration (Phase 3)
│ │
│ ├── ignite-storage/ # NEW — extracted + improved
│ │ └── src/
│ │ ├── lib.rs
│ │ ├── dm.rs # Device Mapper via devicemapper-rs crate
│ │ ├── cow.rs # CoW layer management
│ │ ├── ext4.rs # ext4 image creation and population
│ │ └── snapshot_tree.rs # CoW delta history tree (TimeMachine backend)
│ │
│ ├── ignite-image/ # NEW — VMIF format
│ │ └── src/
│ │ ├── lib.rs
│ │ ├── vmif.rs # VMIF struct (kernel + rootfs + metadata)
│ │ ├── hub_bridge.rs # Docker Hub → VMIF conversion
│ │ └── signing.rs # Ed25519 signing (Phase 4)
│ │
│ ├── ignite-compose/ # Existing — extended
│ │ └── src/
│ │ ├── lib.rs
│ │ ├── schema_v3.rs # Docker Compose v3 schema (replaces v1.0)
│ │ └── orchestrator.rs
│ │
│ ├── ignite-proto/ # NEW — gRPC definitions (Phase 3)
│ │ ├── proto/
│ │ │ ├── vm.proto
│ │ │ ├── image.proto
│ │ │ └── cri.proto # Kubernetes CRI v1
│ │ └── build.rs
│ │
│ ├── ignite-teleport/ # NEW — live migration (Phase 3)
│ │ └── src/
│ │ ├── lib.rs
│ │ ├── sender.rs # Source node: dirty page tracking
│ │ ├── receiver.rs # Destination node: page receiver
│ │ └── protocol.rs # Wire protocol for memory transfer
│ │
│ └── ignite-agent/ # NEW — in-VM binary (Phase 4)
│ └── src/
│ └── main.rs # Compiled for musl, ~400KB static binary
│
├── services/
│ └── ignite-hub/ # NEW — OCI registry + Docker Hub bridge (Phase 3)
│ └── src/
│ ├── main.rs
│ ├── registry.rs # OCI registry protocol
│ ├── bridge.rs # Docker Hub pull + VMIF conversion
│ └── cache.rs # Converted image cache
│
├── vk8s/ # NEW — Kubernetes CRI plugin in Go (Phase 4)
│ ├── cmd/vk8s-shim/
│ └── pkg/cri/
│
├── ui/ # Existing TypeScript dashboard
│ └── src/
│
├── tests/
│ ├── integration/ # Existing
│ ├── chaos/ # NEW — WAL crash recovery tests (Phase 1)
│ ├── compat/ # NEW — Docker Hub image compat matrix (Phase 2)
│ └── bench/ # NEW — perf benchmarks (Phase 2)
│
├── kernels/ # NEW — slim kernel build configs (Phase 3)
├── packaging/
│ ├── systemd/
│ │ └── ignited.service # systemd unit with capability constraints
│ ├── deb/
│ └── rpm/
├── scripts/
├── bin/ # Bundled Firecracker binary
└── Cargo.toml
Duration: 8 weeks
Goal: Make ign run actually work for the vast majority of Docker Hub images. Make the system production-safe at the privilege level. Make the daemon crash-safe.
Crate: ignite-core (oci.rs, layers.rs) + crates/ignited (vm startup path)
Priority: P0 — highest impact fix in the entire project
When ign pull <image> or ign build runs, extract the OCI image config JSON and persist it alongside the ext4 rootfs. When ign run starts a VM, inject the config values as a small init wrapper.
Step 1: Extend the OCI client in ignite-core/src/oci.rs to parse the image config blob.
// ignite-core/src/oci.rs
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct OciImageConfig {
/// e.g. ["/usr/sbin/nginx", "-g", "daemon off;"]
pub entrypoint: Option<Vec<String>>,
/// e.g. ["-c", "/etc/nginx/nginx.conf"]
pub cmd: Option<Vec<String>>,
/// e.g. ["PATH=/usr/local/bin:/usr/bin", "PORT=8080"]
pub env: Option<Vec<String>>,
/// Working directory inside the VM
pub working_dir: Option<String>,
/// Exposed ports metadata (informational only)
pub exposed_ports: Option<HashMap<String, serde_json::Value>>,
/// User to run as (informational, used to set up init script)
pub user: Option<String>,
}
impl OciImageConfig {
/// Produce the full command to exec: ENTRYPOINT + CMD combined
pub fn full_command(&self) -> Vec<String> {
let mut cmd = vec![];
if let Some(ep) = &self.entrypoint {
cmd.extend_from_slice(ep);
}
if let Some(c) = &self.cmd {
cmd.extend_from_slice(c);
}
if cmd.is_empty() {
cmd.push("/bin/sh".to_string());
}
cmd
}
}Step 2: After layer flattening, write ignite-config.json into the image cache directory.
~/.ignite/images/<image-hash>/
rootfs.ext4 # Read-only base image
ignite-config.json # NEW: parsed OCI config
manifest.json
Step 3: In crates/ignited/src/vm_manager.rs, before calling the Firecracker boot API, generate /sbin/ignite-init inside the VM's rootfs COW layer and set it as the kernel init= boot parameter.
// crates/ignited/src/vm_manager.rs
fn generate_init_script(config: &OciImageConfig) -> String {
let mut script = String::from("#!/bin/sh\n");
// Inject environment variables
if let Some(env_vars) = &config.env {
for var in env_vars {
// var is "KEY=VALUE"
script.push_str(&format!("export {}\n", var));
}
}
// Set working directory
if let Some(wd) = &config.working_dir {
script.push_str(&format!("cd {}\n", wd));
}
// Exec the actual command (ENTRYPOINT + CMD)
let full_cmd = config.full_command();
let exec_line = full_cmd.iter()
.map(|s| shell_escape(s))
.collect::<Vec<_>>()
.join(" ");
script.push_str(&format!("exec {}\n", exec_line));
script
}
/// Inject /sbin/ignite-init into the COW layer before VM boot.
/// We use debugfs to write the script without mounting the filesystem.
fn inject_init_script(cow_path: &Path, config: &OciImageConfig) -> Result<()> {
let script = generate_init_script(config);
let script_path = temp_file_with_content(&script)?;
// Write script into ext4 image via debugfs (no mount required, no root needed)
Command::new("debugfs")
.args(["-w", cow_path.to_str().unwrap()])
.stdin(format!("write {} /sbin/ignite-init\nchmod 0755 /sbin/ignite-init\n",
script_path.display()))
.status()?;
Ok(())
}Step 4: Pass init=/sbin/ignite-init in the kernel boot arguments when launching Firecracker.
// In the kernel boot args construction:
let boot_args = format!(
"console=ttyS0 reboot=k panic=1 pci=off \
ip={vm_ip}::172.16.0.1:255.255.0.0::eth0:on \
hostname={name} \
init=/sbin/ignite-init"
);Step 5: For the Ignitefile CMD, ENTRYPOINT, ENV directives in ign build:
// crates/ignite-core/src/oci.rs - Ignitefile parser extension
pub enum IgnitefileInstruction {
From(String),
Run(String),
Copy { src: String, dst: String },
Cmd(Vec<String>), // NEW
Entrypoint(Vec<String>), // NEW
Env { key: String, val: String }, // NEW
WorkDir(String), // NEW
Expose(u16), // NEW (informational)
// VM-specific extensions
VmKernel(String),
VmVcpus(u32),
VmMemory(u64),
VmSnapshotPolicy(String),
VmIopsLimit(u64),
}After ign build completes, serialize the accumulated CMD/ENTRYPOINT/ENV into ignite-config.json in the image cache.
Crate: packaging/deb/, packaging/rpm/, crates/ignited/src/
Priority: P0
Add the static virtiofsd binary to bin/ alongside firecracker. The packaging scripts must include it in the .deb/.rpm at /usr/lib/ignite/virtiofsd.
# packaging/scripts/download_deps.sh
VIRTIOFSD_VERSION="v1.11.1"
VIRTIOFSD_URL="https://gitlab.com/virtio-fs/virtiofsd/-/releases/${VIRTIOFSD_VERSION}/downloads/virtiofsd-x86_64"
curl -L -o bin/virtiofsd "${VIRTIOFSD_URL}"
chmod +x bin/virtiofsd// crates/ignited/src/vm_manager.rs
fn find_virtiofsd() -> Option<PathBuf> {
// Priority order: bundled > system PATH
let candidates = [
PathBuf::from("/usr/lib/ignite/virtiofsd"),
PathBuf::from("/usr/bin/virtiofsd"),
PathBuf::from("/usr/local/bin/virtiofsd"),
];
candidates.into_iter().find(|p| p.exists())
}
fn start_virtiofsd(host_path: &Path, socket: &Path) -> Result<Child> {
let binary = find_virtiofsd()
.ok_or_else(|| anyhow!("virtiofsd not found. This is a packaging bug."))?;
Command::new(binary)
.args([
&format!("--socket-path={}", socket.display()),
"--shared-dir", host_path.to_str().unwrap(),
"--cache=auto",
])
.spawn()
.map_err(Into::into)
}// crates/ign/src/commands/system.rs
fn check_virtiofsd(results: &mut Vec<DoctorCheck>) {
let found = ["/usr/lib/ignite/virtiofsd", "/usr/bin/virtiofsd"]
.iter()
.any(|p| Path::new(p).exists());
results.push(DoctorCheck {
name: "virtiofsd".to_string(),
status: if found { CheckStatus::Ok } else { CheckStatus::Warning },
message: if found {
"virtiofsd found — volume mounts enabled".to_string()
} else {
"virtiofsd not found — volume mounts (-v) will fail. Run: sudo apt install virtiofsd".to_string()
},
});
}Files: packaging/systemd/ignited.service, ignited/src/main.rs
Priority: P0
Replace the current blanket-root service with a dedicated user and explicit capabilities:
# packaging/systemd/ignited.service
[Unit]
Description=Ignite MicroVM Daemon
Documentation=https://github.com/Subeshrock/micro-vm-ecosystem
After=network.target
Wants=network.target
[Service]
Type=notify
ExecStart=/usr/bin/ignited
ExecReload=/bin/kill -s HUP $MAINPID
# ── Privilege Model ──────────────────────────────────────────────
# Run as dedicated ignite system user, NOT root
User=ignite
Group=ignite
# Grant only the capabilities actually needed:
# CAP_NET_ADMIN — create/manage TAP devices, bridges, iptables rules
# CAP_SYS_ADMIN — mount operations, device mapper, cgroups v2 delegation
# CAP_NET_RAW — raw socket access for VXLAN
# CAP_SETUID — for jailer subprocess (Firecracker)
# CAP_SETGID — for jailer subprocess
AmbientCapabilities=CAP_NET_ADMIN CAP_SYS_ADMIN CAP_NET_RAW CAP_SETUID CAP_SETGID
CapabilityBoundingSet=CAP_NET_ADMIN CAP_SYS_ADMIN CAP_NET_RAW CAP_SETUID CAP_SETGID
# Lock down the rest
NoNewPrivileges=false
PrivateTmp=true
ProtectHome=read-only
ProtectSystem=false # Must manage /var/lib/ignite
# Runtime directory
RuntimeDirectory=ignite
RuntimeDirectoryMode=0750
StateDirectory=ignite
StateDirectoryMode=0750
# Socket group allows ign CLI (user) to connect
SocketGroup=ignite
[Install]
WantedBy=multi-user.target# packaging/scripts/postinstall.sh
#!/bin/sh
# Create ignite system user if it doesn't exist
if ! id ignite >/dev/null 2>&1; then
useradd --system --no-create-home --shell /usr/sbin/nologin \
--comment "Ignite MicroVM Daemon" ignite
fi
# Add ignite to kvm group for /dev/kvm access
usermod -aG kvm ignite
# Set socket permissions so ign CLI users can connect
# Users must be in the 'ignite' group to use ign CLI
chown root:ignite /run/ignite/ignite.sock 2>/dev/null || true
chmod 0660 /run/ignite/ignite.sock 2>/dev/null || true
# /dev/kvm access
chmod 0660 /dev/kvm 2>/dev/null || true
chown root:kvm /dev/kvm 2>/dev/null || true
systemctl daemon-reload
systemctl enable ignited// crates/ignited/src/main.rs
fn create_api_socket(path: &Path) -> Result<UnixListener> {
if path.exists() {
std::fs::remove_file(path)?;
}
let listener = UnixListener::bind(path)?;
// Set socket to group-writable so 'ignite' group members can connect
std::os::unix::fs::chown(path, None, Some(get_gid("ignite")?))?;
std::fs::set_permissions(path, std::fs::Permissions::from_mode(0o660))?;
Ok(listener)
}Crate: crates/ignited/src/state/ (new module)
Priority: P1 — prevents data loss on daemon crash
Use sled embedded database for the WAL. Sled is pure Rust, has atomic batch writes, and does not require a separate process.
# crates/ignited/Cargo.toml
[dependencies]
sled = "0.34"Every mutation to daemon state writes a WAL entry before changing in-memory state:
// crates/ignited/src/state/wal.rs
use sled::{Db, Tree};
use serde::{Serialize, Deserialize};
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum WalEntry {
VmCreating { vm_id: String, config: VmConfig },
VmStarted { vm_id: String, pid: u32, ip: String, fc_socket: String },
VmStopping { vm_id: String },
VmStopped { vm_id: String },
VmDestroyed { vm_id: String },
SnapshotCreated { vm_id: String, snap_id: String, path: String },
VolumeAttached { vm_id: String, host_path: String, vm_path: String },
NetworkCreated { net_id: String, name: String, subnet: String, bridge: String },
NetworkDeleted { net_id: String },
}
pub struct Wal {
db: Db,
log: Tree,
state: Tree,
}
impl Wal {
pub fn open(data_dir: &Path) -> Result<Self> {
let db = sled::open(data_dir.join("wal.db"))?;
let log = db.open_tree("log")?;
let state = db.open_tree("state")?;
Ok(Self { db, log, state })
}
/// Append an entry to the log and flush synchronously (O_SYNC semantics)
pub fn append(&self, entry: &WalEntry) -> Result<u64> {
let seq = self.db.generate_id()?;
let key = seq.to_be_bytes();
let value = serde_json::to_vec(entry)?;
self.log.insert(key, value)?;
self.log.flush()?; // fsync — critical for crash safety
Ok(seq)
}
/// Write committed state (used after a transition is complete)
pub fn commit_state(&self, vm_id: &str, state: &VmState) -> Result<()> {
let value = serde_json::to_vec(state)?;
self.state.insert(vm_id.as_bytes(), value)?;
self.state.flush()?;
Ok(())
}
/// Remove a VM from committed state
pub fn remove_state(&self, vm_id: &str) -> Result<()> {
self.state.remove(vm_id.as_bytes())?;
self.state.flush()?;
Ok(())
}
/// Iterate all committed states — called on daemon startup for recovery
pub fn all_states(&self) -> Result<Vec<VmState>> {
self.state
.iter()
.map(|r| {
let (_, v) = r?;
Ok(serde_json::from_slice(&v)?)
})
.collect()
}
}// crates/ignited/src/state/recovery.rs
pub struct RecoveryResult {
pub adopted: Vec<String>, // VMs successfully re-adopted
pub orphaned: Vec<String>, // VMs whose Firecracker process is gone
pub cleaned: Vec<String>, // Resources cleaned up
}
pub async fn recover_from_wal(wal: &Wal, fc_manager: &mut VmManager) -> Result<RecoveryResult> {
let mut result = RecoveryResult::default();
for state in wal.all_states()? {
match state.status {
VmStatus::Running { pid, fc_socket, ip } => {
// Check if the Firecracker process is still alive
if process_exists(pid) {
// Re-adopt the running VM — reconnect to Firecracker socket
match fc_manager.adopt(state.vm_id.clone(), pid, fc_socket, ip).await {
Ok(_) => result.adopted.push(state.vm_id),
Err(e) => {
tracing::warn!("Failed to adopt VM {}: {}", state.vm_id, e);
cleanup_orphan(&state, wal).await?;
result.orphaned.push(state.vm_id);
}
}
} else {
// Process is gone — clean up DM, TAP, COW resources
cleanup_orphan(&state, wal).await?;
result.orphaned.push(state.vm_id);
}
}
VmStatus::Creating => {
// Was in the middle of creation — incomplete, clean up
cleanup_orphan(&state, wal).await?;
result.cleaned.push(state.vm_id);
}
VmStatus::Stopped => {
// Just restore stopped VM metadata into memory (no process to adopt)
fc_manager.restore_stopped_metadata(state);
}
}
}
Ok(result)
}
fn process_exists(pid: u32) -> bool {
Path::new(&format!("/proc/{}", pid)).exists()
}Crate: crates/ignite-compose/src/
Priority: P1
Create schema_v3.rs that parses standard Docker Compose v3 YAML. The existing ignite-compose.yml with version: "1.0" must still parse for backward compatibility, then internally convert to the v3 schema struct.
// crates/ignite-compose/src/schema_v3.rs
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
/// Root compose file — compatible with Docker Compose v3.x
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ComposeFile {
/// Accept "1.0", "3", "3.8", "3.x" — map all to v3 internally
#[serde(default)]
pub version: Option<String>,
#[serde(default)]
pub services: HashMap<String, ServiceConfig>,
#[serde(default)]
pub networks: HashMap<String, NetworkConfig>,
#[serde(default)]
pub volumes: HashMap<String, VolumeConfig>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ServiceConfig {
pub image: Option<String>,
pub build: Option<BuildConfig>,
#[serde(default)]
pub ports: Vec<PortMapping>, // ["8080:80", "443:443"]
#[serde(default)]
pub volumes: Vec<VolumeMount>, // ["./data:/var/lib/data"]
#[serde(default)]
pub environment: EnvSpec, // Map or list form
#[serde(default)]
pub networks: Vec<String>, // ["frontend", "backend"]
#[serde(default)]
pub depends_on: DependsOnSpec,
#[serde(default)]
pub healthcheck: Option<HealthCheck>,
pub deploy: Option<DeployConfig>, // for 'replicas'
/// VM-specific extension block (Ignite-only, ignored by docker-compose)
pub vm: Option<VmExtension>,
// Renamed from v1.0 'cpus'/'memory' — keep for backward compat
pub cpus: Option<f64>,
pub memory: Option<u64>,
pub command: Option<CommandSpec>, // Override CMD
pub entrypoint: Option<CommandSpec>, // Override ENTRYPOINT
pub hostname: Option<String>,
pub restart: Option<String>, // "no", "always", "on-failure", "unless-stopped"
}
/// Ignite-specific VM configuration — silently ignored by docker-compose
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
pub struct VmExtension {
pub kernel: Option<String>, // "ignite/kernels:6.1-slim"
pub vcpus: Option<u32>,
pub memory: Option<u64>, // MiB
pub iops_limit: Option<u32>,
pub snapshot_interval: Option<String>, // "1h", "30m"
pub volume_encryption: Option<String>, // "aes256-xts"
}
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
pub struct NetworkConfig {
pub driver: Option<String>, // "bridge", "overlay", "host", "none"
pub driver_opts: Option<HashMap<String, String>>,
#[serde(default)]
pub internal: bool, // No external connectivity
pub ipam: Option<IpamConfig>,
/// Ignite overlay will use WireGuard automatically when driver="overlay"
pub external: Option<bool>,
}
/// For DependsOn — support both list form and condition form
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(untagged)]
pub enum DependsOnSpec {
List(Vec<String>),
Map(HashMap<String, DependsOnCondition>),
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct DependsOnCondition {
pub condition: String, // "service_started", "service_healthy", "service_completed_successfully"
}
/// Support both map form and list form for environment
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(untagged)]
pub enum EnvSpec {
Map(HashMap<String, String>),
List(Vec<String>), // "KEY=VALUE" or "KEY" (inherit from host)
}When a compose stack specifies networks:, create a separate Linux bridge per named network. Services without an explicit network assignment go on the default bridge.
// crates/ignite-compose/src/orchestrator.rs
async fn provision_compose_networks(
compose: &ComposeFile,
daemon_client: &DaemonClient,
) -> Result<HashMap<String, CreatedNetwork>> {
let mut created = HashMap::new();
for (name, config) in &compose.networks {
let driver = config.driver.as_deref().unwrap_or("bridge");
let req = CreateNetworkRequest {
name: format!("{}_{}_{}", stack_name, name, random_suffix()),
driver: driver.to_string(),
internal: config.internal,
subnet: config.ipam.as_ref()
.and_then(|i| i.config.first())
.and_then(|c| c.subnet.clone()),
};
let net = daemon_client.create_network(req).await?;
created.insert(name.clone(), net);
}
// Ensure default network exists for services without explicit network config
if !created.contains_key("default") {
let default_net = daemon_client.create_network(CreateNetworkRequest {
name: format!("{}_default", stack_name),
driver: "bridge".to_string(),
..Default::default()
}).await?;
created.insert("default".to_string(), default_net);
}
Ok(created)
}Crate: crates/ignited/src/api/vms.rs and vm_manager.rs
Priority: P1 — current behavior leaves dangling resources
// crates/ignited/src/vm_manager.rs
pub async fn destroy_vm(&self, vm_id: &str) -> Result<()> {
let state = self.get_state(vm_id)?;
// 1. Ensure VM is stopped
if state.status == VmStatus::Running { .. } {
self.stop_vm(vm_id).await?;
}
// 2. Kill virtiofsd processes for this VM
for vol in &state.volumes {
if let Some(pid) = vol.virtiofsd_pid {
let _ = signal::kill(Pid::from_raw(pid as i32), Signal::SIGTERM);
}
}
// 3. Remove Device Mapper snapshot
if let Some(dm_name) = &state.dm_device_name {
Command::new("dmsetup").args(["remove", dm_name]).status()?;
}
// 4. Detach loop device for COW file
if let Some(loop_dev) = &state.cow_loop_dev {
Command::new("losetup").args(["-d", loop_dev]).status()?;
}
// 5. Delete COW file
if let Some(cow_path) = &state.cow_path {
let _ = std::fs::remove_file(cow_path);
}
// 6. Delete TAP device
if let Some(tap_name) = &state.tap_device {
Command::new("ip")
.args(["link", "delete", tap_name])
.status()?;
}
// 7. Release IP back to IPAM pool
self.ipam.release(&state.ip)?;
// 8. Remove Firecracker socket and working directory
if let Some(fc_dir) = &state.firecracker_dir {
let _ = std::fs::remove_dir_all(fc_dir);
}
// 9. Remove from WAL and in-memory state
self.wal.remove_state(vm_id)?;
self.vms.lock().await.remove(vm_id);
Ok(())
}Crate: crates/ign/src/commands/image.rs, crates/ignited/src/api/images.rs
Priority: P2
Pauses the VM, flushes the COW delta to the base image layer to create a new read-only image, then resumes.
// crates/ignited/src/api/images.rs
pub async fn commit_vm(vm_id: &str, tag: &str, vm_manager: &VmManager) -> Result<ImageId> {
// 1. Pause VM via Firecracker API
vm_manager.pause_vm(vm_id).await?;
// 2. Merge COW layer into new base image
// Use dmsetup to read the merged snapshot into a new ext4 file
let src_dm = vm_manager.get_dm_device(vm_id)?;
let new_image_path = images_dir().join(format!("{}.ext4", tag_to_path(tag)));
// dd the device mapper device into a new file
Command::new("dd")
.args([
&format!("if={}", src_dm),
&format!("of={}", new_image_path.display()),
"bs=4M",
])
.status()?;
// 3. Write ignite-config.json for the new image
let config = vm_manager.get_vm_config(vm_id)?;
std::fs::write(
new_image_path.with_extension("json"),
serde_json::to_vec_pretty(&config)?,
)?;
// 4. Resume VM
vm_manager.resume_vm(vm_id).await?;
// 5. Register in local image store
let image_id = register_local_image(tag, &new_image_path)?;
Ok(image_id)
}Bundle the ext4 image file + ignite-config.json into a compressed tar. This is a simpler version of the existing ign export / ign import which operates on VM snapshots.
// crates/ign/src/commands/image.rs
pub async fn cmd_save(image: &str, output: &Path, client: &Client) -> Result<()> {
let resp = client.get_image_export(image).await?;
let file = File::create(output)?;
let gz = GzEncoder::new(file, Compression::default());
let mut tar = TarBuilder::new(gz);
tar.append_data(&mut resp.ext4_reader(), "rootfs.ext4")?;
tar.append_data(&mut Cursor::new(&resp.config_json), "ignite-config.json")?;
tar.append_data(&mut Cursor::new(&resp.manifest_json), "manifest.json")?;
tar.finish()?;
println!("Saved {} to {}", image, output.display());
Ok(())
}
pub async fn cmd_load(input: &Path, client: &Client) -> Result<()> {
let file = File::open(input)?;
let gz = GzDecoder::new(file);
let mut archive = Archive::new(gz);
let resp = client.post_image_import(archive.entries()?).await?;
println!("Loaded image: {}", resp.tag);
Ok(())
}Duration: 6 weeks
Goal: Refactor the brittle CLI-subprocess internals into proper Rust-native library calls. Build the chaos test framework. Expand the Docker Hub compatibility test matrix.
Priority: P1 — std::process::Command("dmsetup") error handling is string parsing; brittle.
Extract ignite-core/src/storage.rs into crates/ignite-storage/ and replace the CLI subprocess calls with the devicemapper Rust crate.
# crates/ignite-storage/Cargo.toml
[dependencies]
devicemapper = "0.34" # Safe Rust bindings for libdevmapper
loopdev = "0.4" # Safe Rust bindings for loop devices// crates/ignite-storage/src/dm.rs
use devicemapper::{DmOptions, DmName, DevId, Segment, LinearDev, SnapshotDev};
pub struct DmSnapshots {
dm: DM,
}
impl DmSnapshots {
pub fn new() -> Result<Self> {
Ok(Self { dm: DM::new()? })
}
/// Create a snapshot of base_dev, writing changes to cow_dev
pub fn create_snapshot(
&self,
name: &str,
base_dev: &Path, // loop device of base ext4
cow_dev: &Path, // loop device of sparse COW file
) -> Result<DmDevice> {
let dm_name = DmName::new(name)?;
// Origin target wraps the read-only base
let origin = LinearDev::setup(
&self.dm,
&DmName::new(&format!("{}-origin", name))?,
None,
vec![Segment::new(base_dev, Sector(0), device_size(base_dev)?)],
)?;
// Snapshot combines origin + COW
let snap = SnapshotDev::setup(
&self.dm,
&dm_name,
None,
&origin,
cow_dev,
true, // persistent=true
)?;
Ok(DmDevice { name: name.to_string(), path: snap.path()? })
}
pub fn remove_snapshot(&self, name: &str) -> Result<()> {
self.dm.device_remove(&DevId::Name(DmName::new(name)?), &DmOptions::default())?;
Ok(())
}
}// crates/ignite-storage/src/cow.rs
use loopdev::{LoopControl, LoopDevice};
pub fn attach_loop_device(file: &Path) -> Result<LoopDevice> {
let control = LoopControl::open()?;
let dev = control.next_free()?;
dev.with().read_only(false).attach(file)?;
Ok(dev)
}
pub fn detach_loop_device(dev: &LoopDevice) -> Result<()> {
dev.detach()?;
Ok(())
}
pub fn create_cow_file(path: &Path, size_bytes: u64) -> Result<()> {
// Create a sparse file (no actual disk allocation until written)
let file = File::create(path)?;
file.set_len(size_bytes)?;
Ok(())
}Extract ignite-core/src/network.rs into crates/ignite-net/. Replace ip link/brctl/iptables subprocess calls with rtnetlink.
# crates/ignite-net/Cargo.toml
[dependencies]
rtnetlink = "0.13"
netlink-packet-route = "0.17"
ipnetwork = "0.20"// crates/ignite-net/src/bridge.rs
use rtnetlink::{new_connection, Handle};
pub struct BridgeManager {
handle: Handle,
}
impl BridgeManager {
pub async fn new() -> Result<Self> {
let (conn, handle, _) = new_connection()?;
tokio::spawn(conn);
Ok(Self { handle })
}
pub async fn create_bridge(&self, name: &str) -> Result<u32> {
self.handle
.link()
.add()
.bridge(name.to_string())
.execute()
.await?;
// Get the interface index
let link = self.get_link_by_name(name).await?;
// Set UP
self.handle.link().set(link.header.index).up().execute().await?;
Ok(link.header.index)
}
pub async fn add_tap_to_bridge(&self, tap_name: &str, bridge_idx: u32) -> Result<()> {
// Create TAP device
self.handle
.link()
.add()
.tap(tap_name.to_string())
.execute()
.await?;
let tap_link = self.get_link_by_name(tap_name).await?;
// Attach to bridge
self.handle
.link()
.set(tap_link.header.index)
.controller(bridge_idx)
.execute()
.await?;
// Set UP
self.handle.link().set(tap_link.header.index).up().execute().await?;
Ok(())
}
}Crate: crates/ignite-storage/src/snapshot_tree.rs
Priority: P1 — git is an external dep, wrong abstraction, and breaks CoW delta efficiency.
The git-based approach (ADR in the codebase) must be removed and replaced with a proper snapshot graph backed by sled.
// crates/ignite-storage/src/snapshot_tree.rs
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SnapshotNode {
pub id: String, // UUID
pub vm_id: String,
pub parent_id: Option<String>, // None = root snapshot
pub created_at: u64, // Unix timestamp
pub label: Option<String>, // User-defined e.g. "pre-deploy"
pub tag: Option<String>, // e.g. "snap:6"
pub memory_path: PathBuf, // Firecracker .mem file
pub snapshot_path: PathBuf, // Firecracker .snap file
pub cow_delta_path: PathBuf, // COW diff since parent
pub cow_delta_size: u64, // Bytes
pub memory_size: u64,
}
pub struct SnapshotTree {
db: sled::Tree,
}
impl SnapshotTree {
/// Create a new snapshot, optionally parenting off another
pub fn create(&self, node: &SnapshotNode) -> Result<()> {
let key = node.id.as_bytes().to_vec();
let value = serde_json::to_vec(node)?;
self.db.insert(key, value)?;
self.db.flush()?;
Ok(())
}
/// List all snapshots for a VM in chronological order
pub fn history(&self, vm_id: &str) -> Result<Vec<SnapshotNode>> {
let mut nodes: Vec<SnapshotNode> = self.db
.iter()
.filter_map(|r| r.ok())
.filter_map(|(_, v)| serde_json::from_slice(&v).ok())
.filter(|n: &SnapshotNode| n.vm_id == vm_id)
.collect();
nodes.sort_by_key(|n| n.created_at);
Ok(nodes)
}
/// Fork a new VM from a historical snapshot (like git checkout -b)
pub fn branch(&self, snap_id: &str, new_vm_id: &str) -> Result<SnapshotNode> {
let parent = self.get(snap_id)?;
// Create a new COW layer that reads from the snapshot's state
// The new VM will start from the snapshot's disk + memory state
let new_node = SnapshotNode {
id: uuid::Uuid::new_v4().to_string(),
vm_id: new_vm_id.to_string(),
parent_id: Some(snap_id.to_string()),
created_at: unix_now(),
label: Some(format!("branched-from-{}", snap_id)),
..parent.clone()
};
self.create(&new_node)?;
Ok(new_node)
}
/// Compute filesystem diff between two snapshots
pub fn diff(&self, snap_a: &str, snap_b: &str) -> Result<SnapshotDiff> {
// Mount both COW layers read-only and run a recursive diff
let a = self.get(snap_a)?;
let b = self.get(snap_b)?;
// Use debugfs to list changed inodes without mounting
compute_ext4_diff(&a.cow_delta_path, &b.cow_delta_path)
}
}Location: tests/chaos/
Priority: P1 — required to validate WAL recovery
// tests/chaos/wal_recovery_test.rs
#[tokio::test]
#[ignore = "requires KVM and root"]
async fn test_recovery_after_sigkill_during_create() {
let daemon = TestDaemon::start().await;
// Begin VM creation
let create_handle = tokio::spawn(async {
daemon.client().create_vm("alpine:latest").await
});
// Kill daemon immediately after WAL write but before VM boot
tokio::time::sleep(Duration::from_millis(50)).await;
daemon.kill_sigkill().await;
// Restart daemon
let daemon2 = TestDaemon::start_with_existing_data(daemon.data_dir()).await;
// VMs that were mid-creation should be cleaned up (not left in broken state)
let vms = daemon2.client().list_vms().await.unwrap();
assert!(vms.is_empty(), "Half-created VM should have been cleaned up");
// No dangling loop devices
assert!(!daemon2.has_dangling_loop_devices().await);
// No dangling DM snapshots
assert!(!daemon2.has_dangling_dm_devices().await);
}
#[tokio::test]
#[ignore = "requires KVM and root"]
async fn test_running_vm_survives_daemon_restart() {
let daemon = TestDaemon::start().await;
let vm_id = daemon.client().run_vm("alpine:latest").await.unwrap();
// Wait for VM to be running
daemon.wait_for_status(&vm_id, VmStatus::Running).await;
// Gracefully restart the daemon
daemon.restart().await;
// The running VM should still be there
let vms = daemon.client().list_vms().await.unwrap();
assert_eq!(vms.len(), 1);
assert_eq!(vms[0].id, vm_id);
assert_eq!(vms[0].status, VmStatus::Running);
}Location: tests/compat/
Priority: P2
Build an automated nightly test that pulls the top-N Docker Hub images and verifies:
- Pull succeeds (OCI → ext4 conversion)
ignite-config.jsonis correctly extracted (CMD/ENTRYPOINT/ENV)- VM boots (
ign runreturns Running status) - Healthcheck passes (if declared in the OCI config)
- Main process responds (TCP probe on exposed port if declared)
// tests/compat/docker_hub_matrix.rs
const TEST_IMAGES: &[(&str, &str, Option<u16>)] = &[
("nginx:alpine", "nginx: master process", Some(80)),
("redis:7", "Ready to accept connections", Some(6379)),
("postgres:15", "database system is ready", Some(5432)),
("ubuntu:22.04", "", None),
("alpine:latest", "", None),
("python:3.11-slim", "", None),
("node:20-alpine", "", None),
("golang:1.21", "", None),
("rust:1.75", "", None),
("debian:bookworm", "", None),
// ... expand to 100+ images
];
#[tokio::test]
#[ignore = "nightly CI only — requires KVM"]
async fn test_docker_hub_compatibility() {
let mut results = vec![];
for (image, expected_log, port) in TEST_IMAGES {
let result = test_single_image(image, expected_log, *port).await;
results.push(CompatResult { image, result });
}
let pass_count = results.iter().filter(|r| r.result.is_ok()).count();
let total = results.len();
let pass_rate = pass_count as f64 / total as f64;
println!("Compat matrix: {}/{} passed ({:.1}%)", pass_count, total, pass_rate * 100.0);
// Assert 98% pass rate
assert!(pass_rate >= 0.98, "Compat rate below 98%: {:.1}%", pass_rate * 100.0);
}Duration: 12 weeks
Goal: WireGuard-encrypted Swarm, Raft consensus, live VM migration (Teleport), Ignite Hub, gRPC interface, Prometheus metrics.
Priority: P0 for multi-tenant Swarm — all VXLAN traffic is currently plaintext.
Use boringtun (pure Rust WireGuard implementation) embedded in ignited. No external wg binary dependency.
# crates/ignite-net/Cargo.toml
[dependencies]
boringtun = "0.6"// crates/ignite-net/src/wireguard.rs
use boringtun::crypto::{X25519PublicKey, X25519SecretKey};
use boringtun::device::drop_privileges;
use boringtun::device::{DeviceConfig, DeviceHandle};
pub struct WireGuardNode {
secret_key: X25519SecretKey,
public_key: X25519PublicKey,
handle: DeviceHandle,
}
impl WireGuardNode {
/// Called on `ign swarm init` or `ign swarm join` — generates keypair,
/// creates a WireGuard interface, and listens for peer configurations.
pub fn new(listen_port: u16) -> Result<Self> {
let secret_key = X25519SecretKey::new();
let public_key = secret_key.public_key();
let config = DeviceConfig {
n_threads: 2,
use_connected_socket: true,
..Default::default()
};
let handle = DeviceHandle::new("ignite-wg0", config)?;
Ok(Self { secret_key, public_key, handle })
}
pub fn public_key_base64(&self) -> String {
base64::encode(self.public_key.as_bytes())
}
/// Add a peer (called when a new Swarm node joins)
pub fn add_peer(&self, public_key_b64: &str, endpoint: SocketAddr, allowed_ips: &[IpNetwork]) -> Result<()> {
let pk_bytes = base64::decode(public_key_b64)?;
let pk = X25519PublicKey::from(pk_bytes.as_slice());
self.handle.add_peer(
pk,
Some(endpoint),
allowed_ips,
None, // preshared_key
Some(25), // keepalive seconds
)?;
Ok(())
}
}When ign swarm init is called:
- Generate WireGuard keypair, store in
/var/lib/ignite/wg.key - Start listening on UDP port 51820
- Advertise public key + endpoint in the swarm gossip state
When ign swarm join <seed-ip> is called:
- Generate WireGuard keypair
- POST to seed node's
/api/v1/swarm/joinwith{ public_key, endpoint, subnet_lease_request } - Seed responds with its public key + all existing peer public keys
- Both nodes call
add_peer()on each other - VXLAN traffic flows inside WireGuard tunnel
Crate: crates/ignited/src/swarm/
Priority: P1
Replace the current seed-based approach with Raft via the openraft crate.
# crates/ignited/Cargo.toml
[dependencies]
openraft = { version = "0.9", features = ["serde"] }// crates/ignited/src/swarm/raft.rs
use openraft::{Config, Raft, RaftMetrics};
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum SwarmCommand {
RegisterNode { node_id: u64, ip: String, public_key: String, subnet: String },
DeregisterNode { node_id: u64 },
UpdateVmPlacement { vm_id: String, node_id: u64 },
RemoveVmPlacement { vm_id: String },
CreateService { name: String, spec: ServiceSpec },
UpdateService { name: String, spec: ServiceSpec },
DeleteService { name: String },
}
pub type IgniteRaft = Raft<IgniteTypeConfig>;
pub async fn create_raft_node(
node_id: u64,
config: Arc<Config>,
network: Arc<IgniteNetwork>,
storage: Arc<IgniteStorage>,
) -> Result<IgniteRaft> {
let raft = Raft::new(node_id, config, network, storage).await?;
Ok(raft)
}
/// Called on `ign swarm init` — bootstrap a single-node cluster
pub async fn bootstrap_cluster(raft: &IgniteRaft, node_id: u64, addr: String) -> Result<()> {
let members = BTreeMap::from([(node_id, BasicNode { addr })]);
raft.initialize(members).await?;
Ok(())
}
/// Called on `ign swarm join` — add this node to existing cluster
pub async fn join_cluster(
raft: &IgniteRaft,
leader_addr: &str,
node_id: u64,
my_addr: String,
) -> Result<()> {
// Contact the leader and request to be added
let client = SwarmClient::new(leader_addr);
client.add_learner(node_id, my_addr.clone()).await?;
client.change_membership(node_id).await?;
Ok(())
}Crate: crates/ignite-teleport/
Priority: P2 — flagship feature
Pre-copy memory migration protocol:
Source Node Destination Node
----------- ----------------
1. Mark all memory pages as "dirty" 1. Allocate memory buffer
via KVM_GET_DIRTY_LOG ioctl
2. Bulk copy ALL pages over WireGuard 2. Write pages to buffer
overlay network
3. Repeat: copy only dirty pages 3. Receive dirty pages
(pages written since last round)
Iterate until dirty rate < threshold
4. Pause VM (VMCL: pause call to 4. Receive final delta
Firecracker /pause)
5. Copy final dirty pages + CPU state 5. Reconstruct VM memory
(Firecracker snapshot files)
6. Notify destination: "start VM" 6. Load snapshot into new
Firecracker instance
7. Update overlay network routing 7. VM resumes
(VM IP now routes to dest node)
8. Destroy source VM
// crates/ignite-teleport/src/sender.rs
use kvm_ioctls::{Kvm, VmFd};
pub struct MigrationSender {
vm_fd: VmFd,
fc_client: FirecrackerClient,
wg_stream: TcpStream, // WireGuard-encrypted stream to destination
}
impl MigrationSender {
pub async fn migrate(
mut self,
vm_id: &str,
dest_addr: SocketAddr,
) -> Result<MigrationStats> {
let total_pages = self.vm_fd.get_num_pages()?;
let page_size = 4096u64;
// Phase 1: Enable dirty tracking
self.vm_fd.enable_dirty_log()?;
// Phase 2: Initial bulk transfer
let initial_dirty = self.get_all_pages()?;
self.send_pages(&initial_dirty).await?;
// Phase 3: Iterative refinement
let mut round = 0;
loop {
let dirty = self.get_dirty_pages()?;
let dirty_count = dirty.count_ones() as u64;
tracing::debug!("Migration round {}: {} dirty pages", round, dirty_count);
if dirty_count < MIGRATION_THRESHOLD_PAGES {
break; // Dirty rate low enough to do final pause
}
self.send_pages_by_bitmap(&dirty).await?;
round += 1;
}
// Phase 4: Final pause + snapshot
self.fc_client.pause_vm().await?;
// Send final dirty pages
let final_dirty = self.get_dirty_pages()?;
self.send_pages_by_bitmap(&final_dirty).await?;
// Send Firecracker snapshot (CPU state)
let snap = self.fc_client.create_snapshot().await?;
self.send_snapshot(snap).await?;
// Signal destination to resume
self.send_signal(MigrationSignal::Resume).await?;
// Update routing in Swarm overlay
update_vm_routing(vm_id, dest_addr).await?;
Ok(MigrationStats { rounds: round + 1, total_pages })
}
fn get_dirty_pages(&self) -> Result<BitVec> {
// KVM_GET_DIRTY_LOG ioctl
self.vm_fd.get_dirty_log(0, todo!("slot size"))
.map(|bitmap| BitVec::from_vec(bitmap))
.map_err(Into::into)
}
}Priority: P1 — required for vk8s CRI (Phase 4)
// crates/ignite-proto/proto/vm.proto
syntax = "proto3";
package ignite.v1;
service VmService {
rpc CreateVm (CreateVmRequest) returns (CreateVmResponse);
rpc StartVm (VmIdRequest) returns (VmStatusResponse);
rpc StopVm (VmIdRequest) returns (VmStatusResponse);
rpc DeleteVm (VmIdRequest) returns (google.protobuf.Empty);
rpc ListVms (ListVmsRequest) returns (ListVmsResponse);
rpc GetVm (VmIdRequest) returns (VmInfo);
rpc ExecCommand (ExecRequest) returns (stream ExecOutput);
rpc StreamLogs (LogRequest) returns (stream LogLine);
rpc CreateSnapshot (SnapshotRequest) returns (SnapshotInfo);
rpc RestoreSnapshot (RestoreRequest) returns (VmInfo);
rpc MigrateVm (MigrateRequest) returns (stream MigrationProgress);
}
message VmInfo {
string id = 1;
string image = 2;
string status = 3;
string ip = 4;
uint32 vcpus = 5;
uint64 memory_mb = 6;
repeated PortMapping ports = 7;
int64 created_at = 8;
}# crates/ignite-proto/Cargo.toml
[dependencies]
tonic = "0.11"
prost = "0.12"
[build-dependencies]
tonic-build = "0.11"Add gRPC server alongside the existing axum REST server in ignited:
// crates/ignited/src/main.rs
#[tokio::main]
async fn main() -> Result<()> {
let state = Arc::new(DaemonState::new().await?);
// REST API on Unix socket (for ign CLI)
let rest_server = start_rest_server(state.clone());
// gRPC on TCP (for vk8s CRI plugin, SDK)
let grpc_server = Server::builder()
.add_service(VmServiceServer::new(GrpcVmService::new(state.clone())))
.serve("[::1]:7071".parse()?);
tokio::select! {
_ = rest_server => {},
_ = grpc_server => {},
}
Ok(())
}Priority: P2
// crates/ignited/src/metrics.rs
use prometheus::{Registry, Gauge, Counter, Histogram, GaugeVec};
pub struct IgniteMetrics {
pub vms_running: Gauge,
pub vms_total: Counter,
pub vm_boot_duration: Histogram,
pub vm_memory_usage: GaugeVec, // labels: vm_id
pub vm_cpu_usage: GaugeVec, // labels: vm_id
pub snapshot_count: GaugeVec, // labels: vm_id
}
// Expose at GET /metrics in axum router
async fn metrics_handler() -> impl IntoResponse {
let encoder = TextEncoder::new();
let metric_families = prometheus::gather();
let mut buffer = Vec::new();
encoder.encode(&metric_families, &mut buffer).unwrap();
(
[(header::CONTENT_TYPE, "text/plain; charset=utf-8")],
buffer,
)
}Priority: P2
// crates/ignite-image/src/vmif.rs
/// VMIF (VM Image Format) — the stable on-disk format for Ignite images
/// Layout (OCI-compatible artifact stored in any OCI registry):
/// ignite.toml — image metadata
/// rootfs.sqfs — squashfs root filesystem (read-only, compressed)
/// kernel.vmlinuz — guest kernel (optional, uses bundled default if absent)
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct VmifManifest {
pub schema_version: u32, // 1
pub created: String, // RFC3339 timestamp
pub arch: String, // "amd64", "arm64"
pub kernel: Option<String>, // OCI digest of kernel layer
pub rootfs: String, // OCI digest of rootfs layer
pub config: OciImageConfig, // CMD, ENTRYPOINT, ENV, etc.
pub labels: HashMap<String, String>,
pub size_bytes: u64, // Uncompressed rootfs size
}// crates/ignite-image/src/hub_bridge.rs
/// Convert a Docker Hub OCI image to VMIF format.
/// This is the core of the "Ignite Hub bridge" feature.
/// Called once per image tag; result cached forever.
pub async fn convert_docker_hub_to_vmif(
image_ref: &str,
kernel_ref: Option<&str>,
) -> Result<VmifManifest> {
// 1. Pull OCI layers from Docker Hub
let oci_client = OciClient::new();
let (manifest, config, layers) = oci_client.pull_all(image_ref).await?;
// 2. Unpack layers into staging directory (existing layer flattening logic)
let staging_dir = temp_dir();
unpack_layers(&layers, &staging_dir).await?;
// 3. Convert ext4 → squashfs for better compression + read-only semantics
let sqfs_path = staging_dir.join("rootfs.sqfs");
Command::new("mksquashfs")
.args([staging_dir.to_str().unwrap(), sqfs_path.to_str().unwrap(),
"-comp", "zstd", "-Xcompression-level", "9"])
.status()?;
// 4. Build ignite.toml metadata
let vmif = VmifManifest {
schema_version: 1,
created: chrono::Utc::now().to_rfc3339(),
arch: "amd64".to_string(),
kernel: kernel_ref.map(str::to_string),
rootfs: sha256_of_file(&sqfs_path)?,
config: parse_oci_config(&config)?,
labels: manifest.annotations.unwrap_or_default(),
size_bytes: file_size(&sqfs_path)?,
};
Ok(vmif)
}Location: ui/src/
Priority: P2
Extend the existing TypeScript dashboard. Add these views:
-
TimeMachine View — horizontal scrollable snapshot timeline per VM. Each snapshot is a node. Click to preview metadata. Drag two nodes for diff view. Button to restore.
-
Network Topology View — D3.js force-directed graph. Nodes = VMs, edges = network connections. Color by compose stack. Click a VM for inline stats panel.
-
Compose Editor — Monaco editor (same as VS Code) with YAML schema validation for
ignite-compose.yml. Live validate against the Docker Compose v3 JSON schema. One-click deploy button. -
Hub Browser — Search box. Hit Ignite Hub API (local cache) first, fall back to Docker Hub bridge. Shows conversion status badge.
Duration: 16 weeks
Goal: TimeMachine (full git-for-runtime), Hibernation, Kubernetes CRI (vk8s), Trusted Boot, in-VM agent, SDK.
Crate: crates/ignite-storage/src/snapshot_tree.rs (extends Phase 2 foundation)
Priority: P1 — headline v2.0 feature
The snapshot tree is already built in Phase 2. This phase wires it to the CLI commands.
// crates/ign/src/commands/snapshot.rs
pub async fn cmd_history(vm_id: &str, client: &Client) -> Result<()> {
let history = client.get_snapshot_history(vm_id).await?;
println!("{:<6} {:<20} {:<10} {:<20} {}",
"TAG", "ID", "DELTA", "CREATED", "LABEL");
for (i, snap) in history.iter().enumerate() {
println!("{:<6} {:<20} {:<10} {:<20} {}",
format!("snap:{}", history.len() - i - 1),
&snap.id[..8],
human_bytes(snap.cow_delta_size),
format_relative_time(snap.created_at),
snap.label.as_deref().unwrap_or("-"),
);
}
Ok(())
}pub async fn cmd_time_travel(vm_id: &str, target: &str, client: &Client) -> Result<()> {
// Parse "snap:N" to get the snapshot index
let index: usize = target.strip_prefix("snap:")
.ok_or_else(|| anyhow!("Invalid snapshot ref. Use snap:N"))?
.parse()?;
let history = client.get_snapshot_history(vm_id).await?;
let snap = history.get(history.len() - 1 - index)
.ok_or_else(|| anyhow!("Snapshot snap:{} not found", index))?;
println!("Stopping {} and restoring to {}...", vm_id, snap.label.as_deref().unwrap_or(&snap.id));
client.stop_vm(vm_id).await?;
client.restore_snapshot_to_vm(vm_id, &snap.id).await?;
client.start_vm(vm_id).await?;
println!("✓ Time-traveled to snap:{}", index);
Ok(())
}// crates/ignited/src/vm_manager.rs
pub struct AutoSnapshotTask {
vm_id: String,
interval: Duration,
retain_count: usize,
}
impl AutoSnapshotTask {
/// Spawned as a tokio task when a VM starts with VM_SNAPSHOT_POLICY set
pub async fn run(self, wal: Arc<Wal>, tree: Arc<SnapshotTree>) {
let mut interval_timer = tokio::time::interval(self.interval);
interval_timer.tick().await; // Skip first immediate tick
loop {
interval_timer.tick().await;
match take_snapshot(&self.vm_id, None, &wal).await {
Ok(snap) => {
tracing::info!("Auto-snapshot {} for VM {}", snap.id, self.vm_id);
prune_old_snapshots(&self.vm_id, self.retain_count, &tree).await;
}
Err(e) => tracing::warn!("Auto-snapshot failed for {}: {}", self.vm_id, e),
}
}
}
}Crate: crates/ignited/src/vm_manager.rs
Priority: P1
pub async fn hibernate_vm(&self, vm_id: &str) -> Result<HibernationInfo> {
let state = self.get_running_vm(vm_id)?;
// 1. Firecracker snapshot (CPU + memory state to files)
let hib_dir = hibernation_dir().join(vm_id);
std::fs::create_dir_all(&hib_dir)?;
self.fc_client.pause_vm(&state.fc_socket).await?;
self.fc_client.create_snapshot(
&state.fc_socket,
&hib_dir.join("vm.snap"),
&hib_dir.join("vm.mem"),
SnapshotType::Full,
).await?;
// 2. Stop the Firecracker process (releases vCPUs and memory)
let _ = state.fc_process.kill();
state.fc_process.wait()?;
// 3. Detach TAP device (releases network slot)
// Keep the TAP device OBJECT so we can re-attach; but disable it
Command::new("ip").args(["link", "set", &state.tap_device, "down"]).status()?;
// 4. Release IP back to IPAM (optional — keep IP for fast resume)
// For hibernation we KEEP the IP reserved so VM resumes with same address
// 5. Update WAL state
self.wal.commit_state(vm_id, &VmState {
status: VmStatus::Hibernated {
hib_dir: hib_dir.clone(),
snap_path: hib_dir.join("vm.snap"),
mem_path: hib_dir.join("vm.mem"),
},
..state.clone()
})?;
// 6. Remove from in-memory VM map (resources truly freed)
self.vms.lock().await.remove(vm_id);
Ok(HibernationInfo {
vm_id: vm_id.to_string(),
hib_dir,
preserved_ip: state.ip,
})
}
pub async fn resume_vm_from_hibernation(&self, vm_id: &str) -> Result<()> {
let state = self.wal.get_state(vm_id)?;
let VmStatus::Hibernated { snap_path, mem_path, .. } = &state.status else {
return Err(anyhow!("VM {} is not hibernated", vm_id));
};
// 1. Re-enable TAP device
Command::new("ip").args(["link", "set", &state.tap_device, "up"]).status()?;
// 2. Start new Firecracker process
let fc_socket = new_fc_socket_path(vm_id);
let fc_process = spawn_firecracker(vm_id, &fc_socket)?;
// 3. Load snapshot (Firecracker resumes from exact state)
self.fc_client.load_snapshot(
&fc_socket,
snap_path,
mem_path,
&state.tap_device,
&state.dm_device_path,
).await?;
// 4. Resume execution
self.fc_client.resume_vm(&fc_socket).await?;
// 5. Update WAL
self.wal.commit_state(vm_id, &VmState {
status: VmStatus::Running { pid: fc_process.id(), fc_socket },
..state
})?;
self.vms.lock().await.insert(vm_id.to_string(), Arc::new(fc_process));
Ok(())
}Location: vk8s/
Language: Go (CRI spec is Go-native, generated from protobuf)
Priority: P2
// vk8s/pkg/cri/runtime.go
package cri
import (
pb "k8s.io/cri-api/pkg/apis/runtime/v1"
ignite "github.com/Subeshrock/micro-vm-ecosystem/sdk/go"
)
// IgniteCriServer implements the CRI RuntimeService and ImageService
type IgniteCriServer struct {
client *ignite.Client // gRPC client to ignited
pb.UnimplementedRuntimeServiceServer
pb.UnimplementedImageServiceServer
}
// RunPodSandbox — called when kubelet creates a new Pod
// Each Pod = one Ignite MicroVM
func (s *IgniteCriServer) RunPodSandbox(
ctx context.Context,
req *pb.RunPodSandboxRequest,
) (*pb.RunPodSandboxResponse, error) {
config := req.Config
// Create VM configuration from pod spec
vmConfig := &ignite.CreateVmRequest{
Name: config.Metadata.Name,
Namespace: config.Metadata.Namespace,
Vcpus: uint32(config.Linux.Resources.CpuQuota / 100000),
MemoryMb: uint64(config.Linux.Resources.MemoryLimitInBytes / 1024 / 1024),
Labels: map[string]string{
"k8s.io/pod-name": config.Metadata.Name,
"k8s.io/pod-namespace": config.Metadata.Namespace,
"k8s.io/pod-uid": config.Metadata.Uid,
},
}
resp, err := s.client.CreateVm(ctx, vmConfig)
if err != nil {
return nil, status.Errorf(codes.Internal, "failed to create VM: %v", err)
}
return &pb.RunPodSandboxResponse{PodSandboxId: resp.VmId}, nil
}
// CreateContainer — called for each container in a pod
// Containers within a pod share the same VM via namespaces
func (s *IgniteCriServer) CreateContainer(
ctx context.Context,
req *pb.CreateContainerRequest,
) (*pb.CreateContainerResponse, error) {
// The VM is already running (from RunPodSandbox).
// "Containers" within a pod are processes inside the VM.
// We use `ign exec` semantics to run the container command.
vm_id := req.PodSandboxId
cmd := append(req.Config.Command, req.Config.Args...)
execReq := &ignite.ExecRequest{
VmId: vm_id,
Command: cmd,
Env: req.Config.Envs,
WorkDir: req.Config.WorkingDir,
}
execResp, err := s.client.StartExec(ctx, execReq)
// Returns a container ID = "vmid/process_id"
return &pb.CreateContainerResponse{ContainerId: execResp.ExecId}, nil
}# Kubernetes setup
# kubelet config (containerd-style):
containerRuntimeEndpoint: unix:///var/run/ignite-cri.sock
# Pod spec to use Ignite MicroVM isolation:
spec:
runtimeClassName: ignite-microvm
containers:
- name: web
image: nginx:alpine
ports:
- containerPort: 80Crate: crates/ignited/src/ and crates/ignite-image/src/signing.rs
Priority: P3
// crates/ignite-image/src/signing.rs
use ed25519_dalek::{SigningKey, VerifyingKey, Signer, Verifier, Signature};
use sha2::{Sha256, Digest};
/// Sign a VMIF manifest with an Ed25519 key
pub fn sign_manifest(manifest: &VmifManifest, key: &SigningKey) -> Result<SignedManifest> {
let manifest_bytes = serde_json::to_vec(manifest)?;
let signature = key.sign(&manifest_bytes);
Ok(SignedManifest {
manifest: manifest.clone(),
signature: signature.to_bytes().to_vec(),
public_key: key.verifying_key().to_bytes().to_vec(),
})
}
/// Verify a signed VMIF manifest before booting
pub fn verify_manifest(signed: &SignedManifest, trusted_key: &VerifyingKey) -> Result<()> {
let manifest_bytes = serde_json::to_vec(&signed.manifest)?;
let signature = Signature::from_bytes(&signed.signature.as_slice().try_into()?);
trusted_key.verify(&manifest_bytes, &signature)
.map_err(|e| anyhow!("Image signature verification failed: {}", e))
}# Enforce signature policy via ignited config
# /etc/ignite/config.toml
[security]
require_signed_images = true
trusted_keys = [
"/etc/ignite/trusted-keys/ci.pub",
"/etc/ignite/trusted-keys/hub.pub",
]Crate: crates/ignite-agent/
Target: x86_64-unknown-linux-musl — static binary, ~400KB
Priority: P2
// crates/ignite-agent/src/main.rs
// This binary is injected into every VMIF image at build time.
// It runs as PID 2 (alongside the actual workload) via a wrapper init.
use vsock::{VsockListener, VMADDR_CID_HOST};
const VSOCK_PORT: u32 = 9999;
#[tokio::main]
async fn main() -> Result<()> {
// Listen on vsock for daemon communication
let listener = VsockListener::bind(&VsockAddr::new(VMADDR_CID_HOST, VSOCK_PORT))?;
loop {
let (stream, _) = listener.accept()?;
tokio::spawn(handle_connection(stream));
}
}
async fn handle_connection(stream: VsockStream) -> Result<()> {
let mut framed = LengthDelimitedCodec::new().framed(stream);
while let Some(frame) = framed.next().await {
let request: AgentRequest = serde_json::from_slice(&frame?)?;
let response = match request {
AgentRequest::ProcessList => {
AgentResponse::ProcessList(collect_process_tree().await?)
}
AgentRequest::ExecCommand { cmd, env, workdir } => {
AgentResponse::ExecStarted(exec_command(cmd, env, workdir).await?)
}
AgentRequest::GetMetrics => {
AgentResponse::Metrics(collect_metrics().await?)
}
AgentRequest::FileRead { path } => {
AgentResponse::FileContent(std::fs::read(&path)?)
}
};
framed.send(Bytes::from(serde_json::to_vec(&response)?)).await?;
}
Ok(())
}
/// Collect /proc stats and return structured metrics
async fn collect_metrics() -> Result<VmMetrics> {
Ok(VmMetrics {
cpu_user_ms: read_proc_stat()?.user_time,
cpu_system_ms: read_proc_stat()?.system_time,
mem_used_kb: read_proc_meminfo()?.mem_total - read_proc_meminfo()?.mem_free,
mem_total_kb: read_proc_meminfo()?.mem_total,
load_avg_1: read_loadavg()?.one,
process_count: read_proc_count()?,
})
}Priority: P3 — enables ecosystem growth
The Go SDK is auto-generated from the protobuf definitions in ignite-proto. The Rust and Python SDKs are thin ergonomic wrappers.
// sdk/go/client.go — Auto-generated from proto, then wrap with ergonomic API
package ignite
import (
"google.golang.org/grpc"
pb "github.com/Subeshrock/micro-vm-ecosystem/ignite-proto/gen/go"
)
type Client struct {
conn *grpc.ClientConn
vm pb.VmServiceClient
}
func NewClient(addr string) (*Client, error) {
conn, err := grpc.Dial(addr, grpc.WithInsecure())
if err != nil { return nil, err }
return &Client{conn: conn, vm: pb.NewVmServiceClient(conn)}, nil
}
func (c *Client) Run(ctx context.Context, image string, opts ...RunOption) (*Vm, error) {
cfg := &RunConfig{Image: image, Vcpus: 1, MemoryMb: 512}
for _, o := range opts { o(cfg) }
resp, err := c.vm.CreateVm(ctx, &pb.CreateVmRequest{
Image: cfg.Image,
Vcpus: cfg.Vcpus,
MemoryMb: cfg.MemoryMb,
})
if err != nil { return nil, err }
return &Vm{Id: resp.VmId, client: c}, nil
}| Level | Crate | Command | CI Trigger | KVM Required |
|---|---|---|---|---|
| Unit | all | cargo test |
Every PR | No |
| Component (mocked) | ignited, ignite-core | cargo test -p ignited |
Every PR | No |
| Integration (real FC) | tests/integration | ./scripts/test_integration.sh |
Every merge to main | Yes |
| Chaos | tests/chaos | cargo test --test chaos -- --ignored |
Nightly | Yes |
| Compat matrix | tests/compat | cargo test --test compat -- --ignored |
Nightly | Yes |
| Performance | tests/bench | cargo bench |
Weekly | Yes |
| Metric | Target | Regression Alert |
|---|---|---|
| Cold boot time (alpine:latest) | < 150ms | > 200ms |
| VM creation (DM snapshot) | < 10ms | > 50ms |
| Memory overhead (idle VM) | < 8MB | > 20MB |
| 100 concurrent idle VMs memory | < 800MB total | > 2GB |
| Block I/O overhead vs bare dm-dev | < 3% | > 10% |
| Network throughput overhead | < 2% | > 5% |
| Teleport downtime (512MB VM) | < 100ms | > 500ms |
| Hibernate + resume cycle | < 200ms | > 500ms |
For unit tests that test ignited behavior without KVM:
// tests/mocks/firecracker_server.rs
pub struct MockFirecracker {
addr: SocketAddr,
state: Arc<Mutex<MockVmState>>,
}
impl MockFirecracker {
pub async fn start() -> Self {
let state = Arc::new(Mutex::new(MockVmState::default()));
let app = Router::new()
.route("/", axum::routing::put(mock_machine_config))
.route("/boot-source", axum::routing::put(mock_boot_source))
.route("/drives/:id", axum::routing::put(mock_drive))
.route("/network-interfaces/:id", axum::routing::put(mock_net))
.route("/actions", axum::routing::put(mock_actions))
.route("/snapshot/create", axum::routing::put(mock_snapshot_create))
.route("/snapshot/load", axum::routing::put(mock_snapshot_load))
.with_state(state.clone());
let listener = TcpListener::bind("127.0.0.1:0").await.unwrap();
let addr = listener.local_addr().unwrap();
tokio::spawn(axum::serve(listener, app));
Self { addr, state }
}
}The .deb produced by CI must contain:
/usr/bin/ignited # Daemon binary
/usr/bin/ign # CLI binary
/usr/lib/ignite/firecracker # Bundled Firecracker VMM
/usr/lib/ignite/virtiofsd # Bundled virtiofs daemon (Phase 1)
/usr/lib/ignite/kernels/ # Pre-built minimal kernels (Phase 3)
ignite-6.1-slim.vmlinuz
ignite-6.1-io_uring.vmlinuz
/etc/systemd/system/ignited.service
/var/lib/ignite/ # Runtime state directory (created by postinstall)
/var/log/ignite/ # Log directory
| Version | User | Capabilities |
|---|---|---|
| v1.1 (current) | root | ALL |
| v1.2 (Phase 1) | ignite | CAP_NET_ADMIN, CAP_SYS_ADMIN, CAP_NET_RAW, CAP_SETUID, CAP_SETGID |
| v2.0 (target) | ignite | Same — no regression in capability set |
# .github/workflows/ci.yml
name: CI
on: [push, pull_request]
jobs:
unit-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
- run: cargo test
- run: cargo clippy -- -D warnings
- run: cargo fmt --check
integration-tests:
runs-on: [self-hosted, kvm] # Bare-metal runner with KVM access
needs: unit-tests
steps:
- uses: actions/checkout@v4
- run: cargo build --release
- run: sudo ./scripts/test_integration.sh
nightly-compat:
runs-on: [self-hosted, kvm]
if: github.event_name == 'schedule'
steps:
- uses: actions/checkout@v4
- run: cargo build --release
- run: cargo test --test compat -- --ignoredThis section documents every decision in the existing codebase that must be changed, and the exact migration path.
Current: The daemon calls git init, git add, git commit on snapshot directories. Lives in ignited/src/ and uses std::process::Command("git").
Remove: All git calls from the daemon. Delete any .git directories in .ignite/vms/.
Replace with: The SnapshotTree implemented in ignite-storage/src/snapshot_tree.rs (Phase 2). The sled-backed snapshot tree provides all the same functionality (history, branching, time-travel) without the external git binary dependency and with proper CoW delta semantics.
Migration for existing snapshots: Write a one-time migration script that reads existing git history and converts it to sled snapshot tree entries. Run on first daemon startup after upgrade.
Current: Arc<Mutex<HashMap<String, Arc<tokio::sync::Mutex<VmmManager>>>>> in ignited. ADR-008 acknowledged this loses state on restart.
Replace with: WAL + sled store from Phase 1 (Section 3.4). The in-memory HashMap becomes a cache of the WAL-persisted state. On startup, WAL replay rebuilds the HashMap.
Migration: Non-breaking. The new store is additive. Existing state JSON files in .ignite/state/ can be imported as the initial WAL state on first upgrade.
Current: ADR-002 chose CLI-subprocess wrapping as MVP approach with explicit note to migrate to native crates for production.
Migration path:
- Phase 2: Migrate
storage.rstodevicemapper+loopdevcrates inignite-storagecrate. - Phase 2: Migrate
network.rstortnetlinkcrate inignite-netcrate. - Keep
iptablessubprocess calls for now (theiptablesRust crate is less mature). - Do NOT migrate
debugfscalls — they're already the correct approach for rootless file population.
Current: Custom version: "1.0" YAML schema in ignite-compose.
Migration: The new parser (Phase 1, Section 3.5) accepts both version: "1.0" (old) and version: "3.x" (new) by branching in the deserialization path. Old files continue working. New documentation always shows Docker Compose v3 format.
Current: ADR-019 explicitly demoted rootless mode to "Experimental/Alpha" because:
- Device Mapper (instant clones) requires root
- VXLAN overlay (Swarm) requires root
- User namespace restrictions cause crashes in standard shells
Decision stands: Do NOT reintroduce rootless mode as a first-class feature in v1.2-v2.0. The privileged service model with a constrained ignite user (Phase 1, Section 3.3) is the correct security model — same as Docker.
Mark rootless in docs as "not recommended for production" and remove it from the test matrix to reduce CI surface area.
Current: ADR-003 chose a custom reqwest-based OCI client instead of oci-distribution crate.
Decision stands: Keep the custom implementation. The oci-distribution crate issues (v0.9.4 compatibility) were valid. Our client handles the OCI Index vs Docker V2 Manifest distinction correctly.
Improvements needed (Phase 2):
- Replace string-based error messages with typed
OciErrorenum - Add retry logic with exponential backoff for transient 429/503 from Docker Hub
- Add support for
~/.docker/config.jsonauth (already partially in changelog — verify completeness) - Add
Bearertoken caching to avoid re-authenticating on every layer pull
Current: ADR-009 chose Tokio userspace proxy for port mapping. This is correct and should stay.
Improvement (Phase 2): Add metrics on proxy throughput per port mapping. Add configurable so_reuseport to allow zero-downtime port re-binding during VM restart.
| Version | Focus | Duration | Key Deliverable |
|---|---|---|---|
| v1.1 | Current state | — | Baseline |
| v1.2 | Critical fixes | 8 weeks | CMD/ENTRYPOINT works, virtiofsd bundled, constrained privileges, WAL, Compose v3 |
| v1.3 | Hardening | 6 weeks | Storage/net refactor to Rust-native crates, chaos tests, 98% Docker Hub compat |
| v1.5 | Power features | 12 weeks | WireGuard Swarm, Raft consensus, Teleport, gRPC, Ignite Hub, VMIF, Studio v2 |
| v2.0 | Revolutionary | 16 weeks | TimeMachine, Hibernation, vk8s CRI, Trusted Boot, ignite-agent, SDK |