This playbook defines a fast, repeatable recovery process for HomeDir when a VPS is lost and a replacement VM is already provisioned.
- Rebuild runtime from:
- GitHub repository (
platform/automation assets) - Quay image tags
- encrypted backup artifact of host data directory
- secure env/secrets file from secret storage
- GitHub repository (
- Keep secrets out of git and avoid plaintext backup handling.
For attack-time first-level containment (before full DR), use:
- Gap: bootstrap steps were documented but mostly manual.
- Improvement:
platform/scripts/homedir-dr-recover.shautomates install + restore + deploy + healthcheck.
- Improvement:
- Gap: backup generation was not standardized with encryption/integrity metadata.
- Improvement:
platform/scripts/homedir-dr-backup.shcreates encrypted artifacts + sha256 + metadata.
- Improvement:
- Gap: restore path traversal/symlink protections were not available at host-level.
- Improvement:
platform/scripts/homedir-dr-restore.pyperforms safe archive extraction.
- Improvement:
- Gap: recovery sequence was split across scripts/runbooks.
- Improvement: one orchestrator script drives end-to-end DR flow.
- Gap: post-recovery host hardening checks were inconsistent.
- Improvement:
platform/scripts/homedir-security-hardening.shadds repeatable baselineauditandapplycontrols.
- Improvement:
- Secrets:
homedir.envmust come from a secure secret store (never from git).- Recommended format for transport:
*.age, decrypted only in-memory/on-host temp files.
- Backups:
- Recommended format:
*.tar.gz.ageplus*.sha256. - Validate checksum before restore.
- Recommended format:
- Webhook/deploy channel:
- Require signed webhook requests (
WEBHOOK_REQUIRE_SIGNATURE=true). - Keep webhook status endpoint token protected (
WEBHOOK_STATUS_TOKEN).
- Require signed webhook requests (
- Local safety:
- DR scripts run with
umask 077. - Temporary decrypted files are removed.
- Existing data dir is preserved as
*.pre-dr-<timestamp>before replacement.
- DR scripts run with
/usr/local/bin/homedir-dr-backup.sh \
--age-recipient <AGE_PUBLIC_RECIPIENT> \
--retain-count 28 \
--output-dir /var/backups/homedir-drOutputs:
- encrypted archive (
.tar.gz.age) - integrity file (
.sha256) - metadata file (
.metadata.json) - automatic pruning of older backup sets beyond
--retain-count(28by default)
/usr/local/bin/homedir-dr-recover.sh \
--env-file /secure/homedir.env.age \
--age-identity /root/.config/age/keys.txt \
--backup-file /secure/homedir-data-YYYYMMDDTHHMMSSZ.tar.gz.age \
--backup-sha256-file /secure/homedir-data-YYYYMMDDTHHMMSSZ.tar.gz.age.sha256 \
--apply-hardeningOptional flags:
--repo-ref <tag|branch>to recover from a specific release reference.--deploy-tag vX.Y.Zto force a specific Quay tag.--skip-nginxif nginx is managed externally.--enable-webhookto restore webhook listener service.--skip-data-restorefor stateless recovery drills.--apply-hardeningto execute VPS/app hardening baseline right after recovery.
- Clones repository from GitHub (
--repo-url,--repo-ref). - Installs platform scripts into
/usr/local/bin. - Installs systemd units into
/etc/systemd/systemand reloads daemon. - Optionally installs nginx configs and maintenance page.
- Installs validated env file to
/etc/homedir.envwith mode0600. - Restores backup archive safely to host data directory.
- Enables
homedir-auto-deploy.timer. - Deploys requested tag or latest semver from Quay.
- Optionally applies baseline hardening (
homedir-security-hardening.sh apply). - Waits for local healthcheck success (
/q/health).
- Run a backup with
homedir-dr-backup.sh. - Start a fresh pre-provisioned VM.
- Execute
homedir-dr-recover.shwith encrypted env + backup. - Verify:
/q/healthreturns 200/,/comunidad,/eventos,/proyectosreturn 200homedir-cfp-traffic-guard.timeris enabled and active/usr/local/bin/homedir-security-hardening.sh auditreports zero FAIL checks- admin backup page can list/download/restore as expected
- Record elapsed recovery time and issues.