# Migration runbook — nullstone → cobblestone Goal: relocate the Docker stack (~28 containers, ~227 GiB state) from **nullstone** (Debian 13, 192.168.0.100, AMD Ryzen 5 2600X / 32 GiB / 477 GiB NVMe, no LUKS) to **cobblestone** (Debian, fresh, LAN, hardware TBD by operator), and close audit regression **F4 (no LUKS at rest)** in the same window. This runbook is read-only on both hosts until cutover (section 4). Sections 1–3 are inventory + planning; section 4 is the destructive cutover; sections 5–7 are follow-through. ## Things we don't know about cobblestone yet — operator to fill in | Question | Why it matters | Default if unset | |---|---|---| | CPU model / cores / threads | Sizing for parallel postgres + Ollama + MC | Assume ≥ Ryzen 5 2600X parity | | RAM | 32 GiB nullstone runs 50 % util peak; less = trim MC + Ollama | Require ≥ 32 GiB | | Storage layout (LVM? ZFS? plain?) | Decides LUKS strategy in 3a | Assume single NVMe, plain ext4 | | GPU present (any) | Ollama / vLLM / Misskey thumb GPU helpers | Assume none, leave Ollama on friend RTX 4080 | | LUKS already enabled at install? | If no → reinstall window or LUKS-on-file fallback | Assume **no** (act accordingly) | | Static IP allocated? | Cutover plan needs a parking IP | Assume DHCP, target `.101` for cutover | | DE installed? | Strip vs keep debate | Confirmed installed; default = strip | | User account name + uid | Bind-mount permissions on /home/docker | Assume `user`, uid 1000 (mirror nullstone) | Update this table before running section 3. --- ## 1 — Pre-migration audit (run on nullstone) All commands read-only. SSH as `user@192.168.0.100` (per `feedback_nullstone_ssh_user.md` — `admin@` is rejected). ### 1.1 Container inventory ```bash ssh user@192.168.0.100 'docker ps -a --format "{{json .}}"' \ > nullstone-containers-$(date +%F).jsonl ssh user@192.168.0.100 'docker inspect $(docker ps -aq)' \ > nullstone-inspect-$(date +%F).json ``` Parse for `Names`, `Image`, `Mounts[].Source`, `NetworkSettings.Networks`, `HostConfig.RestartPolicy`, `Config.Labels` (Traefik routers). ### 1.2 Volumes (size estimate) ```bash ssh user@192.168.0.100 'docker volume ls --format "{{.Name}}"' \ | xargs -I {} ssh user@192.168.0.100 \ "docker run --rm -v {}:/v alpine du -sh /v 2>/dev/null | sed 's|/v|{}|'" ``` Cross-reference with `/home/user/docker-data/100000.100000/volumes/` (userns-remapped path) for per-volume bytes. ### 1.3 Network ```bash ssh user@192.168.0.100 'docker network ls; \ ss -tlnp 2>/dev/null | grep LISTEN; \ iptables-save 2>/dev/null; nft list ruleset 2>/dev/null' ``` Capture Traefik vhosts: ```bash ssh user@192.168.0.100 'cd /opt/docker/traefik && \ ls dynamic/; cat dynamic/*.yml | grep -E "rule:|sourceRange:"' ``` ### 1.4 Cron + scheduled tasks ```bash ssh user@192.168.0.100 'sudo cat /etc/crontab /etc/cron.d/* 2>/dev/null; \ for u in $(cut -d: -f1 /etc/passwd); do \ crontab -u $u -l 2>/dev/null && echo "(user $u)"; done' ``` Known: `/etc/cron.d/docker-backup` runs `/opt/docker/backup.sh` daily at 02:00 — **broken** (F-backup-1, fix in section 5). ### 1.5 Systemd ```bash ssh user@192.168.0.100 'systemctl list-unit-files \ --state=enabled --type=service --no-pager' ``` Watch for: `docker.service`, `tailscaled.service`, `ollama.service` (Ollama runs on host, not in Docker), `chrony.service`, `ssh.service`. ### 1.6 Disk + memory + cpu baseline ```bash ssh user@192.168.0.100 'df -hT; \ sudo du -sh /home/docker/* /opt/docker/* /opt/backups 2>/dev/null; \ free -h; lscpu | head -20; nproc' ``` Reference (2026-05-06 spot check): `/` 30 G (37 %) · `/var` 12 G (17 %) · `/home` 399 G (60 %, 226 G used). Most state is on `/home`. ### 1.7 Daemon config ```bash ssh user@192.168.0.100 'cat /etc/docker/daemon.json /etc/subuid /etc/subgid; \ sudo cat /etc/systemd/system/docker.service.d/override.conf 2>/dev/null' ``` Known good (carry forward except possibly userns-remap, see 3c): ```json { "log-driver": "json-file", "log-opts": {"max-size": "10m", "max-file": "3"}, "live-restore": true, "icc": false, "userns-remap": "default", "default-address-pools": [{"base": "172.20.0.0/16", "size": 24}], "storage-driver": "overlay2", "no-new-privileges": true } ``` --- ## 2 — Secret + state catalog Anything in this table that is **lost** or **corrupted** during transfer forces re-issuance / re-pinning / re-handshake. Group by criticality. ### Tier 0 — irreplaceable (lose this and external systems break) | Path | Bytes (est.) | Restore cost if lost | |---|---|---| | `/opt/docker/step-ca/data/secrets/` + `/opt/docker/step-ca/.env` | < 1 MiB | Re-issue every internal cert; reinstall `veilor-root.crt` on every device that uses `*.veilor` / internal-CA chains. Hard. | | `/opt/docker/traefik/data/acme.json` (LE prod) | < 1 MiB | Hits LE rate-limit (5 dupe certs/wk per FQDN, 50 certs/wk per registered domain). Could lock cert issuance for a full week. | | `/opt/docker/traefik/data/acme-internal.json` (step-ca chain) | < 1 MiB | Step-ca re-issues fast, but every leaf reissue invalidates pinned trust anchors. | | `/opt/docker/headscale/config/private.key` + `/opt/docker/headscale/data/db.sqlite` | < 50 MiB | Loss = every node re-enrolls; preauthkeys, routes, ACLs reset. Friend GPU node identity churn. | | `/etc/ssh/ssh_host_*` | < 1 MiB | Either copy → keep TOFU pinning intact, OR rotate → all clients hit "key changed" warning (acceptable but noisy). | ### Tier 1 — application secrets (loss → password reset cascade) | Path | Bytes (est.) | Notes | |---|---|---| | `/opt/docker/forgejo/data/gitea/conf/app.ini` (note: file is `app.ini` under `gitea/conf/` even on Forgejo) | ~10 KiB | `SECRET_KEY`, `INTERNAL_TOKEN`, `JWT_SECRET`, `LFS_JWT_SECRET`, OAuth client secrets. | | `/opt/docker/authentik/.env` + authentik PG dump | tens of MiB | `AUTHENTIK_SECRET_KEY`, `PG_PASS`. Any service trusting Authentik OIDC needs `client_secret` re-handover. | | `/opt/docker/misskey/.env` + misskey PG dump | < 1 MiB env | `id`, `db.user/pass`, `redis.pass`, master key. | | `/opt/docker/n8n/.env` + n8n PG dump | < 1 MiB env | Encryption key for credentials at rest — **lose this and stored creds inside n8n flows are unrecoverable**. | | `/opt/docker/rocketchat/.env` + Mongo dump (currently stopped — see 4.1) | < 1 MiB env | First-admin still unclaimed (audit risk item). | | `/opt/docker/tuwunel*/etc/tuwunel.toml` | < 1 MiB | Server signing key seed; lose = federation re-onboard from zero. | | `/opt/docker/livekit/livekit.yaml` | < 1 KiB | `keys:` map (api-key→secret); JWT minter (`lk-jwt-service`) shares this. | | `/opt/docker/pihole/etc-pihole/` | ~50 MiB | Adlists + custom DNS; rebuildable in 30 min if lost. | | Gandi PAT (`GANDIV5_PERSONAL_ACCESS_TOKEN` in `/opt/docker/traefik/.env`) | <1 KiB | Re-issuable from Gandi UI; LiveDNS-only scope (per `reference_gandi_api.md`). | | Tailscale auth keys (Headscale) | regenerate via `headscale preauthkeys create` | OK to regenerate. | ### Tier 2 — bulk data (large, but reproducible OR low-stakes) | Path | Bytes (est.) | Notes | |---|---|---| | Misskey `/files/` (S3-style local) | tens of GiB | User uploads — irreplaceable to users. Dedup-friendly. | | Forgejo `/home/docker/forgejo/data/git/` | ~5 GiB now | Git repos; also mirrored to GH per `project_forgejo_nullstone.md`, so partial DR exists. | | `dl-veilor` static files | ~1 GiB | Public ISO downloads; rebuildable from veilor-os pipeline. | | n8n flows (in `n8n_n8n_data`) | < 1 GiB | Encrypted with key from Tier 1; export JSON via UI as belt-and-braces. | | Minecraft world (`/home/docker/minecraft/data/`) | ~10–30 GiB | Players will riot if lost. | | Ollama models (`/home/user/models/ollama/`) | ~17 GiB | Re-downloadable from registry; not blocking. | | Postgres dumps (authentik, misskey-db, n8n-postgres) | covered by `pg_dumpall` in 4.1 | | | MongoDB dump (rocketchat-mongodb) | covered by `mongodump` in 4.1 | Container is **stopped** today — start, dump, stop. | ### Tier 3 — config-as-code (safely re-deployable from `~/ai-lab/_github/`) - All `/opt/docker/*/docker-compose.yml` — committed under `~/ai-lab/_github/infra/repos/` and `~/ai-lab/nullstone-server/`. - Traefik `dynamic/*.yml` middleware files. - Treat as authoritative in repo; copy from repo to cobblestone, not from nullstone. Diff old-compose vs repo-compose during section 3d to catch any uncommitted drift. --- ## 3 — Cobblestone install plan ### 3a — OS layer Verify base: ```bash ssh user@cobblestone 'cat /etc/debian_version; uname -r; lsb_release -a' ``` **LUKS2 (mandatory — closes F4):** - **Path A (preferred):** reinstall with full-disk LUKS2 from the Debian installer (`/`, `/home`, swap all on encrypted PVs). Set up TPM2 unattended unlock post-install: ```bash systemd-cryptenroll --tpm2-device=auto --tpm2-pcrs=0+7 /dev/nvmeXnYpZ ``` PCR 0+7 binds to firmware + secure-boot state; bricks if firmware is updated → fall back to passphrase. - **Path B (fallback if reinstall blocked):** LUKS-on-file loopback for the high-value subset only: - `/opt/docker/step-ca/` - `/opt/docker/traefik/data/acme*.json` - `/opt/docker/headscale/` - postgres data dirs - Mongo keyfile volume This is **strictly worse** than Path A (rest of disk still cleartext, including misskey uploads and forgejo repos), but it closes the highest-value subset. Document as accepted risk. Hostname + base packages: ```bash sudo hostnamectl set-hostname cobblestone sudo apt update && sudo apt install -y \ curl ca-certificates gnupg jq ufw fail2ban chrony \ rsync restic tmux htop iotop ncdu ``` **DE strip vs keep — recommendation: STRIP.** Cost of keeping: ~500 MiB RAM, ~5 GiB disk, larger attack surface (CUPS, avahi, polkit, GUI daemons on localhost). Benefit: local browser for vhost testing, on-keyboard recovery if SSH wedges. - **Default (strip):** `sudo apt purge '*-desktop' '*xorg*' lightdm sddm gdm3 'plymouth*' libreoffice-* && sudo apt autoremove --purge`. Install Cockpit for web admin behind Traefik + `no-guest@file`. - **Keep:** lock SDDM/GDM local-only via PAM, disable XDMCP, mask `cups-browsed`. No auto-login. Operator picks; document choice in SYSTEM.md. ### 3b — Network **IP allocation during cutover** — use `192.168.0.101` for cobblestone while nullstone stays on `.100`. Flip DNS / port-forwards last (section 4.6). Avoids ARP collisions and keeps rollback trivial. **nftables ruleset** (mirror nullstone pattern — read live ruleset off nullstone in 1.3, replay on cobblestone): ```bash sudo systemctl enable --now nftables # Drop in /etc/nftables.conf with: # - default policy drop on input # - accept established/related # - accept lo # - accept 22 (SSH) from LAN + tailnet # - accept 80/443 (Traefik) from anywhere # - accept 222 (Forgejo SSH) from LAN + tailnet # - accept 25565 (Minecraft) from anywhere # - log+drop everything else ``` **IPv6:** audit reports nullstone has `net.ipv4.ip_forward=1` (F30). That was an *unintended carryover* from a Tailscale subnet-router experiment. **Do NOT** copy `/etc/sysctl.d/` from nullstone wholesale. Instead, set explicitly: ```bash sudo tee /etc/sysctl.d/99-cobblestone.conf <<'EOF' net.ipv4.ip_forward = 0 net.ipv6.conf.all.forwarding = 0 net.ipv4.conf.all.rp_filter = 1 net.ipv4.conf.all.send_redirects = 0 net.ipv4.conf.all.accept_redirects = 0 EOF sudo sysctl --system ``` If Headscale or Tailscale subnet-router is wired later, re-enable `ip_forward` with explicit comment + audit note. **Tailscale + Headscale node identity:** - Cleanest path: re-enroll cobblestone from scratch. New node, new node-key, list `cobblestone` separately from `nullstone` in Headscale during cutover week. - Alternative: copy `/var/lib/tailscale/` from nullstone → cobblestone to inherit the existing identity. Saves one ACL update but conflates audit history. Not recommended. ### 3c — Docker Install via official repo: ```bash curl -fsSL https://download.docker.com/linux/debian/gpg | \ sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \ https://download.docker.com/linux/debian $(lsb_release -cs) stable" | \ sudo tee /etc/apt/sources.list.d/docker.list sudo apt update && sudo apt install -y \ docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin ``` **`/etc/docker/daemon.json` — userns-remap decision.** Two paths; operator decides. Document choice in SYSTEM.md. **Path 1 — DROP userns-remap (recommended):** same JSON as nullstone minus the `userns-remap` line. - Pros: no more `chown 101000` dance; nsenter trick (`feedback_docker_sudo_bypass.md`) drops the `--userns=host` flag; Mongo keyfile pattern from `project_nullstone_docker_userns.md` becomes unnecessary; `docker exec` UIDs match host 1:1. - Cons: container root → host uid 0. Compensated by `no-new-privileges`, `icc=false`, per-compose CAP drops, read-only root FS where compatible. Net: small regression in defense-in-depth, large workflow simplification. **Path 2 — KEEP userns-remap:** carry `/etc/subuid` + `/etc/subgid` identically (`user:100000:65536`). Existing on-disk ownership at uid `101000` transfers without rechown. Cost: persisting the daily friction the operator has been hitting for months. **Default: Path 1.** If chosen, after rsync: ```bash sudo chown -R user:user /home/docker /opt/docker # Then per-service to the container uid (forgejo 1000, postgres 999, # mongo 999, traefik 0). ``` Networks (must exist before Traefik comes up): ```bash docker network create proxy docker network create socket-proxy-net docker network create misskey-frontend ``` ### 3d — Service redeploy order Topological. Each step depends only on its predecessors. Verification command and rollback at each stage. | # | Stack | Depends on | Verify | Rollback | |---|---|---|---|---| | 1 | networks (`proxy`, `socket-proxy-net`, `misskey-frontend`) | docker daemon | `docker network ls` | `docker network rm` | | 2 | `socket-proxy` | network `socket-proxy-net` | `docker logs socket-proxy` shows API filter active | down compose | | 3 | `traefik` | socket-proxy + acme.json/acme-internal.json carryover + Gandi PAT in .env | `curl -k https://sys.s8n.ru` returns dashboard auth challenge; `docker logs traefik` shows resolver init OK; cert files repopulate without LE call (acme.json reuse) | down compose; acme.json restore from backup | | 4 | `step-ca` | traefik (for ACME-back) | `docker exec step-ca step ca health`; Traefik internal-CA resolver issues a cert against `https://step-ca:9000/acme/acme/directory` | down compose; revert traefik resolver config | | 5 | `headscale` | traefik | `curl https://hs.s8n.ru/health`; `docker exec headscale headscale nodes list` shows existing nodes (db.sqlite carryover) | down compose; restore db.sqlite snapshot | | 6 | authentik (`postgres → redis → server → worker`) | traefik | `curl https://auth.s8n.ru/-/health/ready/`; OIDC discovery doc loads | per-component down | | 7 | `forgejo` | traefik (+ optional authentik, currently unwired) | `curl https://git.s8n.ru/api/v1/version`; `git clone ssh://git@cobblestone:222/...` | down compose; data dir tar-revert | | 8 | misskey (`db → redis → misskey → x-source`) | traefik, network `misskey-frontend` | `curl https://x.veilor/api/meta` returns JSON; signup page renders | down compose; pg dump restore | | 9 | `tuwunel` + `tuwunel-txt` | traefik | `curl https://matrix.veilor.uk/_matrix/federation/v1/version` and `https://mx.s8n.ru/_matrix/federation/v1/version` | down compose; data tar-revert | | 10 | `cinny-txt` + `commet-web` + `signup-page` + `signup-txt` | tuwunel reachable, traefik | `curl -I https://txt.s8n.ru` 200; static assets 200 | down compose | | 11 | `livekit-server` + `lk-jwt-service` | traefik (TURN over HTTPS) | `wscat wss://livekit.veilor.uk/`; jwt service `/healthz` | down compose | | 12 | n8n (`postgres → n8n`) | traefik, restored encryption key | `curl https://n8n.s8n.ru/healthz`; UI loads with existing flows | pg dump restore | | 13 | `pihole` | traefik | `dig @cobblestone | head`; admin UI auth | down compose | | 14 | `forgejo-runner` | forgejo (#7) reachable on internal name | `docker logs forgejo-runner` shows `Runner registered successfully` | down compose; regenerate token via `forgejo actions generate-runner-token` | | 15 | `minecraft-mc` | traefik (only for filebrowser-mc), router port-forward 25565 | `mcstatus mc.racked.ru` (or `nc -zv cobblestone 25565`) | down compose; world tar-revert | | 16 | `dl-veilor` + `filebrowser-mc` | traefik | `curl https://dl.veilor.org/v0.2.0/veilor-root.crt` | down compose | | 17 | `anythingllm` | traefik **with `no-guest@file` middleware applied** OR LAN-only bind — must NOT bring up like nullstone (port 3001 publicly exposed, audit F-anythingllm-1) | `curl -I -H 'Host: ai.s8n.ru' https://cobblestone` from off-LAN must 403 | down compose | | 18 | RocketChat (`mongodb → rocketchat`) | **operator decision** — currently stopped on nullstone; if not retired, restore from mongodump produced in 4.1 | `curl https://rc.s8n.ru/api/info`; first-admin claim if still pending | leave stopped (matches today's state) | --- ## 4 — Cutover sequence ### 4.1 — Snapshot state on nullstone ```bash NS=user@192.168.0.100 TS=$(date +%F-%H%M) DEST=/opt/snap/$TS ssh $NS "sudo mkdir -p $DEST && sudo chown user:user $DEST" # Postgres dumps for pg in authentik-postgres misskey-db n8n-postgres-1; do ssh $NS "docker exec $pg pg_dumpall -U postgres" \ | gzip > $DEST/$pg.sql.gz done # Mongo (start, dump, stop again — currently stopped per audit) ssh $NS 'cd /opt/docker/rocketchat && docker compose up -d rocketchat-mongodb && sleep 15' ssh $NS 'docker exec rocketchat-mongodb mongodump \ --username root \ --password "$(grep MONGO_INITDB_ROOT_PASSWORD /opt/docker/rocketchat/.env | cut -d= -f2)" \ --authenticationDatabase admin --archive' \ | gzip > $DEST/rocketchat.archive.gz ssh $NS 'cd /opt/docker/rocketchat && docker compose stop rocketchat-mongodb' # Forgejo full dump (covers DB + repos + LFS + attachments) ssh $NS 'docker exec -u 1000 forgejo \ forgejo dump --type tar.zst --file /tmp/forgejo-dump.tar.zst' ssh $NS 'docker cp forgejo:/tmp/forgejo-dump.tar.zst -' \ > $DEST/forgejo-dump.tar.zst # Stop everything before tar (consistency) ssh $NS 'for d in /opt/docker/*/; do \ [ -f "$d/docker-compose.yml" ] && \ (cd "$d" && docker compose down) ; \ done' # Bulk state tar ssh $NS "sudo tar --acls --xattrs -cpf - /opt/docker /home/docker /opt/backups" \ | zstd -T0 -19 > $DEST.tar.zst # Manifest ssh $NS "find /opt/docker /home/docker -type f -print0 \ | xargs -0 sha256sum" > $DEST.sha256 ``` Hold the tarball plus dumps in two places: cobblestone target host and an offline USB. `acme.json` and step-ca secrets get an *additional* armored copy to the password manager. ### 4.2 — rsync to cobblestone After the tarball lands, repopulate cobblestone: ```bash COBB=user@192.168.0.101 scp $DEST.tar.zst $COBB:/tmp/ ssh $COBB 'sudo mkdir -p /opt/docker /home/docker /opt/backups && \ sudo zstd -d /tmp/snap.tar.zst -o /tmp/snap.tar && \ sudo tar --acls --xattrs -xpf /tmp/snap.tar -C /' # If userns-remap dropped (Path 1 in 3c): ssh $COBB 'sudo chown -R user:user /opt/docker /home/docker' ``` ### 4.3 — Bring up services on cobblestone Walk section 3d table top to bottom. **Stop and verify** at each row before the next. Don't batch — one bad startup cascades. For services that store internal hostnames (Tuwunel `server_name`, Headscale `server_url`, Forgejo `ROOT_URL`), the values stay the same because public DNS still resolves to the WAN IP — only the internal LAN target changes. No app config edits needed for cutover. ### 4.4 — Verify per vhost ```bash for host in sys.s8n.ru git.s8n.ru auth.s8n.ru pihole.s8n.ru \ signup.txt.s8n.ru hs.s8n.ru rc.s8n.ru n8n.s8n.ru \ txt.s8n.ru mx.s8n.ru x.veilor matrix.veilor.uk \ chat.veilor.uk livekit.veilor.uk signup.veilor.uk \ dl.veilor.org; do echo -n "$host: " curl --resolve $host:443:192.168.0.101 -sI https://$host | head -1 done ``` Then push key flows: - `git push nullstone-remote` (alias still works because DNS is unchanged) — Forgejo CI runs. - Matrix federation: `curl https://federationtester.matrix.org/api/report?server_name=veilor.uk`. - Misskey signup: hit invite-gated form, complete signup, federation test post. ### 4.5 — Cutover network Two paths; operator picks based on appetite. **Path A — DNS swing** (lower risk, slower propagation): 1. Lower `*.s8n.ru` and `*.veilor*` A-record TTLs to 60 s **a week before** cutover (Gandi UI; can't be done via API per `reference_gandi_api.md`). 2. Day-of: change A records from `82.31.156.86` (assumed unchanged public IP) only if the WAN NAT target has changed (e.g. router port-forwards now point at `.101`). If WAN IP and port-forwards stay the same and you swap LAN IPs (`.100` → `.101`), no public DNS edit needed — only edit `/etc/hosts` on internal clients (per `feedback_s8n_hosts_override.md`). **Path B — IP takeover** (faster, higher rollback friction): - Bring nullstone down on `.100`, change cobblestone from `.101` → `.100`, restart networking. Public DNS + router port-forwards unchanged. Rollback = swap IPs back. Update onyx `/etc/hosts` long pin line **last**: ``` 192.168.0. rc.s8n.ru n8n.s8n.ru pihole.s8n.ru sys.s8n.ru \ mx.s8n.ru txt.s8n.ru signup.txt.s8n.ru git.s8n.ru x.veilor \ dl.veilor.org ``` ### 4.6 — Update memory + ai-lab docs - `~/ai-lab/CLAUDE.md` — Device Registry: add `cobblestone` row, mark `nullstone` as `decom 2026-MM-DD`. - `~/ai-lab/SYSTEM.md` — replace nullstone hardware/network blocks with cobblestone equivalents; keep nullstone as "cold spare" until wipe. - `~/ai-lab/README.md` — device table one-liner. - `~/ai-lab/security/` — create `cobblestone-server/` folder; first audit due within 7 days of cutover. - Memory files to update: `project_nullstone_docker_userns.md` (mark **superseded** if userns-remap dropped), `project_forgejo_nullstone.md`, `project_rocketchat_nullstone.md`, `project_tailscale_mesh.md`, `feedback_nullstone_ssh_user.md`, `feedback_s8n_hosts_override.md` (new IP). ### 4.7 — Cold spare + wipe - Hold nullstone powered-off but cabled, 7 days minimum. - If no rollback triggered, wipe: full LUKS reformat (or `nvme format -s1` for crypto-erase if drive supports it), then either donate or repurpose as cobblestone backup target (Restic destination — closes audit recommendation #6). --- ## 5 — Post-migration immediate fixes Carried over from `nullstone-server/audit-report-2026-05-05.md`: - **F-backup-1 — fix `/opt/docker/backup.sh`:** remove dead `matrix-postgres` block (Synapse retired); correct `rocketchat-mongodb` container name; replace literal `CHANGE_ME_MONGO_ADMIN_PASSWORD` with read from `/opt/docker/rocketchat/.env`. Verify next 02:00 run produces non-zero RC + Mongo dumps. - **no-guest@file ACL:** populate `sourceRange` to cover LAN (`192.168.0.0/24`) + tailnet (`100.64.0.0/10`) + IPv6 equivalents. Verify XFF chain restores client IP at the entryPoint level (`forwardedHeaders.trustedIPs`). - **anythingllm:** front via Traefik with `no-guest@file` OR bind LAN-only. Must not repeat the 0.0.0.0:3001 nullstone state. - **LUKS:** done at install (3a). Verify via `cryptsetup status` + `systemd-cryptenroll --tpm2-device=list` post-cutover. - **Restic + autorestic** to B2/Wasabi or to nullstone-as-spare, with restore drill scheduled. - **Vaultwarden** to centralize the secrets currently sprayed across `.env` files. - **Gatus** with cert-expiry checks + ntfy/Matrix alerts. - **CrowdSec** with bouncer plugin at Traefik for the public HTTP attack surface. - **Beszel** for one-pane host metrics. --- ## 6 — Open questions (operator decisions) | Question | Default if undecided | |---|---| | Strip DE on cobblestone? | **Strip + Cockpit.** Easier to defend; remote admin via web UI through Traefik + no-guest@file. | | userns-remap on cobblestone? | **Off (Path 1 in 3c).** Operator pain outweighs the marginal isolation. Document tradeoff. | | Move Headscale + step-ca to a $4 VPS? | **Defer (phase 2).** Keep on cobblestone for now; revisit once Restic + Gatus are running. SPOF mitigation is real but adds attack surface; do it once monitoring is in place. | | RocketChat: bring back up or retire? | **Retire if not used in 30 days.** Currently stopped; first-admin still unclaimed. Mongo dump captured in 4.1, then drop the stack from cobblestone redeploy. Keeps `rc.s8n.ru` DNS for future revival. | | Tailscale identity copy vs re-enroll for cobblestone? | **Re-enroll** (cleaner audit trail; Headscale ACLs need a one-line edit). | | SSH host keys copy vs rotate? | **Copy.** TOFU pinning intact; one less "is this MITM?" prompt for clients. Add rotation to a follow-up cron. | | Authentik wiring during cutover or after? | **After.** Authentik is currently mostly unwired (audit). Cutover is not the time to add new auth dependencies. | --- ## 7 — Risks (severity-tagged) - 🔴 **acme.json mishandling = LE rate-limit.** Mitigation: copy `acme.json` + `acme-internal.json` BEFORE bringing up Traefik on cobblestone. Never let cobblestone Traefik issue a fresh batch of certs. Hold a backup of both files in two locations. - 🔴 **step-ca root key loss = full re-issuance.** Mitigation: triple-copy `/opt/docker/step-ca/.env` + `data/secrets/` (cobblestone, USB, password manager). Test that the encrypted root key decrypts on cobblestone before tearing down nullstone. - 🔴 **anythingllm reintroduces public 0.0.0.0:3001.** Mitigation: do NOT bring it up before middleware is in place. Test from off-LAN IP. - 🟠 **PostgreSQL major-version skew.** Mitigation: pin same major on cobblestone (`postgres:16-alpine` already pinned; do NOT use `:latest`). If a major upgrade is desired, do it as a separate step *after* cutover settles, with a fresh pg_dumpall as safety net. - 🟠 **Headscale node identity churn** if `db.sqlite` not copied. All nodes (onyx, friend RTX 4080 PC, office) re-enroll. Mitigation: copy `db.sqlite` + `private.key`; verify `headscale nodes list` matches pre-cutover before flipping DNS. - 🟡 **chrony NTS peers** may need re-trust on new host (NTS-KE binds to hostname). Mitigation: chrony config copy verbatim; first `chronyc tracking` should show stratum within 5 minutes. - 🟡 **Authentik OIDC `client_secret`s.** Today: mostly unwired (audit). Risk small. If Forgejo/RC/n8n were wired through Authentik, each `client_secret` would need re-handover. Defer Authentik wiring until post-cutover. - 🟡 **Misskey AGPL §13 source endpoint** (`x-source`). Per `project_x_misskey_fork.md`, the AGPL link must keep serving source — and per the same memo, mute is acceptable for short windows. Cutover downtime budget: **≤ 2 h**. If exceeded, post a banner on `x.veilor` linking to `https://git.s8n.ru/s8n-ru/x` for the duration. - 🟡 **Backup script broken on copy.** Audit F-backup-1 still applies if you copy `/opt/docker/backup.sh` verbatim. Fix during section 5, not before — but do not let it run on cobblestone before fix (disable the cron entry until corrected). --- ## Appendix — quick reference - nullstone: `user@192.168.0.100`, Debian 13, 32 GiB / 477 GiB, ~28 containers, no LUKS (F4). - cobblestone: `user@192.168.0.101` during cutover, swing to `.100` post-validation. - LE wildcard `*.s8n.ru` + `*.veilor.uk` via Gandi DNS-01. Internal CA via step-ca, Traefik resolver `internal-ca`. - Out of scope: office workstation install, friend GPU re-enrollment, veilor-os ISO build pipeline. --- **Path:** `/home/admin/ai-lab/_github/infra/runbooks/MIGRATION-nullstone-to-cobblestone.md` Two-line summary: pre-migration audit + secret catalog + cobblestone install plan (LUKS2, optional userns-remap drop, 18-step topological service redeploy) + cutover script + post-migration fixes carried over from the 2026-05-05 audit. Operator must fill the "things we don't know about cobblestone" table and pick on userns-remap / DE / RC retirement before section 3 runs.