infra/runbooks/MIGRATION-nullstone-to-cobblestone.md
s8n 09d80a63f6 init: nullstone deploys + runbooks + audits
Sourced from previous audits + agent-wave outputs (2026-05-05):
  AUDIT-2026-05-05.md           — 5-agent stack synthesis
  forgejo/DEPLOY.md             — git.s8n.ru deploy runbook
  forgejo/forgejo-compose.yml   — production compose
  forgejo/runner-compose.yml    — forgejo-runner
  forgejo/migration-report-...  — GH→Forgejo migration audit (6/6 green)
  runbooks/MIGRATION-...        — nullstone→cobblestone runbook
  runbooks/DE-DECISION-...      — keep-vs-strip DE on cobblestone
  repos/REPO-AUDIT-2026-05-05.md — repo trees + ownership
2026-05-06 10:02:28 +01:00

28 KiB
Raw Permalink Blame History

Migration runbook — nullstone → cobblestone

Goal: relocate the Docker stack (~28 containers, ~227 GiB state) from nullstone (Debian 13, 192.168.0.100, AMD Ryzen 5 2600X / 32 GiB / 477 GiB NVMe, no LUKS) to cobblestone (Debian, fresh, LAN, hardware TBD by operator), and close audit regression F4 (no LUKS at rest) in the same window.

This runbook is read-only on both hosts until cutover (section 4). Sections 13 are inventory + planning; section 4 is the destructive cutover; sections 57 are follow-through.

Things we don't know about cobblestone yet — operator to fill in

Question Why it matters Default if unset
CPU model / cores / threads Sizing for parallel postgres + Ollama + MC Assume ≥ Ryzen 5 2600X parity
RAM 32 GiB nullstone runs 50 % util peak; less = trim MC + Ollama Require ≥ 32 GiB
Storage layout (LVM? ZFS? plain?) Decides LUKS strategy in 3a Assume single NVMe, plain ext4
GPU present (any) Ollama / vLLM / Misskey thumb GPU helpers Assume none, leave Ollama on friend RTX 4080
LUKS already enabled at install? If no → reinstall window or LUKS-on-file fallback Assume no (act accordingly)
Static IP allocated? Cutover plan needs a parking IP Assume DHCP, target .101 for cutover
DE installed? Strip vs keep debate Confirmed installed; default = strip
User account name + uid Bind-mount permissions on /home/docker Assume user, uid 1000 (mirror nullstone)

Update this table before running section 3.


1 — Pre-migration audit (run on nullstone)

All commands read-only. SSH as user@192.168.0.100 (per feedback_nullstone_ssh_user.mdadmin@ is rejected).

1.1 Container inventory

ssh user@192.168.0.100 'docker ps -a --format "{{json .}}"' \
  > nullstone-containers-$(date +%F).jsonl
ssh user@192.168.0.100 'docker inspect $(docker ps -aq)' \
  > nullstone-inspect-$(date +%F).json

Parse for Names, Image, Mounts[].Source, NetworkSettings.Networks, HostConfig.RestartPolicy, Config.Labels (Traefik routers).

1.2 Volumes (size estimate)

ssh user@192.168.0.100 'docker volume ls --format "{{.Name}}"' \
  | xargs -I {} ssh user@192.168.0.100 \
    "docker run --rm -v {}:/v alpine du -sh /v 2>/dev/null | sed 's|/v|{}|'"

Cross-reference with /home/user/docker-data/100000.100000/volumes/ (userns-remapped path) for per-volume bytes.

1.3 Network

ssh user@192.168.0.100 'docker network ls; \
  ss -tlnp 2>/dev/null | grep LISTEN; \
  iptables-save 2>/dev/null; nft list ruleset 2>/dev/null'

Capture Traefik vhosts:

ssh user@192.168.0.100 'cd /opt/docker/traefik && \
  ls dynamic/; cat dynamic/*.yml | grep -E "rule:|sourceRange:"'

1.4 Cron + scheduled tasks

ssh user@192.168.0.100 'sudo cat /etc/crontab /etc/cron.d/* 2>/dev/null; \
  for u in $(cut -d: -f1 /etc/passwd); do \
    crontab -u $u -l 2>/dev/null && echo "(user $u)"; done'

Known: /etc/cron.d/docker-backup runs /opt/docker/backup.sh daily at 02:00 — broken (F-backup-1, fix in section 5).

1.5 Systemd

ssh user@192.168.0.100 'systemctl list-unit-files \
  --state=enabled --type=service --no-pager'

Watch for: docker.service, tailscaled.service, ollama.service (Ollama runs on host, not in Docker), chrony.service, ssh.service.

1.6 Disk + memory + cpu baseline

ssh user@192.168.0.100 'df -hT; \
  sudo du -sh /home/docker/* /opt/docker/* /opt/backups 2>/dev/null; \
  free -h; lscpu | head -20; nproc'

Reference (2026-05-06 spot check): / 30 G (37 %) · /var 12 G (17 %) · /home 399 G (60 %, 226 G used). Most state is on /home.

1.7 Daemon config

ssh user@192.168.0.100 'cat /etc/docker/daemon.json /etc/subuid /etc/subgid; \
  sudo cat /etc/systemd/system/docker.service.d/override.conf 2>/dev/null'

Known good (carry forward except possibly userns-remap, see 3c):

{
  "log-driver": "json-file",
  "log-opts": {"max-size": "10m", "max-file": "3"},
  "live-restore": true,
  "icc": false,
  "userns-remap": "default",
  "default-address-pools": [{"base": "172.20.0.0/16", "size": 24}],
  "storage-driver": "overlay2",
  "no-new-privileges": true
}

2 — Secret + state catalog

Anything in this table that is lost or corrupted during transfer forces re-issuance / re-pinning / re-handshake. Group by criticality.

Tier 0 — irreplaceable (lose this and external systems break)

Path Bytes (est.) Restore cost if lost
/opt/docker/step-ca/data/secrets/ + /opt/docker/step-ca/.env < 1 MiB Re-issue every internal cert; reinstall veilor-root.crt on every device that uses *.veilor / internal-CA chains. Hard.
/opt/docker/traefik/data/acme.json (LE prod) < 1 MiB Hits LE rate-limit (5 dupe certs/wk per FQDN, 50 certs/wk per registered domain). Could lock cert issuance for a full week.
/opt/docker/traefik/data/acme-internal.json (step-ca chain) < 1 MiB Step-ca re-issues fast, but every leaf reissue invalidates pinned trust anchors.
/opt/docker/headscale/config/private.key + /opt/docker/headscale/data/db.sqlite < 50 MiB Loss = every node re-enrolls; preauthkeys, routes, ACLs reset. Friend GPU node identity churn.
/etc/ssh/ssh_host_* < 1 MiB Either copy → keep TOFU pinning intact, OR rotate → all clients hit "key changed" warning (acceptable but noisy).

Tier 1 — application secrets (loss → password reset cascade)

Path Bytes (est.) Notes
/opt/docker/forgejo/data/gitea/conf/app.ini (note: file is app.ini under gitea/conf/ even on Forgejo) ~10 KiB SECRET_KEY, INTERNAL_TOKEN, JWT_SECRET, LFS_JWT_SECRET, OAuth client secrets.
/opt/docker/authentik/.env + authentik PG dump tens of MiB AUTHENTIK_SECRET_KEY, PG_PASS. Any service trusting Authentik OIDC needs client_secret re-handover.
/opt/docker/misskey/.env + misskey PG dump < 1 MiB env id, db.user/pass, redis.pass, master key.
/opt/docker/n8n/.env + n8n PG dump < 1 MiB env Encryption key for credentials at rest — lose this and stored creds inside n8n flows are unrecoverable.
/opt/docker/rocketchat/.env + Mongo dump (currently stopped — see 4.1) < 1 MiB env First-admin still unclaimed (audit risk item).
/opt/docker/tuwunel*/etc/tuwunel.toml < 1 MiB Server signing key seed; lose = federation re-onboard from zero.
/opt/docker/livekit/livekit.yaml < 1 KiB keys: map (api-key→secret); JWT minter (lk-jwt-service) shares this.
/opt/docker/pihole/etc-pihole/ ~50 MiB Adlists + custom DNS; rebuildable in 30 min if lost.
Gandi PAT (GANDIV5_PERSONAL_ACCESS_TOKEN in /opt/docker/traefik/.env) <1 KiB Re-issuable from Gandi UI; LiveDNS-only scope (per reference_gandi_api.md).
Tailscale auth keys (Headscale) regenerate via headscale preauthkeys create OK to regenerate.

Tier 2 — bulk data (large, but reproducible OR low-stakes)

Path Bytes (est.) Notes
Misskey /files/ (S3-style local) tens of GiB User uploads — irreplaceable to users. Dedup-friendly.
Forgejo /home/docker/forgejo/data/git/ ~5 GiB now Git repos; also mirrored to GH per project_forgejo_nullstone.md, so partial DR exists.
dl-veilor static files ~1 GiB Public ISO downloads; rebuildable from veilor-os pipeline.
n8n flows (in n8n_n8n_data) < 1 GiB Encrypted with key from Tier 1; export JSON via UI as belt-and-braces.
Minecraft world (/home/docker/minecraft/data/) ~1030 GiB Players will riot if lost.
Ollama models (/home/user/models/ollama/) ~17 GiB Re-downloadable from registry; not blocking.
Postgres dumps (authentik, misskey-db, n8n-postgres) covered by pg_dumpall in 4.1
MongoDB dump (rocketchat-mongodb) covered by mongodump in 4.1 Container is stopped today — start, dump, stop.

Tier 3 — config-as-code (safely re-deployable from ~/ai-lab/_github/)

  • All /opt/docker/*/docker-compose.yml — committed under ~/ai-lab/_github/infra/repos/ and ~/ai-lab/nullstone-server/.
  • Traefik dynamic/*.yml middleware files.
  • Treat as authoritative in repo; copy from repo to cobblestone, not from nullstone. Diff old-compose vs repo-compose during section 3d to catch any uncommitted drift.

3 — Cobblestone install plan

3a — OS layer

Verify base:

ssh user@cobblestone 'cat /etc/debian_version; uname -r; lsb_release -a'

LUKS2 (mandatory — closes F4):

  • Path A (preferred): reinstall with full-disk LUKS2 from the Debian installer (/, /home, swap all on encrypted PVs). Set up TPM2 unattended unlock post-install:
    systemd-cryptenroll --tpm2-device=auto --tpm2-pcrs=0+7 /dev/nvmeXnYpZ
    
    PCR 0+7 binds to firmware + secure-boot state; bricks if firmware is updated → fall back to passphrase.
  • Path B (fallback if reinstall blocked): LUKS-on-file loopback for the high-value subset only:
    • /opt/docker/step-ca/
    • /opt/docker/traefik/data/acme*.json
    • /opt/docker/headscale/
    • postgres data dirs
    • Mongo keyfile volume This is strictly worse than Path A (rest of disk still cleartext, including misskey uploads and forgejo repos), but it closes the highest-value subset. Document as accepted risk.

Hostname + base packages:

sudo hostnamectl set-hostname cobblestone
sudo apt update && sudo apt install -y \
  curl ca-certificates gnupg jq ufw fail2ban chrony \
  rsync restic tmux htop iotop ncdu

DE strip vs keep — recommendation: STRIP.

Cost of keeping: ~500 MiB RAM, ~5 GiB disk, larger attack surface (CUPS, avahi, polkit, GUI daemons on localhost). Benefit: local browser for vhost testing, on-keyboard recovery if SSH wedges.

  • Default (strip): sudo apt purge '*-desktop' '*xorg*' lightdm sddm gdm3 'plymouth*' libreoffice-* && sudo apt autoremove --purge. Install Cockpit for web admin behind Traefik + no-guest@file.
  • Keep: lock SDDM/GDM local-only via PAM, disable XDMCP, mask cups-browsed. No auto-login.

Operator picks; document choice in SYSTEM.md.

3b — Network

IP allocation during cutover — use 192.168.0.101 for cobblestone while nullstone stays on .100. Flip DNS / port-forwards last (section 4.6). Avoids ARP collisions and keeps rollback trivial.

nftables ruleset (mirror nullstone pattern — read live ruleset off nullstone in 1.3, replay on cobblestone):

sudo systemctl enable --now nftables
# Drop in /etc/nftables.conf with:
#   - default policy drop on input
#   - accept established/related
#   - accept lo
#   - accept 22 (SSH) from LAN + tailnet
#   - accept 80/443 (Traefik) from anywhere
#   - accept 222 (Forgejo SSH) from LAN + tailnet
#   - accept 25565 (Minecraft) from anywhere
#   - log+drop everything else

IPv6: audit reports nullstone has net.ipv4.ip_forward=1 (F30). That was an unintended carryover from a Tailscale subnet-router experiment. Do NOT copy /etc/sysctl.d/ from nullstone wholesale. Instead, set explicitly:

sudo tee /etc/sysctl.d/99-cobblestone.conf <<'EOF'
net.ipv4.ip_forward = 0
net.ipv6.conf.all.forwarding = 0
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.all.accept_redirects = 0
EOF
sudo sysctl --system

If Headscale or Tailscale subnet-router is wired later, re-enable ip_forward with explicit comment + audit note.

Tailscale + Headscale node identity:

  • Cleanest path: re-enroll cobblestone from scratch. New node, new node-key, list cobblestone separately from nullstone in Headscale during cutover week.
  • Alternative: copy /var/lib/tailscale/ from nullstone → cobblestone to inherit the existing identity. Saves one ACL update but conflates audit history. Not recommended.

3c — Docker

Install via official repo:

curl -fsSL https://download.docker.com/linux/debian/gpg | \
  sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/debian $(lsb_release -cs) stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list
sudo apt update && sudo apt install -y \
  docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

/etc/docker/daemon.json — userns-remap decision.

Two paths; operator decides. Document choice in SYSTEM.md.

Path 1 — DROP userns-remap (recommended): same JSON as nullstone minus the userns-remap line.

  • Pros: no more chown 101000 dance; nsenter trick (feedback_docker_sudo_bypass.md) drops the --userns=host flag; Mongo keyfile pattern from project_nullstone_docker_userns.md becomes unnecessary; docker exec UIDs match host 1:1.
  • Cons: container root → host uid 0. Compensated by no-new-privileges, icc=false, per-compose CAP drops, read-only root FS where compatible. Net: small regression in defense-in-depth, large workflow simplification.

Path 2 — KEEP userns-remap: carry /etc/subuid + /etc/subgid identically (user:100000:65536). Existing on-disk ownership at uid 101000 transfers without rechown. Cost: persisting the daily friction the operator has been hitting for months.

Default: Path 1. If chosen, after rsync:

sudo chown -R user:user /home/docker /opt/docker
# Then per-service to the container uid (forgejo 1000, postgres 999,
# mongo 999, traefik 0).

Networks (must exist before Traefik comes up):

docker network create proxy
docker network create socket-proxy-net
docker network create misskey-frontend

3d — Service redeploy order

Topological. Each step depends only on its predecessors. Verification command and rollback at each stage.

# Stack Depends on Verify Rollback
1 networks (proxy, socket-proxy-net, misskey-frontend) docker daemon docker network ls docker network rm
2 socket-proxy network socket-proxy-net docker logs socket-proxy shows API filter active down compose
3 traefik socket-proxy + acme.json/acme-internal.json carryover + Gandi PAT in .env curl -k https://sys.s8n.ru returns dashboard auth challenge; docker logs traefik shows resolver init OK; cert files repopulate without LE call (acme.json reuse) down compose; acme.json restore from backup
4 step-ca traefik (for ACME-back) docker exec step-ca step ca health; Traefik internal-CA resolver issues a cert against https://step-ca:9000/acme/acme/directory down compose; revert traefik resolver config
5 headscale traefik curl https://hs.s8n.ru/health; docker exec headscale headscale nodes list shows existing nodes (db.sqlite carryover) down compose; restore db.sqlite snapshot
6 authentik (postgres → redis → server → worker) traefik curl https://auth.s8n.ru/-/health/ready/; OIDC discovery doc loads per-component down
7 forgejo traefik (+ optional authentik, currently unwired) curl https://git.s8n.ru/api/v1/version; git clone ssh://git@cobblestone:222/... down compose; data dir tar-revert
8 misskey (db → redis → misskey → x-source) traefik, network misskey-frontend curl https://x.veilor/api/meta returns JSON; signup page renders down compose; pg dump restore
9 tuwunel + tuwunel-txt traefik curl https://matrix.veilor.uk/_matrix/federation/v1/version and https://mx.s8n.ru/_matrix/federation/v1/version down compose; data tar-revert
10 cinny-txt + commet-web + signup-page + signup-txt tuwunel reachable, traefik curl -I https://txt.s8n.ru 200; static assets 200 down compose
11 livekit-server + lk-jwt-service traefik (TURN over HTTPS) wscat wss://livekit.veilor.uk/; jwt service /healthz down compose
12 n8n (postgres → n8n) traefik, restored encryption key curl https://n8n.s8n.ru/healthz; UI loads with existing flows pg dump restore
13 pihole traefik `dig @cobblestone head`; admin UI auth
14 forgejo-runner forgejo (#7) reachable on internal name docker logs forgejo-runner shows Runner registered successfully down compose; regenerate token via forgejo actions generate-runner-token
15 minecraft-mc traefik (only for filebrowser-mc), router port-forward 25565 mcstatus mc.racked.ru (or nc -zv cobblestone 25565) down compose; world tar-revert
16 dl-veilor + filebrowser-mc traefik curl https://dl.veilor.org/v0.2.0/veilor-root.crt down compose
17 anythingllm traefik with no-guest@file middleware applied OR LAN-only bind — must NOT bring up like nullstone (port 3001 publicly exposed, audit F-anythingllm-1) curl -I -H 'Host: ai.s8n.ru' https://cobblestone from off-LAN must 403 down compose
18 RocketChat (mongodb → rocketchat) operator decision — currently stopped on nullstone; if not retired, restore from mongodump produced in 4.1 curl https://rc.s8n.ru/api/info; first-admin claim if still pending leave stopped (matches today's state)

4 — Cutover sequence

4.1 — Snapshot state on nullstone

NS=user@192.168.0.100
TS=$(date +%F-%H%M)
DEST=/opt/snap/$TS
ssh $NS "sudo mkdir -p $DEST && sudo chown user:user $DEST"

# Postgres dumps
for pg in authentik-postgres misskey-db n8n-postgres-1; do
  ssh $NS "docker exec $pg pg_dumpall -U postgres" \
    | gzip > $DEST/$pg.sql.gz
done

# Mongo (start, dump, stop again — currently stopped per audit)
ssh $NS 'cd /opt/docker/rocketchat && docker compose up -d rocketchat-mongodb && sleep 15'
ssh $NS 'docker exec rocketchat-mongodb mongodump \
  --username root \
  --password "$(grep MONGO_INITDB_ROOT_PASSWORD /opt/docker/rocketchat/.env | cut -d= -f2)" \
  --authenticationDatabase admin --archive' \
  | gzip > $DEST/rocketchat.archive.gz
ssh $NS 'cd /opt/docker/rocketchat && docker compose stop rocketchat-mongodb'

# Forgejo full dump (covers DB + repos + LFS + attachments)
ssh $NS 'docker exec -u 1000 forgejo \
  forgejo dump --type tar.zst --file /tmp/forgejo-dump.tar.zst'
ssh $NS 'docker cp forgejo:/tmp/forgejo-dump.tar.zst -' \
  > $DEST/forgejo-dump.tar.zst

# Stop everything before tar (consistency)
ssh $NS 'for d in /opt/docker/*/; do \
  [ -f "$d/docker-compose.yml" ] && \
    (cd "$d" && docker compose down) ; \
done'

# Bulk state tar
ssh $NS "sudo tar --acls --xattrs -cpf - /opt/docker /home/docker /opt/backups" \
  | zstd -T0 -19 > $DEST.tar.zst

# Manifest
ssh $NS "find /opt/docker /home/docker -type f -print0 \
  | xargs -0 sha256sum" > $DEST.sha256

Hold the tarball plus dumps in two places: cobblestone target host and an offline USB. acme.json and step-ca secrets get an additional armored copy to the password manager.

4.2 — rsync to cobblestone

After the tarball lands, repopulate cobblestone:

COBB=user@192.168.0.101
scp $DEST.tar.zst $COBB:/tmp/
ssh $COBB 'sudo mkdir -p /opt/docker /home/docker /opt/backups && \
  sudo zstd -d /tmp/snap.tar.zst -o /tmp/snap.tar && \
  sudo tar --acls --xattrs -xpf /tmp/snap.tar -C /'
# If userns-remap dropped (Path 1 in 3c):
ssh $COBB 'sudo chown -R user:user /opt/docker /home/docker'

4.3 — Bring up services on cobblestone

Walk section 3d table top to bottom. Stop and verify at each row before the next. Don't batch — one bad startup cascades.

For services that store internal hostnames (Tuwunel server_name, Headscale server_url, Forgejo ROOT_URL), the values stay the same because public DNS still resolves to the WAN IP — only the internal LAN target changes. No app config edits needed for cutover.

4.4 — Verify per vhost

for host in sys.s8n.ru git.s8n.ru auth.s8n.ru pihole.s8n.ru \
            signup.txt.s8n.ru hs.s8n.ru rc.s8n.ru n8n.s8n.ru \
            txt.s8n.ru mx.s8n.ru x.veilor matrix.veilor.uk \
            chat.veilor.uk livekit.veilor.uk signup.veilor.uk \
            dl.veilor.org; do
  echo -n "$host: "
  curl --resolve $host:443:192.168.0.101 -sI https://$host | head -1
done

Then push key flows:

  • git push nullstone-remote (alias still works because DNS is unchanged) — Forgejo CI runs.
  • Matrix federation: curl https://federationtester.matrix.org/api/report?server_name=veilor.uk.
  • Misskey signup: hit invite-gated form, complete signup, federation test post.

4.5 — Cutover network

Two paths; operator picks based on appetite.

Path A — DNS swing (lower risk, slower propagation):

  1. Lower *.s8n.ru and *.veilor* A-record TTLs to 60 s a week before cutover (Gandi UI; can't be done via API per reference_gandi_api.md).
  2. Day-of: change A records from 82.31.156.86 (assumed unchanged public IP) only if the WAN NAT target has changed (e.g. router port-forwards now point at .101). If WAN IP and port-forwards stay the same and you swap LAN IPs (.100.101), no public DNS edit needed — only edit /etc/hosts on internal clients (per feedback_s8n_hosts_override.md).

Path B — IP takeover (faster, higher rollback friction):

  • Bring nullstone down on .100, change cobblestone from .101.100, restart networking. Public DNS + router port-forwards unchanged. Rollback = swap IPs back.

Update onyx /etc/hosts long pin line last:

192.168.0.<new> rc.s8n.ru n8n.s8n.ru pihole.s8n.ru sys.s8n.ru \
  mx.s8n.ru txt.s8n.ru signup.txt.s8n.ru git.s8n.ru x.veilor \
  dl.veilor.org

4.6 — Update memory + ai-lab docs

  • ~/ai-lab/CLAUDE.md — Device Registry: add cobblestone row, mark nullstone as decom 2026-MM-DD.
  • ~/ai-lab/SYSTEM.md — replace nullstone hardware/network blocks with cobblestone equivalents; keep nullstone as "cold spare" until wipe.
  • ~/ai-lab/README.md — device table one-liner.
  • ~/ai-lab/security/ — create cobblestone-server/ folder; first audit due within 7 days of cutover.
  • Memory files to update: project_nullstone_docker_userns.md (mark superseded if userns-remap dropped), project_forgejo_nullstone.md, project_rocketchat_nullstone.md, project_tailscale_mesh.md, feedback_nullstone_ssh_user.md, feedback_s8n_hosts_override.md (new IP).

4.7 — Cold spare + wipe

  • Hold nullstone powered-off but cabled, 7 days minimum.
  • If no rollback triggered, wipe: full LUKS reformat (or nvme format -s1 for crypto-erase if drive supports it), then either donate or repurpose as cobblestone backup target (Restic destination — closes audit recommendation #6).

5 — Post-migration immediate fixes

Carried over from nullstone-server/audit-report-2026-05-05.md:

  • F-backup-1 — fix /opt/docker/backup.sh: remove dead matrix-postgres block (Synapse retired); correct rocketchat-mongodb container name; replace literal CHANGE_ME_MONGO_ADMIN_PASSWORD with read from /opt/docker/rocketchat/.env. Verify next 02:00 run produces non-zero RC + Mongo dumps.
  • no-guest@file ACL: populate sourceRange to cover LAN (192.168.0.0/24) + tailnet (100.64.0.0/10) + IPv6 equivalents. Verify XFF chain restores client IP at the entryPoint level (forwardedHeaders.trustedIPs).
  • anythingllm: front via Traefik with no-guest@file OR bind LAN-only. Must not repeat the 0.0.0.0:3001 nullstone state.
  • LUKS: done at install (3a). Verify via cryptsetup status + systemd-cryptenroll --tpm2-device=list post-cutover.
  • Restic + autorestic to B2/Wasabi or to nullstone-as-spare, with restore drill scheduled.
  • Vaultwarden to centralize the secrets currently sprayed across .env files.
  • Gatus with cert-expiry checks + ntfy/Matrix alerts.
  • CrowdSec with bouncer plugin at Traefik for the public HTTP attack surface.
  • Beszel for one-pane host metrics.

6 — Open questions (operator decisions)

Question Default if undecided
Strip DE on cobblestone? Strip + Cockpit. Easier to defend; remote admin via web UI through Traefik + no-guest@file.
userns-remap on cobblestone? Off (Path 1 in 3c). Operator pain outweighs the marginal isolation. Document tradeoff.
Move Headscale + step-ca to a $4 VPS? Defer (phase 2). Keep on cobblestone for now; revisit once Restic + Gatus are running. SPOF mitigation is real but adds attack surface; do it once monitoring is in place.
RocketChat: bring back up or retire? Retire if not used in 30 days. Currently stopped; first-admin still unclaimed. Mongo dump captured in 4.1, then drop the stack from cobblestone redeploy. Keeps rc.s8n.ru DNS for future revival.
Tailscale identity copy vs re-enroll for cobblestone? Re-enroll (cleaner audit trail; Headscale ACLs need a one-line edit).
SSH host keys copy vs rotate? Copy. TOFU pinning intact; one less "is this MITM?" prompt for clients. Add rotation to a follow-up cron.
Authentik wiring during cutover or after? After. Authentik is currently mostly unwired (audit). Cutover is not the time to add new auth dependencies.

7 — Risks (severity-tagged)

  • 🔴 acme.json mishandling = LE rate-limit. Mitigation: copy acme.json + acme-internal.json BEFORE bringing up Traefik on cobblestone. Never let cobblestone Traefik issue a fresh batch of certs. Hold a backup of both files in two locations.
  • 🔴 step-ca root key loss = full re-issuance. Mitigation: triple-copy /opt/docker/step-ca/.env + data/secrets/ (cobblestone, USB, password manager). Test that the encrypted root key decrypts on cobblestone before tearing down nullstone.
  • 🔴 anythingllm reintroduces public 0.0.0.0:3001. Mitigation: do NOT bring it up before middleware is in place. Test from off-LAN IP.
  • 🟠 PostgreSQL major-version skew. Mitigation: pin same major on cobblestone (postgres:16-alpine already pinned; do NOT use :latest). If a major upgrade is desired, do it as a separate step after cutover settles, with a fresh pg_dumpall as safety net.
  • 🟠 Headscale node identity churn if db.sqlite not copied. All nodes (onyx, friend RTX 4080 PC, office) re-enroll. Mitigation: copy db.sqlite + private.key; verify headscale nodes list matches pre-cutover before flipping DNS.
  • 🟡 chrony NTS peers may need re-trust on new host (NTS-KE binds to hostname). Mitigation: chrony config copy verbatim; first chronyc tracking should show stratum within 5 minutes.
  • 🟡 Authentik OIDC client_secrets. Today: mostly unwired (audit). Risk small. If Forgejo/RC/n8n were wired through Authentik, each client_secret would need re-handover. Defer Authentik wiring until post-cutover.
  • 🟡 Misskey AGPL §13 source endpoint (x-source). Per project_x_misskey_fork.md, the AGPL link must keep serving source — and per the same memo, mute is acceptable for short windows. Cutover downtime budget: ≤ 2 h. If exceeded, post a banner on x.veilor linking to https://git.s8n.ru/s8n-ru/x for the duration.
  • 🟡 Backup script broken on copy. Audit F-backup-1 still applies if you copy /opt/docker/backup.sh verbatim. Fix during section 5, not before — but do not let it run on cobblestone before fix (disable the cron entry until corrected).

Appendix — quick reference

  • nullstone: user@192.168.0.100, Debian 13, 32 GiB / 477 GiB, ~28 containers, no LUKS (F4).
  • cobblestone: user@192.168.0.101 during cutover, swing to .100 post-validation.
  • LE wildcard *.s8n.ru + *.veilor.uk via Gandi DNS-01. Internal CA via step-ca, Traefik resolver internal-ca.
  • Out of scope: office workstation install, friend GPU re-enrollment, veilor-os ISO build pipeline.

Path: /home/admin/ai-lab/_github/infra/runbooks/MIGRATION-nullstone-to-cobblestone.md

Two-line summary: pre-migration audit + secret catalog + cobblestone install plan (LUKS2, optional userns-remap drop, 18-step topological service redeploy) + cutover script + post-migration fixes carried over from the 2026-05-05 audit. Operator must fill the "things we don't know about cobblestone" table and pick on userns-remap / DE / RC retirement before section 3 runs.