Sourced from previous audits + agent-wave outputs (2026-05-05): AUDIT-2026-05-05.md — 5-agent stack synthesis forgejo/DEPLOY.md — git.s8n.ru deploy runbook forgejo/forgejo-compose.yml — production compose forgejo/runner-compose.yml — forgejo-runner forgejo/migration-report-... — GH→Forgejo migration audit (6/6 green) runbooks/MIGRATION-... — nullstone→cobblestone runbook runbooks/DE-DECISION-... — keep-vs-strip DE on cobblestone repos/REPO-AUDIT-2026-05-05.md — repo trees + ownership
28 KiB
Migration runbook — nullstone → cobblestone
Goal: relocate the Docker stack (~28 containers, ~227 GiB state) from nullstone (Debian 13, 192.168.0.100, AMD Ryzen 5 2600X / 32 GiB / 477 GiB NVMe, no LUKS) to cobblestone (Debian, fresh, LAN, hardware TBD by operator), and close audit regression F4 (no LUKS at rest) in the same window.
This runbook is read-only on both hosts until cutover (section 4). Sections 1–3 are inventory + planning; section 4 is the destructive cutover; sections 5–7 are follow-through.
Things we don't know about cobblestone yet — operator to fill in
| Question | Why it matters | Default if unset |
|---|---|---|
| CPU model / cores / threads | Sizing for parallel postgres + Ollama + MC | Assume ≥ Ryzen 5 2600X parity |
| RAM | 32 GiB nullstone runs 50 % util peak; less = trim MC + Ollama | Require ≥ 32 GiB |
| Storage layout (LVM? ZFS? plain?) | Decides LUKS strategy in 3a | Assume single NVMe, plain ext4 |
| GPU present (any) | Ollama / vLLM / Misskey thumb GPU helpers | Assume none, leave Ollama on friend RTX 4080 |
| LUKS already enabled at install? | If no → reinstall window or LUKS-on-file fallback | Assume no (act accordingly) |
| Static IP allocated? | Cutover plan needs a parking IP | Assume DHCP, target .101 for cutover |
| DE installed? | Strip vs keep debate | Confirmed installed; default = strip |
| User account name + uid | Bind-mount permissions on /home/docker | Assume user, uid 1000 (mirror nullstone) |
Update this table before running section 3.
1 — Pre-migration audit (run on nullstone)
All commands read-only. SSH as user@192.168.0.100
(per feedback_nullstone_ssh_user.md — admin@ is rejected).
1.1 Container inventory
ssh user@192.168.0.100 'docker ps -a --format "{{json .}}"' \
> nullstone-containers-$(date +%F).jsonl
ssh user@192.168.0.100 'docker inspect $(docker ps -aq)' \
> nullstone-inspect-$(date +%F).json
Parse for Names, Image, Mounts[].Source, NetworkSettings.Networks,
HostConfig.RestartPolicy, Config.Labels (Traefik routers).
1.2 Volumes (size estimate)
ssh user@192.168.0.100 'docker volume ls --format "{{.Name}}"' \
| xargs -I {} ssh user@192.168.0.100 \
"docker run --rm -v {}:/v alpine du -sh /v 2>/dev/null | sed 's|/v|{}|'"
Cross-reference with /home/user/docker-data/100000.100000/volumes/
(userns-remapped path) for per-volume bytes.
1.3 Network
ssh user@192.168.0.100 'docker network ls; \
ss -tlnp 2>/dev/null | grep LISTEN; \
iptables-save 2>/dev/null; nft list ruleset 2>/dev/null'
Capture Traefik vhosts:
ssh user@192.168.0.100 'cd /opt/docker/traefik && \
ls dynamic/; cat dynamic/*.yml | grep -E "rule:|sourceRange:"'
1.4 Cron + scheduled tasks
ssh user@192.168.0.100 'sudo cat /etc/crontab /etc/cron.d/* 2>/dev/null; \
for u in $(cut -d: -f1 /etc/passwd); do \
crontab -u $u -l 2>/dev/null && echo "(user $u)"; done'
Known: /etc/cron.d/docker-backup runs /opt/docker/backup.sh daily at
02:00 — broken (F-backup-1, fix in section 5).
1.5 Systemd
ssh user@192.168.0.100 'systemctl list-unit-files \
--state=enabled --type=service --no-pager'
Watch for: docker.service, tailscaled.service, ollama.service
(Ollama runs on host, not in Docker), chrony.service, ssh.service.
1.6 Disk + memory + cpu baseline
ssh user@192.168.0.100 'df -hT; \
sudo du -sh /home/docker/* /opt/docker/* /opt/backups 2>/dev/null; \
free -h; lscpu | head -20; nproc'
Reference (2026-05-06 spot check):
/ 30 G (37 %) · /var 12 G (17 %) · /home 399 G (60 %, 226 G used).
Most state is on /home.
1.7 Daemon config
ssh user@192.168.0.100 'cat /etc/docker/daemon.json /etc/subuid /etc/subgid; \
sudo cat /etc/systemd/system/docker.service.d/override.conf 2>/dev/null'
Known good (carry forward except possibly userns-remap, see 3c):
{
"log-driver": "json-file",
"log-opts": {"max-size": "10m", "max-file": "3"},
"live-restore": true,
"icc": false,
"userns-remap": "default",
"default-address-pools": [{"base": "172.20.0.0/16", "size": 24}],
"storage-driver": "overlay2",
"no-new-privileges": true
}
2 — Secret + state catalog
Anything in this table that is lost or corrupted during transfer forces re-issuance / re-pinning / re-handshake. Group by criticality.
Tier 0 — irreplaceable (lose this and external systems break)
| Path | Bytes (est.) | Restore cost if lost |
|---|---|---|
/opt/docker/step-ca/data/secrets/ + /opt/docker/step-ca/.env |
< 1 MiB | Re-issue every internal cert; reinstall veilor-root.crt on every device that uses *.veilor / internal-CA chains. Hard. |
/opt/docker/traefik/data/acme.json (LE prod) |
< 1 MiB | Hits LE rate-limit (5 dupe certs/wk per FQDN, 50 certs/wk per registered domain). Could lock cert issuance for a full week. |
/opt/docker/traefik/data/acme-internal.json (step-ca chain) |
< 1 MiB | Step-ca re-issues fast, but every leaf reissue invalidates pinned trust anchors. |
/opt/docker/headscale/config/private.key + /opt/docker/headscale/data/db.sqlite |
< 50 MiB | Loss = every node re-enrolls; preauthkeys, routes, ACLs reset. Friend GPU node identity churn. |
/etc/ssh/ssh_host_* |
< 1 MiB | Either copy → keep TOFU pinning intact, OR rotate → all clients hit "key changed" warning (acceptable but noisy). |
Tier 1 — application secrets (loss → password reset cascade)
| Path | Bytes (est.) | Notes |
|---|---|---|
/opt/docker/forgejo/data/gitea/conf/app.ini (note: file is app.ini under gitea/conf/ even on Forgejo) |
~10 KiB | SECRET_KEY, INTERNAL_TOKEN, JWT_SECRET, LFS_JWT_SECRET, OAuth client secrets. |
/opt/docker/authentik/.env + authentik PG dump |
tens of MiB | AUTHENTIK_SECRET_KEY, PG_PASS. Any service trusting Authentik OIDC needs client_secret re-handover. |
/opt/docker/misskey/.env + misskey PG dump |
< 1 MiB env | id, db.user/pass, redis.pass, master key. |
/opt/docker/n8n/.env + n8n PG dump |
< 1 MiB env | Encryption key for credentials at rest — lose this and stored creds inside n8n flows are unrecoverable. |
/opt/docker/rocketchat/.env + Mongo dump (currently stopped — see 4.1) |
< 1 MiB env | First-admin still unclaimed (audit risk item). |
/opt/docker/tuwunel*/etc/tuwunel.toml |
< 1 MiB | Server signing key seed; lose = federation re-onboard from zero. |
/opt/docker/livekit/livekit.yaml |
< 1 KiB | keys: map (api-key→secret); JWT minter (lk-jwt-service) shares this. |
/opt/docker/pihole/etc-pihole/ |
~50 MiB | Adlists + custom DNS; rebuildable in 30 min if lost. |
Gandi PAT (GANDIV5_PERSONAL_ACCESS_TOKEN in /opt/docker/traefik/.env) |
<1 KiB | Re-issuable from Gandi UI; LiveDNS-only scope (per reference_gandi_api.md). |
| Tailscale auth keys (Headscale) | regenerate via headscale preauthkeys create |
OK to regenerate. |
Tier 2 — bulk data (large, but reproducible OR low-stakes)
| Path | Bytes (est.) | Notes |
|---|---|---|
Misskey /files/ (S3-style local) |
tens of GiB | User uploads — irreplaceable to users. Dedup-friendly. |
Forgejo /home/docker/forgejo/data/git/ |
~5 GiB now | Git repos; also mirrored to GH per project_forgejo_nullstone.md, so partial DR exists. |
dl-veilor static files |
~1 GiB | Public ISO downloads; rebuildable from veilor-os pipeline. |
n8n flows (in n8n_n8n_data) |
< 1 GiB | Encrypted with key from Tier 1; export JSON via UI as belt-and-braces. |
Minecraft world (/home/docker/minecraft/data/) |
~10–30 GiB | Players will riot if lost. |
Ollama models (/home/user/models/ollama/) |
~17 GiB | Re-downloadable from registry; not blocking. |
| Postgres dumps (authentik, misskey-db, n8n-postgres) | covered by pg_dumpall in 4.1 |
|
| MongoDB dump (rocketchat-mongodb) | covered by mongodump in 4.1 |
Container is stopped today — start, dump, stop. |
Tier 3 — config-as-code (safely re-deployable from ~/ai-lab/_github/)
- All
/opt/docker/*/docker-compose.yml— committed under~/ai-lab/_github/infra/repos/and~/ai-lab/nullstone-server/. - Traefik
dynamic/*.ymlmiddleware files. - Treat as authoritative in repo; copy from repo to cobblestone, not from nullstone. Diff old-compose vs repo-compose during section 3d to catch any uncommitted drift.
3 — Cobblestone install plan
3a — OS layer
Verify base:
ssh user@cobblestone 'cat /etc/debian_version; uname -r; lsb_release -a'
LUKS2 (mandatory — closes F4):
- Path A (preferred): reinstall with full-disk LUKS2 from the
Debian installer (
/,/home, swap all on encrypted PVs). Set up TPM2 unattended unlock post-install:
PCR 0+7 binds to firmware + secure-boot state; bricks if firmware is updated → fall back to passphrase.systemd-cryptenroll --tpm2-device=auto --tpm2-pcrs=0+7 /dev/nvmeXnYpZ - Path B (fallback if reinstall blocked): LUKS-on-file loopback
for the high-value subset only:
/opt/docker/step-ca//opt/docker/traefik/data/acme*.json/opt/docker/headscale/- postgres data dirs
- Mongo keyfile volume This is strictly worse than Path A (rest of disk still cleartext, including misskey uploads and forgejo repos), but it closes the highest-value subset. Document as accepted risk.
Hostname + base packages:
sudo hostnamectl set-hostname cobblestone
sudo apt update && sudo apt install -y \
curl ca-certificates gnupg jq ufw fail2ban chrony \
rsync restic tmux htop iotop ncdu
DE strip vs keep — recommendation: STRIP.
Cost of keeping: ~500 MiB RAM, ~5 GiB disk, larger attack surface (CUPS, avahi, polkit, GUI daemons on localhost). Benefit: local browser for vhost testing, on-keyboard recovery if SSH wedges.
- Default (strip):
sudo apt purge '*-desktop' '*xorg*' lightdm sddm gdm3 'plymouth*' libreoffice-* && sudo apt autoremove --purge. Install Cockpit for web admin behind Traefik +no-guest@file. - Keep: lock SDDM/GDM local-only via PAM, disable XDMCP, mask
cups-browsed. No auto-login.
Operator picks; document choice in SYSTEM.md.
3b — Network
IP allocation during cutover — use 192.168.0.101 for
cobblestone while nullstone stays on .100. Flip DNS / port-forwards
last (section 4.6). Avoids ARP collisions and keeps rollback trivial.
nftables ruleset (mirror nullstone pattern — read live ruleset off nullstone in 1.3, replay on cobblestone):
sudo systemctl enable --now nftables
# Drop in /etc/nftables.conf with:
# - default policy drop on input
# - accept established/related
# - accept lo
# - accept 22 (SSH) from LAN + tailnet
# - accept 80/443 (Traefik) from anywhere
# - accept 222 (Forgejo SSH) from LAN + tailnet
# - accept 25565 (Minecraft) from anywhere
# - log+drop everything else
IPv6: audit reports nullstone has net.ipv4.ip_forward=1 (F30).
That was an unintended carryover from a Tailscale subnet-router
experiment. Do NOT copy /etc/sysctl.d/ from nullstone wholesale.
Instead, set explicitly:
sudo tee /etc/sysctl.d/99-cobblestone.conf <<'EOF'
net.ipv4.ip_forward = 0
net.ipv6.conf.all.forwarding = 0
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.all.accept_redirects = 0
EOF
sudo sysctl --system
If Headscale or Tailscale subnet-router is wired later, re-enable
ip_forward with explicit comment + audit note.
Tailscale + Headscale node identity:
- Cleanest path: re-enroll cobblestone from scratch. New node, new
node-key, list
cobblestoneseparately fromnullstonein Headscale during cutover week. - Alternative: copy
/var/lib/tailscale/from nullstone → cobblestone to inherit the existing identity. Saves one ACL update but conflates audit history. Not recommended.
3c — Docker
Install via official repo:
curl -fsSL https://download.docker.com/linux/debian/gpg | \
sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/debian $(lsb_release -cs) stable" | \
sudo tee /etc/apt/sources.list.d/docker.list
sudo apt update && sudo apt install -y \
docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
/etc/docker/daemon.json — userns-remap decision.
Two paths; operator decides. Document choice in SYSTEM.md.
Path 1 — DROP userns-remap (recommended): same JSON as nullstone
minus the userns-remap line.
- Pros: no more
chown 101000dance; nsenter trick (feedback_docker_sudo_bypass.md) drops the--userns=hostflag; Mongo keyfile pattern fromproject_nullstone_docker_userns.mdbecomes unnecessary;docker execUIDs match host 1:1. - Cons: container root → host uid 0. Compensated by
no-new-privileges,icc=false, per-compose CAP drops, read-only root FS where compatible. Net: small regression in defense-in-depth, large workflow simplification.
Path 2 — KEEP userns-remap: carry /etc/subuid + /etc/subgid
identically (user:100000:65536). Existing on-disk ownership at uid
101000 transfers without rechown. Cost: persisting the daily
friction the operator has been hitting for months.
Default: Path 1. If chosen, after rsync:
sudo chown -R user:user /home/docker /opt/docker
# Then per-service to the container uid (forgejo 1000, postgres 999,
# mongo 999, traefik 0).
Networks (must exist before Traefik comes up):
docker network create proxy
docker network create socket-proxy-net
docker network create misskey-frontend
3d — Service redeploy order
Topological. Each step depends only on its predecessors. Verification command and rollback at each stage.
| # | Stack | Depends on | Verify | Rollback |
|---|---|---|---|---|
| 1 | networks (proxy, socket-proxy-net, misskey-frontend) |
docker daemon | docker network ls |
docker network rm |
| 2 | socket-proxy |
network socket-proxy-net |
docker logs socket-proxy shows API filter active |
down compose |
| 3 | traefik |
socket-proxy + acme.json/acme-internal.json carryover + Gandi PAT in .env | curl -k https://sys.s8n.ru returns dashboard auth challenge; docker logs traefik shows resolver init OK; cert files repopulate without LE call (acme.json reuse) |
down compose; acme.json restore from backup |
| 4 | step-ca |
traefik (for ACME-back) | docker exec step-ca step ca health; Traefik internal-CA resolver issues a cert against https://step-ca:9000/acme/acme/directory |
down compose; revert traefik resolver config |
| 5 | headscale |
traefik | curl https://hs.s8n.ru/health; docker exec headscale headscale nodes list shows existing nodes (db.sqlite carryover) |
down compose; restore db.sqlite snapshot |
| 6 | authentik (postgres → redis → server → worker) |
traefik | curl https://auth.s8n.ru/-/health/ready/; OIDC discovery doc loads |
per-component down |
| 7 | forgejo |
traefik (+ optional authentik, currently unwired) | curl https://git.s8n.ru/api/v1/version; git clone ssh://git@cobblestone:222/... |
down compose; data dir tar-revert |
| 8 | misskey (db → redis → misskey → x-source) |
traefik, network misskey-frontend |
curl https://x.veilor/api/meta returns JSON; signup page renders |
down compose; pg dump restore |
| 9 | tuwunel + tuwunel-txt |
traefik | curl https://matrix.veilor.uk/_matrix/federation/v1/version and https://mx.s8n.ru/_matrix/federation/v1/version |
down compose; data tar-revert |
| 10 | cinny-txt + commet-web + signup-page + signup-txt |
tuwunel reachable, traefik | curl -I https://txt.s8n.ru 200; static assets 200 |
down compose |
| 11 | livekit-server + lk-jwt-service |
traefik (TURN over HTTPS) | wscat wss://livekit.veilor.uk/; jwt service /healthz |
down compose |
| 12 | n8n (postgres → n8n) |
traefik, restored encryption key | curl https://n8n.s8n.ru/healthz; UI loads with existing flows |
pg dump restore |
| 13 | pihole |
traefik | `dig @cobblestone | head`; admin UI auth |
| 14 | forgejo-runner |
forgejo (#7) reachable on internal name | docker logs forgejo-runner shows Runner registered successfully |
down compose; regenerate token via forgejo actions generate-runner-token |
| 15 | minecraft-mc |
traefik (only for filebrowser-mc), router port-forward 25565 | mcstatus mc.racked.ru (or nc -zv cobblestone 25565) |
down compose; world tar-revert |
| 16 | dl-veilor + filebrowser-mc |
traefik | curl https://dl.veilor.org/v0.2.0/veilor-root.crt |
down compose |
| 17 | anythingllm |
traefik with no-guest@file middleware applied OR LAN-only bind — must NOT bring up like nullstone (port 3001 publicly exposed, audit F-anythingllm-1) |
curl -I -H 'Host: ai.s8n.ru' https://cobblestone from off-LAN must 403 |
down compose |
| 18 | RocketChat (mongodb → rocketchat) |
operator decision — currently stopped on nullstone; if not retired, restore from mongodump produced in 4.1 | curl https://rc.s8n.ru/api/info; first-admin claim if still pending |
leave stopped (matches today's state) |
4 — Cutover sequence
4.1 — Snapshot state on nullstone
NS=user@192.168.0.100
TS=$(date +%F-%H%M)
DEST=/opt/snap/$TS
ssh $NS "sudo mkdir -p $DEST && sudo chown user:user $DEST"
# Postgres dumps
for pg in authentik-postgres misskey-db n8n-postgres-1; do
ssh $NS "docker exec $pg pg_dumpall -U postgres" \
| gzip > $DEST/$pg.sql.gz
done
# Mongo (start, dump, stop again — currently stopped per audit)
ssh $NS 'cd /opt/docker/rocketchat && docker compose up -d rocketchat-mongodb && sleep 15'
ssh $NS 'docker exec rocketchat-mongodb mongodump \
--username root \
--password "$(grep MONGO_INITDB_ROOT_PASSWORD /opt/docker/rocketchat/.env | cut -d= -f2)" \
--authenticationDatabase admin --archive' \
| gzip > $DEST/rocketchat.archive.gz
ssh $NS 'cd /opt/docker/rocketchat && docker compose stop rocketchat-mongodb'
# Forgejo full dump (covers DB + repos + LFS + attachments)
ssh $NS 'docker exec -u 1000 forgejo \
forgejo dump --type tar.zst --file /tmp/forgejo-dump.tar.zst'
ssh $NS 'docker cp forgejo:/tmp/forgejo-dump.tar.zst -' \
> $DEST/forgejo-dump.tar.zst
# Stop everything before tar (consistency)
ssh $NS 'for d in /opt/docker/*/; do \
[ -f "$d/docker-compose.yml" ] && \
(cd "$d" && docker compose down) ; \
done'
# Bulk state tar
ssh $NS "sudo tar --acls --xattrs -cpf - /opt/docker /home/docker /opt/backups" \
| zstd -T0 -19 > $DEST.tar.zst
# Manifest
ssh $NS "find /opt/docker /home/docker -type f -print0 \
| xargs -0 sha256sum" > $DEST.sha256
Hold the tarball plus dumps in two places: cobblestone target host
and an offline USB. acme.json and step-ca secrets get an
additional armored copy to the password manager.
4.2 — rsync to cobblestone
After the tarball lands, repopulate cobblestone:
COBB=user@192.168.0.101
scp $DEST.tar.zst $COBB:/tmp/
ssh $COBB 'sudo mkdir -p /opt/docker /home/docker /opt/backups && \
sudo zstd -d /tmp/snap.tar.zst -o /tmp/snap.tar && \
sudo tar --acls --xattrs -xpf /tmp/snap.tar -C /'
# If userns-remap dropped (Path 1 in 3c):
ssh $COBB 'sudo chown -R user:user /opt/docker /home/docker'
4.3 — Bring up services on cobblestone
Walk section 3d table top to bottom. Stop and verify at each row before the next. Don't batch — one bad startup cascades.
For services that store internal hostnames (Tuwunel server_name,
Headscale server_url, Forgejo ROOT_URL), the values stay the same
because public DNS still resolves to the WAN IP — only the internal LAN
target changes. No app config edits needed for cutover.
4.4 — Verify per vhost
for host in sys.s8n.ru git.s8n.ru auth.s8n.ru pihole.s8n.ru \
signup.txt.s8n.ru hs.s8n.ru rc.s8n.ru n8n.s8n.ru \
txt.s8n.ru mx.s8n.ru x.veilor matrix.veilor.uk \
chat.veilor.uk livekit.veilor.uk signup.veilor.uk \
dl.veilor.org; do
echo -n "$host: "
curl --resolve $host:443:192.168.0.101 -sI https://$host | head -1
done
Then push key flows:
git push nullstone-remote(alias still works because DNS is unchanged) — Forgejo CI runs.- Matrix federation:
curl https://federationtester.matrix.org/api/report?server_name=veilor.uk. - Misskey signup: hit invite-gated form, complete signup, federation test post.
4.5 — Cutover network
Two paths; operator picks based on appetite.
Path A — DNS swing (lower risk, slower propagation):
- Lower
*.s8n.ruand*.veilor*A-record TTLs to 60 s a week before cutover (Gandi UI; can't be done via API perreference_gandi_api.md). - Day-of: change A records from
82.31.156.86(assumed unchanged public IP) only if the WAN NAT target has changed (e.g. router port-forwards now point at.101). If WAN IP and port-forwards stay the same and you swap LAN IPs (.100→.101), no public DNS edit needed — only edit/etc/hostson internal clients (perfeedback_s8n_hosts_override.md).
Path B — IP takeover (faster, higher rollback friction):
- Bring nullstone down on
.100, change cobblestone from.101→.100, restart networking. Public DNS + router port-forwards unchanged. Rollback = swap IPs back.
Update onyx /etc/hosts long pin line last:
192.168.0.<new> rc.s8n.ru n8n.s8n.ru pihole.s8n.ru sys.s8n.ru \
mx.s8n.ru txt.s8n.ru signup.txt.s8n.ru git.s8n.ru x.veilor \
dl.veilor.org
4.6 — Update memory + ai-lab docs
~/ai-lab/CLAUDE.md— Device Registry: addcobblestonerow, marknullstoneasdecom 2026-MM-DD.~/ai-lab/SYSTEM.md— replace nullstone hardware/network blocks with cobblestone equivalents; keep nullstone as "cold spare" until wipe.~/ai-lab/README.md— device table one-liner.~/ai-lab/security/— createcobblestone-server/folder; first audit due within 7 days of cutover.- Memory files to update:
project_nullstone_docker_userns.md(mark superseded if userns-remap dropped),project_forgejo_nullstone.md,project_rocketchat_nullstone.md,project_tailscale_mesh.md,feedback_nullstone_ssh_user.md,feedback_s8n_hosts_override.md(new IP).
4.7 — Cold spare + wipe
- Hold nullstone powered-off but cabled, 7 days minimum.
- If no rollback triggered, wipe: full LUKS reformat (or
nvme format -s1for crypto-erase if drive supports it), then either donate or repurpose as cobblestone backup target (Restic destination — closes audit recommendation #6).
5 — Post-migration immediate fixes
Carried over from nullstone-server/audit-report-2026-05-05.md:
- F-backup-1 — fix
/opt/docker/backup.sh: remove deadmatrix-postgresblock (Synapse retired); correctrocketchat-mongodbcontainer name; replace literalCHANGE_ME_MONGO_ADMIN_PASSWORDwith read from/opt/docker/rocketchat/.env. Verify next 02:00 run produces non-zero RC + Mongo dumps. - no-guest@file ACL: populate
sourceRangeto cover LAN (192.168.0.0/24) + tailnet (100.64.0.0/10) + IPv6 equivalents. Verify XFF chain restores client IP at the entryPoint level (forwardedHeaders.trustedIPs). - anythingllm: front via Traefik with
no-guest@fileOR bind LAN-only. Must not repeat the 0.0.0.0:3001 nullstone state. - LUKS: done at install (3a). Verify via
cryptsetup status+systemd-cryptenroll --tpm2-device=listpost-cutover. - Restic + autorestic to B2/Wasabi or to nullstone-as-spare, with restore drill scheduled.
- Vaultwarden to centralize the secrets currently sprayed across
.envfiles. - Gatus with cert-expiry checks + ntfy/Matrix alerts.
- CrowdSec with bouncer plugin at Traefik for the public HTTP attack surface.
- Beszel for one-pane host metrics.
6 — Open questions (operator decisions)
| Question | Default if undecided |
|---|---|
| Strip DE on cobblestone? | Strip + Cockpit. Easier to defend; remote admin via web UI through Traefik + no-guest@file. |
| userns-remap on cobblestone? | Off (Path 1 in 3c). Operator pain outweighs the marginal isolation. Document tradeoff. |
| Move Headscale + step-ca to a $4 VPS? | Defer (phase 2). Keep on cobblestone for now; revisit once Restic + Gatus are running. SPOF mitigation is real but adds attack surface; do it once monitoring is in place. |
| RocketChat: bring back up or retire? | Retire if not used in 30 days. Currently stopped; first-admin still unclaimed. Mongo dump captured in 4.1, then drop the stack from cobblestone redeploy. Keeps rc.s8n.ru DNS for future revival. |
| Tailscale identity copy vs re-enroll for cobblestone? | Re-enroll (cleaner audit trail; Headscale ACLs need a one-line edit). |
| SSH host keys copy vs rotate? | Copy. TOFU pinning intact; one less "is this MITM?" prompt for clients. Add rotation to a follow-up cron. |
| Authentik wiring during cutover or after? | After. Authentik is currently mostly unwired (audit). Cutover is not the time to add new auth dependencies. |
7 — Risks (severity-tagged)
- 🔴 acme.json mishandling = LE rate-limit. Mitigation: copy
acme.json+acme-internal.jsonBEFORE bringing up Traefik on cobblestone. Never let cobblestone Traefik issue a fresh batch of certs. Hold a backup of both files in two locations. - 🔴 step-ca root key loss = full re-issuance. Mitigation:
triple-copy
/opt/docker/step-ca/.env+data/secrets/(cobblestone, USB, password manager). Test that the encrypted root key decrypts on cobblestone before tearing down nullstone. - 🔴 anythingllm reintroduces public 0.0.0.0:3001. Mitigation: do NOT bring it up before middleware is in place. Test from off-LAN IP.
- 🟠 PostgreSQL major-version skew. Mitigation: pin same major on
cobblestone (
postgres:16-alpinealready pinned; do NOT use:latest). If a major upgrade is desired, do it as a separate step after cutover settles, with a fresh pg_dumpall as safety net. - 🟠 Headscale node identity churn if
db.sqlitenot copied. All nodes (onyx, friend RTX 4080 PC, office) re-enroll. Mitigation: copydb.sqlite+private.key; verifyheadscale nodes listmatches pre-cutover before flipping DNS. - 🟡 chrony NTS peers may need re-trust on new host (NTS-KE binds
to hostname). Mitigation: chrony config copy verbatim; first
chronyc trackingshould show stratum within 5 minutes. - 🟡 Authentik OIDC
client_secrets. Today: mostly unwired (audit). Risk small. If Forgejo/RC/n8n were wired through Authentik, eachclient_secretwould need re-handover. Defer Authentik wiring until post-cutover. - 🟡 Misskey AGPL §13 source endpoint (
x-source). Perproject_x_misskey_fork.md, the AGPL link must keep serving source — and per the same memo, mute is acceptable for short windows. Cutover downtime budget: ≤ 2 h. If exceeded, post a banner onx.veilorlinking tohttps://git.s8n.ru/s8n-ru/xfor the duration. - 🟡 Backup script broken on copy. Audit F-backup-1 still applies
if you copy
/opt/docker/backup.shverbatim. Fix during section 5, not before — but do not let it run on cobblestone before fix (disable the cron entry until corrected).
Appendix — quick reference
- nullstone:
user@192.168.0.100, Debian 13, 32 GiB / 477 GiB, ~28 containers, no LUKS (F4). - cobblestone:
user@192.168.0.101during cutover, swing to.100post-validation. - LE wildcard
*.s8n.ru+*.veilor.ukvia Gandi DNS-01. Internal CA via step-ca, Traefik resolverinternal-ca. - Out of scope: office workstation install, friend GPU re-enrollment, veilor-os ISO build pipeline.
Path: /home/admin/ai-lab/_github/infra/runbooks/MIGRATION-nullstone-to-cobblestone.md
Two-line summary: pre-migration audit + secret catalog + cobblestone install plan (LUKS2, optional userns-remap drop, 18-step topological service redeploy) + cutover script + post-migration fixes carried over from the 2026-05-05 audit. Operator must fill the "things we don't know about cobblestone" table and pick on userns-remap / DE / RC retirement before section 3 runs.