infra/runbooks/MIGRATION-nullstone-to-cobblestone.md
s8n 09d80a63f6 init: nullstone deploys + runbooks + audits
Sourced from previous audits + agent-wave outputs (2026-05-05):
  AUDIT-2026-05-05.md           — 5-agent stack synthesis
  forgejo/DEPLOY.md             — git.s8n.ru deploy runbook
  forgejo/forgejo-compose.yml   — production compose
  forgejo/runner-compose.yml    — forgejo-runner
  forgejo/migration-report-...  — GH→Forgejo migration audit (6/6 green)
  runbooks/MIGRATION-...        — nullstone→cobblestone runbook
  runbooks/DE-DECISION-...      — keep-vs-strip DE on cobblestone
  repos/REPO-AUDIT-2026-05-05.md — repo trees + ownership
2026-05-06 10:02:28 +01:00

630 lines
28 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!--
Migration runbook: nullstone → cobblestone
Audience: P M (operator), nullstone Runtime Owner.
Status: DRAFT — pre-cutover. Read sections 13 first; sections 47 are
executed only on cutover day.
Source-of-truth audits referenced:
- ~/ai-lab/SYSTEM.md
- ~/ai-lab/nullstone-server/audit-report-2026-05-05.md
- ~/ai-lab/nullstone-server/forgejo/deploy-runbook.md
Last updated: 2026-05-06
-->
# Migration runbook — nullstone → cobblestone
Goal: relocate the Docker stack (~28 containers, ~227 GiB state) from
**nullstone** (Debian 13, 192.168.0.100, AMD Ryzen 5 2600X / 32 GiB /
477 GiB NVMe, no LUKS) to **cobblestone** (Debian, fresh, LAN, hardware
TBD by operator), and close audit regression **F4 (no LUKS at rest)**
in the same window.
This runbook is read-only on both hosts until cutover (section 4).
Sections 13 are inventory + planning; section 4 is the destructive
cutover; sections 57 are follow-through.
## Things we don't know about cobblestone yet — operator to fill in
| Question | Why it matters | Default if unset |
|---|---|---|
| CPU model / cores / threads | Sizing for parallel postgres + Ollama + MC | Assume ≥ Ryzen 5 2600X parity |
| RAM | 32 GiB nullstone runs 50 % util peak; less = trim MC + Ollama | Require ≥ 32 GiB |
| Storage layout (LVM? ZFS? plain?) | Decides LUKS strategy in 3a | Assume single NVMe, plain ext4 |
| GPU present (any) | Ollama / vLLM / Misskey thumb GPU helpers | Assume none, leave Ollama on friend RTX 4080 |
| LUKS already enabled at install? | If no → reinstall window or LUKS-on-file fallback | Assume **no** (act accordingly) |
| Static IP allocated? | Cutover plan needs a parking IP | Assume DHCP, target `.101` for cutover |
| DE installed? | Strip vs keep debate | Confirmed installed; default = strip |
| User account name + uid | Bind-mount permissions on /home/docker | Assume `user`, uid 1000 (mirror nullstone) |
Update this table before running section 3.
---
## 1 — Pre-migration audit (run on nullstone)
All commands read-only. SSH as `user@192.168.0.100`
(per `feedback_nullstone_ssh_user.md``admin@` is rejected).
### 1.1 Container inventory
```bash
ssh user@192.168.0.100 'docker ps -a --format "{{json .}}"' \
> nullstone-containers-$(date +%F).jsonl
ssh user@192.168.0.100 'docker inspect $(docker ps -aq)' \
> nullstone-inspect-$(date +%F).json
```
Parse for `Names`, `Image`, `Mounts[].Source`, `NetworkSettings.Networks`,
`HostConfig.RestartPolicy`, `Config.Labels` (Traefik routers).
### 1.2 Volumes (size estimate)
```bash
ssh user@192.168.0.100 'docker volume ls --format "{{.Name}}"' \
| xargs -I {} ssh user@192.168.0.100 \
"docker run --rm -v {}:/v alpine du -sh /v 2>/dev/null | sed 's|/v|{}|'"
```
Cross-reference with `/home/user/docker-data/100000.100000/volumes/`
(userns-remapped path) for per-volume bytes.
### 1.3 Network
```bash
ssh user@192.168.0.100 'docker network ls; \
ss -tlnp 2>/dev/null | grep LISTEN; \
iptables-save 2>/dev/null; nft list ruleset 2>/dev/null'
```
Capture Traefik vhosts:
```bash
ssh user@192.168.0.100 'cd /opt/docker/traefik && \
ls dynamic/; cat dynamic/*.yml | grep -E "rule:|sourceRange:"'
```
### 1.4 Cron + scheduled tasks
```bash
ssh user@192.168.0.100 'sudo cat /etc/crontab /etc/cron.d/* 2>/dev/null; \
for u in $(cut -d: -f1 /etc/passwd); do \
crontab -u $u -l 2>/dev/null && echo "(user $u)"; done'
```
Known: `/etc/cron.d/docker-backup` runs `/opt/docker/backup.sh` daily at
02:00 — **broken** (F-backup-1, fix in section 5).
### 1.5 Systemd
```bash
ssh user@192.168.0.100 'systemctl list-unit-files \
--state=enabled --type=service --no-pager'
```
Watch for: `docker.service`, `tailscaled.service`, `ollama.service`
(Ollama runs on host, not in Docker), `chrony.service`, `ssh.service`.
### 1.6 Disk + memory + cpu baseline
```bash
ssh user@192.168.0.100 'df -hT; \
sudo du -sh /home/docker/* /opt/docker/* /opt/backups 2>/dev/null; \
free -h; lscpu | head -20; nproc'
```
Reference (2026-05-06 spot check):
`/` 30 G (37 %) · `/var` 12 G (17 %) · `/home` 399 G (60 %, 226 G used).
Most state is on `/home`.
### 1.7 Daemon config
```bash
ssh user@192.168.0.100 'cat /etc/docker/daemon.json /etc/subuid /etc/subgid; \
sudo cat /etc/systemd/system/docker.service.d/override.conf 2>/dev/null'
```
Known good (carry forward except possibly userns-remap, see 3c):
```json
{
"log-driver": "json-file",
"log-opts": {"max-size": "10m", "max-file": "3"},
"live-restore": true,
"icc": false,
"userns-remap": "default",
"default-address-pools": [{"base": "172.20.0.0/16", "size": 24}],
"storage-driver": "overlay2",
"no-new-privileges": true
}
```
---
## 2 — Secret + state catalog
Anything in this table that is **lost** or **corrupted** during transfer
forces re-issuance / re-pinning / re-handshake. Group by criticality.
### Tier 0 — irreplaceable (lose this and external systems break)
| Path | Bytes (est.) | Restore cost if lost |
|---|---|---|
| `/opt/docker/step-ca/data/secrets/` + `/opt/docker/step-ca/.env` | < 1 MiB | Re-issue every internal cert; reinstall `veilor-root.crt` on every device that uses `*.veilor` / internal-CA chains. Hard. |
| `/opt/docker/traefik/data/acme.json` (LE prod) | < 1 MiB | Hits LE rate-limit (5 dupe certs/wk per FQDN, 50 certs/wk per registered domain). Could lock cert issuance for a full week. |
| `/opt/docker/traefik/data/acme-internal.json` (step-ca chain) | < 1 MiB | Step-ca re-issues fast, but every leaf reissue invalidates pinned trust anchors. |
| `/opt/docker/headscale/config/private.key` + `/opt/docker/headscale/data/db.sqlite` | < 50 MiB | Loss = every node re-enrolls; preauthkeys, routes, ACLs reset. Friend GPU node identity churn. |
| `/etc/ssh/ssh_host_*` | < 1 MiB | Either copy keep TOFU pinning intact, OR rotate all clients hit "key changed" warning (acceptable but noisy). |
### Tier 1 — application secrets (loss → password reset cascade)
| Path | Bytes (est.) | Notes |
|---|---|---|
| `/opt/docker/forgejo/data/gitea/conf/app.ini` (note: file is `app.ini` under `gitea/conf/` even on Forgejo) | ~10 KiB | `SECRET_KEY`, `INTERNAL_TOKEN`, `JWT_SECRET`, `LFS_JWT_SECRET`, OAuth client secrets. |
| `/opt/docker/authentik/.env` + authentik PG dump | tens of MiB | `AUTHENTIK_SECRET_KEY`, `PG_PASS`. Any service trusting Authentik OIDC needs `client_secret` re-handover. |
| `/opt/docker/misskey/.env` + misskey PG dump | < 1 MiB env | `id`, `db.user/pass`, `redis.pass`, master key. |
| `/opt/docker/n8n/.env` + n8n PG dump | < 1 MiB env | Encryption key for credentials at rest **lose this and stored creds inside n8n flows are unrecoverable**. |
| `/opt/docker/rocketchat/.env` + Mongo dump (currently stopped see 4.1) | < 1 MiB env | First-admin still unclaimed (audit risk item). |
| `/opt/docker/tuwunel*/etc/tuwunel.toml` | < 1 MiB | Server signing key seed; lose = federation re-onboard from zero. |
| `/opt/docker/livekit/livekit.yaml` | < 1 KiB | `keys:` map (api-keysecret); JWT minter (`lk-jwt-service`) shares this. |
| `/opt/docker/pihole/etc-pihole/` | ~50 MiB | Adlists + custom DNS; rebuildable in 30 min if lost. |
| Gandi PAT (`GANDIV5_PERSONAL_ACCESS_TOKEN` in `/opt/docker/traefik/.env`) | <1 KiB | Re-issuable from Gandi UI; LiveDNS-only scope (per `reference_gandi_api.md`). |
| Tailscale auth keys (Headscale) | regenerate via `headscale preauthkeys create` | OK to regenerate. |
### Tier 2 — bulk data (large, but reproducible OR low-stakes)
| Path | Bytes (est.) | Notes |
|---|---|---|
| Misskey `/files/` (S3-style local) | tens of GiB | User uploads irreplaceable to users. Dedup-friendly. |
| Forgejo `/home/docker/forgejo/data/git/` | ~5 GiB now | Git repos; also mirrored to GH per `project_forgejo_nullstone.md`, so partial DR exists. |
| `dl-veilor` static files | ~1 GiB | Public ISO downloads; rebuildable from veilor-os pipeline. |
| n8n flows (in `n8n_n8n_data`) | < 1 GiB | Encrypted with key from Tier 1; export JSON via UI as belt-and-braces. |
| Minecraft world (`/home/docker/minecraft/data/`) | ~1030 GiB | Players will riot if lost. |
| Ollama models (`/home/user/models/ollama/`) | ~17 GiB | Re-downloadable from registry; not blocking. |
| Postgres dumps (authentik, misskey-db, n8n-postgres) | covered by `pg_dumpall` in 4.1 | |
| MongoDB dump (rocketchat-mongodb) | covered by `mongodump` in 4.1 | Container is **stopped** today start, dump, stop. |
### Tier 3 — config-as-code (safely re-deployable from `~/ai-lab/_github/`)
- All `/opt/docker/*/docker-compose.yml` committed under
`~/ai-lab/_github/infra/repos/` and `~/ai-lab/nullstone-server/`.
- Traefik `dynamic/*.yml` middleware files.
- Treat as authoritative in repo; copy from repo to cobblestone, not
from nullstone. Diff old-compose vs repo-compose during section 3d to
catch any uncommitted drift.
---
## 3 — Cobblestone install plan
### 3a — OS layer
Verify base:
```bash
ssh user@cobblestone 'cat /etc/debian_version; uname -r; lsb_release -a'
```
**LUKS2 (mandatory — closes F4):**
- **Path A (preferred):** reinstall with full-disk LUKS2 from the
Debian installer (`/`, `/home`, swap all on encrypted PVs). Set up
TPM2 unattended unlock post-install:
```bash
systemd-cryptenroll --tpm2-device=auto --tpm2-pcrs=0+7 /dev/nvmeXnYpZ
```
PCR 0+7 binds to firmware + secure-boot state; bricks if firmware
is updated fall back to passphrase.
- **Path B (fallback if reinstall blocked):** LUKS-on-file loopback
for the high-value subset only:
- `/opt/docker/step-ca/`
- `/opt/docker/traefik/data/acme*.json`
- `/opt/docker/headscale/`
- postgres data dirs
- Mongo keyfile volume
This is **strictly worse** than Path A (rest of disk still
cleartext, including misskey uploads and forgejo repos), but it
closes the highest-value subset. Document as accepted risk.
Hostname + base packages:
```bash
sudo hostnamectl set-hostname cobblestone
sudo apt update && sudo apt install -y \
curl ca-certificates gnupg jq ufw fail2ban chrony \
rsync restic tmux htop iotop ncdu
```
**DE strip vs keep — recommendation: STRIP.**
Cost of keeping: ~500 MiB RAM, ~5 GiB disk, larger attack surface
(CUPS, avahi, polkit, GUI daemons on localhost). Benefit: local
browser for vhost testing, on-keyboard recovery if SSH wedges.
- **Default (strip):** `sudo apt purge '*-desktop' '*xorg*' lightdm
sddm gdm3 'plymouth*' libreoffice-* && sudo apt autoremove --purge`.
Install Cockpit for web admin behind Traefik + `no-guest@file`.
- **Keep:** lock SDDM/GDM local-only via PAM, disable XDMCP, mask
`cups-browsed`. No auto-login.
Operator picks; document choice in SYSTEM.md.
### 3b — Network
**IP allocation during cutover** use `192.168.0.101` for
cobblestone while nullstone stays on `.100`. Flip DNS / port-forwards
last (section 4.6). Avoids ARP collisions and keeps rollback trivial.
**nftables ruleset** (mirror nullstone pattern read live ruleset off
nullstone in 1.3, replay on cobblestone):
```bash
sudo systemctl enable --now nftables
# Drop in /etc/nftables.conf with:
# - default policy drop on input
# - accept established/related
# - accept lo
# - accept 22 (SSH) from LAN + tailnet
# - accept 80/443 (Traefik) from anywhere
# - accept 222 (Forgejo SSH) from LAN + tailnet
# - accept 25565 (Minecraft) from anywhere
# - log+drop everything else
```
**IPv6:** audit reports nullstone has `net.ipv4.ip_forward=1` (F30).
That was an *unintended carryover* from a Tailscale subnet-router
experiment. **Do NOT** copy `/etc/sysctl.d/` from nullstone wholesale.
Instead, set explicitly:
```bash
sudo tee /etc/sysctl.d/99-cobblestone.conf <<'EOF'
net.ipv4.ip_forward = 0
net.ipv6.conf.all.forwarding = 0
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.all.accept_redirects = 0
EOF
sudo sysctl --system
```
If Headscale or Tailscale subnet-router is wired later, re-enable
`ip_forward` with explicit comment + audit note.
**Tailscale + Headscale node identity:**
- Cleanest path: re-enroll cobblestone from scratch. New node, new
node-key, list `cobblestone` separately from `nullstone` in
Headscale during cutover week.
- Alternative: copy `/var/lib/tailscale/` from nullstone cobblestone
to inherit the existing identity. Saves one ACL update but
conflates audit history. Not recommended.
### 3c — Docker
Install via official repo:
```bash
curl -fsSL https://download.docker.com/linux/debian/gpg | \
sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/debian $(lsb_release -cs) stable" | \
sudo tee /etc/apt/sources.list.d/docker.list
sudo apt update && sudo apt install -y \
docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
```
**`/etc/docker/daemon.json` userns-remap decision.**
Two paths; operator decides. Document choice in SYSTEM.md.
**Path 1 — DROP userns-remap (recommended):** same JSON as nullstone
minus the `userns-remap` line.
- Pros: no more `chown 101000` dance; nsenter trick
(`feedback_docker_sudo_bypass.md`) drops the `--userns=host` flag;
Mongo keyfile pattern from `project_nullstone_docker_userns.md`
becomes unnecessary; `docker exec` UIDs match host 1:1.
- Cons: container root host uid 0. Compensated by
`no-new-privileges`, `icc=false`, per-compose CAP drops, read-only
root FS where compatible. Net: small regression in defense-in-depth,
large workflow simplification.
**Path 2 — KEEP userns-remap:** carry `/etc/subuid` + `/etc/subgid`
identically (`user:100000:65536`). Existing on-disk ownership at uid
`101000` transfers without rechown. Cost: persisting the daily
friction the operator has been hitting for months.
**Default: Path 1.** If chosen, after rsync:
```bash
sudo chown -R user:user /home/docker /opt/docker
# Then per-service to the container uid (forgejo 1000, postgres 999,
# mongo 999, traefik 0).
```
Networks (must exist before Traefik comes up):
```bash
docker network create proxy
docker network create socket-proxy-net
docker network create misskey-frontend
```
### 3d — Service redeploy order
Topological. Each step depends only on its predecessors. Verification
command and rollback at each stage.
| # | Stack | Depends on | Verify | Rollback |
|---|---|---|---|---|
| 1 | networks (`proxy`, `socket-proxy-net`, `misskey-frontend`) | docker daemon | `docker network ls` | `docker network rm` |
| 2 | `socket-proxy` | network `socket-proxy-net` | `docker logs socket-proxy` shows API filter active | down compose |
| 3 | `traefik` | socket-proxy + acme.json/acme-internal.json carryover + Gandi PAT in .env | `curl -k https://sys.s8n.ru` returns dashboard auth challenge; `docker logs traefik` shows resolver init OK; cert files repopulate without LE call (acme.json reuse) | down compose; acme.json restore from backup |
| 4 | `step-ca` | traefik (for ACME-back) | `docker exec step-ca step ca health`; Traefik internal-CA resolver issues a cert against `https://step-ca:9000/acme/acme/directory` | down compose; revert traefik resolver config |
| 5 | `headscale` | traefik | `curl https://hs.s8n.ru/health`; `docker exec headscale headscale nodes list` shows existing nodes (db.sqlite carryover) | down compose; restore db.sqlite snapshot |
| 6 | authentik (`postgres redis server worker`) | traefik | `curl https://auth.s8n.ru/-/health/ready/`; OIDC discovery doc loads | per-component down |
| 7 | `forgejo` | traefik (+ optional authentik, currently unwired) | `curl https://git.s8n.ru/api/v1/version`; `git clone ssh://git@cobblestone:222/...` | down compose; data dir tar-revert |
| 8 | misskey (`db redis misskey x-source`) | traefik, network `misskey-frontend` | `curl https://x.veilor/api/meta` returns JSON; signup page renders | down compose; pg dump restore |
| 9 | `tuwunel` + `tuwunel-txt` | traefik | `curl https://matrix.veilor.uk/_matrix/federation/v1/version` and `https://mx.s8n.ru/_matrix/federation/v1/version` | down compose; data tar-revert |
| 10 | `cinny-txt` + `commet-web` + `signup-page` + `signup-txt` | tuwunel reachable, traefik | `curl -I https://txt.s8n.ru` 200; static assets 200 | down compose |
| 11 | `livekit-server` + `lk-jwt-service` | traefik (TURN over HTTPS) | `wscat wss://livekit.veilor.uk/`; jwt service `/healthz` | down compose |
| 12 | n8n (`postgres n8n`) | traefik, restored encryption key | `curl https://n8n.s8n.ru/healthz`; UI loads with existing flows | pg dump restore |
| 13 | `pihole` | traefik | `dig @cobblestone | head`; admin UI auth | down compose |
| 14 | `forgejo-runner` | forgejo (#7) reachable on internal name | `docker logs forgejo-runner` shows `Runner registered successfully` | down compose; regenerate token via `forgejo actions generate-runner-token` |
| 15 | `minecraft-mc` | traefik (only for filebrowser-mc), router port-forward 25565 | `mcstatus mc.racked.ru` (or `nc -zv cobblestone 25565`) | down compose; world tar-revert |
| 16 | `dl-veilor` + `filebrowser-mc` | traefik | `curl https://dl.veilor.org/v0.2.0/veilor-root.crt` | down compose |
| 17 | `anythingllm` | traefik **with `no-guest@file` middleware applied** OR LAN-only bind must NOT bring up like nullstone (port 3001 publicly exposed, audit F-anythingllm-1) | `curl -I -H 'Host: ai.s8n.ru' https://cobblestone` from off-LAN must 403 | down compose |
| 18 | RocketChat (`mongodb rocketchat`) | **operator decision** currently stopped on nullstone; if not retired, restore from mongodump produced in 4.1 | `curl https://rc.s8n.ru/api/info`; first-admin claim if still pending | leave stopped (matches today's state) |
---
## 4 — Cutover sequence
### 4.1 — Snapshot state on nullstone
```bash
NS=user@192.168.0.100
TS=$(date +%F-%H%M)
DEST=/opt/snap/$TS
ssh $NS "sudo mkdir -p $DEST && sudo chown user:user $DEST"
# Postgres dumps
for pg in authentik-postgres misskey-db n8n-postgres-1; do
ssh $NS "docker exec $pg pg_dumpall -U postgres" \
| gzip > $DEST/$pg.sql.gz
done
# Mongo (start, dump, stop again — currently stopped per audit)
ssh $NS 'cd /opt/docker/rocketchat && docker compose up -d rocketchat-mongodb && sleep 15'
ssh $NS 'docker exec rocketchat-mongodb mongodump \
--username root \
--password "$(grep MONGO_INITDB_ROOT_PASSWORD /opt/docker/rocketchat/.env | cut -d= -f2)" \
--authenticationDatabase admin --archive' \
| gzip > $DEST/rocketchat.archive.gz
ssh $NS 'cd /opt/docker/rocketchat && docker compose stop rocketchat-mongodb'
# Forgejo full dump (covers DB + repos + LFS + attachments)
ssh $NS 'docker exec -u 1000 forgejo \
forgejo dump --type tar.zst --file /tmp/forgejo-dump.tar.zst'
ssh $NS 'docker cp forgejo:/tmp/forgejo-dump.tar.zst -' \
> $DEST/forgejo-dump.tar.zst
# Stop everything before tar (consistency)
ssh $NS 'for d in /opt/docker/*/; do \
[ -f "$d/docker-compose.yml" ] && \
(cd "$d" && docker compose down) ; \
done'
# Bulk state tar
ssh $NS "sudo tar --acls --xattrs -cpf - /opt/docker /home/docker /opt/backups" \
| zstd -T0 -19 > $DEST.tar.zst
# Manifest
ssh $NS "find /opt/docker /home/docker -type f -print0 \
| xargs -0 sha256sum" > $DEST.sha256
```
Hold the tarball plus dumps in two places: cobblestone target host
and an offline USB. `acme.json` and step-ca secrets get an
*additional* armored copy to the password manager.
### 4.2 — rsync to cobblestone
After the tarball lands, repopulate cobblestone:
```bash
COBB=user@192.168.0.101
scp $DEST.tar.zst $COBB:/tmp/
ssh $COBB 'sudo mkdir -p /opt/docker /home/docker /opt/backups && \
sudo zstd -d /tmp/snap.tar.zst -o /tmp/snap.tar && \
sudo tar --acls --xattrs -xpf /tmp/snap.tar -C /'
# If userns-remap dropped (Path 1 in 3c):
ssh $COBB 'sudo chown -R user:user /opt/docker /home/docker'
```
### 4.3 — Bring up services on cobblestone
Walk section 3d table top to bottom. **Stop and verify** at each row
before the next. Don't batch one bad startup cascades.
For services that store internal hostnames (Tuwunel `server_name`,
Headscale `server_url`, Forgejo `ROOT_URL`), the values stay the same
because public DNS still resolves to the WAN IP only the internal LAN
target changes. No app config edits needed for cutover.
### 4.4 — Verify per vhost
```bash
for host in sys.s8n.ru git.s8n.ru auth.s8n.ru pihole.s8n.ru \
signup.txt.s8n.ru hs.s8n.ru rc.s8n.ru n8n.s8n.ru \
txt.s8n.ru mx.s8n.ru x.veilor matrix.veilor.uk \
chat.veilor.uk livekit.veilor.uk signup.veilor.uk \
dl.veilor.org; do
echo -n "$host: "
curl --resolve $host:443:192.168.0.101 -sI https://$host | head -1
done
```
Then push key flows:
- `git push nullstone-remote` (alias still works because DNS is
unchanged) Forgejo CI runs.
- Matrix federation: `curl https://federationtester.matrix.org/api/report?server_name=veilor.uk`.
- Misskey signup: hit invite-gated form, complete signup, federation
test post.
### 4.5 — Cutover network
Two paths; operator picks based on appetite.
**Path A — DNS swing** (lower risk, slower propagation):
1. Lower `*.s8n.ru` and `*.veilor*` A-record TTLs to 60 s **a week
before** cutover (Gandi UI; can't be done via API per
`reference_gandi_api.md`).
2. Day-of: change A records from `82.31.156.86` (assumed unchanged
public IP) only if the WAN NAT target has changed (e.g. router
port-forwards now point at `.101`). If WAN IP and port-forwards
stay the same and you swap LAN IPs (`.100` `.101`), no public
DNS edit needed only edit `/etc/hosts` on internal clients (per
`feedback_s8n_hosts_override.md`).
**Path B — IP takeover** (faster, higher rollback friction):
- Bring nullstone down on `.100`, change cobblestone from `.101`
`.100`, restart networking. Public DNS + router port-forwards
unchanged. Rollback = swap IPs back.
Update onyx `/etc/hosts` long pin line **last**:
```
192.168.0.<new> rc.s8n.ru n8n.s8n.ru pihole.s8n.ru sys.s8n.ru \
mx.s8n.ru txt.s8n.ru signup.txt.s8n.ru git.s8n.ru x.veilor \
dl.veilor.org
```
### 4.6 — Update memory + ai-lab docs
- `~/ai-lab/CLAUDE.md` Device Registry: add `cobblestone` row, mark
`nullstone` as `decom 2026-MM-DD`.
- `~/ai-lab/SYSTEM.md` replace nullstone hardware/network blocks
with cobblestone equivalents; keep nullstone as "cold spare" until
wipe.
- `~/ai-lab/README.md` device table one-liner.
- `~/ai-lab/security/` create `cobblestone-server/` folder; first
audit due within 7 days of cutover.
- Memory files to update: `project_nullstone_docker_userns.md`
(mark **superseded** if userns-remap dropped),
`project_forgejo_nullstone.md`,
`project_rocketchat_nullstone.md`, `project_tailscale_mesh.md`,
`feedback_nullstone_ssh_user.md`, `feedback_s8n_hosts_override.md`
(new IP).
### 4.7 — Cold spare + wipe
- Hold nullstone powered-off but cabled, 7 days minimum.
- If no rollback triggered, wipe: full LUKS reformat (or `nvme
format -s1` for crypto-erase if drive supports it), then either
donate or repurpose as cobblestone backup target (Restic
destination closes audit recommendation #6).
---
## 5 — Post-migration immediate fixes
Carried over from `nullstone-server/audit-report-2026-05-05.md`:
- **F-backup-1 fix `/opt/docker/backup.sh`:** remove dead
`matrix-postgres` block (Synapse retired); correct
`rocketchat-mongodb` container name; replace literal
`CHANGE_ME_MONGO_ADMIN_PASSWORD` with read from
`/opt/docker/rocketchat/.env`. Verify next 02:00 run produces
non-zero RC + Mongo dumps.
- **no-guest@file ACL:** populate `sourceRange` to cover LAN
(`192.168.0.0/24`) + tailnet (`100.64.0.0/10`) + IPv6 equivalents.
Verify XFF chain restores client IP at the entryPoint level
(`forwardedHeaders.trustedIPs`).
- **anythingllm:** front via Traefik with `no-guest@file` OR bind
LAN-only. Must not repeat the 0.0.0.0:3001 nullstone state.
- **LUKS:** done at install (3a). Verify via `cryptsetup status` +
`systemd-cryptenroll --tpm2-device=list` post-cutover.
- **Restic + autorestic** to B2/Wasabi or to nullstone-as-spare,
with restore drill scheduled.
- **Vaultwarden** to centralize the secrets currently sprayed across
`.env` files.
- **Gatus** with cert-expiry checks + ntfy/Matrix alerts.
- **CrowdSec** with bouncer plugin at Traefik for the public
HTTP attack surface.
- **Beszel** for one-pane host metrics.
---
## 6 — Open questions (operator decisions)
| Question | Default if undecided |
|---|---|
| Strip DE on cobblestone? | **Strip + Cockpit.** Easier to defend; remote admin via web UI through Traefik + no-guest@file. |
| userns-remap on cobblestone? | **Off (Path 1 in 3c).** Operator pain outweighs the marginal isolation. Document tradeoff. |
| Move Headscale + step-ca to a $4 VPS? | **Defer (phase 2).** Keep on cobblestone for now; revisit once Restic + Gatus are running. SPOF mitigation is real but adds attack surface; do it once monitoring is in place. |
| RocketChat: bring back up or retire? | **Retire if not used in 30 days.** Currently stopped; first-admin still unclaimed. Mongo dump captured in 4.1, then drop the stack from cobblestone redeploy. Keeps `rc.s8n.ru` DNS for future revival. |
| Tailscale identity copy vs re-enroll for cobblestone? | **Re-enroll** (cleaner audit trail; Headscale ACLs need a one-line edit). |
| SSH host keys copy vs rotate? | **Copy.** TOFU pinning intact; one less "is this MITM?" prompt for clients. Add rotation to a follow-up cron. |
| Authentik wiring during cutover or after? | **After.** Authentik is currently mostly unwired (audit). Cutover is not the time to add new auth dependencies. |
---
## 7 — Risks (severity-tagged)
- 🔴 **acme.json mishandling = LE rate-limit.** Mitigation: copy
`acme.json` + `acme-internal.json` BEFORE bringing up Traefik on
cobblestone. Never let cobblestone Traefik issue a fresh batch of
certs. Hold a backup of both files in two locations.
- 🔴 **step-ca root key loss = full re-issuance.** Mitigation:
triple-copy `/opt/docker/step-ca/.env` + `data/secrets/`
(cobblestone, USB, password manager). Test that the encrypted root
key decrypts on cobblestone before tearing down nullstone.
- 🔴 **anythingllm reintroduces public 0.0.0.0:3001.** Mitigation: do
NOT bring it up before middleware is in place. Test from off-LAN
IP.
- 🟠 **PostgreSQL major-version skew.** Mitigation: pin same major on
cobblestone (`postgres:16-alpine` already pinned; do NOT use
`:latest`). If a major upgrade is desired, do it as a separate
step *after* cutover settles, with a fresh pg_dumpall as safety
net.
- 🟠 **Headscale node identity churn** if `db.sqlite` not copied. All
nodes (onyx, friend RTX 4080 PC, office) re-enroll. Mitigation:
copy `db.sqlite` + `private.key`; verify `headscale nodes list`
matches pre-cutover before flipping DNS.
- 🟡 **chrony NTS peers** may need re-trust on new host (NTS-KE binds
to hostname). Mitigation: chrony config copy verbatim; first
`chronyc tracking` should show stratum within 5 minutes.
- 🟡 **Authentik OIDC `client_secret`s.** Today: mostly unwired
(audit). Risk small. If Forgejo/RC/n8n were wired through
Authentik, each `client_secret` would need re-handover. Defer
Authentik wiring until post-cutover.
- 🟡 **Misskey AGPL §13 source endpoint** (`x-source`). Per
`project_x_misskey_fork.md`, the AGPL link must keep serving
source and per the same memo, mute is acceptable for short
windows. Cutover downtime budget: ** 2 h**. If exceeded, post a
banner on `x.veilor` linking to `https://git.s8n.ru/s8n-ru/x` for
the duration.
- 🟡 **Backup script broken on copy.** Audit F-backup-1 still applies
if you copy `/opt/docker/backup.sh` verbatim. Fix during section 5,
not before but do not let it run on cobblestone before fix
(disable the cron entry until corrected).
---
## Appendix — quick reference
- nullstone: `user@192.168.0.100`, Debian 13, 32 GiB / 477 GiB, ~28
containers, no LUKS (F4).
- cobblestone: `user@192.168.0.101` during cutover, swing to `.100`
post-validation.
- LE wildcard `*.s8n.ru` + `*.veilor.uk` via Gandi DNS-01. Internal CA
via step-ca, Traefik resolver `internal-ca`.
- Out of scope: office workstation install, friend GPU re-enrollment,
veilor-os ISO build pipeline.
---
**Path:** `/home/admin/ai-lab/_github/infra/runbooks/MIGRATION-nullstone-to-cobblestone.md`
Two-line summary: pre-migration audit + secret catalog + cobblestone
install plan (LUKS2, optional userns-remap drop, 18-step topological
service redeploy) + cutover script + post-migration fixes carried over
from the 2026-05-05 audit. Operator must fill the "things we don't know
about cobblestone" table and pick on userns-remap / DE / RC retirement
before section 3 runs.