Two-node primary/secondary architecture with per-service replication: ZFS send/recv 15min for volumes, postgres streaming replication for DBs, Redis Sentinel, Tailscale mesh. Phased plan from cobblestone intake to eventual K3s/Nomad cluster at 3+ nodes. Service placement table, failure-scenario RTO/RPO matrix, open decisions documented.
8.9 KiB
Runbook — distribute load + sync data across nodes
Goal: spread compute across 2+ nodes (nullstone + cobblestone + later) on Tailscale mesh, with stateful data replicated so any single-host loss = ≤15min RPO, ≤5min RTO.
Operator vision (2026-05-07): "down to distribute the load but sync the data". Not a real K8s cluster — a deliberate primary/secondary pair with per-service replication that grows into a 3-node scheduler later.
Status: PLANNING. Build cobblestone first, then phase migration.
North-star architecture
┌─────────────────────┐ ┌─────────────────────┐
│ nullstone │ ◄─────► │ cobblestone │
│ tailscale 100.64.0.2│ mesh │ tailscale 100.64.0.5│
│ Debian 13 │ │ Debian 13 │
│ ZFS root │ │ ZFS root │
└─────────────────────┘ └─────────────────────┘
▲ ▲
│ ZFS send/recv every 15min
│ Postgres streaming replication
│ Forgejo runner per node
│ Tailscale-mediated mTLS
└────────────► both serve from synced state
(one primary, one warm)
Phase 4 (3+ nodes) → swap manual placement for K3s or Nomad. Until then, manual node selection via systemd unit on each host.
Per-layer sync mechanism
| Layer | Tool | RPO | Notes |
|---|---|---|---|
| Network | Tailscale + Headscale (already up) | 0 | hs.s8n.ru on nullstone now; move to cobblestone phase 3 |
| Filesystem volumes | ZFS snapshots + zfs send/recv | 15min | zrepl daemon for the cron + retention policy |
| Postgres DBs (matrix, misskey, authentik, n8n) | streaming replication | <1s lag | hot standby per primary |
| Redis (misskey-redis, authentik-redis) | Redis Sentinel | <1s | auto-failover; needs 3 sentinel processes |
| Static files (assets, wallpapers, configs) | Syncthing | seconds | already a known pattern in this stack |
| Object/blob (uploads, attachments) | rclone bisync nightly + ZFS | 15min | for large media that doesn't fit Syncthing's index well |
| Secrets / .env | sops + age + git | per-commit | encrypted at rest in Forgejo, decrypted on host via systemd-creds |
| DNS | Gandi LiveDNS, TTL 60s | n/a | swing record in failover |
| Backups (offsite) | Restic to B2/Wasabi | nightly | both nodes back up to same bucket; dedupe |
Service placement plan
Pinned services run on one node only; secondary runs warm-stopped.
| Service | Primary | Secondary | Notes |
|---|---|---|---|
| Forgejo (git.s8n.ru) | cobblestone | nullstone (warm) | Move off nullstone so buildah on runner doesn't impact MC |
| Forgejo runner #1 | nullstone | n/a | Runs alongside MC; cgroup CPU quota limit |
| Forgejo runner #2 | cobblestone | n/a | Bigger jobs land here |
| Tuwunel (matrix.veilor.uk) | nullstone | cobblestone (warm) | DB streamed; warm Tuwunel on cobblestone w/ tuwunel-state-pgsync.timer |
| Tuwunel-txt + Cinny (mx.s8n.ru) | cobblestone | nullstone (warm) | Symmetric — operator-redundant |
| Misskey (x.veilor) | nullstone | cobblestone (warm) | Heavy. Move when cobblestone has >32GB |
| Authentik (auth.s8n.ru) | cobblestone | nullstone (warm) | SSO is critical — keeps both up |
| Headscale (hs.s8n.ru) | cobblestone (Phase 3) | nullstone | Control plane = single. Use cobblestone (more reliable hw target) |
| Pi-hole DNS | nullstone | cobblestone | DNS = critical — both must answer; client uses Pi-hole pair via DHCP |
| Traefik | both, separate certs | — | active/active; Cloudflare or DNS round-robin in front |
| Step-CA | nullstone | cold standby | Internal PKI; rare write; backup root key offline |
| Minecraft (mc.racked.ru) | nullstone | none | Pinned. World data ZFS-replicated for DR only |
| anythingllm + dl-veilor | cobblestone | nullstone (warm) | Light load; place where ZFS has more capacity |
| n8n | cobblestone | nullstone (warm) | Cron-driven; can run on either |
| Filebrowser-mc | nullstone | n/a | Pinned to MC for chunkfix volume access |
Phase plan
Phase 1 — bring cobblestone alive (1–2 evenings)
- Install Debian 13 on cobblestone with ZFS root + ECC RAM
- LUKS2 + argon2id (matches our nullstone target post-migration)
- Install: openssh, tailscale, docker, zfsutils-linux, postgresql-client (for replication setup)
- Join Tailscale mesh under
100.64.0.5via Headscale on nullstone - SSH in via tailnet; firewall = drop except inbound 22 from tailnet
- Mirror nullstone's
/opt/docker/layout - Initial seed via
zfs send tank/home/docker@seed | ssh cobblestone zfs recv tank/home/docker - Bring up read-only services (Forgejo runner #2, optional warm Misskey)
- Verify each can resolve neighbour hostnames (use
100.64.0.xdirect, not DNS, for replication)
Phase 2 — replicate stateful (1 weekend)
- Set up Postgres primary on nullstone, replica on cobblestone, for each DB:
matrix-postgres,misskey-postgres,authentik-postgres,n8n-postgres - Test replication:
pg_stat_replicationshows < 1s lag - Set up Redis Sentinel for
misskey-redisandauthentik-redis(3 sentinel processes minimum: nullstone, cobblestone, friend-PC laptop) zrepldaemon installed both sides; replication policy 15min snapshot, hourly retention 24h, daily 30d, weekly 12w- Syncthing for static assets / configs
- Test DR drill: stop nullstone postgres, verify cobblestone replica is read-only-promotable
Phase 3 — selective primary moves
- Move Forgejo to cobblestone (operator pain: builds no longer compete with MC)
- Forgejo runner ON nullstone keeps running — registers as
nullstonelabel - Forgejo runner ON cobblestone registers as
cobblestonelabel - DNS swing:
git.s8n.ruA record → cobblestone tailnet IP via Gandi LiveDNS API - Move Headscale to cobblestone (control plane needs reliable host)
- Move Authentik to cobblestone
Phase 4 — when 3+ nodes (deferred)
- Stand up K3s on the three nodes with embedded etcd
- Longhorn for distributed PV (or NFS w/ Heartbeat for simpler)
- Migrate stateless services to Deployment+Service
- Stateful services stay on labeled nodes (StatefulSet w/ nodeSelector)
- Drop bespoke ZFS replication for services managed by Longhorn
Concrete first-step commands
When cobblestone is racked and SSH-able from nullstone:
# On nullstone — initial ZFS seed
zfs snapshot tank/home/docker@cobblestone-seed
zfs send -R tank/home/docker@cobblestone-seed | \
ssh root@100.64.0.5 zfs recv -F tank/home/docker
# On both — install zrepl
curl -L https://zrepl.github.io/install.sh | bash
# Configure /etc/zrepl/zrepl.yml — primary push job + secondary sink job
# Cron-equivalent: zrepl is its own daemon
systemctl enable --now zrepl
Postgres replication setup is per-DB; document in
runbooks/POSTGRES-streaming-replication.md when Phase 2 lands.
Failure scenarios + RTO
| Scenario | RTO | RPO | Action |
|---|---|---|---|
| nullstone reboot (planned) | 5min | 0 | DNS swing services to cobblestone before reboot, swing back after |
| nullstone hardware fail | ~30min manual | 15min | Promote postgres replica, swing all DNS, restart warm services on cobblestone |
| cobblestone hardware fail | ~5min auto | 15min | nullstone still running primaries; just lose runner #2 + Authentik until rebuild |
| Both nodes fail | hours | hours | Restic restore from B2/Wasabi to a temp box |
| Network partition | seconds | n/a | Tailscale heals; postgres replication catches up; brief degraded UX |
Open decisions
- ECC: cobblestone needs Pro chipset (B650E for AM5) for verified ECC. Cheap-out option = ECC unverified on consumer board (still works on AMD).
- Distros: stay on Debian 13 for both? OR put veilor-os on cobblestone once stable?
- Recommendation: cobblestone runs Debian 13 (server OS, server kernel) with ZFS root. veilor-os is an end-user desktop spin — wrong fit for a production server role. Don't conflate.
- Headscale move: Phase 3, OR sooner if we don't trust nullstone? Defer until we can prove cobblestone uptime.
- Minecraft on cobblestone instead? No — moving MC = downtime + map mismatch risk. Stay pinned.
- K3s vs Nomad: revisit at Phase 4. Nomad simpler for small ops; K3s mainstream + ecosystem bigger.
Related
runbooks/MIGRATION-nullstone-to-cobblestone.md— original migration draftrunbooks/COBBLESTONE-INTAKE.md— hardware spec templateSTATE.md— current node + service state- Memory:
project_tailscale_mesh.md— mesh shape today