obsidian-ai b109933529 runbook: distribute load + sync data (operator's HA vision)

Two-node primary/secondary architecture with per-service replication:
ZFS send/recv 15min for volumes, postgres streaming replication for
DBs, Redis Sentinel, Tailscale mesh. Phased plan from cobblestone
intake to eventual K3s/Nomad cluster at 3+ nodes. Service placement
table, failure-scenario RTO/RPO matrix, open decisions documented.

2026-05-07 03:51:14 +01:00

8.9 KiB

Raw Permalink Blame History

Runbook — distribute load + sync data across nodes

Goal: spread compute across 2+ nodes (nullstone + cobblestone + later) on Tailscale mesh, with stateful data replicated so any single-host loss = ≤15min RPO, ≤5min RTO.

Operator vision (2026-05-07): "down to distribute the load but sync the data". Not a real K8s cluster — a deliberate primary/secondary pair with per-service replication that grows into a 3-node scheduler later.

Status: PLANNING. Build cobblestone first, then phase migration.

North-star architecture

┌─────────────────────┐         ┌─────────────────────┐
│ nullstone           │ ◄─────► │ cobblestone         │
│ tailscale 100.64.0.2│   mesh  │ tailscale 100.64.0.5│
│ Debian 13           │         │ Debian 13           │
│ ZFS root            │         │ ZFS root            │
└─────────────────────┘         └─────────────────────┘
        ▲                                ▲
        │           ZFS send/recv every 15min
        │           Postgres streaming replication
        │           Forgejo runner per node
        │           Tailscale-mediated mTLS
        └────────────► both serve from synced state
                      (one primary, one warm)

Phase 4 (3+ nodes) → swap manual placement for K3s or Nomad. Until then, manual node selection via systemd unit on each host.

Per-layer sync mechanism

Layer	Tool	RPO	Notes
Network	Tailscale + Headscale (already up)	0	hs.s8n.ru on nullstone now; move to cobblestone phase 3
Filesystem volumes	ZFS snapshots + zfs send/recv	15min	`zrepl` daemon for the cron + retention policy
Postgres DBs (matrix, misskey, authentik, n8n)	streaming replication	<1s lag	hot standby per primary
Redis (misskey-redis, authentik-redis)	Redis Sentinel	<1s	auto-failover; needs 3 sentinel processes
Static files (assets, wallpapers, configs)	Syncthing	seconds	already a known pattern in this stack
Object/blob (uploads, attachments)	rclone bisync nightly + ZFS	15min	for large media that doesn't fit Syncthing's index well
Secrets / .env	sops + age + git	per-commit	encrypted at rest in Forgejo, decrypted on host via systemd-creds
DNS	Gandi LiveDNS, TTL 60s	n/a	swing record in failover
Backups (offsite)	Restic to B2/Wasabi	nightly	both nodes back up to same bucket; dedupe

Service placement plan

Pinned services run on one node only; secondary runs warm-stopped.

Service	Primary	Secondary	Notes
Forgejo (git.s8n.ru)	cobblestone	nullstone (warm)	Move off nullstone so buildah on runner doesn't impact MC
Forgejo runner #1	nullstone	n/a	Runs alongside MC; cgroup CPU quota limit
Forgejo runner #2	cobblestone	n/a	Bigger jobs land here
Tuwunel (matrix.veilor.uk)	nullstone	cobblestone (warm)	DB streamed; warm Tuwunel on cobblestone w/ tuwunel-state-pgsync.timer
Tuwunel-txt + Cinny (mx.s8n.ru)	cobblestone	nullstone (warm)	Symmetric — operator-redundant
Misskey (x.veilor)	nullstone	cobblestone (warm)	Heavy. Move when cobblestone has >32GB
Authentik (auth.s8n.ru)	cobblestone	nullstone (warm)	SSO is critical — keeps both up
Headscale (hs.s8n.ru)	cobblestone (Phase 3)	nullstone	Control plane = single. Use cobblestone (more reliable hw target)
Pi-hole DNS	nullstone	cobblestone	DNS = critical — both must answer; client uses Pi-hole pair via DHCP
Traefik	both, separate certs	—	active/active; Cloudflare or DNS round-robin in front
Step-CA	nullstone	cold standby	Internal PKI; rare write; backup root key offline
Minecraft (mc.racked.ru)	nullstone	none	Pinned. World data ZFS-replicated for DR only
anythingllm + dl-veilor	cobblestone	nullstone (warm)	Light load; place where ZFS has more capacity
n8n	cobblestone	nullstone (warm)	Cron-driven; can run on either
Filebrowser-mc	nullstone	n/a	Pinned to MC for chunkfix volume access

Phase plan

Phase 1 — bring cobblestone alive (1–2 evenings)

Install Debian 13 on cobblestone with ZFS root + ECC RAM
LUKS2 + argon2id (matches our nullstone target post-migration)
Install: openssh, tailscale, docker, zfsutils-linux, postgresql-client (for replication setup)
Join Tailscale mesh under 100.64.0.5 via Headscale on nullstone
SSH in via tailnet; firewall = drop except inbound 22 from tailnet
Mirror nullstone's /opt/docker/ layout
Initial seed via zfs send tank/home/docker@seed | ssh cobblestone zfs recv tank/home/docker
Bring up read-only services (Forgejo runner #2, optional warm Misskey)
Verify each can resolve neighbour hostnames (use 100.64.0.x direct, not DNS, for replication)

Phase 2 — replicate stateful (1 weekend)

Set up Postgres primary on nullstone, replica on cobblestone, for each DB: matrix-postgres, misskey-postgres, authentik-postgres, n8n-postgres
Test replication: pg_stat_replication shows < 1s lag
Set up Redis Sentinel for misskey-redis and authentik-redis (3 sentinel processes minimum: nullstone, cobblestone, friend-PC laptop)
zrepl daemon installed both sides; replication policy 15min snapshot, hourly retention 24h, daily 30d, weekly 12w
Syncthing for static assets / configs
Test DR drill: stop nullstone postgres, verify cobblestone replica is read-only-promotable

Phase 3 — selective primary moves

Move Forgejo to cobblestone (operator pain: builds no longer compete with MC)
Forgejo runner ON nullstone keeps running — registers as nullstone label
Forgejo runner ON cobblestone registers as cobblestone label
DNS swing: git.s8n.ru A record → cobblestone tailnet IP via Gandi LiveDNS API
Move Headscale to cobblestone (control plane needs reliable host)
Move Authentik to cobblestone

Phase 4 — when 3+ nodes (deferred)

Stand up K3s on the three nodes with embedded etcd
Longhorn for distributed PV (or NFS w/ Heartbeat for simpler)
Migrate stateless services to Deployment+Service
Stateful services stay on labeled nodes (StatefulSet w/ nodeSelector)
Drop bespoke ZFS replication for services managed by Longhorn

Concrete first-step commands

When cobblestone is racked and SSH-able from nullstone:

# On nullstone — initial ZFS seed
zfs snapshot tank/home/docker@cobblestone-seed
zfs send -R tank/home/docker@cobblestone-seed | \
  ssh root@100.64.0.5 zfs recv -F tank/home/docker

# On both — install zrepl
curl -L https://zrepl.github.io/install.sh | bash
# Configure /etc/zrepl/zrepl.yml — primary push job + secondary sink job

# Cron-equivalent: zrepl is its own daemon
systemctl enable --now zrepl

Postgres replication setup is per-DB; document in runbooks/POSTGRES-streaming-replication.md when Phase 2 lands.

Failure scenarios + RTO

Scenario	RTO	RPO	Action
nullstone reboot (planned)	5min	0	DNS swing services to cobblestone before reboot, swing back after
nullstone hardware fail	~30min manual	15min	Promote postgres replica, swing all DNS, restart warm services on cobblestone
cobblestone hardware fail	~5min auto	15min	nullstone still running primaries; just lose runner #2 + Authentik until rebuild
Both nodes fail	hours	hours	Restic restore from B2/Wasabi to a temp box
Network partition	seconds	n/a	Tailscale heals; postgres replication catches up; brief degraded UX

Open decisions

ECC: cobblestone needs Pro chipset (B650E for AM5) for verified ECC. Cheap-out option = ECC unverified on consumer board (still works on AMD).
Distros: stay on Debian 13 for both? OR put veilor-os on cobblestone once stable?
- Recommendation: cobblestone runs Debian 13 (server OS, server kernel) with ZFS root. veilor-os is an end-user desktop spin — wrong fit for a production server role. Don't conflate.
Headscale move: Phase 3, OR sooner if we don't trust nullstone? Defer until we can prove cobblestone uptime.
Minecraft on cobblestone instead? No — moving MC = downtime + map mismatch risk. Stay pinned.
K3s vs Nomad: revisit at Phase 4. Nomad simpler for small ops; K3s mainstream + ecosystem bigger.

runbooks/MIGRATION-nullstone-to-cobblestone.md — original migration draft
runbooks/COBBLESTONE-INTAKE.md — hardware spec template
STATE.md — current node + service state
Memory: project_tailscale_mesh.md — mesh shape today

8.9 KiB Raw Permalink Blame History Unescape Escape