infra/AUDIT-2026-05-05.md
s8n 09d80a63f6 init: nullstone deploys + runbooks + audits
Sourced from previous audits + agent-wave outputs (2026-05-05):
  AUDIT-2026-05-05.md           — 5-agent stack synthesis
  forgejo/DEPLOY.md             — git.s8n.ru deploy runbook
  forgejo/forgejo-compose.yml   — production compose
  forgejo/runner-compose.yml    — forgejo-runner
  forgejo/migration-report-...  — GH→Forgejo migration audit (6/6 green)
  runbooks/MIGRATION-...        — nullstone→cobblestone runbook
  runbooks/DE-DECISION-...      — keep-vs-strip DE on cobblestone
  repos/REPO-AUDIT-2026-05-05.md — repo trees + ownership
2026-05-06 10:02:28 +01:00

16 KiB
Raw Permalink Blame History

5-Agent Audit Report — 2026-05-05

Synthesis of 5 parallel agents covering: GitHub→Forgejo migration, ai-lab structure, nullstone services, stack rating, recommended additions.

Source agent outputs:

  1. Migration agent → nullstone-server/forgejo/migration-report-2026-05-05.md
  2. ai-lab structural audit
  3. nullstone services + deployment audit
  4. Stack rating (10 axes)
  5. Recommended service additions

TL;DR

  • GH → Forgejo migration: complete. 6/6 repos mirrored (5× s8n-ru/* + veilor-org/veilor-os). All HEADs match, branches match, tags match, push-mirrors back to GH all green. Repaired one default-branch metadata drift on s8n-ru/x. Zero failures.
  • Stack rating: 7/10. Above-average self-hosted setup. Audit discipline + identity/CA story unusually strong. Fragile on monitoring + offsite backup + single-host.
  • Top 5 weaknesses (severity-ordered): F4 no LUKS on nullstone (regression), no monitoring/alerting, backups local-only with silently broken script, :latest floats on most stacks, single point of failure (nullstone + home WAN).
  • Top 5 services to add (priority): Restic+autorestic, Vaultwarden, Gatus, CrowdSec, Beszel.
  • Top 4 anti-recommendations: Nextcloud, full LGTM stack, Mastodon, HashiCorp Vault.

1 — GitHub repo migration

Status: complete. Per migration agent's report.

  • 6 repos enumerated under s8n-ru user + admin'd orgs.
  • 6 mirrored to git.s8n.ru (Forgejo); 5 fresh, 1 already pre-migrated (veilor-org/veilor-os).
  • HEADs / branches / tags match GH for all 6.
  • Push-mirrors Forgejo → GH configured (8h interval + sync-on-commit), all green.
  • One repair: s8n-ru/x default branch was stuck on KisaragiEffective-patch-1 from Misskey upstream; PATCHed to master.

Detail: nullstone-server/forgejo/migration-report-2026-05-05.md.


2 — ai-lab structural audit

Devices

codename type OS role
onyx laptop Fedora 43 KDE Dev workstation (DHCP .28, registry says .6 — drift)
nullstone server Debian 13 Infra host — Docker stack, mesh, Matrix/Misskey/RC
office workstation Fedora 43 KDE (pending install since 2026-04-19) Office/sales (.5)

External: friend PC 100.64.0.3 (RTX 4080, vLLM in WSL2).

Active projects (_github/)

repo purpose status
veilor-os Hardened Fedora 43 KDE remix actively iterating, BlueBuild + kickstart
auth-limbo Paper plugin (racked.ru AuthMe fix) active, released jars
minecraft-launcher Custom MC launcher (PrismLauncher fork) active, v1 build script
minecraft-server Purpur MC at mc.racked.ru:25565 live in prod
minecraft-client racked.ru MC client (FO 11.3.2 fork) active

Per-device security audit cadence

device last audit folder
nullstone 2026-05-05 (ACL hardening); full 2026-05-02 security/nullstone-server/ (9 reports)
onyx 2026-04-15 security/onyx-laptop/ (2 reports)
office never security/office-workstation/ (empty)

Memory record (31 files, 1 index)

  • 2 user, 7 feedback, 1 reference, 21 project memos.
  • Top-active: matrix_veilor, txt_cinny, x_misskey_fork, tailscale_mesh, friend_gpu, org_charter, brand_separation, simplex_org_chat.

What this lab is

The operator runs a small home-lab/3-member CTO-style org (P M=CTO, nullstone=Runtime Owner, onyx-ai=Research/Review) split cleanly across two brands (per project_brand_separation.md):

  1. racked.ru — privacy-first Minecraft platform (MC server + client + custom launcher + AuthLimbo plugin)
  2. veilor — security company stack (veilor-os hardened Fedora ISO, veilor-server-bootstrap Debian preseed, Matrix at veilor.uk, Misskey-fork at x.veilor)

All self-hosted on nullstone behind Traefik+Headscale+Pi-hole. Mesh includes friend's RTX 4080 for remote LLM inference via Tailscale.

Drift / gaps

  • office-workstation/ registered in CLAUDE.md but install pending since 2026-04-19; no audit folder populated.
  • README onyx IP .6 vs actual DHCP .28.
  • README folder tree doesn't match real repo (lists _project_code/
    • scripts/; reality has _github/, _projects/, _archive/, archive/, github/, several .sync-conflict-* files, 30 MB binary re at root).
  • Two parallel nullstone-server/ and server/ device folders — drift candidate.
  • MEMORY.md index missing entry for project_forgejo_nullstone.md (file present, index not updated).
  • Sync-conflict files for CLAUDE.md / README.md / SYSTEM.md from Syncthing merge never resolved.
  • SYSTEM.md still mentions Jitsi/coturn / MAS Element X test retired per project_matrix_veilor.md — TODO list not pruned.

3 — nullstone services + deployment audit

Hardware

  • CPU: AMD Ryzen 5 2600X (6c/12t)
  • RAM: 32 GiB (15 used, 15 free, 24 GiB swap, 256 KiB used)
  • GPU: GTX 1660 Ti 6 GB (Ollama)
  • Disk: 477 GiB NVMe, LVM (keystone-vg):
    • root 30 G (35% used)
    • var 12 G (15%)
    • home 399 G (60%, 227 G used / 153 G free) — watch growth
    • tmp 2.7 G, swap 24 G
  • OS: Debian 13, kernel 6.12.85+deb13
  • Docker: v29.4.2, overlay2, userns-remap=default, live-restore=true, icc=false, no-new-privileges=true. Data root symlinked /var/lib/docker → /home/user/docker-data.

Active services (28 containers)

Including: traefik, socket-proxy, authentik (server+worker+pg+redis), forgejo + forgejo-runner, misskey + db + redis, x-source nginx, rocketchat + mongodb, tuwunel + tuwunel-txt, cinny-txt, commet-web, signup-page + signup-txt, livekit + lk-jwt-service, dl-veilor, pihole, headscale, n8n + postgres, step-ca, filebrowser-mc, minecraft-mc, anythingllm, plus 2 stale alpine:3 shells from userns-host bypass.

Domain → service map (all on *.s8n.ru or *.veilor[.uk])

sys.s8n.ru (traefik dash), git.s8n.ru (forgejo, NEW), auth.s8n.ru (authentik), pihole.s8n.ru, signup.txt.s8n.ru, hs.s8n.ru (headscale), rc.s8n.ru (rocketchat), n8n.s8n.ru, txt.s8n.ru (cinny), mx.s8n.ru (tuwunel-txt), x.veilor (misskey), matrix.veilor.uk, chat.veilor.uk (commet), livekit.veilor.uk, signup.veilor.uk, dl.veilor.org.

Deployment patterns

  • Compose: /opt/docker/<svc>/docker-compose.yml
  • Data: named docker volumes under /home/user/docker-data/100000.100000/volumes/ + per-service bind mounts. Newer services (forgejo, forgejo-runner, minecraft) on /home/docker/<svc>/ to dodge 30 G root.
  • userns-remap quirk: container UIDs shifted +100000. Workaround: alpine root container or chown to 101000.
  • Docker socket exposure: traefik does NOT mount docker.sock; goes via tecnativa/docker-socket-proxy on socket-proxy-net.
  • Networks: proxy + socket-proxy-net + misskey-frontend + per-stack internals (authentik-internal, misskey-internal, etc.).
  • Middleware chain: trusted-only@file → security-headers@file → rate-limit@file → <service-specific> with no-guest@file for routers needing tailnet+LAN but blocking public.

Auth patterns

  • Authentik (auth.s8n.ru) — central OIDC, all 4 components healthy. Currently mostly unwired. Forgejo runs native auth (no OAUTH section in app.ini). RC, n8n, anythingllm, filebrowser likely local-auth too. Authentik present but underused.
  • Forgejo — local users + PAT, admin s8n-ru, SSH 222.
  • Headscale — preauthkey enrollment + headscale-deny-leaks@file.
  • Traefik dashboard — basicauth + trusted-only@file.

Backup state

  • /etc/cron.d/docker-backup runs /opt/docker/backup.sh at 02:00 daily, 7-day rotation to /opt/backups/.
  • Script silently broken (HIGH): matrix-postgres container is gone (Synapse retired); rocketchat-mongodb name mismatch (script expects mongodb); Mongo password reads literal CHANGE_ME_MONGO_ADMIN_PASSWORD. So Rocket.Chat + (former) Matrix dumps not happening. Misskey side-script works.
  • No off-host replication. Single NVMe = total loss on disk failure.

Drift / risk register

  • 🔴 Backup script broken (RC + ex-Matrix not dumping)
  • 🔴 anythingllm listens 0.0.0.0:3001 with no traefik label, bypasses entire L7 trust model. Either bind LAN-only or front via traefik.
  • 🟠 Resource limits: only minecraft-mc has memory/CPU limits. 30 other containers unbounded — runaway can OOM-kill neighbours.
  • 🟠 No service-level health checks on ~half the containers.
  • 🔴 no-guest@file IPAllowList stub: declares only sourceRange: ["127.0.0.0/8"]. Routers chained with no-guest reject everything except loopback unless XFF restores client IP. Verify entryPoint forwardedHeaders.trustedIPs + middleware ipStrategy.depth — misconfig either 403s real users or accepts spoofed XFF.
  • 🟡 office (100.64.0.4) not in trusted-only@file despite tag:infra per SYSTEM.md.
  • 🟠 RocketChat: first-admin setup still pending — wizard endpoint takeover risk until claimed.
  • 🟡 Stale alpine:3 shell containers (userns-host bypass leftovers). docker rm -f after each one-shot.
  • 🟡 Archived compose dirs (pocket-id.archived-*, matrix-old) contain secrets — move off docker tree.
  • 🟡 /home 60% with growing volumes (Ollama, mongo, postgres ×3). No quotas.

Mem pressure: none right now

Top consumer minecraft 9.35 / 18 GiB cap (52% of cap, ~30% host). All others < 2.2%. Headroom good.


4 — Stack rating (10 axes)

Axis Score Top weakness
Architectural coherence 8 Drift artifacts (sync-conflict files, parallel _archive/archive)
Security posture 7 F4 no LUKS on server (regression); F30 ip_forward=1; F12 partial revert
Reproducibility 6 Most stacks on :latest; no IaC; admin bootstrap uncoded
Operational maturity 4 No metrics/alerts; backups untested; on-call="user reads logs"
Cost discipline 9 Single residential ISP + single home server is "cheap because fragile"
Threat model clarity 6 No written THREAT_MODEL.md; AGPL §13 source endpoint deferred
Update hygiene 5 :latest floats; no staged rollout; recovery = "edit compose, restart"
Documentation quality 8 SYSTEM.md is 979 lines; CV + team-msg.txt + sync-conflicts in repo root
Network resilience 5 Single residential WAN; control + data plane same box; no Tor/SOCKS fallback
Branding/product discipline 9 "X" rebrand close to veilor — easy to confuse in logs/docs

Overall: 7/10

Above-average self-hosted stack. Better-documented than 90% of homelabs, with audit discipline most small SaaS shops don't reach, and a coherent identity/CA story (own root CA via step-ca, own VPN control plane via Headscale, own Matrix homeserver). Loses points on operational maturity (no monitoring, no offsite/tested backups, no rollback), one critical regression (no LUKS on nullstone), and inherent fragility from single-host single-ISP design.

The gap between known weaknesses and fixed weaknesses is the limiting factor — operator clearly can fix these (audit closes 27/35 findings in 3 days), they just haven't yet.

Comparison

  • vs Stock Fedora desktop + GitHub: wins decisively (8 vs 3) on network/identity/AGPL discipline.
  • vs secureblue + GH Actions: stronger on server-side sovereignty; weaker on client posture and CI. Roughly tied overall, different axes.
  • vs Hetzner-VPS hobbyist stack: loses on resilience + update hygiene, wins on cost + GPU inference + identity depth. This stack more ambitious; Hetzner more boring-and-reliable.
  • vs Cloudflare/Workers managed: wins on sovereignty + GPU + Matrix ability. Loses on uptime + DDoS + zero-patching. This stack's whole reason to exist is the inverse tradeoff — and it makes that tradeoff coherently.

Top 5 priority (deploy in this order)

# Service Why now Effort Maintenance
1 Restic + autorestic Single biggest gap. nullstone NVMe failure = total loss right now. Encrypted incremental to B2/Wasabi or to onyx. M S
2 Vaultwarden N services with N storage methods for secrets. Centralize before count grows. S S
3 Gatus Otherwise you find out about a downed service from a friend on Matrix. Cert-expiry alone catches the silent killer. Alerts via Tuwunel webhook or ntfy. S S
4 CrowdSec Pi-hole only sees DNS layer. Public Matrix fed candidates + RC + Misskey + signup pages = HTTP attack surface. Bouncer plugin blocks at Traefik. M S
5 Beszel Once Restic is filling disk + CrowdSec flagging IPs, you want one dashboard. S S

Anti-recommendations

Service Why NOT
Nextcloud Heavy (1.5 GB+ RAM idle), notorious upgrade pain. Use Seafile if you need files.
Full LGTM stack (Grafana+Prom+Loki+Alertmanager) Five services to do what Beszel+Gatus do for solo-op.
Mastodon You already run Misskey-fork. Federating two ActivityPub silos doubles moderation.
HashiCorp Vault Complexity-to-benefit ratio terrible for one operator. Infisical or pass-with-git enough.
Authelia Duplicates Authentik. Pick one.

Consolidation suggestions

  • Cinny + various Element/Commet forks: pick one web client per Matrix instance. Each fork = separate audit + CSP + branding burden.
  • n8n: if only used for 2-3 simple flows, replace with shell scripts in Forgejo Actions cron. n8n's value is the GUI for non-coders; you're a coder.
  • Step-CA + Let's Encrypt: confirm zero overlap. If step-ca only issues one cert, kill it.
  • dl-veilor + signup pages: if static, fold into single Caddy container behind Traefik. Two containers for static HTML is two too many.

Other notable picks (lower priority)

  • Seafile CE — file sync (much lighter than Nextcloud)
  • Karakeep (formerly Hoarder) — bookmarks/RSS/read-later, AI tags via your local Ollama / friend RTX 4080
  • ntfy — formalize the push-notification target you're already using ad-hoc
  • Forgejo Packages — already implicit, just enable for container registry + npm/cargo/maven/generic

6 — Action items (severity-ordered)

Ship-blocking (do this week)

  1. Fix /opt/docker/backup.sh — remove dead matrix-postgres, correct rocketchat-mongodb container name, replace literal CHANGE_ME_MONGO_ADMIN_PASSWORD. Verify next 02:00 run produces non-zero RC + Mongo dumps.
  2. Bind anythingllm to LAN-only OR add traefik front with no-guest@file. Currently public on :3001.
  3. Verify no-guest@file ACL — confirm sourceRange covers LAN + tailnet, not just loopback. Verify XFF chain restores real client IP.
  4. Claim RocketChat first-admin — takeover risk until then.
  5. Enable LUKS2 on nullstone (F4 regression) — schedule reinstall window with TPM2 unlock; or until then, LUKS-on-file loopback for step-ca root key + acme.json + Mongo keyfile.

High-value next (do this month)

  1. Deploy Restic + autorestic with B2/Wasabi target + restore drill.
  2. Deploy Vaultwarden + migrate secrets out of .env files.
  3. Deploy Gatus with cert-expiry checks + Matrix/ntfy alerts.
  4. Resolve sync-conflict files at ai-lab repo root.
  5. Pin docker images by digest for critical stacks (already done for Misskey; do tuwunel/livekit/cinny/pihole/RC/Traefik next).

Defer / planned

  • Office workstation install + first audit
  • Fold dl-veilor + signup pages into single Caddy
  • Replace n8n with Forgejo Actions cron (if usage <5 flows)
  • Move Headscale + step-ca to $4/mo VPS for SPOF mitigation

7 — File index

Output Path
This synthesis ~/ai-lab/nullstone-server/audit-report-2026-05-05.md
Migration detail ~/ai-lab/nullstone-server/forgejo/migration-report-2026-05-05.md
Forgejo runbook ~/ai-lab/nullstone-server/forgejo/deploy-runbook.md
Forgejo memory ~/.claude/projects/-home-admin-ai-lab/memory/project_forgejo_nullstone.md
veilor-os strategy ~/ai-lab/_github/veilor-os/docs/STRATEGY.md
veilor-os roadmap ~/ai-lab/_github/veilor-os/docs/ROADMAP.md
veilor-os threat model ~/ai-lab/_github/veilor-os/docs/THREAT-MODEL.md