infra/AUDIT-2026-05-05.md
s8n 09d80a63f6 init: nullstone deploys + runbooks + audits
Sourced from previous audits + agent-wave outputs (2026-05-05):
  AUDIT-2026-05-05.md           — 5-agent stack synthesis
  forgejo/DEPLOY.md             — git.s8n.ru deploy runbook
  forgejo/forgejo-compose.yml   — production compose
  forgejo/runner-compose.yml    — forgejo-runner
  forgejo/migration-report-...  — GH→Forgejo migration audit (6/6 green)
  runbooks/MIGRATION-...        — nullstone→cobblestone runbook
  runbooks/DE-DECISION-...      — keep-vs-strip DE on cobblestone
  repos/REPO-AUDIT-2026-05-05.md — repo trees + ownership
2026-05-06 10:02:28 +01:00

370 lines
16 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 5-Agent Audit Report — 2026-05-05
Synthesis of 5 parallel agents covering: GitHub→Forgejo migration,
ai-lab structure, nullstone services, stack rating, recommended
additions.
Source agent outputs:
1. Migration agent → `nullstone-server/forgejo/migration-report-2026-05-05.md`
2. ai-lab structural audit
3. nullstone services + deployment audit
4. Stack rating (10 axes)
5. Recommended service additions
---
## TL;DR
- **GH → Forgejo migration: complete.** 6/6 repos mirrored
(5× s8n-ru/* + veilor-org/veilor-os). All HEADs match, branches
match, tags match, push-mirrors back to GH all green. Repaired one
default-branch metadata drift on `s8n-ru/x`. Zero failures.
- **Stack rating: 7/10.** Above-average self-hosted setup. Audit
discipline + identity/CA story unusually strong. Fragile on
monitoring + offsite backup + single-host.
- **Top 5 weaknesses (severity-ordered):** F4 no LUKS on nullstone
(regression), no monitoring/alerting, backups local-only with
silently broken script, `:latest` floats on most stacks, single
point of failure (nullstone + home WAN).
- **Top 5 services to add (priority):** Restic+autorestic, Vaultwarden,
Gatus, CrowdSec, Beszel.
- **Top 4 anti-recommendations:** Nextcloud, full LGTM stack, Mastodon,
HashiCorp Vault.
---
## 1 — GitHub repo migration
**Status: complete.** Per migration agent's report.
- 6 repos enumerated under `s8n-ru` user + admin'd orgs.
- 6 mirrored to `git.s8n.ru` (Forgejo); 5 fresh, 1 already pre-migrated
(`veilor-org/veilor-os`).
- HEADs / branches / tags match GH for all 6.
- Push-mirrors Forgejo → GH configured (8h interval + sync-on-commit),
all green.
- One repair: `s8n-ru/x` default branch was stuck on
`KisaragiEffective-patch-1` from Misskey upstream; PATCHed to
`master`.
Detail: `nullstone-server/forgejo/migration-report-2026-05-05.md`.
---
## 2 — ai-lab structural audit
### Devices
| codename | type | OS | role |
|---|---|---|---|
| onyx | laptop | Fedora 43 KDE | Dev workstation (DHCP `.28`, registry says `.6` — drift) |
| nullstone | server | Debian 13 | Infra host — Docker stack, mesh, Matrix/Misskey/RC |
| office | workstation | Fedora 43 KDE (pending install since 2026-04-19) | Office/sales (.5) |
External: friend PC `100.64.0.3` (RTX 4080, vLLM in WSL2).
### Active projects (`_github/`)
| repo | purpose | status |
|---|---|---|
| veilor-os | Hardened Fedora 43 KDE remix | actively iterating, BlueBuild + kickstart |
| auth-limbo | Paper plugin (racked.ru AuthMe fix) | active, released jars |
| minecraft-launcher | Custom MC launcher (PrismLauncher fork) | active, v1 build script |
| minecraft-server | Purpur MC at `mc.racked.ru:25565` | live in prod |
| minecraft-client | racked.ru MC client (FO 11.3.2 fork) | active |
### Per-device security audit cadence
| device | last audit | folder |
|---|---|---|
| nullstone | 2026-05-05 (ACL hardening); full 2026-05-02 | `security/nullstone-server/` (9 reports) |
| onyx | 2026-04-15 | `security/onyx-laptop/` (2 reports) |
| office | never | `security/office-workstation/` (empty) |
### Memory record (31 files, 1 index)
- 2 user, 7 feedback, 1 reference, 21 project memos.
- Top-active: matrix_veilor, txt_cinny, x_misskey_fork, tailscale_mesh,
friend_gpu, org_charter, brand_separation, simplex_org_chat.
### What this lab is
The operator runs a small home-lab/3-member CTO-style org
(`P M=CTO, nullstone=Runtime Owner, onyx-ai=Research/Review`) split
cleanly across **two brands** (per `project_brand_separation.md`):
1. **racked.ru** — privacy-first Minecraft platform (MC server +
client + custom launcher + AuthLimbo plugin)
2. **veilor** — security company stack (veilor-os hardened Fedora
ISO, veilor-server-bootstrap Debian preseed, Matrix at veilor.uk,
Misskey-fork at x.veilor)
All self-hosted on nullstone behind Traefik+Headscale+Pi-hole. Mesh
includes friend's RTX 4080 for remote LLM inference via Tailscale.
### Drift / gaps
- `office-workstation/` registered in CLAUDE.md but install pending
since 2026-04-19; no audit folder populated.
- README onyx IP `.6` vs actual DHCP `.28`.
- README folder tree doesn't match real repo (lists `_project_code/`
+ `scripts/`; reality has `_github/`, `_projects/`, `_archive/`,
`archive/`, `github/`, several `.sync-conflict-*` files, 30 MB
binary `re` at root).
- Two parallel `nullstone-server/` and `server/` device folders —
drift candidate.
- `MEMORY.md` index missing entry for `project_forgejo_nullstone.md`
(file present, index not updated).
- Sync-conflict files for CLAUDE.md / README.md / SYSTEM.md from
Syncthing merge never resolved.
- SYSTEM.md still mentions Jitsi/coturn / MAS Element X test
retired per project_matrix_veilor.md — TODO list not pruned.
---
## 3 — nullstone services + deployment audit
### Hardware
- **CPU:** AMD Ryzen 5 2600X (6c/12t)
- **RAM:** 32 GiB (15 used, 15 free, 24 GiB swap, 256 KiB used)
- **GPU:** GTX 1660 Ti 6 GB (Ollama)
- **Disk:** 477 GiB NVMe, LVM (`keystone-vg`):
- root 30 G (35% used)
- var 12 G (15%)
- **home 399 G (60%, 227 G used / 153 G free)** — watch growth
- tmp 2.7 G, swap 24 G
- **OS:** Debian 13, kernel 6.12.85+deb13
- **Docker:** v29.4.2, overlay2, **userns-remap=default**,
live-restore=true, icc=false, no-new-privileges=true. Data root
symlinked `/var/lib/docker → /home/user/docker-data`.
### Active services (28 containers)
Including: traefik, socket-proxy, authentik (server+worker+pg+redis),
forgejo + forgejo-runner, misskey + db + redis, x-source nginx,
rocketchat + mongodb, tuwunel + tuwunel-txt, cinny-txt, commet-web,
signup-page + signup-txt, livekit + lk-jwt-service, dl-veilor, pihole,
headscale, n8n + postgres, step-ca, filebrowser-mc, minecraft-mc,
anythingllm, plus 2 stale `alpine:3` shells from userns-host bypass.
### Domain → service map (all on `*.s8n.ru` or `*.veilor[.uk]`)
`sys.s8n.ru` (traefik dash), `git.s8n.ru` (forgejo, NEW), `auth.s8n.ru`
(authentik), `pihole.s8n.ru`, `signup.txt.s8n.ru`, `hs.s8n.ru`
(headscale), `rc.s8n.ru` (rocketchat), `n8n.s8n.ru`, `txt.s8n.ru`
(cinny), `mx.s8n.ru` (tuwunel-txt), `x.veilor` (misskey),
`matrix.veilor.uk`, `chat.veilor.uk` (commet), `livekit.veilor.uk`,
`signup.veilor.uk`, `dl.veilor.org`.
### Deployment patterns
- Compose: `/opt/docker/<svc>/docker-compose.yml`
- Data: named docker volumes under
`/home/user/docker-data/100000.100000/volumes/` + per-service
bind mounts. Newer services (forgejo, forgejo-runner, minecraft)
on `/home/docker/<svc>/` to dodge 30 G root.
- userns-remap quirk: container UIDs shifted +100000.
Workaround: alpine root container or chown to 101000.
- Docker socket exposure: traefik does NOT mount docker.sock; goes
via tecnativa/docker-socket-proxy on socket-proxy-net.
- Networks: `proxy` + `socket-proxy-net` + `misskey-frontend` +
per-stack internals (authentik-internal, misskey-internal, etc.).
- Middleware chain: `trusted-only@file → security-headers@file
→ rate-limit@file → <service-specific>` with `no-guest@file`
for routers needing tailnet+LAN but blocking public.
### Auth patterns
- **Authentik (auth.s8n.ru)** — central OIDC, all 4 components healthy.
**Currently mostly unwired.** Forgejo runs native auth (no OAUTH
section in app.ini). RC, n8n, anythingllm, filebrowser likely
local-auth too. Authentik present but underused.
- **Forgejo** — local users + PAT, admin `s8n-ru`, SSH 222.
- **Headscale** — preauthkey enrollment + `headscale-deny-leaks@file`.
- **Traefik dashboard** — basicauth + trusted-only@file.
### Backup state
- `/etc/cron.d/docker-backup` runs `/opt/docker/backup.sh` at 02:00
daily, 7-day rotation to `/opt/backups/`.
- **Script silently broken (HIGH):** matrix-postgres container is
gone (Synapse retired); rocketchat-mongodb name mismatch (script
expects `mongodb`); Mongo password reads literal
`CHANGE_ME_MONGO_ADMIN_PASSWORD`. So Rocket.Chat + (former) Matrix
dumps **not happening**. Misskey side-script works.
- **No off-host replication.** Single NVMe = total loss on disk
failure.
### Drift / risk register
- 🔴 Backup script broken (RC + ex-Matrix not dumping)
- 🔴 `anythingllm` listens 0.0.0.0:3001 with no traefik label,
bypasses entire L7 trust model. Either bind LAN-only or front via
traefik.
- 🟠 Resource limits: only minecraft-mc has memory/CPU limits.
30 other containers unbounded — runaway can OOM-kill neighbours.
- 🟠 No service-level health checks on ~half the containers.
- 🔴 `no-guest@file` IPAllowList stub: declares only
`sourceRange: ["127.0.0.0/8"]`. Routers chained with `no-guest`
reject everything except loopback unless XFF restores client IP.
**Verify** entryPoint forwardedHeaders.trustedIPs + middleware
ipStrategy.depth — misconfig either 403s real users or accepts
spoofed XFF.
- 🟡 office (100.64.0.4) not in `trusted-only@file` despite
`tag:infra` per SYSTEM.md.
- 🟠 RocketChat: first-admin setup still pending — wizard endpoint
takeover risk until claimed.
- 🟡 Stale `alpine:3` shell containers (userns-host bypass leftovers).
`docker rm -f` after each one-shot.
- 🟡 Archived compose dirs (`pocket-id.archived-*`, `matrix-old`)
contain secrets — move off docker tree.
- 🟡 `/home` 60% with growing volumes (Ollama, mongo, postgres ×3).
No quotas.
### Mem pressure: none right now
Top consumer minecraft 9.35 / 18 GiB cap (52% of cap, ~30% host).
All others < 2.2%. Headroom good.
---
## 4 — Stack rating (10 axes)
| Axis | Score | Top weakness |
|---|---|---|
| Architectural coherence | 8 | Drift artifacts (sync-conflict files, parallel `_archive`/`archive`) |
| Security posture | 7 | F4 no LUKS on server (regression); F30 ip_forward=1; F12 partial revert |
| Reproducibility | 6 | Most stacks on `:latest`; no IaC; admin bootstrap uncoded |
| Operational maturity | **4** | **No metrics/alerts; backups untested; on-call="user reads logs"** |
| Cost discipline | 9 | Single residential ISP + single home server is "cheap because fragile" |
| Threat model clarity | 6 | No written THREAT_MODEL.md; AGPL §13 source endpoint deferred |
| Update hygiene | 5 | `:latest` floats; no staged rollout; recovery = "edit compose, restart" |
| Documentation quality | 8 | SYSTEM.md is 979 lines; CV + team-msg.txt + sync-conflicts in repo root |
| Network resilience | 5 | Single residential WAN; control + data plane same box; no Tor/SOCKS fallback |
| Branding/product discipline | 9 | "X" rebrand close to veilor easy to confuse in logs/docs |
### Overall: **7/10**
Above-average self-hosted stack. Better-documented than 90% of
homelabs, with audit discipline most small SaaS shops don't reach,
and a coherent identity/CA story (own root CA via step-ca, own VPN
control plane via Headscale, own Matrix homeserver). Loses points on
operational maturity (no monitoring, no offsite/tested backups, no
rollback), one critical regression (no LUKS on nullstone), and
inherent fragility from single-host single-ISP design.
The gap between **known weaknesses** and **fixed weaknesses** is the
limiting factor operator clearly *can* fix these (audit closes 27/35
findings in 3 days), they just haven't yet.
### Comparison
- vs **Stock Fedora desktop + GitHub:** wins decisively (8 vs 3) on
network/identity/AGPL discipline.
- vs **secureblue + GH Actions:** stronger on server-side sovereignty;
weaker on client posture and CI. Roughly tied overall, different axes.
- vs **Hetzner-VPS hobbyist stack:** loses on resilience + update
hygiene, wins on cost + GPU inference + identity depth. This stack
more ambitious; Hetzner more boring-and-reliable.
- vs **Cloudflare/Workers managed:** wins on sovereignty + GPU + Matrix
ability. Loses on uptime + DDoS + zero-patching. This stack's whole
reason to exist is the inverse tradeoff and it makes that tradeoff
coherently.
---
## 5 — Recommended service additions
### Top 5 priority (deploy in this order)
| # | Service | Why now | Effort | Maintenance |
|---|---|---|---|---|
| 1 | **Restic + autorestic** | Single biggest gap. nullstone NVMe failure = total loss right now. Encrypted incremental to B2/Wasabi or to onyx. | M | S |
| 2 | **Vaultwarden** | N services with N storage methods for secrets. Centralize before count grows. | S | S |
| 3 | **Gatus** | Otherwise you find out about a downed service from a friend on Matrix. Cert-expiry alone catches the silent killer. Alerts via Tuwunel webhook or ntfy. | S | S |
| 4 | **CrowdSec** | Pi-hole only sees DNS layer. Public Matrix fed candidates + RC + Misskey + signup pages = HTTP attack surface. Bouncer plugin blocks at Traefik. | M | S |
| 5 | **Beszel** | Once Restic is filling disk + CrowdSec flagging IPs, you want one dashboard. | S | S |
### Anti-recommendations
| Service | Why NOT |
|---|---|
| **Nextcloud** | Heavy (1.5 GB+ RAM idle), notorious upgrade pain. Use Seafile if you need files. |
| **Full LGTM stack** (Grafana+Prom+Loki+Alertmanager) | Five services to do what Beszel+Gatus do for solo-op. |
| **Mastodon** | You already run Misskey-fork. Federating two ActivityPub silos doubles moderation. |
| **HashiCorp Vault** | Complexity-to-benefit ratio terrible for one operator. Infisical or pass-with-git enough. |
| **Authelia** | Duplicates Authentik. Pick one. |
### Consolidation suggestions
- **Cinny + various Element/Commet forks:** pick **one** web client
per Matrix instance. Each fork = separate audit + CSP + branding burden.
- **n8n:** if only used for 2-3 simple flows, replace with shell
scripts in Forgejo Actions cron. n8n's value is the GUI for
non-coders; you're a coder.
- **Step-CA + Let's Encrypt:** confirm zero overlap. If step-ca only
issues one cert, kill it.
- **dl-veilor + signup pages:** if static, fold into single Caddy
container behind Traefik. Two containers for static HTML is two
too many.
### Other notable picks (lower priority)
- **Seafile CE** file sync (much lighter than Nextcloud)
- **Karakeep** (formerly Hoarder) bookmarks/RSS/read-later, AI tags
via your local Ollama / friend RTX 4080
- **ntfy** formalize the push-notification target you're already
using ad-hoc
- **Forgejo Packages** already implicit, just enable for container
registry + npm/cargo/maven/generic
---
## 6 — Action items (severity-ordered)
### Ship-blocking (do this week)
1. **Fix `/opt/docker/backup.sh`** remove dead matrix-postgres,
correct rocketchat-mongodb container name, replace literal
`CHANGE_ME_MONGO_ADMIN_PASSWORD`. Verify next 02:00 run produces
non-zero RC + Mongo dumps.
2. **Bind anythingllm to LAN-only** OR add traefik front with
`no-guest@file`. Currently public on :3001.
3. **Verify `no-guest@file` ACL** confirm `sourceRange` covers
LAN + tailnet, not just loopback. Verify XFF chain restores
real client IP.
4. **Claim RocketChat first-admin** takeover risk until then.
5. **Enable LUKS2 on nullstone** (F4 regression) schedule reinstall
window with TPM2 unlock; or until then, LUKS-on-file loopback
for step-ca root key + acme.json + Mongo keyfile.
### High-value next (do this month)
6. Deploy **Restic + autorestic** with B2/Wasabi target + restore drill.
7. Deploy **Vaultwarden** + migrate secrets out of `.env` files.
8. Deploy **Gatus** with cert-expiry checks + Matrix/ntfy alerts.
9. Resolve **sync-conflict files** at ai-lab repo root.
10. **Pin docker images by digest** for critical stacks (already done
for Misskey; do tuwunel/livekit/cinny/pihole/RC/Traefik next).
### Defer / planned
- Office workstation install + first audit
- Fold dl-veilor + signup pages into single Caddy
- Replace n8n with Forgejo Actions cron (if usage <5 flows)
- Move Headscale + step-ca to $4/mo VPS for SPOF mitigation
---
## 7 — File index
| Output | Path |
|---|---|
| This synthesis | `~/ai-lab/nullstone-server/audit-report-2026-05-05.md` |
| Migration detail | `~/ai-lab/nullstone-server/forgejo/migration-report-2026-05-05.md` |
| Forgejo runbook | `~/ai-lab/nullstone-server/forgejo/deploy-runbook.md` |
| Forgejo memory | `~/.claude/projects/-home-admin-ai-lab/memory/project_forgejo_nullstone.md` |
| veilor-os strategy | `~/ai-lab/_github/veilor-os/docs/STRATEGY.md` |
| veilor-os roadmap | `~/ai-lab/_github/veilor-os/docs/ROADMAP.md` |
| veilor-os threat model | `~/ai-lab/_github/veilor-os/docs/THREAT-MODEL.md` |