infra: STATE.md + cobblestone intake template

STATE.md = source-of-truth for current state + pending decisions.
Append to changelog when state changes. Don't rewrite history.

COBBLESTONE-INTAKE.md = template the operator fills before agent A2
runs the cobblestone audit. Captures network/SSH/hardware/OS/docker
state + operator-driven migration decisions (LUKS, DE, userns-remap,
RC revive-or-retire, Headscale SPOF, cockpit).
This commit is contained in:
s8n 2026-05-06 10:12:50 +01:00
parent 09d80a63f6
commit f59a6e90d0
2 changed files with 224 additions and 0 deletions

145
STATE.md Normal file
View file

@ -0,0 +1,145 @@
# Infra state — 2026-05-06
Source-of-truth for **what is true now** + **what is pending**.
When state changes, append to top of "Changelog" and edit the
relevant table/section. Don't rewrite history.
## Forge
**Primary git host: <https://git.s8n.ru/> (Forgejo).** GitHub is a
push-mirror only since 2026-05-05. When the operator says "my git",
they mean Forgejo.
- Forgejo: <https://git.s8n.ru/> (LE cert, `no-guest@file` ACL)
- Forgejo SSH: `ssh://git@192.168.0.100:222/<owner>/<repo>.git`
(LAN only; router port-forward 222 not yet configured)
- Admin user: `s8n-ru` (NOT `admin` — reserved by Forgejo)
- Push-mirror to GH: every commit + 8h interval, all repos green
- Forgejo runner: registered on nullstone, labels
`ubuntu-24.04 + nullstone` (privileged Fedora 43 for ISO builds)
## Hosts
| codename | role | LAN IP | OS | LUKS | Status |
|---|---|---|---|---|---|
| onyx | dev workstation | 192.168.0.28 (DHCP, registry says .6 — drift) | Fedora 43 KDE | yes | active |
| nullstone | infra (migrating off) | 192.168.0.100 | Debian 13 | **NO** ⚠️ | active until cutover |
| office | workstation | 192.168.0.5 | Fedora 43 KDE (pending install since 2026-04-19) | tbd | not yet on net |
| **cobblestone** | **infra (target)** | **TBD** | **Debian, has DE** | **TBD — install with LUKS** | **fresh, awaiting access details** |
Mesh:
- Tailscale + Headscale (`hs.s8n.ru` on nullstone) — control plane
moves to cobblestone with the migration. Identity continuity =
carry `/var/lib/tailscale/state` OR re-enroll.
- Friend PC (`100.64.0.3`, RTX 4080) — vLLM in WSL2 over tailnet
for remote LLM inference.
## Repos (8 total)
| Repo | Owner | Forgejo | GH mirror | Notes |
|---|---|---|---|---|
| veilor-os | veilor-org | ✅ primary | ✅ | hardened Fedora KDE remix |
| veilor-server | veilor-org | ✅ primary | ✅ NEW | Debian preseed bootstrap |
| infra | veilor-org | ✅ primary | ✅ NEW | this repo |
| x | s8n-ru | ✅ primary | ✅ | private Misskey fork |
| minecraft-launcher | s8n-ru | ✅ primary | ✅ | racked.ru launcher |
| minecraft-server | s8n-ru | ✅ primary | ✅ | racked.ru MC server |
| minecraft-client | s8n-ru | ✅ primary | ✅ | racked.ru MC client config |
| auth-limbo | s8n-ru | ✅ primary | ✅ | Paper plugin (AuthMe fix) |
**No repos on GH that aren't mirrored from Forgejo.**
⚠️ **`racked-team` GH org does NOT exist** per `gh api`. Memory says
it's the Minecraft brand org — drift to reconcile. Either:
- Move all `s8n-ru/minecraft-*` repos under `racked-team` org (create
it, transfer)
- OR drop the `racked-team` mention from memory (it was aspirational)
## Service inventory (nullstone, current)
28 active containers. Categorized:
```
MESH headscale, pihole
GIT forgejo, forgejo-runner
IDENTITY authentik-server, -worker, -postgres, -redis, step-ca
CHAT tuwunel (matrix.veilor.uk), tuwunel-txt (mx.s8n.ru),
cinny-txt, commet-web, signup-page, signup-txt,
livekit-server, lk-jwt-service
SOCIAL misskey, misskey-db, misskey-redis, x-source nginx
ADMIN traefik, socket-proxy
AUTOMATION n8n-n8n-1, n8n-postgres
HOST APPS minecraft-mc, anythingllm, dl-veilor, filebrowser-mc
DOWN rocketchat, rocketchat-mongodb (volumes preserved)
EPHEMERAL alpine:3 shells (userns-host bypass leftovers — clean up)
```
## Pending decisions (waiting on operator)
| Decision | Recommendation | Status |
|---|---|---|
| Cobblestone IP + SSH access | hand over from operator | ⏳ blocked |
| Cobblestone hardware specs | hand over from operator | ⏳ blocked |
| LUKS on cobblestone | **mandatory** (fixes F4) | ⏳ blocked on access |
| DE on cobblestone | **30-day soak then strip**; install cockpit today | ⏳ runbook drafted |
| userns-remap on cobblestone | **drop** (simpler bind-mounts; lose 1 layer defense) | ⏳ runbook drafted |
| Headscale + step-ca SPOF mitigation | phase-2: move to $4/mo VPS | ⏳ deferred |
| RocketChat revive or retire | 30-day timer; if unused, retire and free volumes | ⏳ stopped 2026-05-06 |
| anythingllm public binding | bind LAN-only or front via traefik+no-guest | ⏳ open issue |
| /opt/docker/backup.sh fixes | matrix-postgres + rocketchat-mongodb + literal CHANGE_ME pw | ⏳ open issue |
| `no-guest@file` ACL config | populate sourceRange beyond loopback; verify XFF chain | ⏳ open issue |
## Pending audits / ratings (from 5-agent wave)
Stack rating: **7/10** ([AUDIT-2026-05-05.md](./AUDIT-2026-05-05.md)).
Top 5 weaknesses (severity):
1. 🔴 No LUKS on nullstone (regression)
2. 🔴 backup.sh broken silently (RC + ex-Matrix not dumping)
3. 🔴 no-guest@file stub (loopback-only sourceRange)
4. 🔴 anythingllm public on 0.0.0.0:3001
5. 🟠 No off-host backup replication (single-NVMe SPOF)
Top 5 services to add (priority order):
1. Restic + autorestic → B2/Wasabi (encrypted, dedup, incremental)
2. Vaultwarden (centralize secrets out of `.env` files)
3. Gatus (uptime + cert-expiry; alerts via Tuwunel/ntfy)
4. CrowdSec (HTTP/SSH layer block at Traefik)
5. Beszel (lightweight observability)
## Pending tracked work
### v0.5.32 ship (veilor-os)
Per `_github/veilor-os/docs/ROADMAP.md`. CI failed last attempt on GH
runner shortage; flip workflow to `runs-on: nullstone` to use
Forgejo runner instead.
### v0.7 BlueBuild spike (veilor-os)
Branch: `v0.7-bluebuild-spike` on Forgejo. Recipe ready, kickstart
ready, GH Actions wired (won't trigger now since main host moved).
Adapt to Forgejo Actions — should be drop-in with `runs-on:
ubuntu-24.04` since runner has that label.
## Changelog
### 2026-05-06
- Created `veilor-org/infra` Forgejo repo + GH push-mirror
- Stopped RocketChat (`docker compose stop`); volumes preserved
- 5-agent stack audit shipped (`AUDIT-2026-05-05.md`)
- Cobblestone deployed (fresh Debian + DE) — awaiting access details
- This STATE.md created
### 2026-05-05
- Forgejo + forgejo-runner deployed on nullstone at git.s8n.ru
- 6 GH repos migrated to Forgejo with push-mirrors back to GH
- Admin pw rotated; SSH key for s8n-ru added; PAT generated
- veilor-os v0.5.31 four-bug fix shipped
- 9-agent research wave on veilor-os v0.5.32 blockers
- secureblue layering strategy locked (`STRATEGY.md`)
- THREAT-MODEL.md drafted
### 2026-05-04 (and earlier)
- See `_github/veilor-os/docs/ROADMAP.md` "Lessons learned" section
- See `~/.claude/projects/-home-admin-ai-lab/memory/MEMORY.md` for
per-project memos

View file

@ -0,0 +1,79 @@
# Cobblestone intake — operator hand-off
When operator brings cobblestone online for migration prep, fill in
this template, then unblock agent A2 (cobblestone audit).
## Network
| Field | Value | Notes |
|---|---|---|
| LAN IP | TBD | static recommended; reservation in router OR static `/etc/network/interfaces` |
| Hostname | `cobblestone` | matches CLAUDE.md device registry |
| Tailscale IP | TBD (when joined) | preserve via `/var/lib/tailscale/state` carry-over OR re-enroll |
| MAC | TBD | |
| Router port-forwards | TBD: 80, 443, 25565, ?222 | `222` for Forgejo SSH (long-deferred fix from nullstone era) |
## SSH
| Field | Value |
|---|---|
| Default user | TBD (Debian default = first-install user) |
| ssh key from onyx authorized? | TBD (if no, run `ssh-copy-id <user>@<ip>`) |
| sshd config | hardened? | nullstone pattern: `AllowUsers user`, no root, no pw auth |
After hand-over, add to `~/.ssh/config` on onyx:
```
Host cobblestone
HostName <IP>
User user
IdentityFile ~/.ssh/id_ed25519
```
## Hardware
| Field | Value |
|---|---|
| CPU | TBD (model + cores) |
| RAM | TBD (GB) |
| Disk(s) | TBD (NVMe? SATA SSD? size?) |
| GPU | TBD (none / iGPU / discrete) |
| TPM2 chip | TBD (`ls /dev/tpm*`) |
## OS state
| Field | Value |
|---|---|
| Debian version | TBD (`cat /etc/debian_version`) |
| Kernel | TBD (`uname -r`) |
| LUKS at install | TBD (`lsblk -f` looking for `crypto_LUKS`) ⚠️ |
| Desktop env | TBD (XFCE / GNOME / KDE / MATE / Cinnamon) |
| Display manager | TBD (`systemctl status display-manager`) |
⚠️ **If LUKS=NO at install**: see [DE-DECISION-cobblestone.md](DE-DECISION-cobblestone.md)
section "post-install LUKS-on-file fallback". Better to reinstall
with LUKS2 from scratch — this is the F4 regression fix.
## Docker
| Field | Value |
|---|---|
| Docker installed | TBD |
| Version | TBD |
| daemon.json | not yet — match nullstone pattern |
| userns-remap | **DROP** per migration recommendation |
## Operator-driven decisions (fill before cutover)
- [ ] LUKS reinstall: yes / LUKS-on-file fallback / accept-no-LUKS
- [ ] DE: strip-now / 30-day soak then strip / keep-forever
- [ ] userns-remap: drop / keep
- [ ] RocketChat: revive on cobblestone / retire (delete volumes)
- [ ] Headscale + step-ca: keep on cobblestone / move to $4 VPS
- [ ] cockpit web admin: install / skip
## Once filled in
Commit + push this file. Then say "agent A2 go" — A2 ssh's into
cobblestone, runs the audit commands from `MIGRATION-...md` section
1, writes `COBBLESTONE-AUDIT-<date>.md` next to this file.