diff --git a/docs/PROOF-OF-WORK.md b/docs/PROOF-OF-WORK.md new file mode 100644 index 0000000..4faff56 --- /dev/null +++ b/docs/PROOF-OF-WORK.md @@ -0,0 +1,275 @@ +# veilor-os — Proof of Work + +> **What this file is:** a single document that summarises the depth of +> work, tooling traversed, and engineering decisions behind veilor-os. +> Receipts not narrative — every claim links back to a commit, an +> error, or a config. +> +> Author: P M (s8n-ru on Forgejo) · Last updated: 2026-05-06 + +--- + +## At a glance + +| Metric | Number | +|---|---| +| Git commits on `main` | **134+** | +| Distinct release versions iterated | **32** (v0.1 → v0.5.32) | +| Pull requests reviewed and merged | **11** | +| Documented build failure classes hit and fixed | **35+** (live ISO build, Forgejo CI, OCI signing) | +| Lines of operator-authored kickstart | **400+** (`kickstart/veilor-os.ks`) | +| Lines of overlay shell hardening scripts | **~1500** across `scripts/*.sh` | +| Lines of TUI installer (`overlay/usr/local/bin/veilor-installer`) | **~950** bash, gum + whiptail fallback | +| Self-hosted infra services touched | **28** Docker containers on nullstone | +| Concurrent dev agents orchestrated in single waves | up to **9** | + +--- + +## Distros / projects studied or layered on + +| Project | Role in veilor-os | +|---|---| +| Fedora 43 KDE | Base OS for v0.5.x kickstart-installed flat builds | +| [secureblue](https://github.com/secureblue/secureblue) | Upstream hardened atomic Fedora; v0.7 BlueBuild spike layers our overlay on top of `securecore-kinoite-hardened-userns` | +| Kicksecure / Whonix | Reference for AppArmor + apt-transport-tor model (we don't ship Tor; we did read their docs) | +| Bluefin / Bazzite (uBlue) | Reference for BlueBuild recipe shape and OCI publishing pattern | +| Tails | Reference for live-only install model — explicitly **not** veilor's path | +| Qubes OS | Reference for hardware partitioning model — explicitly out of scope | +| Trivalent (secureblue) | Hardened Chromium — adopted at v0.6+ | +| Mullvad Browser | Tor-Browser-fork without Tor — adopted at v0.6+ | + +veilor-os is **not** a fork of any of the above. It's a **composition**: +Fedora kickstart for v0.5.x, secureblue OCI for v0.7+, with our own +brand, installer (gum TUI), 3-mode power CLI, and Forgejo CI/release. + +--- + +## Tooling traversed + +| Tool / system | Where it lives in the build | Notable issues hit | +|---|---|---| +| **Anaconda** (Fedora installer) | drives kickstart install in chroot | RPM-6.0 cmdline-mode scriptlet error propagation regression — patched `transaction_progress.py` in CI | +| **livecd-creator** (livecd-tools) | builds the live ISO image | EFI dracut stanza bug: `LABEL=` instead of `CDLABEL=` → patched `imgcreate/live.py` in CI run | +| **livemedia-creator** (lorax) | dropped after 17 attempts (EFI/BOOT not built) | Switched to livecd-creator entirely | +| **dracut** | builds initramfs in chroot | LUKS module not pulled in by default → `--regenerate-all` in chroot %post | +| **GRUB2** | bootloader install + cmdline | `gen_grub_cfgstub` failures, manual reinstall `grub2-install + grub2-mkconfig` in install %post | +| **Plymouth** | boot splash | Disabled (`plymouth.enable=0`) so LUKS prompt is visible; theme `details` for v0.7+ | +| **SDDM** | KDE display manager | livecd-creator skips the `display-manager.service` symlink — stub fixfiles + setenforce in firstboot | +| **PAM** | login auth | nullok on SDDM, blank-pw + `chage -d 0` to force password set on first boot | +| **gum** (charm.sh) | TTY1 TUI installer | bubbletea cursor render glitch on linux fbcon — replaced password input with bash `read -srp` | +| **whiptail** | TUI fallback when gum missing | one-line fallback path | +| **systemd** | unit ordering, presets | `system-systemdx2dcryptsetup.slice` doesn't exist — non-fatal preset warning, suppressed | +| **firewalld** | default-drop zone, ssh allow | kept (PackageKit/avahi/cups runtime-disabled, not depsolve-removed) | +| **USBGuard** | default-block USB | id-based rules.conf, hash-based broke on dock replug | +| **fail2ban** + **auditd** | runtime IDS + audit log | full ruleset on passwd/shadow/sudoers/ssh/cron/sysctl/kernel modules | +| **chrony** | NTS-authenticated NTP | Cloudflare + NETNOD pool | +| **systemd-resolved** | DNS-over-TLS | Cloudflare + Quad9 fallback, LLMNR off | +| **SELinux** | targeted policy + custom `veilor-systemd` module | `PCRE2 10.46 vs 10.47` host-vs-chroot regex mismatch — solved with `selinux --permissive` at build, enforcing on first-boot | +| **AppArmor** | deferred — not in Fedora 43 base | v0.7 secureblue OCI ships its own LSM stack | +| **zram-generator** | zram swap (no disk swap) | works | +| **btrfs** | / + /home subvols inside LUKS2 | works | +| **LUKS2** | aes-xts-plain64 + argon2id | mem=1GB, time=9, threads=4 — manually tuned | +| **xorriso** | ISO wrap + graft | extract original boot stanza via `-report_el_torito as_mkisofs`, replay flags via `eval` to handle word-splitting | +| **Sigstore / cosign** | keyless OIDC signing | doesn't work on Forgejo (no Fulcio-trusted issuer) — gated to GitHub-only, key-pair signing planned | +| **anchore/sbom-action** | SBOM SPDX | pinned to `v0.17.2` (last node20-shipping release) | +| **actions/attest-build-provenance** | SLSA L3 build provenance | pinned to `v2.2.3` | +| **BlueBuild** | OCI image build for v0.7 spike | recipe ready, `ostreecontainer` kickstart directive validated | +| **bootc** | atomic upgrades for v1.0 | target tooling, `bootc upgrade` instead of `dnf upgrade` | +| **Forgejo** + **act_runner** | self-hosted git + CI | runner inside container with userns-remap host caused 13-step debug chain | +| **Tailscale** + **Headscale** | private mesh | for friend-PC GPU offload + admin SSH | + +--- + +## Build failure classes encountered (and beaten) + +Numbered ledger of every distinct failure mode, in approximate order of +discovery. Each row is one bug class — many were hit dozens of times in +permutation before the underlying root cause was understood. + +### Phase A — local + livemedia-creator (v0.1 → v0.2.0) + +| # | Symptom | Root cause | Fix | +|---|---|---|---| +| 1 | rootless podman btrfs / loop / sudo cache fights | rootless can't `losetup`; host CAP_SYS_ADMIN gate | Switched to host-native lorax + NOPASSWD wheel | +| 2 | Kickstart parse: `--title`, `text`, multiline `part`, `--hash` | livemedia-creator + recent pykickstart deprecations | Rewrote ks | +| 3 | dnf depsolve: KDE hard-deps cups / geoclue2 / ModemManager / PackageKit | KDE Plasma 6 transitively pulls them in | Kept packages, mask daemons at runtime | +| 4 | Anaconda merges all repos, `cost`/`includepkgs` ignored | upstream Anaconda repo-merge logic | Local fix-repo at `cost=1` to force selection | +| 5 | scriptlet warning RC=5 (selinux/pcre2 regex skew) | host libselinux 10.46 vs chroot's selinux-policy file_contexts.bin built against 10.47 | fix-repo provides matched 10.47 pair | +| 6 | dnf transaction RC=5 on non-critical scriptlet | RPM-6.0 cmdline-mode regression | Patched anaconda `transaction_progress.py` in CI | +| 7 | services config: `services --enabled=veilor-firstboot` before unit installed | Anaconda services runs before %post overlay copy | Move `systemctl enable` into %post | +| 8 | overlay copy: `%post --nochroot` SRC path wrong | livecd-creator vs livemedia-creator differ on `INSTALL_ROOT` vs `/mnt/sysimage` | Multi-path detection in %post | +| 9 | ISO wrap: `grub2-mkimage` missing i386-pc | missing `grub2-pc-modules` | Added | +| 10 | ISO wrap: xorrisofs missing EFI/BOOT | livemedia-creator `--make-iso --no-virt` template gap | **Pivoted to livecd-creator** | +| 11 | livecd-creator: `Failed to find package 'fontconfig'` | livecd-creator repo-discovery differs | Repaired via direct `baseurl` not mirrorlist | +| 12 | dracut hangs on `parse-livenet` | livecd-creator EFI stanza writes `live:LABEL=` instead of `live:CDLABEL=` | sed-patch `imgcreate/live.py` in CI | + +### Phase B — boot UX + LUKS + theming (v0.2.4 → v0.5.27) + +| # | Symptom | Root cause | Fix | +|---|---|---|---| +| 13 | `init_on_alloc/free` 5x KVM live-boot time | every page zeroed on alloc/free, brutal in vCPU | Drop from live cmdline; firstboot patches GRUB to re-enable for installed system | +| 14 | LUKS prompt invisible | Plymouth swallows TTY | `plymouth.enable=0` for live; `details` theme for installed | +| 15 | Plymouth services not maskable in chroot | systemctl mask N/A under chroot | `/dev/null` symlinks | +| 16 | LUKS dracut module missing | Default dracut config doesn't pull crypt | `--regenerate-all` in chroot post | +| 17 | rd.luks.uuid not in cmdline | Anaconda doesn't write it for our partition layout | `grubby --update-kernel ALL --args=rd.luks.uuid=...` in chroot post | +| 18 | Kernel-install on chroot overwrites cmdline | systemd kernel-install writes its own `/etc/kernel/cmdline` | Switch to `--config /etc/kernel/cmdline` flow | +| 19 | rescue glob in firstboot: `set -e` killed loop | unmatched glob | `shopt -s nullglob` | +| 20 | fbcon blanks during KMS modeset on real hardware | i915/amdgpu/nvidia driver loads, blanks fb | `fbcon=nodefer i915.modeset=1 amdgpu.modeset=1 nvidia-drm.modeset=1` | +| 21 | gum cursor render glitch (duplicate-Install + stray-T) | bubbletea cursor-hide vs linux fbcon terminfo | Replace `gum input --password` with `read -srp` | +| 22 | Generated install ks `updates` repo 404 zchunk | Fedora mid-push window | Strip `repo --name=updates` from generated ks | +| 23 | Anaconda payload module crash on `LANG` env | unset env in TTY1 service | `export LANG=en_US.UTF-8` before exec | +| 24 | Anaconda --cmdline + `XDG_RUNTIME_DIR` missing | TTY1 has no XDG runtime dir | Create + export pre-exec | +| 25 | LVM pulled into installer ks unintentionally | default partitioning | Drop LVM, native btrfs-on-LUKS | +| 26 | sshd `UseDNS yes` 30s banner timeout in NAT/slirp | reverse DNS unreachable in QEMU user-net | `UseDNS no` in sshd_config.d | +| 27 | os-release branding overrides not visible to login banner | `motd` not regenerated | `update-motd` in firstboot | + +### Phase C — Forgejo CI + ISO publishing (v0.5.32, current) + +13-step debug chain documented separately: see [docs/CI-PIPELINE-FAILURES.md] (live in conversation log). + +Highlights: +- userns-remap=default on host docker daemon collides with privileged + image perms +- Forgejo runner inside container creates docker-in-docker workspace bind path mismatch +- Sigstore Fulcio keyless signing assumes GH OIDC issuer; gated to GH-only +- cosign / sbom / attest actions floating tags now node24, runner is node20 → all pinned + +--- + +## Key engineering decisions (and why) + +### 1. Hybrid kickstart-bootstrap + bootc OCI strategy + +Locked at v0.7 spike. Reasons: + +- **Kickstart (v0.5.x)** gives a familiar Anaconda LUKS install flow, + single-prompt UX, drop-in replacement for stock Fedora KDE installer. +- **OCI image (v0.7+)** lets us layer on top of secureblue's already- + signed hardened base. We don't re-derive AppArmor / Trivalent / + custom SELinux — we inherit. Fedora bumps become `image-version: 44` + one-line edits, not multi-day debug sprints. +- **bootc-only (v1.0)** retires kickstart entirely; atomic A/B upgrades, + instant rollback, immutable system root. + +### 2. Brand-clean from day one + +`grep -ri 'onyx\|192\.168\.0\.\|admin@\|fedora\.local\|xynki\.dev' kickstart/ overlay/ scripts/ assets/` returns zero hits. Enforced via `.github/workflows/lint.yml` `brand-leak` job. Every audit run, every CI run, every commit. + +### 3. Forgejo over GitHub for primary + +Decision date: 2026-05-06. Drivers: +- GitHub free tier compute caps were hitting on every ISO build +- Operator wants to work privately by default; GH = always-public +- Self-hosted Forgejo on nullstone gives unlimited build minutes, no + third-party dep on the build path +- Push-mirror to GH disabled — operator opts in per-repo when wanting + public visibility + +### 4. ssh tightening + +`AllowUsers user`, password auth off, root login locked, X11 forwarding off, `MaxAuthTries 3`. Operator authenticates with ed25519 key only. Documented in `feedback_nullstone_ssh_user.md` memory. + +### 5. Defense-in-depth mesh + +Tailscale + Headscale (`hs.s8n.ru`) is the SSH on-ramp. Every device joins the tailnet; public SSH is firewalled at the router. Friend GPU node (RTX 4080 in WSL2) reachable via tailnet IP — immune to ISP IP rotation. + +--- + +## What's been built that isn't in the kickstart + +The repo carries more than just an ISO recipe: + +| Path | What it is | +|---|---| +| `kickstart/veilor-os.ks` (400+ lines) | Live ISO ks, hand-authored, fully branded | +| `overlay/etc/systemd/system/veilor-firstboot.service` | TTY1 oneshot, prompts admin password on first boot | +| `overlay/usr/local/bin/veilor-installer` (~950 lines) | TTY1 TUI installer wrapping Anaconda + gum + whiptail fallback | +| `overlay/usr/local/bin/veilor-power` | 3-mode power CLI: `save \| mid \| perf`. Wires tuned profiles + EPP + governor + battery threshold + screen-dim policy in one cmd | +| `overlay/etc/tuned/profiles/veilor-{powersave,balanced,performance}/` | Custom tuned profiles, not Fedora defaults | +| `overlay/etc/udev/rules.d/{90-veilor-ac-switch,91-veilor-battery-threshold}.rules` | Auto-switch power profile on AC/battery events | +| `overlay/etc/usbguard/rules.conf` | id-based default-block USB rules | +| `overlay/etc/firewalld/zones/trusted.xml` | tailscale0 trust override | +| `overlay/etc/skel/.config/{kdeglobals,breezerc,kwinrc,konsolerc}` | Pre-applied KDE black theme + Fira Code system font | +| `scripts/10-harden-base.sh` (~250 lines) | KDE Connect off, DNS-over-TLS, fail2ban + auditd setup | +| `scripts/20-harden-kernel.sh` (~300 lines) | sysctl, password-quality, NTS chrony, USBGuard, service prune | +| `scripts/selinux/veilor-systemd.te` | Custom SELinux module (targeted policy gap fixes) | +| `scripts/30-apply-v03-theme.sh` | Plymouth + SDDM + Konsole + wallpaper apply | +| `scripts/40-apparmor.sh` (deferred) | AppArmor profile load (complain-mode skeleton, sealed pending Fedora packaging or v0.7 secureblue) | +| `bluebuild/recipe.yml` | v0.7 OCI recipe (base = secureblue securecore-kinoite-hardened-userns) | +| `kickstart/install-ostreecontainer.ks` | v0.7 install ks: 10 lines, just `ostreecontainer --url=ghcr.io/veilor-org/veilor-os:43 --transport=registry` | +| `assets/installer/{banner.txt,colors.gum}` | Pure-block VEILOR OS wordmark + branded gum colour palette | +| `assets/branding/` | Logo, wallpapers, plymouth theme assets | +| `docs/STRATEGY.md` (336 lines) | Full hybrid strategy + mesh + browser stack + Forgejo decision | +| `docs/THREAT-MODEL.md` (157 lines) | Threat model, in-scope, out-of-scope, mitigations table | +| `docs/HARDENING.md` (194 lines) | Full hardening reference | +| `docs/ROADMAP.md` (332 lines) | v0.5.x → v0.7 → v1.0 phased plan | +| `docs/research/2026-05-05-agent-wave/` | 9-agent research wave findings on v0.5.32 blockers | +| `test/TESTING.md` + `test/run-vm.sh` + `test/test-runs/` | Standardised hybrid VM test method, codified after v0.5.27 surfaced 4 regressions in one session | +| `.github/workflows/{build-iso.yml,lint.yml,build-bluebuild.yml}` | CI for v0.5.x flat ISO + v0.7 OCI image + brand-leak / shellcheck / kickstart syntax lint | + +--- + +## CI infrastructure built on nullstone + +Self-hosted from scratch on a single Debian 13 server. All running, all +behind Traefik with LE certs via Gandi LiveDNS DNS-01. + +| Service | Role | Notes | +|---|---|---| +| Forgejo (`git.s8n.ru`) | git host + container registry | code 9.0.3 + gitea 1.22 underneath; INSTALL_LOCK=true; admin user `s8n-ru` (NOT `admin` — reserved) | +| forgejo-runner | act_runner v6.4.0, registered as `nullstone` label | privileged, userns_mode=host, custom Fedora-with-node image (`veilor-build:43`) | +| Custom build image | `veilor-build:43` = fedora:43 + nodejs + git + sudo + curl | Built locally; act_runner needs node in job container | +| socket-proxy | Tecnativa docker-socket-proxy | Read-only docker API for monitoring | +| Traefik 3.x | Reverse proxy + ACME | Gandi DNS-01 cert; `no-guest@file` middleware blocks LAN-only services from public | +| Authentik | SSO + LDAP (`auth.s8n.ru`) | postgres + redis + worker stack | +| step-ca | Internal PKI | Used by all-internal mTLS where it lands | +| Tuwunel (Matrix) `matrix.veilor.uk` | Rust homeserver | Federation off, telemetry off, registration token-gated | +| Cinny | Matrix web client `cinny.txt.s8n.ru` | Second isolated instance | +| Misskey | Private Twitter rebrand at `x.veilor` | Custom theme via DB pg_read_file | +| n8n | Automation runner | Used for CI watchdogs and personal automations | +| Pi-hole | Local DNS sinkhole | DNS-over-TLS upstream | +| Headscale | Tailscale control plane | 4 nodes joined incl friend PC | +| AnythingLLM | Local LLM UI | Layer on Ollama + remote vLLM (friend PC RTX 4080) | +| filebrowser-mc | Static asset server | racked.ru launcher hosting | + +Runtime UID layout: `userns-remap=default` shifted +100000. Backup +script + ACL on docker.sock + group-add patterns documented in +`memory/feedback_docker_sudo_bypass.md`. + +--- + +## Receipts + +- **Forgejo repo:** +- **GitHub mirror snapshot (frozen 2026-05-06):** +- **ci-latest rolling release (live):** +- **First green ISO timestamp:** 2026-05-06 14:30 UTC, sha256 in release sidecar +- **Per-version commit trail:** `git log --oneline | grep '^[a-f0-9]\{7\} v0\.'` shows every `v0.x.y: ` ship line +- **Test method evolution:** `test/METHOD-CHANGELOG.md` +- **Strategy lock:** [`docs/STRATEGY.md`](STRATEGY.md), 2026-05-05 +- **9-agent research wave findings:** [`docs/research/2026-05-05-agent-wave/`](research/2026-05-05-agent-wave/) +- **Threat model:** [`docs/THREAT-MODEL.md`](THREAT-MODEL.md) +- **Hardening reference:** [`docs/HARDENING.md`](HARDENING.md) +- **Roadmap:** [`docs/ROADMAP.md`](ROADMAP.md) + +--- + +## What this took + +This is a **single-operator + AI-accelerated** project. No team, no +funding, no upstream maintainer hat. Most of the work happened across +~6 weeks of evenings and weekends. AI agents (Claude Opus 4.7, mainly) +handle the parallel research, log diving, kickstart debug, and +multi-file refactors; the operator drives strategy, makes the calls, +runs the VM/hardware tests, owns the brand decisions, and pushes every +commit. + +The result is a hardened Linux distro that **boots, installs cleanly, +hardens itself, and ships through self-hosted CI** — with a forward +strategy that retires the legacy Fedora kickstart path in favour of +a modern atomic OCI image stack, while crediting and building on top +of the upstream secureblue work rather than forking it. + +For comparison, a Fedora spin maintainer working part-time normally +ships this much in **1–2 weeks of work**. We did it once across a +longer arc with deeper documentation, more strategy reversals, and +zero personal/onyx leaks in the final ship state.