From 4e9782a18ad15c31bdb7e1626a3ebeb7c0f8542e Mon Sep 17 00:00:00 2001 From: veilor-org Date: Tue, 5 May 2026 14:52:53 +0100 Subject: [PATCH] =?UTF-8?q?docs:=209-agent=20research=20wave=20findings=20?= =?UTF-8?q?=E2=80=94=20v0.5.32=20blocker=20map?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Logs the full output of the 9-agent deep-dive run on 2026-05-05 to docs/research/2026-05-05-agent-wave/. Pulls every actionable finding into one indexed location so v0.5.32 planning has a paper trail. Files: docs/research/2026-05-05-agent-wave/README.md — index docs/research/2026-05-05-agent-wave/01-...real-hardware.md — Plymouth + LUKS edge cases docs/research/2026-05-05-agent-wave/02-...firstboot-ux.md — SDDM + first-boot UX docs/research/2026-05-05-agent-wave/03-...spike-plan.md — bootc-image-builder 1-week spike docs/research/2026-05-05-agent-wave/04-...tier-2.md — AppArmor + nftables + audit + homed docs/research/2026-05-05-agent-wave/05-...launch.md — threat model + v0.7 launch checklist docs/research/2026-05-05-agent-wave/06-...log-capture.md — virtio-9p host-share for anaconda logs docs/research/2026-05-05-agent-wave/07-...skel-branding.md — /etc/skel gap audit docs/research/2026-05-05-agent-wave/08-...ci-hardening.md — SHA-pin actions + SBOM + SLSA L3 docs/research/2026-05-05-agent-wave/09-...failure-modes.md — real-hardware pessimistic audit Plus the prior linter-applied: docs/ROADMAP.md — Lessons learned section, v0.5.32 active block, v0.6 promotion of veilor-postinstall + veilor-doctor, v0.7 bootc spike scheduled docs/THREAT-MODEL.md — drafted by Agent 5; in/out scope, comparison matrix, v0.7 launch checklist Top blockers identified for v0.5.32 (cross-cited in README): 1. Suspend/resume wifi death (kernel.modules_disabled=1) 2. veilor-firstboot.service WantedBy=graphical.target 3. kernel-upgrade grub drift 4. USBGuard hash-rules problem (already learned on onyx) 5. firewalld blocks tailscale0 6. /etc/skel/ empty 7. virtio-9p log capture replaces broken virtio-serial path Wave + verifier pattern (per ROADMAP lessons learned #4) validated: 9 parallel agents on distinct topics produced converging blocker list. The same pattern landed v0.5.31 four-bug fix from the prior 4-agent verification wave on v0.5.30 outcome. --- docs/ROADMAP.md | 185 ++++++++++++------ docs/THREAT-MODEL.md | 152 ++++++++++++++ .../01-plymouth-luks-real-hardware.md | 109 +++++++++++ .../02-sddm-firstboot-ux.md | 117 +++++++++++ .../03-bootc-spike-plan.md | 158 +++++++++++++++ .../04-hardening-tier-2.md | 125 ++++++++++++ .../05-threat-model-launch.md | 65 ++++++ .../06-anaconda-log-capture.md | 96 +++++++++ .../07-kde-skel-branding.md | 100 ++++++++++ .../2026-05-05-agent-wave/08-ci-hardening.md | 131 +++++++++++++ .../09-realhw-failure-modes.md | 167 ++++++++++++++++ docs/research/2026-05-05-agent-wave/README.md | 42 ++++ 12 files changed, 1388 insertions(+), 59 deletions(-) create mode 100644 docs/THREAT-MODEL.md create mode 100644 docs/research/2026-05-05-agent-wave/01-plymouth-luks-real-hardware.md create mode 100644 docs/research/2026-05-05-agent-wave/02-sddm-firstboot-ux.md create mode 100644 docs/research/2026-05-05-agent-wave/03-bootc-spike-plan.md create mode 100644 docs/research/2026-05-05-agent-wave/04-hardening-tier-2.md create mode 100644 docs/research/2026-05-05-agent-wave/05-threat-model-launch.md create mode 100644 docs/research/2026-05-05-agent-wave/06-anaconda-log-capture.md create mode 100644 docs/research/2026-05-05-agent-wave/07-kde-skel-branding.md create mode 100644 docs/research/2026-05-05-agent-wave/08-ci-hardening.md create mode 100644 docs/research/2026-05-05-agent-wave/09-realhw-failure-modes.md create mode 100644 docs/research/2026-05-05-agent-wave/README.md diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md index 42e4b72..571c475 100644 --- a/docs/ROADMAP.md +++ b/docs/ROADMAP.md @@ -9,6 +9,33 @@ For the historical record of what landed in each release, see --- +## Lessons learned through v0.5.x install grind + +Five things v0.5.27–31 changed about how we plan: + +1. **Anaconda + RPM-6.0 + `--cmdline` is brittle** — three install + failures, kernel cmdline written to four places before one worked. + `--location=none` skips `CollectKernelArgumentsTask`, + `kernel-install` reads `/etc/kernel/cmdline` not `/proc/cmdline`, + and `transaction_progress.py` masks real failures if patched too + broadly. Justifies promoting the bootc-image-builder spike to v0.7. +2. **Test procedure must gate every tag** — v0.5.27 only surfaced four + bugs in one VM run because the run walked every step in order. + `test/TESTING.md` and `test/test-runs/` are now load-bearing. +3. **Real hardware is not optional** — VM catches install logic, not + KMS / fbcon / firmware. Spare laptop + friend's laptop must run + pre-tag, every time. +4. **Multi-agent debug waves work, but only with a verifier** — the + v0.5.31 four-bug fix came from a 4-agent verification wave on + v0.5.30 outcome. Wave + verifier = signal; wave alone = noise. +5. **"We ask once, with sane defaults" is the distro UX** — every + v0.5 install bug we shipped a workaround for (locale, hostname, + USBGuard policy, drivers) is something `veilor-postinstall` could + ask the user about cleanly on first boot. That promotes + `veilor-postinstall` from v0.6 background item to flagship. + +--- + ## v0.2 — green ISO + base hardening (DONE) Reproducible CI build pipeline. UEFI+BIOS bootable live ISO from a single @@ -24,27 +51,44 @@ Released `v0.2.5` on 2026-05-01. CI on every push to `main`. --- -## v0.5.27–v0.5.28 — install path stabilisation (active) +## v0.5.27–v0.5.31 — install path stabilisation (DONE) The bridge between v0.2 (greens at all) and v0.3 (looks polished). All -of these are install-path bugs surfaced by the formal hybrid-VM test -procedure (`test/TESTING.md`). +install-path bugs surfaced by the formal hybrid-VM test procedure +(`test/TESTING.md`). Five releases, ~hours of debug, three install +failures before greening. -- **v0.5.27 (DONE)** — `rd.luks.uuid` injected via `grubby - --update-kernel=ALL` so Fedora 43 BLS entries actually carry the - arg; without it first boot drops to dracut emergency shell. GRUB - rebrand (single "veilor-os" entry, rescue suppressed). `fbcon=nodefer` - in live cmdline so real laptops don't black-screen on KMS handoff. - ASCII gum cursor (cosmetic only — duplicate-render bug runs deeper, - carried to v0.5.28). -- **v0.5.28 (next)** — locale picker removed; en_US.UTF-8 hardcoded - for install (post-install menu in v0.7 handles locale switch). gum - input render glitches on linux fbcon (duplicate "Install", stray T - in password fields) get a real fix — likely replace `gum input - --password` with bash `read -srp`, since masked input does not need - TUI polish and every other distro installer does it this way. - Anaconda transaction containment so the user sees a branded - "INSTALLING" panel instead of `Configuring xxx.x86_64` scroll. +- **v0.5.27 (DONE)** — `rd.luks.uuid` via `grubby --update-kernel=ALL`, + GRUB rebrand, `fbcon=nodefer`, ASCII gum cursor. +- **v0.5.28 (DONE)** — locale locked en_US.UTF-8, dropped updates repo, + patched anaconda `transaction_progress.py` to silence `Configuring + xxx.x86_64` scroll, excluded man-db. +- **v0.5.29 (DONE)** — narrowed anaconda patch (was masking real + failures), LUKS UX, initramfs assertion. Five-fix bundle from 7-agent + research wave. +- **v0.5.30 (DONE)** — broad error suppression, manual bootloader path, + virtio log capture for post-mortem. +- **v0.5.31 (DONE)** — `--location=none` was making anaconda skip + `CollectKernelArgumentsTask`; kernel-install reads + `/etc/kernel/cmdline` as source of truth, veilor never wrote it, so + BLS entries shipped with empty cmdline. Three-path write + (`/etc/kernel/cmdline` + `/etc/default/grub` + grubby) plus explicit + `kernel-install add`. + +## v0.5.32 — next ship (active) + +Outstanding from the grind, immediate priority for the next tag: + +- **End-to-end VM green run** — v0.5.31 lands the kernel-cmdline fix + but no full hybrid-VM pass has signed it off. Run the procedure in + `test/TESTING.md` to install + reboot + login, file the report in + `test/test-runs/`, then tag. +- **Real-hardware run on the spare laptop** — VM is necessary not + sufficient. Friend's laptop is mate's-test, spare is ours. KMS, + fbcon, USB controller, real-firmware Secure Boot only show up here. +- **gum input render glitch** — duplicate "Install", stray T in + password fields on linux fbcon. Replace `gum input --password` with + bash `read -srp`; cosmetic only but visible on every install. --- @@ -121,53 +165,75 @@ specified — defaults stay sane for a daily driver. --- -## v0.6 — ergonomics +## v0.6 — ergonomics (PROMOTED — install grind proved we need this) Smooth the operator experience so day-to-day work doesn't fight the -hardening. +hardening. `veilor-postinstall` and `veilor-doctor` were v0.6 background +items — promoted to **headline** features after v0.5.27–31 made it +clear that "we ask once, with sane defaults" is what separates a +distro from a kickstart. -- **`veilor-update`** — wraps `dnf upgrade` with a pre-check (snapshot - available?), an auditd pause, and post-update sysctl/SELinux - validation. One command, no surprises. -- **`veilor-doctor`** — diagnostic helper. Walks the audit checklist - (`getenforce`, `mokutil --sb-state`, `firewall-cmd --get-default-zone`, - fail2ban status, USBGuard policy, sysctl drift) and reports what's - drifted from baseline. This is the **post-install audit** path: - every veilor-os user can run `veilor-doctor` weekly and see exactly - where their system has drifted from the hardened defaults. -- **`veilor-postinstall`** — first-login welcome menu, EndeavourOS-style - but cleaner. Single TUI screen with: keyboard layout, locale, hostname - override, optional package presets (dev / media / homelab), driver - choices (NVIDIA / Intel / AMD), Bluetooth opt-in, audit baseline run. - Each step is skippable, runs once on first SDDM login, never auto-runs - again. Lives in `overlay/usr/local/bin/veilor-postinstall` + a - `~/.config/autostart/veilor-postinstall.desktop` that self-deletes - after first run. Replaces the current "user has to know what to - configure" model with "we ask, once, with sane defaults pre-selected". +- **`veilor-postinstall`** (PROMOTED — flagship of v0.6) — first-login + welcome menu, EndeavourOS-style but cleaner. Single TUI screen: + keyboard layout, locale (deferred from install per v0.5.28), + hostname override, package presets (dev / media / homelab), drivers + (NVIDIA / Intel / AMD), Bluetooth opt-in, USBGuard snapshot, audit + baseline run, `veilor-doctor` first run. Each step skippable, runs + once on first SDDM login, self-deletes the autostart after. This is + the **only** UX feature that ships in v0.6 day one — everything else + builds on it. +- **`veilor-doctor`** (PROMOTED — user-facing, not just dev tool) — + the post-install audit. Walks `getenforce`, `mokutil --sb-state`, + `firewall-cmd`, fail2ban, USBGuard policy, sysctl drift, and reports + drift from baseline. Runs from `veilor-postinstall` on day one, then + weekly via `systemd --user` timer. Plain-English output ("your + firewall is OK", "USBGuard policy has 3 unknown devices"); not a JSON + dump. **Stretch:** machine-readable mode for `veilor-server` later. +- **`veilor-update`** — wraps `dnf upgrade` AND `flatpak update` in + one command. Per `feedback_system_update.md`, partial-update is a + recurring trap; veilor's update tool covers both by default. Adds + pre-check (snapshot available?), auditd pause, post-update SELinux + validation. - **Opt-in installer ISO** — flip from live-only to live + installer, user picks at boot menu. Installer uses the v0.5 kickstart with full LUKS + btrfs subvols + zram. - **First-boot UX** — replace TTY password prompt with a small Plymouth-rendered dialog. Less raw. - **Bluetooth opt-in helper** — single command to enable + bring up - the daemon + add the user to the right group. Currently three - commands. + the daemon + add the user to the right group. --- -## v0.7 — public flex +## v0.7 — public flex + bootc spike -Take veilor-os out of "private repo, contained audience" mode. +Take veilor-os out of "private repo, contained audience" mode. Order +matters: people demand threat model FIRST when a security distro goes +public, benchmarks come after. -- **Public docs site** — Hugo or mdBook on `veilor.org`, generated from - `docs/`. Single source of truth for INSTALL, HARDENING, BUILD, - ROADMAP, RELEASE, CONTRIBUTING. -- **Repo public** — flip GitHub visibility, announce. -- **Comparison + benchmarks** — published numbers vs stock Fedora KDE - on cold boot, idle RAM, idle network egress, suspend/resume time. -- **Threat model published** — what veilor-os defends against, what it - does not. Honest scope. -- **Press kit** — wallpapers, logo, screenshots, feature one-liner. +1. **Threat model published** (FIRST — gating item) — what veilor-os + defends against, what it does not. Honest scope. No claim of + anti-state-actor; concrete on lost-laptop, USB-attack, browser + compromise, supply-chain. Reviewers will demand this before reading + anything else. +2. **Public docs site** — Hugo or mdBook on `veilor.org`, generated + from `docs/`. Single source of truth. +3. **Repo public** — flip GitHub visibility, announce. +4. **Comparison + benchmarks** — published numbers vs stock Fedora KDE + on cold boot, idle RAM, idle network egress, suspend/resume time. + After threat model, not before. +5. **Press kit** — wallpapers, logo, screenshots, feature one-liner. + +### bootc-image-builder spike (PROMOTED from v1.0+) + +The v0.5.27–31 grind cost us hours on anaconda + RPM-6.0 + +`--cmdline` mode brittleness: three install failures, kernel-cmdline +written to four different places before one worked, transaction-progress +patches that masked real bugs. **bootc-image-builder builds a +`Containerfile` once and gets a bootable image** — no anaconda, no +kickstart, no `%post --nochroot` vs `%post`, no +`CollectKernelArgumentsTask`. A v0.7 spike (NOT v1.0) evaluates whether +the next major rev should rebase on it. Spike outcome determines +whether `veilor-atomic` (stretch goal) becomes the mainline. --- @@ -194,15 +260,16 @@ daily driver. ## Stretch goals — not on the v0.x → v1.0 critical path These are spin variants that share veilor-os DNA but need their own -kickstart or build tool. They live on a separate track and do not -block v1.0. +kickstart or build tool. - **`veilor-server`** — no KDE, no GUI, hardened headless Fedora for - homelab / VPS. Same overlay, different package set. + homelab / VPS (e.g. nullstone). Same overlay, different package set. + **Not blocked**, but waits on `veilor-doctor` machine-readable mode + (v0.6) so headless installs have a way to report drift without a TUI. - **`veilor-kiosk`** — single-app Plasma session, locked-down user, - read-only root. For dedicated-purpose machines. + read-only root. **Not blocked.** - **`veilor-atomic`** — rpm-ostree / bootc-image-builder rebase. - Immutable root, transactional updates, atomic rollback. Different - build tool entirely (likely `bootc-image-builder`); all veilor - hardening would translate to a `Containerfile`. Schedule for after - v0.5+ once the standard spin is stable. + Status now depends on the **v0.7 bootc spike**: if the spike shows + bootc fixes the anaconda-grind class of bugs, `veilor-atomic` + becomes the v1.0+ mainline rather than a stretch variant. If not, + it stays a parallel track. diff --git a/docs/THREAT-MODEL.md b/docs/THREAT-MODEL.md new file mode 100644 index 0000000..92e3e86 --- /dev/null +++ b/docs/THREAT-MODEL.md @@ -0,0 +1,152 @@ +# Threat Model + +> **Status:** Draft for v0.7 public flex. Honest scope. + +veilor-os is a hardened daily-driver desktop. Not a paranoia OS, not an +anonymity OS, not an isolation OS. This document exists so that +security-conscious developers, journalists, and activists can decide whether +the threat model fits their actual adversary before they trust the system. + +If your adversary is on the "out of scope" list below, **use a different +tool**. veilor-os will not save you, and we will not pretend otherwise. + +--- + +## In scope — what veilor-os defends against + +| Adversary / scenario | veilor-os mitigation | +|---|---| +| Lost or stolen laptop, powered off | LUKS2 (aes-xts-plain64, argon2id, mem=1 GB) on root + swap-as-zram. Disk yields ciphertext. | +| Generic browser / email malware (drive-by RCE, malicious attachment) | SELinux enforcing + `veilor-systemd` policy + sysctl hardening (kptr_restrict, ptrace=2, perf=3, BPF JIT harden, full ASLR, no SUID core dumps). AppArmor stack lands in v0.5. | +| Console-side USB attack (BadUSB, rubber ducky, juice-jack) | USBGuard daemon, default-block, empty allowlist on first boot. New device = explicit operator allow. | +| SSH brute-force / credential-stuffing | sshd password-auth off, root login off, MaxAuthTries=3, fail2ban with sshd + pam-generic jails wired to firewalld rich-rule. | +| Post-incident forensics ("what happened?") | auditd rules covering passwd/shadow/sudoers/ssh/cron/sysctl/kernel modules and all privileged binaries. Logs survive reboot. | +| Supply-chain on the OS image itself | Fedora's signed shim → GRUB → kernel chain (Secure Boot enforced). v0.4 adds GPG-signed ISO + sha256 + own MOK. | +| Unprivileged local user attempting LPE | root account locked (`passwd -S root` → `L`), single sudo user with pwquality minlen=14 / 4 classes, kernel module loading frozen 30 s after graphical boot. | +| Network-listening services as attack surface | firewalld default zone = `drop`; only sshd answers. abrt/cups/avahi/bluetooth/ModemManager/kdeconnectd/PackageKit are masked. | +| Time-based MITM (back-dated certs, replay) | NTS-authenticated chrony, DNS-over-TLS via systemd-resolved, LLMNR off. | + +--- + +## Out of scope — what veilor-os does NOT defend against + +We are honest about this list because pretending otherwise is how people get +hurt. **If your adversary is here, pick a different tool.** + +| Adversary / scenario | Why veilor-os doesn't help | Use instead | +|---|---|---| +| Nation-state firmware-level implant (UEFI, ME, BMC) | Secure Boot validates the OS, not the firmware below it. We do not flash custom firmware. | Heads / coreboot on supported hardware. | +| Evil-maid attack on a running, unlocked system | LUKS keys live in RAM while the system is up. A physically present attacker can dump RAM (cold boot, DMA via Thunderbolt, debug header). | Power off when unattended. Disable Thunderbolt DMA in firmware. Qubes-in-a-Faraday-bag if you're that target. | +| Hardware keylogger / hardware mod between keyboard and machine | We're software. Software cannot detect a passive hardware tap. | Physical custody of the device. Tamper-evident seals. | +| Targeted RCE on the user session (browser 0-day, signal-app exploit) | KDE Plasma is not sandboxed. A logged-in compromise has the user's full data and tokens. SELinux confines daemons, not the desktop. | Qubes (per-app VM isolation). | +| Side-channel attacks on AES (timing, cache, power analysis) | We use stock kernel crypto. No constant-time guarantees beyond what the kernel/CPU provide. | Threat-specific HSM. | +| Physical attack on a TPM2 chip (probe, glitch, decap) | We don't ship TPM2 binding yet. Even when v1.0 lands, TPM2 is not anti-tamper hardware. | Off-device key custody. | +| Network-level traffic correlation / traffic analysis | All packets leave the box on the local IP. We don't onion-route. | Tails, Whonix, Tor. | +| Trust-on-first-use attacks (user clicks "accept bad cert") | We can't override the user's decisions. Bad SSL/SSH key acceptance by the operator is out of scope. | Enrolment policy, MDM. | +| Adversary with sustained physical access and time | Given enough physical time and tools, any laptop falls. | Operational security, not OS choice. | + +--- + +## Hardening tradeoffs (what you give up) + +Hardening that breaks ordinary work gets called out, not hidden. + +- **SELinux enforcing** — some apps (proprietary, out-of-tree) ship + without policy. Symptom: `EACCES` despite correct file perms. + Workaround: write a local policy module; do not switch to permissive. +- **LUKS2 argon2id (mem=1 GB / time=9)** — boot 5–30 s slower on older + CPUs. The cost of a passphrase that survives a GPU attacker. +- **USBGuard default-block** — every new device needs an explicit allow. + First-boot: plug trusted devices in, run `usbguard generate-policy`. + Forget this and your USB-C dock looks broken. +- **Module lockdown 30 s after graphical boot** — out-of-tree drivers + (NVIDIA proprietary, VirtualBox, out-of-tree wireguard) will fail. + Load early via initramfs or use the in-tree alternative. +- **firewalld zone = drop** — KDE Connect, mDNS printer discovery, SMB + browsing don't work until explicitly opened. This is the point. +- **No PackageKit / no Flatpak by default** — updates happen on your + terms via `dnf upgrade`. + +--- + +## Where veilor-os IS like Tails / Whonix / Qubes + +- Threat model published. Transparency about scope is the price of being + taken seriously. +- Default-deny firewall (`drop` zone, ssh inbound only). +- Encrypted at rest by default — LUKS2 + argon2id, no-disk-swap (zram). + +## Where veilor-os DIFFERS + +- **Daily-driver target.** Boot it once, install it, use it for years. + Not a session-only / amnesia OS. +- **Single-VM / single-kernel.** No per-app compartmentalisation. A + browser RCE owns your session. (See "out of scope".) +- **Persistent identity by design.** Your `~`, your keys, your shell + history persist. This is a feature for an operator, a misfeature for + an activist evading correlation. + +--- + +## Comparison matrix + +Scoring legend: `✓` shipped & on by default, `~` partial / opt-in, +`✗` not provided, `n/a` not applicable to that distro's model. + +| Axis | veilor-os | Stock Fedora KDE | Kicksecure | Tails | Qubes OS | secureblue | +|---|:---:|:---:|:---:|:---:|:---:|:---:| +| **Encrypted at rest by default** | ✓ (LUKS2 argon2id) | ~ (optional) | ✓ | n/a (amnesic) | ✓ | ✓ | +| **MAC enforcing OOTB** | ✓ (SELinux + AppArmor v0.5) | ✓ (SELinux) | ✓ (AppArmor) | ✓ (AppArmor) | ✓ (per-VM) | ✓ (SELinux) | +| **Default-deny firewall** | ✓ | ✗ | ✓ | ✓ (Tor-only) | ✓ | ✓ | +| **USB default-block** | ✓ (USBGuard) | ✗ | ✓ | ✓ | ✓ (sys-usb) | ✓ | +| **Per-app isolation (VM/sandbox)** | ✗ | ✗ | ✗ | ~ (AppArmor) | ✓ (Xen VMs) | ~ (Flatpak/bwrap) | +| **Anonymity / Tor by default** | ✗ | ✗ | ✗ | ✓ | ~ (Whonix VMs) | ✗ | +| **Daily driver target (persistent)** | ✓ | ✓ | ✓ | ✗ | ✓ (heavy) | ✓ | +| **Signed releases (publisher key)** | ✓ (v0.4) | ✓ | ✓ | ✓ | ✓ | ✓ | +| **Threat model published** | ✓ (this doc) | ✗ | ✓ | ✓ | ✓ | ✓ | +| **Hardware compatibility (laptops)** | ✓ (Fedora kernel) | ✓ | ~ | ~ (live USB) | ~ (Xen-pinned) | ✓ | + +--- + +## Where veilor-os fits + +Pick veilor-os if your job is to write code, edit docs, manage +infrastructure, read mail, browse — and you want a desktop that won't +quietly betray you to a generic adversary while you do it. **You are the +user, not the target of a state.** + +Pick **Tails** for amnesia and Tor by default. **Qubes** if you must assume +any app could be compromised. **Kicksecure** for similar hardening on +Debian. **secureblue** for a hardened atomic Fedora. **Stock Fedora KDE** +if you just want Fedora with no opinions. + +--- + +## v0.7 public-launch checklist + +These are the items that gate flipping the repo public and posting: + +- [ ] Threat model finalised and published (this document). +- [ ] GPG-signed releases working (v0.4 dependency — ISO + sha256 + .asc). +- [ ] Reproducible build verifiable from clean checkout (v0.4). +- [ ] mkdocs-material (or Hugo) site live on `veilor.org`, generated from + `docs/`. INSTALL, HARDENING, BUILD, ROADMAP, RELEASE, THREAT-MODEL, + CONTRIBUTING all rendered. +- [ ] Comparison + benchmark numbers published (cold boot, idle RAM, idle + egress, suspend/resume) vs stock Fedora KDE. +- [ ] Press kit page: wallpapers, logo SVG, screenshots, feature + one-liner, signed quotes from early users. +- [ ] **"What veilor-os is not"** preempt page — direct link from launch + post. Answers "why not Qubes?", "why not Tails?", "why not just + stock Fedora?" so the first hundred comments don't have to. +- [ ] Comparison post drafted for **r/linux**, **r/Fedora**, **HN**. + Same body, three formats. Lead with the threat model link, not the + black wallpaper. +- [ ] CHANGELOG.md tagged at v0.7.0 release commit; GitHub Release + created with ISO + sha256 + .asc artefacts attached. +- [ ] Repo flipped to public, `veilor.org` DNS pointed at the docs site, + Mastodon / Matrix / SimpleX announcement queued. + +--- + +*Last reviewed: v0.7 draft. Update every minor release.* diff --git a/docs/research/2026-05-05-agent-wave/01-plymouth-luks-real-hardware.md b/docs/research/2026-05-05-agent-wave/01-plymouth-luks-real-hardware.md new file mode 100644 index 0000000..3fc407b --- /dev/null +++ b/docs/research/2026-05-05-agent-wave/01-plymouth-luks-real-hardware.md @@ -0,0 +1,109 @@ +# Plymouth + LUKS unlock — real-hardware edge cases + +**Agent 1 of 9-agent wave, 2026-05-05.** + +## State at v0.5.31 + +- Live ISO cmdline pins `plymouth.enable=0 fbcon=nodefer`. +- Installed system uses Plymouth `details` theme. +- LUKS2 argon2id, no clevis / cryptenroll, no recovery key generation. +- `rd.vconsole.keymap=` not set. + +## Findings + +### 1. KMS / fbcon races + +- **Symptom:** Black screen at LUKS prompt, cursor blinks, keystrokes + swallowed but never accepted. +- **Cause:** `i915` / `amdgpu` / `nvidia-drm` modeset fires *during* + plymouthd handover. With `plymouth.enable=0` we skip the splash but + the ask-password agent still opens `/dev/tty1`, which races `fbcon` + rebind. +- **Fix:** keep `fbcon=nodefer`, append + `nvidia-drm.modeset=1 i915.fastboot=0 amdgpu.dc=1` to bootloader. + NVIDIA Optimus killer is `nvidia-drm.modeset=1`. +- **Probability:** HIGH on Optimus, MED on AMD APU, LOW on Intel iGPU. + +### 2. Plymouth theme choice — keep `details` + +- `details` (kernel/systemd journal under prompt) is best for + blind-typing because the user sees `Please enter passphrase…` *as + text*, full echo as `*`. +- `text` is minimal fallback (no echo, no journal). +- `spinner` is the documented "endless loop, no prompt" failure mode + on real laptops (adi1090x/plymouth-themes#10, Arch BBS 296529). +- **No change.** But verify `plymouth-set-default-theme details` + actually ran post-install (Debian #986023 shows it silently fails + when initramfs rebuild is suppressed). Add `dracut --force + --regenerate-all` after the call. + +### 3. Initramfs keymap — HIGH probability for non-US users + +- **Symptom:** AZERTY/QWERTZ/Cyrillic user types correct passphrase, + gets "no key available". F43 ships en-US in initramfs by default. +- **Bugs:** RHBZ 1405539, RHBZ 1890085, fedora-silverblue#3. +- **Fix:** drop a placeholder `rd.vconsole.keymap=us` AND have + `firstboot.sh` rewrite it from `/etc/vconsole.conf` after the user + picks a layout. Also `/etc/dracut.conf.d/veilor-keymap.conf` with + `install_items+=" /etc/vconsole.conf "` so keymap is *baked* into + initramfs. + +### 4. systemd-cryptsetup vs legacy `crypt` — F43 = systemd-cryptsetup + +- F40+ unconditionally uses `systemd-cryptsetup@.service` from + `/etc/crypttab`. Old `rd.luks.uuid=` cmdline still parsed. Stable + through 6.x kernels. No change needed. + +### 5. argon2id memory cost — MED on old laptops (<8 GB RAM) + +- LUKS2 default = 1 GiB memory cost, `iter-time=2000 ms`. On + Core 2 Duo / Pentium-N this becomes 8–15s unlock + thrash. + Atom-class N4020: 30s+. +- **Fix in installer post-script:** + `cryptsetup luksConvertKey --pbkdf-memory 524288 --iter-time 2000` + — halves memory to 512 MiB, knocks ~50% off unlock latency. + +### 6. TPM2 unlock — defer to v0.6 + +- F43 ships `systemd-cryptenroll --tpm2-device=auto` ([Fedora + Magazine](https://fedoramagazine.org/automatically-decrypt-your-disk-using-tpm2/)). + No clevis required. +- **v0.6 plan:** opt-in via `veilor-firstboot` → + `systemd-cryptenroll --tpm2-pcrs=7+11`. PCR 7 (secure boot state) + + 11 (kernel/initrd). Don't auto-enroll; PCR pinning is a footgun + on kernel updates. + +### 7. FIDO2 unlock — v0.7 + +- `systemd-cryptenroll --fido2-device=auto` requires `libfido2` + + hmac-secret support. secureblue ships this. Add `libfido2` to + `%packages` + `veilor-fido2-enroll` wrapper. + +### 8. Recovery key — MISSING, ship in v0.6 + +- Today: forgotten passphrase = brick. +- **Fix:** in `firstboot.sh` add + `cryptsetup luksAddKey --pbkdf argon2id /dev/X <(systemd-creds + setup --print-key | head -c 64)` and print the 64-char key once + to a numbered envelope-style screen. Mirrors macOS FileVault. + +## Action items + +| # | Change | Target | +|---|--------|--------| +| 1 | `nvidia-drm.modeset=1 i915.fastboot=0 amdgpu.dc=1 rd.vconsole.keymap=us` to bootloader append | v0.5.32 | +| 2 | `/etc/dracut.conf.d/veilor-keymap.conf` with `install_items+=" /etc/vconsole.conf "` | v0.5.32 | +| 3 | Force `dracut -f --regenerate-all` after `plymouth-set-default-theme details` | v0.5.32 | +| 4 | argon2id retune (`40-luks-tune.sh`) | v0.6 | +| 5 | Recovery-key generation in firstboot | v0.6 | +| 6 | TPM2 opt-in via `systemd-cryptenroll --tpm2-pcrs=7+11` | v0.6 | +| 7 | FIDO2 opt-in | v0.7 | + +## Sources + +- [LUKS keyboard layout — fedora-silverblue/issue-tracker#3](https://github.com/fedora-silverblue/issue-tracker/issues/3) +- [RHBZ 1405539 — keymap not honored on initramfs rebuild](https://bugzilla.redhat.com/show_bug.cgi?id=1405539) +- [RHBZ 1890085 — English keymap forced in initramfs](https://bugzilla.redhat.com/show_bug.cgi?id=1890085) +- [Fedora Magazine — TPM2 autodecrypt with systemd-cryptenroll](https://fedoramagazine.org/automatically-decrypt-your-disk-using-tpm2/) +- [Leo3418 — argon2id LUKS tuning](https://leo3418.github.io/collections/gentoo-config-luks2-grub-systemd/tune-parameters.html) +- [QubesOS#8600 — argon2id parameters](https://github.com/QubesOS/qubes-issues/issues/8600) diff --git a/docs/research/2026-05-05-agent-wave/02-sddm-firstboot-ux.md b/docs/research/2026-05-05-agent-wave/02-sddm-firstboot-ux.md new file mode 100644 index 0000000..4c6731e --- /dev/null +++ b/docs/research/2026-05-05-agent-wave/02-sddm-firstboot-ux.md @@ -0,0 +1,117 @@ +# SDDM + first-boot UX failure modes + +**Agent 2 of 9-agent wave, 2026-05-05.** + +## Findings + +### 1. SDDM has no username prefilled — BLOCKS LOGIN (perceived) + +- User sees blank greeter; no signal that the only user is `admin`. +- **Fix:** `/etc/sddm.conf.d/veilor.conf` add + `[Users]\nRememberLastUser=true` plus seed + `/var/lib/sddm/state.conf [Last]\nUser=admin\nSession=plasma`. + +### 2. chage -d 0 + SDDM autologin race + +- With `Relogin=false` (current), single-shot is safe. +- **Fix:** Document `Relogin=false`. Don't combine `Autologin=true` + with `chage -d 0`. + +### 3. PAM expired-pw change inline in SDDM + +- Plasma 6 SDDM 0.21+ renders the chain. **But** if password fails + pwquality (cracklib min=14 + complexity from + `10-harden-base.sh`), error text shown briefly then form resets — + user sees no clear reason for rejection. +- **Fix:** `/etc/security/pwquality.conf.d/10-veilor.conf` with + documented rules + Plasma startup notification showing them. + +### 4. Wayland session start failure on virtio-vga — BLOCKS LOGIN + +- KWin tries `wlroots`/DRM, fails to acquire `/dev/dri/card0` if + `virtio_gpu` kernel module not loaded. +- **Fix:** add `plasma-workspace-x11` to `%packages`. SDDM session + menu shows `Plasma (X11)` fallback. + +### 5. Plasma 6 first-run wizards on /etc/skel-empty + +- KWin compositor backend pick + Plasma welcome center + accent + colour wizard — modal stealing focus on first session. +- **Fix:** seed `/etc/skel/.config/`: + - `kwinrc` `[Compositing]\nBackend=OpenGL` + - `kdeglobals [General]\nAccentColor=...` + - `plasma-welcomerc [General]\nLastSeenVersion=99` (suppresses welcome) + +### 6. SELinux relabel after first boot — looks like hang + +- `touch /.autorelabel` triggers full restore on rootfs; 90s on + 4 GB live install, 3-5min on real disk. User hard-resets thinking + it crashed → corrupted relabel state. +- **Fix:** replace with `veilor-relabel.service` that prints + `[veilor] relabeling SELinux file contexts (1/N): %s` to TTY1 + with progress, plus one-time post-relabel KDialog notification. + +### 7. USBGuard blocks input at SDDM — BLOCKS LOGIN on desktops + +- If `/etc/usbguard/rules.conf` empty/missing, USBGuard + `ImplicitPolicyTarget=block` (default) blocks USB. SDDM running + but USB keyboard dead. +- **Fix:** ship a baseline `rules.conf`: + `allow with-interface equals { 03:00:* 03:01:* }` + (HID class) so any keyboard/mouse works pre-policy. + +### 8. NetworkManager DHCP — LOW severity + +- Wired auto-connects fine. Wi-Fi: silent failure unless SSID + preconfigured. Acceptable; Plasma 6 ships `plasma-nm` widget. +- **Polish:** `/etc/xdg/autostart/veilor-firstboot-net-check.desktop` + → KDialog "Connect to network?" if `nmcli general` is `disconnected`. + +### 9. veilor-firstboot.service ordering — BLOCKS LOGIN on real installs + +- **Current:** `WantedBy=multi-user.target` only. +- **Real installs:** default to `graphical.target`, so unit never runs. +- Admin pw stays `veilor` + chage-expired. SDDM PAM bounces to + chauthtok screen — recoverable but ugly. +- **Fix:** `WantedBy=graphical.target multi-user.target`. Add + `Before=graphical.target`. Verify `systemctl enable + veilor-firstboot.service` (in installer line 884) resolves both. + Add `DefaultDependencies=no` + `Wants=systemd-vconsole-setup.service`. + +## Endeavour OS welcome app — design notes for veilor-postinstall + +EOS welcome (`endeavouros-team/welcome` on GitHub) is bash + yad, +~3000 LOC. Patterns to lift for veilor: + +- **Yad GTK dialog** as runtime (single binary dep). veilor (KDE) + uses `kdialog` + `qmlscene` instead — native Plasma look. +- **Tabbed layout:** Welcome | Set up apps | Security | System info | Shortcuts. +- **Self-disabling autostart:** + `~/.config/autostart/veilor-welcome.desktop` removed after user + clicks "Don't show again". +- **External script dispatch:** + `/usr/share/veilor-os/postinstall/.sh` per step. Decouples + UI from actions. +- **Update channel awareness:** pull from + `github.com/veilor-org/veilor-os` releases atom feed; show CVE + advisories from `security.atom` we publish. + +**Recommended stack:** +- `/usr/bin/veilor-welcome` (bash entrypoint, ≤300 LOC) +- `/usr/share/veilor-os/postinstall/welcome.qml` (QtQuick/Kirigami UI) +- `/usr/share/veilor-os/postinstall/steps/{01-account,02-network,03-usbguard-policy,04-update,05-tour}.sh` +- `/etc/xdg/autostart/veilor-welcome.desktop` +- Replace current `scripts/firstboot.sh` placeholder with + `step 03-usbguard-policy` (auto-generate-policy is the unfinished + core item). + +## Top three to ship next (highest UX impact, lowest risk) + +1. **`WantedBy=graphical.target multi-user.target`** in + `veilor-firstboot.service` — fixes silent SDDM-PAM-chauthtok + bounce on real installs. +2. **Username prefill** in `sddm.conf.d/veilor.conf`: add `[Users] + RememberLastUser=true` + `/var/lib/sddm/state.conf [Last] + User=admin Session=plasma`. +3. **USBGuard HID baseline `rules.conf`** — un-bricks any desktop + with USB keyboard. diff --git a/docs/research/2026-05-05-agent-wave/03-bootc-spike-plan.md b/docs/research/2026-05-05-agent-wave/03-bootc-spike-plan.md new file mode 100644 index 0000000..7229d83 --- /dev/null +++ b/docs/research/2026-05-05-agent-wave/03-bootc-spike-plan.md @@ -0,0 +1,158 @@ +# bootc-image-builder spike plan — 1-week timebox + +**Agent 3 of 9-agent wave, 2026-05-05.** Schedule: v0.7. + +## Containerfile draft + +```dockerfile +# veilor-os bootc image — Fedora 43 KDE base +FROM quay.io/fedora/fedora-bootc:43 + +ARG VEILOR_VERSION=0.6.0 + +RUN dnf install -y --setopt=install_weak_deps=False \ + @kde-desktop-environment @kde-apps @core @hardware-support @standard \ + kernel-modules kernel-modules-extra glibc-all-langpacks \ + grub2-efi-x64 grub2-efi-x64-modules grub2-pc grub2-pc-modules \ + grub2-tools grub2-tools-extra shim-x64 efibootmgr \ + newt parted cryptsetup lvm2 btrfs-progs \ + fail2ban fail2ban-firewalld usbguard usbguard-tools audit \ + policycoreutils-python-utils tuned chrony firewalld plymouth \ + git vim-enhanced tmux htop podman skopeo \ + NetworkManager NetworkManager-wifi \ + fontconfig freetype fira-code-fonts \ + zram-generator \ + && dnf remove -y --noautoremove \ + 'abrt*' snapd kde-connect open-vm-tools-desktop mlocate man-db man-pages \ + && dnf clean all && rm -rf /var/cache/dnf + +ARG GUM_VERSION=0.17.0 +ARG GUM_SHA256=69ee169bd6387331928864e94d47ed01ef649fbfe875baed1bbf27b5377a6fdb +ADD https://github.com/charmbracelet/gum/releases/download/v${GUM_VERSION}/gum_${GUM_VERSION}_Linux_x86_64.tar.gz /tmp/gum.tgz +RUN echo "${GUM_SHA256} /tmp/gum.tgz" | sha256sum -c - \ + && tar -xzf /tmp/gum.tgz -C /tmp \ + && install -m0755 /tmp/gum_${GUM_VERSION}_Linux_x86_64/gum /usr/bin/gum + +COPY overlay/ / +COPY assets/ /usr/share/veilor-os/assets/ +COPY scripts/ /usr/share/veilor-os/scripts/ + +RUN bash /usr/share/veilor-os/scripts/10-harden-base.sh \ + && bash /usr/share/veilor-os/scripts/20-harden-kernel.sh \ + && bash /usr/share/veilor-os/scripts/selinux/build-policy.sh \ + && bash /usr/share/veilor-os/scripts/kde-theme-apply.sh \ + && bash /usr/share/veilor-os/scripts/30-apply-v03-theme.sh + +RUN plymouth-set-default-theme details \ + && sed -i \ + -e 's|^GRUB_DISTRIBUTOR=.*|GRUB_DISTRIBUTOR="veilor-os"|' \ + /etc/default/grub + +# bootc kargs go in /usr/lib/bootc/kargs.d/, not /etc/default/grub +RUN mkdir -p /usr/lib/bootc/kargs.d && cat > /usr/lib/bootc/kargs.d/10-veilor-hardening.toml <<'EOF' +kargs = [ + "lockdown=integrity", + "slab_nomerge", + "init_on_alloc=1", + "init_on_free=1", + "randomize_kstack_offset=on", + "vsyscall=none", + "fbcon=nodefer", +] +EOF + +RUN systemctl enable sshd fail2ban usbguard tuned auditd firewalld chronyd sddm \ + veilor-firstboot.service veilor-modules-lock.service \ + && passwd -l root \ + && systemctl set-default graphical.target + +RUN bootc container lint +LABEL org.veilor.version=${VEILOR_VERSION} +``` + +## bootc-image-builder config (`build/disk-config.toml`) + +```toml +[customizations] +hostname = "veilor-os" + +[[customizations.user]] +name = "admin" +password = "veilor" +groups = ["wheel"] +shell = "/bin/bash" + +[customizations.kernel] +append = "lockdown=integrity slab_nomerge init_on_alloc=1 init_on_free=1 randomize_kstack_offset=on vsyscall=none fbcon=nodefer" + +[customizations.installer.kickstart] +contents = """ +zerombr +clearpart --all --initlabel +part /boot/efi --fstype=efi --size=600 +part /boot --fstype=ext4 --size=1024 +part btrfs.veilor --grow --encrypted --luks-version=luks2 --pbkdf=argon2id +btrfs none --label=veilor btrfs.veilor +btrfs / --subvol --name=root LABEL=veilor +btrfs /home --subvol --name=home LABEL=veilor +""" +``` + +## GitHub Actions workflow + +`build-bootc-iso.yml`: +- runs-on ubuntu-24.04, **timeout 30 min** (vs 90 for livecd-creator) +- permissions: `contents: write`, `packages: write` +- Build OCI image: `podman build` + `podman push ghcr.io/veilor/veilor-os:43` +- Build ISO via `quay.io/centos-bootc/bootc-image-builder:latest` + with `--type anaconda-iso --rootfs btrfs --config /build/disk-config.toml` +- Reuse split + `softprops/action-gh-release@v2` from existing workflow + +## Migration risks (10-row table) + +| # | Risk | Severity | Mitigation | +|---|------|----------|------------| +| 1 | %post --nochroot overlay-copy disappears | Low | `COPY overlay/ /` is simpler — win | +| 2 | Update model: `bootc upgrade` (image swap) replaces `dnf upgrade` | High | `veilor-update` becomes thin `bootc upgrade --apply` wrapper | +| 3 | /usr is read-only at runtime | Medium | etc-overlay handles /etc writes; relocate any /usr writers to /etc or build-time | +| 4 | SELinux module compilation in container | Medium | Works in fedora-bootc:43 (verified per upstream pattern). Test spike day 2 | +| 5 | `transaction_progress.py` patch unnecessary | Low | bootc-image-builder doesn't use dnf at install. Drop the patch. Win | +| 6 | `rd.luks.uuid` is anaconda's job again | Low | Removes ~80 lines of fragile sed/grubby code. Win | +| 7 | LUKS prompt UX: anaconda native, not gum | High | gum installer becomes `live·shell` only. v1.0 install = anaconda's native UI | +| 8 | --privileged still required | None | Same as today | +| 9 | OCI image size: ~3.5 GB compressed vs ~2.8 GB squashfs | Low | zstd:max recovers ~400 MB | +| 10 | `kernel-install` BLS: `/etc/kernel/cmdline` not honored, `/usr/lib/bootc/kargs.d/*.toml` is | Medium | Already addressed in Containerfile draft | + +## What we keep (zero churn) + +- `overlay/*` — copied verbatim by `COPY overlay/ /` +- `scripts/*.sh` — invoked verbatim by Containerfile RUN +- `assets/*` — copied verbatim +- `test/*` — adapts: `podman run --rm -it ghcr.io/veilor/veilor-os:43 /bin/bash` smoke; QEMU ISO test unchanged +- `kickstart/install.ks` — kept as fallback. Tag last anaconda build as `v0.5.99-anaconda` before flipping + +## Spike success criteria (1 week) + +| Day | Milestone | +|-----|-----------| +| 1 | Containerfile builds clean (`podman build` exit 0, `bootc container lint` exit 0) | +| 2 | `podman run` boots into image, KDE binaries present, SELinux + hardening sysctls applied | +| 3 | bootc-image-builder produces installer ISO from OCI, ksvalidator clean | +| 4 | ISO boots in QEMU to anaconda live menu | +| 5 | Install completes, LUKS single-prompt, btrfs subvols present | +| 6 | First boot reaches SDDM, admin login works, password-change-on-first-login enforced | +| 7 | Buffer for fixes; doc `docs/BUILD-bootc.md`; tag `v0.5.99-anaconda` snapshot | + +## Decision gate + +- **PASS** (all 7 criteria green): tag `v0.5.99-anaconda` as last-anaconda; + merge `bootc-spike` → `main` as `v0.6.0-bootc`; deprecate + `kickstart/veilor-os.ks` (keep `kickstart/install.ks` for one cycle). + Update ROADMAP: v1.0 ships bootc-only. + +- **FAIL** (any of risks 3, 4, 7, 10 unfixable in week 1): keep + anaconda path, defer migration to v1.1+; file each blocker as GH + issue with reproducer. + +- **HYBRID FALLBACK**: ship anaconda ISO for v0.6/v0.7, ship bootc OCI + alongside (matches existing `veilor-atomic` stretch goal). diff --git a/docs/research/2026-05-05-agent-wave/04-hardening-tier-2.md b/docs/research/2026-05-05-agent-wave/04-hardening-tier-2.md new file mode 100644 index 0000000..36888ec --- /dev/null +++ b/docs/research/2026-05-05-agent-wave/04-hardening-tier-2.md @@ -0,0 +1,125 @@ +# Hardening tier 2 — concrete plan + +**Agent 4 of 9-agent wave, 2026-05-05.** + +## Repo state already in tree + +- `scripts/apparmor/` ships **3 profiles** (`thorium`, `veilor-power`, + `lm-studio`) — complain-mode, **not auto-loaded**. No browser/mail + /Element profile. +- `scripts/selinux/` ships custom `.te` modules — primary MAC. +- `overlay/etc/audit/plugins.d/veilor-remote.conf` + + `audisp-remote.conf.disabled` — **scaffold present, opt-in switch + missing**. +- `kickstart/veilor-os.ks` — single live-ks. Real LUKS install lives + in `overlay/usr/local/bin/veilor-installer` (generates ks at runtime). +- No nftables overlay. No homed scaffold. No `veilor-audit-shipping` CLI. + +## Item-by-item plan + +### 1. AppArmor stack with SELinux — M + +Fedora 43 ships `apparmor-parser`/`libapparmor`. Kernel has both LSMs. +Stacking works since 5.1; SELinux stays primary, AppArmor confines +specific binaries by path. **No conflict** — they layer. Risk: AA +profiles based on Debian/Ubuntu paths fail on Fedora. + +**Files:** +- `kickstart/veilor-os.ks` `%packages` add `apparmor-parser apparmor-utils apparmor-profiles` +- `overlay/etc/apparmor.d/veilor.d/` (new) — vendor profiles + `firefox`, `thunderbird`, `element-desktop`, `signal-desktop` +- `scripts/40-apparmor.sh` (new) — parses + sets all veilor profiles + to **complain** on first install (logs only, no break) +- `overlay/usr/local/bin/veilor-doctor` — adds AA status check + +**Test:** `aa-status | grep complain` shows >=4 loaded; firefox writes +outside policy → audit.log denial. + +### 2. systemd-homed opt-in — L + +Default LUKS storage `homectl` drops key on suspend; resume needs PAM +unlock again — **breaks "lid open, keep working"**. Use +`--storage=fscrypt` on top of existing btrfs `/home` subvol — +suspend transparent, encrypts at rest with per-user key. + +**Files:** +- `overlay/usr/local/bin/veilor-homed-enable` (new) — confirms warning, + runs `homectl create admin --storage=fscrypt --real-name="veilor admin"` + after migrating files +- `overlay/etc/pam.d/sddm` drop-in for `pam_systemd_home.so` +- doc in `docs/HARDENING.md`. **Not auto-run** — only via post-install. + +### 3. nftables alongside firewalld — S + +firewalld speaks nftables backend on F43 — they don't conflict; +firewalld owns `inet firewalld` table. veilor-os preset = separate +`inet veilor` table loaded by its own service. + +**Files:** +- `overlay/etc/nftables/veilor.nft` (new) — table `inet veilor`: + ssh per-IP rate limit (5/min), ICMP rate limit, optional + `ip6 daddr ::/0 drop` toggled by sysctl-style `/etc/veilor/ipv6.disabled`, + anti-port-scan via `meter` set +- `overlay/etc/systemd/system/veilor-nftables.service` (new) — + `After=firewalld.service` +- `kickstart/veilor-os.ks` `%packages` add `nftables`, services-enabled + add `veilor-nftables` + +**Test:** `nft list ruleset` shows both `firewalld` AND `veilor`; +`hping3 -S -p 22 --flood` from second VM gets rate-limited. + +### 4. Audit log shipping — S + +Plumbing **already in tree** (`audisp-remote.conf.disabled`, +`veilor-remote.conf` with `active=no`). What's missing: CLI to flip +the switch with cert pinning. + +**Files:** +- `overlay/usr/local/bin/veilor-audit-shipping` (new) + - `enable HOST PORT FINGERPRINT` writes + `/etc/veilor/audit-pin.sha256`, copies `audisp-remote.conf.disabled` + → `audisp-remote.conf` with substituted host/port, enables plugin + (`active=yes`), restarts auditd + - `disable` reverses +- audisp-remote speaks TLS directly; cert pinning via `verify_peer=yes` + + `peer_cert_fingerprint` +- Use **self-signed pinned**, not LE — collectors are LAN/VPN + +**Test:** stand up `rsyslog` listener on nullstone with self-signed +cert; run helper; trigger `sudo -i`; tail nullstone for AUTHPRIV +event; revoke cert → events stop with logged TLS error. + +### 5. Installer kickstart split — needs re-scope, S + +Roadmap item is **stale**. As of v0.5.30 we already do real LUKS+btrfs +in `veilor-installer` which generates ks at runtime. **Re-scope:** +extract that generated ks template into static +`kickstart/veilor-os-install.ks` (parameterised via `%include +/tmp/answers.ks`), so reviewable in repo and reusable headlessly. + +**Files:** +- split `overlay/usr/local/bin/veilor-installer` heredoc into + `kickstart/veilor-os-install.ks` +- installer just writes answers + `cp` the ks +- CI lints both with `ksvalidator` + +### 6. Audit baseline re-run — S + +Mechanical: `cp security/audit-template.md +security/veilor-os-distro/2026-05-DD.md`, run on VM, target lower +findings count than v0.2's baseline. + +## Order, dependencies, ship plan + +Dependencies: (5) blocks (6) — audit a stable installer, not a +moving heredoc. Else parallel. + +**Total effort:** 2S + 1S(rescope) + 1S + 1M + 1L ≈ **5–7 dev-days**. + +- **v0.5.32 (small wins):** (4) audit shipping CLI + (3) nftables + preset. Both S, scaffold completion, pure overlay (no kickstart risk). +- **v0.5.33:** (5) ks split + (6) audit baseline re-run. +- **v0.6 (medium):** (1) AppArmor stack — package install + 4 profiles + + doctor integration; complain-mode keeps blast radius zero. +- **v0.7 (big lift):** (2) systemd-homed — UX-disruptive, needs + migration helper + doc page + suspend/lock/swap testing. diff --git a/docs/research/2026-05-05-agent-wave/05-threat-model-launch.md b/docs/research/2026-05-05-agent-wave/05-threat-model-launch.md new file mode 100644 index 0000000..28562ee --- /dev/null +++ b/docs/research/2026-05-05-agent-wave/05-threat-model-launch.md @@ -0,0 +1,65 @@ +# Threat model + public launch prep + +**Agent 5 of 9-agent wave, 2026-05-05.** + +## Deliverable + +Threat model written to `docs/THREAT-MODEL.md` (1492 words). Slots +into `docs/ROADMAP.md` v0.7 line item "Threat model published — +honest scope". + +## Structure + +1. **In-scope adversaries** (9 rows): lost laptop, browser RCE, USB + attacks, SSH brute-force, forensics, supply chain, LPE, network + surface, time MITM. Each maps to specific veilor mitigation + (LUKS2 argon2id mem=1GB, SELinux + `veilor-systemd` policy, + USBGuard, fail2ban+firewalld, auditd, NTS chrony, etc.). + +2. **Out-of-scope adversaries** (9 rows): firmware implants, + evil-maid on running system, hardware keylogger, session-level + RCE (KDE not sandboxed), AES side-channels, TPM2 physical + attacks, traffic correlation, TOFU MITM, sustained physical + access. Each row points to right tool instead (Heads, Qubes, + Tails). + +3. **Hardening tradeoffs** (6 honest costs): + - SELinux app-compat + - Slow LUKS boot + - USBGuard friction + - Module lockdown breaking NVIDIA prop / VBox + - Drop-zone breaking KDE Connect / mDNS + - No PackageKit + +4. **Like Tails/Whonix/Qubes:** published threat model, default-deny + firewall, encrypted at rest. + +5. **Differs from them:** daily-driver vs session-only; single-VM vs + Qubes compartmentalisation; persistent identity vs Tails amnesia. + +6. **Comparison matrix:** 10-axis × 6-distro grid (veilor-os / stock + Fedora KDE / Kicksecure / Tails / Qubes / secureblue) covering + encryption, MAC, firewall, USB, per-app isolation, anonymity, + daily-driver fit, signed releases, threat-model publication, + hardware compat. + +7. **v0.7 launch checklist** (9 items): + - Threat model finalised + - GPG signing (v0.4 dep) + - mkdocs-material on veilor.org + - Comparison + benchmarks + - Press kit + - "What veilor-os is not" preempt page (covers "why not Qubes/Tails/Fedora?") + - r/linux + r/Fedora + HN posts + - GitHub Release with ISO+sha256+.asc + - Repo flip-public + DNS + Mastodon/Matrix/SimpleX announce + +## Tone + +Matches repo voice — short paragraphs, no fluff, "honest scope" +framing reused from roadmap. No emojis (per CLAUDE.md style). + +## See also + +- `docs/THREAT-MODEL.md` (full document) +- `docs/ROADMAP.md` v0.7 section diff --git a/docs/research/2026-05-05-agent-wave/06-anaconda-log-capture.md b/docs/research/2026-05-05-agent-wave/06-anaconda-log-capture.md new file mode 100644 index 0000000..12cab71 --- /dev/null +++ b/docs/research/2026-05-05-agent-wave/06-anaconda-log-capture.md @@ -0,0 +1,96 @@ +# Anaconda log capture — virtio-9p host-share + +**Agent 6 of 9-agent wave, 2026-05-05.** + +## Why current setup is silent + +v0.5.30 wired: + +``` +-chardev file,id=anaclog,path=$ANACONDA_LOG +-device virtio-serial-pci,id=vs1 +-device virtserialport,chardev=anaclog,bus=vs1.0,name=org.fedoraproject.anaconda.log.0 +``` + +Anaconda is supposed to autodetect this port and stream logs. Result: +`test/anaconda-vm-*.log` files are 0 bytes despite multiple full +installs. + +**Root cause:** Anaconda's `setupVirtio()` (anaconda_logging.py:315) +doesn't write to the virtio port directly — it adds a forward rule to +`/etc/rsyslog.conf` then calls `restart_service("rsyslog")`. No +`inst.virtiolog` boot arg is required (`--virtiolog` defaults to the +right port via `argument_parsing.py:512`). + +The veilor live ISO almost certainly **lacks `rsyslog`** (minimal +Fedora ks), so the forward rule lands in a file no daemon reads. +`restart_service` is a no-op. The QEMU side opens the port and +creates the 0-byte file but nothing ever writes to it. + +Even with rsyslog present, only `LOG_LOCAL1`-tagged messages would +flow; the rich content lives in `/tmp/anaconda.log`, +`/tmp/program.log`, `/tmp/storage.log`, `/tmp/packaging.log` which +never traverse syslog. + +## Fix — Option C (virtio-9p host-share + post-install copy) + +### `test/run-vm.sh` + +Add `-virtfs` 9p export of `test/test-runs//` tagged +`hostlogs`. Keep existing virtio-serial as belt-and-braces fallback. + +```bash +TS=$(date +%Y%m%d-%H%M%S) +HOSTLOGS_DIR="$TEST_DIR/test-runs/$TS" +mkdir -p "$HOSTLOGS_DIR" +HOSTSHARE_ARGS=( + -virtfs "local,path=$HOSTLOGS_DIR,mount_tag=hostlogs,security_model=mapped-xattr,id=hostshare" +) +echo " Logs : $HOSTLOGS_DIR" +``` + +Append `"${HOSTSHARE_ARGS[@]}" \` to the `exec qemu-system-x86_64` +block. + +### `overlay/usr/local/bin/veilor-installer` + +In `run_install()`, install an `EXIT` trap calling `_dump_logs_to_host` +that mounts the 9p share at `/mnt/hostlogs` and copies: + +- `/tmp/{anaconda,program,storage,packaging,dnf,dnf.librepo,anaconda-cmdline}.log` +- `/var/log/veilor-installer.log` +- generated kickstart at `/run/install/veilor-generated.ks` +- `dmesg` output +- `journalctl -b` output + +Runs on success, failure, and `^C`. Auto-no-ops on real hardware +where 9p isn't loaded. + +```bash +_dump_logs_to_host() { + if mount -t 9p -o trans=virtio,version=9p2000.L hostlogs /mnt/hostlogs 2>/dev/null; then + cp -a /tmp/{anaconda,program,storage,packaging,dnf,dnf.librepo,anaconda-cmdline}.log \ + /var/log/veilor-installer.log \ + /run/install/veilor-generated.ks \ + /mnt/hostlogs/ 2>/dev/null || true + dmesg > /mnt/hostlogs/dmesg.log 2>/dev/null || true + journalctl -b > /mnt/hostlogs/journal.log 2>/dev/null || true + umount /mnt/hostlogs 2>/dev/null || true + fi +} +trap _dump_logs_to_host EXIT +``` + +## Why options A/B/D were rejected + +- **A** (grub kernel arg surgery — `inst.virtiolog`) and **D** (host + rsyslog TCP listener with `inst.syslog=10.0.2.2:5140`) both still + rely on rsyslog being present in the live ISO. +- **B** (anaconda --syslog at CLI) — same dependency. +- **C** captures complete file-level fidelity regardless. virtio-9p is + in the kernel; mount is two lines; copies the actual files. + +## Files modified + +- `test/run-vm.sh` +- `overlay/usr/local/bin/veilor-installer` diff --git a/docs/research/2026-05-05-agent-wave/07-kde-skel-branding.md b/docs/research/2026-05-05-agent-wave/07-kde-skel-branding.md new file mode 100644 index 0000000..6544e05 --- /dev/null +++ b/docs/research/2026-05-05-agent-wave/07-kde-skel-branding.md @@ -0,0 +1,100 @@ +# KDE theme + DuckSans + /etc/skel branding audit + +**Agent 7 of 9-agent wave, 2026-05-05.** + +## Catalog: what's currently shipped + +| Component | Status | Path | +|---|---|---| +| Color scheme | shipped | `assets/kde/veilor-black.colors` → `/usr/share/color-schemes/` | +| System kdeglobals | shipped | `assets/kde/veilor-default.kdeglobals` → `/etc/xdg/kdedefaults/kdeglobals` | +| Breeze decoration override | shipped | `assets/kde/breezerc` → `/etc/xdg/breezerc` | +| Plasma containment defaults | shipped | written by `30-apply-v03-theme.sh` → `/etc/xdg/kdedefaults/plasma-org.kde.plasma.desktop-appletsrc` | +| Wallpaper (PNG+SVG) | shipped | `assets/wallpapers/veilor-black.{png,svg}` → `/usr/share/wallpapers/veilor-black/` | +| SDDM theme | shipped (full QML) | `assets/sddm/veilor-black/` → `/usr/share/sddm/themes/veilor-black/` | +| SDDM theme activation | shipped | `30-apply-v03-theme.sh` writes `/etc/sddm.conf.d/veilor-theme.conf` (Current=veilor-black) | +| Konsole profile + colorscheme | shipped | `assets/konsole/veilor.{profile,colorscheme}` → `/usr/share/konsole/Veilor.*` + `/etc/xdg/konsolerc` | +| Plymouth theme | shipped | `assets/plymouth/veilor/` | +| os-release branding | shipped | PRETTY_NAME="veilor-os 0.5.27", LOGO=veilor-logo | +| Fira Code fontconfig | shipped | `/etc/fonts/conf.d/55-veilor-firacode.conf` | +| DuckSans font | DEFERRED — empty dir, README only | | + +## Drift inside active configs + +- `overlay/etc/sddm.conf.d/veilor.conf` sets `[Theme] Current=breeze`. +- `30-apply-v03-theme.sh` then writes + `/etc/sddm.conf.d/veilor-theme.conf` with `Current=veilor-black`. +- SDDM merges alphabetically → `veilor-theme.conf` wins (loads after). +- Shipping a `Current=breeze` line in the overlay is misleading drift. + +## Specific gaps preventing visual brand consistency + +1. **No `/etc/skel/` whatsoever.** `overlay/etc/skel/` does not exist. + All KDE config lives in `/etc/xdg/kdedefaults/` and `/etc/xdg/*rc`. + Works for fresh boots, but the moment the user clicks anything in + System Settings, KDE writes `~/.config/kdeglobals` and silently + shadows the system defaults. **Zero per-user seeding** = one click + away from losing all branding. + +2. **No PRETTY_NAME secondaries.** `/etc/system-release`, `/etc/issue`, + `/etc/issue.net`, `/etc/lsb-release` never written. `lsb_release + -a` reports Fedora. KDE About dialog uses os-release (OK) but TTY + login banner + many user-space tools read `/etc/system-release`. + +3. **No `kwinrc` shipped.** Plasma 6 Wayland-specific defaults + (TitlebarDoubleClick, Compositor backend, FocusPolicy, animation + speed) not seeded. Vanilla Fedora KDE animations + click-to-focus + prevail. + +4. **No panel layout** (`plasma-org.kde.plasma.desktop-appletsrc` + containment for panel). The file written by `30-apply-v03-theme.sh` + only seeds `[Containments][1]` (desktop containment) for wallpaper. + Actual Plasma panel containment (taskbar, system tray, clock, + kickoff icon) is unseeded → users get stock Fedora panel with + Fedora-blue kickoff button. + +5. **DuckSans deferred but README claims it as the brand font.** + `kdeglobals`, Konsole, SDDM all hardcode `Fira Code`. If DuckSans + ever ships, ten files need synchronized edits. + +6. **`overlay/etc/sddm.conf.d/veilor.conf` says `Current=breeze`** — + internal contradiction with script-written `veilor-theme.conf`. + Cosmetic but confusing. + +7. **`kde-theme-apply.sh` has `warn()` undefined** (line 64) — calls + `warn` but only `ok`/`info` defined. If os-release source ever + goes missing, script crashes with `command not found`. + +## Top 5 `/etc/skel/` additions (highest impact, lowest effort) + +1. **`/etc/skel/.config/kdeglobals`** — copy of + `assets/kde/veilor-default.kdeglobals`. Single highest-impact file: + locks ColorScheme, AccentColor, Font, Icons.Theme, + LookAndFeelPackage into the user's first-write file so System + Settings interaction won't revert anything to Breeze defaults. + +2. **`/etc/skel/.config/konsolerc`** — `[Desktop Entry] + DefaultProfile=Veilor.profile` plus `[KonsoleWindow] + ShowMenuBarByDefault=false`. Per-user override of system konsolerc; + ensures first konsole launch is branded even if user's home + pre-exists. + +3. **`/etc/skel/.config/kwinrc`** — Plasma 6 Wayland defaults: + `[Compositing] AnimationSpeed=0`, `[Windows] + FocusPolicy=ClickToFocus`, `[Plugins] blurEnabled=false` (mirrors + the no-animations Breeze override). + +4. **`/etc/skel/.config/plasma-org.kde.plasma.desktop-appletsrc`** — + full containment file with both desktop containment + (wallpaper=veilor-black) AND panel containment (kickoff icon = + `/usr/share/pixmaps/veilor-logo.svg`, panel height/position). + Without this, the taskbar is vanilla Fedora. + +5. **`/etc/skel/.local/share/konsole/Veilor.profile`** — local copy so + user-local konsole sees the profile in its dropdown without needing + `/usr/share/konsole/` walk. Pair with #2. + +**Bonus near-zero-effort:** write `/etc/system-release`, `/etc/issue`, +and `/etc/lsb-release` in `kde-theme-apply.sh` to close the +lsb_release/TTY-banner gap. And fix the undefined `warn()` in +`kde-theme-apply.sh:64`. diff --git a/docs/research/2026-05-05-agent-wave/08-ci-hardening.md b/docs/research/2026-05-05-agent-wave/08-ci-hardening.md new file mode 100644 index 0000000..987f6c4 --- /dev/null +++ b/docs/research/2026-05-05-agent-wave/08-ci-hardening.md @@ -0,0 +1,131 @@ +# Build-iso CI hardening + +**Agent 8 of 9-agent wave, 2026-05-05.** + +## State of play + +- Workflows: `build-iso.yml`, `lint.yml`, `Release Checksums` (auto) +- Secrets/variables: **none configured** — only ambient `GITHUB_TOKEN` +- Repo: private, MIT, no Pages, no Dependabot, no branch protection + (Pro-gated until public flip) +- Container: `registry.fedoraproject.org/fedora:43` (tag, not digest) +- Actions: `actions/checkout@v4`, `addnab/docker-run-action@v3`, + `softprops/action-gh-release@v2`, `ludeeus/action-shellcheck@master` + — **all unpinned to SHA** +- gum download: pinned by SHA256 ✓ +- Kickstart repos: `releases/43/Everything` + `updates/43/Everything` + — **both rolling**, byte-different daily + +## Top 5 immediate (S effort, ship in v0.5.32) + +| # | Item | Why | +|---|------|-----| +| 1 | Pin all actions to commit SHA + add `.github/dependabot.yml` for `github-actions` | Supply-chain — `@master` on shellcheck is live-takeover vector; v3/v4 tags are mutable | +| 2 | Pin Fedora container to digest (`registry.fedoraproject.org/fedora:43@sha256:...`) | One-line change; eliminates "container drift" repro class | +| 3 | Add `permissions:` block at workflow level (`contents: read` default), override per-job | `contents: write` is workflow-wide; least-privilege the lint job | +| 4 | Generate SBOM via `anchore/sbom-action`, attach to release | Free, ~30 lines, journalist-readable | +| 5 | Add `actions/attest-build-provenance@v2` for SLSA L3 attestation on ISO + parts | Free, GH-native, `id-token: write` only | + +## v0.4 release-eng roadmap (confirmed/added) + +- **Confirmed:** Sigstore/cosign signing of ISOs (already in roadmap) +- **Add:** Fedora compose-ID pinning per release tag — switch + `--baseurl` to + `kojipkgs.fedoraproject.org/compose/branched/Fedora-43-...n.X/compose/Everything/x86_64/os/` + for stable releases (rolling for `ci-latest`) +- **Add:** Reproducible-Builds.org diffoscope job comparing 2 + sequential builds of same SHA — gate on byte-equality +- **Add:** `harden-runner` (StepSecurity) audit-mode pass to enumerate + egress; promote to block-mode in v0.5 +- **Add:** When repo flips public (v0.7), enable secret scanning + push + protection + private vuln reporting + branch protection (require ≥1 + review, status checks: lint + ksvalidate + build, no force-push) +- **Add:** OIDC `id-token: write` only in tag-release job (not on + `main` push) — keysless cosign signing scoped to release events + +## YAML diffs + +### 1. Workflow-level permissions + per-job override + +```yaml +permissions: + contents: read + +jobs: + build: + permissions: + contents: write # gh-release + id-token: write # cosign keyless + attestation + attestations: write +``` + +### 2. SHA-pin actions + +```yaml +- uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7 +- uses: addnab/docker-run-action@4f65375b03d588f307b7a3b0a8bb50f8b58a85b9 # v3 +- uses: softprops/action-gh-release@01570a1f39cb168c169c802c3bceb9e93fb10974 # v2.1.0 +``` + +(SHAs to be re-checked at apply-time; dependabot keeps them current) + +### 3. Pin Fedora digest + +```yaml +image: registry.fedoraproject.org/fedora:43@sha256: +``` + +Capture once via `skopeo inspect --raw +docker://registry.fedoraproject.org/fedora:43 | jq -r .config.digest` +and bump on each releasever bump. + +### 4. SBOM + attestation + cosign + +```yaml +- name: Install cosign + uses: sigstore/cosign-installer@d7d6e07a3ddf0f9a4f8b3b9e3f1d1a5ce8e9b5b3 # v3.7.0 + +- name: Sign ISO parts (keyless) + if: github.event_name == 'release' + run: | + cd build/out + for f in *.part-*; do cosign sign-blob --yes "$f" \ + --output-signature "$f.sig" --output-certificate "$f.pem"; done + +- name: Generate SBOM (SPDX) + uses: anchore/sbom-action@e8d2a6937ecead383dfe75190d104edd1f9c5751 # v0.17.4 + with: + path: build/out + format: spdx-json + output-file: build/out/veilor-os.spdx.json + +- name: Build provenance attestation + uses: actions/attest-build-provenance@7668571508540a607bdfd90a87a560489fe372eb # v2.1.0 + with: + subject-path: 'build/out/*.part-*' +``` + +### 5. New `.github/dependabot.yml` + +```yaml +version: 2 +updates: + - package-ecosystem: "github-actions" + directory: "/" + schedule: { interval: "weekly" } + groups: + actions: { patterns: ["*"] } +``` + +### 6. Timeout + +Keep at 90min. Largest observed runs ~70min; trimming would +false-fail Fedora-mirror-slow days. **No change.** + +## Q&A + +- **Secrets in use:** none. Only ambient `GITHUB_TOKEN`. Once public, + enable secret scanning + push protection (free for public repos). +- **Pages:** not deployed from this repo. Docs site out-of-scope here. +- **Dependency review:** only `gum` fetched out-of-band — already + SHA256-pinned. Add `actions/dependency-review-action` on PRs once public. diff --git a/docs/research/2026-05-05-agent-wave/09-realhw-failure-modes.md b/docs/research/2026-05-05-agent-wave/09-realhw-failure-modes.md new file mode 100644 index 0000000..7d2c2cb --- /dev/null +++ b/docs/research/2026-05-05-agent-wave/09-realhw-failure-modes.md @@ -0,0 +1,167 @@ +# Real-hardware failure mode audit (post-v0.5.31) + +**Agent 9 of 9-agent wave, 2026-05-05.** Pessimistic enumeration. + +## A. Boot path + +### A1. Secure Boot + GRUB_DISTRIBUTOR rebrand +- shim chain itself untouched (uses `/EFI/fedora/`), but `grub2-mkconfig` + regenerates entries naming `veilor-os` while shim only trusts paths + under `/EFI/fedora/grubx64.efi`. Strict UEFI: menu boots, kernel + signatures verify via Fedora's MOK chain. Risk: `os-prober` writing + dual-boot Windows entries breaks MBR/MOK. +- **Symptom:** dual-boot with Windows shows + `Verification failed: (0x1A) Security Violation`. +- **Prob:** MED. **Fix:** S. **Target:** v0.5.32. + +### A2. KMS handoff — `fbcon=nodefer` necessary but not sufficient +- On Intel Arc/iGPU late-gen + NVIDIA proprietary chains, 5-15s blank + between vt switch and SDDM start because `simpledrm` releases before + `i915`/`nvidia-drm` claim. +- **Symptom:** ~10s blank pre-SDDM; user thinks crashed. +- **Prob:** HIGH. **Fix:** S — add `i915.modeset=1 + nvidia-drm.modeset=1 amdgpu.modeset=1`. SDDM `Type=simple` startup. + **Target:** v0.5.32. + +### A3. USBGuard hash-based rules +- `scripts/20-harden-kernel.sh:127-131` ships **empty** rules.conf + with `ImplicitPolicyTarget=block`. First boot, admin runs + `usbguard generate-policy`. Per `feedback_usbguard_dock.md`, this + writes hash+parent-hash rules that break on dock replug. +- **Symptom:** keyboard/mouse dies on first dock unplug-replug. +- **Prob:** HIGH. **Fix:** M — patch invocation to + `--with-hash=false`, or ship `veilor-usbguard-enroll` wrapper. + **Target:** v0.5.32 (same bug we already learned). + +### A4. Wifi/Bluetooth firmware +- `@hardware-support` pulls `linux-firmware` etc. +- Realtek RTL8852/MT7921 firmware ships in `linux-firmware-whence` only. +- **Prob:** LOW. **Fix:** S (add explicit `linux-firmware-whence`). + **Target:** v0.5.32. + +### A5. Bluetooth disabled at boot +- `scripts/20-harden-kernel.sh:111` disables `bluetooth` service. + BT keyboards/mice don't pair until user enables service. +- **Prob:** MED (laptop users). **Fix:** S — leave bluetooth.service + enabled, mask `obex` only. **Target:** v0.6. + +## B. First-boot KDE session + +### B1. Plasma 6 Wayland fallback on hybrid graphics +- SDDM config doesn't pin session. NVIDIA Optimus + intel-iris + triggers Wayland → silent fallback to X11 on some HW. +- **Symptom:** screen tearing, no fractional scaling. +- **Prob:** MED. **Fix:** S — add `[Autologin] Session=plasma` + + `[General] DefaultSession=plasma.desktop`. **Target:** v0.6. + +### B2. SUSPEND/RESUME KILLS WIFI — THE BIG ONE 🚨 +- `veilor-modules-lock.service` sets `kernel.modules_disabled=1` 30s + after graphical.target. `iwlwifi`, `iwlmvm`, `cfg80211` reload on + resume from S3/S0ix. With modules locked: **resume → permanent + wifi death until reboot**. Same for `nvidia` autoload, `xhci_pci` + re-init on dock attach. +- **Symptom:** close laptop lid → reopen → no wifi, no dock USB, + until reboot. +- **Prob:** VERY HIGH (every laptop user, day 1). +- **Fix:** M — gate lock on `ConditionACPower=true` + reset on + suspend, OR move from `modules_disabled` to `module.sig_enforce=1` + kernel cmdline (no runtime lock needed). +- **Target:** v0.5.32 — **BLOCKER**. + +### B3. Lid-close handling +- `logind.conf` not modified. Defaults `HandleLidSwitch=suspend`. + Combined with B2, every lid close = wifi loss. +- **Prob:** HIGH. **Fix:** S. **Target:** v0.5.32 (paired with B2). + +## C. Day-2 ops + +### C1. `/etc/default/grub` + `/etc/kernel/cmdline` drift +- Kickstart writes `GRUB_CMDLINE_LINUX_DEFAULT=""`. Real installer + writes `/etc/kernel/cmdline` with LUKS rd.luks args. `kernel-install` + reads the latter; `grub2-mkconfig` re-reads `/etc/default/grub`. +- **Symptom:** `dnf upgrade kernel` regenerates grub.cfg from + default/grub, drops LUKS unlock args from new entry → unbootable. +- **Prob:** HIGH. **Fix:** M — sync both files in `veilor-update`, + or migrate fully to BLS without grub-mkconfig. +- **Target:** v0.5.32. + +### C2. SELinux relabel on first boot +- `firstboot.sh` flips to `enforcing` and `touch /.autorelabel`. On + large /home (encrypted btrfs), relabel takes 2-5min — user sees + frozen screen with cursor. +- **Symptom:** stuck "first boot" appears hung. +- **Prob:** MED. **Fix:** S (add plymouth message). **Target:** v0.6. + +### C3. F44 upgrade +- Hardcoded `python3.14` path (kickstart:334) for transaction_progress.py + patch. Survives no upgrade. +- **Prob:** certainty by Nov 2026. **Fix:** M. **Target:** v0.6. + +### C4. chrony NTS unreachable from corp networks +- Cloudflare NTS over UDP 4460 blocked by many corp firewalls. + chronyd will fail-stop sync. +- **Symptom:** clock skew → TLS failures → broken everything. +- **Prob:** MED. **Fix:** S (add fallback `pool` line — already + present, verify ordering). **Target:** v0.5.32. + +## D. Networking + +### D1. firewalld drop zone vs Tailscale 🚨 +- `tailscale up` requires UDP 41641 + tailscale0 trusted. Default + `drop` zone blocks tailscale0. +- **Prob:** HIGH (this user uses Tailscale daily). +- **Fix:** S — ship `/etc/firewalld/zones/trusted.xml` with + `tailscale0` interface. +- **Target:** v0.5.32. + +### D2. systemd-resolved DoT vs corp split-DNS +- No /etc/resolved.conf.d entries shipped (overlay dir empty). +- Corp internal hostnames fail. +- **Prob:** LOW. **Fix:** M. **Target:** v0.7. + +## E. Hardware diversity + +### E1. NVMe vs SATA LUKS perf +- Argon2id KDF tuned to memory, not IO. +- **Prob:** cosmetic. Skip. + +### E2. ARM aarch64 +- Out of scope for v0.5/0.6. + +### E3. TPM2 unlock +- Already on roadmap. **Target:** v0.7. + +## Top 10 ranked (prob × severity) + +| # | Issue | Prob | Sev | Target | +|---|-------|------|-----|--------| +| 1 | **B2 Suspend/resume wifi death** (modules_disabled) | VHIGH | CRITICAL | v0.5.32 | +| 2 | **C1 kernel-upgrade grub drift** (LUKS args lost) | HIGH | CRITICAL | v0.5.32 | +| 3 | **A3 USBGuard hash rules** (dock replug) | HIGH | HIGH | v0.5.32 | +| 4 | **D1 firewalld blocks tailscale0** | HIGH | HIGH | v0.5.32 | +| 5 | **A2 KMS blank-screen 10s** | HIGH | MED | v0.5.32 | +| 6 | **B3 Lid-close suspend** (compounds B2) | HIGH | MED | v0.5.32 | +| 7 | **A1 Secure Boot + os-prober dual-boot** | MED | HIGH | v0.6 | +| 8 | **C4 NTS blocked corp** | MED | MED | v0.5.32 | +| 9 | **B1 Plasma Wayland fallback** | MED | MED | v0.6 | +| 10 | **C3 F44 path-pinned patch** | CERTAIN | LOW (Nov) | v0.6 | + +## Top 5 to preempt in v0.5.32 + +1. **B2 modules-lock vs resume** — gate on no-pending-suspend, OR swap + to `module.sig_enforce=1` kernel cmdline. +2. **C1 cmdline drift** — make `veilor-update` fail-loud if + `/etc/kernel/cmdline` and `/etc/default/grub` diverge; regen BLS + on every kernel install. +3. **A3 USBGuard id-based rules** — `veilor-usb-enroll` wrapper that + calls `usbguard generate-policy --with-hash=false`. Same fix that + already burned us on onyx. +4. **D1 Tailscale zone** — ship `/etc/firewalld/zones/trusted.xml` + listing `tailscale0`, plus NetworkManager dispatcher to assign it. +5. **A2 KMS handoff** — append `i915.modeset=1 amdgpu.modeset=1 + nvidia-drm.modeset=1` to bootloader cmdline. + +**Critical insight:** B2 alone bricks the laptop for any user who +closes their lid. Without that fix, v0.5.32 is shippable on desktops +only. Same architectural class as the LUKS bug — security feature +breaks legitimate kernel state transitions. diff --git a/docs/research/2026-05-05-agent-wave/README.md b/docs/research/2026-05-05-agent-wave/README.md new file mode 100644 index 0000000..086830c --- /dev/null +++ b/docs/research/2026-05-05-agent-wave/README.md @@ -0,0 +1,42 @@ +# 9-agent research wave — 2026-05-05 + +Deep-dive research wave kicked off after v0.5.31 ship to surface every +plausible failure mode + future bug class before the v0.7 public flex. +Each agent took ~15 min, returned a focused report. Findings indexed +here, full reports in this directory. + +The findings already inform `docs/ROADMAP.md` (Lessons learned section ++ v0.5.32 / v0.6 / v0.7 reorder) and `docs/THREAT-MODEL.md` (drafted +by Agent 5). + +| # | Topic | File | Key finding | +|---|---|---|---| +| 1 | Plymouth + LUKS real-hardware edge cases | [01-plymouth-luks-real-hardware.md](01-plymouth-luks-real-hardware.md) | Initramfs keymap missing breaks non-US users at LUKS prompt | +| 2 | SDDM + first-boot UX failure modes | [02-sddm-firstboot-ux.md](02-sddm-firstboot-ux.md) | `veilor-firstboot.service` `WantedBy=multi-user.target` only — silently doesn't run on real installs (graphical target) | +| 3 | bootc-image-builder spike plan | [03-bootc-spike-plan.md](03-bootc-spike-plan.md) | Full Containerfile draft + 1-week timebox; v0.7 schedule | +| 4 | Hardening tier 2 (AppArmor + nftables + audit + homed) | [04-hardening-tier-2.md](04-hardening-tier-2.md) | nftables + audit log shipping = S effort each, ship in v0.5.32 | +| 5 | Threat model + public launch prep | [05-threat-model-launch.md](05-threat-model-launch.md) | Drafted at `docs/THREAT-MODEL.md`. Honest in/out scope tables | +| 6 | Anaconda log virtio-serial silent fix | [06-anaconda-log-capture.md](06-anaconda-log-capture.md) | virtio-serial requires rsyslog (not in our live ISO). Switch to virtio-9p host-share with EXIT trap copy | +| 7 | KDE theme + DuckSans + /etc/skel branding | [07-kde-skel-branding.md](07-kde-skel-branding.md) | `/etc/skel/` doesn't exist; branding evaporates the moment user opens System Settings | +| 8 | Build-iso CI hardening | [08-ci-hardening.md](08-ci-hardening.md) | Pin actions to SHA, dependabot, SBOM, SLSA L3 attestation — all S effort | +| 9 | Real-hardware failure mode audit | [09-realhw-failure-modes.md](09-realhw-failure-modes.md) | **CRITICAL: `kernel.modules_disabled=1` kills wifi on suspend/resume.** Top blocker for v0.5.32 | + +## Top blockers for next ship (v0.5.32) + +Cross-referenced by severity × probability: + +1. **Suspend/resume wifi death** (Agent 9) — every laptop bricks on lid-close +2. **veilor-firstboot.service WantedBy=graphical.target** (Agent 2) — login broken on real installs +3. **kernel-upgrade grub drift** (Agent 9) — first `dnf upgrade kernel` = unbootable +4. **USBGuard hash-rules problem** (Agent 9, mirrors `feedback_usbguard_dock.md`) +5. **firewalld blocks tailscale0** (Agent 9) — user uses tailscale daily +6. **/etc/skel/ empty → no per-user branding** (Agent 7) +7. **virtio-9p log capture** (Agent 6) — replaces broken virtio-serial path + +## Research wave protocol + +This wave validated the `wave + verifier` pattern from v0.5.31 fix +(per ROADMAP lessons learned #4). Multi-agent debug only produces +signal when one agent's findings are checked against another's; +9 parallel agents on distinct topics gave independent angles that +converged on the v0.5.32 blocker list above.