veilor-os/docs/ROADMAP.md
veilor-org 21c0bbd120 docs: refine strategy — ostreecontainer install + mesh stack + browser stack
Refines docs/STRATEGY.md per parent-operator handoff (2026-05-05).
Locks in five things the original draft didn't cover, and corrects
one mistake.

## Refinement: ostreecontainer install path

The original draft proposed a two-step install: Anaconda partitions
+ kickstart, then on first boot a `veilor-firstboot-rebase.service`
runs `bootc rebase ghcr.io/veilor/veilor-os:43`. This commit drops
that step.

Anaconda's `ostreecontainer --url=... --transport=registry`
directive populates the root filesystem directly from the OCI image
during the install pass. No first-boot rebase, no transition
window, no second reboot. Same end state, simpler path.

Stay on `ostreecontainer` through v0.8. Do NOT migrate to the new
`bootc` kickstart command until v1.0 — it blocks multi-disk and
authenticated registries. Do NOT use `bootc-image-builder
anaconda-iso` output — deprecated in image-builder v44+. Produce
the OCI image and the bootstrap ISO as separate artifacts.

This compresses the v0.7 BlueBuild spike from 2 days → 1 day.

## Correction: keep Trivalent as default

The original strategy.md treated Trivalent (secureblue's hardened
Chromium) as an override-and-remove. That was wrong: Trivalent's
COPR tracks upstream M147+ within hours, ships hardened_malloc +
JIT-less + Drumbrake WASM. Default browser pick.

Mullvad Browser layered alongside for anti-fingerprint. Thorium
remains opt-in via `ujust install-thorium` only — its CVE lag is
months and contradicts the threat model. Never default.

## Mesh stack baked in

Three-layer warm-stack documented in STRATEGY.md:
- L3a Tailscale + Headscale (Day 1, daily driver)
- L3b Yggdrasil-go (Day 1, idle warm-fallback, AllowedPublicKeys mode)
- L3c Reticulum/RetiNet AGPL fork (opt-in via ujust install-reticulum)

Threat floor table: ISP-DNS-block (i, Day 1), ISP-Tailscale-block
(ii, Phase 2 promote Yggdrasil), internet-down (iii, opt-in RetiNet
+ RNode).

Tier model: tag:admin / tag:infra / tag:guest with failsafe pre-auth
key on yubikey + paper + Authentik OIDC group.

## Onboarding

Token paste / QR (user picks). Misskey signup mints reusable
24h-TTL pre-auth key. NOT auto-OIDC at first boot.

## Iroh seeding daemon stub (v0.8 / Phase 2)

`veilor-seed.service` documented but NOT implemented until Iroh hits
1.0 (current 0.96–0.98 RC, Q1 2026 target slipped). BLAKE3 +
iroh-gossip per-service topic. Static media only — DEFER DB
replication forever.

## External dependency tracked

nullstone Traefik `no-guest@file` ACL is currently 0.0.0.0/0
allow-all (XFF chain breakage 2026-05-03). Must be fixed before
veilor-os first-public-ISO ships, otherwise tag:guest provisioning
leaks the full vhost surface to every veilor user. Parent operator
owns the fix; explicitly out of veilor-os scope.

## Files

- docs/STRATEGY.md — full refinement
- docs/ROADMAP.md — v0.7 spike entry now reflects ostreecontainer
  + mesh stack + 1-day spike target
- README.md — drops the "v0.2.5 pre-release" badge + status box
  (out of date), adds bootc/atomic trajectory paragraph

## What did NOT change

- v0.5.x main branch is untouched. The ostreecontainer swap belongs
  in the v0.7 spike branch, NOT v0.5.32.
- nullstone Traefik config is untouched. Out of scope.
- The kickstart and overlay code is untouched.
2026-05-05 15:15:52 +01:00

15 KiB
Raw Blame History

Roadmap

Versioned roadmap for veilor-os. Targets are intentionally short and testable. No fluff. Items in earlier versions are blockers for later ones unless explicitly noted.

For the historical record of what landed in each release, see ../CHANGELOG.md.


Lessons learned through v0.5.x install grind

Five things v0.5.2731 changed about how we plan:

  1. Anaconda + RPM-6.0 + --cmdline is brittle — three install failures, kernel cmdline written to four places before one worked. --location=none skips CollectKernelArgumentsTask, kernel-install reads /etc/kernel/cmdline not /proc/cmdline, and transaction_progress.py masks real failures if patched too broadly. Justifies promoting the bootc-image-builder spike to v0.7.
  2. Test procedure must gate every tag — v0.5.27 only surfaced four bugs in one VM run because the run walked every step in order. test/TESTING.md and test/test-runs/ are now load-bearing.
  3. Real hardware is not optional — VM catches install logic, not KMS / fbcon / firmware. Spare laptop + friend's laptop must run pre-tag, every time.
  4. Multi-agent debug waves work, but only with a verifier — the v0.5.31 four-bug fix came from a 4-agent verification wave on v0.5.30 outcome. Wave + verifier = signal; wave alone = noise.
  5. "We ask once, with sane defaults" is the distro UX — every v0.5 install bug we shipped a workaround for (locale, hostname, USBGuard policy, drivers) is something veilor-postinstall could ask the user about cleanly on first boot. That promotes veilor-postinstall from v0.6 background item to flagship.

v0.2 — green ISO + base hardening (DONE)

Reproducible CI build pipeline. UEFI+BIOS bootable live ISO from a single kickstart. Single-prompt LUKS install. First-boot admin password flow. Full overlay applied (sysctl, sshd, sudoers, tuned profiles, KDE black theme, Fira Code, branded /etc/os-release). SELinux enforcing. firewalld drop zone. fail2ban + auditd + USBGuard active. The build chased five real bugs (DEST hardcoded, set -eu killing cp, os-release symlink, missing admin user, LABEL= vs CDLABEL= in livecd-tools) before greening.

Released v0.2.5 on 2026-05-01. CI on every push to main.


v0.5.27v0.5.31 — install path stabilisation (DONE)

The bridge between v0.2 (greens at all) and v0.3 (looks polished). All install-path bugs surfaced by the formal hybrid-VM test procedure (test/TESTING.md). Five releases, ~hours of debug, three install failures before greening.

  • v0.5.27 (DONE)rd.luks.uuid via grubby --update-kernel=ALL, GRUB rebrand, fbcon=nodefer, ASCII gum cursor.
  • v0.5.28 (DONE) — locale locked en_US.UTF-8, dropped updates repo, patched anaconda transaction_progress.py to silence Configuring xxx.x86_64 scroll, excluded man-db.
  • v0.5.29 (DONE) — narrowed anaconda patch (was masking real failures), LUKS UX, initramfs assertion. Five-fix bundle from 7-agent research wave.
  • v0.5.30 (DONE) — broad error suppression, manual bootloader path, virtio log capture for post-mortem.
  • v0.5.31 (DONE)--location=none was making anaconda skip CollectKernelArgumentsTask; kernel-install reads /etc/kernel/cmdline as source of truth, veilor never wrote it, so BLS entries shipped with empty cmdline. Three-path write (/etc/kernel/cmdline + /etc/default/grub + grubby) plus explicit kernel-install add.

v0.5.32 — next ship (active)

Outstanding from the grind, immediate priority for the next tag:

  • End-to-end VM green run — v0.5.31 lands the kernel-cmdline fix but no full hybrid-VM pass has signed it off. Run the procedure in test/TESTING.md to install + reboot + login, file the report in test/test-runs/, then tag.
  • Real-hardware run on the spare laptop — VM is necessary not sufficient. Friend's laptop is mate's-test, spare is ours. KMS, fbcon, USB controller, real-firmware Secure Boot only show up here.
  • gum input render glitch — duplicate "Install", stray T in password fields on linux fbcon. Replace gum input --password with bash read -srp; cosmetic only but visible on every install.

v0.3 — UX polish (in progress)

The visible polish layer that v0.2 deferred for build velocity.

  • Plymouth black theme — boot splash matching the desktop. No Fedora drum, no white flash. assets/plymouth/veilor/.
  • SDDM theme — black login background, single-user prompt with admin pre-filled, no userlist.
  • Konsole profile — black background, Fira Code, transparent panel off (no compositor cost on resume).
  • Wallpaper SVG — flat black with subtle veilor wordmark, 1080p + 4K + ultrawide variants.
  • Re-enable memory hygiene on installed system. v0.2.5 stripped init_on_alloc=1 init_on_free=1 from the live cmdline because they 5x'd KVM boot time. Re-add post-install via veilor-firstboot so the installed system gets the protection without the ISO penalty.
  • USBGuard auto-snapshot on first boot. Currently the operator runs usbguard generate-policy manually. v0.3 wires this into veilor-firstboot after the password step (with a clear "plug in trusted devices first" prompt).

Target: this month. None of it is a kickstart change — pure overlay work.


v0.4 — distribution + signing

Get veilor-os to a state where the ISO is downloadable, verifiable, and trusted by Secure Boot without user shenanigans.

  • GPG-signed releases. Tag → CI builds → CI signs ISO + sha256 with veilor.org release key → GitHub Release artifact carries .iso.asc.
  • Reproducible builds. Pin Fedora compose ID, lock package versions via dnf snapshot or equivalent, document how to verify two builds match.
  • Own MOK (Machine Owner Key) + sbsign for Secure Boot. Currently veilor-os relies on Fedora's signed shim chain. v0.4 ships our own MOK, signs the kernel + initramfs at build time, optionally enrols the MOK on first boot for users who want a cleaner trust path.
  • ISO download mirror — static download page on veilor.org with current + previous release, sha256, gpg signature. Not an RPM mirror — veilor-os does not ship its own packages, only the spin configuration.
  • Release process documented — tagging, CI, signing, mirror sync in docs/RELEASE.md.

v0.5 — hardening tier 2

Hardening that builds on v0.2's foundation. Each item is opt-in unless specified — defaults stay sane for a daily driver.

  • AppArmor profiles in addition to SELinux. Stack-not-replace. Targeted at the browser, the mail client, and anything that touches attacker-controlled data. SELinux remains the primary MAC.
  • systemd-homed — encrypted-per-user ~, suspend-aware, key unlocked at PAM login. Optional, opt-in via post-install helper.
  • nftables ruleset alongside firewalld defaults. Default firewalld policy stays drop; nftables provides advanced filtering for users who want it.
  • Audit log shipping — opt-in auditd -> remote syslog over TLS, for users running a central log aggregator.
  • Installer kickstart split — separate veilor-os-install.ks for installer ISO (real LUKS partitioning, not the live-rootfs simplification used in v0.2). Lets users install veilor-os as the primary OS without going through the live boot first.
  • Audit baseline — re-run the security audit (template in security/audit-template.md) and target a lower risk score than v0.2.

v0.6 — ergonomics (PROMOTED — install grind proved we need this)

Smooth the operator experience so day-to-day work doesn't fight the hardening. veilor-postinstall and veilor-doctor were v0.6 background items — promoted to headline features after v0.5.2731 made it clear that "we ask once, with sane defaults" is what separates a distro from a kickstart.

  • veilor-postinstall (PROMOTED — flagship of v0.6) — first-login welcome menu, EndeavourOS-style but cleaner. Single TUI screen: keyboard layout, locale (deferred from install per v0.5.28), hostname override, package presets (dev / media / homelab), drivers (NVIDIA / Intel / AMD), Bluetooth opt-in, USBGuard snapshot, audit baseline run, veilor-doctor first run. Each step skippable, runs once on first SDDM login, self-deletes the autostart after. This is the only UX feature that ships in v0.6 day one — everything else builds on it.
  • veilor-doctor (PROMOTED — user-facing, not just dev tool) — the post-install audit. Walks getenforce, mokutil --sb-state, firewall-cmd, fail2ban, USBGuard policy, sysctl drift, and reports drift from baseline. Runs from veilor-postinstall on day one, then weekly via systemd --user timer. Plain-English output ("your firewall is OK", "USBGuard policy has 3 unknown devices"); not a JSON dump. Stretch: machine-readable mode for veilor-server later.
  • veilor-update — wraps dnf upgrade AND flatpak update in one command. Per feedback_system_update.md, partial-update is a recurring trap; veilor's update tool covers both by default. Adds pre-check (snapshot available?), auditd pause, post-update SELinux validation.
  • Opt-in installer ISO — flip from live-only to live + installer, user picks at boot menu. Installer uses the v0.5 kickstart with full LUKS + btrfs subvols + zram.
  • First-boot UX — replace TTY password prompt with a small Plymouth-rendered dialog. Less raw.
  • Bluetooth opt-in helper — single command to enable + bring up the daemon + add the user to the right group.

v0.7 — public flex + bootc spike

Take veilor-os out of "private repo, contained audience" mode. Order matters: people demand threat model FIRST when a security distro goes public, benchmarks come after.

  1. Threat model published (FIRST — gating item) — what veilor-os defends against, what it does not. Honest scope. No claim of anti-state-actor; concrete on lost-laptop, USB-attack, browser compromise, supply-chain. Reviewers will demand this before reading anything else.
  2. Public docs site — Hugo or mdBook on veilor.org, generated from docs/. Single source of truth.
  3. Repo public — flip GitHub visibility, announce.
  4. Comparison + benchmarks — published numbers vs stock Fedora KDE on cold boot, idle RAM, idle network egress, suspend/resume time. After threat model, not before.
  5. Press kit — wallpapers, logo, screenshots, feature one-liner.

Hybrid bootc spike — layer on secureblue, install via ostreecontainer (REVISED 2026-05-05)

The original v0.7 entry called for a Containerfile-from-scratch spike on quay.io/fedora/fedora-bootc:43. Research on 2026-05-05 (see docs/STRATEGY.md and docs/research/2026-05-05-agent-wave/), then a parent-operator refinement same day, locked the path: layer veilor's branding + threat model + UX on top of secureblue's already-shipping securecore-kinoite-hardened-userns OCI image via a BlueBuild recipe, and install it directly during the Anaconda pass via the ostreecontainer kickstart directive (no first-boot rebase).

Reasoning:

  • secureblue has 30 active contributors, 940 stars, 56 commits in the last 5 weeks. They've already implemented the hardening surface we'd need to build alone (sysctl + kargs + SELinux custom policy + USBGuard + hardened-malloc + Unbound DoT + cosign-signed OCI build pipeline).
  • Containerfile-from-scratch spike: 1 week to first ISO. BlueBuild recipe extending secureblue: ~2 days. With the ostreecontainer swap (no veilor-firstboot-rebase.service, no transition window): ~1 day.
  • secureblue does NOT publish a threat model. Athena OS does (their main differentiator, only public threat model in hardened-Linux 2026). Our docs/THREAT-MODEL.md (drafted) gets us ahead of both on the one axis that matters most for a security-branded distro.

Hybrid path locked:

  • Kickstart ISO stays as the bootstrap installer (Anaconda's LUKS UX is mature).
  • %packages is replaced with ostreecontainer --url=ghcr.io/veilor/veilor-os:43 --transport=registry so the install pass populates / directly from the OCI image — no first-boot rebase, no second reboot.
  • From boot one onward, bootc upgrade is the update channel.
  • v1.0 deprecates the kickstart entirely.

Stay on ostreecontainer through v0.8. Do NOT migrate to the new bootc kickstart command until v1.0 — it blocks multi-disk and authenticated registries (likely needed eventually). Do NOT use bootc-image-builder anaconda-iso output — deprecated in image-builder v44+. Produce OCI image and bootstrap ISO as separate artifacts.

Overrides over secureblue: keep Trivalent as default (their COPR tracks upstream M147+ within hours; reverses earlier draft that treated it as override-and-remove); add Mullvad Browser alongside; gate Thorium behind ujust install-thorium with CVE-lag warning; restore sudo (revert run0-only); re-enable Xwayland.

Mesh stack baked in: Tailscale (Day 1, daily driver), Yggdrasil-go (Day 1, idle warm-fallback), Reticulum/RetiNet AGPL fork (opt-in via ujust install-reticulum). See docs/STRATEGY.md mesh stack section for the layer breakdown and threat-floor table.

Full plan: docs/STRATEGY.md. Spike will land in bluebuild/recipe.yml plus .github/workflows/build-bluebuild.yml, on a separate branch — does NOT land in v0.5.x main.

External dependency tracked: Traefik no-guest@file ACL on nullstone is currently an 0.0.0.0/0 allow-all stub. Must be fixed before veilor-os first-public-ISO ships, otherwise tag:guest provisioning leaks the full vhost surface to every veilor user. Parent operator owns the fix; not in veilor-os scope.


v1.0 — production

The line where veilor-os is recommended for a non-author user as a daily driver.

  • Multi-arch. x86_64 today; v1.0 ships aarch64 ISO too (laptops on ARM are real now). Build matrix in CI.
  • LTS commitment — major versions tied to Fedora's release cadence, patch releases for security only, documented EOL per major.
  • Recovery ISO — minimal rescue image with veilor tools (LUKS unlock, btrfs scrub, sysctl reset, fail2ban unban) for "I cannot log in to my system" days.
  • TPM2 integration — sealed LUKS unlock against TPM2 PCRs (opt-in, default stays password). Ships as helper script, not silent default.
  • Signed update channel — beyond GPG-signed ISOs, a signed metadata repo so veilor-doctor can detect available updates without trusting Fedora's mirrorlists alone.

Stretch goals — not on the v0.x → v1.0 critical path

These are spin variants that share veilor-os DNA but need their own kickstart or build tool.

  • veilor-server — no KDE, no GUI, hardened headless Fedora for homelab / VPS (e.g. nullstone). Same overlay, different package set. Not blocked, but waits on veilor-doctor machine-readable mode (v0.6) so headless installs have a way to report drift without a TUI.
  • veilor-kiosk — single-app Plasma session, locked-down user, read-only root. Not blocked.
  • veilor-atomic — rpm-ostree / bootc-image-builder rebase. Status now depends on the v0.7 bootc spike: if the spike shows bootc fixes the anaconda-grind class of bugs, veilor-atomic becomes the v1.0+ mainline rather than a stretch variant. If not, it stays a parallel track.