veilor-os/docs/STRATEGY.md
veilor-org 7060d9aa6b docs: refine strategy — ostreecontainer install + mesh stack + browser stack
Refines docs/STRATEGY.md per parent-operator handoff (2026-05-05).
Locks in five things the original draft didn't cover, and corrects
one mistake.

## Refinement: ostreecontainer install path

The original draft proposed a two-step install: Anaconda partitions
+ kickstart, then on first boot a `veilor-firstboot-rebase.service`
runs `bootc rebase ghcr.io/veilor/veilor-os:43`. This commit drops
that step.

Anaconda's `ostreecontainer --url=... --transport=registry`
directive populates the root filesystem directly from the OCI image
during the install pass. No first-boot rebase, no transition
window, no second reboot. Same end state, simpler path.

Stay on `ostreecontainer` through v0.8. Do NOT migrate to the new
`bootc` kickstart command until v1.0 — it blocks multi-disk and
authenticated registries. Do NOT use `bootc-image-builder
anaconda-iso` output — deprecated in image-builder v44+. Produce
the OCI image and the bootstrap ISO as separate artifacts.

This compresses the v0.7 BlueBuild spike from 2 days → 1 day.

## Correction: keep Trivalent as default

The original strategy.md treated Trivalent (secureblue's hardened
Chromium) as an override-and-remove. That was wrong: Trivalent's
COPR tracks upstream M147+ within hours, ships hardened_malloc +
JIT-less + Drumbrake WASM. Default browser pick.

Mullvad Browser layered alongside for anti-fingerprint. Thorium
remains opt-in via `ujust install-thorium` only — its CVE lag is
months and contradicts the threat model. Never default.

## Mesh stack baked in

Three-layer warm-stack documented in STRATEGY.md:
- L3a Tailscale + Headscale (Day 1, daily driver)
- L3b Yggdrasil-go (Day 1, idle warm-fallback, AllowedPublicKeys mode)
- L3c Reticulum/RetiNet AGPL fork (opt-in via ujust install-reticulum)

Threat floor table: ISP-DNS-block (i, Day 1), ISP-Tailscale-block
(ii, Phase 2 promote Yggdrasil), internet-down (iii, opt-in RetiNet
+ RNode).

Tier model: tag:admin / tag:infra / tag:guest with failsafe pre-auth
key on yubikey + paper + Authentik OIDC group.

## Onboarding

Token paste / QR (user picks). Misskey signup mints reusable
24h-TTL pre-auth key. NOT auto-OIDC at first boot.

## Iroh seeding daemon stub (v0.8 / Phase 2)

`veilor-seed.service` documented but NOT implemented until Iroh hits
1.0 (current 0.96–0.98 RC, Q1 2026 target slipped). BLAKE3 +
iroh-gossip per-service topic. Static media only — DEFER DB
replication forever.

## External dependency tracked

nullstone Traefik `no-guest@file` ACL is currently 0.0.0.0/0
allow-all (XFF chain breakage 2026-05-03). Must be fixed before
veilor-os first-public-ISO ships, otherwise tag:guest provisioning
leaks the full vhost surface to every veilor user. Parent operator
owns the fix; explicitly out of veilor-os scope.

## Files

- docs/STRATEGY.md — full refinement
- docs/ROADMAP.md — v0.7 spike entry now reflects ostreecontainer
  + mesh stack + 1-day spike target
- README.md — drops the "v0.2.5 pre-release" badge + status box
  (out of date), adds bootc/atomic trajectory paragraph

## What did NOT change

- v0.5.x main branch is untouched. The ostreecontainer swap belongs
  in the v0.7 spike branch, NOT v0.5.32.
- nullstone Traefik config is untouched. Out of scope.
- The kickstart and overlay code is untouched.
2026-05-05 15:15:52 +01:00

14 KiB
Raw Blame History

veilor-os Strategy — Hybrid kickstart bootstrap + bootc OCI

Decision date: 2026-05-05 (refined same day from parent-operator handoff, locks the ostreecontainer install path, mesh stack bake-in, browser stack, Iroh seeding roadmap, and threat floor table). Locked at: v0.5.31 → v0.7 spike → v1.0

TL;DR

  • Keep the Anaconda-driven kickstart ISO as the bootstrap installer (LUKS UX is mature, single passphrase prompt, custom partitioning works).
  • Anaconda's ostreecontainer directive populates the root filesystem directly from a veilor-os OCI image (built via BlueBuild on top of secureblue's securecore-kinoite-hardened-userns) during the install pass — no first-boot rebase, no mutable→atomic transition.
  • All future updates flow through bootc upgrade — atomic A/B, instant rollback, cosign-signed.
  • The kickstart-driven mutable-root path is deprecated at v1.0; kept alive as fallback through v0.7.

Why hybrid, not pure pivot

Pure pivot to bootc-from-scratch (Agent 3's spike plan) was 1 week to first ISO. Pure pivot to layering on secureblue is 2 days to first ISO because the hardening work is already done. The ostreecontainer refinement compresses that to 1 day by eliminating the first-boot rebase choreography (no veilor-firstboot-rebase.service, no second reboot, no transition window where the system is half-mutable, half-atomic).

Both pure-pivot paths require throwing away the partitioning UX we already have working in Anaconda. Hybrid keeps it.

Hybrid:

  • Day-zero install: Anaconda kickstart + custom partitioning + LUKS prompt (what we have today). User experience = unchanged.
  • End of install pass: ostreecontainer --url=ghcr.io/veilor/veilor-os:43 --transport=registry populates / from the OCI image. Transition is invisible.
  • First boot: veilor OCI tree, no rebase, no special service.
  • Day-2: bootc upgrade cadence for everything from then on.

We keep what works, pivot the part that doesn't.

ostreecontainer directive (refinement, locked)

Replace the %packages block in the install kickstart with:

ostreecontainer --url=ghcr.io/veilor/veilor-os:43 --transport=registry

Keep the existing part/LUKS encryption block verbatim — Anaconda partitions before ostreecontainer populates root.

Stay on ostreecontainer through v0.8. Do NOT migrate to the new bootc kickstart command until v1.0 — bootc blocks multi-disk and authenticated registries, both of which we'll likely need.

Do NOT use bootc-image-builder anaconda-iso output — deprecated in image-builder v44+. Produce the OCI image and the bootstrap ISO as separate artifacts:

  • OCI image: BlueBuild recipe → cosign-signed image at ghcr.io/veilor/veilor-os:43
  • Bootstrap ISO: Anaconda kickstart with ostreecontainer directive pointing at the OCI image

Reference: https://docs.fedoraproject.org/en-US/bootc/; pykickstart docs for ostreecontainer.

Why secureblue underneath

Question Answer
Maintainers secureblue: 30 contributors, 56 commits/5wks. veilor-os: solo.
Hardening surface secureblue ships sysctl + kargs + SELinux + USBGuard + hardened-malloc + DoT — far more than we'd build alone.
Build pipeline BlueBuild → cosign-signed OCI in GH Actions (build-all.yml, trivy.yml).
Update model bootc upgrade with A/B + instant rollback + signed image chain.
Variants kinoite-hardened-userns is the KDE+Wayland+SELinux variant we'd want.
License Apache-2.0 (compatible with our MIT).

What we override in our recipe:

  • run0 instead of sudo: revert. Breaks too many workflows.
  • Xwayland disabled: revert. Some apps still need it.
  • Veilor branding: theme, KDE color scheme, Plymouth, SDDM, font, os-release. All overlay/* ports verbatim from current repo.

(Browser stack is its own section below — Trivalent is now a kept default, not an override.)

Browser stack

Role Pick Source
Default browser Trivalent (secureblue's hardened Chromium) Fedora COPR secureblue/trivalent — tracks upstream M147+ within hours, ships hardened_malloc + JIT-less + Drumbrake WASM
Anti-fingerprint companion Mullvad Browser Clearnet, no Tor, layered alongside Trivalent for pseudonymous browsing
Optional opt-in Thorium ujust install-thorium only — WARN users of months-long CVE lag (LTS Chromium base, ~9 milestones behind upstream stable as of 2026-05)

DO NOT default to Thorium under any circumstances — contradicts the threat model. Trivalent's COPR keeps us inside one-hour-of-upstream patch latency; Thorium is multi-month-stale and is a perf/media profile choice, not a security choice.

The earlier draft of this doc treated Trivalent as an override-and- remove. That was wrong: Trivalent is exactly the level of hardening we want for a default browser. Keep it. Add Mullvad alongside. Move Thorium behind an explicit opt-in.

Mesh stack — three-layer warm-stack

Day 1 ships layers 1 (Tailscale) and 2 (Yggdrasil idle). Layer 3 (Reticulum) is opt-in via ujust.

Layer 1 — Tailscale + Headscale (daily driver)

  • Already running on nullstone, hs.s8n.ru. OIDC via Authentik.
  • Veilor OS ships tailscale-1.94.2+ from official Fedora repo.
  • Service unit pre-disabled at install time.
  • First-boot prompt: "join Veilor mesh? [paste / QR]". On accept: tailscale up --login-server=https://hs.s8n.ru with the user's pre-auth key.

Layer 2 — Yggdrasil-go (warm fallback, idle by default)

  • yggdrasil-go 0.5.13+ from COPR / dnf.
  • Decentralized IPv6 in 200::/7.
  • systemd unit enabled but config = empty Listen[], one Public peer (e.g. vpn.itrus.su or another EU peer), AllowedPublicKeys allowlist mode (no allow-all).
  • WSS:443 transport for ISP DPI evasion.
  • Generates ECC keypair on first boot via systemd-tmpfiles or firstboot script.
  • Survives ISP-level Tailscale block (threat floor (ii)).

Layer 3 — Reticulum (opt-in)

  • RetiNet AGPL fork (NOT upstream RNS — upstream has an anti-AI license clause incompatible with our governance). Sourced from the Codeberg AGPL fork.
  • Sideband (Android/desktop messenger built on RNS).
  • Install via ujust install-reticulum. NOT auto-started until RetiNet stabilizes.
  • Default config when enabled: AutoInterface (LAN multicast) + 12 TCP backbone peers.
  • RNode hardware (LoRa transceiver) bundle as separate ujust install-reticulum-rnode.
  • Survives total internet outage (threat floor (iii)) when paired with RNode.

Onboarding model

Token-based (paste OR QR, user picks). Misskey signup page mints a reusable pre-auth key (TTL=24h, single-use, regenerated per signup). First boot of Veilor ISO accepts hex paste OR QR scan of the same key.

NOT auto-OIDC at first boot — too much Authentik exposure for day-zero users.

Tier model — three-tier

  • tag:admin — onyx + failsafe. Full mesh, *:*.
  • tag:infra — nullstone, office. Mesh among themselves; admin inbound only.
  • tag:guest — Veilor OS users + friend. ONLY x.veilor:443 reachable + future seeded service hostnames whitelisted.
  • Failsafe — pre-baked admin pre-auth key on yubikey + printed paper + Authentik OIDC group tailnet-admin as second auth path.

Threat floor table

Floor Attack Day 1 (v0.7 ship) Phase 2 (v0.8)
(i) ISP blocks s8n.ru DNS Tailscale dies, Yggdrasil survives YES (documented failover)
(ii) ISP blocks Tailscale protocol Yggdrasil-WSS:443 survives YES
(iii) Internet unreachable RNS over LoRa survives OPT-IN (RetiNet + RNode)

Day 1 must hold floor (i). Floors (ii) and (iii) become P2 once Yggdrasil is promoted from idle to documented failover.

Iroh seeding daemon (Phase 2 / v0.8)

  • veilor-seed.service systemd unit, runs as _veilor-seed user.
  • Watches /var/lib/<service>/files/ blob store directories.
  • BLAKE3-hashes new blobs, registers with local iroh node.
  • Publishes tickets on per-service iroh-gossip topic.
  • LRU local cache, default 10 GB.
  • Sidecar mirrors service blob stores: Misskey /files/, Matrix media, dl.veilor downloads.
  • Other Veilor nodes pull lazily on cache miss.
  • DEFER DB replication forever. Static media only.

DOCUMENT but DO NOT IMPLEMENT until Iroh hits 1.0 (currently 0.960.98 RC season; 1.0 target Q1 2026 slipped, watching).

Reference: https://github.com/n0-computer/iroh-blobs/blob/main/DESIGN.md.

External dependency — Phase 0 (NOT veilor-os scope)

Real ACL gap on nullstone Traefik right now: friend on tag:guest can reach nullstone:443 → SNI-routes to ALL Traefik vhosts (sys.s8n.ru, pihole.s8n.ru, hs.s8n.ru, auth.s8n.ru, n8n, rc, mx, …). Only per-vhost auth blocks them. The no-guest@file Traefik middleware that should fix this is currently an 0.0.0.0/0 allow-all stub (neutralized 2026-05-03 from XFF chain breakage).

veilor-os does NOT fix this. Tracked here as an external dependency: ACL fix on nullstone Traefik required before veilor-os first-public-ISO ships, otherwise tag:guest provisioning leaks the full vhost surface to every veilor user. Parent operator owns it.

Strategic credibility win

secureblue does NOT publish a threat model. Athena OS does, and it's their main differentiator. We've already drafted docs/THREAT-MODEL.md (Agent 5 of 2026-05-05 wave). Publishing that before the v0.7 launch positions veilor-os ahead of secureblue and Athena on the one axis that matters most for a security-branded distro: honest, scoped, public threat model.

Roadmap implications

Version Status Path
v0.5.31 shipped Anaconda kickstart, mutable root
v0.5.32 active — top blockers from 9-agent wave Anaconda kickstart
v0.5.x → v0.6 maintenance Anaconda kickstart, ergonomics + UX polish
v0.7 spike 1-day BlueBuild prototype (was 2 days; ostreecontainer removes first-boot-rebase work) First veilor OCI image extending secureblue-kinoite-hardened
v0.7 ship ISO bootstraps install, ostreecontainer populates from OCI in-pass Hybrid path live
v0.8 Iroh seeding (P2P static media), Yggdrasil promoted from idle to documented failover, RetiNet stabilization watch bootc-only direction
v1.0 bootc-only, kickstart deprecated, possibly migrate ostreecontainer → new bootc kickstart command if multi-disk + auth-registry blockers resolved upstream bootc upgrade for all updates

The Containerfile-from-scratch spike plan (Agent 3 of 2026-05-05 wave) is superseded by this hybrid: don't build a Containerfile from scratch on fedora-bootc:43. Instead, write a BlueBuild recipe on securecore-kinoite-hardened-userns. With ostreecontainer swap, spike compresses 1 week → 1 day.

Next concrete steps

v0.5.32 — current (no strategy change)

Ship the 7 blockers from docs/research/2026-05-05-agent-wave/: suspend/resume wifi fix, firstboot WantedBy, USBGuard id-rules, firewalld tailscale0 zone, KMS modeset, /etc/skel branding, virtio-9p log capture.

ostreecontainer swap does NOT land in v0.5.32 main. It belongs in the v0.7 spike branch only.

v0.7-spike (1 day, separate branch)

  1. New repo dir: bluebuild/recipe.yml.
  2. from: ghcr.io/secureblue/securecore-kinoite-hardened-userns:latest.
  3. Override modules:
    • type: files — stamp our overlay/* tree (branding, themes, veilor scripts, sddm theme, plymouth theme).
    • type: rpm-ostree — install Mullvad Browser + restore Xwayland + re-enable sudo (revert run0).
    • Keep Trivalent as default (was wrongly marked for removal in the first draft of this doc).
    • type: brand — PRETTY_NAME, GRUB_DISTRIBUTOR, distributor URL.
    • type: files — pre-disabled tailscale.service, idle yggdrasil.service, ujust install-reticulum and ujust install-thorium recipes.
  4. .github/workflows/build-bluebuild.yml — pull BlueBuild action, build + cosign sign + push to GHCR.
  5. kickstart/install.ks — replace %packages block with ostreecontainer --url=ghcr.io/veilor/veilor-os:43 --transport=registry. Keep existing partitioning + LUKS block verbatim. Drop all planned veilor-firstboot-rebase.service work — no longer needed.

v1.0 — bootc-only

  • Drop kickstart/veilor-os.ks, drop livecd-creator workflow.
  • Bootstrap ISO is built as a separate artifact (NOT via bootc-image-builder anaconda-iso, which was deprecated in image-builder v44).
  • The OCI image is the source of truth.
  • veilor-update becomes thin bootc upgrade --apply wrapper.
  • Migrate ostreecontainer directive → new bootc kickstart command IF multi-disk + authenticated-registry support has landed upstream by then.

Open questions

  • Does secureblue accept upstream contributions? If yes, send our USBGuard id-based-rules fix and our threat-model framework.
  • Recovery flow when ostreecontainer install pass fails — Anaconda should abort cleanly; verify in spike that no half-installed state is bootable.
  • Iroh 1.0 timing — currently 0.960.98 RC; Q1 2026 target slipped. Re-evaluate Phase 2 schedule when 1.0 lands.
  • RetiNet upstream stabilization — track Codeberg fork for releases. If it stalls > 6 months we re-evaluate Layer 3.
  • Fedora 44 transition: secureblue tracks Fedora releases (current v4.9 on F44). If we follow, we get F44 for free at the same time upstream does.

See also