veilor-os/docs/STRATEGY.md
veilor-org b40e89a3cb
Some checks failed
Lint / Kickstart syntax (push) Failing after 2s
Lint / Shell scripts (push) Failing after 38s
Lint / No personal/onyx leaks (push) Failing after 38s
docs: STRATEGY.md — primary git host moved to git.s8n.ru (Forgejo)
Self-hosted Forgejo + forgejo-runner on nullstone now primary.
GitHub becomes public mirror (Forgejo push-mirrors every commit
+ every 8h). 0 GH Actions minutes consumed.

Runner labels:
  ubuntu-24.04 — drop-in for existing build-iso.yml workflow
  nullstone    — privileged Fedora 43 (opt-in via runs-on: nullstone)

Deploy artifacts: ~/ai-lab/nullstone-server/forgejo/.

External TODO (parent operator owns):
  - router port-forward 222 → nullstone:222 for public SSH push
  - no-guest@file allowlist update for external web UI access
2026-05-06 02:01:06 +01:00

15 KiB
Raw Blame History

veilor-os Strategy — Hybrid kickstart bootstrap + bootc OCI

Decision date: 2026-05-05 (refined same day from parent-operator handoff, locks the ostreecontainer install path, mesh stack bake-in, browser stack, Iroh seeding roadmap, and threat floor table). Locked at: v0.5.31 → v0.7 spike → v1.0

TL;DR

  • Keep the Anaconda-driven kickstart ISO as the bootstrap installer (LUKS UX is mature, single passphrase prompt, custom partitioning works).
  • Anaconda's ostreecontainer directive populates the root filesystem directly from a veilor-os OCI image (built via BlueBuild on top of secureblue's securecore-kinoite-hardened-userns) during the install pass — no first-boot rebase, no mutable→atomic transition.
  • All future updates flow through bootc upgrade — atomic A/B, instant rollback, cosign-signed.
  • The kickstart-driven mutable-root path is deprecated at v1.0; kept alive as fallback through v0.7.

Why hybrid, not pure pivot

Pure pivot to bootc-from-scratch (Agent 3's spike plan) was 1 week to first ISO. Pure pivot to layering on secureblue is 2 days to first ISO because the hardening work is already done. The ostreecontainer refinement compresses that to 1 day by eliminating the first-boot rebase choreography (no veilor-firstboot-rebase.service, no second reboot, no transition window where the system is half-mutable, half-atomic).

Both pure-pivot paths require throwing away the partitioning UX we already have working in Anaconda. Hybrid keeps it.

Hybrid:

  • Day-zero install: Anaconda kickstart + custom partitioning + LUKS prompt (what we have today). User experience = unchanged.
  • End of install pass: ostreecontainer --url=ghcr.io/veilor/veilor-os:43 --transport=registry populates / from the OCI image. Transition is invisible.
  • First boot: veilor OCI tree, no rebase, no special service.
  • Day-2: bootc upgrade cadence for everything from then on.

We keep what works, pivot the part that doesn't.

ostreecontainer directive (refinement, locked)

Replace the %packages block in the install kickstart with:

ostreecontainer --url=ghcr.io/veilor/veilor-os:43 --transport=registry

Keep the existing part/LUKS encryption block verbatim — Anaconda partitions before ostreecontainer populates root.

Stay on ostreecontainer through v0.8. Do NOT migrate to the new bootc kickstart command until v1.0 — bootc blocks multi-disk and authenticated registries, both of which we'll likely need.

Do NOT use bootc-image-builder anaconda-iso output — deprecated in image-builder v44+. Produce the OCI image and the bootstrap ISO as separate artifacts:

  • OCI image: BlueBuild recipe → cosign-signed image at ghcr.io/veilor/veilor-os:43
  • Bootstrap ISO: Anaconda kickstart with ostreecontainer directive pointing at the OCI image

Reference: https://docs.fedoraproject.org/en-US/bootc/; pykickstart docs for ostreecontainer.

Why secureblue underneath

Question Answer
Maintainers secureblue: 30 contributors, 56 commits/5wks. veilor-os: solo.
Hardening surface secureblue ships sysctl + kargs + SELinux + USBGuard + hardened-malloc + DoT — far more than we'd build alone.
Build pipeline BlueBuild → cosign-signed OCI in GH Actions (build-all.yml, trivy.yml).
Update model bootc upgrade with A/B + instant rollback + signed image chain.
Variants kinoite-hardened-userns is the KDE+Wayland+SELinux variant we'd want.
License Apache-2.0 (compatible with our MIT).

What we override in our recipe:

  • run0 instead of sudo: revert. Breaks too many workflows.
  • Xwayland disabled: revert. Some apps still need it.
  • Veilor branding: theme, KDE color scheme, Plymouth, SDDM, font, os-release. All overlay/* ports verbatim from current repo.

(Browser stack is its own section below — Trivalent is now a kept default, not an override.)

Browser stack

Role Pick Source
Default browser Trivalent (secureblue's hardened Chromium) Fedora COPR secureblue/trivalent — tracks upstream M147+ within hours, ships hardened_malloc + JIT-less + Drumbrake WASM
Anti-fingerprint companion Mullvad Browser Clearnet, no Tor, layered alongside Trivalent for pseudonymous browsing
Optional opt-in Thorium ujust install-thorium only — WARN users of months-long CVE lag (LTS Chromium base, ~9 milestones behind upstream stable as of 2026-05)

DO NOT default to Thorium under any circumstances — contradicts the threat model. Trivalent's COPR keeps us inside one-hour-of-upstream patch latency; Thorium is multi-month-stale and is a perf/media profile choice, not a security choice.

The earlier draft of this doc treated Trivalent as an override-and- remove. That was wrong: Trivalent is exactly the level of hardening we want for a default browser. Keep it. Add Mullvad alongside. Move Thorium behind an explicit opt-in.

Mesh stack — three-layer warm-stack

Day 1 ships layers 1 (Tailscale) and 2 (Yggdrasil idle). Layer 3 (Reticulum) is opt-in via ujust.

Layer 1 — Tailscale + Headscale (daily driver)

  • Already running on nullstone, hs.s8n.ru. OIDC via Authentik.
  • Veilor OS ships tailscale-1.94.2+ from official Fedora repo.
  • Service unit pre-disabled at install time.
  • First-boot prompt: "join Veilor mesh? [paste / QR]". On accept: tailscale up --login-server=https://hs.s8n.ru with the user's pre-auth key.

Layer 2 — Yggdrasil-go (warm fallback, idle by default)

  • yggdrasil-go 0.5.13+ from COPR / dnf.
  • Decentralized IPv6 in 200::/7.
  • systemd unit enabled but config = empty Listen[], one Public peer (e.g. vpn.itrus.su or another EU peer), AllowedPublicKeys allowlist mode (no allow-all).
  • WSS:443 transport for ISP DPI evasion.
  • Generates ECC keypair on first boot via systemd-tmpfiles or firstboot script.
  • Survives ISP-level Tailscale block (threat floor (ii)).

Layer 3 — Reticulum (opt-in)

  • RetiNet AGPL fork (NOT upstream RNS — upstream has an anti-AI license clause incompatible with our governance). Sourced from the Codeberg AGPL fork.
  • Sideband (Android/desktop messenger built on RNS).
  • Install via ujust install-reticulum. NOT auto-started until RetiNet stabilizes.
  • Default config when enabled: AutoInterface (LAN multicast) + 12 TCP backbone peers.
  • RNode hardware (LoRa transceiver) bundle as separate ujust install-reticulum-rnode.
  • Survives total internet outage (threat floor (iii)) when paired with RNode.

Onboarding model

Token-based (paste OR QR, user picks). Misskey signup page mints a reusable pre-auth key (TTL=24h, single-use, regenerated per signup). First boot of Veilor ISO accepts hex paste OR QR scan of the same key.

NOT auto-OIDC at first boot — too much Authentik exposure for day-zero users.

Tier model — three-tier

  • tag:admin — onyx + failsafe. Full mesh, *:*.
  • tag:infra — nullstone, office. Mesh among themselves; admin inbound only.
  • tag:guest — Veilor OS users + friend. ONLY x.veilor:443 reachable + future seeded service hostnames whitelisted.
  • Failsafe — pre-baked admin pre-auth key on yubikey + printed paper + Authentik OIDC group tailnet-admin as second auth path.

Threat floor table

Floor Attack Day 1 (v0.7 ship) Phase 2 (v0.8)
(i) ISP blocks s8n.ru DNS Tailscale dies, Yggdrasil survives YES (documented failover)
(ii) ISP blocks Tailscale protocol Yggdrasil-WSS:443 survives YES
(iii) Internet unreachable RNS over LoRa survives OPT-IN (RetiNet + RNode)

Day 1 must hold floor (i). Floors (ii) and (iii) become P2 once Yggdrasil is promoted from idle to documented failover.

Iroh seeding daemon (Phase 2 / v0.8)

  • veilor-seed.service systemd unit, runs as _veilor-seed user.
  • Watches /var/lib/<service>/files/ blob store directories.
  • BLAKE3-hashes new blobs, registers with local iroh node.
  • Publishes tickets on per-service iroh-gossip topic.
  • LRU local cache, default 10 GB.
  • Sidecar mirrors service blob stores: Misskey /files/, Matrix media, dl.veilor downloads.
  • Other Veilor nodes pull lazily on cache miss.
  • DEFER DB replication forever. Static media only.

DOCUMENT but DO NOT IMPLEMENT until Iroh hits 1.0 (currently 0.960.98 RC season; 1.0 target Q1 2026 slipped, watching).

Reference: https://github.com/n0-computer/iroh-blobs/blob/main/DESIGN.md.

External dependency — Phase 0 (NOT veilor-os scope)

Real ACL gap on nullstone Traefik right now: friend on tag:guest can reach nullstone:443 → SNI-routes to ALL Traefik vhosts (sys.s8n.ru, pihole.s8n.ru, hs.s8n.ru, auth.s8n.ru, n8n, rc, mx, …). Only per-vhost auth blocks them. The no-guest@file Traefik middleware that should fix this is currently an 0.0.0.0/0 allow-all stub (neutralized 2026-05-03 from XFF chain breakage).

veilor-os does NOT fix this. Tracked here as an external dependency: ACL fix on nullstone Traefik required before veilor-os first-public-ISO ships, otherwise tag:guest provisioning leaks the full vhost surface to every veilor user. Parent operator owns it.

Strategic credibility win

secureblue does NOT publish a threat model. Athena OS does, and it's their main differentiator. We've already drafted docs/THREAT-MODEL.md (Agent 5 of 2026-05-05 wave). Publishing that before the v0.7 launch positions veilor-os ahead of secureblue and Athena on the one axis that matters most for a security-branded distro: honest, scoped, public threat model.

Roadmap implications

Version Status Path
v0.5.31 shipped Anaconda kickstart, mutable root
v0.5.32 active — top blockers from 9-agent wave Anaconda kickstart
v0.5.x → v0.6 maintenance Anaconda kickstart, ergonomics + UX polish
v0.7 spike 1-day BlueBuild prototype (was 2 days; ostreecontainer removes first-boot-rebase work) First veilor OCI image extending secureblue-kinoite-hardened
v0.7 ship ISO bootstraps install, ostreecontainer populates from OCI in-pass Hybrid path live
v0.8 Iroh seeding (P2P static media), Yggdrasil promoted from idle to documented failover, RetiNet stabilization watch bootc-only direction
v1.0 bootc-only, kickstart deprecated, possibly migrate ostreecontainer → new bootc kickstart command if multi-disk + auth-registry blockers resolved upstream bootc upgrade for all updates

The Containerfile-from-scratch spike plan (Agent 3 of 2026-05-05 wave) is superseded by this hybrid: don't build a Containerfile from scratch on fedora-bootc:43. Instead, write a BlueBuild recipe on securecore-kinoite-hardened-userns. With ostreecontainer swap, spike compresses 1 week → 1 day.

Next concrete steps

v0.5.32 — current (no strategy change)

Ship the 7 blockers from docs/research/2026-05-05-agent-wave/: suspend/resume wifi fix, firstboot WantedBy, USBGuard id-rules, firewalld tailscale0 zone, KMS modeset, /etc/skel branding, virtio-9p log capture.

ostreecontainer swap does NOT land in v0.5.32 main. It belongs in the v0.7 spike branch only.

v0.7-spike (1 day, separate branch)

  1. New repo dir: bluebuild/recipe.yml.
  2. from: ghcr.io/secureblue/securecore-kinoite-hardened-userns:latest.
  3. Override modules:
    • type: files — stamp our overlay/* tree (branding, themes, veilor scripts, sddm theme, plymouth theme).
    • type: rpm-ostree — install Mullvad Browser + restore Xwayland + re-enable sudo (revert run0).
    • Keep Trivalent as default (was wrongly marked for removal in the first draft of this doc).
    • type: brand — PRETTY_NAME, GRUB_DISTRIBUTOR, distributor URL.
    • type: files — pre-disabled tailscale.service, idle yggdrasil.service, ujust install-reticulum and ujust install-thorium recipes.
  4. .github/workflows/build-bluebuild.yml — pull BlueBuild action, build + cosign sign + push to GHCR.
  5. kickstart/install.ks — replace %packages block with ostreecontainer --url=ghcr.io/veilor/veilor-os:43 --transport=registry. Keep existing partitioning + LUKS block verbatim. Drop all planned veilor-firstboot-rebase.service work — no longer needed.

v1.0 — bootc-only

  • Drop kickstart/veilor-os.ks, drop livecd-creator workflow.
  • Bootstrap ISO is built as a separate artifact (NOT via bootc-image-builder anaconda-iso, which was deprecated in image-builder v44).
  • The OCI image is the source of truth.
  • veilor-update becomes thin bootc upgrade --apply wrapper.
  • Migrate ostreecontainer directive → new bootc kickstart command IF multi-disk + authenticated-registry support has landed upstream by then.

Open questions

  • Does secureblue accept upstream contributions? If yes, send our USBGuard id-based-rules fix and our threat-model framework.
  • Recovery flow when ostreecontainer install pass fails — Anaconda should abort cleanly; verify in spike that no half-installed state is bootable.
  • Iroh 1.0 timing — currently 0.960.98 RC; Q1 2026 target slipped. Re-evaluate Phase 2 schedule when 1.0 lands.
  • RetiNet upstream stabilization — track Codeberg fork for releases. If it stalls > 6 months we re-evaluate Layer 3.
  • Fedora 44 transition: secureblue tracks Fedora releases (current v4.9 on F44). If we follow, we get F44 for free at the same time upstream does.

Self-hosted git + CI (locked 2026-05-05)

Primary git host moved off github.com. Forgejo runs on nullstone at git.s8n.ru, with forgejo-runner doing the build work. GH free- tier minute quota was hammering veilor-os iteration; we self-host now.

  • Primary remote: ssh://git@192.168.0.100:222/veilor-org/veilor-os.git (Forgejo, LAN-only until router port-forward 222 → nullstone:222 added — TODO; or use tailnet hostname once tailscale logged in).
  • Public mirror: https://github.com/veilor-org/veilor-os.git. Forgejo push-mirrors every commit + every 8h, so GH stays in sync without consuming GH minutes.
  • Runner labels: ubuntu-24.04 (catthehacker image — works for our current build-iso.yml unmodified) and nullstone (privileged Fedora 43 container — opt-in via runs-on: nullstone).
  • Build cost: 0 GH minutes. Disk: ~80 GB workspace on /home/docker.

Deploy artifacts: ~/ai-lab/nullstone-server/forgejo/. Runbook in same dir.

See also