Self-hosted Forgejo + forgejo-runner on nullstone now primary. GitHub becomes public mirror (Forgejo push-mirrors every commit + every 8h). 0 GH Actions minutes consumed. Runner labels: ubuntu-24.04 — drop-in for existing build-iso.yml workflow nullstone — privileged Fedora 43 (opt-in via runs-on: nullstone) Deploy artifacts: ~/ai-lab/nullstone-server/forgejo/. External TODO (parent operator owns): - router port-forward 222 → nullstone:222 for public SSH push - no-guest@file allowlist update for external web UI access
336 lines
15 KiB
Markdown
336 lines
15 KiB
Markdown
# veilor-os Strategy — Hybrid kickstart bootstrap + bootc OCI
|
||
|
||
Decision date: **2026-05-05** (refined same day from parent-operator
|
||
handoff, locks the `ostreecontainer` install path, mesh stack
|
||
bake-in, browser stack, Iroh seeding roadmap, and threat floor table).
|
||
Locked at: **v0.5.31 → v0.7 spike → v1.0**
|
||
|
||
## TL;DR
|
||
|
||
- Keep the Anaconda-driven kickstart ISO as the **bootstrap installer**
|
||
(LUKS UX is mature, single passphrase prompt, custom partitioning
|
||
works).
|
||
- Anaconda's `ostreecontainer` directive populates the root filesystem
|
||
directly from a **veilor-os OCI image** (built via BlueBuild on top
|
||
of secureblue's `securecore-kinoite-hardened-userns`) **during the
|
||
install pass — no first-boot rebase, no mutable→atomic transition**.
|
||
- All future updates flow through `bootc upgrade` — atomic A/B,
|
||
instant rollback, cosign-signed.
|
||
- The kickstart-driven mutable-root path is deprecated at v1.0; kept
|
||
alive as fallback through v0.7.
|
||
|
||
## Why hybrid, not pure pivot
|
||
|
||
Pure pivot to bootc-from-scratch (Agent 3's spike plan) was **1 week
|
||
to first ISO**. Pure pivot to layering on secureblue is **2 days to
|
||
first ISO** because the hardening work is already done. The
|
||
`ostreecontainer` refinement compresses that to **1 day** by
|
||
eliminating the first-boot rebase choreography (no
|
||
`veilor-firstboot-rebase.service`, no second reboot, no transition
|
||
window where the system is half-mutable, half-atomic).
|
||
|
||
Both pure-pivot paths require throwing away the partitioning UX we
|
||
already have working in Anaconda. Hybrid keeps it.
|
||
|
||
Hybrid:
|
||
- **Day-zero install:** Anaconda kickstart + custom partitioning +
|
||
LUKS prompt (what we have today). User experience = unchanged.
|
||
- **End of install pass:** `ostreecontainer
|
||
--url=ghcr.io/veilor/veilor-os:43 --transport=registry` populates
|
||
`/` from the OCI image. Transition is invisible.
|
||
- **First boot:** veilor OCI tree, no rebase, no special service.
|
||
- **Day-2:** `bootc upgrade` cadence for everything from then on.
|
||
|
||
We keep what works, pivot the part that doesn't.
|
||
|
||
## ostreecontainer directive (refinement, locked)
|
||
|
||
Replace the `%packages` block in the install kickstart with:
|
||
|
||
```
|
||
ostreecontainer --url=ghcr.io/veilor/veilor-os:43 --transport=registry
|
||
```
|
||
|
||
Keep the existing `part`/LUKS encryption block verbatim — Anaconda
|
||
partitions before `ostreecontainer` populates root.
|
||
|
||
**Stay on `ostreecontainer` through v0.8.** Do NOT migrate to the new
|
||
`bootc` kickstart command until v1.0 — `bootc` blocks multi-disk and
|
||
authenticated registries, both of which we'll likely need.
|
||
|
||
**Do NOT use** `bootc-image-builder anaconda-iso` output —
|
||
deprecated in image-builder v44+. Produce the OCI image and the
|
||
bootstrap ISO as **separate artifacts**:
|
||
|
||
- OCI image: BlueBuild recipe → cosign-signed image at
|
||
`ghcr.io/veilor/veilor-os:43`
|
||
- Bootstrap ISO: Anaconda kickstart with `ostreecontainer` directive
|
||
pointing at the OCI image
|
||
|
||
Reference: <https://docs.fedoraproject.org/en-US/bootc/>; pykickstart
|
||
docs for `ostreecontainer`.
|
||
|
||
## Why secureblue underneath
|
||
|
||
| Question | Answer |
|
||
|---|---|
|
||
| Maintainers | secureblue: 30 contributors, 56 commits/5wks. veilor-os: solo. |
|
||
| Hardening surface | secureblue ships sysctl + kargs + SELinux + USBGuard + hardened-malloc + DoT — far more than we'd build alone. |
|
||
| Build pipeline | BlueBuild → cosign-signed OCI in GH Actions (`build-all.yml`, `trivy.yml`). |
|
||
| Update model | bootc upgrade with A/B + instant rollback + signed image chain. |
|
||
| Variants | `kinoite-hardened-userns` is the KDE+Wayland+SELinux variant we'd want. |
|
||
| License | Apache-2.0 (compatible with our MIT). |
|
||
|
||
What we override in our recipe:
|
||
|
||
- **`run0` instead of sudo**: revert. Breaks too many workflows.
|
||
- **Xwayland disabled**: revert. Some apps still need it.
|
||
- **Veilor branding**: theme, KDE color scheme, Plymouth, SDDM, font,
|
||
os-release. All `overlay/*` ports verbatim from current repo.
|
||
|
||
(Browser stack is its own section below — Trivalent is now a *kept*
|
||
default, not an override.)
|
||
|
||
## Browser stack
|
||
|
||
| Role | Pick | Source |
|
||
|---|---|---|
|
||
| **Default browser** | **Trivalent** (secureblue's hardened Chromium) | Fedora COPR `secureblue/trivalent` — tracks upstream M147+ within hours, ships hardened_malloc + JIT-less + Drumbrake WASM |
|
||
| **Anti-fingerprint companion** | **Mullvad Browser** | Clearnet, no Tor, layered alongside Trivalent for pseudonymous browsing |
|
||
| **Optional opt-in** | **Thorium** | `ujust install-thorium` only — WARN users of months-long CVE lag (LTS Chromium base, ~9 milestones behind upstream stable as of 2026-05) |
|
||
|
||
**DO NOT default to Thorium under any circumstances** — contradicts
|
||
the threat model. Trivalent's COPR keeps us inside one-hour-of-upstream
|
||
patch latency; Thorium is multi-month-stale and is a perf/media
|
||
profile choice, not a security choice.
|
||
|
||
The earlier draft of this doc treated Trivalent as an override-and-
|
||
remove. That was wrong: Trivalent is exactly the level of hardening
|
||
we want for a default browser. Keep it. Add Mullvad alongside.
|
||
Move Thorium behind an explicit opt-in.
|
||
|
||
## Mesh stack — three-layer warm-stack
|
||
|
||
Day 1 ships layers 1 (Tailscale) and 2 (Yggdrasil idle). Layer 3
|
||
(Reticulum) is opt-in via `ujust`.
|
||
|
||
### Layer 1 — Tailscale + Headscale (daily driver)
|
||
|
||
- Already running on `nullstone`, `hs.s8n.ru`. OIDC via Authentik.
|
||
- Veilor OS ships `tailscale-1.94.2+` from official Fedora repo.
|
||
- Service unit **pre-disabled** at install time.
|
||
- First-boot prompt: "join Veilor mesh? [paste / QR]". On accept:
|
||
`tailscale up --login-server=https://hs.s8n.ru` with the user's
|
||
pre-auth key.
|
||
|
||
### Layer 2 — Yggdrasil-go (warm fallback, idle by default)
|
||
|
||
- `yggdrasil-go` 0.5.13+ from COPR / dnf.
|
||
- Decentralized IPv6 in `200::/7`.
|
||
- systemd unit **enabled** but config = empty `Listen[]`, one
|
||
`Public peer` (e.g. `vpn.itrus.su` or another EU peer),
|
||
`AllowedPublicKeys` allowlist mode (no allow-all).
|
||
- WSS:443 transport for ISP DPI evasion.
|
||
- Generates ECC keypair on first boot via systemd-tmpfiles or
|
||
firstboot script.
|
||
- Survives ISP-level Tailscale block (threat floor (ii)).
|
||
|
||
### Layer 3 — Reticulum (opt-in)
|
||
|
||
- **RetiNet AGPL fork** (NOT upstream RNS — upstream has an anti-AI
|
||
license clause incompatible with our governance). Sourced from the
|
||
Codeberg AGPL fork.
|
||
- Sideband (Android/desktop messenger built on RNS).
|
||
- Install via `ujust install-reticulum`. NOT auto-started until
|
||
RetiNet stabilizes.
|
||
- Default config when enabled: `AutoInterface` (LAN multicast) +
|
||
1–2 TCP backbone peers.
|
||
- RNode hardware (LoRa transceiver) bundle as separate
|
||
`ujust install-reticulum-rnode`.
|
||
- Survives total internet outage (threat floor (iii)) when paired
|
||
with RNode.
|
||
|
||
## Onboarding model
|
||
|
||
Token-based (paste OR QR, user picks). Misskey signup page mints a
|
||
**reusable pre-auth key** (TTL=24h, single-use, regenerated per
|
||
signup). First boot of Veilor ISO accepts hex paste OR QR scan of
|
||
the same key.
|
||
|
||
**NOT auto-OIDC at first boot** — too much Authentik exposure for
|
||
day-zero users.
|
||
|
||
## Tier model — three-tier
|
||
|
||
- `tag:admin` — onyx + failsafe. Full mesh, `*:*`.
|
||
- `tag:infra` — nullstone, office. Mesh among themselves; admin
|
||
inbound only.
|
||
- `tag:guest` — Veilor OS users + friend. ONLY `x.veilor:443`
|
||
reachable + future seeded service hostnames whitelisted.
|
||
- **Failsafe** — pre-baked admin pre-auth key on yubikey + printed
|
||
paper + Authentik OIDC group `tailnet-admin` as second auth path.
|
||
|
||
## Threat floor table
|
||
|
||
| Floor | Attack | Day 1 (v0.7 ship) | Phase 2 (v0.8) |
|
||
|-------|--------|---|---|
|
||
| (i) | ISP blocks `s8n.ru` DNS | Tailscale dies, Yggdrasil survives | YES (documented failover) |
|
||
| (ii) | ISP blocks Tailscale protocol | Yggdrasil-WSS:443 survives | YES |
|
||
| (iii) | Internet unreachable | RNS over LoRa survives | OPT-IN (RetiNet + RNode) |
|
||
|
||
Day 1 must hold floor (i). Floors (ii) and (iii) become P2 once
|
||
Yggdrasil is promoted from idle to documented failover.
|
||
|
||
## Iroh seeding daemon (Phase 2 / v0.8)
|
||
|
||
- `veilor-seed.service` systemd unit, runs as `_veilor-seed` user.
|
||
- Watches `/var/lib/<service>/files/` blob store directories.
|
||
- BLAKE3-hashes new blobs, registers with local iroh node.
|
||
- Publishes tickets on per-service `iroh-gossip` topic.
|
||
- LRU local cache, default 10 GB.
|
||
- Sidecar mirrors service blob stores: Misskey `/files/`, Matrix
|
||
media, `dl.veilor` downloads.
|
||
- Other Veilor nodes pull lazily on cache miss.
|
||
- **DEFER DB replication forever.** Static media only.
|
||
|
||
DOCUMENT but DO NOT IMPLEMENT until **Iroh hits 1.0** (currently
|
||
0.96–0.98 RC season; 1.0 target Q1 2026 slipped, watching).
|
||
|
||
Reference: <https://github.com/n0-computer/iroh-blobs/blob/main/DESIGN.md>.
|
||
|
||
## External dependency — Phase 0 (NOT veilor-os scope)
|
||
|
||
Real ACL gap on nullstone Traefik right now: friend on `tag:guest`
|
||
can reach `nullstone:443` → SNI-routes to ALL Traefik vhosts
|
||
(`sys.s8n.ru`, `pihole.s8n.ru`, `hs.s8n.ru`, `auth.s8n.ru`, n8n, rc,
|
||
mx, …). Only per-vhost auth blocks them. The `no-guest@file` Traefik
|
||
middleware that should fix this is currently an `0.0.0.0/0`
|
||
allow-all stub (neutralized 2026-05-03 from XFF chain breakage).
|
||
|
||
**veilor-os does NOT fix this.** Tracked here as an external
|
||
dependency: ACL fix on nullstone Traefik **required before veilor-os
|
||
first-public-ISO ships**, otherwise `tag:guest` provisioning leaks
|
||
the full vhost surface to every veilor user. Parent operator owns it.
|
||
|
||
## Strategic credibility win
|
||
|
||
secureblue does NOT publish a threat model. Athena OS does, and it's
|
||
their main differentiator. We've already drafted
|
||
`docs/THREAT-MODEL.md` (Agent 5 of 2026-05-05 wave). Publishing that
|
||
*before* the v0.7 launch positions veilor-os ahead of secureblue and
|
||
Athena on the one axis that matters most for a security-branded
|
||
distro: **honest, scoped, public threat model**.
|
||
|
||
## Roadmap implications
|
||
|
||
| Version | Status | Path |
|
||
|---|---|---|
|
||
| v0.5.31 | shipped | Anaconda kickstart, mutable root |
|
||
| v0.5.32 | active — top blockers from 9-agent wave | Anaconda kickstart |
|
||
| v0.5.x → v0.6 | maintenance | Anaconda kickstart, ergonomics + UX polish |
|
||
| **v0.7 spike** | **1-day BlueBuild prototype** (was 2 days; `ostreecontainer` removes first-boot-rebase work) | First veilor OCI image extending secureblue-kinoite-hardened |
|
||
| v0.7 ship | ISO bootstraps install, `ostreecontainer` populates from OCI in-pass | Hybrid path live |
|
||
| v0.8 | Iroh seeding (P2P static media), Yggdrasil promoted from idle to documented failover, RetiNet stabilization watch | bootc-only direction |
|
||
| **v1.0** | **bootc-only**, kickstart deprecated, possibly migrate `ostreecontainer` → new `bootc` kickstart command if multi-disk + auth-registry blockers resolved upstream | `bootc upgrade` for all updates |
|
||
|
||
The Containerfile-from-scratch spike plan (Agent 3 of 2026-05-05
|
||
wave) is **superseded** by this hybrid: don't build a Containerfile
|
||
from scratch on `fedora-bootc:43`. Instead, write a BlueBuild recipe
|
||
on `securecore-kinoite-hardened-userns`. With `ostreecontainer`
|
||
swap, spike compresses 1 week → 1 day.
|
||
|
||
## Next concrete steps
|
||
|
||
### v0.5.32 — current (no strategy change)
|
||
|
||
Ship the 7 blockers from `docs/research/2026-05-05-agent-wave/`:
|
||
suspend/resume wifi fix, firstboot WantedBy, USBGuard id-rules,
|
||
firewalld tailscale0 zone, KMS modeset, /etc/skel branding, virtio-9p
|
||
log capture.
|
||
|
||
`ostreecontainer` swap **does NOT land in v0.5.32 main.** It belongs
|
||
in the v0.7 spike branch only.
|
||
|
||
### v0.7-spike (1 day, separate branch)
|
||
|
||
1. New repo dir: `bluebuild/recipe.yml`.
|
||
2. `from`: `ghcr.io/secureblue/securecore-kinoite-hardened-userns:latest`.
|
||
3. Override modules:
|
||
- `type: files` — stamp our `overlay/*` tree (branding, themes,
|
||
veilor scripts, sddm theme, plymouth theme).
|
||
- `type: rpm-ostree` — install Mullvad Browser + restore Xwayland +
|
||
re-enable sudo (revert run0).
|
||
- **Keep Trivalent** as default (was wrongly marked for removal in
|
||
the first draft of this doc).
|
||
- `type: brand` — PRETTY_NAME, GRUB_DISTRIBUTOR, distributor URL.
|
||
- `type: files` — pre-disabled `tailscale.service`, idle
|
||
`yggdrasil.service`, `ujust install-reticulum` and
|
||
`ujust install-thorium` recipes.
|
||
4. `.github/workflows/build-bluebuild.yml` — pull BlueBuild action,
|
||
build + cosign sign + push to GHCR.
|
||
5. `kickstart/install.ks` — replace `%packages` block with
|
||
`ostreecontainer --url=ghcr.io/veilor/veilor-os:43
|
||
--transport=registry`. Keep existing partitioning + LUKS block
|
||
verbatim. **Drop** all planned `veilor-firstboot-rebase.service`
|
||
work — no longer needed.
|
||
|
||
### v1.0 — bootc-only
|
||
|
||
- Drop `kickstart/veilor-os.ks`, drop `livecd-creator` workflow.
|
||
- Bootstrap ISO is built as a **separate artifact** (NOT via
|
||
`bootc-image-builder anaconda-iso`, which was deprecated in
|
||
image-builder v44).
|
||
- The OCI image is the source of truth.
|
||
- `veilor-update` becomes thin `bootc upgrade --apply` wrapper.
|
||
- Migrate `ostreecontainer` directive → new `bootc` kickstart
|
||
command IF multi-disk + authenticated-registry support has landed
|
||
upstream by then.
|
||
|
||
## Open questions
|
||
|
||
- Does secureblue accept upstream contributions? If yes, send our
|
||
USBGuard id-based-rules fix and our threat-model framework.
|
||
- Recovery flow when `ostreecontainer` install pass fails — Anaconda
|
||
should abort cleanly; verify in spike that no half-installed
|
||
state is bootable.
|
||
- Iroh 1.0 timing — currently 0.96–0.98 RC; Q1 2026 target slipped.
|
||
Re-evaluate Phase 2 schedule when 1.0 lands.
|
||
- RetiNet upstream stabilization — track Codeberg fork for releases.
|
||
If it stalls > 6 months we re-evaluate Layer 3.
|
||
- Fedora 44 transition: secureblue tracks Fedora releases (current
|
||
`v4.9` on F44). If we follow, we get F44 for free at the same time
|
||
upstream does.
|
||
|
||
## Self-hosted git + CI (locked 2026-05-05)
|
||
|
||
Primary git host moved off github.com. **Forgejo** runs on nullstone
|
||
at `git.s8n.ru`, with **forgejo-runner** doing the build work. GH free-
|
||
tier minute quota was hammering veilor-os iteration; we self-host now.
|
||
|
||
- Primary remote: `ssh://git@192.168.0.100:222/veilor-org/veilor-os.git`
|
||
(Forgejo, LAN-only until router port-forward 222 → nullstone:222
|
||
added — TODO; or use tailnet hostname once tailscale logged in).
|
||
- Public mirror: `https://github.com/veilor-org/veilor-os.git`. Forgejo
|
||
push-mirrors every commit + every 8h, so GH stays in sync without
|
||
consuming GH minutes.
|
||
- Runner labels: `ubuntu-24.04` (catthehacker image — works for our
|
||
current build-iso.yml unmodified) and `nullstone` (privileged Fedora
|
||
43 container — opt-in via `runs-on: nullstone`).
|
||
- Build cost: 0 GH minutes. Disk: ~80 GB workspace on /home/docker.
|
||
|
||
Deploy artifacts: `~/ai-lab/nullstone-server/forgejo/`. Runbook in same
|
||
dir.
|
||
|
||
## See also
|
||
|
||
- `docs/THREAT-MODEL.md` — drafted, needs publish for v0.7
|
||
- `docs/ROADMAP.md` — updated to reflect this strategy
|
||
- `docs/research/2026-05-05-agent-wave/03-bootc-spike-plan.md` —
|
||
superseded by this hybrid (kept as reference for the
|
||
Containerfile-from-scratch alternative)
|
||
- secureblue: <https://github.com/secureblue/secureblue>
|
||
- BlueBuild: <https://blue-build.org>
|
||
- bootc / ostreecontainer docs: <https://docs.fedoraproject.org/en-US/bootc/>
|
||
- Yggdrasil: <https://github.com/yggdrasil-network/yggdrasil-go>
|
||
- Reticulum manual: <https://reticulum.network/manual/>
|
||
- Iroh blobs design: <https://github.com/n0-computer/iroh-blobs/blob/main/DESIGN.md>
|