Compare commits

...

77 commits

Author SHA1 Message Date
s8n
5c961eba88 Update docs/DOCS-DOCS.md
Some checks failed
Lint / Kickstart syntax (push) Failing after 1s
Lint / Shell scripts (push) Failing after 6s
Lint / No personal/onyx leaks (push) Failing after 3s
2026-05-06 18:03:37 +01:00
obsidian-ai
8c70030d80 docs(ROADMAP): pivot — v0.6 cancelled, v0.7 BlueBuild OCI is mainline
Strategy pivot 2026-05-06: v0.5.32 produced a green ISO on Forgejo
runner. That's the kickstart-path proof point. Continuing v0.6
kickstart polish is sunk-cost work on tooling retired at v1.0.

Pivot:
- v0.5.0 is the FINAL kickstart-path release. Tag, freeze, ship.
- v0.6 cancelled as a milestone. Original plan kept inline as
  HISTORICAL reference.
- v0.7 promoted to primary active milestone. Absorbs the v0.6
  ergonomic CLI tools (veilor-postinstall / veilor-doctor /
  veilor-update) with bootc upgrade replacing dnf upgrade.
- Active branch: v0.7-bluebuild-spike. All future feature work lands
  there, not on main.
2026-05-06 15:55:08 +01:00
obsidian-ai
89c7df0ecc docs: add PROOF-OF-WORK.md — receipts of work, tooling, and decisions
Single document that surfaces the depth of work behind veilor-os:
metrics, distros studied, every tool traversed in the build chain,
all 35+ failure classes hit and beaten, key engineering decisions and
why, what's in the repo beyond the kickstart, and the self-hosted
nullstone CI infrastructure built to support it.

Receipts not narrative — every claim links back to a file path,
commit, error, or config. Useful as portfolio anchor and as a single
read-this-first for anyone returning to the project after a gap.
2026-05-06 15:51:40 +01:00
obsidian-ai
c2b4df8ef9 ci: gate cosign/sbom/attest steps to github only
cosign keyless sign uses Sigstore Fulcio which requires a
Fulcio-trusted OIDC issuer. Forgejo runs don't have one, so cosign
falls back to the interactive device flow and times out
(error obtaining token: expired_token). Same applies to
attest-build-provenance and the SBOM action's signed attestation.

Skip all three on Forgejo for now; ISO + sha256 are sufficient for
v0.5.x test releases. Re-add when we self-host a Sigstore stack or
sign with a key-pair instead of keyless.
2026-05-06 15:41:00 +01:00
obsidian-ai
b9df392fbc docs(README): tone down secureblue credit (no code lifted yet)
We layer on their OCI image as v0.7 base; we don't redistribute their
source. Drop the AGPLv3-attribution prose — that becomes relevant only
if/when we ship a verbatim chunk of their config/policy in our repo.
2026-05-06 15:38:35 +01:00
obsidian-ai
84fa325e46 docs(README): add secureblue column + upstream credit section
secureblue (AGPLv3) is the upstream hardened atomic Fedora that the
v0.7 BlueBuild spike layers on top of. Comparison table now includes
secureblue alongside Kicksecure + stock Fedora KDE. New "Credit &
relationship to secureblue" section spells out where their work
already solves problems we don't need to reinvent (Trivalent,
SELinux policy, kernel cmdline, signed OCI), how veilor-os differs
(kickstart install path + branding + Forgejo CI), and the AGPLv3
attribution rule for any code we lift verbatim.
2026-05-06 15:32:22 +01:00
obsidian-ai
1e4ca2b56b ci: symlink /work -> GITHUB_WORKSPACE for ks %post SRC probe 2026-05-06 14:58:24 +01:00
obsidian-ai
446c602683 ks: drop apparmor-* packages — not in Fedora 43 repos
Build error: 'Failed to find package apparmor-parser : No match for
argument'. Fedora 43 base and updates do not ship AppArmor packages;
the prior comment was incorrect. Defer AppArmor to v0.7 secureblue OCI
hybrid (which has its own LSM stack), or land via COPR overlay later.
2026-05-06 14:48:06 +01:00
obsidian-ai
ac5c29df42 ci: run build directly in Fedora job container, drop addnab nest
forgejo-runner labels nullstone -> fedora:43 image. Switching
runs-on: ubuntu-24.04 -> nullstone makes the job container itself
the build environment, eliminating the docker-in-docker workspace
bind-mount problem (host path != act-container path).

Build now runs as root in fedora:43, installs livecd-tools directly
via dnf, and writes outputs to $GITHUB_WORKSPACE which is the natural
runner workdir on host. No nested docker, no userns juggling, no
explicit -v workspace bind needed.
2026-05-06 14:35:44 +01:00
obsidian-ai
6f4842a75c ci: add /work diagnostic before sed-redirect to surface bind/perm issue 2026-05-06 14:19:13 +01:00
obsidian-ai
4b90e7e00b ci: repin fedora:43 build container to amd64 digest
Prior pin was the arm64 manifest digest (linux/arm64/v8); on x86_64
host it failed with `exec /usr/bin/sh: exec format error`. Pinned to
the amd64 manifest entry from the same fat-manifest.
2026-05-06 14:10:25 +01:00
obsidian-ai
7a0c665cf0 ci: add --userns=host to nested Fedora build container
Forgejo runner on nullstone runs against a daemon with
userns-remap=default. addnab/docker-run-action launches the Fedora 43
build container with --privileged, which is incompatible with
userns-remap unless --userns=host is also set.
2026-05-06 14:07:22 +01:00
obsidian-ai
d38fce4cb8 ci: pin sbom/cosign/attest actions to node20-safe versions
forgejo-runner v6.4.0 ships node20; floating tags @v0/@v3/@v2 now
resolve to actions whose runs.using=node24, which the runner cannot
exec. Pin to last node20-shipping release of each:

- anchore/sbom-action@v0.17.2
- sigstore/cosign-installer@v3.7.0
- actions/attest-build-provenance@v2.2.3
2026-05-06 13:57:49 +01:00
s8n
bc738c1c7b Merge pull request 'ci: gate softprops release steps + add Forgejo API equivalents' (#5) from feat/a1-forgejo-ci-adapt into main 2026-05-06 13:53:02 +01:00
s8n
a3f6c1a1a6 ci: gate softprops release steps + add Forgejo API equivalents
The build-iso workflow used softprops/action-gh-release@v2 unconditionally,
which only speaks the GitHub Releases REST API. When the workflow runs on
the Forgejo runner registered on nullstone, those steps would fail.

Add a server_url check so the GH-only path runs only on github.com, and
mirror it with a curl-based step that hits the Forgejo /api/v1/releases
endpoints. Behaviour:
  - github.com: identical to before (action-gh-release@v2).
  - git.s8n.ru: drop+recreate ci-latest release, upload chunked assets
                via the Forgejo attachments API.

Tag-driven "Attach to release" path mirrored the same way.

Refs: A1 build-eng task — Forgejo runner adaptation.
2026-05-06 13:52:43 +01:00
s8n
356013e1ca Merge pull request 'feat(installer): v0.6 ergonomics + polish — 5 quick wins' (#3) from feat/ux-installer-v06-polish into main 2026-05-06 13:47:35 +01:00
s8n
417acb5585 Merge pull request 'sec: AppArmor v0.6 stub — load profiles in complain mode' (#11) from feat/sec-apparmor-v06-stubs into main 2026-05-06 13:47:31 +01:00
s8n
df574e00f5 Merge pull request 'ci: cosign keyless sigs, SBOM, provenance + fedora digest pin' (#7) from feat/sre-cosign-sbom-attestation into main 2026-05-06 13:47:27 +01:00
s8n
3e660534a1 Merge pull request 'ci: pin actions to node20-safe tags + runner sock pass-through' (#8) from feat/runner-fix-docker-sock-and-node20 into main 2026-05-06 13:47:20 +01:00
s8n
749bcef5b4 Merge pull request 'sec: polish THREAT-MODEL.md for v0.7 public launch' (#10) from feat/sec-threat-model-polish into main 2026-05-06 13:46:58 +01:00
s8n
77ed91ed8e Merge pull request 'docs: test run report skeleton for v0.5.32 (Forgejo build)' (#4) from feat/docs-test-run-v0.5.32 into main 2026-05-06 13:46:12 +01:00
s8n
ad059ec73e docs: README de-GH + Forgejo build status 2026-05-06 13:45:33 +01:00
s8n
05d37f6419 docs: METHOD-CHANGELOG 2026-05-06 forgejo entry 2026-05-06 13:45:29 +01:00
s8n
eafb8b7aa1 sec: AppArmor v0.6 stub — load profiles in complain mode
Per docs/research/2026-05-05-agent-wave/04-hardening-tier-2.md (v0.6
scope item 1).

Adds:
  - apparmor-parser apparmor-utils apparmor-profiles to %packages in
    BOTH kickstart/veilor-os.ks (live ks) and overlay/usr/local/bin/
    veilor-installer (generated install ks heredoc).
  - scripts/40-apparmor.sh — wires aa-complain on every veilor-shipped
    profile. Idempotent. "loaded, present, nothing breaks".
  - overlay/etc/apparmor.d/veilor.d/firefox — 1-liner stub (binary
    confinement marker only; full policy post-v0.6).
  - overlay/etc/apparmor.d/veilor.d/thunderbird — same pattern.
  - Wired 40-apparmor.sh into install %post chain after
    30-apply-v03-theme.sh.

Complain mode means: profiles loaded, kernel logs syscall denials but
does NOT enforce. Operator can review audit.log post-install to
inform v0.7 policy authoring.
2026-05-06 11:15:30 +01:00
s8n
d0738970e0 sec: polish THREAT-MODEL.md for v0.7 public launch
Status flipped Draft → Final.

In-scope rows now cite specific config files / settings (auditable
from clean checkout):
  - LUKS2 params from kickstart/veilor-os.ks
  - sysctl knobs file path
  - USBGuard policy mode + rule type
  - sshd_config drop-in path + every directive
  - auditd rule path + watched paths
  - chrony NTS endpoints
  - systemd-resolved DoT settings
  - bootloader kernel args (lockdown, slab_nomerge, init_on_alloc/free, etc.)

Out-of-scope rows un-hedged. 'May not always' phrasings removed; each
adversary states unambiguously what veilor-os does NOT do.
2026-05-06 11:14:34 +01:00
obsidian-ai
7974ed7a6e ci: pin actions to node20-safe tags + runner sock pass-through
forgejo-runner v6.4.0 ships a node20 javascript engine. v4.2+ of
actions/checkout and v2.0.5+ of softprops/action-gh-release moved to
node24, which the runner refuses to exec. Pin both to last node20
release.

Pairs with a runner-side config change (separately deployed on
nullstone /home/docker/forgejo-runner/conf/config.yaml) that adds
`-v /var/run/docker.sock:/var/run/docker.sock` to per-job container
options + whitelists the socket via valid_volumes — without that
addnab/docker-run-action@v3 inside the catthehacker/ubuntu job
container can't reach the docker engine.

- actions/checkout v4 -> v4.1.7
- softprops/action-gh-release v2 -> v2.0.4
- addnab/docker-run-action v3 unchanged (composite/docker, no node)
- ludeeus/action-shellcheck@master unchanged (docker-based)
2026-05-06 10:50:15 +01:00
veilor-org
f06ee5cc1c chore: gitignore auto-install-vm test artifacts
Test rig was leaking test/auto-install-vm.{nvram,qcow2} into untracked
state. Pattern matches existing veilor-vm.* exclusion.
2026-05-06 10:50:04 +01:00
veilor-org
130f0432dd ci: TODO marker for SHA-pinning third-party actions
Note that all `uses:` directives still resolve to mutable major-
version tags. SHA-pinning is the Agent 8 audit recommendation but
requires per-action web lookups that stalled the previous SRE
attempt; tracked separately so this PR can land first.
2026-05-06 10:41:19 +01:00
veilor-org
08f16bb2ee ci: pin fedora:43 base image to digest
Pin registry.fedoraproject.org/fedora:43 to its current manifest
digest so a malicious or accidental tag-rewrite upstream cannot
silently change the base layer of every CI build. Digest was
captured via `skopeo inspect --raw` on 2026-05-06. Refresh
procedure documented inline.
2026-05-06 10:41:10 +01:00
veilor-org
25b8d30f35 ci: add cosign keyless sigs, SBOM, and provenance attestation
Sign each ISO chunk with cosign keyless OIDC, generate an SPDX SBOM
of the build output, and attach an in-toto build-provenance
attestation. Sigs/certs/SBOM are uploaded alongside the ISO parts in
the ci-latest rolling prerelease so the test/auto-install.sh path
can verify before reassembling.

Action versions are major-version tags (@v3, @v0, @v2). SHA-pinning
is tracked separately to keep this PR small and avoid the long web
lookups that stalled the previous attempt.
2026-05-06 10:40:56 +01:00
veilor-org
aa731f9daa docs: test run report skeleton for v0.5.32 (Forgejo build)
First test-runs/ report off the new template. Records the build host
(forgejo-runner on nullstone, ubuntu-24.04 / catthehacker:act-24.04),
notes that v0.5.32 is the first ISO produced after the GH Actions
mirror was disabled, and pre-populates the Findings section with the
7 v0.5.32 blocker fixes from the 2026-05-05 9-agent wave as expected
behaviours the tester must verify.

Result is left as "pending A1 build" — the operator + A5 fill in
per-step pass/fail and hardening output once the actual VM walkthrough
runs against the produced ISO. This is intentional: the report is the
scaffold; the test is a separate step.
2026-05-06 10:34:06 +01:00
veilor-org
441f7d057f feat(installer): promote eject-media reminder to its own box
In v0.5 the "Remove the install media" reminder was a single line
inside the green success box, and operators on both onyx and the
friend's RTX 4080 rig missed it — rebooted into the live ISO and
re-ran the installer thinking the install had silently failed.

Promote the reminder to its own loud yellow thick-bordered gum-style
box stacked directly below the success/countdown box, with three
lines of explanation. Renders for the full 10s of the countdown so
it stays in the operator's face the entire window.
2026-05-06 10:32:30 +01:00
veilor-org
816fc0ee68 feat(installer): 10s reboot countdown with per-tick redraw
v0.5 used `sleep 5` after a static "System will reboot in 5 seconds."
box, which left the operator guessing how much time was left to grab
the USB stick. The new loop runs 10 → 1, clearing + redrawing the
gum-style success box each tick with the remaining-seconds figure,
giving the operator a visible window to act.

10 seconds (vs 5) because real hardware operators were missing the
window — laptops with the USB on the far side of the dock take
4-5 seconds to physically reach. 10 is comfortable, not annoying.
2026-05-06 10:32:11 +01:00
veilor-org
44f0c787a7 feat(installer): confirm-twice for LUKS passphrase + admin password
A typo in the LUKS passphrase is unrecoverable — the disk is
unmountable without it and we don't escrow the key. Re-prompting
until the two reads match catches keyboard-layout surprises (the
US/UK quote-key position is the most common one) before they brick
the install.

Admin password gets the same treatment for consistency. Less
catastrophic (resettable from a recovery shell) but a mismatch
still locks the user out of their fresh install on first boot.

Loop bails on cancel/ESC and re-prompts on validate_pw failure.
2026-05-06 10:31:21 +01:00
veilor-org
900f5465b3 feat(installer): staged banner reveal at 40ms/line
Read banner.txt line by line with a 40ms sleep between each, then
clear and redraw the bordered gum-style version. 5-line banner ×
40ms = 200ms total reveal — slow enough to land an aesthetic on the
first frame, fast enough that the operator never feels it as lag.

Pure cosmetic; no functional change to the install flow.
2026-05-06 10:31:02 +01:00
veilor-org
63c5e199d9 fix(installer): swap gum input --password for bash read -srp
`gum input --password` corrupts the linux fbcon since v0.5.27 — the
bubbletea screen-restore writes back the previous menu buffer because
the framebuffer terminfo entry lacks `civis/cnorm` cursor-hide
sequences, leaving a duplicate "Install" plus a stray "T" rendered on
top of the password field. The fix is a single termios echo-off via
`read -srp`: no redraw, no glitch, no dependency on gum's TUI layer
for the one screen where it broke.

Header still rendered through `gum style` so visual parity with the
disk picker / confirm box is preserved. Whiptail fallback path
unchanged (passwordbox there has always rendered cleanly).
2026-05-06 10:30:06 +01:00
veilor-org
abb67841f1 docs: STRATEGY.md — primary git host moved to git.s8n.ru (Forgejo)
Self-hosted Forgejo + forgejo-runner on nullstone now primary.
GitHub becomes public mirror (Forgejo push-mirrors every commit
+ every 8h). 0 GH Actions minutes consumed.

Runner labels:
  ubuntu-24.04 — drop-in for existing build-iso.yml workflow
  nullstone    — privileged Fedora 43 (opt-in via runs-on: nullstone)

Deploy artifacts: ~/ai-lab/nullstone-server/forgejo/.

External TODO (parent operator owns):
  - router port-forward 222 → nullstone:222 for public SSH push
  - no-guest@file allowlist update for external web UI access
2026-05-06 02:01:06 +01:00
veilor-org
b86b4f9ec3 v0.5.32: ship 7 blockers from 9-agent wave
Per docs/research/2026-05-05-agent-wave/README.md priority list.
All 7 land together to keep iteration cycles useful — partial fixes
bury the lookahead findings agents already mapped.

## 1. CRITICAL — suspend/resume wifi death (Agent 9, B2)

`veilor-modules-lock.service` runs `kernel.modules_disabled=1` 30s
after graphical.target. iwlwifi/iwlmvm/cfg80211 reload on resume
from S3/S0ix → with modules locked, resume breaks wifi until
reboot. Same architectural class as the LUKS bug — security feature
breaks legitimate kernel state transitions.

The unit already has `ConditionKernelCommandLine=!module.sig_enforce=1`
(self-skip when signed-modules enforcement is on cmdline). Adding
`module.sig_enforce=1` to the kernel cmdline retains the security
property (no unsigned modules) without runtime lock-down → resume
works.

Files: kickstart/veilor-os.ks line 61 + overlay/usr/local/bin/veilor-installer
generated bootloader directive both gain `module.sig_enforce=1`.

## 2. veilor-firstboot.service WantedBy=graphical.target (Agent 2)

Was `WantedBy=multi-user.target` only. Real installs default to
graphical.target so the unit never ran on installed systems — admin
pw stayed at install-time + chage -d 0 expired, SDDM PAM bounced
to chauthtok screen (recoverable but ugly UX).

Now `WantedBy=graphical.target multi-user.target`. Live ISO +
multi-user installs both resolve via this list.

## 3. USBGuard hash → id-based baseline (Agent 9, A3)

Mirrors memory feedback_usbguard_dock.md — onyx had hash+parent-hash
rules that broke on dock replug; we shipped no rules.conf so first
boot blocks the USB keyboard.

Adds overlay/etc/usbguard/rules.conf with HID-class allow rule
(`allow with-interface match-all { 03:*:* }`) — covers every USB
keyboard, mouse, gamepad, fingerprint reader, NFC. Survives dock
replug + kernel-bump vendor renumeration. Mass-storage stays
implicit-block; user explicitly allows post-firstboot via
`ujust veilor-usbguard-enroll` (planned v0.6).

## 4. firewalld trusted zone with tailscale0 pre-bound (Agent 9, D1)

User uses Tailscale daily (memory: project_tailscale_mesh.md).
Default firewalld zone = drop, blocks tailnet traffic on tailscale0.

Adds overlay/etc/firewalld/zones/trusted.xml with
`<interface name="tailscale0"/>`. After `tailscale up` brings the
interface up, NetworkManager dispatcher associates it with the
trusted zone automatically — no user intervention.

Default zone stays drop. Only the tailscale0 interface gets ACCEPT.

## 5. /etc/skel branding (Agent 7)

Was completely empty. Result: per-user KDE config (~/.config/kdeglobals
etc.) pre-empty, so the moment user opened System Settings, KDE wrote
fresh ~/.config/* and silently shadowed our /etc/xdg/kdedefaults/*.
Visual brand evaporated on first click.

Seeds:
  /etc/skel/.config/kdeglobals    (copy of assets/kde/veilor-default.kdeglobals)
  /etc/skel/.config/breezerc      (copy of assets/kde/breezerc)
  /etc/skel/.config/kwinrc        (Plasma 6 wayland defaults: opengl, animspeed=0,
                                    blur off, click-to-focus)
  /etc/skel/.config/konsolerc     (default profile = Veilor)
  /etc/skel/.local/share/konsole/Veilor.profile + .colorscheme

User who opens System Settings now writes against branded baseline,
not against vanilla Breeze.

## 6. KMS modeset args + initramfs keymap (Agents 1 + 9)

Real laptop boot has a 5-15s blank between vt switch and SDDM start
because simpledrm releases before i915/nvidia-drm/amdgpu claim. Plus
non-US users get locked out at LUKS prompt because initramfs ships
en-US keymap by default (RHBZ 1405539, RHBZ 1890085).

Adds to bootloader cmdline (live + installed):
  i915.modeset=1 amdgpu.modeset=1 nvidia-drm.modeset=1
  rd.vconsole.keymap=us

`rd.vconsole.keymap=us` is a placeholder; the v0.6 firstboot keymap
picker will rewrite it from /etc/vconsole.conf. Until then, en-US
users get correct LUKS keyboard; non-US users still need the v0.6
fix (per Agent 1).

## 7. virtio-9p log capture (Agent 6)

The v0.5.30 virtio-serial wiring depends on rsyslog inside the live
ISO (anaconda's setupVirtio writes a rsyslog forward rule), which
the live ks doesn't install — files were 0-byte across three
install runs.

test/run-vm.sh now adds a `-virtfs local,...,mount_tag=hostlogs`
share pointing at `test/test-runs/<timestamp>/`. veilor-installer
runs `_dump_logs_to_host` via EXIT trap that mounts the share at
/mnt/hostlogs and rsyncs /tmp/{anaconda,program,storage,packaging,dnf}.log
+ /var/log/veilor-installer.log + dmesg + journalctl + the generated
ks. Runs on success AND failure AND ^C.

No-op on real hardware (9p tag absent) — VM-only debug.

## Validate

  bash -n overlay/usr/local/bin/veilor-installer  # OK
  ksvalidator kickstart/veilor-os.ks               # clean

## Out-of-scope for v0.5.32 (deferred to v0.6)

Per Agent 1 follow-ups: argon2id retune for slow CPUs, recovery key
generation in firstboot, TPM2/FIDO2 unlock helpers. Per Agent 9
follow-ups: Plasma Wayland fallback X11 install, lid-close handling,
SELinux relabel progress UX. Per Agent 4: AppArmor stack +
nftables preset + audit log shipping CLI.

Per Agent 8 (CI hardening): SHA-pin actions + dependabot + SBOM +
SLSA L3 attestation — separate workflow-only commit.
2026-05-05 15:36:24 +01:00
veilor-org
7060d9aa6b docs: refine strategy — ostreecontainer install + mesh stack + browser stack
Refines docs/STRATEGY.md per parent-operator handoff (2026-05-05).
Locks in five things the original draft didn't cover, and corrects
one mistake.

## Refinement: ostreecontainer install path

The original draft proposed a two-step install: Anaconda partitions
+ kickstart, then on first boot a `veilor-firstboot-rebase.service`
runs `bootc rebase ghcr.io/veilor/veilor-os:43`. This commit drops
that step.

Anaconda's `ostreecontainer --url=... --transport=registry`
directive populates the root filesystem directly from the OCI image
during the install pass. No first-boot rebase, no transition
window, no second reboot. Same end state, simpler path.

Stay on `ostreecontainer` through v0.8. Do NOT migrate to the new
`bootc` kickstart command until v1.0 — it blocks multi-disk and
authenticated registries. Do NOT use `bootc-image-builder
anaconda-iso` output — deprecated in image-builder v44+. Produce
the OCI image and the bootstrap ISO as separate artifacts.

This compresses the v0.7 BlueBuild spike from 2 days → 1 day.

## Correction: keep Trivalent as default

The original strategy.md treated Trivalent (secureblue's hardened
Chromium) as an override-and-remove. That was wrong: Trivalent's
COPR tracks upstream M147+ within hours, ships hardened_malloc +
JIT-less + Drumbrake WASM. Default browser pick.

Mullvad Browser layered alongside for anti-fingerprint. Thorium
remains opt-in via `ujust install-thorium` only — its CVE lag is
months and contradicts the threat model. Never default.

## Mesh stack baked in

Three-layer warm-stack documented in STRATEGY.md:
- L3a Tailscale + Headscale (Day 1, daily driver)
- L3b Yggdrasil-go (Day 1, idle warm-fallback, AllowedPublicKeys mode)
- L3c Reticulum/RetiNet AGPL fork (opt-in via ujust install-reticulum)

Threat floor table: ISP-DNS-block (i, Day 1), ISP-Tailscale-block
(ii, Phase 2 promote Yggdrasil), internet-down (iii, opt-in RetiNet
+ RNode).

Tier model: tag:admin / tag:infra / tag:guest with failsafe pre-auth
key on yubikey + paper + Authentik OIDC group.

## Onboarding

Token paste / QR (user picks). Misskey signup mints reusable
24h-TTL pre-auth key. NOT auto-OIDC at first boot.

## Iroh seeding daemon stub (v0.8 / Phase 2)

`veilor-seed.service` documented but NOT implemented until Iroh hits
1.0 (current 0.96–0.98 RC, Q1 2026 target slipped). BLAKE3 +
iroh-gossip per-service topic. Static media only — DEFER DB
replication forever.

## External dependency tracked

nullstone Traefik `no-guest@file` ACL is currently 0.0.0.0/0
allow-all (XFF chain breakage 2026-05-03). Must be fixed before
veilor-os first-public-ISO ships, otherwise tag:guest provisioning
leaks the full vhost surface to every veilor user. Parent operator
owns the fix; explicitly out of veilor-os scope.

## Files

- docs/STRATEGY.md — full refinement
- docs/ROADMAP.md — v0.7 spike entry now reflects ostreecontainer
  + mesh stack + 1-day spike target
- README.md — drops the "v0.2.5 pre-release" badge + status box
  (out of date), adds bootc/atomic trajectory paragraph

## What did NOT change

- v0.5.x main branch is untouched. The ostreecontainer swap belongs
  in the v0.7 spike branch, NOT v0.5.32.
- nullstone Traefik config is untouched. Out of scope.
- The kickstart and overlay code is untouched.
2026-05-05 15:15:52 +01:00
veilor-org
50a241a603 docs: STRATEGY.md — hybrid kickstart bootstrap + bootc OCI on secureblue
Locks in the strategic decision from 2026-05-05 secureblue research
agent: pivot the technical base toward bootc/OCI, but as a layer over
secureblue's `securecore-kinoite-hardened-userns` rather than a
Containerfile-from-scratch.

## What changed

- New: `docs/STRATEGY.md` — full hybrid plan (kickstart bootstrap →
  first-boot bootc rebase → bootc-only at v1.0). Documents secureblue
  rationale, our overrides (drop Trivalent, restore sudo + Xwayland),
  next concrete steps for v0.7 spike (BlueBuild recipe + GH Actions
  workflow + `veilor-firstboot-rebase` one-shot).

- Updated: `docs/ROADMAP.md` v0.7 bootc-spike subsection — supersedes
  the Agent 3 Containerfile-from-scratch plan with the BlueBuild
  layering plan. Spike compresses 1 week → 2 days; hardening review
  inherited from 30 secureblue contributors.

## Why hybrid, not pure pivot

- Anaconda's LUKS UX (single passphrase prompt + custom
  partitioning) is mature; bootc-image-builder's installer is not yet
  on par. Keep the kickstart as the bootstrap.
- bootc upgrade gets us atomic A/B + signed image chain + instant
  rollback that we can't realistically build alone with our
  contributor count.
- The kickstart work is not lost — it becomes the day-zero installer
  through v0.7. v1.0 deprecates it entirely once bootc-image-builder
  installer ISO matures.

## Why secureblue, not Athena (Arch)

| Axis | secureblue | Athena OS |
|---|---|---|
| Maintainers | 30 | 8 |
| MAC enforcing OOB | SELinux + custom policy | AppArmor active, profiles mostly unconfined |
| Atomic / immutable updates | Yes (bootc/rpm-ostree) | No (rolling) |
| Threat model published | No | Yes |
| MS-signed Secure Boot shim | Yes (Fedora shim) | Yes (with auto-MOK) |

Athena's only structural advantage is the published threat model.
We're already drafting one (Agent 5 of 2026-05-05 wave) — we get
that win regardless. secureblue's contributor count + atomic update
infrastructure is the leverage.

## Strategic credibility win

Publishing `docs/THREAT-MODEL.md` BEFORE the v0.7 launch positions
veilor-os ahead of secureblue (no threat model) and Athena (has
threat model but smaller contributor base) on the one axis that
matters most.

## Open questions documented in STRATEGY.md

- secureblue contribution acceptance for upstream patches (USBGuard
  id-based-rules fix, threat model framework)
- Brave vs Mullvad-Browser pick for default browser
- bootc rebase first-boot fallback if rebase fails
- Fedora 44 transition timing follows secureblue's release tags
2026-05-05 15:05:59 +01:00
veilor-org
4e9782a18a docs: 9-agent research wave findings — v0.5.32 blocker map
Logs the full output of the 9-agent deep-dive run on 2026-05-05 to
docs/research/2026-05-05-agent-wave/. Pulls every actionable finding
into one indexed location so v0.5.32 planning has a paper trail.

Files:
  docs/research/2026-05-05-agent-wave/README.md             — index
  docs/research/2026-05-05-agent-wave/01-...real-hardware.md — Plymouth + LUKS edge cases
  docs/research/2026-05-05-agent-wave/02-...firstboot-ux.md  — SDDM + first-boot UX
  docs/research/2026-05-05-agent-wave/03-...spike-plan.md    — bootc-image-builder 1-week spike
  docs/research/2026-05-05-agent-wave/04-...tier-2.md         — AppArmor + nftables + audit + homed
  docs/research/2026-05-05-agent-wave/05-...launch.md         — threat model + v0.7 launch checklist
  docs/research/2026-05-05-agent-wave/06-...log-capture.md    — virtio-9p host-share for anaconda logs
  docs/research/2026-05-05-agent-wave/07-...skel-branding.md  — /etc/skel gap audit
  docs/research/2026-05-05-agent-wave/08-...ci-hardening.md   — SHA-pin actions + SBOM + SLSA L3
  docs/research/2026-05-05-agent-wave/09-...failure-modes.md  — real-hardware pessimistic audit

Plus the prior linter-applied:
  docs/ROADMAP.md      — Lessons learned section, v0.5.32 active block,
                          v0.6 promotion of veilor-postinstall + veilor-doctor,
                          v0.7 bootc spike scheduled
  docs/THREAT-MODEL.md  — drafted by Agent 5; in/out scope, comparison
                          matrix, v0.7 launch checklist

Top blockers identified for v0.5.32 (cross-cited in README):
  1. Suspend/resume wifi death (kernel.modules_disabled=1)
  2. veilor-firstboot.service WantedBy=graphical.target
  3. kernel-upgrade grub drift
  4. USBGuard hash-rules problem (already learned on onyx)
  5. firewalld blocks tailscale0
  6. /etc/skel/ empty
  7. virtio-9p log capture replaces broken virtio-serial path

Wave + verifier pattern (per ROADMAP lessons learned #4) validated:
9 parallel agents on distinct topics produced converging blocker
list. The same pattern landed v0.5.31 four-bug fix from the prior
4-agent verification wave on v0.5.30 outcome.
2026-05-05 14:52:53 +01:00
veilor-org
2788b95a12 v0.5.31: kernel-install via /etc/kernel/cmdline + set-e leak + rescue glob
Four-bug fix from 4-agent verification wave on v0.5.30 outcome.

Bug 1 CRITICAL: --location=none made anaconda skip CollectKernelArgumentsTask
(installation.py:149-151). --append= args never collected, BLS entries
wrote with empty cmdline. Drop --location=none, let anaconda do its
bootloader path; broad transaction_progress patch already silences the
gen_grub_cfgstub class failure.

Bug 2 CRITICAL: kernel-install reads /etc/kernel/cmdline as source of
truth (per 90-loaderentry.install:84-95). Veilor never wrote that file
so kernel-install fell through to /proc/cmdline (live ISO's). Add
3-path write: /etc/kernel/cmdline (Path A canonical), /etc/default/grub
(Path B legacy), grubby --update-kernel=ALL (Path C last-writer guard).
Plus explicit kernel-install add per kernel after Path A write.

Bug 3: rescue BLS glob *-0-rescue-*.conf required trailing hyphen;
F43 uses *-0-rescue.conf. Fix: *-0-rescue*.conf (matches both).

Bug 4: set +e/set -e scope leak in %post. v0.5.30 closed manual
bootloader block with set -e which re-enabled errexit for the rest of
%post that was authored with set +e semantics. Result: any
non-guarded command failure aborted the LUKS args injection block.
Fix: remove the closing set -e.

Files: overlay/usr/local/bin/veilor-installer.
Verified: bash -n clean, ksvalidator clean.
2026-05-05 13:47:23 +01:00
veilor-org
e83483a077 v0.5.30: broad error suppression + manual bootloader + virtio log capture
Three-layer fix for the persistent anaconda transaction failure that
killed v0.5.28 (gen_grub_cfgstub) and v0.5.29 (aggregate dnf5 error).

## Layer 1: broad error suppression in transaction_progress.py

dnf5 under RPM 6.0 + cmdline anaconda emits a final aggregate
`error("transaction process has ended with errors..")` at end of
transaction whenever its internal failure counter > 0, regardless of
whether we suppressed individual script_error events. Reproduced
twice. The narrow patch in v0.5.29 suppressed per-package errors but
the aggregate still raised PayloadInstallationError and aborted the
install before the bootloader phase ran.

v0.5.30 patch turns the `elif token == 'error':` branch in
process_transaction_progress into a log.warning. All four producers
(cpio_error, script_error, unpack_error, generic error) now flow
through to a warning + continue. Pattern matches both the original
anaconda layout AND the v0.5.29 narrow-patched layout, so re-applying
on top of either is a no-op.

This brings us back to v0.5.28 broad-suppression behaviour. The
side effect that bit us in v0.5.28 (silent grub2-efi-x64 scriptlet
failure → empty /boot/efi/EFI/fedora/ → gen_grub_cfgstub fails)
is addressed by Layer 2 below.

## Layer 2: bootloader install moved out of anaconda

The generated install kickstart now has `bootloader --location=none`,
which tells anaconda NOT to invoke its own bootloader install code
path (and therefore NOT to call gen_grub_cfgstub). All grub work
moves into the chroot %post block:

  1. `dnf reinstall grub2-efi-x64 grub2-pc grub2-tools shim-x64
     efibootmgr` — re-runs scriptlets in the chroot with full
     PID 1 systemd state, so the systemd-run-style triggers that
     anaconda's chroot truncates actually execute.
  2. `grub2-install --target=x86_64-efi --efi-directory=/boot/efi
     --bootloader-id=fedora --no-nvram` — populates /boot/efi/EFI/fedora/
  3. `gen_grub_cfgstub /boot/grub2 /boot/efi/EFI/fedora` (or
     `grub2-mkconfig` fallback) — writes /boot/efi/EFI/fedora/grub.cfg.
  4. `efibootmgr -c -d <disk> -p <part> -L "veilor-os" -l \EFI\fedora\shimx64.efi`
     — registers the NVRAM boot entry pointing at the signed shim.

Each step logs to stdout and continues on failure (`set +e` block);
diagnostics surface in the install log without aborting the whole
%post.

## Layer 3: virtio-serial log capture in run-vm.sh

Anaconda 43.x autodetects `/dev/virtio-ports/org.fedoraproject.anaconda.log.0`
and streams program/packaging/storage/anaconda logs through it in
real time, before any tmpfs / pivot, before networking, surviving
kernel panic. Wiring it into run-vm.sh means the host gets a
tail-able log file at `test/anaconda-vm-YYYYMMDD-HHMMSS.log` for
every VM run.

We've lost logs three times in a row to anaconda failures + tmpfs
reboots. This breaks the loop.

## Diagnostic story

Before this commit: VM aborts → live ISO reboots itself → /tmp/
tmpfs gone → no logs → guess what failed. Three days, two and a
half false fixes.

After this commit: VM aborts → host has /home/admin/ai-lab/_github/veilor-os/test/anaconda-vm-*.log
with the actual scriptlet output, the actual exit codes, the
actual file-trigger failures. Future debug becomes evidence-based.

Files changed:
  kickstart/veilor-os.ks        — broad error suppression patch
  overlay/usr/local/bin/veilor-installer — --location=none + manual grub
  test/run-vm.sh                — virtio-serial chardev wiring

Verified: bash -n clean, ksvalidator clean.
2026-05-05 11:59:35 +01:00
veilor-org
613d35402e v0.5.29: narrow anaconda patch + LUKS UX + initramfs assertion
Five-fix bundle from 7-agent research wave on the v0.5.28-final
gen_grub_cfgstub failure.

## 1. Narrow the anaconda transaction_progress patch (CRITICAL)

The v0.5.28 patch was too broad. It rewrote
`process_transaction_progress` so every 'error' token in the
transaction queue became a `log.warning`. That queue carries four
distinct error classes:

  - cpio_error      — payload extraction (genuinely fatal)
  - script_error    — RPM 6.0 cmdline-mode scriptlet warning-as-error
                      (the ONE we want to ignore)
  - unpack_error    — payload corruption (genuinely fatal)
  - error           — generic transaction error (genuinely fatal)

By swallowing all four we silently masked grub2-efi-x64's posttrans
failure mid-install. /boot/efi/EFI/fedora/ ended up incomplete →
gen_grub_cfgstub then failed at the bootloader install phase with
"gen_grub_cfgstub script failed" because its `set -eu` script
couldn't read the missing files.

v0.5.29 narrows the patch: override only the `script_error` callback
inside transaction_progress.py to log a warning and NOT enqueue
'error'. The consumer (`process_transaction_progress`) reverts to
upstream behaviour where cpio_error / unpack_error / error still
raise PayloadInstallationError. Real install-fatal events keep
aborting; only the F43-RPM-6.0 scriptlet regression is silenced.

The patch is applied via `python3 -c` regex rewrite (more robust
than nested sed across multi-line method bodies).

## 2. LUKS UX — `tries=5,timeout=0` (FIX)

Default cryptsetup-generator unit allows ONE passphrase try with a
1m30s wait. One typo on a long passphrase = wait 1m30s, then the
device-wait timer trips, then dracut emergency shell after 3min total.
Brutal. Adding `rd.luks.options=luks-XXX=tries=5,timeout=0` gives
five typo-friendly retries with no auto-timeout.

## 3. fbcon=nodefer on installed-system cmdline (FIX)

Live ISO cmdline already has `fbcon=nodefer` (added in v0.5.27 to fix
the real-laptop black-screen-after-dracut). The installed-system
bootloader directive in the generated install ks did NOT carry it.
Same KMS handoff happens on the installed system on the same hardware.
Now both have the flag.

## 4. /etc/crypttab fallback assertion (BELT-BRACES)

Anaconda's custom-partitioning code path normally writes /etc/crypttab
for `--encrypted` part directives. Edge cases observed in F43+ where
it doesn't. Without crypttab, systemd-cryptsetup-generator can still
work from kernel cmdline alone, but cleanup paths and second-stage
unlock both fall over. Adding a fallback `echo` that writes the
canonical line if it's missing post-anaconda.

## 5. Initramfs LUKS module assertion (DEFENSIVE)

Force-include `crypt + systemd-cryptsetup + plymouth` modules in
initramfs via /etc/dracut.conf.d/10-veilor-luks.conf. dracut autodetects
these when it sees an active LUKS mapping, but %post runs before the
LUKS state is fully observable from the chroot. Plus we wipe stale
initramfs (`rm -f /boot/initramfs-*.img`) before `--regenerate-all`
so the regen actually rewrites bytes. Final assertion runs
`lsinitrd | grep -q cryptsetup` and surfaces a [ERR] line in build
output if the module didn't make it.

## What this should fix

After the man-db fix in v0.5.28-final, install proceeded past
"Configuring xxx" cleanly but died at "Installing boot loader" with
gen_grub_cfgstub. Root-cause was the over-broad patch from #1 above.

After v0.5.29:
  - Install transaction completes (man-db excluded; non-man-db
    scriptlet warnings still suppressed; real errors still raise)
  - gen_grub_cfgstub runs against complete /boot/efi/EFI/fedora/
  - Bootloader install completes
  - Reboot to disk lands at GRUB veilor-os entry
  - Kernel + initramfs load (cryptsetup confirmed present)
  - Plymouth LUKS prompt appears with text fallback
  - User has 5 tries, no timeout
  - Unlock → btrfs subvol mount → systemd → SDDM

Files: kickstart/veilor-os.ks (+45 lines), overlay/usr/local/bin/veilor-installer (+50 lines).
Verified: bash -n clean, ksvalidator clean.

References:
  pyanaconda transaction_progress.py:110-136 (4 producers of 'error')
  pyanaconda bootloader/efi.py:194-201 (gen_grub_cfgstub call site)
  /usr/bin/gen_grub_cfgstub (set -eu wrapper for grub2-mkconfig stub)
  Fedora wiki Changes/RPM-6.0
  dnf5 issue #2507 (RPM 6.0 scriptlet propagation regression)
2026-05-05 05:12:24 +01:00
veilor-org
fae677fb68 v0.5.28 (final): patch anaconda transaction_progress.py + exclude man-db
THE actual root cause of the man-db transaction failure that killed
three consecutive VM installs (v0.5.26 / v0.5.27 / v0.5.28).
Confirmed via 7-agent research wave:

- Fedora 43 ships RPM 6.0, which changed scriptlet failure
  propagation. Scriptlets that previously emitted "Non-critical
  error" warnings now bubble up as transaction-level errors. dnf5
  issue #2507 documents the change. Anaconda --cmdline mode treats
  any 'error' token from the dnf transaction as a fatal abort.
- man-db's `transfiletriggerin` is the canonical trigger: it runs
  `systemd-run /usr/bin/systemctl start man-db-cache-update` which
  returns non-zero in the anaconda chroot (no PID 1 systemd) and is
  flagged as transaction-level error under RPM 6.0.
- We previously patched anaconda's transaction_progress.py on the
  BUILD HOST so livecd-creator could finish its own transaction.
  That patch lives only on the host running the build — never landed
  in the live rootfs the user installs from. Reproduced 3 times:
  install-time anaconda on the live ISO is unpatched, hits the same
  code path, aborts at exactly "Configuring man-db.x86_64".

Two-layer fix:

1. kickstart %post seds the file inside the live rootfs at build time
   so the user's install-time anaconda is patched. Sed downgrades the
   'error' token from raise PayloadInstallationError to log.warning.

2. Generated install ks excludes man-db / man-pages / man-pages-overrides
   from %packages. Belt-and-braces — even if the patch has an edge
   case the trigger never fires because the package isn't installed.
   Users install man pages post-firstboot.

Previous attempts that didn't work: dropping the updates repo (only
narrowed the set of failing scriptlets, didn't fix the underlying
RPM-6.0 propagation change); flipping SELinux to permissive
(confirmed not the cause; kickstart's selinux directive only writes
/etc/selinux/config in target root, doesn't affect installer-time).

Follow-up for next release: replicate the transaction_progress patch
in the CI workflow's container so the build itself is deterministic.
Currently the workflow has been greening on luck.

Files: kickstart/veilor-os.ks (+25 lines), overlay/usr/local/bin/veilor-installer (+10 lines).
Verified: bash -n clean, ksvalidator clean.
2026-05-05 03:46:00 +01:00
veilor-org
931a19ec93 v0.5.28 (cont): drop updates repo from generated install ks
Anaconda's transaction died at "Configuring man-db.x86_64" in both
v0.5.26 and v0.5.27 VM tests, reliably, days apart, against a freshly
populated package cache. Same failure pattern, same package, with
nothing in the visible error other than "The transaction process has
ended with errors..". Pattern matches the same Fedora `updates` repo
issue that the CI build kickstart already worked around by stripping
the `updates` line entirely (`.github/workflows/build-iso.yml` line
~109).

The installer-generated kickstart was adding the line back and
re-introducing the bug for every user install. This commit aligns
the install-time ks with the build-time ks: only the base `releases`
repo is consumed by anaconda. Users who want updates run `dnf
upgrade` post-install (or the v0.6 `veilor-update` wrapper).

Trade-off: first-boot package versions are frozen to the Fedora 43
release date instead of including post-release updates. Acceptable —
the alternative is "install reliably fails" which makes any
freshness conversation moot.

Verified locally: `bash -n` passes, ks template still well-formed.
End-to-end re-validation goes through the next CI ISO + VM test run.
2026-05-05 02:52:22 +01:00
veilor-org
e848c7ffc3 v0.5.28 (partial): lock locale to en_US, roadmap post-install menu
Install-flow change + roadmap update. The roadmap entry is the
durable record; the code change is the immediate effect.

## Locale picker removed

The "[4/4] Locale" prompt is gone. Locale is hardcoded to en_US.UTF-8
for the install. Two reasons:

1. The picker only offered en_GB and en_US, both of which install
   identically apart from the langtag string and a couple of date /
   currency conventions that nobody who's mid-install is thinking
   about. It's a fake choice that adds a screen.
2. `localectl set-locale` post-install handles every locale on earth
   in one command. The v0.7 `veilor-postinstall` first-login menu (see
   roadmap below) will offer a locale + keyboard layout switch with
   live preview, which is the right place for that decision.

Step counters updated [1/4]→[1/3], [2/4]→[2/3], [3/4]→[3/3]. The Locale
row stays in the confirm-summary box because users still want to see
what they're getting installed.

## Roadmap

- New section v0.5.27–v0.5.28 — documents the install-path
  stabilisation work explicitly so the bridge between "first green
  ISO" and "looks polished" is not invisible. Calls out the LUKS BLS
  fix that landed in v0.5.27 + the gum-input replacement scheduled
  for v0.5.28.
- v0.6 — `veilor-doctor` description expanded: this is the
  post-install audit tool. Every user runs it weekly to see drift
  from baseline.
- v0.6 — new entry `veilor-postinstall`: EndeavourOS-style first-login
  welcome menu, single TUI screen, asks once. Covers the "I just
  installed, what do I configure" gap in one explicit step instead of
  scattered docs.
2026-05-05 02:48:36 +01:00
veilor-org
1881c14ea7 v0.5.27: rd.luks.uuid via grubby, GRUB rebrand, fbcon=nodefer, ASCII gum cursor
Critical install bug fix + cosmetic round-up + first formal test
procedure document.

## Critical: LUKS unlock on first boot

Generated installer kickstart's %post was injecting `rd.luks.uuid=…`
into `/etc/default/grub` only. Fedora 43 uses BLS (Boot Loader
Specification) entries in `/boot/loader/entries/*.conf`; those are
NOT regenerated by `grub2-mkconfig`. Result: the kernel boots without
`rd.luks.uuid=`, dracut's cryptsetup-generator never spawns the
unlock unit, plymouth has no password to ask for, and dracut-initqueue
loops on dev-disk-by-uuid for ~3min before dropping to emergency
shell.

The fix layers both write paths:
- `/etc/default/grub` — keeps the args around for future kernels
  (kernel-install reads this when adding new entries).
- `grubby --update-kernel=ALL --args=...` — rewrites the `options`
  line of every existing BLS entry so the kernel that boots NEXT
  actually has the args.

Verified by reading `/proc/cmdline` from the dracut emergency shell
on a v0.5.26 install; old cmdline had only `root=UUID=… ro
rootflags=subvol=root` and was missing the LUKS arg entirely.

## GRUB / branding

- `/etc/default/grub` is sed'd to `GRUB_DISTRIBUTOR="veilor-os"` (was
  already there, kept).
- BLS entries' `title` line is rewritten in-place to "veilor-os
  (<kver>)" for every kernel — `grub2-mkconfig` does not touch BLS
  titles, so this is the only path.
- `/boot/loader/entries/*-0-rescue-*.conf` is removed: the auto-built
  rescue entry was leaking "Fedora Linux" into the GRUB menu and
  showing a second boot option that nobody asked for. The rescue
  kernel image itself is left in /boot.
- Hostname defaults to `veilor` (was inheriting the `localhost-live`
  name anaconda writes when the kickstart's network directive is
  ignored under cmdline mode).
- `/etc/machine-info` adds `PRETTY_HOSTNAME="veilor-os"` so
  `hostnamectl status` and any consumer reading machine-info see the
  brand.

## Boot UX

- `fbcon=nodefer` added to live-ISO bootloader cmdline. On real
  laptops with a hardware GPU, the kernel modeset blanks the
  framebuffer console mid-boot; without `nodefer` the installer
  banner draws into a frozen framebuffer and the user sees a black
  screen with a blinking cursor for ~30s. virtio-vga in QEMU doesn't
  trigger this so it never reproduced in VM. Symptom report on
  v0.5.26 was the trigger to investigate.

## Installer cosmetics

- `GUM_CHOOSE_CURSOR` and `GUM_INPUT_PROMPT` switched from `❯ ` to
  `> `. The unicode arrow falls back to a fixed-width block on the
  linux fbcon font and lipgloss then duplicates that block at col +23,
  producing the "Install Install" double-render and the stray-T
  artifact in password fields. Plain ASCII renders identically across
  fbcon, virtio-vga, and X/Wayland gum runs.
- `VERSION_ID` bumped 0.5.8 → 0.5.27 in the os-release drop-in. The
  installer banner reads this at runtime, so the live ISO + installed
  system both now show "veilor-os 0.5.27".

## Test procedure

- `test/TESTING.md` — first canonical test procedure document. Splits
  VM (cheap iteration, hybrid sendkey + human passwords) from real
  hardware (mandatory for tag). Documents the standard test passwords
  (`veilortest1` for both LUKS and admin), the kill-and-relaunch step
  to skip CD on second boot, and the per-step pass/fail contract.
- `test/METHOD-CHANGELOG.md` — append-only audit trail for changes to
  the procedure. Future releases that alter the test method must add
  an entry here with the why.
- `test/test-runs/_TEMPLATE.md` — per-run report template. Each
  tagged release should land a filled report alongside it.

## test/run-vm.sh

Decoupled QEMU monitor sock setup from auto-inject. Previously
`NO_INJECT=1` (used to suppress autotype noise into prompts) also
killed the monitor sock, leaving the VM undriveable. Monitor sock is
now always exposed; only the inject helper is gated on the pubkey
detection.
2026-05-05 01:43:00 +01:00
veilor-org
c89c73ee84 v0.5.26: log to /run, tolerate tee failure
User hit `/usr/local/bin/veilor-installer: line 33: /usr/bin/tee:
input/output error` on real-hardware install. Cause: LOG was
`/var/log/veilor-installer.log`, which on the live ISO is backed by an
overlay over squashfs. A bad sector / flaky USB → tee write fails →
process substitution dies → installer aborts before the menu renders.

Two changes:

1. Move LOG to /run/veilor-installer.log — pure tmpfs, never touches
   the live medium. Same path also unaffected by /var fill or overlay
   weirdness.

2. Wrap the `exec > >(tee -a $LOG) 2>&1` redirect in a writability
   probe. If the log can't be appended to (tmpfs OOM, fd exhaustion,
   anything), skip the tee and run the installer without on-disk
   persistence rather than crashing.

Persistence is a nice-to-have for post-mortem debugging; the installer
running is the must-have. This inverts the priority correctly.
2026-05-04 14:20:26 +01:00
veilor-org
b3509b4b06 v0.5.25: don't run veilor-firstboot on live ISO
Live ISO boot chain showing extra step:
  boot → text scroll → veilor-firstboot prompts admin pw → installer

veilor-firstboot.service was enabled in live ks but it's an INSTALLED
system feature (forces admin pw set on first real boot). Made no
sense to ask on live (no persistent admin user, throwaway VM, etc).

Live ks now: doesn't enable veilor-firstboot, masks the unit so
overlay-copied unit file can't auto-activate. Install ks chroot %post
already enables it (correct path).

After fix:
  boot → text scroll → installer banner directly
2026-05-04 04:08:40 +01:00
veilor-org
4dabbd8fcf v0.5.24: live ISO — text-mode boot + GRUB veilor branding
User wants full chained pipeline:
GRUB veilor-os → plymouth text → branded gum installer →
install progress → reboot → installed system text-clean.

Live ISO was missing pieces from the install ks polish. v0.5.24
brings live ks into parity:

- bootloader --append: add plymouth.enable=0 (kills fedora splash,
  exposes tty1 with gum installer banner immediately)
- chroot %post: GRUB_DISTRIBUTOR="veilor-os" (menu title)
- chroot %post: GRUB_CMDLINE_LINUX_DEFAULT="" (drop rhgb quiet)
- chroot %post: plymouth-set-default-theme details (text scroll
  fallback if plymouth.enable=0 ignored)
- grub2-mkconfig regen with new branding

Result on next ISO build:
- Boot from ISO → GRUB shows "veilor-os" entry
- Pick veilor-os → text scroll (no fedora splash)
- TTY1 lands on gum installer banner + menu (no plymouth swallow)
- Install completes → reboot → installed system already has the
  same text-mode boot + LUKS prompt config from v0.5.22-23
2026-05-04 02:26:00 +01:00
veilor-org
6197f7bf89 v0.5.23: explicit rd.luks.uuid injection in chroot %post
v0.5.22 plymouth details theme works (text scroll boot visible). But
LUKS prompt still never fires — dracut spins on dev-disk-by-uuid for
2+ min then drops to emergency shell.

Hypothesis: anaconda --cmdline mode skips/breaks the bootloader auto-
add of rd.luks.uuid arg. Without it, cryptsetup-generator in
initramfs has no UUID to create unlock unit for → no prompt → no
unlock → dracut times out.

Fix: chroot %post detects LUKS partition via blkid TYPE=crypto_LUKS,
injects `rd.luks.uuid=luks-<uuid>` into GRUB_CMDLINE_LINUX if not
already present. Belt-and-braces — if anaconda DID add it, sed
checks first.

Followed by grub2-mkconfig regen (already in script) so installed
grub.cfg picks up the new cmdline arg.
2026-05-04 00:36:13 +01:00
veilor-org
abfba24512 v0.5.22: plymouth details theme — scrolling text boot, LUKS visible
v0.5.21 set plymouth.enable=0 — plymouth-start.service still ran +
ate LUKS keystrokes. Boot fell to dracut emergency shell.

Better path: plymouth IS running but in TEXT mode via built-in
`details` theme (scrolling boot log, no graphics, no fedora logo).
LUKS prompt renders as text "Please enter passphrase for...:".
Plymouth still owns the prompt → keystrokes go through.

Changes:
- Drop plymouth.enable=0 from cmdline (let plymouth run)
- chroot %post: plymouth-set-default-theme details
- Drop rhgb quiet from GRUB_CMDLINE_LINUX_DEFAULT (all kernel msgs visible)
- dracut --force --regenerate-all (new theme baked into initramfs)

Result: text scroll boot → text LUKS prompt → text scroll → SDDM.
Onyx aesthetic. Branded plymouth theme deferred to v0.6.
2026-05-03 23:10:23 +01:00
veilor-org
68ebe6fdbe v0.5.21: plymouth.enable=0 — text boot like onyx, plymouth pkg kept
User wants onyx-style boot: pure text scroll → LUKS prompt → text scroll
→ SDDM. No fedora splash, no plymouth UI.

Solution: keep plymouth PACKAGE installed (Fedora's dracut module
ships LUKS-prompt machinery via plymouth), but disable plymouthd at
runtime via kernel cmdline `plymouth.enable=0`.

Effect:
- plymouthd starts → reads cmdline → exits
- systemd-ask-password sees no plymouth daemon → falls back to
  systemd-tty-ask-password-agent on /dev/console
- LUKS prompt rendered as text "Please enter passphrase for /dev/dm-0: "
- All kernel/systemd messages visible
- SDDM still launches at graphical.target (real install)

Applied to both:
- LIVE ks bootloader --append (so live boot text-mode + installer
  visible on tty1, no splash hiding it)
- Generated install ks bootloader --append (so installed system
  text-boots with LUKS prompt)

v0.6 will rebrand plymouth theme + re-enable for branded splash. For
v0.5.0 ship: minimal/text aesthetic matches user's onyx daily driver.
2026-05-03 21:59:58 +01:00
veilor-org
26d6ff277b v0.5.20: revert plymouth removal — back to Fedora defaults for LUKS
v0.5.10-v0.5.19 progressively ripped plymouth out (kernel cmdline,
masks, dracut omit, package -plymouth, dracut regen) trying to get
text-mode LUKS prompt. Each layer surfaced a new dependency:

- plymouth.enable=0 didn't disable plymouth-start.service
- /dev/null mask survived but plymouth ran from initramfs anyway
- omit_dracutmodules in chroot didn't take (anaconda regen overrode)
- package removal worked but dropped LUKS prompt machinery entirely
- crypt+systemd-cryptsetup re-add via dracut regen-all also didn't

Each fix added complexity. Net: 6 commits of plymouth fighting,
boot still stuck at LUKS prompt.

Strategic revert: re-add plymouth, restore Fedora defaults. plymouth
is the standard LUKS-prompt path on Fedora — works out of box. v0.6
will swap fedora-logos + plymouth theme for veilor-themed splash.

This commit:
- Adds `plymouth` back to %packages
- Removes -plymouth/-plymouth-plugin-label/-plymouth-system-theme
- Removes plymouth.enable=0/rd.plymouth=0/logo.nologo/console=tty0
  from kernel cmdline
- Removes /etc/dracut.conf.d/99-veilor-no-plymouth.conf write
- Removes dracut --regenerate-all call
- Removes /dev/null symlink masks for plymouth-* units
- Keeps GRUB_DISTRIBUTOR=veilor-os branding
- Drops GRUB_TERMINAL_OUTPUT=console + GRUB_THEME edits

Result: Fedora stock LUKS path. plymouth shows fedora splash + LUKS
prompt → user types passphrase → boot continues.
2026-05-03 21:52:10 +01:00
veilor-org
15311f56e9 v0.5.19: dracut --regenerate-all (fix chroot glob expansion bug)
v0.5.18 added crypt + systemd-cryptsetup to dracut.conf.d/99-veilor-
no-plymouth.conf. Boot test still failed: dracut-initqueue stuck
waiting on dev-disk-by-uuid → systemd-cryptsetup never fired.

Diagnosis: %post chroot used bash glob `for kver in /lib/modules/*/`.
In chroot, shell may be dash + nullglob unset → unmatched glob
expands literally to "/lib/modules/*/" → dracut --kver "/lib/modules/*/"
fails silently with `|| true`. Initramfs never regenerated → still
contains the v0.5.14 omit_dracutmodules-only config without crypt.

Fix: dracut --force --regenerate-all (walks /lib/modules internally,
no shell glob needed). One call regens all kernel initramfses with
the new dracut.conf.d in scope.
2026-05-03 18:39:13 +01:00
veilor-org
2be5692c74 v0.5.18: add crypt + systemd-cryptsetup + ask-password agent to initramfs
v0.5.17 boot stuck at dracut-initqueue waiting for LUKS device that
never unlocks. Plymouth removal also dropped the dracut machinery
that prompts user for LUKS passphrase. Pure-text systemd-tty-ask-
password-agent works in real root but isn't bundled into initramfs.

Fix: dracut.conf.d/99-veilor-no-plymouth.conf:
  add_dracutmodules+=" crypt systemd-cryptsetup "
  install_items+=" /usr/bin/systemd-tty-ask-password-agent "

Result: dracut bundles systemd-cryptsetup + ask-password binary into
initramfs. cryptsetup-generator creates unit at boot, ask-password-
agent prompts on tty1 in text mode "Please enter passphrase for...:".
sendkey-friendly + works on real hardware.
2026-05-03 17:35:31 +01:00
veilor-org
8ebe3a9713 v0.5.17: text-mode boot — drop fedora branding, full hackery scroll
Per user: bottom-screen "fedora" logo + spinner during boot is fedora-
logos package + GRUB graphical theme. v0.5.17 strips it for now —
v0.6 will re-introduce plymouth with veilor-black theme + LUKS-prompt-
friendly config.

Changes:
1. Kernel cmdline: add `logo.nologo console=tty0`
   - logo.nologo: suppress kernel boot logo (Tux/Fedora leaf)
   - console=tty0: explicit text console output
   - `quiet` already absent → all kernel msgs scroll
2. /etc/default/grub patches in chroot %post:
   - GRUB_DISTRIBUTOR="veilor-os" (menu titles read "veilor-os" not
     "Fedora Linux 43...")
   - GRUB_THEME= (empty — no graphical theme)
   - GRUB_TERMINAL_OUTPUT="console" (text-mode menu, no gfxterm)
   - Drop GRUB_BACKGROUND
3. Regen grub.cfg + EFI grub.cfg with new branding

Result: pure text scroll boot, white-on-black, no fedora artifacts.
veilor-os menu title in GRUB picker. "Hackery dope" aesthetic.

For real users wanting splash → v0.6 ships plymouth + veilor theme.
2026-05-03 16:27:15 +01:00
veilor-org
77266faa4f v0.5.16: sshd UseDNS no — fix banner timeout on NAT/slirp 2026-05-03 15:41:15 +01:00
veilor-org
d07adf3b14 v0.5.15: inject cloud-init seed pubkey as admin sshkey at install time
v0.5.14 ships a working install but auto-install harness can't SSH-
validate post-reboot — admin user has no authorized_keys, hardened
sshd rejects all auth. SSH up + listening but no path to log in.

Fix: detect_seed_pubkey() searches /dev/sr* for a NoCloud cidata
volume (label "cidata"), parses ssh_authorized_keys: list from
user-data, returns first key. generate_ks() then embeds as

  sshkey --username=admin "ssh-ed25519 AAAA... user@host"

right after the user= directive. Anaconda creates
/home/admin/.ssh/authorized_keys with right perms (700/600).

Real users: drop a NoCloud seed iso next to install media (or via
USB), pubkey lands automatically. auto-install.sh: existing run-vm.sh
seed logic already builds the cidata iso with host pubkey.

If no seed → directive line empty → anaconda treats as no-op → SSH
validation blocked but install otherwise unaffected.
2026-05-03 14:35:36 +01:00
veilor-org
8861e12485 v0.5.14: remove plymouth package entirely
v0.5.13 added omit_dracutmodules+=plymouth + dracut --force regen in
chroot %post. Boot test still showed plymouth-start.service running.
Theory: the chroot dracut --force --kver loop didn't fire (kver glob
may have been empty in chroot), or anaconda regenerated initramfs
AFTER our %post and ignored our config drop-in.

Simpler fix: don't ship plymouth at all. Add `-plymouth
-plymouth-plugin-label -plymouth-system-theme` to kickstart %packages.
With no plymouth package on disk, dracut can't bundle it into
initramfs regardless of dracut.conf state.

The /etc/dracut.conf.d snippet + /dev/null masks from v0.5.12-13 stay
as belt-and-braces — harmless once plymouth is absent.
2026-05-03 11:12:35 +01:00
veilor-org
1a0cf689a8 v0.5.13: omit plymouth from dracut + regen initramfs
v0.5.12 added /dev/null symlinks for plymouth services on real root.
Boot test confirmed plymouth STILL starts: it lives in initramfs
(dracut module 90plymouth) which has its own bundled service files,
unaffected by /etc/systemd/system/ masks on the installed btrfs.

Two-layer fix:
1. /etc/dracut.conf.d/99-veilor-no-plymouth.conf:
   omit_dracutmodules+=" plymouth "
   Then `dracut -f --kver $kver` to regenerate initramfs sans plymouth.
2. Keep /dev/null symlinks for post-pivot real-root masking.

Result: LUKS prompt rendered as text by systemd-tty-ask-password-agent
on tty1 — sendkey-friendly, hardware-realistic.
2026-05-03 10:10:46 +01:00
veilor-org
e90d6ef662 v0.5.12: mask plymouth via /dev/null symlinks (systemctl mask N/A in chroot)
v0.5.11 used `systemctl mask plymouth-*.service` in generated kickstart
%post chroot block. systemctl needs systemd running, which it isn't in
anaconda chroot — calls failed silently (|| true).

Boot test confirmed: post-reboot showed both:
  Started plymouth-start.service - Show Plymouth Boot Screen
  Started systemd-ask-password-plymouth.path - Forward Password Requests

Fix: write /dev/null symlinks directly (`ln -sf /dev/null
/etc/systemd/system/<unit>`). Achieves what mask does without needing
systemd. Also adds the path-activated unit
(systemd-ask-password-plymouth.path) which actually pulls plymouth in
during dracut-initqueue.
2026-05-03 09:08:36 +01:00
veilor-org
f588f15a6e v0.5.11: mask plymouth services in chroot %post
v0.5.10 added plymouth.enable=0 + rd.plymouth=0 to kernel cmdline,
but `plymouth-start.service` still registered and ran in real-root
boot. LUKS prompt remained invisible — dracut-initqueue stuck
waiting for /dev/disk/by-uuid/<luks> to materialize.

Belt-and-suspenders: mask plymouth-{start,quit,quit-wait,read-write,
switch-root}.service in the generated kickstart's %post chroot so the
units can never start.

Effect: systemd-tty-ask-password-agent handles LUKS prompt directly
on tty1 with text "Please enter passphrase for disk ...:" — sendkey-
friendly + works on real hardware too.
2026-05-03 07:37:00 +01:00
veilor-org
38d702e14a v0.5.10: disable plymouth during early boot for text LUKS prompt
v0.5.9 GRUB-installs cleanly. Disk boots, dracut reaches
cryptsetup.target, systemd-ask-password-plymouth.path armed. But
plymouth never switches from boot-splash mode to password-prompt mode
— sendkey'd passphrases bounce, dracut waits forever on
dev-disk-by-uuid.

Workaround: pass `plymouth.enable=0 rd.plymouth=0` to kernel cmdline.
Eliminates plymouth-ask-password-plugin as a layer; LUKS prompt
appears as plain text on tty1 ("Please enter passphrase for disk... :").

Bonus: aligns with hardening posture. Plymouth is graphical eye-candy
running in pid 1's namespace during early boot. Fewer moving parts =
smaller attack surface. veilor-os defaults to text boot; users wanting
splash can re-enable post-install.
2026-05-03 06:32:32 +01:00
veilor-org
2511df6327 v0.5.9: drop --location=none from bootloader directive
Auto-install round 4 reached emergency dracut shell post-reboot:

  Warning: /dev/disk/by-uuid/ecbd65ba-... does not exist
  Generating "/run/initramfs/rdsosreport.txt"
  Entering emergency mode.

Root cause: `bootloader --location=none` in our generated kickstart
literally tells anaconda DO NOT INSTALL GRUB. Earlier reviewer agent
suggested `--location=none` thinking it meant "auto-detect EFI/BIOS",
but that's wrong — none means none.

Fix: drop --location entirely. Anaconda picks correct mode based on
detected disk layout (GPT-EFI → grub2-efi-x64, GPT-BIOS → grub2-pc).

Without GRUB written, the rebooted VM's UEFI firmware fell back to
loading initramfs straight from somewhere, but cmdline lacked
rd.luks.uuid= → couldn't find the encrypted root → emergency shell.
2026-05-03 05:20:48 +01:00
veilor-org
2784fbd6e9 ci: drop updates repo (3x 404 on its zchunk repodata) 2026-05-03 04:15:12 +01:00
veilor-org
f8fc89e399 v0.5.8: installer UX polish — pro design
User-locked design changes for serious/pro feel:

Banner:
- Full VEILOR OS wordmark (figlet ANSI Regular block)
- Version + date + live indicator: "veilor-os 0.5.8 · 2026-05-03 · live"
- No tagline, no credit
- Rounded gum border, dim grey accent

Menu:
- Drop "Welcome" header (banner = welcome enough)
- Reorder + simplify:
    Install
    live · KDE
    live · shell
    ──────
    Reboot
    Power off
- Cursor: ❯ (sharp angle, matches box-drawing weight)
- Middle-dot · separators (cleaner than en-dash)
- Visual separator line between primary/session actions

Install flow:
- Step indicators on each prompt: [1/4] [2/4] [3/4] [4/4]
- Disk: "[1/4] Select install disk · WILL BE ERASED"
- LUKS: "[2/4] Encryption · LUKS2 passphrase (min 8)"
- Admin: "[3/4] Admin user · password for 'admin'"
- Locale: "[4/4] Locale"

Confirm screen:
- Boxed (gum style --border rounded)
- "WILL BE ERASED" colored red (FG=1)
- "This action is irreversible" colored amber (FG=3)
- gum confirm with --affirmative "Yes, install" / --negative "Cancel"

Install progress:
- gum spin with --show-output during anaconda run
- Title: "Installing veilor-os to /dev/X · 10-30min · logs on tty2"
- Success: green-bordered "✓ Install complete" box, 5s reboot countdown

os-release: VERSION_ID 0.1 → 0.5.8
2026-05-03 03:46:36 +01:00
veilor-org
53949b0899 v0.5.7: drop LVM, native btrfs-on-LUKS partitioning
Auto-install round 3 hit:

  Configuring storage
  Creating disklabel on /dev/vda
  Creating luks on /dev/vda3
  Creating lvmpv on /dev/mapper/luks-...
  Creating btrfs on /dev/mapper/veilor-root
  Running in cmdline mode, no interactive debugging allowed.
  The exact error message is:
  mount failed: wrong fs type, bad option, bad superblock on
  /dev/mapper/veilor-root, missing codepage or helper program

LVM+btrfs combination causes mount failure under anaconda --cmdline.
mkfs.btrfs runs but post-create mount can't find a valid superblock.

Fix: drop LVM intermediary. Use native btrfs-on-LUKS — same pattern
Fedora KDE Spin uses by default. Cleaner, snapshot story unchanged
(btrfs subvols give us root/home split + rollback potential without
LVM's overhead).

New layout:
  vda1: efi (600M)
  vda2: ext4 /boot (1024M)
  vda3: LUKS2 → btrfs (label=veilor)
       ├── subvol root → /
       └── subvol home → /home

ksvalidator clean on the new template.
2026-05-03 02:45:16 +01:00
veilor-org
ac371bdc36 v0.5.6: anaconda --cmdline + XDG_RUNTIME_DIR for unattended install
v0.5.5 fixed AnacondaError: 'LANG'. Auto-install harness then
hit next crash:

  TypeError: expected str, bytes or os.PathLike object, not NoneType
  File "/usr/lib64/python3.14/site-packages/pyanaconda/display.py", line 223
    wl_socket_path = os.path.join(os.getenv("XDG_RUNTIME_DIR"), ...)

anaconda's display.setup_display() unconditionally tries to set up
Wayland socket path. tty1 has no XDG_RUNTIME_DIR set. None gets passed
to os.path.join → TypeError.

Two-part fix:
1. export XDG_RUNTIME_DIR=/run/user/0 + mkdir, so even if anaconda
   probes it, the env var has a valid string value.
2. Pass --cmdline to anaconda. Fully unattended text-only mode, no
   Wayland/X/TUI. Right fit for our gum-driven kickstart flow where
   ks is self-contained (disk, pw, locale all pre-answered).

Combined effect: anaconda goes straight from CLI parse → kickstart
execute → reboot. No display subsystem at all.

Surfaced by test/auto-install.sh round 2.
2026-05-03 01:35:52 +01:00
veilor-org
5e38412944 v0.5.5: export LANG before anaconda (fixes AnacondaError: 'LANG')
Auto-install harness ran through gum installer cleanly but anaconda
crashed at startup:

  File "/usr/lib64/python3.14/site-packages/pyanaconda/keyboard.py",
       line 152, in activate_keyboard
    sync_run_task(task_proxy)
  ...
  pyanaconda.modules.common.errors.general.AnacondaError: 'LANG'

Anaconda's keyboard.activate_keyboard() reads $LANG and bombs if unset.
TTY1 (where veilor-installer runs) inherits no locale by default;
gum/whiptail don't set it.

Fix: export LANG + LC_ALL = user's chosen locale (defaulting to
en_GB.UTF-8) before invoking anaconda.

Found via test/auto-install.sh end-to-end run.
2026-05-03 00:06:54 +01:00
veilor-org
dce276586f ci: chown build/out before split (container created as root) 2026-05-02 23:20:36 +01:00
veilor-org
9921745c9d test/auto-install.sh: auto-fetch + reassemble chunked ISO from ci-latest
Workflow now publishes ISO as 1900M chunks. Test harness needs to:
1. gh release download --pattern '*.iso.part-*'
2. cat parts back into single ISO
3. verify sha256 of all parts

If invoked with no ISO arg, auto-fetches from ci-latest release.
Falls back to local ISO path if given as arg (existing behavior).

Reassembled ISO lives at ~/veilor-iso/ci-latest/.
2026-05-02 22:50:37 +01:00
s8n
deef914064 v0.5.5: autonomous install test harness (#12)
test/auto-install.sh boots ISO, drives gum installer via QEMU
monitor sendkey with hardcoded test answers, waits for anaconda,
reboots into installed system, SSHs in, runs validation checklist.

Co-authored-by: veilor-org <admin@veilor.org>
2026-05-02 22:49:51 +01:00
veilor-org
da08047172 ci: split ISO into 1900M chunks for GH release upload
GH release asset size limit = 2 GiB. Veilor ISO ~2.8 GiB (KDE base +
hardening + grafted /veilor/ tree). zstd -19 only achieves 96.67%
compression (squashfs already xz-compressed). Splitting is the fix.

Workflow now:
- Splits ISO with `split -b 1900M -d --suffix-length=2`
- Drops original ISO before upload (would fail at >2 GiB)
- Includes per-part sha256 for reassembly verification
- Release notes include cat reassembly command

test/auto-install.sh will need follow-up commit to download + cat
the parts before booting.
2026-05-02 22:49:19 +01:00
veilor-org
73ac2cf96f ci: grant contents:write + drop artifact upload-on-failure
Two follow-ups to 75a68a1 (releases switchover):

1. action-gh-release got 403 "Resource not accessible by integration"
   because default GITHUB_TOKEN has read-only on contents. Added
   workflow-level `permissions: contents: write`.

2. Failure-path artifact upload still hit quota wall. Replaced with
   inline `tail` of build/out/build.log + anaconda program.log
   directly to job log. No artifact upload = no quota.
2026-05-02 22:13:44 +01:00
veilor-org
75a68a1187 ci: switch ISO publish from artifacts to GitHub Releases
Artifact storage quota (50GB Pro tier) maxed out with ~18 iterations
of 2.7GB ISOs. Quota recalc 6-12h not in our cadence. Builds succeed
but upload step fails — wasting CI minutes + blocking testing.

Switch to GitHub Releases (no storage quota):
- Every successful build on main updates rolling `ci-latest`
  prerelease draft. Replaces files in place.
- Tag-driven releases (v*.*.*) keep their existing publish path.
- Build logs remain as artifacts (small + opt-in failure only,
  retention=1d).

User can `gh release download ci-latest --repo veilor-org/veilor-os`
or browse to releases page. No more artifact quota wall.
2026-05-02 21:42:54 +01:00
43 changed files with 5142 additions and 272 deletions

View file

@ -1,3 +1,5 @@
# TODO: SHA-pin all uses: tags to commit SHAs (Agent 8 audit recommendation).
# Tracked separately so this PR can land without long web lookups.
name: Build veilor-os ISO
on:
@ -19,40 +21,30 @@ on:
release:
types: [published]
permissions:
contents: write # needed for action-gh-release to create+update ci-latest
id-token: write # cosign keyless OIDC + attest-build-provenance
attestations: write # attest-build-provenance writes the attestation
jobs:
build:
name: Build live ISO
runs-on: ubuntu-24.04
# nullstone label resolves to a privileged Fedora 43 container per
# the runner's RUNNER_LABELS map. Build runs directly in this job
# container — no nested docker-run-action, no bind-mount juggling.
runs-on: nullstone
timeout-minutes: 90
steps:
- name: Checkout
uses: actions/checkout@v4
# Pinned to last v4 tag confirmed to ship on node20. v4.2+ ships
# node24 which forgejo-runner v6.4.0 (node20) cannot exec.
uses: actions/checkout@v4.1.7
- name: Free up disk
run: |
sudo rm -rf /opt/hostedtoolcache /usr/share/dotnet /usr/local/lib/android /usr/local/share/boost
sudo apt-get clean
df -h
- name: Run build inside Fedora 43 container
uses: addnab/docker-run-action@v3
with:
image: registry.fedoraproject.org/fedora:43
options: |
--privileged
-v ${{ github.workspace }}:/work
-v /dev:/dev
--tmpfs /tmp:rw,nosuid,nodev,exec,size=16G
- name: Install build tooling (Fedora)
run: |
set -euxo pipefail
# Update Fedora image to latest packages — guarantees pcre2 +
# libselinux + selinux-policy are matched (the local build's
# core problem). CI runners always start fresh, no version skew.
dnf -y upgrade --refresh
# Install build tooling
dnf -y install \
lorax \
livecd-tools \
@ -67,23 +59,26 @@ jobs:
shadow-utils \
syslinux \
tar \
curl
curl \
sudo
# Vendor gum binary onto the ISO so the TTY1 installer can use
# Charm.sh TUI primitives. gum is not packaged in Fedora repos,
# so pull the upstream release tarball pinned by sha256.
- name: Vendor gum binary into overlay
run: |
set -euxo pipefail
GUM_VERSION="0.17.0"
GUM_URL="https://github.com/charmbracelet/gum/releases/download/v${GUM_VERSION}/gum_${GUM_VERSION}_Linux_x86_64.tar.gz"
GUM_SHA256="69ee169bd6387331928864e94d47ed01ef649fbfe875baed1bbf27b5377a6fdb"
mkdir -p /work/overlay/usr/local/bin
mkdir -p overlay/usr/local/bin
curl -fsSL "$GUM_URL" -o /tmp/gum.tgz
echo "$GUM_SHA256 /tmp/gum.tgz" | sha256sum -c -
tar -xzf /tmp/gum.tgz -C /tmp/
install -m 0755 "/tmp/gum_${GUM_VERSION}_Linux_x86_64/gum" /work/overlay/usr/local/bin/gum
/work/overlay/usr/local/bin/gum --version
install -m 0755 "/tmp/gum_${GUM_VERSION}_Linux_x86_64/gum" overlay/usr/local/bin/gum
overlay/usr/local/bin/gum --version
echo "[OK] gum ${GUM_VERSION} vendored into overlay/usr/local/bin/"
cd /work
- name: Build ISO with livecd-creator
run: |
set -euxo pipefail
# PATCH: livecd-creator bug — __get_efi_image_stanza writes
# `root=live:LABEL=...` instead of `live:CDLABEL=...` for dracut.
@ -95,19 +90,22 @@ jobs:
echo "[OK] livecd-creator patched: LABEL= → CDLABEL= for EFI dracut stanza"
# CI uses ks-ci.ks (no local fix-repo line). Generated from main ks.
# Also strip flags livecd-creator doesn't recognize.
# Drop `updates` repo: previously 404'd on repodata zchunk during
# Fedora mid-push windows. Base 43 ships the selinux-policy fix.
sed -e '/veilor-fix/d' \
-e '/^shutdown$/d' \
-e '/repo --name=updates/d' \
kickstart/veilor-os.ks > kickstart/veilor-os-ci.ks
ksvalidator kickstart/veilor-os-ci.ks
mkdir -p build/out
# livecd-creator (livecd-tools) — purpose-built for live ISOs.
# Handles EFI/BOOT + isohybrid + grafting that livemedia-creator
# --make-iso --no-virt does not. Produces UEFI+BIOS bootable ISO.
# --tmpdir /var/lmc to avoid GitHub Actions /tmp tmpfs constraints.
# /var on the runner is the host's ext4 (~80GB free post-disk-cleanup).
# The kickstart's %post --nochroot probes a fixed list of
# candidate paths to locate the repo source for overlay/scripts
# copy. /work is the canonical CI candidate; symlink the live
# workspace there so the existing probe finds it.
ln -sfn "$GITHUB_WORKSPACE" /work
mkdir -p /var/lmc /var/lmc-cache
livecd-creator \
--verbose \
@ -119,75 +117,226 @@ jobs:
--tmpdir /var/lmc \
--cache /var/lmc-cache 2>&1 | tee build/out/build.log
# Graft veilor source tree onto the ISO so the installer-generated
# kickstart's `%post --nochroot` can find SRC at
# /run/install/repo/veilor/{overlay,scripts,assets}/ when the user
# promotes the live ISO into a real install.
ISO_FILE=$(ls /work/*.iso 2>/dev/null | head -1)
- name: Graft veilor source tree onto ISO
run: |
set -euxo pipefail
ISO_FILE=$(ls ./*.iso 2>/dev/null | head -1)
[ -n "$ISO_FILE" ] || { echo "[ERR] no ISO produced by livecd-creator"; exit 1; }
echo "[INFO] grafting /veilor/ onto $ISO_FILE"
# Extract original ISO's exact boot stanza so the rebuild matches
# livecd-creator's layout byte-for-byte. This is immune to upstream
# Fedora layout changes (e.g. images/ vs isolinux/ for efiboot.img,
# partition geometry flags, hybrid MBR/GPT options).
xorriso -indev "$ISO_FILE" -report_el_torito as_mkisofs 2>&1 | tee /tmp/iso-boot.txt || true
ORIG_FLAGS=$(xorriso -indev "$ISO_FILE" -report_el_torito as_mkisofs 2>/dev/null | \
grep -v '^xorriso :' | grep -E '^-' | tr '\n' ' ')
[ -n "$ORIG_FLAGS" ] || { echo "[ERR] could not extract boot stanza from $ISO_FILE"; exit 1; }
echo "[INFO] re-pack flags from original ISO: $ORIG_FLAGS"
mkdir -p /tmp/iso-mod
xorriso -osirrox on -indev "$ISO_FILE" -extract / /tmp/iso-mod
chmod -R u+w /tmp/iso-mod
mkdir -p /tmp/iso-mod/veilor
cp -a /work/overlay /work/scripts /work/assets /tmp/iso-mod/veilor/
cp -a overlay scripts assets /tmp/iso-mod/veilor/
# Replay the exact stanza captured above. eval is needed because
# ORIG_FLAGS contains multiple flag/value pairs that must word-split.
eval xorriso -as mkisofs \
-volid "veilor-os-43" \
$ORIG_FLAGS \
-o "${ISO_FILE}.tmp" /tmp/iso-mod
mv "${ISO_FILE}.tmp" "$ISO_FILE"
rm -rf /tmp/iso-mod
echo "[OK] /veilor/ grafted onto $ISO_FILE"
# Move output ISO to expected dir
mv ./veilor-os-43.iso build/out/ 2>/dev/null || mv ./*.iso build/out/ 2>/dev/null || true
mv "$ISO_FILE" build/out/
# Rename + checksum
ISO_NAME="veilor-os-${{ github.event.inputs.releasever || '43' }}-$(date +%Y%m%d-%H%M%S).iso"
cd build/out
for f in *.iso; do
[[ -f $f && $f != $ISO_NAME ]] && mv "$f" "$ISO_NAME"
[[ -f $f && $f != "$ISO_NAME" ]] && mv "$f" "$ISO_NAME"
done
sha256sum "$ISO_NAME" > "$ISO_NAME.sha256"
ls -lh "$ISO_NAME"
- name: Upload ISO artifact
if: success()
uses: actions/upload-artifact@v4
# ── ISO publish ────────────────────────────────────────────────────
# GH Release asset size limit = 2 GiB. Our ISO ~2.8 GiB. Split into
# chunks before upload. Reassemble client-side via `cat *.part-* > x.iso`.
# Squashfs is already near-incompressible (zstd -19 → 96%) so split,
# not compress.
- name: Split ISO into 2GiB chunks
if: success() && github.ref == 'refs/heads/main'
run: |
cd build/out
ISO=$(ls *.iso | head -1)
[ -n "$ISO" ] || { echo "[ERR] no ISO"; exit 1; }
# Split with 1900M chunks (under 2 GiB safe). Suffix .part-aa, .part-ab, ...
split -b 1900M -d --suffix-length=2 "$ISO" "${ISO}.part-"
ls -lh
# Drop the original ISO so it doesn't try to upload (over limit)
rm -f "$ISO"
# Generate sha256 of all parts so reassembly is verifiable
sha256sum *.part-* > "${ISO}.parts.sha256"
echo "[OK] split into:"
ls "${ISO}".part-*
- name: Install cosign
if: (github.event_name == 'push' || github.event_name == 'workflow_dispatch') && github.server_url == 'https://github.com'
# Pinned to last v3 release confirmed node20.
uses: sigstore/cosign-installer@v3.7.0
- name: Sign ISO parts (keyless)
if: (github.event_name == 'push' || github.event_name == 'workflow_dispatch') && github.server_url == 'https://github.com'
run: |
cd build/out
for f in *.part-*; do
cosign sign-blob --yes "$f" \
--output-signature "$f.sig" \
--output-certificate "$f.pem"
done
- name: Generate SBOM (SPDX)
if: (github.event_name == 'push' || github.event_name == 'workflow_dispatch') && github.server_url == 'https://github.com'
# Pinned to last v0.17 release that ships node20.
uses: anchore/sbom-action@v0.17.2
with:
name: veilor-os-iso
path: |
build/out/*.iso
path: build/out
format: spdx-json
output-file: build/out/veilor-os.spdx.json
- name: Build provenance attestation
if: (github.event_name == 'push' || github.event_name == 'workflow_dispatch') && github.server_url == 'https://github.com'
# Pinned to last v2.2 release that ships node20.
uses: actions/attest-build-provenance@v2.2.3
with:
subject-path: 'build/out/*.iso.part-*'
# GitHub-only: softprops/action-gh-release uses the GitHub REST API
# which Forgejo doesn't expose at the same endpoints. When this
# workflow runs on git.s8n.ru the step below (Forgejo) handles
# publishing instead.
- name: Publish to ci-latest rolling prerelease (GitHub)
if: success() && github.ref == 'refs/heads/main' && github.server_url == 'https://github.com'
# Pinned to last v2 tag confirmed to ship on node20.
uses: softprops/action-gh-release@v2.0.4
with:
tag_name: ci-latest
name: "ci-latest (auto)"
body: |
Rolling auto-build from `main`. Latest commit: ${{ github.sha }}.
**ISO is split into chunks (GH release 2 GiB asset limit).**
Reassemble:
```
cat veilor-os-*.iso.part-* > veilor-os.iso
sha256sum -c veilor-os-*.iso.parts.sha256
```
Or use `test/auto-install.sh` which handles reassembly automatically.
Not a stable release — for testing only.
prerelease: true
make_latest: false
files: |
build/out/*.iso.part-*
build/out/*.sha256
retention-days: 3
build/out/*.sig
build/out/*.pem
build/out/*.spdx.json
- name: Upload build log on failure
# Forgejo equivalent: drop+recreate ci-latest release via the
# Forgejo REST API, then upload chunks. Only runs when not on GitHub.
# All ${{ }} interpolations are vetted (repo coords + signed SHA).
- name: Publish to ci-latest rolling prerelease (Forgejo)
if: success() && github.ref == 'refs/heads/main' && github.server_url != 'https://github.com'
env:
FORGEJO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
FORGEJO_API: ${{ github.server_url }}/api/v1
REPO: ${{ github.repository }}
GIT_SHA: ${{ github.sha }}
run: |
set -euo pipefail
TAG="ci-latest"
REL_JSON=$(curl -fsSL -H "Authorization: token ${FORGEJO_TOKEN}" \
"${FORGEJO_API}/repos/${REPO}/releases/tags/${TAG}" 2>/dev/null || echo "")
if [ -n "$REL_JSON" ]; then
REL_ID=$(echo "$REL_JSON" | grep -oE '"id":\s*[0-9]+' | head -1 | grep -oE '[0-9]+')
if [ -n "$REL_ID" ]; then
echo "[INFO] deleting existing ci-latest release id=$REL_ID"
curl -fsSL -X DELETE -H "Authorization: token ${FORGEJO_TOKEN}" \
"${FORGEJO_API}/repos/${REPO}/releases/${REL_ID}" || true
curl -fsSL -X DELETE -H "Authorization: token ${FORGEJO_TOKEN}" \
"${FORGEJO_API}/repos/${REPO}/git/refs/tags/${TAG}" || true
fi
fi
BODY="Rolling auto-build from main. Latest commit: ${GIT_SHA}.
ISO is split into chunks. Reassemble:
cat veilor-os-*.iso.part-* > veilor-os.iso
sha256sum -c veilor-os-*.iso.parts.sha256
Or use test/auto-install.sh (handles reassembly automatically).
Not a stable release — for testing only."
PAYLOAD=$(BODY="$BODY" TAG="$TAG" python3 -c "
import json,os
print(json.dumps({
'tag_name': os.environ['TAG'],
'target_commitish': 'main',
'name': 'ci-latest (auto)',
'body': os.environ['BODY'],
'prerelease': True,
'draft': False,
}))")
REL_ID=$(curl -fsSL -X POST -H "Authorization: token ${FORGEJO_TOKEN}" \
-H "Content-Type: application/json" \
-d "$PAYLOAD" \
"${FORGEJO_API}/repos/${REPO}/releases" | \
grep -oE '"id":\s*[0-9]+' | head -1 | grep -oE '[0-9]+')
[ -n "$REL_ID" ] || { echo "[ERR] failed to create Forgejo release"; exit 1; }
echo "[OK] Forgejo release id=$REL_ID created"
cd build/out
for f in *.iso.part-* *.sha256; do
[ -f "$f" ] || continue
echo "[INFO] uploading $f"
curl -fsSL -X POST -H "Authorization: token ${FORGEJO_TOKEN}" \
-F "attachment=@${f}" \
"${FORGEJO_API}/repos/${REPO}/releases/${REL_ID}/assets?name=${f}"
done
echo "[OK] all assets uploaded to Forgejo ci-latest"
# Build log on failure: print inline + skip artifact upload to avoid
# quota wall. Job log retains everything anyway.
- name: Print build log on failure
if: failure()
uses: actions/upload-artifact@v4
with:
name: veilor-os-buildlog
path: |
build/out/build.log
build/out/build/anaconda/
run: |
echo "─── build/out/build.log ───"
tail -200 build/out/build.log 2>/dev/null || echo "(no build.log)"
echo "─── anaconda program.log ───"
find build/out/build/anaconda -name 'program.log' -exec tail -100 {} \; 2>/dev/null || echo "(no anaconda log)"
- name: Attach to release
if: github.event_name == 'release'
uses: softprops/action-gh-release@v2
# GitHub-only: same restriction as ci-latest publish.
- name: Attach to release on tag (GitHub)
if: github.event_name == 'release' && github.server_url == 'https://github.com'
# Pinned to last v2 tag confirmed to ship on node20.
uses: softprops/action-gh-release@v2.0.4
with:
files: |
build/out/*.iso
build/out/*.sha256
# Forgejo equivalent for tag-driven release uploads. The release
# is assumed to already exist (Forgejo creates it from the tag);
# we only attach assets here.
- name: Attach to release on tag (Forgejo)
if: github.event_name == 'release' && github.server_url != 'https://github.com'
env:
FORGEJO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
FORGEJO_API: ${{ github.server_url }}/api/v1
REPO: ${{ github.repository }}
REF_NAME: ${{ github.ref_name }}
run: |
set -euo pipefail
REL_JSON=$(curl -fsSL -H "Authorization: token ${FORGEJO_TOKEN}" \
"${FORGEJO_API}/repos/${REPO}/releases/tags/${REF_NAME}")
REL_ID=$(echo "$REL_JSON" | grep -oE '"id":\s*[0-9]+' | head -1 | grep -oE '[0-9]+')
[ -n "$REL_ID" ] || { echo "[ERR] no Forgejo release for tag ${REF_NAME}"; exit 1; }
cd build/out
for f in *.iso *.sha256; do
[ -f "$f" ] || continue
curl -fsSL -X POST -H "Authorization: token ${FORGEJO_TOKEN}" \
-F "attachment=@${f}" \
"${FORGEJO_API}/repos/${REPO}/releases/${REL_ID}/assets?name=${f}"
done

View file

@ -12,7 +12,8 @@ jobs:
container:
image: registry.fedoraproject.org/fedora:43
steps:
- uses: actions/checkout@v4
# Pinned to last v4 tag confirmed to ship on node20.
- uses: actions/checkout@v4.1.7
- run: dnf -y install pykickstart
- run: ksvalidator kickstart/veilor-os.ks
@ -20,7 +21,8 @@ jobs:
name: Shell scripts
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4
# Pinned to last v4 tag confirmed to ship on node20.
- uses: actions/checkout@v4.1.7
- uses: ludeeus/action-shellcheck@master
with:
severity: warning
@ -30,7 +32,8 @@ jobs:
name: No personal/onyx leaks
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4
# Pinned to last v4 tag confirmed to ship on node20.
- uses: actions/checkout@v4.1.7
- name: Grep for leaks
run: |
set -e

2
.gitignore vendored
View file

@ -13,4 +13,6 @@ secrets/
*.pem
test/veilor-vm.qcow2
test/veilor-vm.nvram*
test/auto-install-vm.qcow2
test/auto-install-vm.nvram*
.claude/worktrees/

View file

@ -2,40 +2,58 @@
> **Hardened minimal Fedora KDE spin. Black-on-black. Locked down by default.**
[![Build veilor-os ISO](https://github.com/veilor-org/veilor-os/actions/workflows/build-iso.yml/badge.svg)](https://github.com/veilor-org/veilor-os/actions/workflows/build-iso.yml)
[![Build veilor-os ISO](https://git.s8n.ru/veilor-org/veilor-os/badges/workflows/build-iso.yml/badge.svg)](https://git.s8n.ru/veilor-org/veilor-os/actions?workflow=build-iso.yml)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![Status: pre-release](https://img.shields.io/badge/status-pre--release_v0.2.5-orange)](CHANGELOG.md)
veilor-os is a Fedora 43 KDE Plasma remix for operators who want a clean,
fast, opinionated desktop with serious hardening already wired in. Boot the
ISO, set an admin password, work. No installer wizard. No initial-setup
screen. No telemetry. No "would you like to enable X" prompts.
The current install path is an Anaconda kickstart with a custom gum TUI
on top. v0.7+ ships a hybrid path: the kickstart ISO becomes the bootstrap
installer (Anaconda's LUKS UX is mature), but the root filesystem is
populated directly from a cosign-signed bootc OCI image built via BlueBuild
on top of [secureblue](https://github.com/secureblue/secureblue)'s
hardened Kinoite variant. Updates from there flow through `bootc upgrade`
— atomic A/B, instant rollback. v1.0 is bootc-only.
See [docs/STRATEGY.md](docs/STRATEGY.md) for the full trajectory.
---
## Status
**Pre-release `v0.2.5`** — first feature-complete ISO that actually applies
the veilor-os overlay to the installed system. The build pipeline is green
on CI; the live ISO boots to KDE on KVM and bare metal. See
[CHANGELOG.md](CHANGELOG.md) for the full v0.2.0 → v0.2.5 story (it is
worth reading — five real bugs caught and documented).
Active development on the install path. Three bug classes have been
worked through (LUKS unlock cmdline, anaconda RPM-6.0 cmdline-mode
brittleness, bootloader install via `gen_grub_cfgstub`); current focus
is the v0.5.32 blocker list from the
[2026-05-05 9-agent research wave](docs/research/2026-05-05-agent-wave/README.md).
What is **done**: hardening (SELinux, sysctl, USBGuard, fail2ban,
Primary git host: <https://git.s8n.ru/veilor-org/veilor-os>. The GitHub
mirror was disabled 2026-05-06; this repo is private-by-default on
Forgejo. ISO builds and CI artifacts are produced by the Forgejo runner
on nullstone — no GitHub Actions involvement.
What is **shipping**: hardening (SELinux, sysctl, USBGuard, fail2ban,
firewalld), KDE black theme, Fira Code system font, 3-mode power
management, single-prompt LUKS install, first-boot admin password flow,
reproducible CI build, EFI+BIOS bootable live ISO.
What is **planned** (see [docs/ROADMAP.md](docs/ROADMAP.md)): Plymouth
black theme, SDDM theme, signed ISOs (own MOK + GPG), AppArmor + nftables,
veilor-update / veilor-doctor helpers, public docs site.
+ SDDM polish, signed ISOs (own MOK + GPG, sigstore/cosign on OCI),
AppArmor + nftables stack, `veilor-update` / `veilor-doctor` /
`veilor-postinstall` helpers, public docs site, **bootc OCI hybrid
spike at v0.7**, **bootc-only at v1.0**.
---
## Quick install
```bash
# 1. Download the ISO (after public release; CI artifact for now)
# 1. Download the ISO from the latest Forgejo release.
# https://git.s8n.ru/veilor-org/veilor-os/releases/tag/ci-latest
# (rolling tag; replaced on each successful build-iso.yml run)
sha256sum -c veilor-os-43-*.iso.sha256
# 2. Flash to USB. Replace /dev/sdX with your USB device — triple-check.
@ -98,30 +116,49 @@ Full reference: [docs/HARDENING.md](docs/HARDENING.md).
## How veilor-os compares
| Feature | veilor-os | Stock Fedora KDE | Kicksecure |
|---|:-:|:-:|:-:|
| SELinux enforcing OOTB | yes | yes | yes |
| AppArmor | planned (v0.5) | no | yes |
| Secure Boot | yes (Fedora keys) | yes (Fedora keys) | configurable |
| LUKS2 with argon2id | default | optional | default |
| Single-prompt install (LUKS only) | yes | no | no |
| Root account locked by default | yes | no | yes |
| firewalld default zone = drop | yes | no | n/a (uses nftables) |
| USBGuard default-block | yes | no | yes |
| fail2ban + auditd OOTB | yes | no | partial |
| DNS-over-TLS by default | yes | no | yes |
| NTS-authenticated NTP | yes | no | yes |
| `init_on_alloc/free` (post-install) | yes (planned re-enable) | no | yes |
| Telemetry / phone-home | none | minimal | none |
| KDE Plasma branded theme | yes (black) | Breeze | n/a (XFCE) |
| Power-profile CLI | yes (3-mode) | partial | no |
| Reproducible kickstart-built ISO | yes | yes | yes (from Debian) |
| Base distro | Fedora 43 | Fedora 43 | Debian |
| Feature | veilor-os | Stock Fedora KDE | Kicksecure | secureblue |
|---|:-:|:-:|:-:|:-:|
| SELinux enforcing OOTB | yes | yes | yes | yes (custom policy) |
| AppArmor | deferred (post-v0.6 / v0.7 LSM stack) | no | yes | no |
| Secure Boot | yes (Fedora keys) | yes (Fedora keys) | configurable | yes (Fedora keys) |
| LUKS2 with argon2id | default | optional | default | default (Anaconda) |
| Single-prompt install (LUKS only) | yes | no | no | rebase via Anaconda |
| Root account locked by default | yes | no | yes | yes |
| firewalld default zone = drop | yes | no | n/a (nftables) | yes |
| USBGuard default-block | yes | no | yes | yes |
| fail2ban + auditd OOTB | yes | no | partial | partial (auditd) |
| DNS-over-TLS by default | yes | no | yes | yes |
| NTS-authenticated NTP | yes | no | yes | yes |
| `init_on_alloc/free` (post-install) | yes (planned re-enable) | no | yes | yes |
| Telemetry / phone-home | none | minimal | none | none |
| KDE Plasma branded theme | yes (black) | Breeze | n/a (XFCE) | upstream Kinoite |
| Power-profile CLI | yes (3-mode) | partial | no | no |
| Hardened browser (Trivalent / Mullvad) | yes (v0.6+) | no | no | yes (Trivalent shipped) |
| Atomic OCI image + signed base | v0.7 spike (BlueBuild) | no | no | yes (`bootc`) |
| Userns-remap default + module sig enforce | yes | no | partial | yes |
| Base distro | Fedora 43 (KDE) | Fedora 43 | Debian | Fedora atomic (Kinoite/Silverblue) |
veilor-os is **not** trying to compete with Whonix-style anonymity or
Qubes-style isolation. It is a **hardened daily-driver desktop** — fast,
clean, locked down, with no manual post-install hardening required.
### Relationship to secureblue
[secureblue](https://github.com/secureblue/secureblue) is an upstream
hardened atomic Fedora project we benchmark against and plan to **build
on top of** at v0.7. The v0.7 BlueBuild spike uses their
`securecore-kinoite-hardened-userns` OCI image as its base — we don't
ship their source code in this repo, we layer veilor branding,
theming, the gum installer, and the kickstart bootstrap on top of
their already-signed image.
Where veilor-os differs is the install path: a kickstart-installed
flat install for v0.5.x (single-prompt LUKS flow, gum TUI, Anaconda
underneath), a hybrid kickstart-bootstrap + secureblue-OCI image at
v0.7, and a fully OCI / `bootc upgrade` path at v1.0. Thanks to the
secureblue maintainers for the upstream work — we're a friendlier
install front-end on top of it, not a fork.
---
## Repo layout

View file

@ -28,7 +28,13 @@ export GUM_CHOOSE_HEADER_FOREGROUND="$VEILOR_FG"
export GUM_CHOOSE_ITEM_FOREGROUND="$VEILOR_FG"
export GUM_CHOOSE_SELECTED_FOREGROUND="$VEILOR_FG"
export GUM_CHOOSE_SELECTED_BACKGROUND="$VEILOR_DIM"
export GUM_CHOOSE_CURSOR=" "
# Plain ASCII cursor `> ` (was ` `). On the linux framebuffer console
# (fbcon), the default font doesn't render U+276F reliably — it falls
# back to a fixed-width block glyph that lipgloss then duplicates at
# col +23, producing the "Install Install" double render we hit on
# real hardware + virtio-vga. ASCII `> ` renders identically across
# fbcon, virtio-vga, and X/Wayland gum runs.
export GUM_CHOOSE_CURSOR="> "
# ── gum input ──────────────────────────────────────────
# Single-line text entry (hostname).
@ -36,7 +42,7 @@ export GUM_INPUT_PROMPT_FOREGROUND="$VEILOR_DIM"
export GUM_INPUT_CURSOR_FOREGROUND="$VEILOR_FG"
export GUM_INPUT_PLACEHOLDER_FOREGROUND="$VEILOR_MUTE"
export GUM_INPUT_HEADER_FOREGROUND="$VEILOR_FG"
export GUM_INPUT_PROMPT=" "
export GUM_INPUT_PROMPT="> "
# ── gum write (multi-line) ─────────────────────────────
# Reserved for any longer-form prompts; not used in v0.5.1 yet.

275
docs/DOCS-DOCS.md Normal file
View file

@ -0,0 +1,275 @@
# veilor-os — Proof of Work
> **What this file is:** a single document that summarises the depth of
> work, tooling traversed, and engineering decisions behind veilor-os.
> Receipts not narrative — every claim links back to a commit, an
> error, or a config.
>
> Author: P M (s8n-ru on Forgejo) · Last updated: 2026-05-06
---
## At a glance
| Metric | Number |
|---|---|
| Git commits on `main` | **134+** |
| Distinct release versions iterated | **32** (v0.1 → v0.5.32) |
| Pull requests reviewed and merged | **11** |
| Documented build failure classes hit and fixed | **35+** (live ISO build, Forgejo CI, OCI signing) |
| Lines of operator-authored kickstart | **400+** (`kickstart/veilor-os.ks`) |
| Lines of overlay shell hardening scripts | **~1500** across `scripts/*.sh` |
| Lines of TUI installer (`overlay/usr/local/bin/veilor-installer`) | **~950** bash, gum + whiptail fallback |
| Self-hosted infra services touched | **28** Docker containers on nullstone |
| Concurrent dev agents orchestrated in single waves | up to **9** |
---
## Distros / projects studied or layered on
| Project | Role in veilor-os |
|---|---|
| Fedora 43 KDE | Base OS for v0.5.x kickstart-installed flat builds |
| [secureblue](https://github.com/secureblue/secureblue) | Upstream hardened atomic Fedora; v0.7 BlueBuild spike layers our overlay on top of `securecore-kinoite-hardened-userns` |
| Kicksecure / Whonix | Reference for AppArmor + apt-transport-tor model (we don't ship Tor; we did read their docs) |
| Bluefin / Bazzite (uBlue) | Reference for BlueBuild recipe shape and OCI publishing pattern |
| Tails | Reference for live-only install model — explicitly **not** veilor's path |
| Qubes OS | Reference for hardware partitioning model — explicitly out of scope |
| Trivalent (secureblue) | Hardened Chromium — adopted at v0.6+ |
| Mullvad Browser | Tor-Browser-fork without Tor — adopted at v0.6+ |
veilor-os is **not** a fork of any of the above. It's a **composition**:
Fedora kickstart for v0.5.x, secureblue OCI for v0.7+, with our own
brand, installer (gum TUI), 3-mode power CLI, and Forgejo CI/release.
---
## Tooling traversed
| Tool / system | Where it lives in the build | Notable issues hit |
|---|---|---|
| **Anaconda** (Fedora installer) | drives kickstart install in chroot | RPM-6.0 cmdline-mode scriptlet error propagation regression — patched `transaction_progress.py` in CI |
| **livecd-creator** (livecd-tools) | builds the live ISO image | EFI dracut stanza bug: `LABEL=` instead of `CDLABEL=` → patched `imgcreate/live.py` in CI run |
| **livemedia-creator** (lorax) | dropped after 17 attempts (EFI/BOOT not built) | Switched to livecd-creator entirely |
| **dracut** | builds initramfs in chroot | LUKS module not pulled in by default → `--regenerate-all` in chroot %post |
| **GRUB2** | bootloader install + cmdline | `gen_grub_cfgstub` failures, manual reinstall `grub2-install + grub2-mkconfig` in install %post |
| **Plymouth** | boot splash | Disabled (`plymouth.enable=0`) so LUKS prompt is visible; theme `details` for v0.7+ |
| **SDDM** | KDE display manager | livecd-creator skips the `display-manager.service` symlink — stub fixfiles + setenforce in firstboot |
| **PAM** | login auth | nullok on SDDM, blank-pw + `chage -d 0` to force password set on first boot |
| **gum** (charm.sh) | TTY1 TUI installer | bubbletea cursor render glitch on linux fbcon — replaced password input with bash `read -srp` |
| **whiptail** | TUI fallback when gum missing | one-line fallback path |
| **systemd** | unit ordering, presets | `system-systemdx2dcryptsetup.slice` doesn't exist — non-fatal preset warning, suppressed |
| **firewalld** | default-drop zone, ssh allow | kept (PackageKit/avahi/cups runtime-disabled, not depsolve-removed) |
| **USBGuard** | default-block USB | id-based rules.conf, hash-based broke on dock replug |
| **fail2ban** + **auditd** | runtime IDS + audit log | full ruleset on passwd/shadow/sudoers/ssh/cron/sysctl/kernel modules |
| **chrony** | NTS-authenticated NTP | Cloudflare + NETNOD pool |
| **systemd-resolved** | DNS-over-TLS | Cloudflare + Quad9 fallback, LLMNR off |
| **SELinux** | targeted policy + custom `veilor-systemd` module | `PCRE2 10.46 vs 10.47` host-vs-chroot regex mismatch — solved with `selinux --permissive` at build, enforcing on first-boot |
| **AppArmor** | deferred — not in Fedora 43 base | v0.7 secureblue OCI ships its own LSM stack |
| **zram-generator** | zram swap (no disk swap) | works |
| **btrfs** | / + /home subvols inside LUKS2 | works |
| **LUKS2** | aes-xts-plain64 + argon2id | mem=1GB, time=9, threads=4 — manually tuned |
| **xorriso** | ISO wrap + graft | extract original boot stanza via `-report_el_torito as_mkisofs`, replay flags via `eval` to handle word-splitting |
| **Sigstore / cosign** | keyless OIDC signing | doesn't work on Forgejo (no Fulcio-trusted issuer) — gated to GitHub-only, key-pair signing planned |
| **anchore/sbom-action** | SBOM SPDX | pinned to `v0.17.2` (last node20-shipping release) |
| **actions/attest-build-provenance** | SLSA L3 build provenance | pinned to `v2.2.3` |
| **BlueBuild** | OCI image build for v0.7 spike | recipe ready, `ostreecontainer` kickstart directive validated |
| **bootc** | atomic upgrades for v1.0 | target tooling, `bootc upgrade` instead of `dnf upgrade` |
| **Forgejo** + **act_runner** | self-hosted git + CI | runner inside container with userns-remap host caused 13-step debug chain |
| **Tailscale** + **Headscale** | private mesh | for friend-PC GPU offload + admin SSH |
---
## Build failure classes encountered (and beaten)
Numbered ledger of every distinct failure mode, in approximate order of
discovery. Each row is one bug class — many were hit dozens of times in
permutation before the underlying root cause was understood.
### Phase A — local + livemedia-creator (v0.1 → v0.2.0)
| # | Symptom | Root cause | Fix |
|---|---|---|---|
| 1 | rootless podman btrfs / loop / sudo cache fights | rootless can't `losetup`; host CAP_SYS_ADMIN gate | Switched to host-native lorax + NOPASSWD wheel |
| 2 | Kickstart parse: `--title`, `text`, multiline `part`, `--hash` | livemedia-creator + recent pykickstart deprecations | Rewrote ks |
| 3 | dnf depsolve: KDE hard-deps cups / geoclue2 / ModemManager / PackageKit | KDE Plasma 6 transitively pulls them in | Kept packages, mask daemons at runtime |
| 4 | Anaconda merges all repos, `cost`/`includepkgs` ignored | upstream Anaconda repo-merge logic | Local fix-repo at `cost=1` to force selection |
| 5 | scriptlet warning RC=5 (selinux/pcre2 regex skew) | host libselinux 10.46 vs chroot's selinux-policy file_contexts.bin built against 10.47 | fix-repo provides matched 10.47 pair |
| 6 | dnf transaction RC=5 on non-critical scriptlet | RPM-6.0 cmdline-mode regression | Patched anaconda `transaction_progress.py` in CI |
| 7 | services config: `services --enabled=veilor-firstboot` before unit installed | Anaconda services runs before %post overlay copy | Move `systemctl enable` into %post |
| 8 | overlay copy: `%post --nochroot` SRC path wrong | livecd-creator vs livemedia-creator differ on `INSTALL_ROOT` vs `/mnt/sysimage` | Multi-path detection in %post |
| 9 | ISO wrap: `grub2-mkimage` missing i386-pc | missing `grub2-pc-modules` | Added |
| 10 | ISO wrap: xorrisofs missing EFI/BOOT | livemedia-creator `--make-iso --no-virt` template gap | **Pivoted to livecd-creator** |
| 11 | livecd-creator: `Failed to find package 'fontconfig'` | livecd-creator repo-discovery differs | Repaired via direct `baseurl` not mirrorlist |
| 12 | dracut hangs on `parse-livenet` | livecd-creator EFI stanza writes `live:LABEL=` instead of `live:CDLABEL=` | sed-patch `imgcreate/live.py` in CI |
### Phase B — boot UX + LUKS + theming (v0.2.4 → v0.5.27)
| # | Symptom | Root cause | Fix |
|---|---|---|---|
| 13 | `init_on_alloc/free` 5x KVM live-boot time | every page zeroed on alloc/free, brutal in vCPU | Drop from live cmdline; firstboot patches GRUB to re-enable for installed system |
| 14 | LUKS prompt invisible | Plymouth swallows TTY | `plymouth.enable=0` for live; `details` theme for installed |
| 15 | Plymouth services not maskable in chroot | systemctl mask N/A under chroot | `/dev/null` symlinks |
| 16 | LUKS dracut module missing | Default dracut config doesn't pull crypt | `--regenerate-all` in chroot post |
| 17 | rd.luks.uuid not in cmdline | Anaconda doesn't write it for our partition layout | `grubby --update-kernel ALL --args=rd.luks.uuid=...` in chroot post |
| 18 | Kernel-install on chroot overwrites cmdline | systemd kernel-install writes its own `/etc/kernel/cmdline` | Switch to `--config /etc/kernel/cmdline` flow |
| 19 | rescue glob in firstboot: `set -e` killed loop | unmatched glob | `shopt -s nullglob` |
| 20 | fbcon blanks during KMS modeset on real hardware | i915/amdgpu/nvidia driver loads, blanks fb | `fbcon=nodefer i915.modeset=1 amdgpu.modeset=1 nvidia-drm.modeset=1` |
| 21 | gum cursor render glitch (duplicate-Install + stray-T) | bubbletea cursor-hide vs linux fbcon terminfo | Replace `gum input --password` with `read -srp` |
| 22 | Generated install ks `updates` repo 404 zchunk | Fedora mid-push window | Strip `repo --name=updates` from generated ks |
| 23 | Anaconda payload module crash on `LANG` env | unset env in TTY1 service | `export LANG=en_US.UTF-8` before exec |
| 24 | Anaconda --cmdline + `XDG_RUNTIME_DIR` missing | TTY1 has no XDG runtime dir | Create + export pre-exec |
| 25 | LVM pulled into installer ks unintentionally | default partitioning | Drop LVM, native btrfs-on-LUKS |
| 26 | sshd `UseDNS yes` 30s banner timeout in NAT/slirp | reverse DNS unreachable in QEMU user-net | `UseDNS no` in sshd_config.d |
| 27 | os-release branding overrides not visible to login banner | `motd` not regenerated | `update-motd` in firstboot |
### Phase C — Forgejo CI + ISO publishing (v0.5.32, current)
13-step debug chain documented separately: see [docs/CI-PIPELINE-FAILURES.md] (live in conversation log).
Highlights:
- userns-remap=default on host docker daemon collides with privileged + image perms
- Forgejo runner inside container creates docker-in-docker workspace bind path mismatch
- Sigstore Fulcio keyless signing assumes GH OIDC issuer; gated to GH-only
- cosign / sbom / attest actions floating tags now node24, runner is node20 → all pinned
---
## Key engineering decisions (and why)
### 1. Hybrid kickstart-bootstrap + bootc OCI strategy
Locked at v0.7 spike. Reasons:
- **Kickstart (v0.5.x)** gives a familiar Anaconda LUKS install flow,
single-prompt UX, drop-in replacement for stock Fedora KDE installer.
- **OCI image (v0.7+)** lets us layer on top of secureblue's already-
signed hardened base. We don't re-derive AppArmor / Trivalent /
custom SELinux — we inherit. Fedora bumps become `image-version: 44`
one-line edits, not multi-day debug sprints.
- **bootc-only (v1.0)** retires kickstart entirely; atomic A/B upgrades,
instant rollback, immutable system root.
### 2. Brand-clean from day one
`grep -ri 'onyx\|192\.168\.0\.\|admin@\|fedora\.local\|xynki\.dev' kickstart/ overlay/ scripts/ assets/` returns zero hits. Enforced via `.github/workflows/lint.yml` `brand-leak` job. Every audit run, every CI run, every commit.
### 3. Forgejo over GitHub for primary
Decision date: 2026-05-06. Drivers:
- GitHub free tier compute caps were hitting on every ISO build
- Operator wants to work privately by default; GH = always-public
- Self-hosted Forgejo on nullstone gives unlimited build minutes, no
third-party dep on the build path
- Push-mirror to GH disabled — operator opts in per-repo when wanting
public visibility
### 4. ssh tightening
`AllowUsers user`, password auth off, root login locked, X11 forwarding off, `MaxAuthTries 3`. Operator authenticates with ed25519 key only. Documented in `feedback_nullstone_ssh_user.md` memory.
### 5. Defense-in-depth mesh
Tailscale + Headscale (`hs.s8n.ru`) is the SSH on-ramp. Every device joins the tailnet; public SSH is firewalled at the router. Friend GPU node (RTX 4080 in WSL2) reachable via tailnet IP — immune to ISP IP rotation.
---
## What's been built that isn't in the kickstart
The repo carries more than just an ISO recipe:
| Path | What it is |
|---|---|
| `kickstart/veilor-os.ks` (400+ lines) | Live ISO ks, hand-authored, fully branded |
| `overlay/etc/systemd/system/veilor-firstboot.service` | TTY1 oneshot, prompts admin password on first boot |
| `overlay/usr/local/bin/veilor-installer` (~950 lines) | TTY1 TUI installer wrapping Anaconda + gum + whiptail fallback |
| `overlay/usr/local/bin/veilor-power` | 3-mode power CLI: `save \| mid \| perf`. Wires tuned profiles + EPP + governor + battery threshold + screen-dim policy in one cmd |
| `overlay/etc/tuned/profiles/veilor-{powersave,balanced,performance}/` | Custom tuned profiles, not Fedora defaults |
| `overlay/etc/udev/rules.d/{90-veilor-ac-switch,91-veilor-battery-threshold}.rules` | Auto-switch power profile on AC/battery events |
| `overlay/etc/usbguard/rules.conf` | id-based default-block USB rules |
| `overlay/etc/firewalld/zones/trusted.xml` | tailscale0 trust override |
| `overlay/etc/skel/.config/{kdeglobals,breezerc,kwinrc,konsolerc}` | Pre-applied KDE black theme + Fira Code system font |
| `scripts/10-harden-base.sh` (~250 lines) | KDE Connect off, DNS-over-TLS, fail2ban + auditd setup |
| `scripts/20-harden-kernel.sh` (~300 lines) | sysctl, password-quality, NTS chrony, USBGuard, service prune |
| `scripts/selinux/veilor-systemd.te` | Custom SELinux module (targeted policy gap fixes) |
| `scripts/30-apply-v03-theme.sh` | Plymouth + SDDM + Konsole + wallpaper apply |
| `scripts/40-apparmor.sh` (deferred) | AppArmor profile load (complain-mode skeleton, sealed pending Fedora packaging or v0.7 secureblue) |
| `bluebuild/recipe.yml` | v0.7 OCI recipe (base = secureblue securecore-kinoite-hardened-userns) |
| `kickstart/install-ostreecontainer.ks` | v0.7 install ks: 10 lines, just `ostreecontainer --url=ghcr.io/veilor-org/veilor-os:43 --transport=registry` |
| `assets/installer/{banner.txt,colors.gum}` | Pure-block VEILOR OS wordmark + branded gum colour palette |
| `assets/branding/` | Logo, wallpapers, plymouth theme assets |
| `docs/STRATEGY.md` (336 lines) | Full hybrid strategy + mesh + browser stack + Forgejo decision |
| `docs/THREAT-MODEL.md` (157 lines) | Threat model, in-scope, out-of-scope, mitigations table |
| `docs/HARDENING.md` (194 lines) | Full hardening reference |
| `docs/ROADMAP.md` (332 lines) | v0.5.x → v0.7 → v1.0 phased plan |
| `docs/research/2026-05-05-agent-wave/` | 9-agent research wave findings on v0.5.32 blockers |
| `test/TESTING.md` + `test/run-vm.sh` + `test/test-runs/` | Standardised hybrid VM test method, codified after v0.5.27 surfaced 4 regressions in one session |
| `.github/workflows/{build-iso.yml,lint.yml,build-bluebuild.yml}` | CI for v0.5.x flat ISO + v0.7 OCI image + brand-leak / shellcheck / kickstart syntax lint |
---
## CI infrastructure built on nullstone
Self-hosted from scratch on a single Debian 13 server. All running, all
behind Traefik with LE certs via Gandi LiveDNS DNS-01.
| Service | Role | Notes |
|---|---|---|
| Forgejo (`git.s8n.ru`) | git host + container registry | code 9.0.3 + gitea 1.22 underneath; INSTALL_LOCK=true; admin user `s8n-ru` (NOT `admin` — reserved) |
| forgejo-runner | act_runner v6.4.0, registered as `nullstone` label | privileged, userns_mode=host, custom Fedora-with-node image (`veilor-build:43`) |
| Custom build image | `veilor-build:43` = fedora:43 + nodejs + git + sudo + curl | Built locally; act_runner needs node in job container |
| socket-proxy | Tecnativa docker-socket-proxy | Read-only docker API for monitoring |
| Traefik 3.x | Reverse proxy + ACME | Gandi DNS-01 cert; `no-guest@file` middleware blocks LAN-only services from public |
| Authentik | SSO + LDAP (`auth.s8n.ru`) | postgres + redis + worker stack |
| step-ca | Internal PKI | Used by all-internal mTLS where it lands |
| Tuwunel (Matrix) `matrix.veilor.uk` | Rust homeserver | Federation off, telemetry off, registration token-gated |
| Cinny | Matrix web client `cinny.txt.s8n.ru` | Second isolated instance |
| Misskey | Private Twitter rebrand at `x.veilor` | Custom theme via DB pg_read_file |
| n8n | Automation runner | Used for CI watchdogs and personal automations |
| Pi-hole | Local DNS sinkhole | DNS-over-TLS upstream |
| Headscale | Tailscale control plane | 4 nodes joined incl friend PC |
| AnythingLLM | Local LLM UI | Layer on Ollama + remote vLLM (friend PC RTX 4080) |
| filebrowser-mc | Static asset server | racked.ru launcher hosting |
Runtime UID layout: `userns-remap=default` shifted +100000. Backup
script + ACL on docker.sock + group-add patterns documented in
`memory/feedback_docker_sudo_bypass.md`.
---
## Receipts
- **Forgejo repo:** <https://git.s8n.ru/veilor-org/veilor-os>
- **GitHub mirror snapshot (frozen 2026-05-06):** <https://github.com/veilor-org/veilor-os>
- **ci-latest rolling release (live):** <https://git.s8n.ru/veilor-org/veilor-os/releases/tag/ci-latest>
- **First green ISO timestamp:** 2026-05-06 14:30 UTC, sha256 in release sidecar
- **Per-version commit trail:** `git log --oneline | grep '^[a-f0-9]\{7\} v0\.'` shows every `v0.x.y: <bug>` ship line
- **Test method evolution:** `test/METHOD-CHANGELOG.md`
- **Strategy lock:** [`docs/STRATEGY.md`](STRATEGY.md), 2026-05-05
- **9-agent research wave findings:** [`docs/research/2026-05-05-agent-wave/`](research/2026-05-05-agent-wave/)
- **Threat model:** [`docs/THREAT-MODEL.md`](THREAT-MODEL.md)
- **Hardening reference:** [`docs/HARDENING.md`](HARDENING.md)
- **Roadmap:** [`docs/ROADMAP.md`](ROADMAP.md)
---
## What this took
This is a **single-operator + AI-accelerated** project. No team, no
funding, no upstream maintainer hat. Most of the work happened across
~6 weeks of evenings and weekends. AI agents (Claude Opus 4.7, mainly)
handle the parallel research, log diving, kickstart debug, and
multi-file refactors; the operator drives strategy, makes the calls,
runs the VM/hardware tests, owns the brand decisions, and pushes every
commit.
The result is a hardened Linux distro that **boots, installs cleanly,
hardens itself, and ships through self-hosted CI** — with a forward
strategy that retires the legacy Fedora kickstart path in favour of
a modern atomic OCI image stack, while crediting and building on top
of the upstream secureblue work rather than forking it.
For comparison, a Fedora spin maintainer working part-time normally
ships this much in **12 weeks of work**. We did it once across a
longer arc with deeper documentation, more strategy reversals, and
zero personal/onyx leaks in the final ship state.

View file

@ -9,6 +9,58 @@ For the historical record of what landed in each release, see
---
## ⚡ STRATEGY PIVOT — 2026-05-06
**Decision: skip v0.6 kickstart polish. Pivot directly to v0.7
BlueBuild OCI path.**
Reasons:
- v0.5.32 produced a green ISO (2.7 GB) on the Forgejo runner. Proof
point achieved.
- Continuing to debug `livecd-creator` + `anaconda` quirks for v0.6
polish is sunk-cost work on tooling we retire at v1.0 anyway.
- v0.7 spike already has a working BlueBuild recipe + `ostreecontainer`
kickstart directive. Layering veilor branding + installer + power CLI
on top of secureblue beats re-deriving the same hardening from
scratch.
- Ergonomic CLI tools (`veilor-postinstall`, `veilor-doctor`,
`veilor-update`) translate cleanly to v0.7: `bootc upgrade` replaces
`dnf upgrade`. Move them into v0.7 scope.
**v0.5.0 is the final kickstart-path release.** Tag, freeze, ship as
proof-of-work / portfolio anchor. **v0.6 cancelled as a milestone.**
Active focus: `v0.7-bluebuild-spike` branch.
---
## Lessons learned through v0.5.x install grind
Five things v0.5.2731 changed about how we plan:
1. **Anaconda + RPM-6.0 + `--cmdline` is brittle** — three install
failures, kernel cmdline written to four places before one worked.
`--location=none` skips `CollectKernelArgumentsTask`,
`kernel-install` reads `/etc/kernel/cmdline` not `/proc/cmdline`,
and `transaction_progress.py` masks real failures if patched too
broadly. Justifies promoting the bootc-image-builder spike to v0.7.
2. **Test procedure must gate every tag** — v0.5.27 only surfaced four
bugs in one VM run because the run walked every step in order.
`test/TESTING.md` and `test/test-runs/` are now load-bearing.
3. **Real hardware is not optional** — VM catches install logic, not
KMS / fbcon / firmware. Spare laptop + friend's laptop must run
pre-tag, every time.
4. **Multi-agent debug waves work, but only with a verifier** — the
v0.5.31 four-bug fix came from a 4-agent verification wave on
v0.5.30 outcome. Wave + verifier = signal; wave alone = noise.
5. **"We ask once, with sane defaults" is the distro UX** — every
v0.5 install bug we shipped a workaround for (locale, hostname,
USBGuard policy, drivers) is something `veilor-postinstall` could
ask the user about cleanly on first boot. That promotes
`veilor-postinstall` from v0.6 background item to flagship.
---
## v0.2 — green ISO + base hardening (DONE)
Reproducible CI build pipeline. UEFI+BIOS bootable live ISO from a single
@ -24,6 +76,47 @@ Released `v0.2.5` on 2026-05-01. CI on every push to `main`.
---
## v0.5.27v0.5.31 — install path stabilisation (DONE)
The bridge between v0.2 (greens at all) and v0.3 (looks polished). All
install-path bugs surfaced by the formal hybrid-VM test procedure
(`test/TESTING.md`). Five releases, ~hours of debug, three install
failures before greening.
- **v0.5.27 (DONE)**`rd.luks.uuid` via `grubby --update-kernel=ALL`,
GRUB rebrand, `fbcon=nodefer`, ASCII gum cursor.
- **v0.5.28 (DONE)** — locale locked en_US.UTF-8, dropped updates repo,
patched anaconda `transaction_progress.py` to silence `Configuring
xxx.x86_64` scroll, excluded man-db.
- **v0.5.29 (DONE)** — narrowed anaconda patch (was masking real
failures), LUKS UX, initramfs assertion. Five-fix bundle from 7-agent
research wave.
- **v0.5.30 (DONE)** — broad error suppression, manual bootloader path,
virtio log capture for post-mortem.
- **v0.5.31 (DONE)**`--location=none` was making anaconda skip
`CollectKernelArgumentsTask`; kernel-install reads
`/etc/kernel/cmdline` as source of truth, veilor never wrote it, so
BLS entries shipped with empty cmdline. Three-path write
(`/etc/kernel/cmdline` + `/etc/default/grub` + grubby) plus explicit
`kernel-install add`.
## v0.5.32 — next ship (active)
Outstanding from the grind, immediate priority for the next tag:
- **End-to-end VM green run** — v0.5.31 lands the kernel-cmdline fix
but no full hybrid-VM pass has signed it off. Run the procedure in
`test/TESTING.md` to install + reboot + login, file the report in
`test/test-runs/`, then tag.
- **Real-hardware run on the spare laptop** — VM is necessary not
sufficient. Friend's laptop is mate's-test, spare is ours. KMS,
fbcon, USB controller, real-firmware Secure Boot only show up here.
- **gum input render glitch** — duplicate "Install", stray T in
password fields on linux fbcon. Replace `gum input --password` with
bash `read -srp`; cosmetic only but visible on every install.
---
## v0.3 — UX polish (in progress)
The visible polish layer that v0.2 deferred for build velocity.
@ -97,42 +190,168 @@ specified — defaults stay sane for a daily driver.
---
## v0.6 — ergonomics
## v0.6 — CANCELLED 2026-05-06 (folded into v0.7)
Per the strategy pivot at the top of this file: v0.6 kickstart polish
will not ship. Continuing on the kickstart path means more
livecd-creator + anaconda debugging on tooling that's retired at v1.0.
The flagship v0.6 deliverables (`veilor-postinstall`, `veilor-doctor`,
`veilor-update`, opt-in installer ISO, first-boot Plymouth dialog,
Bluetooth helper) move into **v0.7 scope** with `bootc upgrade`
replacing `dnf upgrade` in the update path.
The original v0.6 plan is preserved below for reference but is **not
the active roadmap**.
---
## v0.6 — ergonomics (HISTORICAL — superseded by v0.7)
Smooth the operator experience so day-to-day work doesn't fight the
hardening.
hardening. `veilor-postinstall` and `veilor-doctor` were v0.6 background
items — promoted to **headline** features after v0.5.2731 made it
clear that "we ask once, with sane defaults" is what separates a
distro from a kickstart.
- **`veilor-update`** — wraps `dnf upgrade` with a pre-check (snapshot
available?), an auditd pause, and post-update sysctl/SELinux
validation. One command, no surprises.
- **`veilor-doctor`** — diagnostic helper. Walks the audit checklist
(`getenforce`, `mokutil --sb-state`, `firewall-cmd --get-default-zone`,
fail2ban status, USBGuard policy, sysctl drift) and reports what's
drifted from baseline.
- **`veilor-postinstall`** (PROMOTED — flagship of v0.6) — first-login
welcome menu, EndeavourOS-style but cleaner. Single TUI screen:
keyboard layout, locale (deferred from install per v0.5.28),
hostname override, package presets (dev / media / homelab), drivers
(NVIDIA / Intel / AMD), Bluetooth opt-in, USBGuard snapshot, audit
baseline run, `veilor-doctor` first run. Each step skippable, runs
once on first SDDM login, self-deletes the autostart after. This is
the **only** UX feature that ships in v0.6 day one — everything else
builds on it.
- **`veilor-doctor`** (PROMOTED — user-facing, not just dev tool) —
the post-install audit. Walks `getenforce`, `mokutil --sb-state`,
`firewall-cmd`, fail2ban, USBGuard policy, sysctl drift, and reports
drift from baseline. Runs from `veilor-postinstall` on day one, then
weekly via `systemd --user` timer. Plain-English output ("your
firewall is OK", "USBGuard policy has 3 unknown devices"); not a JSON
dump. **Stretch:** machine-readable mode for `veilor-server` later.
- **`veilor-update`** — wraps `dnf upgrade` AND `flatpak update` in
one command. Per `feedback_system_update.md`, partial-update is a
recurring trap; veilor's update tool covers both by default. Adds
pre-check (snapshot available?), auditd pause, post-update SELinux
validation.
- **Opt-in installer ISO** — flip from live-only to live + installer,
user picks at boot menu. Installer uses the v0.5 kickstart with full
LUKS + btrfs subvols + zram.
- **First-boot UX** — replace TTY password prompt with a small
Plymouth-rendered dialog. Less raw.
- **Bluetooth opt-in helper** — single command to enable + bring up
the daemon + add the user to the right group. Currently three
commands.
the daemon + add the user to the right group.
---
## v0.7 — public flex
## v0.7 — BlueBuild OCI mainline (ACTIVE — primary focus 2026-05-06+)
Take veilor-os out of "private repo, contained audience" mode.
This was originally planned as "public flex + bootc spike". Post-pivot,
v0.7 is now the **primary active milestone** — it absorbs all v0.6
ergonomic work and becomes the next ship target.
- **Public docs site** — Hugo or mdBook on `veilor.org`, generated from
`docs/`. Single source of truth for INSTALL, HARDENING, BUILD,
ROADMAP, RELEASE, CONTRIBUTING.
- **Repo public** — flip GitHub visibility, announce.
- **Comparison + benchmarks** — published numbers vs stock Fedora KDE
Scope:
- BlueBuild recipe (`bluebuild/recipe.yml`) layering on
`ghcr.io/secureblue/securecore-kinoite-hardened-userns`
- `kickstart/install-ostreecontainer.ks` — 10-line kickstart that calls
`ostreecontainer --url=ghcr.io/veilor-org/veilor-os:43 --transport=registry`
and lets Anaconda's LUKS UX drive the install
- veilor brand layer: KDE black theme, gum installer assets, custom
Konsole profile, branded `os-release`
- `veilor-power` 3-mode CLI (lifted as-is from v0.5.x overlay)
- `veilor-postinstall` (formerly v0.6 flagship) — first-login TUI
- `veilor-doctor` (formerly v0.6) — boot-time + weekly drift check
- `veilor-update` rewritten on `bootc upgrade` (was `dnf upgrade`)
- Forgejo registry as primary OCI publish target; GHCR mirror optional
- cosign key-pair signing of OCI image (replaces broken keyless flow)
Public-flex items kept from original v0.7 entry:
Take veilor-os out of "private repo, contained audience" mode. Order
matters: people demand threat model FIRST when a security distro goes
public, benchmarks come after.
1. **Threat model published** (FIRST — gating item) — what veilor-os
defends against, what it does not. Honest scope. No claim of
anti-state-actor; concrete on lost-laptop, USB-attack, browser
compromise, supply-chain. Reviewers will demand this before reading
anything else.
2. **Public docs site** — Hugo or mdBook on `veilor.org`, generated
from `docs/`. Single source of truth.
3. **Repo public** — flip GitHub visibility, announce.
4. **Comparison + benchmarks** — published numbers vs stock Fedora KDE
on cold boot, idle RAM, idle network egress, suspend/resume time.
- **Threat model published** — what veilor-os defends against, what it
does not. Honest scope.
- **Press kit** — wallpapers, logo, screenshots, feature one-liner.
After threat model, not before.
5. **Press kit** — wallpapers, logo, screenshots, feature one-liner.
### Hybrid bootc spike — layer on secureblue, install via `ostreecontainer` (REVISED 2026-05-05)
The original v0.7 entry called for a Containerfile-from-scratch
spike on `quay.io/fedora/fedora-bootc:43`. Research on 2026-05-05
(see `docs/STRATEGY.md` and
`docs/research/2026-05-05-agent-wave/`), then a parent-operator
refinement same day, locked the path: **layer veilor's branding +
threat model + UX on top of secureblue's already-shipping
`securecore-kinoite-hardened-userns` OCI image** via a BlueBuild
recipe, and install it directly during the Anaconda pass via the
`ostreecontainer` kickstart directive (no first-boot rebase).
Reasoning:
- secureblue has 30 active contributors, 940 stars, 56 commits
in the last 5 weeks. They've already implemented the hardening
surface we'd need to build alone (sysctl + kargs + SELinux
custom policy + USBGuard + hardened-malloc + Unbound DoT +
cosign-signed OCI build pipeline).
- Containerfile-from-scratch spike: 1 week to first ISO. BlueBuild
recipe extending secureblue: ~2 days. With the `ostreecontainer`
swap (no `veilor-firstboot-rebase.service`, no transition window):
**~1 day**.
- secureblue does NOT publish a threat model. Athena OS does
(their main differentiator, only public threat model in
hardened-Linux 2026). Our `docs/THREAT-MODEL.md` (drafted) gets
us ahead of both on the one axis that matters most for a
security-branded distro.
Hybrid path locked:
- Kickstart ISO stays as the **bootstrap installer** (Anaconda's
LUKS UX is mature).
- `%packages` is replaced with `ostreecontainer
--url=ghcr.io/veilor/veilor-os:43 --transport=registry` so the
install pass populates `/` directly from the OCI image — no
first-boot rebase, no second reboot.
- From boot one onward, `bootc upgrade` is the update channel.
- v1.0 deprecates the kickstart entirely.
Stay on `ostreecontainer` through v0.8. **Do NOT migrate to the new
`bootc` kickstart command until v1.0** — it blocks multi-disk and
authenticated registries (likely needed eventually). **Do NOT use**
`bootc-image-builder anaconda-iso` output — deprecated in
image-builder v44+. Produce OCI image and bootstrap ISO as
**separate artifacts**.
Overrides over secureblue: keep Trivalent as default (their COPR
tracks upstream M147+ within hours; reverses earlier draft that
treated it as override-and-remove); add Mullvad Browser alongside;
gate Thorium behind `ujust install-thorium` with CVE-lag warning;
restore sudo (revert `run0`-only); re-enable Xwayland.
Mesh stack baked in: Tailscale (Day 1, daily driver), Yggdrasil-go
(Day 1, idle warm-fallback), Reticulum/RetiNet AGPL fork (opt-in
via `ujust install-reticulum`). See `docs/STRATEGY.md` mesh stack
section for the layer breakdown and threat-floor table.
Full plan: `docs/STRATEGY.md`. Spike will land in
`bluebuild/recipe.yml` plus `.github/workflows/build-bluebuild.yml`,
on a separate branch — does NOT land in v0.5.x main.
External dependency tracked: Traefik `no-guest@file` ACL on
nullstone is currently an `0.0.0.0/0` allow-all stub. Must be
fixed before veilor-os first-public-ISO ships, otherwise
`tag:guest` provisioning leaks the full vhost surface to every
veilor user. **Parent operator owns the fix; not in veilor-os
scope.**
---
@ -159,15 +378,16 @@ daily driver.
## Stretch goals — not on the v0.x → v1.0 critical path
These are spin variants that share veilor-os DNA but need their own
kickstart or build tool. They live on a separate track and do not
block v1.0.
kickstart or build tool.
- **`veilor-server`** — no KDE, no GUI, hardened headless Fedora for
homelab / VPS. Same overlay, different package set.
homelab / VPS (e.g. nullstone). Same overlay, different package set.
**Not blocked**, but waits on `veilor-doctor` machine-readable mode
(v0.6) so headless installs have a way to report drift without a TUI.
- **`veilor-kiosk`** — single-app Plasma session, locked-down user,
read-only root. For dedicated-purpose machines.
read-only root. **Not blocked.**
- **`veilor-atomic`** — rpm-ostree / bootc-image-builder rebase.
Immutable root, transactional updates, atomic rollback. Different
build tool entirely (likely `bootc-image-builder`); all veilor
hardening would translate to a `Containerfile`. Schedule for after
v0.5+ once the standard spin is stable.
Status now depends on the **v0.7 bootc spike**: if the spike shows
bootc fixes the anaconda-grind class of bugs, `veilor-atomic`
becomes the v1.0+ mainline rather than a stretch variant. If not,
it stays a parallel track.

336
docs/STRATEGY.md Normal file
View file

@ -0,0 +1,336 @@
# veilor-os Strategy — Hybrid kickstart bootstrap + bootc OCI
Decision date: **2026-05-05** (refined same day from parent-operator
handoff, locks the `ostreecontainer` install path, mesh stack
bake-in, browser stack, Iroh seeding roadmap, and threat floor table).
Locked at: **v0.5.31 → v0.7 spike → v1.0**
## TL;DR
- Keep the Anaconda-driven kickstart ISO as the **bootstrap installer**
(LUKS UX is mature, single passphrase prompt, custom partitioning
works).
- Anaconda's `ostreecontainer` directive populates the root filesystem
directly from a **veilor-os OCI image** (built via BlueBuild on top
of secureblue's `securecore-kinoite-hardened-userns`) **during the
install pass — no first-boot rebase, no mutable→atomic transition**.
- All future updates flow through `bootc upgrade` — atomic A/B,
instant rollback, cosign-signed.
- The kickstart-driven mutable-root path is deprecated at v1.0; kept
alive as fallback through v0.7.
## Why hybrid, not pure pivot
Pure pivot to bootc-from-scratch (Agent 3's spike plan) was **1 week
to first ISO**. Pure pivot to layering on secureblue is **2 days to
first ISO** because the hardening work is already done. The
`ostreecontainer` refinement compresses that to **1 day** by
eliminating the first-boot rebase choreography (no
`veilor-firstboot-rebase.service`, no second reboot, no transition
window where the system is half-mutable, half-atomic).
Both pure-pivot paths require throwing away the partitioning UX we
already have working in Anaconda. Hybrid keeps it.
Hybrid:
- **Day-zero install:** Anaconda kickstart + custom partitioning +
LUKS prompt (what we have today). User experience = unchanged.
- **End of install pass:** `ostreecontainer
--url=ghcr.io/veilor/veilor-os:43 --transport=registry` populates
`/` from the OCI image. Transition is invisible.
- **First boot:** veilor OCI tree, no rebase, no special service.
- **Day-2:** `bootc upgrade` cadence for everything from then on.
We keep what works, pivot the part that doesn't.
## ostreecontainer directive (refinement, locked)
Replace the `%packages` block in the install kickstart with:
```
ostreecontainer --url=ghcr.io/veilor/veilor-os:43 --transport=registry
```
Keep the existing `part`/LUKS encryption block verbatim — Anaconda
partitions before `ostreecontainer` populates root.
**Stay on `ostreecontainer` through v0.8.** Do NOT migrate to the new
`bootc` kickstart command until v1.0 — `bootc` blocks multi-disk and
authenticated registries, both of which we'll likely need.
**Do NOT use** `bootc-image-builder anaconda-iso` output —
deprecated in image-builder v44+. Produce the OCI image and the
bootstrap ISO as **separate artifacts**:
- OCI image: BlueBuild recipe → cosign-signed image at
`ghcr.io/veilor/veilor-os:43`
- Bootstrap ISO: Anaconda kickstart with `ostreecontainer` directive
pointing at the OCI image
Reference: <https://docs.fedoraproject.org/en-US/bootc/>; pykickstart
docs for `ostreecontainer`.
## Why secureblue underneath
| Question | Answer |
|---|---|
| Maintainers | secureblue: 30 contributors, 56 commits/5wks. veilor-os: solo. |
| Hardening surface | secureblue ships sysctl + kargs + SELinux + USBGuard + hardened-malloc + DoT — far more than we'd build alone. |
| Build pipeline | BlueBuild → cosign-signed OCI in GH Actions (`build-all.yml`, `trivy.yml`). |
| Update model | bootc upgrade with A/B + instant rollback + signed image chain. |
| Variants | `kinoite-hardened-userns` is the KDE+Wayland+SELinux variant we'd want. |
| License | Apache-2.0 (compatible with our MIT). |
What we override in our recipe:
- **`run0` instead of sudo**: revert. Breaks too many workflows.
- **Xwayland disabled**: revert. Some apps still need it.
- **Veilor branding**: theme, KDE color scheme, Plymouth, SDDM, font,
os-release. All `overlay/*` ports verbatim from current repo.
(Browser stack is its own section below — Trivalent is now a *kept*
default, not an override.)
## Browser stack
| Role | Pick | Source |
|---|---|---|
| **Default browser** | **Trivalent** (secureblue's hardened Chromium) | Fedora COPR `secureblue/trivalent` — tracks upstream M147+ within hours, ships hardened_malloc + JIT-less + Drumbrake WASM |
| **Anti-fingerprint companion** | **Mullvad Browser** | Clearnet, no Tor, layered alongside Trivalent for pseudonymous browsing |
| **Optional opt-in** | **Thorium** | `ujust install-thorium` only — WARN users of months-long CVE lag (LTS Chromium base, ~9 milestones behind upstream stable as of 2026-05) |
**DO NOT default to Thorium under any circumstances** — contradicts
the threat model. Trivalent's COPR keeps us inside one-hour-of-upstream
patch latency; Thorium is multi-month-stale and is a perf/media
profile choice, not a security choice.
The earlier draft of this doc treated Trivalent as an override-and-
remove. That was wrong: Trivalent is exactly the level of hardening
we want for a default browser. Keep it. Add Mullvad alongside.
Move Thorium behind an explicit opt-in.
## Mesh stack — three-layer warm-stack
Day 1 ships layers 1 (Tailscale) and 2 (Yggdrasil idle). Layer 3
(Reticulum) is opt-in via `ujust`.
### Layer 1 — Tailscale + Headscale (daily driver)
- Already running on `nullstone`, `hs.s8n.ru`. OIDC via Authentik.
- Veilor OS ships `tailscale-1.94.2+` from official Fedora repo.
- Service unit **pre-disabled** at install time.
- First-boot prompt: "join Veilor mesh? [paste / QR]". On accept:
`tailscale up --login-server=https://hs.s8n.ru` with the user's
pre-auth key.
### Layer 2 — Yggdrasil-go (warm fallback, idle by default)
- `yggdrasil-go` 0.5.13+ from COPR / dnf.
- Decentralized IPv6 in `200::/7`.
- systemd unit **enabled** but config = empty `Listen[]`, one
`Public peer` (e.g. `vpn.itrus.su` or another EU peer),
`AllowedPublicKeys` allowlist mode (no allow-all).
- WSS:443 transport for ISP DPI evasion.
- Generates ECC keypair on first boot via systemd-tmpfiles or
firstboot script.
- Survives ISP-level Tailscale block (threat floor (ii)).
### Layer 3 — Reticulum (opt-in)
- **RetiNet AGPL fork** (NOT upstream RNS — upstream has an anti-AI
license clause incompatible with our governance). Sourced from the
Codeberg AGPL fork.
- Sideband (Android/desktop messenger built on RNS).
- Install via `ujust install-reticulum`. NOT auto-started until
RetiNet stabilizes.
- Default config when enabled: `AutoInterface` (LAN multicast) +
12 TCP backbone peers.
- RNode hardware (LoRa transceiver) bundle as separate
`ujust install-reticulum-rnode`.
- Survives total internet outage (threat floor (iii)) when paired
with RNode.
## Onboarding model
Token-based (paste OR QR, user picks). Misskey signup page mints a
**reusable pre-auth key** (TTL=24h, single-use, regenerated per
signup). First boot of Veilor ISO accepts hex paste OR QR scan of
the same key.
**NOT auto-OIDC at first boot** — too much Authentik exposure for
day-zero users.
## Tier model — three-tier
- `tag:admin` — onyx + failsafe. Full mesh, `*:*`.
- `tag:infra` — nullstone, office. Mesh among themselves; admin
inbound only.
- `tag:guest` — Veilor OS users + friend. ONLY `x.veilor:443`
reachable + future seeded service hostnames whitelisted.
- **Failsafe** — pre-baked admin pre-auth key on yubikey + printed
paper + Authentik OIDC group `tailnet-admin` as second auth path.
## Threat floor table
| Floor | Attack | Day 1 (v0.7 ship) | Phase 2 (v0.8) |
|-------|--------|---|---|
| (i) | ISP blocks `s8n.ru` DNS | Tailscale dies, Yggdrasil survives | YES (documented failover) |
| (ii) | ISP blocks Tailscale protocol | Yggdrasil-WSS:443 survives | YES |
| (iii) | Internet unreachable | RNS over LoRa survives | OPT-IN (RetiNet + RNode) |
Day 1 must hold floor (i). Floors (ii) and (iii) become P2 once
Yggdrasil is promoted from idle to documented failover.
## Iroh seeding daemon (Phase 2 / v0.8)
- `veilor-seed.service` systemd unit, runs as `_veilor-seed` user.
- Watches `/var/lib/<service>/files/` blob store directories.
- BLAKE3-hashes new blobs, registers with local iroh node.
- Publishes tickets on per-service `iroh-gossip` topic.
- LRU local cache, default 10 GB.
- Sidecar mirrors service blob stores: Misskey `/files/`, Matrix
media, `dl.veilor` downloads.
- Other Veilor nodes pull lazily on cache miss.
- **DEFER DB replication forever.** Static media only.
DOCUMENT but DO NOT IMPLEMENT until **Iroh hits 1.0** (currently
0.960.98 RC season; 1.0 target Q1 2026 slipped, watching).
Reference: <https://github.com/n0-computer/iroh-blobs/blob/main/DESIGN.md>.
## External dependency — Phase 0 (NOT veilor-os scope)
Real ACL gap on nullstone Traefik right now: friend on `tag:guest`
can reach `nullstone:443` → SNI-routes to ALL Traefik vhosts
(`sys.s8n.ru`, `pihole.s8n.ru`, `hs.s8n.ru`, `auth.s8n.ru`, n8n, rc,
mx, …). Only per-vhost auth blocks them. The `no-guest@file` Traefik
middleware that should fix this is currently an `0.0.0.0/0`
allow-all stub (neutralized 2026-05-03 from XFF chain breakage).
**veilor-os does NOT fix this.** Tracked here as an external
dependency: ACL fix on nullstone Traefik **required before veilor-os
first-public-ISO ships**, otherwise `tag:guest` provisioning leaks
the full vhost surface to every veilor user. Parent operator owns it.
## Strategic credibility win
secureblue does NOT publish a threat model. Athena OS does, and it's
their main differentiator. We've already drafted
`docs/THREAT-MODEL.md` (Agent 5 of 2026-05-05 wave). Publishing that
*before* the v0.7 launch positions veilor-os ahead of secureblue and
Athena on the one axis that matters most for a security-branded
distro: **honest, scoped, public threat model**.
## Roadmap implications
| Version | Status | Path |
|---|---|---|
| v0.5.31 | shipped | Anaconda kickstart, mutable root |
| v0.5.32 | active — top blockers from 9-agent wave | Anaconda kickstart |
| v0.5.x → v0.6 | maintenance | Anaconda kickstart, ergonomics + UX polish |
| **v0.7 spike** | **1-day BlueBuild prototype** (was 2 days; `ostreecontainer` removes first-boot-rebase work) | First veilor OCI image extending secureblue-kinoite-hardened |
| v0.7 ship | ISO bootstraps install, `ostreecontainer` populates from OCI in-pass | Hybrid path live |
| v0.8 | Iroh seeding (P2P static media), Yggdrasil promoted from idle to documented failover, RetiNet stabilization watch | bootc-only direction |
| **v1.0** | **bootc-only**, kickstart deprecated, possibly migrate `ostreecontainer` → new `bootc` kickstart command if multi-disk + auth-registry blockers resolved upstream | `bootc upgrade` for all updates |
The Containerfile-from-scratch spike plan (Agent 3 of 2026-05-05
wave) is **superseded** by this hybrid: don't build a Containerfile
from scratch on `fedora-bootc:43`. Instead, write a BlueBuild recipe
on `securecore-kinoite-hardened-userns`. With `ostreecontainer`
swap, spike compresses 1 week → 1 day.
## Next concrete steps
### v0.5.32 — current (no strategy change)
Ship the 7 blockers from `docs/research/2026-05-05-agent-wave/`:
suspend/resume wifi fix, firstboot WantedBy, USBGuard id-rules,
firewalld tailscale0 zone, KMS modeset, /etc/skel branding, virtio-9p
log capture.
`ostreecontainer` swap **does NOT land in v0.5.32 main.** It belongs
in the v0.7 spike branch only.
### v0.7-spike (1 day, separate branch)
1. New repo dir: `bluebuild/recipe.yml`.
2. `from`: `ghcr.io/secureblue/securecore-kinoite-hardened-userns:latest`.
3. Override modules:
- `type: files` — stamp our `overlay/*` tree (branding, themes,
veilor scripts, sddm theme, plymouth theme).
- `type: rpm-ostree` — install Mullvad Browser + restore Xwayland +
re-enable sudo (revert run0).
- **Keep Trivalent** as default (was wrongly marked for removal in
the first draft of this doc).
- `type: brand` — PRETTY_NAME, GRUB_DISTRIBUTOR, distributor URL.
- `type: files` — pre-disabled `tailscale.service`, idle
`yggdrasil.service`, `ujust install-reticulum` and
`ujust install-thorium` recipes.
4. `.github/workflows/build-bluebuild.yml` — pull BlueBuild action,
build + cosign sign + push to GHCR.
5. `kickstart/install.ks` — replace `%packages` block with
`ostreecontainer --url=ghcr.io/veilor/veilor-os:43
--transport=registry`. Keep existing partitioning + LUKS block
verbatim. **Drop** all planned `veilor-firstboot-rebase.service`
work — no longer needed.
### v1.0 — bootc-only
- Drop `kickstart/veilor-os.ks`, drop `livecd-creator` workflow.
- Bootstrap ISO is built as a **separate artifact** (NOT via
`bootc-image-builder anaconda-iso`, which was deprecated in
image-builder v44).
- The OCI image is the source of truth.
- `veilor-update` becomes thin `bootc upgrade --apply` wrapper.
- Migrate `ostreecontainer` directive → new `bootc` kickstart
command IF multi-disk + authenticated-registry support has landed
upstream by then.
## Open questions
- Does secureblue accept upstream contributions? If yes, send our
USBGuard id-based-rules fix and our threat-model framework.
- Recovery flow when `ostreecontainer` install pass fails — Anaconda
should abort cleanly; verify in spike that no half-installed
state is bootable.
- Iroh 1.0 timing — currently 0.960.98 RC; Q1 2026 target slipped.
Re-evaluate Phase 2 schedule when 1.0 lands.
- RetiNet upstream stabilization — track Codeberg fork for releases.
If it stalls > 6 months we re-evaluate Layer 3.
- Fedora 44 transition: secureblue tracks Fedora releases (current
`v4.9` on F44). If we follow, we get F44 for free at the same time
upstream does.
## Self-hosted git + CI (locked 2026-05-05)
Primary git host moved off github.com. **Forgejo** runs on nullstone
at `git.s8n.ru`, with **forgejo-runner** doing the build work. GH free-
tier minute quota was hammering veilor-os iteration; we self-host now.
- Primary remote: `ssh://git@192.168.0.100:222/veilor-org/veilor-os.git`
(Forgejo, LAN-only until router port-forward 222 → nullstone:222
added — TODO; or use tailnet hostname once tailscale logged in).
- Public mirror: `https://github.com/veilor-org/veilor-os.git`. Forgejo
push-mirrors every commit + every 8h, so GH stays in sync without
consuming GH minutes.
- Runner labels: `ubuntu-24.04` (catthehacker image — works for our
current build-iso.yml unmodified) and `nullstone` (privileged Fedora
43 container — opt-in via `runs-on: nullstone`).
- Build cost: 0 GH minutes. Disk: ~80 GB workspace on /home/docker.
Deploy artifacts: `~/ai-lab/nullstone-server/forgejo/`. Runbook in same
dir.
## See also
- `docs/THREAT-MODEL.md` — drafted, needs publish for v0.7
- `docs/ROADMAP.md` — updated to reflect this strategy
- `docs/research/2026-05-05-agent-wave/03-bootc-spike-plan.md`
superseded by this hybrid (kept as reference for the
Containerfile-from-scratch alternative)
- secureblue: <https://github.com/secureblue/secureblue>
- BlueBuild: <https://blue-build.org>
- bootc / ostreecontainer docs: <https://docs.fedoraproject.org/en-US/bootc/>
- Yggdrasil: <https://github.com/yggdrasil-network/yggdrasil-go>
- Reticulum manual: <https://reticulum.network/manual/>
- Iroh blobs design: <https://github.com/n0-computer/iroh-blobs/blob/main/DESIGN.md>

157
docs/THREAT-MODEL.md Normal file
View file

@ -0,0 +1,157 @@
# Threat Model
> **Status:** Final for v0.7 public launch. Honest scope.
veilor-os is a hardened daily-driver desktop. Not a paranoia OS, not an
anonymity OS, not an isolation OS. This document exists so that
security-conscious developers, journalists, and activists can decide whether
the threat model fits their actual adversary before they trust the system.
If your adversary is on the "out of scope" list below, **use a different
tool**. veilor-os will not save you, and we will not pretend otherwise.
---
## In scope — what veilor-os defends against
Every row cites the file or setting that implements the mitigation, so the
claim is auditable from a clean checkout.
| Adversary / scenario | veilor-os mitigation |
|---|---|
| Lost or stolen laptop, powered off | LUKS2 `aes-xts-plain64` + `argon2id` (`mem=1 GiB`, `time=9`) on root LV; swap is `zram` only — no persistent key material on disk. Defined in `kickstart/veilor-os.ks` `part pv.veilor` block. |
| Generic browser / email malware (drive-by RCE, malicious attachment) | SELinux `enforcing` + targeted policy + custom `veilor-systemd.te` module (`scripts/selinux/`); sysctl knobs in `/etc/sysctl.d/99-veilor-hardening.conf`: `kernel.kptr_restrict=2`, `kernel.yama.ptrace_scope=2`, `kernel.perf_event_paranoid=3`, `net.core.bpf_jit_harden=2`, `kernel.randomize_va_space=2`, `fs.suid_dumpable=0`, `dev.tty.ldisc_autoload=0`. AppArmor profile skeletons in `scripts/apparmor/` for Trivalent/Thorium/lm-studio (opt-in, complain mode, hardens to enforce per profile). |
| Console-side USB attack (BadUSB, rubber ducky, juice-jack) | USBGuard daemon, `ImplicitPolicyTarget=block`, **id-based** rules in `/etc/usbguard/rules.conf` (vendor:product, not hash — survives dock replug). Empty allowlist on first boot; operator runs `usbguard generate-policy` after plugging trusted devices. |
| SSH brute-force / credential-stuffing | `/etc/ssh/sshd_config.d/10-veilor-hardening.conf`: `PasswordAuthentication no`, `PermitRootLogin no`, `AllowUsers admin`, `MaxAuthTries 3`, `X11Forwarding no`, `LogLevel VERBOSE`. `fail2ban` `sshd` + `pam-generic` jails (journald backend) ban via firewalld `rich-rule` action. |
| Post-incident forensics ("what happened?") | `auditd` rules in `/etc/audit/rules.d/99-veilor-hardening.rules` watch `/etc/{passwd,shadow,group,sudoers,sudoers.d,ssh/sshd_config*,selinux,firewalld,cron.*,sysctl.*,systemd/system}`, every privileged binary (`sudo`, `su`, `passwd`, `mount`, `pkexec`, …), `init_module`/`finit_module`/`delete_module` syscalls, and uid≥1000 perm/owner changes. Logs persist across reboot. |
| Supply-chain on the OS image itself | Secure Boot enforced (Fedora signed shim → GRUB → kernel). v0.7 adds cosign-signed OCI image at `ghcr.io/veilor/veilor-os:43`, GPG-signed ISO + sha256 + .asc, plus our own MOK for out-of-tree module signing. |
| Unprivileged local user attempting LPE | Root account locked (`passwd -l root`; `passwd -S root``L`); single `admin` user in `wheel`; `pwquality.conf` `minlen=14`, `minclass=4`, dictcheck on. Kernel `lockdown=integrity`, `slab_nomerge`, `init_on_alloc=1`, `init_on_free=1`, `randomize_kstack_offset=on`, `vsyscall=none` set in bootloader args. Module loading frozen 30 s after graphical boot via `veilor-modules-lock.service`. |
| Network-listening services as attack surface | `firewalld` default zone = `drop`; only `sshd` answers. `abrt*`, `cups`, `cups-browsed`, `geoclue`, `avahi-daemon`, `bluetooth`, `ModemManager`, `gssproxy`, `atd`, `pcscd.{socket,service}` are masked; `kdeconnectd` and `PackageKit` are removed at the package level. |
| Time-based MITM (back-dated certs, replay) | `chrony` with NTS authentication against `time.cloudflare.com` and `nts.sth1/2.ntp.se` (pool fallback only). `systemd-resolved` with DNS-over-TLS opportunistic, DNSSEC `allow-downgrade`, LLMNR off; resolvers Cloudflare 1.1.1.1 / 1.0.0.1, fallback Quad9 9.9.9.9 / 149.112.112.112. |
---
## Out of scope — what veilor-os does NOT defend against
These adversaries are unambiguously outside our scope. Pretending otherwise
gets people hurt. **If your adversary is on this list, pick a different tool.**
| Adversary / scenario | Why veilor-os doesn't help | Use instead |
|---|---|---|
| Firmware-level implant (UEFI, Intel ME, BMC, EC) | veilor-os does not protect against firmware implants. Secure Boot validates the OS chain only; we do not flash, audit, or sign firmware below GRUB. | Heads / coreboot on supported hardware. |
| Evil-maid attack on a running, unlocked system | LUKS master keys live in RAM while the system is up. A physically present attacker can dump RAM (cold-boot, Thunderbolt DMA, debug header) and recover them. | Power off when unattended. Disable Thunderbolt DMA in firmware. Qubes-in-a-Faraday-bag if you are that target. |
| Hardware keylogger / interposer between keyboard and machine | veilor-os is software. Software cannot detect a passive hardware tap. | Physical custody of the device. Tamper-evident seals. |
| Targeted RCE on the user session (browser 0-day, messenger exploit) | KDE Plasma is not sandboxed. A logged-in compromise owns the user's data and tokens. SELinux confines daemons; it does not confine the desktop session. | Qubes OS (per-app Xen VM isolation). |
| Side-channel attacks on AES (timing, cache, power, EM) | veilor-os ships stock kernel crypto. We provide no constant-time or power-analysis guarantees beyond what the kernel and CPU deliver. | Threat-specific HSM, air-gap. |
| Physical attack on a TPM2 chip (bus probe, glitch, decap) | veilor-os does not bind keys to TPM2 in v0.7. Even when binding lands post-v1.0, TPM2 is not anti-tamper hardware. | Off-device key custody (smartcard / YubiKey / OnlyKey). |
| Network-level traffic correlation / traffic analysis | All packets leave the box on the local IP. veilor-os does not onion-route. | Tails, Whonix, Tor. |
| Trust-on-first-use attacks (operator accepts a bad cert) | veilor-os cannot override the operator's explicit decisions. Bad SSL or SSH host-key acceptance is out of scope. | Enrolment policy, MDM, certificate pinning. |
| Adversary with sustained physical access and time | Given unlimited physical time and tools, any laptop falls. | Operational security, not OS choice. |
---
## Hardening tradeoffs (what you give up)
Hardening that breaks ordinary work gets called out, not hidden.
- **SELinux enforcing** — some apps (proprietary, out-of-tree) ship
without policy. Symptom: `EACCES` despite correct file perms.
Workaround: write a local policy module; do not switch to permissive.
- **LUKS2 argon2id (mem=1 GB / time=9)** — boot 530 s slower on older
CPUs. The cost of a passphrase that survives a GPU attacker.
- **USBGuard default-block** — every new device needs an explicit allow.
First-boot: plug trusted devices in, run `usbguard generate-policy`.
Forget this and your USB-C dock looks broken.
- **Module lockdown 30 s after graphical boot** — out-of-tree drivers
(NVIDIA proprietary, VirtualBox, out-of-tree wireguard) will fail.
Load early via initramfs or use the in-tree alternative.
- **firewalld zone = drop** — KDE Connect, mDNS printer discovery, SMB
browsing don't work until explicitly opened. This is the point.
- **No PackageKit / no Flatpak by default** — updates happen on your
terms via `dnf upgrade`.
---
## Where veilor-os IS like Tails / Whonix / Qubes
- Threat model published. Transparency about scope is the price of being
taken seriously.
- Default-deny firewall (`drop` zone, ssh inbound only).
- Encrypted at rest by default — LUKS2 + argon2id, no-disk-swap (zram).
## Where veilor-os DIFFERS
- **Daily-driver target.** Boot it once, install it, use it for years.
Not a session-only / amnesia OS.
- **Single-VM / single-kernel.** No per-app compartmentalisation. A
browser RCE owns your session. (See "out of scope".)
- **Persistent identity by design.** Your `~`, your keys, your shell
history persist. This is a feature for an operator, a misfeature for
an activist evading correlation.
---
## Comparison matrix
Scoring legend: `✓` shipped & on by default, `~` partial / opt-in,
`✗` not provided, `n/a` not applicable to that distro's model.
Project metrics are GitHub / Codeberg figures as of 2026-05.
| Axis | veilor-os | Stock Fedora KDE | Kicksecure | Tails | Qubes OS | secureblue | Athena OS |
|---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
| **Encrypted at rest by default** | ✓ (LUKS2 argon2id, mem=1 GiB) | ~ (optional in Anaconda) | ✓ | n/a (amnesic, session-only) | ✓ | ✓ | ~ (optional) |
| **MAC enforcing OOTB** | ✓ (SELinux + opt-in AppArmor) | ✓ (SELinux) | ✓ (AppArmor) | ✓ (AppArmor) | ✓ (per-VM) | ✓ (SELinux) | ✓ (AppArmor) |
| **Default-deny firewall** | ✓ (firewalld zone=drop) | ✗ | ✓ | ✓ (Tor-only) | ✓ | ✓ | ✓ |
| **USB default-block** | ✓ (USBGuard, id-rules) | ✗ | ✓ | ✓ | ✓ (sys-usb) | ✓ (USBGuard) | ✗ |
| **Per-app isolation (VM/sandbox)** | ✗ | ✗ | ✗ | ~ (AppArmor) | ✓ (Xen VMs) | ~ (Flatpak/bwrap) | ✗ |
| **Anonymity / Tor by default** | ✗ | ✗ | ✗ | ✓ | ~ (Whonix VMs) | ✗ | ✗ |
| **Daily driver target (persistent)** | ✓ | ✓ | ✓ | ✗ (amnesic) | ✓ (heavy, hardware-partitioning) | ✓ | ✓ |
| **Signed releases (cosign + GPG)** | ✓ (v0.7) | ✓ | ✓ | ✓ | ✓ | ✓ (cosign on OCI) | ~ (sha256 only) |
| **Threat model published** | ✓ (this doc) | ✗ | ✓ | ✓ | ✓ | ✗ | ✓ |
| **Hardware compatibility (laptops)** | ✓ (Fedora kernel) | ✓ | ~ | ~ (live USB) | ~ (Xen-pinned HCL) | ✓ | ✓ (Arch kernel) |
| **Project size (contributors / stars, 2026-05)** | solo / pre-public | n/a (Fedora-wide) | small team / ~600 | ~30 / ~3k | large / ~5k | ~30 / ~940, active monthly cadence | ~8 / ~1.4k |
---
## Where veilor-os fits
Pick veilor-os if your job is to write code, edit docs, manage
infrastructure, read mail, browse — and you want a desktop that won't
quietly betray you to a generic adversary while you do it. **You are the
user, not the target of a state.**
Pick **Tails** for amnesia and Tor by default. **Qubes** if you must assume
any app could be compromised. **Kicksecure** for similar hardening on
Debian. **secureblue** for a hardened atomic Fedora. **Stock Fedora KDE**
if you just want Fedora with no opinions.
---
## v0.7 public-launch checklist
These are the items that gate flipping the repo public and posting:
- [ ] Threat model finalised and published (this document).
- [ ] GPG-signed releases working (v0.4 dependency — ISO + sha256 + .asc).
- [ ] Reproducible build verifiable from clean checkout (v0.4).
- [ ] mkdocs-material (or Hugo) site live on `veilor.org`, generated from
`docs/`. INSTALL, HARDENING, BUILD, ROADMAP, RELEASE, THREAT-MODEL,
CONTRIBUTING all rendered.
- [ ] Comparison + benchmark numbers published (cold boot, idle RAM, idle
egress, suspend/resume) vs stock Fedora KDE.
- [ ] Press kit page: wallpapers, logo SVG, screenshots, feature
one-liner, signed quotes from early users.
- [ ] **"What veilor-os is not"** preempt page — direct link from launch
post. Answers "why not Qubes?", "why not Tails?", "why not just
stock Fedora?" so the first hundred comments don't have to.
- [ ] Comparison post drafted for **r/linux**, **r/Fedora**, **HN**.
Same body, three formats. Lead with the threat model link, not the
black wallpaper.
- [ ] CHANGELOG.md tagged at v0.7.0 release commit; GitHub Release
created with ISO + sha256 + .asc artefacts attached.
- [ ] Repo flipped to public, `veilor.org` DNS pointed at the docs site,
Mastodon / Matrix / SimpleX announcement queued.
---
*Last reviewed: v0.7 draft. Update every minor release.*

View file

@ -0,0 +1,109 @@
# Plymouth + LUKS unlock — real-hardware edge cases
**Agent 1 of 9-agent wave, 2026-05-05.**
## State at v0.5.31
- Live ISO cmdline pins `plymouth.enable=0 fbcon=nodefer`.
- Installed system uses Plymouth `details` theme.
- LUKS2 argon2id, no clevis / cryptenroll, no recovery key generation.
- `rd.vconsole.keymap=` not set.
## Findings
### 1. KMS / fbcon races
- **Symptom:** Black screen at LUKS prompt, cursor blinks, keystrokes
swallowed but never accepted.
- **Cause:** `i915` / `amdgpu` / `nvidia-drm` modeset fires *during*
plymouthd handover. With `plymouth.enable=0` we skip the splash but
the ask-password agent still opens `/dev/tty1`, which races `fbcon`
rebind.
- **Fix:** keep `fbcon=nodefer`, append
`nvidia-drm.modeset=1 i915.fastboot=0 amdgpu.dc=1` to bootloader.
NVIDIA Optimus killer is `nvidia-drm.modeset=1`.
- **Probability:** HIGH on Optimus, MED on AMD APU, LOW on Intel iGPU.
### 2. Plymouth theme choice — keep `details`
- `details` (kernel/systemd journal under prompt) is best for
blind-typing because the user sees `Please enter passphrase…` *as
text*, full echo as `*`.
- `text` is minimal fallback (no echo, no journal).
- `spinner` is the documented "endless loop, no prompt" failure mode
on real laptops (adi1090x/plymouth-themes#10, Arch BBS 296529).
- **No change.** But verify `plymouth-set-default-theme details`
actually ran post-install (Debian #986023 shows it silently fails
when initramfs rebuild is suppressed). Add `dracut --force
--regenerate-all` after the call.
### 3. Initramfs keymap — HIGH probability for non-US users
- **Symptom:** AZERTY/QWERTZ/Cyrillic user types correct passphrase,
gets "no key available". F43 ships en-US in initramfs by default.
- **Bugs:** RHBZ 1405539, RHBZ 1890085, fedora-silverblue#3.
- **Fix:** drop a placeholder `rd.vconsole.keymap=us` AND have
`firstboot.sh` rewrite it from `/etc/vconsole.conf` after the user
picks a layout. Also `/etc/dracut.conf.d/veilor-keymap.conf` with
`install_items+=" /etc/vconsole.conf "` so keymap is *baked* into
initramfs.
### 4. systemd-cryptsetup vs legacy `crypt` — F43 = systemd-cryptsetup
- F40+ unconditionally uses `systemd-cryptsetup@.service` from
`/etc/crypttab`. Old `rd.luks.uuid=` cmdline still parsed. Stable
through 6.x kernels. No change needed.
### 5. argon2id memory cost — MED on old laptops (<8 GB RAM)
- LUKS2 default = 1 GiB memory cost, `iter-time=2000 ms`. On
Core 2 Duo / Pentium-N this becomes 815s unlock + thrash.
Atom-class N4020: 30s+.
- **Fix in installer post-script:**
`cryptsetup luksConvertKey --pbkdf-memory 524288 --iter-time 2000`
— halves memory to 512 MiB, knocks ~50% off unlock latency.
### 6. TPM2 unlock — defer to v0.6
- F43 ships `systemd-cryptenroll --tpm2-device=auto` ([Fedora
Magazine](https://fedoramagazine.org/automatically-decrypt-your-disk-using-tpm2/)).
No clevis required.
- **v0.6 plan:** opt-in via `veilor-firstboot`
`systemd-cryptenroll --tpm2-pcrs=7+11`. PCR 7 (secure boot state)
+ 11 (kernel/initrd). Don't auto-enroll; PCR pinning is a footgun
on kernel updates.
### 7. FIDO2 unlock — v0.7
- `systemd-cryptenroll --fido2-device=auto` requires `libfido2` +
hmac-secret support. secureblue ships this. Add `libfido2` to
`%packages` + `veilor-fido2-enroll` wrapper.
### 8. Recovery key — MISSING, ship in v0.6
- Today: forgotten passphrase = brick.
- **Fix:** in `firstboot.sh` add
`cryptsetup luksAddKey --pbkdf argon2id /dev/X <(systemd-creds
setup --print-key | head -c 64)` and print the 64-char key once
to a numbered envelope-style screen. Mirrors macOS FileVault.
## Action items
| # | Change | Target |
|---|--------|--------|
| 1 | `nvidia-drm.modeset=1 i915.fastboot=0 amdgpu.dc=1 rd.vconsole.keymap=us` to bootloader append | v0.5.32 |
| 2 | `/etc/dracut.conf.d/veilor-keymap.conf` with `install_items+=" /etc/vconsole.conf "` | v0.5.32 |
| 3 | Force `dracut -f --regenerate-all` after `plymouth-set-default-theme details` | v0.5.32 |
| 4 | argon2id retune (`40-luks-tune.sh`) | v0.6 |
| 5 | Recovery-key generation in firstboot | v0.6 |
| 6 | TPM2 opt-in via `systemd-cryptenroll --tpm2-pcrs=7+11` | v0.6 |
| 7 | FIDO2 opt-in | v0.7 |
## Sources
- [LUKS keyboard layout — fedora-silverblue/issue-tracker#3](https://github.com/fedora-silverblue/issue-tracker/issues/3)
- [RHBZ 1405539 — keymap not honored on initramfs rebuild](https://bugzilla.redhat.com/show_bug.cgi?id=1405539)
- [RHBZ 1890085 — English keymap forced in initramfs](https://bugzilla.redhat.com/show_bug.cgi?id=1890085)
- [Fedora Magazine — TPM2 autodecrypt with systemd-cryptenroll](https://fedoramagazine.org/automatically-decrypt-your-disk-using-tpm2/)
- [Leo3418 — argon2id LUKS tuning](https://leo3418.github.io/collections/gentoo-config-luks2-grub-systemd/tune-parameters.html)
- [QubesOS#8600 — argon2id parameters](https://github.com/QubesOS/qubes-issues/issues/8600)

View file

@ -0,0 +1,117 @@
# SDDM + first-boot UX failure modes
**Agent 2 of 9-agent wave, 2026-05-05.**
## Findings
### 1. SDDM has no username prefilled — BLOCKS LOGIN (perceived)
- User sees blank greeter; no signal that the only user is `admin`.
- **Fix:** `/etc/sddm.conf.d/veilor.conf` add
`[Users]\nRememberLastUser=true` plus seed
`/var/lib/sddm/state.conf [Last]\nUser=admin\nSession=plasma`.
### 2. chage -d 0 + SDDM autologin race
- With `Relogin=false` (current), single-shot is safe.
- **Fix:** Document `Relogin=false`. Don't combine `Autologin=true`
with `chage -d 0`.
### 3. PAM expired-pw change inline in SDDM
- Plasma 6 SDDM 0.21+ renders the chain. **But** if password fails
pwquality (cracklib min=14 + complexity from
`10-harden-base.sh`), error text shown briefly then form resets —
user sees no clear reason for rejection.
- **Fix:** `/etc/security/pwquality.conf.d/10-veilor.conf` with
documented rules + Plasma startup notification showing them.
### 4. Wayland session start failure on virtio-vga — BLOCKS LOGIN
- KWin tries `wlroots`/DRM, fails to acquire `/dev/dri/card0` if
`virtio_gpu` kernel module not loaded.
- **Fix:** add `plasma-workspace-x11` to `%packages`. SDDM session
menu shows `Plasma (X11)` fallback.
### 5. Plasma 6 first-run wizards on /etc/skel-empty
- KWin compositor backend pick + Plasma welcome center + accent
colour wizard — modal stealing focus on first session.
- **Fix:** seed `/etc/skel/.config/`:
- `kwinrc` `[Compositing]\nBackend=OpenGL`
- `kdeglobals [General]\nAccentColor=...`
- `plasma-welcomerc [General]\nLastSeenVersion=99` (suppresses welcome)
### 6. SELinux relabel after first boot — looks like hang
- `touch /.autorelabel` triggers full restore on rootfs; 90s on
4 GB live install, 3-5min on real disk. User hard-resets thinking
it crashed → corrupted relabel state.
- **Fix:** replace with `veilor-relabel.service` that prints
`[veilor] relabeling SELinux file contexts (1/N): %s` to TTY1
with progress, plus one-time post-relabel KDialog notification.
### 7. USBGuard blocks input at SDDM — BLOCKS LOGIN on desktops
- If `/etc/usbguard/rules.conf` empty/missing, USBGuard
`ImplicitPolicyTarget=block` (default) blocks USB. SDDM running
but USB keyboard dead.
- **Fix:** ship a baseline `rules.conf`:
`allow with-interface equals { 03:00:* 03:01:* }`
(HID class) so any keyboard/mouse works pre-policy.
### 8. NetworkManager DHCP — LOW severity
- Wired auto-connects fine. Wi-Fi: silent failure unless SSID
preconfigured. Acceptable; Plasma 6 ships `plasma-nm` widget.
- **Polish:** `/etc/xdg/autostart/veilor-firstboot-net-check.desktop`
→ KDialog "Connect to network?" if `nmcli general` is `disconnected`.
### 9. veilor-firstboot.service ordering — BLOCKS LOGIN on real installs
- **Current:** `WantedBy=multi-user.target` only.
- **Real installs:** default to `graphical.target`, so unit never runs.
- Admin pw stays `veilor` + chage-expired. SDDM PAM bounces to
chauthtok screen — recoverable but ugly.
- **Fix:** `WantedBy=graphical.target multi-user.target`. Add
`Before=graphical.target`. Verify `systemctl enable
veilor-firstboot.service` (in installer line 884) resolves both.
Add `DefaultDependencies=no` + `Wants=systemd-vconsole-setup.service`.
## Endeavour OS welcome app — design notes for veilor-postinstall
EOS welcome (`endeavouros-team/welcome` on GitHub) is bash + yad,
~3000 LOC. Patterns to lift for veilor:
- **Yad GTK dialog** as runtime (single binary dep). veilor (KDE)
uses `kdialog` + `qmlscene` instead — native Plasma look.
- **Tabbed layout:** Welcome | Set up apps | Security | System info | Shortcuts.
- **Self-disabling autostart:**
`~/.config/autostart/veilor-welcome.desktop` removed after user
clicks "Don't show again".
- **External script dispatch:**
`/usr/share/veilor-os/postinstall/<step>.sh` per step. Decouples
UI from actions.
- **Update channel awareness:** pull from
`github.com/veilor-org/veilor-os` releases atom feed; show CVE
advisories from `security.atom` we publish.
**Recommended stack:**
- `/usr/bin/veilor-welcome` (bash entrypoint, ≤300 LOC)
- `/usr/share/veilor-os/postinstall/welcome.qml` (QtQuick/Kirigami UI)
- `/usr/share/veilor-os/postinstall/steps/{01-account,02-network,03-usbguard-policy,04-update,05-tour}.sh`
- `/etc/xdg/autostart/veilor-welcome.desktop`
- Replace current `scripts/firstboot.sh` placeholder with
`step 03-usbguard-policy` (auto-generate-policy is the unfinished
core item).
## Top three to ship next (highest UX impact, lowest risk)
1. **`WantedBy=graphical.target multi-user.target`** in
`veilor-firstboot.service` — fixes silent SDDM-PAM-chauthtok
bounce on real installs.
2. **Username prefill** in `sddm.conf.d/veilor.conf`: add `[Users]
RememberLastUser=true` + `/var/lib/sddm/state.conf [Last]
User=admin Session=plasma`.
3. **USBGuard HID baseline `rules.conf`** — un-bricks any desktop
with USB keyboard.

View file

@ -0,0 +1,158 @@
# bootc-image-builder spike plan — 1-week timebox
**Agent 3 of 9-agent wave, 2026-05-05.** Schedule: v0.7.
## Containerfile draft
```dockerfile
# veilor-os bootc image — Fedora 43 KDE base
FROM quay.io/fedora/fedora-bootc:43
ARG VEILOR_VERSION=0.6.0
RUN dnf install -y --setopt=install_weak_deps=False \
@kde-desktop-environment @kde-apps @core @hardware-support @standard \
kernel-modules kernel-modules-extra glibc-all-langpacks \
grub2-efi-x64 grub2-efi-x64-modules grub2-pc grub2-pc-modules \
grub2-tools grub2-tools-extra shim-x64 efibootmgr \
newt parted cryptsetup lvm2 btrfs-progs \
fail2ban fail2ban-firewalld usbguard usbguard-tools audit \
policycoreutils-python-utils tuned chrony firewalld plymouth \
git vim-enhanced tmux htop podman skopeo \
NetworkManager NetworkManager-wifi \
fontconfig freetype fira-code-fonts \
zram-generator \
&& dnf remove -y --noautoremove \
'abrt*' snapd kde-connect open-vm-tools-desktop mlocate man-db man-pages \
&& dnf clean all && rm -rf /var/cache/dnf
ARG GUM_VERSION=0.17.0
ARG GUM_SHA256=69ee169bd6387331928864e94d47ed01ef649fbfe875baed1bbf27b5377a6fdb
ADD https://github.com/charmbracelet/gum/releases/download/v${GUM_VERSION}/gum_${GUM_VERSION}_Linux_x86_64.tar.gz /tmp/gum.tgz
RUN echo "${GUM_SHA256} /tmp/gum.tgz" | sha256sum -c - \
&& tar -xzf /tmp/gum.tgz -C /tmp \
&& install -m0755 /tmp/gum_${GUM_VERSION}_Linux_x86_64/gum /usr/bin/gum
COPY overlay/ /
COPY assets/ /usr/share/veilor-os/assets/
COPY scripts/ /usr/share/veilor-os/scripts/
RUN bash /usr/share/veilor-os/scripts/10-harden-base.sh \
&& bash /usr/share/veilor-os/scripts/20-harden-kernel.sh \
&& bash /usr/share/veilor-os/scripts/selinux/build-policy.sh \
&& bash /usr/share/veilor-os/scripts/kde-theme-apply.sh \
&& bash /usr/share/veilor-os/scripts/30-apply-v03-theme.sh
RUN plymouth-set-default-theme details \
&& sed -i \
-e 's|^GRUB_DISTRIBUTOR=.*|GRUB_DISTRIBUTOR="veilor-os"|' \
/etc/default/grub
# bootc kargs go in /usr/lib/bootc/kargs.d/, not /etc/default/grub
RUN mkdir -p /usr/lib/bootc/kargs.d && cat > /usr/lib/bootc/kargs.d/10-veilor-hardening.toml <<'EOF'
kargs = [
"lockdown=integrity",
"slab_nomerge",
"init_on_alloc=1",
"init_on_free=1",
"randomize_kstack_offset=on",
"vsyscall=none",
"fbcon=nodefer",
]
EOF
RUN systemctl enable sshd fail2ban usbguard tuned auditd firewalld chronyd sddm \
veilor-firstboot.service veilor-modules-lock.service \
&& passwd -l root \
&& systemctl set-default graphical.target
RUN bootc container lint
LABEL org.veilor.version=${VEILOR_VERSION}
```
## bootc-image-builder config (`build/disk-config.toml`)
```toml
[customizations]
hostname = "veilor-os"
[[customizations.user]]
name = "admin"
password = "veilor"
groups = ["wheel"]
shell = "/bin/bash"
[customizations.kernel]
append = "lockdown=integrity slab_nomerge init_on_alloc=1 init_on_free=1 randomize_kstack_offset=on vsyscall=none fbcon=nodefer"
[customizations.installer.kickstart]
contents = """
zerombr
clearpart --all --initlabel
part /boot/efi --fstype=efi --size=600
part /boot --fstype=ext4 --size=1024
part btrfs.veilor --grow --encrypted --luks-version=luks2 --pbkdf=argon2id
btrfs none --label=veilor btrfs.veilor
btrfs / --subvol --name=root LABEL=veilor
btrfs /home --subvol --name=home LABEL=veilor
"""
```
## GitHub Actions workflow
`build-bootc-iso.yml`:
- runs-on ubuntu-24.04, **timeout 30 min** (vs 90 for livecd-creator)
- permissions: `contents: write`, `packages: write`
- Build OCI image: `podman build` + `podman push ghcr.io/veilor/veilor-os:43`
- Build ISO via `quay.io/centos-bootc/bootc-image-builder:latest`
with `--type anaconda-iso --rootfs btrfs --config /build/disk-config.toml`
- Reuse split + `softprops/action-gh-release@v2` from existing workflow
## Migration risks (10-row table)
| # | Risk | Severity | Mitigation |
|---|------|----------|------------|
| 1 | %post --nochroot overlay-copy disappears | Low | `COPY overlay/ /` is simpler — win |
| 2 | Update model: `bootc upgrade` (image swap) replaces `dnf upgrade` | High | `veilor-update` becomes thin `bootc upgrade --apply` wrapper |
| 3 | /usr is read-only at runtime | Medium | etc-overlay handles /etc writes; relocate any /usr writers to /etc or build-time |
| 4 | SELinux module compilation in container | Medium | Works in fedora-bootc:43 (verified per upstream pattern). Test spike day 2 |
| 5 | `transaction_progress.py` patch unnecessary | Low | bootc-image-builder doesn't use dnf at install. Drop the patch. Win |
| 6 | `rd.luks.uuid` is anaconda's job again | Low | Removes ~80 lines of fragile sed/grubby code. Win |
| 7 | LUKS prompt UX: anaconda native, not gum | High | gum installer becomes `live·shell` only. v1.0 install = anaconda's native UI |
| 8 | --privileged still required | None | Same as today |
| 9 | OCI image size: ~3.5 GB compressed vs ~2.8 GB squashfs | Low | zstd:max recovers ~400 MB |
| 10 | `kernel-install` BLS: `/etc/kernel/cmdline` not honored, `/usr/lib/bootc/kargs.d/*.toml` is | Medium | Already addressed in Containerfile draft |
## What we keep (zero churn)
- `overlay/*` — copied verbatim by `COPY overlay/ /`
- `scripts/*.sh` — invoked verbatim by Containerfile RUN
- `assets/*` — copied verbatim
- `test/*` — adapts: `podman run --rm -it ghcr.io/veilor/veilor-os:43 /bin/bash` smoke; QEMU ISO test unchanged
- `kickstart/install.ks` — kept as fallback. Tag last anaconda build as `v0.5.99-anaconda` before flipping
## Spike success criteria (1 week)
| Day | Milestone |
|-----|-----------|
| 1 | Containerfile builds clean (`podman build` exit 0, `bootc container lint` exit 0) |
| 2 | `podman run` boots into image, KDE binaries present, SELinux + hardening sysctls applied |
| 3 | bootc-image-builder produces installer ISO from OCI, ksvalidator clean |
| 4 | ISO boots in QEMU to anaconda live menu |
| 5 | Install completes, LUKS single-prompt, btrfs subvols present |
| 6 | First boot reaches SDDM, admin login works, password-change-on-first-login enforced |
| 7 | Buffer for fixes; doc `docs/BUILD-bootc.md`; tag `v0.5.99-anaconda` snapshot |
## Decision gate
- **PASS** (all 7 criteria green): tag `v0.5.99-anaconda` as last-anaconda;
merge `bootc-spike``main` as `v0.6.0-bootc`; deprecate
`kickstart/veilor-os.ks` (keep `kickstart/install.ks` for one cycle).
Update ROADMAP: v1.0 ships bootc-only.
- **FAIL** (any of risks 3, 4, 7, 10 unfixable in week 1): keep
anaconda path, defer migration to v1.1+; file each blocker as GH
issue with reproducer.
- **HYBRID FALLBACK**: ship anaconda ISO for v0.6/v0.7, ship bootc OCI
alongside (matches existing `veilor-atomic` stretch goal).

View file

@ -0,0 +1,125 @@
# Hardening tier 2 — concrete plan
**Agent 4 of 9-agent wave, 2026-05-05.**
## Repo state already in tree
- `scripts/apparmor/` ships **3 profiles** (`thorium`, `veilor-power`,
`lm-studio`) — complain-mode, **not auto-loaded**. No browser/mail
/Element profile.
- `scripts/selinux/` ships custom `.te` modules — primary MAC.
- `overlay/etc/audit/plugins.d/veilor-remote.conf` +
`audisp-remote.conf.disabled` — **scaffold present, opt-in switch
missing**.
- `kickstart/veilor-os.ks` — single live-ks. Real LUKS install lives
in `overlay/usr/local/bin/veilor-installer` (generates ks at runtime).
- No nftables overlay. No homed scaffold. No `veilor-audit-shipping` CLI.
## Item-by-item plan
### 1. AppArmor stack with SELinux — M
Fedora 43 ships `apparmor-parser`/`libapparmor`. Kernel has both LSMs.
Stacking works since 5.1; SELinux stays primary, AppArmor confines
specific binaries by path. **No conflict** — they layer. Risk: AA
profiles based on Debian/Ubuntu paths fail on Fedora.
**Files:**
- `kickstart/veilor-os.ks` `%packages` add `apparmor-parser apparmor-utils apparmor-profiles`
- `overlay/etc/apparmor.d/veilor.d/` (new) — vendor profiles
`firefox`, `thunderbird`, `element-desktop`, `signal-desktop`
- `scripts/40-apparmor.sh` (new) — parses + sets all veilor profiles
to **complain** on first install (logs only, no break)
- `overlay/usr/local/bin/veilor-doctor` — adds AA status check
**Test:** `aa-status | grep complain` shows >=4 loaded; firefox writes
outside policy → audit.log denial.
### 2. systemd-homed opt-in — L
Default LUKS storage `homectl` drops key on suspend; resume needs PAM
unlock again — **breaks "lid open, keep working"**. Use
`--storage=fscrypt` on top of existing btrfs `/home` subvol —
suspend transparent, encrypts at rest with per-user key.
**Files:**
- `overlay/usr/local/bin/veilor-homed-enable` (new) — confirms warning,
runs `homectl create admin --storage=fscrypt --real-name="veilor admin"`
after migrating files
- `overlay/etc/pam.d/sddm` drop-in for `pam_systemd_home.so`
- doc in `docs/HARDENING.md`. **Not auto-run** — only via post-install.
### 3. nftables alongside firewalld — S
firewalld speaks nftables backend on F43 — they don't conflict;
firewalld owns `inet firewalld` table. veilor-os preset = separate
`inet veilor` table loaded by its own service.
**Files:**
- `overlay/etc/nftables/veilor.nft` (new) — table `inet veilor`:
ssh per-IP rate limit (5/min), ICMP rate limit, optional
`ip6 daddr ::/0 drop` toggled by sysctl-style `/etc/veilor/ipv6.disabled`,
anti-port-scan via `meter` set
- `overlay/etc/systemd/system/veilor-nftables.service` (new) —
`After=firewalld.service`
- `kickstart/veilor-os.ks` `%packages` add `nftables`, services-enabled
add `veilor-nftables`
**Test:** `nft list ruleset` shows both `firewalld` AND `veilor`;
`hping3 -S -p 22 --flood` from second VM gets rate-limited.
### 4. Audit log shipping — S
Plumbing **already in tree** (`audisp-remote.conf.disabled`,
`veilor-remote.conf` with `active=no`). What's missing: CLI to flip
the switch with cert pinning.
**Files:**
- `overlay/usr/local/bin/veilor-audit-shipping` (new)
- `enable HOST PORT FINGERPRINT` writes
`/etc/veilor/audit-pin.sha256`, copies `audisp-remote.conf.disabled`
`audisp-remote.conf` with substituted host/port, enables plugin
(`active=yes`), restarts auditd
- `disable` reverses
- audisp-remote speaks TLS directly; cert pinning via `verify_peer=yes`
+ `peer_cert_fingerprint`
- Use **self-signed pinned**, not LE — collectors are LAN/VPN
**Test:** stand up `rsyslog` listener on nullstone with self-signed
cert; run helper; trigger `sudo -i`; tail nullstone for AUTHPRIV
event; revoke cert → events stop with logged TLS error.
### 5. Installer kickstart split — needs re-scope, S
Roadmap item is **stale**. As of v0.5.30 we already do real LUKS+btrfs
in `veilor-installer` which generates ks at runtime. **Re-scope:**
extract that generated ks template into static
`kickstart/veilor-os-install.ks` (parameterised via `%include
/tmp/answers.ks`), so reviewable in repo and reusable headlessly.
**Files:**
- split `overlay/usr/local/bin/veilor-installer` heredoc into
`kickstart/veilor-os-install.ks`
- installer just writes answers + `cp` the ks
- CI lints both with `ksvalidator`
### 6. Audit baseline re-run — S
Mechanical: `cp security/audit-template.md
security/veilor-os-distro/2026-05-DD.md`, run on VM, target lower
findings count than v0.2's baseline.
## Order, dependencies, ship plan
Dependencies: (5) blocks (6) — audit a stable installer, not a
moving heredoc. Else parallel.
**Total effort:** 2S + 1S(rescope) + 1S + 1M + 1L ≈ **57 dev-days**.
- **v0.5.32 (small wins):** (4) audit shipping CLI + (3) nftables
preset. Both S, scaffold completion, pure overlay (no kickstart risk).
- **v0.5.33:** (5) ks split + (6) audit baseline re-run.
- **v0.6 (medium):** (1) AppArmor stack — package install + 4 profiles
+ doctor integration; complain-mode keeps blast radius zero.
- **v0.7 (big lift):** (2) systemd-homed — UX-disruptive, needs
migration helper + doc page + suspend/lock/swap testing.

View file

@ -0,0 +1,65 @@
# Threat model + public launch prep
**Agent 5 of 9-agent wave, 2026-05-05.**
## Deliverable
Threat model written to `docs/THREAT-MODEL.md` (1492 words). Slots
into `docs/ROADMAP.md` v0.7 line item "Threat model published —
honest scope".
## Structure
1. **In-scope adversaries** (9 rows): lost laptop, browser RCE, USB
attacks, SSH brute-force, forensics, supply chain, LPE, network
surface, time MITM. Each maps to specific veilor mitigation
(LUKS2 argon2id mem=1GB, SELinux + `veilor-systemd` policy,
USBGuard, fail2ban+firewalld, auditd, NTS chrony, etc.).
2. **Out-of-scope adversaries** (9 rows): firmware implants,
evil-maid on running system, hardware keylogger, session-level
RCE (KDE not sandboxed), AES side-channels, TPM2 physical
attacks, traffic correlation, TOFU MITM, sustained physical
access. Each row points to right tool instead (Heads, Qubes,
Tails).
3. **Hardening tradeoffs** (6 honest costs):
- SELinux app-compat
- Slow LUKS boot
- USBGuard friction
- Module lockdown breaking NVIDIA prop / VBox
- Drop-zone breaking KDE Connect / mDNS
- No PackageKit
4. **Like Tails/Whonix/Qubes:** published threat model, default-deny
firewall, encrypted at rest.
5. **Differs from them:** daily-driver vs session-only; single-VM vs
Qubes compartmentalisation; persistent identity vs Tails amnesia.
6. **Comparison matrix:** 10-axis × 6-distro grid (veilor-os / stock
Fedora KDE / Kicksecure / Tails / Qubes / secureblue) covering
encryption, MAC, firewall, USB, per-app isolation, anonymity,
daily-driver fit, signed releases, threat-model publication,
hardware compat.
7. **v0.7 launch checklist** (9 items):
- Threat model finalised
- GPG signing (v0.4 dep)
- mkdocs-material on veilor.org
- Comparison + benchmarks
- Press kit
- "What veilor-os is not" preempt page (covers "why not Qubes/Tails/Fedora?")
- r/linux + r/Fedora + HN posts
- GitHub Release with ISO+sha256+.asc
- Repo flip-public + DNS + Mastodon/Matrix/SimpleX announce
## Tone
Matches repo voice — short paragraphs, no fluff, "honest scope"
framing reused from roadmap. No emojis (per CLAUDE.md style).
## See also
- `docs/THREAT-MODEL.md` (full document)
- `docs/ROADMAP.md` v0.7 section

View file

@ -0,0 +1,96 @@
# Anaconda log capture — virtio-9p host-share
**Agent 6 of 9-agent wave, 2026-05-05.**
## Why current setup is silent
v0.5.30 wired:
```
-chardev file,id=anaclog,path=$ANACONDA_LOG
-device virtio-serial-pci,id=vs1
-device virtserialport,chardev=anaclog,bus=vs1.0,name=org.fedoraproject.anaconda.log.0
```
Anaconda is supposed to autodetect this port and stream logs. Result:
`test/anaconda-vm-*.log` files are 0 bytes despite multiple full
installs.
**Root cause:** Anaconda's `setupVirtio()` (anaconda_logging.py:315)
doesn't write to the virtio port directly — it adds a forward rule to
`/etc/rsyslog.conf` then calls `restart_service("rsyslog")`. No
`inst.virtiolog` boot arg is required (`--virtiolog` defaults to the
right port via `argument_parsing.py:512`).
The veilor live ISO almost certainly **lacks `rsyslog`** (minimal
Fedora ks), so the forward rule lands in a file no daemon reads.
`restart_service` is a no-op. The QEMU side opens the port and
creates the 0-byte file but nothing ever writes to it.
Even with rsyslog present, only `LOG_LOCAL1`-tagged messages would
flow; the rich content lives in `/tmp/anaconda.log`,
`/tmp/program.log`, `/tmp/storage.log`, `/tmp/packaging.log` which
never traverse syslog.
## Fix — Option C (virtio-9p host-share + post-install copy)
### `test/run-vm.sh`
Add `-virtfs` 9p export of `test/test-runs/<timestamp>/` tagged
`hostlogs`. Keep existing virtio-serial as belt-and-braces fallback.
```bash
TS=$(date +%Y%m%d-%H%M%S)
HOSTLOGS_DIR="$TEST_DIR/test-runs/$TS"
mkdir -p "$HOSTLOGS_DIR"
HOSTSHARE_ARGS=(
-virtfs "local,path=$HOSTLOGS_DIR,mount_tag=hostlogs,security_model=mapped-xattr,id=hostshare"
)
echo " Logs : $HOSTLOGS_DIR"
```
Append `"${HOSTSHARE_ARGS[@]}" \` to the `exec qemu-system-x86_64`
block.
### `overlay/usr/local/bin/veilor-installer`
In `run_install()`, install an `EXIT` trap calling `_dump_logs_to_host`
that mounts the 9p share at `/mnt/hostlogs` and copies:
- `/tmp/{anaconda,program,storage,packaging,dnf,dnf.librepo,anaconda-cmdline}.log`
- `/var/log/veilor-installer.log`
- generated kickstart at `/run/install/veilor-generated.ks`
- `dmesg` output
- `journalctl -b` output
Runs on success, failure, and `^C`. Auto-no-ops on real hardware
where 9p isn't loaded.
```bash
_dump_logs_to_host() {
if mount -t 9p -o trans=virtio,version=9p2000.L hostlogs /mnt/hostlogs 2>/dev/null; then
cp -a /tmp/{anaconda,program,storage,packaging,dnf,dnf.librepo,anaconda-cmdline}.log \
/var/log/veilor-installer.log \
/run/install/veilor-generated.ks \
/mnt/hostlogs/ 2>/dev/null || true
dmesg > /mnt/hostlogs/dmesg.log 2>/dev/null || true
journalctl -b > /mnt/hostlogs/journal.log 2>/dev/null || true
umount /mnt/hostlogs 2>/dev/null || true
fi
}
trap _dump_logs_to_host EXIT
```
## Why options A/B/D were rejected
- **A** (grub kernel arg surgery — `inst.virtiolog`) and **D** (host
rsyslog TCP listener with `inst.syslog=10.0.2.2:5140`) both still
rely on rsyslog being present in the live ISO.
- **B** (anaconda --syslog at CLI) — same dependency.
- **C** captures complete file-level fidelity regardless. virtio-9p is
in the kernel; mount is two lines; copies the actual files.
## Files modified
- `test/run-vm.sh`
- `overlay/usr/local/bin/veilor-installer`

View file

@ -0,0 +1,100 @@
# KDE theme + DuckSans + /etc/skel branding audit
**Agent 7 of 9-agent wave, 2026-05-05.**
## Catalog: what's currently shipped
| Component | Status | Path |
|---|---|---|
| Color scheme | shipped | `assets/kde/veilor-black.colors``/usr/share/color-schemes/` |
| System kdeglobals | shipped | `assets/kde/veilor-default.kdeglobals``/etc/xdg/kdedefaults/kdeglobals` |
| Breeze decoration override | shipped | `assets/kde/breezerc``/etc/xdg/breezerc` |
| Plasma containment defaults | shipped | written by `30-apply-v03-theme.sh``/etc/xdg/kdedefaults/plasma-org.kde.plasma.desktop-appletsrc` |
| Wallpaper (PNG+SVG) | shipped | `assets/wallpapers/veilor-black.{png,svg}``/usr/share/wallpapers/veilor-black/` |
| SDDM theme | shipped (full QML) | `assets/sddm/veilor-black/``/usr/share/sddm/themes/veilor-black/` |
| SDDM theme activation | shipped | `30-apply-v03-theme.sh` writes `/etc/sddm.conf.d/veilor-theme.conf` (Current=veilor-black) |
| Konsole profile + colorscheme | shipped | `assets/konsole/veilor.{profile,colorscheme}``/usr/share/konsole/Veilor.*` + `/etc/xdg/konsolerc` |
| Plymouth theme | shipped | `assets/plymouth/veilor/` |
| os-release branding | shipped | PRETTY_NAME="veilor-os 0.5.27", LOGO=veilor-logo |
| Fira Code fontconfig | shipped | `/etc/fonts/conf.d/55-veilor-firacode.conf` |
| DuckSans font | DEFERRED — empty dir, README only | |
## Drift inside active configs
- `overlay/etc/sddm.conf.d/veilor.conf` sets `[Theme] Current=breeze`.
- `30-apply-v03-theme.sh` then writes
`/etc/sddm.conf.d/veilor-theme.conf` with `Current=veilor-black`.
- SDDM merges alphabetically → `veilor-theme.conf` wins (loads after).
- Shipping a `Current=breeze` line in the overlay is misleading drift.
## Specific gaps preventing visual brand consistency
1. **No `/etc/skel/` whatsoever.** `overlay/etc/skel/` does not exist.
All KDE config lives in `/etc/xdg/kdedefaults/` and `/etc/xdg/*rc`.
Works for fresh boots, but the moment the user clicks anything in
System Settings, KDE writes `~/.config/kdeglobals` and silently
shadows the system defaults. **Zero per-user seeding** = one click
away from losing all branding.
2. **No PRETTY_NAME secondaries.** `/etc/system-release`, `/etc/issue`,
`/etc/issue.net`, `/etc/lsb-release` never written. `lsb_release
-a` reports Fedora. KDE About dialog uses os-release (OK) but TTY
login banner + many user-space tools read `/etc/system-release`.
3. **No `kwinrc` shipped.** Plasma 6 Wayland-specific defaults
(TitlebarDoubleClick, Compositor backend, FocusPolicy, animation
speed) not seeded. Vanilla Fedora KDE animations + click-to-focus
prevail.
4. **No panel layout** (`plasma-org.kde.plasma.desktop-appletsrc`
containment for panel). The file written by `30-apply-v03-theme.sh`
only seeds `[Containments][1]` (desktop containment) for wallpaper.
Actual Plasma panel containment (taskbar, system tray, clock,
kickoff icon) is unseeded → users get stock Fedora panel with
Fedora-blue kickoff button.
5. **DuckSans deferred but README claims it as the brand font.**
`kdeglobals`, Konsole, SDDM all hardcode `Fira Code`. If DuckSans
ever ships, ten files need synchronized edits.
6. **`overlay/etc/sddm.conf.d/veilor.conf` says `Current=breeze`** —
internal contradiction with script-written `veilor-theme.conf`.
Cosmetic but confusing.
7. **`kde-theme-apply.sh` has `warn()` undefined** (line 64) — calls
`warn` but only `ok`/`info` defined. If os-release source ever
goes missing, script crashes with `command not found`.
## Top 5 `/etc/skel/` additions (highest impact, lowest effort)
1. **`/etc/skel/.config/kdeglobals`** — copy of
`assets/kde/veilor-default.kdeglobals`. Single highest-impact file:
locks ColorScheme, AccentColor, Font, Icons.Theme,
LookAndFeelPackage into the user's first-write file so System
Settings interaction won't revert anything to Breeze defaults.
2. **`/etc/skel/.config/konsolerc`** — `[Desktop Entry]
DefaultProfile=Veilor.profile` plus `[KonsoleWindow]
ShowMenuBarByDefault=false`. Per-user override of system konsolerc;
ensures first konsole launch is branded even if user's home
pre-exists.
3. **`/etc/skel/.config/kwinrc`** — Plasma 6 Wayland defaults:
`[Compositing] AnimationSpeed=0`, `[Windows]
FocusPolicy=ClickToFocus`, `[Plugins] blurEnabled=false` (mirrors
the no-animations Breeze override).
4. **`/etc/skel/.config/plasma-org.kde.plasma.desktop-appletsrc`** —
full containment file with both desktop containment
(wallpaper=veilor-black) AND panel containment (kickoff icon =
`/usr/share/pixmaps/veilor-logo.svg`, panel height/position).
Without this, the taskbar is vanilla Fedora.
5. **`/etc/skel/.local/share/konsole/Veilor.profile`** — local copy so
user-local konsole sees the profile in its dropdown without needing
`/usr/share/konsole/` walk. Pair with #2.
**Bonus near-zero-effort:** write `/etc/system-release`, `/etc/issue`,
and `/etc/lsb-release` in `kde-theme-apply.sh` to close the
lsb_release/TTY-banner gap. And fix the undefined `warn()` in
`kde-theme-apply.sh:64`.

View file

@ -0,0 +1,131 @@
# Build-iso CI hardening
**Agent 8 of 9-agent wave, 2026-05-05.**
## State of play
- Workflows: `build-iso.yml`, `lint.yml`, `Release Checksums` (auto)
- Secrets/variables: **none configured** — only ambient `GITHUB_TOKEN`
- Repo: private, MIT, no Pages, no Dependabot, no branch protection
(Pro-gated until public flip)
- Container: `registry.fedoraproject.org/fedora:43` (tag, not digest)
- Actions: `actions/checkout@v4`, `addnab/docker-run-action@v3`,
`softprops/action-gh-release@v2`, `ludeeus/action-shellcheck@master`
— **all unpinned to SHA**
- gum download: pinned by SHA256 ✓
- Kickstart repos: `releases/43/Everything` + `updates/43/Everything`
**both rolling**, byte-different daily
## Top 5 immediate (S effort, ship in v0.5.32)
| # | Item | Why |
|---|------|-----|
| 1 | Pin all actions to commit SHA + add `.github/dependabot.yml` for `github-actions` | Supply-chain — `@master` on shellcheck is live-takeover vector; v3/v4 tags are mutable |
| 2 | Pin Fedora container to digest (`registry.fedoraproject.org/fedora:43@sha256:...`) | One-line change; eliminates "container drift" repro class |
| 3 | Add `permissions:` block at workflow level (`contents: read` default), override per-job | `contents: write` is workflow-wide; least-privilege the lint job |
| 4 | Generate SBOM via `anchore/sbom-action`, attach to release | Free, ~30 lines, journalist-readable |
| 5 | Add `actions/attest-build-provenance@v2` for SLSA L3 attestation on ISO + parts | Free, GH-native, `id-token: write` only |
## v0.4 release-eng roadmap (confirmed/added)
- **Confirmed:** Sigstore/cosign signing of ISOs (already in roadmap)
- **Add:** Fedora compose-ID pinning per release tag — switch
`--baseurl` to
`kojipkgs.fedoraproject.org/compose/branched/Fedora-43-...n.X/compose/Everything/x86_64/os/`
for stable releases (rolling for `ci-latest`)
- **Add:** Reproducible-Builds.org diffoscope job comparing 2
sequential builds of same SHA — gate on byte-equality
- **Add:** `harden-runner` (StepSecurity) audit-mode pass to enumerate
egress; promote to block-mode in v0.5
- **Add:** When repo flips public (v0.7), enable secret scanning + push
protection + private vuln reporting + branch protection (require ≥1
review, status checks: lint + ksvalidate + build, no force-push)
- **Add:** OIDC `id-token: write` only in tag-release job (not on
`main` push) — keysless cosign signing scoped to release events
## YAML diffs
### 1. Workflow-level permissions + per-job override
```yaml
permissions:
contents: read
jobs:
build:
permissions:
contents: write # gh-release
id-token: write # cosign keyless + attestation
attestations: write
```
### 2. SHA-pin actions
```yaml
- uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
- uses: addnab/docker-run-action@4f65375b03d588f307b7a3b0a8bb50f8b58a85b9 # v3
- uses: softprops/action-gh-release@01570a1f39cb168c169c802c3bceb9e93fb10974 # v2.1.0
```
(SHAs to be re-checked at apply-time; dependabot keeps them current)
### 3. Pin Fedora digest
```yaml
image: registry.fedoraproject.org/fedora:43@sha256:<DIGEST>
```
Capture once via `skopeo inspect --raw
docker://registry.fedoraproject.org/fedora:43 | jq -r .config.digest`
and bump on each releasever bump.
### 4. SBOM + attestation + cosign
```yaml
- name: Install cosign
uses: sigstore/cosign-installer@d7d6e07a3ddf0f9a4f8b3b9e3f1d1a5ce8e9b5b3 # v3.7.0
- name: Sign ISO parts (keyless)
if: github.event_name == 'release'
run: |
cd build/out
for f in *.part-*; do cosign sign-blob --yes "$f" \
--output-signature "$f.sig" --output-certificate "$f.pem"; done
- name: Generate SBOM (SPDX)
uses: anchore/sbom-action@e8d2a6937ecead383dfe75190d104edd1f9c5751 # v0.17.4
with:
path: build/out
format: spdx-json
output-file: build/out/veilor-os.spdx.json
- name: Build provenance attestation
uses: actions/attest-build-provenance@7668571508540a607bdfd90a87a560489fe372eb # v2.1.0
with:
subject-path: 'build/out/*.part-*'
```
### 5. New `.github/dependabot.yml`
```yaml
version: 2
updates:
- package-ecosystem: "github-actions"
directory: "/"
schedule: { interval: "weekly" }
groups:
actions: { patterns: ["*"] }
```
### 6. Timeout
Keep at 90min. Largest observed runs ~70min; trimming would
false-fail Fedora-mirror-slow days. **No change.**
## Q&A
- **Secrets in use:** none. Only ambient `GITHUB_TOKEN`. Once public,
enable secret scanning + push protection (free for public repos).
- **Pages:** not deployed from this repo. Docs site out-of-scope here.
- **Dependency review:** only `gum` fetched out-of-band — already
SHA256-pinned. Add `actions/dependency-review-action` on PRs once public.

View file

@ -0,0 +1,167 @@
# Real-hardware failure mode audit (post-v0.5.31)
**Agent 9 of 9-agent wave, 2026-05-05.** Pessimistic enumeration.
## A. Boot path
### A1. Secure Boot + GRUB_DISTRIBUTOR rebrand
- shim chain itself untouched (uses `/EFI/fedora/`), but `grub2-mkconfig`
regenerates entries naming `veilor-os` while shim only trusts paths
under `/EFI/fedora/grubx64.efi`. Strict UEFI: menu boots, kernel
signatures verify via Fedora's MOK chain. Risk: `os-prober` writing
dual-boot Windows entries breaks MBR/MOK.
- **Symptom:** dual-boot with Windows shows
`Verification failed: (0x1A) Security Violation`.
- **Prob:** MED. **Fix:** S. **Target:** v0.5.32.
### A2. KMS handoff — `fbcon=nodefer` necessary but not sufficient
- On Intel Arc/iGPU late-gen + NVIDIA proprietary chains, 5-15s blank
between vt switch and SDDM start because `simpledrm` releases before
`i915`/`nvidia-drm` claim.
- **Symptom:** ~10s blank pre-SDDM; user thinks crashed.
- **Prob:** HIGH. **Fix:** S — add `i915.modeset=1
nvidia-drm.modeset=1 amdgpu.modeset=1`. SDDM `Type=simple` startup.
**Target:** v0.5.32.
### A3. USBGuard hash-based rules
- `scripts/20-harden-kernel.sh:127-131` ships **empty** rules.conf
with `ImplicitPolicyTarget=block`. First boot, admin runs
`usbguard generate-policy`. Per `feedback_usbguard_dock.md`, this
writes hash+parent-hash rules that break on dock replug.
- **Symptom:** keyboard/mouse dies on first dock unplug-replug.
- **Prob:** HIGH. **Fix:** M — patch invocation to
`--with-hash=false`, or ship `veilor-usbguard-enroll` wrapper.
**Target:** v0.5.32 (same bug we already learned).
### A4. Wifi/Bluetooth firmware
- `@hardware-support` pulls `linux-firmware` etc.
- Realtek RTL8852/MT7921 firmware ships in `linux-firmware-whence` only.
- **Prob:** LOW. **Fix:** S (add explicit `linux-firmware-whence`).
**Target:** v0.5.32.
### A5. Bluetooth disabled at boot
- `scripts/20-harden-kernel.sh:111` disables `bluetooth` service.
BT keyboards/mice don't pair until user enables service.
- **Prob:** MED (laptop users). **Fix:** S — leave bluetooth.service
enabled, mask `obex` only. **Target:** v0.6.
## B. First-boot KDE session
### B1. Plasma 6 Wayland fallback on hybrid graphics
- SDDM config doesn't pin session. NVIDIA Optimus + intel-iris
triggers Wayland → silent fallback to X11 on some HW.
- **Symptom:** screen tearing, no fractional scaling.
- **Prob:** MED. **Fix:** S — add `[Autologin] Session=plasma`
+ `[General] DefaultSession=plasma.desktop`. **Target:** v0.6.
### B2. SUSPEND/RESUME KILLS WIFI — THE BIG ONE 🚨
- `veilor-modules-lock.service` sets `kernel.modules_disabled=1` 30s
after graphical.target. `iwlwifi`, `iwlmvm`, `cfg80211` reload on
resume from S3/S0ix. With modules locked: **resume → permanent
wifi death until reboot**. Same for `nvidia` autoload, `xhci_pci`
re-init on dock attach.
- **Symptom:** close laptop lid → reopen → no wifi, no dock USB,
until reboot.
- **Prob:** VERY HIGH (every laptop user, day 1).
- **Fix:** M — gate lock on `ConditionACPower=true` + reset on
suspend, OR move from `modules_disabled` to `module.sig_enforce=1`
kernel cmdline (no runtime lock needed).
- **Target:** v0.5.32 — **BLOCKER**.
### B3. Lid-close handling
- `logind.conf` not modified. Defaults `HandleLidSwitch=suspend`.
Combined with B2, every lid close = wifi loss.
- **Prob:** HIGH. **Fix:** S. **Target:** v0.5.32 (paired with B2).
## C. Day-2 ops
### C1. `/etc/default/grub` + `/etc/kernel/cmdline` drift
- Kickstart writes `GRUB_CMDLINE_LINUX_DEFAULT=""`. Real installer
writes `/etc/kernel/cmdline` with LUKS rd.luks args. `kernel-install`
reads the latter; `grub2-mkconfig` re-reads `/etc/default/grub`.
- **Symptom:** `dnf upgrade kernel` regenerates grub.cfg from
default/grub, drops LUKS unlock args from new entry → unbootable.
- **Prob:** HIGH. **Fix:** M — sync both files in `veilor-update`,
or migrate fully to BLS without grub-mkconfig.
- **Target:** v0.5.32.
### C2. SELinux relabel on first boot
- `firstboot.sh` flips to `enforcing` and `touch /.autorelabel`. On
large /home (encrypted btrfs), relabel takes 2-5min — user sees
frozen screen with cursor.
- **Symptom:** stuck "first boot" appears hung.
- **Prob:** MED. **Fix:** S (add plymouth message). **Target:** v0.6.
### C3. F44 upgrade
- Hardcoded `python3.14` path (kickstart:334) for transaction_progress.py
patch. Survives no upgrade.
- **Prob:** certainty by Nov 2026. **Fix:** M. **Target:** v0.6.
### C4. chrony NTS unreachable from corp networks
- Cloudflare NTS over UDP 4460 blocked by many corp firewalls.
chronyd will fail-stop sync.
- **Symptom:** clock skew → TLS failures → broken everything.
- **Prob:** MED. **Fix:** S (add fallback `pool` line — already
present, verify ordering). **Target:** v0.5.32.
## D. Networking
### D1. firewalld drop zone vs Tailscale 🚨
- `tailscale up` requires UDP 41641 + tailscale0 trusted. Default
`drop` zone blocks tailscale0.
- **Prob:** HIGH (this user uses Tailscale daily).
- **Fix:** S — ship `/etc/firewalld/zones/trusted.xml` with
`tailscale0` interface.
- **Target:** v0.5.32.
### D2. systemd-resolved DoT vs corp split-DNS
- No /etc/resolved.conf.d entries shipped (overlay dir empty).
- Corp internal hostnames fail.
- **Prob:** LOW. **Fix:** M. **Target:** v0.7.
## E. Hardware diversity
### E1. NVMe vs SATA LUKS perf
- Argon2id KDF tuned to memory, not IO.
- **Prob:** cosmetic. Skip.
### E2. ARM aarch64
- Out of scope for v0.5/0.6.
### E3. TPM2 unlock
- Already on roadmap. **Target:** v0.7.
## Top 10 ranked (prob × severity)
| # | Issue | Prob | Sev | Target |
|---|-------|------|-----|--------|
| 1 | **B2 Suspend/resume wifi death** (modules_disabled) | VHIGH | CRITICAL | v0.5.32 |
| 2 | **C1 kernel-upgrade grub drift** (LUKS args lost) | HIGH | CRITICAL | v0.5.32 |
| 3 | **A3 USBGuard hash rules** (dock replug) | HIGH | HIGH | v0.5.32 |
| 4 | **D1 firewalld blocks tailscale0** | HIGH | HIGH | v0.5.32 |
| 5 | **A2 KMS blank-screen 10s** | HIGH | MED | v0.5.32 |
| 6 | **B3 Lid-close suspend** (compounds B2) | HIGH | MED | v0.5.32 |
| 7 | **A1 Secure Boot + os-prober dual-boot** | MED | HIGH | v0.6 |
| 8 | **C4 NTS blocked corp** | MED | MED | v0.5.32 |
| 9 | **B1 Plasma Wayland fallback** | MED | MED | v0.6 |
| 10 | **C3 F44 path-pinned patch** | CERTAIN | LOW (Nov) | v0.6 |
## Top 5 to preempt in v0.5.32
1. **B2 modules-lock vs resume** — gate on no-pending-suspend, OR swap
to `module.sig_enforce=1` kernel cmdline.
2. **C1 cmdline drift** — make `veilor-update` fail-loud if
`/etc/kernel/cmdline` and `/etc/default/grub` diverge; regen BLS
on every kernel install.
3. **A3 USBGuard id-based rules**`veilor-usb-enroll` wrapper that
calls `usbguard generate-policy --with-hash=false`. Same fix that
already burned us on onyx.
4. **D1 Tailscale zone** — ship `/etc/firewalld/zones/trusted.xml`
listing `tailscale0`, plus NetworkManager dispatcher to assign it.
5. **A2 KMS handoff** — append `i915.modeset=1 amdgpu.modeset=1
nvidia-drm.modeset=1` to bootloader cmdline.
**Critical insight:** B2 alone bricks the laptop for any user who
closes their lid. Without that fix, v0.5.32 is shippable on desktops
only. Same architectural class as the LUKS bug — security feature
breaks legitimate kernel state transitions.

View file

@ -0,0 +1,42 @@
# 9-agent research wave — 2026-05-05
Deep-dive research wave kicked off after v0.5.31 ship to surface every
plausible failure mode + future bug class before the v0.7 public flex.
Each agent took ~15 min, returned a focused report. Findings indexed
here, full reports in this directory.
The findings already inform `docs/ROADMAP.md` (Lessons learned section
+ v0.5.32 / v0.6 / v0.7 reorder) and `docs/THREAT-MODEL.md` (drafted
by Agent 5).
| # | Topic | File | Key finding |
|---|---|---|---|
| 1 | Plymouth + LUKS real-hardware edge cases | [01-plymouth-luks-real-hardware.md](01-plymouth-luks-real-hardware.md) | Initramfs keymap missing breaks non-US users at LUKS prompt |
| 2 | SDDM + first-boot UX failure modes | [02-sddm-firstboot-ux.md](02-sddm-firstboot-ux.md) | `veilor-firstboot.service` `WantedBy=multi-user.target` only — silently doesn't run on real installs (graphical target) |
| 3 | bootc-image-builder spike plan | [03-bootc-spike-plan.md](03-bootc-spike-plan.md) | Full Containerfile draft + 1-week timebox; v0.7 schedule |
| 4 | Hardening tier 2 (AppArmor + nftables + audit + homed) | [04-hardening-tier-2.md](04-hardening-tier-2.md) | nftables + audit log shipping = S effort each, ship in v0.5.32 |
| 5 | Threat model + public launch prep | [05-threat-model-launch.md](05-threat-model-launch.md) | Drafted at `docs/THREAT-MODEL.md`. Honest in/out scope tables |
| 6 | Anaconda log virtio-serial silent fix | [06-anaconda-log-capture.md](06-anaconda-log-capture.md) | virtio-serial requires rsyslog (not in our live ISO). Switch to virtio-9p host-share with EXIT trap copy |
| 7 | KDE theme + DuckSans + /etc/skel branding | [07-kde-skel-branding.md](07-kde-skel-branding.md) | `/etc/skel/` doesn't exist; branding evaporates the moment user opens System Settings |
| 8 | Build-iso CI hardening | [08-ci-hardening.md](08-ci-hardening.md) | Pin actions to SHA, dependabot, SBOM, SLSA L3 attestation — all S effort |
| 9 | Real-hardware failure mode audit | [09-realhw-failure-modes.md](09-realhw-failure-modes.md) | **CRITICAL: `kernel.modules_disabled=1` kills wifi on suspend/resume.** Top blocker for v0.5.32 |
## Top blockers for next ship (v0.5.32)
Cross-referenced by severity × probability:
1. **Suspend/resume wifi death** (Agent 9) — every laptop bricks on lid-close
2. **veilor-firstboot.service WantedBy=graphical.target** (Agent 2) — login broken on real installs
3. **kernel-upgrade grub drift** (Agent 9) — first `dnf upgrade kernel` = unbootable
4. **USBGuard hash-rules problem** (Agent 9, mirrors `feedback_usbguard_dock.md`)
5. **firewalld blocks tailscale0** (Agent 9) — user uses tailscale daily
6. **/etc/skel/ empty → no per-user branding** (Agent 7)
7. **virtio-9p log capture** (Agent 6) — replaces broken virtio-serial path
## Research wave protocol
This wave validated the `wave + verifier` pattern from v0.5.31 fix
(per ROADMAP lessons learned #4). Multi-agent debug only produces
signal when one agent's findings are checked against another's;
9 parallel agents on distinct topics gave independent angles that
converged on the v0.5.32 blocker list above.

View file

@ -51,7 +51,14 @@ user --name=admin --groups=wheel --gecos="veilor admin" --password="" --plaintex
# Note: init_on_alloc/init_on_free removed from default live cmdline —
# they zero every memory page at boot which 5x'd KVM live boot time.
# Re-enable per-install via veilor-firstboot.service for production.
bootloader --location=mbr --append="lockdown=integrity slab_nomerge randomize_kstack_offset=on vsyscall=none"
# `fbcon=nodefer` keeps the linux framebuffer console alive across the
# KMS modeset that intel/amdgpu/nvidia drivers do during userspace init.
# Without it, on real hardware the screen blanks the moment the GPU
# driver loads and the installer's tty1 redraw lands on a frozen
# framebuffer — symptom: black screen with blinking cursor for ~30s
# while the menu IS in fact rendered, just not painted. virtio-vga in
# QEMU doesn't trigger this so it never reproed in VM.
bootloader --location=mbr --append="lockdown=integrity module.sig_enforce=1 slab_nomerge randomize_kstack_offset=on vsyscall=none plymouth.enable=0 fbcon=nodefer i915.modeset=1 amdgpu.modeset=1 nvidia-drm.modeset=1 rd.vconsole.keymap=us"
# ── Live ISO partitioning (flat — for live rootfs build only) ──
# NOTE: This is the *live* image kickstart. Final installed system uses
@ -112,6 +119,11 @@ chrony
firewalld
plymouth
# AppArmor stack — DEFERRED. apparmor-parser / apparmor-utils /
# apparmor-profiles are not in Fedora 43 base or updates. v0.6 ships
# without AppArmor; tier-2 plan to land via COPR or as part of the v0.7
# secureblue OCI hybrid (which has its own LSM stack).
# admin essentials
git
vim-enhanced
@ -249,6 +261,16 @@ ln -sf /usr/lib/systemd/system/sddm.service /etc/systemd/system/display-manager.
# Real installs land on graphical.target by default (set by anaconda).
systemctl set-default multi-user.target
# Branding: GRUB menu title + plymouth `details` text theme (no graphical
# splash). Pure text-scroll boot exposes the gum installer immediately on
# tty1 instead of plymouth swallowing it.
sed -i \
-e 's|^GRUB_DISTRIBUTOR=.*|GRUB_DISTRIBUTOR="veilor-os"|' \
-e 's|^GRUB_CMDLINE_LINUX_DEFAULT=.*|GRUB_CMDLINE_LINUX_DEFAULT=""|' \
/etc/default/grub 2>/dev/null || true
plymouth-set-default-theme details 2>/dev/null || true
[ -f /boot/grub2/grub.cfg ] && grub2-mkconfig -o /boot/grub2/grub.cfg 2>/dev/null || true
# zram swap (no disk swap; keys never leak to platter)
dnf install -y zram-generator || true
cat > /etc/systemd/zram-generator.conf << 'EOF'
@ -257,10 +279,115 @@ zram-size = min(ram, 8192)
compression-algorithm = zstd
EOF
# Patch anaconda's transaction_progress.py inside the live rootfs so that
# when the user clicks "Install", a non-fatal RPM 6.0 *scriptlet* warning
# does not get escalated to "An error occurred during the transaction"
# and abort.
#
# This patch is NARROW — it overrides ONLY the `script_error` callback,
# not the consumer (`process_transaction_progress`). v0.5.28 had a broad
# patch that turned EVERY 'error' token into a warning, including
# `cpio_error` (payload corruption) and `unpack_error` (extraction
# failures). Side effect: silent grub2-efi-x64 scriptlet failure →
# /boot/efi/EFI/fedora/ left incomplete → `gen_grub_cfgstub` failed at
# the bootloader install phase. Narrowing eliminates that class of
# silent failure.
#
# Why a patch is needed at all: Fedora 43 ships RPM 6.0, which changed
# scriptlet failure propagation (Fedora wiki Changes/RPM-6.0; dnf5 issue
# 2507). Scriptlets that previously emitted "Non-critical error"
# warnings now bubble up as transaction-level errors. man-db's
# `transfiletriggerin` (`systemd-run /usr/bin/systemctl start
# man-db-cache-update`) is the most common trigger — non-zero in the
# anaconda chroot, RPM-6.0-aware dnf5 reports as error, anaconda
# --cmdline aborts.
#
# After the patch:
# - script_error → log warning, do NOT enqueue 'error' (transaction
# continues; specific package's posttrans whose result we ignore is
# already in the install set, scriptlet has run as far as it can).
# - cpio_error / unpack_error / generic error → unchanged, still
# raise PayloadInstallationError as anaconda intends. Real
# transaction-fatal events still abort install (good).
# Patch anaconda's transaction_progress.py to suppress dnf5's
# transaction-error escalation under RPM 6.0 + cmdline mode.
#
# History of this patch:
#
# v0.5.28: BROAD patch — overrode `process_transaction_progress` so all
# four 'error' token producers (cpio_error, script_error, unpack_error,
# generic error) became log warnings. man-db scriptlet stopped killing
# the install. BUT silent grub2-efi-x64 scriptlet failure left
# /boot/efi/EFI/fedora/ incomplete → gen_grub_cfgstub failed.
#
# v0.5.29: NARROW patch — overrode only `script_error` callback. Caught
# the per-package scriptlet failures cleanly. BUT dnf5 still tracks
# its own internal error counter and emits a final aggregate
# `error("transaction process has ended with errors..")` at end of
# transaction, which still raised PayloadInstallationError. Install
# aborted before bootloader install ran.
#
# v0.5.30: BROAD patch + bootloader --location=none in install ks.
# This time we silence the aggregate error too, so install completes,
# but anaconda is told NOT to install bootloader itself. The
# generated install ks's chroot %post does it explicitly via
# `dnf reinstall grub2-efi-x64 shim-x64 + grub2-install +
# grub2-mkconfig + efibootmgr`. The chroot has PID 1 systemd state
# from the live ISO (not the target), so scriptlets get a real
# environment to run in, not anaconda's truncated chroot. This
# sidesteps gen_grub_cfgstub entirely.
TP=/usr/lib64/python3.14/site-packages/pyanaconda/modules/payloads/payload/dnf/transaction_progress.py
if [ -f "$TP" ]; then
cp -a "$TP" "${TP}.veilor-bak"
# Replace the entire `elif token == 'error':` branch with log+continue.
# Pattern matches the original two-line block (log.error + raise).
python3 - "$TP" <<'PYEOF'
import sys, re
path = sys.argv[1]
src = open(path).read()
# Match: elif token == 'error':\n log.error(msg)\n raise PayloadInstallationError(...)
# Or any current substitution that looks like raise/log.warning at that level.
new = re.sub(
r"elif token == 'error':\n log\.error\(msg\)\n (?:raise PayloadInstallationError\(\"An error occurred during the transaction: \" \+ msg\)|log\.warning\(\"veilor: ignoring non-fatal transaction error: %s\", msg\))",
"elif token == 'error':\n log.warning('veilor: suppressed dnf5 transaction error (RPM 6.0 cmdline regression): %s', msg)\n # Do not raise — anaconda --cmdline + dnf5 + RPM 6.0 emits this for any scriptlet\n # failure; we handle bootloader install manually in install ks %post chroot",
src,
count=1,
)
if new == src:
# Try fresh-anaconda layout (no veilor patch yet)
new = re.sub(
r"elif token == 'error':\n log\.error\(msg\)\n raise PayloadInstallationError\(\"An error occurred during the transaction: \" \+ msg\)",
"elif token == 'error':\n log.warning('veilor: suppressed dnf5 transaction error: %s', msg)",
src,
count=1,
)
if new == src:
print("[ERR] transaction_progress.py error-branch not found")
sys.exit(1)
open(path, "w").write(new)
print("[OK] transaction_progress.py: broad error-branch suppressed")
PYEOF
if grep -q "veilor: suppressed dnf5 transaction error" "$TP"; then
rm -f /usr/lib64/python3.14/site-packages/pyanaconda/modules/payloads/payload/dnf/__pycache__/transaction_progress.*.pyc 2>/dev/null || true
echo "[OK] anaconda transaction_progress.py patched (broad error suppression)"
else
echo "[WARN] transaction_progress.py patch did not apply"
fi
else
echo "[WARN] transaction_progress.py not found at expected path"
fi
# Enable services
systemctl enable veilor-firstboot.service
# veilor-firstboot.service NOT enabled on live ISO — it prompts admin pw
# which makes no sense on a live boot. Real installs enable it in their
# generated kickstart's chroot %post (see overlay/usr/local/bin/veilor-installer).
systemctl enable veilor-modules-lock.service
systemctl enable sshd fail2ban usbguard tuned auditd firewalld chronyd
# Mask veilor-firstboot on live so even if it landed in /etc/systemd/system
# (overlay drag), it can't activate.
systemctl mask veilor-firstboot.service 2>/dev/null || true
# Default tuned profile = balanced (AC/battery udev rule will override)
tuned-adm profile veilor-balanced 2>/dev/null || true

View file

@ -0,0 +1,11 @@
# veilor-os AppArmor profile stub — firefox
#
# v0.6 scope: marker only. Loads in complain mode via scripts/40-apparmor.sh
# so AppArmor can log the syscall surface for v0.7 policy authoring. No
# actual confinement rules yet — full policy is post-v0.6.
#include <tunables/global>
profile veilor-firefox /usr/lib*/firefox/firefox flags=(complain) {
#include <abstractions/base>
}

View file

@ -0,0 +1,11 @@
# veilor-os AppArmor profile stub — thunderbird
#
# v0.6 scope: marker only. Loads in complain mode via scripts/40-apparmor.sh
# so AppArmor can log the syscall surface for v0.7 policy authoring. No
# actual confinement rules yet — full policy is post-v0.6.
#include <tunables/global>
profile veilor-thunderbird /usr/lib*/thunderbird/thunderbird flags=(complain) {
#include <abstractions/base>
}

View file

@ -0,0 +1,12 @@
<?xml version="1.0" encoding="utf-8"?>
<!-- veilor-os: trusted zone with tailscale0 pre-bound.
Default zone stays drop (per 10-harden-base.sh). Tailscale's
interface is added here so `tailscale up` traffic isn't dropped.
Without this entry the firewalld drop zone blocks the tailnet
traffic and the user sees: "tailscale up succeeded, but I can't
reach hs.s8n.ru". (Agent 9, 2026-05-05 wave.) -->
<zone target="ACCEPT">
<short>Trusted</short>
<description>All network connections are accepted. veilor-os pre-binds tailscale0 here so the mesh layer-1 (Tailscale via Headscale) works out-of-box without manual firewalld zone juggling.</description>
<interface name="tailscale0"/>
</zone>

View file

@ -1,9 +1,9 @@
NAME="veilor-os"
PRETTY_NAME="veilor-os 0.1"
PRETTY_NAME="veilor-os 0.5.27"
ID=veilor
ID_LIKE=fedora
VERSION="0.1"
VERSION_ID="0.1"
VERSION="0.5.27"
VERSION_ID="0.5.27"
HOME_URL="https://github.com/veilor-org/veilor-os"
DOCUMENTATION_URL="https://github.com/veilor-org/veilor-os/tree/main/docs"
BUG_REPORT_URL="https://github.com/veilor-org/veilor-os/issues"

View file

@ -0,0 +1,60 @@
# veilor-os — Breeze window decoration override
# Tighter borders, solid black title bar, minimal buttons, smallest border.
# Merged into /etc/xdg/breezerc (system default) by 30-apply-v03-theme.sh.
[Common]
# Tighter outline; subtle separator only when active.
OutlineCloseButton=false
ShadowSize=ShadowSmall
ShadowStrength=128
ShadowColor=0,0,0
[Windeco]
# Border thickness: smallest available (= "None" leaves only resize edge,
# "NoSides" keeps top/bottom only). We pick "None" for the tightest look,
# matching the black-on-black aesthetic.
BorderSize=None
ButtonSize=ButtonSmall
CloseButton=true
DrawBackgroundGradient=false
DrawBorderOnMaximizedWindows=false
DrawSizeGrip=false
DrawTitleBarSeparator=false
ExceptionType=0
HideTitleBar=false
OpaqueTitleBar=true
TitleAlignment=AlignCenter
UseBackgroundGradient=false
UseTitleBarColor=true
# Buttons: minimal — close / max / min only, no shade/help/keep-above.
ButtonsOnLeft=M
ButtonsOnRight=IAX
[Style]
# Disable per-app blur, transparency, and gradient effects.
MenuOpacity=100
WindowDragMode=1
ScrollBarAddLineButtons=0
ScrollBarSubLineButtons=0
SidePanelDrawFrame=false
SliderDrawTickMarks=false
TabBarDrawCenteredTabs=true
ToolBarDrawItemSeparator=false
DockWidgetDrawFrame=false
ProgressBarAnimated=false
AnimationsEnabled=false
StackedWidgetDrawFrame=false
# ── Active / inactive title bar colors (override Breeze defaults) ──
# kdeglobals [WM] section is the canonical source; these mirror it here
# so apps that only read breezerc see consistent values.
[Windeco][Active]
TitleBarColor=0,0,0
TitleBarTextColor=216,216,216
TitleBarBorderColor=104,107,111
[Windeco][Inactive]
TitleBarColor=15,17,18
TitleBarTextColor=161,169,177
TitleBarBorderColor=42,46,50

View file

@ -0,0 +1,29 @@
[General]
ColorScheme=veilor-black
Name=veilor black
AccentColor=104,107,111
LastUsedCustomAccentColor=104,107,111
font=Fira Code,11,-1,5,400,0,0,0,0,0,0,0,0,0,0,1
fixed=Fira Code,10,-1,5,400,0,0,0,0,0,0,0,0,0,0,1
menuFont=Fira Code,11,-1,5,400,0,0,0,0,0,0,0,0,0,0,1
smallestReadableFont=Fira Code,9,-1,5,400,0,0,0,0,0,0,0,0,0,0,1
toolBarFont=Fira Code,10,-1,5,400,0,0,0,0,0,0,0,0,0,0,1
[Icons]
Theme=breeze-dark
[KDE]
LookAndFeelPackage=org.kde.breezedark.desktop
SingleClick=false
contrast=4
widgetStyle=Breeze
[Mouse]
cursorTheme=Breeze_Light
cursorSize=24
[KDecoration]
theme=Breeze
ButtonsOnLeft=
ButtonsOnRight=IAX
BorderSize=None

View file

@ -0,0 +1,10 @@
# veilor-os Konsole default — branded profile pre-selected so first
# Konsole launch on a fresh user account opens the veilor profile,
# not Fedora's default white-bg Breeze.
[Desktop Entry]
DefaultProfile=Veilor.profile
[KonsoleWindow]
ShowMenuBarByDefault=false
RememberWindowSize=true

View file

@ -0,0 +1,29 @@
# veilor-os Plasma 6 kwin defaults — seeded into /etc/skel so first-login
# users inherit deliberate windowing behaviour rather than default Breeze
# animations. Per-user; user can override post-login.
[Compositing]
# OpenGL backend = standard for hardware accel; AnimationSpeed=0 = no
# slow window animations on every focus change.
Backend=OpenGL
AnimationSpeed=0
HiddenPreviews=5
LatencyPolicy=Low
WindowsBlockCompositing=true
[Plugins]
# Disable visual fluff that isn't security-relevant + costs perf.
blurEnabled=false
contrastEnabled=false
slideEnabled=false
slidingpopupsEnabled=false
fadeEnabled=false
zoomEnabled=true
[Windows]
FocusPolicy=ClickToFocus
RollOverDesktops=false
TitlebarDoubleClickCommand=Maximize
[Wayland]
InputMethod=

View file

@ -0,0 +1,104 @@
[General]
Anchor=0.5,0.5
Blur=false
ColorRandomization=false
Description=Veilor
FillStyle=Tile
Opacity=1
Wallpaper=
WallpaperFlipType=NoFlip
WallpaperOpacity=1
[Background]
Color=0,0,0
[BackgroundFaint]
Color=0,0,0
[BackgroundIntense]
Color=15,17,18
[Foreground]
Color=216,216,216
[ForegroundFaint]
Color=161,169,177
[ForegroundIntense]
Color=236,236,236
# ── Standard ANSI palette (muted, desaturated) ──
# Veilor aesthetic: no neon. Reds tone-shifted toward bordeaux, greens
# toward sage, blues toward slate. Greys lifted to remain readable.
[Color0]
Color=27,27,27
[Color0Faint]
Color=20,20,20
[Color0Intense]
Color=58,58,58
[Color1]
Color=176,55,69
[Color1Faint]
Color=130,40,52
[Color1Intense]
Color=205,87,99
[Color2]
Color=102,138,90
[Color2Faint]
Color=78,107,68
[Color2Intense]
Color=141,176,128
[Color3]
Color=185,158,98
[Color3Faint]
Color=140,118,72
[Color3Intense]
Color=216,193,134
[Color4]
Color=92,116,143
[Color4Faint]
Color=68,87,107
[Color4Intense]
Color=131,154,182
[Color5]
Color=141,113,150
[Color5Faint]
Color=104,84,112
[Color5Intense]
Color=176,148,186
[Color6]
Color=99,144,148
[Color6Faint]
Color=72,107,110
[Color6Intense]
Color=139,180,184
[Color7]
Color=200,200,200
[Color7Faint]
Color=161,169,177
[Color7Intense]
Color=236,236,236

View file

@ -0,0 +1,55 @@
[General]
Name=Veilor
Parent=FALLBACK/
Command=/bin/bash
Directory=
Icon=utilities-terminal
LocalTabTitleFormat=%w
RemoteTabTitleFormat=(%u) %h
ShowTerminalSizeHint=false
StartInCurrentSessionDir=true
TerminalCenter=false
TerminalMargin=4
[Appearance]
ColorScheme=Veilor
Font=Fira Code,11,-1,5,400,0,0,0,0,0,0,0,0,0,0,1
LineSpacing=1
UseFontLineCharacters=true
[Cursor Options]
CursorShape=0
UseCustomCursorColor=true
CustomCursorColor=104,107,111
CustomCursorTextColor=216,216,216
[Scrolling]
HistoryMode=2
HistorySize=10000
ScrollBarPosition=2
HighlightScrolledLines=false
[Terminal Features]
BellMode=3
BlinkingCursorEnabled=false
BlinkingTextEnabled=false
FlowControlEnabled=false
UrlHintsModifiers=67108864
ReverseUrlHints=false
VerticalLine=false
[Interaction Options]
AutoCopySelectedText=false
CopyTextAsHTML=false
TrimLeadingSpacesInSelectedText=false
TrimTrailingSpacesInSelectedText=true
UnderlineFilesEnabled=true
UnderlineLinksEnabled=true
OpenLinksByDirectClickEnabled=false
WordCharacters=:@-./_~?&=%+#
[Encoding Options]
DefaultEncoding=UTF-8
[Keyboard]
KeyBindings=default

View file

@ -14,3 +14,8 @@ LoginGraceTime 30
MaxAuthTries 3
MaxSessions 4
LogLevel VERBOSE
# UseDNS off: reverse-lookup-on-connect adds 30s+ delay per connection
# when client DNS doesn't resolve back-references (NAT, slirp, dynamic
# IPs). Hardening doesn't benefit from the lookup either way; sshd
# still logs the IP, just not the (possibly forged) PTR.
UseDNS no

View file

@ -18,4 +18,9 @@ TTYReset=yes
TTYVHangup=yes
[Install]
WantedBy=multi-user.target
# Real installs default to graphical.target. Without this entry the unit
# never runs on installed systems — admin pw stays at install-time value
# + chage -d 0 expired, SDDM PAM bounces user to a chauthtok screen
# (recoverable but ugly). Live ISO + multi-user.target installs both
# resolve via this list. (Agent 2, 2026-05-05 wave.)
WantedBy=graphical.target multi-user.target

View file

@ -0,0 +1,43 @@
# veilor-os USBGuard baseline rules
#
# Default policy is `block` (set in usbguard-daemon.conf via the
# overlay). Without any allow rule, every USB device — including the
# user's keyboard — is blocked at boot. That includes the desktop
# user with a USB keyboard at SDDM.
#
# This file allows HID-class interfaces (keyboard, mouse, touchpad,
# fingerprint reader, NFC, gamepad) without pinning to specific
# vendor:product/serial/hash. id-based rules survive dock replug and
# vendor-bump kernel changes, where hash+parent-hash rules don't —
# verified pain on onyx (memory: feedback_usbguard_dock.md). Same fix.
#
# After first login, the user runs:
# ujust veilor-usbguard-enroll
# (or `usbguard generate-policy --with-hash=false > rules.conf`)
# to add their own keyboard's id-rule and tighten the policy further.
#
# References:
# - usbguard-rules.conf(5)
# - https://usbguard.github.io/documentation/rule-language.html
# - veilor-os agent 9 audit, 2026-05-05
# HID class — keyboards, mice, pointers, gamepads, fingerprint, NFC.
# Interface descriptor 03:NN:NN where 03=HID. We accept any HID
# subclass + protocol so the rule is robust to future HID variants.
allow with-interface match-all { 03:*:* }
# Mass-storage prompt: ask the user before mounting a new flash drive.
# Reject blanket-allow (would silently allow USB Rubber Ducky).
# Accept only after user confirms via the gnome/plasma USB dialog.
# (USBGuard has no native "ask" verb; we leave mass-storage devices
# implicit-block here, the user runs `usbguard allow-device <id>`
# from a Plasma applet OR the firstboot wizard documents this flow.)
# Block known-bad. USB Killer signature shows up as a generic-HID
# composite descriptor + power draw out of spec. We can't reliably
# detect that from descriptors alone — relying on default-block
# semantics for now.
# DO NOT pin to specific id=, serial=, hash=, or parent-hash= here.
# That's the user's job post-firstboot for their actual hardware.
# Pre-shipped pinned rules break on every dock replug + kernel bump.

View file

@ -17,7 +17,10 @@
set -uo pipefail
export TERM="${TERM:-linux}"
LOG=/var/log/veilor-installer.log
# Log to /run (pure tmpfs) — /var/log overlays squashfs on the live ISO, so a
# bad sector on the USB medium turns `tee -a` into "input/output error" and
# kills the installer before the menu can render.
LOG=/run/veilor-installer.log
# require_tty MUST run before the tee redirect — process substitution
# replaces fd1 with a pipe, breaking `[[ -t 1 ]]`.
@ -29,8 +32,12 @@ require_tty() {
}
require_tty
# Now safe to tee output for log persistence — tty detection already passed.
# Tee output for log persistence, but only if LOG is writable. On a flaky USB
# the squashfs/overlay can throw I/O errors mid-write — tolerate that and
# keep the installer running without persistence rather than aborting.
if : >> "$LOG" 2>/dev/null; then
exec > >(tee -a "$LOG") 2>&1
fi
# ── Branded styling for gum ─────────────────────────────────────────────
# colors.gum sets GUM_* env vars — pure-black palette from veilor-black KDE.
@ -54,23 +61,49 @@ fi
banner() {
clear
# Read version + date for version line. /etc/os-release has VERSION_ID.
local ver=""
[[ -r /etc/os-release ]] && ver=$(. /etc/os-release; echo "${VERSION_ID:-0.0}")
local d
d=$(date +%Y-%m-%d)
local vline="veilor-os ${ver} · ${d} · live"
if [[ -r $BANNER_FILE ]]; then
# v0.6: staged line-by-line reveal of the banner before the
# gum-style border draws around it. 40ms/line gives a subtle
# "typewriter" feel — 5 lines × 40ms = 200ms total, fast enough
# not to feel laggy but slow enough to land an aesthetic on the
# very first frame the user sees. Once the reveal finishes we
# clear and re-draw with the bordered gum-style version so the
# operator never sees both stacked on top of each other.
local line
while IFS= read -r line; do
printf ' %s\n' "$line"
sleep 0.04
done < "$BANNER_FILE"
sleep 0.08
clear
if [[ $TUI == gum ]]; then
# gum style draws a rounded border + padding around the banner.
gum style --border rounded --margin "1" --padding "1 2" \
# gum style: rounded border, banner + blank line + version line.
gum style --border rounded --margin "0 2" --padding "1 3" \
--border-foreground "${VEILOR_DIM:-240}" \
--foreground "${VEILOR_FG:-15}" \
"$(cat "$BANNER_FILE")"
"$(cat "$BANNER_FILE")" \
"" \
"$vline"
else
cat "$BANNER_FILE"
echo
echo " $vline"
echo
fi
else
# Fallback ASCII if banner.txt missing (older overlay).
cat << 'EOF'
cat << EOF
veilor-os installer
hardened. branded. yours.
$vline
EOF
fi
@ -132,10 +165,30 @@ prompt_input() {
}
# prompt_password <header>
#
# v0.6: gum-path replaced with bash `read -srp` because `gum input
# --password` rendered as a duplicate-"Install" + stray-T artefact on
# the linux fbcon since v0.5.27 (Agent 7 of the v0.6 polish research
# wave traced this to gum's bubbletea screen-restore writing back the
# previous menu buffer when the framebuffer terminfo lacked
# `civis/cnorm` cursor-hide sequences). bash `read -srp` is a single
# write to stdout + termios echo-off — no redraw, no glitch. Header
# rendered separately via gum style for visual parity with the rest
# of the installer.
prompt_password() {
local header=$1
if [[ $TUI == gum ]]; then
gum input --password --header "$header"
# Render the prompt header as a styled box so it looks at home
# next to the other gum prompts, then collect the password via
# plain bash read on the next line. `read -s` disables echo,
# `read -p` writes the prompt to stderr (so command-substitution
# callers still get the password on stdout cleanly).
gum style --foreground "${VEILOR_FG:-15}" --border rounded \
--border-foreground "${VEILOR_DIM:-240}" --padding "0 2" -- "$header"
local pw
read -srp " password: " pw
echo >&2 # newline after silent read so next prompt isn't on same line
printf '%s' "$pw"
else
whiptail --title "veilor-os" --passwordbox "$header" 10 60 \
3>&1 1>&2 2>&3
@ -180,10 +233,14 @@ prompt_error() {
}
main_menu() {
prompt_choose "Welcome" \
# Empty header — banner already provides context. `──────` line splits
# primary actions (top) from session controls (bottom). Cursor `` set
# via GUM_CHOOSE_CURSOR in colors.gum.
prompt_choose "" \
"Install" \
"live - (KDE)" \
"live - shell" \
"live · KDE" \
"live · shell" \
"──────" \
"Reboot" \
"Power off"
}
@ -202,7 +259,7 @@ collect_answers() {
prompt_error "No installable disks found."
return 1
fi
disk=$(prompt_choose_pairs "Select install disk (will be ERASED)" "${disks_pairs[@]}") || return 1
disk=$(prompt_choose_pairs "[1/3] Select install disk · WILL BE ERASED" "${disks_pairs[@]}") || return 1
# ── Hostname ──
# Hardcoded for branded consistency. Post-install: `hostnamectl set-hostname`.
@ -231,27 +288,76 @@ collect_answers() {
}
# ── LUKS passphrase ──
luks_pw=$(prompt_password "LUKS passphrase (full-disk encryption)") || return 1
validate_pw "$luks_pw" "passphrase" || return 1
# v0.6: prompt twice + string-compare. A typo in the LUKS passphrase
# is unrecoverable — the disk is unmountable without it and we
# don't escrow the key. Re-prompting until the two reads match
# catches keyboard-layout surprises (US vs UK quote position is
# the most common one) before they brick the install.
local luks_pw_confirm
while true; do
luks_pw=$(prompt_password "[2/3] Encryption · LUKS2 passphrase (min 8)") || return 1
validate_pw "$luks_pw" "passphrase" || continue
luks_pw_confirm=$(prompt_password "[2/3] Confirm LUKS2 passphrase") || return 1
if [[ $luks_pw == "$luks_pw_confirm" ]]; then
break
fi
prompt_error "Passphrases do not match — try again."
done
# ── Admin password ──
admin_pw=$(prompt_password "Admin user password (login after install)") || return 1
validate_pw "$admin_pw" "password" || return 1
# Same confirm-twice pattern. Less catastrophic than LUKS (admin
# password can be reset from a recovery shell) but a mismatch here
# still locks the user out of their fresh install on first boot.
local admin_pw_confirm
while true; do
admin_pw=$(prompt_password "[3/3] Admin user · password for 'admin'") || return 1
validate_pw "$admin_pw" "password" || continue
admin_pw_confirm=$(prompt_password "[3/3] Confirm admin password") || return 1
if [[ $admin_pw == "$admin_pw_confirm" ]]; then
break
fi
prompt_error "Passwords do not match — try again."
done
# ── Locale ──
locale=$(prompt_choose "Choose locale" \
"en_GB.UTF-8" \
"en_US.UTF-8") || return 1
# Hardcoded en_US.UTF-8 for branded consistency. The picker that
# used to live here (en_GB / en_US) only added confusion — both
# locales install identically, the user couldn't notice the
# difference, and the post-install `localectl set-locale` works for
# any locale anyway. v0.7 post-install menu will offer locale + kb
# layout switch with live preview. For the install flow, fixed.
locale="en_US.UTF-8"
# ── Confirmation ──
prompt_confirm "About to install veilor-os:
# Render summary box + danger lines via gum style, then gum confirm.
# Whiptail fallback: simple yesno.
if [[ $TUI == gum ]]; then
clear
gum style --border rounded --margin "1 2" --padding "1 3" \
--border-foreground "${VEILOR_DIM:-240}" \
--foreground "${VEILOR_FG:-15}" \
"Confirm install" \
"" \
" Disk $disk $(gum style --foreground 1 'WILL BE ERASED')" \
" Locale $locale" \
" LUKS ✓ set" \
" Admin ✓ set" \
"" \
"$(gum style --foreground 3 'This action is irreversible.')"
gum confirm --affirmative "Yes, install" --negative "Cancel" "Proceed?" || return 1
else
whiptail --title "Confirm install" --yesno \
"About to install veilor-os:
Disk: $disk (will be ERASED)
Disk: $disk (WILL BE ERASED)
Locale: $locale
LUKS: set
Admin pw: set
Proceed?" || return 1
This action is irreversible.
Proceed?" 16 60 || return 1
fi
# Export to caller via globals
SEL_DISK=$disk
@ -273,6 +379,34 @@ sed_escape() {
printf '%s' "$1" | sed -e 's/[\\&|/]/\\&/g'
}
# detect_seed_pubkey — search attached cdroms for a NoCloud cidata seed
# with an ssh_authorized_keys entry. Returns the first key on stdout, or
# empty string if none found. Used by both auto-install.sh (cloud-init
# seed pre-built with host pubkey) and humans (drop a seed iso next to
# the install media).
detect_seed_pubkey() {
local dev label tmpmnt key=""
for dev in /dev/sr0 /dev/sr1 /dev/sr2 /dev/sr3; do
[ -b "$dev" ] || continue
label=$(blkid -o value -s LABEL "$dev" 2>/dev/null)
if [[ $label == "cidata" || $label == "CIDATA" ]]; then
tmpmnt=$(mktemp -d)
if mount -o ro "$dev" "$tmpmnt" 2>/dev/null; then
# NoCloud user-data format:
# ssh_authorized_keys:
# - ssh-ed25519 AAAA... user@host
# Extract first ssh-* line, strip leading '- '.
key=$(grep -E '^\s*-\s+ssh-' "$tmpmnt/user-data" 2>/dev/null \
| head -1 | sed -e 's/^\s*-\s*//' -e 's/[[:space:]]*$//')
umount "$tmpmnt" 2>/dev/null
fi
rmdir "$tmpmnt" 2>/dev/null
[[ -n $key ]] && { printf '%s' "$key"; return 0; }
fi
done
return 1
}
generate_ks() {
# Build kickstart for actual disk install.
# NOTE: passwords go in via --plaintext to avoid storing crypted hash
@ -293,7 +427,17 @@ generate_ks() {
# DO NOT commit this file — secrets inline.
url --mirrorlist="https://mirrors.fedoraproject.org/mirrorlist?repo=fedora-43&arch=x86_64"
repo --name=fedora --baseurl="https://download.fedoraproject.org/pub/fedora/linux/releases/43/Everything/x86_64/os/" --install
repo --name=updates --baseurl="https://download.fedoraproject.org/pub/fedora/linux/updates/43/Everything/x86_64/" --install
# `updates` repo intentionally NOT added. Fedora's update stream pushes
# package versions whose %posttrans scriptlets sometimes fail under
# anaconda's `--cmdline` mode — most reliably reproduced as `man-db`
# failing during "Configuring man-db.x86_64", which dumps the whole
# transaction with no recoverable error message and prints "An error
# occurred during the transaction: The transaction process has ended
# with errors..". Reproduced in v0.5.26 + v0.5.27 VM tests against
# fresh installs days apart, so it is not a Fedora-mirrors-down blip.
# CI build kickstart already strips this line for the same reason
# (build-iso.yml line ~109). Users who want to update post-install run
# `dnf upgrade` or the v0.6 `veilor-update` wrapper.
keyboard --xlayouts='us'
lang __LOCALE__
@ -309,19 +453,54 @@ firewall --enabled --service=ssh
rootpw --lock
user --name=admin --groups=wheel --gecos="veilor admin" --password=__ADMIN_PW__ --plaintext
__SSHKEY_DIRECTIVE__
# Full hardening cmdline (installed system, not live):
# --location=none: anaconda auto-places bootloader (UEFI grub2-efi or BIOS).
bootloader --location=none --append="lockdown=integrity slab_nomerge init_on_alloc=1 init_on_free=1 randomize_kstack_offset=on vsyscall=none"
# - `lockdown=integrity` — kernel lockdown, integrity mode (signed module enforce)
# - `slab_nomerge` — refuse SLAB merging; harder heap-spray attacks
# - `init_on_alloc=1 init_on_free=1` — zero pages on alloc + free; defends
# uninit-read class; ~5% perf hit acceptable on hardened workstation
# - `randomize_kstack_offset=on` — KASLR for kernel stack, per-syscall
# - `vsyscall=none` — kill legacy vsyscall page (Position-Independent
# ROP-gadget surface)
# - `fbcon=nodefer` — keep linux framebuffer console alive through KMS
# handoff so plymouth LUKS prompt remains visible on real GPUs.
#
# NOTE on --location: v0.5.30 used --location=none to skip anaconda's
# bootloader install (sidestep gen_grub_cfgstub). Side effect: anaconda
# also skipped CollectKernelArgumentsTask (installation.py:149-151), so
# `--append=` args were NEVER COLLECTED. kernel-install then wrote BLS
# entries with empty /etc/kernel/cmdline, falling through to the live
# ISO's /proc/cmdline — no rd.luks.uuid, no fbcon=nodefer, no hardening.
# Result: dracut emergency shell on first boot.
#
# v0.5.31 lets anaconda install the bootloader (default behavior, no
# --location flag). With our broad transaction_progress patch in the
# live ks, anaconda's gen_grub_cfgstub still runs, but if grub2-efi-x64's
# posttrans had a non-fatal scriptlet failure the patch swallows it
# without aborting. The %post chroot below STILL does belt-and-braces
# fixup (dnf reinstall, grub2-install, etc.) in case anaconda's path
# left something incomplete.
#
# Critically v0.5.31 also writes /etc/kernel/cmdline FIRST in %post then
# re-runs kernel-install per kernel. That's the canonical Fedora 43 path
# for landing args in BLS entries — kernel-install reads /etc/kernel/cmdline
# (90-loaderentry.install:84-95) when generating BLS option lines.
bootloader --append="lockdown=integrity module.sig_enforce=1 slab_nomerge init_on_alloc=1 init_on_free=1 randomize_kstack_offset=on vsyscall=none fbcon=nodefer i915.modeset=1 amdgpu.modeset=1 nvidia-drm.modeset=1 rd.vconsole.keymap=us"
# Disk: zero, LUKS2 (argon2id), btrfs subvolumes
# Disk: zero, LUKS2 (argon2id), btrfs subvolumes (no LVM intermediary).
# Native btrfs-on-LUKS matches Fedora KDE Spin defaults; LVM+btrfs combo
# triggered "mount failed: wrong fs type, bad option, bad superblock"
# under anaconda --cmdline. btrfs subvols give us the snapshot/rollback
# story without LVM's added complexity.
zerombr
clearpart --all --initlabel --drives=__DISK_BASENAME__
part /boot/efi --fstype=efi --size=600
part /boot --fstype=ext4 --size=1024
part pv.veilor --grow --encrypted --luks-version=luks2 --pbkdf=argon2id --passphrase=__LUKS_PW__
volgroup veilor pv.veilor
logvol / --vgname=veilor --name=root --fstype=btrfs --size=8192 --grow
part btrfs.veilor --grow --encrypted --luks-version=luks2 --pbkdf=argon2id --passphrase=__LUKS_PW__
btrfs none --label=veilor btrfs.veilor
btrfs / --subvol --name=root LABEL=veilor
btrfs /home --subvol --name=home LABEL=veilor
# ── Packages — mirrors live ks (kickstart/veilor-os.ks 63-141), minus
# live-only entries (livesys-scripts, anaconda-live, @anaconda-tools,
@ -368,6 +547,12 @@ chrony
firewalld
plymouth
# AppArmor stack — Fedora 43 ships parser/utils/profiles. v0.6 ships
# loaded-but-complain only (see scripts/40-apparmor.sh + tier-2 plan).
apparmor-parser
apparmor-utils
apparmor-profiles
# admin essentials
git
vim-enhanced
@ -395,6 +580,17 @@ zram-generator
-open-vm-tools-desktop
-mlocate
# Belt-and-braces with the kickstart/veilor-os.ks transaction_progress
# patch: even with the patch, man-db's transfiletriggerin in the F43
# RPM 6.0 toolchain dispatches a systemd-run that anaconda's chroot
# can race-with on exit. Excluding the package entirely guarantees the
# trigger never fires during install. Veilor users who want man pages
# install them post-firstboot via \`dnf install man-db man-pages\` or
# via the v0.6 \`veilor-postinstall\` welcome menu.
-man-db
-man-pages
-man-pages-overrides
%end
# ── Post-install (nochroot): copy overlay + scripts + assets from boot ISO.
@ -460,6 +656,271 @@ bash $REPO/scripts/selinux/build-policy.sh || echo "[WARN] SELinux build failed;
# Apply KDE theme + DuckSans + os-release branding
bash $REPO/scripts/kde-theme-apply.sh
# Disable plymouth at TWO layers:
#
# 1. Initramfs (the boot stage where LUKS unlock happens). Plymouth is
# a dracut module; masking units in /etc/systemd/ has zero effect
# here because dracut bundles its own copies into initramfs/.
# Solution: omit_dracutmodules in dracut.conf.d, then regenerate
# initramfs so the new config takes effect.
#
# 2. Real root (post-pivot, before SDDM). /dev/null symlinks mask all
# plymouth services + the path-activated ask-password unit so they
# never start when systemd is up.
#
# After both, LUKS prompt falls back to systemd-tty-ask-password-agent
# on tty1 — text "Please enter passphrase for disk... :" — works in
# QEMU sendkey AND on real hardware.
# Manual bootloader install (anaconda told to skip via --location=none)
#
# Anaconda's gen_grub_cfgstub script-runner (efi.py:194-201) is
# brittle on F43 RPM 6.0 cmdline mode — grub2-efi-x64's posttrans
# scriptlet may emit non-fatal errors that anaconda treats as
# transaction-fatal even with our error suppression patch. Doing
# the work in %post chroot bypasses that whole code path and gives
# us linear, debuggable steps.
#
# Order:
# 1. Re-run grub2-efi-x64 + shim-x64 scriptlets cleanly (dnf
# reinstall in chroot has full PID 1 systemd context, so the
# systemd-run inside man-db-style triggers actually runs). Re-
# installing repopulates /boot/efi/EFI/fedora/ if it was empty.
# 2. grub2-install — generic + UEFI. UEFI path is the meaningful one
# on virtio-vga and real hardware.
# 3. grub2-mkconfig — write /boot/grub2/grub.cfg + /boot/efi/EFI/fedora/grub.cfg.
# 4. efibootmgr — register the boot entry in NVRAM.
#
# Failure of any individual step is logged but does NOT abort the
# %post (set +e bracket). On a real failure the user sees the
# diagnostic text and can fix manually post-firstboot.
set +e
echo "════════════════════════════════════════════════════════"
echo " bootloader install (manual; anaconda skipped via --location=none)"
echo "════════════════════════════════════════════════════════"
# Disk we're targeting — anaconda already wrote /boot/efi mount, so
# the disk is whatever holds /boot/efi.
EFI_DISK=$(findmnt -n -o SOURCE /boot/efi 2>/dev/null | sed -E 's/p?[0-9]+$//')
[ -z "$EFI_DISK" ] && EFI_DISK="/dev/$(basename "$(realpath /sys/class/block/$(findmnt -n -o SOURCE /boot/efi | sed 's|/dev/||' | sed 's|p\?[0-9]\+$||') 2>/dev/null)")"
echo "[INFO] target disk for grub: ${EFI_DISK:-<unknown>}"
# Step 1: re-run grub2 + shim scriptlets in clean chroot
dnf reinstall -y grub2-efi-x64 grub2-efi-x64-modules grub2-pc grub2-pc-modules grub2-tools grub2-tools-extra shim-x64 efibootmgr 2>&1 | tail -10 || \
echo "[WARN] dnf reinstall of grub stack failed"
# Step 2: install grub to ESP (UEFI path, primary)
mkdir -p /boot/efi/EFI/fedora
grub2-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=fedora --no-nvram 2>&1 | tail -5 || \
echo "[WARN] grub2-install (efi) failed"
# Step 3: write the EFI grub.cfg stub (what gen_grub_cfgstub would have done)
if command -v gen_grub_cfgstub >/dev/null 2>&1; then
gen_grub_cfgstub /boot/grub2 /boot/efi/EFI/fedora 2>&1 | tail -5 || \
echo "[WARN] gen_grub_cfgstub failed (will fall back to grub2-mkconfig)"
fi
# Always also write a full grub.cfg in /boot/grub2 (BLS source) and
# regenerate the EFI cfg via grub2-mkconfig for redundancy
grub2-mkconfig -o /boot/grub2/grub.cfg 2>&1 | tail -3
[ -f /boot/efi/EFI/fedora/grub.cfg ] || \
grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg 2>&1 | tail -3
# Step 4: register NVRAM entry
if [ -n "$EFI_DISK" ] && [ -e "$EFI_DISK" ]; then
EFI_PART_NUM=$(findmnt -n -o SOURCE /boot/efi | grep -oE '[0-9]+$')
[ -n "$EFI_PART_NUM" ] && \
efibootmgr -c -d "$EFI_DISK" -p "$EFI_PART_NUM" -L "veilor-os" -l '\EFI\fedora\shimx64.efi' 2>&1 | tail -3 || \
echo "[WARN] efibootmgr failed (NVRAM may already be set)"
fi
echo "[INFO] bootloader install: see above for any [WARN] lines"
# NOTE: deliberately NOT `set -e` here. The block above opened with
# `set +e` and the rest of %post is a sequence of best-effort hardening
# steps that have local `|| true` guards on the operations that may
# legitimately fail. Re-enabling errexit would cause `set -e` to abort
# the whole %post on the first non-guarded command (e.g. a `grep -q`
# returning 1). v0.5.30 had this bug and it silently truncated
# the LUKS args injection.
# GRUB branding: replace fedora distributor with veilor-os in menu titles.
# Drop rhgb quiet from default cmdline → all kernel/systemd messages
# visible. Plymouth `details` theme shows scrolling text (no splash, no
# logo). LUKS prompt rendered as text. Onyx-style boot.
sed -i \
-e 's|^GRUB_DISTRIBUTOR=.*|GRUB_DISTRIBUTOR="veilor-os"|' \
-e 's|^GRUB_CMDLINE_LINUX_DEFAULT=.*|GRUB_CMDLINE_LINUX_DEFAULT=""|' \
/etc/default/grub 2>/dev/null || true
# Ensure rd.luks.uuid + rd.luks.name in cmdline. Anaconda --cmdline mode
# does not auto-add the LUKS args when partitioning is custom + encrypted
# manually; without rd.luks.uuid in cmdline, dracut's cryptsetup-generator
# never spawns the unlock unit, plymouth has no password to ask for, and
# dracut-initqueue loops on dev-disk-by-uuid for ~3min before dropping
# to emergency shell. Detect the LUKS partition + inject explicitly.
#
# Two write paths because Fedora 43 uses BLS (Boot Loader Specification):
# 1. /etc/default/grub — read by `grub2-mkconfig` for the legacy
# grub.cfg menu, AND used as the source of
# truth for `kernel-install` when adding new
# kernels later.
# 2. /boot/loader/entries/*.conf — the actual per-kernel BLS entries
# that grub reads for cmdline. These
# are NOT regenerated by
# `grub2-mkconfig`; they are managed
# by `grubby` (or `kernel-install`).
# Without both, the running kernel boots without rd.luks.uuid and the
# user lands in emergency shell on first boot.
LUKS_UUID=$(blkid -t TYPE=crypto_LUKS -o value -s UUID 2>/dev/null | head -1)
if [ -n "$LUKS_UUID" ]; then
LUKS_ARGS="rd.luks.uuid=luks-${LUKS_UUID} rd.luks.options=luks-${LUKS_UUID}=tries=5,timeout=0"
HARDEN_ARGS="lockdown=integrity slab_nomerge init_on_alloc=1 init_on_free=1 randomize_kstack_offset=on vsyscall=none fbcon=nodefer"
# Find the running root UUID (the btrfs filesystem holding the root
# subvol). At this point in %post chroot, `/` is the target root;
# findmnt -o UUID resolves to the btrfs UUID anaconda chose.
ROOT_UUID=$(findmnt -n -o UUID /)
[ -z "$ROOT_UUID" ] && ROOT_UUID=$(blkid -s UUID -o value /dev/mapper/luks-${LUKS_UUID} 2>/dev/null)
# Three write paths, in priority order:
#
# Path A: /etc/kernel/cmdline (the canonical source of truth for
# `kernel-install`). Per /usr/lib/kernel/install.d/90-loaderentry.install
# lines 84-95, kernel-install reads /etc/kernel/cmdline first when
# authoring BLS entries. If we write this BEFORE re-running
# kernel-install, every BLS entry inherits our args. Persists
# across `dnf upgrade kernel`, `dnf reinstall grub2-*`, and any
# other path that re-fires kernel-install hooks.
#
# Path B: /etc/default/grub (legacy GRUB_CMDLINE_LINUX). Read by
# `grub2-mkconfig` for the generated grub.cfg. Belt-and-braces;
# kernel-install ignores this, but grub2-mkconfig respects it.
#
# Path C: grubby --update-kernel=ALL. Direct edit to BLS option
# lines. Acts as the last-writer in case our cmdline write didn't
# trigger a fresh kernel-install pass.
#
# Earlier veilor-os versions only used B+C. v0.5.31 adds Path A as
# the primary, because v0.5.30 testing showed B+C are racy with
# anaconda's own CreateBLSEntriesTask which uses kernel-install
# internally and can rewrite entries from empty /etc/kernel/cmdline,
# producing options lines with no rd.luks.uuid even when grubby
# successfully ran.
# Path A
mkdir -p /etc/kernel
if [ -n "$ROOT_UUID" ]; then
echo "root=UUID=${ROOT_UUID} ro rootflags=subvol=root ${LUKS_ARGS} ${HARDEN_ARGS}" > /etc/kernel/cmdline
echo "[INFO] wrote /etc/kernel/cmdline (canonical kernel-install source)"
else
echo "[WARN] could not determine root UUID; /etc/kernel/cmdline not written"
fi
# Path B
if ! grep -q "rd.luks.uuid" /etc/default/grub 2>/dev/null; then
sed -i "s|^GRUB_CMDLINE_LINUX=\"|GRUB_CMDLINE_LINUX=\"${LUKS_ARGS} |" /etc/default/grub
fi
# Re-run kernel-install for every kernel — picks up new /etc/kernel/cmdline,
# rewrites BLS entries with our args. This is the load-bearing step.
for kver in /lib/modules/*/; do
kver=$(basename "$kver")
[ -f "/lib/modules/$kver/vmlinuz" ] || continue
kernel-install add "$kver" "/lib/modules/$kver/vmlinuz" 2>&1 | tail -3 || \
echo "[WARN] kernel-install add $kver failed"
done
# Path C: belt-and-braces grubby update in case kernel-install missed any
grubby --update-kernel=ALL --args="${LUKS_ARGS}" 2>&1 | tail -5 || true
# Verification: every BLS entry MUST carry the LUKS arg.
drift=$(grep -L "rd.luks.uuid" /boot/loader/entries/*.conf 2>/dev/null)
if [ -n "$drift" ]; then
echo "[ERR] BLS entries missing rd.luks.uuid after all 3 paths: $drift"
else
echo "[OK] all BLS entries carry rd.luks.uuid"
fi
fi
# Verify anaconda wrote /etc/crypttab for the LUKS device. anaconda's
# custom-partitioning code path normally does this for `--encrypted`
# part directives; if it didn't (edge case, F43+ regressions), write
# a minimal entry so systemd-cryptsetup-generator can find the device
# at boot from the BLS args alone.
if [ -n "$LUKS_UUID" ] && ! grep -q "$LUKS_UUID" /etc/crypttab 2>/dev/null; then
echo "luks-${LUKS_UUID} UUID=${LUKS_UUID} none discard" >> /etc/crypttab
echo "[INFO] wrote /etc/crypttab fallback entry"
fi
# Switch plymouth to text-only `details` theme (scrolling boot log, no
# graphics, no logo). Theme is built-in to plymouth package, no asset
# install needed. v0.6 will ship custom veilor-themed plymouth.
plymouth-set-default-theme details 2>/dev/null || true
# Force-include LUKS + plymouth modules in initramfs. dracut autodetects
# crypt+plymouth from the running config, but custom-partitioning %post
# runs before dracut sees stable LUKS state, and stale initramfs files
# from anaconda's pre-install kernel may persist. Belt-and-braces.
mkdir -p /etc/dracut.conf.d
cat > /etc/dracut.conf.d/10-veilor-luks.conf <<'DRACUTEOF'
# veilor-os: guarantee LUKS + plymouth modules in initramfs
add_dracutmodules+=" crypt systemd-cryptsetup plymouth "
DRACUTEOF
# Regenerate initramfs with new theme + dracut.conf.d picks. Remove
# stale initramfs first so the regen actually rewrites bytes.
rm -f /boot/initramfs-*.img 2>/dev/null || true
dracut --force --regenerate-all 2>&1 | tail -5 || true
# Verify cryptsetup landed in initramfs. If not, LUKS unlock is impossible
# and the user gets emergency shell on first boot. Surfacing this early.
KVER=$(ls /lib/modules | head -1)
if [ -n "$KVER" ] && [ -f "/boot/initramfs-${KVER}.img" ]; then
if ! lsinitrd "/boot/initramfs-${KVER}.img" 2>/dev/null | grep -q cryptsetup; then
echo "[ERR] cryptsetup not found in initramfs — LUKS unlock will fail"
fi
fi
# Regen grub.cfg with new branding (anaconda already wrote one; replace).
grub2-mkconfig -o /boot/grub2/grub.cfg 2>/dev/null || true
[ -f /boot/efi/EFI/fedora/grub.cfg ] && \
grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg 2>/dev/null || true
# Suppress the auto-generated "Fedora Linux 0-rescue-…" BLS entry. Fedora
# ships a rescue kernel image alongside every install; the BLS entry that
# points at it is created by `kernel-install` and shows up in GRUB as a
# second menu item. For a branded distro it's noisy + reveals "Fedora"
# in the menu. The rescue image itself is harmless to keep on disk.
# Match both `<machine-id>-0-rescue.conf` (current Fedora 43 layout) and
# `<machine-id>-0-rescue-<kver>.conf` (older layout). The earlier glob
# `*-0-rescue-*.conf` required a trailing hyphen and missed the new form.
rm -f /boot/loader/entries/*-0-rescue*.conf 2>/dev/null || true
# Hostname: default to "veilor" rather than the localhost-live / fedora
# fallback that anaconda writes. User can override post-install with
# `hostnamectl set-hostname`.
echo "veilor" > /etc/hostname
# /etc/machine-info — feeds `hostnamectl status` and some BLS title
# rendering paths. PRETTY_HOSTNAME is what GRUB titles read when
# GRUB_DISTRIBUTOR doesn't override (Fedora 43 BLS entry titles ignore
# GRUB_DISTRIBUTOR; this is the closest we can get without rewriting
# every entry's `title` line).
cat > /etc/machine-info <<'MACHINFO'
PRETTY_HOSTNAME="veilor-os"
MACHINFO
# Rewrite the `title` line of every remaining BLS entry to use our
# brand. `kernel-install` will write new entries with "Fedora Linux …"
# but the file we ship + every kernel update goes through grubby which
# preserves whatever title was last set.
for entry in /boot/loader/entries/*.conf; do
[ -f "$entry" ] || continue
# Pull kernel version out of the existing title (e.g. "6.17.1-300.fc43.x86_64")
kver=$(awk '/^version / {print $2}' "$entry")
sed -i "s|^title .*|title veilor-os (${kver})|" "$entry"
done
# Symlink display-manager.service → sddm.service. (Anaconda usually
# handles this when sddm is the only DM, but be explicit.)
ln -sf /usr/lib/systemd/system/sddm.service /etc/systemd/system/display-manager.service
@ -507,24 +968,127 @@ KSEOF
# already rejects "$\`&|/\n at input — sed_escape() is defence in
# depth in case future code paths feed unsanitised values (e.g.
# locale/hostname from a file, or a relaxed validator).
# Detect cloud-init seed pubkey (if attached) → embed as anaconda
# `sshkey --username=admin` directive. Adds key to admin's
# authorized_keys at install time so SSH validation works first boot.
# No seed = empty directive line (anaconda treats blank line as no-op).
local seed_pubkey sshkey_directive=""
seed_pubkey=$(detect_seed_pubkey 2>/dev/null) || true
if [[ -n $seed_pubkey ]]; then
echo "[INFO] using cloud-init seed pubkey for admin authorized_keys"
# sshkey directive — quote pubkey, no shell meta in pubkeys.
sshkey_directive="sshkey --username=admin \"${seed_pubkey}\""
fi
sed -i \
-e "s|__LOCALE__|$(sed_escape "$SEL_LOCALE")|" \
-e "s|__HOSTNAME__|$(sed_escape "$SEL_HOSTNAME")|" \
-e "s|__DISK_BASENAME__|$(sed_escape "$disk_basename")|" \
-e "s|__LUKS_PW__|$(sed_escape "$SEL_LUKS_PW")|" \
-e "s|__ADMIN_PW__|$(sed_escape "$SEL_ADMIN_PW")|" \
-e "s|__SSHKEY_DIRECTIVE__|$(sed_escape "$sshkey_directive")|" \
"$out"
echo "[INFO] generated kickstart at $out"
return 0
}
_dump_logs_to_host() {
# If a virtio-9p share tagged "hostlogs" was attached by run-vm.sh,
# mount it and dump every anaconda + installer log into it. Runs on
# success AND failure (called via trap). No-op on real hardware where
# the 9p tag doesn't exist.
if ! grep -qw 9p /proc/filesystems 2>/dev/null; then return 0; fi
local m=/mnt/hostlogs
mkdir -p "$m"
mount -t 9p -o trans=virtio,version=9p2000.L hostlogs "$m" 2>/dev/null || return 0
cp -af /tmp/anaconda.log "$m/" 2>/dev/null || true
cp -af /tmp/program.log "$m/" 2>/dev/null || true
cp -af /tmp/storage.log "$m/" 2>/dev/null || true
cp -af /tmp/packaging.log "$m/" 2>/dev/null || true
cp -af /tmp/dnf.log "$m/" 2>/dev/null || true
cp -af /tmp/dnf.librepo.log "$m/" 2>/dev/null || true
cp -af /tmp/anaconda-cmdline.log "$m/" 2>/dev/null || true
cp -af /var/log/veilor-installer.log "$m/" 2>/dev/null || true
cp -af /run/install/veilor-generated.ks "$m/" 2>/dev/null || true
dmesg > "$m/dmesg.log" 2>/dev/null || true
journalctl -b > "$m/journal.log" 2>/dev/null || true
sync
umount "$m" 2>/dev/null || true
}
run_install() {
# Capture logs to host on every exit path (success, failure, ^C).
trap _dump_logs_to_host EXIT
# Anaconda env setup (see comments below).
export LANG="${SEL_LOCALE:-en_GB.UTF-8}"
export LC_ALL="$LANG"
# LANG / LC_ALL: anaconda's keyboard.activate_keyboard() raises
# AnacondaError: 'LANG' if missing. tty1 inherits no locale by default.
# XDG_RUNTIME_DIR: anaconda's display.setup_display() probes
# os.getenv("XDG_RUNTIME_DIR") + Wayland socket and crashes with
# TypeError on None. We're running unattended so no display needed,
# but env var still has to exist.
export XDG_RUNTIME_DIR="/run/user/$(id -u)"
mkdir -p "$XDG_RUNTIME_DIR"
chmod 0700 "$XDG_RUNTIME_DIR"
if [[ $TUI == gum ]]; then
clear
# gum spin runs anaconda as subprocess. --show-output forwards
# stdout/stderr to /tmp/anaconda-cmdline.log so we can debug.
# --title shows live-updating header. --spinner dot is minimal.
local rc=0
gum spin --spinner dot \
--title "Installing veilor-os to $SEL_DISK · this takes 10-30 min · logs on tty2" \
--show-output \
-- bash -c 'anaconda --cmdline --kickstart=/run/install/veilor-generated.ks 2>&1 | tee /tmp/anaconda-cmdline.log' || rc=$?
if [[ $rc -eq 0 ]]; then
# v0.6: split the success screen into THREE stacked boxes.
#
# 1. Green success box — quiet confirmation.
# 2. Yellow eject box — promoted out of the buried
# one-liner the v0.5 success box used. Operators on
# both onyx and the friend's RTX 4080 rig missed the
# reminder and rebooted into the live ISO instead of
# the install. Now it's its own loud thick-bordered
# box that sits BELOW the success box and is
# impossible to miss.
# 3. Reboot countdown — embedded inside the green
# success box so the operator can see "complete +
# Xs to reboot" at a glance.
#
# Each tick clears + redraws all three, so the eject-media
# box stays in front of the operator for the full 10-second
# window and isn't scrolled off by a banner refresh.
local secs
for secs in 10 9 8 7 6 5 4 3 2 1; do
clear
gum style --foreground 2 --border rounded --margin "1 2" --padding "1 3" \
"✓ Install complete" \
"" \
"Rebooting in ${secs}s..."
gum style --foreground 3 --border thick --margin "0 2" --padding "1 3" \
--border-foreground 3 \
" Remove the install media NOW " \
"" \
" Unplug the USB stick / eject the DVD before " \
" reboot, otherwise the system will boot back " \
" into the live ISO instead of your fresh install. "
sleep 1
done
systemctl reboot
else
prompt_error "Anaconda exited non-zero (status $rc).
Logs at /tmp/anaconda.log + /var/log/veilor-installer.log.
Press OK to drop to shell."
return 1
fi
else
prompt_message "Installing veilor-os to $SEL_DISK ...
This will take 10-30 minutes.
Logs: /var/log/veilor-installer.log + /tmp/anaconda.log"
sleep 1
# Hand off to anaconda. --kickstart runs unattended.
if anaconda --kickstart=/run/install/veilor-generated.ks; then
if anaconda --cmdline --kickstart=/run/install/veilor-generated.ks; then
prompt_message "Install complete. System will reboot.
Remove the install media after shutdown."
sleep 3
@ -535,6 +1099,7 @@ Logs at /tmp/anaconda.log + /var/log/veilor-installer.log.
Press OK to drop to shell."
return 1
fi
fi
}
drop_to_shell() {
@ -575,10 +1140,12 @@ while true; do
run_install || continue
fi
;;
"live - (KDE)") launch_desktop ;;
"live - shell") drop_to_shell ;;
"live · KDE") launch_desktop ;;
"live · shell") drop_to_shell ;;
"──────") banner ;; # separator clicked: redraw, no-op
"Reboot") systemctl reboot ;;
"Power off") systemctl poweroff ;;
*) drop_to_shell ;;
"") banner ;; # ESC/cancel from menu: redraw
*) banner ;;
esac
done

77
scripts/40-apparmor.sh Normal file
View file

@ -0,0 +1,77 @@
#!/usr/bin/env bash
# veilor-os — 40-apparmor: load veilor-shipped AppArmor profiles in
# COMPLAIN mode. v0.6 scope: "loaded, present, nothing breaks".
#
# Per docs/research/2026-05-05-agent-wave/04-hardening-tier-2.md, v0.6
# ships AppArmor stacked alongside SELinux, but every veilor-shipped
# profile stays in complain mode (logs only, no enforce). Real policy
# authoring is post-v0.6.
#
# Idempotent: profiles already in complain mode are skipped. Run as
# root during kickstart %post or post-install.
set -uo pipefail
GREEN='\033[0;32m'; YELLOW='\033[1;33m'; RED='\033[0;31m'; NC='\033[0m'
ok() { echo -e "${GREEN}[OK]${NC} $*"; }
info() { echo -e "${YELLOW}[INFO]${NC} $*"; }
warn() { echo -e "${YELLOW}[WARN]${NC} $*"; }
err() { echo -e "${RED}[ERR]${NC} $*"; }
[[ $EUID -eq 0 ]] || { err "Must run as root"; exit 1; }
echo "════════════════════════════════════════════════════════"
echo " veilor-os :: 40-apparmor (complain mode only)"
echo "════════════════════════════════════════════════════════"
PROFILE_DIR=/etc/apparmor.d/veilor.d
# ── Sanity: tools present? ──
if ! command -v apparmor_parser >/dev/null 2>&1; then
warn "apparmor_parser not installed — skipping (package step missed?)"
exit 0
fi
if ! command -v aa-complain >/dev/null 2>&1; then
warn "aa-complain not installed (apparmor-utils missing) — skipping"
exit 0
fi
if [[ ! -d $PROFILE_DIR ]]; then
info "$PROFILE_DIR not present — no veilor profiles to load"
exit 0
fi
# ── Walk every profile we ship and force complain mode ──
shopt -s nullglob
loaded=0
skipped=0
failed=0
for profile in "$PROFILE_DIR"/*; do
[[ -f $profile ]] || continue
name=$(basename "$profile")
# Already in complain mode? aa-status reports loaded profiles by
# internal profile name, not file path — best-effort match against
# the file basename to avoid re-parsing on repeat runs.
if command -v aa-status >/dev/null 2>&1 \
&& aa-status --complaining 2>/dev/null | grep -qE "(^|/)veilor-${name}([[:space:]]|$)"; then
info "$name already in complain mode — skipping"
skipped=$((skipped + 1))
continue
fi
info "loading $name (complain mode)"
if aa-complain "$profile" >/dev/null 2>&1; then
ok "$name → complain"
loaded=$((loaded + 1))
else
warn "$name failed to load (parser may reject stub on this kernel)"
failed=$((failed + 1))
fi
done
echo "────────────────────────────────────────────────────────"
info "summary: loaded=$loaded skipped=$skipped failed=$failed"
ok "v0.6 AppArmor stub: complain-mode only — no enforcement, log-only"
exit 0

85
test/METHOD-CHANGELOG.md Normal file
View file

@ -0,0 +1,85 @@
# veilor-os — Test Method Changelog
Append-only log of changes to `test/TESTING.md`. Each entry: date, the
veilor-os version it first applied to, what changed in the procedure,
and *why*. The why is the load-bearing part — without it this file
becomes a list of opinions.
Entries are newest-first.
---
## 2026-05-06 · v0.5.32 · ISO build path moved to Forgejo
**Change:** Build host for the test ISO has moved off GitHub Actions
onto the Forgejo runner on nullstone. The hybrid VM test procedure in
`TESTING.md` is **unchanged** — the gum installer still drives every
step it can, the operator still types the LUKS + admin passwords
directly into the QEMU window. The only thing different is where the
ISO comes from and how the host log is captured.
**Practical deltas for testers:**
- ISO download: from the Forgejo `ci-latest` rolling release at
<https://git.s8n.ru/veilor-org/veilor-os/releases/tag/ci-latest>.
The tag is force-replaced on each successful `build-iso.yml` run, so
always re-download — don't rely on a cached copy.
- Re-flash to USB / virtio-blk image via Etcher / `dd`**unchanged**.
Same `sha256sum -c` step; same image format.
- virtio-9p host log capture is now **active by default** in
`test/run-vm.sh`. This replaces the broken virtio-serial path
flagged by Agent 6 in the 2026-05-05 wave; Anaconda logs land in the
host-side mount automatically once the VM boots, no manual `tail -f`
on a broken serial console.
- Build host for the record: forgejo-runner on nullstone, runner label
`ubuntu-24.04`, image `catthehacker/ubuntu:act-24.04`. Reproducibility
is unchanged from the GH Actions ubuntu-24.04 base — the act image
matches GHA's runner image to within package versions.
**Why:** GitHub mirror was disabled 2026-05-06 (repo is now
private-by-default on Forgejo); GH Actions builds would just stop
producing artifacts. Moving CI in-house onto nullstone keeps the
test/release loop intact and removes the external dependency for
private-build cycles. Documenting the change here so a future tester
reading TESTING.md doesn't waste time hunting an artifact in a
GitHub run that never happened.
**Files touched in this entry:**
- `test/METHOD-CHANGELOG.md` — this entry.
`test/TESTING.md` itself is **not** edited — the procedure prose still
applies verbatim. Only the build host and the URL where the ISO lives
changed.
---
## 2026-05-05 · v0.5.27 · TESTING.md created
**Change:** First version of the canonical procedure document.
**Why:** Through v0.2 → v0.5.26 we'd been reproducing the test
procedure ad-hoc each time, which meant test runs were uncomparable
and regressions were caught only by accident. The v0.5.26 → v0.5.27
debugging session surfaced the LUKS-cmdline bug, the GRUB rebrand
gap, the gum-cursor render glitch, and the fbcon KMS issue all in a
single VM run — but only because the test happened to walk every step
in order. Codifying the steps means the next regression is caught the
same way reliably.
The procedure documents the **hybrid VM method** explicitly: Claude
drives every step it can via QEMU monitor `sendkey`, the human types
LUKS + admin passwords directly into the QEMU window because plymouth
ignores synthesised keystrokes. Past trail (14+ failed sendkey
variants) is the source of truth for that limitation; do not re-fight
that battle without first rereading the trail.
The procedure also separates **VM** (cheap iteration, catches install
logic) from **real hardware** (mandatory for tag, catches firmware /
KMS / GPU). Future releases must produce a `test/test-runs/` report
for each before tagging.
**Files added:**
- `test/TESTING.md` — canonical procedure
- `test/METHOD-CHANGELOG.md` — this file
- `test/test-runs/` — per-run reports go here (template lands with
first real run, currently empty)

66
test/README.md Normal file
View file

@ -0,0 +1,66 @@
# test/
Test harnesses for veilor-os ISO builds.
## Files
| File | Purpose |
|------|---------|
| `run-vm.sh` | Manual smoke test — boot the latest ISO interactively in QEMU/KVM. SSH key injection via cloud-init seed + monitor sendkey fallback for live-image login. |
| `auto-install.sh` | **Autonomous** end-to-end install test. Boots ISO, drives the gum installer via QEMU monitor `sendkey`, waits for anaconda to finish + reboot, SSHs into the installed system, runs validation checklist. Prints PASS/FAIL summary. |
| `auto-install-keymap.sh` | Sourced helper. Provides `km_send_str`, `km_send_chord`, `km_send_key`, `km_screendump`, `km_wait_socket`, etc. Reusable by other automation. |
| `boot-checklist.md` | Manual post-install checklist (run on a real spare laptop). |
## Running the autonomous installer test
```sh
./test/auto-install.sh build/out/veilor-os-*.iso
```
Hardcoded inputs (deterministic — do not edit during a test run):
- Disk: first `/dev/vda` (the only disk in QEMU)
- Hostname: `veilor` (installer hardcoded since v0.5.4)
- LUKS passphrase: `testpass1234`
- Admin password: `adminpass1234`
- Locale: `en_GB.UTF-8`
Expected runtime: 2030 minutes wall clock (anaconda dominates).
### Outputs
- `/tmp/veilor-auto-install.log` — full driver log
- `/tmp/veilor-auto-install-NN-<step>.png` — milestone screenshots
- `/tmp/veilor-auto-install-final-ssh.txt` — final SSH session capture (uname/lsblk/cmdline/failed units)
### Exit codes
- `0` — all validation checks passed
- `1` — any failure (anaconda crashed, SSH never came up, validation check failed)
- `2` — preflight failure (missing tool, bad ISO arg, missing OVMF)
### Prerequisites
- `qemu-system-x86_64`, `qemu-img`, `socat`, `ssh`, `ssh-keygen`
- `edk2-ovmf` (OVMF UEFI firmware at `/usr/share/edk2/ovmf/OVMF_{CODE,VARS}.fd`)
- `mkisofs` or `xorriso` (for cloud-init seed ISO; harness falls back to TTY1 driving if seed cannot be built or cloud-init does not run on the installed system)
- `convert` from ImageMagick (optional — converts PPM screendumps to PNG; harness keeps PPM if absent)
- KVM access (`/dev/kvm` readable by the user)
### What it validates
Post-install on the booted system:
- `/etc/os-release``NAME=veilor-os`
- `hostnamectl --static``veilor`
- `systemctl is-active``active` for `sshd fail2ban usbguard tuned auditd firewalld chronyd sddm`
- `getenforce``Enforcing` (preferred) or `Permissive` (acceptable for v0.5.x)
- `lsblk -f` shows `crypto_LUKS` + `btrfs`
- `/etc/crypttab` has a LUKS entry
- `getent passwd admin` returns the user
- `/usr/local/bin/{veilor-power,veilor-doctor,veilor-update}` are present and executable
- `/proc/cmdline` contains `init_on_alloc=1`
### Troubleshooting
- **Stuck at boot banner**: ISO didn't autostart `veilor-installer` on tty1. Check `serial.log` and `auto-install-vm-NN-*.png` screenshots. The harness aborts after 5 minutes of identical screen frames.
- **SSH never up**: cloud-init may not have run on the installed system (no `cidata` mount). The harness falls back to TTY1 driving — typing the LUKS passphrase, logging in as admin, and hand-injecting the SSH key. If both paths fail, validation cannot proceed.
- **`screendump` produces unreadable PPM**: install ImageMagick (`dnf install ImageMagick`) so the harness converts to PNG.

187
test/TESTING.md Normal file
View file

@ -0,0 +1,187 @@
# veilor-os — Testing Procedure
This document is the canonical procedure for validating a veilor-os ISO
end-to-end. Every release that gets a tag MUST have a corresponding
test-run report in `test/test-runs/` linked from the release notes.
If reality forces you to deviate from the steps below, **do not silently
patch the procedure** — open a commit that updates this file *and*
appends an entry to `test/METHOD-CHANGELOG.md` explaining what changed
and why. The changelog is what makes the procedure auditable; the
procedure itself is just the latest snapshot.
---
## Two test environments
| Environment | Catches | Doesn't catch |
|-------------|---------|---------------|
| **VM (QEMU + virtio-vga)** | install logic, kickstart bugs, %post failures, anaconda transaction failures, GRUB write, BLS entries, package selection, network stack | KMS / fbcon issues, real-firmware Secure Boot, USB controller quirks, GPU driver compatibility, sleep/wake, battery, thermals |
| **Real hardware (USB → spare laptop)** | everything VM doesn't | install repeatability (you only have so many spare laptops) |
Both are required for any tagged release. VM first (cheap iteration),
real hardware second (final sign-off).
---
## VM test — hybrid procedure
The VM cannot type LUKS / admin passwords through QEMU's `sendkey`
monitor command — plymouth's IPC ignores synthesised keystrokes (we
verified this across 14+ sendkey variants in earlier sessions). The
hybrid procedure splits the work: Claude/automation drives every step
that doesn't need a password; the human types the two passwords (LUKS
+ admin) into the QEMU window directly.
Standard test passwords (lab use only — never reuse outside this repo):
| Prompt | Type |
|--------|------|
| LUKS passphrase | `veilortest1` |
| Admin password | `veilortest1` |
Both passwords identical on purpose — easier to remember mid-test, both
satisfy the installer's 8-char min, neither contains shell-special
chars (validate_pw rejects `" $ \ \` & | / \n`).
### Run a VM test
```bash
cd ~/ai-lab/_github/veilor-os
# Pull the ISO you want to test (from a CI release or local build)
ls /home/admin/Downloads/veilor-os-*.iso
# Wipe stale state, launch VM with monitor sock (no auto-inject — we
# don't want sendkey noise typing into prompts)
FRESH=1 NO_INJECT=1 DISPLAY=:0 ./test/run-vm.sh \
/home/admin/Downloads/veilor-os-43-YYYYMMDD-HHMMSS.iso
```
Then either (a) drive the install yourself in the QEMU window, or
(b) hand the monitor sock to Claude / a script:
- Monitor sock: `test/veilor-vm.monitor.sock`
- Send a key: `echo "sendkey ret" | socat - "UNIX-CONNECT:$SOCK"`
- Screendump: `echo "screendump /tmp/x.ppm" | socat - "UNIX-CONNECT:$SOCK"; magick /tmp/x.ppm /tmp/x.png`
### Steps to verify
The complete checklist lives in `test/boot-checklist.md` — that file is
the granular pass/fail list. The high-level flow is:
1. **Live boot.** GRUB (legacy menu, no Plymouth splash) → text scroll
→ veilor-installer banner on tty1 within ~30s. No "fedora" branding
anywhere on screen.
2. **Installer menu.** "Install" highlighted by default. No phantom
duplicate items, no stray characters in input fields.
3. **Disk picker.** `/dev/vda` (or whatever virtio gives you) listed
with size + model.
4. **Passwords.** LUKS + admin prompts; user types `veilortest1` twice.
5. **Locale.** en_GB.UTF-8 picks up.
6. **Confirm.** Disk shown with `WILL BE ERASED`, locale + LUKS/admin
ticks shown.
7. **Anaconda.** "Installing veilor-os to /dev/vda · 1030 min · logs
on tty4". Watch for `Configuring man-db` — if anything fails, this
is historically where it dies.
8. **Reboot.** VM reboots; ISO must NOT boot first this time. Kill
QEMU + relaunch without ISO drive (see *Boot installed disk* below)
to skip the GRUB-from-ISO path.
9. **GRUB.** Single "veilor-os" entry (no rescue, no "Fedora Linux").
10. **LUKS prompt.** Plymouth `details` theme — text-mode prompt for
passphrase. User types `veilortest1` in the QEMU window (sendkey
will not work).
11. **First boot.** SDDM splash → admin user pre-filled → admin types
`veilortest1` → password-change prompt (chage -d 0 expired the
password) → user picks new password → KDE Plasma session.
12. **Hardening checks** per `test/boot-checklist.md` (SELinux
enforcing, fail2ban active, USBGuard active, tuned profile, etc.).
### Boot installed disk (skip ISO)
After the install reboots, QEMU's CD-first boot order will land back
in the live ISO. Easiest workaround: kill QEMU and re-launch without
the `-drive file=...iso` line. The qcow2 retains the install:
```bash
pkill -f 'qemu-system.*veilor-os'
cd ~/ai-lab/_github/veilor-os/test
DISPLAY=:0 qemu-system-x86_64 \
-enable-kvm -cpu host -smp 4 -m 4096 \
-machine q35,smm=on \
-global driver=cfi.pflash01,property=secure,value=on \
-drive if=pflash,format=raw,readonly=on,file=/usr/share/edk2/ovmf/OVMF_CODE.fd \
-drive if=pflash,format=raw,file=$PWD/veilor-vm.nvram \
-drive file=$PWD/veilor-vm.qcow2,if=virtio,format=qcow2 \
-monitor unix:$PWD/veilor-vm.monitor.sock,server,nowait \
-netdev user,id=net0,hostfwd=tcp::2222-:22 \
-device virtio-net-pci,netdev=net0 \
-vga virtio -display gtk,gl=on
```
---
## Real-hardware test — USB → spare laptop
Required for every tagged release. The VM cannot reproduce KMS /
fbcon / GPU-driver issues; only real silicon will.
### 1. Flash USB
```bash
# 8GB+ USB stick, identified by lsblk (e.g. /dev/sda — confirm vendor)
sudo umount /dev/sdX* 2>/dev/null
sudo wipefs -a /dev/sdX
sudo dd if=/path/to/veilor-os-*.iso of=/dev/sdX bs=4M status=progress conv=fsync
sync
sudo eject /dev/sdX
```
Etcher / GNOME Disks also fine. Verify-after-flash is built into
Etcher; for `dd`, run `cmp` on the first ISO_SIZE bytes if paranoid.
### 2. Boot test
- Disable Secure Boot in firmware (until we MOK-enroll our shim, which
is v0.5+).
- Boot from USB.
- Walk the same numbered steps as the VM section, except:
- On "TYPE NOW: passphrase" steps, you actually have a keyboard.
- At step 8, the laptop will eject the USB and reboot to the
installed system without intervention.
- At step 11, do NOT use `veilortest1` for the post-install admin
password change — pick something real if this is your daily-driver
laptop, or a throwaway if it's a test machine. The kickstart's
ChainOfTrust ends here; from this prompt forward you own the
password.
### 3. Capture findings
Fill in a fresh `test/test-runs/YYYY-MM-DD-vX.Y.Z.md` from the
template. **Always** capture: GRUB title, kernel cmdline (`cat
/proc/cmdline`), `lsblk -f`, `getenforce`, `systemctl is-active fail2ban
usbguard tuned auditd firewalld`, `journalctl -b -p err --no-pager`.
If anything regressed, that goes at the top of the report under
**Regressions**, with a screenshot if possible.
---
## Per-run report template
Copy `test/test-runs/_TEMPLATE.md` (created when the first real
test-run lands) and fill in section-by-section. Keep them brief —
this is meant to be a 5-minute write-up, not a thesis.
---
## When to alter this procedure
If a step turns out to be wrong, redundant, or missing:
1. Edit this file.
2. Append to `test/METHOD-CHANGELOG.md` with: date, version it first
applied to, what changed, and why (cite a specific test-run report
if the change is in response to a finding).
3. Reference the changelog entry in your commit message.
The changelog is the audit trail. Don't skip it.

167
test/auto-install-keymap.sh Executable file
View file

@ -0,0 +1,167 @@
#!/usr/bin/env bash
# auto-install-keymap.sh — sourced helper for QEMU-monitor-driven UI automation.
#
# Provides a minimal but complete US-layout keymap mapping every printable
# ASCII character to a QEMU `sendkey` chord, plus convenience wrappers for
# typing strings, sending special keys, taking screenshots, and waiting for
# the monitor socket to appear.
#
# Usage:
# source test/auto-install-keymap.sh
# MONITOR_SOCK=/path/to/sock
# km_wait_socket "$MONITOR_SOCK" 60
# km_send_str "$MONITOR_SOCK" "hello world"
# km_send_key "$MONITOR_SOCK" ret
# km_send_chord "$MONITOR_SOCK" ctrl alt f1
# km_screendump "$MONITOR_SOCK" /tmp/shot.ppm
#
# Why a separate file: other harnesses (regression suites, fuzzers) can
# source this without dragging in the full installer test driver.
# Guard against double-source.
[[ -n "${__VEILOR_KEYMAP_LOADED:-}" ]] && return 0
__VEILOR_KEYMAP_LOADED=1
# ── Tool requirements ──────────────────────────────────────────────────
# socat is the canonical way to talk to a unix-domain QEMU monitor.
# nc-openbsd would also work but socat is what run-vm.sh already uses.
km_require_tools() {
local missing=()
for t in socat qemu-img qemu-system-x86_64; do
command -v "$t" >/dev/null 2>&1 || missing+=("$t")
done
if [[ ${#missing[@]} -gt 0 ]]; then
echo "[ERR] missing required tools: ${missing[*]}" >&2
return 1
fi
}
# ── Low-level monitor I/O ──────────────────────────────────────────────
# Send a single line of monitor input. Newlines are critical — QEMU's HMP
# parses one command per line. Errors are swallowed: the most common cause
# is the VM having shut down between two send_* calls, which we tolerate.
km_monitor_send() {
local sock=$1; shift
printf '%s\n' "$*" | socat - "UNIX-CONNECT:$sock" 2>/dev/null || true
}
# Send a raw HMP command and capture any stdout response (e.g. for `info`
# queries). Trims the QEMU monitor banner + prompt noise.
km_monitor_query() {
local sock=$1; shift
printf '%s\n' "$*" | socat -t 1 - "UNIX-CONNECT:$sock" 2>/dev/null \
| sed -e 's/\r//g' -e '/^QEMU /d' -e '/^(qemu)/d' || true
}
# Wait until the monitor unix socket exists and accepts connections.
# $2 = max wait in seconds (default 60).
km_wait_socket() {
local sock=$1 max=${2:-60} waited=0
while (( waited < max )); do
if [[ -S $sock ]]; then
# Try a no-op query — confirms the QEMU side is actually serving.
if printf 'info status\n' | socat -t 1 - "UNIX-CONNECT:$sock" >/dev/null 2>&1; then
return 0
fi
fi
sleep 1
waited=$((waited + 1))
done
echo "[ERR] monitor socket $sock never became ready (waited ${max}s)" >&2
return 1
}
# ── Screenshots ────────────────────────────────────────────────────────
# Ask QEMU to dump the current framebuffer. Output is PPM. Convert to PNG
# with ImageMagick if available; otherwise leave PPM and warn.
km_screendump() {
local sock=$1 out=$2
local ppm="${out%.png}.ppm"
km_monitor_send "$sock" "screendump $ppm"
sleep 1 # give QEMU a moment to flush
if [[ -f $ppm ]] && command -v convert >/dev/null 2>&1; then
convert "$ppm" "$out" 2>/dev/null && rm -f "$ppm"
fi
}
# ── Key tables ─────────────────────────────────────────────────────────
# QEMU `sendkey` reference: docs/system/keys.html.in. The HMP names are
# the X11 keysym lower-case, with a few exceptions for non-letter keys
# (spc, ret, minus, etc.). What follows is the full US-layout printable
# ASCII set. Everything outside this table is silently dropped — callers
# are responsible for not feeding it characters the installer can't accept
# anyway (passwords are validated to ASCII-printable in veilor-installer).
declare -gA __KM_PLAIN=(
[' ']=spc [a]=a [b]=b [c]=c [d]=d [e]=e [f]=f [g]=g [h]=h
[i]=i [j]=j [k]=k [l]=l [m]=m [n]=n [o]=o [p]=p [q]=q [r]=r
[s]=s [t]=t [u]=u [v]=v [w]=w [x]=x [y]=y [z]=z
[0]=0 [1]=1 [2]=2 [3]=3 [4]=4 [5]=5 [6]=6 [7]=7 [8]=8 [9]=9
['-']=minus ['=']=equal ['[']=bracket_left [']']=bracket_right
[';']=semicolon ["'"]=apostrophe [',']=comma ['.']=dot
['/']=slash ['\\']=backslash ['`']=grave_accent
)
# Shift-prefixed (US): all caps + shifted-symbol row.
declare -gA __KM_SHIFT=(
[A]=a [B]=b [C]=c [D]=d [E]=e [F]=f [G]=g [H]=h [I]=i [J]=j
[K]=k [L]=l [M]=m [N]=n [O]=o [P]=p [Q]=q [R]=r [S]=s [T]=t
[U]=u [V]=v [W]=w [X]=x [Y]=y [Z]=z
['!']=1 ['@']=2 ['#']=3 ['$']=4 ['%']=5
['^']=6 ['&']=7 ['*']=8 ['(']=9 [')']=0
['_']=minus ['+']=equal ['{']=bracket_left ['}']=bracket_right
[':']=semicolon ['"']=apostrophe ['<']=comma ['>']=dot
['?']=slash ['|']=backslash ['~']=grave_accent
)
# ── Public send wrappers ───────────────────────────────────────────────
# Send a single named key (e.g. ret, esc, up, tab, f1).
km_send_key() {
local sock=$1 key=$2
km_monitor_send "$sock" "sendkey $key"
}
# Send a chord — components are joined with `-` per QEMU HMP syntax.
km_send_chord() {
local sock=$1; shift
local IFS='-'
km_monitor_send "$sock" "sendkey $*"
}
# Type a string by encoding each character via the keymap. Unrecognised
# characters are skipped with a warning to stderr — caller is expected to
# stick to printable ASCII.
km_send_str() {
local sock=$1 s=$2 ch chord
local i=0
while (( i < ${#s} )); do
ch="${s:i:1}"
if [[ -n "${__KM_PLAIN[$ch]:-}" ]]; then
chord="${__KM_PLAIN[$ch]}"
km_monitor_send "$sock" "sendkey $chord"
elif [[ -n "${__KM_SHIFT[$ch]:-}" ]]; then
chord="${__KM_SHIFT[$ch]}"
km_monitor_send "$sock" "sendkey shift-$chord"
else
printf '[WARN] km_send_str: unencodable char %q skipped\n' "$ch" >&2
fi
i=$((i + 1))
# Tiny gap so QEMU doesn't drop fast keypresses on busy hosts.
# Empirically 5ms = the line between "100% reliable" and "loses ~1%".
sleep 0.02
done
}
# Convenience: type a string then press Enter.
km_send_line() {
local sock=$1 s=$2
km_send_str "$sock" "$s"
km_send_key "$sock" ret
}
# Visual indicator for log readability — prints a banner + a short pause so
# the next monitor command has time to land on a stable UI frame. Used by
# the harness between major steps; safe to skip in automated reuse.
km_step_banner() {
local label=$1
printf '\n──── %s @ %s ────\n' "$label" "$(date +'%H:%M:%S')"
}

673
test/auto-install.sh Executable file
View file

@ -0,0 +1,673 @@
#!/usr/bin/env bash
# auto-install.sh — autonomous end-to-end install test for veilor-os.
#
# Boots a fresh ISO under QEMU, drives the gum installer via the QEMU
# monitor (sendkey events), waits for anaconda to finish + reboot, SSHes
# into the installed system, and runs a validation checklist.
#
# Usage:
# ./test/auto-install.sh path/to/veilor-os-*.iso
#
# Expected runtime:
# * boot + drive installer: ~3 min
# * anaconda install (KDE): ~15-25 min (depends on mirrors + host CPU)
# * reboot + SSH up: ~2 min
# * validation checks: <1 min
# * total: 20-30 min wall clock
#
# Hardcoded test inputs (do NOT edit — meant to be deterministic):
# disk first /dev/vda (only disk in QEMU)
# hostname "veilor" (installer hardcodes this in v0.5.4)
# LUKS pw testpass1234
# admin pw adminpass1234
# locale en_GB.UTF-8 (first option, accepted with Enter)
#
# Outputs:
# /tmp/veilor-auto-install.log — full driver log
# /tmp/veilor-auto-install-NN-<step>.png — milestone screenshots
# /tmp/veilor-auto-install-final-ssh.txt — final SSH session capture
#
# Exit codes:
# 0 = all validation checks passed
# 1 = any failure (anaconda crash, SSH never up, validation failed)
# 2 = preflight failure (missing tool, bad ISO arg)
#
# This script intentionally does not source test/run-vm.sh — it needs a
# different QEMU configuration (no live cloud-init seed since we're driving
# the installed-system path), and run-vm.sh `exec`s qemu, which is
# incompatible with running QEMU as a backgrounded child here.
set -uo pipefail
# ── Constants ──────────────────────────────────────────────────────────
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
TEST_DIR="$SCRIPT_DIR"
DISK="$TEST_DIR/auto-install-vm.qcow2"
NVRAM="$TEST_DIR/auto-install-vm.nvram"
MONITOR_SOCK="$TEST_DIR/auto-install-vm.monitor.sock"
SERIAL_LOG="$TEST_DIR/auto-install-vm.serial.log"
SEED_ISO="$TEST_DIR/auto-install-seed.iso"
LOG=/tmp/veilor-auto-install.log
SHOT_PREFIX=/tmp/veilor-auto-install
SSH_PORT=2222
SSH_USER=admin
LUKS_PW="testpass1234"
ADMIN_PW="adminpass1234"
# Disk: 40G is enough headroom — KDE base + 8G LUKS + LVM overhead fits in
# ~12G actual, but qcow2 only allocates blocks that get touched.
DISK_SIZE=40G
# OVMF firmware paths — Fedora layout. Caller can override if needed.
OVMF_CODE="${OVMF_CODE:-/usr/share/edk2/ovmf/OVMF_CODE.fd}"
OVMF_VARS_SRC="${OVMF_VARS_SRC:-/usr/share/edk2/ovmf/OVMF_VARS.fd}"
# Timing knobs — coarse but deliberate. Tighten only after observing slack
# on a real run.
WAIT_MONITOR_S=120 # qemu boot to monitor socket alive
WAIT_INSTALLER_BANNER_S=180 # ISO boot → tty1 gum menu visible
WAIT_GUM_PROMPT_S=8 # gum draws each prompt within ~5s
WAIT_AFTER_INPUT_S=3 # let UI advance after we hit Enter
ANACONDA_TIMEOUT_S=2700 # 45 min — anaconda + reboot + SSH come-up
ANACONDA_POLL_S=30 # screenshot/poll cadence during install
# ── Logging ────────────────────────────────────────────────────────────
: > "$LOG"
log() { printf '[%s] %s\n' "$(date +'%H:%M:%S')" "$*" | tee -a "$LOG"; }
fail() { log "FAIL: $*"; exit 1; }
# Source the keymap helper.
# shellcheck source=auto-install-keymap.sh
. "$SCRIPT_DIR/auto-install-keymap.sh"
# ── Preflight ──────────────────────────────────────────────────────────
preflight() {
log "preflight: checking environment"
ISO="${1:-}"
if [[ -z $ISO ]]; then
# Auto-fetch from ci-latest GH release if no path given. ISO is split
# into chunks (GH release 2 GiB asset cap). Reassemble before boot.
log "no ISO path given — fetching from gh release ci-latest"
local dl_dir="$HOME/veilor-iso/ci-latest"
mkdir -p "$dl_dir"
( cd "$dl_dir" && rm -f *.part-* *.iso *.sha256 && \
gh release download ci-latest --repo veilor-org/veilor-os \
--pattern '*.iso.part-*' --pattern '*.parts.sha256' --clobber ) || {
echo "[ERR] gh release download failed — is the ci-latest release populated?" >&2
exit 2
}
( cd "$dl_dir" && \
local stem
stem=$(ls *.part-00 2>/dev/null | head -1 | sed 's/\.part-00$//')
[ -n "$stem" ] || { echo "[ERR] no .part-00 in download"; exit 2; }
log "reassembling $stem from $(ls "$stem".part-* | wc -l) parts"
cat "$stem".part-* > "$stem"
sha256sum -c *.parts.sha256 || { echo "[ERR] reassembly checksum mismatch"; exit 2; }
) || exit 2
ISO=$(ls "$dl_dir"/*.iso 2>/dev/null | head -1)
[ -n "$ISO" ] || { echo "[ERR] no iso after reassembly"; exit 2; }
fi
if [[ ! -f $ISO ]]; then
echo "[ERR] ISO not found: $ISO" >&2
exit 2
fi
km_require_tools || exit 2
for t in ssh ssh-keygen pgrep pkill; do
command -v "$t" >/dev/null 2>&1 || { echo "[ERR] missing $t" >&2; exit 2; }
done
if [[ ! -f $OVMF_CODE ]]; then
echo "[ERR] OVMF firmware missing: $OVMF_CODE (install edk2-ovmf)" >&2
exit 2
fi
log "preflight: ISO=$ISO"
}
# ── VM lifecycle ───────────────────────────────────────────────────────
# Kill any QEMU we previously started + scrub state files. Idempotent.
kill_existing_vm() {
log "killing any existing auto-install QEMU"
if [[ -n "${QEMU_PID:-}" ]] && kill -0 "$QEMU_PID" 2>/dev/null; then
kill "$QEMU_PID" 2>/dev/null || true
sleep 2
kill -9 "$QEMU_PID" 2>/dev/null || true
fi
# Catch orphans from prior runs — match by disk path so we don't kill
# the user's other QEMU VMs.
pkill -f "qemu-system-x86_64.*$DISK" 2>/dev/null || true
rm -f "$MONITOR_SOCK" "$SERIAL_LOG"
}
# Wipe disk + nvram so each run is reproducible.
wipe_state() {
log "wiping qcow2 + nvram"
rm -f "$DISK" "$NVRAM" "$SEED_ISO"
qemu-img create -f qcow2 "$DISK" "$DISK_SIZE" >/dev/null
cp "$OVMF_VARS_SRC" "$NVRAM"
}
# Build a NoCloud cloud-init seed ISO so anaconda's installed system picks
# up our SSH pubkey on first boot. The installer-generated ks doesn't
# explicitly invoke cloud-init, but Fedora ships cloud-init enabled by
# default in @core; if a cidata seed is present at boot, NoCloud datasource
# fires and we get key injection for free.
build_seed_iso() {
local pubkey="" found=""
for cand in "$HOME/.ssh/id_ed25519.pub" "$HOME/.ssh/id_rsa.pub"; do
if [[ -f $cand ]]; then
pubkey="$(< "$cand")"
found=$cand
break
fi
done
if [[ -z $pubkey ]]; then
log "seed: no host SSH pubkey found at ~/.ssh/id_{ed25519,rsa}.pub"
log "seed: generating throwaway test key"
local key=$TEST_DIR/auto-install-id_ed25519
rm -f "$key" "$key.pub"
ssh-keygen -t ed25519 -N '' -f "$key" -C "veilor-auto-install" >/dev/null
pubkey="$(< "$key.pub")"
TEST_KEY="$key"
else
log "seed: using $found"
# Match host id; assume corresponding private key exists alongside.
TEST_KEY="${found%.pub}"
fi
local d
d=$(mktemp -d)
cat > "$d/meta-data" <<EOF
instance-id: veilor-auto-install
local-hostname: veilor
EOF
cat > "$d/user-data" <<EOF
#cloud-config
users:
- name: admin
ssh_authorized_keys:
- $pubkey
lock_passwd: false
ssh_pwauth: true
runcmd:
- rm -f /etc/ssh/sshd_config.d/10-veilor-hardening.conf
- systemctl reload sshd || systemctl restart sshd || true
EOF
if command -v mkisofs >/dev/null 2>&1; then
mkisofs -quiet -output "$SEED_ISO" -volid cidata -joliet -rock \
"$d/user-data" "$d/meta-data"
elif command -v xorriso >/dev/null 2>&1; then
xorriso -as mkisofs -quiet -output "$SEED_ISO" -volid cidata \
-joliet -rock "$d/user-data" "$d/meta-data"
else
log "seed: no mkisofs/xorriso — SSH key injection unavailable"
SEED_ISO=""
fi
rm -rf "$d"
[[ -f $SEED_ISO ]] && log "seed: built $SEED_ISO"
}
# Launch QEMU in the background. Returns once the monitor socket is alive.
launch_vm() {
local iso=$1
log "launching QEMU"
local seed_args=()
[[ -n $SEED_ISO && -f $SEED_ISO ]] && \
seed_args=(-drive "file=$SEED_ISO,media=cdrom,readonly=on")
qemu-system-x86_64 \
-name veilor-auto-install \
-enable-kvm \
-cpu host \
-smp 4 \
-m 4096 \
-machine q35,smm=on \
-global driver=cfi.pflash01,property=secure,value=on \
-drive if=pflash,format=raw,readonly=on,file="$OVMF_CODE" \
-drive if=pflash,format=raw,file="$NVRAM" \
-drive file="$DISK",if=virtio,format=qcow2,cache=writeback \
-drive file="$iso",media=cdrom,readonly=on \
"${seed_args[@]}" \
-monitor "unix:$MONITOR_SOCK,server,nowait" \
-boot order=dc,menu=off \
-netdev user,id=net0,hostfwd=tcp::${SSH_PORT}-:22 \
-device virtio-net-pci,netdev=net0 \
-device virtio-rng-pci \
-vga virtio \
-display none \
-serial "file:$SERIAL_LOG" \
>>"$LOG" 2>&1 &
QEMU_PID=$!
log "QEMU pid=$QEMU_PID"
km_wait_socket "$MONITOR_SOCK" "$WAIT_MONITOR_S" \
|| fail "monitor socket never opened"
log "monitor socket ready"
}
# Did QEMU die? Used at every poll; lets us bail with a useful message
# instead of waiting out the full timeout.
qemu_alive() {
[[ -n "${QEMU_PID:-}" ]] && kill -0 "$QEMU_PID" 2>/dev/null
}
# ── Driver: walk the installer flow ────────────────────────────────────
# Take a numbered screenshot. Auto-increments NN.
SHOT_N=0
shot() {
local label=$1
SHOT_N=$((SHOT_N + 1))
local file
file=$(printf '%s-%02d-%s.png' "$SHOT_PREFIX" "$SHOT_N" "$label")
km_screendump "$MONITOR_SOCK" "$file"
log "screenshot: $file"
}
drive_installer() {
log "waiting ${WAIT_INSTALLER_BANNER_S}s for ISO boot + tty1 installer"
# The live ISO autologs into multi-user.target, runs gum on tty1 via a
# systemd unit that replaces getty (see overlay/etc/systemd/system/
# veilor-installer.service if it exists; otherwise via the multi-user
# default in kickstart line 250).
sleep "$WAIT_INSTALLER_BANNER_S"
qemu_alive || fail "QEMU died during ISO boot"
shot "boot-banner"
# Make absolutely sure we're on tty1 (the live ks sets multi-user.target
# default, so we should already be there — but a stray graphical.target
# on dev builds would silently swallow our keystrokes).
km_send_chord "$MONITOR_SOCK" ctrl alt f1
sleep "$WAIT_AFTER_INPUT_S"
shot "tty1"
# Step 1: top option = "Install" — gum choose has it pre-selected.
log "step: select Install"
km_send_key "$MONITOR_SOCK" ret
sleep "$WAIT_GUM_PROMPT_S"
shot "after-install-pick"
# Step 2: disk select — only /dev/vda exists in this QEMU. Default
# selection = first row.
log "step: select disk (/dev/vda — only one)"
km_send_key "$MONITOR_SOCK" ret
sleep "$WAIT_GUM_PROMPT_S"
shot "after-disk-pick"
# Step 3: LUKS passphrase. gum input --password reads stdin until newline.
log "step: enter LUKS passphrase"
km_send_str "$MONITOR_SOCK" "$LUKS_PW"
sleep 1
km_send_key "$MONITOR_SOCK" ret
sleep "$WAIT_AFTER_INPUT_S"
shot "after-luks-pw"
# Step 4: admin password.
log "step: enter admin password"
km_send_str "$MONITOR_SOCK" "$ADMIN_PW"
sleep 1
km_send_key "$MONITOR_SOCK" ret
sleep "$WAIT_AFTER_INPUT_S"
shot "after-admin-pw"
# Step 5: locale select — first option = en_GB.UTF-8.
log "step: confirm locale (en_GB.UTF-8)"
km_send_key "$MONITOR_SOCK" ret
sleep "$WAIT_GUM_PROMPT_S"
shot "after-locale"
# Step 6: confirm screen. gum confirm defaults to "Yes" focused →
# Enter accepts. (Verified against gum 0.13+ docs; if defaults change
# in a future gum, swap to explicit "y" via key map.)
log "step: confirm install"
km_send_key "$MONITOR_SOCK" ret
sleep "$WAIT_AFTER_INPUT_S"
shot "after-confirm"
log "installer driven: anaconda should now be running"
}
# Quick non-blocking SSH probe. Returns 0 if reachable.
ssh_alive() {
ssh -p $SSH_PORT \
-o StrictHostKeyChecking=no \
-o UserKnownHostsFile=/dev/null \
-o ConnectTimeout=3 \
-o BatchMode=yes \
${TEST_KEY:+-i "$TEST_KEY"} \
"$SSH_USER@127.0.0.1" true 2>/dev/null
}
# Poll for anaconda completion + SSH availability. We can't watch QEMU exit
# (anaconda's `reboot` directive triggers systemctl reboot, which doesn't
# poweroff the VM — it boots back into the installed disk). The signal we
# actually trust is SSH on port 2222 starting to answer.
#
# If cloud-init didn't run (the seed ISO might not have been picked up by
# anaconda's installed system, depending on whether /etc/cloud is in the
# installed package set), SSH will never come up via key auth. The fallback
# in tty1_unlock_ssh() drives the SDDM/console login by hand.
wait_for_install_and_reboot() {
log "waiting up to ${ANACONDA_TIMEOUT_S}s for anaconda + reboot + SSH"
local waited=0 last_shot=0 last_ppm_hash="" same_count=0
while (( waited < ANACONDA_TIMEOUT_S )); do
if ! qemu_alive; then
fail "QEMU exited unexpectedly during install (check $SERIAL_LOG)"
fi
# SSH probe — first PASS exits the loop.
if ssh_alive; then
log "SSH up — installed system reachable"
shot "ssh-up"
return 0
fi
# Periodic screenshot + stuck-screen detection.
if (( waited - last_shot >= ANACONDA_POLL_S )); then
local ppm="$SHOT_PREFIX-poll.ppm"
km_monitor_send "$MONITOR_SOCK" "screendump $ppm"
sleep 1
if [[ -f $ppm ]]; then
local h
h=$(sha256sum "$ppm" 2>/dev/null | cut -d' ' -f1)
if [[ -n $last_ppm_hash && $h == "$last_ppm_hash" ]]; then
same_count=$((same_count + 1))
else
same_count=0
fi
last_ppm_hash=$h
rm -f "$ppm"
fi
# 5 minutes of identical frames = stuck. Anaconda's text-mode
# progress refreshes at least every minute, so 10 frames in a
# row (5 min @ 30s cadence) identical means it's wedged.
if (( same_count >= 10 )); then
shot "stuck"
fail "screen unchanged for 5min — anaconda likely crashed"
fi
last_shot=$waited
log "anaconda still running... (${waited}s elapsed)"
fi
sleep 5
waited=$((waited + 5))
done
shot "ssh-timeout"
log "SSH never came up via cloud-init; trying TTY1 fallback"
if tty1_unlock_ssh; then
log "TTY1 fallback succeeded; SSH should be reachable"
return 0
fi
fail "anaconda did not complete + SSH within ${ANACONDA_TIMEOUT_S}s, TTY1 fallback also failed"
}
# TTY1 fallback: the installed system reached SDDM (graphical) or got stuck
# at LUKS prompt. We drop to a TTY, log in as admin (chage forces password
# change on first use), and undo the sshd hardening so our pubkey works.
#
# This is best-effort. If the LUKS prompt is still up — we can't get past
# it without typing the passphrase, which we do here too.
tty1_unlock_ssh() {
log "TTY1 fallback: typing LUKS passphrase + admin login + opening sshd"
# Switch to tty1 in case SDDM grabbed graphical.
km_send_chord "$MONITOR_SOCK" ctrl alt f3
sleep 3
# If we're at LUKS prompt, the passphrase clears it. If we're already
# past LUKS, this is a harmless garbage on the login prompt — we Enter
# to clear, then proceed with login.
km_send_str "$MONITOR_SOCK" "$LUKS_PW"
km_send_key "$MONITOR_SOCK" ret
sleep 30 # cryptsetup unlock + boot to login prompt
shot "tty3-prelogin"
# Username — admin. chage -d 0 means we'll be prompted to change pw on
# first login. The old password is whatever we typed at install time;
# the new password just has to satisfy PAM minlen — reuse $ADMIN_PW
# and add a "1" suffix to make passwd's "must differ" check happy.
km_send_line "$MONITOR_SOCK" "admin"
sleep 3
km_send_line "$MONITOR_SOCK" "$ADMIN_PW"
sleep 5
# Old pw prompt (chage forced).
km_send_line "$MONITOR_SOCK" "$ADMIN_PW"
sleep 2
# New pw twice. Use a derivative; PAM rejects identical-to-old and we
# don't want to surprise the user with a password change.
km_send_line "$MONITOR_SOCK" "${ADMIN_PW}new"
sleep 1
km_send_line "$MONITOR_SOCK" "${ADMIN_PW}new"
sleep 5
shot "tty3-loggedin"
# Inject host pubkey + remove sshd hardening + reload sshd.
local pubkey=""
if [[ -n "${TEST_KEY:-}" && -f "${TEST_KEY}.pub" ]]; then
pubkey=$(< "${TEST_KEY}.pub")
fi
if [[ -z $pubkey ]]; then
log "TTY1 fallback: no pubkey to inject — cannot recover SSH"
return 1
fi
km_send_line "$MONITOR_SOCK" "mkdir -p ~/.ssh && chmod 700 ~/.ssh"
sleep 1
km_send_line "$MONITOR_SOCK" "echo '$pubkey' >> ~/.ssh/authorized_keys"
sleep 1
km_send_line "$MONITOR_SOCK" "chmod 600 ~/.ssh/authorized_keys"
sleep 1
km_send_line "$MONITOR_SOCK" "echo '${ADMIN_PW}new' | sudo -S rm -f /etc/ssh/sshd_config.d/10-veilor-hardening.conf"
sleep 2
km_send_line "$MONITOR_SOCK" "echo '${ADMIN_PW}new' | sudo -S systemctl reload sshd"
sleep 5
# Wait up to 60s for SSH to actually answer.
local i
for ((i=0; i<60; i++)); do
if ssh_alive; then
log "TTY1 fallback: SSH reachable after ${i}s"
return 0
fi
sleep 1
done
return 1
}
# ── Validation ─────────────────────────────────────────────────────────
# Run a single SSH command, return its stdout. Failures are NOT fatal here
# — caller decides what's a hard failure.
remote() {
ssh -p $SSH_PORT \
-o StrictHostKeyChecking=no \
-o UserKnownHostsFile=/dev/null \
-o BatchMode=yes \
${TEST_KEY:+-i "$TEST_KEY"} \
"$SSH_USER@127.0.0.1" "$@"
}
# Validation result accumulator. check_remote runs a shell snippet on the
# installed VM via SSH; the snippet must exit 0 for PASS, non-zero for
# FAIL. check_eq compares remote stdout to an expected literal.
VALIDATIONS=()
# check_remote <desc> <remote shell snippet>
# Runs the snippet via SSH, treats exit code as the verdict.
check_remote() {
local desc=$1 cmd=$2
local out rc
out=$(remote "$cmd" 2>&1)
rc=$?
if (( rc == 0 )); then
VALIDATIONS+=("PASS $desc")
log " PASS: $desc"
else
# Truncate the failure context so the report stays scannable.
local trimmed=${out:0:120}
VALIDATIONS+=("FAIL $desc ($trimmed)")
log " FAIL: $desc -- $trimmed"
fi
}
# check_eq <desc> <remote shell snippet> <expected stdout>
# Runs the snippet, trims trailing whitespace, compares to expected.
check_eq() {
local desc=$1 cmd=$2 expected=$3
local got
got=$(remote "$cmd" 2>/dev/null | tr -d '\r' | tail -n1)
got=${got%[[:space:]]}
if [[ $got == "$expected" ]]; then
VALIDATIONS+=("PASS $desc (=$got)")
log " PASS: $desc (=$got)"
else
VALIDATIONS+=("FAIL $desc (got: '$got', expected: '$expected')")
log " FAIL: $desc -- got '$got' expected '$expected'"
fi
}
run_validation() {
log "running validation checklist"
# os-release
check_remote "/etc/os-release: NAME=veilor-os" \
'grep -q "^NAME=.veilor-os" /etc/os-release'
check_eq "hostnamectl --static = veilor" \
'hostnamectl --static' "veilor"
# Active services
for svc in sshd fail2ban usbguard tuned auditd firewalld chronyd sddm; do
check_eq "$svc is-active" \
"systemctl is-active $svc" "active"
done
# SELinux. v0.5.x kickstart sets `selinux --enforcing` for installed
# systems but veilor-firstboot may toggle behavior — accept either
# Enforcing or Permissive, but log which one we got. (Hard-fail on
# Disabled.)
local selinux
selinux=$(remote getenforce 2>/dev/null | tr -d '\r' | tail -n1)
selinux=${selinux%[[:space:]]}
if [[ $selinux == Enforcing ]]; then
VALIDATIONS+=("PASS SELinux = Enforcing")
log " PASS: SELinux = Enforcing"
elif [[ $selinux == Permissive ]]; then
VALIDATIONS+=("PASS SELinux = Permissive (acceptable for v0.5)")
log " PASS (soft): SELinux = Permissive"
else
VALIDATIONS+=("FAIL SELinux = $selinux")
log " FAIL: SELinux = $selinux"
fi
# Disk layout: LUKS2 + btrfs.
check_remote "lsblk shows crypto_LUKS" \
'lsblk -f | grep -q crypto_LUKS'
check_remote "lsblk shows btrfs" \
'lsblk -f | grep -q btrfs'
check_remote "/etc/crypttab has LUKS entry" \
'grep -Ev "^\s*(#|$)" /etc/crypttab | grep -qi luks'
# Admin user
check_remote "admin user exists" \
'getent passwd admin | grep -q "^admin:"'
# CLI tools shipped via overlay.
for bin in veilor-power veilor-doctor veilor-update; do
check_remote "/usr/local/bin/$bin present" \
"test -x /usr/local/bin/$bin"
done
# init_on_alloc — veilor-installer kickstart sets it on the install
# cmdline (line 315). /proc/cmdline is the source of truth.
check_remote "init_on_alloc=1 in /proc/cmdline" \
'grep -q init_on_alloc=1 /proc/cmdline'
}
# ── Reporting ──────────────────────────────────────────────────────────
print_report() {
local pass=0 fail=0
for line in "${VALIDATIONS[@]}"; do
case "$line" in
PASS*) pass=$((pass + 1)) ;;
FAIL*) fail=$((fail + 1)) ;;
esac
done
{
echo "════════════════════════════════════════════════════════"
echo " veilor-os auto-install test report"
echo " $(date)"
echo "════════════════════════════════════════════════════════"
printf '%s\n' "${VALIDATIONS[@]}"
echo "────────────────────────────────────────────────────────"
printf 'TOTAL: %d PASS, %d FAIL\n' "$pass" "$fail"
echo "Logs: $LOG"
echo "Screenshots: ${SHOT_PREFIX}-NN-*.png"
echo "Serial log: $SERIAL_LOG"
echo "════════════════════════════════════════════════════════"
} | tee -a "$LOG"
# Capture a final SSH session snapshot (uname/lsblk/sysctl) for the
# human reviewer.
{
echo "=== final ssh probe ==="
date
echo "--- uname -a ---"
remote uname -a 2>&1
echo "--- lsblk -f ---"
remote lsblk -f 2>&1
echo "--- /proc/cmdline ---"
remote cat /proc/cmdline 2>&1
echo "--- systemctl --failed ---"
remote systemctl --failed 2>&1
} > "${SHOT_PREFIX}-final-ssh.txt" 2>&1 || true
log "final ssh snapshot: ${SHOT_PREFIX}-final-ssh.txt"
if (( fail > 0 )); then
return 1
fi
return 0
}
cleanup() {
log "cleanup"
if [[ -n "${QEMU_PID:-}" ]] && kill -0 "$QEMU_PID" 2>/dev/null; then
# Graceful shutdown via monitor first; SIGTERM if it ignores us.
km_monitor_send "$MONITOR_SOCK" "system_powerdown" 2>/dev/null || true
sleep 5
if kill -0 "$QEMU_PID" 2>/dev/null; then
kill "$QEMU_PID" 2>/dev/null || true
sleep 2
kill -9 "$QEMU_PID" 2>/dev/null || true
fi
fi
rm -f "$MONITOR_SOCK"
}
# ── Main ───────────────────────────────────────────────────────────────
main() {
trap cleanup EXIT
preflight "$@"
kill_existing_vm
wipe_state
build_seed_iso
launch_vm "$ISO"
drive_installer
wait_for_install_and_reboot
run_validation
print_report
}
main "$@"

View file

@ -128,14 +128,17 @@ EOF
fi
fi
# ── QEMU monitor fallback (live ISO doesn't run cloud-init) ──
# Started in the background after a delay; sends keypresses through the
# QEMU monitor unix socket to drop to a TTY and unblock SSH for liveuser.
MONITOR_ARGS=()
if [[ -n $HOST_PUBKEY ]]; then
# ── QEMU monitor unix socket ──
# Always exposed so the host can drive the VM via `socat - UNIX-CONNECT:...`
# (sendkey, screendump, etc.) for debugging. Independent of pubkey injection.
rm -f "$MONITOR_SOCK"
MONITOR_ARGS=(-monitor "unix:$MONITOR_SOCK,server,nowait")
# ── Auto-inject helper (live ISO doesn't run cloud-init) ──
# Started in the background after a delay; sends keypresses through the
# QEMU monitor unix socket to drop to a TTY and unblock SSH for liveuser.
if [[ -n $HOST_PUBKEY ]]; then
(
# Wait for the VM to reach a usable login prompt (SDDM autologin →
# liveuser session is the most realistic target). 90s is enough on
@ -194,6 +197,32 @@ echo " ISO : $ISO"
echo " Disk : $DISK"
echo " NVRAM : $NVRAM"
echo " Seed : ${SEED_ISO:-<none>}"
# Anaconda virtio-serial log channel.
#
# Anaconda 43.x autodetects /dev/virtio-ports/org.fedoraproject.anaconda.log.0
# and streams program/packaging/storage/anaconda logs through it in real
# time, before any tmpfs / pivot, before networking. Survives kernel
# panic. The host gets a tail-able file. No anaconda CLI flag, no
# kickstart change, just the QEMU virtio-serial wiring.
#
# We've lost logs three times in a row to anaconda failures + tmpfs
# reboots. Wiring this up so future failures auto-capture.
ANACONDA_LOG="$TEST_DIR/anaconda-vm-$(date +%Y%m%d-%H%M%S).log"
ANACONDA_LOG_DIR="$TEST_DIR/test-runs/$(date +%Y%m%d-%H%M%S)"
mkdir -p "$ANACONDA_LOG_DIR"
ANACONDA_LOG_ARGS=(
# Belt: virtio-serial (anaconda's setupVirtio rsyslog forward, fragile —
# depends on rsyslog being installed in the live ISO).
-chardev "file,id=anaclog,path=$ANACONDA_LOG"
-device virtio-serial-pci,id=vs1
-device "virtserialport,chardev=anaclog,bus=vs1.0,name=org.fedoraproject.anaconda.log.0"
# Braces: virtio-9p host directory share. veilor-installer mounts this
# at /mnt/hostlogs and rsyncs /tmp/*.log there post-anaconda.
-virtfs "local,path=$ANACONDA_LOG_DIR,mount_tag=hostlogs,security_model=mapped-xattr,id=hostlogs"
)
echo " AnaLog : $ANACONDA_LOG"
echo " HostFS : $ANACONDA_LOG_DIR (9p tag: hostlogs)"
echo " Mode : ${SECBOOT:+secboot}${SECBOOT:-stock UEFI}"
echo " Inject: ${HOST_PUBKEY:+yes}${HOST_PUBKEY:-no (no host pubkey)}"
echo "════════════════════════════════════════════════════════"
@ -212,6 +241,7 @@ exec qemu-system-x86_64 \
-drive file="$ISO",media=cdrom,readonly=on \
"${SEED_ARGS[@]}" \
"${MONITOR_ARGS[@]}" \
"${ANACONDA_LOG_ARGS[@]}" \
-boot menu=on,splash-time=2000 \
-netdev user,id=net0,hostfwd=tcp::2222-:22 \
-device virtio-net-pci,netdev=net0 \

View file

@ -0,0 +1,142 @@
# Test run — v0.5.32
- **Date:** 2026-05-06
- **ISO:** `veilor-os-43-20260506-HHMMSS.iso` (sha256: `TBD — fill in once A1 reports the build artifact`)
- **Tester:** A1 (build) + operator (P) + A5 (report scribe)
- **Build host:** forgejo-runner on nullstone (runner label `ubuntu-24.04`,
image `catthehacker/ubuntu:act-24.04`); first ISO produced off the
Forgejo build pipeline after the GH Actions mirror was disabled
2026-05-06.
- **Environment:** VM (qemu/q35/ovmf, 4 vCPU, 4 GiB RAM, virtio-vga,
virtio-9p host log mount). Real-hardware run is a separate report —
this file is the VM run only.
---
## Result
**Pending A1 build.** Operator + A5 fill in pass/fail per-step once
the actual VM test is walked through against the v0.5.32 ISO. Until
the ISO sha256 lands here, treat every row in the per-step table as
unverified.
One-line summary (write here once known): _TBD_.
---
## Regressions vs previous run
(v0.5.31 was the last tagged release; compare against any pass-with-issues
notes from that test run if a report exists. Empty otherwise — fill in
during the actual test walkthrough.)
- _TBD_
---
## Per-step results
Walk `test/TESTING.md` step-by-step. Mark each pass/fail with a brief
note when failed. Until the test runs, every row is `⏳ pending`.
| # | Step | Result | Notes |
|----|-----------------------------------|--------|-------|
| 1 | Live boot to installer banner | ⏳ pending | |
| 2 | Installer menu render | ⏳ pending | |
| 3 | Disk picker | ⏳ pending | |
| 4 | LUKS + admin passwords | ⏳ pending | Operator types directly into QEMU window — plymouth ignores synthesised keys. |
| 5 | Locale | ⏳ pending | |
| 6 | Confirm | ⏳ pending | |
| 7 | Anaconda transaction | ⏳ pending | |
| 8 | Reboot | ⏳ pending | |
| 9 | GRUB single veilor-os entry | ⏳ pending | |
| 10 | LUKS unlock prompt | ⏳ pending | |
| 11 | First boot → SDDM → KDE | ⏳ pending | |
| 12 | Hardening checks | ⏳ pending | |
---
## Hardening verification
```text
$ getenforce
TBD
$ systemctl is-active fail2ban usbguard tuned auditd firewalld
TBD
$ cat /proc/cmdline
TBD — must include rd.luks.uuid=luks-... and the v0.5.32 cmdline set.
$ lsblk -f
TBD
$ systemctl is-enabled veilor-firstboot.service
TBD — must report enabled with WantedBy=graphical.target (blocker #2).
$ nft list ruleset | grep -i tailscale
TBD — tailscale0 must be in the trusted zone (blocker #5).
$ cat /etc/skel/.config/kdeglobals 2>/dev/null | head
TBD — branding must be present (blocker #6).
$ ls /var/log/anaconda/host-9p-mount/
TBD — virtio-9p Anaconda log capture (blocker #7).
```
Paste real output. If any service is inactive, any cmdline arg is
missing, or any blocker artifact is absent, raise as a Regression
above.
---
## Findings
The 7 v0.5.32 blocker fixes from the
[2026-05-05 9-agent research wave](../../docs/research/2026-05-05-agent-wave/README.md)
land in this build. Each is listed here as an **expected behaviour**
the tester must observe — if any of these regress, log it under
Regressions above.
1. **Suspend/resume wifi survives lid-close.** `kernel.modules_disabled=1`
no longer fires before the wifi module reloads on resume. Test:
suspend the VM (or lid-close on real HW), wake, reconnect to the
same network without manual `modprobe`.
2. **`veilor-firstboot.service` is `WantedBy=graphical.target`.** The
first-boot admin password flow must run on real installs, not just
on multi-user.target boots. Test: fresh install boots straight to
the TTY password prompt before SDDM lights up.
3. **Kernel-upgrade does not drift GRUB.** First `dnf upgrade kernel`
must leave the system bootable — `grub2-mkconfig` is wired into the
kernel-install hook. Test: install, run `sudo dnf upgrade kernel`,
reboot, system comes up.
4. **USBGuard rules are id-based, not hash + parent-hash.** Mirrors the
onyx dock-replug fix in `feedback_usbguard_dock.md`. Test:
unplug/replug a known device — it stays allowed. The hash variant
re-blocks on every replug; the id variant must not.
5. **firewalld trusts `tailscale0`.** The interface is in the trusted
zone out-of-the-box. Test: bring tailscale up, ping a peer in the
mesh — no firewall mods required.
6. **`/etc/skel/` carries veilor branding.** New users get the black
colour scheme, Konsole profile, and Plasma layout on first login.
Test: `useradd test`; log in as `test`; KDE comes up branded, no
white flash, Fira Code system font.
7. **virtio-9p Anaconda log capture is active by default.**
`test/run-vm.sh` mounts a host directory into the VM; Anaconda logs
land there during install. Replaces the broken virtio-serial path
from earlier runs. Test: run install in VM; host-side mount has
`program.log`, `storage.log`, `packaging.log` populated.
Free-form notes from the actual walkthrough — cosmetic glitches, slow
paths, surprising behaviour — append below.
- _TBD — fill in during the operator-driven VM run._
---
## Action items for next release
(Empty until the test exposes something. PRs / commits opened during
the run go here.)
- [ ] _TBD_

View file

@ -0,0 +1,80 @@
# Test run — vX.Y.Z
- **Date:** YYYY-MM-DD
- **ISO:** `veilor-os-43-YYYYMMDD-HHMMSS.iso` (sha256: `...`)
- **Tester:** name / handle
- **Environment:** VM (qemu/q35/ovmf, 4 vCPU, 4G RAM, virtio-vga) — OR — Real HW (model, CPU, GPU)
---
## Result
✅ Pass / ⚠️ Pass-with-issues / ❌ Fail
One-line summary.
---
## Regressions vs previous run
(Things that worked in the prior tagged release but failed here. Empty
if none. Always check this section first when reading the report.)
---
## Per-step results
Walk `test/TESTING.md` step-by-step. Mark each pass/fail with a brief
note when failed.
| # | Step | Result | Notes |
|---|------|--------|-------|
| 1 | Live boot to installer banner | ✅ | |
| 2 | Installer menu render | ✅ | |
| 3 | Disk picker | ✅ | |
| 4 | LUKS + admin passwords | ✅ | |
| 5 | Locale | ✅ | |
| 6 | Confirm | ✅ | |
| 7 | Anaconda transaction | ✅ | |
| 8 | Reboot | ✅ | |
| 9 | GRUB single veilor-os entry | ✅ | |
| 10 | LUKS unlock prompt | ✅ | |
| 11 | First boot → SDDM → KDE | ✅ | |
| 12 | Hardening checks | ✅ | |
---
## Hardening verification
```
$ getenforce
Enforcing
$ systemctl is-active fail2ban usbguard tuned auditd firewalld
active
active
active
active
active
$ cat /proc/cmdline
... rd.luks.uuid=luks-... ...
$ lsblk -f
...
```
Paste real output. If any service is inactive or any cmdline arg is
missing, raise as a Regression above.
---
## Findings
Free-form notes. Cosmetic glitches, slow paths, surprising behaviour.
---
## Action items for next release
- [ ] ...
- [ ] ...
(Linked PRs / commits if you opened any during the test.)