Commit graph

137 commits

Author SHA1 Message Date
s8n
5c961eba88 Update docs/DOCS-DOCS.md
Some checks failed
Lint / Kickstart syntax (push) Failing after 1s
Lint / Shell scripts (push) Failing after 6s
Lint / No personal/onyx leaks (push) Failing after 3s
2026-05-06 18:03:37 +01:00
obsidian-ai
8c70030d80 docs(ROADMAP): pivot — v0.6 cancelled, v0.7 BlueBuild OCI is mainline
Strategy pivot 2026-05-06: v0.5.32 produced a green ISO on Forgejo
runner. That's the kickstart-path proof point. Continuing v0.6
kickstart polish is sunk-cost work on tooling retired at v1.0.

Pivot:
- v0.5.0 is the FINAL kickstart-path release. Tag, freeze, ship.
- v0.6 cancelled as a milestone. Original plan kept inline as
  HISTORICAL reference.
- v0.7 promoted to primary active milestone. Absorbs the v0.6
  ergonomic CLI tools (veilor-postinstall / veilor-doctor /
  veilor-update) with bootc upgrade replacing dnf upgrade.
- Active branch: v0.7-bluebuild-spike. All future feature work lands
  there, not on main.
2026-05-06 15:55:08 +01:00
obsidian-ai
89c7df0ecc docs: add PROOF-OF-WORK.md — receipts of work, tooling, and decisions
Single document that surfaces the depth of work behind veilor-os:
metrics, distros studied, every tool traversed in the build chain,
all 35+ failure classes hit and beaten, key engineering decisions and
why, what's in the repo beyond the kickstart, and the self-hosted
nullstone CI infrastructure built to support it.

Receipts not narrative — every claim links back to a file path,
commit, error, or config. Useful as portfolio anchor and as a single
read-this-first for anyone returning to the project after a gap.
2026-05-06 15:51:40 +01:00
obsidian-ai
c2b4df8ef9 ci: gate cosign/sbom/attest steps to github only
cosign keyless sign uses Sigstore Fulcio which requires a
Fulcio-trusted OIDC issuer. Forgejo runs don't have one, so cosign
falls back to the interactive device flow and times out
(error obtaining token: expired_token). Same applies to
attest-build-provenance and the SBOM action's signed attestation.

Skip all three on Forgejo for now; ISO + sha256 are sufficient for
v0.5.x test releases. Re-add when we self-host a Sigstore stack or
sign with a key-pair instead of keyless.
2026-05-06 15:41:00 +01:00
obsidian-ai
b9df392fbc docs(README): tone down secureblue credit (no code lifted yet)
We layer on their OCI image as v0.7 base; we don't redistribute their
source. Drop the AGPLv3-attribution prose — that becomes relevant only
if/when we ship a verbatim chunk of their config/policy in our repo.
2026-05-06 15:38:35 +01:00
obsidian-ai
84fa325e46 docs(README): add secureblue column + upstream credit section
secureblue (AGPLv3) is the upstream hardened atomic Fedora that the
v0.7 BlueBuild spike layers on top of. Comparison table now includes
secureblue alongside Kicksecure + stock Fedora KDE. New "Credit &
relationship to secureblue" section spells out where their work
already solves problems we don't need to reinvent (Trivalent,
SELinux policy, kernel cmdline, signed OCI), how veilor-os differs
(kickstart install path + branding + Forgejo CI), and the AGPLv3
attribution rule for any code we lift verbatim.
2026-05-06 15:32:22 +01:00
obsidian-ai
1e4ca2b56b ci: symlink /work -> GITHUB_WORKSPACE for ks %post SRC probe 2026-05-06 14:58:24 +01:00
obsidian-ai
446c602683 ks: drop apparmor-* packages — not in Fedora 43 repos
Build error: 'Failed to find package apparmor-parser : No match for
argument'. Fedora 43 base and updates do not ship AppArmor packages;
the prior comment was incorrect. Defer AppArmor to v0.7 secureblue OCI
hybrid (which has its own LSM stack), or land via COPR overlay later.
2026-05-06 14:48:06 +01:00
obsidian-ai
ac5c29df42 ci: run build directly in Fedora job container, drop addnab nest
forgejo-runner labels nullstone -> fedora:43 image. Switching
runs-on: ubuntu-24.04 -> nullstone makes the job container itself
the build environment, eliminating the docker-in-docker workspace
bind-mount problem (host path != act-container path).

Build now runs as root in fedora:43, installs livecd-tools directly
via dnf, and writes outputs to $GITHUB_WORKSPACE which is the natural
runner workdir on host. No nested docker, no userns juggling, no
explicit -v workspace bind needed.
2026-05-06 14:35:44 +01:00
obsidian-ai
6f4842a75c ci: add /work diagnostic before sed-redirect to surface bind/perm issue 2026-05-06 14:19:13 +01:00
obsidian-ai
4b90e7e00b ci: repin fedora:43 build container to amd64 digest
Prior pin was the arm64 manifest digest (linux/arm64/v8); on x86_64
host it failed with `exec /usr/bin/sh: exec format error`. Pinned to
the amd64 manifest entry from the same fat-manifest.
2026-05-06 14:10:25 +01:00
obsidian-ai
7a0c665cf0 ci: add --userns=host to nested Fedora build container
Forgejo runner on nullstone runs against a daemon with
userns-remap=default. addnab/docker-run-action launches the Fedora 43
build container with --privileged, which is incompatible with
userns-remap unless --userns=host is also set.
2026-05-06 14:07:22 +01:00
obsidian-ai
d38fce4cb8 ci: pin sbom/cosign/attest actions to node20-safe versions
forgejo-runner v6.4.0 ships node20; floating tags @v0/@v3/@v2 now
resolve to actions whose runs.using=node24, which the runner cannot
exec. Pin to last node20-shipping release of each:

- anchore/sbom-action@v0.17.2
- sigstore/cosign-installer@v3.7.0
- actions/attest-build-provenance@v2.2.3
2026-05-06 13:57:49 +01:00
s8n
bc738c1c7b Merge pull request 'ci: gate softprops release steps + add Forgejo API equivalents' (#5) from feat/a1-forgejo-ci-adapt into main 2026-05-06 13:53:02 +01:00
s8n
a3f6c1a1a6 ci: gate softprops release steps + add Forgejo API equivalents
The build-iso workflow used softprops/action-gh-release@v2 unconditionally,
which only speaks the GitHub Releases REST API. When the workflow runs on
the Forgejo runner registered on nullstone, those steps would fail.

Add a server_url check so the GH-only path runs only on github.com, and
mirror it with a curl-based step that hits the Forgejo /api/v1/releases
endpoints. Behaviour:
  - github.com: identical to before (action-gh-release@v2).
  - git.s8n.ru: drop+recreate ci-latest release, upload chunked assets
                via the Forgejo attachments API.

Tag-driven "Attach to release" path mirrored the same way.

Refs: A1 build-eng task — Forgejo runner adaptation.
2026-05-06 13:52:43 +01:00
s8n
356013e1ca Merge pull request 'feat(installer): v0.6 ergonomics + polish — 5 quick wins' (#3) from feat/ux-installer-v06-polish into main 2026-05-06 13:47:35 +01:00
s8n
417acb5585 Merge pull request 'sec: AppArmor v0.6 stub — load profiles in complain mode' (#11) from feat/sec-apparmor-v06-stubs into main 2026-05-06 13:47:31 +01:00
s8n
df574e00f5 Merge pull request 'ci: cosign keyless sigs, SBOM, provenance + fedora digest pin' (#7) from feat/sre-cosign-sbom-attestation into main 2026-05-06 13:47:27 +01:00
s8n
3e660534a1 Merge pull request 'ci: pin actions to node20-safe tags + runner sock pass-through' (#8) from feat/runner-fix-docker-sock-and-node20 into main 2026-05-06 13:47:20 +01:00
s8n
749bcef5b4 Merge pull request 'sec: polish THREAT-MODEL.md for v0.7 public launch' (#10) from feat/sec-threat-model-polish into main 2026-05-06 13:46:58 +01:00
s8n
77ed91ed8e Merge pull request 'docs: test run report skeleton for v0.5.32 (Forgejo build)' (#4) from feat/docs-test-run-v0.5.32 into main 2026-05-06 13:46:12 +01:00
s8n
ad059ec73e docs: README de-GH + Forgejo build status 2026-05-06 13:45:33 +01:00
s8n
05d37f6419 docs: METHOD-CHANGELOG 2026-05-06 forgejo entry 2026-05-06 13:45:29 +01:00
s8n
eafb8b7aa1 sec: AppArmor v0.6 stub — load profiles in complain mode
Per docs/research/2026-05-05-agent-wave/04-hardening-tier-2.md (v0.6
scope item 1).

Adds:
  - apparmor-parser apparmor-utils apparmor-profiles to %packages in
    BOTH kickstart/veilor-os.ks (live ks) and overlay/usr/local/bin/
    veilor-installer (generated install ks heredoc).
  - scripts/40-apparmor.sh — wires aa-complain on every veilor-shipped
    profile. Idempotent. "loaded, present, nothing breaks".
  - overlay/etc/apparmor.d/veilor.d/firefox — 1-liner stub (binary
    confinement marker only; full policy post-v0.6).
  - overlay/etc/apparmor.d/veilor.d/thunderbird — same pattern.
  - Wired 40-apparmor.sh into install %post chain after
    30-apply-v03-theme.sh.

Complain mode means: profiles loaded, kernel logs syscall denials but
does NOT enforce. Operator can review audit.log post-install to
inform v0.7 policy authoring.
2026-05-06 11:15:30 +01:00
s8n
d0738970e0 sec: polish THREAT-MODEL.md for v0.7 public launch
Status flipped Draft → Final.

In-scope rows now cite specific config files / settings (auditable
from clean checkout):
  - LUKS2 params from kickstart/veilor-os.ks
  - sysctl knobs file path
  - USBGuard policy mode + rule type
  - sshd_config drop-in path + every directive
  - auditd rule path + watched paths
  - chrony NTS endpoints
  - systemd-resolved DoT settings
  - bootloader kernel args (lockdown, slab_nomerge, init_on_alloc/free, etc.)

Out-of-scope rows un-hedged. 'May not always' phrasings removed; each
adversary states unambiguously what veilor-os does NOT do.
2026-05-06 11:14:34 +01:00
obsidian-ai
7974ed7a6e ci: pin actions to node20-safe tags + runner sock pass-through
forgejo-runner v6.4.0 ships a node20 javascript engine. v4.2+ of
actions/checkout and v2.0.5+ of softprops/action-gh-release moved to
node24, which the runner refuses to exec. Pin both to last node20
release.

Pairs with a runner-side config change (separately deployed on
nullstone /home/docker/forgejo-runner/conf/config.yaml) that adds
`-v /var/run/docker.sock:/var/run/docker.sock` to per-job container
options + whitelists the socket via valid_volumes — without that
addnab/docker-run-action@v3 inside the catthehacker/ubuntu job
container can't reach the docker engine.

- actions/checkout v4 -> v4.1.7
- softprops/action-gh-release v2 -> v2.0.4
- addnab/docker-run-action v3 unchanged (composite/docker, no node)
- ludeeus/action-shellcheck@master unchanged (docker-based)
2026-05-06 10:50:15 +01:00
veilor-org
f06ee5cc1c chore: gitignore auto-install-vm test artifacts
Test rig was leaking test/auto-install-vm.{nvram,qcow2} into untracked
state. Pattern matches existing veilor-vm.* exclusion.
2026-05-06 10:50:04 +01:00
veilor-org
130f0432dd ci: TODO marker for SHA-pinning third-party actions
Note that all `uses:` directives still resolve to mutable major-
version tags. SHA-pinning is the Agent 8 audit recommendation but
requires per-action web lookups that stalled the previous SRE
attempt; tracked separately so this PR can land first.
2026-05-06 10:41:19 +01:00
veilor-org
08f16bb2ee ci: pin fedora:43 base image to digest
Pin registry.fedoraproject.org/fedora:43 to its current manifest
digest so a malicious or accidental tag-rewrite upstream cannot
silently change the base layer of every CI build. Digest was
captured via `skopeo inspect --raw` on 2026-05-06. Refresh
procedure documented inline.
2026-05-06 10:41:10 +01:00
veilor-org
25b8d30f35 ci: add cosign keyless sigs, SBOM, and provenance attestation
Sign each ISO chunk with cosign keyless OIDC, generate an SPDX SBOM
of the build output, and attach an in-toto build-provenance
attestation. Sigs/certs/SBOM are uploaded alongside the ISO parts in
the ci-latest rolling prerelease so the test/auto-install.sh path
can verify before reassembling.

Action versions are major-version tags (@v3, @v0, @v2). SHA-pinning
is tracked separately to keep this PR small and avoid the long web
lookups that stalled the previous attempt.
2026-05-06 10:40:56 +01:00
veilor-org
aa731f9daa docs: test run report skeleton for v0.5.32 (Forgejo build)
First test-runs/ report off the new template. Records the build host
(forgejo-runner on nullstone, ubuntu-24.04 / catthehacker:act-24.04),
notes that v0.5.32 is the first ISO produced after the GH Actions
mirror was disabled, and pre-populates the Findings section with the
7 v0.5.32 blocker fixes from the 2026-05-05 9-agent wave as expected
behaviours the tester must verify.

Result is left as "pending A1 build" — the operator + A5 fill in
per-step pass/fail and hardening output once the actual VM walkthrough
runs against the produced ISO. This is intentional: the report is the
scaffold; the test is a separate step.
2026-05-06 10:34:06 +01:00
veilor-org
441f7d057f feat(installer): promote eject-media reminder to its own box
In v0.5 the "Remove the install media" reminder was a single line
inside the green success box, and operators on both onyx and the
friend's RTX 4080 rig missed it — rebooted into the live ISO and
re-ran the installer thinking the install had silently failed.

Promote the reminder to its own loud yellow thick-bordered gum-style
box stacked directly below the success/countdown box, with three
lines of explanation. Renders for the full 10s of the countdown so
it stays in the operator's face the entire window.
2026-05-06 10:32:30 +01:00
veilor-org
816fc0ee68 feat(installer): 10s reboot countdown with per-tick redraw
v0.5 used `sleep 5` after a static "System will reboot in 5 seconds."
box, which left the operator guessing how much time was left to grab
the USB stick. The new loop runs 10 → 1, clearing + redrawing the
gum-style success box each tick with the remaining-seconds figure,
giving the operator a visible window to act.

10 seconds (vs 5) because real hardware operators were missing the
window — laptops with the USB on the far side of the dock take
4-5 seconds to physically reach. 10 is comfortable, not annoying.
2026-05-06 10:32:11 +01:00
veilor-org
44f0c787a7 feat(installer): confirm-twice for LUKS passphrase + admin password
A typo in the LUKS passphrase is unrecoverable — the disk is
unmountable without it and we don't escrow the key. Re-prompting
until the two reads match catches keyboard-layout surprises (the
US/UK quote-key position is the most common one) before they brick
the install.

Admin password gets the same treatment for consistency. Less
catastrophic (resettable from a recovery shell) but a mismatch
still locks the user out of their fresh install on first boot.

Loop bails on cancel/ESC and re-prompts on validate_pw failure.
2026-05-06 10:31:21 +01:00
veilor-org
900f5465b3 feat(installer): staged banner reveal at 40ms/line
Read banner.txt line by line with a 40ms sleep between each, then
clear and redraw the bordered gum-style version. 5-line banner ×
40ms = 200ms total reveal — slow enough to land an aesthetic on the
first frame, fast enough that the operator never feels it as lag.

Pure cosmetic; no functional change to the install flow.
2026-05-06 10:31:02 +01:00
veilor-org
63c5e199d9 fix(installer): swap gum input --password for bash read -srp
`gum input --password` corrupts the linux fbcon since v0.5.27 — the
bubbletea screen-restore writes back the previous menu buffer because
the framebuffer terminfo entry lacks `civis/cnorm` cursor-hide
sequences, leaving a duplicate "Install" plus a stray "T" rendered on
top of the password field. The fix is a single termios echo-off via
`read -srp`: no redraw, no glitch, no dependency on gum's TUI layer
for the one screen where it broke.

Header still rendered through `gum style` so visual parity with the
disk picker / confirm box is preserved. Whiptail fallback path
unchanged (passwordbox there has always rendered cleanly).
2026-05-06 10:30:06 +01:00
veilor-org
abb67841f1 docs: STRATEGY.md — primary git host moved to git.s8n.ru (Forgejo)
Self-hosted Forgejo + forgejo-runner on nullstone now primary.
GitHub becomes public mirror (Forgejo push-mirrors every commit
+ every 8h). 0 GH Actions minutes consumed.

Runner labels:
  ubuntu-24.04 — drop-in for existing build-iso.yml workflow
  nullstone    — privileged Fedora 43 (opt-in via runs-on: nullstone)

Deploy artifacts: ~/ai-lab/nullstone-server/forgejo/.

External TODO (parent operator owns):
  - router port-forward 222 → nullstone:222 for public SSH push
  - no-guest@file allowlist update for external web UI access
2026-05-06 02:01:06 +01:00
veilor-org
b86b4f9ec3 v0.5.32: ship 7 blockers from 9-agent wave
Per docs/research/2026-05-05-agent-wave/README.md priority list.
All 7 land together to keep iteration cycles useful — partial fixes
bury the lookahead findings agents already mapped.

## 1. CRITICAL — suspend/resume wifi death (Agent 9, B2)

`veilor-modules-lock.service` runs `kernel.modules_disabled=1` 30s
after graphical.target. iwlwifi/iwlmvm/cfg80211 reload on resume
from S3/S0ix → with modules locked, resume breaks wifi until
reboot. Same architectural class as the LUKS bug — security feature
breaks legitimate kernel state transitions.

The unit already has `ConditionKernelCommandLine=!module.sig_enforce=1`
(self-skip when signed-modules enforcement is on cmdline). Adding
`module.sig_enforce=1` to the kernel cmdline retains the security
property (no unsigned modules) without runtime lock-down → resume
works.

Files: kickstart/veilor-os.ks line 61 + overlay/usr/local/bin/veilor-installer
generated bootloader directive both gain `module.sig_enforce=1`.

## 2. veilor-firstboot.service WantedBy=graphical.target (Agent 2)

Was `WantedBy=multi-user.target` only. Real installs default to
graphical.target so the unit never ran on installed systems — admin
pw stayed at install-time + chage -d 0 expired, SDDM PAM bounced
to chauthtok screen (recoverable but ugly UX).

Now `WantedBy=graphical.target multi-user.target`. Live ISO +
multi-user installs both resolve via this list.

## 3. USBGuard hash → id-based baseline (Agent 9, A3)

Mirrors memory feedback_usbguard_dock.md — onyx had hash+parent-hash
rules that broke on dock replug; we shipped no rules.conf so first
boot blocks the USB keyboard.

Adds overlay/etc/usbguard/rules.conf with HID-class allow rule
(`allow with-interface match-all { 03:*:* }`) — covers every USB
keyboard, mouse, gamepad, fingerprint reader, NFC. Survives dock
replug + kernel-bump vendor renumeration. Mass-storage stays
implicit-block; user explicitly allows post-firstboot via
`ujust veilor-usbguard-enroll` (planned v0.6).

## 4. firewalld trusted zone with tailscale0 pre-bound (Agent 9, D1)

User uses Tailscale daily (memory: project_tailscale_mesh.md).
Default firewalld zone = drop, blocks tailnet traffic on tailscale0.

Adds overlay/etc/firewalld/zones/trusted.xml with
`<interface name="tailscale0"/>`. After `tailscale up` brings the
interface up, NetworkManager dispatcher associates it with the
trusted zone automatically — no user intervention.

Default zone stays drop. Only the tailscale0 interface gets ACCEPT.

## 5. /etc/skel branding (Agent 7)

Was completely empty. Result: per-user KDE config (~/.config/kdeglobals
etc.) pre-empty, so the moment user opened System Settings, KDE wrote
fresh ~/.config/* and silently shadowed our /etc/xdg/kdedefaults/*.
Visual brand evaporated on first click.

Seeds:
  /etc/skel/.config/kdeglobals    (copy of assets/kde/veilor-default.kdeglobals)
  /etc/skel/.config/breezerc      (copy of assets/kde/breezerc)
  /etc/skel/.config/kwinrc        (Plasma 6 wayland defaults: opengl, animspeed=0,
                                    blur off, click-to-focus)
  /etc/skel/.config/konsolerc     (default profile = Veilor)
  /etc/skel/.local/share/konsole/Veilor.profile + .colorscheme

User who opens System Settings now writes against branded baseline,
not against vanilla Breeze.

## 6. KMS modeset args + initramfs keymap (Agents 1 + 9)

Real laptop boot has a 5-15s blank between vt switch and SDDM start
because simpledrm releases before i915/nvidia-drm/amdgpu claim. Plus
non-US users get locked out at LUKS prompt because initramfs ships
en-US keymap by default (RHBZ 1405539, RHBZ 1890085).

Adds to bootloader cmdline (live + installed):
  i915.modeset=1 amdgpu.modeset=1 nvidia-drm.modeset=1
  rd.vconsole.keymap=us

`rd.vconsole.keymap=us` is a placeholder; the v0.6 firstboot keymap
picker will rewrite it from /etc/vconsole.conf. Until then, en-US
users get correct LUKS keyboard; non-US users still need the v0.6
fix (per Agent 1).

## 7. virtio-9p log capture (Agent 6)

The v0.5.30 virtio-serial wiring depends on rsyslog inside the live
ISO (anaconda's setupVirtio writes a rsyslog forward rule), which
the live ks doesn't install — files were 0-byte across three
install runs.

test/run-vm.sh now adds a `-virtfs local,...,mount_tag=hostlogs`
share pointing at `test/test-runs/<timestamp>/`. veilor-installer
runs `_dump_logs_to_host` via EXIT trap that mounts the share at
/mnt/hostlogs and rsyncs /tmp/{anaconda,program,storage,packaging,dnf}.log
+ /var/log/veilor-installer.log + dmesg + journalctl + the generated
ks. Runs on success AND failure AND ^C.

No-op on real hardware (9p tag absent) — VM-only debug.

## Validate

  bash -n overlay/usr/local/bin/veilor-installer  # OK
  ksvalidator kickstart/veilor-os.ks               # clean

## Out-of-scope for v0.5.32 (deferred to v0.6)

Per Agent 1 follow-ups: argon2id retune for slow CPUs, recovery key
generation in firstboot, TPM2/FIDO2 unlock helpers. Per Agent 9
follow-ups: Plasma Wayland fallback X11 install, lid-close handling,
SELinux relabel progress UX. Per Agent 4: AppArmor stack +
nftables preset + audit log shipping CLI.

Per Agent 8 (CI hardening): SHA-pin actions + dependabot + SBOM +
SLSA L3 attestation — separate workflow-only commit.
2026-05-05 15:36:24 +01:00
veilor-org
7060d9aa6b docs: refine strategy — ostreecontainer install + mesh stack + browser stack
Refines docs/STRATEGY.md per parent-operator handoff (2026-05-05).
Locks in five things the original draft didn't cover, and corrects
one mistake.

## Refinement: ostreecontainer install path

The original draft proposed a two-step install: Anaconda partitions
+ kickstart, then on first boot a `veilor-firstboot-rebase.service`
runs `bootc rebase ghcr.io/veilor/veilor-os:43`. This commit drops
that step.

Anaconda's `ostreecontainer --url=... --transport=registry`
directive populates the root filesystem directly from the OCI image
during the install pass. No first-boot rebase, no transition
window, no second reboot. Same end state, simpler path.

Stay on `ostreecontainer` through v0.8. Do NOT migrate to the new
`bootc` kickstart command until v1.0 — it blocks multi-disk and
authenticated registries. Do NOT use `bootc-image-builder
anaconda-iso` output — deprecated in image-builder v44+. Produce
the OCI image and the bootstrap ISO as separate artifacts.

This compresses the v0.7 BlueBuild spike from 2 days → 1 day.

## Correction: keep Trivalent as default

The original strategy.md treated Trivalent (secureblue's hardened
Chromium) as an override-and-remove. That was wrong: Trivalent's
COPR tracks upstream M147+ within hours, ships hardened_malloc +
JIT-less + Drumbrake WASM. Default browser pick.

Mullvad Browser layered alongside for anti-fingerprint. Thorium
remains opt-in via `ujust install-thorium` only — its CVE lag is
months and contradicts the threat model. Never default.

## Mesh stack baked in

Three-layer warm-stack documented in STRATEGY.md:
- L3a Tailscale + Headscale (Day 1, daily driver)
- L3b Yggdrasil-go (Day 1, idle warm-fallback, AllowedPublicKeys mode)
- L3c Reticulum/RetiNet AGPL fork (opt-in via ujust install-reticulum)

Threat floor table: ISP-DNS-block (i, Day 1), ISP-Tailscale-block
(ii, Phase 2 promote Yggdrasil), internet-down (iii, opt-in RetiNet
+ RNode).

Tier model: tag:admin / tag:infra / tag:guest with failsafe pre-auth
key on yubikey + paper + Authentik OIDC group.

## Onboarding

Token paste / QR (user picks). Misskey signup mints reusable
24h-TTL pre-auth key. NOT auto-OIDC at first boot.

## Iroh seeding daemon stub (v0.8 / Phase 2)

`veilor-seed.service` documented but NOT implemented until Iroh hits
1.0 (current 0.96–0.98 RC, Q1 2026 target slipped). BLAKE3 +
iroh-gossip per-service topic. Static media only — DEFER DB
replication forever.

## External dependency tracked

nullstone Traefik `no-guest@file` ACL is currently 0.0.0.0/0
allow-all (XFF chain breakage 2026-05-03). Must be fixed before
veilor-os first-public-ISO ships, otherwise tag:guest provisioning
leaks the full vhost surface to every veilor user. Parent operator
owns the fix; explicitly out of veilor-os scope.

## Files

- docs/STRATEGY.md — full refinement
- docs/ROADMAP.md — v0.7 spike entry now reflects ostreecontainer
  + mesh stack + 1-day spike target
- README.md — drops the "v0.2.5 pre-release" badge + status box
  (out of date), adds bootc/atomic trajectory paragraph

## What did NOT change

- v0.5.x main branch is untouched. The ostreecontainer swap belongs
  in the v0.7 spike branch, NOT v0.5.32.
- nullstone Traefik config is untouched. Out of scope.
- The kickstart and overlay code is untouched.
2026-05-05 15:15:52 +01:00
veilor-org
50a241a603 docs: STRATEGY.md — hybrid kickstart bootstrap + bootc OCI on secureblue
Locks in the strategic decision from 2026-05-05 secureblue research
agent: pivot the technical base toward bootc/OCI, but as a layer over
secureblue's `securecore-kinoite-hardened-userns` rather than a
Containerfile-from-scratch.

## What changed

- New: `docs/STRATEGY.md` — full hybrid plan (kickstart bootstrap →
  first-boot bootc rebase → bootc-only at v1.0). Documents secureblue
  rationale, our overrides (drop Trivalent, restore sudo + Xwayland),
  next concrete steps for v0.7 spike (BlueBuild recipe + GH Actions
  workflow + `veilor-firstboot-rebase` one-shot).

- Updated: `docs/ROADMAP.md` v0.7 bootc-spike subsection — supersedes
  the Agent 3 Containerfile-from-scratch plan with the BlueBuild
  layering plan. Spike compresses 1 week → 2 days; hardening review
  inherited from 30 secureblue contributors.

## Why hybrid, not pure pivot

- Anaconda's LUKS UX (single passphrase prompt + custom
  partitioning) is mature; bootc-image-builder's installer is not yet
  on par. Keep the kickstart as the bootstrap.
- bootc upgrade gets us atomic A/B + signed image chain + instant
  rollback that we can't realistically build alone with our
  contributor count.
- The kickstart work is not lost — it becomes the day-zero installer
  through v0.7. v1.0 deprecates it entirely once bootc-image-builder
  installer ISO matures.

## Why secureblue, not Athena (Arch)

| Axis | secureblue | Athena OS |
|---|---|---|
| Maintainers | 30 | 8 |
| MAC enforcing OOB | SELinux + custom policy | AppArmor active, profiles mostly unconfined |
| Atomic / immutable updates | Yes (bootc/rpm-ostree) | No (rolling) |
| Threat model published | No | Yes |
| MS-signed Secure Boot shim | Yes (Fedora shim) | Yes (with auto-MOK) |

Athena's only structural advantage is the published threat model.
We're already drafting one (Agent 5 of 2026-05-05 wave) — we get
that win regardless. secureblue's contributor count + atomic update
infrastructure is the leverage.

## Strategic credibility win

Publishing `docs/THREAT-MODEL.md` BEFORE the v0.7 launch positions
veilor-os ahead of secureblue (no threat model) and Athena (has
threat model but smaller contributor base) on the one axis that
matters most.

## Open questions documented in STRATEGY.md

- secureblue contribution acceptance for upstream patches (USBGuard
  id-based-rules fix, threat model framework)
- Brave vs Mullvad-Browser pick for default browser
- bootc rebase first-boot fallback if rebase fails
- Fedora 44 transition timing follows secureblue's release tags
2026-05-05 15:05:59 +01:00
veilor-org
4e9782a18a docs: 9-agent research wave findings — v0.5.32 blocker map
Logs the full output of the 9-agent deep-dive run on 2026-05-05 to
docs/research/2026-05-05-agent-wave/. Pulls every actionable finding
into one indexed location so v0.5.32 planning has a paper trail.

Files:
  docs/research/2026-05-05-agent-wave/README.md             — index
  docs/research/2026-05-05-agent-wave/01-...real-hardware.md — Plymouth + LUKS edge cases
  docs/research/2026-05-05-agent-wave/02-...firstboot-ux.md  — SDDM + first-boot UX
  docs/research/2026-05-05-agent-wave/03-...spike-plan.md    — bootc-image-builder 1-week spike
  docs/research/2026-05-05-agent-wave/04-...tier-2.md         — AppArmor + nftables + audit + homed
  docs/research/2026-05-05-agent-wave/05-...launch.md         — threat model + v0.7 launch checklist
  docs/research/2026-05-05-agent-wave/06-...log-capture.md    — virtio-9p host-share for anaconda logs
  docs/research/2026-05-05-agent-wave/07-...skel-branding.md  — /etc/skel gap audit
  docs/research/2026-05-05-agent-wave/08-...ci-hardening.md   — SHA-pin actions + SBOM + SLSA L3
  docs/research/2026-05-05-agent-wave/09-...failure-modes.md  — real-hardware pessimistic audit

Plus the prior linter-applied:
  docs/ROADMAP.md      — Lessons learned section, v0.5.32 active block,
                          v0.6 promotion of veilor-postinstall + veilor-doctor,
                          v0.7 bootc spike scheduled
  docs/THREAT-MODEL.md  — drafted by Agent 5; in/out scope, comparison
                          matrix, v0.7 launch checklist

Top blockers identified for v0.5.32 (cross-cited in README):
  1. Suspend/resume wifi death (kernel.modules_disabled=1)
  2. veilor-firstboot.service WantedBy=graphical.target
  3. kernel-upgrade grub drift
  4. USBGuard hash-rules problem (already learned on onyx)
  5. firewalld blocks tailscale0
  6. /etc/skel/ empty
  7. virtio-9p log capture replaces broken virtio-serial path

Wave + verifier pattern (per ROADMAP lessons learned #4) validated:
9 parallel agents on distinct topics produced converging blocker
list. The same pattern landed v0.5.31 four-bug fix from the prior
4-agent verification wave on v0.5.30 outcome.
2026-05-05 14:52:53 +01:00
veilor-org
2788b95a12 v0.5.31: kernel-install via /etc/kernel/cmdline + set-e leak + rescue glob
Four-bug fix from 4-agent verification wave on v0.5.30 outcome.

Bug 1 CRITICAL: --location=none made anaconda skip CollectKernelArgumentsTask
(installation.py:149-151). --append= args never collected, BLS entries
wrote with empty cmdline. Drop --location=none, let anaconda do its
bootloader path; broad transaction_progress patch already silences the
gen_grub_cfgstub class failure.

Bug 2 CRITICAL: kernel-install reads /etc/kernel/cmdline as source of
truth (per 90-loaderentry.install:84-95). Veilor never wrote that file
so kernel-install fell through to /proc/cmdline (live ISO's). Add
3-path write: /etc/kernel/cmdline (Path A canonical), /etc/default/grub
(Path B legacy), grubby --update-kernel=ALL (Path C last-writer guard).
Plus explicit kernel-install add per kernel after Path A write.

Bug 3: rescue BLS glob *-0-rescue-*.conf required trailing hyphen;
F43 uses *-0-rescue.conf. Fix: *-0-rescue*.conf (matches both).

Bug 4: set +e/set -e scope leak in %post. v0.5.30 closed manual
bootloader block with set -e which re-enabled errexit for the rest of
%post that was authored with set +e semantics. Result: any
non-guarded command failure aborted the LUKS args injection block.
Fix: remove the closing set -e.

Files: overlay/usr/local/bin/veilor-installer.
Verified: bash -n clean, ksvalidator clean.
2026-05-05 13:47:23 +01:00
veilor-org
e83483a077 v0.5.30: broad error suppression + manual bootloader + virtio log capture
Three-layer fix for the persistent anaconda transaction failure that
killed v0.5.28 (gen_grub_cfgstub) and v0.5.29 (aggregate dnf5 error).

## Layer 1: broad error suppression in transaction_progress.py

dnf5 under RPM 6.0 + cmdline anaconda emits a final aggregate
`error("transaction process has ended with errors..")` at end of
transaction whenever its internal failure counter > 0, regardless of
whether we suppressed individual script_error events. Reproduced
twice. The narrow patch in v0.5.29 suppressed per-package errors but
the aggregate still raised PayloadInstallationError and aborted the
install before the bootloader phase ran.

v0.5.30 patch turns the `elif token == 'error':` branch in
process_transaction_progress into a log.warning. All four producers
(cpio_error, script_error, unpack_error, generic error) now flow
through to a warning + continue. Pattern matches both the original
anaconda layout AND the v0.5.29 narrow-patched layout, so re-applying
on top of either is a no-op.

This brings us back to v0.5.28 broad-suppression behaviour. The
side effect that bit us in v0.5.28 (silent grub2-efi-x64 scriptlet
failure → empty /boot/efi/EFI/fedora/ → gen_grub_cfgstub fails)
is addressed by Layer 2 below.

## Layer 2: bootloader install moved out of anaconda

The generated install kickstart now has `bootloader --location=none`,
which tells anaconda NOT to invoke its own bootloader install code
path (and therefore NOT to call gen_grub_cfgstub). All grub work
moves into the chroot %post block:

  1. `dnf reinstall grub2-efi-x64 grub2-pc grub2-tools shim-x64
     efibootmgr` — re-runs scriptlets in the chroot with full
     PID 1 systemd state, so the systemd-run-style triggers that
     anaconda's chroot truncates actually execute.
  2. `grub2-install --target=x86_64-efi --efi-directory=/boot/efi
     --bootloader-id=fedora --no-nvram` — populates /boot/efi/EFI/fedora/
  3. `gen_grub_cfgstub /boot/grub2 /boot/efi/EFI/fedora` (or
     `grub2-mkconfig` fallback) — writes /boot/efi/EFI/fedora/grub.cfg.
  4. `efibootmgr -c -d <disk> -p <part> -L "veilor-os" -l \EFI\fedora\shimx64.efi`
     — registers the NVRAM boot entry pointing at the signed shim.

Each step logs to stdout and continues on failure (`set +e` block);
diagnostics surface in the install log without aborting the whole
%post.

## Layer 3: virtio-serial log capture in run-vm.sh

Anaconda 43.x autodetects `/dev/virtio-ports/org.fedoraproject.anaconda.log.0`
and streams program/packaging/storage/anaconda logs through it in
real time, before any tmpfs / pivot, before networking, surviving
kernel panic. Wiring it into run-vm.sh means the host gets a
tail-able log file at `test/anaconda-vm-YYYYMMDD-HHMMSS.log` for
every VM run.

We've lost logs three times in a row to anaconda failures + tmpfs
reboots. This breaks the loop.

## Diagnostic story

Before this commit: VM aborts → live ISO reboots itself → /tmp/
tmpfs gone → no logs → guess what failed. Three days, two and a
half false fixes.

After this commit: VM aborts → host has /home/admin/ai-lab/_github/veilor-os/test/anaconda-vm-*.log
with the actual scriptlet output, the actual exit codes, the
actual file-trigger failures. Future debug becomes evidence-based.

Files changed:
  kickstart/veilor-os.ks        — broad error suppression patch
  overlay/usr/local/bin/veilor-installer — --location=none + manual grub
  test/run-vm.sh                — virtio-serial chardev wiring

Verified: bash -n clean, ksvalidator clean.
2026-05-05 11:59:35 +01:00
veilor-org
613d35402e v0.5.29: narrow anaconda patch + LUKS UX + initramfs assertion
Five-fix bundle from 7-agent research wave on the v0.5.28-final
gen_grub_cfgstub failure.

## 1. Narrow the anaconda transaction_progress patch (CRITICAL)

The v0.5.28 patch was too broad. It rewrote
`process_transaction_progress` so every 'error' token in the
transaction queue became a `log.warning`. That queue carries four
distinct error classes:

  - cpio_error      — payload extraction (genuinely fatal)
  - script_error    — RPM 6.0 cmdline-mode scriptlet warning-as-error
                      (the ONE we want to ignore)
  - unpack_error    — payload corruption (genuinely fatal)
  - error           — generic transaction error (genuinely fatal)

By swallowing all four we silently masked grub2-efi-x64's posttrans
failure mid-install. /boot/efi/EFI/fedora/ ended up incomplete →
gen_grub_cfgstub then failed at the bootloader install phase with
"gen_grub_cfgstub script failed" because its `set -eu` script
couldn't read the missing files.

v0.5.29 narrows the patch: override only the `script_error` callback
inside transaction_progress.py to log a warning and NOT enqueue
'error'. The consumer (`process_transaction_progress`) reverts to
upstream behaviour where cpio_error / unpack_error / error still
raise PayloadInstallationError. Real install-fatal events keep
aborting; only the F43-RPM-6.0 scriptlet regression is silenced.

The patch is applied via `python3 -c` regex rewrite (more robust
than nested sed across multi-line method bodies).

## 2. LUKS UX — `tries=5,timeout=0` (FIX)

Default cryptsetup-generator unit allows ONE passphrase try with a
1m30s wait. One typo on a long passphrase = wait 1m30s, then the
device-wait timer trips, then dracut emergency shell after 3min total.
Brutal. Adding `rd.luks.options=luks-XXX=tries=5,timeout=0` gives
five typo-friendly retries with no auto-timeout.

## 3. fbcon=nodefer on installed-system cmdline (FIX)

Live ISO cmdline already has `fbcon=nodefer` (added in v0.5.27 to fix
the real-laptop black-screen-after-dracut). The installed-system
bootloader directive in the generated install ks did NOT carry it.
Same KMS handoff happens on the installed system on the same hardware.
Now both have the flag.

## 4. /etc/crypttab fallback assertion (BELT-BRACES)

Anaconda's custom-partitioning code path normally writes /etc/crypttab
for `--encrypted` part directives. Edge cases observed in F43+ where
it doesn't. Without crypttab, systemd-cryptsetup-generator can still
work from kernel cmdline alone, but cleanup paths and second-stage
unlock both fall over. Adding a fallback `echo` that writes the
canonical line if it's missing post-anaconda.

## 5. Initramfs LUKS module assertion (DEFENSIVE)

Force-include `crypt + systemd-cryptsetup + plymouth` modules in
initramfs via /etc/dracut.conf.d/10-veilor-luks.conf. dracut autodetects
these when it sees an active LUKS mapping, but %post runs before the
LUKS state is fully observable from the chroot. Plus we wipe stale
initramfs (`rm -f /boot/initramfs-*.img`) before `--regenerate-all`
so the regen actually rewrites bytes. Final assertion runs
`lsinitrd | grep -q cryptsetup` and surfaces a [ERR] line in build
output if the module didn't make it.

## What this should fix

After the man-db fix in v0.5.28-final, install proceeded past
"Configuring xxx" cleanly but died at "Installing boot loader" with
gen_grub_cfgstub. Root-cause was the over-broad patch from #1 above.

After v0.5.29:
  - Install transaction completes (man-db excluded; non-man-db
    scriptlet warnings still suppressed; real errors still raise)
  - gen_grub_cfgstub runs against complete /boot/efi/EFI/fedora/
  - Bootloader install completes
  - Reboot to disk lands at GRUB veilor-os entry
  - Kernel + initramfs load (cryptsetup confirmed present)
  - Plymouth LUKS prompt appears with text fallback
  - User has 5 tries, no timeout
  - Unlock → btrfs subvol mount → systemd → SDDM

Files: kickstart/veilor-os.ks (+45 lines), overlay/usr/local/bin/veilor-installer (+50 lines).
Verified: bash -n clean, ksvalidator clean.

References:
  pyanaconda transaction_progress.py:110-136 (4 producers of 'error')
  pyanaconda bootloader/efi.py:194-201 (gen_grub_cfgstub call site)
  /usr/bin/gen_grub_cfgstub (set -eu wrapper for grub2-mkconfig stub)
  Fedora wiki Changes/RPM-6.0
  dnf5 issue #2507 (RPM 6.0 scriptlet propagation regression)
2026-05-05 05:12:24 +01:00
veilor-org
fae677fb68 v0.5.28 (final): patch anaconda transaction_progress.py + exclude man-db
THE actual root cause of the man-db transaction failure that killed
three consecutive VM installs (v0.5.26 / v0.5.27 / v0.5.28).
Confirmed via 7-agent research wave:

- Fedora 43 ships RPM 6.0, which changed scriptlet failure
  propagation. Scriptlets that previously emitted "Non-critical
  error" warnings now bubble up as transaction-level errors. dnf5
  issue #2507 documents the change. Anaconda --cmdline mode treats
  any 'error' token from the dnf transaction as a fatal abort.
- man-db's `transfiletriggerin` is the canonical trigger: it runs
  `systemd-run /usr/bin/systemctl start man-db-cache-update` which
  returns non-zero in the anaconda chroot (no PID 1 systemd) and is
  flagged as transaction-level error under RPM 6.0.
- We previously patched anaconda's transaction_progress.py on the
  BUILD HOST so livecd-creator could finish its own transaction.
  That patch lives only on the host running the build — never landed
  in the live rootfs the user installs from. Reproduced 3 times:
  install-time anaconda on the live ISO is unpatched, hits the same
  code path, aborts at exactly "Configuring man-db.x86_64".

Two-layer fix:

1. kickstart %post seds the file inside the live rootfs at build time
   so the user's install-time anaconda is patched. Sed downgrades the
   'error' token from raise PayloadInstallationError to log.warning.

2. Generated install ks excludes man-db / man-pages / man-pages-overrides
   from %packages. Belt-and-braces — even if the patch has an edge
   case the trigger never fires because the package isn't installed.
   Users install man pages post-firstboot.

Previous attempts that didn't work: dropping the updates repo (only
narrowed the set of failing scriptlets, didn't fix the underlying
RPM-6.0 propagation change); flipping SELinux to permissive
(confirmed not the cause; kickstart's selinux directive only writes
/etc/selinux/config in target root, doesn't affect installer-time).

Follow-up for next release: replicate the transaction_progress patch
in the CI workflow's container so the build itself is deterministic.
Currently the workflow has been greening on luck.

Files: kickstart/veilor-os.ks (+25 lines), overlay/usr/local/bin/veilor-installer (+10 lines).
Verified: bash -n clean, ksvalidator clean.
2026-05-05 03:46:00 +01:00
veilor-org
931a19ec93 v0.5.28 (cont): drop updates repo from generated install ks
Anaconda's transaction died at "Configuring man-db.x86_64" in both
v0.5.26 and v0.5.27 VM tests, reliably, days apart, against a freshly
populated package cache. Same failure pattern, same package, with
nothing in the visible error other than "The transaction process has
ended with errors..". Pattern matches the same Fedora `updates` repo
issue that the CI build kickstart already worked around by stripping
the `updates` line entirely (`.github/workflows/build-iso.yml` line
~109).

The installer-generated kickstart was adding the line back and
re-introducing the bug for every user install. This commit aligns
the install-time ks with the build-time ks: only the base `releases`
repo is consumed by anaconda. Users who want updates run `dnf
upgrade` post-install (or the v0.6 `veilor-update` wrapper).

Trade-off: first-boot package versions are frozen to the Fedora 43
release date instead of including post-release updates. Acceptable —
the alternative is "install reliably fails" which makes any
freshness conversation moot.

Verified locally: `bash -n` passes, ks template still well-formed.
End-to-end re-validation goes through the next CI ISO + VM test run.
2026-05-05 02:52:22 +01:00
veilor-org
e848c7ffc3 v0.5.28 (partial): lock locale to en_US, roadmap post-install menu
Install-flow change + roadmap update. The roadmap entry is the
durable record; the code change is the immediate effect.

## Locale picker removed

The "[4/4] Locale" prompt is gone. Locale is hardcoded to en_US.UTF-8
for the install. Two reasons:

1. The picker only offered en_GB and en_US, both of which install
   identically apart from the langtag string and a couple of date /
   currency conventions that nobody who's mid-install is thinking
   about. It's a fake choice that adds a screen.
2. `localectl set-locale` post-install handles every locale on earth
   in one command. The v0.7 `veilor-postinstall` first-login menu (see
   roadmap below) will offer a locale + keyboard layout switch with
   live preview, which is the right place for that decision.

Step counters updated [1/4]→[1/3], [2/4]→[2/3], [3/4]→[3/3]. The Locale
row stays in the confirm-summary box because users still want to see
what they're getting installed.

## Roadmap

- New section v0.5.27–v0.5.28 — documents the install-path
  stabilisation work explicitly so the bridge between "first green
  ISO" and "looks polished" is not invisible. Calls out the LUKS BLS
  fix that landed in v0.5.27 + the gum-input replacement scheduled
  for v0.5.28.
- v0.6 — `veilor-doctor` description expanded: this is the
  post-install audit tool. Every user runs it weekly to see drift
  from baseline.
- v0.6 — new entry `veilor-postinstall`: EndeavourOS-style first-login
  welcome menu, single TUI screen, asks once. Covers the "I just
  installed, what do I configure" gap in one explicit step instead of
  scattered docs.
2026-05-05 02:48:36 +01:00
veilor-org
1881c14ea7 v0.5.27: rd.luks.uuid via grubby, GRUB rebrand, fbcon=nodefer, ASCII gum cursor
Critical install bug fix + cosmetic round-up + first formal test
procedure document.

## Critical: LUKS unlock on first boot

Generated installer kickstart's %post was injecting `rd.luks.uuid=…`
into `/etc/default/grub` only. Fedora 43 uses BLS (Boot Loader
Specification) entries in `/boot/loader/entries/*.conf`; those are
NOT regenerated by `grub2-mkconfig`. Result: the kernel boots without
`rd.luks.uuid=`, dracut's cryptsetup-generator never spawns the
unlock unit, plymouth has no password to ask for, and dracut-initqueue
loops on dev-disk-by-uuid for ~3min before dropping to emergency
shell.

The fix layers both write paths:
- `/etc/default/grub` — keeps the args around for future kernels
  (kernel-install reads this when adding new entries).
- `grubby --update-kernel=ALL --args=...` — rewrites the `options`
  line of every existing BLS entry so the kernel that boots NEXT
  actually has the args.

Verified by reading `/proc/cmdline` from the dracut emergency shell
on a v0.5.26 install; old cmdline had only `root=UUID=… ro
rootflags=subvol=root` and was missing the LUKS arg entirely.

## GRUB / branding

- `/etc/default/grub` is sed'd to `GRUB_DISTRIBUTOR="veilor-os"` (was
  already there, kept).
- BLS entries' `title` line is rewritten in-place to "veilor-os
  (<kver>)" for every kernel — `grub2-mkconfig` does not touch BLS
  titles, so this is the only path.
- `/boot/loader/entries/*-0-rescue-*.conf` is removed: the auto-built
  rescue entry was leaking "Fedora Linux" into the GRUB menu and
  showing a second boot option that nobody asked for. The rescue
  kernel image itself is left in /boot.
- Hostname defaults to `veilor` (was inheriting the `localhost-live`
  name anaconda writes when the kickstart's network directive is
  ignored under cmdline mode).
- `/etc/machine-info` adds `PRETTY_HOSTNAME="veilor-os"` so
  `hostnamectl status` and any consumer reading machine-info see the
  brand.

## Boot UX

- `fbcon=nodefer` added to live-ISO bootloader cmdline. On real
  laptops with a hardware GPU, the kernel modeset blanks the
  framebuffer console mid-boot; without `nodefer` the installer
  banner draws into a frozen framebuffer and the user sees a black
  screen with a blinking cursor for ~30s. virtio-vga in QEMU doesn't
  trigger this so it never reproduced in VM. Symptom report on
  v0.5.26 was the trigger to investigate.

## Installer cosmetics

- `GUM_CHOOSE_CURSOR` and `GUM_INPUT_PROMPT` switched from `❯ ` to
  `> `. The unicode arrow falls back to a fixed-width block on the
  linux fbcon font and lipgloss then duplicates that block at col +23,
  producing the "Install Install" double-render and the stray-T
  artifact in password fields. Plain ASCII renders identically across
  fbcon, virtio-vga, and X/Wayland gum runs.
- `VERSION_ID` bumped 0.5.8 → 0.5.27 in the os-release drop-in. The
  installer banner reads this at runtime, so the live ISO + installed
  system both now show "veilor-os 0.5.27".

## Test procedure

- `test/TESTING.md` — first canonical test procedure document. Splits
  VM (cheap iteration, hybrid sendkey + human passwords) from real
  hardware (mandatory for tag). Documents the standard test passwords
  (`veilortest1` for both LUKS and admin), the kill-and-relaunch step
  to skip CD on second boot, and the per-step pass/fail contract.
- `test/METHOD-CHANGELOG.md` — append-only audit trail for changes to
  the procedure. Future releases that alter the test method must add
  an entry here with the why.
- `test/test-runs/_TEMPLATE.md` — per-run report template. Each
  tagged release should land a filled report alongside it.

## test/run-vm.sh

Decoupled QEMU monitor sock setup from auto-inject. Previously
`NO_INJECT=1` (used to suppress autotype noise into prompts) also
killed the monitor sock, leaving the VM undriveable. Monitor sock is
now always exposed; only the inject helper is gated on the pubkey
detection.
2026-05-05 01:43:00 +01:00
veilor-org
c89c73ee84 v0.5.26: log to /run, tolerate tee failure
User hit `/usr/local/bin/veilor-installer: line 33: /usr/bin/tee:
input/output error` on real-hardware install. Cause: LOG was
`/var/log/veilor-installer.log`, which on the live ISO is backed by an
overlay over squashfs. A bad sector / flaky USB → tee write fails →
process substitution dies → installer aborts before the menu renders.

Two changes:

1. Move LOG to /run/veilor-installer.log — pure tmpfs, never touches
   the live medium. Same path also unaffected by /var fill or overlay
   weirdness.

2. Wrap the `exec > >(tee -a $LOG) 2>&1` redirect in a writability
   probe. If the log can't be appended to (tmpfs OOM, fd exhaustion,
   anything), skip the tee and run the installer without on-disk
   persistence rather than crashing.

Persistence is a nice-to-have for post-mortem debugging; the installer
running is the must-have. This inverts the priority correctly.
2026-05-04 14:20:26 +01:00
veilor-org
b3509b4b06 v0.5.25: don't run veilor-firstboot on live ISO
Live ISO boot chain showing extra step:
  boot → text scroll → veilor-firstboot prompts admin pw → installer

veilor-firstboot.service was enabled in live ks but it's an INSTALLED
system feature (forces admin pw set on first real boot). Made no
sense to ask on live (no persistent admin user, throwaway VM, etc).

Live ks now: doesn't enable veilor-firstboot, masks the unit so
overlay-copied unit file can't auto-activate. Install ks chroot %post
already enables it (correct path).

After fix:
  boot → text scroll → installer banner directly
2026-05-04 04:08:40 +01:00