A3 inline (agent failed on API). Three CLIs ported / written for the
v0.7+ atomic system:
veilor-update — rewritten on bootc upgrade (was dnf upgrade --refresh).
Pre-checks bootc status, pauses auditd while staging, prints summary
and offers reboot. Returns 0/1/2/3 per legacy contract.
veilor-postinstall (NEW) — first-login TUI run via
veilor-postinstall.service oneshot. Asks once for keyboard, locale,
hostname, GPU drivers, package presets (dev/media/homelab),
bluetooth, USBGuard snapshot, then invokes veilor-doctor. Writes
/var/lib/veilor/postinstall-complete and self-disables on success.
veilor-doctor — Updates section rewritten to parse `bootc status
--json` (with jq) when available, falls back to dnf history /
check-update for legacy v0.5.x kickstart-installed systems.
Plus systemd units:
- veilor-postinstall.service (oneshot on graphical.target, gated on
absence of done-marker, runs on tty1)
- veilor-doctor.service + .timer (weekly drift check)
In v0.5 the "Remove the install media" reminder was a single line
inside the green success box, and operators on both onyx and the
friend's RTX 4080 rig missed it — rebooted into the live ISO and
re-ran the installer thinking the install had silently failed.
Promote the reminder to its own loud yellow thick-bordered gum-style
box stacked directly below the success/countdown box, with three
lines of explanation. Renders for the full 10s of the countdown so
it stays in the operator's face the entire window.
v0.5 used `sleep 5` after a static "System will reboot in 5 seconds."
box, which left the operator guessing how much time was left to grab
the USB stick. The new loop runs 10 → 1, clearing + redrawing the
gum-style success box each tick with the remaining-seconds figure,
giving the operator a visible window to act.
10 seconds (vs 5) because real hardware operators were missing the
window — laptops with the USB on the far side of the dock take
4-5 seconds to physically reach. 10 is comfortable, not annoying.
A typo in the LUKS passphrase is unrecoverable — the disk is
unmountable without it and we don't escrow the key. Re-prompting
until the two reads match catches keyboard-layout surprises (the
US/UK quote-key position is the most common one) before they brick
the install.
Admin password gets the same treatment for consistency. Less
catastrophic (resettable from a recovery shell) but a mismatch
still locks the user out of their fresh install on first boot.
Loop bails on cancel/ESC and re-prompts on validate_pw failure.
Read banner.txt line by line with a 40ms sleep between each, then
clear and redraw the bordered gum-style version. 5-line banner ×
40ms = 200ms total reveal — slow enough to land an aesthetic on the
first frame, fast enough that the operator never feels it as lag.
Pure cosmetic; no functional change to the install flow.
`gum input --password` corrupts the linux fbcon since v0.5.27 — the
bubbletea screen-restore writes back the previous menu buffer because
the framebuffer terminfo entry lacks `civis/cnorm` cursor-hide
sequences, leaving a duplicate "Install" plus a stray "T" rendered on
top of the password field. The fix is a single termios echo-off via
`read -srp`: no redraw, no glitch, no dependency on gum's TUI layer
for the one screen where it broke.
Header still rendered through `gum style` so visual parity with the
disk picker / confirm box is preserved. Whiptail fallback path
unchanged (passwordbox there has always rendered cleanly).
Per docs/research/2026-05-05-agent-wave/README.md priority list.
All 7 land together to keep iteration cycles useful — partial fixes
bury the lookahead findings agents already mapped.
## 1. CRITICAL — suspend/resume wifi death (Agent 9, B2)
`veilor-modules-lock.service` runs `kernel.modules_disabled=1` 30s
after graphical.target. iwlwifi/iwlmvm/cfg80211 reload on resume
from S3/S0ix → with modules locked, resume breaks wifi until
reboot. Same architectural class as the LUKS bug — security feature
breaks legitimate kernel state transitions.
The unit already has `ConditionKernelCommandLine=!module.sig_enforce=1`
(self-skip when signed-modules enforcement is on cmdline). Adding
`module.sig_enforce=1` to the kernel cmdline retains the security
property (no unsigned modules) without runtime lock-down → resume
works.
Files: kickstart/veilor-os.ks line 61 + overlay/usr/local/bin/veilor-installer
generated bootloader directive both gain `module.sig_enforce=1`.
## 2. veilor-firstboot.service WantedBy=graphical.target (Agent 2)
Was `WantedBy=multi-user.target` only. Real installs default to
graphical.target so the unit never ran on installed systems — admin
pw stayed at install-time + chage -d 0 expired, SDDM PAM bounced
to chauthtok screen (recoverable but ugly UX).
Now `WantedBy=graphical.target multi-user.target`. Live ISO +
multi-user installs both resolve via this list.
## 3. USBGuard hash → id-based baseline (Agent 9, A3)
Mirrors memory feedback_usbguard_dock.md — onyx had hash+parent-hash
rules that broke on dock replug; we shipped no rules.conf so first
boot blocks the USB keyboard.
Adds overlay/etc/usbguard/rules.conf with HID-class allow rule
(`allow with-interface match-all { 03:*:* }`) — covers every USB
keyboard, mouse, gamepad, fingerprint reader, NFC. Survives dock
replug + kernel-bump vendor renumeration. Mass-storage stays
implicit-block; user explicitly allows post-firstboot via
`ujust veilor-usbguard-enroll` (planned v0.6).
## 4. firewalld trusted zone with tailscale0 pre-bound (Agent 9, D1)
User uses Tailscale daily (memory: project_tailscale_mesh.md).
Default firewalld zone = drop, blocks tailnet traffic on tailscale0.
Adds overlay/etc/firewalld/zones/trusted.xml with
`<interface name="tailscale0"/>`. After `tailscale up` brings the
interface up, NetworkManager dispatcher associates it with the
trusted zone automatically — no user intervention.
Default zone stays drop. Only the tailscale0 interface gets ACCEPT.
## 5. /etc/skel branding (Agent 7)
Was completely empty. Result: per-user KDE config (~/.config/kdeglobals
etc.) pre-empty, so the moment user opened System Settings, KDE wrote
fresh ~/.config/* and silently shadowed our /etc/xdg/kdedefaults/*.
Visual brand evaporated on first click.
Seeds:
/etc/skel/.config/kdeglobals (copy of assets/kde/veilor-default.kdeglobals)
/etc/skel/.config/breezerc (copy of assets/kde/breezerc)
/etc/skel/.config/kwinrc (Plasma 6 wayland defaults: opengl, animspeed=0,
blur off, click-to-focus)
/etc/skel/.config/konsolerc (default profile = Veilor)
/etc/skel/.local/share/konsole/Veilor.profile + .colorscheme
User who opens System Settings now writes against branded baseline,
not against vanilla Breeze.
## 6. KMS modeset args + initramfs keymap (Agents 1 + 9)
Real laptop boot has a 5-15s blank between vt switch and SDDM start
because simpledrm releases before i915/nvidia-drm/amdgpu claim. Plus
non-US users get locked out at LUKS prompt because initramfs ships
en-US keymap by default (RHBZ 1405539, RHBZ 1890085).
Adds to bootloader cmdline (live + installed):
i915.modeset=1 amdgpu.modeset=1 nvidia-drm.modeset=1
rd.vconsole.keymap=us
`rd.vconsole.keymap=us` is a placeholder; the v0.6 firstboot keymap
picker will rewrite it from /etc/vconsole.conf. Until then, en-US
users get correct LUKS keyboard; non-US users still need the v0.6
fix (per Agent 1).
## 7. virtio-9p log capture (Agent 6)
The v0.5.30 virtio-serial wiring depends on rsyslog inside the live
ISO (anaconda's setupVirtio writes a rsyslog forward rule), which
the live ks doesn't install — files were 0-byte across three
install runs.
test/run-vm.sh now adds a `-virtfs local,...,mount_tag=hostlogs`
share pointing at `test/test-runs/<timestamp>/`. veilor-installer
runs `_dump_logs_to_host` via EXIT trap that mounts the share at
/mnt/hostlogs and rsyncs /tmp/{anaconda,program,storage,packaging,dnf}.log
+ /var/log/veilor-installer.log + dmesg + journalctl + the generated
ks. Runs on success AND failure AND ^C.
No-op on real hardware (9p tag absent) — VM-only debug.
## Validate
bash -n overlay/usr/local/bin/veilor-installer # OK
ksvalidator kickstart/veilor-os.ks # clean
## Out-of-scope for v0.5.32 (deferred to v0.6)
Per Agent 1 follow-ups: argon2id retune for slow CPUs, recovery key
generation in firstboot, TPM2/FIDO2 unlock helpers. Per Agent 9
follow-ups: Plasma Wayland fallback X11 install, lid-close handling,
SELinux relabel progress UX. Per Agent 4: AppArmor stack +
nftables preset + audit log shipping CLI.
Per Agent 8 (CI hardening): SHA-pin actions + dependabot + SBOM +
SLSA L3 attestation — separate workflow-only commit.
Four-bug fix from 4-agent verification wave on v0.5.30 outcome.
Bug 1 CRITICAL: --location=none made anaconda skip CollectKernelArgumentsTask
(installation.py:149-151). --append= args never collected, BLS entries
wrote with empty cmdline. Drop --location=none, let anaconda do its
bootloader path; broad transaction_progress patch already silences the
gen_grub_cfgstub class failure.
Bug 2 CRITICAL: kernel-install reads /etc/kernel/cmdline as source of
truth (per 90-loaderentry.install:84-95). Veilor never wrote that file
so kernel-install fell through to /proc/cmdline (live ISO's). Add
3-path write: /etc/kernel/cmdline (Path A canonical), /etc/default/grub
(Path B legacy), grubby --update-kernel=ALL (Path C last-writer guard).
Plus explicit kernel-install add per kernel after Path A write.
Bug 3: rescue BLS glob *-0-rescue-*.conf required trailing hyphen;
F43 uses *-0-rescue.conf. Fix: *-0-rescue*.conf (matches both).
Bug 4: set +e/set -e scope leak in %post. v0.5.30 closed manual
bootloader block with set -e which re-enabled errexit for the rest of
%post that was authored with set +e semantics. Result: any
non-guarded command failure aborted the LUKS args injection block.
Fix: remove the closing set -e.
Files: overlay/usr/local/bin/veilor-installer.
Verified: bash -n clean, ksvalidator clean.
Three-layer fix for the persistent anaconda transaction failure that
killed v0.5.28 (gen_grub_cfgstub) and v0.5.29 (aggregate dnf5 error).
## Layer 1: broad error suppression in transaction_progress.py
dnf5 under RPM 6.0 + cmdline anaconda emits a final aggregate
`error("transaction process has ended with errors..")` at end of
transaction whenever its internal failure counter > 0, regardless of
whether we suppressed individual script_error events. Reproduced
twice. The narrow patch in v0.5.29 suppressed per-package errors but
the aggregate still raised PayloadInstallationError and aborted the
install before the bootloader phase ran.
v0.5.30 patch turns the `elif token == 'error':` branch in
process_transaction_progress into a log.warning. All four producers
(cpio_error, script_error, unpack_error, generic error) now flow
through to a warning + continue. Pattern matches both the original
anaconda layout AND the v0.5.29 narrow-patched layout, so re-applying
on top of either is a no-op.
This brings us back to v0.5.28 broad-suppression behaviour. The
side effect that bit us in v0.5.28 (silent grub2-efi-x64 scriptlet
failure → empty /boot/efi/EFI/fedora/ → gen_grub_cfgstub fails)
is addressed by Layer 2 below.
## Layer 2: bootloader install moved out of anaconda
The generated install kickstart now has `bootloader --location=none`,
which tells anaconda NOT to invoke its own bootloader install code
path (and therefore NOT to call gen_grub_cfgstub). All grub work
moves into the chroot %post block:
1. `dnf reinstall grub2-efi-x64 grub2-pc grub2-tools shim-x64
efibootmgr` — re-runs scriptlets in the chroot with full
PID 1 systemd state, so the systemd-run-style triggers that
anaconda's chroot truncates actually execute.
2. `grub2-install --target=x86_64-efi --efi-directory=/boot/efi
--bootloader-id=fedora --no-nvram` — populates /boot/efi/EFI/fedora/
3. `gen_grub_cfgstub /boot/grub2 /boot/efi/EFI/fedora` (or
`grub2-mkconfig` fallback) — writes /boot/efi/EFI/fedora/grub.cfg.
4. `efibootmgr -c -d <disk> -p <part> -L "veilor-os" -l \EFI\fedora\shimx64.efi`
— registers the NVRAM boot entry pointing at the signed shim.
Each step logs to stdout and continues on failure (`set +e` block);
diagnostics surface in the install log without aborting the whole
%post.
## Layer 3: virtio-serial log capture in run-vm.sh
Anaconda 43.x autodetects `/dev/virtio-ports/org.fedoraproject.anaconda.log.0`
and streams program/packaging/storage/anaconda logs through it in
real time, before any tmpfs / pivot, before networking, surviving
kernel panic. Wiring it into run-vm.sh means the host gets a
tail-able log file at `test/anaconda-vm-YYYYMMDD-HHMMSS.log` for
every VM run.
We've lost logs three times in a row to anaconda failures + tmpfs
reboots. This breaks the loop.
## Diagnostic story
Before this commit: VM aborts → live ISO reboots itself → /tmp/
tmpfs gone → no logs → guess what failed. Three days, two and a
half false fixes.
After this commit: VM aborts → host has /home/admin/ai-lab/_github/veilor-os/test/anaconda-vm-*.log
with the actual scriptlet output, the actual exit codes, the
actual file-trigger failures. Future debug becomes evidence-based.
Files changed:
kickstart/veilor-os.ks — broad error suppression patch
overlay/usr/local/bin/veilor-installer — --location=none + manual grub
test/run-vm.sh — virtio-serial chardev wiring
Verified: bash -n clean, ksvalidator clean.
Five-fix bundle from 7-agent research wave on the v0.5.28-final
gen_grub_cfgstub failure.
## 1. Narrow the anaconda transaction_progress patch (CRITICAL)
The v0.5.28 patch was too broad. It rewrote
`process_transaction_progress` so every 'error' token in the
transaction queue became a `log.warning`. That queue carries four
distinct error classes:
- cpio_error — payload extraction (genuinely fatal)
- script_error — RPM 6.0 cmdline-mode scriptlet warning-as-error
(the ONE we want to ignore)
- unpack_error — payload corruption (genuinely fatal)
- error — generic transaction error (genuinely fatal)
By swallowing all four we silently masked grub2-efi-x64's posttrans
failure mid-install. /boot/efi/EFI/fedora/ ended up incomplete →
gen_grub_cfgstub then failed at the bootloader install phase with
"gen_grub_cfgstub script failed" because its `set -eu` script
couldn't read the missing files.
v0.5.29 narrows the patch: override only the `script_error` callback
inside transaction_progress.py to log a warning and NOT enqueue
'error'. The consumer (`process_transaction_progress`) reverts to
upstream behaviour where cpio_error / unpack_error / error still
raise PayloadInstallationError. Real install-fatal events keep
aborting; only the F43-RPM-6.0 scriptlet regression is silenced.
The patch is applied via `python3 -c` regex rewrite (more robust
than nested sed across multi-line method bodies).
## 2. LUKS UX — `tries=5,timeout=0` (FIX)
Default cryptsetup-generator unit allows ONE passphrase try with a
1m30s wait. One typo on a long passphrase = wait 1m30s, then the
device-wait timer trips, then dracut emergency shell after 3min total.
Brutal. Adding `rd.luks.options=luks-XXX=tries=5,timeout=0` gives
five typo-friendly retries with no auto-timeout.
## 3. fbcon=nodefer on installed-system cmdline (FIX)
Live ISO cmdline already has `fbcon=nodefer` (added in v0.5.27 to fix
the real-laptop black-screen-after-dracut). The installed-system
bootloader directive in the generated install ks did NOT carry it.
Same KMS handoff happens on the installed system on the same hardware.
Now both have the flag.
## 4. /etc/crypttab fallback assertion (BELT-BRACES)
Anaconda's custom-partitioning code path normally writes /etc/crypttab
for `--encrypted` part directives. Edge cases observed in F43+ where
it doesn't. Without crypttab, systemd-cryptsetup-generator can still
work from kernel cmdline alone, but cleanup paths and second-stage
unlock both fall over. Adding a fallback `echo` that writes the
canonical line if it's missing post-anaconda.
## 5. Initramfs LUKS module assertion (DEFENSIVE)
Force-include `crypt + systemd-cryptsetup + plymouth` modules in
initramfs via /etc/dracut.conf.d/10-veilor-luks.conf. dracut autodetects
these when it sees an active LUKS mapping, but %post runs before the
LUKS state is fully observable from the chroot. Plus we wipe stale
initramfs (`rm -f /boot/initramfs-*.img`) before `--regenerate-all`
so the regen actually rewrites bytes. Final assertion runs
`lsinitrd | grep -q cryptsetup` and surfaces a [ERR] line in build
output if the module didn't make it.
## What this should fix
After the man-db fix in v0.5.28-final, install proceeded past
"Configuring xxx" cleanly but died at "Installing boot loader" with
gen_grub_cfgstub. Root-cause was the over-broad patch from #1 above.
After v0.5.29:
- Install transaction completes (man-db excluded; non-man-db
scriptlet warnings still suppressed; real errors still raise)
- gen_grub_cfgstub runs against complete /boot/efi/EFI/fedora/
- Bootloader install completes
- Reboot to disk lands at GRUB veilor-os entry
- Kernel + initramfs load (cryptsetup confirmed present)
- Plymouth LUKS prompt appears with text fallback
- User has 5 tries, no timeout
- Unlock → btrfs subvol mount → systemd → SDDM
Files: kickstart/veilor-os.ks (+45 lines), overlay/usr/local/bin/veilor-installer (+50 lines).
Verified: bash -n clean, ksvalidator clean.
References:
pyanaconda transaction_progress.py:110-136 (4 producers of 'error')
pyanaconda bootloader/efi.py:194-201 (gen_grub_cfgstub call site)
/usr/bin/gen_grub_cfgstub (set -eu wrapper for grub2-mkconfig stub)
Fedora wiki Changes/RPM-6.0
dnf5 issue #2507 (RPM 6.0 scriptlet propagation regression)
THE actual root cause of the man-db transaction failure that killed
three consecutive VM installs (v0.5.26 / v0.5.27 / v0.5.28).
Confirmed via 7-agent research wave:
- Fedora 43 ships RPM 6.0, which changed scriptlet failure
propagation. Scriptlets that previously emitted "Non-critical
error" warnings now bubble up as transaction-level errors. dnf5
issue #2507 documents the change. Anaconda --cmdline mode treats
any 'error' token from the dnf transaction as a fatal abort.
- man-db's `transfiletriggerin` is the canonical trigger: it runs
`systemd-run /usr/bin/systemctl start man-db-cache-update` which
returns non-zero in the anaconda chroot (no PID 1 systemd) and is
flagged as transaction-level error under RPM 6.0.
- We previously patched anaconda's transaction_progress.py on the
BUILD HOST so livecd-creator could finish its own transaction.
That patch lives only on the host running the build — never landed
in the live rootfs the user installs from. Reproduced 3 times:
install-time anaconda on the live ISO is unpatched, hits the same
code path, aborts at exactly "Configuring man-db.x86_64".
Two-layer fix:
1. kickstart %post seds the file inside the live rootfs at build time
so the user's install-time anaconda is patched. Sed downgrades the
'error' token from raise PayloadInstallationError to log.warning.
2. Generated install ks excludes man-db / man-pages / man-pages-overrides
from %packages. Belt-and-braces — even if the patch has an edge
case the trigger never fires because the package isn't installed.
Users install man pages post-firstboot.
Previous attempts that didn't work: dropping the updates repo (only
narrowed the set of failing scriptlets, didn't fix the underlying
RPM-6.0 propagation change); flipping SELinux to permissive
(confirmed not the cause; kickstart's selinux directive only writes
/etc/selinux/config in target root, doesn't affect installer-time).
Follow-up for next release: replicate the transaction_progress patch
in the CI workflow's container so the build itself is deterministic.
Currently the workflow has been greening on luck.
Files: kickstart/veilor-os.ks (+25 lines), overlay/usr/local/bin/veilor-installer (+10 lines).
Verified: bash -n clean, ksvalidator clean.
Anaconda's transaction died at "Configuring man-db.x86_64" in both
v0.5.26 and v0.5.27 VM tests, reliably, days apart, against a freshly
populated package cache. Same failure pattern, same package, with
nothing in the visible error other than "The transaction process has
ended with errors..". Pattern matches the same Fedora `updates` repo
issue that the CI build kickstart already worked around by stripping
the `updates` line entirely (`.github/workflows/build-iso.yml` line
~109).
The installer-generated kickstart was adding the line back and
re-introducing the bug for every user install. This commit aligns
the install-time ks with the build-time ks: only the base `releases`
repo is consumed by anaconda. Users who want updates run `dnf
upgrade` post-install (or the v0.6 `veilor-update` wrapper).
Trade-off: first-boot package versions are frozen to the Fedora 43
release date instead of including post-release updates. Acceptable —
the alternative is "install reliably fails" which makes any
freshness conversation moot.
Verified locally: `bash -n` passes, ks template still well-formed.
End-to-end re-validation goes through the next CI ISO + VM test run.
Install-flow change + roadmap update. The roadmap entry is the
durable record; the code change is the immediate effect.
## Locale picker removed
The "[4/4] Locale" prompt is gone. Locale is hardcoded to en_US.UTF-8
for the install. Two reasons:
1. The picker only offered en_GB and en_US, both of which install
identically apart from the langtag string and a couple of date /
currency conventions that nobody who's mid-install is thinking
about. It's a fake choice that adds a screen.
2. `localectl set-locale` post-install handles every locale on earth
in one command. The v0.7 `veilor-postinstall` first-login menu (see
roadmap below) will offer a locale + keyboard layout switch with
live preview, which is the right place for that decision.
Step counters updated [1/4]→[1/3], [2/4]→[2/3], [3/4]→[3/3]. The Locale
row stays in the confirm-summary box because users still want to see
what they're getting installed.
## Roadmap
- New section v0.5.27–v0.5.28 — documents the install-path
stabilisation work explicitly so the bridge between "first green
ISO" and "looks polished" is not invisible. Calls out the LUKS BLS
fix that landed in v0.5.27 + the gum-input replacement scheduled
for v0.5.28.
- v0.6 — `veilor-doctor` description expanded: this is the
post-install audit tool. Every user runs it weekly to see drift
from baseline.
- v0.6 — new entry `veilor-postinstall`: EndeavourOS-style first-login
welcome menu, single TUI screen, asks once. Covers the "I just
installed, what do I configure" gap in one explicit step instead of
scattered docs.
Critical install bug fix + cosmetic round-up + first formal test
procedure document.
## Critical: LUKS unlock on first boot
Generated installer kickstart's %post was injecting `rd.luks.uuid=…`
into `/etc/default/grub` only. Fedora 43 uses BLS (Boot Loader
Specification) entries in `/boot/loader/entries/*.conf`; those are
NOT regenerated by `grub2-mkconfig`. Result: the kernel boots without
`rd.luks.uuid=`, dracut's cryptsetup-generator never spawns the
unlock unit, plymouth has no password to ask for, and dracut-initqueue
loops on dev-disk-by-uuid for ~3min before dropping to emergency
shell.
The fix layers both write paths:
- `/etc/default/grub` — keeps the args around for future kernels
(kernel-install reads this when adding new entries).
- `grubby --update-kernel=ALL --args=...` — rewrites the `options`
line of every existing BLS entry so the kernel that boots NEXT
actually has the args.
Verified by reading `/proc/cmdline` from the dracut emergency shell
on a v0.5.26 install; old cmdline had only `root=UUID=… ro
rootflags=subvol=root` and was missing the LUKS arg entirely.
## GRUB / branding
- `/etc/default/grub` is sed'd to `GRUB_DISTRIBUTOR="veilor-os"` (was
already there, kept).
- BLS entries' `title` line is rewritten in-place to "veilor-os
(<kver>)" for every kernel — `grub2-mkconfig` does not touch BLS
titles, so this is the only path.
- `/boot/loader/entries/*-0-rescue-*.conf` is removed: the auto-built
rescue entry was leaking "Fedora Linux" into the GRUB menu and
showing a second boot option that nobody asked for. The rescue
kernel image itself is left in /boot.
- Hostname defaults to `veilor` (was inheriting the `localhost-live`
name anaconda writes when the kickstart's network directive is
ignored under cmdline mode).
- `/etc/machine-info` adds `PRETTY_HOSTNAME="veilor-os"` so
`hostnamectl status` and any consumer reading machine-info see the
brand.
## Boot UX
- `fbcon=nodefer` added to live-ISO bootloader cmdline. On real
laptops with a hardware GPU, the kernel modeset blanks the
framebuffer console mid-boot; without `nodefer` the installer
banner draws into a frozen framebuffer and the user sees a black
screen with a blinking cursor for ~30s. virtio-vga in QEMU doesn't
trigger this so it never reproduced in VM. Symptom report on
v0.5.26 was the trigger to investigate.
## Installer cosmetics
- `GUM_CHOOSE_CURSOR` and `GUM_INPUT_PROMPT` switched from `❯ ` to
`> `. The unicode arrow falls back to a fixed-width block on the
linux fbcon font and lipgloss then duplicates that block at col +23,
producing the "Install Install" double-render and the stray-T
artifact in password fields. Plain ASCII renders identically across
fbcon, virtio-vga, and X/Wayland gum runs.
- `VERSION_ID` bumped 0.5.8 → 0.5.27 in the os-release drop-in. The
installer banner reads this at runtime, so the live ISO + installed
system both now show "veilor-os 0.5.27".
## Test procedure
- `test/TESTING.md` — first canonical test procedure document. Splits
VM (cheap iteration, hybrid sendkey + human passwords) from real
hardware (mandatory for tag). Documents the standard test passwords
(`veilortest1` for both LUKS and admin), the kill-and-relaunch step
to skip CD on second boot, and the per-step pass/fail contract.
- `test/METHOD-CHANGELOG.md` — append-only audit trail for changes to
the procedure. Future releases that alter the test method must add
an entry here with the why.
- `test/test-runs/_TEMPLATE.md` — per-run report template. Each
tagged release should land a filled report alongside it.
## test/run-vm.sh
Decoupled QEMU monitor sock setup from auto-inject. Previously
`NO_INJECT=1` (used to suppress autotype noise into prompts) also
killed the monitor sock, leaving the VM undriveable. Monitor sock is
now always exposed; only the inject helper is gated on the pubkey
detection.
User hit `/usr/local/bin/veilor-installer: line 33: /usr/bin/tee:
input/output error` on real-hardware install. Cause: LOG was
`/var/log/veilor-installer.log`, which on the live ISO is backed by an
overlay over squashfs. A bad sector / flaky USB → tee write fails →
process substitution dies → installer aborts before the menu renders.
Two changes:
1. Move LOG to /run/veilor-installer.log — pure tmpfs, never touches
the live medium. Same path also unaffected by /var fill or overlay
weirdness.
2. Wrap the `exec > >(tee -a $LOG) 2>&1` redirect in a writability
probe. If the log can't be appended to (tmpfs OOM, fd exhaustion,
anything), skip the tee and run the installer without on-disk
persistence rather than crashing.
Persistence is a nice-to-have for post-mortem debugging; the installer
running is the must-have. This inverts the priority correctly.
v0.5.22 plymouth details theme works (text scroll boot visible). But
LUKS prompt still never fires — dracut spins on dev-disk-by-uuid for
2+ min then drops to emergency shell.
Hypothesis: anaconda --cmdline mode skips/breaks the bootloader auto-
add of rd.luks.uuid arg. Without it, cryptsetup-generator in
initramfs has no UUID to create unlock unit for → no prompt → no
unlock → dracut times out.
Fix: chroot %post detects LUKS partition via blkid TYPE=crypto_LUKS,
injects `rd.luks.uuid=luks-<uuid>` into GRUB_CMDLINE_LINUX if not
already present. Belt-and-braces — if anaconda DID add it, sed
checks first.
Followed by grub2-mkconfig regen (already in script) so installed
grub.cfg picks up the new cmdline arg.
v0.5.21 set plymouth.enable=0 — plymouth-start.service still ran +
ate LUKS keystrokes. Boot fell to dracut emergency shell.
Better path: plymouth IS running but in TEXT mode via built-in
`details` theme (scrolling boot log, no graphics, no fedora logo).
LUKS prompt renders as text "Please enter passphrase for...:".
Plymouth still owns the prompt → keystrokes go through.
Changes:
- Drop plymouth.enable=0 from cmdline (let plymouth run)
- chroot %post: plymouth-set-default-theme details
- Drop rhgb quiet from GRUB_CMDLINE_LINUX_DEFAULT (all kernel msgs visible)
- dracut --force --regenerate-all (new theme baked into initramfs)
Result: text scroll boot → text LUKS prompt → text scroll → SDDM.
Onyx aesthetic. Branded plymouth theme deferred to v0.6.
User wants onyx-style boot: pure text scroll → LUKS prompt → text scroll
→ SDDM. No fedora splash, no plymouth UI.
Solution: keep plymouth PACKAGE installed (Fedora's dracut module
ships LUKS-prompt machinery via plymouth), but disable plymouthd at
runtime via kernel cmdline `plymouth.enable=0`.
Effect:
- plymouthd starts → reads cmdline → exits
- systemd-ask-password sees no plymouth daemon → falls back to
systemd-tty-ask-password-agent on /dev/console
- LUKS prompt rendered as text "Please enter passphrase for /dev/dm-0: "
- All kernel/systemd messages visible
- SDDM still launches at graphical.target (real install)
Applied to both:
- LIVE ks bootloader --append (so live boot text-mode + installer
visible on tty1, no splash hiding it)
- Generated install ks bootloader --append (so installed system
text-boots with LUKS prompt)
v0.6 will rebrand plymouth theme + re-enable for branded splash. For
v0.5.0 ship: minimal/text aesthetic matches user's onyx daily driver.
v0.5.18 added crypt + systemd-cryptsetup to dracut.conf.d/99-veilor-
no-plymouth.conf. Boot test still failed: dracut-initqueue stuck
waiting on dev-disk-by-uuid → systemd-cryptsetup never fired.
Diagnosis: %post chroot used bash glob `for kver in /lib/modules/*/`.
In chroot, shell may be dash + nullglob unset → unmatched glob
expands literally to "/lib/modules/*/" → dracut --kver "/lib/modules/*/"
fails silently with `|| true`. Initramfs never regenerated → still
contains the v0.5.14 omit_dracutmodules-only config without crypt.
Fix: dracut --force --regenerate-all (walks /lib/modules internally,
no shell glob needed). One call regens all kernel initramfses with
the new dracut.conf.d in scope.
v0.5.17 boot stuck at dracut-initqueue waiting for LUKS device that
never unlocks. Plymouth removal also dropped the dracut machinery
that prompts user for LUKS passphrase. Pure-text systemd-tty-ask-
password-agent works in real root but isn't bundled into initramfs.
Fix: dracut.conf.d/99-veilor-no-plymouth.conf:
add_dracutmodules+=" crypt systemd-cryptsetup "
install_items+=" /usr/bin/systemd-tty-ask-password-agent "
Result: dracut bundles systemd-cryptsetup + ask-password binary into
initramfs. cryptsetup-generator creates unit at boot, ask-password-
agent prompts on tty1 in text mode "Please enter passphrase for...:".
sendkey-friendly + works on real hardware.
Per user: bottom-screen "fedora" logo + spinner during boot is fedora-
logos package + GRUB graphical theme. v0.5.17 strips it for now —
v0.6 will re-introduce plymouth with veilor-black theme + LUKS-prompt-
friendly config.
Changes:
1. Kernel cmdline: add `logo.nologo console=tty0`
- logo.nologo: suppress kernel boot logo (Tux/Fedora leaf)
- console=tty0: explicit text console output
- `quiet` already absent → all kernel msgs scroll
2. /etc/default/grub patches in chroot %post:
- GRUB_DISTRIBUTOR="veilor-os" (menu titles read "veilor-os" not
"Fedora Linux 43...")
- GRUB_THEME= (empty — no graphical theme)
- GRUB_TERMINAL_OUTPUT="console" (text-mode menu, no gfxterm)
- Drop GRUB_BACKGROUND
3. Regen grub.cfg + EFI grub.cfg with new branding
Result: pure text scroll boot, white-on-black, no fedora artifacts.
veilor-os menu title in GRUB picker. "Hackery dope" aesthetic.
For real users wanting splash → v0.6 ships plymouth + veilor theme.
v0.5.14 ships a working install but auto-install harness can't SSH-
validate post-reboot — admin user has no authorized_keys, hardened
sshd rejects all auth. SSH up + listening but no path to log in.
Fix: detect_seed_pubkey() searches /dev/sr* for a NoCloud cidata
volume (label "cidata"), parses ssh_authorized_keys: list from
user-data, returns first key. generate_ks() then embeds as
sshkey --username=admin "ssh-ed25519 AAAA... user@host"
right after the user= directive. Anaconda creates
/home/admin/.ssh/authorized_keys with right perms (700/600).
Real users: drop a NoCloud seed iso next to install media (or via
USB), pubkey lands automatically. auto-install.sh: existing run-vm.sh
seed logic already builds the cidata iso with host pubkey.
If no seed → directive line empty → anaconda treats as no-op → SSH
validation blocked but install otherwise unaffected.
v0.5.13 added omit_dracutmodules+=plymouth + dracut --force regen in
chroot %post. Boot test still showed plymouth-start.service running.
Theory: the chroot dracut --force --kver loop didn't fire (kver glob
may have been empty in chroot), or anaconda regenerated initramfs
AFTER our %post and ignored our config drop-in.
Simpler fix: don't ship plymouth at all. Add `-plymouth
-plymouth-plugin-label -plymouth-system-theme` to kickstart %packages.
With no plymouth package on disk, dracut can't bundle it into
initramfs regardless of dracut.conf state.
The /etc/dracut.conf.d snippet + /dev/null masks from v0.5.12-13 stay
as belt-and-braces — harmless once plymouth is absent.
v0.5.12 added /dev/null symlinks for plymouth services on real root.
Boot test confirmed plymouth STILL starts: it lives in initramfs
(dracut module 90plymouth) which has its own bundled service files,
unaffected by /etc/systemd/system/ masks on the installed btrfs.
Two-layer fix:
1. /etc/dracut.conf.d/99-veilor-no-plymouth.conf:
omit_dracutmodules+=" plymouth "
Then `dracut -f --kver $kver` to regenerate initramfs sans plymouth.
2. Keep /dev/null symlinks for post-pivot real-root masking.
Result: LUKS prompt rendered as text by systemd-tty-ask-password-agent
on tty1 — sendkey-friendly, hardware-realistic.
v0.5.11 used `systemctl mask plymouth-*.service` in generated kickstart
%post chroot block. systemctl needs systemd running, which it isn't in
anaconda chroot — calls failed silently (|| true).
Boot test confirmed: post-reboot showed both:
Started plymouth-start.service - Show Plymouth Boot Screen
Started systemd-ask-password-plymouth.path - Forward Password Requests
Fix: write /dev/null symlinks directly (`ln -sf /dev/null
/etc/systemd/system/<unit>`). Achieves what mask does without needing
systemd. Also adds the path-activated unit
(systemd-ask-password-plymouth.path) which actually pulls plymouth in
during dracut-initqueue.
v0.5.10 added plymouth.enable=0 + rd.plymouth=0 to kernel cmdline,
but `plymouth-start.service` still registered and ran in real-root
boot. LUKS prompt remained invisible — dracut-initqueue stuck
waiting for /dev/disk/by-uuid/<luks> to materialize.
Belt-and-suspenders: mask plymouth-{start,quit,quit-wait,read-write,
switch-root}.service in the generated kickstart's %post chroot so the
units can never start.
Effect: systemd-tty-ask-password-agent handles LUKS prompt directly
on tty1 with text "Please enter passphrase for disk ...:" — sendkey-
friendly + works on real hardware too.
v0.5.9 GRUB-installs cleanly. Disk boots, dracut reaches
cryptsetup.target, systemd-ask-password-plymouth.path armed. But
plymouth never switches from boot-splash mode to password-prompt mode
— sendkey'd passphrases bounce, dracut waits forever on
dev-disk-by-uuid.
Workaround: pass `plymouth.enable=0 rd.plymouth=0` to kernel cmdline.
Eliminates plymouth-ask-password-plugin as a layer; LUKS prompt
appears as plain text on tty1 ("Please enter passphrase for disk... :").
Bonus: aligns with hardening posture. Plymouth is graphical eye-candy
running in pid 1's namespace during early boot. Fewer moving parts =
smaller attack surface. veilor-os defaults to text boot; users wanting
splash can re-enable post-install.
Auto-install round 3 hit:
Configuring storage
Creating disklabel on /dev/vda
Creating luks on /dev/vda3
Creating lvmpv on /dev/mapper/luks-...
Creating btrfs on /dev/mapper/veilor-root
Running in cmdline mode, no interactive debugging allowed.
The exact error message is:
mount failed: wrong fs type, bad option, bad superblock on
/dev/mapper/veilor-root, missing codepage or helper program
LVM+btrfs combination causes mount failure under anaconda --cmdline.
mkfs.btrfs runs but post-create mount can't find a valid superblock.
Fix: drop LVM intermediary. Use native btrfs-on-LUKS — same pattern
Fedora KDE Spin uses by default. Cleaner, snapshot story unchanged
(btrfs subvols give us root/home split + rollback potential without
LVM's overhead).
New layout:
vda1: efi (600M)
vda2: ext4 /boot (1024M)
vda3: LUKS2 → btrfs (label=veilor)
├── subvol root → /
└── subvol home → /home
ksvalidator clean on the new template.
v0.5.5 fixed AnacondaError: 'LANG'. Auto-install harness then
hit next crash:
TypeError: expected str, bytes or os.PathLike object, not NoneType
File "/usr/lib64/python3.14/site-packages/pyanaconda/display.py", line 223
wl_socket_path = os.path.join(os.getenv("XDG_RUNTIME_DIR"), ...)
anaconda's display.setup_display() unconditionally tries to set up
Wayland socket path. tty1 has no XDG_RUNTIME_DIR set. None gets passed
to os.path.join → TypeError.
Two-part fix:
1. export XDG_RUNTIME_DIR=/run/user/0 + mkdir, so even if anaconda
probes it, the env var has a valid string value.
2. Pass --cmdline to anaconda. Fully unattended text-only mode, no
Wayland/X/TUI. Right fit for our gum-driven kickstart flow where
ks is self-contained (disk, pw, locale all pre-answered).
Combined effect: anaconda goes straight from CLI parse → kickstart
execute → reboot. No display subsystem at all.
Surfaced by test/auto-install.sh round 2.
Auto-install harness ran through gum installer cleanly but anaconda
crashed at startup:
File "/usr/lib64/python3.14/site-packages/pyanaconda/keyboard.py",
line 152, in activate_keyboard
sync_run_task(task_proxy)
...
pyanaconda.modules.common.errors.general.AnacondaError: 'LANG'
Anaconda's keyboard.activate_keyboard() reads $LANG and bombs if unset.
TTY1 (where veilor-installer runs) inherits no locale by default;
gum/whiptail don't set it.
Fix: export LANG + LC_ALL = user's chosen locale (defaulting to
en_GB.UTF-8) before invoking anaconda.
Found via test/auto-install.sh end-to-end run.
User boot-tested v0.5.2 + in-VM patch. Requested polish:
- Banner: replace slant-figlet `veilor-os` + "hardened. branded. yours."
tagline with figlet ANSI Regular `VEILOR OS` wordmark (5-line block).
No tagline. Border preserved by gum style call.
- Menu header: "Welcome. What would you like to do?" → "Welcome"
- Menu labels:
"Install veilor-os to disk" → "Install"
"Try live — desktop (KDE Plasma)" → "live - (KDE)"
"Try live — shell" → "live - shell"
"Reboot" / "Power off" unchanged
- Hostname prompt removed — hardcoded to "veilor". User can change
post-install via hostnamectl. Cuts one prompt from install flow.
Confirmation summary drops the Hostname row.
- Locale options trimmed: en_GB.UTF-8, en_US.UTF-8 only (was 4 incl
de_DE, fr_FR). i18n not v0.5 priority.
Verified in-VM rendering of the menu changes via sed-patch on v0.5.2
ISO. ksvalidator + bash -n clean.
QEMU boot test of v0.5.2 found service still status=1/FAILURE despite
file present at /usr/local/bin/veilor-installer. Root cause via
`bash -x`: `exec > >(tee -a "$LOG") 2>&1` ran BEFORE require_tty
check; process substitution replaces fd1 with a pipe, so [[ -t 1 ]]
returns false → require_tty bails out with [ERR] message.
Order was self-inflicted bug from v0.5.0. Fix: move require_tty
function definition + call BEFORE the tee redirect. Drop the
redundant require_tty call in the entry block (would fail post-redirect).
QEMU boot test of v0.5.1 (commit 3cbffaf) revealed both scripts
missing from /usr/local/sbin/ on running system, despite being in
overlay/usr/local/sbin/ in the source tree.
Root cause: Fedora's filesystem package (or post-install scriptlet)
rewrites /usr/local/sbin → /usr/local/bin symlink AFTER kickstart
%post --nochroot's overlay copy runs. The cp -a placed files in
/usr/local/sbin/ as a real directory; the symlink replacement
deleted them.
Confirmed via tty diagnostic: `ls -la /usr/local` shows
`lrwxrwxrwx ... sbin -> bin` with bin mtime predating sbin symlink
ctime by ~5min — overlay copy ran first, scriptlet rewrote sbin
second.
Fix: move both binaries to overlay/usr/local/bin/ where they're
safe from the symlink rewrite. Update all references:
- kickstart/veilor-os.ks chmod path + chown + diagnostic ls
- overlay/etc/systemd/system/getty@tty1.service.d/veilor-installer.conf ExecStart
- overlay/etc/systemd/system/veilor-firstboot.service ExecStart
- scripts/selinux/build-policy.sh fcontext + restorecon paths
- generated install ks template inside veilor-installer
Service drop-in stays at /etc/systemd/system/getty@tty1.service.d/
unchanged. The veilor-installer binary in /usr/local/bin/ is
discoverable via $PATH same as before.
Two user-facing commands shipped in overlay/usr/local/bin/.
Wraps dnf+flatpak update flow and read-only health diagnostic.
Uses gum if available, plain output otherwise. No kickstart wiring
yet beyond chmod — full integration in v0.6.0 release.
Co-authored-by: veilor-org <admin@veilor.org>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* v0.5.1: gum installer + full veilor-os ks generation
Two changes, one commit (matches v0.5.1 milestone):
1. Swap whiptail → gum (charm.sh)
- Source /usr/share/veilor-os/assets/installer/colors.gum at top so all
prompts pick up branded GUM_* env vars.
- Render banner.txt via `gum style --border rounded`.
- Wrap every prompt behind prompt_choose / prompt_input / prompt_password
/ prompt_confirm / prompt_message / prompt_error helpers that dispatch
gum→whiptail based on `command -v gum`. Defensive: minimal images
without /usr/local/bin/gum still get a working TUI.
- Main menu items now use literal labels (case-matched), not 1..5 tags.
2. Generated kickstart now installs full veilor-os
Previously emitted a vanilla F43 KDE + ~12 hardening packages with no
overlay/scripts/branding. Now mirrors live ks (kickstart/veilor-os.ks
63-141) for %packages, plus:
- %post --nochroot copies overlay/, scripts/, assets/ from
/run/install/repo/veilor (single source — boot ISO mount path).
- %post (chroot) runs scripts/10-harden-base.sh, 20-harden-kernel.sh,
selinux/build-policy.sh, kde-theme-apply.sh.
- `chage -d 0 admin` so first login forces password change. (Account
itself is created by anaconda from the `user` directive — admin pw
collected via gum is passed through --plaintext.)
- `systemctl set-default graphical.target` (real install boots SDDM,
not the TTY1 installer like live).
- Drops live-only entries (livesys-scripts, anaconda-live, dracut-live,
isomd5sum, xorriso, livesys.service enables).
Tested: bash -n clean; ksvalidator on a substituted-placeholder copy
exits 0.
gum binary itself (/usr/local/bin/gum) is vendored by a separate
build-side change — not in this PR.
* fix: escape sed special chars + reject & | / in passwords
Reviewer found a password like aA1!@#%^&*()_-+={}[] becomes
aA1!@#%^__ADMIN_PW__*()_-+={}[] because sed expands & to matched
pattern. Two layers of defense:
1. validate_pw rejects & | / newline at input
2. sed_escape() helper escapes any remaining special chars before
substitution
---------
Co-authored-by: veilor-org <admin@veilor.org>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Pre-existing shellcheck failure blocking all PR merges. Standard
"double-quote array expansions" fix. No behavior change.
Co-authored-by: veilor-org <admin@veilor.org>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Bugs found by agent linter on v0.5.0-alpha:
1. logvol missing --size: ksvalidator rejected. Added --size=8192 --grow.
2. bootloader --location=mbr on UEFI: conflicts with /boot/efi part.
Switched to --location=none (anaconda auto-detects EFI vs BIOS).
3. lsblk awk truncated multi-word disk models ("WD PC SN740" → "WD").
Now collapses model spaces to underscores, preserves full string.
Also added mmcblk to disk regex (eMMC support).
4. Heredoc with $VAR expansion + passwords containing $/`/" corrupted
generated ks. Now: single-quoted heredoc + sed placeholder
substitution. Plus input validator rejects "$\` chars in passwords.
ksvalidator clean on sample generated ks.
bash -n clean.
CI build still in flight (3328ffb). This pushes a new commit; CI will
run again with these fixes. Net delay: zero (3328ffb's installer was
broken anyway, so its ISO unusable for install path).
CI builds in fresh Fedora 43 container — matched pcre2/libselinux/selinux-policy
versions, no fix-repo hack needed. Container starts every run from clean
state, no zombie collisions. Fastest path to first green ISO.