Per docs/research/2026-05-05-agent-wave/README.md priority list.
All 7 land together to keep iteration cycles useful — partial fixes
bury the lookahead findings agents already mapped.
## 1. CRITICAL — suspend/resume wifi death (Agent 9, B2)
`veilor-modules-lock.service` runs `kernel.modules_disabled=1` 30s
after graphical.target. iwlwifi/iwlmvm/cfg80211 reload on resume
from S3/S0ix → with modules locked, resume breaks wifi until
reboot. Same architectural class as the LUKS bug — security feature
breaks legitimate kernel state transitions.
The unit already has `ConditionKernelCommandLine=!module.sig_enforce=1`
(self-skip when signed-modules enforcement is on cmdline). Adding
`module.sig_enforce=1` to the kernel cmdline retains the security
property (no unsigned modules) without runtime lock-down → resume
works.
Files: kickstart/veilor-os.ks line 61 + overlay/usr/local/bin/veilor-installer
generated bootloader directive both gain `module.sig_enforce=1`.
## 2. veilor-firstboot.service WantedBy=graphical.target (Agent 2)
Was `WantedBy=multi-user.target` only. Real installs default to
graphical.target so the unit never ran on installed systems — admin
pw stayed at install-time + chage -d 0 expired, SDDM PAM bounced
to chauthtok screen (recoverable but ugly UX).
Now `WantedBy=graphical.target multi-user.target`. Live ISO +
multi-user installs both resolve via this list.
## 3. USBGuard hash → id-based baseline (Agent 9, A3)
Mirrors memory feedback_usbguard_dock.md — onyx had hash+parent-hash
rules that broke on dock replug; we shipped no rules.conf so first
boot blocks the USB keyboard.
Adds overlay/etc/usbguard/rules.conf with HID-class allow rule
(`allow with-interface match-all { 03:*:* }`) — covers every USB
keyboard, mouse, gamepad, fingerprint reader, NFC. Survives dock
replug + kernel-bump vendor renumeration. Mass-storage stays
implicit-block; user explicitly allows post-firstboot via
`ujust veilor-usbguard-enroll` (planned v0.6).
## 4. firewalld trusted zone with tailscale0 pre-bound (Agent 9, D1)
User uses Tailscale daily (memory: project_tailscale_mesh.md).
Default firewalld zone = drop, blocks tailnet traffic on tailscale0.
Adds overlay/etc/firewalld/zones/trusted.xml with
`<interface name="tailscale0"/>`. After `tailscale up` brings the
interface up, NetworkManager dispatcher associates it with the
trusted zone automatically — no user intervention.
Default zone stays drop. Only the tailscale0 interface gets ACCEPT.
## 5. /etc/skel branding (Agent 7)
Was completely empty. Result: per-user KDE config (~/.config/kdeglobals
etc.) pre-empty, so the moment user opened System Settings, KDE wrote
fresh ~/.config/* and silently shadowed our /etc/xdg/kdedefaults/*.
Visual brand evaporated on first click.
Seeds:
/etc/skel/.config/kdeglobals (copy of assets/kde/veilor-default.kdeglobals)
/etc/skel/.config/breezerc (copy of assets/kde/breezerc)
/etc/skel/.config/kwinrc (Plasma 6 wayland defaults: opengl, animspeed=0,
blur off, click-to-focus)
/etc/skel/.config/konsolerc (default profile = Veilor)
/etc/skel/.local/share/konsole/Veilor.profile + .colorscheme
User who opens System Settings now writes against branded baseline,
not against vanilla Breeze.
## 6. KMS modeset args + initramfs keymap (Agents 1 + 9)
Real laptop boot has a 5-15s blank between vt switch and SDDM start
because simpledrm releases before i915/nvidia-drm/amdgpu claim. Plus
non-US users get locked out at LUKS prompt because initramfs ships
en-US keymap by default (RHBZ 1405539, RHBZ 1890085).
Adds to bootloader cmdline (live + installed):
i915.modeset=1 amdgpu.modeset=1 nvidia-drm.modeset=1
rd.vconsole.keymap=us
`rd.vconsole.keymap=us` is a placeholder; the v0.6 firstboot keymap
picker will rewrite it from /etc/vconsole.conf. Until then, en-US
users get correct LUKS keyboard; non-US users still need the v0.6
fix (per Agent 1).
## 7. virtio-9p log capture (Agent 6)
The v0.5.30 virtio-serial wiring depends on rsyslog inside the live
ISO (anaconda's setupVirtio writes a rsyslog forward rule), which
the live ks doesn't install — files were 0-byte across three
install runs.
test/run-vm.sh now adds a `-virtfs local,...,mount_tag=hostlogs`
share pointing at `test/test-runs/<timestamp>/`. veilor-installer
runs `_dump_logs_to_host` via EXIT trap that mounts the share at
/mnt/hostlogs and rsyncs /tmp/{anaconda,program,storage,packaging,dnf}.log
+ /var/log/veilor-installer.log + dmesg + journalctl + the generated
ks. Runs on success AND failure AND ^C.
No-op on real hardware (9p tag absent) — VM-only debug.
## Validate
bash -n overlay/usr/local/bin/veilor-installer # OK
ksvalidator kickstart/veilor-os.ks # clean
## Out-of-scope for v0.5.32 (deferred to v0.6)
Per Agent 1 follow-ups: argon2id retune for slow CPUs, recovery key
generation in firstboot, TPM2/FIDO2 unlock helpers. Per Agent 9
follow-ups: Plasma Wayland fallback X11 install, lid-close handling,
SELinux relabel progress UX. Per Agent 4: AppArmor stack +
nftables preset + audit log shipping CLI.
Per Agent 8 (CI hardening): SHA-pin actions + dependabot + SBOM +
SLSA L3 attestation — separate workflow-only commit.
Three-layer fix for the persistent anaconda transaction failure that
killed v0.5.28 (gen_grub_cfgstub) and v0.5.29 (aggregate dnf5 error).
## Layer 1: broad error suppression in transaction_progress.py
dnf5 under RPM 6.0 + cmdline anaconda emits a final aggregate
`error("transaction process has ended with errors..")` at end of
transaction whenever its internal failure counter > 0, regardless of
whether we suppressed individual script_error events. Reproduced
twice. The narrow patch in v0.5.29 suppressed per-package errors but
the aggregate still raised PayloadInstallationError and aborted the
install before the bootloader phase ran.
v0.5.30 patch turns the `elif token == 'error':` branch in
process_transaction_progress into a log.warning. All four producers
(cpio_error, script_error, unpack_error, generic error) now flow
through to a warning + continue. Pattern matches both the original
anaconda layout AND the v0.5.29 narrow-patched layout, so re-applying
on top of either is a no-op.
This brings us back to v0.5.28 broad-suppression behaviour. The
side effect that bit us in v0.5.28 (silent grub2-efi-x64 scriptlet
failure → empty /boot/efi/EFI/fedora/ → gen_grub_cfgstub fails)
is addressed by Layer 2 below.
## Layer 2: bootloader install moved out of anaconda
The generated install kickstart now has `bootloader --location=none`,
which tells anaconda NOT to invoke its own bootloader install code
path (and therefore NOT to call gen_grub_cfgstub). All grub work
moves into the chroot %post block:
1. `dnf reinstall grub2-efi-x64 grub2-pc grub2-tools shim-x64
efibootmgr` — re-runs scriptlets in the chroot with full
PID 1 systemd state, so the systemd-run-style triggers that
anaconda's chroot truncates actually execute.
2. `grub2-install --target=x86_64-efi --efi-directory=/boot/efi
--bootloader-id=fedora --no-nvram` — populates /boot/efi/EFI/fedora/
3. `gen_grub_cfgstub /boot/grub2 /boot/efi/EFI/fedora` (or
`grub2-mkconfig` fallback) — writes /boot/efi/EFI/fedora/grub.cfg.
4. `efibootmgr -c -d <disk> -p <part> -L "veilor-os" -l \EFI\fedora\shimx64.efi`
— registers the NVRAM boot entry pointing at the signed shim.
Each step logs to stdout and continues on failure (`set +e` block);
diagnostics surface in the install log without aborting the whole
%post.
## Layer 3: virtio-serial log capture in run-vm.sh
Anaconda 43.x autodetects `/dev/virtio-ports/org.fedoraproject.anaconda.log.0`
and streams program/packaging/storage/anaconda logs through it in
real time, before any tmpfs / pivot, before networking, surviving
kernel panic. Wiring it into run-vm.sh means the host gets a
tail-able log file at `test/anaconda-vm-YYYYMMDD-HHMMSS.log` for
every VM run.
We've lost logs three times in a row to anaconda failures + tmpfs
reboots. This breaks the loop.
## Diagnostic story
Before this commit: VM aborts → live ISO reboots itself → /tmp/
tmpfs gone → no logs → guess what failed. Three days, two and a
half false fixes.
After this commit: VM aborts → host has /home/admin/ai-lab/_github/veilor-os/test/anaconda-vm-*.log
with the actual scriptlet output, the actual exit codes, the
actual file-trigger failures. Future debug becomes evidence-based.
Files changed:
kickstart/veilor-os.ks — broad error suppression patch
overlay/usr/local/bin/veilor-installer — --location=none + manual grub
test/run-vm.sh — virtio-serial chardev wiring
Verified: bash -n clean, ksvalidator clean.
Critical install bug fix + cosmetic round-up + first formal test
procedure document.
## Critical: LUKS unlock on first boot
Generated installer kickstart's %post was injecting `rd.luks.uuid=…`
into `/etc/default/grub` only. Fedora 43 uses BLS (Boot Loader
Specification) entries in `/boot/loader/entries/*.conf`; those are
NOT regenerated by `grub2-mkconfig`. Result: the kernel boots without
`rd.luks.uuid=`, dracut's cryptsetup-generator never spawns the
unlock unit, plymouth has no password to ask for, and dracut-initqueue
loops on dev-disk-by-uuid for ~3min before dropping to emergency
shell.
The fix layers both write paths:
- `/etc/default/grub` — keeps the args around for future kernels
(kernel-install reads this when adding new entries).
- `grubby --update-kernel=ALL --args=...` — rewrites the `options`
line of every existing BLS entry so the kernel that boots NEXT
actually has the args.
Verified by reading `/proc/cmdline` from the dracut emergency shell
on a v0.5.26 install; old cmdline had only `root=UUID=… ro
rootflags=subvol=root` and was missing the LUKS arg entirely.
## GRUB / branding
- `/etc/default/grub` is sed'd to `GRUB_DISTRIBUTOR="veilor-os"` (was
already there, kept).
- BLS entries' `title` line is rewritten in-place to "veilor-os
(<kver>)" for every kernel — `grub2-mkconfig` does not touch BLS
titles, so this is the only path.
- `/boot/loader/entries/*-0-rescue-*.conf` is removed: the auto-built
rescue entry was leaking "Fedora Linux" into the GRUB menu and
showing a second boot option that nobody asked for. The rescue
kernel image itself is left in /boot.
- Hostname defaults to `veilor` (was inheriting the `localhost-live`
name anaconda writes when the kickstart's network directive is
ignored under cmdline mode).
- `/etc/machine-info` adds `PRETTY_HOSTNAME="veilor-os"` so
`hostnamectl status` and any consumer reading machine-info see the
brand.
## Boot UX
- `fbcon=nodefer` added to live-ISO bootloader cmdline. On real
laptops with a hardware GPU, the kernel modeset blanks the
framebuffer console mid-boot; without `nodefer` the installer
banner draws into a frozen framebuffer and the user sees a black
screen with a blinking cursor for ~30s. virtio-vga in QEMU doesn't
trigger this so it never reproduced in VM. Symptom report on
v0.5.26 was the trigger to investigate.
## Installer cosmetics
- `GUM_CHOOSE_CURSOR` and `GUM_INPUT_PROMPT` switched from `❯ ` to
`> `. The unicode arrow falls back to a fixed-width block on the
linux fbcon font and lipgloss then duplicates that block at col +23,
producing the "Install Install" double-render and the stray-T
artifact in password fields. Plain ASCII renders identically across
fbcon, virtio-vga, and X/Wayland gum runs.
- `VERSION_ID` bumped 0.5.8 → 0.5.27 in the os-release drop-in. The
installer banner reads this at runtime, so the live ISO + installed
system both now show "veilor-os 0.5.27".
## Test procedure
- `test/TESTING.md` — first canonical test procedure document. Splits
VM (cheap iteration, hybrid sendkey + human passwords) from real
hardware (mandatory for tag). Documents the standard test passwords
(`veilortest1` for both LUKS and admin), the kill-and-relaunch step
to skip CD on second boot, and the per-step pass/fail contract.
- `test/METHOD-CHANGELOG.md` — append-only audit trail for changes to
the procedure. Future releases that alter the test method must add
an entry here with the why.
- `test/test-runs/_TEMPLATE.md` — per-run report template. Each
tagged release should land a filled report alongside it.
## test/run-vm.sh
Decoupled QEMU monitor sock setup from auto-inject. Previously
`NO_INJECT=1` (used to suppress autotype noise into prompts) also
killed the monitor sock, leaving the VM undriveable. Monitor sock is
now always exposed; only the inject helper is gated on the pubkey
detection.