veilor-os/test
veilor-org e83483a077 v0.5.30: broad error suppression + manual bootloader + virtio log capture
Three-layer fix for the persistent anaconda transaction failure that
killed v0.5.28 (gen_grub_cfgstub) and v0.5.29 (aggregate dnf5 error).

## Layer 1: broad error suppression in transaction_progress.py

dnf5 under RPM 6.0 + cmdline anaconda emits a final aggregate
`error("transaction process has ended with errors..")` at end of
transaction whenever its internal failure counter > 0, regardless of
whether we suppressed individual script_error events. Reproduced
twice. The narrow patch in v0.5.29 suppressed per-package errors but
the aggregate still raised PayloadInstallationError and aborted the
install before the bootloader phase ran.

v0.5.30 patch turns the `elif token == 'error':` branch in
process_transaction_progress into a log.warning. All four producers
(cpio_error, script_error, unpack_error, generic error) now flow
through to a warning + continue. Pattern matches both the original
anaconda layout AND the v0.5.29 narrow-patched layout, so re-applying
on top of either is a no-op.

This brings us back to v0.5.28 broad-suppression behaviour. The
side effect that bit us in v0.5.28 (silent grub2-efi-x64 scriptlet
failure → empty /boot/efi/EFI/fedora/ → gen_grub_cfgstub fails)
is addressed by Layer 2 below.

## Layer 2: bootloader install moved out of anaconda

The generated install kickstart now has `bootloader --location=none`,
which tells anaconda NOT to invoke its own bootloader install code
path (and therefore NOT to call gen_grub_cfgstub). All grub work
moves into the chroot %post block:

  1. `dnf reinstall grub2-efi-x64 grub2-pc grub2-tools shim-x64
     efibootmgr` — re-runs scriptlets in the chroot with full
     PID 1 systemd state, so the systemd-run-style triggers that
     anaconda's chroot truncates actually execute.
  2. `grub2-install --target=x86_64-efi --efi-directory=/boot/efi
     --bootloader-id=fedora --no-nvram` — populates /boot/efi/EFI/fedora/
  3. `gen_grub_cfgstub /boot/grub2 /boot/efi/EFI/fedora` (or
     `grub2-mkconfig` fallback) — writes /boot/efi/EFI/fedora/grub.cfg.
  4. `efibootmgr -c -d <disk> -p <part> -L "veilor-os" -l \EFI\fedora\shimx64.efi`
     — registers the NVRAM boot entry pointing at the signed shim.

Each step logs to stdout and continues on failure (`set +e` block);
diagnostics surface in the install log without aborting the whole
%post.

## Layer 3: virtio-serial log capture in run-vm.sh

Anaconda 43.x autodetects `/dev/virtio-ports/org.fedoraproject.anaconda.log.0`
and streams program/packaging/storage/anaconda logs through it in
real time, before any tmpfs / pivot, before networking, surviving
kernel panic. Wiring it into run-vm.sh means the host gets a
tail-able log file at `test/anaconda-vm-YYYYMMDD-HHMMSS.log` for
every VM run.

We've lost logs three times in a row to anaconda failures + tmpfs
reboots. This breaks the loop.

## Diagnostic story

Before this commit: VM aborts → live ISO reboots itself → /tmp/
tmpfs gone → no logs → guess what failed. Three days, two and a
half false fixes.

After this commit: VM aborts → host has /home/admin/ai-lab/_github/veilor-os/test/anaconda-vm-*.log
with the actual scriptlet output, the actual exit codes, the
actual file-trigger failures. Future debug becomes evidence-based.

Files changed:
  kickstart/veilor-os.ks        — broad error suppression patch
  overlay/usr/local/bin/veilor-installer — --location=none + manual grub
  test/run-vm.sh                — virtio-serial chardev wiring

Verified: bash -n clean, ksvalidator clean.
2026-05-05 11:59:35 +01:00
..
test-runs v0.5.27: rd.luks.uuid via grubby, GRUB rebrand, fbcon=nodefer, ASCII gum cursor 2026-05-05 01:43:00 +01:00
auto-install-keymap.sh v0.5.5: autonomous install test harness (#12) 2026-05-02 22:49:51 +01:00
auto-install.sh test/auto-install.sh: auto-fetch + reassemble chunked ISO from ci-latest 2026-05-02 22:50:37 +01:00
boot-checklist.md veilor-os v0.1 scaffold — kickstart + hardening + 3-mode power + DuckSans-ready KDE black theme 2026-04-30 03:43:33 +01:00
METHOD-CHANGELOG.md v0.5.27: rd.luks.uuid via grubby, GRUB rebrand, fbcon=nodefer, ASCII gum cursor 2026-05-05 01:43:00 +01:00
README.md v0.5.5: autonomous install test harness (#12) 2026-05-02 22:49:51 +01:00
run-vm.sh v0.5.30: broad error suppression + manual bootloader + virtio log capture 2026-05-05 11:59:35 +01:00
TESTING.md v0.5.27: rd.luks.uuid via grubby, GRUB rebrand, fbcon=nodefer, ASCII gum cursor 2026-05-05 01:43:00 +01:00

test/

Test harnesses for veilor-os ISO builds.

Files

File Purpose
run-vm.sh Manual smoke test — boot the latest ISO interactively in QEMU/KVM. SSH key injection via cloud-init seed + monitor sendkey fallback for live-image login.
auto-install.sh Autonomous end-to-end install test. Boots ISO, drives the gum installer via QEMU monitor sendkey, waits for anaconda to finish + reboot, SSHs into the installed system, runs validation checklist. Prints PASS/FAIL summary.
auto-install-keymap.sh Sourced helper. Provides km_send_str, km_send_chord, km_send_key, km_screendump, km_wait_socket, etc. Reusable by other automation.
boot-checklist.md Manual post-install checklist (run on a real spare laptop).

Running the autonomous installer test

./test/auto-install.sh build/out/veilor-os-*.iso

Hardcoded inputs (deterministic — do not edit during a test run):

  • Disk: first /dev/vda (the only disk in QEMU)
  • Hostname: veilor (installer hardcoded since v0.5.4)
  • LUKS passphrase: testpass1234
  • Admin password: adminpass1234
  • Locale: en_GB.UTF-8

Expected runtime: 2030 minutes wall clock (anaconda dominates).

Outputs

  • /tmp/veilor-auto-install.log — full driver log
  • /tmp/veilor-auto-install-NN-<step>.png — milestone screenshots
  • /tmp/veilor-auto-install-final-ssh.txt — final SSH session capture (uname/lsblk/cmdline/failed units)

Exit codes

  • 0 — all validation checks passed
  • 1 — any failure (anaconda crashed, SSH never came up, validation check failed)
  • 2 — preflight failure (missing tool, bad ISO arg, missing OVMF)

Prerequisites

  • qemu-system-x86_64, qemu-img, socat, ssh, ssh-keygen
  • edk2-ovmf (OVMF UEFI firmware at /usr/share/edk2/ovmf/OVMF_{CODE,VARS}.fd)
  • mkisofs or xorriso (for cloud-init seed ISO; harness falls back to TTY1 driving if seed cannot be built or cloud-init does not run on the installed system)
  • convert from ImageMagick (optional — converts PPM screendumps to PNG; harness keeps PPM if absent)
  • KVM access (/dev/kvm readable by the user)

What it validates

Post-install on the booted system:

  • /etc/os-releaseNAME=veilor-os
  • hostnamectl --staticveilor
  • systemctl is-activeactive for sshd fail2ban usbguard tuned auditd firewalld chronyd sddm
  • getenforceEnforcing (preferred) or Permissive (acceptable for v0.5.x)
  • lsblk -f shows crypto_LUKS + btrfs
  • /etc/crypttab has a LUKS entry
  • getent passwd admin returns the user
  • /usr/local/bin/{veilor-power,veilor-doctor,veilor-update} are present and executable
  • /proc/cmdline contains init_on_alloc=1

Troubleshooting

  • Stuck at boot banner: ISO didn't autostart veilor-installer on tty1. Check serial.log and auto-install-vm-NN-*.png screenshots. The harness aborts after 5 minutes of identical screen frames.
  • SSH never up: cloud-init may not have run on the installed system (no cidata mount). The harness falls back to TTY1 driving — typing the LUKS passphrase, logging in as admin, and hand-injecting the SSH key. If both paths fail, validation cannot proceed.
  • screendump produces unreadable PPM: install ImageMagick (dnf install ImageMagick) so the harness converts to PNG.