veilor-os/test/TESTING.md
veilor-org 1881c14ea7 v0.5.27: rd.luks.uuid via grubby, GRUB rebrand, fbcon=nodefer, ASCII gum cursor
Critical install bug fix + cosmetic round-up + first formal test
procedure document.

## Critical: LUKS unlock on first boot

Generated installer kickstart's %post was injecting `rd.luks.uuid=…`
into `/etc/default/grub` only. Fedora 43 uses BLS (Boot Loader
Specification) entries in `/boot/loader/entries/*.conf`; those are
NOT regenerated by `grub2-mkconfig`. Result: the kernel boots without
`rd.luks.uuid=`, dracut's cryptsetup-generator never spawns the
unlock unit, plymouth has no password to ask for, and dracut-initqueue
loops on dev-disk-by-uuid for ~3min before dropping to emergency
shell.

The fix layers both write paths:
- `/etc/default/grub` — keeps the args around for future kernels
  (kernel-install reads this when adding new entries).
- `grubby --update-kernel=ALL --args=...` — rewrites the `options`
  line of every existing BLS entry so the kernel that boots NEXT
  actually has the args.

Verified by reading `/proc/cmdline` from the dracut emergency shell
on a v0.5.26 install; old cmdline had only `root=UUID=… ro
rootflags=subvol=root` and was missing the LUKS arg entirely.

## GRUB / branding

- `/etc/default/grub` is sed'd to `GRUB_DISTRIBUTOR="veilor-os"` (was
  already there, kept).
- BLS entries' `title` line is rewritten in-place to "veilor-os
  (<kver>)" for every kernel — `grub2-mkconfig` does not touch BLS
  titles, so this is the only path.
- `/boot/loader/entries/*-0-rescue-*.conf` is removed: the auto-built
  rescue entry was leaking "Fedora Linux" into the GRUB menu and
  showing a second boot option that nobody asked for. The rescue
  kernel image itself is left in /boot.
- Hostname defaults to `veilor` (was inheriting the `localhost-live`
  name anaconda writes when the kickstart's network directive is
  ignored under cmdline mode).
- `/etc/machine-info` adds `PRETTY_HOSTNAME="veilor-os"` so
  `hostnamectl status` and any consumer reading machine-info see the
  brand.

## Boot UX

- `fbcon=nodefer` added to live-ISO bootloader cmdline. On real
  laptops with a hardware GPU, the kernel modeset blanks the
  framebuffer console mid-boot; without `nodefer` the installer
  banner draws into a frozen framebuffer and the user sees a black
  screen with a blinking cursor for ~30s. virtio-vga in QEMU doesn't
  trigger this so it never reproduced in VM. Symptom report on
  v0.5.26 was the trigger to investigate.

## Installer cosmetics

- `GUM_CHOOSE_CURSOR` and `GUM_INPUT_PROMPT` switched from `❯ ` to
  `> `. The unicode arrow falls back to a fixed-width block on the
  linux fbcon font and lipgloss then duplicates that block at col +23,
  producing the "Install Install" double-render and the stray-T
  artifact in password fields. Plain ASCII renders identically across
  fbcon, virtio-vga, and X/Wayland gum runs.
- `VERSION_ID` bumped 0.5.8 → 0.5.27 in the os-release drop-in. The
  installer banner reads this at runtime, so the live ISO + installed
  system both now show "veilor-os 0.5.27".

## Test procedure

- `test/TESTING.md` — first canonical test procedure document. Splits
  VM (cheap iteration, hybrid sendkey + human passwords) from real
  hardware (mandatory for tag). Documents the standard test passwords
  (`veilortest1` for both LUKS and admin), the kill-and-relaunch step
  to skip CD on second boot, and the per-step pass/fail contract.
- `test/METHOD-CHANGELOG.md` — append-only audit trail for changes to
  the procedure. Future releases that alter the test method must add
  an entry here with the why.
- `test/test-runs/_TEMPLATE.md` — per-run report template. Each
  tagged release should land a filled report alongside it.

## test/run-vm.sh

Decoupled QEMU monitor sock setup from auto-inject. Previously
`NO_INJECT=1` (used to suppress autotype noise into prompts) also
killed the monitor sock, leaving the VM undriveable. Monitor sock is
now always exposed; only the inject helper is gated on the pubkey
detection.
2026-05-05 01:43:00 +01:00

7.4 KiB
Raw Blame History

veilor-os — Testing Procedure

This document is the canonical procedure for validating a veilor-os ISO end-to-end. Every release that gets a tag MUST have a corresponding test-run report in test/test-runs/ linked from the release notes.

If reality forces you to deviate from the steps below, do not silently patch the procedure — open a commit that updates this file and appends an entry to test/METHOD-CHANGELOG.md explaining what changed and why. The changelog is what makes the procedure auditable; the procedure itself is just the latest snapshot.


Two test environments

Environment Catches Doesn't catch
VM (QEMU + virtio-vga) install logic, kickstart bugs, %post failures, anaconda transaction failures, GRUB write, BLS entries, package selection, network stack KMS / fbcon issues, real-firmware Secure Boot, USB controller quirks, GPU driver compatibility, sleep/wake, battery, thermals
Real hardware (USB → spare laptop) everything VM doesn't install repeatability (you only have so many spare laptops)

Both are required for any tagged release. VM first (cheap iteration), real hardware second (final sign-off).


VM test — hybrid procedure

The VM cannot type LUKS / admin passwords through QEMU's sendkey monitor command — plymouth's IPC ignores synthesised keystrokes (we verified this across 14+ sendkey variants in earlier sessions). The hybrid procedure splits the work: Claude/automation drives every step that doesn't need a password; the human types the two passwords (LUKS

  • admin) into the QEMU window directly.

Standard test passwords (lab use only — never reuse outside this repo):

Prompt Type
LUKS passphrase veilortest1
Admin password veilortest1

Both passwords identical on purpose — easier to remember mid-test, both satisfy the installer's 8-char min, neither contains shell-special chars (validate_pw rejects " $ \ \ & | / \n`).

Run a VM test

cd ~/ai-lab/_github/veilor-os
# Pull the ISO you want to test (from a CI release or local build)
ls /home/admin/Downloads/veilor-os-*.iso

# Wipe stale state, launch VM with monitor sock (no auto-inject — we
# don't want sendkey noise typing into prompts)
FRESH=1 NO_INJECT=1 DISPLAY=:0 ./test/run-vm.sh \
    /home/admin/Downloads/veilor-os-43-YYYYMMDD-HHMMSS.iso

Then either (a) drive the install yourself in the QEMU window, or (b) hand the monitor sock to Claude / a script:

  • Monitor sock: test/veilor-vm.monitor.sock
  • Send a key: echo "sendkey ret" | socat - "UNIX-CONNECT:$SOCK"
  • Screendump: echo "screendump /tmp/x.ppm" | socat - "UNIX-CONNECT:$SOCK"; magick /tmp/x.ppm /tmp/x.png

Steps to verify

The complete checklist lives in test/boot-checklist.md — that file is the granular pass/fail list. The high-level flow is:

  1. Live boot. GRUB (legacy menu, no Plymouth splash) → text scroll → veilor-installer banner on tty1 within ~30s. No "fedora" branding anywhere on screen.
  2. Installer menu. "Install" highlighted by default. No phantom duplicate items, no stray characters in input fields.
  3. Disk picker. /dev/vda (or whatever virtio gives you) listed with size + model.
  4. Passwords. LUKS + admin prompts; user types veilortest1 twice.
  5. Locale. en_GB.UTF-8 picks up.
  6. Confirm. Disk shown with WILL BE ERASED, locale + LUKS/admin ticks shown.
  7. Anaconda. "Installing veilor-os to /dev/vda · 1030 min · logs on tty4". Watch for Configuring man-db — if anything fails, this is historically where it dies.
  8. Reboot. VM reboots; ISO must NOT boot first this time. Kill QEMU + relaunch without ISO drive (see Boot installed disk below) to skip the GRUB-from-ISO path.
  9. GRUB. Single "veilor-os" entry (no rescue, no "Fedora Linux").
  10. LUKS prompt. Plymouth details theme — text-mode prompt for passphrase. User types veilortest1 in the QEMU window (sendkey will not work).
  11. First boot. SDDM splash → admin user pre-filled → admin types veilortest1 → password-change prompt (chage -d 0 expired the password) → user picks new password → KDE Plasma session.
  12. Hardening checks per test/boot-checklist.md (SELinux enforcing, fail2ban active, USBGuard active, tuned profile, etc.).

Boot installed disk (skip ISO)

After the install reboots, QEMU's CD-first boot order will land back in the live ISO. Easiest workaround: kill QEMU and re-launch without the -drive file=...iso line. The qcow2 retains the install:

pkill -f 'qemu-system.*veilor-os'
cd ~/ai-lab/_github/veilor-os/test
DISPLAY=:0 qemu-system-x86_64 \
    -enable-kvm -cpu host -smp 4 -m 4096 \
    -machine q35,smm=on \
    -global driver=cfi.pflash01,property=secure,value=on \
    -drive if=pflash,format=raw,readonly=on,file=/usr/share/edk2/ovmf/OVMF_CODE.fd \
    -drive if=pflash,format=raw,file=$PWD/veilor-vm.nvram \
    -drive file=$PWD/veilor-vm.qcow2,if=virtio,format=qcow2 \
    -monitor unix:$PWD/veilor-vm.monitor.sock,server,nowait \
    -netdev user,id=net0,hostfwd=tcp::2222-:22 \
    -device virtio-net-pci,netdev=net0 \
    -vga virtio -display gtk,gl=on

Real-hardware test — USB → spare laptop

Required for every tagged release. The VM cannot reproduce KMS / fbcon / GPU-driver issues; only real silicon will.

1. Flash USB

# 8GB+ USB stick, identified by lsblk (e.g. /dev/sda — confirm vendor)
sudo umount /dev/sdX* 2>/dev/null
sudo wipefs -a /dev/sdX
sudo dd if=/path/to/veilor-os-*.iso of=/dev/sdX bs=4M status=progress conv=fsync
sync
sudo eject /dev/sdX

Etcher / GNOME Disks also fine. Verify-after-flash is built into Etcher; for dd, run cmp on the first ISO_SIZE bytes if paranoid.

2. Boot test

  • Disable Secure Boot in firmware (until we MOK-enroll our shim, which is v0.5+).
  • Boot from USB.
  • Walk the same numbered steps as the VM section, except:
    • On "TYPE NOW: passphrase" steps, you actually have a keyboard.
    • At step 8, the laptop will eject the USB and reboot to the installed system without intervention.
    • At step 11, do NOT use veilortest1 for the post-install admin password change — pick something real if this is your daily-driver laptop, or a throwaway if it's a test machine. The kickstart's ChainOfTrust ends here; from this prompt forward you own the password.

3. Capture findings

Fill in a fresh test/test-runs/YYYY-MM-DD-vX.Y.Z.md from the template. Always capture: GRUB title, kernel cmdline (cat /proc/cmdline), lsblk -f, getenforce, systemctl is-active fail2ban usbguard tuned auditd firewalld, journalctl -b -p err --no-pager.

If anything regressed, that goes at the top of the report under Regressions, with a screenshot if possible.


Per-run report template

Copy test/test-runs/_TEMPLATE.md (created when the first real test-run lands) and fill in section-by-section. Keep them brief — this is meant to be a 5-minute write-up, not a thesis.


When to alter this procedure

If a step turns out to be wrong, redundant, or missing:

  1. Edit this file.
  2. Append to test/METHOD-CHANGELOG.md with: date, version it first applied to, what changed, and why (cite a specific test-run report if the change is in response to a finding).
  3. Reference the changelog entry in your commit message.

The changelog is the audit trail. Don't skip it.