veilor-os/test/TESTING.md
veilor-org 1881c14ea7 v0.5.27: rd.luks.uuid via grubby, GRUB rebrand, fbcon=nodefer, ASCII gum cursor
Critical install bug fix + cosmetic round-up + first formal test
procedure document.

## Critical: LUKS unlock on first boot

Generated installer kickstart's %post was injecting `rd.luks.uuid=…`
into `/etc/default/grub` only. Fedora 43 uses BLS (Boot Loader
Specification) entries in `/boot/loader/entries/*.conf`; those are
NOT regenerated by `grub2-mkconfig`. Result: the kernel boots without
`rd.luks.uuid=`, dracut's cryptsetup-generator never spawns the
unlock unit, plymouth has no password to ask for, and dracut-initqueue
loops on dev-disk-by-uuid for ~3min before dropping to emergency
shell.

The fix layers both write paths:
- `/etc/default/grub` — keeps the args around for future kernels
  (kernel-install reads this when adding new entries).
- `grubby --update-kernel=ALL --args=...` — rewrites the `options`
  line of every existing BLS entry so the kernel that boots NEXT
  actually has the args.

Verified by reading `/proc/cmdline` from the dracut emergency shell
on a v0.5.26 install; old cmdline had only `root=UUID=… ro
rootflags=subvol=root` and was missing the LUKS arg entirely.

## GRUB / branding

- `/etc/default/grub` is sed'd to `GRUB_DISTRIBUTOR="veilor-os"` (was
  already there, kept).
- BLS entries' `title` line is rewritten in-place to "veilor-os
  (<kver>)" for every kernel — `grub2-mkconfig` does not touch BLS
  titles, so this is the only path.
- `/boot/loader/entries/*-0-rescue-*.conf` is removed: the auto-built
  rescue entry was leaking "Fedora Linux" into the GRUB menu and
  showing a second boot option that nobody asked for. The rescue
  kernel image itself is left in /boot.
- Hostname defaults to `veilor` (was inheriting the `localhost-live`
  name anaconda writes when the kickstart's network directive is
  ignored under cmdline mode).
- `/etc/machine-info` adds `PRETTY_HOSTNAME="veilor-os"` so
  `hostnamectl status` and any consumer reading machine-info see the
  brand.

## Boot UX

- `fbcon=nodefer` added to live-ISO bootloader cmdline. On real
  laptops with a hardware GPU, the kernel modeset blanks the
  framebuffer console mid-boot; without `nodefer` the installer
  banner draws into a frozen framebuffer and the user sees a black
  screen with a blinking cursor for ~30s. virtio-vga in QEMU doesn't
  trigger this so it never reproduced in VM. Symptom report on
  v0.5.26 was the trigger to investigate.

## Installer cosmetics

- `GUM_CHOOSE_CURSOR` and `GUM_INPUT_PROMPT` switched from `❯ ` to
  `> `. The unicode arrow falls back to a fixed-width block on the
  linux fbcon font and lipgloss then duplicates that block at col +23,
  producing the "Install Install" double-render and the stray-T
  artifact in password fields. Plain ASCII renders identically across
  fbcon, virtio-vga, and X/Wayland gum runs.
- `VERSION_ID` bumped 0.5.8 → 0.5.27 in the os-release drop-in. The
  installer banner reads this at runtime, so the live ISO + installed
  system both now show "veilor-os 0.5.27".

## Test procedure

- `test/TESTING.md` — first canonical test procedure document. Splits
  VM (cheap iteration, hybrid sendkey + human passwords) from real
  hardware (mandatory for tag). Documents the standard test passwords
  (`veilortest1` for both LUKS and admin), the kill-and-relaunch step
  to skip CD on second boot, and the per-step pass/fail contract.
- `test/METHOD-CHANGELOG.md` — append-only audit trail for changes to
  the procedure. Future releases that alter the test method must add
  an entry here with the why.
- `test/test-runs/_TEMPLATE.md` — per-run report template. Each
  tagged release should land a filled report alongside it.

## test/run-vm.sh

Decoupled QEMU monitor sock setup from auto-inject. Previously
`NO_INJECT=1` (used to suppress autotype noise into prompts) also
killed the monitor sock, leaving the VM undriveable. Monitor sock is
now always exposed; only the inject helper is gated on the pubkey
detection.
2026-05-05 01:43:00 +01:00

187 lines
7.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# veilor-os — Testing Procedure
This document is the canonical procedure for validating a veilor-os ISO
end-to-end. Every release that gets a tag MUST have a corresponding
test-run report in `test/test-runs/` linked from the release notes.
If reality forces you to deviate from the steps below, **do not silently
patch the procedure** — open a commit that updates this file *and*
appends an entry to `test/METHOD-CHANGELOG.md` explaining what changed
and why. The changelog is what makes the procedure auditable; the
procedure itself is just the latest snapshot.
---
## Two test environments
| Environment | Catches | Doesn't catch |
|-------------|---------|---------------|
| **VM (QEMU + virtio-vga)** | install logic, kickstart bugs, %post failures, anaconda transaction failures, GRUB write, BLS entries, package selection, network stack | KMS / fbcon issues, real-firmware Secure Boot, USB controller quirks, GPU driver compatibility, sleep/wake, battery, thermals |
| **Real hardware (USB → spare laptop)** | everything VM doesn't | install repeatability (you only have so many spare laptops) |
Both are required for any tagged release. VM first (cheap iteration),
real hardware second (final sign-off).
---
## VM test — hybrid procedure
The VM cannot type LUKS / admin passwords through QEMU's `sendkey`
monitor command — plymouth's IPC ignores synthesised keystrokes (we
verified this across 14+ sendkey variants in earlier sessions). The
hybrid procedure splits the work: Claude/automation drives every step
that doesn't need a password; the human types the two passwords (LUKS
+ admin) into the QEMU window directly.
Standard test passwords (lab use only — never reuse outside this repo):
| Prompt | Type |
|--------|------|
| LUKS passphrase | `veilortest1` |
| Admin password | `veilortest1` |
Both passwords identical on purpose — easier to remember mid-test, both
satisfy the installer's 8-char min, neither contains shell-special
chars (validate_pw rejects `" $ \ \` & | / \n`).
### Run a VM test
```bash
cd ~/ai-lab/_github/veilor-os
# Pull the ISO you want to test (from a CI release or local build)
ls /home/admin/Downloads/veilor-os-*.iso
# Wipe stale state, launch VM with monitor sock (no auto-inject — we
# don't want sendkey noise typing into prompts)
FRESH=1 NO_INJECT=1 DISPLAY=:0 ./test/run-vm.sh \
/home/admin/Downloads/veilor-os-43-YYYYMMDD-HHMMSS.iso
```
Then either (a) drive the install yourself in the QEMU window, or
(b) hand the monitor sock to Claude / a script:
- Monitor sock: `test/veilor-vm.monitor.sock`
- Send a key: `echo "sendkey ret" | socat - "UNIX-CONNECT:$SOCK"`
- Screendump: `echo "screendump /tmp/x.ppm" | socat - "UNIX-CONNECT:$SOCK"; magick /tmp/x.ppm /tmp/x.png`
### Steps to verify
The complete checklist lives in `test/boot-checklist.md` — that file is
the granular pass/fail list. The high-level flow is:
1. **Live boot.** GRUB (legacy menu, no Plymouth splash) → text scroll
→ veilor-installer banner on tty1 within ~30s. No "fedora" branding
anywhere on screen.
2. **Installer menu.** "Install" highlighted by default. No phantom
duplicate items, no stray characters in input fields.
3. **Disk picker.** `/dev/vda` (or whatever virtio gives you) listed
with size + model.
4. **Passwords.** LUKS + admin prompts; user types `veilortest1` twice.
5. **Locale.** en_GB.UTF-8 picks up.
6. **Confirm.** Disk shown with `WILL BE ERASED`, locale + LUKS/admin
ticks shown.
7. **Anaconda.** "Installing veilor-os to /dev/vda · 1030 min · logs
on tty4". Watch for `Configuring man-db` — if anything fails, this
is historically where it dies.
8. **Reboot.** VM reboots; ISO must NOT boot first this time. Kill
QEMU + relaunch without ISO drive (see *Boot installed disk* below)
to skip the GRUB-from-ISO path.
9. **GRUB.** Single "veilor-os" entry (no rescue, no "Fedora Linux").
10. **LUKS prompt.** Plymouth `details` theme — text-mode prompt for
passphrase. User types `veilortest1` in the QEMU window (sendkey
will not work).
11. **First boot.** SDDM splash → admin user pre-filled → admin types
`veilortest1` → password-change prompt (chage -d 0 expired the
password) → user picks new password → KDE Plasma session.
12. **Hardening checks** per `test/boot-checklist.md` (SELinux
enforcing, fail2ban active, USBGuard active, tuned profile, etc.).
### Boot installed disk (skip ISO)
After the install reboots, QEMU's CD-first boot order will land back
in the live ISO. Easiest workaround: kill QEMU and re-launch without
the `-drive file=...iso` line. The qcow2 retains the install:
```bash
pkill -f 'qemu-system.*veilor-os'
cd ~/ai-lab/_github/veilor-os/test
DISPLAY=:0 qemu-system-x86_64 \
-enable-kvm -cpu host -smp 4 -m 4096 \
-machine q35,smm=on \
-global driver=cfi.pflash01,property=secure,value=on \
-drive if=pflash,format=raw,readonly=on,file=/usr/share/edk2/ovmf/OVMF_CODE.fd \
-drive if=pflash,format=raw,file=$PWD/veilor-vm.nvram \
-drive file=$PWD/veilor-vm.qcow2,if=virtio,format=qcow2 \
-monitor unix:$PWD/veilor-vm.monitor.sock,server,nowait \
-netdev user,id=net0,hostfwd=tcp::2222-:22 \
-device virtio-net-pci,netdev=net0 \
-vga virtio -display gtk,gl=on
```
---
## Real-hardware test — USB → spare laptop
Required for every tagged release. The VM cannot reproduce KMS /
fbcon / GPU-driver issues; only real silicon will.
### 1. Flash USB
```bash
# 8GB+ USB stick, identified by lsblk (e.g. /dev/sda — confirm vendor)
sudo umount /dev/sdX* 2>/dev/null
sudo wipefs -a /dev/sdX
sudo dd if=/path/to/veilor-os-*.iso of=/dev/sdX bs=4M status=progress conv=fsync
sync
sudo eject /dev/sdX
```
Etcher / GNOME Disks also fine. Verify-after-flash is built into
Etcher; for `dd`, run `cmp` on the first ISO_SIZE bytes if paranoid.
### 2. Boot test
- Disable Secure Boot in firmware (until we MOK-enroll our shim, which
is v0.5+).
- Boot from USB.
- Walk the same numbered steps as the VM section, except:
- On "TYPE NOW: passphrase" steps, you actually have a keyboard.
- At step 8, the laptop will eject the USB and reboot to the
installed system without intervention.
- At step 11, do NOT use `veilortest1` for the post-install admin
password change — pick something real if this is your daily-driver
laptop, or a throwaway if it's a test machine. The kickstart's
ChainOfTrust ends here; from this prompt forward you own the
password.
### 3. Capture findings
Fill in a fresh `test/test-runs/YYYY-MM-DD-vX.Y.Z.md` from the
template. **Always** capture: GRUB title, kernel cmdline (`cat
/proc/cmdline`), `lsblk -f`, `getenforce`, `systemctl is-active fail2ban
usbguard tuned auditd firewalld`, `journalctl -b -p err --no-pager`.
If anything regressed, that goes at the top of the report under
**Regressions**, with a screenshot if possible.
---
## Per-run report template
Copy `test/test-runs/_TEMPLATE.md` (created when the first real
test-run lands) and fill in section-by-section. Keep them brief —
this is meant to be a 5-minute write-up, not a thesis.
---
## When to alter this procedure
If a step turns out to be wrong, redundant, or missing:
1. Edit this file.
2. Append to `test/METHOD-CHANGELOG.md` with: date, version it first
applied to, what changed, and why (cite a specific test-run report
if the change is in response to a finding).
3. Reference the changelog entry in your commit message.
The changelog is the audit trail. Don't skip it.