veilor-os/test/TESTING.md

188 lines
7.4 KiB
Markdown
Raw Permalink Normal View History

v0.5.27: rd.luks.uuid via grubby, GRUB rebrand, fbcon=nodefer, ASCII gum cursor Critical install bug fix + cosmetic round-up + first formal test procedure document. ## Critical: LUKS unlock on first boot Generated installer kickstart's %post was injecting `rd.luks.uuid=…` into `/etc/default/grub` only. Fedora 43 uses BLS (Boot Loader Specification) entries in `/boot/loader/entries/*.conf`; those are NOT regenerated by `grub2-mkconfig`. Result: the kernel boots without `rd.luks.uuid=`, dracut's cryptsetup-generator never spawns the unlock unit, plymouth has no password to ask for, and dracut-initqueue loops on dev-disk-by-uuid for ~3min before dropping to emergency shell. The fix layers both write paths: - `/etc/default/grub` — keeps the args around for future kernels (kernel-install reads this when adding new entries). - `grubby --update-kernel=ALL --args=...` — rewrites the `options` line of every existing BLS entry so the kernel that boots NEXT actually has the args. Verified by reading `/proc/cmdline` from the dracut emergency shell on a v0.5.26 install; old cmdline had only `root=UUID=… ro rootflags=subvol=root` and was missing the LUKS arg entirely. ## GRUB / branding - `/etc/default/grub` is sed'd to `GRUB_DISTRIBUTOR="veilor-os"` (was already there, kept). - BLS entries' `title` line is rewritten in-place to "veilor-os (<kver>)" for every kernel — `grub2-mkconfig` does not touch BLS titles, so this is the only path. - `/boot/loader/entries/*-0-rescue-*.conf` is removed: the auto-built rescue entry was leaking "Fedora Linux" into the GRUB menu and showing a second boot option that nobody asked for. The rescue kernel image itself is left in /boot. - Hostname defaults to `veilor` (was inheriting the `localhost-live` name anaconda writes when the kickstart's network directive is ignored under cmdline mode). - `/etc/machine-info` adds `PRETTY_HOSTNAME="veilor-os"` so `hostnamectl status` and any consumer reading machine-info see the brand. ## Boot UX - `fbcon=nodefer` added to live-ISO bootloader cmdline. On real laptops with a hardware GPU, the kernel modeset blanks the framebuffer console mid-boot; without `nodefer` the installer banner draws into a frozen framebuffer and the user sees a black screen with a blinking cursor for ~30s. virtio-vga in QEMU doesn't trigger this so it never reproduced in VM. Symptom report on v0.5.26 was the trigger to investigate. ## Installer cosmetics - `GUM_CHOOSE_CURSOR` and `GUM_INPUT_PROMPT` switched from `❯ ` to `> `. The unicode arrow falls back to a fixed-width block on the linux fbcon font and lipgloss then duplicates that block at col +23, producing the "Install Install" double-render and the stray-T artifact in password fields. Plain ASCII renders identically across fbcon, virtio-vga, and X/Wayland gum runs. - `VERSION_ID` bumped 0.5.8 → 0.5.27 in the os-release drop-in. The installer banner reads this at runtime, so the live ISO + installed system both now show "veilor-os 0.5.27". ## Test procedure - `test/TESTING.md` — first canonical test procedure document. Splits VM (cheap iteration, hybrid sendkey + human passwords) from real hardware (mandatory for tag). Documents the standard test passwords (`veilortest1` for both LUKS and admin), the kill-and-relaunch step to skip CD on second boot, and the per-step pass/fail contract. - `test/METHOD-CHANGELOG.md` — append-only audit trail for changes to the procedure. Future releases that alter the test method must add an entry here with the why. - `test/test-runs/_TEMPLATE.md` — per-run report template. Each tagged release should land a filled report alongside it. ## test/run-vm.sh Decoupled QEMU monitor sock setup from auto-inject. Previously `NO_INJECT=1` (used to suppress autotype noise into prompts) also killed the monitor sock, leaving the VM undriveable. Monitor sock is now always exposed; only the inject helper is gated on the pubkey detection.
2026-05-05 01:43:00 +01:00
# veilor-os — Testing Procedure
This document is the canonical procedure for validating a veilor-os ISO
end-to-end. Every release that gets a tag MUST have a corresponding
test-run report in `test/test-runs/` linked from the release notes.
If reality forces you to deviate from the steps below, **do not silently
patch the procedure** — open a commit that updates this file *and*
appends an entry to `test/METHOD-CHANGELOG.md` explaining what changed
and why. The changelog is what makes the procedure auditable; the
procedure itself is just the latest snapshot.
---
## Two test environments
| Environment | Catches | Doesn't catch |
|-------------|---------|---------------|
| **VM (QEMU + virtio-vga)** | install logic, kickstart bugs, %post failures, anaconda transaction failures, GRUB write, BLS entries, package selection, network stack | KMS / fbcon issues, real-firmware Secure Boot, USB controller quirks, GPU driver compatibility, sleep/wake, battery, thermals |
| **Real hardware (USB → spare laptop)** | everything VM doesn't | install repeatability (you only have so many spare laptops) |
Both are required for any tagged release. VM first (cheap iteration),
real hardware second (final sign-off).
---
## VM test — hybrid procedure
The VM cannot type LUKS / admin passwords through QEMU's `sendkey`
monitor command — plymouth's IPC ignores synthesised keystrokes (we
verified this across 14+ sendkey variants in earlier sessions). The
hybrid procedure splits the work: Claude/automation drives every step
that doesn't need a password; the human types the two passwords (LUKS
+ admin) into the QEMU window directly.
Standard test passwords (lab use only — never reuse outside this repo):
| Prompt | Type |
|--------|------|
| LUKS passphrase | `veilortest1` |
| Admin password | `veilortest1` |
Both passwords identical on purpose — easier to remember mid-test, both
satisfy the installer's 8-char min, neither contains shell-special
chars (validate_pw rejects `" $ \ \` & | / \n`).
### Run a VM test
```bash
cd ~/ai-lab/_github/veilor-os
# Pull the ISO you want to test (from a CI release or local build)
ls /home/admin/Downloads/veilor-os-*.iso
# Wipe stale state, launch VM with monitor sock (no auto-inject — we
# don't want sendkey noise typing into prompts)
FRESH=1 NO_INJECT=1 DISPLAY=:0 ./test/run-vm.sh \
/home/admin/Downloads/veilor-os-43-YYYYMMDD-HHMMSS.iso
```
Then either (a) drive the install yourself in the QEMU window, or
(b) hand the monitor sock to Claude / a script:
- Monitor sock: `test/veilor-vm.monitor.sock`
- Send a key: `echo "sendkey ret" | socat - "UNIX-CONNECT:$SOCK"`
- Screendump: `echo "screendump /tmp/x.ppm" | socat - "UNIX-CONNECT:$SOCK"; magick /tmp/x.ppm /tmp/x.png`
### Steps to verify
The complete checklist lives in `test/boot-checklist.md` — that file is
the granular pass/fail list. The high-level flow is:
1. **Live boot.** GRUB (legacy menu, no Plymouth splash) → text scroll
→ veilor-installer banner on tty1 within ~30s. No "fedora" branding
anywhere on screen.
2. **Installer menu.** "Install" highlighted by default. No phantom
duplicate items, no stray characters in input fields.
3. **Disk picker.** `/dev/vda` (or whatever virtio gives you) listed
with size + model.
4. **Passwords.** LUKS + admin prompts; user types `veilortest1` twice.
5. **Locale.** en_GB.UTF-8 picks up.
6. **Confirm.** Disk shown with `WILL BE ERASED`, locale + LUKS/admin
ticks shown.
7. **Anaconda.** "Installing veilor-os to /dev/vda · 1030 min · logs
on tty4". Watch for `Configuring man-db` — if anything fails, this
is historically where it dies.
8. **Reboot.** VM reboots; ISO must NOT boot first this time. Kill
QEMU + relaunch without ISO drive (see *Boot installed disk* below)
to skip the GRUB-from-ISO path.
9. **GRUB.** Single "veilor-os" entry (no rescue, no "Fedora Linux").
10. **LUKS prompt.** Plymouth `details` theme — text-mode prompt for
passphrase. User types `veilortest1` in the QEMU window (sendkey
will not work).
11. **First boot.** SDDM splash → admin user pre-filled → admin types
`veilortest1` → password-change prompt (chage -d 0 expired the
password) → user picks new password → KDE Plasma session.
12. **Hardening checks** per `test/boot-checklist.md` (SELinux
enforcing, fail2ban active, USBGuard active, tuned profile, etc.).
### Boot installed disk (skip ISO)
After the install reboots, QEMU's CD-first boot order will land back
in the live ISO. Easiest workaround: kill QEMU and re-launch without
the `-drive file=...iso` line. The qcow2 retains the install:
```bash
pkill -f 'qemu-system.*veilor-os'
cd ~/ai-lab/_github/veilor-os/test
DISPLAY=:0 qemu-system-x86_64 \
-enable-kvm -cpu host -smp 4 -m 4096 \
-machine q35,smm=on \
-global driver=cfi.pflash01,property=secure,value=on \
-drive if=pflash,format=raw,readonly=on,file=/usr/share/edk2/ovmf/OVMF_CODE.fd \
-drive if=pflash,format=raw,file=$PWD/veilor-vm.nvram \
-drive file=$PWD/veilor-vm.qcow2,if=virtio,format=qcow2 \
-monitor unix:$PWD/veilor-vm.monitor.sock,server,nowait \
-netdev user,id=net0,hostfwd=tcp::2222-:22 \
-device virtio-net-pci,netdev=net0 \
-vga virtio -display gtk,gl=on
```
---
## Real-hardware test — USB → spare laptop
Required for every tagged release. The VM cannot reproduce KMS /
fbcon / GPU-driver issues; only real silicon will.
### 1. Flash USB
```bash
# 8GB+ USB stick, identified by lsblk (e.g. /dev/sda — confirm vendor)
sudo umount /dev/sdX* 2>/dev/null
sudo wipefs -a /dev/sdX
sudo dd if=/path/to/veilor-os-*.iso of=/dev/sdX bs=4M status=progress conv=fsync
sync
sudo eject /dev/sdX
```
Etcher / GNOME Disks also fine. Verify-after-flash is built into
Etcher; for `dd`, run `cmp` on the first ISO_SIZE bytes if paranoid.
### 2. Boot test
- Disable Secure Boot in firmware (until we MOK-enroll our shim, which
is v0.5+).
- Boot from USB.
- Walk the same numbered steps as the VM section, except:
- On "TYPE NOW: passphrase" steps, you actually have a keyboard.
- At step 8, the laptop will eject the USB and reboot to the
installed system without intervention.
- At step 11, do NOT use `veilortest1` for the post-install admin
password change — pick something real if this is your daily-driver
laptop, or a throwaway if it's a test machine. The kickstart's
ChainOfTrust ends here; from this prompt forward you own the
password.
### 3. Capture findings
Fill in a fresh `test/test-runs/YYYY-MM-DD-vX.Y.Z.md` from the
template. **Always** capture: GRUB title, kernel cmdline (`cat
/proc/cmdline`), `lsblk -f`, `getenforce`, `systemctl is-active fail2ban
usbguard tuned auditd firewalld`, `journalctl -b -p err --no-pager`.
If anything regressed, that goes at the top of the report under
**Regressions**, with a screenshot if possible.
---
## Per-run report template
Copy `test/test-runs/_TEMPLATE.md` (created when the first real
test-run lands) and fill in section-by-section. Keep them brief —
this is meant to be a 5-minute write-up, not a thesis.
---
## When to alter this procedure
If a step turns out to be wrong, redundant, or missing:
1. Edit this file.
2. Append to `test/METHOD-CHANGELOG.md` with: date, version it first
applied to, what changed, and why (cite a specific test-run report
if the change is in response to a finding).
3. Reference the changelog entry in your commit message.
The changelog is the audit trail. Don't skip it.