Critical install bug fix + cosmetic round-up + first formal test procedure document. ## Critical: LUKS unlock on first boot Generated installer kickstart's %post was injecting `rd.luks.uuid=…` into `/etc/default/grub` only. Fedora 43 uses BLS (Boot Loader Specification) entries in `/boot/loader/entries/*.conf`; those are NOT regenerated by `grub2-mkconfig`. Result: the kernel boots without `rd.luks.uuid=`, dracut's cryptsetup-generator never spawns the unlock unit, plymouth has no password to ask for, and dracut-initqueue loops on dev-disk-by-uuid for ~3min before dropping to emergency shell. The fix layers both write paths: - `/etc/default/grub` — keeps the args around for future kernels (kernel-install reads this when adding new entries). - `grubby --update-kernel=ALL --args=...` — rewrites the `options` line of every existing BLS entry so the kernel that boots NEXT actually has the args. Verified by reading `/proc/cmdline` from the dracut emergency shell on a v0.5.26 install; old cmdline had only `root=UUID=… ro rootflags=subvol=root` and was missing the LUKS arg entirely. ## GRUB / branding - `/etc/default/grub` is sed'd to `GRUB_DISTRIBUTOR="veilor-os"` (was already there, kept). - BLS entries' `title` line is rewritten in-place to "veilor-os (<kver>)" for every kernel — `grub2-mkconfig` does not touch BLS titles, so this is the only path. - `/boot/loader/entries/*-0-rescue-*.conf` is removed: the auto-built rescue entry was leaking "Fedora Linux" into the GRUB menu and showing a second boot option that nobody asked for. The rescue kernel image itself is left in /boot. - Hostname defaults to `veilor` (was inheriting the `localhost-live` name anaconda writes when the kickstart's network directive is ignored under cmdline mode). - `/etc/machine-info` adds `PRETTY_HOSTNAME="veilor-os"` so `hostnamectl status` and any consumer reading machine-info see the brand. ## Boot UX - `fbcon=nodefer` added to live-ISO bootloader cmdline. On real laptops with a hardware GPU, the kernel modeset blanks the framebuffer console mid-boot; without `nodefer` the installer banner draws into a frozen framebuffer and the user sees a black screen with a blinking cursor for ~30s. virtio-vga in QEMU doesn't trigger this so it never reproduced in VM. Symptom report on v0.5.26 was the trigger to investigate. ## Installer cosmetics - `GUM_CHOOSE_CURSOR` and `GUM_INPUT_PROMPT` switched from `❯ ` to `> `. The unicode arrow falls back to a fixed-width block on the linux fbcon font and lipgloss then duplicates that block at col +23, producing the "Install Install" double-render and the stray-T artifact in password fields. Plain ASCII renders identically across fbcon, virtio-vga, and X/Wayland gum runs. - `VERSION_ID` bumped 0.5.8 → 0.5.27 in the os-release drop-in. The installer banner reads this at runtime, so the live ISO + installed system both now show "veilor-os 0.5.27". ## Test procedure - `test/TESTING.md` — first canonical test procedure document. Splits VM (cheap iteration, hybrid sendkey + human passwords) from real hardware (mandatory for tag). Documents the standard test passwords (`veilortest1` for both LUKS and admin), the kill-and-relaunch step to skip CD on second boot, and the per-step pass/fail contract. - `test/METHOD-CHANGELOG.md` — append-only audit trail for changes to the procedure. Future releases that alter the test method must add an entry here with the why. - `test/test-runs/_TEMPLATE.md` — per-run report template. Each tagged release should land a filled report alongside it. ## test/run-vm.sh Decoupled QEMU monitor sock setup from auto-inject. Previously `NO_INJECT=1` (used to suppress autotype noise into prompts) also killed the monitor sock, leaving the VM undriveable. Monitor sock is now always exposed; only the inject helper is gated on the pubkey detection.
187 lines
7.4 KiB
Markdown
187 lines
7.4 KiB
Markdown
# veilor-os — Testing Procedure
|
||
|
||
This document is the canonical procedure for validating a veilor-os ISO
|
||
end-to-end. Every release that gets a tag MUST have a corresponding
|
||
test-run report in `test/test-runs/` linked from the release notes.
|
||
|
||
If reality forces you to deviate from the steps below, **do not silently
|
||
patch the procedure** — open a commit that updates this file *and*
|
||
appends an entry to `test/METHOD-CHANGELOG.md` explaining what changed
|
||
and why. The changelog is what makes the procedure auditable; the
|
||
procedure itself is just the latest snapshot.
|
||
|
||
---
|
||
|
||
## Two test environments
|
||
|
||
| Environment | Catches | Doesn't catch |
|
||
|-------------|---------|---------------|
|
||
| **VM (QEMU + virtio-vga)** | install logic, kickstart bugs, %post failures, anaconda transaction failures, GRUB write, BLS entries, package selection, network stack | KMS / fbcon issues, real-firmware Secure Boot, USB controller quirks, GPU driver compatibility, sleep/wake, battery, thermals |
|
||
| **Real hardware (USB → spare laptop)** | everything VM doesn't | install repeatability (you only have so many spare laptops) |
|
||
|
||
Both are required for any tagged release. VM first (cheap iteration),
|
||
real hardware second (final sign-off).
|
||
|
||
---
|
||
|
||
## VM test — hybrid procedure
|
||
|
||
The VM cannot type LUKS / admin passwords through QEMU's `sendkey`
|
||
monitor command — plymouth's IPC ignores synthesised keystrokes (we
|
||
verified this across 14+ sendkey variants in earlier sessions). The
|
||
hybrid procedure splits the work: Claude/automation drives every step
|
||
that doesn't need a password; the human types the two passwords (LUKS
|
||
+ admin) into the QEMU window directly.
|
||
|
||
Standard test passwords (lab use only — never reuse outside this repo):
|
||
|
||
| Prompt | Type |
|
||
|--------|------|
|
||
| LUKS passphrase | `veilortest1` |
|
||
| Admin password | `veilortest1` |
|
||
|
||
Both passwords identical on purpose — easier to remember mid-test, both
|
||
satisfy the installer's 8-char min, neither contains shell-special
|
||
chars (validate_pw rejects `" $ \ \` & | / \n`).
|
||
|
||
### Run a VM test
|
||
|
||
```bash
|
||
cd ~/ai-lab/_github/veilor-os
|
||
# Pull the ISO you want to test (from a CI release or local build)
|
||
ls /home/admin/Downloads/veilor-os-*.iso
|
||
|
||
# Wipe stale state, launch VM with monitor sock (no auto-inject — we
|
||
# don't want sendkey noise typing into prompts)
|
||
FRESH=1 NO_INJECT=1 DISPLAY=:0 ./test/run-vm.sh \
|
||
/home/admin/Downloads/veilor-os-43-YYYYMMDD-HHMMSS.iso
|
||
```
|
||
|
||
Then either (a) drive the install yourself in the QEMU window, or
|
||
(b) hand the monitor sock to Claude / a script:
|
||
|
||
- Monitor sock: `test/veilor-vm.monitor.sock`
|
||
- Send a key: `echo "sendkey ret" | socat - "UNIX-CONNECT:$SOCK"`
|
||
- Screendump: `echo "screendump /tmp/x.ppm" | socat - "UNIX-CONNECT:$SOCK"; magick /tmp/x.ppm /tmp/x.png`
|
||
|
||
### Steps to verify
|
||
|
||
The complete checklist lives in `test/boot-checklist.md` — that file is
|
||
the granular pass/fail list. The high-level flow is:
|
||
|
||
1. **Live boot.** GRUB (legacy menu, no Plymouth splash) → text scroll
|
||
→ veilor-installer banner on tty1 within ~30s. No "fedora" branding
|
||
anywhere on screen.
|
||
2. **Installer menu.** "Install" highlighted by default. No phantom
|
||
duplicate items, no stray characters in input fields.
|
||
3. **Disk picker.** `/dev/vda` (or whatever virtio gives you) listed
|
||
with size + model.
|
||
4. **Passwords.** LUKS + admin prompts; user types `veilortest1` twice.
|
||
5. **Locale.** en_GB.UTF-8 picks up.
|
||
6. **Confirm.** Disk shown with `WILL BE ERASED`, locale + LUKS/admin
|
||
ticks shown.
|
||
7. **Anaconda.** "Installing veilor-os to /dev/vda · 10–30 min · logs
|
||
on tty4". Watch for `Configuring man-db` — if anything fails, this
|
||
is historically where it dies.
|
||
8. **Reboot.** VM reboots; ISO must NOT boot first this time. Kill
|
||
QEMU + relaunch without ISO drive (see *Boot installed disk* below)
|
||
to skip the GRUB-from-ISO path.
|
||
9. **GRUB.** Single "veilor-os" entry (no rescue, no "Fedora Linux").
|
||
10. **LUKS prompt.** Plymouth `details` theme — text-mode prompt for
|
||
passphrase. User types `veilortest1` in the QEMU window (sendkey
|
||
will not work).
|
||
11. **First boot.** SDDM splash → admin user pre-filled → admin types
|
||
`veilortest1` → password-change prompt (chage -d 0 expired the
|
||
password) → user picks new password → KDE Plasma session.
|
||
12. **Hardening checks** per `test/boot-checklist.md` (SELinux
|
||
enforcing, fail2ban active, USBGuard active, tuned profile, etc.).
|
||
|
||
### Boot installed disk (skip ISO)
|
||
|
||
After the install reboots, QEMU's CD-first boot order will land back
|
||
in the live ISO. Easiest workaround: kill QEMU and re-launch without
|
||
the `-drive file=...iso` line. The qcow2 retains the install:
|
||
|
||
```bash
|
||
pkill -f 'qemu-system.*veilor-os'
|
||
cd ~/ai-lab/_github/veilor-os/test
|
||
DISPLAY=:0 qemu-system-x86_64 \
|
||
-enable-kvm -cpu host -smp 4 -m 4096 \
|
||
-machine q35,smm=on \
|
||
-global driver=cfi.pflash01,property=secure,value=on \
|
||
-drive if=pflash,format=raw,readonly=on,file=/usr/share/edk2/ovmf/OVMF_CODE.fd \
|
||
-drive if=pflash,format=raw,file=$PWD/veilor-vm.nvram \
|
||
-drive file=$PWD/veilor-vm.qcow2,if=virtio,format=qcow2 \
|
||
-monitor unix:$PWD/veilor-vm.monitor.sock,server,nowait \
|
||
-netdev user,id=net0,hostfwd=tcp::2222-:22 \
|
||
-device virtio-net-pci,netdev=net0 \
|
||
-vga virtio -display gtk,gl=on
|
||
```
|
||
|
||
---
|
||
|
||
## Real-hardware test — USB → spare laptop
|
||
|
||
Required for every tagged release. The VM cannot reproduce KMS /
|
||
fbcon / GPU-driver issues; only real silicon will.
|
||
|
||
### 1. Flash USB
|
||
|
||
```bash
|
||
# 8GB+ USB stick, identified by lsblk (e.g. /dev/sda — confirm vendor)
|
||
sudo umount /dev/sdX* 2>/dev/null
|
||
sudo wipefs -a /dev/sdX
|
||
sudo dd if=/path/to/veilor-os-*.iso of=/dev/sdX bs=4M status=progress conv=fsync
|
||
sync
|
||
sudo eject /dev/sdX
|
||
```
|
||
|
||
Etcher / GNOME Disks also fine. Verify-after-flash is built into
|
||
Etcher; for `dd`, run `cmp` on the first ISO_SIZE bytes if paranoid.
|
||
|
||
### 2. Boot test
|
||
|
||
- Disable Secure Boot in firmware (until we MOK-enroll our shim, which
|
||
is v0.5+).
|
||
- Boot from USB.
|
||
- Walk the same numbered steps as the VM section, except:
|
||
- On "TYPE NOW: passphrase" steps, you actually have a keyboard.
|
||
- At step 8, the laptop will eject the USB and reboot to the
|
||
installed system without intervention.
|
||
- At step 11, do NOT use `veilortest1` for the post-install admin
|
||
password change — pick something real if this is your daily-driver
|
||
laptop, or a throwaway if it's a test machine. The kickstart's
|
||
ChainOfTrust ends here; from this prompt forward you own the
|
||
password.
|
||
|
||
### 3. Capture findings
|
||
|
||
Fill in a fresh `test/test-runs/YYYY-MM-DD-vX.Y.Z.md` from the
|
||
template. **Always** capture: GRUB title, kernel cmdline (`cat
|
||
/proc/cmdline`), `lsblk -f`, `getenforce`, `systemctl is-active fail2ban
|
||
usbguard tuned auditd firewalld`, `journalctl -b -p err --no-pager`.
|
||
|
||
If anything regressed, that goes at the top of the report under
|
||
**Regressions**, with a screenshot if possible.
|
||
|
||
---
|
||
|
||
## Per-run report template
|
||
|
||
Copy `test/test-runs/_TEMPLATE.md` (created when the first real
|
||
test-run lands) and fill in section-by-section. Keep them brief —
|
||
this is meant to be a 5-minute write-up, not a thesis.
|
||
|
||
---
|
||
|
||
## When to alter this procedure
|
||
|
||
If a step turns out to be wrong, redundant, or missing:
|
||
|
||
1. Edit this file.
|
||
2. Append to `test/METHOD-CHANGELOG.md` with: date, version it first
|
||
applied to, what changed, and why (cite a specific test-run report
|
||
if the change is in response to a finding).
|
||
3. Reference the changelog entry in your commit message.
|
||
|
||
The changelog is the audit trail. Don't skip it.
|