veilor-os/test/run-vm.sh
veilor-org b86b4f9ec3 v0.5.32: ship 7 blockers from 9-agent wave
Per docs/research/2026-05-05-agent-wave/README.md priority list.
All 7 land together to keep iteration cycles useful — partial fixes
bury the lookahead findings agents already mapped.

## 1. CRITICAL — suspend/resume wifi death (Agent 9, B2)

`veilor-modules-lock.service` runs `kernel.modules_disabled=1` 30s
after graphical.target. iwlwifi/iwlmvm/cfg80211 reload on resume
from S3/S0ix → with modules locked, resume breaks wifi until
reboot. Same architectural class as the LUKS bug — security feature
breaks legitimate kernel state transitions.

The unit already has `ConditionKernelCommandLine=!module.sig_enforce=1`
(self-skip when signed-modules enforcement is on cmdline). Adding
`module.sig_enforce=1` to the kernel cmdline retains the security
property (no unsigned modules) without runtime lock-down → resume
works.

Files: kickstart/veilor-os.ks line 61 + overlay/usr/local/bin/veilor-installer
generated bootloader directive both gain `module.sig_enforce=1`.

## 2. veilor-firstboot.service WantedBy=graphical.target (Agent 2)

Was `WantedBy=multi-user.target` only. Real installs default to
graphical.target so the unit never ran on installed systems — admin
pw stayed at install-time + chage -d 0 expired, SDDM PAM bounced
to chauthtok screen (recoverable but ugly UX).

Now `WantedBy=graphical.target multi-user.target`. Live ISO +
multi-user installs both resolve via this list.

## 3. USBGuard hash → id-based baseline (Agent 9, A3)

Mirrors memory feedback_usbguard_dock.md — onyx had hash+parent-hash
rules that broke on dock replug; we shipped no rules.conf so first
boot blocks the USB keyboard.

Adds overlay/etc/usbguard/rules.conf with HID-class allow rule
(`allow with-interface match-all { 03:*:* }`) — covers every USB
keyboard, mouse, gamepad, fingerprint reader, NFC. Survives dock
replug + kernel-bump vendor renumeration. Mass-storage stays
implicit-block; user explicitly allows post-firstboot via
`ujust veilor-usbguard-enroll` (planned v0.6).

## 4. firewalld trusted zone with tailscale0 pre-bound (Agent 9, D1)

User uses Tailscale daily (memory: project_tailscale_mesh.md).
Default firewalld zone = drop, blocks tailnet traffic on tailscale0.

Adds overlay/etc/firewalld/zones/trusted.xml with
`<interface name="tailscale0"/>`. After `tailscale up` brings the
interface up, NetworkManager dispatcher associates it with the
trusted zone automatically — no user intervention.

Default zone stays drop. Only the tailscale0 interface gets ACCEPT.

## 5. /etc/skel branding (Agent 7)

Was completely empty. Result: per-user KDE config (~/.config/kdeglobals
etc.) pre-empty, so the moment user opened System Settings, KDE wrote
fresh ~/.config/* and silently shadowed our /etc/xdg/kdedefaults/*.
Visual brand evaporated on first click.

Seeds:
  /etc/skel/.config/kdeglobals    (copy of assets/kde/veilor-default.kdeglobals)
  /etc/skel/.config/breezerc      (copy of assets/kde/breezerc)
  /etc/skel/.config/kwinrc        (Plasma 6 wayland defaults: opengl, animspeed=0,
                                    blur off, click-to-focus)
  /etc/skel/.config/konsolerc     (default profile = Veilor)
  /etc/skel/.local/share/konsole/Veilor.profile + .colorscheme

User who opens System Settings now writes against branded baseline,
not against vanilla Breeze.

## 6. KMS modeset args + initramfs keymap (Agents 1 + 9)

Real laptop boot has a 5-15s blank between vt switch and SDDM start
because simpledrm releases before i915/nvidia-drm/amdgpu claim. Plus
non-US users get locked out at LUKS prompt because initramfs ships
en-US keymap by default (RHBZ 1405539, RHBZ 1890085).

Adds to bootloader cmdline (live + installed):
  i915.modeset=1 amdgpu.modeset=1 nvidia-drm.modeset=1
  rd.vconsole.keymap=us

`rd.vconsole.keymap=us` is a placeholder; the v0.6 firstboot keymap
picker will rewrite it from /etc/vconsole.conf. Until then, en-US
users get correct LUKS keyboard; non-US users still need the v0.6
fix (per Agent 1).

## 7. virtio-9p log capture (Agent 6)

The v0.5.30 virtio-serial wiring depends on rsyslog inside the live
ISO (anaconda's setupVirtio writes a rsyslog forward rule), which
the live ks doesn't install — files were 0-byte across three
install runs.

test/run-vm.sh now adds a `-virtfs local,...,mount_tag=hostlogs`
share pointing at `test/test-runs/<timestamp>/`. veilor-installer
runs `_dump_logs_to_host` via EXIT trap that mounts the share at
/mnt/hostlogs and rsyncs /tmp/{anaconda,program,storage,packaging,dnf}.log
+ /var/log/veilor-installer.log + dmesg + journalctl + the generated
ks. Runs on success AND failure AND ^C.

No-op on real hardware (9p tag absent) — VM-only debug.

## Validate

  bash -n overlay/usr/local/bin/veilor-installer  # OK
  ksvalidator kickstart/veilor-os.ks               # clean

## Out-of-scope for v0.5.32 (deferred to v0.6)

Per Agent 1 follow-ups: argon2id retune for slow CPUs, recovery key
generation in firstboot, TPM2/FIDO2 unlock helpers. Per Agent 9
follow-ups: Plasma Wayland fallback X11 install, lid-close handling,
SELinux relabel progress UX. Per Agent 4: AppArmor stack +
nftables preset + audit log shipping CLI.

Per Agent 8 (CI hardening): SHA-pin actions + dependabot + SBOM +
SLSA L3 attestation — separate workflow-only commit.
2026-05-05 15:36:24 +01:00

253 lines
9.8 KiB
Bash
Executable file

#!/usr/bin/env bash
# Boot veilor-os ISO in KVM/QEMU under UEFI.
# Usage:
# ./test/run-vm.sh # boots latest ISO from build/out
# ./test/run-vm.sh path/to.iso # specific ISO
# SECBOOT=1 ./test/run-vm.sh # use OVMF Secure Boot firmware
# FRESH=1 ./test/run-vm.sh # wipe disk + nvram, re-install from scratch
# NO_INJECT=1 ./test/run-vm.sh # skip SSH-key auto-injection
#
# SSH-key auto-injection (chosen approach: dual — cloud-init NoCloud + QEMU
# monitor sendkey fallback)
# ------------------------------------------------------------------
# Goal: previously each test required logging in at the QEMU console and
# running `passwd -d liveuser`, editing sshd_config, etc. before
# `ssh -p 2222 liveuser@localhost` worked. This script eliminates that.
#
# Primary path (works for the *installed* system, not the live image):
# * Detect host pubkey at ~/.ssh/id_ed25519.pub or ~/.ssh/id_rsa.pub
# * Build a NoCloud cloud-init ISO (user-data + meta-data) via mkisofs/xorriso
# * Mount it as a second virtual cdrom — Anaconda/cloud-init picks it up
# automatically when installing because the seed has the magic
# `cidata` volume label.
#
# Fallback path (works for the *live* image, which doesn't run cloud-init by
# default — dracut-live + livesys-scripts mount squashfs read-only and skip
# cloud-init.target):
# * Open a QEMU monitor unix socket (-monitor unix:...).
# * After ~90s (long enough for SDDM autologin → liveuser), background a
# helper that pipes a sequence of `sendkey` events to the monitor:
# Ctrl+Alt+F2 (drop to TTY)
# "sudo passwd -d liveuser && sudo systemctl reload sshd\n"
# This unblocks SSH on port 2222 without manual interaction.
#
# Both paths are best-effort; if the host has no pubkey, both are skipped
# and the script behaves exactly as before.
set -euo pipefail
REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
TEST_DIR="$REPO_ROOT/test"
DISK="$TEST_DIR/veilor-vm.qcow2"
NVRAM="$TEST_DIR/veilor-vm.nvram"
SEED_ISO="$TEST_DIR/cloud-init-seed.iso"
MONITOR_SOCK="$TEST_DIR/veilor-vm.monitor.sock"
ISO="${1:-$(ls -t "$REPO_ROOT"/build/out/*.iso 2>/dev/null | head -1)}"
[[ -n ${ISO:-} && -f $ISO ]] || { echo "[ERR] No ISO found. Build first: ./build/build-iso.sh"; exit 1; }
# OVMF firmware selection
if [[ "${SECBOOT:-0}" == "1" ]]; then
OVMF_CODE=/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd
OVMF_VARS_SRC=/usr/share/edk2/ovmf/OVMF_VARS.secboot.fd
NVRAM="$TEST_DIR/veilor-vm.nvram.secboot"
else
OVMF_CODE=/usr/share/edk2/ovmf/OVMF_CODE.fd
OVMF_VARS_SRC=/usr/share/edk2/ovmf/OVMF_VARS.fd
fi
# Reset on FRESH=1
if [[ "${FRESH:-0}" == "1" ]]; then
rm -f "$DISK" "$NVRAM" "$SEED_ISO"
fi
# Provision disk + per-VM nvram once
[[ -f $DISK ]] || qemu-img create -f qcow2 "$DISK" 40G
[[ -f $NVRAM ]] || cp "$OVMF_VARS_SRC" "$NVRAM"
# ── Locate host SSH pubkey (ed25519 preferred, rsa fallback) ──
HOST_PUBKEY=""
if [[ "${NO_INJECT:-0}" != "1" ]]; then
for cand in "$HOME/.ssh/id_ed25519.pub" "$HOME/.ssh/id_rsa.pub"; do
if [[ -f $cand ]]; then
HOST_PUBKEY="$(< "$cand")"
echo "[INFO] using host pubkey: $cand"
break
fi
done
fi
# ── Build cloud-init NoCloud seed ISO (primary path) ──
SEED_ARGS=()
if [[ -n $HOST_PUBKEY ]]; then
SEED_DIR="$(mktemp -d)"
trap 'rm -rf "$SEED_DIR"' EXIT
cat > "$SEED_DIR/meta-data" <<EOF
instance-id: veilor-test-vm
local-hostname: veilor-test
EOF
cat > "$SEED_DIR/user-data" <<EOF
#cloud-config
users:
- name: liveuser
ssh_authorized_keys:
- $HOST_PUBKEY
- name: admin
ssh_authorized_keys:
- $HOST_PUBKEY
lock_passwd: false
passwd:
ssh_pwauth: true
runcmd:
- rm -f /etc/ssh/sshd_config.d/10-veilor-hardening.conf
- systemctl reload sshd || systemctl restart sshd || true
EOF
# Build NoCloud ISO. Volume label MUST be "cidata" (case-insensitive)
# for cloud-init's NoCloud datasource to pick it up.
if command -v mkisofs >/dev/null 2>&1; then
mkisofs -quiet -output "$SEED_ISO" \
-volid cidata -joliet -rock \
"$SEED_DIR/user-data" "$SEED_DIR/meta-data"
elif command -v xorriso >/dev/null 2>&1; then
xorriso -as mkisofs -quiet -output "$SEED_ISO" \
-volid cidata -joliet -rock \
"$SEED_DIR/user-data" "$SEED_DIR/meta-data"
elif command -v cloud-localds >/dev/null 2>&1; then
cloud-localds "$SEED_ISO" "$SEED_DIR/user-data" "$SEED_DIR/meta-data"
else
echo "[WARN] no mkisofs/xorriso/cloud-localds — skipping cloud-init seed"
SEED_ISO=""
fi
if [[ -n $SEED_ISO && -f $SEED_ISO ]]; then
echo "[INFO] cloud-init seed ISO: $SEED_ISO"
SEED_ARGS=(-drive "file=$SEED_ISO,media=cdrom,readonly=on")
fi
fi
# ── QEMU monitor unix socket ──
# Always exposed so the host can drive the VM via `socat - UNIX-CONNECT:...`
# (sendkey, screendump, etc.) for debugging. Independent of pubkey injection.
rm -f "$MONITOR_SOCK"
MONITOR_ARGS=(-monitor "unix:$MONITOR_SOCK,server,nowait")
# ── Auto-inject helper (live ISO doesn't run cloud-init) ──
# Started in the background after a delay; sends keypresses through the
# QEMU monitor unix socket to drop to a TTY and unblock SSH for liveuser.
if [[ -n $HOST_PUBKEY ]]; then
(
# Wait for the VM to reach a usable login prompt (SDDM autologin →
# liveuser session is the most realistic target). 90s is enough on
# KVM/4 vCPUs; tune via VM_BOOT_DELAY if needed.
sleep "${VM_BOOT_DELAY:-90}"
[[ -S $MONITOR_SOCK ]] || exit 0
# send_chord <key1> [key2 ...] — chord released between calls
send_chord() {
local IFS='-'
local chord="$*"
printf 'sendkey %s\n' "$chord"
}
# send_str <text> — only ASCII printable + space + return
send_str() {
local s="$1" ch
local i=0
while (( i < ${#s} )); do
ch="${s:i:1}"
case "$ch" in
' ') printf 'sendkey spc\n' ;;
[a-z0-9]) printf 'sendkey %s\n' "$ch" ;;
[A-Z]) printf 'sendkey shift-%s\n' "${ch,,}" ;;
'-') printf 'sendkey minus\n' ;;
'_') printf 'sendkey shift-minus\n' ;;
'/') printf 'sendkey slash\n' ;;
'.') printf 'sendkey dot\n' ;;
'&') printf 'sendkey shift-7\n' ;;
esac
i=$((i+1))
done
}
{
send_chord ctrl alt f2
sleep 1
# Type: liveuser <enter> (no password by default on live)
send_str "liveuser"
printf 'sendkey ret\n'
sleep 2
send_str "sudo passwd -d liveuser"
printf 'sendkey ret\n'
sleep 1
send_str "sudo systemctl reload sshd"
printf 'sendkey ret\n'
} | socat - "UNIX-CONNECT:$MONITOR_SOCK" 2>/dev/null || true
) &
INJECT_PID=$!
trap 'kill $INJECT_PID 2>/dev/null || true; rm -f "$MONITOR_SOCK"; rm -rf "${SEED_DIR:-}"' EXIT
fi
echo "════════════════════════════════════════════════════════"
echo " veilor-os :: VM test"
echo " ISO : $ISO"
echo " Disk : $DISK"
echo " NVRAM : $NVRAM"
echo " Seed : ${SEED_ISO:-<none>}"
# Anaconda virtio-serial log channel.
#
# Anaconda 43.x autodetects /dev/virtio-ports/org.fedoraproject.anaconda.log.0
# and streams program/packaging/storage/anaconda logs through it in real
# time, before any tmpfs / pivot, before networking. Survives kernel
# panic. The host gets a tail-able file. No anaconda CLI flag, no
# kickstart change, just the QEMU virtio-serial wiring.
#
# We've lost logs three times in a row to anaconda failures + tmpfs
# reboots. Wiring this up so future failures auto-capture.
ANACONDA_LOG="$TEST_DIR/anaconda-vm-$(date +%Y%m%d-%H%M%S).log"
ANACONDA_LOG_DIR="$TEST_DIR/test-runs/$(date +%Y%m%d-%H%M%S)"
mkdir -p "$ANACONDA_LOG_DIR"
ANACONDA_LOG_ARGS=(
# Belt: virtio-serial (anaconda's setupVirtio rsyslog forward, fragile —
# depends on rsyslog being installed in the live ISO).
-chardev "file,id=anaclog,path=$ANACONDA_LOG"
-device virtio-serial-pci,id=vs1
-device "virtserialport,chardev=anaclog,bus=vs1.0,name=org.fedoraproject.anaconda.log.0"
# Braces: virtio-9p host directory share. veilor-installer mounts this
# at /mnt/hostlogs and rsyncs /tmp/*.log there post-anaconda.
-virtfs "local,path=$ANACONDA_LOG_DIR,mount_tag=hostlogs,security_model=mapped-xattr,id=hostlogs"
)
echo " AnaLog : $ANACONDA_LOG"
echo " HostFS : $ANACONDA_LOG_DIR (9p tag: hostlogs)"
echo " Mode : ${SECBOOT:+secboot}${SECBOOT:-stock UEFI}"
echo " Inject: ${HOST_PUBKEY:+yes}${HOST_PUBKEY:-no (no host pubkey)}"
echo "════════════════════════════════════════════════════════"
exec qemu-system-x86_64 \
-name veilor-os \
-enable-kvm \
-cpu host \
-smp 4 \
-m 4096 \
-machine q35,smm=on \
-global driver=cfi.pflash01,property=secure,value=on \
-drive if=pflash,format=raw,readonly=on,file="$OVMF_CODE" \
-drive if=pflash,format=raw,file="$NVRAM" \
-drive file="$DISK",if=virtio,format=qcow2,cache=writeback \
-drive file="$ISO",media=cdrom,readonly=on \
"${SEED_ARGS[@]}" \
"${MONITOR_ARGS[@]}" \
"${ANACONDA_LOG_ARGS[@]}" \
-boot menu=on,splash-time=2000 \
-netdev user,id=net0,hostfwd=tcp::2222-:22 \
-device virtio-net-pci,netdev=net0 \
-device virtio-rng-pci \
-vga virtio \
-display gtk,gl=on \
-audiodev pa,id=snd0 \
-device intel-hda \
-device hda-output,audiodev=snd0