Three-layer fix for the persistent anaconda transaction failure that
killed v0.5.28 (gen_grub_cfgstub) and v0.5.29 (aggregate dnf5 error).
## Layer 1: broad error suppression in transaction_progress.py
dnf5 under RPM 6.0 + cmdline anaconda emits a final aggregate
`error("transaction process has ended with errors..")` at end of
transaction whenever its internal failure counter > 0, regardless of
whether we suppressed individual script_error events. Reproduced
twice. The narrow patch in v0.5.29 suppressed per-package errors but
the aggregate still raised PayloadInstallationError and aborted the
install before the bootloader phase ran.
v0.5.30 patch turns the `elif token == 'error':` branch in
process_transaction_progress into a log.warning. All four producers
(cpio_error, script_error, unpack_error, generic error) now flow
through to a warning + continue. Pattern matches both the original
anaconda layout AND the v0.5.29 narrow-patched layout, so re-applying
on top of either is a no-op.
This brings us back to v0.5.28 broad-suppression behaviour. The
side effect that bit us in v0.5.28 (silent grub2-efi-x64 scriptlet
failure → empty /boot/efi/EFI/fedora/ → gen_grub_cfgstub fails)
is addressed by Layer 2 below.
## Layer 2: bootloader install moved out of anaconda
The generated install kickstart now has `bootloader --location=none`,
which tells anaconda NOT to invoke its own bootloader install code
path (and therefore NOT to call gen_grub_cfgstub). All grub work
moves into the chroot %post block:
1. `dnf reinstall grub2-efi-x64 grub2-pc grub2-tools shim-x64
efibootmgr` — re-runs scriptlets in the chroot with full
PID 1 systemd state, so the systemd-run-style triggers that
anaconda's chroot truncates actually execute.
2. `grub2-install --target=x86_64-efi --efi-directory=/boot/efi
--bootloader-id=fedora --no-nvram` — populates /boot/efi/EFI/fedora/
3. `gen_grub_cfgstub /boot/grub2 /boot/efi/EFI/fedora` (or
`grub2-mkconfig` fallback) — writes /boot/efi/EFI/fedora/grub.cfg.
4. `efibootmgr -c -d <disk> -p <part> -L "veilor-os" -l \EFI\fedora\shimx64.efi`
— registers the NVRAM boot entry pointing at the signed shim.
Each step logs to stdout and continues on failure (`set +e` block);
diagnostics surface in the install log without aborting the whole
%post.
## Layer 3: virtio-serial log capture in run-vm.sh
Anaconda 43.x autodetects `/dev/virtio-ports/org.fedoraproject.anaconda.log.0`
and streams program/packaging/storage/anaconda logs through it in
real time, before any tmpfs / pivot, before networking, surviving
kernel panic. Wiring it into run-vm.sh means the host gets a
tail-able log file at `test/anaconda-vm-YYYYMMDD-HHMMSS.log` for
every VM run.
We've lost logs three times in a row to anaconda failures + tmpfs
reboots. This breaks the loop.
## Diagnostic story
Before this commit: VM aborts → live ISO reboots itself → /tmp/
tmpfs gone → no logs → guess what failed. Three days, two and a
half false fixes.
After this commit: VM aborts → host has /home/admin/ai-lab/_github/veilor-os/test/anaconda-vm-*.log
with the actual scriptlet output, the actual exit codes, the
actual file-trigger failures. Future debug becomes evidence-based.
Files changed:
kickstart/veilor-os.ks — broad error suppression patch
overlay/usr/local/bin/veilor-installer — --location=none + manual grub
test/run-vm.sh — virtio-serial chardev wiring
Verified: bash -n clean, ksvalidator clean.
245 lines
9.3 KiB
Bash
Executable file
245 lines
9.3 KiB
Bash
Executable file
#!/usr/bin/env bash
|
|
# Boot veilor-os ISO in KVM/QEMU under UEFI.
|
|
# Usage:
|
|
# ./test/run-vm.sh # boots latest ISO from build/out
|
|
# ./test/run-vm.sh path/to.iso # specific ISO
|
|
# SECBOOT=1 ./test/run-vm.sh # use OVMF Secure Boot firmware
|
|
# FRESH=1 ./test/run-vm.sh # wipe disk + nvram, re-install from scratch
|
|
# NO_INJECT=1 ./test/run-vm.sh # skip SSH-key auto-injection
|
|
#
|
|
# SSH-key auto-injection (chosen approach: dual — cloud-init NoCloud + QEMU
|
|
# monitor sendkey fallback)
|
|
# ------------------------------------------------------------------
|
|
# Goal: previously each test required logging in at the QEMU console and
|
|
# running `passwd -d liveuser`, editing sshd_config, etc. before
|
|
# `ssh -p 2222 liveuser@localhost` worked. This script eliminates that.
|
|
#
|
|
# Primary path (works for the *installed* system, not the live image):
|
|
# * Detect host pubkey at ~/.ssh/id_ed25519.pub or ~/.ssh/id_rsa.pub
|
|
# * Build a NoCloud cloud-init ISO (user-data + meta-data) via mkisofs/xorriso
|
|
# * Mount it as a second virtual cdrom — Anaconda/cloud-init picks it up
|
|
# automatically when installing because the seed has the magic
|
|
# `cidata` volume label.
|
|
#
|
|
# Fallback path (works for the *live* image, which doesn't run cloud-init by
|
|
# default — dracut-live + livesys-scripts mount squashfs read-only and skip
|
|
# cloud-init.target):
|
|
# * Open a QEMU monitor unix socket (-monitor unix:...).
|
|
# * After ~90s (long enough for SDDM autologin → liveuser), background a
|
|
# helper that pipes a sequence of `sendkey` events to the monitor:
|
|
# Ctrl+Alt+F2 (drop to TTY)
|
|
# "sudo passwd -d liveuser && sudo systemctl reload sshd\n"
|
|
# This unblocks SSH on port 2222 without manual interaction.
|
|
#
|
|
# Both paths are best-effort; if the host has no pubkey, both are skipped
|
|
# and the script behaves exactly as before.
|
|
|
|
set -euo pipefail
|
|
|
|
REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
|
|
TEST_DIR="$REPO_ROOT/test"
|
|
DISK="$TEST_DIR/veilor-vm.qcow2"
|
|
NVRAM="$TEST_DIR/veilor-vm.nvram"
|
|
SEED_ISO="$TEST_DIR/cloud-init-seed.iso"
|
|
MONITOR_SOCK="$TEST_DIR/veilor-vm.monitor.sock"
|
|
|
|
ISO="${1:-$(ls -t "$REPO_ROOT"/build/out/*.iso 2>/dev/null | head -1)}"
|
|
[[ -n ${ISO:-} && -f $ISO ]] || { echo "[ERR] No ISO found. Build first: ./build/build-iso.sh"; exit 1; }
|
|
|
|
# OVMF firmware selection
|
|
if [[ "${SECBOOT:-0}" == "1" ]]; then
|
|
OVMF_CODE=/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd
|
|
OVMF_VARS_SRC=/usr/share/edk2/ovmf/OVMF_VARS.secboot.fd
|
|
NVRAM="$TEST_DIR/veilor-vm.nvram.secboot"
|
|
else
|
|
OVMF_CODE=/usr/share/edk2/ovmf/OVMF_CODE.fd
|
|
OVMF_VARS_SRC=/usr/share/edk2/ovmf/OVMF_VARS.fd
|
|
fi
|
|
|
|
# Reset on FRESH=1
|
|
if [[ "${FRESH:-0}" == "1" ]]; then
|
|
rm -f "$DISK" "$NVRAM" "$SEED_ISO"
|
|
fi
|
|
|
|
# Provision disk + per-VM nvram once
|
|
[[ -f $DISK ]] || qemu-img create -f qcow2 "$DISK" 40G
|
|
[[ -f $NVRAM ]] || cp "$OVMF_VARS_SRC" "$NVRAM"
|
|
|
|
# ── Locate host SSH pubkey (ed25519 preferred, rsa fallback) ──
|
|
HOST_PUBKEY=""
|
|
if [[ "${NO_INJECT:-0}" != "1" ]]; then
|
|
for cand in "$HOME/.ssh/id_ed25519.pub" "$HOME/.ssh/id_rsa.pub"; do
|
|
if [[ -f $cand ]]; then
|
|
HOST_PUBKEY="$(< "$cand")"
|
|
echo "[INFO] using host pubkey: $cand"
|
|
break
|
|
fi
|
|
done
|
|
fi
|
|
|
|
# ── Build cloud-init NoCloud seed ISO (primary path) ──
|
|
SEED_ARGS=()
|
|
if [[ -n $HOST_PUBKEY ]]; then
|
|
SEED_DIR="$(mktemp -d)"
|
|
trap 'rm -rf "$SEED_DIR"' EXIT
|
|
|
|
cat > "$SEED_DIR/meta-data" <<EOF
|
|
instance-id: veilor-test-vm
|
|
local-hostname: veilor-test
|
|
EOF
|
|
|
|
cat > "$SEED_DIR/user-data" <<EOF
|
|
#cloud-config
|
|
users:
|
|
- name: liveuser
|
|
ssh_authorized_keys:
|
|
- $HOST_PUBKEY
|
|
- name: admin
|
|
ssh_authorized_keys:
|
|
- $HOST_PUBKEY
|
|
lock_passwd: false
|
|
passwd:
|
|
ssh_pwauth: true
|
|
runcmd:
|
|
- rm -f /etc/ssh/sshd_config.d/10-veilor-hardening.conf
|
|
- systemctl reload sshd || systemctl restart sshd || true
|
|
EOF
|
|
|
|
# Build NoCloud ISO. Volume label MUST be "cidata" (case-insensitive)
|
|
# for cloud-init's NoCloud datasource to pick it up.
|
|
if command -v mkisofs >/dev/null 2>&1; then
|
|
mkisofs -quiet -output "$SEED_ISO" \
|
|
-volid cidata -joliet -rock \
|
|
"$SEED_DIR/user-data" "$SEED_DIR/meta-data"
|
|
elif command -v xorriso >/dev/null 2>&1; then
|
|
xorriso -as mkisofs -quiet -output "$SEED_ISO" \
|
|
-volid cidata -joliet -rock \
|
|
"$SEED_DIR/user-data" "$SEED_DIR/meta-data"
|
|
elif command -v cloud-localds >/dev/null 2>&1; then
|
|
cloud-localds "$SEED_ISO" "$SEED_DIR/user-data" "$SEED_DIR/meta-data"
|
|
else
|
|
echo "[WARN] no mkisofs/xorriso/cloud-localds — skipping cloud-init seed"
|
|
SEED_ISO=""
|
|
fi
|
|
|
|
if [[ -n $SEED_ISO && -f $SEED_ISO ]]; then
|
|
echo "[INFO] cloud-init seed ISO: $SEED_ISO"
|
|
SEED_ARGS=(-drive "file=$SEED_ISO,media=cdrom,readonly=on")
|
|
fi
|
|
fi
|
|
|
|
# ── QEMU monitor unix socket ──
|
|
# Always exposed so the host can drive the VM via `socat - UNIX-CONNECT:...`
|
|
# (sendkey, screendump, etc.) for debugging. Independent of pubkey injection.
|
|
rm -f "$MONITOR_SOCK"
|
|
MONITOR_ARGS=(-monitor "unix:$MONITOR_SOCK,server,nowait")
|
|
|
|
# ── Auto-inject helper (live ISO doesn't run cloud-init) ──
|
|
# Started in the background after a delay; sends keypresses through the
|
|
# QEMU monitor unix socket to drop to a TTY and unblock SSH for liveuser.
|
|
if [[ -n $HOST_PUBKEY ]]; then
|
|
|
|
(
|
|
# Wait for the VM to reach a usable login prompt (SDDM autologin →
|
|
# liveuser session is the most realistic target). 90s is enough on
|
|
# KVM/4 vCPUs; tune via VM_BOOT_DELAY if needed.
|
|
sleep "${VM_BOOT_DELAY:-90}"
|
|
[[ -S $MONITOR_SOCK ]] || exit 0
|
|
|
|
# send_chord <key1> [key2 ...] — chord released between calls
|
|
send_chord() {
|
|
local IFS='-'
|
|
local chord="$*"
|
|
printf 'sendkey %s\n' "$chord"
|
|
}
|
|
|
|
# send_str <text> — only ASCII printable + space + return
|
|
send_str() {
|
|
local s="$1" ch
|
|
local i=0
|
|
while (( i < ${#s} )); do
|
|
ch="${s:i:1}"
|
|
case "$ch" in
|
|
' ') printf 'sendkey spc\n' ;;
|
|
[a-z0-9]) printf 'sendkey %s\n' "$ch" ;;
|
|
[A-Z]) printf 'sendkey shift-%s\n' "${ch,,}" ;;
|
|
'-') printf 'sendkey minus\n' ;;
|
|
'_') printf 'sendkey shift-minus\n' ;;
|
|
'/') printf 'sendkey slash\n' ;;
|
|
'.') printf 'sendkey dot\n' ;;
|
|
'&') printf 'sendkey shift-7\n' ;;
|
|
esac
|
|
i=$((i+1))
|
|
done
|
|
}
|
|
|
|
{
|
|
send_chord ctrl alt f2
|
|
sleep 1
|
|
# Type: liveuser <enter> (no password by default on live)
|
|
send_str "liveuser"
|
|
printf 'sendkey ret\n'
|
|
sleep 2
|
|
send_str "sudo passwd -d liveuser"
|
|
printf 'sendkey ret\n'
|
|
sleep 1
|
|
send_str "sudo systemctl reload sshd"
|
|
printf 'sendkey ret\n'
|
|
} | socat - "UNIX-CONNECT:$MONITOR_SOCK" 2>/dev/null || true
|
|
) &
|
|
INJECT_PID=$!
|
|
trap 'kill $INJECT_PID 2>/dev/null || true; rm -f "$MONITOR_SOCK"; rm -rf "${SEED_DIR:-}"' EXIT
|
|
fi
|
|
|
|
echo "════════════════════════════════════════════════════════"
|
|
echo " veilor-os :: VM test"
|
|
echo " ISO : $ISO"
|
|
echo " Disk : $DISK"
|
|
echo " NVRAM : $NVRAM"
|
|
echo " Seed : ${SEED_ISO:-<none>}"
|
|
# Anaconda virtio-serial log channel.
|
|
#
|
|
# Anaconda 43.x autodetects /dev/virtio-ports/org.fedoraproject.anaconda.log.0
|
|
# and streams program/packaging/storage/anaconda logs through it in real
|
|
# time, before any tmpfs / pivot, before networking. Survives kernel
|
|
# panic. The host gets a tail-able file. No anaconda CLI flag, no
|
|
# kickstart change, just the QEMU virtio-serial wiring.
|
|
#
|
|
# We've lost logs three times in a row to anaconda failures + tmpfs
|
|
# reboots. Wiring this up so future failures auto-capture.
|
|
ANACONDA_LOG="$TEST_DIR/anaconda-vm-$(date +%Y%m%d-%H%M%S).log"
|
|
ANACONDA_LOG_ARGS=(
|
|
-chardev "file,id=anaclog,path=$ANACONDA_LOG"
|
|
-device virtio-serial-pci,id=vs1
|
|
-device "virtserialport,chardev=anaclog,bus=vs1.0,name=org.fedoraproject.anaconda.log.0"
|
|
)
|
|
echo " AnaLog: $ANACONDA_LOG"
|
|
|
|
echo " Mode : ${SECBOOT:+secboot}${SECBOOT:-stock UEFI}"
|
|
echo " Inject: ${HOST_PUBKEY:+yes}${HOST_PUBKEY:-no (no host pubkey)}"
|
|
echo "════════════════════════════════════════════════════════"
|
|
|
|
exec qemu-system-x86_64 \
|
|
-name veilor-os \
|
|
-enable-kvm \
|
|
-cpu host \
|
|
-smp 4 \
|
|
-m 4096 \
|
|
-machine q35,smm=on \
|
|
-global driver=cfi.pflash01,property=secure,value=on \
|
|
-drive if=pflash,format=raw,readonly=on,file="$OVMF_CODE" \
|
|
-drive if=pflash,format=raw,file="$NVRAM" \
|
|
-drive file="$DISK",if=virtio,format=qcow2,cache=writeback \
|
|
-drive file="$ISO",media=cdrom,readonly=on \
|
|
"${SEED_ARGS[@]}" \
|
|
"${MONITOR_ARGS[@]}" \
|
|
"${ANACONDA_LOG_ARGS[@]}" \
|
|
-boot menu=on,splash-time=2000 \
|
|
-netdev user,id=net0,hostfwd=tcp::2222-:22 \
|
|
-device virtio-net-pci,netdev=net0 \
|
|
-device virtio-rng-pci \
|
|
-vga virtio \
|
|
-display gtk,gl=on \
|
|
-audiodev pa,id=snd0 \
|
|
-device intel-hda \
|
|
-device hda-output,audiodev=snd0
|