From a0b0d02bf29110430bd23cacbdda078d7f3cbf13 Mon Sep 17 00:00:00 2001 From: veilor-org Date: Tue, 5 May 2026 13:47:23 +0100 Subject: [PATCH] v0.5.31: kernel-install via /etc/kernel/cmdline + set-e leak + rescue glob Four-bug fix from 4-agent verification wave on v0.5.30 outcome. Bug 1 CRITICAL: --location=none made anaconda skip CollectKernelArgumentsTask (installation.py:149-151). --append= args never collected, BLS entries wrote with empty cmdline. Drop --location=none, let anaconda do its bootloader path; broad transaction_progress patch already silences the gen_grub_cfgstub class failure. Bug 2 CRITICAL: kernel-install reads /etc/kernel/cmdline as source of truth (per 90-loaderentry.install:84-95). Veilor never wrote that file so kernel-install fell through to /proc/cmdline (live ISO's). Add 3-path write: /etc/kernel/cmdline (Path A canonical), /etc/default/grub (Path B legacy), grubby --update-kernel=ALL (Path C last-writer guard). Plus explicit kernel-install add per kernel after Path A write. Bug 3: rescue BLS glob *-0-rescue-*.conf required trailing hyphen; F43 uses *-0-rescue.conf. Fix: *-0-rescue*.conf (matches both). Bug 4: set +e/set -e scope leak in %post. v0.5.30 closed manual bootloader block with set -e which re-enabled errexit for the rest of %post that was authored with set +e semantics. Result: any non-guarded command failure aborted the LUKS args injection block. Fix: remove the closing set -e. Files: overlay/usr/local/bin/veilor-installer. Verified: bash -n clean, ksvalidator clean. --- overlay/usr/local/bin/veilor-installer | 135 ++++++++++++++++--------- 1 file changed, 89 insertions(+), 46 deletions(-) diff --git a/overlay/usr/local/bin/veilor-installer b/overlay/usr/local/bin/veilor-installer index 2cfbe36..4a0d328 100644 --- a/overlay/usr/local/bin/veilor-installer +++ b/overlay/usr/local/bin/veilor-installer @@ -407,21 +407,27 @@ __SSHKEY_DIRECTIVE__ # - `fbcon=nodefer` — keep linux framebuffer console alive through KMS # handoff so plymouth LUKS prompt remains visible on real GPUs. # -# `--location=none` — DO NOT let anaconda install the bootloader. v0.5.30 -# moved bootloader install to %post chroot below for two reasons: -# 1. Anaconda's gen_grub_cfgstub script (efi.py:194-201) runs -# against an /boot/efi/EFI/fedora/ tree that grub2-efi-x64's -# posttrans scriptlet may not have populated yet — Fedora 43's -# RPM 6.0 + dnf5 + cmdline-mode anaconda combo is brittle here. -# Reproduced as "gen_grub_cfgstub script failed" twice. -# 2. Running grub2-install + grub2-mkconfig directly in %post lets -# us pick up the env after anaconda finishes the package -# transaction, with all scriptlets' file artifacts settled, and -# gives clearer error messages if anything goes wrong. -# We still install the packages (grub2-efi-x64, shim-x64, efibootmgr) -# via %packages — anaconda just doesn't auto-invoke its bootloader code -# path. -bootloader --location=none --append="lockdown=integrity slab_nomerge init_on_alloc=1 init_on_free=1 randomize_kstack_offset=on vsyscall=none fbcon=nodefer" +# NOTE on --location: v0.5.30 used --location=none to skip anaconda's +# bootloader install (sidestep gen_grub_cfgstub). Side effect: anaconda +# also skipped CollectKernelArgumentsTask (installation.py:149-151), so +# `--append=` args were NEVER COLLECTED. kernel-install then wrote BLS +# entries with empty /etc/kernel/cmdline, falling through to the live +# ISO's /proc/cmdline — no rd.luks.uuid, no fbcon=nodefer, no hardening. +# Result: dracut emergency shell on first boot. +# +# v0.5.31 lets anaconda install the bootloader (default behavior, no +# --location flag). With our broad transaction_progress patch in the +# live ks, anaconda's gen_grub_cfgstub still runs, but if grub2-efi-x64's +# posttrans had a non-fatal scriptlet failure the patch swallows it +# without aborting. The %post chroot below STILL does belt-and-braces +# fixup (dnf reinstall, grub2-install, etc.) in case anaconda's path +# left something incomplete. +# +# Critically v0.5.31 also writes /etc/kernel/cmdline FIRST in %post then +# re-runs kernel-install per kernel. That's the canonical Fedora 43 path +# for landing args in BLS entries — kernel-install reads /etc/kernel/cmdline +# (90-loaderentry.install:84-95) when generating BLS option lines. +bootloader --append="lockdown=integrity slab_nomerge init_on_alloc=1 init_on_free=1 randomize_kstack_offset=on vsyscall=none fbcon=nodefer" # Disk: zero, LUKS2 (argon2id), btrfs subvolumes (no LVM intermediary). # Native btrfs-on-LUKS matches Fedora KDE Spin defaults; LVM+btrfs combo @@ -664,7 +670,13 @@ if [ -n "$EFI_DISK" ] && [ -e "$EFI_DISK" ]; then fi echo "[INFO] bootloader install: see above for any [WARN] lines" -set -e +# NOTE: deliberately NOT `set -e` here. The block above opened with +# `set +e` and the rest of %post is a sequence of best-effort hardening +# steps that have local `|| true` guards on the operations that may +# legitimately fail. Re-enabling errexit would cause `set -e` to abort +# the whole %post on the first non-guarded command (e.g. a `grep -q` +# returning 1). v0.5.30 had this bug and it silently truncated +# the LUKS args injection. # GRUB branding: replace fedora distributor with veilor-os in menu titles. # Drop rhgb quiet from default cmdline → all kernel/systemd messages @@ -696,45 +708,73 @@ sed -i \ # user lands in emergency shell on first boot. LUKS_UUID=$(blkid -t TYPE=crypto_LUKS -o value -s UUID 2>/dev/null | head -1) if [ -n "$LUKS_UUID" ]; then - # Args: - # rd.luks.uuid=luks-XXX — tells dracut to expect a LUKS device, - # triggers cryptsetup-generator. - # rd.luks.options=...=tries=5 — five typo retries before giving up - # (default 1; one slip = emergency - # shell after 3min, terrible UX). - # rd.luks.options=...=timeout=0 — never time out unlock device wait - # (default 1m30s; slow user typing - # on a long passphrase still works). - # fbcon=nodefer — keep linux framebuffer console alive - # through KMS handoff. Without this on - # real laptops the plymouth LUKS prompt - # draws into a frozen framebuffer and - # the user sees a black screen with a - # blinking cursor. Already in the live - # ISO bootloader cmdline; missing from - # the installed-system bootloader line - # in the generated install ks above - # (also fixed there). - LUKS_ARGS="rd.luks.uuid=luks-${LUKS_UUID} rd.luks.options=luks-${LUKS_UUID}=tries=5,timeout=0 fbcon=nodefer" + LUKS_ARGS="rd.luks.uuid=luks-${LUKS_UUID} rd.luks.options=luks-${LUKS_UUID}=tries=5,timeout=0" + HARDEN_ARGS="lockdown=integrity slab_nomerge init_on_alloc=1 init_on_free=1 randomize_kstack_offset=on vsyscall=none fbcon=nodefer" - # Path 1: persist into /etc/default/grub so future kernels inherit. + # Find the running root UUID (the btrfs filesystem holding the root + # subvol). At this point in %post chroot, `/` is the target root; + # findmnt -o UUID resolves to the btrfs UUID anaconda chose. + ROOT_UUID=$(findmnt -n -o UUID /) + [ -z "$ROOT_UUID" ] && ROOT_UUID=$(blkid -s UUID -o value /dev/mapper/luks-${LUKS_UUID} 2>/dev/null) + + # Three write paths, in priority order: + # + # Path A: /etc/kernel/cmdline (the canonical source of truth for + # `kernel-install`). Per /usr/lib/kernel/install.d/90-loaderentry.install + # lines 84-95, kernel-install reads /etc/kernel/cmdline first when + # authoring BLS entries. If we write this BEFORE re-running + # kernel-install, every BLS entry inherits our args. Persists + # across `dnf upgrade kernel`, `dnf reinstall grub2-*`, and any + # other path that re-fires kernel-install hooks. + # + # Path B: /etc/default/grub (legacy GRUB_CMDLINE_LINUX). Read by + # `grub2-mkconfig` for the generated grub.cfg. Belt-and-braces; + # kernel-install ignores this, but grub2-mkconfig respects it. + # + # Path C: grubby --update-kernel=ALL. Direct edit to BLS option + # lines. Acts as the last-writer in case our cmdline write didn't + # trigger a fresh kernel-install pass. + # + # Earlier veilor-os versions only used B+C. v0.5.31 adds Path A as + # the primary, because v0.5.30 testing showed B+C are racy with + # anaconda's own CreateBLSEntriesTask which uses kernel-install + # internally and can rewrite entries from empty /etc/kernel/cmdline, + # producing options lines with no rd.luks.uuid even when grubby + # successfully ran. + + # Path A + mkdir -p /etc/kernel + if [ -n "$ROOT_UUID" ]; then + echo "root=UUID=${ROOT_UUID} ro rootflags=subvol=root ${LUKS_ARGS} ${HARDEN_ARGS}" > /etc/kernel/cmdline + echo "[INFO] wrote /etc/kernel/cmdline (canonical kernel-install source)" + else + echo "[WARN] could not determine root UUID; /etc/kernel/cmdline not written" + fi + + # Path B if ! grep -q "rd.luks.uuid" /etc/default/grub 2>/dev/null; then sed -i "s|^GRUB_CMDLINE_LINUX=\"|GRUB_CMDLINE_LINUX=\"${LUKS_ARGS} |" /etc/default/grub fi - # Path 2: update existing BLS entries so the kernel that boots NEXT - # gets the args. grubby walks /boot/loader/entries/*.conf and edits - # the `options` line in-place. + # Re-run kernel-install for every kernel — picks up new /etc/kernel/cmdline, + # rewrites BLS entries with our args. This is the load-bearing step. + for kver in /lib/modules/*/; do + kver=$(basename "$kver") + [ -f "/lib/modules/$kver/vmlinuz" ] || continue + kernel-install add "$kver" "/lib/modules/$kver/vmlinuz" 2>&1 | tail -3 || \ + echo "[WARN] kernel-install add $kver failed" + done + + # Path C: belt-and-braces grubby update in case kernel-install missed any grubby --update-kernel=ALL --args="${LUKS_ARGS}" 2>&1 | tail -5 || true - # Verification: every BLS entry MUST carry the LUKS arg now. Empty - # output = success. + # Verification: every BLS entry MUST carry the LUKS arg. drift=$(grep -L "rd.luks.uuid" /boot/loader/entries/*.conf 2>/dev/null) if [ -n "$drift" ]; then - echo "[WARN] BLS entries missing rd.luks.uuid: $drift" + echo "[ERR] BLS entries missing rd.luks.uuid after all 3 paths: $drift" + else + echo "[OK] all BLS entries carry rd.luks.uuid" fi - - echo "[INFO] injected ${LUKS_ARGS} into /etc/default/grub + BLS entries" fi # Verify anaconda wrote /etc/crypttab for the LUKS device. anaconda's @@ -786,7 +826,10 @@ grub2-mkconfig -o /boot/grub2/grub.cfg 2>/dev/null || true # points at it is created by `kernel-install` and shows up in GRUB as a # second menu item. For a branded distro it's noisy + reveals "Fedora" # in the menu. The rescue image itself is harmless to keep on disk. -rm -f /boot/loader/entries/*-0-rescue-*.conf 2>/dev/null || true +# Match both `-0-rescue.conf` (current Fedora 43 layout) and +# `-0-rescue-.conf` (older layout). The earlier glob +# `*-0-rescue-*.conf` required a trailing hyphen and missed the new form. +rm -f /boot/loader/entries/*-0-rescue*.conf 2>/dev/null || true # Hostname: default to "veilor" rather than the localhost-live / fedora # fallback that anaconda writes. User can override post-install with