v0.5.31: kernel-install via /etc/kernel/cmdline + set-e leak + rescue glob

Four-bug fix from 4-agent verification wave on v0.5.30 outcome.

Bug 1 CRITICAL: --location=none made anaconda skip CollectKernelArgumentsTask
(installation.py:149-151). --append= args never collected, BLS entries
wrote with empty cmdline. Drop --location=none, let anaconda do its
bootloader path; broad transaction_progress patch already silences the
gen_grub_cfgstub class failure.

Bug 2 CRITICAL: kernel-install reads /etc/kernel/cmdline as source of
truth (per 90-loaderentry.install:84-95). Veilor never wrote that file
so kernel-install fell through to /proc/cmdline (live ISO's). Add
3-path write: /etc/kernel/cmdline (Path A canonical), /etc/default/grub
(Path B legacy), grubby --update-kernel=ALL (Path C last-writer guard).
Plus explicit kernel-install add per kernel after Path A write.

Bug 3: rescue BLS glob *-0-rescue-*.conf required trailing hyphen;
F43 uses *-0-rescue.conf. Fix: *-0-rescue*.conf (matches both).

Bug 4: set +e/set -e scope leak in %post. v0.5.30 closed manual
bootloader block with set -e which re-enabled errexit for the rest of
%post that was authored with set +e semantics. Result: any
non-guarded command failure aborted the LUKS args injection block.
Fix: remove the closing set -e.

Files: overlay/usr/local/bin/veilor-installer.
Verified: bash -n clean, ksvalidator clean.
This commit is contained in:
veilor-org 2026-05-05 13:47:23 +01:00
parent e83483a077
commit 2788b95a12

View file

@ -407,21 +407,27 @@ __SSHKEY_DIRECTIVE__
# - `fbcon=nodefer` — keep linux framebuffer console alive through KMS # - `fbcon=nodefer` — keep linux framebuffer console alive through KMS
# handoff so plymouth LUKS prompt remains visible on real GPUs. # handoff so plymouth LUKS prompt remains visible on real GPUs.
# #
# `--location=none` — DO NOT let anaconda install the bootloader. v0.5.30 # NOTE on --location: v0.5.30 used --location=none to skip anaconda's
# moved bootloader install to %post chroot below for two reasons: # bootloader install (sidestep gen_grub_cfgstub). Side effect: anaconda
# 1. Anaconda's gen_grub_cfgstub script (efi.py:194-201) runs # also skipped CollectKernelArgumentsTask (installation.py:149-151), so
# against an /boot/efi/EFI/fedora/ tree that grub2-efi-x64's # `--append=` args were NEVER COLLECTED. kernel-install then wrote BLS
# posttrans scriptlet may not have populated yet — Fedora 43's # entries with empty /etc/kernel/cmdline, falling through to the live
# RPM 6.0 + dnf5 + cmdline-mode anaconda combo is brittle here. # ISO's /proc/cmdline — no rd.luks.uuid, no fbcon=nodefer, no hardening.
# Reproduced as "gen_grub_cfgstub script failed" twice. # Result: dracut emergency shell on first boot.
# 2. Running grub2-install + grub2-mkconfig directly in %post lets #
# us pick up the env after anaconda finishes the package # v0.5.31 lets anaconda install the bootloader (default behavior, no
# transaction, with all scriptlets' file artifacts settled, and # --location flag). With our broad transaction_progress patch in the
# gives clearer error messages if anything goes wrong. # live ks, anaconda's gen_grub_cfgstub still runs, but if grub2-efi-x64's
# We still install the packages (grub2-efi-x64, shim-x64, efibootmgr) # posttrans had a non-fatal scriptlet failure the patch swallows it
# via %packages — anaconda just doesn't auto-invoke its bootloader code # without aborting. The %post chroot below STILL does belt-and-braces
# path. # fixup (dnf reinstall, grub2-install, etc.) in case anaconda's path
bootloader --location=none --append="lockdown=integrity slab_nomerge init_on_alloc=1 init_on_free=1 randomize_kstack_offset=on vsyscall=none fbcon=nodefer" # left something incomplete.
#
# Critically v0.5.31 also writes /etc/kernel/cmdline FIRST in %post then
# re-runs kernel-install per kernel. That's the canonical Fedora 43 path
# for landing args in BLS entries — kernel-install reads /etc/kernel/cmdline
# (90-loaderentry.install:84-95) when generating BLS option lines.
bootloader --append="lockdown=integrity slab_nomerge init_on_alloc=1 init_on_free=1 randomize_kstack_offset=on vsyscall=none fbcon=nodefer"
# Disk: zero, LUKS2 (argon2id), btrfs subvolumes (no LVM intermediary). # Disk: zero, LUKS2 (argon2id), btrfs subvolumes (no LVM intermediary).
# Native btrfs-on-LUKS matches Fedora KDE Spin defaults; LVM+btrfs combo # Native btrfs-on-LUKS matches Fedora KDE Spin defaults; LVM+btrfs combo
@ -664,7 +670,13 @@ if [ -n "$EFI_DISK" ] && [ -e "$EFI_DISK" ]; then
fi fi
echo "[INFO] bootloader install: see above for any [WARN] lines" echo "[INFO] bootloader install: see above for any [WARN] lines"
set -e # NOTE: deliberately NOT `set -e` here. The block above opened with
# `set +e` and the rest of %post is a sequence of best-effort hardening
# steps that have local `|| true` guards on the operations that may
# legitimately fail. Re-enabling errexit would cause `set -e` to abort
# the whole %post on the first non-guarded command (e.g. a `grep -q`
# returning 1). v0.5.30 had this bug and it silently truncated
# the LUKS args injection.
# GRUB branding: replace fedora distributor with veilor-os in menu titles. # GRUB branding: replace fedora distributor with veilor-os in menu titles.
# Drop rhgb quiet from default cmdline → all kernel/systemd messages # Drop rhgb quiet from default cmdline → all kernel/systemd messages
@ -696,45 +708,73 @@ sed -i \
# user lands in emergency shell on first boot. # user lands in emergency shell on first boot.
LUKS_UUID=$(blkid -t TYPE=crypto_LUKS -o value -s UUID 2>/dev/null | head -1) LUKS_UUID=$(blkid -t TYPE=crypto_LUKS -o value -s UUID 2>/dev/null | head -1)
if [ -n "$LUKS_UUID" ]; then if [ -n "$LUKS_UUID" ]; then
# Args: LUKS_ARGS="rd.luks.uuid=luks-${LUKS_UUID} rd.luks.options=luks-${LUKS_UUID}=tries=5,timeout=0"
# rd.luks.uuid=luks-XXX — tells dracut to expect a LUKS device, HARDEN_ARGS="lockdown=integrity slab_nomerge init_on_alloc=1 init_on_free=1 randomize_kstack_offset=on vsyscall=none fbcon=nodefer"
# triggers cryptsetup-generator.
# rd.luks.options=...=tries=5 — five typo retries before giving up
# (default 1; one slip = emergency
# shell after 3min, terrible UX).
# rd.luks.options=...=timeout=0 — never time out unlock device wait
# (default 1m30s; slow user typing
# on a long passphrase still works).
# fbcon=nodefer — keep linux framebuffer console alive
# through KMS handoff. Without this on
# real laptops the plymouth LUKS prompt
# draws into a frozen framebuffer and
# the user sees a black screen with a
# blinking cursor. Already in the live
# ISO bootloader cmdline; missing from
# the installed-system bootloader line
# in the generated install ks above
# (also fixed there).
LUKS_ARGS="rd.luks.uuid=luks-${LUKS_UUID} rd.luks.options=luks-${LUKS_UUID}=tries=5,timeout=0 fbcon=nodefer"
# Path 1: persist into /etc/default/grub so future kernels inherit. # Find the running root UUID (the btrfs filesystem holding the root
# subvol). At this point in %post chroot, `/` is the target root;
# findmnt -o UUID resolves to the btrfs UUID anaconda chose.
ROOT_UUID=$(findmnt -n -o UUID /)
[ -z "$ROOT_UUID" ] && ROOT_UUID=$(blkid -s UUID -o value /dev/mapper/luks-${LUKS_UUID} 2>/dev/null)
# Three write paths, in priority order:
#
# Path A: /etc/kernel/cmdline (the canonical source of truth for
# `kernel-install`). Per /usr/lib/kernel/install.d/90-loaderentry.install
# lines 84-95, kernel-install reads /etc/kernel/cmdline first when
# authoring BLS entries. If we write this BEFORE re-running
# kernel-install, every BLS entry inherits our args. Persists
# across `dnf upgrade kernel`, `dnf reinstall grub2-*`, and any
# other path that re-fires kernel-install hooks.
#
# Path B: /etc/default/grub (legacy GRUB_CMDLINE_LINUX). Read by
# `grub2-mkconfig` for the generated grub.cfg. Belt-and-braces;
# kernel-install ignores this, but grub2-mkconfig respects it.
#
# Path C: grubby --update-kernel=ALL. Direct edit to BLS option
# lines. Acts as the last-writer in case our cmdline write didn't
# trigger a fresh kernel-install pass.
#
# Earlier veilor-os versions only used B+C. v0.5.31 adds Path A as
# the primary, because v0.5.30 testing showed B+C are racy with
# anaconda's own CreateBLSEntriesTask which uses kernel-install
# internally and can rewrite entries from empty /etc/kernel/cmdline,
# producing options lines with no rd.luks.uuid even when grubby
# successfully ran.
# Path A
mkdir -p /etc/kernel
if [ -n "$ROOT_UUID" ]; then
echo "root=UUID=${ROOT_UUID} ro rootflags=subvol=root ${LUKS_ARGS} ${HARDEN_ARGS}" > /etc/kernel/cmdline
echo "[INFO] wrote /etc/kernel/cmdline (canonical kernel-install source)"
else
echo "[WARN] could not determine root UUID; /etc/kernel/cmdline not written"
fi
# Path B
if ! grep -q "rd.luks.uuid" /etc/default/grub 2>/dev/null; then if ! grep -q "rd.luks.uuid" /etc/default/grub 2>/dev/null; then
sed -i "s|^GRUB_CMDLINE_LINUX=\"|GRUB_CMDLINE_LINUX=\"${LUKS_ARGS} |" /etc/default/grub sed -i "s|^GRUB_CMDLINE_LINUX=\"|GRUB_CMDLINE_LINUX=\"${LUKS_ARGS} |" /etc/default/grub
fi fi
# Path 2: update existing BLS entries so the kernel that boots NEXT # Re-run kernel-install for every kernel — picks up new /etc/kernel/cmdline,
# gets the args. grubby walks /boot/loader/entries/*.conf and edits # rewrites BLS entries with our args. This is the load-bearing step.
# the `options` line in-place. for kver in /lib/modules/*/; do
kver=$(basename "$kver")
[ -f "/lib/modules/$kver/vmlinuz" ] || continue
kernel-install add "$kver" "/lib/modules/$kver/vmlinuz" 2>&1 | tail -3 || \
echo "[WARN] kernel-install add $kver failed"
done
# Path C: belt-and-braces grubby update in case kernel-install missed any
grubby --update-kernel=ALL --args="${LUKS_ARGS}" 2>&1 | tail -5 || true grubby --update-kernel=ALL --args="${LUKS_ARGS}" 2>&1 | tail -5 || true
# Verification: every BLS entry MUST carry the LUKS arg now. Empty # Verification: every BLS entry MUST carry the LUKS arg.
# output = success.
drift=$(grep -L "rd.luks.uuid" /boot/loader/entries/*.conf 2>/dev/null) drift=$(grep -L "rd.luks.uuid" /boot/loader/entries/*.conf 2>/dev/null)
if [ -n "$drift" ]; then if [ -n "$drift" ]; then
echo "[WARN] BLS entries missing rd.luks.uuid: $drift" echo "[ERR] BLS entries missing rd.luks.uuid after all 3 paths: $drift"
else
echo "[OK] all BLS entries carry rd.luks.uuid"
fi fi
echo "[INFO] injected ${LUKS_ARGS} into /etc/default/grub + BLS entries"
fi fi
# Verify anaconda wrote /etc/crypttab for the LUKS device. anaconda's # Verify anaconda wrote /etc/crypttab for the LUKS device. anaconda's
@ -786,7 +826,10 @@ grub2-mkconfig -o /boot/grub2/grub.cfg 2>/dev/null || true
# points at it is created by `kernel-install` and shows up in GRUB as a # points at it is created by `kernel-install` and shows up in GRUB as a
# second menu item. For a branded distro it's noisy + reveals "Fedora" # second menu item. For a branded distro it's noisy + reveals "Fedora"
# in the menu. The rescue image itself is harmless to keep on disk. # in the menu. The rescue image itself is harmless to keep on disk.
rm -f /boot/loader/entries/*-0-rescue-*.conf 2>/dev/null || true # Match both `<machine-id>-0-rescue.conf` (current Fedora 43 layout) and
# `<machine-id>-0-rescue-<kver>.conf` (older layout). The earlier glob
# `*-0-rescue-*.conf` required a trailing hyphen and missed the new form.
rm -f /boot/loader/entries/*-0-rescue*.conf 2>/dev/null || true
# Hostname: default to "veilor" rather than the localhost-live / fedora # Hostname: default to "veilor" rather than the localhost-live / fedora
# fallback that anaconda writes. User can override post-install with # fallback that anaconda writes. User can override post-install with