v0.5.29: narrow anaconda patch + LUKS UX + initramfs assertion
Five-fix bundle from 7-agent research wave on the v0.5.28-final
gen_grub_cfgstub failure.
## 1. Narrow the anaconda transaction_progress patch (CRITICAL)
The v0.5.28 patch was too broad. It rewrote
`process_transaction_progress` so every 'error' token in the
transaction queue became a `log.warning`. That queue carries four
distinct error classes:
- cpio_error — payload extraction (genuinely fatal)
- script_error — RPM 6.0 cmdline-mode scriptlet warning-as-error
(the ONE we want to ignore)
- unpack_error — payload corruption (genuinely fatal)
- error — generic transaction error (genuinely fatal)
By swallowing all four we silently masked grub2-efi-x64's posttrans
failure mid-install. /boot/efi/EFI/fedora/ ended up incomplete →
gen_grub_cfgstub then failed at the bootloader install phase with
"gen_grub_cfgstub script failed" because its `set -eu` script
couldn't read the missing files.
v0.5.29 narrows the patch: override only the `script_error` callback
inside transaction_progress.py to log a warning and NOT enqueue
'error'. The consumer (`process_transaction_progress`) reverts to
upstream behaviour where cpio_error / unpack_error / error still
raise PayloadInstallationError. Real install-fatal events keep
aborting; only the F43-RPM-6.0 scriptlet regression is silenced.
The patch is applied via `python3 -c` regex rewrite (more robust
than nested sed across multi-line method bodies).
## 2. LUKS UX — `tries=5,timeout=0` (FIX)
Default cryptsetup-generator unit allows ONE passphrase try with a
1m30s wait. One typo on a long passphrase = wait 1m30s, then the
device-wait timer trips, then dracut emergency shell after 3min total.
Brutal. Adding `rd.luks.options=luks-XXX=tries=5,timeout=0` gives
five typo-friendly retries with no auto-timeout.
## 3. fbcon=nodefer on installed-system cmdline (FIX)
Live ISO cmdline already has `fbcon=nodefer` (added in v0.5.27 to fix
the real-laptop black-screen-after-dracut). The installed-system
bootloader directive in the generated install ks did NOT carry it.
Same KMS handoff happens on the installed system on the same hardware.
Now both have the flag.
## 4. /etc/crypttab fallback assertion (BELT-BRACES)
Anaconda's custom-partitioning code path normally writes /etc/crypttab
for `--encrypted` part directives. Edge cases observed in F43+ where
it doesn't. Without crypttab, systemd-cryptsetup-generator can still
work from kernel cmdline alone, but cleanup paths and second-stage
unlock both fall over. Adding a fallback `echo` that writes the
canonical line if it's missing post-anaconda.
## 5. Initramfs LUKS module assertion (DEFENSIVE)
Force-include `crypt + systemd-cryptsetup + plymouth` modules in
initramfs via /etc/dracut.conf.d/10-veilor-luks.conf. dracut autodetects
these when it sees an active LUKS mapping, but %post runs before the
LUKS state is fully observable from the chroot. Plus we wipe stale
initramfs (`rm -f /boot/initramfs-*.img`) before `--regenerate-all`
so the regen actually rewrites bytes. Final assertion runs
`lsinitrd | grep -q cryptsetup` and surfaces a [ERR] line in build
output if the module didn't make it.
## What this should fix
After the man-db fix in v0.5.28-final, install proceeded past
"Configuring xxx" cleanly but died at "Installing boot loader" with
gen_grub_cfgstub. Root-cause was the over-broad patch from #1 above.
After v0.5.29:
- Install transaction completes (man-db excluded; non-man-db
scriptlet warnings still suppressed; real errors still raise)
- gen_grub_cfgstub runs against complete /boot/efi/EFI/fedora/
- Bootloader install completes
- Reboot to disk lands at GRUB veilor-os entry
- Kernel + initramfs load (cryptsetup confirmed present)
- Plymouth LUKS prompt appears with text fallback
- User has 5 tries, no timeout
- Unlock → btrfs subvol mount → systemd → SDDM
Files: kickstart/veilor-os.ks (+45 lines), overlay/usr/local/bin/veilor-installer (+50 lines).
Verified: bash -n clean, ksvalidator clean.
References:
pyanaconda transaction_progress.py:110-136 (4 producers of 'error')
pyanaconda bootloader/efi.py:194-201 (gen_grub_cfgstub call site)
/usr/bin/gen_grub_cfgstub (set -eu wrapper for grub2-mkconfig stub)
Fedora wiki Changes/RPM-6.0
dnf5 issue #2507 (RPM 6.0 scriptlet propagation regression)
This commit is contained in:
parent
fae677fb68
commit
613d35402e
2 changed files with 132 additions and 31 deletions
|
|
@ -275,42 +275,72 @@ compression-algorithm = zstd
|
|||
EOF
|
||||
|
||||
# Patch anaconda's transaction_progress.py inside the live rootfs so that
|
||||
# when the user clicks "Install" from the live ISO and anaconda runs in
|
||||
# --cmdline mode, a non-fatal scriptlet warning (RC=5) does not get
|
||||
# escalated to "An error occurred during the transaction" + abort.
|
||||
# when the user clicks "Install", a non-fatal RPM 6.0 *scriptlet* warning
|
||||
# does not get escalated to "An error occurred during the transaction"
|
||||
# and abort.
|
||||
#
|
||||
# Why this is needed: Fedora 43 ships RPM 6.0, which changed scriptlet
|
||||
# failure propagation (Fedora wiki Changes/RPM-6.0; dnf5 issue #2507).
|
||||
# Scriptlets that previously emitted "Non-critical error" warnings now
|
||||
# bubble up as transaction-level errors. man-db's
|
||||
# `transfiletriggerin` is the most common trigger — `systemd-run
|
||||
# /usr/bin/systemctl start man-db-cache-update` returns non-zero in
|
||||
# the anaconda chroot, RPM-6.0-aware dnf5 reports it as transaction
|
||||
# error, anaconda --cmdline aborts.
|
||||
# This patch is NARROW — it overrides ONLY the `script_error` callback,
|
||||
# not the consumer (`process_transaction_progress`). v0.5.28 had a broad
|
||||
# patch that turned EVERY 'error' token into a warning, including
|
||||
# `cpio_error` (payload corruption) and `unpack_error` (extraction
|
||||
# failures). Side effect: silent grub2-efi-x64 scriptlet failure →
|
||||
# /boot/efi/EFI/fedora/ left incomplete → `gen_grub_cfgstub` failed at
|
||||
# the bootloader install phase. Narrowing eliminates that class of
|
||||
# silent failure.
|
||||
#
|
||||
# We previously patched the same file on the BUILD HOST (build/build-iso.sh)
|
||||
# so livecd-creator could finish its own transaction. That patch lives
|
||||
# only on the host running the build — never landed in the live rootfs
|
||||
# the user installs from. Reproduced 3 consecutive VM tests
|
||||
# (v0.5.26 / v0.5.27 / v0.5.28) failing at exactly "Configuring
|
||||
# man-db.x86_64".
|
||||
# Why a patch is needed at all: Fedora 43 ships RPM 6.0, which changed
|
||||
# scriptlet failure propagation (Fedora wiki Changes/RPM-6.0; dnf5 issue
|
||||
# 2507). Scriptlets that previously emitted "Non-critical error"
|
||||
# warnings now bubble up as transaction-level errors. man-db's
|
||||
# `transfiletriggerin` (`systemd-run /usr/bin/systemctl start
|
||||
# man-db-cache-update`) is the most common trigger — non-zero in the
|
||||
# anaconda chroot, RPM-6.0-aware dnf5 reports as error, anaconda
|
||||
# --cmdline aborts.
|
||||
#
|
||||
# The patch downgrades the 'error' token in transaction progress
|
||||
# callback to a warning log line. Confirmed working at build time
|
||||
# (build/build-iso.sh:47-51).
|
||||
# After the patch:
|
||||
# - script_error → log warning, do NOT enqueue 'error' (transaction
|
||||
# continues; specific package's posttrans whose result we ignore is
|
||||
# already in the install set, scriptlet has run as far as it can).
|
||||
# - cpio_error / unpack_error / generic error → unchanged, still
|
||||
# raise PayloadInstallationError as anaconda intends. Real
|
||||
# transaction-fatal events still abort install (good).
|
||||
TP=/usr/lib64/python3.14/site-packages/pyanaconda/modules/payloads/payload/dnf/transaction_progress.py
|
||||
if [ -f "$TP" ]; then
|
||||
cp -a "$TP" "${TP}.veilor-bak"
|
||||
sed -i 's|raise PayloadInstallationError("An error occurred during the transaction: " + msg)|log.warning("veilor: ignoring non-fatal transaction error: %s", msg)|' "$TP"
|
||||
if grep -q 'veilor: ignoring' "$TP"; then
|
||||
echo "[OK] transaction_progress.py patched in live rootfs"
|
||||
|
||||
# Replace the script_error self._queue.put(('error', ...)) line with a
|
||||
# warning log + return. The script_error method is uniquely identified
|
||||
# by its `return_code` argument; sed targets that line specifically.
|
||||
# `python3 -c` block is more robust than nested sed across multi-line
|
||||
# statements; rewrite the whole script_error method body.
|
||||
python3 - "$TP" <<'PYEOF'
|
||||
import sys, re
|
||||
path = sys.argv[1]
|
||||
src = open(path).read()
|
||||
# Find the script_error method and replace the queue.put(...) line at its end
|
||||
new = re.sub(
|
||||
r'( def script_error\(self, item, nevra, type, return_code\):.*?)\n self\._queue\.put\(\(.error., item\.get_package\(\)\.to_string\(\)\)\)',
|
||||
r'\1\n log.warning("veilor: ignoring non-fatal scriptlet failure rc=%s for %s",\n return_code,\n item.get_package().to_string() if item else "unknown")\n # do NOT enqueue \'error\' — let install continue (RPM 6.0 cmdline regression workaround)',
|
||||
src,
|
||||
flags=re.DOTALL,
|
||||
count=1,
|
||||
)
|
||||
if new == src:
|
||||
print("[ERR] script_error method not found in expected form — anaconda layout changed")
|
||||
sys.exit(1)
|
||||
open(path, "w").write(new)
|
||||
print("[OK] transaction_progress.py: narrowed script_error override")
|
||||
PYEOF
|
||||
|
||||
if grep -q "veilor: ignoring non-fatal scriptlet" "$TP"; then
|
||||
# Drop the cached .pyc so the patched .py is what runs.
|
||||
rm -f /usr/lib64/python3.14/site-packages/pyanaconda/modules/payloads/payload/dnf/__pycache__/transaction_progress.*.pyc 2>/dev/null || true
|
||||
echo "[OK] anaconda transaction_progress.py patched in live rootfs (script_error only)"
|
||||
else
|
||||
echo "[WARN] transaction_progress.py patch did not apply — file format may have changed in this anaconda version"
|
||||
echo "[WARN] transaction_progress.py patch did not apply — anaconda layout may have changed"
|
||||
fi
|
||||
else
|
||||
echo "[WARN] transaction_progress.py not found at expected path — anaconda may have moved it"
|
||||
echo "[WARN] transaction_progress.py not found at expected path"
|
||||
fi
|
||||
|
||||
# Enable services
|
||||
|
|
|
|||
|
|
@ -397,8 +397,22 @@ user --name=admin --groups=wheel --gecos="veilor admin" --password=__ADMIN_PW__
|
|||
__SSHKEY_DIRECTIVE__
|
||||
|
||||
# Full hardening cmdline (installed system, not live):
|
||||
# --location=none: anaconda auto-places bootloader (UEFI grub2-efi or BIOS).
|
||||
bootloader --append="lockdown=integrity slab_nomerge init_on_alloc=1 init_on_free=1 randomize_kstack_offset=on vsyscall=none"
|
||||
# - `lockdown=integrity` — kernel lockdown, integrity mode (signed module enforce)
|
||||
# - `slab_nomerge` — refuse SLAB merging; harder heap-spray attacks
|
||||
# - `init_on_alloc=1 init_on_free=1` — zero pages on alloc + free; defends
|
||||
# uninit-read class; ~5% perf hit acceptable on hardened workstation
|
||||
# - `randomize_kstack_offset=on` — KASLR for kernel stack, per-syscall
|
||||
# - `vsyscall=none` — kill legacy vsyscall page (Position-Independent
|
||||
# ROP-gadget surface)
|
||||
# - `fbcon=nodefer` — keep linux framebuffer console alive through KMS
|
||||
# handoff so plymouth LUKS prompt and any boot-time text remain
|
||||
# visible on real GPU drivers (intel/amdgpu/nvidia). Already in live
|
||||
# ISO cmdline; was previously missing from installed-system cmdline,
|
||||
# which produced a black-screen boot on real hardware until KMS
|
||||
# stabilised.
|
||||
# Anaconda picks bootloader location (UEFI ESP or BIOS MBR) automatically;
|
||||
# `--location=mbr` would be cargo-cult on UEFI and risky on multi-disk.
|
||||
bootloader --append="lockdown=integrity slab_nomerge init_on_alloc=1 init_on_free=1 randomize_kstack_offset=on vsyscall=none fbcon=nodefer"
|
||||
|
||||
# Disk: zero, LUKS2 (argon2id), btrfs subvolumes (no LVM intermediary).
|
||||
# Native btrfs-on-LUKS matches Fedora KDE Spin defaults; LVM+btrfs combo
|
||||
|
|
@ -608,7 +622,26 @@ sed -i \
|
|||
# user lands in emergency shell on first boot.
|
||||
LUKS_UUID=$(blkid -t TYPE=crypto_LUKS -o value -s UUID 2>/dev/null | head -1)
|
||||
if [ -n "$LUKS_UUID" ]; then
|
||||
LUKS_ARGS="rd.luks.uuid=luks-${LUKS_UUID}"
|
||||
# Args:
|
||||
# rd.luks.uuid=luks-XXX — tells dracut to expect a LUKS device,
|
||||
# triggers cryptsetup-generator.
|
||||
# rd.luks.options=...=tries=5 — five typo retries before giving up
|
||||
# (default 1; one slip = emergency
|
||||
# shell after 3min, terrible UX).
|
||||
# rd.luks.options=...=timeout=0 — never time out unlock device wait
|
||||
# (default 1m30s; slow user typing
|
||||
# on a long passphrase still works).
|
||||
# fbcon=nodefer — keep linux framebuffer console alive
|
||||
# through KMS handoff. Without this on
|
||||
# real laptops the plymouth LUKS prompt
|
||||
# draws into a frozen framebuffer and
|
||||
# the user sees a black screen with a
|
||||
# blinking cursor. Already in the live
|
||||
# ISO bootloader cmdline; missing from
|
||||
# the installed-system bootloader line
|
||||
# in the generated install ks above
|
||||
# (also fixed there).
|
||||
LUKS_ARGS="rd.luks.uuid=luks-${LUKS_UUID} rd.luks.options=luks-${LUKS_UUID}=tries=5,timeout=0 fbcon=nodefer"
|
||||
|
||||
# Path 1: persist into /etc/default/grub so future kernels inherit.
|
||||
if ! grep -q "rd.luks.uuid" /etc/default/grub 2>/dev/null; then
|
||||
|
|
@ -620,16 +653,54 @@ if [ -n "$LUKS_UUID" ]; then
|
|||
# the `options` line in-place.
|
||||
grubby --update-kernel=ALL --args="${LUKS_ARGS}" 2>&1 | tail -5 || true
|
||||
|
||||
# Verification: every BLS entry MUST carry the LUKS arg now. Empty
|
||||
# output = success.
|
||||
drift=$(grep -L "rd.luks.uuid" /boot/loader/entries/*.conf 2>/dev/null)
|
||||
if [ -n "$drift" ]; then
|
||||
echo "[WARN] BLS entries missing rd.luks.uuid: $drift"
|
||||
fi
|
||||
|
||||
echo "[INFO] injected ${LUKS_ARGS} into /etc/default/grub + BLS entries"
|
||||
fi
|
||||
|
||||
# Verify anaconda wrote /etc/crypttab for the LUKS device. anaconda's
|
||||
# custom-partitioning code path normally does this for `--encrypted`
|
||||
# part directives; if it didn't (edge case, F43+ regressions), write
|
||||
# a minimal entry so systemd-cryptsetup-generator can find the device
|
||||
# at boot from the BLS args alone.
|
||||
if [ -n "$LUKS_UUID" ] && ! grep -q "$LUKS_UUID" /etc/crypttab 2>/dev/null; then
|
||||
echo "luks-${LUKS_UUID} UUID=${LUKS_UUID} none discard" >> /etc/crypttab
|
||||
echo "[INFO] wrote /etc/crypttab fallback entry"
|
||||
fi
|
||||
|
||||
# Switch plymouth to text-only `details` theme (scrolling boot log, no
|
||||
# graphics, no logo). Theme is built-in to plymouth package, no asset
|
||||
# install needed. v0.6 will ship custom veilor-themed plymouth.
|
||||
plymouth-set-default-theme details 2>/dev/null || true
|
||||
# Regenerate initramfs with new theme baked in (plymouth modules read
|
||||
# theme at initramfs build time).
|
||||
dracut --force --regenerate-all 2>&1 | tail -3 || true
|
||||
|
||||
# Force-include LUKS + plymouth modules in initramfs. dracut autodetects
|
||||
# crypt+plymouth from the running config, but custom-partitioning %post
|
||||
# runs before dracut sees stable LUKS state, and stale initramfs files
|
||||
# from anaconda's pre-install kernel may persist. Belt-and-braces.
|
||||
mkdir -p /etc/dracut.conf.d
|
||||
cat > /etc/dracut.conf.d/10-veilor-luks.conf <<'DRACUTEOF'
|
||||
# veilor-os: guarantee LUKS + plymouth modules in initramfs
|
||||
add_dracutmodules+=" crypt systemd-cryptsetup plymouth "
|
||||
DRACUTEOF
|
||||
|
||||
# Regenerate initramfs with new theme + dracut.conf.d picks. Remove
|
||||
# stale initramfs first so the regen actually rewrites bytes.
|
||||
rm -f /boot/initramfs-*.img 2>/dev/null || true
|
||||
dracut --force --regenerate-all 2>&1 | tail -5 || true
|
||||
|
||||
# Verify cryptsetup landed in initramfs. If not, LUKS unlock is impossible
|
||||
# and the user gets emergency shell on first boot. Surfacing this early.
|
||||
KVER=$(ls /lib/modules | head -1)
|
||||
if [ -n "$KVER" ] && [ -f "/boot/initramfs-${KVER}.img" ]; then
|
||||
if ! lsinitrd "/boot/initramfs-${KVER}.img" 2>/dev/null | grep -q cryptsetup; then
|
||||
echo "[ERR] cryptsetup not found in initramfs — LUKS unlock will fail"
|
||||
fi
|
||||
fi
|
||||
|
||||
# Regen grub.cfg with new branding (anaconda already wrote one; replace).
|
||||
grub2-mkconfig -o /boot/grub2/grub.cfg 2>/dev/null || true
|
||||
|
|
|
|||
Loading…
Reference in a new issue