Five-fix bundle from 7-agent research wave on the v0.5.28-final
gen_grub_cfgstub failure.
## 1. Narrow the anaconda transaction_progress patch (CRITICAL)
The v0.5.28 patch was too broad. It rewrote
`process_transaction_progress` so every 'error' token in the
transaction queue became a `log.warning`. That queue carries four
distinct error classes:
- cpio_error — payload extraction (genuinely fatal)
- script_error — RPM 6.0 cmdline-mode scriptlet warning-as-error
(the ONE we want to ignore)
- unpack_error — payload corruption (genuinely fatal)
- error — generic transaction error (genuinely fatal)
By swallowing all four we silently masked grub2-efi-x64's posttrans
failure mid-install. /boot/efi/EFI/fedora/ ended up incomplete →
gen_grub_cfgstub then failed at the bootloader install phase with
"gen_grub_cfgstub script failed" because its `set -eu` script
couldn't read the missing files.
v0.5.29 narrows the patch: override only the `script_error` callback
inside transaction_progress.py to log a warning and NOT enqueue
'error'. The consumer (`process_transaction_progress`) reverts to
upstream behaviour where cpio_error / unpack_error / error still
raise PayloadInstallationError. Real install-fatal events keep
aborting; only the F43-RPM-6.0 scriptlet regression is silenced.
The patch is applied via `python3 -c` regex rewrite (more robust
than nested sed across multi-line method bodies).
## 2. LUKS UX — `tries=5,timeout=0` (FIX)
Default cryptsetup-generator unit allows ONE passphrase try with a
1m30s wait. One typo on a long passphrase = wait 1m30s, then the
device-wait timer trips, then dracut emergency shell after 3min total.
Brutal. Adding `rd.luks.options=luks-XXX=tries=5,timeout=0` gives
five typo-friendly retries with no auto-timeout.
## 3. fbcon=nodefer on installed-system cmdline (FIX)
Live ISO cmdline already has `fbcon=nodefer` (added in v0.5.27 to fix
the real-laptop black-screen-after-dracut). The installed-system
bootloader directive in the generated install ks did NOT carry it.
Same KMS handoff happens on the installed system on the same hardware.
Now both have the flag.
## 4. /etc/crypttab fallback assertion (BELT-BRACES)
Anaconda's custom-partitioning code path normally writes /etc/crypttab
for `--encrypted` part directives. Edge cases observed in F43+ where
it doesn't. Without crypttab, systemd-cryptsetup-generator can still
work from kernel cmdline alone, but cleanup paths and second-stage
unlock both fall over. Adding a fallback `echo` that writes the
canonical line if it's missing post-anaconda.
## 5. Initramfs LUKS module assertion (DEFENSIVE)
Force-include `crypt + systemd-cryptsetup + plymouth` modules in
initramfs via /etc/dracut.conf.d/10-veilor-luks.conf. dracut autodetects
these when it sees an active LUKS mapping, but %post runs before the
LUKS state is fully observable from the chroot. Plus we wipe stale
initramfs (`rm -f /boot/initramfs-*.img`) before `--regenerate-all`
so the regen actually rewrites bytes. Final assertion runs
`lsinitrd | grep -q cryptsetup` and surfaces a [ERR] line in build
output if the module didn't make it.
## What this should fix
After the man-db fix in v0.5.28-final, install proceeded past
"Configuring xxx" cleanly but died at "Installing boot loader" with
gen_grub_cfgstub. Root-cause was the over-broad patch from #1 above.
After v0.5.29:
- Install transaction completes (man-db excluded; non-man-db
scriptlet warnings still suppressed; real errors still raise)
- gen_grub_cfgstub runs against complete /boot/efi/EFI/fedora/
- Bootloader install completes
- Reboot to disk lands at GRUB veilor-os entry
- Kernel + initramfs load (cryptsetup confirmed present)
- Plymouth LUKS prompt appears with text fallback
- User has 5 tries, no timeout
- Unlock → btrfs subvol mount → systemd → SDDM
Files: kickstart/veilor-os.ks (+45 lines), overlay/usr/local/bin/veilor-installer (+50 lines).
Verified: bash -n clean, ksvalidator clean.
References:
pyanaconda transaction_progress.py:110-136 (4 producers of 'error')
pyanaconda bootloader/efi.py:194-201 (gen_grub_cfgstub call site)
/usr/bin/gen_grub_cfgstub (set -eu wrapper for grub2-mkconfig stub)
Fedora wiki Changes/RPM-6.0
dnf5 issue #2507 (RPM 6.0 scriptlet propagation regression)
370 lines
16 KiB
Text
370 lines
16 KiB
Text
#version=DEVEL
|
|
# veilor-os kickstart — Fedora 43 KDE base, hardened, minimal.
|
|
# Build with livemedia-creator inside build/Containerfile.
|
|
|
|
# ── Install source ──
|
|
# Hard-code version (not $releasever) because lorax doesn't expand
|
|
# inside kickstart `url`/`repo` directives. Updates repo critical:
|
|
# base Fedora 43 ships selinux-policy 42.12 with pcre2-10.47-built
|
|
# file_contexts.bin, which fails chroot %triggerin against host's
|
|
# libselinux (built against pcre2 10.46). 43.7 in updates is rebuilt.
|
|
url --mirrorlist="https://mirrors.fedoraproject.org/mirrorlist?repo=fedora-43&arch=x86_64"
|
|
# Explicit `repo --name=fedora` lets livecd-creator see base repo (it only
|
|
# reads repo.repoList, ignores url= directive). livemedia-creator + Anaconda
|
|
# honor both. No behavior change for either tool.
|
|
# Use direct baseurl (kernel.org mirror) to avoid mirrorlist 404s during
|
|
# Fedora's metadata sync windows.
|
|
repo --name=fedora --baseurl="https://download.fedoraproject.org/pub/fedora/linux/releases/43/Everything/x86_64/os/" --install
|
|
repo --name=updates --baseurl="https://download.fedoraproject.org/pub/fedora/linux/updates/43/Everything/x86_64/" --install
|
|
# Local fix-repo: build-time-only workaround for host pcre2/libselinux skew.
|
|
# Stripped from CI ks via sed in build-iso.yml. NOT shipped state.
|
|
repo --name=veilor-fix --baseurl=file:///tmp/veilor-fix-repo --install --cost=1
|
|
|
|
# ── Locale / keyboard / time (template — adjust per build) ──
|
|
keyboard --xlayouts='us'
|
|
lang en_GB.UTF-8
|
|
timezone Europe/London --utc
|
|
|
|
# ── Install mode ──
|
|
# Note: no display mode (text/graphical/cmdline) — livemedia-creator forbids.
|
|
firstboot --disable
|
|
eula --agreed
|
|
# Build-time SELinux disabled to avoid PCRE2 regex version mismatch between
|
|
# host libselinux and chroot's selinux-policy file_contexts.bin (pcre2 10.46
|
|
# vs 10.47). veilor-firstboot.service triggers `fixfiles -F onboot` and
|
|
# `setenforce 1` on first boot to re-enable enforcing mode.
|
|
selinux --permissive
|
|
# veilor-firstboot + veilor-modules-lock enabled via %post after overlay
|
|
# copy (units don't exist yet at services-config phase).
|
|
services --enabled=sshd,fail2ban,usbguard,tuned,auditd,firewalld,chronyd,sddm
|
|
|
|
# ── Network / hostname ──
|
|
network --bootproto=dhcp --device=link --activate --hostname=veilor-os
|
|
firewall --enabled --service=ssh
|
|
|
|
# ── Identity (zero-prompt; only LUKS passphrase asked at install) ──
|
|
# Note: `auth` command removed in pykickstart 3.x — defaults (sha512 shadow) apply.
|
|
rootpw --lock
|
|
user --name=admin --groups=wheel --gecos="veilor admin" --password="" --plaintext
|
|
|
|
# ── Bootloader: kernel hardening flags ──
|
|
# Note: init_on_alloc/init_on_free removed from default live cmdline —
|
|
# they zero every memory page at boot which 5x'd KVM live boot time.
|
|
# Re-enable per-install via veilor-firstboot.service for production.
|
|
# `fbcon=nodefer` keeps the linux framebuffer console alive across the
|
|
# KMS modeset that intel/amdgpu/nvidia drivers do during userspace init.
|
|
# Without it, on real hardware the screen blanks the moment the GPU
|
|
# driver loads and the installer's tty1 redraw lands on a frozen
|
|
# framebuffer — symptom: black screen with blinking cursor for ~30s
|
|
# while the menu IS in fact rendered, just not painted. virtio-vga in
|
|
# QEMU doesn't trigger this so it never reproed in VM.
|
|
bootloader --location=mbr --append="lockdown=integrity slab_nomerge randomize_kstack_offset=on vsyscall=none plymouth.enable=0 fbcon=nodefer"
|
|
|
|
# ── Live ISO partitioning (flat — for live rootfs build only) ──
|
|
# NOTE: This is the *live* image kickstart. Final installed system uses
|
|
# a separate installer kickstart (kickstart/install.ks, planned v0.2.1)
|
|
# that does LUKS2 + btrfs subvols on the target disk.
|
|
part / --fstype=ext4 --size=8192
|
|
|
|
# ── Packages ──
|
|
%packages --excludedocs
|
|
@^kde-desktop-environment
|
|
@kde-apps
|
|
@core
|
|
@hardware-support
|
|
@standard
|
|
|
|
# live install plumbing (required by livemedia-creator --make-iso)
|
|
# CRITICAL: livesys-scripts + anaconda-live ship the systemd units lorax expects
|
|
# at squashfs creation. Without them, EFI/BOOT not built and ISO wrap fails.
|
|
# (Upstream Fedora's fedora-live-kde.ks includes these via fedora-live-base.ks.)
|
|
livesys-scripts
|
|
anaconda-live
|
|
@anaconda-tools
|
|
kernel-modules
|
|
kernel-modules-extra
|
|
glibc-all-langpacks
|
|
dracut-live
|
|
dracut-config-generic
|
|
kernel
|
|
grub2-efi-x64
|
|
grub2-efi-x64-modules
|
|
grub2-pc
|
|
grub2-pc-modules
|
|
grub2-tools
|
|
grub2-tools-extra
|
|
shim-x64
|
|
efibootmgr
|
|
syslinux
|
|
isomd5sum
|
|
xorriso
|
|
|
|
# veilor-installer dependencies (TTY1 TUI installer wrapping anaconda)
|
|
newt
|
|
parted
|
|
cryptsetup
|
|
lvm2
|
|
btrfs-progs
|
|
|
|
|
|
# core hardening tools
|
|
fail2ban
|
|
fail2ban-firewalld
|
|
usbguard
|
|
usbguard-tools
|
|
audit
|
|
policycoreutils-python-utils
|
|
tuned
|
|
chrony
|
|
firewalld
|
|
plymouth
|
|
|
|
# admin essentials
|
|
git
|
|
vim-enhanced
|
|
tmux
|
|
htop
|
|
podman
|
|
skopeo
|
|
NetworkManager
|
|
NetworkManager-wifi
|
|
|
|
# fonts
|
|
fontconfig
|
|
freetype
|
|
fira-code-fonts
|
|
|
|
# remove fluff
|
|
# Note: KDE Plasma 6 hard-deps on cups/geoclue2/ModemManager/PackageKit
|
|
# transitively (plasma-print-manager, xdg-desktop-portal, NM-wwan etc),
|
|
# so package removal breaks depsolve. Daemons disabled at runtime via
|
|
# scripts/20-harden-kernel.sh instead.
|
|
-abrt*
|
|
-snapd
|
|
-kde-connect
|
|
-open-vm-tools-desktop
|
|
-mlocate
|
|
|
|
%end
|
|
|
|
# ── Post-install (nochroot): copy overlay tree into installed root ──
|
|
%post --nochroot --erroronfail
|
|
set -uo pipefail
|
|
# DEST: livecd-creator sets INSTALL_ROOT, livemedia-creator uses /mnt/sysimage.
|
|
DEST="${INSTALL_ROOT:-/mnt/sysimage}"
|
|
[[ -d $DEST ]] || { echo "[ERR] DEST=$DEST does not exist (livecd-creator? livemedia-creator?)" >&2; exit 1; }
|
|
|
|
# Try multiple source paths:
|
|
# /run/install/repo/veilor — boot ISO (--virt mode)
|
|
# /work — bind mount in CI container
|
|
# $(dirname kickstart)/.. — local --no-virt builds
|
|
SRC=""
|
|
for candidate in /run/install/repo/veilor /work /mnt/work; do
|
|
if [[ -d $candidate/overlay ]]; then
|
|
SRC=$candidate
|
|
break
|
|
fi
|
|
done
|
|
|
|
# Fallback: derive from kickstart path. Anaconda passes ks via --kickstart=<path>.
|
|
if [[ -z $SRC ]]; then
|
|
KS_PATH=$(ps -ef | grep -oP -- '--kickstart[= ]\K[^ ]+' | head -1)
|
|
if [[ -n $KS_PATH && -d $(dirname "$KS_PATH")/../overlay ]]; then
|
|
SRC=$(realpath "$(dirname "$KS_PATH")/..")
|
|
fi
|
|
fi
|
|
|
|
if [[ -z $SRC ]]; then
|
|
echo "[ERR] cannot locate veilor-os repo source — overlay/scripts not copied" >&2
|
|
exit 1
|
|
fi
|
|
|
|
echo "[INFO] using SRC=$SRC DEST=$DEST"
|
|
set -x
|
|
cp -a "$SRC/overlay/." "$DEST/" || echo "[ERR] overlay cp failed: $?"
|
|
mkdir -p "$DEST/usr/share/veilor-os" || echo "[ERR] mkdir failed: $?"
|
|
ls -la "$SRC/assets" "$SRC/scripts" 2>&1 || echo "[ERR] assets/scripts missing in $SRC"
|
|
cp -a "$SRC/assets" "$DEST/usr/share/veilor-os/" || echo "[ERR] assets cp failed: $?"
|
|
cp -a "$SRC/scripts" "$DEST/usr/share/veilor-os/" || echo "[ERR] scripts cp failed: $?"
|
|
ls -la "$DEST/usr/share/veilor-os/" 2>&1 || echo "[ERR] dest dir missing post-cp"
|
|
# Force root ownership on everything we copied — `cp -a` preserves
|
|
# CI runner uid (1001), which makes sudo refuse to read /etc/sudoers.d.
|
|
chown -R 0:0 "$DEST/etc" "$DEST/usr/share/veilor-os" "$DEST/usr/local/bin" 2>&1 || echo "[WARN] chown failed"
|
|
set +x
|
|
|
|
# Persist nochroot log into installed system for diagnostics
|
|
{
|
|
echo "=== %post --nochroot trace ==="
|
|
date
|
|
echo "SRC=$SRC DEST=$DEST"
|
|
ls -la "$DEST/usr/share/veilor-os/" 2>&1
|
|
ls -la "$DEST/usr/local/bin/" 2>&1
|
|
} > "$DEST/var/log/veilor-nochroot.log" 2>&1 || true
|
|
%end
|
|
|
|
# ── Post-install (chroot): apply hardening, theme, branding ──
|
|
%post
|
|
set -uo pipefail
|
|
exec > >(tee -a /var/log/veilor-install.log) 2>&1
|
|
|
|
echo "════════════════════════════════════════════════════════"
|
|
echo " veilor-os install — %post"
|
|
echo "════════════════════════════════════════════════════════"
|
|
|
|
REPO=/usr/share/veilor-os
|
|
chmod +x $REPO/scripts/*.sh $REPO/scripts/selinux/*.sh /usr/local/bin/veilor-power /usr/local/bin/veilor-update /usr/local/bin/veilor-doctor /usr/local/bin/veilor-firstboot /usr/local/bin/veilor-installer
|
|
|
|
# Live image plumbing (matches upstream Fedora live ks). Without these the
|
|
# squashfs/EFI build fails — livesys-scripts ships systemd units lorax expects.
|
|
systemctl enable livesys.service livesys-late.service 2>/dev/null || true
|
|
systemctl enable tmp.mount 2>/dev/null || true
|
|
|
|
# /etc/machine-id reset on first boot (live image baseline)
|
|
> /etc/machine-id
|
|
|
|
# Apply hardening
|
|
bash $REPO/scripts/10-harden-base.sh
|
|
bash $REPO/scripts/20-harden-kernel.sh
|
|
|
|
# Build SELinux module
|
|
bash $REPO/scripts/selinux/build-policy.sh || echo "[WARN] SELinux build failed; load on first boot"
|
|
|
|
# Apply KDE theme + DuckSans + os-release branding
|
|
bash $REPO/scripts/kde-theme-apply.sh
|
|
bash $REPO/scripts/30-apply-v03-theme.sh || echo "[WARN] v03-theme apply failed"
|
|
|
|
# Force admin password set on first boot.
|
|
# livecd-creator does NOT honor `user` kickstart directive (it's a LIVE
|
|
# image, no installer step). Create admin manually in chroot %post.
|
|
# Note: SDDM rejects blank passwords by default (PAM nullok off), so we
|
|
# set throwaway pw `veilor` + chage -d 0 to force reset on first login.
|
|
if ! getent passwd admin >/dev/null; then
|
|
useradd -m -G wheel -s /bin/bash -c "veilor admin" admin
|
|
echo 'admin:veilor' | chpasswd
|
|
chage -d 0 admin
|
|
echo "[INFO] admin user created (default pw=veilor, expired)"
|
|
fi
|
|
|
|
# Symlink display-manager.service → sddm.service. graphical.target Wants=
|
|
# display-manager but the alias doesn't get auto-created when sddm package
|
|
# is installed via livecd-creator (vs Anaconda installer which handles it).
|
|
# Without this, sddm stays inactive even though enabled.
|
|
ln -sf /usr/lib/systemd/system/sddm.service /etc/systemd/system/display-manager.service
|
|
|
|
# Live ISO default target: multi-user (TTY1 = veilor-installer TUI lands first).
|
|
# User picks "Try live — desktop" from menu → systemctl isolate graphical.target.
|
|
# Real installs land on graphical.target by default (set by anaconda).
|
|
systemctl set-default multi-user.target
|
|
|
|
# Branding: GRUB menu title + plymouth `details` text theme (no graphical
|
|
# splash). Pure text-scroll boot exposes the gum installer immediately on
|
|
# tty1 instead of plymouth swallowing it.
|
|
sed -i \
|
|
-e 's|^GRUB_DISTRIBUTOR=.*|GRUB_DISTRIBUTOR="veilor-os"|' \
|
|
-e 's|^GRUB_CMDLINE_LINUX_DEFAULT=.*|GRUB_CMDLINE_LINUX_DEFAULT=""|' \
|
|
/etc/default/grub 2>/dev/null || true
|
|
plymouth-set-default-theme details 2>/dev/null || true
|
|
[ -f /boot/grub2/grub.cfg ] && grub2-mkconfig -o /boot/grub2/grub.cfg 2>/dev/null || true
|
|
|
|
# zram swap (no disk swap; keys never leak to platter)
|
|
dnf install -y zram-generator || true
|
|
cat > /etc/systemd/zram-generator.conf << 'EOF'
|
|
[zram0]
|
|
zram-size = min(ram, 8192)
|
|
compression-algorithm = zstd
|
|
EOF
|
|
|
|
# Patch anaconda's transaction_progress.py inside the live rootfs so that
|
|
# when the user clicks "Install", a non-fatal RPM 6.0 *scriptlet* warning
|
|
# does not get escalated to "An error occurred during the transaction"
|
|
# and abort.
|
|
#
|
|
# This patch is NARROW — it overrides ONLY the `script_error` callback,
|
|
# not the consumer (`process_transaction_progress`). v0.5.28 had a broad
|
|
# patch that turned EVERY 'error' token into a warning, including
|
|
# `cpio_error` (payload corruption) and `unpack_error` (extraction
|
|
# failures). Side effect: silent grub2-efi-x64 scriptlet failure →
|
|
# /boot/efi/EFI/fedora/ left incomplete → `gen_grub_cfgstub` failed at
|
|
# the bootloader install phase. Narrowing eliminates that class of
|
|
# silent failure.
|
|
#
|
|
# Why a patch is needed at all: Fedora 43 ships RPM 6.0, which changed
|
|
# scriptlet failure propagation (Fedora wiki Changes/RPM-6.0; dnf5 issue
|
|
# 2507). Scriptlets that previously emitted "Non-critical error"
|
|
# warnings now bubble up as transaction-level errors. man-db's
|
|
# `transfiletriggerin` (`systemd-run /usr/bin/systemctl start
|
|
# man-db-cache-update`) is the most common trigger — non-zero in the
|
|
# anaconda chroot, RPM-6.0-aware dnf5 reports as error, anaconda
|
|
# --cmdline aborts.
|
|
#
|
|
# After the patch:
|
|
# - script_error → log warning, do NOT enqueue 'error' (transaction
|
|
# continues; specific package's posttrans whose result we ignore is
|
|
# already in the install set, scriptlet has run as far as it can).
|
|
# - cpio_error / unpack_error / generic error → unchanged, still
|
|
# raise PayloadInstallationError as anaconda intends. Real
|
|
# transaction-fatal events still abort install (good).
|
|
TP=/usr/lib64/python3.14/site-packages/pyanaconda/modules/payloads/payload/dnf/transaction_progress.py
|
|
if [ -f "$TP" ]; then
|
|
cp -a "$TP" "${TP}.veilor-bak"
|
|
|
|
# Replace the script_error self._queue.put(('error', ...)) line with a
|
|
# warning log + return. The script_error method is uniquely identified
|
|
# by its `return_code` argument; sed targets that line specifically.
|
|
# `python3 -c` block is more robust than nested sed across multi-line
|
|
# statements; rewrite the whole script_error method body.
|
|
python3 - "$TP" <<'PYEOF'
|
|
import sys, re
|
|
path = sys.argv[1]
|
|
src = open(path).read()
|
|
# Find the script_error method and replace the queue.put(...) line at its end
|
|
new = re.sub(
|
|
r'( def script_error\(self, item, nevra, type, return_code\):.*?)\n self\._queue\.put\(\(.error., item\.get_package\(\)\.to_string\(\)\)\)',
|
|
r'\1\n log.warning("veilor: ignoring non-fatal scriptlet failure rc=%s for %s",\n return_code,\n item.get_package().to_string() if item else "unknown")\n # do NOT enqueue \'error\' — let install continue (RPM 6.0 cmdline regression workaround)',
|
|
src,
|
|
flags=re.DOTALL,
|
|
count=1,
|
|
)
|
|
if new == src:
|
|
print("[ERR] script_error method not found in expected form — anaconda layout changed")
|
|
sys.exit(1)
|
|
open(path, "w").write(new)
|
|
print("[OK] transaction_progress.py: narrowed script_error override")
|
|
PYEOF
|
|
|
|
if grep -q "veilor: ignoring non-fatal scriptlet" "$TP"; then
|
|
# Drop the cached .pyc so the patched .py is what runs.
|
|
rm -f /usr/lib64/python3.14/site-packages/pyanaconda/modules/payloads/payload/dnf/__pycache__/transaction_progress.*.pyc 2>/dev/null || true
|
|
echo "[OK] anaconda transaction_progress.py patched in live rootfs (script_error only)"
|
|
else
|
|
echo "[WARN] transaction_progress.py patch did not apply — anaconda layout may have changed"
|
|
fi
|
|
else
|
|
echo "[WARN] transaction_progress.py not found at expected path"
|
|
fi
|
|
|
|
# Enable services
|
|
# veilor-firstboot.service NOT enabled on live ISO — it prompts admin pw
|
|
# which makes no sense on a live boot. Real installs enable it in their
|
|
# generated kickstart's chroot %post (see overlay/usr/local/bin/veilor-installer).
|
|
systemctl enable veilor-modules-lock.service
|
|
systemctl enable sshd fail2ban usbguard tuned auditd firewalld chronyd
|
|
# Mask veilor-firstboot on live so even if it landed in /etc/systemd/system
|
|
# (overlay drag), it can't activate.
|
|
systemctl mask veilor-firstboot.service 2>/dev/null || true
|
|
|
|
# Default tuned profile = balanced (AC/battery udev rule will override)
|
|
tuned-adm profile veilor-balanced 2>/dev/null || true
|
|
|
|
# Lock root explicitly (kickstart --lock should already do this)
|
|
passwd -l root
|
|
|
|
# Sanity: zero references to onyx / personal IPs in installed system
|
|
if grep -rqi 'onyx\|192\.168\.0\.\|fedora\.local' /etc/veilor* /etc/tuned/profiles/veilor-* 2>/dev/null; then
|
|
echo "[ERR] brand leak detected in /etc — investigate"
|
|
fi
|
|
|
|
echo "════════════════════════════════════════════════════════"
|
|
echo " veilor-os install complete"
|
|
echo "════════════════════════════════════════════════════════"
|
|
%end
|