veilor-os/kickstart/veilor-os.ks
veilor-org b86b4f9ec3 v0.5.32: ship 7 blockers from 9-agent wave
Per docs/research/2026-05-05-agent-wave/README.md priority list.
All 7 land together to keep iteration cycles useful — partial fixes
bury the lookahead findings agents already mapped.

## 1. CRITICAL — suspend/resume wifi death (Agent 9, B2)

`veilor-modules-lock.service` runs `kernel.modules_disabled=1` 30s
after graphical.target. iwlwifi/iwlmvm/cfg80211 reload on resume
from S3/S0ix → with modules locked, resume breaks wifi until
reboot. Same architectural class as the LUKS bug — security feature
breaks legitimate kernel state transitions.

The unit already has `ConditionKernelCommandLine=!module.sig_enforce=1`
(self-skip when signed-modules enforcement is on cmdline). Adding
`module.sig_enforce=1` to the kernel cmdline retains the security
property (no unsigned modules) without runtime lock-down → resume
works.

Files: kickstart/veilor-os.ks line 61 + overlay/usr/local/bin/veilor-installer
generated bootloader directive both gain `module.sig_enforce=1`.

## 2. veilor-firstboot.service WantedBy=graphical.target (Agent 2)

Was `WantedBy=multi-user.target` only. Real installs default to
graphical.target so the unit never ran on installed systems — admin
pw stayed at install-time + chage -d 0 expired, SDDM PAM bounced
to chauthtok screen (recoverable but ugly UX).

Now `WantedBy=graphical.target multi-user.target`. Live ISO +
multi-user installs both resolve via this list.

## 3. USBGuard hash → id-based baseline (Agent 9, A3)

Mirrors memory feedback_usbguard_dock.md — onyx had hash+parent-hash
rules that broke on dock replug; we shipped no rules.conf so first
boot blocks the USB keyboard.

Adds overlay/etc/usbguard/rules.conf with HID-class allow rule
(`allow with-interface match-all { 03:*:* }`) — covers every USB
keyboard, mouse, gamepad, fingerprint reader, NFC. Survives dock
replug + kernel-bump vendor renumeration. Mass-storage stays
implicit-block; user explicitly allows post-firstboot via
`ujust veilor-usbguard-enroll` (planned v0.6).

## 4. firewalld trusted zone with tailscale0 pre-bound (Agent 9, D1)

User uses Tailscale daily (memory: project_tailscale_mesh.md).
Default firewalld zone = drop, blocks tailnet traffic on tailscale0.

Adds overlay/etc/firewalld/zones/trusted.xml with
`<interface name="tailscale0"/>`. After `tailscale up` brings the
interface up, NetworkManager dispatcher associates it with the
trusted zone automatically — no user intervention.

Default zone stays drop. Only the tailscale0 interface gets ACCEPT.

## 5. /etc/skel branding (Agent 7)

Was completely empty. Result: per-user KDE config (~/.config/kdeglobals
etc.) pre-empty, so the moment user opened System Settings, KDE wrote
fresh ~/.config/* and silently shadowed our /etc/xdg/kdedefaults/*.
Visual brand evaporated on first click.

Seeds:
  /etc/skel/.config/kdeglobals    (copy of assets/kde/veilor-default.kdeglobals)
  /etc/skel/.config/breezerc      (copy of assets/kde/breezerc)
  /etc/skel/.config/kwinrc        (Plasma 6 wayland defaults: opengl, animspeed=0,
                                    blur off, click-to-focus)
  /etc/skel/.config/konsolerc     (default profile = Veilor)
  /etc/skel/.local/share/konsole/Veilor.profile + .colorscheme

User who opens System Settings now writes against branded baseline,
not against vanilla Breeze.

## 6. KMS modeset args + initramfs keymap (Agents 1 + 9)

Real laptop boot has a 5-15s blank between vt switch and SDDM start
because simpledrm releases before i915/nvidia-drm/amdgpu claim. Plus
non-US users get locked out at LUKS prompt because initramfs ships
en-US keymap by default (RHBZ 1405539, RHBZ 1890085).

Adds to bootloader cmdline (live + installed):
  i915.modeset=1 amdgpu.modeset=1 nvidia-drm.modeset=1
  rd.vconsole.keymap=us

`rd.vconsole.keymap=us` is a placeholder; the v0.6 firstboot keymap
picker will rewrite it from /etc/vconsole.conf. Until then, en-US
users get correct LUKS keyboard; non-US users still need the v0.6
fix (per Agent 1).

## 7. virtio-9p log capture (Agent 6)

The v0.5.30 virtio-serial wiring depends on rsyslog inside the live
ISO (anaconda's setupVirtio writes a rsyslog forward rule), which
the live ks doesn't install — files were 0-byte across three
install runs.

test/run-vm.sh now adds a `-virtfs local,...,mount_tag=hostlogs`
share pointing at `test/test-runs/<timestamp>/`. veilor-installer
runs `_dump_logs_to_host` via EXIT trap that mounts the share at
/mnt/hostlogs and rsyncs /tmp/{anaconda,program,storage,packaging,dnf}.log
+ /var/log/veilor-installer.log + dmesg + journalctl + the generated
ks. Runs on success AND failure AND ^C.

No-op on real hardware (9p tag absent) — VM-only debug.

## Validate

  bash -n overlay/usr/local/bin/veilor-installer  # OK
  ksvalidator kickstart/veilor-os.ks               # clean

## Out-of-scope for v0.5.32 (deferred to v0.6)

Per Agent 1 follow-ups: argon2id retune for slow CPUs, recovery key
generation in firstboot, TPM2/FIDO2 unlock helpers. Per Agent 9
follow-ups: Plasma Wayland fallback X11 install, lid-close handling,
SELinux relabel progress UX. Per Agent 4: AppArmor stack +
nftables preset + audit log shipping CLI.

Per Agent 8 (CI hardening): SHA-pin actions + dependabot + SBOM +
SLSA L3 attestation — separate workflow-only commit.
2026-05-05 15:36:24 +01:00

401 lines
17 KiB
Text

#version=DEVEL
# veilor-os kickstart — Fedora 43 KDE base, hardened, minimal.
# Build with livemedia-creator inside build/Containerfile.
# ── Install source ──
# Hard-code version (not $releasever) because lorax doesn't expand
# inside kickstart `url`/`repo` directives. Updates repo critical:
# base Fedora 43 ships selinux-policy 42.12 with pcre2-10.47-built
# file_contexts.bin, which fails chroot %triggerin against host's
# libselinux (built against pcre2 10.46). 43.7 in updates is rebuilt.
url --mirrorlist="https://mirrors.fedoraproject.org/mirrorlist?repo=fedora-43&arch=x86_64"
# Explicit `repo --name=fedora` lets livecd-creator see base repo (it only
# reads repo.repoList, ignores url= directive). livemedia-creator + Anaconda
# honor both. No behavior change for either tool.
# Use direct baseurl (kernel.org mirror) to avoid mirrorlist 404s during
# Fedora's metadata sync windows.
repo --name=fedora --baseurl="https://download.fedoraproject.org/pub/fedora/linux/releases/43/Everything/x86_64/os/" --install
repo --name=updates --baseurl="https://download.fedoraproject.org/pub/fedora/linux/updates/43/Everything/x86_64/" --install
# Local fix-repo: build-time-only workaround for host pcre2/libselinux skew.
# Stripped from CI ks via sed in build-iso.yml. NOT shipped state.
repo --name=veilor-fix --baseurl=file:///tmp/veilor-fix-repo --install --cost=1
# ── Locale / keyboard / time (template — adjust per build) ──
keyboard --xlayouts='us'
lang en_GB.UTF-8
timezone Europe/London --utc
# ── Install mode ──
# Note: no display mode (text/graphical/cmdline) — livemedia-creator forbids.
firstboot --disable
eula --agreed
# Build-time SELinux disabled to avoid PCRE2 regex version mismatch between
# host libselinux and chroot's selinux-policy file_contexts.bin (pcre2 10.46
# vs 10.47). veilor-firstboot.service triggers `fixfiles -F onboot` and
# `setenforce 1` on first boot to re-enable enforcing mode.
selinux --permissive
# veilor-firstboot + veilor-modules-lock enabled via %post after overlay
# copy (units don't exist yet at services-config phase).
services --enabled=sshd,fail2ban,usbguard,tuned,auditd,firewalld,chronyd,sddm
# ── Network / hostname ──
network --bootproto=dhcp --device=link --activate --hostname=veilor-os
firewall --enabled --service=ssh
# ── Identity (zero-prompt; only LUKS passphrase asked at install) ──
# Note: `auth` command removed in pykickstart 3.x — defaults (sha512 shadow) apply.
rootpw --lock
user --name=admin --groups=wheel --gecos="veilor admin" --password="" --plaintext
# ── Bootloader: kernel hardening flags ──
# Note: init_on_alloc/init_on_free removed from default live cmdline —
# they zero every memory page at boot which 5x'd KVM live boot time.
# Re-enable per-install via veilor-firstboot.service for production.
# `fbcon=nodefer` keeps the linux framebuffer console alive across the
# KMS modeset that intel/amdgpu/nvidia drivers do during userspace init.
# Without it, on real hardware the screen blanks the moment the GPU
# driver loads and the installer's tty1 redraw lands on a frozen
# framebuffer — symptom: black screen with blinking cursor for ~30s
# while the menu IS in fact rendered, just not painted. virtio-vga in
# QEMU doesn't trigger this so it never reproed in VM.
bootloader --location=mbr --append="lockdown=integrity module.sig_enforce=1 slab_nomerge randomize_kstack_offset=on vsyscall=none plymouth.enable=0 fbcon=nodefer i915.modeset=1 amdgpu.modeset=1 nvidia-drm.modeset=1 rd.vconsole.keymap=us"
# ── Live ISO partitioning (flat — for live rootfs build only) ──
# NOTE: This is the *live* image kickstart. Final installed system uses
# a separate installer kickstart (kickstart/install.ks, planned v0.2.1)
# that does LUKS2 + btrfs subvols on the target disk.
part / --fstype=ext4 --size=8192
# ── Packages ──
%packages --excludedocs
@^kde-desktop-environment
@kde-apps
@core
@hardware-support
@standard
# live install plumbing (required by livemedia-creator --make-iso)
# CRITICAL: livesys-scripts + anaconda-live ship the systemd units lorax expects
# at squashfs creation. Without them, EFI/BOOT not built and ISO wrap fails.
# (Upstream Fedora's fedora-live-kde.ks includes these via fedora-live-base.ks.)
livesys-scripts
anaconda-live
@anaconda-tools
kernel-modules
kernel-modules-extra
glibc-all-langpacks
dracut-live
dracut-config-generic
kernel
grub2-efi-x64
grub2-efi-x64-modules
grub2-pc
grub2-pc-modules
grub2-tools
grub2-tools-extra
shim-x64
efibootmgr
syslinux
isomd5sum
xorriso
# veilor-installer dependencies (TTY1 TUI installer wrapping anaconda)
newt
parted
cryptsetup
lvm2
btrfs-progs
# core hardening tools
fail2ban
fail2ban-firewalld
usbguard
usbguard-tools
audit
policycoreutils-python-utils
tuned
chrony
firewalld
plymouth
# admin essentials
git
vim-enhanced
tmux
htop
podman
skopeo
NetworkManager
NetworkManager-wifi
# fonts
fontconfig
freetype
fira-code-fonts
# remove fluff
# Note: KDE Plasma 6 hard-deps on cups/geoclue2/ModemManager/PackageKit
# transitively (plasma-print-manager, xdg-desktop-portal, NM-wwan etc),
# so package removal breaks depsolve. Daemons disabled at runtime via
# scripts/20-harden-kernel.sh instead.
-abrt*
-snapd
-kde-connect
-open-vm-tools-desktop
-mlocate
%end
# ── Post-install (nochroot): copy overlay tree into installed root ──
%post --nochroot --erroronfail
set -uo pipefail
# DEST: livecd-creator sets INSTALL_ROOT, livemedia-creator uses /mnt/sysimage.
DEST="${INSTALL_ROOT:-/mnt/sysimage}"
[[ -d $DEST ]] || { echo "[ERR] DEST=$DEST does not exist (livecd-creator? livemedia-creator?)" >&2; exit 1; }
# Try multiple source paths:
# /run/install/repo/veilor — boot ISO (--virt mode)
# /work — bind mount in CI container
# $(dirname kickstart)/.. — local --no-virt builds
SRC=""
for candidate in /run/install/repo/veilor /work /mnt/work; do
if [[ -d $candidate/overlay ]]; then
SRC=$candidate
break
fi
done
# Fallback: derive from kickstart path. Anaconda passes ks via --kickstart=<path>.
if [[ -z $SRC ]]; then
KS_PATH=$(ps -ef | grep -oP -- '--kickstart[= ]\K[^ ]+' | head -1)
if [[ -n $KS_PATH && -d $(dirname "$KS_PATH")/../overlay ]]; then
SRC=$(realpath "$(dirname "$KS_PATH")/..")
fi
fi
if [[ -z $SRC ]]; then
echo "[ERR] cannot locate veilor-os repo source — overlay/scripts not copied" >&2
exit 1
fi
echo "[INFO] using SRC=$SRC DEST=$DEST"
set -x
cp -a "$SRC/overlay/." "$DEST/" || echo "[ERR] overlay cp failed: $?"
mkdir -p "$DEST/usr/share/veilor-os" || echo "[ERR] mkdir failed: $?"
ls -la "$SRC/assets" "$SRC/scripts" 2>&1 || echo "[ERR] assets/scripts missing in $SRC"
cp -a "$SRC/assets" "$DEST/usr/share/veilor-os/" || echo "[ERR] assets cp failed: $?"
cp -a "$SRC/scripts" "$DEST/usr/share/veilor-os/" || echo "[ERR] scripts cp failed: $?"
ls -la "$DEST/usr/share/veilor-os/" 2>&1 || echo "[ERR] dest dir missing post-cp"
# Force root ownership on everything we copied — `cp -a` preserves
# CI runner uid (1001), which makes sudo refuse to read /etc/sudoers.d.
chown -R 0:0 "$DEST/etc" "$DEST/usr/share/veilor-os" "$DEST/usr/local/bin" 2>&1 || echo "[WARN] chown failed"
set +x
# Persist nochroot log into installed system for diagnostics
{
echo "=== %post --nochroot trace ==="
date
echo "SRC=$SRC DEST=$DEST"
ls -la "$DEST/usr/share/veilor-os/" 2>&1
ls -la "$DEST/usr/local/bin/" 2>&1
} > "$DEST/var/log/veilor-nochroot.log" 2>&1 || true
%end
# ── Post-install (chroot): apply hardening, theme, branding ──
%post
set -uo pipefail
exec > >(tee -a /var/log/veilor-install.log) 2>&1
echo "════════════════════════════════════════════════════════"
echo " veilor-os install — %post"
echo "════════════════════════════════════════════════════════"
REPO=/usr/share/veilor-os
chmod +x $REPO/scripts/*.sh $REPO/scripts/selinux/*.sh /usr/local/bin/veilor-power /usr/local/bin/veilor-update /usr/local/bin/veilor-doctor /usr/local/bin/veilor-firstboot /usr/local/bin/veilor-installer
# Live image plumbing (matches upstream Fedora live ks). Without these the
# squashfs/EFI build fails — livesys-scripts ships systemd units lorax expects.
systemctl enable livesys.service livesys-late.service 2>/dev/null || true
systemctl enable tmp.mount 2>/dev/null || true
# /etc/machine-id reset on first boot (live image baseline)
> /etc/machine-id
# Apply hardening
bash $REPO/scripts/10-harden-base.sh
bash $REPO/scripts/20-harden-kernel.sh
# Build SELinux module
bash $REPO/scripts/selinux/build-policy.sh || echo "[WARN] SELinux build failed; load on first boot"
# Apply KDE theme + DuckSans + os-release branding
bash $REPO/scripts/kde-theme-apply.sh
bash $REPO/scripts/30-apply-v03-theme.sh || echo "[WARN] v03-theme apply failed"
# Force admin password set on first boot.
# livecd-creator does NOT honor `user` kickstart directive (it's a LIVE
# image, no installer step). Create admin manually in chroot %post.
# Note: SDDM rejects blank passwords by default (PAM nullok off), so we
# set throwaway pw `veilor` + chage -d 0 to force reset on first login.
if ! getent passwd admin >/dev/null; then
useradd -m -G wheel -s /bin/bash -c "veilor admin" admin
echo 'admin:veilor' | chpasswd
chage -d 0 admin
echo "[INFO] admin user created (default pw=veilor, expired)"
fi
# Symlink display-manager.service → sddm.service. graphical.target Wants=
# display-manager but the alias doesn't get auto-created when sddm package
# is installed via livecd-creator (vs Anaconda installer which handles it).
# Without this, sddm stays inactive even though enabled.
ln -sf /usr/lib/systemd/system/sddm.service /etc/systemd/system/display-manager.service
# Live ISO default target: multi-user (TTY1 = veilor-installer TUI lands first).
# User picks "Try live — desktop" from menu → systemctl isolate graphical.target.
# Real installs land on graphical.target by default (set by anaconda).
systemctl set-default multi-user.target
# Branding: GRUB menu title + plymouth `details` text theme (no graphical
# splash). Pure text-scroll boot exposes the gum installer immediately on
# tty1 instead of plymouth swallowing it.
sed -i \
-e 's|^GRUB_DISTRIBUTOR=.*|GRUB_DISTRIBUTOR="veilor-os"|' \
-e 's|^GRUB_CMDLINE_LINUX_DEFAULT=.*|GRUB_CMDLINE_LINUX_DEFAULT=""|' \
/etc/default/grub 2>/dev/null || true
plymouth-set-default-theme details 2>/dev/null || true
[ -f /boot/grub2/grub.cfg ] && grub2-mkconfig -o /boot/grub2/grub.cfg 2>/dev/null || true
# zram swap (no disk swap; keys never leak to platter)
dnf install -y zram-generator || true
cat > /etc/systemd/zram-generator.conf << 'EOF'
[zram0]
zram-size = min(ram, 8192)
compression-algorithm = zstd
EOF
# Patch anaconda's transaction_progress.py inside the live rootfs so that
# when the user clicks "Install", a non-fatal RPM 6.0 *scriptlet* warning
# does not get escalated to "An error occurred during the transaction"
# and abort.
#
# This patch is NARROW — it overrides ONLY the `script_error` callback,
# not the consumer (`process_transaction_progress`). v0.5.28 had a broad
# patch that turned EVERY 'error' token into a warning, including
# `cpio_error` (payload corruption) and `unpack_error` (extraction
# failures). Side effect: silent grub2-efi-x64 scriptlet failure →
# /boot/efi/EFI/fedora/ left incomplete → `gen_grub_cfgstub` failed at
# the bootloader install phase. Narrowing eliminates that class of
# silent failure.
#
# Why a patch is needed at all: Fedora 43 ships RPM 6.0, which changed
# scriptlet failure propagation (Fedora wiki Changes/RPM-6.0; dnf5 issue
# 2507). Scriptlets that previously emitted "Non-critical error"
# warnings now bubble up as transaction-level errors. man-db's
# `transfiletriggerin` (`systemd-run /usr/bin/systemctl start
# man-db-cache-update`) is the most common trigger — non-zero in the
# anaconda chroot, RPM-6.0-aware dnf5 reports as error, anaconda
# --cmdline aborts.
#
# After the patch:
# - script_error → log warning, do NOT enqueue 'error' (transaction
# continues; specific package's posttrans whose result we ignore is
# already in the install set, scriptlet has run as far as it can).
# - cpio_error / unpack_error / generic error → unchanged, still
# raise PayloadInstallationError as anaconda intends. Real
# transaction-fatal events still abort install (good).
# Patch anaconda's transaction_progress.py to suppress dnf5's
# transaction-error escalation under RPM 6.0 + cmdline mode.
#
# History of this patch:
#
# v0.5.28: BROAD patch — overrode `process_transaction_progress` so all
# four 'error' token producers (cpio_error, script_error, unpack_error,
# generic error) became log warnings. man-db scriptlet stopped killing
# the install. BUT silent grub2-efi-x64 scriptlet failure left
# /boot/efi/EFI/fedora/ incomplete → gen_grub_cfgstub failed.
#
# v0.5.29: NARROW patch — overrode only `script_error` callback. Caught
# the per-package scriptlet failures cleanly. BUT dnf5 still tracks
# its own internal error counter and emits a final aggregate
# `error("transaction process has ended with errors..")` at end of
# transaction, which still raised PayloadInstallationError. Install
# aborted before bootloader install ran.
#
# v0.5.30: BROAD patch + bootloader --location=none in install ks.
# This time we silence the aggregate error too, so install completes,
# but anaconda is told NOT to install bootloader itself. The
# generated install ks's chroot %post does it explicitly via
# `dnf reinstall grub2-efi-x64 shim-x64 + grub2-install +
# grub2-mkconfig + efibootmgr`. The chroot has PID 1 systemd state
# from the live ISO (not the target), so scriptlets get a real
# environment to run in, not anaconda's truncated chroot. This
# sidesteps gen_grub_cfgstub entirely.
TP=/usr/lib64/python3.14/site-packages/pyanaconda/modules/payloads/payload/dnf/transaction_progress.py
if [ -f "$TP" ]; then
cp -a "$TP" "${TP}.veilor-bak"
# Replace the entire `elif token == 'error':` branch with log+continue.
# Pattern matches the original two-line block (log.error + raise).
python3 - "$TP" <<'PYEOF'
import sys, re
path = sys.argv[1]
src = open(path).read()
# Match: elif token == 'error':\n log.error(msg)\n raise PayloadInstallationError(...)
# Or any current substitution that looks like raise/log.warning at that level.
new = re.sub(
r"elif token == 'error':\n log\.error\(msg\)\n (?:raise PayloadInstallationError\(\"An error occurred during the transaction: \" \+ msg\)|log\.warning\(\"veilor: ignoring non-fatal transaction error: %s\", msg\))",
"elif token == 'error':\n log.warning('veilor: suppressed dnf5 transaction error (RPM 6.0 cmdline regression): %s', msg)\n # Do not raise — anaconda --cmdline + dnf5 + RPM 6.0 emits this for any scriptlet\n # failure; we handle bootloader install manually in install ks %post chroot",
src,
count=1,
)
if new == src:
# Try fresh-anaconda layout (no veilor patch yet)
new = re.sub(
r"elif token == 'error':\n log\.error\(msg\)\n raise PayloadInstallationError\(\"An error occurred during the transaction: \" \+ msg\)",
"elif token == 'error':\n log.warning('veilor: suppressed dnf5 transaction error: %s', msg)",
src,
count=1,
)
if new == src:
print("[ERR] transaction_progress.py error-branch not found")
sys.exit(1)
open(path, "w").write(new)
print("[OK] transaction_progress.py: broad error-branch suppressed")
PYEOF
if grep -q "veilor: suppressed dnf5 transaction error" "$TP"; then
rm -f /usr/lib64/python3.14/site-packages/pyanaconda/modules/payloads/payload/dnf/__pycache__/transaction_progress.*.pyc 2>/dev/null || true
echo "[OK] anaconda transaction_progress.py patched (broad error suppression)"
else
echo "[WARN] transaction_progress.py patch did not apply"
fi
else
echo "[WARN] transaction_progress.py not found at expected path"
fi
# Enable services
# veilor-firstboot.service NOT enabled on live ISO — it prompts admin pw
# which makes no sense on a live boot. Real installs enable it in their
# generated kickstart's chroot %post (see overlay/usr/local/bin/veilor-installer).
systemctl enable veilor-modules-lock.service
systemctl enable sshd fail2ban usbguard tuned auditd firewalld chronyd
# Mask veilor-firstboot on live so even if it landed in /etc/systemd/system
# (overlay drag), it can't activate.
systemctl mask veilor-firstboot.service 2>/dev/null || true
# Default tuned profile = balanced (AC/battery udev rule will override)
tuned-adm profile veilor-balanced 2>/dev/null || true
# Lock root explicitly (kickstart --lock should already do this)
passwd -l root
# Sanity: zero references to onyx / personal IPs in installed system
if grep -rqi 'onyx\|192\.168\.0\.\|fedora\.local' /etc/veilor* /etc/tuned/profiles/veilor-* 2>/dev/null; then
echo "[ERR] brand leak detected in /etc — investigate"
fi
echo "════════════════════════════════════════════════════════"
echo " veilor-os install complete"
echo "════════════════════════════════════════════════════════"
%end