Commit graph

9 commits

Author SHA1 Message Date
veilor-org
3b87a63a88 docs: METHOD-CHANGELOG — 2026-05-06 entry for Forgejo build path
Append a v0.5.32 entry documenting that ISO builds moved from GitHub
Actions to the Forgejo runner on nullstone, that the hybrid VM test
procedure is unchanged, that the ISO is now pulled from the
ci-latest Forgejo release tag, and that virtio-9p host log capture
is on by default (replaces the broken virtio-serial path flagged by
Agent 6 in the 2026-05-05 wave).

TESTING.md itself is intentionally untouched — only the build host
and download URL changed; the procedure prose still applies as-is.
2026-05-06 10:32:30 +01:00
veilor-org
b86b4f9ec3 v0.5.32: ship 7 blockers from 9-agent wave
Per docs/research/2026-05-05-agent-wave/README.md priority list.
All 7 land together to keep iteration cycles useful — partial fixes
bury the lookahead findings agents already mapped.

## 1. CRITICAL — suspend/resume wifi death (Agent 9, B2)

`veilor-modules-lock.service` runs `kernel.modules_disabled=1` 30s
after graphical.target. iwlwifi/iwlmvm/cfg80211 reload on resume
from S3/S0ix → with modules locked, resume breaks wifi until
reboot. Same architectural class as the LUKS bug — security feature
breaks legitimate kernel state transitions.

The unit already has `ConditionKernelCommandLine=!module.sig_enforce=1`
(self-skip when signed-modules enforcement is on cmdline). Adding
`module.sig_enforce=1` to the kernel cmdline retains the security
property (no unsigned modules) without runtime lock-down → resume
works.

Files: kickstart/veilor-os.ks line 61 + overlay/usr/local/bin/veilor-installer
generated bootloader directive both gain `module.sig_enforce=1`.

## 2. veilor-firstboot.service WantedBy=graphical.target (Agent 2)

Was `WantedBy=multi-user.target` only. Real installs default to
graphical.target so the unit never ran on installed systems — admin
pw stayed at install-time + chage -d 0 expired, SDDM PAM bounced
to chauthtok screen (recoverable but ugly UX).

Now `WantedBy=graphical.target multi-user.target`. Live ISO +
multi-user installs both resolve via this list.

## 3. USBGuard hash → id-based baseline (Agent 9, A3)

Mirrors memory feedback_usbguard_dock.md — onyx had hash+parent-hash
rules that broke on dock replug; we shipped no rules.conf so first
boot blocks the USB keyboard.

Adds overlay/etc/usbguard/rules.conf with HID-class allow rule
(`allow with-interface match-all { 03:*:* }`) — covers every USB
keyboard, mouse, gamepad, fingerprint reader, NFC. Survives dock
replug + kernel-bump vendor renumeration. Mass-storage stays
implicit-block; user explicitly allows post-firstboot via
`ujust veilor-usbguard-enroll` (planned v0.6).

## 4. firewalld trusted zone with tailscale0 pre-bound (Agent 9, D1)

User uses Tailscale daily (memory: project_tailscale_mesh.md).
Default firewalld zone = drop, blocks tailnet traffic on tailscale0.

Adds overlay/etc/firewalld/zones/trusted.xml with
`<interface name="tailscale0"/>`. After `tailscale up` brings the
interface up, NetworkManager dispatcher associates it with the
trusted zone automatically — no user intervention.

Default zone stays drop. Only the tailscale0 interface gets ACCEPT.

## 5. /etc/skel branding (Agent 7)

Was completely empty. Result: per-user KDE config (~/.config/kdeglobals
etc.) pre-empty, so the moment user opened System Settings, KDE wrote
fresh ~/.config/* and silently shadowed our /etc/xdg/kdedefaults/*.
Visual brand evaporated on first click.

Seeds:
  /etc/skel/.config/kdeglobals    (copy of assets/kde/veilor-default.kdeglobals)
  /etc/skel/.config/breezerc      (copy of assets/kde/breezerc)
  /etc/skel/.config/kwinrc        (Plasma 6 wayland defaults: opengl, animspeed=0,
                                    blur off, click-to-focus)
  /etc/skel/.config/konsolerc     (default profile = Veilor)
  /etc/skel/.local/share/konsole/Veilor.profile + .colorscheme

User who opens System Settings now writes against branded baseline,
not against vanilla Breeze.

## 6. KMS modeset args + initramfs keymap (Agents 1 + 9)

Real laptop boot has a 5-15s blank between vt switch and SDDM start
because simpledrm releases before i915/nvidia-drm/amdgpu claim. Plus
non-US users get locked out at LUKS prompt because initramfs ships
en-US keymap by default (RHBZ 1405539, RHBZ 1890085).

Adds to bootloader cmdline (live + installed):
  i915.modeset=1 amdgpu.modeset=1 nvidia-drm.modeset=1
  rd.vconsole.keymap=us

`rd.vconsole.keymap=us` is a placeholder; the v0.6 firstboot keymap
picker will rewrite it from /etc/vconsole.conf. Until then, en-US
users get correct LUKS keyboard; non-US users still need the v0.6
fix (per Agent 1).

## 7. virtio-9p log capture (Agent 6)

The v0.5.30 virtio-serial wiring depends on rsyslog inside the live
ISO (anaconda's setupVirtio writes a rsyslog forward rule), which
the live ks doesn't install — files were 0-byte across three
install runs.

test/run-vm.sh now adds a `-virtfs local,...,mount_tag=hostlogs`
share pointing at `test/test-runs/<timestamp>/`. veilor-installer
runs `_dump_logs_to_host` via EXIT trap that mounts the share at
/mnt/hostlogs and rsyncs /tmp/{anaconda,program,storage,packaging,dnf}.log
+ /var/log/veilor-installer.log + dmesg + journalctl + the generated
ks. Runs on success AND failure AND ^C.

No-op on real hardware (9p tag absent) — VM-only debug.

## Validate

  bash -n overlay/usr/local/bin/veilor-installer  # OK
  ksvalidator kickstart/veilor-os.ks               # clean

## Out-of-scope for v0.5.32 (deferred to v0.6)

Per Agent 1 follow-ups: argon2id retune for slow CPUs, recovery key
generation in firstboot, TPM2/FIDO2 unlock helpers. Per Agent 9
follow-ups: Plasma Wayland fallback X11 install, lid-close handling,
SELinux relabel progress UX. Per Agent 4: AppArmor stack +
nftables preset + audit log shipping CLI.

Per Agent 8 (CI hardening): SHA-pin actions + dependabot + SBOM +
SLSA L3 attestation — separate workflow-only commit.
2026-05-05 15:36:24 +01:00
veilor-org
e83483a077 v0.5.30: broad error suppression + manual bootloader + virtio log capture
Three-layer fix for the persistent anaconda transaction failure that
killed v0.5.28 (gen_grub_cfgstub) and v0.5.29 (aggregate dnf5 error).

## Layer 1: broad error suppression in transaction_progress.py

dnf5 under RPM 6.0 + cmdline anaconda emits a final aggregate
`error("transaction process has ended with errors..")` at end of
transaction whenever its internal failure counter > 0, regardless of
whether we suppressed individual script_error events. Reproduced
twice. The narrow patch in v0.5.29 suppressed per-package errors but
the aggregate still raised PayloadInstallationError and aborted the
install before the bootloader phase ran.

v0.5.30 patch turns the `elif token == 'error':` branch in
process_transaction_progress into a log.warning. All four producers
(cpio_error, script_error, unpack_error, generic error) now flow
through to a warning + continue. Pattern matches both the original
anaconda layout AND the v0.5.29 narrow-patched layout, so re-applying
on top of either is a no-op.

This brings us back to v0.5.28 broad-suppression behaviour. The
side effect that bit us in v0.5.28 (silent grub2-efi-x64 scriptlet
failure → empty /boot/efi/EFI/fedora/ → gen_grub_cfgstub fails)
is addressed by Layer 2 below.

## Layer 2: bootloader install moved out of anaconda

The generated install kickstart now has `bootloader --location=none`,
which tells anaconda NOT to invoke its own bootloader install code
path (and therefore NOT to call gen_grub_cfgstub). All grub work
moves into the chroot %post block:

  1. `dnf reinstall grub2-efi-x64 grub2-pc grub2-tools shim-x64
     efibootmgr` — re-runs scriptlets in the chroot with full
     PID 1 systemd state, so the systemd-run-style triggers that
     anaconda's chroot truncates actually execute.
  2. `grub2-install --target=x86_64-efi --efi-directory=/boot/efi
     --bootloader-id=fedora --no-nvram` — populates /boot/efi/EFI/fedora/
  3. `gen_grub_cfgstub /boot/grub2 /boot/efi/EFI/fedora` (or
     `grub2-mkconfig` fallback) — writes /boot/efi/EFI/fedora/grub.cfg.
  4. `efibootmgr -c -d <disk> -p <part> -L "veilor-os" -l \EFI\fedora\shimx64.efi`
     — registers the NVRAM boot entry pointing at the signed shim.

Each step logs to stdout and continues on failure (`set +e` block);
diagnostics surface in the install log without aborting the whole
%post.

## Layer 3: virtio-serial log capture in run-vm.sh

Anaconda 43.x autodetects `/dev/virtio-ports/org.fedoraproject.anaconda.log.0`
and streams program/packaging/storage/anaconda logs through it in
real time, before any tmpfs / pivot, before networking, surviving
kernel panic. Wiring it into run-vm.sh means the host gets a
tail-able log file at `test/anaconda-vm-YYYYMMDD-HHMMSS.log` for
every VM run.

We've lost logs three times in a row to anaconda failures + tmpfs
reboots. This breaks the loop.

## Diagnostic story

Before this commit: VM aborts → live ISO reboots itself → /tmp/
tmpfs gone → no logs → guess what failed. Three days, two and a
half false fixes.

After this commit: VM aborts → host has /home/admin/ai-lab/_github/veilor-os/test/anaconda-vm-*.log
with the actual scriptlet output, the actual exit codes, the
actual file-trigger failures. Future debug becomes evidence-based.

Files changed:
  kickstart/veilor-os.ks        — broad error suppression patch
  overlay/usr/local/bin/veilor-installer — --location=none + manual grub
  test/run-vm.sh                — virtio-serial chardev wiring

Verified: bash -n clean, ksvalidator clean.
2026-05-05 11:59:35 +01:00
veilor-org
1881c14ea7 v0.5.27: rd.luks.uuid via grubby, GRUB rebrand, fbcon=nodefer, ASCII gum cursor
Critical install bug fix + cosmetic round-up + first formal test
procedure document.

## Critical: LUKS unlock on first boot

Generated installer kickstart's %post was injecting `rd.luks.uuid=…`
into `/etc/default/grub` only. Fedora 43 uses BLS (Boot Loader
Specification) entries in `/boot/loader/entries/*.conf`; those are
NOT regenerated by `grub2-mkconfig`. Result: the kernel boots without
`rd.luks.uuid=`, dracut's cryptsetup-generator never spawns the
unlock unit, plymouth has no password to ask for, and dracut-initqueue
loops on dev-disk-by-uuid for ~3min before dropping to emergency
shell.

The fix layers both write paths:
- `/etc/default/grub` — keeps the args around for future kernels
  (kernel-install reads this when adding new entries).
- `grubby --update-kernel=ALL --args=...` — rewrites the `options`
  line of every existing BLS entry so the kernel that boots NEXT
  actually has the args.

Verified by reading `/proc/cmdline` from the dracut emergency shell
on a v0.5.26 install; old cmdline had only `root=UUID=… ro
rootflags=subvol=root` and was missing the LUKS arg entirely.

## GRUB / branding

- `/etc/default/grub` is sed'd to `GRUB_DISTRIBUTOR="veilor-os"` (was
  already there, kept).
- BLS entries' `title` line is rewritten in-place to "veilor-os
  (<kver>)" for every kernel — `grub2-mkconfig` does not touch BLS
  titles, so this is the only path.
- `/boot/loader/entries/*-0-rescue-*.conf` is removed: the auto-built
  rescue entry was leaking "Fedora Linux" into the GRUB menu and
  showing a second boot option that nobody asked for. The rescue
  kernel image itself is left in /boot.
- Hostname defaults to `veilor` (was inheriting the `localhost-live`
  name anaconda writes when the kickstart's network directive is
  ignored under cmdline mode).
- `/etc/machine-info` adds `PRETTY_HOSTNAME="veilor-os"` so
  `hostnamectl status` and any consumer reading machine-info see the
  brand.

## Boot UX

- `fbcon=nodefer` added to live-ISO bootloader cmdline. On real
  laptops with a hardware GPU, the kernel modeset blanks the
  framebuffer console mid-boot; without `nodefer` the installer
  banner draws into a frozen framebuffer and the user sees a black
  screen with a blinking cursor for ~30s. virtio-vga in QEMU doesn't
  trigger this so it never reproduced in VM. Symptom report on
  v0.5.26 was the trigger to investigate.

## Installer cosmetics

- `GUM_CHOOSE_CURSOR` and `GUM_INPUT_PROMPT` switched from `❯ ` to
  `> `. The unicode arrow falls back to a fixed-width block on the
  linux fbcon font and lipgloss then duplicates that block at col +23,
  producing the "Install Install" double-render and the stray-T
  artifact in password fields. Plain ASCII renders identically across
  fbcon, virtio-vga, and X/Wayland gum runs.
- `VERSION_ID` bumped 0.5.8 → 0.5.27 in the os-release drop-in. The
  installer banner reads this at runtime, so the live ISO + installed
  system both now show "veilor-os 0.5.27".

## Test procedure

- `test/TESTING.md` — first canonical test procedure document. Splits
  VM (cheap iteration, hybrid sendkey + human passwords) from real
  hardware (mandatory for tag). Documents the standard test passwords
  (`veilortest1` for both LUKS and admin), the kill-and-relaunch step
  to skip CD on second boot, and the per-step pass/fail contract.
- `test/METHOD-CHANGELOG.md` — append-only audit trail for changes to
  the procedure. Future releases that alter the test method must add
  an entry here with the why.
- `test/test-runs/_TEMPLATE.md` — per-run report template. Each
  tagged release should land a filled report alongside it.

## test/run-vm.sh

Decoupled QEMU monitor sock setup from auto-inject. Previously
`NO_INJECT=1` (used to suppress autotype noise into prompts) also
killed the monitor sock, leaving the VM undriveable. Monitor sock is
now always exposed; only the inject helper is gated on the pubkey
detection.
2026-05-05 01:43:00 +01:00
veilor-org
9921745c9d test/auto-install.sh: auto-fetch + reassemble chunked ISO from ci-latest
Workflow now publishes ISO as 1900M chunks. Test harness needs to:
1. gh release download --pattern '*.iso.part-*'
2. cat parts back into single ISO
3. verify sha256 of all parts

If invoked with no ISO arg, auto-fetches from ci-latest release.
Falls back to local ISO path if given as arg (existing behavior).

Reassembled ISO lives at ~/veilor-iso/ci-latest/.
2026-05-02 22:50:37 +01:00
s8n
deef914064 v0.5.5: autonomous install test harness (#12)
test/auto-install.sh boots ISO, drives gum installer via QEMU
monitor sendkey with hardcoded test answers, waits for anaconda,
reboots into installed system, SSHs in, runs validation checklist.

Co-authored-by: veilor-org <admin@veilor.org>
2026-05-02 22:49:51 +01:00
s8n
09f7c1f753 build: wire 30-apply-v03-theme.sh into ks %post + SSH key auto-inject in run-vm.sh (#1)
Co-authored-by: veilor-org <admin@veilor.org>
2026-05-02 04:38:23 +01:00
veilor
33a0673126 test: add VM runner — qemu+OVMF wrapper for fast iso iteration loop 2026-04-30 04:06:19 +01:00
veilor
1822005df1 veilor-os v0.1 scaffold — kickstart + hardening + 3-mode power + DuckSans-ready KDE black theme 2026-04-30 03:43:33 +01:00