168 lines
7.2 KiB
Markdown
168 lines
7.2 KiB
Markdown
|
|
# Real-hardware failure mode audit (post-v0.5.31)
|
|||
|
|
|
|||
|
|
**Agent 9 of 9-agent wave, 2026-05-05.** Pessimistic enumeration.
|
|||
|
|
|
|||
|
|
## A. Boot path
|
|||
|
|
|
|||
|
|
### A1. Secure Boot + GRUB_DISTRIBUTOR rebrand
|
|||
|
|
- shim chain itself untouched (uses `/EFI/fedora/`), but `grub2-mkconfig`
|
|||
|
|
regenerates entries naming `veilor-os` while shim only trusts paths
|
|||
|
|
under `/EFI/fedora/grubx64.efi`. Strict UEFI: menu boots, kernel
|
|||
|
|
signatures verify via Fedora's MOK chain. Risk: `os-prober` writing
|
|||
|
|
dual-boot Windows entries breaks MBR/MOK.
|
|||
|
|
- **Symptom:** dual-boot with Windows shows
|
|||
|
|
`Verification failed: (0x1A) Security Violation`.
|
|||
|
|
- **Prob:** MED. **Fix:** S. **Target:** v0.5.32.
|
|||
|
|
|
|||
|
|
### A2. KMS handoff — `fbcon=nodefer` necessary but not sufficient
|
|||
|
|
- On Intel Arc/iGPU late-gen + NVIDIA proprietary chains, 5-15s blank
|
|||
|
|
between vt switch and SDDM start because `simpledrm` releases before
|
|||
|
|
`i915`/`nvidia-drm` claim.
|
|||
|
|
- **Symptom:** ~10s blank pre-SDDM; user thinks crashed.
|
|||
|
|
- **Prob:** HIGH. **Fix:** S — add `i915.modeset=1
|
|||
|
|
nvidia-drm.modeset=1 amdgpu.modeset=1`. SDDM `Type=simple` startup.
|
|||
|
|
**Target:** v0.5.32.
|
|||
|
|
|
|||
|
|
### A3. USBGuard hash-based rules
|
|||
|
|
- `scripts/20-harden-kernel.sh:127-131` ships **empty** rules.conf
|
|||
|
|
with `ImplicitPolicyTarget=block`. First boot, admin runs
|
|||
|
|
`usbguard generate-policy`. Per `feedback_usbguard_dock.md`, this
|
|||
|
|
writes hash+parent-hash rules that break on dock replug.
|
|||
|
|
- **Symptom:** keyboard/mouse dies on first dock unplug-replug.
|
|||
|
|
- **Prob:** HIGH. **Fix:** M — patch invocation to
|
|||
|
|
`--with-hash=false`, or ship `veilor-usbguard-enroll` wrapper.
|
|||
|
|
**Target:** v0.5.32 (same bug we already learned).
|
|||
|
|
|
|||
|
|
### A4. Wifi/Bluetooth firmware
|
|||
|
|
- `@hardware-support` pulls `linux-firmware` etc.
|
|||
|
|
- Realtek RTL8852/MT7921 firmware ships in `linux-firmware-whence` only.
|
|||
|
|
- **Prob:** LOW. **Fix:** S (add explicit `linux-firmware-whence`).
|
|||
|
|
**Target:** v0.5.32.
|
|||
|
|
|
|||
|
|
### A5. Bluetooth disabled at boot
|
|||
|
|
- `scripts/20-harden-kernel.sh:111` disables `bluetooth` service.
|
|||
|
|
BT keyboards/mice don't pair until user enables service.
|
|||
|
|
- **Prob:** MED (laptop users). **Fix:** S — leave bluetooth.service
|
|||
|
|
enabled, mask `obex` only. **Target:** v0.6.
|
|||
|
|
|
|||
|
|
## B. First-boot KDE session
|
|||
|
|
|
|||
|
|
### B1. Plasma 6 Wayland fallback on hybrid graphics
|
|||
|
|
- SDDM config doesn't pin session. NVIDIA Optimus + intel-iris
|
|||
|
|
triggers Wayland → silent fallback to X11 on some HW.
|
|||
|
|
- **Symptom:** screen tearing, no fractional scaling.
|
|||
|
|
- **Prob:** MED. **Fix:** S — add `[Autologin] Session=plasma`
|
|||
|
|
+ `[General] DefaultSession=plasma.desktop`. **Target:** v0.6.
|
|||
|
|
|
|||
|
|
### B2. SUSPEND/RESUME KILLS WIFI — THE BIG ONE 🚨
|
|||
|
|
- `veilor-modules-lock.service` sets `kernel.modules_disabled=1` 30s
|
|||
|
|
after graphical.target. `iwlwifi`, `iwlmvm`, `cfg80211` reload on
|
|||
|
|
resume from S3/S0ix. With modules locked: **resume → permanent
|
|||
|
|
wifi death until reboot**. Same for `nvidia` autoload, `xhci_pci`
|
|||
|
|
re-init on dock attach.
|
|||
|
|
- **Symptom:** close laptop lid → reopen → no wifi, no dock USB,
|
|||
|
|
until reboot.
|
|||
|
|
- **Prob:** VERY HIGH (every laptop user, day 1).
|
|||
|
|
- **Fix:** M — gate lock on `ConditionACPower=true` + reset on
|
|||
|
|
suspend, OR move from `modules_disabled` to `module.sig_enforce=1`
|
|||
|
|
kernel cmdline (no runtime lock needed).
|
|||
|
|
- **Target:** v0.5.32 — **BLOCKER**.
|
|||
|
|
|
|||
|
|
### B3. Lid-close handling
|
|||
|
|
- `logind.conf` not modified. Defaults `HandleLidSwitch=suspend`.
|
|||
|
|
Combined with B2, every lid close = wifi loss.
|
|||
|
|
- **Prob:** HIGH. **Fix:** S. **Target:** v0.5.32 (paired with B2).
|
|||
|
|
|
|||
|
|
## C. Day-2 ops
|
|||
|
|
|
|||
|
|
### C1. `/etc/default/grub` + `/etc/kernel/cmdline` drift
|
|||
|
|
- Kickstart writes `GRUB_CMDLINE_LINUX_DEFAULT=""`. Real installer
|
|||
|
|
writes `/etc/kernel/cmdline` with LUKS rd.luks args. `kernel-install`
|
|||
|
|
reads the latter; `grub2-mkconfig` re-reads `/etc/default/grub`.
|
|||
|
|
- **Symptom:** `dnf upgrade kernel` regenerates grub.cfg from
|
|||
|
|
default/grub, drops LUKS unlock args from new entry → unbootable.
|
|||
|
|
- **Prob:** HIGH. **Fix:** M — sync both files in `veilor-update`,
|
|||
|
|
or migrate fully to BLS without grub-mkconfig.
|
|||
|
|
- **Target:** v0.5.32.
|
|||
|
|
|
|||
|
|
### C2. SELinux relabel on first boot
|
|||
|
|
- `firstboot.sh` flips to `enforcing` and `touch /.autorelabel`. On
|
|||
|
|
large /home (encrypted btrfs), relabel takes 2-5min — user sees
|
|||
|
|
frozen screen with cursor.
|
|||
|
|
- **Symptom:** stuck "first boot" appears hung.
|
|||
|
|
- **Prob:** MED. **Fix:** S (add plymouth message). **Target:** v0.6.
|
|||
|
|
|
|||
|
|
### C3. F44 upgrade
|
|||
|
|
- Hardcoded `python3.14` path (kickstart:334) for transaction_progress.py
|
|||
|
|
patch. Survives no upgrade.
|
|||
|
|
- **Prob:** certainty by Nov 2026. **Fix:** M. **Target:** v0.6.
|
|||
|
|
|
|||
|
|
### C4. chrony NTS unreachable from corp networks
|
|||
|
|
- Cloudflare NTS over UDP 4460 blocked by many corp firewalls.
|
|||
|
|
chronyd will fail-stop sync.
|
|||
|
|
- **Symptom:** clock skew → TLS failures → broken everything.
|
|||
|
|
- **Prob:** MED. **Fix:** S (add fallback `pool` line — already
|
|||
|
|
present, verify ordering). **Target:** v0.5.32.
|
|||
|
|
|
|||
|
|
## D. Networking
|
|||
|
|
|
|||
|
|
### D1. firewalld drop zone vs Tailscale 🚨
|
|||
|
|
- `tailscale up` requires UDP 41641 + tailscale0 trusted. Default
|
|||
|
|
`drop` zone blocks tailscale0.
|
|||
|
|
- **Prob:** HIGH (this user uses Tailscale daily).
|
|||
|
|
- **Fix:** S — ship `/etc/firewalld/zones/trusted.xml` with
|
|||
|
|
`tailscale0` interface.
|
|||
|
|
- **Target:** v0.5.32.
|
|||
|
|
|
|||
|
|
### D2. systemd-resolved DoT vs corp split-DNS
|
|||
|
|
- No /etc/resolved.conf.d entries shipped (overlay dir empty).
|
|||
|
|
- Corp internal hostnames fail.
|
|||
|
|
- **Prob:** LOW. **Fix:** M. **Target:** v0.7.
|
|||
|
|
|
|||
|
|
## E. Hardware diversity
|
|||
|
|
|
|||
|
|
### E1. NVMe vs SATA LUKS perf
|
|||
|
|
- Argon2id KDF tuned to memory, not IO.
|
|||
|
|
- **Prob:** cosmetic. Skip.
|
|||
|
|
|
|||
|
|
### E2. ARM aarch64
|
|||
|
|
- Out of scope for v0.5/0.6.
|
|||
|
|
|
|||
|
|
### E3. TPM2 unlock
|
|||
|
|
- Already on roadmap. **Target:** v0.7.
|
|||
|
|
|
|||
|
|
## Top 10 ranked (prob × severity)
|
|||
|
|
|
|||
|
|
| # | Issue | Prob | Sev | Target |
|
|||
|
|
|---|-------|------|-----|--------|
|
|||
|
|
| 1 | **B2 Suspend/resume wifi death** (modules_disabled) | VHIGH | CRITICAL | v0.5.32 |
|
|||
|
|
| 2 | **C1 kernel-upgrade grub drift** (LUKS args lost) | HIGH | CRITICAL | v0.5.32 |
|
|||
|
|
| 3 | **A3 USBGuard hash rules** (dock replug) | HIGH | HIGH | v0.5.32 |
|
|||
|
|
| 4 | **D1 firewalld blocks tailscale0** | HIGH | HIGH | v0.5.32 |
|
|||
|
|
| 5 | **A2 KMS blank-screen 10s** | HIGH | MED | v0.5.32 |
|
|||
|
|
| 6 | **B3 Lid-close suspend** (compounds B2) | HIGH | MED | v0.5.32 |
|
|||
|
|
| 7 | **A1 Secure Boot + os-prober dual-boot** | MED | HIGH | v0.6 |
|
|||
|
|
| 8 | **C4 NTS blocked corp** | MED | MED | v0.5.32 |
|
|||
|
|
| 9 | **B1 Plasma Wayland fallback** | MED | MED | v0.6 |
|
|||
|
|
| 10 | **C3 F44 path-pinned patch** | CERTAIN | LOW (Nov) | v0.6 |
|
|||
|
|
|
|||
|
|
## Top 5 to preempt in v0.5.32
|
|||
|
|
|
|||
|
|
1. **B2 modules-lock vs resume** — gate on no-pending-suspend, OR swap
|
|||
|
|
to `module.sig_enforce=1` kernel cmdline.
|
|||
|
|
2. **C1 cmdline drift** — make `veilor-update` fail-loud if
|
|||
|
|
`/etc/kernel/cmdline` and `/etc/default/grub` diverge; regen BLS
|
|||
|
|
on every kernel install.
|
|||
|
|
3. **A3 USBGuard id-based rules** — `veilor-usb-enroll` wrapper that
|
|||
|
|
calls `usbguard generate-policy --with-hash=false`. Same fix that
|
|||
|
|
already burned us on onyx.
|
|||
|
|
4. **D1 Tailscale zone** — ship `/etc/firewalld/zones/trusted.xml`
|
|||
|
|
listing `tailscale0`, plus NetworkManager dispatcher to assign it.
|
|||
|
|
5. **A2 KMS handoff** — append `i915.modeset=1 amdgpu.modeset=1
|
|||
|
|
nvidia-drm.modeset=1` to bootloader cmdline.
|
|||
|
|
|
|||
|
|
**Critical insight:** B2 alone bricks the laptop for any user who
|
|||
|
|
closes their lid. Without that fix, v0.5.32 is shippable on desktops
|
|||
|
|
only. Same architectural class as the LUKS bug — security feature
|
|||
|
|
breaks legitimate kernel state transitions.
|