veilor-os/docs/research/2026-05-05-agent-wave/03-bootc-spike-plan.md

159 lines
6.6 KiB
Markdown
Raw Normal View History

docs: 9-agent research wave findings — v0.5.32 blocker map Logs the full output of the 9-agent deep-dive run on 2026-05-05 to docs/research/2026-05-05-agent-wave/. Pulls every actionable finding into one indexed location so v0.5.32 planning has a paper trail. Files: docs/research/2026-05-05-agent-wave/README.md — index docs/research/2026-05-05-agent-wave/01-...real-hardware.md — Plymouth + LUKS edge cases docs/research/2026-05-05-agent-wave/02-...firstboot-ux.md — SDDM + first-boot UX docs/research/2026-05-05-agent-wave/03-...spike-plan.md — bootc-image-builder 1-week spike docs/research/2026-05-05-agent-wave/04-...tier-2.md — AppArmor + nftables + audit + homed docs/research/2026-05-05-agent-wave/05-...launch.md — threat model + v0.7 launch checklist docs/research/2026-05-05-agent-wave/06-...log-capture.md — virtio-9p host-share for anaconda logs docs/research/2026-05-05-agent-wave/07-...skel-branding.md — /etc/skel gap audit docs/research/2026-05-05-agent-wave/08-...ci-hardening.md — SHA-pin actions + SBOM + SLSA L3 docs/research/2026-05-05-agent-wave/09-...failure-modes.md — real-hardware pessimistic audit Plus the prior linter-applied: docs/ROADMAP.md — Lessons learned section, v0.5.32 active block, v0.6 promotion of veilor-postinstall + veilor-doctor, v0.7 bootc spike scheduled docs/THREAT-MODEL.md — drafted by Agent 5; in/out scope, comparison matrix, v0.7 launch checklist Top blockers identified for v0.5.32 (cross-cited in README): 1. Suspend/resume wifi death (kernel.modules_disabled=1) 2. veilor-firstboot.service WantedBy=graphical.target 3. kernel-upgrade grub drift 4. USBGuard hash-rules problem (already learned on onyx) 5. firewalld blocks tailscale0 6. /etc/skel/ empty 7. virtio-9p log capture replaces broken virtio-serial path Wave + verifier pattern (per ROADMAP lessons learned #4) validated: 9 parallel agents on distinct topics produced converging blocker list. The same pattern landed v0.5.31 four-bug fix from the prior 4-agent verification wave on v0.5.30 outcome.
2026-05-05 14:52:53 +01:00
# bootc-image-builder spike plan — 1-week timebox
**Agent 3 of 9-agent wave, 2026-05-05.** Schedule: v0.7.
## Containerfile draft
```dockerfile
# veilor-os bootc image — Fedora 43 KDE base
FROM quay.io/fedora/fedora-bootc:43
ARG VEILOR_VERSION=0.6.0
RUN dnf install -y --setopt=install_weak_deps=False \
@kde-desktop-environment @kde-apps @core @hardware-support @standard \
kernel-modules kernel-modules-extra glibc-all-langpacks \
grub2-efi-x64 grub2-efi-x64-modules grub2-pc grub2-pc-modules \
grub2-tools grub2-tools-extra shim-x64 efibootmgr \
newt parted cryptsetup lvm2 btrfs-progs \
fail2ban fail2ban-firewalld usbguard usbguard-tools audit \
policycoreutils-python-utils tuned chrony firewalld plymouth \
git vim-enhanced tmux htop podman skopeo \
NetworkManager NetworkManager-wifi \
fontconfig freetype fira-code-fonts \
zram-generator \
&& dnf remove -y --noautoremove \
'abrt*' snapd kde-connect open-vm-tools-desktop mlocate man-db man-pages \
&& dnf clean all && rm -rf /var/cache/dnf
ARG GUM_VERSION=0.17.0
ARG GUM_SHA256=69ee169bd6387331928864e94d47ed01ef649fbfe875baed1bbf27b5377a6fdb
ADD https://github.com/charmbracelet/gum/releases/download/v${GUM_VERSION}/gum_${GUM_VERSION}_Linux_x86_64.tar.gz /tmp/gum.tgz
RUN echo "${GUM_SHA256} /tmp/gum.tgz" | sha256sum -c - \
&& tar -xzf /tmp/gum.tgz -C /tmp \
&& install -m0755 /tmp/gum_${GUM_VERSION}_Linux_x86_64/gum /usr/bin/gum
COPY overlay/ /
COPY assets/ /usr/share/veilor-os/assets/
COPY scripts/ /usr/share/veilor-os/scripts/
RUN bash /usr/share/veilor-os/scripts/10-harden-base.sh \
&& bash /usr/share/veilor-os/scripts/20-harden-kernel.sh \
&& bash /usr/share/veilor-os/scripts/selinux/build-policy.sh \
&& bash /usr/share/veilor-os/scripts/kde-theme-apply.sh \
&& bash /usr/share/veilor-os/scripts/30-apply-v03-theme.sh
RUN plymouth-set-default-theme details \
&& sed -i \
-e 's|^GRUB_DISTRIBUTOR=.*|GRUB_DISTRIBUTOR="veilor-os"|' \
/etc/default/grub
# bootc kargs go in /usr/lib/bootc/kargs.d/, not /etc/default/grub
RUN mkdir -p /usr/lib/bootc/kargs.d && cat > /usr/lib/bootc/kargs.d/10-veilor-hardening.toml <<'EOF'
kargs = [
"lockdown=integrity",
"slab_nomerge",
"init_on_alloc=1",
"init_on_free=1",
"randomize_kstack_offset=on",
"vsyscall=none",
"fbcon=nodefer",
]
EOF
RUN systemctl enable sshd fail2ban usbguard tuned auditd firewalld chronyd sddm \
veilor-firstboot.service veilor-modules-lock.service \
&& passwd -l root \
&& systemctl set-default graphical.target
RUN bootc container lint
LABEL org.veilor.version=${VEILOR_VERSION}
```
## bootc-image-builder config (`build/disk-config.toml`)
```toml
[customizations]
hostname = "veilor-os"
[[customizations.user]]
name = "admin"
password = "veilor"
groups = ["wheel"]
shell = "/bin/bash"
[customizations.kernel]
append = "lockdown=integrity slab_nomerge init_on_alloc=1 init_on_free=1 randomize_kstack_offset=on vsyscall=none fbcon=nodefer"
[customizations.installer.kickstart]
contents = """
zerombr
clearpart --all --initlabel
part /boot/efi --fstype=efi --size=600
part /boot --fstype=ext4 --size=1024
part btrfs.veilor --grow --encrypted --luks-version=luks2 --pbkdf=argon2id
btrfs none --label=veilor btrfs.veilor
btrfs / --subvol --name=root LABEL=veilor
btrfs /home --subvol --name=home LABEL=veilor
"""
```
## GitHub Actions workflow
`build-bootc-iso.yml`:
- runs-on ubuntu-24.04, **timeout 30 min** (vs 90 for livecd-creator)
- permissions: `contents: write`, `packages: write`
- Build OCI image: `podman build` + `podman push ghcr.io/veilor/veilor-os:43`
- Build ISO via `quay.io/centos-bootc/bootc-image-builder:latest`
with `--type anaconda-iso --rootfs btrfs --config /build/disk-config.toml`
- Reuse split + `softprops/action-gh-release@v2` from existing workflow
## Migration risks (10-row table)
| # | Risk | Severity | Mitigation |
|---|------|----------|------------|
| 1 | %post --nochroot overlay-copy disappears | Low | `COPY overlay/ /` is simpler — win |
| 2 | Update model: `bootc upgrade` (image swap) replaces `dnf upgrade` | High | `veilor-update` becomes thin `bootc upgrade --apply` wrapper |
| 3 | /usr is read-only at runtime | Medium | etc-overlay handles /etc writes; relocate any /usr writers to /etc or build-time |
| 4 | SELinux module compilation in container | Medium | Works in fedora-bootc:43 (verified per upstream pattern). Test spike day 2 |
| 5 | `transaction_progress.py` patch unnecessary | Low | bootc-image-builder doesn't use dnf at install. Drop the patch. Win |
| 6 | `rd.luks.uuid` is anaconda's job again | Low | Removes ~80 lines of fragile sed/grubby code. Win |
| 7 | LUKS prompt UX: anaconda native, not gum | High | gum installer becomes `live·shell` only. v1.0 install = anaconda's native UI |
| 8 | --privileged still required | None | Same as today |
| 9 | OCI image size: ~3.5 GB compressed vs ~2.8 GB squashfs | Low | zstd:max recovers ~400 MB |
| 10 | `kernel-install` BLS: `/etc/kernel/cmdline` not honored, `/usr/lib/bootc/kargs.d/*.toml` is | Medium | Already addressed in Containerfile draft |
## What we keep (zero churn)
- `overlay/*` — copied verbatim by `COPY overlay/ /`
- `scripts/*.sh` — invoked verbatim by Containerfile RUN
- `assets/*` — copied verbatim
- `test/*` — adapts: `podman run --rm -it ghcr.io/veilor/veilor-os:43 /bin/bash` smoke; QEMU ISO test unchanged
- `kickstart/install.ks` — kept as fallback. Tag last anaconda build as `v0.5.99-anaconda` before flipping
## Spike success criteria (1 week)
| Day | Milestone |
|-----|-----------|
| 1 | Containerfile builds clean (`podman build` exit 0, `bootc container lint` exit 0) |
| 2 | `podman run` boots into image, KDE binaries present, SELinux + hardening sysctls applied |
| 3 | bootc-image-builder produces installer ISO from OCI, ksvalidator clean |
| 4 | ISO boots in QEMU to anaconda live menu |
| 5 | Install completes, LUKS single-prompt, btrfs subvols present |
| 6 | First boot reaches SDDM, admin login works, password-change-on-first-login enforced |
| 7 | Buffer for fixes; doc `docs/BUILD-bootc.md`; tag `v0.5.99-anaconda` snapshot |
## Decision gate
- **PASS** (all 7 criteria green): tag `v0.5.99-anaconda` as last-anaconda;
merge `bootc-spike``main` as `v0.6.0-bootc`; deprecate
`kickstart/veilor-os.ks` (keep `kickstart/install.ks` for one cycle).
Update ROADMAP: v1.0 ships bootc-only.
- **FAIL** (any of risks 3, 4, 7, 10 unfixable in week 1): keep
anaconda path, defer migration to v1.1+; file each blocker as GH
issue with reproducer.
- **HYBRID FALLBACK**: ship anaconda ISO for v0.6/v0.7, ship bootc OCI
alongside (matches existing `veilor-atomic` stretch goal).