Add to v0.7 scope: bootstrap ISO writes /var/log/anaconda + the
resolved ks + ostreecontainer pull log + dmesg back to the USB stick
into veilor-install-logs/<timestamp>/. Toggleable via kernel cmdline
inst.veilor.savelogs=0 for opt-out. ON by default through v0.7-v0.9;
flips OFF for v1.0 final release.
Why: failed install + bricked machine + no screenshots — operator boots
back to a working OS, plugs the USB, reads logs offline. No more
"please take a photo of dracut".
BlueBuild's files module fails with 'chmod: Operation not permitted' on
its own bind-mounted /tmp/modules/files/files.sh when run under podman.
Disable SELinux relabeling + seccomp filter on the bluebuild CLI
container so its nested buildah can chmod inside layer mounts.
Add livemedia-creator --make-iso pipeline that produces a small
Anaconda installer ISO consuming a CI-buildable variant of the
runtime ostreecontainer kickstart. Disk/LUKS/user blocks dropped
from the CI ks (Anaconda interactive handles them); ostreecontainer
URL pinned to ghcr.io/veilor-org/veilor-os:43. Output split into
1900M chunks; published to Forgejo installer-latest rolling tag.
Generated a cosign keypair for v0.7 OCI signing.
- bluebuild/cosign.pub committed alongside the recipe
- cosign.key stored on operator workstation only (chmod 600)
- COSIGN_PRIVATE_KEY Forgejo Actions secret set to the same key
- Workflow stages the secret to bluebuild/cosign.key at build time
(chmod 600), where the BlueBuild signing module picks it up
- .gitignore guards against any cosign.key accidental commit
- Restored the type:signing module in recipe.yml
The 'stage-keys' COPY step in BlueBuild's generated containerfile
fails without cosign.pub adjacent to recipe.yml even when
type:signing is removed; re-add the module + provide real keys.
podman login writes to $XDG_RUNTIME_DIR/containers/auth.json by
default; that path varies and was missing. Probe known locations,
copy into /root/.config/containers/auth.json so the bind into the
bluebuild container has a stable source.
The 'securecore-kinoite-hardened-userns' image we'd been targeting
does not exist in the secureblue org's package list. Their KDE
Plasma (Kinoite) hardened variant is published as
'kinoite-main-hardened' (or 'kinoite-nvidia-hardened' for NV boxes).
Switch the recipe + all doc references.
GHCR rate-limited anonymous pulls (403 on bearer token). Login with
the GHCR_PULL_TOKEN secret (s8n-ru read-only PAT), then bind-mount
podman's auth.json into the bluebuild CLI container so its inner
buildah sees the same login.
GHCR rejected skopeo's anonymous manifest call from inside the
bluebuild CLI container. Pre-pull the secureblue base on the host
podman (which handles the anonymous token dance), then bind-mount
/var/lib/containers/storage into the bluebuild container so its
buildah sees the cached base layer. Drop deprecated --inspect-driver
flag while we are touching the invocation.
Container's default entrypoint is dumb-init, which interpreted 'build'
as a command to exec rather than as a bluebuild subcommand. Pin
--entrypoint /usr/local/bin/bluebuild and pass 'build ...' as args.
The blue-build/github-action requires docker buildx which podman
doesn't ship. Symlinking podman as docker isn't enough — the action
calls 'docker buildx inspect' / 'docker buildx rm' which podman
doesn't implement. Pull the official BlueBuild CLI container and run
it with --build-driver buildah; works against podman storage with no
docker dependency.
BlueBuild CLI does not ship pre-built binaries on GitHub Releases
(latest tag v0.9.35 has no assets — install path is cargo or their
container image). Drop the curl-tarball install step and use the
official composite action @ pinned SHA — it runs podman + buildah
inside, works on Forgejo runner identically to GH-hosted because
it's bash, not node-bound.
dnf5 in Fedora 43 strict-fails when 'already installed' packages
appear in -y install. Drop git/curl/tar/sudo (shipped in
veilor-build:43 image already) and use --skip-unavailable. cosign
isn't packaged in F43 — pull v2.4.1 static binary from upstream.
Walk every action in kickstart/veilor-os.ks %post and map to its
v0.7 atomic equivalent:
Build-time script additions:
- chmod +x /usr/share/veilor-os/scripts/* + /usr/local/bin/veilor-*
(BlueBuild type:files sometimes drops perms)
- fc-cache -f after Fira Code stamping
- os-release brand override (NAME=veilor-os, ID=veilor, ID_LIKE)
- brand-leak guard: fail the image build if any onyx/personal data
slipped through into shipped state
Layered packages:
- zram-generator (memory hygiene; replaces dnf install in kickstart)
- jq (used by veilor-doctor for `bootc status --json`)
- vim-enhanced + tmux + htop (admin essentials, parity with v0.5.x)
Systemd unit enables added:
- veilor-postinstall.service (first-login TUI; new in A3)
- veilor-doctor.timer (weekly drift check; new in A3)
Dropped: anaconda transaction_progress.py patch (build-time CI work,
not image content); SDDM display-manager symlink (kinoite ships
sddm.service already); SELinux module build (secureblue has its
own); systemctl set-default multi-user.target (kinoite is
graphical.target by design).
A3 inline (agent failed on API). Three CLIs ported / written for the
v0.7+ atomic system:
veilor-update — rewritten on bootc upgrade (was dnf upgrade --refresh).
Pre-checks bootc status, pauses auditd while staging, prints summary
and offers reboot. Returns 0/1/2/3 per legacy contract.
veilor-postinstall (NEW) — first-login TUI run via
veilor-postinstall.service oneshot. Asks once for keyboard, locale,
hostname, GPU drivers, package presets (dev/media/homelab),
bluetooth, USBGuard snapshot, then invokes veilor-doctor. Writes
/var/lib/veilor/postinstall-complete and self-disables on success.
veilor-doctor — Updates section rewritten to parse `bootc status
--json` (with jq) when available, falls back to dnf history /
check-update for legacy v0.5.x kickstart-installed systems.
Plus systemd units:
- veilor-postinstall.service (oneshot on graphical.target, gated on
absence of done-marker, runs on tty1)
- veilor-doctor.service + .timer (weekly drift check)
A1 inline (agent failed on worktree base mismatch). Adapt
build-bluebuild.yml to run on the Forgejo self-hosted runner using
the same lessons from build-iso.yml debug:
- runs-on: nullstone (resolves to veilor-build:43, fedora43+nodejs)
- BlueBuild CLI installed in-job from upstream release tarball v0.9.10
- podman/buildah/skopeo/cosign installed via dnf
- bluebuild build with podman driver + skopeo inspect + cosign signing
- Push primary to Forgejo registry git.s8n.ru/veilor-org/veilor-os
- GHCR push gated to github.server_url == 'https://github.com' only
- SBOM + attest-build-provenance gated GH-only (Forgejo has no Fulcio)
- All third-party actions remain pinned to node20-shipping versions
Secrets needed in Forgejo repo settings:
- FORGEJO_REGISTRY_TOKEN: PAT with package:write on veilor-org
- FORGEJO_REGISTRY_USER: 's8n-ru' (or org member with write scope)
Strategy pivot 2026-05-06: v0.5.32 produced a green ISO on Forgejo
runner. That's the kickstart-path proof point. Continuing v0.6
kickstart polish is sunk-cost work on tooling retired at v1.0.
Pivot:
- v0.5.0 is the FINAL kickstart-path release. Tag, freeze, ship.
- v0.6 cancelled as a milestone. Original plan kept inline as
HISTORICAL reference.
- v0.7 promoted to primary active milestone. Absorbs the v0.6
ergonomic CLI tools (veilor-postinstall / veilor-doctor /
veilor-update) with bootc upgrade replacing dnf upgrade.
- Active branch: v0.7-bluebuild-spike. All future feature work lands
there, not on main.
Single document that surfaces the depth of work behind veilor-os:
metrics, distros studied, every tool traversed in the build chain,
all 35+ failure classes hit and beaten, key engineering decisions and
why, what's in the repo beyond the kickstart, and the self-hosted
nullstone CI infrastructure built to support it.
Receipts not narrative — every claim links back to a file path,
commit, error, or config. Useful as portfolio anchor and as a single
read-this-first for anyone returning to the project after a gap.
cosign keyless sign uses Sigstore Fulcio which requires a
Fulcio-trusted OIDC issuer. Forgejo runs don't have one, so cosign
falls back to the interactive device flow and times out
(error obtaining token: expired_token). Same applies to
attest-build-provenance and the SBOM action's signed attestation.
Skip all three on Forgejo for now; ISO + sha256 are sufficient for
v0.5.x test releases. Re-add when we self-host a Sigstore stack or
sign with a key-pair instead of keyless.
We layer on their OCI image as v0.7 base; we don't redistribute their
source. Drop the AGPLv3-attribution prose — that becomes relevant only
if/when we ship a verbatim chunk of their config/policy in our repo.
secureblue (AGPLv3) is the upstream hardened atomic Fedora that the
v0.7 BlueBuild spike layers on top of. Comparison table now includes
secureblue alongside Kicksecure + stock Fedora KDE. New "Credit &
relationship to secureblue" section spells out where their work
already solves problems we don't need to reinvent (Trivalent,
SELinux policy, kernel cmdline, signed OCI), how veilor-os differs
(kickstart install path + branding + Forgejo CI), and the AGPLv3
attribution rule for any code we lift verbatim.
Build error: 'Failed to find package apparmor-parser : No match for
argument'. Fedora 43 base and updates do not ship AppArmor packages;
the prior comment was incorrect. Defer AppArmor to v0.7 secureblue OCI
hybrid (which has its own LSM stack), or land via COPR overlay later.
forgejo-runner labels nullstone -> fedora:43 image. Switching
runs-on: ubuntu-24.04 -> nullstone makes the job container itself
the build environment, eliminating the docker-in-docker workspace
bind-mount problem (host path != act-container path).
Build now runs as root in fedora:43, installs livecd-tools directly
via dnf, and writes outputs to $GITHUB_WORKSPACE which is the natural
runner workdir on host. No nested docker, no userns juggling, no
explicit -v workspace bind needed.
Prior pin was the arm64 manifest digest (linux/arm64/v8); on x86_64
host it failed with `exec /usr/bin/sh: exec format error`. Pinned to
the amd64 manifest entry from the same fat-manifest.
Forgejo runner on nullstone runs against a daemon with
userns-remap=default. addnab/docker-run-action launches the Fedora 43
build container with --privileged, which is incompatible with
userns-remap unless --userns=host is also set.
forgejo-runner v6.4.0 ships node20; floating tags @v0/@v3/@v2 now
resolve to actions whose runs.using=node24, which the runner cannot
exec. Pin to last node20-shipping release of each:
- anchore/sbom-action@v0.17.2
- sigstore/cosign-installer@v3.7.0
- actions/attest-build-provenance@v2.2.3
The build-iso workflow used softprops/action-gh-release@v2 unconditionally,
which only speaks the GitHub Releases REST API. When the workflow runs on
the Forgejo runner registered on nullstone, those steps would fail.
Add a server_url check so the GH-only path runs only on github.com, and
mirror it with a curl-based step that hits the Forgejo /api/v1/releases
endpoints. Behaviour:
- github.com: identical to before (action-gh-release@v2).
- git.s8n.ru: drop+recreate ci-latest release, upload chunked assets
via the Forgejo attachments API.
Tag-driven "Attach to release" path mirrored the same way.
Refs: A1 build-eng task — Forgejo runner adaptation.
In v0.5 the "Remove the install media" reminder was a single line
inside the green success box, and operators on both onyx and the
friend's RTX 4080 rig missed it — rebooted into the live ISO and
re-ran the installer thinking the install had silently failed.
Promote the reminder to its own loud yellow thick-bordered gum-style
box stacked directly below the success/countdown box, with three
lines of explanation. Renders for the full 10s of the countdown so
it stays in the operator's face the entire window.
v0.5 used `sleep 5` after a static "System will reboot in 5 seconds."
box, which left the operator guessing how much time was left to grab
the USB stick. The new loop runs 10 → 1, clearing + redrawing the
gum-style success box each tick with the remaining-seconds figure,
giving the operator a visible window to act.
10 seconds (vs 5) because real hardware operators were missing the
window — laptops with the USB on the far side of the dock take
4-5 seconds to physically reach. 10 is comfortable, not annoying.
A typo in the LUKS passphrase is unrecoverable — the disk is
unmountable without it and we don't escrow the key. Re-prompting
until the two reads match catches keyboard-layout surprises (the
US/UK quote-key position is the most common one) before they brick
the install.
Admin password gets the same treatment for consistency. Less
catastrophic (resettable from a recovery shell) but a mismatch
still locks the user out of their fresh install on first boot.
Loop bails on cancel/ESC and re-prompts on validate_pw failure.
Read banner.txt line by line with a 40ms sleep between each, then
clear and redraw the bordered gum-style version. 5-line banner ×
40ms = 200ms total reveal — slow enough to land an aesthetic on the
first frame, fast enough that the operator never feels it as lag.
Pure cosmetic; no functional change to the install flow.
`gum input --password` corrupts the linux fbcon since v0.5.27 — the
bubbletea screen-restore writes back the previous menu buffer because
the framebuffer terminfo entry lacks `civis/cnorm` cursor-hide
sequences, leaving a duplicate "Install" plus a stray "T" rendered on
top of the password field. The fix is a single termios echo-off via
`read -srp`: no redraw, no glitch, no dependency on gum's TUI layer
for the one screen where it broke.
Header still rendered through `gum style` so visual parity with the
disk picker / confirm box is preserved. Whiptail fallback path
unchanged (passwordbox there has always rendered cleanly).
Note that all `uses:` directives still resolve to mutable major-
version tags. SHA-pinning is the Agent 8 audit recommendation but
requires per-action web lookups that stalled the previous SRE
attempt; tracked separately so this PR can land first.
Pin registry.fedoraproject.org/fedora:43 to its current manifest
digest so a malicious or accidental tag-rewrite upstream cannot
silently change the base layer of every CI build. Digest was
captured via `skopeo inspect --raw` on 2026-05-06. Refresh
procedure documented inline.
Sign each ISO chunk with cosign keyless OIDC, generate an SPDX SBOM
of the build output, and attach an in-toto build-provenance
attestation. Sigs/certs/SBOM are uploaded alongside the ISO parts in
the ci-latest rolling prerelease so the test/auto-install.sh path
can verify before reassembling.
Action versions are major-version tags (@v3, @v0, @v2). SHA-pinning
is tracked separately to keep this PR small and avoid the long web
lookups that stalled the previous attempt.
forgejo-runner v6.4.0 ships a node20 javascript engine. v4.2+ of
actions/checkout and v2.0.5+ of softprops/action-gh-release moved to
node24, which the runner refuses to exec. Pin both to last node20
release.
Pairs with a runner-side config change (separately deployed on
nullstone /home/docker/forgejo-runner/conf/config.yaml) that adds
`-v /var/run/docker.sock:/var/run/docker.sock` to per-job container
options + whitelists the socket via valid_volumes — without that
addnab/docker-run-action@v3 inside the catthehacker/ubuntu job
container can't reach the docker engine.
- actions/checkout v4 -> v4.1.7
- softprops/action-gh-release v2 -> v2.0.4
- addnab/docker-run-action v3 unchanged (composite/docker, no node)
- ludeeus/action-shellcheck@master unchanged (docker-based)
First test-runs/ report off the new template. Records the build host
(forgejo-runner on nullstone, ubuntu-24.04 / catthehacker:act-24.04),
notes that v0.5.32 is the first ISO produced after the GH Actions
mirror was disabled, and pre-populates the Findings section with the
7 v0.5.32 blocker fixes from the 2026-05-05 9-agent wave as expected
behaviours the tester must verify.
Result is left as "pending A1 build" — the operator + A5 fill in
per-step pass/fail and hardening output once the actual VM walkthrough
runs against the produced ISO. This is intentional: the report is the
scaffold; the test is a separate step.