Run 189 + 191 (2026-05-08) both died at the surviving rpm-ostree module:
chmod: changing permissions of '/tmp/modules/rpm-ostree/rpm-ostree.sh':
Operation not permitted
subprocess exited with status 1
Same buildah userns=host bind-mount bug we already worked around for
type:files / type:script / type:systemd — BlueBuild's helper script
tries to chmod itself inside its own bind-mounted layer and the host's
unprivileged user can't change perms on a kernel-mounted overlay path.
Workaround: drop the type:rpm-ostree module, fold its 10-package install
list into the existing type:containerfile snippet as a raw RUN. Per BB
docs each snippets[] entry = its own layer, so all three RUN concerns
(repo curl + rpm-ostree install + brand sed + systemctl enable/disable)
are merged into ONE snippet = ONE layer to keep the A1b commit-cost
collapse intact (~40min/layer wallclock on our fuse-overlayfs runner).
Added repo files inside the same RUN — secureblue base ships neither:
- https://repository.mullvad.net/rpm/stable/mullvad.repo
- https://pkgs.tailscale.com/stable/fedora/tailscale.repo
The previous type:rpm-ostree must have been silently failing on these
two pkgs (build never got past the chmod gate to find out).
Ordering: pkgs first (so systemctl enable yggdrasil / disable tailscaled
see their unit files), then brand+units in a best-effort group, then
rpm-ostree cleanup -m && ostree container commit to finalize the layer
(BB's wrapped rpm-ostree module does this implicitly; raw RUN must do
it manually for parity with secureblue / Universal Blue base).
Module count: 7 → 6.
Expected outcome: build clears past STEP rpm-ostree, no chmod gate left.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Run 183 (2026-05-08) hit runner timeout at 3h10min not on brand-leak
grep (already moved to CI smoke-test in 7027026) but on per-layer
commit cost. Each RUN/COPY layer COMMIT under fuse-overlayfs over
secureblue's ~130-layer hardened base eats ~40min wallclock:
STEP 10 cp keys 23:55:59 -> 00:34:02 38min
STEP 11 cp bins 00:34:02 -> 01:16:17 42min
STEP 12 cp nushell 01:16:17 -> 01:58:17 42min
STEP 13 pre_build 01:58:17 -> 02:41:48 43min
STEP 14 brand sed 02:41:48 -> killed 04:02:59 (1h21min, runner-
side timeout
below the 360min
workflow cap)
Ergo: every module saved = ~40min wallclock saved.
Collapses:
- 5x rpm-ostree -> 1x (-4 layers) sudo + Xwayland + mullvad-
browser + tailscale + yggdrasil + zram-generator + jq + vim-
enhanced + tmux + htop now in one install: list
- 2x containerfile -> 1x (-1 layer) brand-sed + systemctl enable/
disable merged into one RUN snippet (BlueBuild docs: each
snippet entry == its own layer, so single snippet stays single
layer)
- 4x copy -> 4x (no change) BlueBuild copy module is
one-src/dest-per-entry per
https://blue-build.org/reference/modules/copy/. Floor unless we
drop down to a hand-rolled Containerfile.
Net: 12 -> 7 modules. Expected savings ~5x40min ~= 3h20min off the
~3h10min run-183 wallclock. That should land us comfortably under
the runner timeout with budget for the actual layer work.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- STEP 14/30 hung under buildah fuse-overlayfs scanning
/usr/share/veilor-os on ~130-layer secureblue base (Forgejo run 171,
2026-05-07, hit 360-min timeout, no error logged).
- Brand-leak grep -rqi removed from bluebuild/recipe.yml RUN snippet;
one-line comment left in its place pointing at the new location.
- Added equivalent assertion at the end of the Smoke-test OCI image
step in .github/workflows/build-bluebuild.yml. Runs once on the
sealed image (no overlayfs in flight), uses `find -type f` over
bounded paths + name globs (text files only), then a single grep
invocation — much faster than recursive grep over the whole tree.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- new helper overlay/usr/share/veilor-os/scripts/persist-install-logs.sh
detects boot USB (BOOT=/findfs, /run/install/repo, /sys/block removable),
copies /tmp/anaconda.log + program/storage/packaging/dnf/syslog/X +
journalctl -b + dmesg + lsblk/blkid/mount + /proc/cmdline into
/veilor-install-logs/<UTC-ts>/ on the stick; mirrors backup into
/mnt/sysroot/var/log/veilor-install-logs/ so logs survive even on RO
USB or detect failure
- toggle: kernel cmdline veilor.install_logs=on|off (default ON until
v1.0 final); never fails install on log persistence error
- kickstart/install-ostreecontainer-installer.ks: add %post --nochroot
block calling helper with toggle-aware inline fallback if helper
missing
- .github/workflows/build-installer-iso.yml: switch bib config from
[customizations.user] to [customizations.installer.kickstart] so our
new %post --nochroot actually lands in the produced ISO; admin user
now created by ks user directive (locked + chage 0); ostreecontainer
line stripped (bib auto-appends it); kernel-cmdline-default
limitation documented (osbuild/bootc-image-builder#899)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
livemedia-creator rejected our kickstart with:
Only url, nfs and ostreesetup install methods are currently supported
ostreecontainer is too new for livemedia. bootc-image-builder is the
canonical tool for ostreecontainer-based installer ISOs — consumes
the OCI image directly, generates an Anaconda installer ISO that
embeds it. Per memory, anaconda-iso is deprecated in image-builder
v44+ but works on v43 (current).
Workflow now:
1. Login to Forgejo registry (read OCI)
2. Pull the OCI image into local podman storage
3. podman run quay.io/centos-bootc/bootc-image-builder
--type anaconda-iso --rootfs btrfs <oci-ref>
4. Copy resulting ISO into build/out
Drop livemedia-creator + lorax + pykickstart + anaconda-tui + grub2
+ shim install — bootc-image-builder ships its own runtime.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
livemedia-creator needs the kickstart to specify packages for the
LIVE BOOT environment (the squashfs that runs Anaconda); ostreecontainer
populates the target system. The two are independent. Add minimal
package set: anaconda-tui + dracut-live + ostree/rpm-ostree/bootupd
+ grub2 + shim + xorriso.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
166 ran 6hr through STEPS 10-14 (cosign keys cp, stage-bins cp,
nushell pull, pre_build.sh) — fuse-overlayfs with 130+ layers makes
each cp/RUN take ~40min on first build. Subsequent builds will be
faster (cached layers).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
livemedia-creator pre-creates the parent dir of --logfile before
checking that --resultdir doesn't exist. Putting the log inside
resultdir made the dir 'exist' before the check ran. Move logfile
to /tmp.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
/var/lmc-out-PID kept being marked 'exists' by livemedia even after
rm -rf. Probably bind-mount or tmpfs from runner persists /var.
Switch to /tmp/lmc-out-PID — act job container's /tmp is fresh per
run.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
livemedia-creator refuses any existing resultdir. Even after
rm -rf build/out the runner workspace dir reappears. Use a fresh
PID-suffixed /var/lmc-out path outside workspace, then cp into
build/out for downstream steps.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Both bluebuild module types ship a helper (script.nu / systemd.nu)
inside their bind-mounted module image at /tmp/modules. The first
thing run_module.sh does is chmod +x the helper, which fails
'Operation not permitted' under podman/buildah privileged in our
runner — same root cause as the type:files chmod we already worked
around with type:copy.
Raw `type: containerfile` (RUN block) bypasses bluebuild's module
helpers entirely. Move our brand+chmod+fc-cache+os-release sed +
brand-leak guard into one RUN line, and the systemctl
enable/disable into another.
This should clear the last bluebuild module-helper blocker.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
bluebuild (159): 'type: files' module fails 'chmod: Operation not
permitted' inside its own bind-mounted /tmp/modules/files/files.sh
under buildah + privileged-podman in our runner. Switch all four
`type: files` modules to `type: copy` (low-level COPY, no chmod, no
helper script needed).
installer-iso (160): livemedia-creator refused build/out which
checkout had already created (Forgejo runner reuses workspace dir
between runs). rm -rf build/out before invocation; mkdir not needed,
livemedia-creator creates the dir itself.
Add to v0.7 scope: bootstrap ISO writes /var/log/anaconda + the
resolved ks + ostreecontainer pull log + dmesg back to the USB stick
into veilor-install-logs/<timestamp>/. Toggleable via kernel cmdline
inst.veilor.savelogs=0 for opt-out. ON by default through v0.7-v0.9;
flips OFF for v1.0 final release.
Why: failed install + bricked machine + no screenshots — operator boots
back to a working OS, plugs the USB, reads logs offline. No more
"please take a photo of dracut".
BlueBuild's files module fails with 'chmod: Operation not permitted' on
its own bind-mounted /tmp/modules/files/files.sh when run under podman.
Disable SELinux relabeling + seccomp filter on the bluebuild CLI
container so its nested buildah can chmod inside layer mounts.
Add livemedia-creator --make-iso pipeline that produces a small
Anaconda installer ISO consuming a CI-buildable variant of the
runtime ostreecontainer kickstart. Disk/LUKS/user blocks dropped
from the CI ks (Anaconda interactive handles them); ostreecontainer
URL pinned to ghcr.io/veilor-org/veilor-os:43. Output split into
1900M chunks; published to Forgejo installer-latest rolling tag.
Generated a cosign keypair for v0.7 OCI signing.
- bluebuild/cosign.pub committed alongside the recipe
- cosign.key stored on operator workstation only (chmod 600)
- COSIGN_PRIVATE_KEY Forgejo Actions secret set to the same key
- Workflow stages the secret to bluebuild/cosign.key at build time
(chmod 600), where the BlueBuild signing module picks it up
- .gitignore guards against any cosign.key accidental commit
- Restored the type:signing module in recipe.yml
The 'stage-keys' COPY step in BlueBuild's generated containerfile
fails without cosign.pub adjacent to recipe.yml even when
type:signing is removed; re-add the module + provide real keys.
podman login writes to $XDG_RUNTIME_DIR/containers/auth.json by
default; that path varies and was missing. Probe known locations,
copy into /root/.config/containers/auth.json so the bind into the
bluebuild container has a stable source.
The 'securecore-kinoite-hardened-userns' image we'd been targeting
does not exist in the secureblue org's package list. Their KDE
Plasma (Kinoite) hardened variant is published as
'kinoite-main-hardened' (or 'kinoite-nvidia-hardened' for NV boxes).
Switch the recipe + all doc references.
GHCR rate-limited anonymous pulls (403 on bearer token). Login with
the GHCR_PULL_TOKEN secret (s8n-ru read-only PAT), then bind-mount
podman's auth.json into the bluebuild CLI container so its inner
buildah sees the same login.
GHCR rejected skopeo's anonymous manifest call from inside the
bluebuild CLI container. Pre-pull the secureblue base on the host
podman (which handles the anonymous token dance), then bind-mount
/var/lib/containers/storage into the bluebuild container so its
buildah sees the cached base layer. Drop deprecated --inspect-driver
flag while we are touching the invocation.
Container's default entrypoint is dumb-init, which interpreted 'build'
as a command to exec rather than as a bluebuild subcommand. Pin
--entrypoint /usr/local/bin/bluebuild and pass 'build ...' as args.
The blue-build/github-action requires docker buildx which podman
doesn't ship. Symlinking podman as docker isn't enough — the action
calls 'docker buildx inspect' / 'docker buildx rm' which podman
doesn't implement. Pull the official BlueBuild CLI container and run
it with --build-driver buildah; works against podman storage with no
docker dependency.
BlueBuild CLI does not ship pre-built binaries on GitHub Releases
(latest tag v0.9.35 has no assets — install path is cargo or their
container image). Drop the curl-tarball install step and use the
official composite action @ pinned SHA — it runs podman + buildah
inside, works on Forgejo runner identically to GH-hosted because
it's bash, not node-bound.
dnf5 in Fedora 43 strict-fails when 'already installed' packages
appear in -y install. Drop git/curl/tar/sudo (shipped in
veilor-build:43 image already) and use --skip-unavailable. cosign
isn't packaged in F43 — pull v2.4.1 static binary from upstream.
Walk every action in kickstart/veilor-os.ks %post and map to its
v0.7 atomic equivalent:
Build-time script additions:
- chmod +x /usr/share/veilor-os/scripts/* + /usr/local/bin/veilor-*
(BlueBuild type:files sometimes drops perms)
- fc-cache -f after Fira Code stamping
- os-release brand override (NAME=veilor-os, ID=veilor, ID_LIKE)
- brand-leak guard: fail the image build if any onyx/personal data
slipped through into shipped state
Layered packages:
- zram-generator (memory hygiene; replaces dnf install in kickstart)
- jq (used by veilor-doctor for `bootc status --json`)
- vim-enhanced + tmux + htop (admin essentials, parity with v0.5.x)
Systemd unit enables added:
- veilor-postinstall.service (first-login TUI; new in A3)
- veilor-doctor.timer (weekly drift check; new in A3)
Dropped: anaconda transaction_progress.py patch (build-time CI work,
not image content); SDDM display-manager symlink (kinoite ships
sddm.service already); SELinux module build (secureblue has its
own); systemctl set-default multi-user.target (kinoite is
graphical.target by design).
A3 inline (agent failed on API). Three CLIs ported / written for the
v0.7+ atomic system:
veilor-update — rewritten on bootc upgrade (was dnf upgrade --refresh).
Pre-checks bootc status, pauses auditd while staging, prints summary
and offers reboot. Returns 0/1/2/3 per legacy contract.
veilor-postinstall (NEW) — first-login TUI run via
veilor-postinstall.service oneshot. Asks once for keyboard, locale,
hostname, GPU drivers, package presets (dev/media/homelab),
bluetooth, USBGuard snapshot, then invokes veilor-doctor. Writes
/var/lib/veilor/postinstall-complete and self-disables on success.
veilor-doctor — Updates section rewritten to parse `bootc status
--json` (with jq) when available, falls back to dnf history /
check-update for legacy v0.5.x kickstart-installed systems.
Plus systemd units:
- veilor-postinstall.service (oneshot on graphical.target, gated on
absence of done-marker, runs on tty1)
- veilor-doctor.service + .timer (weekly drift check)
A1 inline (agent failed on worktree base mismatch). Adapt
build-bluebuild.yml to run on the Forgejo self-hosted runner using
the same lessons from build-iso.yml debug:
- runs-on: nullstone (resolves to veilor-build:43, fedora43+nodejs)
- BlueBuild CLI installed in-job from upstream release tarball v0.9.10
- podman/buildah/skopeo/cosign installed via dnf
- bluebuild build with podman driver + skopeo inspect + cosign signing
- Push primary to Forgejo registry git.s8n.ru/veilor-org/veilor-os
- GHCR push gated to github.server_url == 'https://github.com' only
- SBOM + attest-build-provenance gated GH-only (Forgejo has no Fulcio)
- All third-party actions remain pinned to node20-shipping versions
Secrets needed in Forgejo repo settings:
- FORGEJO_REGISTRY_TOKEN: PAT with package:write on veilor-org
- FORGEJO_REGISTRY_USER: 's8n-ru' (or org member with write scope)
Strategy pivot 2026-05-06: v0.5.32 produced a green ISO on Forgejo
runner. That's the kickstart-path proof point. Continuing v0.6
kickstart polish is sunk-cost work on tooling retired at v1.0.
Pivot:
- v0.5.0 is the FINAL kickstart-path release. Tag, freeze, ship.
- v0.6 cancelled as a milestone. Original plan kept inline as
HISTORICAL reference.
- v0.7 promoted to primary active milestone. Absorbs the v0.6
ergonomic CLI tools (veilor-postinstall / veilor-doctor /
veilor-update) with bootc upgrade replacing dnf upgrade.
- Active branch: v0.7-bluebuild-spike. All future feature work lands
there, not on main.
Single document that surfaces the depth of work behind veilor-os:
metrics, distros studied, every tool traversed in the build chain,
all 35+ failure classes hit and beaten, key engineering decisions and
why, what's in the repo beyond the kickstart, and the self-hosted
nullstone CI infrastructure built to support it.
Receipts not narrative — every claim links back to a file path,
commit, error, or config. Useful as portfolio anchor and as a single
read-this-first for anyone returning to the project after a gap.
cosign keyless sign uses Sigstore Fulcio which requires a
Fulcio-trusted OIDC issuer. Forgejo runs don't have one, so cosign
falls back to the interactive device flow and times out
(error obtaining token: expired_token). Same applies to
attest-build-provenance and the SBOM action's signed attestation.
Skip all three on Forgejo for now; ISO + sha256 are sufficient for
v0.5.x test releases. Re-add when we self-host a Sigstore stack or
sign with a key-pair instead of keyless.
We layer on their OCI image as v0.7 base; we don't redistribute their
source. Drop the AGPLv3-attribution prose — that becomes relevant only
if/when we ship a verbatim chunk of their config/policy in our repo.
secureblue (AGPLv3) is the upstream hardened atomic Fedora that the
v0.7 BlueBuild spike layers on top of. Comparison table now includes
secureblue alongside Kicksecure + stock Fedora KDE. New "Credit &
relationship to secureblue" section spells out where their work
already solves problems we don't need to reinvent (Trivalent,
SELinux policy, kernel cmdline, signed OCI), how veilor-os differs
(kickstart install path + branding + Forgejo CI), and the AGPLv3
attribution rule for any code we lift verbatim.
Build error: 'Failed to find package apparmor-parser : No match for
argument'. Fedora 43 base and updates do not ship AppArmor packages;
the prior comment was incorrect. Defer AppArmor to v0.7 secureblue OCI
hybrid (which has its own LSM stack), or land via COPR overlay later.
forgejo-runner labels nullstone -> fedora:43 image. Switching
runs-on: ubuntu-24.04 -> nullstone makes the job container itself
the build environment, eliminating the docker-in-docker workspace
bind-mount problem (host path != act-container path).
Build now runs as root in fedora:43, installs livecd-tools directly
via dnf, and writes outputs to $GITHUB_WORKSPACE which is the natural
runner workdir on host. No nested docker, no userns juggling, no
explicit -v workspace bind needed.
Prior pin was the arm64 manifest digest (linux/arm64/v8); on x86_64
host it failed with `exec /usr/bin/sh: exec format error`. Pinned to
the amd64 manifest entry from the same fat-manifest.
Forgejo runner on nullstone runs against a daemon with
userns-remap=default. addnab/docker-run-action launches the Fedora 43
build container with --privileged, which is incompatible with
userns-remap unless --userns=host is also set.
forgejo-runner v6.4.0 ships node20; floating tags @v0/@v3/@v2 now
resolve to actions whose runs.using=node24, which the runner cannot
exec. Pin to last node20-shipping release of each:
- anchore/sbom-action@v0.17.2
- sigstore/cosign-installer@v3.7.0
- actions/attest-build-provenance@v2.2.3