- STEP 14/30 hung under buildah fuse-overlayfs scanning
/usr/share/veilor-os on ~130-layer secureblue base (Forgejo run 171,
2026-05-07, hit 360-min timeout, no error logged).
- Brand-leak grep -rqi removed from bluebuild/recipe.yml RUN snippet;
one-line comment left in its place pointing at the new location.
- Added equivalent assertion at the end of the Smoke-test OCI image
step in .github/workflows/build-bluebuild.yml. Runs once on the
sealed image (no overlayfs in flight), uses `find -type f` over
bounded paths + name globs (text files only), then a single grep
invocation — much faster than recursive grep over the whole tree.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- new helper overlay/usr/share/veilor-os/scripts/persist-install-logs.sh
detects boot USB (BOOT=/findfs, /run/install/repo, /sys/block removable),
copies /tmp/anaconda.log + program/storage/packaging/dnf/syslog/X +
journalctl -b + dmesg + lsblk/blkid/mount + /proc/cmdline into
/veilor-install-logs/<UTC-ts>/ on the stick; mirrors backup into
/mnt/sysroot/var/log/veilor-install-logs/ so logs survive even on RO
USB or detect failure
- toggle: kernel cmdline veilor.install_logs=on|off (default ON until
v1.0 final); never fails install on log persistence error
- kickstart/install-ostreecontainer-installer.ks: add %post --nochroot
block calling helper with toggle-aware inline fallback if helper
missing
- .github/workflows/build-installer-iso.yml: switch bib config from
[customizations.user] to [customizations.installer.kickstart] so our
new %post --nochroot actually lands in the produced ISO; admin user
now created by ks user directive (locked + chage 0); ostreecontainer
line stripped (bib auto-appends it); kernel-cmdline-default
limitation documented (osbuild/bootc-image-builder#899)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
livemedia-creator rejected our kickstart with:
Only url, nfs and ostreesetup install methods are currently supported
ostreecontainer is too new for livemedia. bootc-image-builder is the
canonical tool for ostreecontainer-based installer ISOs — consumes
the OCI image directly, generates an Anaconda installer ISO that
embeds it. Per memory, anaconda-iso is deprecated in image-builder
v44+ but works on v43 (current).
Workflow now:
1. Login to Forgejo registry (read OCI)
2. Pull the OCI image into local podman storage
3. podman run quay.io/centos-bootc/bootc-image-builder
--type anaconda-iso --rootfs btrfs <oci-ref>
4. Copy resulting ISO into build/out
Drop livemedia-creator + lorax + pykickstart + anaconda-tui + grub2
+ shim install — bootc-image-builder ships its own runtime.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
166 ran 6hr through STEPS 10-14 (cosign keys cp, stage-bins cp,
nushell pull, pre_build.sh) — fuse-overlayfs with 130+ layers makes
each cp/RUN take ~40min on first build. Subsequent builds will be
faster (cached layers).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
livemedia-creator pre-creates the parent dir of --logfile before
checking that --resultdir doesn't exist. Putting the log inside
resultdir made the dir 'exist' before the check ran. Move logfile
to /tmp.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
/var/lmc-out-PID kept being marked 'exists' by livemedia even after
rm -rf. Probably bind-mount or tmpfs from runner persists /var.
Switch to /tmp/lmc-out-PID — act job container's /tmp is fresh per
run.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
livemedia-creator refuses any existing resultdir. Even after
rm -rf build/out the runner workspace dir reappears. Use a fresh
PID-suffixed /var/lmc-out path outside workspace, then cp into
build/out for downstream steps.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
bluebuild (159): 'type: files' module fails 'chmod: Operation not
permitted' inside its own bind-mounted /tmp/modules/files/files.sh
under buildah + privileged-podman in our runner. Switch all four
`type: files` modules to `type: copy` (low-level COPY, no chmod, no
helper script needed).
installer-iso (160): livemedia-creator refused build/out which
checkout had already created (Forgejo runner reuses workspace dir
between runs). rm -rf build/out before invocation; mkdir not needed,
livemedia-creator creates the dir itself.
BlueBuild's files module fails with 'chmod: Operation not permitted' on
its own bind-mounted /tmp/modules/files/files.sh when run under podman.
Disable SELinux relabeling + seccomp filter on the bluebuild CLI
container so its nested buildah can chmod inside layer mounts.
Add livemedia-creator --make-iso pipeline that produces a small
Anaconda installer ISO consuming a CI-buildable variant of the
runtime ostreecontainer kickstart. Disk/LUKS/user blocks dropped
from the CI ks (Anaconda interactive handles them); ostreecontainer
URL pinned to ghcr.io/veilor-org/veilor-os:43. Output split into
1900M chunks; published to Forgejo installer-latest rolling tag.
Generated a cosign keypair for v0.7 OCI signing.
- bluebuild/cosign.pub committed alongside the recipe
- cosign.key stored on operator workstation only (chmod 600)
- COSIGN_PRIVATE_KEY Forgejo Actions secret set to the same key
- Workflow stages the secret to bluebuild/cosign.key at build time
(chmod 600), where the BlueBuild signing module picks it up
- .gitignore guards against any cosign.key accidental commit
- Restored the type:signing module in recipe.yml
The 'stage-keys' COPY step in BlueBuild's generated containerfile
fails without cosign.pub adjacent to recipe.yml even when
type:signing is removed; re-add the module + provide real keys.
podman login writes to $XDG_RUNTIME_DIR/containers/auth.json by
default; that path varies and was missing. Probe known locations,
copy into /root/.config/containers/auth.json so the bind into the
bluebuild container has a stable source.
The 'securecore-kinoite-hardened-userns' image we'd been targeting
does not exist in the secureblue org's package list. Their KDE
Plasma (Kinoite) hardened variant is published as
'kinoite-main-hardened' (or 'kinoite-nvidia-hardened' for NV boxes).
Switch the recipe + all doc references.
GHCR rate-limited anonymous pulls (403 on bearer token). Login with
the GHCR_PULL_TOKEN secret (s8n-ru read-only PAT), then bind-mount
podman's auth.json into the bluebuild CLI container so its inner
buildah sees the same login.
GHCR rejected skopeo's anonymous manifest call from inside the
bluebuild CLI container. Pre-pull the secureblue base on the host
podman (which handles the anonymous token dance), then bind-mount
/var/lib/containers/storage into the bluebuild container so its
buildah sees the cached base layer. Drop deprecated --inspect-driver
flag while we are touching the invocation.
Container's default entrypoint is dumb-init, which interpreted 'build'
as a command to exec rather than as a bluebuild subcommand. Pin
--entrypoint /usr/local/bin/bluebuild and pass 'build ...' as args.
The blue-build/github-action requires docker buildx which podman
doesn't ship. Symlinking podman as docker isn't enough — the action
calls 'docker buildx inspect' / 'docker buildx rm' which podman
doesn't implement. Pull the official BlueBuild CLI container and run
it with --build-driver buildah; works against podman storage with no
docker dependency.
BlueBuild CLI does not ship pre-built binaries on GitHub Releases
(latest tag v0.9.35 has no assets — install path is cargo or their
container image). Drop the curl-tarball install step and use the
official composite action @ pinned SHA — it runs podman + buildah
inside, works on Forgejo runner identically to GH-hosted because
it's bash, not node-bound.
dnf5 in Fedora 43 strict-fails when 'already installed' packages
appear in -y install. Drop git/curl/tar/sudo (shipped in
veilor-build:43 image already) and use --skip-unavailable. cosign
isn't packaged in F43 — pull v2.4.1 static binary from upstream.
A1 inline (agent failed on worktree base mismatch). Adapt
build-bluebuild.yml to run on the Forgejo self-hosted runner using
the same lessons from build-iso.yml debug:
- runs-on: nullstone (resolves to veilor-build:43, fedora43+nodejs)
- BlueBuild CLI installed in-job from upstream release tarball v0.9.10
- podman/buildah/skopeo/cosign installed via dnf
- bluebuild build with podman driver + skopeo inspect + cosign signing
- Push primary to Forgejo registry git.s8n.ru/veilor-org/veilor-os
- GHCR push gated to github.server_url == 'https://github.com' only
- SBOM + attest-build-provenance gated GH-only (Forgejo has no Fulcio)
- All third-party actions remain pinned to node20-shipping versions
Secrets needed in Forgejo repo settings:
- FORGEJO_REGISTRY_TOKEN: PAT with package:write on veilor-org
- FORGEJO_REGISTRY_USER: 's8n-ru' (or org member with write scope)
cosign keyless sign uses Sigstore Fulcio which requires a
Fulcio-trusted OIDC issuer. Forgejo runs don't have one, so cosign
falls back to the interactive device flow and times out
(error obtaining token: expired_token). Same applies to
attest-build-provenance and the SBOM action's signed attestation.
Skip all three on Forgejo for now; ISO + sha256 are sufficient for
v0.5.x test releases. Re-add when we self-host a Sigstore stack or
sign with a key-pair instead of keyless.
forgejo-runner labels nullstone -> fedora:43 image. Switching
runs-on: ubuntu-24.04 -> nullstone makes the job container itself
the build environment, eliminating the docker-in-docker workspace
bind-mount problem (host path != act-container path).
Build now runs as root in fedora:43, installs livecd-tools directly
via dnf, and writes outputs to $GITHUB_WORKSPACE which is the natural
runner workdir on host. No nested docker, no userns juggling, no
explicit -v workspace bind needed.
Prior pin was the arm64 manifest digest (linux/arm64/v8); on x86_64
host it failed with `exec /usr/bin/sh: exec format error`. Pinned to
the amd64 manifest entry from the same fat-manifest.
Forgejo runner on nullstone runs against a daemon with
userns-remap=default. addnab/docker-run-action launches the Fedora 43
build container with --privileged, which is incompatible with
userns-remap unless --userns=host is also set.
forgejo-runner v6.4.0 ships node20; floating tags @v0/@v3/@v2 now
resolve to actions whose runs.using=node24, which the runner cannot
exec. Pin to last node20-shipping release of each:
- anchore/sbom-action@v0.17.2
- sigstore/cosign-installer@v3.7.0
- actions/attest-build-provenance@v2.2.3
The build-iso workflow used softprops/action-gh-release@v2 unconditionally,
which only speaks the GitHub Releases REST API. When the workflow runs on
the Forgejo runner registered on nullstone, those steps would fail.
Add a server_url check so the GH-only path runs only on github.com, and
mirror it with a curl-based step that hits the Forgejo /api/v1/releases
endpoints. Behaviour:
- github.com: identical to before (action-gh-release@v2).
- git.s8n.ru: drop+recreate ci-latest release, upload chunked assets
via the Forgejo attachments API.
Tag-driven "Attach to release" path mirrored the same way.
Refs: A1 build-eng task — Forgejo runner adaptation.
Note that all `uses:` directives still resolve to mutable major-
version tags. SHA-pinning is the Agent 8 audit recommendation but
requires per-action web lookups that stalled the previous SRE
attempt; tracked separately so this PR can land first.
Pin registry.fedoraproject.org/fedora:43 to its current manifest
digest so a malicious or accidental tag-rewrite upstream cannot
silently change the base layer of every CI build. Digest was
captured via `skopeo inspect --raw` on 2026-05-06. Refresh
procedure documented inline.
Sign each ISO chunk with cosign keyless OIDC, generate an SPDX SBOM
of the build output, and attach an in-toto build-provenance
attestation. Sigs/certs/SBOM are uploaded alongside the ISO parts in
the ci-latest rolling prerelease so the test/auto-install.sh path
can verify before reassembling.
Action versions are major-version tags (@v3, @v0, @v2). SHA-pinning
is tracked separately to keep this PR small and avoid the long web
lookups that stalled the previous attempt.
forgejo-runner v6.4.0 ships a node20 javascript engine. v4.2+ of
actions/checkout and v2.0.5+ of softprops/action-gh-release moved to
node24, which the runner refuses to exec. Pin both to last node20
release.
Pairs with a runner-side config change (separately deployed on
nullstone /home/docker/forgejo-runner/conf/config.yaml) that adds
`-v /var/run/docker.sock:/var/run/docker.sock` to per-job container
options + whitelists the socket via valid_volumes — without that
addnab/docker-run-action@v3 inside the catthehacker/ubuntu job
container can't reach the docker engine.
- actions/checkout v4 -> v4.1.7
- softprops/action-gh-release v2 -> v2.0.4
- addnab/docker-run-action v3 unchanged (composite/docker, no node)
- ludeeus/action-shellcheck@master unchanged (docker-based)
forgejo-runner v6.4.0 javascript runtime is node20. Pin every
javascript action used in the spike branch's workflows to the last
release that ships node20.
- actions/checkout v4 -> v4.1.7 (3 files)
- softprops/action-gh-release v2 -> v2.0.4 (build-iso)
- anchore/sbom-action v0 -> v0.17.2
- actions/attest-build-provenance v2 -> v2.2.3
- blue-build/github-action@v1 unchanged (TODO: SHA pin)
This is the spike-branch counterpart of the main-branch fix in
feat/runner-fix-docker-sock-and-node20.
Replace @v1 with @24d146df25adc2cf579e918efe2d9bff6adea408 (the commit
v1 currently resolves to). Tag pins on third-party actions are mutable
— a maintainer or attacker can re-point v1 at a malicious commit and
silently change what runs on every push.
Trailing comment '# v1' preserves human readability for future bumps.
Refs: 9-agent CI hardening wave (agent 8), 2026-05-05.
GH release asset size limit = 2 GiB. Veilor ISO ~2.8 GiB (KDE base +
hardening + grafted /veilor/ tree). zstd -19 only achieves 96.67%
compression (squashfs already xz-compressed). Splitting is the fix.
Workflow now:
- Splits ISO with `split -b 1900M -d --suffix-length=2`
- Drops original ISO before upload (would fail at >2 GiB)
- Includes per-part sha256 for reassembly verification
- Release notes include cat reassembly command
test/auto-install.sh will need follow-up commit to download + cat
the parts before booting.
Two follow-ups to 75a68a1 (releases switchover):
1. action-gh-release got 403 "Resource not accessible by integration"
because default GITHUB_TOKEN has read-only on contents. Added
workflow-level `permissions: contents: write`.
2. Failure-path artifact upload still hit quota wall. Replaced with
inline `tail` of build/out/build.log + anaconda program.log
directly to job log. No artifact upload = no quota.
Artifact storage quota (50GB Pro tier) maxed out with ~18 iterations
of 2.7GB ISOs. Quota recalc 6-12h not in our cadence. Builds succeed
but upload step fails — wasting CI minutes + blocking testing.
Switch to GitHub Releases (no storage quota):
- Every successful build on main updates rolling `ci-latest`
prerelease draft. Replaces files in place.
- Tag-driven releases (v*.*.*) keep their existing publish path.
- Build logs remain as artifacts (small + opt-in failure only,
retention=1d).
User can `gh release download ci-latest --repo veilor-org/veilor-os`
or browse to releases page. No more artifact quota wall.
* v0.5.1 build: vendor gum binary + graft /veilor/ onto ISO
- gum 0.17.0 pinned by sha256, downloaded into overlay/usr/local/bin/
so installer can use Charm.sh TUI primitives.
- After livecd-creator produces ISO, extract+re-pack with /veilor/
containing overlay+scripts+assets so installer-generated ks can
copy them into target system at install time.
* fix: extract original ISO boot stanza programmatically (no hardcoded paths)
Reviewer found `-e images/efiboot.img` was wrong — Fedora livecd-creator
places efiboot.img in isolinux/ not images/. Plus missing
--mbr-force-bootable + -partition_* flags would produce hybrid MBR/GPT
mismatch refused by some BIOS firmwares.
Fix: extract original ISO's exact boot stanza via
`xorriso -report_el_torito as_mkisofs` and replay it via eval.
Guarantees exact match, immune to upstream Fedora layout changes.
---------
Co-authored-by: veilor-org <admin@veilor.org>