s8n 6288c57781 doc 26 INC4: black band + 4K HDR slow transcode + v2 test + methodology audit

Two regressions slipped through INC1-3:

INC4a -- BLACK BAND behind every detail-page carousel
  Pre-existing 2026-05-08 home-page rule painted .emby-scroller {bg:#000
  !important} UNSCOPED. Hits every carousel inside .itemDetailPage incl
  admin-only More from Season N, More Like This. INC1-3 transparent-scope
  list missed .emby-scroller / .verticalSection / .padded-top-focusscale.
  Fixed by extending scope.

INC4b -- VIDEO 'BLACK SCREEN' on play
  Not actually black-screen. CPU-only nullstone cannot sustain real-time
  4K HEVC HDR tonemap+x264 transcode -- 0.5x realtime, ffmpeg takes ~6s
  per 3s segment. With user resume seeks adding restart overhead, total
  wait ~18s before browser readyState rises. User saw black, gave up.
  Fix: disable EnableTonemapping (R&M fake HDR per doc 21) + cap
  RemoteClientBitrateLimit=20Mbps on every user (1080p target, no 4K
  scale). Headless v2 test confirms HEVC + AV1 episodes now hit
  readyState=3/4 within wait window; 4K HDR R&M still slow (heaviest).

INC4 testing methodology audit -- bin/headless-test-v2.py
  v1 only logged in as guest and never clicked Play. v2 runs both admin
  and guest, walks 3 codec-tagged items per role (HEVC/AV1/H.264),
  clicks Play, captures <video> state, sweeps DOM for opaque bgs over
  backdrop layer. False positives: off-viewport #reactRoot + collapsed
  .mainDrawer (negative coords). Allowlist refinement TODO.

Open: 4K HDR sources still slow even post-fix. Real fix path = pre-
transcode masters to 1080p H.264 SDR via separate batch, OR migrate to
10.11.8 with vaapi/qsv driver fixed.

2026-05-09 01:46:47 +01:00

85 KiB

Raw Blame History

26 — Incident 2026-05-09: Page Unresponsive + Posters Missing + Playback Black-Screen

Session log. Live document — updated as fix proceeds. Goal: future-me + other operators can read this and skip every dead-end I already walked.

Status as of doc creation: ONGOING — partial fix applied, more under investigation.

Symptoms reported by owner (in order)

"Browser arrflix is broken videos don't play at all"
"I can't even see a preview of the TV series / movie"
After first fix: page loads, posters render, but "Page Unresponsive" Chrome dialog before posters paint (screenshot 1)
After second fix attempt: posters render, but "Abspielen" (German Play button) instead of "Play"; all backdrop art replaced by black; video plays as black screen (screenshot 2)

Root causes identified so far

A — Browser hangs (resolved by fix #1)

/opt/docker/jellyfin/web-overrides/index.html deployed copy was AHEAD of repo HEAD. md5 deployed b97c1cb4 ≠ repo d77c106b. Someone hot-patched a forceEnglishUI() text-walker MutationObserver onto document.body with subtree:true, characterData:true. Walker rewrote alt/title/aria-label on every DOM mutation. Poster grid lazy-load fired it hundreds of times → main thread frozen → Chrome "Page Unresponsive".

Fix applied: scp'd repo HEAD index.html over deployed, restarted container. Verified md5 matches.

Lesson: never hot-patch the bind-mount. Always commit + redeploy from repo. Drift is invisible until something breaks.

B — DB write failures (auto-resolved before this session)

Agent investigation found jellyfin.db had been owned by uid 101000 (userns-remap leftover, see ~/.claude/projects/-home-admin-ai-lab/memory/project_nullstone_docker_userns.md). Container ran as 1000 → SQLite Error 8: attempt to write a readonly database. By the time we re-checked, file was already user:user. Probably fixed during 23:22 container restart.

Lesson: if jellyfin.db is unwritable, EVERY user-config save silently fails (HTTP 204 success, value not persisted). Check ownership FIRST when config writes don't stick.

C — German "Abspielen" leak (NOT YET FIXED — current focus)

User's Configuration.UICulture is <absent> for ALL 12 users. Tried POST /Users/{id}/Configuration with UICulture: en-US payload via bin/force-english-all-users.sh. Server returned HTTP 204 but field did NOT persist on subsequent GET. POST silently drops UICulture.

Possible explanation: the UserConfiguration model in 10.10.3 may have removed the per-user UICulture field, OR the Users table schema (verified) has no UICulture column AND no Preferences row stores it. Doc 15 claims Configuration.UICulture is authoritative, but that doc is from when fix worked. Behavior may have shifted.

Traefik DOES rewrite Accept-Language: en-US,en;q=0.9 on every request (force-en-accept-lang@file middleware) AND rewrites locale chunk JS path so de-json.X.chunk.js → en-us-json.667484b4a441712c7e05.chunk.js. Verified via curl: de-json.X.chunk.js returns 107425 bytes of English content.

So why German leaking? Service Worker cache. Browser's SW serves stale German chunk from CacheStorage, never hits network, never sees the Traefik rewrite. SW from before the lockdown was deployed.

Tried: Clear-Site-Data: "cache", "cookies", "storage" Traefik response header on /web/index.html. Verified live via curl. But the user's browser STILL has SW cache — SW intercepts the GET to /web/index.html and serves from cache, response from server (with Clear-Site-Data) never reaches browser cache layer. SW prevents its own death.

D — Backdrops missing (NOT YET INVESTIGATED)

User reports backdrop art (the wide background image behind episode cards) is now black for every show. Could be:

Image not in DB/cache (server returning empty)
CSS hiding backdrop element
SW serving stale 404 from a bad earlier session
Jellyfin metadata refresh interrupted

E — Video black screen on play (NOT YET FIXED)

Server logs show ffmpeg IS transcoding HEVC source → H.264 high@5.1 + libfdk_aac. But browser shows black. Earlier /Sessions proved DirectPlay worked for one client (RemoteEndPoint 82.31.156.86). Recent attempts: HLS segment 186.mp4 returned 499 (client closed connection) + POST /Sessions/Playing/Progress returned 502 Bad Gateway at 23:31:49 (during traefik momentary upstream-missing window).

Possible causes:

SW intercepting HLS init segment, serving stale/wrong-mime
10-bit HEVC source → H.264 transcode timing issue
CSS hiding <video> element
HLS init.mp4 vs segment naming bug (hls_fmp4_init_filename "X-1.mp4" + hls_segment_filename "X%d.mp4" — collision risk)

Actions taken this session

#	Action	Outcome
1	scp repo `index.html` → deployed; `docker restart jellyfin`	DOM-walker shim gone. Page no longer hangs.
2	Insert temp ApiKeys row in jellyfin.db, run `bin/force-english-all-users.sh`	POST 204 but UICulture NOT persisted. Possibly server-model dropped field.
3	Add `clear-site-data@file` Traefik middleware to `jellyfin-html-nocache` router	Header lives. But SW intercepts before browser cache layer can apply.
4	Revoke temp ApiKey	Done.

What did NOT work (don't repeat)

bin/force-english-all-users.sh against 10.10.3 — POST 204 but field dropped server-side. Either model changed or DB write path broken differently than uid-101000 issue.
Clear-Site-Data response header alone — SW intercepts and the header never reaches browser cache eviction. Need to kill SW BEFORE it can intercept.

Forbidden patterns

Hot-patching web-overrides/index.html without committing to repo. Bug A came from this exact pattern. Repo MUST = deployed.
Trusting HTTP 204 as success. Verify with GET.
Client-side DOM-walker MutationObservers without debounce + scope. Will tank performance + freeze browser.

Plan (in flight)

Read every prior doc (docs/01..25) — extract what was tried + outcome (agent task)
Read git log of web-overrides/, bin/force-english-all-users.sh, bin/inject-shim.py (agent task)
Online: how to kill a Jellyfin Service Worker definitively (agent task)
Read /web/serviceworker.js source — what does it cache? (agent task)
Diagnose backdrop missing — server vs CSS vs SW (agent task)
Diagnose HEVC playback black screen — codec + segment + HLS (agent task)
Compare jellyfin-dev vs jellyfin (agent task — dev MAY be working, look at what's different)
Apply consolidated fix from agent findings
Verify in user browser
Commit doc 26 + any code changes; push to git.s8n.ru/s8n/ARRFLIX

Findings from agents

Repo archeology

Reference compiled 2026-05-09 from docs/13-25 + bin/* + git log. Use this to skip dead-ends.

A - Locale lockdown - what's been tried + outcomes

Chronological history (paths absolute):

/home/admin/arrflix-repo/docs/15-force-english.md (commit 14f63e8, 2026-05-08 04:22) - diagnosis: per-user Configuration.UICulture absent on all 5 users -> SPA falls back to Accept-Language. Built bin/force-english-all-users.sh (read-modify-write POST /Users/{id}/Configuration with UICulture: en-US, expect 204). Shipped one-line wrapper patch for bin/add-jellyfin-user.sh step 3/4 (c['UICulture']='en-US'). Status at write-time: plan-only, script never executed.
/home/admin/arrflix-repo/docs/19-english-only-audit.md (a3f82df) - confirmed UICulture still absent on 8/8 users; identified that 92 non-English <lang>-json.<hash>.chunk.js chunks reachable (de-json.1afccc006ab8bb6c5953.chunk.js contains "Play":"Abspielen"). Proposed three orthogonal fixes: (a) Path-A Traefik customrequestheaders.Accept-Language=en-US middleware, (b) Path-B 1-byte chunk stub bind-mounts (brittle - chunk hashes rotate per JF image), (c) navigator.language shim in inject-shim.py. Outcome: recommendations only.
/home/admin/arrflix-repo/docs/20-english-only-lockdown.md (d5d6856) - operator doc declaring 4 layers (server, per-user, web SPA shim, Accept-Language). Ships bin/english-lockdown-runner.sh (idempotent re-apply for layers 1+2). Layer 3 = web-overrides/english-lockdown.{js,css} (sibling commit d2120c6). Outcome: claimed working at write-time.
/home/admin/arrflix-repo/docs/25-english-leak-deep-dive-2026-05-08.md (117fa33) - critical retraction: greppped the live web bundle and proved the SPA NEVER reads Configuration.UICulture. Only wizard-start.<hash>.chunk.js and 25583.<hash>.chunk.js reference it, both for the admin /System/Configuration form, NOT user UI. Actual locale resolver reads document.documentElement.getAttribute("data-culture") -> navigator.language -> navigator.userLanguage -> navigator.languages[0] -> localStorage.getItem("language") (no user prefix). Per-user UICulture POST = theatre. Only the shim's Object.defineProperty(Navigator.prototype, 'language', ...) actually pins SPA UI. Verified with headless Trivalent --lang=de-DE --accept-lang=de-DE,de,en -> only en-us-json.667484b4a441712c7e05.chunk.js requested.
Today's deployed shim (/home/admin/arrflix-repo/bin/inject-shim.py lines 13-114) - does ALL of the above: localStorage.setItem for 6 keys (appLanguage,selectedlanguage,selectedlocale,language,locale,culture), Object.defineProperty(Navigator.prototype, 'language'), Object.defineProperty(Navigator.prototype, 'languages'), fallback navigator.X redefine, fetch+XHR wrappers stripping Accept-Language and rewriting POST /Users/{id}/Configuration body to force UICulture:'en-US', pinLocale() re-runs every 1 s + on visibility-change. This is the canonical recipe - anything that works lives here. Doc 26 sec C confirms Traefik force-en-accept-lang@file middleware also rewrites Accept-Language per request, AND rewrites de-json.X.chunk.js -> en-us-json.667484b4a441712c7e05.chunk.js (curl-verified: de URL returns 107 425 bytes of English).

B - Service worker handling - what's been tried + outcomes

docs/13 finding 11 + docs/23 sec 5 + docs/25 hypothesis 2 - /web/serviceworker.js is 768 bytes, Last-Modified: 2024-11-19 (Jellyfin 10.10.3 ship). Source confirmed: only notificationclick handler + clients.claim(), no fetch listener, no precache, no cache.put. Stock SW cannot poison posters/HLS by design.
bin/inject-shim.py lines 174-188 - shim already calls navigator.serviceWorker.getRegistrations().then(regs => regs.forEach(r => if scriptURL.includes('serviceworker.js') r.unregister())) AND caches.keys().then(keys => keys.forEach(caches.delete)). Built-in SW kill + cache wipe runs every page load. In production now.
docs/25 R1 - proposed Cache-Control: no-cache on /web/index.html to stop heuristic caching of pre-shim HTML (Path-A label-scoped Traefik middleware). Status: not applied at doc-25 write-time.
Doc 26 sec C - added clear-site-data@file Traefik middleware. Header reaches curl, but SW intercepts before browser cache layer can apply Clear-Site-Data - SW prevents its own death. SW kill must come from inside the SW (self-destruct) or via Update fetch returning 404. See SW kill recipe section below.

C - Backdrop / artwork issues - any prior doc covers this?

docs/14 - only doc that touches detail-page backdrops. Diagnosed Finity-parent's --detail-page-backdrop-offset: 17% + mask.png from raw.githubusercontent.com/prism2001/finity/main/assets/mask.png. Two CSS culprits clamping the band hard-black: (a) :root --primary-background-color: #000 !important, (b) html, body, .preload, .skinBody, ..., #reactRoot, .mainAnimatedPages, .dashboardDocument { bg:#000 !important }.
docs/14 sec 7 proposed CSS fix (linear-gradient overlay, body.itemDetailPage scope-out for bg-clamp). Doc 21 sec 4 cross-ref says "just landed".
docs/23 finding 6 - /Items/{id}/Images/Primary returns Cache-Control: public with NO max-age (heuristic = 0 s); cold poster transcode 350-470 ms; on-disk image cache /cache/images/resized-images/ is 39 MB / 412 files / 16 h retention.
docs/24 sec 4 - image cache 39 MB total, 412 files, no GC pressure, oldest 16 h old.
No prior doc covers "all backdrops replaced by black" as a regression. Closest precedents: doc 14 hard-black left band (CSS layer), doc 23 poster timing (cold-cache layer). New investigation territory for doc 26.

D - Video playback / HLS / transcode issues - any prior doc?

docs/13 finding 03 - EnableThrottling=false, EnableSegmentDeletion=false, MaxMuxingQueueSize=2048, SegmentKeepSeconds=720. Two 499 client-cancels in 1 h (HLS segments at 6.4 s + 2.9 s).
docs/21 - full HDR/HEVC diagnosis for Rick & Morty. Source = HDR10 (smpte2084, bt2020nc, yuv420p10le, color_range=pc, no MasteringDisplay/CLL - fake AI-upscale HDR). EnableTonemapping=false + HardwareAccelerationType=none -> HDR pixels delivered as SDR -> washed-out (NOT pure black). PlaybackInfo: TranscodeReasons=ContainerNotSupported, AudioCodecNotSupported, SubtitleCodecNotSupported. Fix: EnableTonemapping=true (bt2390 already selected).
docs/22 sec 5 - 4 concurrent ffmpegs on ONE viewer of R&M S01E01. Filtergraph: [0:4]scale,scale=3840:2160:fast_bilinear[sub]; [0:0]...format=yuv420p[main]; [main][sub]overlay, libx264 preset=veryfast crf=23 maxrate=13.5Mbps, fmp4 HLS. 643 % CPU each. Cause: EnableThrottling=false + EnableSegmentDeletion=false.
docs/22 sec 3 - TranscodingSubProtocol: hls, Container: fmp4/hls, IsVideoDirect=False, IsAudioDirect=False. PlayMethod reports DirectPlay while TranscodingInfo is populated - race in Sessions DTO; actual decision is transcode.
docs/23 sec 7 - every Traefik request > 50 ms is /videos/.../hls1/main/*.mp4 HLS-segment GET. AV1+HEVC at 360-550 Mbit. 15 x 499 + 8 x 500 in 6 h (CPU-side, not edge).
No prior doc covers "video plays as black screen" with audio working. HLS init/segment naming collision risk (hls_fmp4_init_filename "X-1.mp4" + hls_segment_filename "X%d.mp4") is a doc-26-only hypothesis. SW-intercepting-init-segment is also doc-26-only - but stock SW has no fetch handler so this requires a poisoned non-stock SW.

E - Forbidden patterns - things explicitly called out as "do not do"

No bundle modifications (docs/16 F5, docs/19 row 16). Content-hashed filenames rotate per JF image upgrade; breaks source-map; must re-emit per bump.
No DOM-walker MutationObservers without debounce + scope (doc 26 sec A bug A). The hot-patched forceEnglishUI() text-walker on document.body with subtree:true, characterData:true froze the main thread on poster lazy-load. The inject-shim.py walker in doc 16 sec C is the safe pattern (acceptNode filter + bounded selector).
No hot-patching web-overrides/index.html without committing to repo (doc 26 sec A lesson). md5 drift between deployed and repo HEAD is invisible until breakage.
No trusting HTTP 204 as success (doc 26 sec B lesson). jellyfin.db owned by uid 101000 (userns leftover) -> SQLite Error 8 readonly - POSTs return 204 but value not persisted. Always GET-verify.
No Cache-Control: immutable on /web/index.html (doc 25 R1 caveat). Bricks next deploy until users force-reload. Scope to hashed chunks only.
No tonemap on SDR sources (doc 21 sec 7e). If Mandalorian looks oversaturated post-fix, tonemap leaks - set TonemappingMode from auto to stricter.
No relying on per-user Configuration.UICulture for UI strings (doc 25 R3 + sec 4). Server-side metadata theatre. Only the shim pins UI. Keep field for future-proofing but stop expecting it to fix Abspielen.
No bundle bind-mount for <lang>-json.<hash>.chunk.js (doc 19 Path B caveat, doc 25 R4). Hashes rotate per image upgrade - must regenerate every bump.
No deleting Settings drawer node (doc 17 sec 3.1). Drawer-renderer rebuilds on next render; remove only via CSS display:none + style override. Old mypreferencesmenu selectors match 0 elements - use a.btnSettings, [data-itemid="settings"].
No theme @import without snapshot (doc 14 sec 9). /System/Configuration/branding is whole-object replace - sibling Cineplex POST overwrote ElegantFin/NeutralFin within minutes (race rule, doc 04 sec 3b).
No bg:#000 !important on detail pages (doc 14 sec 2c, doc 21 sec 4) - clamps Finity's intentional 17vw band into hard-black slab. Scope to body:not(.itemDetailPage).
No stripping Accept-Language at Traefik for shared backends (doc 15 limit 2; relaxed in doc 19 sec 19 since arrflix is sole consumer of arrflix.s8n.ru router).

SW kill recipe

Research date 2026-05-09. Treat as authoritative for this incident.

Q1 — Clear-Site-Data through an active SW: Per W3C spec and MDN, Clear-Site-Data is only honored on responses fetched over the network, not those served by a SW. A SW can return arbitrary responses (incl. third-party), so browsers ignore CSD on SW-intercepted responses. Chrome/Firefox/Edge/Opera implement this; Safari support is partial. Conclusion: our existing Traefik header on /web/index.html will only fire for users whose SW lets that exact URL through to network — for stuck SWs that serve cached index.html, the header never reaches the browser. Verified-not-working alone. (MDN Clear-Site-Data, Chrome Workbox guide)

Q2 — Self-destruct shim: Verified working pattern. Google's official Workbox guide recommends this as the primary approach. The browser performs a byte-for-byte update check on the SW script (max 24h, often immediate when Cache-Control: max-age=0 or response differs). When the new script unregisters itself, all clients controlled by it lose their controller on next navigation. Canonical NekR snippet (github.com/NekR/self-destroying-sw):

self.addEventListener('install', e => self.skipWaiting());
self.addEventListener('activate', e => {
  self.registration.unregister()
    .then(() => self.clients.matchAll())
    .then(cs => cs.forEach(c => c.navigate(c.url)));
});

Bind-mount feasibility: Jellyfin official image serves web from /jellyfin/jellyfin-web/ inside the container. Bind-mounting the whole directory is broken (jellyfin/jellyfin#8441), but bind-mounting a single file over the existing serviceworker.js works the same way index.html does for us. Path inside container is /jellyfin/jellyfin-web/serviceworker.js. (Jellyfin container docs, discussion #8441)

Q3 — 404/410 for SW script: Spec status is may work, browser-dependent. W3C ServiceWorker issue #204 was closed wontfix — the spec does NOT mandate auto-unregister on 404/410 during normal navigation. HOWEVER, the Update algorithm (run on navigation, ~24h, or registration.update()) DOES unregister on 404/410 in Chrome and Firefox today (matches AppCache). The catch: update only runs when the browser checks; a stuck SW serving cached pages may never trigger an update fetch. Less reliable than self-destruct shim. (w3c/ServiceWorker#204)

Q4 — Jellyfin 10.10.x SW poisoning: No 10.10-specific SW-poster issue filed. The actual src/serviceworker.js in jellyfin-web is notification-only — no fetch listener, no cache logic. So if arrflix.s8n.ru/web/serviceworker.js is intercepting media, it is NOT stock Jellyfin code — likely a stale SW from a prior deploy, an injected mod (BobHasNoSoul/jellyfin-mods etc.), or browser-side residue. Stock Jellyfin SW cannot poison posters/HLS by design. Related issues: jellyfin-web#4549 (premature caching), jellyfin-web#5729 (stale /system/info/public).

Q5 — Container path: Confirmed /jellyfin/jellyfin-web/serviceworker.js for the official jellyfin/jellyfin image.

Prod-vs-dev diff

Investigation 2026-05-09 — comparing live jellyfin (prod) vs jellyfin-dev containers on nullstone. Image tags identical: both jellyfin/jellyfin:10.10.3. Network.xml byte-identical. So differences below are 100% the operator's hardening, not Jellyfin upstream.

A — docker-compose.yml diff (key items):

Prod mounts ~110+ web-override files: index.html, cineplex.css, AND a locale-en-only/ directory containing every non-English *-json.*.chunk.js (af, ar, as, be, bg, bn, ca, cs, da, de, ... zh-tw, zu) bind-mounted RO over the container's locale chunks. Dev mounts ONLY index-dev.html over index.html. No CSS, no locale chunks.
Prod traefik labels: security-headers@file,compress@file,force-en-accept-lang@file. Dev: security-headers@file,no-guest@file. Prod has NO no-guest@file directly on the docker-label router — its no-guest layer is enforced by the higher-priority jellyfin-html-nocache file-provider router (which ALSO adds cache-no-store@file, clear-site-data@file — see below).
Prod env adds JELLYFIN_UICulture=en-US, LANG=en_US.UTF-8, LC_ALL=en_US.UTF-8. Dev has none.

B — branding.xml / CustomCss diff:

Prod: 30,795 bytes. Full Cineplex CSS via @import url("/web/cineplex.css") (LOCAL bind-mount), ARRFLIX logo PNG embedded as base64 data-URI, Cast/Crew hidden, Quick Connect hidden, header buttons hidden, white slider thumbs, pure-black --primary-background-color.
Dev: 26,345 bytes. Cineplex via @import url("https://cdn.jsdelivr.net/gh/MRunkehl/cineplex@v1.0.6/cineplex.css") (REMOTE jsDelivr — no /web/cineplex.css bind-mount). Same login disclaimer + Cast/Crew hide. Confirmed dev has its OWN branding.xml on disk (not empty).

C — Per-user UICulture / settings: Could not run sqlite3 inside container (binary not present). Prod and dev both have separate config dirs (/home/docker/jellyfin/ vs /home/docker/jellyfin-dev/). Dev config/data tree is a leaner subset (no keyframes/, no splashscreen.png, no subtitles/, no device.txt-only DB-shm/wal absence — dev DB sits idle without WAL == fewer active sessions, expected). Dev was set up as a fresh first-run wizard per docs/12-dev-instance.md, so its user table is its own admin only.

D — encoding.xml diff: Real divergence:

Prod: EnableThrottling=true, EnableSegmentDeletion=true, EnableTonemapping=true.
Dev: EnableThrottling=false, EnableSegmentDeletion=false, EnableTonemapping=false.
Prod is the stricter/lower-resource HLS profile; dev keeps every segment around. Plausible contributor to the HLS 499 client-disconnect seen in section E (prod): if a client pauses/seeks while throttling+deletion are both on, segment 186 may be reaped before re-request lands.

E — Surprising / smoking gun: Traefik headers prod-only, NOT applied to dev:

curl -sI https://arrflix.s8n.ru/web/index.html returns:
- cache-control: no-cache, no-store, must-revalidate
- clear-site-data: "cache", "cookies", "storage"
curl -sI https://dev.arrflix.s8n.ru/web/index.html returns NEITHER. Just x-frame-options: SAMEORIGIN.
Source: /opt/docker/traefik/config/dynamic.yml defines a HIGH-PRIORITY (priority:100) file-provider router jellyfin-html-nocache matching Host(arrflix.s8n.ru) && Path(/, /web/, /web/index.html, /web/sw.js, /web/manifest.json) with middlewares security-headers,compress,cache-no-store,force-en-accept-lang,clear-site-data. Dev's dev.arrflix.s8n.ru host has no equivalent file-provider router — only the docker-label router applies.
The clear-site-data middleware was ADDED 2026-05-09 (today) as a "one-shot" to wipe SW+cache+storage. Comment in dynamic.yml literally says: "Remove this middleware after owner has visited once and confirmed clean state."
Implication: Every prod page-load tells the browser to wipe cache + cookies + storage. If the SW intercepts before the header reaches the cache layer (per Q1 finding above) the header is harmless; but if any auth state or in-progress playback state is in storage when the header DOES land (e.g. on a forced refetch), it gets nuked. Dev does not have this and dev "works".
Prod also has jellyfin-locale-force-en (priority:200) doing replacePathRegex from any locale-json chunk to en-us-json.667484b4a441712c7e05.chunk.js. The hash is hard-coded; if the deployed Jellyfin web bundle ever shipped a different en-us-json hash, EVERY locale chunk request returns a 404 wrapped as a successful rewrite to a non-existent path. Worth verifying the hash matches the live bundle.

Suggested transplant (smallest reversible change):

Remove the clear-site-data@file middleware from the jellyfin-html-nocache router in /opt/docker/traefik/config/dynamic.yml (one line). Keep cache-no-store so the SW-update fetch still bypasses heuristic cache. Traefik hot-reloads.
Verify with curl -sI https://arrflix.s8n.ru/web/index.html → no clear-site-data header.
If prod now behaves like dev, the CSD header was a major factor in the unresponsive page (storage wipe in flight while SPA boots = re-auth race + token loss).
Re-test playback. If still black-screen, suspect the encoding.xml EnableThrottling+SegmentDeletion=true combo and try toggling each off to match dev.
Last resort: also drop the jellyfin-locale-force-en rewrite and verify the hard-coded en-us-json hash is current with the running 10.10.3 bundle.

Online research 2026-05-09

Research-only pass against current GitHub state. All URLs verified live this date.

Q1 — UICulture per-user broken in 10.10.3? No evidence the field was removed from UserConfiguration in the 10.10.x line. DeepWiki's settings-management page still documents per-user UICulture. The closest live regression is jellyfin/jellyfin#16117 ("Can't change plugins settings - Fixed by disabling Cloudflare Rocket Loader"): same shape — POST returns 2xx, body silently dropped, only over reverse proxy. Verdict: probable that our symptom is reverse-proxy-side body mangling, not a server-side schema removal. Sanity check: bypass Traefik (curl --resolve arrflix.s8n.ru:8096:127.0.0.1 direct to container) and POST UICulture; if it persists there but not via Traefik, middleware is mutating the JSON. Discussion #15857 confirms 204 No Content is the expected return code for these write endpoints — the 204 itself is not the bug. (#16117, discussion #15857, DeepWiki settings)

Q2 — Backdrops missing while posters work. Confirmed root cause = TMDB API change. jellyfin/jellyfin#14922 (opened 2025-10-01, CLOSED) and #14951 (2025-10-06, CLOSED): TMDB swapped "no-language" backdrop tag from empty-string to xx; Jellyfin 10.10.x scrapes those as Thumbs, not Backdrops, so the Backdrops slot is empty. The Jellyfin team explicitly said it will not be backported to 10.10 — fix lands only in 10.11.0+. So our 10.10.3 instance has zero backdrops for any item added after ~Sep 2025 unless a non-xx language backdrop happened to exist. Issue #7264 (Movies showing backdrops instead of posters) is a separate 10.11.1 regression — opposite symptom, not relevant here, marked "Can't Reproduce" in #15259. Verdict: confirmed for our case. Mitigation = upgrade to 10.11.x and run "Replace existing images" on every item after upgrading. (#14922, #14951, #7264)

Q3 — Service Worker survival despite Clear-Site-Data. Confirmed. Chrome's official Workbox guide states Clear-Site-Data "can't be relied on alone" because the SW intercepts the very response that would carry the header. Chromium SW Security FAQ explicitly recommends pairing CSD with a no-op SW. Same conclusion as our SW kill recipe section, validated from a second angle. (Chrome Workbox, Chromium SW FAQ)

Q4 — Self-destruct SW pattern in Jellyfin community. No Jellyfin-specific recipe published. Generic NekR self-destroying-sw is the canonical pattern (already cited above). BobHasNoSoul/jellyfin-mods ships a replacement SW (not a self-destruct one) — useful only as a reference for how others bind-mount over /jellyfin/jellyfin-web/serviceworker.js. Verdict: no evidence of a Jellyfin-curated kill recipe; we are first to ship one. (NekR, BobHasNoSoul/jellyfin-mods)

Q5 — HLS fmp4 init-segment collision on restart. No evidence of collision in practice. Jellyfin always passes -start_number 0 and the init filename is <hash>-1.mp4 (literal -1, not %d-derived); segments are <hash>0.mp4, <hash>1.mp4, ... so -1 cannot collide with any positive %d. Restart spawns a new hash (different session id), so old and new sessions don't share filenames either. The active live bug is jellyfin/jellyfin#16612 — playback breaks after 10–15 s in 10.11.8 with fMP4-HLS — but the cause traced in that thread is FFmpeg/segment-availability, not init-name collision. Tangentially: #12230 (CLOSED) is about the init filename being passed relative not absolute — only matters when Jellyfin's CWD ≠ transcode dir (rffmpeg setups). Verdict: no evidence that init-name collision causes our black-screen. Look at #16612 and at Cache-Control: no-store on /Videos/*/hls1/* instead. (#16612, #12230)

Q6 — Cineplex theme repo activity. Repo MRunkehl/cineplex last pushed 2025-09-06 (sha 98c8e71, "Fixed more styles and script"). Description: "Updated jellyflix theme for newest jellyfin v10.10.7 and better netflix styles". Zero open or closed issues (issues tab is empty). No commits since 10.11.0 shipped, so the theme has not been validated against 10.11 image-type changes. Verdict: probable that backdrop CSS selectors target 10.10 DOM and may break or hide backdrops on a 10.11 upgrade. Audit cineplex.css for .itemBackdrop, .backdropContainer, .cardBox-bottompadded selectors before upgrading. (repo)

Q7 — Jellyfin 10.11.8 changelog. Does NOT fix our issues directly. Server 10.11.8 ships only 3 changes: subtitle-language library handling, subtitle saving, and language-filter querying. jellyfin-web 10.11.8: a single PR (#7796) for lazy device-info loading. Released as a regression-revert from 10.11.7 ahead of CVE/GHSA disclosure. None of UICulture persistence, SW poisoning, or fMP4 playback are addressed in .8 itself. However the TMDB-backdrop fix (Q2) lands in the 10.11.0 baseline that .8 inherits. Verdict on .8 specifically: no evidence it helps directly; confirmed the 10.11 line fixes Q2. Upgrade target = 10.11.8 (latest stable: 10.11.0 backdrop fix + .7 security fixes + .8 regression reverts). (10.11.8 server, 10.11.8 web)

Recommended action sequence

Option A — Self-destruct shim (RECOMMENDED, verified working):

# On nullstone, in the arrflix compose dir:
cat > /opt/docker/arrflix/web-overrides/serviceworker.js <<'EOF'
self.addEventListener('install', e => self.skipWaiting());
self.addEventListener('activate', e => {
  self.registration.unregister()
    .then(() => self.clients.matchAll())
    .then(cs => cs.forEach(c => c.navigate(c.url)));
});
EOF
# Add to compose volumes (same pattern as index.html):
#   - /opt/docker/arrflix/web-overrides/serviceworker.js:/jellyfin/jellyfin-web/serviceworker.js:ro
docker compose -f /opt/docker/arrflix/compose.yml up -d --force-recreate jellyfin
# Force Traefik to send no-cache on the SW script so browsers refetch immediately:
#   middleware: response header Cache-Control: no-cache, no-store, max-age=0 on /web/serviceworker.js

Side effects: every existing browser session navigates to its current URL once on next page load — looks like a single auto-refresh. No data loss. New visitors get the shim, immediately unregister, never see it again.
Recovery: revert by removing the bind-mount line + up -d --force-recreate. Original SW returns.
Verify: curl -skI https://arrflix.s8n.ru/web/serviceworker.js → 200 + Cache-Control: no-cache. Body matches the shim. In an incognito window: open DevTools → Application → Service Workers shows registration then "redundant" within seconds.

Option B — Serve 404 (may work, less reliable):

# Traefik file-provider snippet:
#   - /web/serviceworker.js → middleware that returns 404 (errors middleware → static 404 service)
# Or simply: bind-mount an empty file and add a Traefik replacePathRegex to a non-existent path.

Side effects: Chrome/Firefox unregister on next Update fetch (typically next navigation after >24h, or sooner if user reloads). Slow rollout. Some users may stay stuck for a day.
Recovery: remove the rule, original SW returns on next image rebuild.
Verify: curl -skI https://arrflix.s8n.ru/web/serviceworker.js → 404. DevTools shows SW going "redundant" after a navigation+reload cycle.

Option C — Do nothing server-side, force user manual:

User opens DevTools → Application → Service Workers → Unregister, OR chrome://serviceworker-internals → Unregister, OR clears site data.
Side effects: every user must do this individually; non-technical users can't.
Recovery: trivial, nothing changed.
Verify: per-user; no server signal.

Decision: Go with Option A. It is the Google-recommended pattern, is the only approach that auto-fixes already-loaded tabs without user action, and is reversible by removing one line from compose.

SW source + image cache

(Agent run 2026-05-09 — verifies the stock SW source live on the running container, and probes server-side image health for a known item. Important: contradicts the working assumption that the SW is intercepting fetches.)

Part 1 — /web/serviceworker.js source + interception map

Both docker exec jellyfin cat /jellyfin/jellyfin-web/serviceworker.js and curl -sk https://arrflix.s8n.ru/web/serviceworker.js return the same file (~1KB single line):

(self.webpackChunk=self.webpackChunk||[]).push([[82798],{16764:function(n,e,t){
  t(78557),t(90076),
  self.addEventListener("notificationclick", function(n){ /* opens window or calls connectionManager */ }, !1),
  self.addEventListener("activate", function(){ return self.clients.claim() })
}}, function(n){ n.O(0,[59928], function(){ return 16764, n(n.s=16764) }), n.O() }]);

Interception map — there is none.

No fetch event listener in this file.
Only listeners: notificationclick and activate (calls clients.claim()).
t(78557) and t(90076) are webpack require calls for two other modules — those might register fetch handlers, but they are NOT in this bundle (they live in lazy chunks under /web/*.chunk.js). The chunk IDs 82798 / 59928 map to the notification module only.
No CacheStorage usage anywhere in this bundle. No caches.open, caches.match, cache.put. So this SW does NOT cache /Items/{id}/Images/*, /Videos/{id}/*, /web/*-json.*.chunk.js, or /web/index.html.

Conclusion: Jellyfin 10.10.3 web's stock SW is push-notification-only. It does not intercept fetches and owns no CacheStorage entries. This confirms agent Q4 finding ("notification-only — no fetch listener, no cache logic") against the running container — not just spec/source, the literal bytes Jellyfin is shipping.

Implication for Section C diagnosis: "SW intercepts the GET to /web/index.html and serves from cache" is false. With no fetch handler the SW cannot intercept. Clear-Site-Data would already reach the network response — the real blocker for stale German chunks is HTTP browser cache (memory + disk), not Service Worker cache.

Replacement plan: The self-unregister shim is still safe and useful as belt-and-braces — installs cleanly, deletes any caches that ever existed, unregisters, force-reloads. Bind-mount path inside container is /jellyfin/jellyfin-web/serviceworker.js. But it is not the missing piece for the German leak. Real fix: existing Cache-Control: no-store + Clear-Site-Data headers on /web/index.html plus a hard reload (Ctrl+Shift+R) or DevTools → Application → Clear storage on user's browser.

Part 2 — Image cache state

/home/docker/jellyfin/config/metadata = 112M  (well-populated)
  /library/<hh>/<item-id>/poster.jpg present in sampled items
/home/docker/jellyfin/cache             = 59M
  /images/resized-images/{0..f} = 16 hex subdirs, all populated with .webp tiles

Agent 7's earlier note "only resized-images subdir present" is still true — /cache/images/ contains only resized-images/, no original/ or remote/. That is the expected Jellyfin layout (originals live under /config/metadata/library/, only resizes live under /cache/images/resized-images/). Not a bug.

API probe for item 7aa5add2c2d8575eda5280b9b9072071 (The Mike Nolan Show) via temp token (revoked after), all four image types via https://arrflix.s8n.ru:

Endpoint	Status	Content-Type	Notes
`/Items/{id}/Images/Backdrop`	200	image/jpeg	served, `age: 5400` (90min upstream cache)
`/Items/{id}/Images/Primary`	200	image/jpeg	served
`/Items/{id}/Images/Logo`	200	image/png	served
`/Items/{id}/Images/Thumb`	200	image/jpeg	served

Verdict: Server-side images are healthy. Backdrop + Primary + Logo + Thumb all 200 with valid content-types for a real item the user is browsing. The "all backdrops black" symptom (Section D) is NOT a server-side image problem and NOT a SW-cache problem. Likely culprits remaining:

(a) CSS rule in deployed index.html overrides / theme overrides hiding .itemBackdrop or setting opacity: 0;
(b) browser HTTP cache holding stale 404s from earlier broken state — same Ctrl+Shift+R fix as Part 1;
(c) a custom-css.user.css backdrop opacity:0 / display:none rule.

Recommend: in user's browser open one show page, DevTools → Network → filter Img → look for /Items/{id}/Images/Backdrop request. If 200 served but invisible → CSS theme leak. If never requested → SPA template not fetching it (theme-side bug).

Backdrop diagnosis

Investigation 2026-05-09. User reported: detail-page backdrops are pure black on prod (arrflix.s8n.ru). Posters render fine. Used a temp ApiKey row (Name='arrflix-backdrop-diag-2026-05-09', deleted after diag) on the live jellyfin container.

Layer A (server) — RULED OUT.

Item 7aa5add2c2d8575eda5280b9b9072071 (The Dark Knight) JSON returns BackdropImageTags: ['76cac7069dc988f7cd54e99b481db3fc']. Tag exists.
HEAD https://arrflix.s8n.ru/Items/.../Images/Backdrop → HTTP/2 200, content-type: image/jpeg, content-length: 560210, last-modified: 2026-05-08 22:11:50.
Same call against dev.arrflix.s8n.ru → also 200 + image/jpeg. Both prod and dev serve backdrop bytes correctly.

Layer C (browser cache / SW) — RULED OUT.

The stock SW (Section "SW source + image cache" above) does not intercept /Items/*/Images/*. Backdrop URL also returns fresh on direct curl (no SW in path).

Layer B (CSS) — CONFIRMED. The CustomCss BLACK-PASS block hides the image layer.

The Jellyfin DOM has two distinct elements (verified by reading main.jellyfin.bundle.js + main.jellyfin.1ed46a7a22b550acaef3.css inside the running container):

.backdropContainer — stock CSS: position:fixed; bottom:0; left:0; right:0; top:0; z-index:-1. Holds a child <div class="backdropImage"> whose style.backgroundImage="url(/Items/.../Backdrop)" is injected by JS (r.style.backgroundImage="url('".concat(e,"')") in the bundle). This is the IMAGE LAYER.
.backgroundContainer (no d) — separate position:fixed overlay; gets the withBackdrop class toggled by JS. This is the OVERLAY LAYER. Stock CSS sets body { background-color: transparent !important; } precisely so the body never occludes the z-index:-1 backdrop.

Bug 1 — !important blacks override stock body transparency. CustomCss BLACK-PASS 2026-05-08 block (lines ~110-202 of branding.xml CustomCss) sets background-color: #000000 !important on html, body, #reactRoot, .skinBody, .preload, .mainAnimatedPages, .pageContainer, .libraryPage, .itemDetailPage, .padded-bottom-page, .layout-desktop, .layout-mobile, .layout-tv etc. Since .backdropContainer is at z-index:-1, ANY ancestor with an opaque background paints on top of it, hiding the backdrop image entirely.

Bug 2 — The transparent-scope rule at lines 102-107 is incomplete. It scopes to body.itemDetailPage, body.itemDetailPage #reactRoot, body.itemDetailPage .mainAnimatedPages, body.itemDetailPage .skinBody, but does NOT include .layout-desktop / .itemDetailPage itself / .layout-tv / .pageContainer / .padded-bottom-page — so those wrappers remain #000 on detail pages and continue to occlude the z-index:-1 layer.

Bug 3 (cosmetic — not the cause of black) — line 89-101 sets background-image: linear-gradient(...) on .layout-desktop .backgroundContainer.withBackdrop. That's the OVERLAY layer, fine on its own. But because the actual backdrop image is hidden by Bug 1, the gradient now composites against pure black instead of the backdrop, so the user sees only the gradient (which fades from black to transparent) over a black backdrop = solid black with at most a faint gradient edge.

Cross-check: dev (dev.arrflix.s8n.ru) does NOT mount the BLACK-PASS CustomCss block (Section B above confirms dev branding.xml is 4.5KB smaller and uses remote jsDelivr Cineplex without local overrides). Opening dev should show backdrops normally; if it does, that's a clean A/B confirmation that prod's CustomCss is the regression.

Fix recipe (smallest reversible change).

In /home/docker/jellyfin/config/config/branding.xml <CustomCss> block, extend the body.itemDetailPage transparent-scope rule (currently lines 102-107) to also cancel the black backgrounds on every wrapper that the BLACK-PASS block paints:

/* Replace existing block at lines 102-107 with: */
body.itemDetailPage,
body.itemDetailPage #reactRoot,
body.itemDetailPage .mainAnimatedPages,
body.itemDetailPage .skinBody,
body.itemDetailPage .layout-desktop,
body.itemDetailPage .layout-mobile,
body.itemDetailPage .layout-tv,
body.itemDetailPage .pageContainer,
body.itemDetailPage .padded-bottom-page,
body.itemDetailPage .itemDetailPage,
body.itemDetailPage #mainPanel,
body.itemDetailPage #mainDrawerPanel {
  background-color: transparent !important;
  background: transparent !important;
}

This keeps #000 everywhere else (library, search, dashboard) but reveals the .backdropContainer > .backdropImage layer on detail pages — which is what the gradient overlay (Bug 3) was originally designed to compose against.

Apply via Dashboard → Branding → Custom CSS (no container restart needed; CSS reloads on next page render). Editing branding.xml directly works too but Jellyfin re-serializes on save, so use the Dashboard.

Verify after edit: open a movie detail page in an incognito window (bypasses SW). Expected: full-bleed backdrop visible at right ~70% of viewport, gradient fade from black on the left. If still black: hard-refresh + DevTools → Elements → search .backdropImage and confirm its parent chain has no background-color other than transparent.

Recovery: revert to the original 6-selector block.

Playback diagnosis

Investigation date 2026-05-09 ~00:30–00:45 UTC. Live transcode test against prod jellyfin via temp ApiKey arrflix-playback-diag-2026-05-09 (deleted at end of session, verified empty SELECT after DELETE).

A) Source codec verdict — the ItemId is mis-attributed in this incident report. ItemId 7aa5add2c2d8575eda5280b9b9072071 is The Dark Knight (2008), NOT "The Mike Nolan Show". Confirmed via /Users/{u}/Items?searchTerm=...:

7aa5add2... → Movie / /media/movies/The Dark Knight (2008)/The Dark Knight (2008).mkv — HEVC Main 10 / yuv420p10le, 1918x800, TrueHD 24-bit + AC3 + 2× PGS.
The Mike Nolan Show series Id is 37cb910f507c4d1f9e365ef1954f99c2. Episodes (e.g. S01E04 "Ding Dong Delli") are AV1 Main / yuv420p / Opus, ~412 kbps total.

(So the prior Section D backdrop-probe line that labelled 7aa5add2... as MNS is also wrong — those Backdrop/Primary/Logo/Thumb 200s were TDK images. Doesn't change Section D's conclusion that backdrops serve fine.)

Chrome advertises av1,h264,vp9 (NOT hevc, NOT vp8). So:

TDK (HEVC 10-bit): must transcode → server picks libx264 High@4.0 yuv420p (8-bit) AAC LC stereo. Fully Chrome-decodable.
MNS episodes (AV1+Opus): should DirectPlay/DirectStream — Chrome supports both natively.

B) HLS pipeline verdict — server-side fully working. PlaybackInfo POST returned TranscodingUrl=/videos/.../master.m3u8?VideoCodec=h264&..., SupportsTranscoding=True, TranscodingSubProtocol=hls. Manual fetches on TDK:

master.m3u8 → HTTP 200, valid #EXTM3U, single variant BANDWIDTH=13407532, RESOLUTION=1918x800, CODECS="avc1.424029,mp4a.40.2" (the 424029 decodes to "Baseline 4.1" but actual stream below is High — known cosmetic Jellyfin mislabel, not a Chrome blocker).
main.m3u8 sub-playlist → HTTP 200, segments hls1/main/0.ts … 9.ts, 3-second EXTINF.
segment 0.ts → HTTP 200, 269 KB. ffprobe verdict: h264 High / yuv420p / level 4.0, 1918x800 + aac LC. Valid 8-bit H.264. Cache dir during playback contains 40+ valid .ts segments. No fmp4 init filename collision (mpegts segments in current run; the earlier fmp4 path's -1.mp4 init pattern with start_number=0 is also fine — -1.mp4 literally has the -1 infix in filename, while data segments are 0.mp4, 1.mp4...; no actual name collision).

C) CSS verdict — video element NOT hidden. Read branding.xml CustomCss + cineplex.css (full). All display:none / visibility:hidden / opacity:0 / transform:scale(0) matches are on UI chrome (#castCollapsible, #guestCastCollapsible, .btnQuick, .headerSyncButton, .headerCastButton, .headerUserButton, MUI drawer items, .countIndicator, #loginPage h1, etc.). The only video::* / :cue rules touch subtitle font only. No hide/scale rule hits .htmlvideoplayer, .videoPlayerContainer, or the <video> element itself. CustomCss is not the cause of the black screen.

D) Service Worker verdict — no fetch interception. /web/serviceworker.js is the stock Jellyfin notification-only handler (notificationclick + activate→clients.claim). No install cache, no fetch listener. Cannot intercept HLS or video URLs. Already characterised in the prior "SW kill recipe" section — stock SW is harmless for media playback.

E) Web research findings. No 10.10.3-specific Chrome black-screen bug surfaced for the HLS path. Closest historical pattern: hls.js + AV1+Opus DirectStream where Jellyfin 10.10 mis-builds the codec attribute on the playlist for AV1, causing hls.js to abort. Common workaround: force transcode via DeviceProfile or restrict AV1 in user policy. No citation strong enough to assert as root cause from outside the live browser.

F) The actual story — and the fix recipe.

Timeline reconstruction from server logs for the user's session (192.168.0.10):

00:28:46 — PlaybackInfo for 7aa5add2... (TDK).
00:28:47 → ffmpeg launches on /media/movies/The Dark Knight (2008)/...mkv (libx264 High@5.1, fmp4).
00:28:53, 00:29:01 — ffmpeg restarts at -ss 00:04:18 and 00:09:06 (= user seeking forward during TDK playback).
00:29:07 — "Playback stopped … playing The Dark Knight. Stopped at 549885 ms" (= 9:09).
00:29:28 — "Playback stopped … playing F.T.C. Stopped at 39053 ms" (MNS S01E02).
00:42:42 — "Playback stopped … playing Ding Dong Delli. Stopped at 20905 ms" (MNS S01E04).

What this means: TDK transcoded and played fine for 9 minutes with seeks — TDK is not black-screening. The MNS episodes (AV1+Opus, 20-39 s before stop) match the user-perceived "black screen, give up" pattern. The incident report conflated these — user said "Mike Nolan Show + ItemId 7aa5add2" but the ItemId is TDK and the actual symptom is on the AV1 MNS episodes.

The 00:42:49 ffmpeg launch on TDK that appears AFTER MNS stop is my own diagnostic curl — its PlaySessionId 14f52f35eee04cec8146379c0dc6c960 matches the one I generated. Disregard as evidence of user behaviour.

Recommended fix sequence (ordered by likelihood):

Re-run with the right item. Ask user to repro on MNS S01E04 (Ding Dong Delli), capture browser DevTools Network panel: was /Videos/.../master.m3u8 issued (transcode path) or only /Videos/.../stream.webm (DirectStream)? What does /Items/.../PlaybackInfo return for SupportsDirectStream on the AV1 source? Capture the JS console for hls.js / shaka / MediaSource errors.
If DirectStream is on for AV1 → force transcode by adding a CodecProfile in the user's DeviceProfile that bans AV1 DirectStream (Type=Video, Codec=av1, Container=mkv,webm → forced conditional Direct=false). Server then falls back to libx264 transcode (CPU-only on nullstone, slow but reliable).
Cross-browser test — try Firefox. Different hls.js behaviour for AV1. If Firefox plays MNS but Chrome doesn't, confirms client-side AV1 DirectStream bug not server.
TDK is fine — leave alone, unrelated to this incident.

Out-of-scope here: dev.arrflix.s8n.ru /Sessions returned 401 with the api_key (Sessions needs a user-token, not just admin api_key). Recommend redoing the dev comparison through the user's browser cookie session.

API key cleanup verified: SELECT Name FROM ApiKeys returned empty after DELETE.

Final fix applied (verified via playwright headless)

Status: CLOSED for symptoms 1-4. Symptom 5 (video black-screen on AV1+Opus items) is a separate codec issue tracked for the 10.11.8 migration.

Three patches landed

branding.xml CustomCss: append content: "Play" override on .mainDetailButtons .material-icons.play_arrow::after. Cineplex theme hardcoded German "Abspielen" via CSS content: rule — NOT a Jellyfin locale issue. Hours of Traefik Accept-Language rewrites and force-english-all-users.sh chases were chasing the wrong layer entirely.
branding.xml CustomCss: backdrop transparent-scope using :has(). body.itemDetailPage selector (from prior docs) does NOT match in 10.10.3 — body class is libraryDocument. New rule scopes by .layout-desktop:has(.itemDetailPage) etc so backdrop layer (z-index:-1) renders behind detail pages without breaking other surfaces.
encoding.xml: EnableThrottling=false + EnableSegmentDeletion=false. Kills HLS 499 (segments reaped before browser re-requests).

Headless verification

bin/headless-test.py (new) logs in via Jellyfin SPA login form using playwright Chromium, navigates to detail page, screenshots, and probes computed styles. Used to bisect:

baseline screenshot (broken)
:has() selector verified backdrop renders
"Play" verified replaces "Abspielen"

Re-apply

bin/apply-26-incident-fixes.sh (new, idempotent) re-applies all three patches if branding.xml / encoding.xml drift back. Run via: ssh user@nullstone "$(cat bin/apply-26-incident-fixes.sh)"

What was rolled back

The clear-site-data@file Traefik middleware I added during this session was making prod worse: it was wiping cookies+storage on every visit, breaking auth+playback session continuity. Reverted by restoring the Traefik dynamic.yml backup taken right before the edit.

Do-NOT-repeat checklist (post-mortem)

These are the dead-ends. Future operators (and future me) should skip:

Don't add Clear-Site-Data to a Jellyfin route to "force the SW out". Stock Jellyfin SW is notification-only (no fetch handler) — there is no SW poisoning to begin with. The middleware just wipes cookies on every visit, breaking auth races.
Don't run bin/force-english-all-users.sh to fix "Abspielen". Doc 25 already established per-user Configuration.UICulture is theatre and the SPA never reads it. The German text was in Cineplex CSS via content: "Abspielen". Patch the CSS, not the user config.
Don't trust HTTP 204 from POST /Users/{id}/Configuration as success. Always GET back and verify. (And see #2 — even if you CAN persist UICulture, it doesn't drive UI strings in 10.10.x.)
Don't use body.itemDetailPage as a CSS selector in 10.10.3. The body class on detail pages is libraryDocument, not itemDetailPage. Use .itemDetailPage directly or :has(.itemDetailPage) on ancestors.
Don't paint #000 !important on .layout-desktop / .pageContainer without scoping. They wrap the backdrop layer; an unscoped black override occludes the entire backdrop. Always scope with :has() or by page-specific class.
Don't hot-patch web-overrides/index.html on the server without committing back to repo same step. Drift from repo is invisible until it breaks. Bug A (the DOM-walker MutationObserver freezing the browser) came from this exact pattern — see ~/.claude/projects/.../memory/feedback_always_commit_to_my_git.md.
Don't write CSS Mutation/text-walker observers without debounce + scope. Walking every text node on every DOM mutation freezes the main thread on poster grids. If you need DOM rewriting, use targeted selectors + debounce.
Don't sed-via-python regex on YAML files without strict anchors. I damaged dynamic.yml with a too-greedy DOTALL match earlier in this session (deleted unrelated routers). Restore-from-backup saved it. Always diff before reload.
Don't believe a single-itemId test as "playback works". Item 7aa5add2c2d8575eda5280b9b9072071 is The Dark Knight (HEVC, transcodes fine to H.264). The Mike Nolan Show episodes are AV1+Opus and break in Chrome. Always test the actual item the user reported.
Don't skip headless smoke-test. Visual confirmation in playwright Chromium catches CSS regressions instantly without waiting for the user to clear browser cache. bin/headless-test.py is a 30s round-trip.

Iteration 2 — backdrop visible only on top viewport (2026-05-09 follow-up)

INC4 online research

Web sweep 2026-05-09 against jellyfin/jellyfin + jellyfin/jellyfin-web issues filed since 2025-01. All URLs cited inline. "Verdict" = how strong the link to our two open symptoms (black-screen video, opaque "More from Season" band) is.

Q1 — Web 10.10.3 video black-screen on play (server transcoding HLS, browser shows nothing):

jellyfin-webos #126 "Black screen by enable Prefer FMP4-HLS as media container" — HEVC Main10 HDR10 10-bit direct-stream goes black, audio fine. Workaround: disable Prefer fMP4-HLS. https://github.com/jellyfin/jellyfin-webos/issues/126
jellyfin-web #7405 "HLS Media Errors only in Webbrowsers." https://github.com/jellyfin/jellyfin-web/issues/7405
jellyfin #16612 "Playback errors due to fMP4-HLS" (10.11.8, but root cause is fMP4 container; same workaround). https://github.com/jellyfin/jellyfin/issues/16612
forum t-solved-black-screen … web UI 10.0.3: theme .preload { #000 !important } covered the player. Direct precedent for our symptom. https://forum.jellyfin.org/t-solved-black-screen-w-audio-when-playing-video-web-ui-10-0-3
Verdict: probable. Two independent vectors: (1) fMP4-HLS container produces an init segment hls.js stalls on for certain codec profiles; (2) custom-CSS overlay covering the player. Both consistent with our black-screen-but-server-transcoding behaviour.
Next step: in DevTools, confirm whether <video> has frames (network MSE buffer) or is occluded. If the SourceBuffer never appendBuffer-s, it's #126/#16612 → toggle off "Prefer fMP4-HLS Media Container" in playback settings (or strip from custom DeviceProfile). If frames are buffered but invisible, search for an opaque ancestor (.preload, BLACK-PASS rule covering .videoPlayerContainer).

Q2 — Chrome 148 + -hls_fmp4_init_filename "X-1.mp4" MSE compatibility:

jellyfin-web #7546 "[Regression] Web browser HLS playback times out when audio transcoding required - worked in 10.10.7, broken in 10.11.6" — hls.js times out waiting for the first segment while ffmpeg probes large files. https://github.com/jellyfin/jellyfin-web/issues/7546
jellyfin #14487 "Audio delay don't work with fMP4-HLS." https://github.com/jellyfin/jellyfin/issues/14487
jellyfin #16647 "HLS subtitle X-TIMESTAMP-MAP is misaligned when using fMP4 segments." https://github.com/jellyfin/jellyfin/issues/16647
Verdict: confirmed broken across 10.10.7 → 10.11.x for some codec/container combos. Not Chrome-148-specific; the init-filename pattern itself isn't the bug — the timing between ffmpeg probing and hls.js segment-load timeout is.
Next step: disable Prefer fMP4-HLS first (single-toggle fix). If still broken, drop probesize + analyzeduration on the encoder side, or force ts segments via DeviceProfile TranscodingProfile container=ts.

Q3 — AV1 DirectStream codec-tag mislabel:

jellyfin #15646 "AV1 Video Stream in Wrong Container" — av1 muxed into mpegts as private-data stream, ffmpeg warning "may not be recognized upon reading". Workaround: switch hls_segment_type from mpegts to fmp4 with .m4s extension. Marked closed in UI but in Team Review (no PR linked, no version-tag yet). https://github.com/jellyfin/jellyfin/issues/15646
Codec Support docs reaffirm AV1 web playback is gated on browser support + correct container. https://jellyfin.org/docs/general/clients/codec-support/
Verdict: confirmed open. Affects 10.11.3 and back; no PR landed in 10.10.x line. Mike Nolan Show AV1+Opus matches the failure pattern.
Next step: ban AV1 DirectStream via custom DeviceProfile (drop AV1 from DirectPlayProfiles → forces server-side libx264 transcode).

Q4 — "More from Season" CSS class names:

jellyfin-web source uses verticalSection + detailVerticalSection pair, with data-type="MusicAlbum|Episode|...". https://github.com/tedhinklater/JellyfinThemeGuide
Layouts reference .scrollSlider, .itemsContainer, .padded-left, .sectionTitleContainer (already in our Iteration 2 fix list).

INC4 video playback diagnosis (full e2e)

End-to-end test 2026-05-09 ~01:35 UTC. Temp ApiKey arrflix-playback-e2e-2026-05-09 (token rotated, deleted at end, verified SELECT empty). Headless Chromium via playwright drove the SPA login as guest:123 and clicked .btnPlay on Rick & Morty S1E1 Pilot (324f75b84f394a5d9b0749c0679f23b9). Logs in /tmp/arrflix-playback-e2e/.

Source codec verdict — Rick & Morty Pilot is NOT H.264. ffprobe inside container reports the file is HEVC Main 10 / yuv420p10le / 3840x2160 / TrueHD 5.1 24-bit + AC3 5.1 + AC3 2.0 + PGS subs (4K HDR). Same codec class as TDK. The task brief assumption ("Rick & Morty likely H.264") is wrong — this library is 4K HDR remux. Path: /media/tv/Rick and Morty (2013)/Season 01/Rick and Morty (2013) - S01E01 - Pilot.mkv.

Failure mode at click — playback DOES work, but takes 12-18s to first frame. All segments + manifest 200 OK, no console errors, no video.error, no MediaSource exception, no CSS occlusion (.htmlvideoplayer / <video> display:block opacity:1 visibility:visible z-index:auto, getBoundingClientRect == full viewport). State timeline (clean run, position reset to 0):

t (s)	readyState	networkState	currentTime	buffered
2-10	0 (HAVE_NOTHING)	2 (LOADING)	0	[]
12	3 (HAVE_FUTURE_DATA)	2	0	0, 2.97
16	3	2	0.72	0, 5.97
22	3	2	6.74	0, 11.99
30	3	2	14.75	0, 14.97

With user's actual stored resume position (243.018 s from prior session), adds a kill+restart cycle: SPA fetches segment 0, sees currentTime=243, seeks → server kills 1st ffmpeg, launches 2nd with -ss 00:04:03 -noaccurate_seek -start_number 81. Browser stays at readyState=1 from ~t=8s to ~t=16s while 2nd ffmpeg produces segment 81. Total wait ≈ 18s to first painted frame. From the user's seat that looks identical to a broken player.

Server-side ffmpeg command (verified live in jellyfin logs):

/usr/lib/jellyfin-ffmpeg/ffmpeg -analyzeduration 200M -probesize 1G \
  -i "/media/tv/Rick and Morty (2013)/Season 01/...Pilot.mkv" \
  -map 0:0 -map 0:1 -codec:v:0 libx264 -preset veryfast -crf 23 \
  -maxrate 13546858 -profile:v:0 high -level 51 \
  -vf "setparams=color_primaries=bt2020:color_trc=smpte2084:colorspace=bt2020nc,\
       scale=trunc(min(max(iw,ih*a),min(3840,2160*a))/2)*2:trunc(min(max(iw/a,ih),min(3840/a,2160))/2)*2,\
       tonemapx=tonemap=bt2390:desat=0:peak=100:t=bt709:m=bt709:p=bt709:format=yuv420p" \
  -codec:a:0 libfdk_aac -ac 2 -ab 256000 \
  -hls_segment_type fmp4 -hls_fmp4_init_filename "...-1.mp4" \
  -start_number 0 -hls_segment_filename "/cache/transcodes/...%d.mp4" \
  -f hls -hls_time 3 ...

HardwareAccelerationType=none + 4K + tonemapx + libx264 veryfast + software stereo downmix. Per-segment encode wallclock observed: seg0 ~6 s, seg1 ~2.05 s. At nullstone Ryzen 5 5600G CPU-only, that's ~50% of real-time on a sustained run. Browser stalls because new segments arrive slower than they're consumed.

PlaybackInfo verdict (browser-emulating DeviceProfile, av1+h264+vp9 both allowed): SupportsDirectPlay=False, SupportsDirectStream=False, SupportsTranscoding=True, TranscodeReasons=ContainerNotSupported,VideoCodecNotSupported,AudioCodecNotSupported, TranscodingSubProtocol=hls, TranscodingContainer=ts (when client asks ts) — but in the headless run the SPA's stock DeviceProfile asks SegmentContainer=mp4 (fmp4 path) and the server picked libx264 H.264 high@5.1 8-bit, NOT av1. The VideoCodec=av1,h264,vp9 in the URL is the priority list; server reads it and selects the first the source can map to without HW — that's libx264 here, confirmed by -codec:v:0 libx264 in ffmpeg cmdline. AV1 is never used as a transcode target on prod.

Web research corroboration:

jellyfin#13324 "Transcoded playback of 4K HDR content fails": "no modern consumer CPU can transcode 4K HDR to SDR in real time" — software tonemapping is the bottleneck. https://github.com/jellyfin/jellyfin/issues/13324
jellyfin#5067 "HDR Tone Mapping is very slow in Jellyfin (19fps, 70% cpu)": ~20 fps cap on tonemapx. https://github.com/jellyfin/jellyfin/issues/5067
jellyfin docs Hardware Acceleration: software CPU decode + tonemap + encode at 4K HDR is officially "not supported for sustained real-time". https://jellyfin.org/docs/general/post-install/transcoding/hardware-acceleration/

Recommended fix (ordered by reversibility + UX impact):

Cap user MaxStreamingBitrate to 20 Mbps in jellyfin-web settings. Each user → Profile → Playback → Quality → 20 Mbps (or "Auto" with a default cap). Server-side ffmpeg still runs but -maxrate 20000000 matched output bitrate is reasonable and the scale filter clamps to 1080p (1920x800 for the source aspect), eliminating the 4K scale pass. Reduces per-segment encode wallclock from ~6s → ~1.5s. Single toggle, per-user, no server restart, fully reversible. This is the right move first.
Force libx264 + transcoding container=ts via DeviceProfile (or in jellyfin-web settings disable "Prefer fMP4-HLS"). Skips the fmp4 init-segment path which is implicated in jellyfin#16612 / webos#126 for HEVC Main10 sources. ts segments self-contain init data — ssimpler timing.
Disable software tonemapping for libraries with fake-HDR sources. Doc 21 already established R&M's MasteringDisplay/MaxCLL are absent (fake AI-upscale HDR). Server-side toggle:
```
ssh user@192.168.0.100 'docker exec jellyfin sh -c "\
  sed -i \"s|<EnableTonemapping>true|<EnableTonemapping>false|\" \
  /config/config/encoding.xml" && docker restart jellyfin'
```
Removes the tonemapx step from the filtergraph. Output will be SDR- directly-from-HDR-pixels (washed out per doc 21 — already accepted as the lesser evil for R&M). Saves ~30% encode CPU at 4K.
(Last resort, deferred to 10.11.8 migration) Add a CCR-style "transcode pre-warm" hook: when SPA opens a detail page, pre-issue /Items/{id}/PlaybackInfo + a no-op range request on segment 0 to start ffmpeg before the user clicks Play. Reduces perceived TTFP.

Recommended immediate action: option 1 + option 3. No code change needed — both are settings flips. After flipping, repro: open Pilot in Chrome, click Play, time-to-first-frame should be <5s.

Headless artefact warning: the v2-02-after-30s.png screenshot is pure black despite readyState=3 + currentTime advancing + buffered=[0, 14.97]. That is because Chromium without GPU does not paint decoded H.264 frames (no compositor target). Real Chrome on real GPU paints. So a black screenshot from bin/headless-test.py after Play is NOT a CSS bug — it's a headless rendering artefact. Verify CSS occlusion via getComputedStyle + getBoundingClientRect instead, both already clean in this run.

Open follow-ups left: AV1+Opus episodes (Mike Nolan Show) still untested in this iteration — different failure mode (DirectStream codec-tag mislabel per Q3 above), separate fix path.

https://deepwiki.com/jellyfin/jellyfin-web/3.5-home-sections-and-library-navigation

BobHasNoSoul/jellyfin-mods uses #itemDetailPage parent + nth-of-type for section targeting. https://github.com/BobHasNoSoul/jellyfin-mods
Verdict: confirmed. The wrapper is .verticalSection.detailVerticalSection (no moreFromSeasonSection class — Jellyfin distinguishes sections by data-type attr, not class). Our INC3 selector list already covers .detailVerticalSection*, so the opaque band is from a DESCENDANT, not the wrapper itself. Likely candidates: a .cardScalable, .cardBox, or .cardImageContainer with explicit background:#000 from BLACK-PASS.
Next step: in DevTools, inspect the opaque band, walk parent chain, find the first ancestor with non-transparent computed bg. Either add to transparent-scope or wrap selector in :not(.cardImageContainer).

Q5 — Themes implementing full-page persistent backdrop:

meow.garden "Dynamic backdrops for Jellyfin" — uses .detailPagePrimaryContainer .detailImageContainer .blurhash-canvas { position: fixed !important; opacity: .5; } to repurpose the blurhash placeholder as a fullscreen backdrop. https://meow.garden/jellyfin-dynamic-backdrops/
Cineplex theme custom.css: targets .backgroundContainer, .backgroundContainer.withBackdrop, .backdropImage, .blurhash-canvas (commented out). Mobile-only .itemBackdrop mask gradient. https://github.com/MRunkehl/cineplex
Finity theme: minimal docs, refers to "gradient mask for show backdrops" but actual selectors live in CSS files (not exposed in README). https://github.com/prism2001/finity
Verdict: confirmed. Two viable patterns: (1) pin .backgroundContainer (our current INC2 approach) — works but must transparent-scope every ancestor; (2) repurpose .blurhash-canvas as the fixed layer (meow.garden) — cleaner because blurhash is already per-item; survives section navigation without scroll math.
Next step: if INC3 transparent-scope keeps regressing, switch to blurhash-canvas pin. One selector vs ~20 wrappers to keep transparent.

Q6 — 10.10.3 → 10.10.7 worth bumping?

10.10.7 forum announcement (2025-04-05): security release, "several bugfixes." Trusted-proxies config required pre-upgrade. https://forum.jellyfin.org/t-new-jellyfin-server-web-release-10-10-7
Compare-page diff (v10.10.3...v10.10.7) didn't generate (too long). Releasebot lists per-release notes: https://releasebot.io/updates/jellyfin/jellyfin-server
Most fMP4/HLS fixes in our research target 10.11.x line, not 10.10.x patch series.
Verdict: probable mild improvement, not a fix for our bugs. Worth bumping for security/CVE coverage but unlikely to resolve black-screen or carousel-band. The known regressions of 10.11.x (#7546, #16612) argue against jumping straight to 10.11.8 without dev validation.
Next step: snapshot DB, bump dev to 10.10.7 first. If still broken, 10.11.8 is roadmap path with ElegantFin theme swap.

Q7 — Force-transcode-everything DeviceProfile:

Jellyfin docs confirm there's no built-in admin toggle to force transcoding for all clients. https://jellyfin.org/docs/general/post-install/transcoding/
forum.jellyfin.org/t-force-trasnscoding-or-disable-directplay: community workaround is reduce client max bitrate to 1Mbps (degrades quality) — no clean DeviceProfile-only override. https://forum.jellyfin.org/t-force-trasnscoding-or-disable-directplay-x265-stuttering-firetv
jellyfin-web #7651 "Chrome DeviceProfile hardcodes MKV in DirectPlayProfiles": JS-Injector plugin removes entries client-side before PlaybackInfo POST. Workaround pattern is generalisable: hook PlaybackInfo XHR, set DirectPlayProfiles=[], leave only TranscodingProfiles with H264 mp4/HLS. Server then has nothing to match → forces transcode. https://github.com/jellyfin/jellyfin-web/issues/7651
Verdict: confirmed pattern, no native config knob. Server-side empty DirectPlayProfiles in a custom DeviceProfile is the cleanest bypass; only ts-format TranscodingProfile remaining → libx264.
Next step: create custom DeviceProfile in admin → DLNA → Profiles with empty DirectPlay + a single TranscodingProfile (Container=mp4, VideoCodec=h264, AudioCodec=aac, Protocol=Hls). Match to Identification by browser UA. Eliminates codec compat as a variable in one move and is the cleanest test for "is the bug in our codec path or our renderer".

After INC1 (:has() transparent-scope) shipped and prod showed backdrop on detail-page top, owner reported "in the middle of the More from Season 1 is black, it's hiding the artwork". Below-the-fold sections (Next Up, Seasons, More Like This) showed solid black instead of continuing the backdrop.

Root cause (INC2)

.backdropContainer defaults to non-fixed positioning — it scrolls out of view. INC1 made wrappers transparent so backdrop showed through, but only where the backdrop EXISTED in the DOM viewport. Once user scrolls down, backdrop is above viewport, sections see body's #000 bg.

Fix INC2

Pin .backdropContainer + .backgroundContainer to position: fixed; top:0; height:100vh; z-index:0. Added ::after vertical gradient (transparent at top → 75% black at bottom) so text remains readable as user scrolls into backdrop area.

Root cause (INC3)

INC2 alone didn't fix it visually — section wrappers (.detailVerticalSection, .scrollSliderContainer, .padded-bottom-page, .itemsContainer etc) still painted opaque bg from BLACK-PASS + finity. Pinned backdrop sat behind, but sections occluded it section-by-section.

Fix INC3

Extended transparent-scope to all detail-page sub-sections: .itemDetailPage > *, .detailPageContent, .detailPagePrimaryContainer, .detailPageWrapperContainer, .detailVerticalSection*, .detailSection*, .itemsContainer, .scrollSlider*, .padded-bottom-page, .sectionTitleContainer, .detailRibbon, .subtitleAudioContainer, .detailPageRoot.

Verification (INC2 + INC3)

Updated bin/headless-test.py to take TWO viewport screenshots: top-of-page

scrolled to 50% page height. With INC2/INC3 applied, scrolled screenshot shows R&M backdrop persisting behind "Seasons" + "More Like This" sections (previously: solid black).

Lesson learned

When pinning a backdrop with position:fixed, transparency must extend RECURSIVELY through every wrapper ON TOP of the backdrop layer, not just the top-level page wrappers. Test with scrolled screenshot — full-page screenshot in playwright stretches viewport and hides position:fixed issues.

bin/headless-test.py now takes both top + scrolled. Use both to bisect.

INC4 black-band locator (2026-05-09)

Symptom. After INC3, owner reported that for ADMIN users a wide black band (~250px tall, full-width) still painted around the "More from Season 1" carousel on the Rick & Morty detail page (admin-only carousel; guest users don't see it). Cards rendered fine, only the BAND around them was opaque.

Diagnostic method. Inserted temp arrflix-band-diag-2026-05-09 ApiKey, logged in as admin via playwright, navigated to R&M detail page, scrolled all sections into view, then walked DOM upward from each .scrollSlider restricted to the .itemDetailPage subtree, reporting every ancestor with non-transparent background. Locator script: /tmp/arrflix-band-locator.py.

Result. Single opaque-black wrapper found, identical for ALL 9 carousels (Schedule / Next Up / Seasons / Additional Parts / Lyrics / Cast & Crew / Special Features / Music Videos / Scenes / More Like This / More from Season / More from Artist):

div.padded-top-focusscale.padded-bottom-focusscale.no-padding.emby-scroller
  bg = rgb(0, 0, 0)   pos = static   z = auto
  rect = x:80 y:1242  1488×333  (matches the band the user described)

Root cause. Pre-existing CSS rule in branding.xml from 2026-05-08 labelled /* kill gray band behind home-page Recently Added rows */ applied .emby-scroller { background: #000 !important; } UNSCOPED. INC3 overrode its sibling wrappers (.detailVerticalSection, .itemsContainer, .scrollSlider, .scrollSliderContainer) but missed the IMMEDIATE PARENT .emby-scroller. That single wrapper was the band.

Fix INC4. Detail-page-scoped transparent override appended to CustomCss after the INC3 block:

.itemDetailPage .emby-scroller,
.itemDetailPage .emby-scroller-container,
.itemDetailPage .verticalSection,
.itemDetailPage .padded-top-focusscale,
.itemDetailPage .padded-bottom-focusscale,
.itemDetailPage .moreFromSeasonSection,
.itemDetailPage .moreFromArtistSection,
.itemDetailPage .scrollSliderContainer,
.itemDetailPage .scrollButtonContainer {
  background-color: transparent !important;
  background: transparent !important;
}

No position:relative; z-index:1 needed on .emby-scroller — the parent .detailPageWrapperContainer already has position:relative; z-index:2, which is above the pinned .backdropContainer at z:0. Removing the opaque fill alone is sufficient.

Verification. Re-ran band-locator after docker restart jellyfin — opaqueBlackBands: 0 inside .itemDetailPage (was 1). Screenshot of R&M detail page at mid-scroll now shows portal/Easter Island backdrop continuous behind every carousel including "More Like This". Cleaned up the arrflix-band-diag-2026-05-09 ApiKey row.

Patch lines added to bin/apply-26-incident-fixes.sh so re-runs are idempotent and recover from branding.xml drift.

Lesson. When a prior unscoped background: #000 !important rule exists in a shared CSS bucket (here: branding.xml CustomCss), grep the file for the property/selector BEFORE writing a new transparent-scope rule. A DOM-walking locator script that reports every opaque ancestor of the target finds the painter in seconds — much faster than guessing selectors. Going forward: when adding a "paint opaque" rule, scope it from day one (.homePage .emby-scroller, not bare .emby-scroller).

Open follow-ups (for separate sessions)

AV1+Opus playback (Bug E): Chrome's AV1 DirectStream codec-tag mislabel bug. Fix options: (a) ban AV1 DirectStream via DeviceProfile (force x264 transcode), (b) re-encode MNS source to H.264, (c) wait for 10.11.8 upgrade. See agent finding in this doc → "Playback diagnosis".
10.11.8 migration: current 10.10.3 has known issues per online research (TMDB scrape regression #14922, custom CSS injection #7220). 10.11.8 is current stable as of 2026-05-09 with CVE fixes. Plan: dev first, snapshot EF Core DB migration, swap Cineplex → ElegantFin (10.11-supported), promote to prod after verified.
Permanent SW kill option (deferred — stock SW doesn't actually intercept anything): if a future Jellyfin update enables a real fetch-handler SW, we have the recipe in this doc → "SW kill recipe" agent finding.
Session-state backup off-host (ROADMAP H4): no automated backup yet. Today's incident was rescued by inline cp X X.bak.$(date +%s) for both branding.xml and dynamic.yml — should be systematized.

Iteration 2

INC4 testing methodology audit

This iteration is a meta-audit on the test that signed off Iteration 1. After INC1–INC3 shipped, owner reported two regressions the headless test did NOT catch:

A black band painted behind the "More from Season N" carousel on detail pages.
Video plays as a black screen on the user's actual TV episode content (AV1+Opus from Mike Nolan Show), even though the test claimed playback was fixed.

This section documents what the v1 test missed, why those gaps existed, what bin/headless-test-v2.py changes, and the preflight protocol every future fix must pass before claiming "verified".

a) What v1 missed

Gap	Concrete consequence
Logged in only as `guest` (non-admin restricted user).	The "More from Season N" carousel is admin-visible content. `guest`'s permissions hid it from the DOM, so the section wrapper that painted the black band never rendered during the test. v1 reported "no regression" because the offending element wasn't on the page it screenshotted.
Never clicked Play. v1 only loaded the detail page, took screenshots, scraped a small fixed selector list.	A `<video>` element that fails to decode (AV1 in Chrome with mislabelled codec tag, per Bug E in this doc) won't show up unless you actually start playback. v1 had no way to observe `video.error`, `video.readyState`, `videoWidth/Height`, or `currentTime` because the player was never instantiated.
Only one item tested. v1 auto-picked the first Series and probed its detail page.	Codec coverage was random — usually whatever happened to be first alphabetically. The HEVC movie that worked (Dark Knight) and the AV1 episode that didn't (Mike Nolan Show) had different failure modes; v1 couldn't distinguish them because it tested neither systematically.
Hardcoded selector list for DOM probe.	v1 inspected ~22 known selectors. Any new section wrapper (e.g. `.moreFromSeasonContainer`) painting an opaque background outside that list was invisible. The black band lived in a wrapper v1 didn't even know existed.
No structured pass/fail criterion. v1 emitted `probe.json` with raw computed-style snapshots; humans had to read it and decide.	"I declared playback fixed" — that human decision had no machine-verifiable backing. There was no JSON field saying `regressions: []` that owner / next-Claude could trust without re-deriving from raw data.
No cross-reference to a known-good baseline.	Even if v1 had caught the band, there was no golden-image comparison to alert "this looks different from last passing run". Detection relied on someone eyeballing the screenshot.

b) Why those gaps existed

Speed-bias. v1 was written under time pressure as the third-tier verification of an INC3 CSS fix. The minimum viable test was "page loads and looks right at top + scrolled". That worked for the visual bug it was designed against — and stopped there.
No threat model for the test itself. The test never asked "what classes of regression CAN I detect, what classes CAN'T I". If it had, the missing-Play and admin-only-content gaps would have been obvious.
Single-account convenience. guest-mirror was the easiest creds to hand because doc 17 had just minted them. Re-using one role across the whole verification was the path of least resistance.
Selector tunnel-vision. The selector list was copied from the previous fix's diagnostic queries (INC2/INC3). It tracked what the previous bugs touched, not what the current page actually rendered.
Server-log success treated as proof of client success. Bug E was declared "fixed" because Dark Knight transcoding logs looked clean. No one closed the loop and confirmed the user's actual content (Mike Nolan Show / AV1) decoded in a real browser.

c) What v2 changes (`bin/headless-test-v2.py`)

Improvement	Mechanism
Multi-user coverage	Runs the entire probe twice: once as admin (`s8n` / `s8n-dev`), once as non-admin (`guest` / `guest-mirror`). Per-user screenshots + `probe.json`. Computes a `section_title_diff` listing which sections rendered for one role but not the other — that diff is the canonical alert for "you're missing admin-only content".
Click Play + observe	After detail page settles, locates `.btnPlay` / `[data-action="play"]`, clicks (with keyboard `p` fallback), waits 10 s, then reads `<video>` element state: `currentTime`, `paused`, `ended`, `readyState`, `networkState`, `videoWidth`, `videoHeight`, `error.code`, `buffered_ranges`. Also captures a `*-play.png` screenshot and accumulates new console / network errors during the playback window.
Multiple-item coverage	Three items per role: HEVC movie (Dark Knight, hardcoded id `7aa5add2c2d8575eda5280b9b9072071`), AV1 episode (auto-picked from Mike Nolan Show), H.264 episode (auto-picked from a different series). Codec types are labelled in JSON so failures can be attributed to a codec class, not "the test failed". `ITEMS=` env var overrides for ad-hoc runs.
Section-bg sweep	At scroll-bottom, walks `document.querySelectorAll('*')` and reports every visible element with non-transparent `backgroundColor` whose rect overlaps the viewport. Filters via a small `BG_ALLOWLIST` (video player, dialogs, header) and a darkness heuristic (R+G+B < 90 → likely a black-band regression). Output goes into `probe.json` under `runs[].items[].regressions`.
Golden-screenshot diff	If `OUT/golden/<key>-{top,mid,bot,play}.png` exists, the run computes a Pillow `ImageChops.difference`, writes a diff PNG, and emits `{bbox, ratio}` per shot. Maintainer can populate goldens after the next clean run; subsequent runs flag drift quantitatively.
Structured pass/fail JSON	`probe.json` now has stable shape: `{url, runs:[{role, user, is_admin, items:[{kind, probe, play, regressions, diffs_vs_golden}]}], section_title_diff, issues, exit_code}`. `grade()` produces `issues[]` and exits 0/2 deterministically. CI / orchestration can `jq '.issues
Documented invariants up front	The script header explicitly lists "what v1 missed and how v2 closes it" so the next person reading it doesn't repeat the speed-bias trap.

d) Preflight protocol — do this before claiming any ARRFLIX fix is "verified"

Treat this list as a hard gate. If any step is skipped, the fix is unverified, not "fixed".

Run v2 with both roles. bin/headless-test-v2.py https://dev.arrflix.s8n.ru. Confirm exit code 0 AND probe.json .issues is empty. If exit code 2, read .issues[] — those are concrete regressions, not flaky test noise.
Inspect section_title_diff. A non-empty only_admin array means the admin sees content the guest doesn't — that section MUST be verified visually in the admin screenshots, because guest-only testing would have been blind to it.
Confirm playback per codec. For each item in runs[].items[], play.video.readyState must be ≥ 2 AND play.video.error must be null. paused is acceptable iff currentTime > 0 (autoplay policy may pause after the first frame, but a frame DID render). videoWidth and videoHeight must be > 0 — that's the canonical "actually decoding" check.
Sweep flagged dark backgrounds. Any element in runs[].items[].regressions that is not a known overlay (dialog, video player chrome, drawer header) is a candidate band-bg regression. Add it to BG_ALLOWLIST only if the design genuinely intends it to be opaque; otherwise fix the CSS.
Diff against goldens. If diffs_vs_golden[].ratio for any shot exceeds your threshold (start at 0.02 = 2% pixels changed), open the *-diff.png and confirm the change was intended.
Run on prod after dev passes. Same script, same expectations: bin/headless-test-v2.py https://arrflix.s8n.ru. Dev mirror exists (doc 12 / doc 17) precisely so you can verify there first.
Only THEN write "verified" in the doc. Always cite the run's probe.json path and exit code in the verification note. Future-you needs to be able to re-run the exact same gate.

Three single-sentence rules carved out of this protocol, for posters on the wall:

Always test as both admin and non-admin — admin-only sections are invisible to guests, and a fix that breaks admin-only content will not be detected by guest-only tests.
Always click Play — page-load is necessary but not sufficient; black-screen playback only manifests after <video> is instantiated and a frame is requested.
Always sweep ALL backgrounds — fixed-list selector probes only catch regressions in selectors you already knew about, which is the opposite of what a regression test is supposed to do.

85 KiB Raw Blame History Unescape Escape