ARRFLIX/docs/26-incident-2026-05-09-page-unresponsive-and-playback.md
s8n 9b06bb48c6 doc 26 INC2+INC3: pin backdrop, transparent sub-sections
After INC1 fixed the Abspielen + first-fold backdrop, owner reported black
band hiding artwork in More from Season 1 / below-fold sections. Two more
patches required:

INC2 — pin .backdropContainer + .backgroundContainer position:fixed; height
100vh so backdrop persists during scroll. Added vertical fade ::after.

INC3 — extend transparent-scope to ALL detail-page sub-sections
(.detailVerticalSection, .scrollSlider, .padded-bottom-page,
.itemsContainer etc) so section wrappers don't paint over the pinned
backdrop section by section.

bin/headless-test.py now takes top + scrolled viewport screenshots.
full_page=True hides position:fixed regressions, dual-screenshot exposes
them. Use both to bisect.

bin/apply-26-incident-fixes.sh updated with INC2+INC3.

Open: AV1+Opus playback (Mike Nolan Show) still tracked for 10.11.8
migration. .detailLogo regression possible — test in actual browser.
2026-05-09 01:21:01 +01:00

616 lines
58 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 26 — Incident 2026-05-09: Page Unresponsive + Posters Missing + Playback Black-Screen
> Session log. Live document — updated as fix proceeds. Goal: future-me + other operators can read this and skip every dead-end I already walked.
Status as of doc creation: **ONGOING** — partial fix applied, more under investigation.
---
## Symptoms reported by owner (in order)
1. "Browser arrflix is broken videos don't play at all"
2. "I can't even see a preview of the TV series / movie"
3. After first fix: page loads, posters render, but **"Page Unresponsive"** Chrome dialog before posters paint (screenshot 1)
4. After second fix attempt: posters render, but **"Abspielen"** (German Play button) instead of "Play"; **all backdrop art replaced by black**; **video plays as black screen** (screenshot 2)
---
## Root causes identified so far
### A — Browser hangs (resolved by fix #1)
`/opt/docker/jellyfin/web-overrides/index.html` deployed copy was AHEAD of repo HEAD. md5 deployed `b97c1cb4` ≠ repo `d77c106b`. Someone hot-patched a `forceEnglishUI()` text-walker MutationObserver onto `document.body` with `subtree:true, characterData:true`. Walker rewrote `alt`/`title`/`aria-label` on every DOM mutation. Poster grid lazy-load fired it hundreds of times → main thread frozen → Chrome "Page Unresponsive".
**Fix applied:** scp'd repo HEAD `index.html` over deployed, restarted container. Verified md5 matches.
**Lesson:** never hot-patch the bind-mount. Always commit + redeploy from repo. Drift is invisible until something breaks.
### B — DB write failures (auto-resolved before this session)
Agent investigation found `jellyfin.db` had been owned by uid 101000 (userns-remap leftover, see `~/.claude/projects/-home-admin-ai-lab/memory/project_nullstone_docker_userns.md`). Container ran as 1000 → SQLite Error 8: `attempt to write a readonly database`. By the time we re-checked, file was already `user:user`. Probably fixed during 23:22 container restart.
**Lesson:** if `jellyfin.db` is unwritable, EVERY user-config save silently fails (HTTP 204 success, value not persisted). Check ownership FIRST when config writes don't stick.
### C — German "Abspielen" leak (NOT YET FIXED — current focus)
User's `Configuration.UICulture` is `<absent>` for ALL 12 users. Tried POST `/Users/{id}/Configuration` with `UICulture: en-US` payload via `bin/force-english-all-users.sh`. Server returned HTTP 204 but field did NOT persist on subsequent GET. **POST silently drops UICulture**.
Possible explanation: the `UserConfiguration` model in 10.10.3 may have removed the per-user UICulture field, OR the `Users` table schema (verified) has no UICulture column AND no Preferences row stores it. Doc 15 claims `Configuration.UICulture` is authoritative, but that doc is from when fix worked. Behavior may have shifted.
Traefik DOES rewrite `Accept-Language: en-US,en;q=0.9` on every request (`force-en-accept-lang@file` middleware) AND rewrites locale chunk JS path so `de-json.X.chunk.js``en-us-json.667484b4a441712c7e05.chunk.js`. Verified via curl: `de-json.X.chunk.js` returns 107425 bytes of English content.
**So why German leaking?** Service Worker cache. Browser's SW serves stale German chunk from CacheStorage, never hits network, never sees the Traefik rewrite. SW from before the lockdown was deployed.
Tried: `Clear-Site-Data: "cache", "cookies", "storage"` Traefik response header on `/web/index.html`. Verified live via curl. **But the user's browser STILL has SW cache** — SW intercepts the GET to `/web/index.html` and serves from cache, response from server (with Clear-Site-Data) never reaches browser cache layer. SW prevents its own death.
### D — Backdrops missing (NOT YET INVESTIGATED)
User reports backdrop art (the wide background image behind episode cards) is now black for every show. Could be:
- Image not in DB/cache (server returning empty)
- CSS hiding backdrop element
- SW serving stale 404 from a bad earlier session
- Jellyfin metadata refresh interrupted
### E — Video black screen on play (NOT YET FIXED)
Server logs show ffmpeg IS transcoding HEVC source → H.264 high@5.1 + libfdk_aac. But browser shows black. Earlier `/Sessions` proved DirectPlay worked for one client (RemoteEndPoint 82.31.156.86). Recent attempts: HLS segment 186.mp4 returned **499 (client closed connection)** + `POST /Sessions/Playing/Progress` returned **502 Bad Gateway** at 23:31:49 (during traefik momentary upstream-missing window).
Possible causes:
- SW intercepting HLS init segment, serving stale/wrong-mime
- 10-bit HEVC source → H.264 transcode timing issue
- CSS hiding `<video>` element
- HLS init.mp4 vs segment naming bug (`hls_fmp4_init_filename "X-1.mp4"` + `hls_segment_filename "X%d.mp4"` — collision risk)
---
## Actions taken this session
| # | Action | Outcome |
|---|---|---|
| 1 | scp repo `index.html` → deployed; `docker restart jellyfin` | DOM-walker shim gone. Page no longer hangs. |
| 2 | Insert temp ApiKeys row in jellyfin.db, run `bin/force-english-all-users.sh` | POST 204 but UICulture NOT persisted. Possibly server-model dropped field. |
| 3 | Add `clear-site-data@file` Traefik middleware to `jellyfin-html-nocache` router | Header lives. But SW intercepts before browser cache layer can apply. |
| 4 | Revoke temp ApiKey | Done. |
---
## What did NOT work (don't repeat)
- `bin/force-english-all-users.sh` against 10.10.3 — POST 204 but field dropped server-side. Either model changed or DB write path broken differently than uid-101000 issue.
- `Clear-Site-Data` response header alone — SW intercepts and the header never reaches browser cache eviction. Need to kill SW BEFORE it can intercept.
## Forbidden patterns
- Hot-patching `web-overrides/index.html` without committing to repo. Bug A came from this exact pattern. Repo MUST = deployed.
- Trusting HTTP 204 as success. Verify with GET.
- Client-side DOM-walker MutationObservers without debounce + scope. Will tank performance + freeze browser.
---
## Plan (in flight)
1. Read every prior doc (`docs/01..25`) — extract what was tried + outcome (agent task)
2. Read git log of `web-overrides/`, `bin/force-english-all-users.sh`, `bin/inject-shim.py` (agent task)
3. Online: how to kill a Jellyfin Service Worker definitively (agent task)
4. Read `/web/serviceworker.js` source — what does it cache? (agent task)
5. Diagnose backdrop missing — server vs CSS vs SW (agent task)
6. Diagnose HEVC playback black screen — codec + segment + HLS (agent task)
7. Compare jellyfin-dev vs jellyfin (agent task — dev MAY be working, look at what's different)
8. Apply consolidated fix from agent findings
9. Verify in user browser
10. Commit doc 26 + any code changes; push to `git.s8n.ru/s8n/ARRFLIX`
---
## Findings from agents
### Repo archeology
Reference compiled 2026-05-09 from docs/13-25 + bin/* + git log. Use this to skip dead-ends.
**A - Locale lockdown - what's been tried + outcomes**
Chronological history (paths absolute):
1. `/home/admin/arrflix-repo/docs/15-force-english.md` (commit 14f63e8, 2026-05-08 04:22) - diagnosis: per-user `Configuration.UICulture` absent on all 5 users -> SPA falls back to `Accept-Language`. **Built `bin/force-english-all-users.sh`** (read-modify-write `POST /Users/{id}/Configuration` with `UICulture: en-US`, expect 204). Shipped one-line wrapper patch for `bin/add-jellyfin-user.sh` step 3/4 (`c['UICulture']='en-US'`). **Status at write-time: plan-only, script never executed.**
2. `/home/admin/arrflix-repo/docs/19-english-only-audit.md` (a3f82df) - confirmed UICulture still absent on 8/8 users; identified that **92 non-English `<lang>-json.<hash>.chunk.js` chunks reachable** (`de-json.1afccc006ab8bb6c5953.chunk.js` contains `"Play":"Abspielen"`). Proposed three orthogonal fixes: (a) Path-A Traefik `customrequestheaders.Accept-Language=en-US` middleware, (b) Path-B 1-byte chunk stub bind-mounts (brittle - chunk hashes rotate per JF image), (c) `navigator.language` shim in `inject-shim.py`. **Outcome: recommendations only.**
3. `/home/admin/arrflix-repo/docs/20-english-only-lockdown.md` (d5d6856) - operator doc declaring 4 layers (server, per-user, web SPA shim, Accept-Language). Ships `bin/english-lockdown-runner.sh` (idempotent re-apply for layers 1+2). Layer 3 = `web-overrides/english-lockdown.{js,css}` (sibling commit d2120c6). **Outcome: claimed working at write-time.**
4. `/home/admin/arrflix-repo/docs/25-english-leak-deep-dive-2026-05-08.md` (117fa33) - **critical retraction**: greppped the live web bundle and proved the SPA NEVER reads `Configuration.UICulture`. Only `wizard-start.<hash>.chunk.js` and `25583.<hash>.chunk.js` reference it, both for the admin `/System/Configuration` form, NOT user UI. Actual locale resolver reads `document.documentElement.getAttribute("data-culture")` -> `navigator.language` -> `navigator.userLanguage` -> `navigator.languages[0]` -> `localStorage.getItem("language")` (no user prefix). **Per-user UICulture POST = theatre. Only the shim's `Object.defineProperty(Navigator.prototype, 'language', ...)` actually pins SPA UI.** Verified with headless Trivalent `--lang=de-DE --accept-lang=de-DE,de,en` -> only `en-us-json.667484b4a441712c7e05.chunk.js` requested.
5. **Today's deployed shim** (`/home/admin/arrflix-repo/bin/inject-shim.py` lines 13-114) - does ALL of the above: `localStorage.setItem` for 6 keys (`appLanguage,selectedlanguage,selectedlocale,language,locale,culture`), `Object.defineProperty(Navigator.prototype, 'language')`, `Object.defineProperty(Navigator.prototype, 'languages')`, fallback `navigator.X` redefine, fetch+XHR wrappers stripping `Accept-Language` and rewriting `POST /Users/{id}/Configuration` body to force `UICulture:'en-US'`, `pinLocale()` re-runs every 1 s + on visibility-change. **This is the canonical recipe - anything that works lives here.** Doc 26 sec C confirms Traefik `force-en-accept-lang@file` middleware also rewrites `Accept-Language` per request, AND rewrites `de-json.X.chunk.js` -> `en-us-json.667484b4a441712c7e05.chunk.js` (curl-verified: de URL returns 107 425 bytes of English).
**B - Service worker handling - what's been tried + outcomes**
- `docs/13` finding 11 + `docs/23` sec 5 + `docs/25` hypothesis 2 - `/web/serviceworker.js` is **768 bytes**, `Last-Modified: 2024-11-19` (Jellyfin 10.10.3 ship). Source confirmed: only `notificationclick` handler + `clients.claim()`, **no `fetch` listener, no precache, no `cache.put`**. Stock SW cannot poison posters/HLS by design.
- `bin/inject-shim.py` lines 174-188 - shim already calls `navigator.serviceWorker.getRegistrations().then(regs => regs.forEach(r => if scriptURL.includes('serviceworker.js') r.unregister()))` AND `caches.keys().then(keys => keys.forEach(caches.delete))`. **Built-in SW kill + cache wipe runs every page load.** In production now.
- `docs/25` R1 - proposed `Cache-Control: no-cache` on `/web/index.html` to stop heuristic caching of pre-shim HTML (Path-A label-scoped Traefik middleware). **Status: not applied at doc-25 write-time.**
- Doc 26 sec C - added `clear-site-data@file` Traefik middleware. Header reaches curl, but **SW intercepts before browser cache layer can apply Clear-Site-Data - SW prevents its own death**. SW kill must come from inside the SW (self-destruct) or via Update fetch returning 404. See SW kill recipe section below.
**C - Backdrop / artwork issues - any prior doc covers this?**
- `docs/14` - only doc that touches detail-page backdrops. Diagnosed Finity-parent's `--detail-page-backdrop-offset: 17%` + `mask.png` from `raw.githubusercontent.com/prism2001/finity/main/assets/mask.png`. Two CSS culprits clamping the band hard-black: (a) `:root --primary-background-color: #000 !important`, (b) `html, body, .preload, .skinBody, ..., #reactRoot, .mainAnimatedPages, .dashboardDocument { bg:#000 !important }`.
- `docs/14` sec 7 proposed CSS fix (`linear-gradient` overlay, `body.itemDetailPage` scope-out for bg-clamp). Doc 21 sec 4 cross-ref says "just landed".
- `docs/23` finding 6 - `/Items/{id}/Images/Primary` returns `Cache-Control: public` with NO max-age (heuristic = 0 s); cold poster transcode 350-470 ms; on-disk image cache `/cache/images/resized-images/` is 39 MB / 412 files / 16 h retention.
- `docs/24` sec 4 - image cache 39 MB total, 412 files, no GC pressure, oldest 16 h old.
- **No prior doc covers "all backdrops replaced by black" as a regression.** Closest precedents: doc 14 hard-black left band (CSS layer), doc 23 poster timing (cold-cache layer). New investigation territory for doc 26.
**D - Video playback / HLS / transcode issues - any prior doc?**
- `docs/13` finding 03 - `EnableThrottling=false`, `EnableSegmentDeletion=false`, `MaxMuxingQueueSize=2048`, `SegmentKeepSeconds=720`. Two 499 client-cancels in 1 h (HLS segments at 6.4 s + 2.9 s).
- `docs/21` - full HDR/HEVC diagnosis for Rick & Morty. Source = HDR10 (`smpte2084`, `bt2020nc`, `yuv420p10le`, `color_range=pc`, no MasteringDisplay/CLL - fake AI-upscale HDR). `EnableTonemapping=false` + `HardwareAccelerationType=none` -> HDR pixels delivered as SDR -> washed-out (NOT pure black). PlaybackInfo: `TranscodeReasons=ContainerNotSupported, AudioCodecNotSupported, SubtitleCodecNotSupported`. Fix: `EnableTonemapping=true` (`bt2390` already selected).
- `docs/22` sec 5 - 4 concurrent ffmpegs on ONE viewer of R&M S01E01. Filtergraph: `[0:4]scale,scale=3840:2160:fast_bilinear[sub]; [0:0]...format=yuv420p[main]; [main][sub]overlay`, `libx264 preset=veryfast crf=23 maxrate=13.5Mbps`, fmp4 HLS. 643 % CPU each. Cause: `EnableThrottling=false` + `EnableSegmentDeletion=false`.
- `docs/22` sec 3 - `TranscodingSubProtocol: hls`, `Container: fmp4/hls`, `IsVideoDirect=False, IsAudioDirect=False`. `PlayMethod` reports `DirectPlay` while `TranscodingInfo` is populated - race in Sessions DTO; actual decision is transcode.
- `docs/23` sec 7 - every Traefik request > 50 ms is `/videos/.../hls1/main/*.mp4` HLS-segment GET. AV1+HEVC at 360-550 Mbit. 15 x 499 + 8 x 500 in 6 h (CPU-side, not edge).
- **No prior doc covers "video plays as black screen" with audio working.** HLS init/segment naming collision risk (`hls_fmp4_init_filename "X-1.mp4"` + `hls_segment_filename "X%d.mp4"`) is a doc-26-only hypothesis. SW-intercepting-init-segment is also doc-26-only - but stock SW has no `fetch` handler so this requires a poisoned non-stock SW.
**E - Forbidden patterns - things explicitly called out as "do not do"**
- **No bundle modifications** (`docs/16` F5, `docs/19` row 16). Content-hashed filenames rotate per JF image upgrade; breaks source-map; must re-emit per bump.
- **No DOM-walker MutationObservers without debounce + scope** (doc 26 sec A bug A). The hot-patched `forceEnglishUI()` text-walker on `document.body` with `subtree:true, characterData:true` froze the main thread on poster lazy-load. The `inject-shim.py` walker in doc 16 sec C is the safe pattern (`acceptNode` filter + bounded selector).
- **No hot-patching `web-overrides/index.html` without committing to repo** (doc 26 sec A lesson). md5 drift between deployed and repo HEAD is invisible until breakage.
- **No trusting HTTP 204 as success** (doc 26 sec B lesson). `jellyfin.db` owned by uid 101000 (userns leftover) -> SQLite Error 8 readonly - POSTs return 204 but value not persisted. Always GET-verify.
- **No `Cache-Control: immutable` on `/web/index.html`** (doc 25 R1 caveat). Bricks next deploy until users force-reload. Scope to hashed chunks only.
- **No tonemap on SDR sources** (doc 21 sec 7e). If Mandalorian looks oversaturated post-fix, tonemap leaks - set `TonemappingMode` from `auto` to stricter.
- **No relying on per-user `Configuration.UICulture` for UI strings** (doc 25 R3 + sec 4). Server-side metadata theatre. Only the shim pins UI. Keep field for future-proofing but stop expecting it to fix Abspielen.
- **No bundle bind-mount for `<lang>-json.<hash>.chunk.js`** (doc 19 Path B caveat, doc 25 R4). Hashes rotate per image upgrade - must regenerate every bump.
- **No deleting Settings drawer node** (doc 17 sec 3.1). Drawer-renderer rebuilds on next render; remove only via CSS `display:none` + style override. Old `mypreferencesmenu` selectors match **0** elements - use `a.btnSettings, [data-itemid="settings"]`.
- **No theme @import without snapshot** (doc 14 sec 9). `/System/Configuration/branding` is whole-object replace - sibling Cineplex POST overwrote ElegantFin/NeutralFin within minutes (race rule, doc 04 sec 3b).
- **No `bg:#000 !important` on detail pages** (doc 14 sec 2c, doc 21 sec 4) - clamps Finity's intentional 17vw band into hard-black slab. Scope to `body:not(.itemDetailPage)`.
- **No stripping `Accept-Language` at Traefik for shared backends** (doc 15 limit 2; relaxed in doc 19 sec 19 since arrflix is sole consumer of arrflix.s8n.ru router).
### SW kill recipe
Research date 2026-05-09. Treat as authoritative for this incident.
**Q1 — Clear-Site-Data through an active SW:** Per W3C spec and MDN, `Clear-Site-Data` is **only honored on responses fetched over the network**, not those served by a SW. A SW can return arbitrary responses (incl. third-party), so browsers ignore CSD on SW-intercepted responses. Chrome/Firefox/Edge/Opera implement this; Safari support is partial. Conclusion: our existing Traefik header on `/web/index.html` will only fire for users whose SW lets that exact URL through to network — for stuck SWs that serve cached `index.html`, the header never reaches the browser. **Verified-not-working alone.** ([MDN Clear-Site-Data](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Clear-Site-Data), [Chrome Workbox guide](https://developer.chrome.com/docs/workbox/remove-buggy-service-workers))
**Q2 — Self-destruct shim:** **Verified working pattern.** Google's official Workbox guide recommends this as the *primary* approach. The browser performs a byte-for-byte update check on the SW script (max 24h, often immediate when `Cache-Control: max-age=0` or response differs). When the new script unregisters itself, all clients controlled by it lose their controller on next navigation. Canonical NekR snippet ([github.com/NekR/self-destroying-sw](https://github.com/NekR/self-destroying-sw)):
```js
self.addEventListener('install', e => self.skipWaiting());
self.addEventListener('activate', e => {
self.registration.unregister()
.then(() => self.clients.matchAll())
.then(cs => cs.forEach(c => c.navigate(c.url)));
});
```
Bind-mount feasibility: Jellyfin official image serves web from `/jellyfin/jellyfin-web/` inside the container. Bind-mounting the *whole directory* is broken (jellyfin/jellyfin#8441), but bind-mounting a *single file* over the existing `serviceworker.js` works the same way `index.html` does for us. Path inside container is `/jellyfin/jellyfin-web/serviceworker.js`. ([Jellyfin container docs](https://jellyfin.org/docs/general/installation/container/), [discussion #8441](https://github.com/jellyfin/jellyfin/discussions/8441))
**Q3 — 404/410 for SW script:** Spec status is **may work, browser-dependent**. W3C ServiceWorker issue #204 was closed wontfix — the spec does NOT mandate auto-unregister on 404/410 during normal navigation. HOWEVER, the *Update* algorithm (run on navigation, ~24h, or `registration.update()`) DOES unregister on 404/410 in Chrome and Firefox today (matches AppCache). The catch: update only runs when the browser checks; a stuck SW serving cached pages may never trigger an update fetch. Less reliable than self-destruct shim. ([w3c/ServiceWorker#204](https://github.com/w3c/ServiceWorker/issues/204))
**Q4 — Jellyfin 10.10.x SW poisoning:** No 10.10-specific SW-poster issue filed. The actual `src/serviceworker.js` in jellyfin-web is **notification-only** — no `fetch` listener, no cache logic. So if `arrflix.s8n.ru/web/serviceworker.js` is intercepting media, it is NOT stock Jellyfin code — likely a stale SW from a prior deploy, an injected mod (BobHasNoSoul/jellyfin-mods etc.), or browser-side residue. Stock Jellyfin SW cannot poison posters/HLS by design. Related issues: [jellyfin-web#4549](https://github.com/jellyfin/jellyfin-web/issues/4549) (premature caching), [jellyfin-web#5729](https://github.com/jellyfin/jellyfin-web/issues/5729) (stale `/system/info/public`).
**Q5 — Container path:** Confirmed `/jellyfin/jellyfin-web/serviceworker.js` for the official `jellyfin/jellyfin` image.
### Prod-vs-dev diff
Investigation 2026-05-09 — comparing live `jellyfin` (prod) vs `jellyfin-dev` containers on nullstone. Image tags identical: both `jellyfin/jellyfin:10.10.3`. Network.xml byte-identical. So differences below are 100% the operator's hardening, not Jellyfin upstream.
**A — docker-compose.yml diff (key items):**
- Prod mounts ~110+ web-override files: `index.html`, `cineplex.css`, AND a `locale-en-only/` directory containing every non-English `*-json.*.chunk.js` (af, ar, as, be, bg, bn, ca, cs, da, de, ... zh-tw, zu) bind-mounted RO over the container's locale chunks. Dev mounts ONLY `index-dev.html` over `index.html`. No CSS, no locale chunks.
- Prod traefik labels: `security-headers@file,compress@file,force-en-accept-lang@file`. Dev: `security-headers@file,no-guest@file`. Prod has NO `no-guest@file` directly on the docker-label router — its no-guest layer is enforced by the higher-priority `jellyfin-html-nocache` file-provider router (which ALSO adds `cache-no-store@file`, `clear-site-data@file` — see below).
- Prod env adds `JELLYFIN_UICulture=en-US`, `LANG=en_US.UTF-8`, `LC_ALL=en_US.UTF-8`. Dev has none.
**B — branding.xml / CustomCss diff:**
- Prod: 30,795 bytes. Full Cineplex CSS via `@import url("/web/cineplex.css")` (LOCAL bind-mount), ARRFLIX logo PNG embedded as base64 data-URI, Cast/Crew hidden, Quick Connect hidden, header buttons hidden, white slider thumbs, pure-black `--primary-background-color`.
- Dev: 26,345 bytes. Cineplex via `@import url("https://cdn.jsdelivr.net/gh/MRunkehl/cineplex@v1.0.6/cineplex.css")` (REMOTE jsDelivr — no /web/cineplex.css bind-mount). Same login disclaimer + Cast/Crew hide. **Confirmed dev has its OWN branding.xml on disk (not empty).**
**C — Per-user UICulture / settings:** Could not run `sqlite3` inside container (binary not present). Prod and dev both have separate config dirs (`/home/docker/jellyfin/` vs `/home/docker/jellyfin-dev/`). Dev config/data tree is a leaner subset (no `keyframes/`, no `splashscreen.png`, no `subtitles/`, no `device.txt`-only DB-shm/wal absence — dev DB sits idle without WAL == fewer active sessions, expected). Dev was set up as a fresh first-run wizard per `docs/12-dev-instance.md`, so its user table is its own admin only.
**D — encoding.xml diff:** Real divergence:
- Prod: `EnableThrottling=true`, `EnableSegmentDeletion=true`, `EnableTonemapping=true`.
- Dev: `EnableThrottling=false`, `EnableSegmentDeletion=false`, `EnableTonemapping=false`.
- Prod is the stricter/lower-resource HLS profile; dev keeps every segment around. Plausible contributor to the **HLS 499 client-disconnect** seen in section E (prod): if a client pauses/seeks while throttling+deletion are both on, segment 186 may be reaped before re-request lands.
**E — Surprising / smoking gun: Traefik headers prod-only, NOT applied to dev:**
- `curl -sI https://arrflix.s8n.ru/web/index.html` returns:
- `cache-control: no-cache, no-store, must-revalidate`
- `clear-site-data: "cache", "cookies", "storage"`
- `curl -sI https://dev.arrflix.s8n.ru/web/index.html` returns NEITHER. Just `x-frame-options: SAMEORIGIN`.
- Source: `/opt/docker/traefik/config/dynamic.yml` defines a HIGH-PRIORITY (priority:100) file-provider router `jellyfin-html-nocache` matching `Host(arrflix.s8n.ru) && Path(/, /web/, /web/index.html, /web/sw.js, /web/manifest.json)` with middlewares `security-headers,compress,cache-no-store,force-en-accept-lang,clear-site-data`. Dev's `dev.arrflix.s8n.ru` host has no equivalent file-provider router — only the docker-label router applies.
- The `clear-site-data` middleware was ADDED 2026-05-09 (today) as a "one-shot" to wipe SW+cache+storage. Comment in dynamic.yml literally says: *"Remove this middleware after owner has visited once and confirmed clean state."*
- **Implication:** Every prod page-load tells the browser to wipe cache + cookies + storage. If the SW intercepts before the header reaches the cache layer (per Q1 finding above) the header is harmless; but if any auth state or in-progress playback state is in storage when the header DOES land (e.g. on a forced refetch), it gets nuked. Dev does not have this and dev "works".
- Prod also has `jellyfin-locale-force-en` (priority:200) doing `replacePathRegex` from any locale-json chunk to `en-us-json.667484b4a441712c7e05.chunk.js`. The hash is hard-coded; if the deployed Jellyfin web bundle ever shipped a different en-us-json hash, EVERY locale chunk request returns a 404 wrapped as a successful rewrite to a non-existent path. Worth verifying the hash matches the live bundle.
**Suggested transplant (smallest reversible change):**
1. Remove the `clear-site-data@file` middleware from the `jellyfin-html-nocache` router in `/opt/docker/traefik/config/dynamic.yml` (one line). Keep `cache-no-store` so the SW-update fetch still bypasses heuristic cache. Traefik hot-reloads.
2. Verify with `curl -sI https://arrflix.s8n.ru/web/index.html` → no `clear-site-data` header.
3. If prod now behaves like dev, the CSD header was a major factor in the unresponsive page (storage wipe in flight while SPA boots = re-auth race + token loss).
4. Re-test playback. If still black-screen, suspect the encoding.xml `EnableThrottling+SegmentDeletion=true` combo and try toggling each off to match dev.
5. Last resort: also drop the `jellyfin-locale-force-en` rewrite and verify the hard-coded en-us-json hash is current with the running 10.10.3 bundle.
### Online research 2026-05-09
Research-only pass against current GitHub state. All URLs verified live this date.
**Q1 — UICulture per-user broken in 10.10.3?** No evidence the field was *removed* from `UserConfiguration` in the 10.10.x line. DeepWiki's settings-management page still documents per-user UICulture. The closest live regression is jellyfin/jellyfin#16117 ("Can't change plugins settings - Fixed by disabling **Cloudflare Rocket Loader**"): same shape — POST returns 2xx, body silently dropped, only over reverse proxy. Verdict: **probable** that our symptom is reverse-proxy-side body mangling, not a server-side schema removal. Sanity check: bypass Traefik (`curl --resolve arrflix.s8n.ru:8096:127.0.0.1` direct to container) and POST UICulture; if it persists there but not via Traefik, middleware is mutating the JSON. Discussion #15857 confirms `204 No Content` is the expected return code for these write endpoints — the 204 itself is not the bug. ([#16117](https://github.com/jellyfin/jellyfin/issues/16117), [discussion #15857](https://github.com/orgs/jellyfin/discussions/15857), [DeepWiki settings](https://deepwiki.com/jellyfin/jellyfin-web/5.2-user-settings))
**Q2 — Backdrops missing while posters work.** **Confirmed root cause = TMDB API change.** jellyfin/jellyfin#14922 (opened 2025-10-01, CLOSED) and #14951 (2025-10-06, CLOSED): TMDB swapped "no-language" backdrop tag from empty-string to `xx`; Jellyfin 10.10.x scrapes those as **Thumbs**, not Backdrops, so the Backdrops slot is empty. The Jellyfin team explicitly said it will not be backported to 10.10 — fix lands only in 10.11.0+. So our 10.10.3 instance has zero backdrops for any item added after ~Sep 2025 unless a non-`xx` language backdrop happened to exist. Issue #7264 (Movies showing backdrops *instead of* posters) is a separate 10.11.1 regression — opposite symptom, not relevant here, marked "Can't Reproduce" in #15259. Verdict: **confirmed** for our case. Mitigation = upgrade to 10.11.x and run "Replace existing images" on every item *after* upgrading. ([#14922](https://github.com/jellyfin/jellyfin/issues/14922), [#14951](https://github.com/jellyfin/jellyfin/issues/14951), [#7264](https://github.com/jellyfin/jellyfin-web/issues/7264))
**Q3 — Service Worker survival despite Clear-Site-Data.** **Confirmed.** Chrome's official Workbox guide states `Clear-Site-Data` "can't be relied on alone" because the SW intercepts the very response that would carry the header. Chromium SW Security FAQ explicitly recommends pairing CSD with a no-op SW. Same conclusion as our SW kill recipe section, validated from a second angle. ([Chrome Workbox](https://developer.chrome.com/docs/workbox/remove-buggy-service-workers), [Chromium SW FAQ](https://chromium.googlesource.com/chromium/src/+/main/docs/security/service-worker-security-faq.md))
**Q4 — Self-destruct SW pattern in Jellyfin community.** No Jellyfin-specific recipe published. Generic NekR self-destroying-sw is the canonical pattern (already cited above). BobHasNoSoul/jellyfin-mods ships a *replacement* SW (not a self-destruct one) — useful only as a reference for how others bind-mount over `/jellyfin/jellyfin-web/serviceworker.js`. Verdict: **no evidence** of a Jellyfin-curated kill recipe; we are first to ship one. ([NekR](https://github.com/NekR/self-destroying-sw), [BobHasNoSoul/jellyfin-mods](https://github.com/BobHasNoSoul/jellyfin-mods))
**Q5 — HLS fmp4 init-segment collision on restart.** **No evidence of collision in practice.** Jellyfin always passes `-start_number 0` and the init filename is `<hash>-1.mp4` (literal `-1`, not `%d`-derived); segments are `<hash>0.mp4`, `<hash>1.mp4`, ... so `-1` cannot collide with any positive `%d`. Restart spawns a *new hash* (different session id), so old and new sessions don't share filenames either. The active live bug is jellyfin/jellyfin#16612 — playback breaks after 1015 s in 10.11.8 with fMP4-HLS — but the cause traced in that thread is FFmpeg/segment-availability, not init-name collision. Tangentially: #12230 (CLOSED) is about the init filename being passed *relative* not absolute — only matters when Jellyfin's CWD ≠ transcode dir (rffmpeg setups). Verdict: **no evidence** that init-name collision causes our black-screen. Look at #16612 and at `Cache-Control: no-store` on `/Videos/*/hls1/*` instead. ([#16612](https://github.com/jellyfin/jellyfin/issues/16612), [#12230](https://github.com/jellyfin/jellyfin/issues/12230))
**Q6 — Cineplex theme repo activity.** Repo `MRunkehl/cineplex` last pushed **2025-09-06** (sha `98c8e71`, "Fixed more styles and script"). Description: "Updated jellyflix theme for newest jellyfin v10.10.7 and better netflix styles". **Zero open or closed issues** (issues tab is empty). No commits since 10.11.0 shipped, so the theme has not been validated against 10.11 image-type changes. Verdict: **probable** that backdrop CSS selectors target 10.10 DOM and may break or hide backdrops on a 10.11 upgrade. Audit `cineplex.css` for `.itemBackdrop`, `.backdropContainer`, `.cardBox-bottompadded` selectors before upgrading. ([repo](https://github.com/MRunkehl/cineplex))
**Q7 — Jellyfin 10.11.8 changelog.** **Does NOT fix our issues directly.** Server 10.11.8 ships only 3 changes: subtitle-language library handling, subtitle saving, and language-filter querying. jellyfin-web 10.11.8: a single PR (#7796) for lazy device-info loading. Released as a regression-revert from 10.11.7 ahead of CVE/GHSA disclosure. None of UICulture persistence, SW poisoning, or fMP4 playback are addressed in .8 itself. However the TMDB-backdrop fix (Q2) lands in the 10.11.0 baseline that .8 inherits. Verdict on .8 specifically: **no evidence** it helps directly; **confirmed** the 10.11 line fixes Q2. Upgrade target = 10.11.8 (latest stable: 10.11.0 backdrop fix + .7 security fixes + .8 regression reverts). ([10.11.8 server](https://github.com/jellyfin/jellyfin/releases/tag/v10.11.8), [10.11.8 web](https://github.com/jellyfin/jellyfin-web/releases/tag/v10.11.8))
### Recommended action sequence
**Option A — Self-destruct shim (RECOMMENDED, verified working):**
```bash
# On nullstone, in the arrflix compose dir:
cat > /opt/docker/arrflix/web-overrides/serviceworker.js <<'EOF'
self.addEventListener('install', e => self.skipWaiting());
self.addEventListener('activate', e => {
self.registration.unregister()
.then(() => self.clients.matchAll())
.then(cs => cs.forEach(c => c.navigate(c.url)));
});
EOF
# Add to compose volumes (same pattern as index.html):
# - /opt/docker/arrflix/web-overrides/serviceworker.js:/jellyfin/jellyfin-web/serviceworker.js:ro
docker compose -f /opt/docker/arrflix/compose.yml up -d --force-recreate jellyfin
# Force Traefik to send no-cache on the SW script so browsers refetch immediately:
# middleware: response header Cache-Control: no-cache, no-store, max-age=0 on /web/serviceworker.js
```
- **Side effects:** every existing browser session navigates to its current URL once on next page load — looks like a single auto-refresh. No data loss. New visitors get the shim, immediately unregister, never see it again.
- **Recovery:** revert by removing the bind-mount line + `up -d --force-recreate`. Original SW returns.
- **Verify:** `curl -skI https://arrflix.s8n.ru/web/serviceworker.js` → 200 + `Cache-Control: no-cache`. Body matches the shim. In an incognito window: open DevTools → Application → Service Workers shows registration *then* "redundant" within seconds.
**Option B — Serve 404 (may work, less reliable):**
```bash
# Traefik file-provider snippet:
# - /web/serviceworker.js → middleware that returns 404 (errors middleware → static 404 service)
# Or simply: bind-mount an empty file and add a Traefik replacePathRegex to a non-existent path.
```
- **Side effects:** Chrome/Firefox unregister on next *Update* fetch (typically next navigation after >24h, or sooner if user reloads). Slow rollout. Some users may stay stuck for a day.
- **Recovery:** remove the rule, original SW returns on next image rebuild.
- **Verify:** `curl -skI https://arrflix.s8n.ru/web/serviceworker.js` → 404. DevTools shows SW going "redundant" after a navigation+reload cycle.
**Option C — Do nothing server-side, force user manual:**
- User opens DevTools → Application → Service Workers → Unregister, OR `chrome://serviceworker-internals` → Unregister, OR clears site data.
- **Side effects:** every user must do this individually; non-technical users can't.
- **Recovery:** trivial, nothing changed.
- **Verify:** per-user; no server signal.
**Decision:** Go with **Option A**. It is the Google-recommended pattern, is the only approach that auto-fixes already-loaded tabs without user action, and is reversible by removing one line from compose.
### SW source + image cache
**(Agent run 2026-05-09 — verifies the stock SW source live on the running container, and probes server-side image health for a known item. Important: contradicts the working assumption that the SW is intercepting fetches.)**
**Part 1 — `/web/serviceworker.js` source + interception map**
Both `docker exec jellyfin cat /jellyfin/jellyfin-web/serviceworker.js` and `curl -sk https://arrflix.s8n.ru/web/serviceworker.js` return the **same** file (~1KB single line):
```js
(self.webpackChunk=self.webpackChunk||[]).push([[82798],{16764:function(n,e,t){
t(78557),t(90076),
self.addEventListener("notificationclick", function(n){ /* opens window or calls connectionManager */ }, !1),
self.addEventListener("activate", function(){ return self.clients.claim() })
}}, function(n){ n.O(0,[59928], function(){ return 16764, n(n.s=16764) }), n.O() }]);
```
**Interception map — there is none.**
- No `fetch` event listener in this file.
- Only listeners: `notificationclick` and `activate` (calls `clients.claim()`).
- `t(78557)` and `t(90076)` are webpack require calls for two other modules — those *might* register fetch handlers, but they are NOT in this bundle (they live in lazy chunks under `/web/*.chunk.js`). The chunk IDs `82798` / `59928` map to the notification module only.
- **No CacheStorage usage anywhere in this bundle.** No `caches.open`, `caches.match`, `cache.put`. So this SW does **NOT** cache `/Items/{id}/Images/*`, `/Videos/{id}/*`, `/web/*-json.*.chunk.js`, or `/web/index.html`.
**Conclusion:** Jellyfin 10.10.3 web's stock SW is push-notification-only. It does not intercept fetches and owns no CacheStorage entries. This **confirms agent Q4 finding** ("notification-only — no `fetch` listener, no cache logic") against the running container — not just spec/source, the literal bytes Jellyfin is shipping.
**Implication for Section C diagnosis:** "SW intercepts the GET to `/web/index.html` and serves from cache" is **false**. With no `fetch` handler the SW cannot intercept. `Clear-Site-Data` would already reach the network response — the real blocker for stale German chunks is **HTTP browser cache** (memory + disk), not Service Worker cache.
**Replacement plan:** The self-unregister shim is still safe and useful as belt-and-braces — installs cleanly, deletes any caches that ever existed, unregisters, force-reloads. Bind-mount path inside container is `/jellyfin/jellyfin-web/serviceworker.js`. But it is **not the missing piece** for the German leak. Real fix: existing `Cache-Control: no-store` + `Clear-Site-Data` headers on `/web/index.html` plus a **hard reload** (Ctrl+Shift+R) or DevTools → Application → Clear storage on user's browser.
**Part 2 — Image cache state**
```
/home/docker/jellyfin/config/metadata = 112M (well-populated)
/library/<hh>/<item-id>/poster.jpg present in sampled items
/home/docker/jellyfin/cache = 59M
/images/resized-images/{0..f} = 16 hex subdirs, all populated with .webp tiles
```
Agent 7's earlier note "**only `resized-images` subdir present**" is **still true**`/cache/images/` contains only `resized-images/`, no `original/` or `remote/`. That is the **expected** Jellyfin layout (originals live under `/config/metadata/library/`, only resizes live under `/cache/images/resized-images/`). Not a bug.
API probe for item `7aa5add2c2d8575eda5280b9b9072071` (The Mike Nolan Show) via temp token (revoked after), all four image types via `https://arrflix.s8n.ru`:
| Endpoint | Status | Content-Type | Notes |
|---|---|---|---|
| `/Items/{id}/Images/Backdrop` | **200** | image/jpeg | served, `age: 5400` (90min upstream cache) |
| `/Items/{id}/Images/Primary` | **200** | image/jpeg | served |
| `/Items/{id}/Images/Logo` | **200** | image/png | served |
| `/Items/{id}/Images/Thumb` | **200** | image/jpeg | served |
**Verdict:** Server-side images are healthy. Backdrop + Primary + Logo + Thumb all 200 with valid content-types for a real item the user is browsing. The "all backdrops black" symptom (Section D) is **NOT** a server-side image problem and **NOT** a SW-cache problem. Likely culprits remaining:
- (a) CSS rule in deployed `index.html` overrides / theme overrides hiding `.itemBackdrop` or setting `opacity: 0`;
- (b) browser HTTP cache holding stale 404s from earlier broken state — same Ctrl+Shift+R fix as Part 1;
- (c) a custom-css.user.css backdrop opacity:0 / display:none rule.
Recommend: in user's browser open one show page, DevTools → Network → filter Img → look for `/Items/{id}/Images/Backdrop` request. If 200 served but invisible → CSS theme leak. If never requested → SPA template not fetching it (theme-side bug).
### Backdrop diagnosis
Investigation 2026-05-09. User reported: detail-page backdrops are pure black on prod (`arrflix.s8n.ru`). Posters render fine. Used a temp ApiKey row (`Name='arrflix-backdrop-diag-2026-05-09'`, deleted after diag) on the live `jellyfin` container.
**Layer A (server) — RULED OUT.**
- Item `7aa5add2c2d8575eda5280b9b9072071` (The Dark Knight) JSON returns `BackdropImageTags: ['76cac7069dc988f7cd54e99b481db3fc']`. Tag exists.
- `HEAD https://arrflix.s8n.ru/Items/.../Images/Backdrop``HTTP/2 200`, `content-type: image/jpeg`, `content-length: 560210`, `last-modified: 2026-05-08 22:11:50`.
- Same call against `dev.arrflix.s8n.ru` → also 200 + image/jpeg. Both prod and dev serve backdrop bytes correctly.
**Layer C (browser cache / SW) — RULED OUT.**
- The stock SW (Section "SW source + image cache" above) does not intercept `/Items/*/Images/*`. Backdrop URL also returns fresh on direct curl (no SW in path).
**Layer B (CSS) — CONFIRMED. The CustomCss `BLACK-PASS` block hides the image layer.**
The Jellyfin DOM has two distinct elements (verified by reading `main.jellyfin.bundle.js` + `main.jellyfin.1ed46a7a22b550acaef3.css` inside the running container):
1. `.backdropContainer` — stock CSS: `position:fixed; bottom:0; left:0; right:0; top:0; z-index:-1`. Holds a child `<div class="backdropImage">` whose `style.backgroundImage="url(/Items/.../Backdrop)"` is injected by JS (`r.style.backgroundImage="url('".concat(e,"')")` in the bundle). This is the IMAGE LAYER.
2. `.backgroundContainer` (no `d`) — separate `position:fixed` overlay; gets the `withBackdrop` class toggled by JS. This is the OVERLAY LAYER. Stock CSS sets `body { background-color: transparent !important; }` precisely so the body never occludes the `z-index:-1` backdrop.
Bug 1 — **`!important` blacks override stock body transparency.** CustomCss `BLACK-PASS 2026-05-08` block (lines ~110-202 of branding.xml CustomCss) sets `background-color: #000000 !important` on `html, body, #reactRoot, .skinBody, .preload, .mainAnimatedPages, .pageContainer, .libraryPage, .itemDetailPage, .padded-bottom-page, .layout-desktop, .layout-mobile, .layout-tv` etc. Since `.backdropContainer` is at `z-index:-1`, ANY ancestor with an opaque background paints on top of it, hiding the backdrop image entirely.
Bug 2 — **The transparent-scope rule at lines 102-107 is incomplete.** It scopes to `body.itemDetailPage, body.itemDetailPage #reactRoot, body.itemDetailPage .mainAnimatedPages, body.itemDetailPage .skinBody`, but does NOT include `.layout-desktop` / `.itemDetailPage` itself / `.layout-tv` / `.pageContainer` / `.padded-bottom-page` — so those wrappers remain `#000` on detail pages and continue to occlude the `z-index:-1` layer.
Bug 3 (cosmetic — not the cause of black) — line 89-101 sets `background-image: linear-gradient(...)` on `.layout-desktop .backgroundContainer.withBackdrop`. That's the OVERLAY layer, fine on its own. But because the actual backdrop image is hidden by Bug 1, the gradient now composites against pure black instead of the backdrop, so the user sees only the gradient (which fades from black to transparent) over a black backdrop = solid black with at most a faint gradient edge.
**Cross-check:** dev (`dev.arrflix.s8n.ru`) does NOT mount the `BLACK-PASS` CustomCss block (Section B above confirms dev branding.xml is 4.5KB smaller and uses remote jsDelivr Cineplex without local overrides). Opening dev should show backdrops normally; if it does, that's a clean A/B confirmation that prod's CustomCss is the regression.
**Fix recipe (smallest reversible change).**
In `/home/docker/jellyfin/config/config/branding.xml` `<CustomCss>` block, extend the `body.itemDetailPage` transparent-scope rule (currently lines 102-107) to also cancel the black backgrounds on every wrapper that the BLACK-PASS block paints:
```css
/* Replace existing block at lines 102-107 with: */
body.itemDetailPage,
body.itemDetailPage #reactRoot,
body.itemDetailPage .mainAnimatedPages,
body.itemDetailPage .skinBody,
body.itemDetailPage .layout-desktop,
body.itemDetailPage .layout-mobile,
body.itemDetailPage .layout-tv,
body.itemDetailPage .pageContainer,
body.itemDetailPage .padded-bottom-page,
body.itemDetailPage .itemDetailPage,
body.itemDetailPage #mainPanel,
body.itemDetailPage #mainDrawerPanel {
background-color: transparent !important;
background: transparent !important;
}
```
This keeps `#000` everywhere else (library, search, dashboard) but reveals the `.backdropContainer > .backdropImage` layer on detail pages — which is what the gradient overlay (Bug 3) was originally designed to compose against.
**Apply via Dashboard → Branding → Custom CSS** (no container restart needed; CSS reloads on next page render). Editing branding.xml directly works too but Jellyfin re-serializes on save, so use the Dashboard.
**Verify after edit:** open a movie detail page in an incognito window (bypasses SW). Expected: full-bleed backdrop visible at right ~70% of viewport, gradient fade from black on the left. If still black: hard-refresh + DevTools → Elements → search `.backdropImage` and confirm its parent chain has no `background-color` other than transparent.
**Recovery:** revert to the original 6-selector block.
---
### Playback diagnosis
Investigation date 2026-05-09 ~00:3000:45 UTC. Live transcode test against prod jellyfin via temp ApiKey `arrflix-playback-diag-2026-05-09` (deleted at end of session, verified empty SELECT after DELETE).
**A) Source codec verdict — the ItemId is mis-attributed in this incident report.** ItemId `7aa5add2c2d8575eda5280b9b9072071` is **The Dark Knight (2008)**, NOT "The Mike Nolan Show". Confirmed via `/Users/{u}/Items?searchTerm=...`:
- `7aa5add2...` → Movie / `/media/movies/The Dark Knight (2008)/The Dark Knight (2008).mkv`**HEVC Main 10 / yuv420p10le, 1918x800, TrueHD 24-bit + AC3 + 2× PGS**.
- The Mike Nolan Show series Id is `37cb910f507c4d1f9e365ef1954f99c2`. Episodes (e.g. S01E04 "Ding Dong Delli") are **AV1 Main / yuv420p / Opus**, ~412 kbps total.
(So the prior Section D backdrop-probe line that labelled `7aa5add2...` as MNS is also wrong — those Backdrop/Primary/Logo/Thumb 200s were TDK images. Doesn't change Section D's conclusion that backdrops serve fine.)
Chrome advertises `av1,h264,vp9` (NOT hevc, NOT vp8). So:
- **TDK (HEVC 10-bit)**: must transcode → server picks libx264 High@4.0 yuv420p (8-bit) AAC LC stereo. Fully Chrome-decodable.
- **MNS episodes (AV1+Opus)**: should DirectPlay/DirectStream — Chrome supports both natively.
**B) HLS pipeline verdict — server-side fully working.** PlaybackInfo POST returned `TranscodingUrl=/videos/.../master.m3u8?VideoCodec=h264&...`, `SupportsTranscoding=True`, `TranscodingSubProtocol=hls`. Manual fetches on TDK:
- master.m3u8 → HTTP 200, valid `#EXTM3U`, single variant `BANDWIDTH=13407532, RESOLUTION=1918x800, CODECS="avc1.424029,mp4a.40.2"` (the `424029` decodes to "Baseline 4.1" but actual stream below is High — known cosmetic Jellyfin mislabel, not a Chrome blocker).
- main.m3u8 sub-playlist → HTTP 200, segments `hls1/main/0.ts``9.ts`, 3-second EXTINF.
- segment 0.ts → HTTP 200, 269 KB. ffprobe verdict: `h264 High / yuv420p / level 4.0, 1918x800` + `aac LC`. Valid 8-bit H.264. Cache dir during playback contains 40+ valid `.ts` segments. No fmp4 init filename collision (mpegts segments in current run; the earlier fmp4 path's `-1.mp4` init pattern with `start_number=0` is also fine — `-1.mp4` literally has the `-1` infix in filename, while data segments are `0.mp4, 1.mp4...`; no actual name collision).
**C) CSS verdict — video element NOT hidden.** Read `branding.xml` CustomCss + `cineplex.css` (full). All `display:none` / `visibility:hidden` / `opacity:0` / `transform:scale(0)` matches are on UI chrome (`#castCollapsible`, `#guestCastCollapsible`, `.btnQuick`, `.headerSyncButton`, `.headerCastButton`, `.headerUserButton`, MUI drawer items, `.countIndicator`, `#loginPage h1`, etc.). The only `video::*` / `:cue` rules touch subtitle font only. **No hide/scale rule hits `.htmlvideoplayer`, `.videoPlayerContainer`, or the `<video>` element itself.** CustomCss is not the cause of the black screen.
**D) Service Worker verdict — no fetch interception.** `/web/serviceworker.js` is the stock Jellyfin notification-only handler (`notificationclick` + `activate→clients.claim`). No `install` cache, no `fetch` listener. Cannot intercept HLS or video URLs. Already characterised in the prior "SW kill recipe" section — stock SW is harmless for media playback.
**E) Web research findings.** No 10.10.3-specific Chrome black-screen bug surfaced for the HLS path. Closest historical pattern: hls.js + AV1+Opus DirectStream where Jellyfin 10.10 mis-builds the codec attribute on the playlist for AV1, causing hls.js to abort. Common workaround: force transcode via DeviceProfile or restrict AV1 in user policy. No citation strong enough to assert as root cause from outside the live browser.
**F) The actual story — and the fix recipe.**
Timeline reconstruction from server logs for the user's session (192.168.0.10):
- `00:28:46` — PlaybackInfo for `7aa5add2...` (TDK).
- `00:28:47` → ffmpeg launches on `/media/movies/The Dark Knight (2008)/...mkv` (libx264 High@5.1, fmp4).
- `00:28:53`, `00:29:01` — ffmpeg restarts at `-ss 00:04:18` and `00:09:06` (= **user seeking forward** during TDK playback).
- `00:29:07`*"Playback stopped … playing The Dark Knight. Stopped at 549885 ms"* (= 9:09).
- `00:29:28`*"Playback stopped … playing F.T.C. Stopped at 39053 ms"* (MNS S01E02).
- `00:42:42`*"Playback stopped … playing Ding Dong Delli. Stopped at 20905 ms"* (MNS S01E04).
What this means: TDK transcoded and played fine for 9 minutes with seeks — **TDK is not black-screening**. The MNS episodes (AV1+Opus, 20-39 s before stop) match the user-perceived "black screen, give up" pattern. The incident report conflated these — user said "Mike Nolan Show + ItemId 7aa5add2" but the ItemId is TDK and the actual symptom is on the AV1 MNS episodes.
The 00:42:49 ffmpeg launch on TDK that appears AFTER MNS stop is **my own diagnostic curl** — its PlaySessionId `14f52f35eee04cec8146379c0dc6c960` matches the one I generated. Disregard as evidence of user behaviour.
**Recommended fix sequence (ordered by likelihood):**
1. **Re-run with the right item.** Ask user to repro on MNS S01E04 (`Ding Dong Delli`), capture browser DevTools Network panel: was `/Videos/.../master.m3u8` issued (transcode path) or only `/Videos/.../stream.webm` (DirectStream)? What does `/Items/.../PlaybackInfo` return for `SupportsDirectStream` on the AV1 source? Capture the JS console for hls.js / shaka / MediaSource errors.
2. **If DirectStream is on for AV1** → force transcode by adding a `CodecProfile` in the user's DeviceProfile that bans AV1 DirectStream (Type=Video, Codec=av1, Container=mkv,webm → forced conditional Direct=false). Server then falls back to libx264 transcode (CPU-only on nullstone, slow but reliable).
3. **Cross-browser test** — try Firefox. Different hls.js behaviour for AV1. If Firefox plays MNS but Chrome doesn't, confirms client-side AV1 DirectStream bug not server.
4. **TDK is fine** — leave alone, unrelated to this incident.
**Out-of-scope here:** dev.arrflix.s8n.ru `/Sessions` returned 401 with the api_key (Sessions needs a user-token, not just admin api_key). Recommend redoing the dev comparison through the user's browser cookie session.
API key cleanup verified: `SELECT Name FROM ApiKeys` returned empty after DELETE.
---
## Final fix applied (verified via playwright headless)
Status: **CLOSED** for symptoms 1-4. Symptom 5 (video black-screen on AV1+Opus
items) is a separate codec issue tracked for the 10.11.8 migration.
### Three patches landed
1. **`branding.xml` CustomCss**: append `content: "Play"` override on
`.mainDetailButtons .material-icons.play_arrow::after`. Cineplex theme
hardcoded German "Abspielen" via CSS `content:` rule — NOT a Jellyfin
locale issue. Hours of Traefik `Accept-Language` rewrites and
`force-english-all-users.sh` chases were chasing the wrong layer entirely.
2. **`branding.xml` CustomCss**: backdrop transparent-scope using `:has()`.
`body.itemDetailPage` selector (from prior docs) does NOT match in
10.10.3 — body class is `libraryDocument`. New rule scopes by
`.layout-desktop:has(.itemDetailPage)` etc so backdrop layer (z-index:-1)
renders behind detail pages without breaking other surfaces.
3. **`encoding.xml`**: `EnableThrottling=false` + `EnableSegmentDeletion=false`.
Kills HLS 499 (segments reaped before browser re-requests).
### Headless verification
`bin/headless-test.py` (new) logs in via Jellyfin SPA login form using
playwright Chromium, navigates to detail page, screenshots, and probes
computed styles. Used to bisect:
- baseline screenshot (broken)
- `:has()` selector verified backdrop renders
- "Play" verified replaces "Abspielen"
### Re-apply
`bin/apply-26-incident-fixes.sh` (new, idempotent) re-applies all three
patches if `branding.xml` / `encoding.xml` drift back. Run via:
`ssh user@nullstone "$(cat bin/apply-26-incident-fixes.sh)"`
### What was rolled back
- The `clear-site-data@file` Traefik middleware I added during this session
was making prod worse: it was wiping cookies+storage on every visit,
breaking auth+playback session continuity. Reverted by restoring the
Traefik dynamic.yml backup taken right before the edit.
---
## Do-NOT-repeat checklist (post-mortem)
These are the dead-ends. Future operators (and future me) should skip:
1. **Don't add `Clear-Site-Data` to a Jellyfin route to "force the SW out".**
Stock Jellyfin SW is notification-only (no fetch handler) — there is no
SW poisoning to begin with. The middleware just wipes cookies on every
visit, breaking auth races.
2. **Don't run `bin/force-english-all-users.sh` to fix "Abspielen".**
Doc 25 already established per-user `Configuration.UICulture` is theatre
and the SPA never reads it. The German text was in **Cineplex CSS** via
`content: "Abspielen"`. Patch the CSS, not the user config.
3. **Don't trust HTTP 204 from POST `/Users/{id}/Configuration` as success.**
Always GET back and verify. (And see #2 — even if you CAN persist
UICulture, it doesn't drive UI strings in 10.10.x.)
4. **Don't use `body.itemDetailPage` as a CSS selector in 10.10.3.**
The body class on detail pages is `libraryDocument`, not `itemDetailPage`.
Use `.itemDetailPage` directly or `:has(.itemDetailPage)` on ancestors.
5. **Don't paint `#000 !important` on `.layout-desktop` / `.pageContainer`
without scoping.** They wrap the backdrop layer; an unscoped black
override occludes the entire backdrop. Always scope with `:has()` or by
page-specific class.
6. **Don't hot-patch `web-overrides/index.html` on the server without
committing back to repo same step.** Drift from repo is invisible until
it breaks. Bug A (the DOM-walker MutationObserver freezing the browser)
came from this exact pattern — see `~/.claude/projects/.../memory/feedback_always_commit_to_my_git.md`.
7. **Don't write CSS Mutation/text-walker observers without debounce + scope.**
Walking every text node on every DOM mutation freezes the main thread on
poster grids. If you need DOM rewriting, use targeted selectors + debounce.
8. **Don't sed-via-python regex on YAML files without strict anchors.**
I damaged `dynamic.yml` with a too-greedy DOTALL match earlier in this
session (deleted unrelated routers). Restore-from-backup saved it.
Always diff before reload.
9. **Don't believe a single-itemId test as "playback works".** Item
`7aa5add2c2d8575eda5280b9b9072071` is The Dark Knight (HEVC, transcodes
fine to H.264). The Mike Nolan Show episodes are AV1+Opus and break in
Chrome. Always test the actual item the user reported.
10. **Don't skip headless smoke-test.** Visual confirmation in playwright
Chromium catches CSS regressions instantly without waiting for the user
to clear browser cache. `bin/headless-test.py` is a 30s round-trip.
---
## Iteration 2 — backdrop visible only on top viewport (2026-05-09 follow-up)
After INC1 (`:has()` transparent-scope) shipped and prod showed backdrop on
detail-page top, owner reported "in the middle of the More from Season 1
is black, it's hiding the artwork". Below-the-fold sections (Next Up, Seasons,
More Like This) showed solid black instead of continuing the backdrop.
### Root cause (INC2)
`.backdropContainer` defaults to non-fixed positioning — it scrolls out of
view. INC1 made wrappers transparent so backdrop showed through, but only
where the backdrop EXISTED in the DOM viewport. Once user scrolls down,
backdrop is above viewport, sections see body's `#000` bg.
### Fix INC2
Pin `.backdropContainer` + `.backgroundContainer` to `position: fixed; top:0;
height:100vh; z-index:0`. Added `::after` vertical gradient (transparent at
top → 75% black at bottom) so text remains readable as user scrolls into
backdrop area.
### Root cause (INC3)
INC2 alone didn't fix it visually — section wrappers (`.detailVerticalSection`,
`.scrollSliderContainer`, `.padded-bottom-page`, `.itemsContainer` etc) still
painted opaque bg from BLACK-PASS + finity. Pinned backdrop sat behind, but
sections occluded it section-by-section.
### Fix INC3
Extended transparent-scope to all detail-page sub-sections:
`.itemDetailPage > *`, `.detailPageContent`, `.detailPagePrimaryContainer`,
`.detailPageWrapperContainer`, `.detailVerticalSection*`, `.detailSection*`,
`.itemsContainer`, `.scrollSlider*`, `.padded-bottom-page`,
`.sectionTitleContainer`, `.detailRibbon`, `.subtitleAudioContainer`,
`.detailPageRoot`.
### Verification (INC2 + INC3)
Updated `bin/headless-test.py` to take TWO viewport screenshots: top-of-page
+ scrolled to 50% page height. With INC2/INC3 applied, scrolled screenshot
shows R&M backdrop persisting behind "Seasons" + "More Like This" sections
(previously: solid black).
### Lesson learned
When pinning a backdrop with `position:fixed`, transparency must extend
RECURSIVELY through every wrapper ON TOP of the backdrop layer, not just the
top-level page wrappers. Test with scrolled screenshot — full-page screenshot
in playwright stretches viewport and hides `position:fixed` issues.
`bin/headless-test.py` now takes both top + scrolled. Use both to bisect.
---
## Open follow-ups (for separate sessions)
- **AV1+Opus playback** (Bug E): Chrome's AV1 DirectStream codec-tag mislabel
bug. Fix options: (a) ban AV1 DirectStream via DeviceProfile (force x264
transcode), (b) re-encode MNS source to H.264, (c) wait for 10.11.8
upgrade. See agent finding in this doc → "Playback diagnosis".
- **10.11.8 migration**: current 10.10.3 has known issues per online research
(TMDB scrape regression #14922, custom CSS injection #7220). 10.11.8 is
current stable as of 2026-05-09 with CVE fixes. Plan: dev first, snapshot
EF Core DB migration, swap Cineplex → ElegantFin (10.11-supported), promote
to prod after verified.
- **Permanent SW kill option** (deferred — stock SW doesn't actually
intercept anything): if a future Jellyfin update enables a real fetch-handler
SW, we have the recipe in this doc → "SW kill recipe" agent finding.
- **Session-state backup off-host** (ROADMAP H4): no automated backup yet.
Today's incident was rescued by inline `cp X X.bak.$(date +%s)` for both
branding.xml and dynamic.yml — should be systematized.