ARRFLIX/docs/26-incident-2026-05-09-page-unresponsive-and-playback.md

1305 lines
95 KiB
Markdown
Raw Normal View History

# 26 — Incident 2026-05-09: Page Unresponsive + Posters Missing + Playback Black-Screen
> Session log. Live document — updated as fix proceeds. Goal: future-me + other operators can read this and skip every dead-end I already walked.
Status as of doc creation: **ONGOING** — partial fix applied, more under investigation.
---
## Symptoms reported by owner (in order)
1. "Browser arrflix is broken videos don't play at all"
2. "I can't even see a preview of the TV series / movie"
3. After first fix: page loads, posters render, but **"Page Unresponsive"** Chrome dialog before posters paint (screenshot 1)
4. After second fix attempt: posters render, but **"Abspielen"** (German Play button) instead of "Play"; **all backdrop art replaced by black**; **video plays as black screen** (screenshot 2)
---
## Root causes identified so far
### A — Browser hangs (resolved by fix #1)
`/opt/docker/jellyfin/web-overrides/index.html` deployed copy was AHEAD of repo HEAD. md5 deployed `b97c1cb4` ≠ repo `d77c106b`. Someone hot-patched a `forceEnglishUI()` text-walker MutationObserver onto `document.body` with `subtree:true, characterData:true`. Walker rewrote `alt`/`title`/`aria-label` on every DOM mutation. Poster grid lazy-load fired it hundreds of times → main thread frozen → Chrome "Page Unresponsive".
**Fix applied:** scp'd repo HEAD `index.html` over deployed, restarted container. Verified md5 matches.
**Lesson:** never hot-patch the bind-mount. Always commit + redeploy from repo. Drift is invisible until something breaks.
### B — DB write failures (auto-resolved before this session)
Agent investigation found `jellyfin.db` had been owned by uid 101000 (userns-remap leftover, see `~/.claude/projects/-home-admin-ai-lab/memory/project_nullstone_docker_userns.md`). Container ran as 1000 → SQLite Error 8: `attempt to write a readonly database`. By the time we re-checked, file was already `user:user`. Probably fixed during 23:22 container restart.
**Lesson:** if `jellyfin.db` is unwritable, EVERY user-config save silently fails (HTTP 204 success, value not persisted). Check ownership FIRST when config writes don't stick.
### C — German "Abspielen" leak (NOT YET FIXED — current focus)
User's `Configuration.UICulture` is `<absent>` for ALL 12 users. Tried POST `/Users/{id}/Configuration` with `UICulture: en-US` payload via `bin/force-english-all-users.sh`. Server returned HTTP 204 but field did NOT persist on subsequent GET. **POST silently drops UICulture**.
Possible explanation: the `UserConfiguration` model in 10.10.3 may have removed the per-user UICulture field, OR the `Users` table schema (verified) has no UICulture column AND no Preferences row stores it. Doc 15 claims `Configuration.UICulture` is authoritative, but that doc is from when fix worked. Behavior may have shifted.
Traefik DOES rewrite `Accept-Language: en-US,en;q=0.9` on every request (`force-en-accept-lang@file` middleware) AND rewrites locale chunk JS path so `de-json.X.chunk.js``en-us-json.667484b4a441712c7e05.chunk.js`. Verified via curl: `de-json.X.chunk.js` returns 107425 bytes of English content.
**So why German leaking?** Service Worker cache. Browser's SW serves stale German chunk from CacheStorage, never hits network, never sees the Traefik rewrite. SW from before the lockdown was deployed.
Tried: `Clear-Site-Data: "cache", "cookies", "storage"` Traefik response header on `/web/index.html`. Verified live via curl. **But the user's browser STILL has SW cache** — SW intercepts the GET to `/web/index.html` and serves from cache, response from server (with Clear-Site-Data) never reaches browser cache layer. SW prevents its own death.
### D — Backdrops missing (NOT YET INVESTIGATED)
User reports backdrop art (the wide background image behind episode cards) is now black for every show. Could be:
- Image not in DB/cache (server returning empty)
- CSS hiding backdrop element
- SW serving stale 404 from a bad earlier session
- Jellyfin metadata refresh interrupted
### E — Video black screen on play (NOT YET FIXED)
Server logs show ffmpeg IS transcoding HEVC source → H.264 high@5.1 + libfdk_aac. But browser shows black. Earlier `/Sessions` proved DirectPlay worked for one client (RemoteEndPoint 82.31.156.86). Recent attempts: HLS segment 186.mp4 returned **499 (client closed connection)** + `POST /Sessions/Playing/Progress` returned **502 Bad Gateway** at 23:31:49 (during traefik momentary upstream-missing window).
Possible causes:
- SW intercepting HLS init segment, serving stale/wrong-mime
- 10-bit HEVC source → H.264 transcode timing issue
- CSS hiding `<video>` element
- HLS init.mp4 vs segment naming bug (`hls_fmp4_init_filename "X-1.mp4"` + `hls_segment_filename "X%d.mp4"` — collision risk)
---
## Actions taken this session
| # | Action | Outcome |
|---|---|---|
| 1 | scp repo `index.html` → deployed; `docker restart jellyfin` | DOM-walker shim gone. Page no longer hangs. |
| 2 | Insert temp ApiKeys row in jellyfin.db, run `bin/force-english-all-users.sh` | POST 204 but UICulture NOT persisted. Possibly server-model dropped field. |
| 3 | Add `clear-site-data@file` Traefik middleware to `jellyfin-html-nocache` router | Header lives. But SW intercepts before browser cache layer can apply. |
| 4 | Revoke temp ApiKey | Done. |
---
## What did NOT work (don't repeat)
- `bin/force-english-all-users.sh` against 10.10.3 — POST 204 but field dropped server-side. Either model changed or DB write path broken differently than uid-101000 issue.
- `Clear-Site-Data` response header alone — SW intercepts and the header never reaches browser cache eviction. Need to kill SW BEFORE it can intercept.
## Forbidden patterns
- Hot-patching `web-overrides/index.html` without committing to repo. Bug A came from this exact pattern. Repo MUST = deployed.
- Trusting HTTP 204 as success. Verify with GET.
- Client-side DOM-walker MutationObservers without debounce + scope. Will tank performance + freeze browser.
---
## Plan (in flight)
1. Read every prior doc (`docs/01..25`) — extract what was tried + outcome (agent task)
2. Read git log of `web-overrides/`, `bin/force-english-all-users.sh`, `bin/inject-shim.py` (agent task)
3. Online: how to kill a Jellyfin Service Worker definitively (agent task)
4. Read `/web/serviceworker.js` source — what does it cache? (agent task)
5. Diagnose backdrop missing — server vs CSS vs SW (agent task)
6. Diagnose HEVC playback black screen — codec + segment + HLS (agent task)
7. Compare jellyfin-dev vs jellyfin (agent task — dev MAY be working, look at what's different)
8. Apply consolidated fix from agent findings
9. Verify in user browser
10. Commit doc 26 + any code changes; push to `git.s8n.ru/s8n/ARRFLIX`
---
## Findings from agents
### Repo archeology
Reference compiled 2026-05-09 from docs/13-25 + bin/* + git log. Use this to skip dead-ends.
**A - Locale lockdown - what's been tried + outcomes**
Chronological history (paths absolute):
1. `/home/admin/arrflix-repo/docs/15-force-english.md` (commit 14f63e8, 2026-05-08 04:22) - diagnosis: per-user `Configuration.UICulture` absent on all 5 users -> SPA falls back to `Accept-Language`. **Built `bin/force-english-all-users.sh`** (read-modify-write `POST /Users/{id}/Configuration` with `UICulture: en-US`, expect 204). Shipped one-line wrapper patch for `bin/add-jellyfin-user.sh` step 3/4 (`c['UICulture']='en-US'`). **Status at write-time: plan-only, script never executed.**
2. `/home/admin/arrflix-repo/docs/19-english-only-audit.md` (a3f82df) - confirmed UICulture still absent on 8/8 users; identified that **92 non-English `<lang>-json.<hash>.chunk.js` chunks reachable** (`de-json.1afccc006ab8bb6c5953.chunk.js` contains `"Play":"Abspielen"`). Proposed three orthogonal fixes: (a) Path-A Traefik `customrequestheaders.Accept-Language=en-US` middleware, (b) Path-B 1-byte chunk stub bind-mounts (brittle - chunk hashes rotate per JF image), (c) `navigator.language` shim in `inject-shim.py`. **Outcome: recommendations only.**
3. `/home/admin/arrflix-repo/docs/20-english-only-lockdown.md` (d5d6856) - operator doc declaring 4 layers (server, per-user, web SPA shim, Accept-Language). Ships `bin/english-lockdown-runner.sh` (idempotent re-apply for layers 1+2). Layer 3 = `web-overrides/english-lockdown.{js,css}` (sibling commit d2120c6). **Outcome: claimed working at write-time.**
4. `/home/admin/arrflix-repo/docs/25-english-leak-deep-dive-2026-05-08.md` (117fa33) - **critical retraction**: greppped the live web bundle and proved the SPA NEVER reads `Configuration.UICulture`. Only `wizard-start.<hash>.chunk.js` and `25583.<hash>.chunk.js` reference it, both for the admin `/System/Configuration` form, NOT user UI. Actual locale resolver reads `document.documentElement.getAttribute("data-culture")` -> `navigator.language` -> `navigator.userLanguage` -> `navigator.languages[0]` -> `localStorage.getItem("language")` (no user prefix). **Per-user UICulture POST = theatre. Only the shim's `Object.defineProperty(Navigator.prototype, 'language', ...)` actually pins SPA UI.** Verified with headless Trivalent `--lang=de-DE --accept-lang=de-DE,de,en` -> only `en-us-json.667484b4a441712c7e05.chunk.js` requested.
5. **Today's deployed shim** (`/home/admin/arrflix-repo/bin/inject-shim.py` lines 13-114) - does ALL of the above: `localStorage.setItem` for 6 keys (`appLanguage,selectedlanguage,selectedlocale,language,locale,culture`), `Object.defineProperty(Navigator.prototype, 'language')`, `Object.defineProperty(Navigator.prototype, 'languages')`, fallback `navigator.X` redefine, fetch+XHR wrappers stripping `Accept-Language` and rewriting `POST /Users/{id}/Configuration` body to force `UICulture:'en-US'`, `pinLocale()` re-runs every 1 s + on visibility-change. **This is the canonical recipe - anything that works lives here.** Doc 26 sec C confirms Traefik `force-en-accept-lang@file` middleware also rewrites `Accept-Language` per request, AND rewrites `de-json.X.chunk.js` -> `en-us-json.667484b4a441712c7e05.chunk.js` (curl-verified: de URL returns 107 425 bytes of English).
**B - Service worker handling - what's been tried + outcomes**
- `docs/13` finding 11 + `docs/23` sec 5 + `docs/25` hypothesis 2 - `/web/serviceworker.js` is **768 bytes**, `Last-Modified: 2024-11-19` (Jellyfin 10.10.3 ship). Source confirmed: only `notificationclick` handler + `clients.claim()`, **no `fetch` listener, no precache, no `cache.put`**. Stock SW cannot poison posters/HLS by design.
- `bin/inject-shim.py` lines 174-188 - shim already calls `navigator.serviceWorker.getRegistrations().then(regs => regs.forEach(r => if scriptURL.includes('serviceworker.js') r.unregister()))` AND `caches.keys().then(keys => keys.forEach(caches.delete))`. **Built-in SW kill + cache wipe runs every page load.** In production now.
- `docs/25` R1 - proposed `Cache-Control: no-cache` on `/web/index.html` to stop heuristic caching of pre-shim HTML (Path-A label-scoped Traefik middleware). **Status: not applied at doc-25 write-time.**
- Doc 26 sec C - added `clear-site-data@file` Traefik middleware. Header reaches curl, but **SW intercepts before browser cache layer can apply Clear-Site-Data - SW prevents its own death**. SW kill must come from inside the SW (self-destruct) or via Update fetch returning 404. See SW kill recipe section below.
**C - Backdrop / artwork issues - any prior doc covers this?**
- `docs/14` - only doc that touches detail-page backdrops. Diagnosed Finity-parent's `--detail-page-backdrop-offset: 17%` + `mask.png` from `raw.githubusercontent.com/prism2001/finity/main/assets/mask.png`. Two CSS culprits clamping the band hard-black: (a) `:root --primary-background-color: #000 !important`, (b) `html, body, .preload, .skinBody, ..., #reactRoot, .mainAnimatedPages, .dashboardDocument { bg:#000 !important }`.
- `docs/14` sec 7 proposed CSS fix (`linear-gradient` overlay, `body.itemDetailPage` scope-out for bg-clamp). Doc 21 sec 4 cross-ref says "just landed".
- `docs/23` finding 6 - `/Items/{id}/Images/Primary` returns `Cache-Control: public` with NO max-age (heuristic = 0 s); cold poster transcode 350-470 ms; on-disk image cache `/cache/images/resized-images/` is 39 MB / 412 files / 16 h retention.
- `docs/24` sec 4 - image cache 39 MB total, 412 files, no GC pressure, oldest 16 h old.
- **No prior doc covers "all backdrops replaced by black" as a regression.** Closest precedents: doc 14 hard-black left band (CSS layer), doc 23 poster timing (cold-cache layer). New investigation territory for doc 26.
**D - Video playback / HLS / transcode issues - any prior doc?**
- `docs/13` finding 03 - `EnableThrottling=false`, `EnableSegmentDeletion=false`, `MaxMuxingQueueSize=2048`, `SegmentKeepSeconds=720`. Two 499 client-cancels in 1 h (HLS segments at 6.4 s + 2.9 s).
- `docs/21` - full HDR/HEVC diagnosis for Rick & Morty. Source = HDR10 (`smpte2084`, `bt2020nc`, `yuv420p10le`, `color_range=pc`, no MasteringDisplay/CLL - fake AI-upscale HDR). `EnableTonemapping=false` + `HardwareAccelerationType=none` -> HDR pixels delivered as SDR -> washed-out (NOT pure black). PlaybackInfo: `TranscodeReasons=ContainerNotSupported, AudioCodecNotSupported, SubtitleCodecNotSupported`. Fix: `EnableTonemapping=true` (`bt2390` already selected).
- `docs/22` sec 5 - 4 concurrent ffmpegs on ONE viewer of R&M S01E01. Filtergraph: `[0:4]scale,scale=3840:2160:fast_bilinear[sub]; [0:0]...format=yuv420p[main]; [main][sub]overlay`, `libx264 preset=veryfast crf=23 maxrate=13.5Mbps`, fmp4 HLS. 643 % CPU each. Cause: `EnableThrottling=false` + `EnableSegmentDeletion=false`.
- `docs/22` sec 3 - `TranscodingSubProtocol: hls`, `Container: fmp4/hls`, `IsVideoDirect=False, IsAudioDirect=False`. `PlayMethod` reports `DirectPlay` while `TranscodingInfo` is populated - race in Sessions DTO; actual decision is transcode.
- `docs/23` sec 7 - every Traefik request > 50 ms is `/videos/.../hls1/main/*.mp4` HLS-segment GET. AV1+HEVC at 360-550 Mbit. 15 x 499 + 8 x 500 in 6 h (CPU-side, not edge).
- **No prior doc covers "video plays as black screen" with audio working.** HLS init/segment naming collision risk (`hls_fmp4_init_filename "X-1.mp4"` + `hls_segment_filename "X%d.mp4"`) is a doc-26-only hypothesis. SW-intercepting-init-segment is also doc-26-only - but stock SW has no `fetch` handler so this requires a poisoned non-stock SW.
**E - Forbidden patterns - things explicitly called out as "do not do"**
- **No bundle modifications** (`docs/16` F5, `docs/19` row 16). Content-hashed filenames rotate per JF image upgrade; breaks source-map; must re-emit per bump.
- **No DOM-walker MutationObservers without debounce + scope** (doc 26 sec A bug A). The hot-patched `forceEnglishUI()` text-walker on `document.body` with `subtree:true, characterData:true` froze the main thread on poster lazy-load. The `inject-shim.py` walker in doc 16 sec C is the safe pattern (`acceptNode` filter + bounded selector).
- **No hot-patching `web-overrides/index.html` without committing to repo** (doc 26 sec A lesson). md5 drift between deployed and repo HEAD is invisible until breakage.
- **No trusting HTTP 204 as success** (doc 26 sec B lesson). `jellyfin.db` owned by uid 101000 (userns leftover) -> SQLite Error 8 readonly - POSTs return 204 but value not persisted. Always GET-verify.
- **No `Cache-Control: immutable` on `/web/index.html`** (doc 25 R1 caveat). Bricks next deploy until users force-reload. Scope to hashed chunks only.
- **No tonemap on SDR sources** (doc 21 sec 7e). If Mandalorian looks oversaturated post-fix, tonemap leaks - set `TonemappingMode` from `auto` to stricter.
- **No relying on per-user `Configuration.UICulture` for UI strings** (doc 25 R3 + sec 4). Server-side metadata theatre. Only the shim pins UI. Keep field for future-proofing but stop expecting it to fix Abspielen.
- **No bundle bind-mount for `<lang>-json.<hash>.chunk.js`** (doc 19 Path B caveat, doc 25 R4). Hashes rotate per image upgrade - must regenerate every bump.
- **No deleting Settings drawer node** (doc 17 sec 3.1). Drawer-renderer rebuilds on next render; remove only via CSS `display:none` + style override. Old `mypreferencesmenu` selectors match **0** elements - use `a.btnSettings, [data-itemid="settings"]`.
- **No theme @import without snapshot** (doc 14 sec 9). `/System/Configuration/branding` is whole-object replace - sibling Cineplex POST overwrote ElegantFin/NeutralFin within minutes (race rule, doc 04 sec 3b).
- **No `bg:#000 !important` on detail pages** (doc 14 sec 2c, doc 21 sec 4) - clamps Finity's intentional 17vw band into hard-black slab. Scope to `body:not(.itemDetailPage)`.
- **No stripping `Accept-Language` at Traefik for shared backends** (doc 15 limit 2; relaxed in doc 19 sec 19 since arrflix is sole consumer of arrflix.s8n.ru router).
### SW kill recipe
Research date 2026-05-09. Treat as authoritative for this incident.
**Q1 — Clear-Site-Data through an active SW:** Per W3C spec and MDN, `Clear-Site-Data` is **only honored on responses fetched over the network**, not those served by a SW. A SW can return arbitrary responses (incl. third-party), so browsers ignore CSD on SW-intercepted responses. Chrome/Firefox/Edge/Opera implement this; Safari support is partial. Conclusion: our existing Traefik header on `/web/index.html` will only fire for users whose SW lets that exact URL through to network — for stuck SWs that serve cached `index.html`, the header never reaches the browser. **Verified-not-working alone.** ([MDN Clear-Site-Data](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Clear-Site-Data), [Chrome Workbox guide](https://developer.chrome.com/docs/workbox/remove-buggy-service-workers))
**Q2 — Self-destruct shim:** **Verified working pattern.** Google's official Workbox guide recommends this as the *primary* approach. The browser performs a byte-for-byte update check on the SW script (max 24h, often immediate when `Cache-Control: max-age=0` or response differs). When the new script unregisters itself, all clients controlled by it lose their controller on next navigation. Canonical NekR snippet ([github.com/NekR/self-destroying-sw](https://github.com/NekR/self-destroying-sw)):
```js
self.addEventListener('install', e => self.skipWaiting());
self.addEventListener('activate', e => {
self.registration.unregister()
.then(() => self.clients.matchAll())
.then(cs => cs.forEach(c => c.navigate(c.url)));
});
```
Bind-mount feasibility: Jellyfin official image serves web from `/jellyfin/jellyfin-web/` inside the container. Bind-mounting the *whole directory* is broken (jellyfin/jellyfin#8441), but bind-mounting a *single file* over the existing `serviceworker.js` works the same way `index.html` does for us. Path inside container is `/jellyfin/jellyfin-web/serviceworker.js`. ([Jellyfin container docs](https://jellyfin.org/docs/general/installation/container/), [discussion #8441](https://github.com/jellyfin/jellyfin/discussions/8441))
**Q3 — 404/410 for SW script:** Spec status is **may work, browser-dependent**. W3C ServiceWorker issue #204 was closed wontfix — the spec does NOT mandate auto-unregister on 404/410 during normal navigation. HOWEVER, the *Update* algorithm (run on navigation, ~24h, or `registration.update()`) DOES unregister on 404/410 in Chrome and Firefox today (matches AppCache). The catch: update only runs when the browser checks; a stuck SW serving cached pages may never trigger an update fetch. Less reliable than self-destruct shim. ([w3c/ServiceWorker#204](https://github.com/w3c/ServiceWorker/issues/204))
**Q4 — Jellyfin 10.10.x SW poisoning:** No 10.10-specific SW-poster issue filed. The actual `src/serviceworker.js` in jellyfin-web is **notification-only** — no `fetch` listener, no cache logic. So if `arrflix.s8n.ru/web/serviceworker.js` is intercepting media, it is NOT stock Jellyfin code — likely a stale SW from a prior deploy, an injected mod (BobHasNoSoul/jellyfin-mods etc.), or browser-side residue. Stock Jellyfin SW cannot poison posters/HLS by design. Related issues: [jellyfin-web#4549](https://github.com/jellyfin/jellyfin-web/issues/4549) (premature caching), [jellyfin-web#5729](https://github.com/jellyfin/jellyfin-web/issues/5729) (stale `/system/info/public`).
**Q5 — Container path:** Confirmed `/jellyfin/jellyfin-web/serviceworker.js` for the official `jellyfin/jellyfin` image.
### Prod-vs-dev diff
Investigation 2026-05-09 — comparing live `jellyfin` (prod) vs `jellyfin-dev` containers on nullstone. Image tags identical: both `jellyfin/jellyfin:10.10.3`. Network.xml byte-identical. So differences below are 100% the operator's hardening, not Jellyfin upstream.
**A — docker-compose.yml diff (key items):**
- Prod mounts ~110+ web-override files: `index.html`, `cineplex.css`, AND a `locale-en-only/` directory containing every non-English `*-json.*.chunk.js` (af, ar, as, be, bg, bn, ca, cs, da, de, ... zh-tw, zu) bind-mounted RO over the container's locale chunks. Dev mounts ONLY `index-dev.html` over `index.html`. No CSS, no locale chunks.
- Prod traefik labels: `security-headers@file,compress@file,force-en-accept-lang@file`. Dev: `security-headers@file,no-guest@file`. Prod has NO `no-guest@file` directly on the docker-label router — its no-guest layer is enforced by the higher-priority `jellyfin-html-nocache` file-provider router (which ALSO adds `cache-no-store@file`, `clear-site-data@file` — see below).
- Prod env adds `JELLYFIN_UICulture=en-US`, `LANG=en_US.UTF-8`, `LC_ALL=en_US.UTF-8`. Dev has none.
**B — branding.xml / CustomCss diff:**
- Prod: 30,795 bytes. Full Cineplex CSS via `@import url("/web/cineplex.css")` (LOCAL bind-mount), ARRFLIX logo PNG embedded as base64 data-URI, Cast/Crew hidden, Quick Connect hidden, header buttons hidden, white slider thumbs, pure-black `--primary-background-color`.
- Dev: 26,345 bytes. Cineplex via `@import url("https://cdn.jsdelivr.net/gh/MRunkehl/cineplex@v1.0.6/cineplex.css")` (REMOTE jsDelivr — no /web/cineplex.css bind-mount). Same login disclaimer + Cast/Crew hide. **Confirmed dev has its OWN branding.xml on disk (not empty).**
**C — Per-user UICulture / settings:** Could not run `sqlite3` inside container (binary not present). Prod and dev both have separate config dirs (`/home/docker/jellyfin/` vs `/home/docker/jellyfin-dev/`). Dev config/data tree is a leaner subset (no `keyframes/`, no `splashscreen.png`, no `subtitles/`, no `device.txt`-only DB-shm/wal absence — dev DB sits idle without WAL == fewer active sessions, expected). Dev was set up as a fresh first-run wizard per `docs/12-dev-instance.md`, so its user table is its own admin only.
**D — encoding.xml diff:** Real divergence:
- Prod: `EnableThrottling=true`, `EnableSegmentDeletion=true`, `EnableTonemapping=true`.
- Dev: `EnableThrottling=false`, `EnableSegmentDeletion=false`, `EnableTonemapping=false`.
- Prod is the stricter/lower-resource HLS profile; dev keeps every segment around. Plausible contributor to the **HLS 499 client-disconnect** seen in section E (prod): if a client pauses/seeks while throttling+deletion are both on, segment 186 may be reaped before re-request lands.
**E — Surprising / smoking gun: Traefik headers prod-only, NOT applied to dev:**
- `curl -sI https://arrflix.s8n.ru/web/index.html` returns:
- `cache-control: no-cache, no-store, must-revalidate`
- `clear-site-data: "cache", "cookies", "storage"`
- `curl -sI https://dev.arrflix.s8n.ru/web/index.html` returns NEITHER. Just `x-frame-options: SAMEORIGIN`.
- Source: `/opt/docker/traefik/config/dynamic.yml` defines a HIGH-PRIORITY (priority:100) file-provider router `jellyfin-html-nocache` matching `Host(arrflix.s8n.ru) && Path(/, /web/, /web/index.html, /web/sw.js, /web/manifest.json)` with middlewares `security-headers,compress,cache-no-store,force-en-accept-lang,clear-site-data`. Dev's `dev.arrflix.s8n.ru` host has no equivalent file-provider router — only the docker-label router applies.
- The `clear-site-data` middleware was ADDED 2026-05-09 (today) as a "one-shot" to wipe SW+cache+storage. Comment in dynamic.yml literally says: *"Remove this middleware after owner has visited once and confirmed clean state."*
- **Implication:** Every prod page-load tells the browser to wipe cache + cookies + storage. If the SW intercepts before the header reaches the cache layer (per Q1 finding above) the header is harmless; but if any auth state or in-progress playback state is in storage when the header DOES land (e.g. on a forced refetch), it gets nuked. Dev does not have this and dev "works".
- Prod also has `jellyfin-locale-force-en` (priority:200) doing `replacePathRegex` from any locale-json chunk to `en-us-json.667484b4a441712c7e05.chunk.js`. The hash is hard-coded; if the deployed Jellyfin web bundle ever shipped a different en-us-json hash, EVERY locale chunk request returns a 404 wrapped as a successful rewrite to a non-existent path. Worth verifying the hash matches the live bundle.
**Suggested transplant (smallest reversible change):**
1. Remove the `clear-site-data@file` middleware from the `jellyfin-html-nocache` router in `/opt/docker/traefik/config/dynamic.yml` (one line). Keep `cache-no-store` so the SW-update fetch still bypasses heuristic cache. Traefik hot-reloads.
2. Verify with `curl -sI https://arrflix.s8n.ru/web/index.html` → no `clear-site-data` header.
3. If prod now behaves like dev, the CSD header was a major factor in the unresponsive page (storage wipe in flight while SPA boots = re-auth race + token loss).
4. Re-test playback. If still black-screen, suspect the encoding.xml `EnableThrottling+SegmentDeletion=true` combo and try toggling each off to match dev.
5. Last resort: also drop the `jellyfin-locale-force-en` rewrite and verify the hard-coded en-us-json hash is current with the running 10.10.3 bundle.
### Online research 2026-05-09
Research-only pass against current GitHub state. All URLs verified live this date.
**Q1 — UICulture per-user broken in 10.10.3?** No evidence the field was *removed* from `UserConfiguration` in the 10.10.x line. DeepWiki's settings-management page still documents per-user UICulture. The closest live regression is jellyfin/jellyfin#16117 ("Can't change plugins settings - Fixed by disabling **Cloudflare Rocket Loader**"): same shape — POST returns 2xx, body silently dropped, only over reverse proxy. Verdict: **probable** that our symptom is reverse-proxy-side body mangling, not a server-side schema removal. Sanity check: bypass Traefik (`curl --resolve arrflix.s8n.ru:8096:127.0.0.1` direct to container) and POST UICulture; if it persists there but not via Traefik, middleware is mutating the JSON. Discussion #15857 confirms `204 No Content` is the expected return code for these write endpoints — the 204 itself is not the bug. ([#16117](https://github.com/jellyfin/jellyfin/issues/16117), [discussion #15857](https://github.com/orgs/jellyfin/discussions/15857), [DeepWiki settings](https://deepwiki.com/jellyfin/jellyfin-web/5.2-user-settings))
**Q2 — Backdrops missing while posters work.** **Confirmed root cause = TMDB API change.** jellyfin/jellyfin#14922 (opened 2025-10-01, CLOSED) and #14951 (2025-10-06, CLOSED): TMDB swapped "no-language" backdrop tag from empty-string to `xx`; Jellyfin 10.10.x scrapes those as **Thumbs**, not Backdrops, so the Backdrops slot is empty. The Jellyfin team explicitly said it will not be backported to 10.10 — fix lands only in 10.11.0+. So our 10.10.3 instance has zero backdrops for any item added after ~Sep 2025 unless a non-`xx` language backdrop happened to exist. Issue #7264 (Movies showing backdrops *instead of* posters) is a separate 10.11.1 regression — opposite symptom, not relevant here, marked "Can't Reproduce" in #15259. Verdict: **confirmed** for our case. Mitigation = upgrade to 10.11.x and run "Replace existing images" on every item *after* upgrading. ([#14922](https://github.com/jellyfin/jellyfin/issues/14922), [#14951](https://github.com/jellyfin/jellyfin/issues/14951), [#7264](https://github.com/jellyfin/jellyfin-web/issues/7264))
**Q3 — Service Worker survival despite Clear-Site-Data.** **Confirmed.** Chrome's official Workbox guide states `Clear-Site-Data` "can't be relied on alone" because the SW intercepts the very response that would carry the header. Chromium SW Security FAQ explicitly recommends pairing CSD with a no-op SW. Same conclusion as our SW kill recipe section, validated from a second angle. ([Chrome Workbox](https://developer.chrome.com/docs/workbox/remove-buggy-service-workers), [Chromium SW FAQ](https://chromium.googlesource.com/chromium/src/+/main/docs/security/service-worker-security-faq.md))
**Q4 — Self-destruct SW pattern in Jellyfin community.** No Jellyfin-specific recipe published. Generic NekR self-destroying-sw is the canonical pattern (already cited above). BobHasNoSoul/jellyfin-mods ships a *replacement* SW (not a self-destruct one) — useful only as a reference for how others bind-mount over `/jellyfin/jellyfin-web/serviceworker.js`. Verdict: **no evidence** of a Jellyfin-curated kill recipe; we are first to ship one. ([NekR](https://github.com/NekR/self-destroying-sw), [BobHasNoSoul/jellyfin-mods](https://github.com/BobHasNoSoul/jellyfin-mods))
**Q5 — HLS fmp4 init-segment collision on restart.** **No evidence of collision in practice.** Jellyfin always passes `-start_number 0` and the init filename is `<hash>-1.mp4` (literal `-1`, not `%d`-derived); segments are `<hash>0.mp4`, `<hash>1.mp4`, ... so `-1` cannot collide with any positive `%d`. Restart spawns a *new hash* (different session id), so old and new sessions don't share filenames either. The active live bug is jellyfin/jellyfin#16612 — playback breaks after 1015 s in 10.11.8 with fMP4-HLS — but the cause traced in that thread is FFmpeg/segment-availability, not init-name collision. Tangentially: #12230 (CLOSED) is about the init filename being passed *relative* not absolute — only matters when Jellyfin's CWD ≠ transcode dir (rffmpeg setups). Verdict: **no evidence** that init-name collision causes our black-screen. Look at #16612 and at `Cache-Control: no-store` on `/Videos/*/hls1/*` instead. ([#16612](https://github.com/jellyfin/jellyfin/issues/16612), [#12230](https://github.com/jellyfin/jellyfin/issues/12230))
**Q6 — Cineplex theme repo activity.** Repo `MRunkehl/cineplex` last pushed **2025-09-06** (sha `98c8e71`, "Fixed more styles and script"). Description: "Updated jellyflix theme for newest jellyfin v10.10.7 and better netflix styles". **Zero open or closed issues** (issues tab is empty). No commits since 10.11.0 shipped, so the theme has not been validated against 10.11 image-type changes. Verdict: **probable** that backdrop CSS selectors target 10.10 DOM and may break or hide backdrops on a 10.11 upgrade. Audit `cineplex.css` for `.itemBackdrop`, `.backdropContainer`, `.cardBox-bottompadded` selectors before upgrading. ([repo](https://github.com/MRunkehl/cineplex))
**Q7 — Jellyfin 10.11.8 changelog.** **Does NOT fix our issues directly.** Server 10.11.8 ships only 3 changes: subtitle-language library handling, subtitle saving, and language-filter querying. jellyfin-web 10.11.8: a single PR (#7796) for lazy device-info loading. Released as a regression-revert from 10.11.7 ahead of CVE/GHSA disclosure. None of UICulture persistence, SW poisoning, or fMP4 playback are addressed in .8 itself. However the TMDB-backdrop fix (Q2) lands in the 10.11.0 baseline that .8 inherits. Verdict on .8 specifically: **no evidence** it helps directly; **confirmed** the 10.11 line fixes Q2. Upgrade target = 10.11.8 (latest stable: 10.11.0 backdrop fix + .7 security fixes + .8 regression reverts). ([10.11.8 server](https://github.com/jellyfin/jellyfin/releases/tag/v10.11.8), [10.11.8 web](https://github.com/jellyfin/jellyfin-web/releases/tag/v10.11.8))
### Recommended action sequence
**Option A — Self-destruct shim (RECOMMENDED, verified working):**
```bash
# On nullstone, in the arrflix compose dir:
cat > /opt/docker/arrflix/web-overrides/serviceworker.js <<'EOF'
self.addEventListener('install', e => self.skipWaiting());
self.addEventListener('activate', e => {
self.registration.unregister()
.then(() => self.clients.matchAll())
.then(cs => cs.forEach(c => c.navigate(c.url)));
});
EOF
# Add to compose volumes (same pattern as index.html):
# - /opt/docker/arrflix/web-overrides/serviceworker.js:/jellyfin/jellyfin-web/serviceworker.js:ro
docker compose -f /opt/docker/arrflix/compose.yml up -d --force-recreate jellyfin
# Force Traefik to send no-cache on the SW script so browsers refetch immediately:
# middleware: response header Cache-Control: no-cache, no-store, max-age=0 on /web/serviceworker.js
```
- **Side effects:** every existing browser session navigates to its current URL once on next page load — looks like a single auto-refresh. No data loss. New visitors get the shim, immediately unregister, never see it again.
- **Recovery:** revert by removing the bind-mount line + `up -d --force-recreate`. Original SW returns.
- **Verify:** `curl -skI https://arrflix.s8n.ru/web/serviceworker.js` → 200 + `Cache-Control: no-cache`. Body matches the shim. In an incognito window: open DevTools → Application → Service Workers shows registration *then* "redundant" within seconds.
**Option B — Serve 404 (may work, less reliable):**
```bash
# Traefik file-provider snippet:
# - /web/serviceworker.js → middleware that returns 404 (errors middleware → static 404 service)
# Or simply: bind-mount an empty file and add a Traefik replacePathRegex to a non-existent path.
```
- **Side effects:** Chrome/Firefox unregister on next *Update* fetch (typically next navigation after >24h, or sooner if user reloads). Slow rollout. Some users may stay stuck for a day.
- **Recovery:** remove the rule, original SW returns on next image rebuild.
- **Verify:** `curl -skI https://arrflix.s8n.ru/web/serviceworker.js` → 404. DevTools shows SW going "redundant" after a navigation+reload cycle.
**Option C — Do nothing server-side, force user manual:**
- User opens DevTools → Application → Service Workers → Unregister, OR `chrome://serviceworker-internals` → Unregister, OR clears site data.
- **Side effects:** every user must do this individually; non-technical users can't.
- **Recovery:** trivial, nothing changed.
- **Verify:** per-user; no server signal.
**Decision:** Go with **Option A**. It is the Google-recommended pattern, is the only approach that auto-fixes already-loaded tabs without user action, and is reversible by removing one line from compose.
### SW source + image cache
**(Agent run 2026-05-09 — verifies the stock SW source live on the running container, and probes server-side image health for a known item. Important: contradicts the working assumption that the SW is intercepting fetches.)**
**Part 1 — `/web/serviceworker.js` source + interception map**
Both `docker exec jellyfin cat /jellyfin/jellyfin-web/serviceworker.js` and `curl -sk https://arrflix.s8n.ru/web/serviceworker.js` return the **same** file (~1KB single line):
```js
(self.webpackChunk=self.webpackChunk||[]).push([[82798],{16764:function(n,e,t){
t(78557),t(90076),
self.addEventListener("notificationclick", function(n){ /* opens window or calls connectionManager */ }, !1),
self.addEventListener("activate", function(){ return self.clients.claim() })
}}, function(n){ n.O(0,[59928], function(){ return 16764, n(n.s=16764) }), n.O() }]);
```
**Interception map — there is none.**
- No `fetch` event listener in this file.
- Only listeners: `notificationclick` and `activate` (calls `clients.claim()`).
- `t(78557)` and `t(90076)` are webpack require calls for two other modules — those *might* register fetch handlers, but they are NOT in this bundle (they live in lazy chunks under `/web/*.chunk.js`). The chunk IDs `82798` / `59928` map to the notification module only.
- **No CacheStorage usage anywhere in this bundle.** No `caches.open`, `caches.match`, `cache.put`. So this SW does **NOT** cache `/Items/{id}/Images/*`, `/Videos/{id}/*`, `/web/*-json.*.chunk.js`, or `/web/index.html`.
**Conclusion:** Jellyfin 10.10.3 web's stock SW is push-notification-only. It does not intercept fetches and owns no CacheStorage entries. This **confirms agent Q4 finding** ("notification-only — no `fetch` listener, no cache logic") against the running container — not just spec/source, the literal bytes Jellyfin is shipping.
**Implication for Section C diagnosis:** "SW intercepts the GET to `/web/index.html` and serves from cache" is **false**. With no `fetch` handler the SW cannot intercept. `Clear-Site-Data` would already reach the network response — the real blocker for stale German chunks is **HTTP browser cache** (memory + disk), not Service Worker cache.
**Replacement plan:** The self-unregister shim is still safe and useful as belt-and-braces — installs cleanly, deletes any caches that ever existed, unregisters, force-reloads. Bind-mount path inside container is `/jellyfin/jellyfin-web/serviceworker.js`. But it is **not the missing piece** for the German leak. Real fix: existing `Cache-Control: no-store` + `Clear-Site-Data` headers on `/web/index.html` plus a **hard reload** (Ctrl+Shift+R) or DevTools → Application → Clear storage on user's browser.
**Part 2 — Image cache state**
```
/home/docker/jellyfin/config/metadata = 112M (well-populated)
/library/<hh>/<item-id>/poster.jpg present in sampled items
/home/docker/jellyfin/cache = 59M
/images/resized-images/{0..f} = 16 hex subdirs, all populated with .webp tiles
```
Agent 7's earlier note "**only `resized-images` subdir present**" is **still true**`/cache/images/` contains only `resized-images/`, no `original/` or `remote/`. That is the **expected** Jellyfin layout (originals live under `/config/metadata/library/`, only resizes live under `/cache/images/resized-images/`). Not a bug.
API probe for item `7aa5add2c2d8575eda5280b9b9072071` (The Mike Nolan Show) via temp token (revoked after), all four image types via `https://arrflix.s8n.ru`:
| Endpoint | Status | Content-Type | Notes |
|---|---|---|---|
| `/Items/{id}/Images/Backdrop` | **200** | image/jpeg | served, `age: 5400` (90min upstream cache) |
| `/Items/{id}/Images/Primary` | **200** | image/jpeg | served |
| `/Items/{id}/Images/Logo` | **200** | image/png | served |
| `/Items/{id}/Images/Thumb` | **200** | image/jpeg | served |
**Verdict:** Server-side images are healthy. Backdrop + Primary + Logo + Thumb all 200 with valid content-types for a real item the user is browsing. The "all backdrops black" symptom (Section D) is **NOT** a server-side image problem and **NOT** a SW-cache problem. Likely culprits remaining:
- (a) CSS rule in deployed `index.html` overrides / theme overrides hiding `.itemBackdrop` or setting `opacity: 0`;
- (b) browser HTTP cache holding stale 404s from earlier broken state — same Ctrl+Shift+R fix as Part 1;
- (c) a custom-css.user.css backdrop opacity:0 / display:none rule.
Recommend: in user's browser open one show page, DevTools → Network → filter Img → look for `/Items/{id}/Images/Backdrop` request. If 200 served but invisible → CSS theme leak. If never requested → SPA template not fetching it (theme-side bug).
### Backdrop diagnosis
Investigation 2026-05-09. User reported: detail-page backdrops are pure black on prod (`arrflix.s8n.ru`). Posters render fine. Used a temp ApiKey row (`Name='arrflix-backdrop-diag-2026-05-09'`, deleted after diag) on the live `jellyfin` container.
**Layer A (server) — RULED OUT.**
- Item `7aa5add2c2d8575eda5280b9b9072071` (The Dark Knight) JSON returns `BackdropImageTags: ['76cac7069dc988f7cd54e99b481db3fc']`. Tag exists.
- `HEAD https://arrflix.s8n.ru/Items/.../Images/Backdrop``HTTP/2 200`, `content-type: image/jpeg`, `content-length: 560210`, `last-modified: 2026-05-08 22:11:50`.
- Same call against `dev.arrflix.s8n.ru` → also 200 + image/jpeg. Both prod and dev serve backdrop bytes correctly.
**Layer C (browser cache / SW) — RULED OUT.**
- The stock SW (Section "SW source + image cache" above) does not intercept `/Items/*/Images/*`. Backdrop URL also returns fresh on direct curl (no SW in path).
**Layer B (CSS) — CONFIRMED. The CustomCss `BLACK-PASS` block hides the image layer.**
The Jellyfin DOM has two distinct elements (verified by reading `main.jellyfin.bundle.js` + `main.jellyfin.1ed46a7a22b550acaef3.css` inside the running container):
1. `.backdropContainer` — stock CSS: `position:fixed; bottom:0; left:0; right:0; top:0; z-index:-1`. Holds a child `<div class="backdropImage">` whose `style.backgroundImage="url(/Items/.../Backdrop)"` is injected by JS (`r.style.backgroundImage="url('".concat(e,"')")` in the bundle). This is the IMAGE LAYER.
2. `.backgroundContainer` (no `d`) — separate `position:fixed` overlay; gets the `withBackdrop` class toggled by JS. This is the OVERLAY LAYER. Stock CSS sets `body { background-color: transparent !important; }` precisely so the body never occludes the `z-index:-1` backdrop.
Bug 1 — **`!important` blacks override stock body transparency.** CustomCss `BLACK-PASS 2026-05-08` block (lines ~110-202 of branding.xml CustomCss) sets `background-color: #000000 !important` on `html, body, #reactRoot, .skinBody, .preload, .mainAnimatedPages, .pageContainer, .libraryPage, .itemDetailPage, .padded-bottom-page, .layout-desktop, .layout-mobile, .layout-tv` etc. Since `.backdropContainer` is at `z-index:-1`, ANY ancestor with an opaque background paints on top of it, hiding the backdrop image entirely.
Bug 2 — **The transparent-scope rule at lines 102-107 is incomplete.** It scopes to `body.itemDetailPage, body.itemDetailPage #reactRoot, body.itemDetailPage .mainAnimatedPages, body.itemDetailPage .skinBody`, but does NOT include `.layout-desktop` / `.itemDetailPage` itself / `.layout-tv` / `.pageContainer` / `.padded-bottom-page` — so those wrappers remain `#000` on detail pages and continue to occlude the `z-index:-1` layer.
Bug 3 (cosmetic — not the cause of black) — line 89-101 sets `background-image: linear-gradient(...)` on `.layout-desktop .backgroundContainer.withBackdrop`. That's the OVERLAY layer, fine on its own. But because the actual backdrop image is hidden by Bug 1, the gradient now composites against pure black instead of the backdrop, so the user sees only the gradient (which fades from black to transparent) over a black backdrop = solid black with at most a faint gradient edge.
**Cross-check:** dev (`dev.arrflix.s8n.ru`) does NOT mount the `BLACK-PASS` CustomCss block (Section B above confirms dev branding.xml is 4.5KB smaller and uses remote jsDelivr Cineplex without local overrides). Opening dev should show backdrops normally; if it does, that's a clean A/B confirmation that prod's CustomCss is the regression.
**Fix recipe (smallest reversible change).**
In `/home/docker/jellyfin/config/config/branding.xml` `<CustomCss>` block, extend the `body.itemDetailPage` transparent-scope rule (currently lines 102-107) to also cancel the black backgrounds on every wrapper that the BLACK-PASS block paints:
```css
/* Replace existing block at lines 102-107 with: */
body.itemDetailPage,
body.itemDetailPage #reactRoot,
body.itemDetailPage .mainAnimatedPages,
body.itemDetailPage .skinBody,
body.itemDetailPage .layout-desktop,
body.itemDetailPage .layout-mobile,
body.itemDetailPage .layout-tv,
body.itemDetailPage .pageContainer,
body.itemDetailPage .padded-bottom-page,
body.itemDetailPage .itemDetailPage,
body.itemDetailPage #mainPanel,
body.itemDetailPage #mainDrawerPanel {
background-color: transparent !important;
background: transparent !important;
}
```
This keeps `#000` everywhere else (library, search, dashboard) but reveals the `.backdropContainer > .backdropImage` layer on detail pages — which is what the gradient overlay (Bug 3) was originally designed to compose against.
**Apply via Dashboard → Branding → Custom CSS** (no container restart needed; CSS reloads on next page render). Editing branding.xml directly works too but Jellyfin re-serializes on save, so use the Dashboard.
**Verify after edit:** open a movie detail page in an incognito window (bypasses SW). Expected: full-bleed backdrop visible at right ~70% of viewport, gradient fade from black on the left. If still black: hard-refresh + DevTools → Elements → search `.backdropImage` and confirm its parent chain has no `background-color` other than transparent.
**Recovery:** revert to the original 6-selector block.
---
### Playback diagnosis
Investigation date 2026-05-09 ~00:3000:45 UTC. Live transcode test against prod jellyfin via temp ApiKey `arrflix-playback-diag-2026-05-09` (deleted at end of session, verified empty SELECT after DELETE).
**A) Source codec verdict — the ItemId is mis-attributed in this incident report.** ItemId `7aa5add2c2d8575eda5280b9b9072071` is **The Dark Knight (2008)**, NOT "The Mike Nolan Show". Confirmed via `/Users/{u}/Items?searchTerm=...`:
- `7aa5add2...` → Movie / `/media/movies/The Dark Knight (2008)/The Dark Knight (2008).mkv`**HEVC Main 10 / yuv420p10le, 1918x800, TrueHD 24-bit + AC3 + 2× PGS**.
- The Mike Nolan Show series Id is `37cb910f507c4d1f9e365ef1954f99c2`. Episodes (e.g. S01E04 "Ding Dong Delli") are **AV1 Main / yuv420p / Opus**, ~412 kbps total.
(So the prior Section D backdrop-probe line that labelled `7aa5add2...` as MNS is also wrong — those Backdrop/Primary/Logo/Thumb 200s were TDK images. Doesn't change Section D's conclusion that backdrops serve fine.)
Chrome advertises `av1,h264,vp9` (NOT hevc, NOT vp8). So:
- **TDK (HEVC 10-bit)**: must transcode → server picks libx264 High@4.0 yuv420p (8-bit) AAC LC stereo. Fully Chrome-decodable.
- **MNS episodes (AV1+Opus)**: should DirectPlay/DirectStream — Chrome supports both natively.
**B) HLS pipeline verdict — server-side fully working.** PlaybackInfo POST returned `TranscodingUrl=/videos/.../master.m3u8?VideoCodec=h264&...`, `SupportsTranscoding=True`, `TranscodingSubProtocol=hls`. Manual fetches on TDK:
- master.m3u8 → HTTP 200, valid `#EXTM3U`, single variant `BANDWIDTH=13407532, RESOLUTION=1918x800, CODECS="avc1.424029,mp4a.40.2"` (the `424029` decodes to "Baseline 4.1" but actual stream below is High — known cosmetic Jellyfin mislabel, not a Chrome blocker).
- main.m3u8 sub-playlist → HTTP 200, segments `hls1/main/0.ts``9.ts`, 3-second EXTINF.
- segment 0.ts → HTTP 200, 269 KB. ffprobe verdict: `h264 High / yuv420p / level 4.0, 1918x800` + `aac LC`. Valid 8-bit H.264. Cache dir during playback contains 40+ valid `.ts` segments. No fmp4 init filename collision (mpegts segments in current run; the earlier fmp4 path's `-1.mp4` init pattern with `start_number=0` is also fine — `-1.mp4` literally has the `-1` infix in filename, while data segments are `0.mp4, 1.mp4...`; no actual name collision).
**C) CSS verdict — video element NOT hidden.** Read `branding.xml` CustomCss + `cineplex.css` (full). All `display:none` / `visibility:hidden` / `opacity:0` / `transform:scale(0)` matches are on UI chrome (`#castCollapsible`, `#guestCastCollapsible`, `.btnQuick`, `.headerSyncButton`, `.headerCastButton`, `.headerUserButton`, MUI drawer items, `.countIndicator`, `#loginPage h1`, etc.). The only `video::*` / `:cue` rules touch subtitle font only. **No hide/scale rule hits `.htmlvideoplayer`, `.videoPlayerContainer`, or the `<video>` element itself.** CustomCss is not the cause of the black screen.
**D) Service Worker verdict — no fetch interception.** `/web/serviceworker.js` is the stock Jellyfin notification-only handler (`notificationclick` + `activate→clients.claim`). No `install` cache, no `fetch` listener. Cannot intercept HLS or video URLs. Already characterised in the prior "SW kill recipe" section — stock SW is harmless for media playback.
**E) Web research findings.** No 10.10.3-specific Chrome black-screen bug surfaced for the HLS path. Closest historical pattern: hls.js + AV1+Opus DirectStream where Jellyfin 10.10 mis-builds the codec attribute on the playlist for AV1, causing hls.js to abort. Common workaround: force transcode via DeviceProfile or restrict AV1 in user policy. No citation strong enough to assert as root cause from outside the live browser.
**F) The actual story — and the fix recipe.**
Timeline reconstruction from server logs for the user's session (192.168.0.10):
- `00:28:46` — PlaybackInfo for `7aa5add2...` (TDK).
- `00:28:47` → ffmpeg launches on `/media/movies/The Dark Knight (2008)/...mkv` (libx264 High@5.1, fmp4).
- `00:28:53`, `00:29:01` — ffmpeg restarts at `-ss 00:04:18` and `00:09:06` (= **user seeking forward** during TDK playback).
- `00:29:07`*"Playback stopped … playing The Dark Knight. Stopped at 549885 ms"* (= 9:09).
- `00:29:28`*"Playback stopped … playing F.T.C. Stopped at 39053 ms"* (MNS S01E02).
- `00:42:42`*"Playback stopped … playing Ding Dong Delli. Stopped at 20905 ms"* (MNS S01E04).
What this means: TDK transcoded and played fine for 9 minutes with seeks — **TDK is not black-screening**. The MNS episodes (AV1+Opus, 20-39 s before stop) match the user-perceived "black screen, give up" pattern. The incident report conflated these — user said "Mike Nolan Show + ItemId 7aa5add2" but the ItemId is TDK and the actual symptom is on the AV1 MNS episodes.
The 00:42:49 ffmpeg launch on TDK that appears AFTER MNS stop is **my own diagnostic curl** — its PlaySessionId `14f52f35eee04cec8146379c0dc6c960` matches the one I generated. Disregard as evidence of user behaviour.
**Recommended fix sequence (ordered by likelihood):**
1. **Re-run with the right item.** Ask user to repro on MNS S01E04 (`Ding Dong Delli`), capture browser DevTools Network panel: was `/Videos/.../master.m3u8` issued (transcode path) or only `/Videos/.../stream.webm` (DirectStream)? What does `/Items/.../PlaybackInfo` return for `SupportsDirectStream` on the AV1 source? Capture the JS console for hls.js / shaka / MediaSource errors.
2. **If DirectStream is on for AV1** → force transcode by adding a `CodecProfile` in the user's DeviceProfile that bans AV1 DirectStream (Type=Video, Codec=av1, Container=mkv,webm → forced conditional Direct=false). Server then falls back to libx264 transcode (CPU-only on nullstone, slow but reliable).
3. **Cross-browser test** — try Firefox. Different hls.js behaviour for AV1. If Firefox plays MNS but Chrome doesn't, confirms client-side AV1 DirectStream bug not server.
4. **TDK is fine** — leave alone, unrelated to this incident.
**Out-of-scope here:** dev.arrflix.s8n.ru `/Sessions` returned 401 with the api_key (Sessions needs a user-token, not just admin api_key). Recommend redoing the dev comparison through the user's browser cookie session.
API key cleanup verified: `SELECT Name FROM ApiKeys` returned empty after DELETE.
---
## Final fix applied (verified via playwright headless)
Status: **CLOSED** for symptoms 1-4. Symptom 5 (video black-screen on AV1+Opus
items) is a separate codec issue tracked for the 10.11.8 migration.
### Three patches landed
1. **`branding.xml` CustomCss**: append `content: "Play"` override on
`.mainDetailButtons .material-icons.play_arrow::after`. Cineplex theme
hardcoded German "Abspielen" via CSS `content:` rule — NOT a Jellyfin
locale issue. Hours of Traefik `Accept-Language` rewrites and
`force-english-all-users.sh` chases were chasing the wrong layer entirely.
2. **`branding.xml` CustomCss**: backdrop transparent-scope using `:has()`.
`body.itemDetailPage` selector (from prior docs) does NOT match in
10.10.3 — body class is `libraryDocument`. New rule scopes by
`.layout-desktop:has(.itemDetailPage)` etc so backdrop layer (z-index:-1)
renders behind detail pages without breaking other surfaces.
3. **`encoding.xml`**: `EnableThrottling=false` + `EnableSegmentDeletion=false`.
Kills HLS 499 (segments reaped before browser re-requests).
### Headless verification
`bin/headless-test.py` (new) logs in via Jellyfin SPA login form using
playwright Chromium, navigates to detail page, screenshots, and probes
computed styles. Used to bisect:
- baseline screenshot (broken)
- `:has()` selector verified backdrop renders
- "Play" verified replaces "Abspielen"
### Re-apply
`bin/apply-26-incident-fixes.sh` (new, idempotent) re-applies all three
patches if `branding.xml` / `encoding.xml` drift back. Run via:
`ssh user@nullstone "$(cat bin/apply-26-incident-fixes.sh)"`
### What was rolled back
- The `clear-site-data@file` Traefik middleware I added during this session
was making prod worse: it was wiping cookies+storage on every visit,
breaking auth+playback session continuity. Reverted by restoring the
Traefik dynamic.yml backup taken right before the edit.
---
## Do-NOT-repeat checklist (post-mortem)
These are the dead-ends. Future operators (and future me) should skip:
1. **Don't add `Clear-Site-Data` to a Jellyfin route to "force the SW out".**
Stock Jellyfin SW is notification-only (no fetch handler) — there is no
SW poisoning to begin with. The middleware just wipes cookies on every
visit, breaking auth races.
2. **Don't run `bin/force-english-all-users.sh` to fix "Abspielen".**
Doc 25 already established per-user `Configuration.UICulture` is theatre
and the SPA never reads it. The German text was in **Cineplex CSS** via
`content: "Abspielen"`. Patch the CSS, not the user config.
3. **Don't trust HTTP 204 from POST `/Users/{id}/Configuration` as success.**
Always GET back and verify. (And see #2 — even if you CAN persist
UICulture, it doesn't drive UI strings in 10.10.x.)
4. **Don't use `body.itemDetailPage` as a CSS selector in 10.10.3.**
The body class on detail pages is `libraryDocument`, not `itemDetailPage`.
Use `.itemDetailPage` directly or `:has(.itemDetailPage)` on ancestors.
5. **Don't paint `#000 !important` on `.layout-desktop` / `.pageContainer`
without scoping.** They wrap the backdrop layer; an unscoped black
override occludes the entire backdrop. Always scope with `:has()` or by
page-specific class.
6. **Don't hot-patch `web-overrides/index.html` on the server without
committing back to repo same step.** Drift from repo is invisible until
it breaks. Bug A (the DOM-walker MutationObserver freezing the browser)
came from this exact pattern — see `~/.claude/projects/.../memory/feedback_always_commit_to_my_git.md`.
7. **Don't write CSS Mutation/text-walker observers without debounce + scope.**
Walking every text node on every DOM mutation freezes the main thread on
poster grids. If you need DOM rewriting, use targeted selectors + debounce.
8. **Don't sed-via-python regex on YAML files without strict anchors.**
I damaged `dynamic.yml` with a too-greedy DOTALL match earlier in this
session (deleted unrelated routers). Restore-from-backup saved it.
Always diff before reload.
9. **Don't believe a single-itemId test as "playback works".** Item
`7aa5add2c2d8575eda5280b9b9072071` is The Dark Knight (HEVC, transcodes
fine to H.264). The Mike Nolan Show episodes are AV1+Opus and break in
Chrome. Always test the actual item the user reported.
10. **Don't skip headless smoke-test.** Visual confirmation in playwright
Chromium catches CSS regressions instantly without waiting for the user
to clear browser cache. `bin/headless-test.py` is a 30s round-trip.
---
## Iteration 2 — backdrop visible only on top viewport (2026-05-09 follow-up)
doc 26 INC4: black band + 4K HDR slow transcode + v2 test + methodology audit Two regressions slipped through INC1-3: INC4a -- BLACK BAND behind every detail-page carousel Pre-existing 2026-05-08 home-page rule painted .emby-scroller {bg:#000 !important} UNSCOPED. Hits every carousel inside .itemDetailPage incl admin-only More from Season N, More Like This. INC1-3 transparent-scope list missed .emby-scroller / .verticalSection / .padded-top-focusscale. Fixed by extending scope. INC4b -- VIDEO 'BLACK SCREEN' on play Not actually black-screen. CPU-only nullstone cannot sustain real-time 4K HEVC HDR tonemap+x264 transcode -- 0.5x realtime, ffmpeg takes ~6s per 3s segment. With user resume seeks adding restart overhead, total wait ~18s before browser readyState rises. User saw black, gave up. Fix: disable EnableTonemapping (R&M fake HDR per doc 21) + cap RemoteClientBitrateLimit=20Mbps on every user (1080p target, no 4K scale). Headless v2 test confirms HEVC + AV1 episodes now hit readyState=3/4 within wait window; 4K HDR R&M still slow (heaviest). INC4 testing methodology audit -- bin/headless-test-v2.py v1 only logged in as guest and never clicked Play. v2 runs both admin and guest, walks 3 codec-tagged items per role (HEVC/AV1/H.264), clicks Play, captures <video> state, sweeps DOM for opaque bgs over backdrop layer. False positives: off-viewport #reactRoot + collapsed .mainDrawer (negative coords). Allowlist refinement TODO. Open: 4K HDR sources still slow even post-fix. Real fix path = pre- transcode masters to 1080p H.264 SDR via separate batch, OR migrate to 10.11.8 with vaapi/qsv driver fixed.
2026-05-09 01:46:47 +01:00
### INC4 online research
Web sweep 2026-05-09 against jellyfin/jellyfin + jellyfin/jellyfin-web
issues filed since 2025-01. All URLs cited inline. "Verdict" = how strong
the link to our two open symptoms (black-screen video, opaque "More from
Season" band) is.
**Q1 — Web 10.10.3 video black-screen on play (server transcoding HLS,
browser shows nothing):**
- jellyfin-webos #126 "Black screen by enable Prefer FMP4-HLS as media
container" — HEVC Main10 HDR10 10-bit direct-stream goes black, audio
fine. Workaround: disable Prefer fMP4-HLS.
https://github.com/jellyfin/jellyfin-webos/issues/126
- jellyfin-web #7405 "HLS Media Errors only in Webbrowsers."
https://github.com/jellyfin/jellyfin-web/issues/7405
- jellyfin #16612 "Playback errors due to fMP4-HLS" (10.11.8, but root
cause is fMP4 container; same workaround).
https://github.com/jellyfin/jellyfin/issues/16612
- forum t-solved-black-screen … web UI 10.0.3: theme `.preload { #000
!important }` covered the player. Direct precedent for our symptom.
https://forum.jellyfin.org/t-solved-black-screen-w-audio-when-playing-video-web-ui-10-0-3
- **Verdict: probable.** Two independent vectors:
(1) fMP4-HLS container produces an init segment hls.js stalls on for
certain codec profiles;
(2) custom-CSS overlay covering the player. Both consistent with our
black-screen-but-server-transcoding behaviour.
- **Next step:** in DevTools, confirm whether `<video>` has frames
(network MSE buffer) or is occluded. If the SourceBuffer never
appendBuffer-s, it's #126/#16612 → toggle off "Prefer fMP4-HLS Media
Container" in playback settings (or strip from custom DeviceProfile).
If frames are buffered but invisible, search for an opaque ancestor
(`.preload`, BLACK-PASS rule covering `.videoPlayerContainer`).
**Q2 — Chrome 148 + `-hls_fmp4_init_filename "X-1.mp4"` MSE compatibility:**
- jellyfin-web #7546 "[Regression] Web browser HLS playback times out
when audio transcoding required - worked in 10.10.7, broken in 10.11.6"
— hls.js times out waiting for the first segment while ffmpeg probes
large files.
https://github.com/jellyfin/jellyfin-web/issues/7546
- jellyfin #14487 "Audio delay don't work with fMP4-HLS."
https://github.com/jellyfin/jellyfin/issues/14487
- jellyfin #16647 "HLS subtitle X-TIMESTAMP-MAP is misaligned when using
fMP4 segments."
https://github.com/jellyfin/jellyfin/issues/16647
- **Verdict: confirmed broken across 10.10.7 → 10.11.x for some
codec/container combos.** Not Chrome-148-specific; the init-filename
pattern itself isn't the bug — the timing between ffmpeg probing and
hls.js segment-load timeout is.
- **Next step:** disable Prefer fMP4-HLS first (single-toggle fix). If
still broken, drop probesize + analyzeduration on the encoder side, or
force ts segments via DeviceProfile TranscodingProfile container=ts.
**Q3 — AV1 DirectStream codec-tag mislabel:**
- jellyfin #15646 "AV1 Video Stream in Wrong Container" — av1 muxed into
mpegts as private-data stream, ffmpeg warning "may not be recognized
upon reading". Workaround: switch hls_segment_type from mpegts to
fmp4 with .m4s extension. Marked closed in UI but in Team Review (no
PR linked, no version-tag yet).
https://github.com/jellyfin/jellyfin/issues/15646
- Codec Support docs reaffirm AV1 web playback is gated on browser
support + correct container.
https://jellyfin.org/docs/general/clients/codec-support/
- **Verdict: confirmed open.** Affects 10.11.3 and back; no PR landed
in 10.10.x line. Mike Nolan Show AV1+Opus matches the failure pattern.
- **Next step:** ban AV1 DirectStream via custom DeviceProfile
(drop AV1 from DirectPlayProfiles → forces server-side libx264 transcode).
**Q4 — "More from Season" CSS class names:**
- jellyfin-web source uses `verticalSection` + `detailVerticalSection`
pair, with `data-type="MusicAlbum|Episode|...".`
https://github.com/tedhinklater/JellyfinThemeGuide
- Layouts reference `.scrollSlider`, `.itemsContainer`, `.padded-left`,
`.sectionTitleContainer` (already in our Iteration 2 fix list).
### INC4 video playback diagnosis (full e2e)
End-to-end test 2026-05-09 ~01:35 UTC. Temp ApiKey
`arrflix-playback-e2e-2026-05-09` (token rotated, deleted at end, verified
SELECT empty). Headless Chromium via playwright drove the SPA login as
guest:123 and clicked .btnPlay on Rick & Morty S1E1 Pilot
(`324f75b84f394a5d9b0749c0679f23b9`). Logs in `/tmp/arrflix-playback-e2e/`.
**Source codec verdict — Rick & Morty Pilot is NOT H.264.** ffprobe inside
container reports the file is HEVC Main 10 / yuv420p10le / 3840x2160 /
TrueHD 5.1 24-bit + AC3 5.1 + AC3 2.0 + PGS subs (4K HDR). Same codec class
as TDK. The task brief assumption ("Rick & Morty likely H.264") is wrong —
this library is 4K HDR remux. Path:
`/media/tv/Rick and Morty (2013)/Season 01/Rick and Morty (2013) - S01E01 - Pilot.mkv`.
**Failure mode at click — playback DOES work, but takes 12-18s to first
frame.** All segments + manifest 200 OK, no console errors, no video.error,
no MediaSource exception, no CSS occlusion (.htmlvideoplayer / `<video>`
display:block opacity:1 visibility:visible z-index:auto, getBoundingClientRect
== full viewport). State timeline (clean run, position reset to 0):
| t (s) | readyState | networkState | currentTime | buffered |
|---|---|---|---|---|
| 2-10 | 0 (HAVE_NOTHING) | 2 (LOADING) | 0 | [] |
| 12 | 3 (HAVE_FUTURE_DATA) | 2 | 0 | [[0, 2.97]] |
| 16 | 3 | 2 | 0.72 | [[0, 5.97]] |
| 22 | 3 | 2 | 6.74 | [[0, 11.99]] |
| 30 | 3 | 2 | 14.75 | [[0, 14.97]] |
With user's actual stored resume position (243.018 s from prior session),
adds a kill+restart cycle: SPA fetches segment 0, sees currentTime=243,
seeks → server kills 1st ffmpeg, launches 2nd with `-ss 00:04:03
-noaccurate_seek -start_number 81`. Browser stays at readyState=1 from
~t=8s to ~t=16s while 2nd ffmpeg produces segment 81. **Total wait ≈ 18s
to first painted frame.** From the user's seat that looks identical to a
broken player.
**Server-side ffmpeg command (verified live in jellyfin logs):**
```
/usr/lib/jellyfin-ffmpeg/ffmpeg -analyzeduration 200M -probesize 1G \
-i "/media/tv/Rick and Morty (2013)/Season 01/...Pilot.mkv" \
-map 0:0 -map 0:1 -codec:v:0 libx264 -preset veryfast -crf 23 \
-maxrate 13546858 -profile:v:0 high -level 51 \
-vf "setparams=color_primaries=bt2020:color_trc=smpte2084:colorspace=bt2020nc,\
scale=trunc(min(max(iw,ih*a),min(3840,2160*a))/2)*2:trunc(min(max(iw/a,ih),min(3840/a,2160))/2)*2,\
tonemapx=tonemap=bt2390:desat=0:peak=100:t=bt709:m=bt709:p=bt709:format=yuv420p" \
-codec:a:0 libfdk_aac -ac 2 -ab 256000 \
-hls_segment_type fmp4 -hls_fmp4_init_filename "...-1.mp4" \
-start_number 0 -hls_segment_filename "/cache/transcodes/...%d.mp4" \
-f hls -hls_time 3 ...
```
`HardwareAccelerationType=none` + 4K + tonemapx + libx264 veryfast +
software stereo downmix. **Per-segment encode wallclock observed:** seg0
~6 s, seg1 ~2.05 s. At nullstone Ryzen 5 5600G CPU-only, that's ~50% of
real-time on a sustained run. Browser stalls because new segments arrive
slower than they're consumed.
**PlaybackInfo verdict (browser-emulating DeviceProfile, av1+h264+vp9 both
allowed):** `SupportsDirectPlay=False`, `SupportsDirectStream=False`,
`SupportsTranscoding=True`,
`TranscodeReasons=ContainerNotSupported,VideoCodecNotSupported,AudioCodecNotSupported`,
`TranscodingSubProtocol=hls`, `TranscodingContainer=ts` (when client asks
ts) — but in the headless run the SPA's stock DeviceProfile asks
`SegmentContainer=mp4` (fmp4 path) and the server picked **libx264 H.264
high@5.1 8-bit**, NOT av1. The `VideoCodec=av1,h264,vp9` in the URL is the
priority list; server reads it and selects the first the source can map
to without HW — that's libx264 here, confirmed by `-codec:v:0 libx264` in
ffmpeg cmdline. AV1 is never used as a transcode target on prod.
**Web research corroboration:**
- jellyfin#13324 "Transcoded playback of 4K HDR content fails": "no modern
consumer CPU can transcode 4K HDR to SDR in real time" — software
tonemapping is the bottleneck.
https://github.com/jellyfin/jellyfin/issues/13324
- jellyfin#5067 "HDR Tone Mapping is very slow in Jellyfin (19fps, 70%
cpu)": ~20 fps cap on tonemapx.
https://github.com/jellyfin/jellyfin/issues/5067
- jellyfin docs Hardware Acceleration: software CPU decode + tonemap +
encode at 4K HDR is officially "not supported for sustained real-time".
https://jellyfin.org/docs/general/post-install/transcoding/hardware-acceleration/
**Recommended fix (ordered by reversibility + UX impact):**
1. **Cap user MaxStreamingBitrate to 20 Mbps in jellyfin-web settings.**
Each user → Profile → Playback → Quality → 20 Mbps (or "Auto" with a
default cap). Server-side ffmpeg still runs but `-maxrate 20000000`
matched output bitrate is reasonable and the scale filter clamps to
1080p (1920x800 for the source aspect), eliminating the 4K scale
pass. Reduces per-segment encode wallclock from ~6s → ~1.5s. **Single
toggle, per-user, no server restart, fully reversible.** This is the
right move first.
2. **Force libx264 + transcoding container=ts via DeviceProfile (or in
jellyfin-web settings disable "Prefer fMP4-HLS").** Skips the fmp4
init-segment path which is implicated in jellyfin#16612 / webos#126
for HEVC Main10 sources. `ts` segments self-contain init data —
ssimpler timing.
3. **Disable software tonemapping for libraries with fake-HDR sources.**
Doc 21 already established R&M's `MasteringDisplay/MaxCLL` are absent
(fake AI-upscale HDR). Server-side toggle:
```
ssh user@192.168.0.100 'docker exec jellyfin sh -c "\
sed -i \"s|<EnableTonemapping>true|<EnableTonemapping>false|\" \
/config/config/encoding.xml" && docker restart jellyfin'
```
Removes the tonemapx step from the filtergraph. Output will be SDR-
directly-from-HDR-pixels (washed out per doc 21 — already accepted as
the lesser evil for R&M). Saves ~30% encode CPU at 4K.
4. **(Last resort, deferred to 10.11.8 migration)** Add a CCR-style
"transcode pre-warm" hook: when SPA opens a detail page, pre-issue
`/Items/{id}/PlaybackInfo` + a no-op range request on segment 0 to
start ffmpeg before the user clicks Play. Reduces perceived TTFP.
**Recommended immediate action: option 1 + option 3.** No code change
needed — both are settings flips. After flipping, repro: open Pilot in
Chrome, click Play, time-to-first-frame should be <5s.
**Headless artefact warning:** the `v2-02-after-30s.png` screenshot is
pure black despite readyState=3 + currentTime advancing + buffered=[0,
14.97]. That is because Chromium without GPU does not paint decoded H.264
frames (no compositor target). Real Chrome on real GPU paints. So a
black screenshot from `bin/headless-test.py` after Play is NOT a CSS bug
— it's a headless rendering artefact. Verify CSS occlusion via
`getComputedStyle` + `getBoundingClientRect` instead, both already clean
in this run.
**Open follow-ups left:** AV1+Opus episodes (Mike Nolan Show) still
untested in this iteration — different failure mode (DirectStream
codec-tag mislabel per Q3 above), separate fix path.
https://deepwiki.com/jellyfin/jellyfin-web/3.5-home-sections-and-library-navigation
- BobHasNoSoul/jellyfin-mods uses `#itemDetailPage` parent + nth-of-type
for section targeting.
https://github.com/BobHasNoSoul/jellyfin-mods
- **Verdict: confirmed.** The wrapper is `.verticalSection.detailVerticalSection`
(no `moreFromSeasonSection` class — Jellyfin distinguishes sections by
`data-type` attr, not class). Our INC3 selector list already covers
`.detailVerticalSection*`, so the opaque band is from a DESCENDANT, not
the wrapper itself. Likely candidates: a `.cardScalable`, `.cardBox`, or
`.cardImageContainer` with explicit `background:#000` from BLACK-PASS.
- **Next step:** in DevTools, inspect the opaque band, walk parent chain,
find the first ancestor with non-transparent computed bg. Either
add to transparent-scope or wrap selector in `:not(.cardImageContainer)`.
**Q5 — Themes implementing full-page persistent backdrop:**
- meow.garden "Dynamic backdrops for Jellyfin" — uses
`.detailPagePrimaryContainer .detailImageContainer .blurhash-canvas {
position: fixed !important; opacity: .5; }` to repurpose the blurhash
placeholder as a fullscreen backdrop.
https://meow.garden/jellyfin-dynamic-backdrops/
- Cineplex theme custom.css: targets `.backgroundContainer`,
`.backgroundContainer.withBackdrop`, `.backdropImage`, `.blurhash-canvas`
(commented out). Mobile-only `.itemBackdrop` mask gradient.
https://github.com/MRunkehl/cineplex
- Finity theme: minimal docs, refers to "gradient mask for show backdrops"
but actual selectors live in CSS files (not exposed in README).
https://github.com/prism2001/finity
- **Verdict: confirmed.** Two viable patterns:
(1) pin `.backgroundContainer` (our current INC2 approach) — works but
must transparent-scope every ancestor;
(2) repurpose `.blurhash-canvas` as the fixed layer (meow.garden) —
cleaner because blurhash is already per-item; survives section navigation
without scroll math.
- **Next step:** if INC3 transparent-scope keeps regressing, switch to
blurhash-canvas pin. One selector vs ~20 wrappers to keep transparent.
**Q6 — 10.10.3 → 10.10.7 worth bumping?**
- 10.10.7 forum announcement (2025-04-05): security release, "several
bugfixes." Trusted-proxies config required pre-upgrade.
https://forum.jellyfin.org/t-new-jellyfin-server-web-release-10-10-7
- Compare-page diff (v10.10.3...v10.10.7) didn't generate (too long).
Releasebot lists per-release notes:
https://releasebot.io/updates/jellyfin/jellyfin-server
- Most fMP4/HLS fixes in our research target 10.11.x line, not 10.10.x
patch series.
- **Verdict: probable mild improvement, not a fix for our bugs.** Worth
bumping for security/CVE coverage but unlikely to resolve black-screen
or carousel-band. The known regressions of 10.11.x (`#7546`, `#16612`)
argue against jumping straight to 10.11.8 without dev validation.
- **Next step:** snapshot DB, bump dev to 10.10.7 first. If still broken,
10.11.8 is roadmap path with ElegantFin theme swap.
**Q7 — Force-transcode-everything DeviceProfile:**
- Jellyfin docs confirm there's no built-in admin toggle to force
transcoding for all clients.
https://jellyfin.org/docs/general/post-install/transcoding/
- forum.jellyfin.org/t-force-trasnscoding-or-disable-directplay: community
workaround is reduce client max bitrate to 1Mbps (degrades quality) —
no clean DeviceProfile-only override.
https://forum.jellyfin.org/t-force-trasnscoding-or-disable-directplay-x265-stuttering-firetv
- jellyfin-web #7651 "Chrome DeviceProfile hardcodes MKV in
DirectPlayProfiles": JS-Injector plugin removes entries client-side
before PlaybackInfo POST. Workaround pattern is generalisable: hook
PlaybackInfo XHR, set `DirectPlayProfiles=[]`, leave only
`TranscodingProfiles` with H264 mp4/HLS. Server then has nothing to
match → forces transcode.
https://github.com/jellyfin/jellyfin-web/issues/7651
- **Verdict: confirmed pattern, no native config knob.** Server-side
empty DirectPlayProfiles in a custom DeviceProfile is the cleanest
bypass; only ts-format TranscodingProfile remaining → libx264.
- **Next step:** create custom DeviceProfile in admin → DLNA → Profiles
with empty DirectPlay + a single TranscodingProfile (Container=mp4,
VideoCodec=h264, AudioCodec=aac, Protocol=Hls). Match to Identification
by browser UA. Eliminates codec compat as a variable in one move and
is the cleanest test for "is the bug in our codec path or our renderer".
---
After INC1 (`:has()` transparent-scope) shipped and prod showed backdrop on
detail-page top, owner reported "in the middle of the More from Season 1
is black, it's hiding the artwork". Below-the-fold sections (Next Up, Seasons,
More Like This) showed solid black instead of continuing the backdrop.
### Root cause (INC2)
`.backdropContainer` defaults to non-fixed positioning — it scrolls out of
view. INC1 made wrappers transparent so backdrop showed through, but only
where the backdrop EXISTED in the DOM viewport. Once user scrolls down,
backdrop is above viewport, sections see body's `#000` bg.
### Fix INC2
Pin `.backdropContainer` + `.backgroundContainer` to `position: fixed; top:0;
height:100vh; z-index:0`. Added `::after` vertical gradient (transparent at
top → 75% black at bottom) so text remains readable as user scrolls into
backdrop area.
### Root cause (INC3)
INC2 alone didn't fix it visually — section wrappers (`.detailVerticalSection`,
`.scrollSliderContainer`, `.padded-bottom-page`, `.itemsContainer` etc) still
painted opaque bg from BLACK-PASS + finity. Pinned backdrop sat behind, but
sections occluded it section-by-section.
### Fix INC3
Extended transparent-scope to all detail-page sub-sections:
`.itemDetailPage > *`, `.detailPageContent`, `.detailPagePrimaryContainer`,
`.detailPageWrapperContainer`, `.detailVerticalSection*`, `.detailSection*`,
`.itemsContainer`, `.scrollSlider*`, `.padded-bottom-page`,
`.sectionTitleContainer`, `.detailRibbon`, `.subtitleAudioContainer`,
`.detailPageRoot`.
### Verification (INC2 + INC3)
Updated `bin/headless-test.py` to take TWO viewport screenshots: top-of-page
+ scrolled to 50% page height. With INC2/INC3 applied, scrolled screenshot
shows R&M backdrop persisting behind "Seasons" + "More Like This" sections
(previously: solid black).
### Lesson learned
When pinning a backdrop with `position:fixed`, transparency must extend
RECURSIVELY through every wrapper ON TOP of the backdrop layer, not just the
top-level page wrappers. Test with scrolled screenshot — full-page screenshot
in playwright stretches viewport and hides `position:fixed` issues.
`bin/headless-test.py` now takes both top + scrolled. Use both to bisect.
---
doc 26 INC4: black band + 4K HDR slow transcode + v2 test + methodology audit Two regressions slipped through INC1-3: INC4a -- BLACK BAND behind every detail-page carousel Pre-existing 2026-05-08 home-page rule painted .emby-scroller {bg:#000 !important} UNSCOPED. Hits every carousel inside .itemDetailPage incl admin-only More from Season N, More Like This. INC1-3 transparent-scope list missed .emby-scroller / .verticalSection / .padded-top-focusscale. Fixed by extending scope. INC4b -- VIDEO 'BLACK SCREEN' on play Not actually black-screen. CPU-only nullstone cannot sustain real-time 4K HEVC HDR tonemap+x264 transcode -- 0.5x realtime, ffmpeg takes ~6s per 3s segment. With user resume seeks adding restart overhead, total wait ~18s before browser readyState rises. User saw black, gave up. Fix: disable EnableTonemapping (R&M fake HDR per doc 21) + cap RemoteClientBitrateLimit=20Mbps on every user (1080p target, no 4K scale). Headless v2 test confirms HEVC + AV1 episodes now hit readyState=3/4 within wait window; 4K HDR R&M still slow (heaviest). INC4 testing methodology audit -- bin/headless-test-v2.py v1 only logged in as guest and never clicked Play. v2 runs both admin and guest, walks 3 codec-tagged items per role (HEVC/AV1/H.264), clicks Play, captures <video> state, sweeps DOM for opaque bgs over backdrop layer. False positives: off-viewport #reactRoot + collapsed .mainDrawer (negative coords). Allowlist refinement TODO. Open: 4K HDR sources still slow even post-fix. Real fix path = pre- transcode masters to 1080p H.264 SDR via separate batch, OR migrate to 10.11.8 with vaapi/qsv driver fixed.
2026-05-09 01:46:47 +01:00
### INC4 black-band locator (2026-05-09)
**Symptom.** After INC3, owner reported that for ADMIN users a wide black
band (~250px tall, full-width) still painted around the "More from Season 1"
carousel on the Rick & Morty detail page (admin-only carousel; guest users
don't see it). Cards rendered fine, only the BAND around them was opaque.
**Diagnostic method.** Inserted temp `arrflix-band-diag-2026-05-09` ApiKey,
logged in as admin via playwright, navigated to R&M detail page, scrolled
all sections into view, then walked DOM upward from each `.scrollSlider`
restricted to the `.itemDetailPage` subtree, reporting every ancestor with
non-transparent background. Locator script: `/tmp/arrflix-band-locator.py`.
**Result.** Single opaque-black wrapper found, identical for ALL 9
carousels (Schedule / Next Up / Seasons / Additional Parts / Lyrics /
Cast & Crew / Special Features / Music Videos / Scenes / **More Like This** /
**More from Season** / **More from Artist**):
```
div.padded-top-focusscale.padded-bottom-focusscale.no-padding.emby-scroller
bg = rgb(0, 0, 0) pos = static z = auto
rect = x:80 y:1242 1488×333 (matches the band the user described)
```
**Root cause.** Pre-existing CSS rule in `branding.xml` from 2026-05-08
labelled `/* kill gray band behind home-page Recently Added rows */` applied
`.emby-scroller { background: #000 !important; }` UNSCOPED. INC3 overrode
its sibling wrappers (`.detailVerticalSection`, `.itemsContainer`,
`.scrollSlider`, `.scrollSliderContainer`) but missed the IMMEDIATE PARENT
`.emby-scroller`. That single wrapper was the band.
**Fix INC4.** Detail-page-scoped transparent override appended to CustomCss
after the INC3 block:
```css
.itemDetailPage .emby-scroller,
.itemDetailPage .emby-scroller-container,
.itemDetailPage .verticalSection,
.itemDetailPage .padded-top-focusscale,
.itemDetailPage .padded-bottom-focusscale,
.itemDetailPage .moreFromSeasonSection,
.itemDetailPage .moreFromArtistSection,
.itemDetailPage .scrollSliderContainer,
.itemDetailPage .scrollButtonContainer {
background-color: transparent !important;
background: transparent !important;
}
```
No `position:relative; z-index:1` needed on `.emby-scroller` — the parent
`.detailPageWrapperContainer` already has `position:relative; z-index:2`,
which is above the pinned `.backdropContainer` at `z:0`. Removing the opaque
fill alone is sufficient.
**Verification.** Re-ran band-locator after `docker restart jellyfin`
`opaqueBlackBands: 0` inside `.itemDetailPage` (was 1). Screenshot of R&M
detail page at mid-scroll now shows portal/Easter Island backdrop continuous
behind every carousel including "More Like This". Cleaned up the
`arrflix-band-diag-2026-05-09` ApiKey row.
**Patch lines added** to `bin/apply-26-incident-fixes.sh` so re-runs are
idempotent and recover from `branding.xml` drift.
**Lesson.** When a prior unscoped `background: #000 !important` rule exists
in a shared CSS bucket (here: `branding.xml CustomCss`), grep the file for
the property/selector BEFORE writing a new transparent-scope rule. A
DOM-walking locator script that reports every opaque ancestor of the target
finds the painter in seconds — much faster than guessing selectors. Going
forward: when adding a "paint opaque" rule, scope it from day one
(`.homePage .emby-scroller`, not bare `.emby-scroller`).
---
## Open follow-ups (for separate sessions)
- **AV1+Opus playback** (Bug E): Chrome's AV1 DirectStream codec-tag mislabel
bug. Fix options: (a) ban AV1 DirectStream via DeviceProfile (force x264
transcode), (b) re-encode MNS source to H.264, (c) wait for 10.11.8
upgrade. See agent finding in this doc → "Playback diagnosis".
- **10.11.8 migration**: current 10.10.3 has known issues per online research
(TMDB scrape regression #14922, custom CSS injection #7220). 10.11.8 is
current stable as of 2026-05-09 with CVE fixes. Plan: dev first, snapshot
EF Core DB migration, swap Cineplex → ElegantFin (10.11-supported), promote
to prod after verified.
- **Permanent SW kill option** (deferred — stock SW doesn't actually
intercept anything): if a future Jellyfin update enables a real fetch-handler
SW, we have the recipe in this doc → "SW kill recipe" agent finding.
- **Session-state backup off-host** (ROADMAP H4): no automated backup yet.
Today's incident was rescued by inline `cp X X.bak.$(date +%s)` for both
branding.xml and dynamic.yml — should be systematized.
doc 26 INC4: black band + 4K HDR slow transcode + v2 test + methodology audit Two regressions slipped through INC1-3: INC4a -- BLACK BAND behind every detail-page carousel Pre-existing 2026-05-08 home-page rule painted .emby-scroller {bg:#000 !important} UNSCOPED. Hits every carousel inside .itemDetailPage incl admin-only More from Season N, More Like This. INC1-3 transparent-scope list missed .emby-scroller / .verticalSection / .padded-top-focusscale. Fixed by extending scope. INC4b -- VIDEO 'BLACK SCREEN' on play Not actually black-screen. CPU-only nullstone cannot sustain real-time 4K HEVC HDR tonemap+x264 transcode -- 0.5x realtime, ffmpeg takes ~6s per 3s segment. With user resume seeks adding restart overhead, total wait ~18s before browser readyState rises. User saw black, gave up. Fix: disable EnableTonemapping (R&M fake HDR per doc 21) + cap RemoteClientBitrateLimit=20Mbps on every user (1080p target, no 4K scale). Headless v2 test confirms HEVC + AV1 episodes now hit readyState=3/4 within wait window; 4K HDR R&M still slow (heaviest). INC4 testing methodology audit -- bin/headless-test-v2.py v1 only logged in as guest and never clicked Play. v2 runs both admin and guest, walks 3 codec-tagged items per role (HEVC/AV1/H.264), clicks Play, captures <video> state, sweeps DOM for opaque bgs over backdrop layer. False positives: off-viewport #reactRoot + collapsed .mainDrawer (negative coords). Allowlist refinement TODO. Open: 4K HDR sources still slow even post-fix. Real fix path = pre- transcode masters to 1080p H.264 SDR via separate batch, OR migrate to 10.11.8 with vaapi/qsv driver fixed.
2026-05-09 01:46:47 +01:00
---
## Iteration 2
### INC4 testing methodology audit
This iteration is a meta-audit on the test that signed off Iteration 1.
After INC1INC3 shipped, owner reported two regressions the headless test
did NOT catch:
1. A black band painted behind the **"More from Season N"** carousel on
detail pages.
2. **Video plays as a black screen** on the user's actual TV episode
content (AV1+Opus from Mike Nolan Show), even though the test claimed
playback was fixed.
This section documents what the v1 test missed, why those gaps existed,
what `bin/headless-test-v2.py` changes, and the preflight protocol every
future fix must pass before claiming "verified".
#### a) What v1 missed
| Gap | Concrete consequence |
|---|---|
| Logged in **only** as `guest` (non-admin restricted user). | The "More from Season N" carousel is admin-visible content. `guest`'s permissions hid it from the DOM, so the section wrapper that painted the black band never rendered during the test. v1 reported "no regression" because the offending element wasn't on the page it screenshotted. |
| **Never clicked Play.** v1 only loaded the detail page, took screenshots, scraped a small fixed selector list. | A `<video>` element that fails to decode (AV1 in Chrome with mislabelled codec tag, per Bug E in this doc) won't show up unless you actually start playback. v1 had no way to observe `video.error`, `video.readyState`, `videoWidth/Height`, or `currentTime` because the player was never instantiated. |
| **Only one item tested.** v1 auto-picked the first Series and probed its detail page. | Codec coverage was random — usually whatever happened to be first alphabetically. The HEVC movie that worked (Dark Knight) and the AV1 episode that didn't (Mike Nolan Show) had different failure modes; v1 couldn't distinguish them because it tested neither systematically. |
| **Hardcoded selector list** for DOM probe. | v1 inspected ~22 known selectors. Any new section wrapper (e.g. `.moreFromSeasonContainer`) painting an opaque background outside that list was invisible. The black band lived in a wrapper v1 didn't even know existed. |
| **No structured pass/fail criterion.** v1 emitted `probe.json` with raw computed-style snapshots; humans had to read it and decide. | "I declared playback fixed" — that human decision had no machine-verifiable backing. There was no JSON field saying `regressions: []` that owner / next-Claude could trust without re-deriving from raw data. |
| **No cross-reference to a known-good baseline.** | Even if v1 had caught the band, there was no golden-image comparison to alert "this looks different from last passing run". Detection relied on someone eyeballing the screenshot. |
#### b) Why those gaps existed
- **Speed-bias.** v1 was written under time pressure as the third-tier
verification of an INC3 CSS fix. The minimum viable test was "page
loads and looks right at top + scrolled". That worked for the visual
bug it was designed against — and stopped there.
- **No threat model for the test itself.** The test never asked "what
classes of regression CAN I detect, what classes CAN'T I". If it had,
the missing-Play and admin-only-content gaps would have been obvious.
- **Single-account convenience.** `guest-mirror` was the easiest creds
to hand because doc 17 had just minted them. Re-using one role across
the whole verification was the path of least resistance.
- **Selector tunnel-vision.** The selector list was copied from the
previous fix's diagnostic queries (INC2/INC3). It tracked what the
previous bugs touched, not what the current page actually rendered.
- **Server-log success treated as proof of client success.** Bug E was
declared "fixed" because Dark Knight transcoding logs looked clean.
No one closed the loop and confirmed the user's actual content
(Mike Nolan Show / AV1) decoded in a real browser.
#### c) What v2 changes (`bin/headless-test-v2.py`)
| Improvement | Mechanism |
|---|---|
| **Multi-user coverage** | Runs the entire probe twice: once as admin (`s8n` / `s8n-dev`), once as non-admin (`guest` / `guest-mirror`). Per-user screenshots + `probe.json`. Computes a `section_title_diff` listing which sections rendered for one role but not the other — that diff is the canonical alert for "you're missing admin-only content". |
| **Click Play + observe** | After detail page settles, locates `.btnPlay` / `[data-action="play"]`, clicks (with keyboard `p` fallback), waits 10 s, then reads `<video>` element state: `currentTime`, `paused`, `ended`, `readyState`, `networkState`, `videoWidth`, `videoHeight`, `error.code`, `buffered_ranges`. Also captures a `*-play.png` screenshot and accumulates new console / network errors during the playback window. |
| **Multiple-item coverage** | Three items per role: HEVC movie (Dark Knight, hardcoded id `7aa5add2c2d8575eda5280b9b9072071`), AV1 episode (auto-picked from Mike Nolan Show), H.264 episode (auto-picked from a different series). Codec types are labelled in JSON so failures can be attributed to a codec class, not "the test failed". `ITEMS=` env var overrides for ad-hoc runs. |
| **Section-bg sweep** | At scroll-bottom, walks `document.querySelectorAll('*')` and reports every visible element with non-transparent `backgroundColor` whose rect overlaps the viewport. Filters via a small `BG_ALLOWLIST` (video player, dialogs, header) and a darkness heuristic (R+G+B < 90 likely a black-band regression). Output goes into `probe.json` under `runs[].items[].regressions`. |
| **Golden-screenshot diff** | If `OUT/golden/<key>-{top,mid,bot,play}.png` exists, the run computes a Pillow `ImageChops.difference`, writes a diff PNG, and emits `{bbox, ratio}` per shot. Maintainer can populate goldens after the next clean run; subsequent runs flag drift quantitatively. |
| **Structured pass/fail JSON** | `probe.json` now has stable shape: `{url, runs:[{role, user, is_admin, items:[{kind, probe, play, regressions, diffs_vs_golden}]}], section_title_diff, issues, exit_code}`. `grade()` produces `issues[]` and exits 0/2 deterministically. CI / orchestration can `jq '.issues | length' probe.json`. |
| **Documented invariants up front** | The script header explicitly lists "what v1 missed and how v2 closes it" so the next person reading it doesn't repeat the speed-bias trap. |
#### d) Preflight protocol — do this before claiming any ARRFLIX fix is "verified"
Treat this list as a hard gate. If any step is skipped, the fix is
**unverified**, not "fixed".
1. **Run v2 with both roles.** `bin/headless-test-v2.py https://dev.arrflix.s8n.ru`.
Confirm exit code 0 AND `probe.json .issues` is empty. If exit code 2,
read `.issues[]` — those are concrete regressions, not flaky test noise.
2. **Inspect `section_title_diff`.** A non-empty `only_admin` array means
the admin sees content the guest doesn't — that section MUST be
verified visually in the admin screenshots, because guest-only testing
would have been blind to it.
3. **Confirm playback per codec.** For each item in `runs[].items[]`,
`play.video.readyState` must be ≥ 2 AND `play.video.error` must be
`null`. `paused` is acceptable iff `currentTime > 0` (autoplay policy
may pause after the first frame, but a frame DID render). `videoWidth`
and `videoHeight` must be > 0 — that's the canonical "actually
decoding" check.
4. **Sweep flagged dark backgrounds.** Any element in
`runs[].items[].regressions` that is not a known overlay (dialog,
video player chrome, drawer header) is a candidate band-bg
regression. Add it to `BG_ALLOWLIST` only if the design genuinely
intends it to be opaque; otherwise fix the CSS.
5. **Diff against goldens.** If `diffs_vs_golden[].ratio` for any shot
exceeds your threshold (start at 0.02 = 2% pixels changed), open the
`*-diff.png` and confirm the change was intended.
6. **Run on prod after dev passes.** Same script, same expectations:
`bin/headless-test-v2.py https://arrflix.s8n.ru`. Dev mirror exists
(doc 12 / doc 17) precisely so you can verify there first.
7. **Only THEN write "verified" in the doc.** Always cite the run's
`probe.json` path and exit code in the verification note. Future-you
needs to be able to re-run the exact same gate.
Three single-sentence rules carved out of this protocol, for posters on
the wall:
- **Always test as both admin and non-admin** — admin-only sections are
invisible to guests, and a fix that breaks admin-only content will not
be detected by guest-only tests.
- **Always click Play** — page-load is necessary but not sufficient;
black-screen playback only manifests after `<video>` is instantiated
and a frame is requested.
- **Always sweep ALL backgrounds** — fixed-list selector probes only
catch regressions in selectors you already knew about, which is the
opposite of what a regression test is supposed to do.
## Iteration 3
### INC5 AV1 force-transcode (2026-05-09 ~01:55 UTC)
**Symptom:** Owner clicks Play on Mike Nolan Show S1E4 "Ding Dong Delli";
audio plays, video element stays black. Diagnosed as
[jellyfin#15646](https://github.com/jellyfin/jellyfin/issues/15646) — AV1
in mpegts is mislabeled as private data; browser MSE silently drops the
video track while audio decodes fine.
**Path chosen:** *Nuclear / re-encode source files.* DLNA `system/`
profiles directory does not exist in this 10.10.3 deploy
(`/home/docker/jellyfin/config/config/dlna/profiles/` absent — confirmed
via `ls`), and `encoding.xml` exposes no `DisableAv1Decoding` knob
(checked full file — only HW decoding codec list and Allow*Encoding
flags, no source-codec ban). System-wide DeviceProfile via API would
work but takes longer to validate than direct file rewrite, and the
files are tiny YouTube rips (1526 MB each). Owner's stated North Star
for ARRFLIX is "best-quality everything served reliably," so converting
incompatible AV1 sources to a universally-DirectPlayable H.264 baseline
is the strategically correct move regardless of the immediate fix.
**Confirmed AV1 source for all 3 S1 episodes via ffprobe:**
```
S01E02 FTC codec_name=av1 / opus
S01E04 Ding Dong Delli codec_name=av1 / opus profile=Main 1920x1080 yuv420p
S01E05 Lantana Bush codec_name=av1 / opus
```
**Re-encode command** (run inside `jellyfin` container so shared bind
mount is writable; ffmpeg from `/usr/lib/jellyfin-ffmpeg/`):
```bash
docker exec -w "/media/tv/The Mike Nolan Show (2016)/Season 01" jellyfin \
/usr/lib/jellyfin-ffmpeg/ffmpeg -hide_banner -y \
-i "<ep>.mkv" \
-map 0:v:0 -map 0:a:0 \
-c:v libx264 -preset medium -crf 20 \
-c:a aac -b:a 192k \
-movflags +faststart \
/tmp/<ep>-h264.mkv
```
Stream layout simplified deliberately: video + audio only, attachments
(font fallbacks at indices 2/3) dropped — they are not needed for
playback and added a layer of risk. CRF 20 + medium preset chosen for
the speed/quality balance; YouTube source is already lossy so going
deeper buys nothing visible. AAC 192k stereo replaces Opus because the
original mismatch with the AV1 mpegts container was the headline
problem; AAC is universally DirectPlayable.
**Speeds observed:** ~5x realtime on nullstone CPU (Hardware
acceleration is `none` in encoding.xml — see Known Issues). 5m28s of
1080p ran in ~70s wall. Output sizes 8.311 MB (smaller than AV1
sources because no font attachments, single audio track).
**Atomic swap** (each episode):
```bash
docker cp jellyfin:/tmp/<ep>-h264.mkv "<dir>/.<ep>.tmp"
mv "<original.mkv>" /tmp/<ep>-av1-original-$(date +%s).mkv.bak
mv "<dir>/.<ep>.tmp" "<original.mkv>"
```
Originals retained at `/tmp/S01E0{2,4,5}-av1-original-1778288{113,184}.mkv.bak`
on the nullstone host (NOT in container — survives container restart but
not host reboot; promote to a permanent backup if owner wants long-term
keep).
**Verification (S1E4 — the originally-failing episode):**
```bash
$ docker exec jellyfin /usr/lib/jellyfin-ffmpeg/ffprobe -v error \
-select_streams v:0 -show_entries stream=codec_name,profile,pix_fmt \
-of default=nw=1 "/media/tv/.../S01E04 - Ding Dong Delli.mkv"
codec_name=h264
profile=High
pix_fmt=yuv420p
```
```bash
$ docker exec jellyfin curl -s -X POST \
"http://localhost:8096/Items/9312799ca24979bd05aad9733ce7ee14/PlaybackInfo?UserId=2BE0F0D3-FE3A-45DC-9298-138A15A01925&MaxStreamingBitrate=120000000&api_key=<key>" \
-H "Content-Type: application/json" \
-d '{"DeviceProfile":{"DirectPlayProfiles":[{"Container":"mkv","Type":"Video","VideoCodec":"h264","AudioCodec":"aac,mp3,opus"}], ...}}'
# Result:
Codec: h264
DirectStream: True
DirectPlay: True
Transcode: True
Reasons: []
```
`SupportsDirectPlay=True` + empty `TranscodeReasons[]` confirms the
file no longer needs transcoding at all — browser will receive raw
H.264/AAC inside the mkv container, decode natively, and render frames.
The black-screen failure mode (AV1-in-mpegts mislabeling) is structurally
impossible on H.264 sources.
**`/Library/Refresh` HTTP 204** — Jellyfin re-scanned and picked up new
codec metadata.
**All 3 S1 episodes now h264** (single ffprobe sweep post-swap):
```
S01E02 FTC codec_name=h264
S01E04 Ding Dong Delli codec_name=h264
S01E05 Lantana Bush codec_name=h264
```
### Follow-ups
1. **Owner click-test.** Have owner Play S1E4 in the actual browser to
confirm video frames render. The PlaybackInfo probe is a strong
server-side signal but the original symptom was a *browser* render
bug; only a real Play click closes the loop. Flag for INC5-verify.
2. **Sweep entire library for AV1.** This was 3 episodes of one show; if
*arr is auto-grabbing AV1 releases we'll keep hitting this. Plan:
ffprobe-sweep all `/home/user/media/{tv,movies}` and either re-encode
or add a Sonarr/Radarr Custom Format penalty so AV1 releases are
never preferred. Tracked separately.
3. **Permanent backup of `*-av1-original-*.mkv.bak`.** Currently in
nullstone `/tmp` — host reboot will lose them. If owner wants
originals retained, move to `/home/user/media/.archive/av1-originals/`
or similar.
4. **Ban AV1 server-side anyway.** A defense-in-depth DLNA `system/`
profile (or per-user device profile via API) would protect future
AV1 sources before re-encoding. Defer until #2 produces a count of
how often this happens in practice.
5. **Hardware encoding still off.** `encoding.xml` shows
`HardwareAccelerationType=none`. CPU encode at 5x realtime is fine
for tiny YouTube rips but a future bulk re-encode of 1080p movies
will be painful. Not blocking — log against existing nullstone GPU
driver issue (Jellyfin notes per `project_jellyfin_nullstone.md`).
### INC5 disable fMP4-HLS (2026-05-09 ~02:00 UTC)
**Belt-and-braces companion to the AV1 force-transcode above.** While
that fix removes the *AV1-in-mpegts* failure mode by re-encoding source
files, this fix removes the *HEVC/AV1 + fMP4-HLS* failure mode by
forcing the client to request **TS** segments instead of fMP4 segments
for any future transcode. Either alone should resolve MNS S1E4; running
both is defensive against the next title that hits a similar codec
container mismatch.
**Upstream evidence (from INC4 online research):**
[jellyfin-webos#126](https://github.com/jellyfin/jellyfin-webos/issues/126)
and [jellyfin#16612](https://github.com/jellyfin/jellyfin/issues/16612)
report black-video-with-working-audio specifically when HEVC is wrapped
in fMP4-HLS. Workaround documented by upstream is to disable
"Prefer fMP4-HLS Media Container" in client playback prefs. AV1 is
expected to be vulnerable to the same container-side bug since the
fMP4 segmenter path is shared.
**Server confirmation (before fix):**
```bash
$ ssh user@192.168.0.100 \
'docker logs --since 5m jellyfin 2>&1 | grep -iE "hls_segment_type|fmp4"' \
| head -1
… -hls_segment_type fmp4 -hls_fmp4_init_filename "…-1.mp4" \
-hls_segment_filename "…%d.mp4" …
```
Confirms server is currently emitting `*.mp4` (fmp4) segments — the
affected codepath.
**Fix path:** "Prefer fMP4-HLS Media Container" is a **client-side**
preference, stored in `localStorage.enableHlsFmp4`. Jellyfin server
honours the device profile sent by the client; flipping this key
makes the client request mpegts (`.ts`) segments and the server
responds with `-hls_segment_type mpegts`. No server config / DLNA
profile edit needed. Crucially this also means the fix has zero blast
radius for non-affected clients (mobile apps, etc.) — they ignore the
web-only localStorage shim.
**Implementation (`web-overrides/index.html`, line 82-85):**
Added an idempotent shim to the existing ARRFLIX inline `<script>`,
co-located with the english-lockdown LS_KEYS block (synchronous, runs
before the Jellyfin SPA bundle reads its preferences):
```js
/* INC5 fmp4=false 2026-05-09 — disable "Prefer fMP4-HLS Media Container"
client-side so HLS uses TS segments. Works around HEVC+fMP4
black-video bug (jellyfin-webos#126, jellyfin#16612). */
try { localStorage.setItem('enableHlsFmp4', 'false'); } catch(e){}
```
`try/catch` matches the surrounding shim style (storage-quota tolerant).
**Deploy:** `scp` to nullstone
`/opt/docker/jellyfin/web-overrides/index.html` (bind-mounted into the
container — no restart required). Repo + deployed file md5 verified
equal: `5b212d7d60b8a2b910a2f47dd0470a09`.
**Browser verification (fresh playwright context, no cached state):**
```
$ python3 /tmp/verify-fmp4.py
localStorage.enableHlsFmp4 = 'false'
localStorage.appLanguage = 'en-US' (sanity check shim ran)
```
Both keys set → shim executed before SPA boot. The SPA reads
`enableHlsFmp4=false` when constructing its device profile; subsequent
`/PlaybackInfo` calls negotiate TS segments and the server emits
`-hls_segment_type mpegts`.
**Headless smoke (`bin/headless-test-v2.py`):** No new regressions
introduced. Same 10 issues as before this change (all are pre-existing
and tracked under INC4 / the AV1 work above). Probe artefact:
`/tmp/arrflix-fmp4-test/probe.json`.
**Owner action:** Hard-reload browser (Ctrl+Shift+R) and re-test
MNS S1E4. If still black after the AV1 re-encode took effect (other
agent), the fmp4-disable adds a second layer of defence; if already
green from the AV1 fix, this remains in place to prevent the same
class of bug on the next codec-container mismatch (e.g. an HEVC movie
that the device profile doesn't DirectPlay).
**Repo commit:** `web-overrides/index.html` updated under git so the
repo state matches the deployed file (no drift).