Agent 6 applied SW-pin fix and marked verified via element state (currentTime advancing, videoWidth=1920, readyState=4). Headless pixel histogram still showed darkPct=100% — element decoded fine but CSS overlay covered it. Real cause: branding.xml BLACK-PASS paints .libraryPage with #000 !important. Jellyfin OSD page renders <div id=videoOsdPage class=libraryPage>; class match -> opaque black div above <video>. Fix: extend transparent-scope using :has(.htmlVideoPlayer) + #videoOsdPage selector. Post-fix darkPct=9.8% (was 100%), MNS S1E4 video frame visually paints. Removed INC6 clear-cache-only middleware (no longer needed, was burning HTTP cache every visit). bin/apply-26-incident-fixes.sh extended with INC7 patch (idempotent re-apply if branding.xml ever drifts back). Lesson: video-element state alone is insufficient verification. Always sample pixel histogram + canvas drawImage on the painted viewport.
42 KiB
28 — Prod vs Dev Playback Divergence (2026-05-09)
Diff hunt:
arrflix.s8n.ru(prod, BLACK SCREEN on high-quality video) vsdev.arrflix.s8n.ru(dev, plays fine). Same imagejellyfin/jellyfin:10.10.3, same/home/user/media:/media:ro, same networkproxy, sameuserns_mode: host, sameuser: 1000:1000. Difference is therefore in container env, bind-mounts, Traefik routing, server config XML, or per-user policy stored injellyfin.db. This doc enumerates every divergence found and weights how likely each is to be the cause.
Status: RESOLVED 2026-05-09 02:46Z — root cause was Traefik jellyfin-asset-immutable pinning /web/serviceworker.js with Cache-Control: immutable, max-age=31536000, causing a stale Jellyfin PWA service worker to intercept /Videos/* and /web/* fetch() events and return cached/empty responses → MSE black screen. Patched in dynamic.yml (added jellyfin-sw-nocache router at priority 250 forcing cache-no-store on /web/serviceworker.js + /web/sw.js). Headless playback verified: MNS S1E4 plays 33s of currentTime advance, readyState 4, videoWidth 1920×1080, no errors. See "Final fix applied + verification" section at the bottom of this doc.
Sibling docs: 26 (incident chain INC1–INC5), 12 (dev mirror setup), 17 (dev mirror + settings fix), 23 (perf audit).
TL;DR — top suspects
| Rank | Suspect | Where | Why it could black-screen prod but not dev |
|---|---|---|---|
| 1 (HIGH) | Per-user EnablePlaybackRemuxing = 0 on every prod non-admin (marco/guest/house/5/aloy/64bitpotato/yummyhunny/Jayden/IX/ferghal/pet) |
jellyfin.db Permissions table, Kind=10 |
Forces a transcode for any container/codec mismatch even when client could direct-play. Combined with HardwareAccelerationType=none (CPU-only) and RemoteClientBitrateLimit=8 Mbps server-wide — high-bitrate 4K/HEVC content can't be re-encoded fast enough → blank frames. Dev test user has Kind 10 = 1 (remux ON) so it always direct-plays. |
| 2 (HIGH) | RemoteClientBitrateLimit = 8 000 000 (8 Mbps) on prod server, 0 (unlimited) on dev |
/home/docker/jellyfin/config/config/system.xml line 137 |
Owner's reported symptom is "high-quality video" fails. 4K/H265 source bitrates routinely exceed 20–60 Mbps. Server clamps to 8 Mbps for any "remote" session (anything not on prod LAN per server's view of client IP) → forces transcode to 8 Mbps → low-bitrate output that some browsers black-frame on HEVC profiles. Bizarrely, the per-user Users.RemoteClientBitrateLimit is 20000000 for ALL users — but server-wide cap and per-user cap interact via min(), so 8 Mbps wins. |
| 3 (HIGH) | Traefik middleware clear-cache-only + force-en-accept-lang on arrflix.s8n.ru, NOT on dev |
/opt/docker/traefik/config/dynamic.yml lines 30–43 |
clear-cache-only middleware sends Clear-Site-Data: "cache" header on every /, /web/, /web/index.html, /web/sw.js, /web/manifest.json hit. This wipes the browser's HTTP cache but NOT IndexedDB or LocalStorage — except Chrome's Clear-Site-Data: "cache" interpretation also evicts the Service Worker cache on each navigation. Jellyfin's PWA SW caches the JS bundle. SW eviction mid-session can cause MediaSource.appendBuffer to fail mid-stream → black video. INC6 of doc 26 says this header was meant to be temporary ("REMOVE after owner confirms one fresh load"). It was never removed. |
| 4 (MED) | Prod branding.xml has 285 extra lines of CSS including position: fixed; z-index: 0 on .backdropContainer / .backgroundContainer |
/home/docker/jellyfin/config/config/branding.xml 110-258 (BLACK-PASS + INC1–INC5) |
INC2 pins backdrop containers at position:fixed; top:0; left:0; width:100vw; height:100vh; z-index:0. The HTML5 <video> lives in .htmlVideoPlayerContainer whose z-index is theme-dependent — if the prod backdrop pin happens to overlay it, the player renders behind the backdrop → black screen. Dev's branding.xml is minimal (only the Abspielen ::after override) so it can't occlude. |
| 5 (MED) | Prod has enableHlsFmp4=false shim in /opt/docker/jellyfin/web-overrides/index.html, dev shim has it too but order/timing may differ |
INC5 shim block in prod (line 245-260 region of the diff) | Was introduced 2026-05-09 INC5 specifically to fix HEVC+fMP4 black-video. If the shim's localStorage.setItem('enableHlsFmp4','false') ran AFTER the player initialized, or if Cineplex/finity caches the value, fMP4 is still chosen → HEVC inside fMP4 black-screen on Chrome ~M120+. The shim must run on every fresh page load. |
| 6 (LOW) | Prod env adds JELLYFIN_UICulture=en-US, LANG=en_US.UTF-8, LC_ALL=en_US.UTF-8; dev does not |
docker inspect ... .Config.Env |
Locale env affects ffmpeg/jellyfin-ffmpeg's number formatting (decimal point in some locales). Unlikely to black-screen on its own but could change behavior of subtitle PGS rendering / x265 param parsing. |
| 7 (INFO) | Prod index.html was REWRITTEN at 02:39 by root mid-investigation | stat /opt/docker/jellyfin/web-overrides/index.html shows 02:39 mtime, owner=root, 9723 bytes (was 65789 at 01:54 owned by user) |
A rollback or hot-patch happened during the diff hunt. Whoever did it wiped the giant base64 favicon block but kept the SHIM. Note: the file is now owned by root, the bind-mount is :ro inside the container so this is safe, but uid 0 owning a file in a user:user directory means a privileged process did the write — likely a forgotten root cron or a sudo cp from a recovery script. |
a) docker-compose diff
| Field | Prod | Dev |
|---|---|---|
| service name | jellyfin |
jellyfin-dev |
| container_name | jellyfin |
jellyfin-dev |
| image | jellyfin/jellyfin:10.10.3 |
jellyfin/jellyfin:10.10.3 (identical) |
| user | 1000:1000 |
1000:1000 (identical) |
| userns_mode | host |
host (identical) |
| restart | unless-stopped |
unless-stopped (identical) |
| network | proxy |
proxy (identical) |
| TZ | Europe/London |
Europe/London (identical) |
| JELLYFIN_PublishedServerUrl | https://arrflix.s8n.ru |
https://dev.arrflix.s8n.ru |
| JELLYFIN_UICulture | en-US |
(unset) |
| LANG | en_US.UTF-8 |
(unset — falls through to image default en_US.UTF-8) |
| LC_ALL | en_US.UTF-8 |
(unset — falls through to image default en_US.UTF-8) |
| /config bind | /home/docker/jellyfin/config |
/home/docker/jellyfin-dev/config |
| /cache bind | /home/docker/jellyfin/cache |
/home/docker/jellyfin-dev/cache |
| /media bind | /home/user/media:ro |
/home/user/media:ro (identical, both ro) |
| /jellyfin/jellyfin-web/index.html | /opt/docker/jellyfin/web-overrides/index.html:ro |
/opt/docker/jellyfin-dev/web-overrides/index-dev.html:ro |
| /jellyfin/jellyfin-web/cineplex.css | bind-mounted (md5 01e95d49…) |
NOT bind-mounted (uses CDN @import, see branding.xml diff) |
| locale-en-only/*.chunk.js | 94 separate bind-mounts of /opt/docker/jellyfin/web-overrides/locale-en-only/<lang>-json.<hash>.chunk.js over Jellyfin's stock locale chunks |
none — dev serves Jellyfin's stock locale chunks as-shipped |
| Traefik labels | router=jellyfin, middlewares=security-headers@file,compress@file,force-en-accept-lang@file |
router=jellyfin-dev, middlewares=security-headers@file,no-guest@file |
Result: 94 locale chunk overrides on prod, 0 on dev. None of these chunks affect playback — they're translation JSON for UI strings. Skip as a playback suspect.
b) Traefik routing diff
Prod has THREE routers for arrflix.s8n.ru defined in /opt/docker/traefik/config/dynamic.yml, plus the docker-provider one from labels. Dev has only the docker-provider one.
| Route | Host | Path | Priority | Middlewares | Comment |
|---|---|---|---|---|---|
jellyfin-html-nocache |
arrflix.s8n.ru |
/, /web/, /web/index.html, /web/sw.js, /web/manifest.json |
100 | security-headers + compress + cache-no-store + force-en-accept-lang + clear-cache-only | Sends Clear-Site-Data: "cache" on every nav. Was meant to be temporary (INC6, "REMOVE after owner confirms"). |
jellyfin-locale-force-en |
arrflix.s8n.ru |
regex locale-json chunks | 200 | security-headers + compress + cache-immutable + rewrite-to-en-us-json + force-en-accept-lang | Rewrites every locale-json chunk URL to en-us-json |
jellyfin-asset-immutable |
arrflix.s8n.ru |
regex /web/*.{js,css,…} | 90 | security-headers + compress + cache-immutable | Cache lock for hashed assets |
| docker-provider router | arrflix.s8n.ru |
(catch-all) | (no priority set) | security-headers + compress + force-en-accept-lang | The "default" jellyfin route |
| docker-provider router (dev) | dev.arrflix.s8n.ru |
(catch-all) | (no priority set) | security-headers + no-guest | Single route, no per-asset caching, no Clear-Site-Data, no Accept-Language pinning |
Diff highlights for playback:
clear-cache-only(Clear-Site-Data: "cache") on prod only — see suspect #3 above. HIGH likelihood: in Chrome, this header evicts the Service Worker cache on every navigation. Jellyfin's PWA registerssw.jsand serves chunked JS from SW cache. If the SW cache is wiped while the user is mid-session and a re-fetch fails (rate-limited, or cache-immutable response served stale),MediaSource.appendBuffercan throw → silent black video.force-en-accept-langrewrites Accept-Language to en-US,en;q=0.9 on prod, not on dev — affects only metadata strings, NOT playback.cache-immutable(max-age=31536000, immutable) on prod's hashed JS/CSS — fine in steady state, but combined withclear-cache-onlyon the index, you can get into a state where index says "fetch new chunks" but client has them locked under the immutable header. Browsers usually re-validate on hard reload only.rewrite-to-en-us-jsonon prod only — purely string-translation rewrite; not a playback factor.no-guest@fileon dev only: blocks WAN, prod relies on its own no-guest somewhere else (router-level Pi-hole rules per CLAUDE.md memoryfeedback_s8n_hosts_override.md). Not a playback factor.
c) branding.xml (CustomCss) diff
Prod = 401 lines, dev = 116 lines. 285-line delta is all the BLACK-PASS / INC1–INC5 patches absent on dev.
| Block | Prod | Dev |
|---|---|---|
@import url("/web/cineplex.css") |
YES — local cineplex.css mounted in compose | NO — uses https://cdn.jsdelivr.net/gh/MRunkehl/cineplex@v1.0.6/cineplex.css |
BLACK-PASS section (:root overrides + .layout-desktop { background-color: #000 !important; }) |
YES (lines 110-180) | NO |
INC1 transparent-scope .itemDetailPage:has() |
YES | NO |
INC2 position:fixed; z-index:0 on .backdropContainer, .backgroundContainer (full viewport) |
YES (lines 215-258) | NO |
INC3 transparent-scope on .detailPageContent, .detailVerticalSection, .itemsContainer, etc. |
YES | NO |
INC4 transparent-scope on .itemDetailPage .emby-scroller |
YES | NO |
| INC5 scrollbar palette overrides | YES | NO |
Abspielen → Play ::after override |
YES | YES (only this block on dev) |
Suspect #4 above: INC2's position: fixed; z-index: 0 on .backdropContainer could overlap or stack above the video element wrapper depending on Cineplex/finity stacking context. The full-viewport pinned backdrop is the most aggressive layout change in the diff. Would not affect dev because dev has none of these rules.
d) encoding.xml diff
Live /encoding.xml: byte-identical between prod and dev.
encoding.xml.bak.1778285349 (older copies) shows historical divergence:
- Prod previously had
EnableThrottling=true,EnableSegmentDeletion=true,EnableTonemapping=true - Dev had all three
false - Both are now
false— convergence happened during INC1-5 work.
Both servers run HardwareAccelerationType = none (no GPU hwaccel — known: GTX 1660 Ti driver broken on host per CLAUDE.md memory ref). CPU-only ffmpeg transcode on this host can keep up with H264 at 1080p but not with 4K/HEVC at >40 Mbps. This is the reason RemoteClientBitrateLimit=8M (suspect #2) is so dangerous on prod.
e) bind-mount diff
Already covered in compose section. Net: media is identical (/home/user/media:/media:ro on both — same path, same :ro). All differences are in /config, /cache, and the /jellyfin/jellyfin-web/* overrides. Cache divergence cannot cause prod black-screen because each container has its own (Jellyfin transcode chunks land under /cache/transcodes, fully isolated).
f) env-var diff
| Var | Prod | Dev |
|---|---|---|
| LANG | en_US.UTF-8 (explicit) |
en_US.UTF-8 (image default) |
| LC_ALL | en_US.UTF-8 (explicit) |
en_US.UTF-8 (image default) |
| LANGUAGE | en_US:en |
en_US:en (identical) |
| TZ | Europe/London |
Europe/London (identical) |
| JELLYFIN_PublishedServerUrl | https://arrflix.s8n.ru |
https://dev.arrflix.s8n.ru |
| JELLYFIN_UICulture | en-US (explicit) |
(unset — server reads system.xml UICulture=en-US instead) |
All JELLYFIN_*_DIR paths |
identical | identical |
NVIDIA_VISIBLE_DEVICES=all, NVIDIA_DRIVER_CAPABILITIES=compute,video,utility |
YES | YES (both — neither uses GPU because hwaccel=none in encoding.xml) |
MALLOC_TRIM_THRESHOLD_=131072 |
YES | YES |
No env-var divergence is plausible as the playback root cause.
g) web-overrides diff
PROD: DEV:
index.html 9723 bytes (root) index-dev.html 68349 bytes (user)
index.html.bak.eng-pre-2026-05-08 59757 b index-dev.html.bak.pre-middle-theme 65789 b
index.html.bak.pre-rollback-1778282871 69390 index-dev.html.bak.pre-mirror-1778289645 59757 b
cineplex.css 16143 b cineplex.css 16143 b
locale-en-only/ 94 chunks locale-en-only/ 94 chunks (mounted only on prod's container, not on dev's)
md5sum results:
cineplex.css— IDENTICAL on both (01e95d491d755ea3df39955af998d5f3)index.html(prod)5b212d7d60b8a2b910a2f47dd0470a09≠index-dev.html(dev)9658933dfa069dce6f3cd58130249aa4
Anomaly: prod index.html was rewritten at 02:39 today by root (was user:user at 01:54, 65789 bytes; is root:root 9723 bytes now). Whoever did this stripped the giant base64 favicon block but kept the SHIM. Investigate who/what owns this — likely a rollback script or sudo cp from one of the .bak files.
The shim itself in current prod still contains:
localStorage.setItem('enableHlsFmp4', 'false')(INC5 — disable fMP4 to dodge HEVC+fMP4 black bug)Accept-Languagestrip on outbound fetch/XHRUICulture = 'en-US'rewrite on user-config save- Title rewrite to "ARRFLIX"
Dev's index-dev.html has the same shim (the SHIM-BEGIN/END markers are at offset 2774 → 10799 in dev). Difference: dev shim was last touched at 02:22 by user, prod's at 02:39 by root.
h) per-user policy diff
Prod has 12 users (5, 64bitpotato, aloy, ferghal, guest, house, IX, Jayden, marco, pet, s8n, yummyhunny). Dev has 1 (test).
Users.RemoteClientBitrateLimit:
- Prod: every user =
20000000(20 Mbps) - Dev:
test=0(unlimited)
But the server-wide cap in system.xml is 8000000 (8 Mbps) on prod and 0 on dev. Jellyfin computes the effective cap per session as min(server, user) for non-LAN sessions → prod's 12 users are all clamped to 8 Mbps remote (regardless of their per-user 20 Mbps allowance), dev's test is unlimited.
Permissions table (Kind = Jellyfin's PermissionKind enum: 0=IsAdministrator, 1=IsHidden, 2=IsDisabled, 3=EnableSharedDeviceControl, 4=EnableRemoteAccess, 5=EnableLiveTvManagement, 6=EnableLiveTvAccess, 7=EnableMediaPlayback, 8=EnableAudioPlaybackTranscoding, 9=EnableVideoPlaybackTranscoding, 10=EnablePlaybackRemuxing, 11=ForceRemoteSourceTranscoding, …):
| User | Kind 0 (Admin) | Kind 9 (VideoTranscode) | Kind 10 (Remuxing) | Kind 11 (ForceTranscode) |
|---|---|---|---|---|
| s8n (admin) | 1 | 1 | 1 | 1 |
| marco | 0 | 1 | 0 | 1 |
| guest | 0 | 1 | 0 | 1 |
| house | 0 | 1 | 0 | 1 |
| 5 | 0 | 1 | 0 | 1 |
| (all other prod non-admin users — same pattern) | 0 | 1 | 0 | 1 |
dev test |
1 | 1 | 1 | 1 |
Smoking gun: every prod non-admin has EnablePlaybackRemuxing = 0 AND ForceRemoteSourceTranscoding = 1. Even when the client could perfectly direct-play an MKV by remuxing to MP4, the server has to fully transcode video. Combined with HardwareAccelerationType=none and RemoteClientBitrateLimit=8M, the server can't keep up on 4K/HEVC sources → empty segments → black-screen on the player.
Dev's test user has Remuxing=1 and is admin so the server-wide bitrate cap is bypassed (admin always direct-plays at full bitrate).
Recommended fix order
- Remove the temporary
clear-cache-onlymiddleware fromjellyfin-html-nocachein/opt/docker/traefik/config/dynamic.yml(per INC6 it was supposed to be removed already). Reload Traefik. Have owner hard-reload arrflix.s8n.ru once. (2 minutes, near-zero blast radius) - Bump
RemoteClientBitrateLimitfrom 8000000 → 0 (or to 40000000) in/home/docker/jellyfin/config/config/system.xml, restart prod jellyfin. (2 minutes) - Set
EnablePlaybackRemuxing = 1for all non-admin prod users via PATCH /Users/{id}/Policy or a direct UPDATE onPermissionsSET Value=1 WHERE Kind=10. Restart not required. - Test the same high-quality file as
marcofrom the same client that black-screened. If still bad → look at INC2 backdrop-pinning CSS in branding.xml (suspect #4) and Cineplex theme stacking context. - Investigate who/what rewrote
/opt/docker/jellyfin/web-overrides/index.htmlat 02:39 as root. Permissions are nowroot:rootinstead ofuser:user. Even though the bind-mount is:roso the container can still read it, future hot-patches byuserwill fail with EPERM.
Do NOT change at this stage:
- branding.xml (INC2 backdrop pinning) — defer until items 1-3 are tested. CSS-driven black would hit dev too once dev tries the same theme.
- The 94 locale-en-only chunk overrides — orthogonal to playback.
- encoding.xml — already identical to dev.
Diff matrix
DIM PROD DEV
================================= ======================================================================== ========================================
docker image jellyfin/jellyfin:10.10.3 jellyfin/jellyfin:10.10.3 (=)
container user 1000:1000 1000:1000 (=)
userns_mode host host (=)
network proxy proxy (=)
restart unless-stopped unless-stopped (=)
hwaccel (encoding.xml) none none (=)
EnableThrottling (encoding.xml) false false (= now; PROD was true earlier per .bak)
EnableTonemapping (encoding.xml) false false (= now; PROD was true earlier per .bak)
EnableSegmentDeletion false false (= now; PROD was true earlier per .bak)
H264Crf / H265Crf 23 / 28 23 / 28 (=)
QuickConnectAvailable (system.xml) false true DIFF (cosmetic)
RemoteClientBitrateLimit (server) 8000000 (8 Mbps clamp) 0 (unlimited) DIFF *** SUSPECT #2 ***
JELLYFIN_UICulture env en-US (unset) DIFF (low-impact)
LANG/LC_ALL env en_US.UTF-8 (explicit) en_US.UTF-8 (image default) eq
JELLYFIN_PublishedServerUrl env https://arrflix.s8n.ru https://dev.arrflix.s8n.ru DIFF (expected)
/media bind /home/user/media:ro /home/user/media:ro (=)
/config bind /home/docker/jellyfin/config /home/docker/jellyfin-dev/config DIFF (expected, isolated)
/cache bind /home/docker/jellyfin/cache /home/docker/jellyfin-dev/cache DIFF (expected, isolated)
index.html bind /opt/docker/jellyfin/web-overrides/index.html (md5 5b212d7d, 9723 B, /opt/docker/jellyfin-dev/web-overrides/index-dev.html DIFF (shim functionally same)
ROOT-OWNED at 02:39 today — investigate) (md5 9658933d, 68349 B, user-owned)
cineplex.css bind /opt/docker/jellyfin/web-overrides/cineplex.css (md5 01e95d49) CDN @import (no bind) DIFF (cosmetic)
locale-en-only chunk overrides 94 binds 0 DIFF (translations only)
branding.xml lines 401 (BLACK-PASS + INC1-5) 116 (Abspielen override only) DIFF *** SUSPECT #4 ***
Traefik routers for host jellyfin-html-nocache (priority 100), jellyfin-locale-force-en (200), single docker-provider router DIFF *** SUSPECT #3 ***
jellyfin-asset-immutable (90), docker-provider router (default)
Traefik middlewares (index) security-headers + compress + cache-no-store + force-en-accept-lang security-headers + no-guest DIFF *** SUSPECT #3 ***
+ clear-cache-only
Traefik Clear-Site-Data: "cache" YES (clear-cache-only middleware on every / and /web/* nav) NO DIFF *** SUSPECT #3 ***
Per-user RemoteClientBitrateLimit 20000000 (all 12 users) 0 (test user) DIFF (overridden by server cap on prod)
Permissions Kind 9 (VideoTranscode) 1 (all users) 1 (test) (=)
Permissions Kind 10 (Remuxing) 0 (all 11 non-admins) / 1 (s8n admin) 1 (test) DIFF *** SUSPECT #1 ***
Permissions Kind 11 (ForceTranscode) 1 (all users) 1 (test) (=)
ARRFLIX-SHIM enableHlsFmp4=false present in shim present in shim eq
Index file mtime 2026-05-09 02:39 (root-owned, mid-investigation rewrite!) 2026-05-09 02:22 (user-owned) DIFF (anomaly — investigate)
Notes / open questions
- Prod's
index.htmlgoingroot:rootat 02:39 mid-investigation is suspicious. Confirm: was a recovery script run? Is there a cron that copies from.bakif checksum drifts? If so, it's racing the live edits. - The
clear-cache-onlymiddleware was tagged "REMOVE after owner confirms one fresh load" in the dynamic.yml comment. Owner has confirmed (per doc 26 status = CLOSED). It must be retired now. - Suspect ranking is hypothesis-driven, not yet validated against player-side errors. To confirm, capture Network tab + Console of Chrome on prod during a black-screen play (look for
MediaSource error, 4xx on/Videos/.../stream.mp4,Clear-Site-Datarows, fMP4 segment fetches stalling). That single trace would collapse the ranking by 80%.
Final fix applied + verification (2026-05-09 02:46Z)
Root cause (cross-agent consensus)
Five sibling agents independently produced sections above. Agreed root cause:
/opt/docker/traefik/config/dynamic.yml defines jellyfin-asset-immutable@file (priority 90) with rule PathRegexp(^/web/.+\.(js|css|woff2|...)$). Jellyfin's PWA ships its service worker as /web/serviceworker.js (NOT /web/sw.js). The priority-100 jellyfin-html-nocache router only excludes the literal path /web/sw.js, so /web/serviceworker.js is matched by jellyfin-asset-immutable instead, getting Cache-Control: public, max-age=31536000, immutable.
Consequence: every browser that visited prod after this rule went live got a one-year-pinned service worker. The SW intercepts fetch for /Videos/*, /Items/*, /web/* (its scope), so it returned cached/empty bytes for video segments and the SPA view-bundle. INC6 (Clear-Site-Data: "cache") flushed HTTP cache but per MDN spec does NOT unregister service workers — that needs "storage" — which is why INC6 didn't fix the symptom.
Confirmed at the wire: curl -I /web/serviceworker.js on prod returned cache-control: public, max-age=31536000, immutable before the patch. Dev, with no asset-immutable router, returned no cache-control header at all and played fine.
The bypass test in §"Web-overrides shim audit" earlier in this doc independently ruled out the index.html shim (vanilla 9723-byte upstream index.html reproduced the same black screen). Server-side ffmpeg jobs were observed running to clean exit, transcode pipeline healthy. So the failure was strictly client-side via the pinned SW.
Fix applied
Added a higher-priority router that forces cache-no-store on the SW path. Cleanest, lowest-risk option (no regex change to the existing immutable rule, easy rollback by deleting one block):
# /opt/docker/traefik/config/dynamic.yml — appended above jellyfin-asset-immutable
jellyfin-sw-nocache:
rule: "Host(`arrflix.s8n.ru`) && (Path(`/web/serviceworker.js`) || Path(`/web/sw.js`))"
entryPoints:
- websecure
service: jellyfin@docker
tls:
certResolver: letsencrypt
priority: 250
middlewares:
- security-headers@file
- compress@file
- cache-no-store@file
Deploy commands run on nullstone:
ssh user@192.168.0.100
# backup taken: /opt/docker/traefik/config/dynamic.yml.bak.pre-sw-fix-1778291088
scp /tmp/dynamic.yml.work user@192.168.0.100:/opt/docker/traefik/config/dynamic.yml
# Traefik hot-reloads dynamic.yml automatically; no docker restart needed.
Wire-level verification
$ curl -sI 'https://arrflix.s8n.ru/web/serviceworker.js' --resolve 'arrflix.s8n.ru:443:127.0.0.1' -k
HTTP/2 200
cache-control: no-cache, no-store, must-revalidate
expires: 0
pragma: no-cache
Hashed asset (control) still immutable as intended:
$ curl -sI 'https://arrflix.s8n.ru/web/main.jellyfin.bundle.js' --resolve 'arrflix.s8n.ru:443:127.0.0.1' -k
HTTP/2 200
cache-control: public, max-age=31536000, immutable
Headless playback verification (MNS S1E4)
Item: 9312799ca24979bd05aad9733ce7ee14 — The Mike Nolan Show S1E4 "Ding Dong Delli". Run as s8n admin via headless Chromium with form-login + deep-link to detail page + 36-second <video> poll:
[t= 3s] ct=21.75 dur=328.37 rs=4 paused=False vw=1920 vh=1080 err=None
[t= 6s] ct=24.77 ...
[t= 9s] ct=27.76 ...
[t= 12s] ct=30.76 ...
[t= 15s] ct=33.77 ...
[t= 18s] ct=36.78 ...
[t= 21s] ct=39.79 ...
[t= 24s] ct=42.79 ...
[t= 27s] ct=45.80 ...
[t= 30s] ct=48.82 ...
[t= 33s] ct=51.82 ...
[t= 36s] ct=54.84 ...
VERDICT: ct_advance=33.09s rs=4 vw=1920 err=None → PASS
headless-test-v2.py against prod with ITEMS=9312799ca24979bd05aad9733ce7ee14 confirms the same outcome for both the admin (s8n) and the non-admin (guest) user: readyState=4, currentTime≈9.5s, videoWidth=1920, paused=false, error=null, src https://arrflix.s8n.ru/Videos/9312799ca24979bd05aad9733ce7ee14/stream.mkv?Static=true... (direct-play, no transcode required for this codec/profile pair).
Open follow-ups
- INC6
clear-cache-onlymiddleware can be retired now — it was deployed to flush stale cache after INC5 but cannot dislodge SWs (see §Q3/Q9). Now that the SW is oncache-no-store, the hammer is no longer needed. Remove the line- clear-cache-only@filefromjellyfin-html-nocachemiddleware list in a follow-up commit once owner confirms one fresh load on real browsers. - Service-worker auto-recovery for already-poisoned clients. The ARRFLIX shim already loops
navigator.serviceWorker.getRegistrations() → r.unregister(); caches.keys() → caches.delete()once per pageview (verified in shim audit §c). With the SW now servedno-store, the next reload picks up a clean SW and recovery is automatic — no user action needed. - INC2 backdrop-pin CSS in branding.xml is no longer suspected (not the root cause this round) but still worth a deferred audit when the Cineplex theme update lands.
- Per-user
EnablePlaybackRemuxing=0flagged as suspect #1 in the original ranking is benign for direct-play codec paths (verified by guest playing fine on the test). It only matters if the source codec needs remux to MP4 for a constrained client; can be left as-is or normalised in a separate housekeeping pass. /opt/docker/jellyfin/web-overrides/index.htmlownership root:root mtime 02:39 — investigate whether a recovery cron or a sudo cp from a.bakfile rewrote it mid-incident. The bind-mount is:roso the container is unaffected, but future hot-patches byuserwill EPERM. Cosmetic, fix in a follow-up.
Commit
Repo commit (this doc + bin/prod-vs-dev-compare.py): 917d21b3be5f8de198ff9b965942fb20cbded902
- Author:
s8n <admin@s8n.ru>per memoryuser_git_identity.md— no Co-Authored-By trailer - Pushed to
origin mainongit.s8n.ru/s8n/ARRFLIXat 2026-05-09 02:46Z
The dynamic.yml patch is deployed to /opt/docker/traefik/config/dynamic.yml on nullstone (hot-reloaded via Traefik file provider). Backup of the pre-fix file kept at /opt/docker/traefik/config/dynamic.yml.bak.pre-sw-fix-1778291088 for one-step rollback if needed. Traefik config is intentionally NOT mirrored into the arrflix-repo (lives in nullstone-side /opt/docker/traefik/); the doc captures the change in full.
Headless comparison (2026-05-09 ~02:57Z)
Followup empirical test using Playwright + chromium-headless against both
sides simultaneously. Script at bin/prod-vs-dev-compare.py.
Method
- Login as admin on each side (
s8n/2001dudeon prod;test/2001dudeon dev, reset viaUPDATE Users SET Password=NULL WHERE Username='test'while the container was stopped, then API-set to2001dude). - Navigate to
Mike Nolan Show — S01E04 (Ding Dong Delli), ItemId9312799ca24979bd05aad9733ce7ee14(same on both sides — guid is derived from the file path which is identical). - Click the on-page Play button, sample state at t=5/10/20/30s. At each
sample:
<video>.{currentTime,paused,error,videoWidth,readyState}plus a 32×18drawImage(<video>)to a hidden canvas to compute average luma (so we can tell if the video element itself is decoding pixels), plusdocument.elementsFromPoint(videoCenter)to record the DOM stacking order at the centre of the<video>element.
File metadata (identical on both sides)
| Field | Value |
|---|---|
| Path | /media/tv/The Mike Nolan Show (2016)/Season 01/...S01E04 - Ding Dong Delli.mkv |
| Container | mkv |
| Size | 11534336 bytes (~11 MB) |
| Bitrate | 473009 bps |
| Video codec | h264 High@4.0, SDR, 1920×1080 |
| Audio codec | aac LC, 2-channel |
PlaybackInfo / API
Identical on both sides for the API-issued POST /Items/{id}/PlaybackInfo:
| Field | prod | dev |
|---|---|---|
| Container | mkv |
mkv |
| Protocol | File |
File |
| SupportsDirectPlay | True |
True |
| SupportsDirectStream | True |
True |
| TranscodingUrl | None |
None |
| TranscodeReasons | None |
None |
| Bitrate | 473009 |
473009 |
So the server's playback decision is identical — it's not a
transcoder-vs-direct-play divergence. No ffmpeg cmdline appeared in either
container's docker logs during the run; both DirectPlay'd the .mkv.
Stream URL (decoded)
- prod:
https://arrflix.s8n.ru/Videos/9312799ca24979bd05aad9733ce7ee14/stream.mkv?Static=true&mediaSourceId=9312799ca24979bd05aad9733ce7ee14&deviceId=...&api_key=...&Tag=448d71aa9830b270dc375a83a4d6c6fc#t=70.44175 - dev:
https://dev.arrflix.s8n.ru/Videos/9312799ca24979bd05aad9733ce7ee14/stream.mkv?Static=true&mediaSourceId=9312799ca24979bd05aad9733ce7ee14&deviceId=...&api_key=...&Tag=448d71aa9830b270dc375a83a4d6c6fc#t=29.892814
Same URL template, same file Tag (448d71aa9830b270dc375a83a4d6c6fc), same
DirectPlay path. The #t= fragment difference is just resume-position state.
Final video state at t=30s
| Field | prod | dev |
|---|---|---|
| currentTime | 99.68 |
60.19 |
| duration | 328.368 |
328.368 |
| paused | False |
False |
| error | None |
None |
| videoWidth | 1920 |
1920 |
| videoHeight | 1080 |
1080 |
| readyState | 4 (HAVE_ENOUGH_DATA) |
4 |
| paintLuma | 107.2 (real frame data) |
129.7 |
| paintOk | True |
True |
The <video> element on prod is decoding actual pixels — drawImage(v)
captures luma >100 (vivid cartoon color). Yet a full-page screenshot at the
same instant is all-black. The pixels never reach the page composition.
Smoking gun — DOM stacking at the video centre
=== prod ===
[top] div#videoOsdPage.page libraryPage mainAnimatedPage
bg=rgb(0, 0, 0) ← OPAQUE BLACK, full viewport
z=auto, position=absolute
div.backgroundContainer backgroundContainer-transparent bg=rgba(0,0,0,0)
video.htmlvideoplayer bg=rgba(0,0,0,0)
div.videoPlayerContainer bg=rgb(0,0,0)
[bot] body, html
=== dev ===
[top] div#videoOsdPage.page libraryPage mainAnimatedPage
bg=rgba(0, 0, 0, 0) ← TRANSPARENT
z=auto, position=absolute
div.backgroundContainer backgroundContainer-transparent bg=rgba(0,0,0,0)
video.htmlvideoplayer bg=rgba(0,0,0,0)
div.videoPlayerContainer bg=rgb(0,0,0)
[bot] body, html
#videoOsdPage has the same class names on both sides
(page libraryPage mainAnimatedPage), the same DOM position, the same
z-index/position. The only difference is background-color: rgb(0,0,0)
on prod versus rgba(0,0,0,0) on dev. That single property covers the
entire viewport with opaque black on top of the still-decoding video.
Root cause — Custom CSS in branding.xml
/home/docker/jellyfin/config/config/branding.xml (prod) is 401 lines.
/home/docker/jellyfin-dev/config/config/branding.xml is 116 lines. The
diff includes the BLACK-PASS 2026-05-08 rule that doesn't exist on dev:
/* === BLACK-PASS 2026-05-08 — eliminate ALL residual grays ... === */
:root { --theme-background-color: #000000 !important; ... }
...
/* Page-container surfaces — hit every wrapper the SPA might render */
.dashboardDocument, body.dashboardDocument,
.mainAnimatedPages, .pageContainer, .libraryPage,
.absolutePageTabContent, .itemDetailPage,
.padded-bottom-page, #mainDrawerPanel, #mainPanel,
.layout-desktop, .layout-mobile, .layout-tv {
background-color: #000000 !important; /* ← THIS LINE */
}
Later in the same file there's a guarded undo:
.libraryPage:has(.itemDetailPage),
.absolutePageTabContent:has(.itemDetailPage) {
background-color: transparent !important;
background: transparent !important;
}
The undo only matches when the .libraryPage contains .itemDetailPage
as a descendant. The OSD/video page #videoOsdPage also has class
libraryPage, but its descendant tree is the video player (.htmlVideoPlayer,
.videoOsdBottom, etc.) — not .itemDetailPage. So the BLACK-PASS rule
wins for the OSD page and paints opaque black over the playing video.
Fix
Extend the override to also exempt .libraryPage instances that contain
the video player. In /home/docker/jellyfin/config/config/branding.xml,
in the .libraryPage:has(.itemDetailPage) block, add:
.libraryPage:has(.itemDetailPage),
.libraryPage:has(.htmlVideoPlayer), /* ← add this */
.libraryPage:has(.videoPlayerContainer), /* ← and this */
.libraryPage#videoOsdPage, /* ← belt + suspenders */
.absolutePageTabContent:has(.itemDetailPage) {
background-color: transparent !important;
background: transparent !important;
}
Or, more surgically, add a single rule:
#videoOsdPage,
.page#videoOsdPage,
.libraryPage#videoOsdPage {
background-color: transparent !important;
background: transparent !important;
}
Either form will let the underlying <video> element show through the OSD
page wrapper while playback is active. No server / Traefik / Jellyfin-image
change is needed; just edit branding.xml (Custom CSS) and the change takes
effect on next hard reload of the web client.
One-line answer
prod fails because the BLACK-PASS 2026-05-08 Custom-CSS rule paints
#videoOsdPage (which has class libraryPage) with background:#000 !important,
covering the still-decoding <video> element with an opaque black div whenever
the OSD page is rendered for playback. Dev never shipped that rule, so its
#videoOsdPage stays transparent and the video paints through.
Artifacts
bin/prod-vs-dev-compare.py— the comparison script (committable)/tmp/arrflix-prod-vs-dev/diff.jsonand/tmp/arrflix-prod-vs-dev/diff.md/tmp/arrflix-prod-vs-dev/{prod,dev}/result.json— full per-side JSON (includes every/Videos /Items /master.m3u8 /PlaybackInfo /Audio /streamrequest URL + status, browser console, server log tail)/tmp/arrflix-prod-vs-dev/{prod,dev}/play-t{5,10,20,30}.png— screenshots- API key
arrflix-prodvsdev-2026-05-09was created on each side at run start and deleted at run end (404 on the dev cleanup is benign — the new key is no longer in the listing because token rotation already invalidated it afterAuth/Keysoperation; manual confirmation viacurl https://{prod,dev}.../Auth/Keysshows no leftover entry).
Note that the test harness ran in headless chromium and was on prod still
painting actual pixels to the underlying <video> element (paintLuma
~107). On a real browser the same overlay div fully covers the canvas, so
the user reports "black screen" exactly as observed in the screenshots.
INC7 final — CSS overlay was the actual cause
After INC7-attempt-1 (Traefik SW-pin fix) shipped, headless playwright
on prod still measured darkPct=100% of the visual viewport while
<video> element decoded frames (canvas drawImage luma=84,
videoWidth=1920, currentTime advancing). Confirmed agent 2's
hypothesis: <video> paints, but a CSS overlay covers it.
Root cause
branding.xml BLACK-PASS rule paints .libraryPage with
background:#000 !important. Jellyfin's video OSD page renders as
<div id="videoOsdPage" class="libraryPage"> (id + class).
The class match → opaque black div ABOVE the <video> element →
visually black despite real frames decoding underneath.
Dev didn't ship the BLACK-PASS block at all → no overlay → video visible.
Fix (CSS, server-side branding.xml CustomCss)
.libraryPage:has(.htmlVideoPlayer),
.libraryPage#videoOsdPage,
#videoOsdPage,
#videoOsdPage .pageContainer,
#videoOsdPage .layout-desktop,
#videoOsdPage .mainAnimatedPages {
background-color: transparent !important;
background: transparent !important;
}
Verified
Post-fix headless playwright: darkPct=9.8%. Screenshot /tmp/inc7-after.png
shows actual MNS S1E4 video frame (sasquatch in cage). Real visual paint.
Cleanup
- Removed
clear-cache-only@filemiddleware attachment fromjellyfin-html-nocacherouter. INC7 SW-pin fix + INC7 CSS fix together close the case; the temporary cache-wipe middleware is no longer needed and would burn HTTP cache on every visit. - Backup:
/opt/docker/traefik/config/dynamic.yml.bak.inc6-removal.*
Lesson
Agent 6 marked "verified" using video-element state alone (currentTime advancing, readyState=4, videoWidth>0). Element decoded fine — but CSS overlay above it made it visually black. Headless test must ALSO sample pixel histogram + canvas drawImage on the actual painted viewport, not just element properties.
bin/headless-test-v2.py already includes the canvas-drawImage paint
check (Pillow + drawImage luma). Add a darkPct assertion to surface
this class of regression next time.
Status
INC7 FINAL — case closed. Owner action: hard-reload browser, confirm visual paint.