# 28 — Prod vs Dev Playback Divergence (2026-05-09)
> Diff hunt: `arrflix.s8n.ru` (prod, BLACK SCREEN on high-quality video) vs `dev.arrflix.s8n.ru` (dev, plays fine). Same image `jellyfin/jellyfin:10.10.3`, same `/home/user/media:/media:ro`, same network `proxy`, same `userns_mode: host`, same `user: 1000:1000`. Difference is therefore in container env, bind-mounts, Traefik routing, server config XML, or per-user policy stored in `jellyfin.db`. This doc enumerates every divergence found and weights how likely each is to be the cause.
Status: **RESOLVED 2026-05-09 02:46Z** — root cause was Traefik `jellyfin-asset-immutable` pinning `/web/serviceworker.js` with `Cache-Control: immutable, max-age=31536000`, causing a stale Jellyfin PWA service worker to intercept `/Videos/*` and `/web/*``fetch()` events and return cached/empty responses → MSE black screen. Patched in dynamic.yml (added `jellyfin-sw-nocache` router at priority 250 forcing `cache-no-store` on `/web/serviceworker.js` + `/web/sw.js`). Headless playback verified: MNS S1E4 plays 33s of currentTime advance, readyState 4, videoWidth 1920×1080, no errors. See "Final fix applied + verification" section at the bottom of this doc.
| 1 (HIGH) | **Per-user `EnablePlaybackRemuxing = 0`** on every prod non-admin (marco/guest/house/5/aloy/64bitpotato/yummyhunny/Jayden/IX/ferghal/pet) | `jellyfin.db` Permissions table, Kind=10 | Forces a transcode for any container/codec mismatch even when client could direct-play. Combined with `HardwareAccelerationType=none` (CPU-only) and `RemoteClientBitrateLimit=8 Mbps` server-wide — high-bitrate 4K/HEVC content can't be re-encoded fast enough → blank frames. Dev `test` user has Kind 10 = 1 (remux ON) so it always direct-plays. |
| 2 (HIGH) | **`RemoteClientBitrateLimit = 8 000 000` (8 Mbps)** on prod server, `0` (unlimited) on dev | `/home/docker/jellyfin/config/config/system.xml` line 137 | Owner's reported symptom is *"high-quality video"* fails. 4K/H265 source bitrates routinely exceed 20–60 Mbps. Server clamps to 8 Mbps for any "remote" session (anything not on prod LAN per server's view of client IP) → forces transcode to 8 Mbps → low-bitrate output that some browsers black-frame on HEVC profiles. Bizarrely, the per-user `Users.RemoteClientBitrateLimit` is `20000000` for ALL users — but server-wide cap and per-user cap interact via `min()`, so 8 Mbps wins. |
| 3 (HIGH) | **Traefik middleware `clear-cache-only` + `force-en-accept-lang` on `arrflix.s8n.ru`, NOT on `dev`** | `/opt/docker/traefik/config/dynamic.yml` lines 30–43 | `clear-cache-only` middleware sends `Clear-Site-Data: "cache"` header on every `/`, `/web/`, `/web/index.html`, `/web/sw.js`, `/web/manifest.json` hit. This wipes the browser's HTTP cache but NOT IndexedDB or LocalStorage — except Chrome's `Clear-Site-Data: "cache"` interpretation **also evicts the Service Worker cache** on each navigation. Jellyfin's PWA SW caches the JS bundle. SW eviction mid-session can cause `MediaSource.appendBuffer` to fail mid-stream → black video. INC6 of doc 26 says this header was meant to be **temporary** ("REMOVE after owner confirms one fresh load"). It was never removed. |
| 4 (MED) | **Prod branding.xml has 285 extra lines of CSS** including `position: fixed; z-index: 0` on `.backdropContainer` / `.backgroundContainer` | `/home/docker/jellyfin/config/config/branding.xml` 110-258 (BLACK-PASS + INC1–INC5) | INC2 pins backdrop containers at `position:fixed; top:0; left:0; width:100vw; height:100vh; z-index:0`. The HTML5 `<video>` lives in `.htmlVideoPlayerContainer` whose z-index is theme-dependent — if the prod backdrop pin happens to overlay it, the player renders behind the backdrop → black screen. Dev's branding.xml is minimal (only the `Abspielen` ::after override) so it can't occlude. |
| 5 (MED) | **Prod has `enableHlsFmp4=false` shim** in `/opt/docker/jellyfin/web-overrides/index.html`, dev shim has it too but order/timing may differ | INC5 shim block in prod (line 245-260 region of the diff) | Was introduced 2026-05-09 INC5 specifically to *fix* HEVC+fMP4 black-video. If the shim's `localStorage.setItem('enableHlsFmp4','false')` ran AFTER the player initialized, or if Cineplex/finity caches the value, fMP4 is still chosen → HEVC inside fMP4 black-screen on Chrome ~M120+. The shim must run on every fresh page load. |
| 6 (LOW) | **Prod env adds `JELLYFIN_UICulture=en-US`, `LANG=en_US.UTF-8`, `LC_ALL=en_US.UTF-8`**; dev does not | `docker inspect ... .Config.Env` | Locale env affects ffmpeg/jellyfin-ffmpeg's number formatting (decimal point in some locales). Unlikely to black-screen on its own but could change behavior of subtitle PGS rendering / x265 param parsing. |
| 7 (INFO) | **Prod index.html was REWRITTEN at 02:39 by root** mid-investigation | `stat /opt/docker/jellyfin/web-overrides/index.html` shows 02:39 mtime, owner=root, 9723 bytes (was 65789 at 01:54 owned by user) | A rollback or hot-patch happened during the diff hunt. Whoever did it wiped the giant base64 favicon block but kept the SHIM. Note: the file is now owned by root, the bind-mount is :ro inside the container so this is safe, but **uid 0 owning a file in a `user:user` directory means a privileged process did the write** — likely a forgotten root cron or a `sudo cp` from a recovery script. |
Result: 94 locale chunk overrides on prod, 0 on dev. None of these chunks affect playback — they're translation JSON for UI strings. Skip as a playback suspect.
## b) Traefik routing diff
Prod has **THREE routers** for `arrflix.s8n.ru` defined in `/opt/docker/traefik/config/dynamic.yml`, plus the docker-provider one from labels. Dev has only the docker-provider one.
| docker-provider router (dev) | `dev.arrflix.s8n.ru` | (catch-all) | (no priority set) | security-headers + **no-guest** | Single route, no per-asset caching, no Clear-Site-Data, no Accept-Language pinning |
Diff highlights for playback:
- **`clear-cache-only` (Clear-Site-Data: "cache") on prod only** — see suspect #3 above. HIGH likelihood: in Chrome, this header evicts the Service Worker cache on every navigation. Jellyfin's PWA registers `sw.js` and serves chunked JS from SW cache. If the SW cache is wiped while the user is mid-session and a re-fetch fails (rate-limited, or cache-immutable response served stale), `MediaSource.appendBuffer` can throw → silent black video.
- **`force-en-accept-lang` rewrites Accept-Language to en-US,en;q=0.9 on prod, not on dev** — affects only metadata strings, NOT playback.
- **`cache-immutable` (`max-age=31536000, immutable`) on prod's hashed JS/CSS** — fine in steady state, but combined with `clear-cache-only` on the index, you can get into a state where index says "fetch new chunks" but client has them locked under the immutable header. Browsers usually re-validate on hard reload only.
- **`rewrite-to-en-us-json` on prod only** — purely string-translation rewrite; not a playback factor.
- **`no-guest@file` on dev only**: blocks WAN, prod relies on its own no-guest somewhere else (router-level Pi-hole rules per CLAUDE.md memory `feedback_s8n_hosts_override.md`). Not a playback factor.
## c) branding.xml (CustomCss) diff
Prod = **401 lines**, dev = **116 lines**. 285-line delta is all the BLACK-PASS / INC1–INC5 patches absent on dev.
| Block | Prod | Dev |
|-------|------|-----|
| `@import url("/web/cineplex.css")` | YES — local cineplex.css mounted in compose | NO — uses `https://cdn.jsdelivr.net/gh/MRunkehl/cineplex@v1.0.6/cineplex.css` |
| INC1 transparent-scope `.itemDetailPage:has()` | YES | NO |
| INC2 `position:fixed; z-index:0` on `.backdropContainer`, `.backgroundContainer` (full viewport) | YES (lines 215-258) | NO |
| INC3 transparent-scope on `.detailPageContent`, `.detailVerticalSection`, `.itemsContainer`, etc. | YES | NO |
| INC4 transparent-scope on `.itemDetailPage .emby-scroller` | YES | NO |
| INC5 scrollbar palette overrides | YES | NO |
| `Abspielen` → `Play` ::after override | YES | YES (only this block on dev) |
Suspect #4 above: INC2's `position: fixed; z-index: 0` on `.backdropContainer` could overlap or stack above the video element wrapper depending on Cineplex/finity stacking context. The full-viewport pinned backdrop is the most aggressive layout change in the diff. Would not affect dev because dev has none of these rules.
## d) encoding.xml diff
Live `/encoding.xml`: **byte-identical** between prod and dev.
- Prod previously had `EnableThrottling=true`, `EnableSegmentDeletion=true`, `EnableTonemapping=true`
- Dev had all three `false`
- Both are now `false` — convergence happened during INC1-5 work.
Both servers run `HardwareAccelerationType = none` (no GPU hwaccel — known: GTX 1660 Ti driver broken on host per CLAUDE.md memory ref). CPU-only ffmpeg transcode on this host can keep up with H264 at 1080p but not with 4K/HEVC at >40 Mbps. This is the reason `RemoteClientBitrateLimit=8M` (suspect #2) is so dangerous on prod.
## e) bind-mount diff
Already covered in compose section. Net: **media is identical** (`/home/user/media:/media:ro` on both — same path, same `:ro`). All differences are in `/config`, `/cache`, and the `/jellyfin/jellyfin-web/*` overrides. Cache divergence cannot cause prod black-screen because each container has its own (Jellyfin transcode chunks land under `/cache/transcodes`, fully isolated).
## f) env-var diff
| Var | Prod | Dev |
|-----|------|-----|
| LANG | `en_US.UTF-8` (explicit) | `en_US.UTF-8` (image default) |
**Anomaly**: prod `index.html` was rewritten at **02:39 today by root** (was `user:user` at 01:54, 65789 bytes; is `root:root` 9723 bytes now). Whoever did this stripped the giant base64 favicon block but kept the SHIM. Investigate who/what owns this — likely a rollback script or `sudo cp` from one of the `.bak` files.
The shim itself in current prod still contains:
-`localStorage.setItem('enableHlsFmp4', 'false')` (INC5 — disable fMP4 to dodge HEVC+fMP4 black bug)
-`Accept-Language` strip on outbound fetch/XHR
-`UICulture = 'en-US'` rewrite on user-config save
- Title rewrite to "ARRFLIX"
Dev's index-dev.html has the same shim (the SHIM-BEGIN/END markers are at offset 2774 → 10799 in dev). Difference: dev shim was last touched at 02:22 by user, prod's at 02:39 by root.
## h) per-user policy diff
Prod has 12 users (`5`, `64bitpotato`, `aloy`, `ferghal`, `guest`, `house`, `IX`, `Jayden`, `marco`, `pet`, `s8n`, `yummyhunny`). Dev has 1 (`test`).
`Users.RemoteClientBitrateLimit`:
- Prod: every user = `20000000` (20 Mbps)
- Dev: `test` = `0` (unlimited)
But the **server-wide cap in `system.xml`** is `8000000` (8 Mbps) on prod and `0` on dev. Jellyfin computes the effective cap per session as `min(server, user)` for non-LAN sessions → prod's 12 users are all clamped to **8 Mbps remote** (regardless of their per-user 20 Mbps allowance), dev's `test` is unlimited.
| (all other prod non-admin users — same pattern) | 0 | 1 | **0** | 1 |
| dev `test` | 1 | 1 | **1** | 1 |
**Smoking gun**: every prod non-admin has `EnablePlaybackRemuxing = 0` AND `ForceRemoteSourceTranscoding = 1`. Even when the client could perfectly direct-play an MKV by remuxing to MP4, the server has to fully transcode video. Combined with `HardwareAccelerationType=none` and `RemoteClientBitrateLimit=8M`, the server can't keep up on 4K/HEVC sources → empty segments → black-screen on the player.
Dev's `test` user has Remuxing=1 and is admin so the server-wide bitrate cap is bypassed (admin always direct-plays at full bitrate).
---
## Recommended fix order
1.**Remove the temporary `clear-cache-only` middleware** from `jellyfin-html-nocache` in `/opt/docker/traefik/config/dynamic.yml` (per INC6 it was supposed to be removed already). Reload Traefik. Have owner hard-reload arrflix.s8n.ru once. **(2 minutes, near-zero blast radius)**
2.**Bump `RemoteClientBitrateLimit` from 8000000 → 0** (or to 40000000) in `/home/docker/jellyfin/config/config/system.xml`, restart prod jellyfin. **(2 minutes)**
3.**Set `EnablePlaybackRemuxing = 1` for all non-admin prod users** via PATCH /Users/{id}/Policy or a direct UPDATE on `Permissions` SET Value=1 WHERE Kind=10. Restart not required.
4. Test the same high-quality file as `marco` from the same client that black-screened. If still bad → look at INC2 backdrop-pinning CSS in branding.xml (suspect #4) and Cineplex theme stacking context.
5. Investigate who/what rewrote `/opt/docker/jellyfin/web-overrides/index.html` at 02:39 as root. Permissions are now `root:root` instead of `user:user`. Even though the bind-mount is `:ro` so the container can still read it, future hot-patches by `user` will fail with EPERM.
Do NOT change at this stage:
- branding.xml (INC2 backdrop pinning) — defer until items 1-3 are tested. CSS-driven black would hit dev too once dev tries the same theme.
- The 94 locale-en-only chunk overrides — orthogonal to playback.
- Prod's `index.html` going `root:root` at 02:39 mid-investigation is suspicious. Confirm: was a recovery script run? Is there a cron that copies from `.bak` if checksum drifts? If so, it's racing the live edits.
- The `clear-cache-only` middleware was tagged "REMOVE after owner confirms one fresh load" in the dynamic.yml comment. Owner has confirmed (per doc 26 status = CLOSED). It must be retired now.
- Suspect ranking is hypothesis-driven, not yet validated against player-side errors. To confirm, capture **Network tab + Console of Chrome on prod during a black-screen play** (look for `MediaSource error`, 4xx on `/Videos/.../stream.mp4`, `Clear-Site-Data` rows, fMP4 segment fetches stalling). That single trace would collapse the ranking by 80%.
---
## Final fix applied + verification (2026-05-09 02:46Z)
### Root cause (cross-agent consensus)
Five sibling agents independently produced sections above. Agreed root cause:
`/opt/docker/traefik/config/dynamic.yml` defines `jellyfin-asset-immutable@file` (priority 90) with rule `PathRegexp(^/web/.+\.(js|css|woff2|...)$)`. Jellyfin's PWA ships its service worker as `/web/serviceworker.js` (NOT `/web/sw.js`). The priority-100 `jellyfin-html-nocache` router only excludes the literal path `/web/sw.js`, so `/web/serviceworker.js` is matched by `jellyfin-asset-immutable` instead, getting `Cache-Control: public, max-age=31536000, immutable`.
Consequence: every browser that visited prod after this rule went live got a one-year-pinned service worker. The SW intercepts `fetch` for `/Videos/*`, `/Items/*`, `/web/*` (its scope), so it returned cached/empty bytes for video segments and the SPA view-bundle. INC6 (`Clear-Site-Data: "cache"`) flushed HTTP cache but per MDN spec does NOT unregister service workers — that needs `"storage"` — which is why INC6 didn't fix the symptom.
Confirmed at the wire: `curl -I /web/serviceworker.js` on prod returned `cache-control: public, max-age=31536000, immutable` before the patch. Dev, with no asset-immutable router, returned no cache-control header at all and played fine.
The bypass test in §"Web-overrides shim audit" earlier in this doc independently ruled out the index.html shim (vanilla 9723-byte upstream index.html reproduced the same black screen). Server-side ffmpeg jobs were observed running to clean exit, transcode pipeline healthy. So the failure was strictly client-side via the pinned SW.
### Fix applied
Added a higher-priority router that forces `cache-no-store` on the SW path. Cleanest, lowest-risk option (no regex change to the existing immutable rule, easy rollback by deleting one block):
Item: `9312799ca24979bd05aad9733ce7ee14` — *The Mike Nolan Show* S1E4 "Ding Dong Delli". Run as `s8n` admin via headless Chromium with form-login + deep-link to detail page + 36-second `<video>` poll:
`headless-test-v2.py` against prod with `ITEMS=9312799ca24979bd05aad9733ce7ee14` confirms the same outcome for both the admin (`s8n`) and the non-admin (`guest`) user: `readyState=4`, `currentTime≈9.5s`, `videoWidth=1920`, `paused=false`, `error=null`, src `https://arrflix.s8n.ru/Videos/9312799ca24979bd05aad9733ce7ee14/stream.mkv?Static=true...` (direct-play, no transcode required for this codec/profile pair).
### Open follow-ups
1.**INC6 `clear-cache-only` middleware can be retired now** — it was deployed to flush stale cache after INC5 but cannot dislodge SWs (see §Q3/Q9). Now that the SW is on `cache-no-store`, the hammer is no longer needed. Remove the line `- clear-cache-only@file` from `jellyfin-html-nocache` middleware list in a follow-up commit once owner confirms one fresh load on real browsers.
2.**Service-worker auto-recovery for already-poisoned clients.** The ARRFLIX shim already loops `navigator.serviceWorker.getRegistrations() → r.unregister(); caches.keys() → caches.delete()` once per pageview (verified in shim audit §c). With the SW now served `no-store`, the next reload picks up a clean SW and recovery is automatic — no user action needed.
3.**INC2 backdrop-pin CSS in branding.xml** is no longer suspected (not the root cause this round) but still worth a deferred audit when the Cineplex theme update lands.
4.**Per-user `EnablePlaybackRemuxing=0`** flagged as suspect #1 in the original ranking is benign for direct-play codec paths (verified by guest playing fine on the test). It only matters if the source codec needs remux to MP4 for a constrained client; can be left as-is or normalised in a separate housekeeping pass.
5.**`/opt/docker/jellyfin/web-overrides/index.html` ownership root:root mtime 02:39** — investigate whether a recovery cron or a sudo cp from a `.bak` file rewrote it mid-incident. The bind-mount is `:ro` so the container is unaffected, but future hot-patches by `user` will EPERM. Cosmetic, fix in a follow-up.
- Author: `s8n <admin@s8n.ru>` per memory `user_git_identity.md` — no Co-Authored-By trailer
- Pushed to `origin main` on `git.s8n.ru/s8n/ARRFLIX` at 2026-05-09 02:46Z
The dynamic.yml patch is deployed to `/opt/docker/traefik/config/dynamic.yml` on nullstone (hot-reloaded via Traefik file provider). Backup of the pre-fix file kept at `/opt/docker/traefik/config/dynamic.yml.bak.pre-sw-fix-1778291088` for one-step rollback if needed. Traefik config is intentionally NOT mirrored into the arrflix-repo (lives in nullstone-side `/opt/docker/traefik/`); the doc captures the change in full.