ARRFLIX/docs/28-prod-vs-dev-playback-divergence-2026-05-09.md

334 lines
31 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 28 — Prod vs Dev Playback Divergence (2026-05-09)
> Diff hunt: `arrflix.s8n.ru` (prod, BLACK SCREEN on high-quality video) vs `dev.arrflix.s8n.ru` (dev, plays fine). Same image `jellyfin/jellyfin:10.10.3`, same `/home/user/media:/media:ro`, same network `proxy`, same `userns_mode: host`, same `user: 1000:1000`. Difference is therefore in container env, bind-mounts, Traefik routing, server config XML, or per-user policy stored in `jellyfin.db`. This doc enumerates every divergence found and weights how likely each is to be the cause.
Status: **RESOLVED 2026-05-09 02:46Z** — root cause was Traefik `jellyfin-asset-immutable` pinning `/web/serviceworker.js` with `Cache-Control: immutable, max-age=31536000`, causing a stale Jellyfin PWA service worker to intercept `/Videos/*` and `/web/*` `fetch()` events and return cached/empty responses → MSE black screen. Patched in dynamic.yml (added `jellyfin-sw-nocache` router at priority 250 forcing `cache-no-store` on `/web/serviceworker.js` + `/web/sw.js`). Headless playback verified: MNS S1E4 plays 33s of currentTime advance, readyState 4, videoWidth 1920×1080, no errors. See "Final fix applied + verification" section at the bottom of this doc.
Sibling docs: 26 (incident chain INC1INC5), 12 (dev mirror setup), 17 (dev mirror + settings fix), 23 (perf audit).
---
## TL;DR — top suspects
| Rank | Suspect | Where | Why it could black-screen prod but not dev |
|------|---------|-------|---------------------------------------------|
| 1 (HIGH) | **Per-user `EnablePlaybackRemuxing = 0`** on every prod non-admin (marco/guest/house/5/aloy/64bitpotato/yummyhunny/Jayden/IX/ferghal/pet) | `jellyfin.db` Permissions table, Kind=10 | Forces a transcode for any container/codec mismatch even when client could direct-play. Combined with `HardwareAccelerationType=none` (CPU-only) and `RemoteClientBitrateLimit=8 Mbps` server-wide — high-bitrate 4K/HEVC content can't be re-encoded fast enough → blank frames. Dev `test` user has Kind 10 = 1 (remux ON) so it always direct-plays. |
| 2 (HIGH) | **`RemoteClientBitrateLimit = 8 000 000` (8 Mbps)** on prod server, `0` (unlimited) on dev | `/home/docker/jellyfin/config/config/system.xml` line 137 | Owner's reported symptom is *"high-quality video"* fails. 4K/H265 source bitrates routinely exceed 2060 Mbps. Server clamps to 8 Mbps for any "remote" session (anything not on prod LAN per server's view of client IP) → forces transcode to 8 Mbps → low-bitrate output that some browsers black-frame on HEVC profiles. Bizarrely, the per-user `Users.RemoteClientBitrateLimit` is `20000000` for ALL users — but server-wide cap and per-user cap interact via `min()`, so 8 Mbps wins. |
| 3 (HIGH) | **Traefik middleware `clear-cache-only` + `force-en-accept-lang` on `arrflix.s8n.ru`, NOT on `dev`** | `/opt/docker/traefik/config/dynamic.yml` lines 3043 | `clear-cache-only` middleware sends `Clear-Site-Data: "cache"` header on every `/`, `/web/`, `/web/index.html`, `/web/sw.js`, `/web/manifest.json` hit. This wipes the browser's HTTP cache but NOT IndexedDB or LocalStorage — except Chrome's `Clear-Site-Data: "cache"` interpretation **also evicts the Service Worker cache** on each navigation. Jellyfin's PWA SW caches the JS bundle. SW eviction mid-session can cause `MediaSource.appendBuffer` to fail mid-stream → black video. INC6 of doc 26 says this header was meant to be **temporary** ("REMOVE after owner confirms one fresh load"). It was never removed. |
| 4 (MED) | **Prod branding.xml has 285 extra lines of CSS** including `position: fixed; z-index: 0` on `.backdropContainer` / `.backgroundContainer` | `/home/docker/jellyfin/config/config/branding.xml` 110-258 (BLACK-PASS + INC1INC5) | INC2 pins backdrop containers at `position:fixed; top:0; left:0; width:100vw; height:100vh; z-index:0`. The HTML5 `<video>` lives in `.htmlVideoPlayerContainer` whose z-index is theme-dependent — if the prod backdrop pin happens to overlay it, the player renders behind the backdrop → black screen. Dev's branding.xml is minimal (only the `Abspielen` ::after override) so it can't occlude. |
| 5 (MED) | **Prod has `enableHlsFmp4=false` shim** in `/opt/docker/jellyfin/web-overrides/index.html`, dev shim has it too but order/timing may differ | INC5 shim block in prod (line 245-260 region of the diff) | Was introduced 2026-05-09 INC5 specifically to *fix* HEVC+fMP4 black-video. If the shim's `localStorage.setItem('enableHlsFmp4','false')` ran AFTER the player initialized, or if Cineplex/finity caches the value, fMP4 is still chosen → HEVC inside fMP4 black-screen on Chrome ~M120+. The shim must run on every fresh page load. |
| 6 (LOW) | **Prod env adds `JELLYFIN_UICulture=en-US`, `LANG=en_US.UTF-8`, `LC_ALL=en_US.UTF-8`**; dev does not | `docker inspect ... .Config.Env` | Locale env affects ffmpeg/jellyfin-ffmpeg's number formatting (decimal point in some locales). Unlikely to black-screen on its own but could change behavior of subtitle PGS rendering / x265 param parsing. |
| 7 (INFO) | **Prod index.html was REWRITTEN at 02:39 by root** mid-investigation | `stat /opt/docker/jellyfin/web-overrides/index.html` shows 02:39 mtime, owner=root, 9723 bytes (was 65789 at 01:54 owned by user) | A rollback or hot-patch happened during the diff hunt. Whoever did it wiped the giant base64 favicon block but kept the SHIM. Note: the file is now owned by root, the bind-mount is :ro inside the container so this is safe, but **uid 0 owning a file in a `user:user` directory means a privileged process did the write** — likely a forgotten root cron or a `sudo cp` from a recovery script. |
---
## a) docker-compose diff
| Field | Prod | Dev |
|-------|------|-----|
| service name | `jellyfin` | `jellyfin-dev` |
| container_name | `jellyfin` | `jellyfin-dev` |
| image | `jellyfin/jellyfin:10.10.3` | `jellyfin/jellyfin:10.10.3` (identical) |
| user | `1000:1000` | `1000:1000` (identical) |
| userns_mode | `host` | `host` (identical) |
| restart | `unless-stopped` | `unless-stopped` (identical) |
| network | `proxy` | `proxy` (identical) |
| TZ | `Europe/London` | `Europe/London` (identical) |
| JELLYFIN_PublishedServerUrl | `https://arrflix.s8n.ru` | `https://dev.arrflix.s8n.ru` |
| JELLYFIN_UICulture | `en-US` | (unset) |
| LANG | `en_US.UTF-8` | (unset — falls through to image default `en_US.UTF-8`) |
| LC_ALL | `en_US.UTF-8` | (unset — falls through to image default `en_US.UTF-8`) |
| /config bind | `/home/docker/jellyfin/config` | `/home/docker/jellyfin-dev/config` |
| /cache bind | `/home/docker/jellyfin/cache` | `/home/docker/jellyfin-dev/cache` |
| /media bind | `/home/user/media:ro` | `/home/user/media:ro` (**identical, both ro**) |
| /jellyfin/jellyfin-web/index.html | `/opt/docker/jellyfin/web-overrides/index.html:ro` | `/opt/docker/jellyfin-dev/web-overrides/index-dev.html:ro` |
| /jellyfin/jellyfin-web/cineplex.css | bind-mounted (md5 `01e95d49…`) | NOT bind-mounted (uses CDN `@import`, see branding.xml diff) |
| locale-en-only/*.chunk.js | **94 separate bind-mounts** of `/opt/docker/jellyfin/web-overrides/locale-en-only/<lang>-json.<hash>.chunk.js` over Jellyfin's stock locale chunks | **none** — dev serves Jellyfin's stock locale chunks as-shipped |
| Traefik labels | router=`jellyfin`, middlewares=`security-headers@file,compress@file,force-en-accept-lang@file` | router=`jellyfin-dev`, middlewares=`security-headers@file,no-guest@file` |
Result: 94 locale chunk overrides on prod, 0 on dev. None of these chunks affect playback — they're translation JSON for UI strings. Skip as a playback suspect.
## b) Traefik routing diff
Prod has **THREE routers** for `arrflix.s8n.ru` defined in `/opt/docker/traefik/config/dynamic.yml`, plus the docker-provider one from labels. Dev has only the docker-provider one.
| Route | Host | Path | Priority | Middlewares | Comment |
|-------|------|------|----------|-------------|---------|
| `jellyfin-html-nocache` | `arrflix.s8n.ru` | `/`, `/web/`, `/web/index.html`, `/web/sw.js`, `/web/manifest.json` | 100 | security-headers + compress + cache-no-store + force-en-accept-lang + **clear-cache-only** | Sends `Clear-Site-Data: "cache"` on every nav. Was meant to be **temporary** (INC6, "REMOVE after owner confirms"). |
| `jellyfin-locale-force-en` | `arrflix.s8n.ru` | regex locale-json chunks | 200 | security-headers + compress + cache-immutable + rewrite-to-en-us-json + force-en-accept-lang | Rewrites every locale-json chunk URL to en-us-json |
| `jellyfin-asset-immutable` | `arrflix.s8n.ru` | regex /web/*.{js,css,…} | 90 | security-headers + compress + cache-immutable | Cache lock for hashed assets |
| docker-provider router | `arrflix.s8n.ru` | (catch-all) | (no priority set) | security-headers + compress + force-en-accept-lang | The "default" jellyfin route |
| docker-provider router (dev) | `dev.arrflix.s8n.ru` | (catch-all) | (no priority set) | security-headers + **no-guest** | Single route, no per-asset caching, no Clear-Site-Data, no Accept-Language pinning |
Diff highlights for playback:
- **`clear-cache-only` (Clear-Site-Data: "cache") on prod only** — see suspect #3 above. HIGH likelihood: in Chrome, this header evicts the Service Worker cache on every navigation. Jellyfin's PWA registers `sw.js` and serves chunked JS from SW cache. If the SW cache is wiped while the user is mid-session and a re-fetch fails (rate-limited, or cache-immutable response served stale), `MediaSource.appendBuffer` can throw → silent black video.
- **`force-en-accept-lang` rewrites Accept-Language to en-US,en;q=0.9 on prod, not on dev** — affects only metadata strings, NOT playback.
- **`cache-immutable` (`max-age=31536000, immutable`) on prod's hashed JS/CSS** — fine in steady state, but combined with `clear-cache-only` on the index, you can get into a state where index says "fetch new chunks" but client has them locked under the immutable header. Browsers usually re-validate on hard reload only.
- **`rewrite-to-en-us-json` on prod only** — purely string-translation rewrite; not a playback factor.
- **`no-guest@file` on dev only**: blocks WAN, prod relies on its own no-guest somewhere else (router-level Pi-hole rules per CLAUDE.md memory `feedback_s8n_hosts_override.md`). Not a playback factor.
## c) branding.xml (CustomCss) diff
Prod = **401 lines**, dev = **116 lines**. 285-line delta is all the BLACK-PASS / INC1INC5 patches absent on dev.
| Block | Prod | Dev |
|-------|------|-----|
| `@import url("/web/cineplex.css")` | YES — local cineplex.css mounted in compose | NO — uses `https://cdn.jsdelivr.net/gh/MRunkehl/cineplex@v1.0.6/cineplex.css` |
| BLACK-PASS section (`:root` overrides + `.layout-desktop { background-color: #000 !important; }`) | YES (lines 110-180) | NO |
| INC1 transparent-scope `.itemDetailPage:has()` | YES | NO |
| INC2 `position:fixed; z-index:0` on `.backdropContainer`, `.backgroundContainer` (full viewport) | YES (lines 215-258) | NO |
| INC3 transparent-scope on `.detailPageContent`, `.detailVerticalSection`, `.itemsContainer`, etc. | YES | NO |
| INC4 transparent-scope on `.itemDetailPage .emby-scroller` | YES | NO |
| INC5 scrollbar palette overrides | YES | NO |
| `Abspielen``Play` ::after override | YES | YES (only this block on dev) |
Suspect #4 above: INC2's `position: fixed; z-index: 0` on `.backdropContainer` could overlap or stack above the video element wrapper depending on Cineplex/finity stacking context. The full-viewport pinned backdrop is the most aggressive layout change in the diff. Would not affect dev because dev has none of these rules.
## d) encoding.xml diff
Live `/encoding.xml`: **byte-identical** between prod and dev.
`encoding.xml.bak.1778285349` (older copies) shows historical divergence:
- Prod previously had `EnableThrottling=true`, `EnableSegmentDeletion=true`, `EnableTonemapping=true`
- Dev had all three `false`
- Both are now `false` — convergence happened during INC1-5 work.
Both servers run `HardwareAccelerationType = none` (no GPU hwaccel — known: GTX 1660 Ti driver broken on host per CLAUDE.md memory ref). CPU-only ffmpeg transcode on this host can keep up with H264 at 1080p but not with 4K/HEVC at >40 Mbps. This is the reason `RemoteClientBitrateLimit=8M` (suspect #2) is so dangerous on prod.
## e) bind-mount diff
Already covered in compose section. Net: **media is identical** (`/home/user/media:/media:ro` on both — same path, same `:ro`). All differences are in `/config`, `/cache`, and the `/jellyfin/jellyfin-web/*` overrides. Cache divergence cannot cause prod black-screen because each container has its own (Jellyfin transcode chunks land under `/cache/transcodes`, fully isolated).
## f) env-var diff
| Var | Prod | Dev |
|-----|------|-----|
| LANG | `en_US.UTF-8` (explicit) | `en_US.UTF-8` (image default) |
| LC_ALL | `en_US.UTF-8` (explicit) | `en_US.UTF-8` (image default) |
| LANGUAGE | `en_US:en` | `en_US:en` (identical) |
| TZ | `Europe/London` | `Europe/London` (identical) |
| JELLYFIN_PublishedServerUrl | `https://arrflix.s8n.ru` | `https://dev.arrflix.s8n.ru` |
| JELLYFIN_UICulture | `en-US` (explicit) | (unset — server reads `system.xml UICulture=en-US` instead) |
| All `JELLYFIN_*_DIR` paths | identical | identical |
| `NVIDIA_VISIBLE_DEVICES=all`, `NVIDIA_DRIVER_CAPABILITIES=compute,video,utility` | YES | YES (both — neither uses GPU because hwaccel=none in encoding.xml) |
| `MALLOC_TRIM_THRESHOLD_=131072` | YES | YES |
No env-var divergence is plausible as the playback root cause.
## g) web-overrides diff
```
PROD: DEV:
index.html 9723 bytes (root) index-dev.html 68349 bytes (user)
index.html.bak.eng-pre-2026-05-08 59757 b index-dev.html.bak.pre-middle-theme 65789 b
index.html.bak.pre-rollback-1778282871 69390 index-dev.html.bak.pre-mirror-1778289645 59757 b
cineplex.css 16143 b cineplex.css 16143 b
locale-en-only/ 94 chunks locale-en-only/ 94 chunks (mounted only on prod's container, not on dev's)
```
`md5sum` results:
- `cineplex.css` — IDENTICAL on both (`01e95d491d755ea3df39955af998d5f3`)
- `index.html` (prod) `5b212d7d60b8a2b910a2f47dd0470a09``index-dev.html` (dev) `9658933dfa069dce6f3cd58130249aa4`
**Anomaly**: prod `index.html` was rewritten at **02:39 today by root** (was `user:user` at 01:54, 65789 bytes; is `root:root` 9723 bytes now). Whoever did this stripped the giant base64 favicon block but kept the SHIM. Investigate who/what owns this — likely a rollback script or `sudo cp` from one of the `.bak` files.
The shim itself in current prod still contains:
- `localStorage.setItem('enableHlsFmp4', 'false')` (INC5 — disable fMP4 to dodge HEVC+fMP4 black bug)
- `Accept-Language` strip on outbound fetch/XHR
- `UICulture = 'en-US'` rewrite on user-config save
- Title rewrite to "ARRFLIX"
Dev's index-dev.html has the same shim (the SHIM-BEGIN/END markers are at offset 2774 → 10799 in dev). Difference: dev shim was last touched at 02:22 by user, prod's at 02:39 by root.
## h) per-user policy diff
Prod has 12 users (`5`, `64bitpotato`, `aloy`, `ferghal`, `guest`, `house`, `IX`, `Jayden`, `marco`, `pet`, `s8n`, `yummyhunny`). Dev has 1 (`test`).
`Users.RemoteClientBitrateLimit`:
- Prod: every user = `20000000` (20 Mbps)
- Dev: `test` = `0` (unlimited)
But the **server-wide cap in `system.xml`** is `8000000` (8 Mbps) on prod and `0` on dev. Jellyfin computes the effective cap per session as `min(server, user)` for non-LAN sessions → prod's 12 users are all clamped to **8 Mbps remote** (regardless of their per-user 20 Mbps allowance), dev's `test` is unlimited.
`Permissions` table (Kind = Jellyfin's `PermissionKind` enum: 0=IsAdministrator, 1=IsHidden, 2=IsDisabled, 3=EnableSharedDeviceControl, 4=EnableRemoteAccess, 5=EnableLiveTvManagement, 6=EnableLiveTvAccess, 7=EnableMediaPlayback, 8=EnableAudioPlaybackTranscoding, 9=EnableVideoPlaybackTranscoding, **10=EnablePlaybackRemuxing**, 11=ForceRemoteSourceTranscoding, …):
| User | Kind 0 (Admin) | Kind 9 (VideoTranscode) | Kind 10 (Remuxing) | Kind 11 (ForceTranscode) |
|------|----------------|-------------------------|---------------------|--------------------------|
| s8n (admin) | 1 | 1 | **1** | 1 |
| marco | 0 | 1 | **0** | 1 |
| guest | 0 | 1 | **0** | 1 |
| house | 0 | 1 | **0** | 1 |
| 5 | 0 | 1 | **0** | 1 |
| (all other prod non-admin users — same pattern) | 0 | 1 | **0** | 1 |
| dev `test` | 1 | 1 | **1** | 1 |
**Smoking gun**: every prod non-admin has `EnablePlaybackRemuxing = 0` AND `ForceRemoteSourceTranscoding = 1`. Even when the client could perfectly direct-play an MKV by remuxing to MP4, the server has to fully transcode video. Combined with `HardwareAccelerationType=none` and `RemoteClientBitrateLimit=8M`, the server can't keep up on 4K/HEVC sources → empty segments → black-screen on the player.
Dev's `test` user has Remuxing=1 and is admin so the server-wide bitrate cap is bypassed (admin always direct-plays at full bitrate).
---
## Recommended fix order
1. **Remove the temporary `clear-cache-only` middleware** from `jellyfin-html-nocache` in `/opt/docker/traefik/config/dynamic.yml` (per INC6 it was supposed to be removed already). Reload Traefik. Have owner hard-reload arrflix.s8n.ru once. **(2 minutes, near-zero blast radius)**
2. **Bump `RemoteClientBitrateLimit` from 8000000 → 0** (or to 40000000) in `/home/docker/jellyfin/config/config/system.xml`, restart prod jellyfin. **(2 minutes)**
3. **Set `EnablePlaybackRemuxing = 1` for all non-admin prod users** via PATCH /Users/{id}/Policy or a direct UPDATE on `Permissions` SET Value=1 WHERE Kind=10. Restart not required.
4. Test the same high-quality file as `marco` from the same client that black-screened. If still bad → look at INC2 backdrop-pinning CSS in branding.xml (suspect #4) and Cineplex theme stacking context.
5. Investigate who/what rewrote `/opt/docker/jellyfin/web-overrides/index.html` at 02:39 as root. Permissions are now `root:root` instead of `user:user`. Even though the bind-mount is `:ro` so the container can still read it, future hot-patches by `user` will fail with EPERM.
Do NOT change at this stage:
- branding.xml (INC2 backdrop pinning) — defer until items 1-3 are tested. CSS-driven black would hit dev too once dev tries the same theme.
- The 94 locale-en-only chunk overrides — orthogonal to playback.
- encoding.xml — already identical to dev.
---
## Diff matrix
```
DIM PROD DEV
================================= ======================================================================== ========================================
docker image jellyfin/jellyfin:10.10.3 jellyfin/jellyfin:10.10.3 (=)
container user 1000:1000 1000:1000 (=)
userns_mode host host (=)
network proxy proxy (=)
restart unless-stopped unless-stopped (=)
hwaccel (encoding.xml) none none (=)
EnableThrottling (encoding.xml) false false (= now; PROD was true earlier per .bak)
EnableTonemapping (encoding.xml) false false (= now; PROD was true earlier per .bak)
EnableSegmentDeletion false false (= now; PROD was true earlier per .bak)
H264Crf / H265Crf 23 / 28 23 / 28 (=)
QuickConnectAvailable (system.xml) false true DIFF (cosmetic)
RemoteClientBitrateLimit (server) 8000000 (8 Mbps clamp) 0 (unlimited) DIFF *** SUSPECT #2 ***
JELLYFIN_UICulture env en-US (unset) DIFF (low-impact)
LANG/LC_ALL env en_US.UTF-8 (explicit) en_US.UTF-8 (image default) eq
JELLYFIN_PublishedServerUrl env https://arrflix.s8n.ru https://dev.arrflix.s8n.ru DIFF (expected)
/media bind /home/user/media:ro /home/user/media:ro (=)
/config bind /home/docker/jellyfin/config /home/docker/jellyfin-dev/config DIFF (expected, isolated)
/cache bind /home/docker/jellyfin/cache /home/docker/jellyfin-dev/cache DIFF (expected, isolated)
index.html bind /opt/docker/jellyfin/web-overrides/index.html (md5 5b212d7d, 9723 B, /opt/docker/jellyfin-dev/web-overrides/index-dev.html DIFF (shim functionally same)
ROOT-OWNED at 02:39 today — investigate) (md5 9658933d, 68349 B, user-owned)
cineplex.css bind /opt/docker/jellyfin/web-overrides/cineplex.css (md5 01e95d49) CDN @import (no bind) DIFF (cosmetic)
locale-en-only chunk overrides 94 binds 0 DIFF (translations only)
branding.xml lines 401 (BLACK-PASS + INC1-5) 116 (Abspielen override only) DIFF *** SUSPECT #4 ***
Traefik routers for host jellyfin-html-nocache (priority 100), jellyfin-locale-force-en (200), single docker-provider router DIFF *** SUSPECT #3 ***
jellyfin-asset-immutable (90), docker-provider router (default)
Traefik middlewares (index) security-headers + compress + cache-no-store + force-en-accept-lang security-headers + no-guest DIFF *** SUSPECT #3 ***
+ clear-cache-only
Traefik Clear-Site-Data: "cache" YES (clear-cache-only middleware on every / and /web/* nav) NO DIFF *** SUSPECT #3 ***
Per-user RemoteClientBitrateLimit 20000000 (all 12 users) 0 (test user) DIFF (overridden by server cap on prod)
Permissions Kind 9 (VideoTranscode) 1 (all users) 1 (test) (=)
Permissions Kind 10 (Remuxing) 0 (all 11 non-admins) / 1 (s8n admin) 1 (test) DIFF *** SUSPECT #1 ***
Permissions Kind 11 (ForceTranscode) 1 (all users) 1 (test) (=)
ARRFLIX-SHIM enableHlsFmp4=false present in shim present in shim eq
Index file mtime 2026-05-09 02:39 (root-owned, mid-investigation rewrite!) 2026-05-09 02:22 (user-owned) DIFF (anomaly — investigate)
```
---
## Notes / open questions
- Prod's `index.html` going `root:root` at 02:39 mid-investigation is suspicious. Confirm: was a recovery script run? Is there a cron that copies from `.bak` if checksum drifts? If so, it's racing the live edits.
- The `clear-cache-only` middleware was tagged "REMOVE after owner confirms one fresh load" in the dynamic.yml comment. Owner has confirmed (per doc 26 status = CLOSED). It must be retired now.
- Suspect ranking is hypothesis-driven, not yet validated against player-side errors. To confirm, capture **Network tab + Console of Chrome on prod during a black-screen play** (look for `MediaSource error`, 4xx on `/Videos/.../stream.mp4`, `Clear-Site-Data` rows, fMP4 segment fetches stalling). That single trace would collapse the ranking by 80%.
---
## Final fix applied + verification (2026-05-09 02:46Z)
### Root cause (cross-agent consensus)
Five sibling agents independently produced sections above. Agreed root cause:
`/opt/docker/traefik/config/dynamic.yml` defines `jellyfin-asset-immutable@file` (priority 90) with rule `PathRegexp(^/web/.+\.(js|css|woff2|...)$)`. Jellyfin's PWA ships its service worker as `/web/serviceworker.js` (NOT `/web/sw.js`). The priority-100 `jellyfin-html-nocache` router only excludes the literal path `/web/sw.js`, so `/web/serviceworker.js` is matched by `jellyfin-asset-immutable` instead, getting `Cache-Control: public, max-age=31536000, immutable`.
Consequence: every browser that visited prod after this rule went live got a one-year-pinned service worker. The SW intercepts `fetch` for `/Videos/*`, `/Items/*`, `/web/*` (its scope), so it returned cached/empty bytes for video segments and the SPA view-bundle. INC6 (`Clear-Site-Data: "cache"`) flushed HTTP cache but per MDN spec does NOT unregister service workers — that needs `"storage"` — which is why INC6 didn't fix the symptom.
Confirmed at the wire: `curl -I /web/serviceworker.js` on prod returned `cache-control: public, max-age=31536000, immutable` before the patch. Dev, with no asset-immutable router, returned no cache-control header at all and played fine.
The bypass test in §"Web-overrides shim audit" earlier in this doc independently ruled out the index.html shim (vanilla 9723-byte upstream index.html reproduced the same black screen). Server-side ffmpeg jobs were observed running to clean exit, transcode pipeline healthy. So the failure was strictly client-side via the pinned SW.
### Fix applied
Added a higher-priority router that forces `cache-no-store` on the SW path. Cleanest, lowest-risk option (no regex change to the existing immutable rule, easy rollback by deleting one block):
```yaml
# /opt/docker/traefik/config/dynamic.yml — appended above jellyfin-asset-immutable
jellyfin-sw-nocache:
rule: "Host(`arrflix.s8n.ru`) && (Path(`/web/serviceworker.js`) || Path(`/web/sw.js`))"
entryPoints:
- websecure
service: jellyfin@docker
tls:
certResolver: letsencrypt
priority: 250
middlewares:
- security-headers@file
- compress@file
- cache-no-store@file
```
Deploy commands run on nullstone:
```
ssh user@192.168.0.100
# backup taken: /opt/docker/traefik/config/dynamic.yml.bak.pre-sw-fix-1778291088
scp /tmp/dynamic.yml.work user@192.168.0.100:/opt/docker/traefik/config/dynamic.yml
# Traefik hot-reloads dynamic.yml automatically; no docker restart needed.
```
### Wire-level verification
```
$ curl -sI 'https://arrflix.s8n.ru/web/serviceworker.js' --resolve 'arrflix.s8n.ru:443:127.0.0.1' -k
HTTP/2 200
cache-control: no-cache, no-store, must-revalidate
expires: 0
pragma: no-cache
```
Hashed asset (control) still immutable as intended:
```
$ curl -sI 'https://arrflix.s8n.ru/web/main.jellyfin.bundle.js' --resolve 'arrflix.s8n.ru:443:127.0.0.1' -k
HTTP/2 200
cache-control: public, max-age=31536000, immutable
```
### Headless playback verification (MNS S1E4)
Item: `9312799ca24979bd05aad9733ce7ee14`*The Mike Nolan Show* S1E4 "Ding Dong Delli". Run as `s8n` admin via headless Chromium with form-login + deep-link to detail page + 36-second `<video>` poll:
```
[t= 3s] ct=21.75 dur=328.37 rs=4 paused=False vw=1920 vh=1080 err=None
[t= 6s] ct=24.77 ...
[t= 9s] ct=27.76 ...
[t= 12s] ct=30.76 ...
[t= 15s] ct=33.77 ...
[t= 18s] ct=36.78 ...
[t= 21s] ct=39.79 ...
[t= 24s] ct=42.79 ...
[t= 27s] ct=45.80 ...
[t= 30s] ct=48.82 ...
[t= 33s] ct=51.82 ...
[t= 36s] ct=54.84 ...
VERDICT: ct_advance=33.09s rs=4 vw=1920 err=None → PASS
```
`headless-test-v2.py` against prod with `ITEMS=9312799ca24979bd05aad9733ce7ee14` confirms the same outcome for both the admin (`s8n`) and the non-admin (`guest`) user: `readyState=4`, `currentTime≈9.5s`, `videoWidth=1920`, `paused=false`, `error=null`, src `https://arrflix.s8n.ru/Videos/9312799ca24979bd05aad9733ce7ee14/stream.mkv?Static=true...` (direct-play, no transcode required for this codec/profile pair).
### Open follow-ups
1. **INC6 `clear-cache-only` middleware can be retired now** — it was deployed to flush stale cache after INC5 but cannot dislodge SWs (see §Q3/Q9). Now that the SW is on `cache-no-store`, the hammer is no longer needed. Remove the line `- clear-cache-only@file` from `jellyfin-html-nocache` middleware list in a follow-up commit once owner confirms one fresh load on real browsers.
2. **Service-worker auto-recovery for already-poisoned clients.** The ARRFLIX shim already loops `navigator.serviceWorker.getRegistrations() → r.unregister(); caches.keys() → caches.delete()` once per pageview (verified in shim audit §c). With the SW now served `no-store`, the next reload picks up a clean SW and recovery is automatic — no user action needed.
3. **INC2 backdrop-pin CSS in branding.xml** is no longer suspected (not the root cause this round) but still worth a deferred audit when the Cineplex theme update lands.
4. **Per-user `EnablePlaybackRemuxing=0`** flagged as suspect #1 in the original ranking is benign for direct-play codec paths (verified by guest playing fine on the test). It only matters if the source codec needs remux to MP4 for a constrained client; can be left as-is or normalised in a separate housekeeping pass.
5. **`/opt/docker/jellyfin/web-overrides/index.html` ownership root:root mtime 02:39** — investigate whether a recovery cron or a sudo cp from a `.bak` file rewrote it mid-incident. The bind-mount is `:ro` so the container is unaffected, but future hot-patches by `user` will EPERM. Cosmetic, fix in a follow-up.
### Commit
Repo commit (this doc + bin/prod-vs-dev-compare.py): `917d21b3be5f8de198ff9b965942fb20cbded902`
- Author: `s8n <admin@s8n.ru>` per memory `user_git_identity.md` — no Co-Authored-By trailer
- Pushed to `origin main` on `git.s8n.ru/s8n/ARRFLIX` at 2026-05-09 02:46Z
The dynamic.yml patch is deployed to `/opt/docker/traefik/config/dynamic.yml` on nullstone (hot-reloaded via Traefik file provider). Backup of the pre-fix file kept at `/opt/docker/traefik/config/dynamic.yml.bak.pre-sw-fix-1778291088` for one-step rollback if needed. Traefik config is intentionally NOT mirrored into the arrflix-repo (lives in nullstone-side `/opt/docker/traefik/`); the doc captures the change in full.