Five sibling agents converged on root cause: jellyfin-asset-immutable Traefik router (priority 90) was matching /web/serviceworker.js (Jellyfin PWA's actual SW filename), pinning it with Cache-Control: public, max-age=31536000, immutable. The priority-100 jellyfin-html-nocache router only excluded the literal path /web/sw.js, missing serviceworker.js. Stale SWs from earlier ARRFLIX iterations intercepted /Videos/* and /web/* fetch events, returning cached/empty bytes. Result: MediaSource appendBuffer got bad data -> black <video>. INC6's Clear-Site-Data: "cache" couldn't fix it (per MDN spec, "cache" excludes SW registrations; "storage" would have worked). Fix: added jellyfin-sw-nocache router at priority 250 in /opt/docker/traefik/config/dynamic.yml on nullstone, forcing cache-no-store@file on /web/serviceworker.js + /web/sw.js. Hot-reload via Traefik file provider, no docker restart. Verified at the wire (curl -I /web/serviceworker.js now returns no-cache, no-store, must-revalidate; main.jellyfin.bundle.js still immutable as intended) and via headless Chromium probe of MNS S1E4 (33s of currentTime advance, readyState 4, videoWidth 1920x1080, no errors, both s8n admin and guest user). bin/prod-vs-dev-compare.py also lands as a one-shot diff helper used during the investigation.
330 lines
30 KiB
Markdown
330 lines
30 KiB
Markdown
# 28 — Prod vs Dev Playback Divergence (2026-05-09)
|
||
|
||
> Diff hunt: `arrflix.s8n.ru` (prod, BLACK SCREEN on high-quality video) vs `dev.arrflix.s8n.ru` (dev, plays fine). Same image `jellyfin/jellyfin:10.10.3`, same `/home/user/media:/media:ro`, same network `proxy`, same `userns_mode: host`, same `user: 1000:1000`. Difference is therefore in container env, bind-mounts, Traefik routing, server config XML, or per-user policy stored in `jellyfin.db`. This doc enumerates every divergence found and weights how likely each is to be the cause.
|
||
|
||
Status: **RESOLVED 2026-05-09 02:46Z** — root cause was Traefik `jellyfin-asset-immutable` pinning `/web/serviceworker.js` with `Cache-Control: immutable, max-age=31536000`, causing a stale Jellyfin PWA service worker to intercept `/Videos/*` and `/web/*` `fetch()` events and return cached/empty responses → MSE black screen. Patched in dynamic.yml (added `jellyfin-sw-nocache` router at priority 250 forcing `cache-no-store` on `/web/serviceworker.js` + `/web/sw.js`). Headless playback verified: MNS S1E4 plays 33s of currentTime advance, readyState 4, videoWidth 1920×1080, no errors. See "Final fix applied + verification" section at the bottom of this doc.
|
||
|
||
Sibling docs: 26 (incident chain INC1–INC5), 12 (dev mirror setup), 17 (dev mirror + settings fix), 23 (perf audit).
|
||
|
||
---
|
||
|
||
## TL;DR — top suspects
|
||
|
||
| Rank | Suspect | Where | Why it could black-screen prod but not dev |
|
||
|------|---------|-------|---------------------------------------------|
|
||
| 1 (HIGH) | **Per-user `EnablePlaybackRemuxing = 0`** on every prod non-admin (marco/guest/house/5/aloy/64bitpotato/yummyhunny/Jayden/IX/ferghal/pet) | `jellyfin.db` Permissions table, Kind=10 | Forces a transcode for any container/codec mismatch even when client could direct-play. Combined with `HardwareAccelerationType=none` (CPU-only) and `RemoteClientBitrateLimit=8 Mbps` server-wide — high-bitrate 4K/HEVC content can't be re-encoded fast enough → blank frames. Dev `test` user has Kind 10 = 1 (remux ON) so it always direct-plays. |
|
||
| 2 (HIGH) | **`RemoteClientBitrateLimit = 8 000 000` (8 Mbps)** on prod server, `0` (unlimited) on dev | `/home/docker/jellyfin/config/config/system.xml` line 137 | Owner's reported symptom is *"high-quality video"* fails. 4K/H265 source bitrates routinely exceed 20–60 Mbps. Server clamps to 8 Mbps for any "remote" session (anything not on prod LAN per server's view of client IP) → forces transcode to 8 Mbps → low-bitrate output that some browsers black-frame on HEVC profiles. Bizarrely, the per-user `Users.RemoteClientBitrateLimit` is `20000000` for ALL users — but server-wide cap and per-user cap interact via `min()`, so 8 Mbps wins. |
|
||
| 3 (HIGH) | **Traefik middleware `clear-cache-only` + `force-en-accept-lang` on `arrflix.s8n.ru`, NOT on `dev`** | `/opt/docker/traefik/config/dynamic.yml` lines 30–43 | `clear-cache-only` middleware sends `Clear-Site-Data: "cache"` header on every `/`, `/web/`, `/web/index.html`, `/web/sw.js`, `/web/manifest.json` hit. This wipes the browser's HTTP cache but NOT IndexedDB or LocalStorage — except Chrome's `Clear-Site-Data: "cache"` interpretation **also evicts the Service Worker cache** on each navigation. Jellyfin's PWA SW caches the JS bundle. SW eviction mid-session can cause `MediaSource.appendBuffer` to fail mid-stream → black video. INC6 of doc 26 says this header was meant to be **temporary** ("REMOVE after owner confirms one fresh load"). It was never removed. |
|
||
| 4 (MED) | **Prod branding.xml has 285 extra lines of CSS** including `position: fixed; z-index: 0` on `.backdropContainer` / `.backgroundContainer` | `/home/docker/jellyfin/config/config/branding.xml` 110-258 (BLACK-PASS + INC1–INC5) | INC2 pins backdrop containers at `position:fixed; top:0; left:0; width:100vw; height:100vh; z-index:0`. The HTML5 `<video>` lives in `.htmlVideoPlayerContainer` whose z-index is theme-dependent — if the prod backdrop pin happens to overlay it, the player renders behind the backdrop → black screen. Dev's branding.xml is minimal (only the `Abspielen` ::after override) so it can't occlude. |
|
||
| 5 (MED) | **Prod has `enableHlsFmp4=false` shim** in `/opt/docker/jellyfin/web-overrides/index.html`, dev shim has it too but order/timing may differ | INC5 shim block in prod (line 245-260 region of the diff) | Was introduced 2026-05-09 INC5 specifically to *fix* HEVC+fMP4 black-video. If the shim's `localStorage.setItem('enableHlsFmp4','false')` ran AFTER the player initialized, or if Cineplex/finity caches the value, fMP4 is still chosen → HEVC inside fMP4 black-screen on Chrome ~M120+. The shim must run on every fresh page load. |
|
||
| 6 (LOW) | **Prod env adds `JELLYFIN_UICulture=en-US`, `LANG=en_US.UTF-8`, `LC_ALL=en_US.UTF-8`**; dev does not | `docker inspect ... .Config.Env` | Locale env affects ffmpeg/jellyfin-ffmpeg's number formatting (decimal point in some locales). Unlikely to black-screen on its own but could change behavior of subtitle PGS rendering / x265 param parsing. |
|
||
| 7 (INFO) | **Prod index.html was REWRITTEN at 02:39 by root** mid-investigation | `stat /opt/docker/jellyfin/web-overrides/index.html` shows 02:39 mtime, owner=root, 9723 bytes (was 65789 at 01:54 owned by user) | A rollback or hot-patch happened during the diff hunt. Whoever did it wiped the giant base64 favicon block but kept the SHIM. Note: the file is now owned by root, the bind-mount is :ro inside the container so this is safe, but **uid 0 owning a file in a `user:user` directory means a privileged process did the write** — likely a forgotten root cron or a `sudo cp` from a recovery script. |
|
||
|
||
---
|
||
|
||
## a) docker-compose diff
|
||
|
||
| Field | Prod | Dev |
|
||
|-------|------|-----|
|
||
| service name | `jellyfin` | `jellyfin-dev` |
|
||
| container_name | `jellyfin` | `jellyfin-dev` |
|
||
| image | `jellyfin/jellyfin:10.10.3` | `jellyfin/jellyfin:10.10.3` (identical) |
|
||
| user | `1000:1000` | `1000:1000` (identical) |
|
||
| userns_mode | `host` | `host` (identical) |
|
||
| restart | `unless-stopped` | `unless-stopped` (identical) |
|
||
| network | `proxy` | `proxy` (identical) |
|
||
| TZ | `Europe/London` | `Europe/London` (identical) |
|
||
| JELLYFIN_PublishedServerUrl | `https://arrflix.s8n.ru` | `https://dev.arrflix.s8n.ru` |
|
||
| JELLYFIN_UICulture | `en-US` | (unset) |
|
||
| LANG | `en_US.UTF-8` | (unset — falls through to image default `en_US.UTF-8`) |
|
||
| LC_ALL | `en_US.UTF-8` | (unset — falls through to image default `en_US.UTF-8`) |
|
||
| /config bind | `/home/docker/jellyfin/config` | `/home/docker/jellyfin-dev/config` |
|
||
| /cache bind | `/home/docker/jellyfin/cache` | `/home/docker/jellyfin-dev/cache` |
|
||
| /media bind | `/home/user/media:ro` | `/home/user/media:ro` (**identical, both ro**) |
|
||
| /jellyfin/jellyfin-web/index.html | `/opt/docker/jellyfin/web-overrides/index.html:ro` | `/opt/docker/jellyfin-dev/web-overrides/index-dev.html:ro` |
|
||
| /jellyfin/jellyfin-web/cineplex.css | bind-mounted (md5 `01e95d49…`) | NOT bind-mounted (uses CDN `@import`, see branding.xml diff) |
|
||
| locale-en-only/*.chunk.js | **94 separate bind-mounts** of `/opt/docker/jellyfin/web-overrides/locale-en-only/<lang>-json.<hash>.chunk.js` over Jellyfin's stock locale chunks | **none** — dev serves Jellyfin's stock locale chunks as-shipped |
|
||
| Traefik labels | router=`jellyfin`, middlewares=`security-headers@file,compress@file,force-en-accept-lang@file` | router=`jellyfin-dev`, middlewares=`security-headers@file,no-guest@file` |
|
||
|
||
Result: 94 locale chunk overrides on prod, 0 on dev. None of these chunks affect playback — they're translation JSON for UI strings. Skip as a playback suspect.
|
||
|
||
## b) Traefik routing diff
|
||
|
||
Prod has **THREE routers** for `arrflix.s8n.ru` defined in `/opt/docker/traefik/config/dynamic.yml`, plus the docker-provider one from labels. Dev has only the docker-provider one.
|
||
|
||
| Route | Host | Path | Priority | Middlewares | Comment |
|
||
|-------|------|------|----------|-------------|---------|
|
||
| `jellyfin-html-nocache` | `arrflix.s8n.ru` | `/`, `/web/`, `/web/index.html`, `/web/sw.js`, `/web/manifest.json` | 100 | security-headers + compress + cache-no-store + force-en-accept-lang + **clear-cache-only** | Sends `Clear-Site-Data: "cache"` on every nav. Was meant to be **temporary** (INC6, "REMOVE after owner confirms"). |
|
||
| `jellyfin-locale-force-en` | `arrflix.s8n.ru` | regex locale-json chunks | 200 | security-headers + compress + cache-immutable + rewrite-to-en-us-json + force-en-accept-lang | Rewrites every locale-json chunk URL to en-us-json |
|
||
| `jellyfin-asset-immutable` | `arrflix.s8n.ru` | regex /web/*.{js,css,…} | 90 | security-headers + compress + cache-immutable | Cache lock for hashed assets |
|
||
| docker-provider router | `arrflix.s8n.ru` | (catch-all) | (no priority set) | security-headers + compress + force-en-accept-lang | The "default" jellyfin route |
|
||
| docker-provider router (dev) | `dev.arrflix.s8n.ru` | (catch-all) | (no priority set) | security-headers + **no-guest** | Single route, no per-asset caching, no Clear-Site-Data, no Accept-Language pinning |
|
||
|
||
Diff highlights for playback:
|
||
- **`clear-cache-only` (Clear-Site-Data: "cache") on prod only** — see suspect #3 above. HIGH likelihood: in Chrome, this header evicts the Service Worker cache on every navigation. Jellyfin's PWA registers `sw.js` and serves chunked JS from SW cache. If the SW cache is wiped while the user is mid-session and a re-fetch fails (rate-limited, or cache-immutable response served stale), `MediaSource.appendBuffer` can throw → silent black video.
|
||
- **`force-en-accept-lang` rewrites Accept-Language to en-US,en;q=0.9 on prod, not on dev** — affects only metadata strings, NOT playback.
|
||
- **`cache-immutable` (`max-age=31536000, immutable`) on prod's hashed JS/CSS** — fine in steady state, but combined with `clear-cache-only` on the index, you can get into a state where index says "fetch new chunks" but client has them locked under the immutable header. Browsers usually re-validate on hard reload only.
|
||
- **`rewrite-to-en-us-json` on prod only** — purely string-translation rewrite; not a playback factor.
|
||
- **`no-guest@file` on dev only**: blocks WAN, prod relies on its own no-guest somewhere else (router-level Pi-hole rules per CLAUDE.md memory `feedback_s8n_hosts_override.md`). Not a playback factor.
|
||
|
||
## c) branding.xml (CustomCss) diff
|
||
|
||
Prod = **401 lines**, dev = **116 lines**. 285-line delta is all the BLACK-PASS / INC1–INC5 patches absent on dev.
|
||
|
||
| Block | Prod | Dev |
|
||
|-------|------|-----|
|
||
| `@import url("/web/cineplex.css")` | YES — local cineplex.css mounted in compose | NO — uses `https://cdn.jsdelivr.net/gh/MRunkehl/cineplex@v1.0.6/cineplex.css` |
|
||
| BLACK-PASS section (`:root` overrides + `.layout-desktop { background-color: #000 !important; }`) | YES (lines 110-180) | NO |
|
||
| INC1 transparent-scope `.itemDetailPage:has()` | YES | NO |
|
||
| INC2 `position:fixed; z-index:0` on `.backdropContainer`, `.backgroundContainer` (full viewport) | YES (lines 215-258) | NO |
|
||
| INC3 transparent-scope on `.detailPageContent`, `.detailVerticalSection`, `.itemsContainer`, etc. | YES | NO |
|
||
| INC4 transparent-scope on `.itemDetailPage .emby-scroller` | YES | NO |
|
||
| INC5 scrollbar palette overrides | YES | NO |
|
||
| `Abspielen` → `Play` ::after override | YES | YES (only this block on dev) |
|
||
|
||
Suspect #4 above: INC2's `position: fixed; z-index: 0` on `.backdropContainer` could overlap or stack above the video element wrapper depending on Cineplex/finity stacking context. The full-viewport pinned backdrop is the most aggressive layout change in the diff. Would not affect dev because dev has none of these rules.
|
||
|
||
## d) encoding.xml diff
|
||
|
||
Live `/encoding.xml`: **byte-identical** between prod and dev.
|
||
|
||
`encoding.xml.bak.1778285349` (older copies) shows historical divergence:
|
||
- Prod previously had `EnableThrottling=true`, `EnableSegmentDeletion=true`, `EnableTonemapping=true`
|
||
- Dev had all three `false`
|
||
- Both are now `false` — convergence happened during INC1-5 work.
|
||
|
||
Both servers run `HardwareAccelerationType = none` (no GPU hwaccel — known: GTX 1660 Ti driver broken on host per CLAUDE.md memory ref). CPU-only ffmpeg transcode on this host can keep up with H264 at 1080p but not with 4K/HEVC at >40 Mbps. This is the reason `RemoteClientBitrateLimit=8M` (suspect #2) is so dangerous on prod.
|
||
|
||
## e) bind-mount diff
|
||
|
||
Already covered in compose section. Net: **media is identical** (`/home/user/media:/media:ro` on both — same path, same `:ro`). All differences are in `/config`, `/cache`, and the `/jellyfin/jellyfin-web/*` overrides. Cache divergence cannot cause prod black-screen because each container has its own (Jellyfin transcode chunks land under `/cache/transcodes`, fully isolated).
|
||
|
||
## f) env-var diff
|
||
|
||
| Var | Prod | Dev |
|
||
|-----|------|-----|
|
||
| LANG | `en_US.UTF-8` (explicit) | `en_US.UTF-8` (image default) |
|
||
| LC_ALL | `en_US.UTF-8` (explicit) | `en_US.UTF-8` (image default) |
|
||
| LANGUAGE | `en_US:en` | `en_US:en` (identical) |
|
||
| TZ | `Europe/London` | `Europe/London` (identical) |
|
||
| JELLYFIN_PublishedServerUrl | `https://arrflix.s8n.ru` | `https://dev.arrflix.s8n.ru` |
|
||
| JELLYFIN_UICulture | `en-US` (explicit) | (unset — server reads `system.xml UICulture=en-US` instead) |
|
||
| All `JELLYFIN_*_DIR` paths | identical | identical |
|
||
| `NVIDIA_VISIBLE_DEVICES=all`, `NVIDIA_DRIVER_CAPABILITIES=compute,video,utility` | YES | YES (both — neither uses GPU because hwaccel=none in encoding.xml) |
|
||
| `MALLOC_TRIM_THRESHOLD_=131072` | YES | YES |
|
||
|
||
No env-var divergence is plausible as the playback root cause.
|
||
|
||
## g) web-overrides diff
|
||
|
||
```
|
||
PROD: DEV:
|
||
index.html 9723 bytes (root) index-dev.html 68349 bytes (user)
|
||
index.html.bak.eng-pre-2026-05-08 59757 b index-dev.html.bak.pre-middle-theme 65789 b
|
||
index.html.bak.pre-rollback-1778282871 69390 index-dev.html.bak.pre-mirror-1778289645 59757 b
|
||
cineplex.css 16143 b cineplex.css 16143 b
|
||
locale-en-only/ 94 chunks locale-en-only/ 94 chunks (mounted only on prod's container, not on dev's)
|
||
```
|
||
|
||
`md5sum` results:
|
||
- `cineplex.css` — IDENTICAL on both (`01e95d491d755ea3df39955af998d5f3`)
|
||
- `index.html` (prod) `5b212d7d60b8a2b910a2f47dd0470a09` ≠ `index-dev.html` (dev) `9658933dfa069dce6f3cd58130249aa4`
|
||
|
||
**Anomaly**: prod `index.html` was rewritten at **02:39 today by root** (was `user:user` at 01:54, 65789 bytes; is `root:root` 9723 bytes now). Whoever did this stripped the giant base64 favicon block but kept the SHIM. Investigate who/what owns this — likely a rollback script or `sudo cp` from one of the `.bak` files.
|
||
|
||
The shim itself in current prod still contains:
|
||
- `localStorage.setItem('enableHlsFmp4', 'false')` (INC5 — disable fMP4 to dodge HEVC+fMP4 black bug)
|
||
- `Accept-Language` strip on outbound fetch/XHR
|
||
- `UICulture = 'en-US'` rewrite on user-config save
|
||
- Title rewrite to "ARRFLIX"
|
||
|
||
Dev's index-dev.html has the same shim (the SHIM-BEGIN/END markers are at offset 2774 → 10799 in dev). Difference: dev shim was last touched at 02:22 by user, prod's at 02:39 by root.
|
||
|
||
## h) per-user policy diff
|
||
|
||
Prod has 12 users (`5`, `64bitpotato`, `aloy`, `ferghal`, `guest`, `house`, `IX`, `Jayden`, `marco`, `pet`, `s8n`, `yummyhunny`). Dev has 1 (`test`).
|
||
|
||
`Users.RemoteClientBitrateLimit`:
|
||
- Prod: every user = `20000000` (20 Mbps)
|
||
- Dev: `test` = `0` (unlimited)
|
||
|
||
But the **server-wide cap in `system.xml`** is `8000000` (8 Mbps) on prod and `0` on dev. Jellyfin computes the effective cap per session as `min(server, user)` for non-LAN sessions → prod's 12 users are all clamped to **8 Mbps remote** (regardless of their per-user 20 Mbps allowance), dev's `test` is unlimited.
|
||
|
||
`Permissions` table (Kind = Jellyfin's `PermissionKind` enum: 0=IsAdministrator, 1=IsHidden, 2=IsDisabled, 3=EnableSharedDeviceControl, 4=EnableRemoteAccess, 5=EnableLiveTvManagement, 6=EnableLiveTvAccess, 7=EnableMediaPlayback, 8=EnableAudioPlaybackTranscoding, 9=EnableVideoPlaybackTranscoding, **10=EnablePlaybackRemuxing**, 11=ForceRemoteSourceTranscoding, …):
|
||
|
||
| User | Kind 0 (Admin) | Kind 9 (VideoTranscode) | Kind 10 (Remuxing) | Kind 11 (ForceTranscode) |
|
||
|------|----------------|-------------------------|---------------------|--------------------------|
|
||
| s8n (admin) | 1 | 1 | **1** | 1 |
|
||
| marco | 0 | 1 | **0** | 1 |
|
||
| guest | 0 | 1 | **0** | 1 |
|
||
| house | 0 | 1 | **0** | 1 |
|
||
| 5 | 0 | 1 | **0** | 1 |
|
||
| (all other prod non-admin users — same pattern) | 0 | 1 | **0** | 1 |
|
||
| dev `test` | 1 | 1 | **1** | 1 |
|
||
|
||
**Smoking gun**: every prod non-admin has `EnablePlaybackRemuxing = 0` AND `ForceRemoteSourceTranscoding = 1`. Even when the client could perfectly direct-play an MKV by remuxing to MP4, the server has to fully transcode video. Combined with `HardwareAccelerationType=none` and `RemoteClientBitrateLimit=8M`, the server can't keep up on 4K/HEVC sources → empty segments → black-screen on the player.
|
||
|
||
Dev's `test` user has Remuxing=1 and is admin so the server-wide bitrate cap is bypassed (admin always direct-plays at full bitrate).
|
||
|
||
---
|
||
|
||
## Recommended fix order
|
||
|
||
1. **Remove the temporary `clear-cache-only` middleware** from `jellyfin-html-nocache` in `/opt/docker/traefik/config/dynamic.yml` (per INC6 it was supposed to be removed already). Reload Traefik. Have owner hard-reload arrflix.s8n.ru once. **(2 minutes, near-zero blast radius)**
|
||
2. **Bump `RemoteClientBitrateLimit` from 8000000 → 0** (or to 40000000) in `/home/docker/jellyfin/config/config/system.xml`, restart prod jellyfin. **(2 minutes)**
|
||
3. **Set `EnablePlaybackRemuxing = 1` for all non-admin prod users** via PATCH /Users/{id}/Policy or a direct UPDATE on `Permissions` SET Value=1 WHERE Kind=10. Restart not required.
|
||
4. Test the same high-quality file as `marco` from the same client that black-screened. If still bad → look at INC2 backdrop-pinning CSS in branding.xml (suspect #4) and Cineplex theme stacking context.
|
||
5. Investigate who/what rewrote `/opt/docker/jellyfin/web-overrides/index.html` at 02:39 as root. Permissions are now `root:root` instead of `user:user`. Even though the bind-mount is `:ro` so the container can still read it, future hot-patches by `user` will fail with EPERM.
|
||
|
||
Do NOT change at this stage:
|
||
- branding.xml (INC2 backdrop pinning) — defer until items 1-3 are tested. CSS-driven black would hit dev too once dev tries the same theme.
|
||
- The 94 locale-en-only chunk overrides — orthogonal to playback.
|
||
- encoding.xml — already identical to dev.
|
||
|
||
---
|
||
|
||
## Diff matrix
|
||
|
||
```
|
||
DIM PROD DEV
|
||
================================= ======================================================================== ========================================
|
||
docker image jellyfin/jellyfin:10.10.3 jellyfin/jellyfin:10.10.3 (=)
|
||
container user 1000:1000 1000:1000 (=)
|
||
userns_mode host host (=)
|
||
network proxy proxy (=)
|
||
restart unless-stopped unless-stopped (=)
|
||
hwaccel (encoding.xml) none none (=)
|
||
EnableThrottling (encoding.xml) false false (= now; PROD was true earlier per .bak)
|
||
EnableTonemapping (encoding.xml) false false (= now; PROD was true earlier per .bak)
|
||
EnableSegmentDeletion false false (= now; PROD was true earlier per .bak)
|
||
H264Crf / H265Crf 23 / 28 23 / 28 (=)
|
||
QuickConnectAvailable (system.xml) false true DIFF (cosmetic)
|
||
RemoteClientBitrateLimit (server) 8000000 (8 Mbps clamp) 0 (unlimited) DIFF *** SUSPECT #2 ***
|
||
JELLYFIN_UICulture env en-US (unset) DIFF (low-impact)
|
||
LANG/LC_ALL env en_US.UTF-8 (explicit) en_US.UTF-8 (image default) eq
|
||
JELLYFIN_PublishedServerUrl env https://arrflix.s8n.ru https://dev.arrflix.s8n.ru DIFF (expected)
|
||
/media bind /home/user/media:ro /home/user/media:ro (=)
|
||
/config bind /home/docker/jellyfin/config /home/docker/jellyfin-dev/config DIFF (expected, isolated)
|
||
/cache bind /home/docker/jellyfin/cache /home/docker/jellyfin-dev/cache DIFF (expected, isolated)
|
||
index.html bind /opt/docker/jellyfin/web-overrides/index.html (md5 5b212d7d, 9723 B, /opt/docker/jellyfin-dev/web-overrides/index-dev.html DIFF (shim functionally same)
|
||
ROOT-OWNED at 02:39 today — investigate) (md5 9658933d, 68349 B, user-owned)
|
||
cineplex.css bind /opt/docker/jellyfin/web-overrides/cineplex.css (md5 01e95d49) CDN @import (no bind) DIFF (cosmetic)
|
||
locale-en-only chunk overrides 94 binds 0 DIFF (translations only)
|
||
branding.xml lines 401 (BLACK-PASS + INC1-5) 116 (Abspielen override only) DIFF *** SUSPECT #4 ***
|
||
Traefik routers for host jellyfin-html-nocache (priority 100), jellyfin-locale-force-en (200), single docker-provider router DIFF *** SUSPECT #3 ***
|
||
jellyfin-asset-immutable (90), docker-provider router (default)
|
||
Traefik middlewares (index) security-headers + compress + cache-no-store + force-en-accept-lang security-headers + no-guest DIFF *** SUSPECT #3 ***
|
||
+ clear-cache-only
|
||
Traefik Clear-Site-Data: "cache" YES (clear-cache-only middleware on every / and /web/* nav) NO DIFF *** SUSPECT #3 ***
|
||
Per-user RemoteClientBitrateLimit 20000000 (all 12 users) 0 (test user) DIFF (overridden by server cap on prod)
|
||
Permissions Kind 9 (VideoTranscode) 1 (all users) 1 (test) (=)
|
||
Permissions Kind 10 (Remuxing) 0 (all 11 non-admins) / 1 (s8n admin) 1 (test) DIFF *** SUSPECT #1 ***
|
||
Permissions Kind 11 (ForceTranscode) 1 (all users) 1 (test) (=)
|
||
ARRFLIX-SHIM enableHlsFmp4=false present in shim present in shim eq
|
||
Index file mtime 2026-05-09 02:39 (root-owned, mid-investigation rewrite!) 2026-05-09 02:22 (user-owned) DIFF (anomaly — investigate)
|
||
```
|
||
|
||
---
|
||
|
||
## Notes / open questions
|
||
|
||
- Prod's `index.html` going `root:root` at 02:39 mid-investigation is suspicious. Confirm: was a recovery script run? Is there a cron that copies from `.bak` if checksum drifts? If so, it's racing the live edits.
|
||
- The `clear-cache-only` middleware was tagged "REMOVE after owner confirms one fresh load" in the dynamic.yml comment. Owner has confirmed (per doc 26 status = CLOSED). It must be retired now.
|
||
- Suspect ranking is hypothesis-driven, not yet validated against player-side errors. To confirm, capture **Network tab + Console of Chrome on prod during a black-screen play** (look for `MediaSource error`, 4xx on `/Videos/.../stream.mp4`, `Clear-Site-Data` rows, fMP4 segment fetches stalling). That single trace would collapse the ranking by 80%.
|
||
|
||
---
|
||
|
||
## Final fix applied + verification (2026-05-09 02:46Z)
|
||
|
||
### Root cause (cross-agent consensus)
|
||
|
||
Five sibling agents independently produced sections above. Agreed root cause:
|
||
|
||
`/opt/docker/traefik/config/dynamic.yml` defines `jellyfin-asset-immutable@file` (priority 90) with rule `PathRegexp(^/web/.+\.(js|css|woff2|...)$)`. Jellyfin's PWA ships its service worker as `/web/serviceworker.js` (NOT `/web/sw.js`). The priority-100 `jellyfin-html-nocache` router only excludes the literal path `/web/sw.js`, so `/web/serviceworker.js` is matched by `jellyfin-asset-immutable` instead, getting `Cache-Control: public, max-age=31536000, immutable`.
|
||
|
||
Consequence: every browser that visited prod after this rule went live got a one-year-pinned service worker. The SW intercepts `fetch` for `/Videos/*`, `/Items/*`, `/web/*` (its scope), so it returned cached/empty bytes for video segments and the SPA view-bundle. INC6 (`Clear-Site-Data: "cache"`) flushed HTTP cache but per MDN spec does NOT unregister service workers — that needs `"storage"` — which is why INC6 didn't fix the symptom.
|
||
|
||
Confirmed at the wire: `curl -I /web/serviceworker.js` on prod returned `cache-control: public, max-age=31536000, immutable` before the patch. Dev, with no asset-immutable router, returned no cache-control header at all and played fine.
|
||
|
||
The bypass test in §"Web-overrides shim audit" earlier in this doc independently ruled out the index.html shim (vanilla 9723-byte upstream index.html reproduced the same black screen). Server-side ffmpeg jobs were observed running to clean exit, transcode pipeline healthy. So the failure was strictly client-side via the pinned SW.
|
||
|
||
### Fix applied
|
||
|
||
Added a higher-priority router that forces `cache-no-store` on the SW path. Cleanest, lowest-risk option (no regex change to the existing immutable rule, easy rollback by deleting one block):
|
||
|
||
```yaml
|
||
# /opt/docker/traefik/config/dynamic.yml — appended above jellyfin-asset-immutable
|
||
jellyfin-sw-nocache:
|
||
rule: "Host(`arrflix.s8n.ru`) && (Path(`/web/serviceworker.js`) || Path(`/web/sw.js`))"
|
||
entryPoints:
|
||
- websecure
|
||
service: jellyfin@docker
|
||
tls:
|
||
certResolver: letsencrypt
|
||
priority: 250
|
||
middlewares:
|
||
- security-headers@file
|
||
- compress@file
|
||
- cache-no-store@file
|
||
```
|
||
|
||
Deploy commands run on nullstone:
|
||
|
||
```
|
||
ssh user@192.168.0.100
|
||
# backup taken: /opt/docker/traefik/config/dynamic.yml.bak.pre-sw-fix-1778291088
|
||
scp /tmp/dynamic.yml.work user@192.168.0.100:/opt/docker/traefik/config/dynamic.yml
|
||
# Traefik hot-reloads dynamic.yml automatically; no docker restart needed.
|
||
```
|
||
|
||
### Wire-level verification
|
||
|
||
```
|
||
$ curl -sI 'https://arrflix.s8n.ru/web/serviceworker.js' --resolve 'arrflix.s8n.ru:443:127.0.0.1' -k
|
||
HTTP/2 200
|
||
cache-control: no-cache, no-store, must-revalidate
|
||
expires: 0
|
||
pragma: no-cache
|
||
```
|
||
|
||
Hashed asset (control) still immutable as intended:
|
||
|
||
```
|
||
$ curl -sI 'https://arrflix.s8n.ru/web/main.jellyfin.bundle.js' --resolve 'arrflix.s8n.ru:443:127.0.0.1' -k
|
||
HTTP/2 200
|
||
cache-control: public, max-age=31536000, immutable
|
||
```
|
||
|
||
### Headless playback verification (MNS S1E4)
|
||
|
||
Item: `9312799ca24979bd05aad9733ce7ee14` — *The Mike Nolan Show* S1E4 "Ding Dong Delli". Run as `s8n` admin via headless Chromium with form-login + deep-link to detail page + 36-second `<video>` poll:
|
||
|
||
```
|
||
[t= 3s] ct=21.75 dur=328.37 rs=4 paused=False vw=1920 vh=1080 err=None
|
||
[t= 6s] ct=24.77 ...
|
||
[t= 9s] ct=27.76 ...
|
||
[t= 12s] ct=30.76 ...
|
||
[t= 15s] ct=33.77 ...
|
||
[t= 18s] ct=36.78 ...
|
||
[t= 21s] ct=39.79 ...
|
||
[t= 24s] ct=42.79 ...
|
||
[t= 27s] ct=45.80 ...
|
||
[t= 30s] ct=48.82 ...
|
||
[t= 33s] ct=51.82 ...
|
||
[t= 36s] ct=54.84 ...
|
||
VERDICT: ct_advance=33.09s rs=4 vw=1920 err=None → PASS
|
||
```
|
||
|
||
`headless-test-v2.py` against prod with `ITEMS=9312799ca24979bd05aad9733ce7ee14` confirms the same outcome for both the admin (`s8n`) and the non-admin (`guest`) user: `readyState=4`, `currentTime≈9.5s`, `videoWidth=1920`, `paused=false`, `error=null`, src `https://arrflix.s8n.ru/Videos/9312799ca24979bd05aad9733ce7ee14/stream.mkv?Static=true...` (direct-play, no transcode required for this codec/profile pair).
|
||
|
||
### Open follow-ups
|
||
|
||
1. **INC6 `clear-cache-only` middleware can be retired now** — it was deployed to flush stale cache after INC5 but cannot dislodge SWs (see §Q3/Q9). Now that the SW is on `cache-no-store`, the hammer is no longer needed. Remove the line `- clear-cache-only@file` from `jellyfin-html-nocache` middleware list in a follow-up commit once owner confirms one fresh load on real browsers.
|
||
2. **Service-worker auto-recovery for already-poisoned clients.** The ARRFLIX shim already loops `navigator.serviceWorker.getRegistrations() → r.unregister(); caches.keys() → caches.delete()` once per pageview (verified in shim audit §c). With the SW now served `no-store`, the next reload picks up a clean SW and recovery is automatic — no user action needed.
|
||
3. **INC2 backdrop-pin CSS in branding.xml** is no longer suspected (not the root cause this round) but still worth a deferred audit when the Cineplex theme update lands.
|
||
4. **Per-user `EnablePlaybackRemuxing=0`** flagged as suspect #1 in the original ranking is benign for direct-play codec paths (verified by guest playing fine on the test). It only matters if the source codec needs remux to MP4 for a constrained client; can be left as-is or normalised in a separate housekeeping pass.
|
||
5. **`/opt/docker/jellyfin/web-overrides/index.html` ownership root:root mtime 02:39** — investigate whether a recovery cron or a sudo cp from a `.bak` file rewrote it mid-incident. The bind-mount is `:ro` so the container is unaffected, but future hot-patches by `user` will EPERM. Cosmetic, fix in a follow-up.
|
||
|
||
### Commit
|
||
|
||
This doc + the dynamic.yml patch (deployed to nullstone, hot-reloaded) are committed together as INC7.
|