doc 28 + INC7: fix prod black-screen via SW cache pin
Five sibling agents converged on root cause: jellyfin-asset-immutable Traefik router (priority 90) was matching /web/serviceworker.js (Jellyfin PWA's actual SW filename), pinning it with Cache-Control: public, max-age=31536000, immutable. The priority-100 jellyfin-html-nocache router only excluded the literal path /web/sw.js, missing serviceworker.js. Stale SWs from earlier ARRFLIX iterations intercepted /Videos/* and /web/* fetch events, returning cached/empty bytes. Result: MediaSource appendBuffer got bad data -> black <video>. INC6's Clear-Site-Data: "cache" couldn't fix it (per MDN spec, "cache" excludes SW registrations; "storage" would have worked). Fix: added jellyfin-sw-nocache router at priority 250 in /opt/docker/traefik/config/dynamic.yml on nullstone, forcing cache-no-store@file on /web/serviceworker.js + /web/sw.js. Hot-reload via Traefik file provider, no docker restart. Verified at the wire (curl -I /web/serviceworker.js now returns no-cache, no-store, must-revalidate; main.jellyfin.bundle.js still immutable as intended) and via headless Chromium probe of MNS S1E4 (33s of currentTime advance, readyState 4, videoWidth 1920x1080, no errors, both s8n admin and guest user). bin/prod-vs-dev-compare.py also lands as a one-shot diff helper used during the investigation.
This commit is contained in:
parent
f0a2ac6450
commit
917d21b3be
2 changed files with 927 additions and 0 deletions
597
bin/prod-vs-dev-compare.py
Executable file
597
bin/prod-vs-dev-compare.py
Executable file
|
|
@ -0,0 +1,597 @@
|
|||
#!/usr/bin/env python3
|
||||
"""ARRFLIX prod-vs-dev playback divergence test (2026-05-09).
|
||||
|
||||
Runs the SAME flow against arrflix.s8n.ru (prod) and dev.arrflix.s8n.ru (dev)
|
||||
for the same physical file (Mike Nolan Show S01E04 — Ding Dong Delli.mkv,
|
||||
H.264+AAC) and produces a side-by-side diff:
|
||||
|
||||
- URL of master.m3u8 / Videos/{id}/stream
|
||||
- PlaybackInfo response MediaSources[0] (DirectPlay/DirectStream/Transcode)
|
||||
- Final <video> element state at t=5/10/20/30s after Play
|
||||
- Server ffmpeg cmdline (if transcoding) from docker logs
|
||||
- HTTP status of all /Videos /Items /master.m3u8 /PlaybackInfo /Audio
|
||||
/stream requests
|
||||
|
||||
Artifacts: /tmp/arrflix-prod-vs-dev/{prod,dev}/{...} + diff.json + diff.md.
|
||||
|
||||
Run:
|
||||
bin/prod-vs-dev-compare.py
|
||||
"""
|
||||
import sys, os, json, time, asyncio, ssl, urllib.request, urllib.error, urllib.parse, subprocess, re
|
||||
from pathlib import Path
|
||||
from playwright.async_api import async_playwright
|
||||
|
||||
OUT = "/tmp/arrflix-prod-vs-dev"
|
||||
os.makedirs(OUT, exist_ok=True)
|
||||
|
||||
SIDES = [
|
||||
{"side": "prod", "url": "https://arrflix.s8n.ru", "user": "s8n", "pw": "2001dude",
|
||||
"container": "jellyfin"},
|
||||
{"side": "dev", "url": "https://dev.arrflix.s8n.ru", "user": "test", "pw": "2001dude",
|
||||
"container": "jellyfin-dev"},
|
||||
]
|
||||
|
||||
ITEM_ID = "9312799ca24979bd05aad9733ce7ee14" # MNS S01E04 (same on both sides)
|
||||
ITEM_LABEL = "Mike Nolan Show — S01E04 (Ding Dong Delli)"
|
||||
|
||||
DEVICE_ID = "prodvsdev-2026-05-09"
|
||||
CLIENT = "ProdVsDev"
|
||||
APIKEY_NAME = "arrflix-prodvsdev-2026-05-09"
|
||||
|
||||
CTX = ssl._create_unverified_context()
|
||||
|
||||
|
||||
# ------------------- HTTP helpers -------------------
|
||||
|
||||
def auth_h(token=None):
|
||||
h = (f'MediaBrowser Client="{CLIENT}", Device="cli", DeviceId="{DEVICE_ID}", '
|
||||
f'Version="1.0"')
|
||||
if token:
|
||||
h += f', Token="{token}"'
|
||||
return h
|
||||
|
||||
|
||||
def http(url, path, method="GET", body=None, token=None):
|
||||
data = json.dumps(body).encode() if body is not None else None
|
||||
headers = {
|
||||
"Authorization": auth_h(token),
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
req = urllib.request.Request(
|
||||
f"{url}{path}", data=data, headers=headers, method=method)
|
||||
raw = urllib.request.urlopen(req, context=CTX, timeout=20).read()
|
||||
return json.loads(raw) if raw else {}
|
||||
|
||||
|
||||
def login(url, user, pw):
|
||||
last_err = None
|
||||
for attempt in range(3):
|
||||
try:
|
||||
return http(url, "/Users/AuthenticateByName", "POST",
|
||||
{"Username": user, "Pw": pw})
|
||||
except urllib.error.HTTPError as e:
|
||||
last_err = e
|
||||
if e.code in (500, 503):
|
||||
time.sleep(3); continue
|
||||
raise
|
||||
raise last_err
|
||||
|
||||
|
||||
def playbackinfo(url, item_id, user_id, token):
|
||||
"""Mimic the web-client's /PlaybackInfo POST body for a generic browser."""
|
||||
body = {
|
||||
"DeviceProfile": {
|
||||
"MaxStreamingBitrate": 140000000,
|
||||
"MaxStaticBitrate": 100000000,
|
||||
"MusicStreamingTranscodingBitrate": 384000,
|
||||
"DirectPlayProfiles": [
|
||||
{"Container": "mp4,m4v", "Type": "Video",
|
||||
"VideoCodec": "h264,hevc,vp9,av1",
|
||||
"AudioCodec": "aac,mp3,ac3,eac3,opus,flac"},
|
||||
{"Container": "mkv", "Type": "Video",
|
||||
"VideoCodec": "h264,hevc,vp9,av1",
|
||||
"AudioCodec": "aac,mp3,ac3,eac3,opus,flac"},
|
||||
{"Container": "webm", "Type": "Video",
|
||||
"VideoCodec": "vp9,av1", "AudioCodec": "opus,vorbis"},
|
||||
],
|
||||
"TranscodingProfiles": [
|
||||
{"Container": "ts", "Type": "Video", "VideoCodec": "h264",
|
||||
"AudioCodec": "aac", "Protocol": "hls", "Context": "Streaming",
|
||||
"MaxAudioChannels": "2"},
|
||||
{"Container": "mp4", "Type": "Video", "VideoCodec": "h264",
|
||||
"AudioCodec": "aac", "Context": "Static",
|
||||
"MaxAudioChannels": "2"},
|
||||
],
|
||||
"ContainerProfiles": [],
|
||||
"CodecProfiles": [],
|
||||
"SubtitleProfiles": [
|
||||
{"Format": "vtt", "Method": "External"},
|
||||
{"Format": "srt", "Method": "External"},
|
||||
],
|
||||
},
|
||||
"AutoOpenLiveStream": True,
|
||||
"IsPlayback": True,
|
||||
}
|
||||
return http(url, f"/Items/{item_id}/PlaybackInfo?UserId={user_id}",
|
||||
"POST", body, token=token)
|
||||
|
||||
|
||||
def make_apikey(url, token, name=APIKEY_NAME):
|
||||
"""Issue an API key. Jellyfin only takes the name in query string."""
|
||||
try:
|
||||
http(url, f"/Auth/Keys?App={name}", "POST", token=token)
|
||||
except urllib.error.HTTPError:
|
||||
pass
|
||||
keys = http(url, "/Auth/Keys", token=token)
|
||||
for k in keys.get("Items", []):
|
||||
if k.get("AppName") == name:
|
||||
return k.get("AccessToken")
|
||||
return None
|
||||
|
||||
|
||||
def del_apikey(url, token, name=APIKEY_NAME):
|
||||
try:
|
||||
keys = http(url, "/Auth/Keys", token=token)
|
||||
for k in keys.get("Items", []):
|
||||
if k.get("AppName") == name:
|
||||
http(url, f"/Auth/Keys/{k['AccessToken']}", "DELETE", token=token)
|
||||
except Exception as e:
|
||||
print(f"[!] del_apikey({name}): {e}")
|
||||
|
||||
|
||||
# ------------------- Playwright run -------------------
|
||||
|
||||
async def run_side(p, side_cfg):
|
||||
side = side_cfg["side"]; url = side_cfg["url"]
|
||||
user = side_cfg["user"]; pw = side_cfg["pw"]
|
||||
side_dir = os.path.join(OUT, side)
|
||||
os.makedirs(side_dir, exist_ok=True)
|
||||
|
||||
# API login
|
||||
auth = login(url, user, pw)
|
||||
token = auth["AccessToken"]; uid = auth["User"]["Id"]
|
||||
server_id = auth["ServerId"]
|
||||
is_admin = auth["User"].get("Policy", {}).get("IsAdministrator", False)
|
||||
print(f"\n=== {side} === user={user} uid={uid} admin={is_admin}")
|
||||
|
||||
# API-side PlaybackInfo (independent of browser, for canonical record)
|
||||
pbi_api = playbackinfo(url, ITEM_ID, uid, token)
|
||||
with open(os.path.join(side_dir, "playbackinfo-api.json"), "w") as f:
|
||||
json.dump(pbi_api, f, indent=2)
|
||||
ms = pbi_api.get("MediaSources", [])
|
||||
if ms:
|
||||
m = ms[0]
|
||||
print(f"[{side}] PlaybackInfo (API): DirectPlay={m.get('SupportsDirectPlay')} "
|
||||
f"DirectStream={m.get('SupportsDirectStream')} "
|
||||
f"Transcoding={m.get('SupportsTranscoding')} "
|
||||
f"transcodeUrl={m.get('TranscodingUrl','-')[:80]}")
|
||||
|
||||
# API key for this run (caller asked, even if not strictly needed here)
|
||||
apikey = make_apikey(url, token)
|
||||
print(f"[{side}] api key: {apikey[:8] if apikey else None}")
|
||||
|
||||
# Browser pass
|
||||
browser = await p.chromium.launch(
|
||||
headless=True,
|
||||
args=["--no-sandbox", "--disable-dev-shm-usage",
|
||||
"--autoplay-policy=no-user-gesture-required",
|
||||
"--use-fake-ui-for-media-stream"])
|
||||
ctx = await browser.new_context(
|
||||
viewport={"width": 1600, "height": 900},
|
||||
ignore_https_errors=True)
|
||||
page = await ctx.new_page()
|
||||
|
||||
requests, responses, console = [], [], []
|
||||
pbi_response_bodies = []
|
||||
|
||||
def on_request(req):
|
||||
u = req.url
|
||||
if any(x in u for x in ["/Videos/", "/Items/", "/master.m3u8",
|
||||
"/PlaybackInfo", "/Audio/", "/stream"]):
|
||||
requests.append({"method": req.method, "url": u,
|
||||
"post": req.post_data[:300] if req.post_data else None})
|
||||
|
||||
page.on("request", on_request)
|
||||
|
||||
async def on_response(r):
|
||||
u = r.url
|
||||
if any(x in u for x in ["/Videos/", "/Items/", "/master.m3u8",
|
||||
"/PlaybackInfo", "/Audio/", "/stream"]):
|
||||
entry = {"method": r.request.method, "url": u, "status": r.status}
|
||||
responses.append(entry)
|
||||
if "/PlaybackInfo" in u and r.request.method == "POST":
|
||||
try:
|
||||
body = await r.json()
|
||||
pbi_response_bodies.append({"url": u, "body": body})
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
page.on("response", lambda r: asyncio.create_task(on_response(r)))
|
||||
page.on("console", lambda m: console.append({"type": m.type,
|
||||
"text": m.text[:300]}))
|
||||
|
||||
# Form login (handles both manual-form and user-avatar landing pages)
|
||||
await page.goto(f"{url}/web/", wait_until="networkidle", timeout=30000)
|
||||
await asyncio.sleep(3)
|
||||
# If we landed on the avatar/user-list selection screen, click "Manual Login"
|
||||
try:
|
||||
manual = await page.query_selector(".manualLoginForm a, .btnManual, a.button-link")
|
||||
if manual:
|
||||
txt = (await manual.inner_text()).strip().lower()
|
||||
if "manual" in txt:
|
||||
await manual.click()
|
||||
await asyncio.sleep(2)
|
||||
# Or there might be a direct "Manual Login" button on the avatar grid
|
||||
manual_btn = await page.query_selector("text=/Manual Login/i")
|
||||
if manual_btn:
|
||||
try:
|
||||
await manual_btn.click(timeout=2000); await asyncio.sleep(1)
|
||||
except Exception:
|
||||
pass
|
||||
except Exception as e:
|
||||
print(f"[{side}] manual-login click attempt: {e}")
|
||||
try:
|
||||
await page.wait_for_selector("input[type=password]", timeout=15000)
|
||||
# Use the canonical Jellyfin login fields
|
||||
u_sel = "#txtManualName"
|
||||
pw_sel = "#txtManualPassword"
|
||||
# Fall back to dynamic discovery if the canonical IDs are absent
|
||||
if not await page.query_selector(u_sel):
|
||||
inputs = await page.evaluate(
|
||||
"() => Array.from(document.querySelectorAll('input')).map(i => "
|
||||
"({id:i.id, name:i.name, type:i.type}))")
|
||||
u_sel = pw_sel = None
|
||||
for i in inputs:
|
||||
fid, fname, ftype = i.get("id", ""), i.get("name", ""), i.get("type", "")
|
||||
if not u_sel and (ftype == "text" or "user" in (fid+fname).lower()
|
||||
or "name" in (fid+fname).lower()):
|
||||
u_sel = f"#{fid}" if fid else f'input[name="{fname}"]'
|
||||
if not pw_sel and ftype == "password":
|
||||
pw_sel = f"#{fid}" if fid else f'input[name="{fname}"]'
|
||||
await page.fill(u_sel, user)
|
||||
await page.fill(pw_sel, pw)
|
||||
await page.keyboard.press("Enter")
|
||||
await page.wait_for_load_state("networkidle", timeout=20000)
|
||||
await asyncio.sleep(3)
|
||||
print(f"[{side}] form login OK as {user}")
|
||||
except Exception as e:
|
||||
print(f"[{side}] form login error: {e}")
|
||||
|
||||
# Navigate to detail page
|
||||
target = f"{url}/web/#/details?id={ITEM_ID}&serverId={server_id}"
|
||||
print(f"[{side}] goto {target}")
|
||||
await page.goto(target, wait_until="networkidle", timeout=30000)
|
||||
await asyncio.sleep(4)
|
||||
await page.screenshot(path=os.path.join(side_dir, "detail.png"))
|
||||
|
||||
# Click Play
|
||||
play_clicked = False
|
||||
used_sel = None
|
||||
for sel in [".btnPlay", "[data-action=\"play\"]"]:
|
||||
try:
|
||||
btn = await page.query_selector(sel)
|
||||
if btn:
|
||||
box = await btn.bounding_box()
|
||||
if box and box["width"] > 0:
|
||||
await btn.click(timeout=5000)
|
||||
play_clicked = True; used_sel = sel; break
|
||||
except Exception:
|
||||
pass
|
||||
if not play_clicked:
|
||||
try:
|
||||
await page.keyboard.press("p"); play_clicked = True; used_sel = "kbd:p"
|
||||
except Exception:
|
||||
pass
|
||||
print(f"[{side}] play clicked={play_clicked} via={used_sel}")
|
||||
|
||||
# Sample state at t=5/10/20/30s
|
||||
timestamps = [5, 10, 20, 30]
|
||||
samples = []
|
||||
last = 0
|
||||
for t in timestamps:
|
||||
await asyncio.sleep(t - last)
|
||||
last = t
|
||||
snap = await page.evaluate("""() => {
|
||||
const v = document.querySelector('video');
|
||||
if (!v) return { present: false };
|
||||
// Sample whether the <video> is painting actual pixels by drawing
|
||||
// a thumbnail to a hidden canvas and checking the average luma.
|
||||
// If the average is ~0 (or all-near-zero), the video element is
|
||||
// rendering opaque black despite claiming to play.
|
||||
let paintLuma = null, paintRGBSum = null, paintOk = null, paintErr = null;
|
||||
try {
|
||||
const c = document.createElement('canvas');
|
||||
c.width = 32; c.height = 18;
|
||||
const ctx = c.getContext('2d', { willReadFrequently: true });
|
||||
ctx.drawImage(v, 0, 0, 32, 18);
|
||||
const d = ctx.getImageData(0, 0, 32, 18).data;
|
||||
let r=0,g=0,b=0,n=0;
|
||||
for (let i=0;i<d.length;i+=4){r+=d[i];g+=d[i+1];b+=d[i+2];n++;}
|
||||
paintLuma = (0.299*r + 0.587*g + 0.114*b) / n;
|
||||
paintRGBSum = (r+g+b)/n;
|
||||
paintOk = paintLuma > 4; // > a few luma above pure black
|
||||
} catch (e) { paintErr = String(e); }
|
||||
return {
|
||||
present: true,
|
||||
src: v.src || '',
|
||||
currentSrc: v.currentSrc || '',
|
||||
currentTime: v.currentTime,
|
||||
duration: v.duration,
|
||||
paused: v.paused,
|
||||
ended: v.ended,
|
||||
readyState: v.readyState,
|
||||
networkState: v.networkState,
|
||||
error: v.error ? { code: v.error.code, message: v.error.message } : null,
|
||||
videoWidth: v.videoWidth,
|
||||
videoHeight: v.videoHeight,
|
||||
bufferedRanges: v.buffered.length,
|
||||
bufferedEnd: v.buffered.length ? v.buffered.end(v.buffered.length-1) : 0,
|
||||
paintLuma, paintRGBSum, paintOk, paintErr,
|
||||
};
|
||||
}""")
|
||||
samples.append({"t": t, "video": snap})
|
||||
await page.screenshot(path=os.path.join(side_dir, f"play-t{t}.png"))
|
||||
ct = snap.get('currentTime')
|
||||
ct_s = f"{ct:.2f}" if isinstance(ct, (int, float)) else str(ct)
|
||||
pl = snap.get('paintLuma')
|
||||
pl_s = f"{pl:.1f}" if isinstance(pl, (int, float)) else str(pl)
|
||||
print(f"[{side}] t={t}s: time={ct_s} "
|
||||
f"paused={snap.get('paused')} err={snap.get('error')} "
|
||||
f"dim={snap.get('videoWidth')}x{snap.get('videoHeight')} "
|
||||
f"rs={snap.get('readyState')} paintLuma={pl_s} paintOk={snap.get('paintOk')}")
|
||||
|
||||
# Final src URL fully decoded
|
||||
final_src = samples[-1]["video"].get("currentSrc") or samples[-1]["video"].get("src", "")
|
||||
final_src_decoded = urllib.parse.unquote(final_src) if final_src else ""
|
||||
|
||||
await browser.close()
|
||||
|
||||
# Server side ffmpeg / transcode log
|
||||
server_logs = ""
|
||||
try:
|
||||
server_logs = subprocess.check_output(
|
||||
["ssh", "-o", "ConnectTimeout=5", "user@192.168.0.100",
|
||||
f"docker logs --since 2m {side_cfg['container']} 2>&1 | tail -300"],
|
||||
timeout=15).decode(errors="replace")
|
||||
except Exception as e:
|
||||
server_logs = f"(failed to fetch server logs: {e})"
|
||||
|
||||
# Extract ffmpeg cmdline + transcode reasons from log
|
||||
ffmpeg_cmd = None
|
||||
for line in server_logs.splitlines():
|
||||
if "ffmpeg" in line.lower() and ("-i " in line or "-f hls" in line or "-c:v" in line):
|
||||
ffmpeg_cmd = line.strip()
|
||||
break
|
||||
transcode_reasons = []
|
||||
for line in server_logs.splitlines():
|
||||
if "transcode reason" in line.lower() or "TranscodeReasons" in line:
|
||||
transcode_reasons.append(line.strip())
|
||||
|
||||
# Save artifacts
|
||||
side_out = {
|
||||
"side": side, "url": url, "user": user, "uid": uid, "is_admin": is_admin,
|
||||
"server_id": server_id, "item_id": ITEM_ID, "item_label": ITEM_LABEL,
|
||||
"play_clicked": play_clicked, "play_selector": used_sel,
|
||||
"samples": samples,
|
||||
"final_src": final_src,
|
||||
"final_src_decoded": final_src_decoded,
|
||||
"playbackinfo_api": pbi_api,
|
||||
"playbackinfo_browser_responses": pbi_response_bodies,
|
||||
"requests": requests,
|
||||
"responses": responses,
|
||||
"console": console[-200:],
|
||||
"ffmpeg_cmdline": ffmpeg_cmd,
|
||||
"transcode_reasons_log": transcode_reasons,
|
||||
}
|
||||
with open(os.path.join(side_dir, "result.json"), "w") as f:
|
||||
json.dump(side_out, f, indent=2, default=str)
|
||||
with open(os.path.join(side_dir, "server.log"), "w") as f:
|
||||
f.write(server_logs)
|
||||
|
||||
# Cleanup the temp api key
|
||||
del_apikey(url, token)
|
||||
|
||||
return side_out
|
||||
|
||||
|
||||
# ------------------- Diff & report -------------------
|
||||
|
||||
def diff_results(prod, dev):
|
||||
"""Build the comparison matrix."""
|
||||
def keyfields(pbi):
|
||||
ms = pbi.get("MediaSources", [])
|
||||
if not ms:
|
||||
return None
|
||||
m = ms[0]
|
||||
return {
|
||||
"Container": m.get("Container"),
|
||||
"Protocol": m.get("Protocol"),
|
||||
"SupportsDirectPlay": m.get("SupportsDirectPlay"),
|
||||
"SupportsDirectStream": m.get("SupportsDirectStream"),
|
||||
"SupportsTranscoding": m.get("SupportsTranscoding"),
|
||||
"TranscodingUrl": m.get("TranscodingUrl"),
|
||||
"TranscodingSubProtocol": m.get("TranscodingSubProtocol"),
|
||||
"TranscodingContainer": m.get("TranscodingContainer"),
|
||||
"TranscodeReasons": m.get("TranscodeReasons"),
|
||||
"Bitrate": m.get("Bitrate"),
|
||||
"Size": m.get("Size"),
|
||||
"Path": m.get("Path"),
|
||||
}
|
||||
p_pbi = keyfields(prod["playbackinfo_api"])
|
||||
d_pbi = keyfields(dev["playbackinfo_api"])
|
||||
last_p = prod["samples"][-1]["video"]
|
||||
last_d = dev["samples"][-1]["video"]
|
||||
|
||||
out = {
|
||||
"item_id": ITEM_ID, "label": ITEM_LABEL,
|
||||
"prod_url": prod["url"], "dev_url": dev["url"],
|
||||
"playback_info_diff": {
|
||||
"prod": p_pbi, "dev": d_pbi,
|
||||
"differences": {
|
||||
k: {"prod": p_pbi.get(k), "dev": d_pbi.get(k)}
|
||||
for k in (set(p_pbi or {}) | set(d_pbi or {}))
|
||||
if (p_pbi or {}).get(k) != (d_pbi or {}).get(k)
|
||||
} if p_pbi and d_pbi else "missing-on-one-side",
|
||||
},
|
||||
"video_state_t30": {
|
||||
"prod": last_p,
|
||||
"dev": last_d,
|
||||
"differences": {
|
||||
k: {"prod": last_p.get(k), "dev": last_d.get(k)}
|
||||
for k in (set(last_p) | set(last_d))
|
||||
if last_p.get(k) != last_d.get(k)
|
||||
},
|
||||
},
|
||||
"stream_url_prod": prod.get("final_src_decoded"),
|
||||
"stream_url_dev": dev.get("final_src_decoded"),
|
||||
"ffmpeg_cmdline_prod": prod.get("ffmpeg_cmdline"),
|
||||
"ffmpeg_cmdline_dev": dev.get("ffmpeg_cmdline"),
|
||||
"transcode_reasons_log_prod": prod.get("transcode_reasons_log"),
|
||||
"transcode_reasons_log_dev": dev.get("transcode_reasons_log"),
|
||||
"http_status_diff": [],
|
||||
}
|
||||
|
||||
# HTTP-status diff: for matched URL templates, show statuses where they differ.
|
||||
def normalise(u):
|
||||
# Strip /Videos/{id} → /Videos/* and quoting; keep last path segment
|
||||
u = re.sub(r"/Videos/[a-f0-9]{32}", "/Videos/*", u)
|
||||
u = re.sub(r"/Items/[a-f0-9]{32}", "/Items/*", u)
|
||||
u = re.sub(r"\?.*$", "", u)
|
||||
u = re.sub(r"^https?://[^/]+", "", u)
|
||||
return u
|
||||
|
||||
def status_map(rs):
|
||||
out = {}
|
||||
for r in rs:
|
||||
k = (r["method"], normalise(r["url"]))
|
||||
out.setdefault(k, []).append(r["status"])
|
||||
return out
|
||||
|
||||
sp = status_map(prod.get("responses", []))
|
||||
sd = status_map(dev.get("responses", []))
|
||||
keys = set(sp) | set(sd)
|
||||
for k in sorted(keys):
|
||||
if sp.get(k) != sd.get(k):
|
||||
out["http_status_diff"].append({
|
||||
"method": k[0], "path": k[1],
|
||||
"prod": sp.get(k), "dev": sd.get(k),
|
||||
})
|
||||
return out
|
||||
|
||||
|
||||
def render_md(diff, prod, dev):
|
||||
pp = diff["playback_info_diff"].get("prod") or {}
|
||||
dp = diff["playback_info_diff"].get("dev") or {}
|
||||
last_p = diff["video_state_t30"]["prod"]
|
||||
last_d = diff["video_state_t30"]["dev"]
|
||||
|
||||
def fmt_bool(x): return "Y" if x else ("N" if x is False else "—")
|
||||
|
||||
def headline():
|
||||
# Three failure modes to recognise, in order:
|
||||
# 1. paused-at-zero → MediaSource attach never fired
|
||||
# 2. <video>.error → format/decode error
|
||||
# 3. paint-black → video advances but renders no pixels (DRM-style
|
||||
# black, or codec-not-actually-decodable in this
|
||||
# chromium build despite advancing the clock)
|
||||
bp = bool(last_p.get("paused")) and (last_p.get("currentTime", 0) or 0) < 0.1
|
||||
bd = bool(last_d.get("paused")) and (last_d.get("currentTime", 0) or 0) < 0.1
|
||||
if bp and not bd:
|
||||
return ("prod fails because video stayed paused at t=0 while dev advanced")
|
||||
if bd and not bp:
|
||||
return ("dev fails because video stayed paused at t=0 while prod advanced")
|
||||
if last_p.get("error") and not last_d.get("error"):
|
||||
return f"prod fails because <video>.error code={last_p['error'].get('code')}"
|
||||
if last_d.get("error") and not last_p.get("error"):
|
||||
return f"dev fails because <video>.error code={last_d['error'].get('code')}"
|
||||
# Paint check
|
||||
pp_ok = last_p.get("paintOk"); dp_ok = last_d.get("paintOk")
|
||||
if pp_ok is False and dp_ok is True:
|
||||
return ("prod fails because <video> advances time but paints all-black "
|
||||
"(paintLuma~0) while dev paints normally — pixels never reach the canvas")
|
||||
if dp_ok is False and pp_ok is True:
|
||||
return ("dev fails because <video> advances time but paints all-black "
|
||||
"(paintLuma~0) while prod paints normally")
|
||||
return "neither side errored or painted black explicitly — see HTTP/PlaybackInfo/cmdline diffs"
|
||||
|
||||
md = []
|
||||
md.append(f"# Prod vs Dev — playback divergence test ({time.strftime('%Y-%m-%d %H:%M')})")
|
||||
md.append("")
|
||||
md.append(f"Item: **{diff['label']}** (ItemId `{diff['item_id']}`)")
|
||||
md.append("")
|
||||
md.append(f"**Headline:** {headline()}")
|
||||
md.append("")
|
||||
md.append("## Final video state at t=30s")
|
||||
md.append("| Field | prod | dev |")
|
||||
md.append("|---|---|---|")
|
||||
for k in ["present", "currentTime", "duration", "paused", "ended",
|
||||
"readyState", "networkState", "error",
|
||||
"videoWidth", "videoHeight", "bufferedRanges", "bufferedEnd",
|
||||
"paintLuma", "paintRGBSum", "paintOk"]:
|
||||
md.append(f"| {k} | `{last_p.get(k)}` | `{last_d.get(k)}` |")
|
||||
md.append("")
|
||||
md.append("## Stream URL (decoded)")
|
||||
md.append(f"- **prod**: `{diff.get('stream_url_prod') or '(empty)'}`")
|
||||
md.append(f"- **dev**: `{diff.get('stream_url_dev') or '(empty)'}`")
|
||||
md.append("")
|
||||
md.append("## PlaybackInfo MediaSources[0]")
|
||||
md.append("| Field | prod | dev |")
|
||||
md.append("|---|---|---|")
|
||||
for k in ["Container", "Protocol", "SupportsDirectPlay",
|
||||
"SupportsDirectStream", "SupportsTranscoding",
|
||||
"TranscodingUrl", "TranscodingSubProtocol", "TranscodingContainer",
|
||||
"TranscodeReasons", "Bitrate", "Size", "Path"]:
|
||||
md.append(f"| {k} | `{pp.get(k)}` | `{dp.get(k)}` |")
|
||||
md.append("")
|
||||
md.append("## ffmpeg cmdline (from docker logs)")
|
||||
md.append(f"- **prod**: `{diff.get('ffmpeg_cmdline_prod') or '(none — no transcoding observed)'}`")
|
||||
md.append(f"- **dev**: `{diff.get('ffmpeg_cmdline_dev') or '(none — no transcoding observed)'}`")
|
||||
md.append("")
|
||||
md.append("## HTTP status differences")
|
||||
if diff.get("http_status_diff"):
|
||||
md.append("| Method | Path | prod | dev |")
|
||||
md.append("|---|---|---|---|")
|
||||
for r in diff["http_status_diff"]:
|
||||
md.append(f"| {r['method']} | `{r['path']}` | {r['prod']} | {r['dev']} |")
|
||||
else:
|
||||
md.append("(none — all matched URLs returned the same status code)")
|
||||
md.append("")
|
||||
md.append("## Per-sample timeline")
|
||||
md.append("| t | prod time | prod paused | prod err | dev time | dev paused | dev err |")
|
||||
md.append("|---|---|---|---|---|---|---|")
|
||||
for ps, ds in zip(prod["samples"], dev["samples"]):
|
||||
pv, dv = ps["video"], ds["video"]
|
||||
md.append(f"| {ps['t']}s | {pv.get('currentTime')} | {pv.get('paused')} | "
|
||||
f"{pv.get('error')} | {dv.get('currentTime')} | {dv.get('paused')} | "
|
||||
f"{dv.get('error')} |")
|
||||
md.append("")
|
||||
return "\n".join(md)
|
||||
|
||||
|
||||
# ------------------- main -------------------
|
||||
|
||||
async def main():
|
||||
print(f"[+] OUT: {OUT}")
|
||||
async with async_playwright() as p:
|
||||
prod = await run_side(p, SIDES[0])
|
||||
dev = await run_side(p, SIDES[1])
|
||||
|
||||
diff = diff_results(prod, dev)
|
||||
with open(os.path.join(OUT, "diff.json"), "w") as f:
|
||||
json.dump(diff, f, indent=2, default=str)
|
||||
md = render_md(diff, prod, dev)
|
||||
with open(os.path.join(OUT, "diff.md"), "w") as f:
|
||||
f.write(md)
|
||||
|
||||
print("\n=== SUMMARY ===")
|
||||
last_p = diff["video_state_t30"]["prod"]; last_d = diff["video_state_t30"]["dev"]
|
||||
print(f"prod t=30: time={last_p.get('currentTime')} paused={last_p.get('paused')} "
|
||||
f"err={last_p.get('error')} dim={last_p.get('videoWidth')}x{last_p.get('videoHeight')}")
|
||||
print(f"dev t=30: time={last_d.get('currentTime')} paused={last_d.get('paused')} "
|
||||
f"err={last_d.get('error')} dim={last_d.get('videoWidth')}x{last_d.get('videoHeight')}")
|
||||
print(f"diff.json: {os.path.join(OUT, 'diff.json')}")
|
||||
print(f"diff.md: {os.path.join(OUT, 'diff.md')}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
330
docs/28-prod-vs-dev-playback-divergence-2026-05-09.md
Normal file
330
docs/28-prod-vs-dev-playback-divergence-2026-05-09.md
Normal file
|
|
@ -0,0 +1,330 @@
|
|||
# 28 — Prod vs Dev Playback Divergence (2026-05-09)
|
||||
|
||||
> Diff hunt: `arrflix.s8n.ru` (prod, BLACK SCREEN on high-quality video) vs `dev.arrflix.s8n.ru` (dev, plays fine). Same image `jellyfin/jellyfin:10.10.3`, same `/home/user/media:/media:ro`, same network `proxy`, same `userns_mode: host`, same `user: 1000:1000`. Difference is therefore in container env, bind-mounts, Traefik routing, server config XML, or per-user policy stored in `jellyfin.db`. This doc enumerates every divergence found and weights how likely each is to be the cause.
|
||||
|
||||
Status: **RESOLVED 2026-05-09 02:46Z** — root cause was Traefik `jellyfin-asset-immutable` pinning `/web/serviceworker.js` with `Cache-Control: immutable, max-age=31536000`, causing a stale Jellyfin PWA service worker to intercept `/Videos/*` and `/web/*` `fetch()` events and return cached/empty responses → MSE black screen. Patched in dynamic.yml (added `jellyfin-sw-nocache` router at priority 250 forcing `cache-no-store` on `/web/serviceworker.js` + `/web/sw.js`). Headless playback verified: MNS S1E4 plays 33s of currentTime advance, readyState 4, videoWidth 1920×1080, no errors. See "Final fix applied + verification" section at the bottom of this doc.
|
||||
|
||||
Sibling docs: 26 (incident chain INC1–INC5), 12 (dev mirror setup), 17 (dev mirror + settings fix), 23 (perf audit).
|
||||
|
||||
---
|
||||
|
||||
## TL;DR — top suspects
|
||||
|
||||
| Rank | Suspect | Where | Why it could black-screen prod but not dev |
|
||||
|------|---------|-------|---------------------------------------------|
|
||||
| 1 (HIGH) | **Per-user `EnablePlaybackRemuxing = 0`** on every prod non-admin (marco/guest/house/5/aloy/64bitpotato/yummyhunny/Jayden/IX/ferghal/pet) | `jellyfin.db` Permissions table, Kind=10 | Forces a transcode for any container/codec mismatch even when client could direct-play. Combined with `HardwareAccelerationType=none` (CPU-only) and `RemoteClientBitrateLimit=8 Mbps` server-wide — high-bitrate 4K/HEVC content can't be re-encoded fast enough → blank frames. Dev `test` user has Kind 10 = 1 (remux ON) so it always direct-plays. |
|
||||
| 2 (HIGH) | **`RemoteClientBitrateLimit = 8 000 000` (8 Mbps)** on prod server, `0` (unlimited) on dev | `/home/docker/jellyfin/config/config/system.xml` line 137 | Owner's reported symptom is *"high-quality video"* fails. 4K/H265 source bitrates routinely exceed 20–60 Mbps. Server clamps to 8 Mbps for any "remote" session (anything not on prod LAN per server's view of client IP) → forces transcode to 8 Mbps → low-bitrate output that some browsers black-frame on HEVC profiles. Bizarrely, the per-user `Users.RemoteClientBitrateLimit` is `20000000` for ALL users — but server-wide cap and per-user cap interact via `min()`, so 8 Mbps wins. |
|
||||
| 3 (HIGH) | **Traefik middleware `clear-cache-only` + `force-en-accept-lang` on `arrflix.s8n.ru`, NOT on `dev`** | `/opt/docker/traefik/config/dynamic.yml` lines 30–43 | `clear-cache-only` middleware sends `Clear-Site-Data: "cache"` header on every `/`, `/web/`, `/web/index.html`, `/web/sw.js`, `/web/manifest.json` hit. This wipes the browser's HTTP cache but NOT IndexedDB or LocalStorage — except Chrome's `Clear-Site-Data: "cache"` interpretation **also evicts the Service Worker cache** on each navigation. Jellyfin's PWA SW caches the JS bundle. SW eviction mid-session can cause `MediaSource.appendBuffer` to fail mid-stream → black video. INC6 of doc 26 says this header was meant to be **temporary** ("REMOVE after owner confirms one fresh load"). It was never removed. |
|
||||
| 4 (MED) | **Prod branding.xml has 285 extra lines of CSS** including `position: fixed; z-index: 0` on `.backdropContainer` / `.backgroundContainer` | `/home/docker/jellyfin/config/config/branding.xml` 110-258 (BLACK-PASS + INC1–INC5) | INC2 pins backdrop containers at `position:fixed; top:0; left:0; width:100vw; height:100vh; z-index:0`. The HTML5 `<video>` lives in `.htmlVideoPlayerContainer` whose z-index is theme-dependent — if the prod backdrop pin happens to overlay it, the player renders behind the backdrop → black screen. Dev's branding.xml is minimal (only the `Abspielen` ::after override) so it can't occlude. |
|
||||
| 5 (MED) | **Prod has `enableHlsFmp4=false` shim** in `/opt/docker/jellyfin/web-overrides/index.html`, dev shim has it too but order/timing may differ | INC5 shim block in prod (line 245-260 region of the diff) | Was introduced 2026-05-09 INC5 specifically to *fix* HEVC+fMP4 black-video. If the shim's `localStorage.setItem('enableHlsFmp4','false')` ran AFTER the player initialized, or if Cineplex/finity caches the value, fMP4 is still chosen → HEVC inside fMP4 black-screen on Chrome ~M120+. The shim must run on every fresh page load. |
|
||||
| 6 (LOW) | **Prod env adds `JELLYFIN_UICulture=en-US`, `LANG=en_US.UTF-8`, `LC_ALL=en_US.UTF-8`**; dev does not | `docker inspect ... .Config.Env` | Locale env affects ffmpeg/jellyfin-ffmpeg's number formatting (decimal point in some locales). Unlikely to black-screen on its own but could change behavior of subtitle PGS rendering / x265 param parsing. |
|
||||
| 7 (INFO) | **Prod index.html was REWRITTEN at 02:39 by root** mid-investigation | `stat /opt/docker/jellyfin/web-overrides/index.html` shows 02:39 mtime, owner=root, 9723 bytes (was 65789 at 01:54 owned by user) | A rollback or hot-patch happened during the diff hunt. Whoever did it wiped the giant base64 favicon block but kept the SHIM. Note: the file is now owned by root, the bind-mount is :ro inside the container so this is safe, but **uid 0 owning a file in a `user:user` directory means a privileged process did the write** — likely a forgotten root cron or a `sudo cp` from a recovery script. |
|
||||
|
||||
---
|
||||
|
||||
## a) docker-compose diff
|
||||
|
||||
| Field | Prod | Dev |
|
||||
|-------|------|-----|
|
||||
| service name | `jellyfin` | `jellyfin-dev` |
|
||||
| container_name | `jellyfin` | `jellyfin-dev` |
|
||||
| image | `jellyfin/jellyfin:10.10.3` | `jellyfin/jellyfin:10.10.3` (identical) |
|
||||
| user | `1000:1000` | `1000:1000` (identical) |
|
||||
| userns_mode | `host` | `host` (identical) |
|
||||
| restart | `unless-stopped` | `unless-stopped` (identical) |
|
||||
| network | `proxy` | `proxy` (identical) |
|
||||
| TZ | `Europe/London` | `Europe/London` (identical) |
|
||||
| JELLYFIN_PublishedServerUrl | `https://arrflix.s8n.ru` | `https://dev.arrflix.s8n.ru` |
|
||||
| JELLYFIN_UICulture | `en-US` | (unset) |
|
||||
| LANG | `en_US.UTF-8` | (unset — falls through to image default `en_US.UTF-8`) |
|
||||
| LC_ALL | `en_US.UTF-8` | (unset — falls through to image default `en_US.UTF-8`) |
|
||||
| /config bind | `/home/docker/jellyfin/config` | `/home/docker/jellyfin-dev/config` |
|
||||
| /cache bind | `/home/docker/jellyfin/cache` | `/home/docker/jellyfin-dev/cache` |
|
||||
| /media bind | `/home/user/media:ro` | `/home/user/media:ro` (**identical, both ro**) |
|
||||
| /jellyfin/jellyfin-web/index.html | `/opt/docker/jellyfin/web-overrides/index.html:ro` | `/opt/docker/jellyfin-dev/web-overrides/index-dev.html:ro` |
|
||||
| /jellyfin/jellyfin-web/cineplex.css | bind-mounted (md5 `01e95d49…`) | NOT bind-mounted (uses CDN `@import`, see branding.xml diff) |
|
||||
| locale-en-only/*.chunk.js | **94 separate bind-mounts** of `/opt/docker/jellyfin/web-overrides/locale-en-only/<lang>-json.<hash>.chunk.js` over Jellyfin's stock locale chunks | **none** — dev serves Jellyfin's stock locale chunks as-shipped |
|
||||
| Traefik labels | router=`jellyfin`, middlewares=`security-headers@file,compress@file,force-en-accept-lang@file` | router=`jellyfin-dev`, middlewares=`security-headers@file,no-guest@file` |
|
||||
|
||||
Result: 94 locale chunk overrides on prod, 0 on dev. None of these chunks affect playback — they're translation JSON for UI strings. Skip as a playback suspect.
|
||||
|
||||
## b) Traefik routing diff
|
||||
|
||||
Prod has **THREE routers** for `arrflix.s8n.ru` defined in `/opt/docker/traefik/config/dynamic.yml`, plus the docker-provider one from labels. Dev has only the docker-provider one.
|
||||
|
||||
| Route | Host | Path | Priority | Middlewares | Comment |
|
||||
|-------|------|------|----------|-------------|---------|
|
||||
| `jellyfin-html-nocache` | `arrflix.s8n.ru` | `/`, `/web/`, `/web/index.html`, `/web/sw.js`, `/web/manifest.json` | 100 | security-headers + compress + cache-no-store + force-en-accept-lang + **clear-cache-only** | Sends `Clear-Site-Data: "cache"` on every nav. Was meant to be **temporary** (INC6, "REMOVE after owner confirms"). |
|
||||
| `jellyfin-locale-force-en` | `arrflix.s8n.ru` | regex locale-json chunks | 200 | security-headers + compress + cache-immutable + rewrite-to-en-us-json + force-en-accept-lang | Rewrites every locale-json chunk URL to en-us-json |
|
||||
| `jellyfin-asset-immutable` | `arrflix.s8n.ru` | regex /web/*.{js,css,…} | 90 | security-headers + compress + cache-immutable | Cache lock for hashed assets |
|
||||
| docker-provider router | `arrflix.s8n.ru` | (catch-all) | (no priority set) | security-headers + compress + force-en-accept-lang | The "default" jellyfin route |
|
||||
| docker-provider router (dev) | `dev.arrflix.s8n.ru` | (catch-all) | (no priority set) | security-headers + **no-guest** | Single route, no per-asset caching, no Clear-Site-Data, no Accept-Language pinning |
|
||||
|
||||
Diff highlights for playback:
|
||||
- **`clear-cache-only` (Clear-Site-Data: "cache") on prod only** — see suspect #3 above. HIGH likelihood: in Chrome, this header evicts the Service Worker cache on every navigation. Jellyfin's PWA registers `sw.js` and serves chunked JS from SW cache. If the SW cache is wiped while the user is mid-session and a re-fetch fails (rate-limited, or cache-immutable response served stale), `MediaSource.appendBuffer` can throw → silent black video.
|
||||
- **`force-en-accept-lang` rewrites Accept-Language to en-US,en;q=0.9 on prod, not on dev** — affects only metadata strings, NOT playback.
|
||||
- **`cache-immutable` (`max-age=31536000, immutable`) on prod's hashed JS/CSS** — fine in steady state, but combined with `clear-cache-only` on the index, you can get into a state where index says "fetch new chunks" but client has them locked under the immutable header. Browsers usually re-validate on hard reload only.
|
||||
- **`rewrite-to-en-us-json` on prod only** — purely string-translation rewrite; not a playback factor.
|
||||
- **`no-guest@file` on dev only**: blocks WAN, prod relies on its own no-guest somewhere else (router-level Pi-hole rules per CLAUDE.md memory `feedback_s8n_hosts_override.md`). Not a playback factor.
|
||||
|
||||
## c) branding.xml (CustomCss) diff
|
||||
|
||||
Prod = **401 lines**, dev = **116 lines**. 285-line delta is all the BLACK-PASS / INC1–INC5 patches absent on dev.
|
||||
|
||||
| Block | Prod | Dev |
|
||||
|-------|------|-----|
|
||||
| `@import url("/web/cineplex.css")` | YES — local cineplex.css mounted in compose | NO — uses `https://cdn.jsdelivr.net/gh/MRunkehl/cineplex@v1.0.6/cineplex.css` |
|
||||
| BLACK-PASS section (`:root` overrides + `.layout-desktop { background-color: #000 !important; }`) | YES (lines 110-180) | NO |
|
||||
| INC1 transparent-scope `.itemDetailPage:has()` | YES | NO |
|
||||
| INC2 `position:fixed; z-index:0` on `.backdropContainer`, `.backgroundContainer` (full viewport) | YES (lines 215-258) | NO |
|
||||
| INC3 transparent-scope on `.detailPageContent`, `.detailVerticalSection`, `.itemsContainer`, etc. | YES | NO |
|
||||
| INC4 transparent-scope on `.itemDetailPage .emby-scroller` | YES | NO |
|
||||
| INC5 scrollbar palette overrides | YES | NO |
|
||||
| `Abspielen` → `Play` ::after override | YES | YES (only this block on dev) |
|
||||
|
||||
Suspect #4 above: INC2's `position: fixed; z-index: 0` on `.backdropContainer` could overlap or stack above the video element wrapper depending on Cineplex/finity stacking context. The full-viewport pinned backdrop is the most aggressive layout change in the diff. Would not affect dev because dev has none of these rules.
|
||||
|
||||
## d) encoding.xml diff
|
||||
|
||||
Live `/encoding.xml`: **byte-identical** between prod and dev.
|
||||
|
||||
`encoding.xml.bak.1778285349` (older copies) shows historical divergence:
|
||||
- Prod previously had `EnableThrottling=true`, `EnableSegmentDeletion=true`, `EnableTonemapping=true`
|
||||
- Dev had all three `false`
|
||||
- Both are now `false` — convergence happened during INC1-5 work.
|
||||
|
||||
Both servers run `HardwareAccelerationType = none` (no GPU hwaccel — known: GTX 1660 Ti driver broken on host per CLAUDE.md memory ref). CPU-only ffmpeg transcode on this host can keep up with H264 at 1080p but not with 4K/HEVC at >40 Mbps. This is the reason `RemoteClientBitrateLimit=8M` (suspect #2) is so dangerous on prod.
|
||||
|
||||
## e) bind-mount diff
|
||||
|
||||
Already covered in compose section. Net: **media is identical** (`/home/user/media:/media:ro` on both — same path, same `:ro`). All differences are in `/config`, `/cache`, and the `/jellyfin/jellyfin-web/*` overrides. Cache divergence cannot cause prod black-screen because each container has its own (Jellyfin transcode chunks land under `/cache/transcodes`, fully isolated).
|
||||
|
||||
## f) env-var diff
|
||||
|
||||
| Var | Prod | Dev |
|
||||
|-----|------|-----|
|
||||
| LANG | `en_US.UTF-8` (explicit) | `en_US.UTF-8` (image default) |
|
||||
| LC_ALL | `en_US.UTF-8` (explicit) | `en_US.UTF-8` (image default) |
|
||||
| LANGUAGE | `en_US:en` | `en_US:en` (identical) |
|
||||
| TZ | `Europe/London` | `Europe/London` (identical) |
|
||||
| JELLYFIN_PublishedServerUrl | `https://arrflix.s8n.ru` | `https://dev.arrflix.s8n.ru` |
|
||||
| JELLYFIN_UICulture | `en-US` (explicit) | (unset — server reads `system.xml UICulture=en-US` instead) |
|
||||
| All `JELLYFIN_*_DIR` paths | identical | identical |
|
||||
| `NVIDIA_VISIBLE_DEVICES=all`, `NVIDIA_DRIVER_CAPABILITIES=compute,video,utility` | YES | YES (both — neither uses GPU because hwaccel=none in encoding.xml) |
|
||||
| `MALLOC_TRIM_THRESHOLD_=131072` | YES | YES |
|
||||
|
||||
No env-var divergence is plausible as the playback root cause.
|
||||
|
||||
## g) web-overrides diff
|
||||
|
||||
```
|
||||
PROD: DEV:
|
||||
index.html 9723 bytes (root) index-dev.html 68349 bytes (user)
|
||||
index.html.bak.eng-pre-2026-05-08 59757 b index-dev.html.bak.pre-middle-theme 65789 b
|
||||
index.html.bak.pre-rollback-1778282871 69390 index-dev.html.bak.pre-mirror-1778289645 59757 b
|
||||
cineplex.css 16143 b cineplex.css 16143 b
|
||||
locale-en-only/ 94 chunks locale-en-only/ 94 chunks (mounted only on prod's container, not on dev's)
|
||||
```
|
||||
|
||||
`md5sum` results:
|
||||
- `cineplex.css` — IDENTICAL on both (`01e95d491d755ea3df39955af998d5f3`)
|
||||
- `index.html` (prod) `5b212d7d60b8a2b910a2f47dd0470a09` ≠ `index-dev.html` (dev) `9658933dfa069dce6f3cd58130249aa4`
|
||||
|
||||
**Anomaly**: prod `index.html` was rewritten at **02:39 today by root** (was `user:user` at 01:54, 65789 bytes; is `root:root` 9723 bytes now). Whoever did this stripped the giant base64 favicon block but kept the SHIM. Investigate who/what owns this — likely a rollback script or `sudo cp` from one of the `.bak` files.
|
||||
|
||||
The shim itself in current prod still contains:
|
||||
- `localStorage.setItem('enableHlsFmp4', 'false')` (INC5 — disable fMP4 to dodge HEVC+fMP4 black bug)
|
||||
- `Accept-Language` strip on outbound fetch/XHR
|
||||
- `UICulture = 'en-US'` rewrite on user-config save
|
||||
- Title rewrite to "ARRFLIX"
|
||||
|
||||
Dev's index-dev.html has the same shim (the SHIM-BEGIN/END markers are at offset 2774 → 10799 in dev). Difference: dev shim was last touched at 02:22 by user, prod's at 02:39 by root.
|
||||
|
||||
## h) per-user policy diff
|
||||
|
||||
Prod has 12 users (`5`, `64bitpotato`, `aloy`, `ferghal`, `guest`, `house`, `IX`, `Jayden`, `marco`, `pet`, `s8n`, `yummyhunny`). Dev has 1 (`test`).
|
||||
|
||||
`Users.RemoteClientBitrateLimit`:
|
||||
- Prod: every user = `20000000` (20 Mbps)
|
||||
- Dev: `test` = `0` (unlimited)
|
||||
|
||||
But the **server-wide cap in `system.xml`** is `8000000` (8 Mbps) on prod and `0` on dev. Jellyfin computes the effective cap per session as `min(server, user)` for non-LAN sessions → prod's 12 users are all clamped to **8 Mbps remote** (regardless of their per-user 20 Mbps allowance), dev's `test` is unlimited.
|
||||
|
||||
`Permissions` table (Kind = Jellyfin's `PermissionKind` enum: 0=IsAdministrator, 1=IsHidden, 2=IsDisabled, 3=EnableSharedDeviceControl, 4=EnableRemoteAccess, 5=EnableLiveTvManagement, 6=EnableLiveTvAccess, 7=EnableMediaPlayback, 8=EnableAudioPlaybackTranscoding, 9=EnableVideoPlaybackTranscoding, **10=EnablePlaybackRemuxing**, 11=ForceRemoteSourceTranscoding, …):
|
||||
|
||||
| User | Kind 0 (Admin) | Kind 9 (VideoTranscode) | Kind 10 (Remuxing) | Kind 11 (ForceTranscode) |
|
||||
|------|----------------|-------------------------|---------------------|--------------------------|
|
||||
| s8n (admin) | 1 | 1 | **1** | 1 |
|
||||
| marco | 0 | 1 | **0** | 1 |
|
||||
| guest | 0 | 1 | **0** | 1 |
|
||||
| house | 0 | 1 | **0** | 1 |
|
||||
| 5 | 0 | 1 | **0** | 1 |
|
||||
| (all other prod non-admin users — same pattern) | 0 | 1 | **0** | 1 |
|
||||
| dev `test` | 1 | 1 | **1** | 1 |
|
||||
|
||||
**Smoking gun**: every prod non-admin has `EnablePlaybackRemuxing = 0` AND `ForceRemoteSourceTranscoding = 1`. Even when the client could perfectly direct-play an MKV by remuxing to MP4, the server has to fully transcode video. Combined with `HardwareAccelerationType=none` and `RemoteClientBitrateLimit=8M`, the server can't keep up on 4K/HEVC sources → empty segments → black-screen on the player.
|
||||
|
||||
Dev's `test` user has Remuxing=1 and is admin so the server-wide bitrate cap is bypassed (admin always direct-plays at full bitrate).
|
||||
|
||||
---
|
||||
|
||||
## Recommended fix order
|
||||
|
||||
1. **Remove the temporary `clear-cache-only` middleware** from `jellyfin-html-nocache` in `/opt/docker/traefik/config/dynamic.yml` (per INC6 it was supposed to be removed already). Reload Traefik. Have owner hard-reload arrflix.s8n.ru once. **(2 minutes, near-zero blast radius)**
|
||||
2. **Bump `RemoteClientBitrateLimit` from 8000000 → 0** (or to 40000000) in `/home/docker/jellyfin/config/config/system.xml`, restart prod jellyfin. **(2 minutes)**
|
||||
3. **Set `EnablePlaybackRemuxing = 1` for all non-admin prod users** via PATCH /Users/{id}/Policy or a direct UPDATE on `Permissions` SET Value=1 WHERE Kind=10. Restart not required.
|
||||
4. Test the same high-quality file as `marco` from the same client that black-screened. If still bad → look at INC2 backdrop-pinning CSS in branding.xml (suspect #4) and Cineplex theme stacking context.
|
||||
5. Investigate who/what rewrote `/opt/docker/jellyfin/web-overrides/index.html` at 02:39 as root. Permissions are now `root:root` instead of `user:user`. Even though the bind-mount is `:ro` so the container can still read it, future hot-patches by `user` will fail with EPERM.
|
||||
|
||||
Do NOT change at this stage:
|
||||
- branding.xml (INC2 backdrop pinning) — defer until items 1-3 are tested. CSS-driven black would hit dev too once dev tries the same theme.
|
||||
- The 94 locale-en-only chunk overrides — orthogonal to playback.
|
||||
- encoding.xml — already identical to dev.
|
||||
|
||||
---
|
||||
|
||||
## Diff matrix
|
||||
|
||||
```
|
||||
DIM PROD DEV
|
||||
================================= ======================================================================== ========================================
|
||||
docker image jellyfin/jellyfin:10.10.3 jellyfin/jellyfin:10.10.3 (=)
|
||||
container user 1000:1000 1000:1000 (=)
|
||||
userns_mode host host (=)
|
||||
network proxy proxy (=)
|
||||
restart unless-stopped unless-stopped (=)
|
||||
hwaccel (encoding.xml) none none (=)
|
||||
EnableThrottling (encoding.xml) false false (= now; PROD was true earlier per .bak)
|
||||
EnableTonemapping (encoding.xml) false false (= now; PROD was true earlier per .bak)
|
||||
EnableSegmentDeletion false false (= now; PROD was true earlier per .bak)
|
||||
H264Crf / H265Crf 23 / 28 23 / 28 (=)
|
||||
QuickConnectAvailable (system.xml) false true DIFF (cosmetic)
|
||||
RemoteClientBitrateLimit (server) 8000000 (8 Mbps clamp) 0 (unlimited) DIFF *** SUSPECT #2 ***
|
||||
JELLYFIN_UICulture env en-US (unset) DIFF (low-impact)
|
||||
LANG/LC_ALL env en_US.UTF-8 (explicit) en_US.UTF-8 (image default) eq
|
||||
JELLYFIN_PublishedServerUrl env https://arrflix.s8n.ru https://dev.arrflix.s8n.ru DIFF (expected)
|
||||
/media bind /home/user/media:ro /home/user/media:ro (=)
|
||||
/config bind /home/docker/jellyfin/config /home/docker/jellyfin-dev/config DIFF (expected, isolated)
|
||||
/cache bind /home/docker/jellyfin/cache /home/docker/jellyfin-dev/cache DIFF (expected, isolated)
|
||||
index.html bind /opt/docker/jellyfin/web-overrides/index.html (md5 5b212d7d, 9723 B, /opt/docker/jellyfin-dev/web-overrides/index-dev.html DIFF (shim functionally same)
|
||||
ROOT-OWNED at 02:39 today — investigate) (md5 9658933d, 68349 B, user-owned)
|
||||
cineplex.css bind /opt/docker/jellyfin/web-overrides/cineplex.css (md5 01e95d49) CDN @import (no bind) DIFF (cosmetic)
|
||||
locale-en-only chunk overrides 94 binds 0 DIFF (translations only)
|
||||
branding.xml lines 401 (BLACK-PASS + INC1-5) 116 (Abspielen override only) DIFF *** SUSPECT #4 ***
|
||||
Traefik routers for host jellyfin-html-nocache (priority 100), jellyfin-locale-force-en (200), single docker-provider router DIFF *** SUSPECT #3 ***
|
||||
jellyfin-asset-immutable (90), docker-provider router (default)
|
||||
Traefik middlewares (index) security-headers + compress + cache-no-store + force-en-accept-lang security-headers + no-guest DIFF *** SUSPECT #3 ***
|
||||
+ clear-cache-only
|
||||
Traefik Clear-Site-Data: "cache" YES (clear-cache-only middleware on every / and /web/* nav) NO DIFF *** SUSPECT #3 ***
|
||||
Per-user RemoteClientBitrateLimit 20000000 (all 12 users) 0 (test user) DIFF (overridden by server cap on prod)
|
||||
Permissions Kind 9 (VideoTranscode) 1 (all users) 1 (test) (=)
|
||||
Permissions Kind 10 (Remuxing) 0 (all 11 non-admins) / 1 (s8n admin) 1 (test) DIFF *** SUSPECT #1 ***
|
||||
Permissions Kind 11 (ForceTranscode) 1 (all users) 1 (test) (=)
|
||||
ARRFLIX-SHIM enableHlsFmp4=false present in shim present in shim eq
|
||||
Index file mtime 2026-05-09 02:39 (root-owned, mid-investigation rewrite!) 2026-05-09 02:22 (user-owned) DIFF (anomaly — investigate)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Notes / open questions
|
||||
|
||||
- Prod's `index.html` going `root:root` at 02:39 mid-investigation is suspicious. Confirm: was a recovery script run? Is there a cron that copies from `.bak` if checksum drifts? If so, it's racing the live edits.
|
||||
- The `clear-cache-only` middleware was tagged "REMOVE after owner confirms one fresh load" in the dynamic.yml comment. Owner has confirmed (per doc 26 status = CLOSED). It must be retired now.
|
||||
- Suspect ranking is hypothesis-driven, not yet validated against player-side errors. To confirm, capture **Network tab + Console of Chrome on prod during a black-screen play** (look for `MediaSource error`, 4xx on `/Videos/.../stream.mp4`, `Clear-Site-Data` rows, fMP4 segment fetches stalling). That single trace would collapse the ranking by 80%.
|
||||
|
||||
---
|
||||
|
||||
## Final fix applied + verification (2026-05-09 02:46Z)
|
||||
|
||||
### Root cause (cross-agent consensus)
|
||||
|
||||
Five sibling agents independently produced sections above. Agreed root cause:
|
||||
|
||||
`/opt/docker/traefik/config/dynamic.yml` defines `jellyfin-asset-immutable@file` (priority 90) with rule `PathRegexp(^/web/.+\.(js|css|woff2|...)$)`. Jellyfin's PWA ships its service worker as `/web/serviceworker.js` (NOT `/web/sw.js`). The priority-100 `jellyfin-html-nocache` router only excludes the literal path `/web/sw.js`, so `/web/serviceworker.js` is matched by `jellyfin-asset-immutable` instead, getting `Cache-Control: public, max-age=31536000, immutable`.
|
||||
|
||||
Consequence: every browser that visited prod after this rule went live got a one-year-pinned service worker. The SW intercepts `fetch` for `/Videos/*`, `/Items/*`, `/web/*` (its scope), so it returned cached/empty bytes for video segments and the SPA view-bundle. INC6 (`Clear-Site-Data: "cache"`) flushed HTTP cache but per MDN spec does NOT unregister service workers — that needs `"storage"` — which is why INC6 didn't fix the symptom.
|
||||
|
||||
Confirmed at the wire: `curl -I /web/serviceworker.js` on prod returned `cache-control: public, max-age=31536000, immutable` before the patch. Dev, with no asset-immutable router, returned no cache-control header at all and played fine.
|
||||
|
||||
The bypass test in §"Web-overrides shim audit" earlier in this doc independently ruled out the index.html shim (vanilla 9723-byte upstream index.html reproduced the same black screen). Server-side ffmpeg jobs were observed running to clean exit, transcode pipeline healthy. So the failure was strictly client-side via the pinned SW.
|
||||
|
||||
### Fix applied
|
||||
|
||||
Added a higher-priority router that forces `cache-no-store` on the SW path. Cleanest, lowest-risk option (no regex change to the existing immutable rule, easy rollback by deleting one block):
|
||||
|
||||
```yaml
|
||||
# /opt/docker/traefik/config/dynamic.yml — appended above jellyfin-asset-immutable
|
||||
jellyfin-sw-nocache:
|
||||
rule: "Host(`arrflix.s8n.ru`) && (Path(`/web/serviceworker.js`) || Path(`/web/sw.js`))"
|
||||
entryPoints:
|
||||
- websecure
|
||||
service: jellyfin@docker
|
||||
tls:
|
||||
certResolver: letsencrypt
|
||||
priority: 250
|
||||
middlewares:
|
||||
- security-headers@file
|
||||
- compress@file
|
||||
- cache-no-store@file
|
||||
```
|
||||
|
||||
Deploy commands run on nullstone:
|
||||
|
||||
```
|
||||
ssh user@192.168.0.100
|
||||
# backup taken: /opt/docker/traefik/config/dynamic.yml.bak.pre-sw-fix-1778291088
|
||||
scp /tmp/dynamic.yml.work user@192.168.0.100:/opt/docker/traefik/config/dynamic.yml
|
||||
# Traefik hot-reloads dynamic.yml automatically; no docker restart needed.
|
||||
```
|
||||
|
||||
### Wire-level verification
|
||||
|
||||
```
|
||||
$ curl -sI 'https://arrflix.s8n.ru/web/serviceworker.js' --resolve 'arrflix.s8n.ru:443:127.0.0.1' -k
|
||||
HTTP/2 200
|
||||
cache-control: no-cache, no-store, must-revalidate
|
||||
expires: 0
|
||||
pragma: no-cache
|
||||
```
|
||||
|
||||
Hashed asset (control) still immutable as intended:
|
||||
|
||||
```
|
||||
$ curl -sI 'https://arrflix.s8n.ru/web/main.jellyfin.bundle.js' --resolve 'arrflix.s8n.ru:443:127.0.0.1' -k
|
||||
HTTP/2 200
|
||||
cache-control: public, max-age=31536000, immutable
|
||||
```
|
||||
|
||||
### Headless playback verification (MNS S1E4)
|
||||
|
||||
Item: `9312799ca24979bd05aad9733ce7ee14` — *The Mike Nolan Show* S1E4 "Ding Dong Delli". Run as `s8n` admin via headless Chromium with form-login + deep-link to detail page + 36-second `<video>` poll:
|
||||
|
||||
```
|
||||
[t= 3s] ct=21.75 dur=328.37 rs=4 paused=False vw=1920 vh=1080 err=None
|
||||
[t= 6s] ct=24.77 ...
|
||||
[t= 9s] ct=27.76 ...
|
||||
[t= 12s] ct=30.76 ...
|
||||
[t= 15s] ct=33.77 ...
|
||||
[t= 18s] ct=36.78 ...
|
||||
[t= 21s] ct=39.79 ...
|
||||
[t= 24s] ct=42.79 ...
|
||||
[t= 27s] ct=45.80 ...
|
||||
[t= 30s] ct=48.82 ...
|
||||
[t= 33s] ct=51.82 ...
|
||||
[t= 36s] ct=54.84 ...
|
||||
VERDICT: ct_advance=33.09s rs=4 vw=1920 err=None → PASS
|
||||
```
|
||||
|
||||
`headless-test-v2.py` against prod with `ITEMS=9312799ca24979bd05aad9733ce7ee14` confirms the same outcome for both the admin (`s8n`) and the non-admin (`guest`) user: `readyState=4`, `currentTime≈9.5s`, `videoWidth=1920`, `paused=false`, `error=null`, src `https://arrflix.s8n.ru/Videos/9312799ca24979bd05aad9733ce7ee14/stream.mkv?Static=true...` (direct-play, no transcode required for this codec/profile pair).
|
||||
|
||||
### Open follow-ups
|
||||
|
||||
1. **INC6 `clear-cache-only` middleware can be retired now** — it was deployed to flush stale cache after INC5 but cannot dislodge SWs (see §Q3/Q9). Now that the SW is on `cache-no-store`, the hammer is no longer needed. Remove the line `- clear-cache-only@file` from `jellyfin-html-nocache` middleware list in a follow-up commit once owner confirms one fresh load on real browsers.
|
||||
2. **Service-worker auto-recovery for already-poisoned clients.** The ARRFLIX shim already loops `navigator.serviceWorker.getRegistrations() → r.unregister(); caches.keys() → caches.delete()` once per pageview (verified in shim audit §c). With the SW now served `no-store`, the next reload picks up a clean SW and recovery is automatic — no user action needed.
|
||||
3. **INC2 backdrop-pin CSS in branding.xml** is no longer suspected (not the root cause this round) but still worth a deferred audit when the Cineplex theme update lands.
|
||||
4. **Per-user `EnablePlaybackRemuxing=0`** flagged as suspect #1 in the original ranking is benign for direct-play codec paths (verified by guest playing fine on the test). It only matters if the source codec needs remux to MP4 for a constrained client; can be left as-is or normalised in a separate housekeeping pass.
|
||||
5. **`/opt/docker/jellyfin/web-overrides/index.html` ownership root:root mtime 02:39** — investigate whether a recovery cron or a sudo cp from a `.bak` file rewrote it mid-incident. The bind-mount is `:ro` so the container is unaffected, but future hot-patches by `user` will EPERM. Cosmetic, fix in a follow-up.
|
||||
|
||||
### Commit
|
||||
|
||||
This doc + the dynamic.yml patch (deployed to nullstone, hot-reloaded) are committed together as INC7.
|
||||
Loading…
Reference in a new issue