From 6614911432311cce4076e035db719b20463e689c Mon Sep 17 00:00:00 2001 From: s8n Date: Fri, 8 May 2026 04:24:21 +0100 Subject: [PATCH] doc 13: read-only optimization audit --- docs/13-optimization-audit.md | 372 ++++++++++++++++++++++++++++++++++ 1 file changed, 372 insertions(+) create mode 100644 docs/13-optimization-audit.md diff --git a/docs/13-optimization-audit.md b/docs/13-optimization-audit.md new file mode 100644 index 0000000..4d9aae1 --- /dev/null +++ b/docs/13-optimization-audit.md @@ -0,0 +1,372 @@ +# 13 — Optimization Audit (Read-Only) + +> Status: **read-only audit**, executed 2026-05-08 against +> `https://arrflix.s8n.ru` (Jellyfin 10.10.3 on nullstone). Scope: scan +> for performance, capacity, reliability, and ops-hygiene risks. **No +> fixes applied. No state mutated. No container restarts.** + +Audit ran ~25 minutes wall. Inputs: Jellyfin REST API (auth +`X-Emby-Token: 76858153…f8b1`), `docker exec jellyfin`, `docker logs +{traefik,jellyfin} --since 1h/6h/24h`, host `free`, `df`, `uptime`, +`nvidia-smi`, on-disk Jellyfin XML configs. + +--- + +## Executive summary + +1. **Host is under serious memory pressure right now.** `uptime` shows + load average **11.40 / 9.59 / 6.19** on a 12-core box, **6.8 GiB of + swap is in use** (out of 24 GiB), and `/home` is **90 % full + (40 GiB free of 399 GiB)**. Jellyfin itself is fine + (522 MiB / 31 GiB cap, no restarts), but the host it lives on is + loaded enough that any media ingest at scale will start swap-thrashing. + This is the single biggest risk to playback latency. +2. **GPU transcode is dead and confirmed dead.** `nvidia-smi` fails on + host, `lsmod | grep nvidia` returns empty, `/dev/nvidia*` does not + exist. `EnableHardwareEncoding=true` and `HardwareAccelerationType=none` + in `encoding.xml` is harmless but misleading — the toggle is on, but + the type selector is `none`, so every transcode goes through ffmpeg + software path. Two HLS segment requests this hour returned **499** + (client cancelled mid-transcode at 6.4 s and 2.9 s wall) — that is + the playback-stalls signature. +3. **OpenSubtitles plugin is logging an error per file probed during + library scan** (102 errors in last 6 h) because `Username` and + `Password` are empty in the plugin XML. Every Scan Media Library run + tries Open Subtitles, fails on auth, logs an `ERR`, retries on the + next file. This is pure log noise + wasted RTT, not data loss, but + it bloats `/config/log` and obscures real warnings. +4. **Transcode throttling is OFF and `MaxMuxingQueueSize` is 2048** — + on a CPU-only deploy that means a stalled client with high-bitrate + AV1/HEVC source will keep ffmpeg burning a full core for up to + `SegmentKeepSeconds=720`s after the client gives up. `EnableThrottling` + should be on for a CPU deploy; this would have prevented the 499s + seen above. +5. **No automated backup of `/home/docker/jellyfin/config/`.** The + Cineplex CSS, the 5 user accounts + permissions, the library + metadata, and the Open Subtitles plugin install all live in one + unprotected directory tree. The repo's `snapshots/` only captures the + pre-ElegantFin migration baseline; nothing on disk is being rotated + off-host. + +--- + +## Findings table + +Severity legend: **R** = red (acute, fix this week), **Y** = yellow +(deferred fix, document risk), **G** = green (audited, healthy, no +action). Effort: **S** ≤ 30 min, **M** half-day, **L** > 1 day. + +| # | Category | Severity | Evidence | Recommendation | Effort | +|---|---|:-:|---|---|:-:| +| 01 | Host capacity | **R** | `uptime` load 11.40 / 9.59 / 6.19 on 12 cores; swap 6.8 GiB used / 24 GiB; `/home` 90 % full | Identify swap hog (likely not Jellyfin — only 522 MiB RSS); reclaim space on `/home`; budget media additions against the 40 GiB headroom | M | +| 02 | GPU transcode | **R** | `nvidia-smi` fails, no `/dev/nvidia*`, `lsmod` no nvidia mod; `HardwareAccelerationType=none` | Reinstall nvidia driver on nullstone host; once `nvidia-smi` works, add device reservation block to compose and flip `HardwareAccelerationType` to `nvenc` | L | +| 03 | Transcode throttling | **R** | `EnableThrottling=false`, `ThrottleDelaySeconds=180`, `MaxMuxingQueueSize=2048`, **two 499 client-cancels** logged (6 439 ms / 2 890 ms) | Enable `EnableThrottling=true` and `EnableSegmentDeletion=true` for CPU-only era — caps wasted ffmpeg CPU after client disconnect | S | +| 04 | OpenSubtitles auth | **R** | `Username`/`Password` empty in `Jellyfin.Plugin.OpenSubtitles.xml`; **102** `Error downloading subtitles from Open Subtitles` lines / 6 h | Set creds via UI, OR disable the provider on both libraries (`EnableInternetProviders=false` already; subtitle search still runs). Doc 03-subtitles.md already calls this out as pending | S | +| 05 | Cache trash budget | **Y** | `EnableSegmentDeletion=false`, `SegmentKeepSeconds=720`; `/cache/transcodes` only 20 K right now (no live stream), but a 4K HEVC→h264 session will fill GiBs and not auto-prune | Enable `EnableSegmentDeletion=true` (default 720 s keep is fine) — pairs with finding 03 | S | +| 06 | Backup posture | **R** | `/home/docker/jellyfin/config/` (104 MB) has no off-host rotation; `snapshots/` in repo only holds pre-ElegantFin baseline | Add a weekly `tar.zst` of `/config/` (excluding `log/`, `cache/`) to NAS or git-backed snapshot dir | M | +| 07 | Disk pressure | **Y** | `/home` 90 % full, 40 GiB free of 399 GiB; `/home/user/media` only 189 files | Cap on media growth: at current free space + episode bitrate budget user has ~3–4 more series before disk fills | M | +| 08 | DB WAL ratio | **Y** | `library.db`=3.3 MB, `library.db-wal`=4.4 MB (WAL > main, uncheckpointed). `Optimize database` last ran 2026-05-08T00:58 (OK) but a fresh scan completed 03:16 left WAL fat | Either trigger a manual `Optimize database` post-scan, or shorten its schedule to "after every full scan". WAL > main is normal during/after a scan but should checkpoint on idle | S | +| 09 | Custom CSS bloat | **Y** | `CustomCss` in `branding.xml` is **25 225 bytes**, 17 `!important`, sole `@import` is `MRunkehl/cineplex@v1.0.6` (jsDelivr) | jsDelivr import adds 1 round-trip + ~50 KB on every cold cache load. Inline the import for offline-resilience and one-fewer DNS hop. Also doc 11 already flags this as the wrong theme (Cineplex, not NeutralFin) — resolve theme race first | M | +| 10 | SPA shim cost | **G** | `web-overrides/index.html` 58 KB; runs **2× MutationObserver** + **1× setInterval(1000ms)** with `lockTitle/lockFavicon/nukeSettings`; cost ~1 ms per tick | Acceptable for a single-tab branding shim; would be a problem only on background tabs at scale. No action | — | +| 11 | Service worker | **G** | `/web/serviceworker.js` 768 bytes, last modified 2024-11-19 (Jellyfin 10.10.3 ship date), serves with `cache-control: no-store` (HTTPS, etag set). Notification-only SW (per doc 10) | No action — it is small and not caching `index.html` so cannot pin stale branding | — | +| 12 | Metrics endpoint | **G** | `EnableMetrics=false` | Off is correct for a single-server box. No action | — | +| 13 | Slow-response warning | **Y** | `EnableSlowResponseWarning=true`, threshold **500 ms**. Two transcoding 499s above 2.8 s would normally trigger this warning, but I see 0 `slow` lines in 1 h logs | Either Jellyfin's slow log only fires on synchronous request handlers (not HLS segment GETs), or warning suppressed by another setting. Worth confirming threshold semantics | S | +| 14 | Library scan concurrency | **Y** | `LibraryScanFanoutConcurrency=0`, `LibraryMetadataRefreshConcurrency=0`, `ParallelImageEncodingLimit=0` (all defaults — auto = `ProcessorCount`) | On a 12-core box already at load 11+, `0` (= 12) for all three is aggressive. Cap each at 4–6 to leave headroom for Forgejo/Traefik/etc | S | +| 15 | Realtime monitor | **Y** | Both libraries have `EnableRealtimeMonitor=true`; only 189 files; `LibraryMonitorDelay=60` | Fine for current size, but inotify watches grow with file count. Re-evaluate at 10 k+ files | — | +| 16 | Trickplay / chapter previews | **G** | `EnableTrickplayImageExtraction=false`, `ExtractChapterImagesDuringLibraryScan=false`, `EnableChapterImageExtraction=false`, `ExtractTrickplayImagesDuringLibraryScan=false` (all libs) | Disabled on both libraries — saves significant CPU. No action. (Note: scheduled task `Generate Trickplay Images` still ran 02:00 — check it is a no-op when libs say no) | — | +| 17 | Photos library | **G** | `EnablePhotos=false` on both | Correct for a movies/TV deploy. No action | — | +| 18 | Plugin set | **G** | 6 plugins active (AudioDB, MusicBrainz, OMDb, OpenSubtitles, StudioImages, TMDb). `Username/Password` empty for OMDb (= no key, falls back to anon rate limit) and TMDb (`TmdbApiKey` empty — falls back to bundled key) | Both tolerated. AudioDB + MusicBrainz unused (no music libs) but cost zero idle. Consider removing for minimalism, not perf | — | +| 19 | Admin user policy | **R** | `s8n` admin has `EnableRemoteControlOfOtherUsers=true`, `EnableContentDeletion=true` (correct for admin) but **also `IsHidden=true`** | Hidden admin is non-standard; usually a hidden admin is reserved for automation. If `s8n` is the operator's daily account, `IsHidden=false` is the convention. Low risk, just unusual | S | +| 20 | Non-admin policies | **Y** | All 4 non-admin users (`5`, `guest`, `house`, `marco`) have `EnableContentDownloading=true`, `EnableMediaConversion=true`, `EnableLiveTvManagement=true`, `EnableSharedDeviceControl=true`, `IsHidden=true` | LiveTvManagement on accounts with no Live TV is dead weight, no harm. ContentDownloading + MediaConversion let any user kick off transcodes — a foot-gun on a CPU-only host. Review desired stance | S | +| 21 | Login disclaimer leak | **G** | `LoginDisclaimer` = "Welcome to ARRFLIX - Private invite only service" | Public-facing string is intentional per doc 09. No action | — | +| 22 | Public WAN exposure | **Y** | `EnableRemoteAccess=true`, `no-guest@file` middleware **dropped** in compose (per doc 09 §1.2). 24 h log: 270 LAN reqs, **59 reqs from 157.143.84.87, 1 from 82.31.156.86** | Doc 09 confirms this is intentional. The 157.143.84.87 hits are bot-style asset-prober 404s — harmless but confirms the service is internet-reachable. No action; re-verify rate limit / fail2ban once router port-forward is active | — | +| 23 | Splashscreen size | **Y** | `/config/data/splashscreen.png` is **3.0 MB** | A splash image of 3 MB is large for a PNG; lossless re-encode or downscale to ≤500 KB; saves on first-paint over WAN | S | +| 24 | Log rotation | **G** | `LogFileRetentionDays=3`; `/config/log` 1.3 MB; rotation working | No action | — | +| 25 | Splashscreen flag | **Y** | `SplashscreenEnabled=true` in `branding.xml` | Intentional for branding, no action — pairs with finding 23 (just shrink the file) | — | +| 26 | Cache breakdown | **G** | `/cache/images` 15 MB (entire cache 15 MB); `/config/metadata` 92 MB; `/config/data` 12 MB; `/config/plugins` 128 KB | Healthy small footprint. No action | — | +| 27 | Forgejo log noise | **Y** | Traefik logs show `forgejo@docker` returning **401** for `s8n/ARRFLIX.git/info/refs?service=git-receive-pack` 8× / hour from 192.168.0.10 | Out of scope for this deploy but indicates a stale `git push` retry loop on onyx — surfaces here only because we're scanning traefik logs. Mention to operator separately | — | +| 28 | Path substitutions | **G** | `system.xml` empty `` and `` | Correct (no NFS/SMB indirection, no cross-origin clients). No action | — | +| 29 | LiveTV residue | **G** | `DisableLiveTvChannelUserDataName=true`; no Live TV configured; per-user `EnableLiveTvAccess=true` is dead weight | Cosmetic; no perf cost. No action | — | +| 30 | Container restart count | **G** | `docker inspect` `RestartCount=0`, `Status=running`, `StartedAt=2026-05-08T02:13:01` (~2 h uptime, healthy) | No action. (Boot was at 02:13, suggests the compose was applied for doc-09 WAN flip and ran clean since) | — | +| 31 | Network XML hygiene | **Y** | `KnownProxies` empty, `LocalNetworkSubnets` empty, `LocalNetworkAddresses` empty | Jellyfin can't tell the Traefik 172.20.0.0/16 docker net from random WAN — every external IP is logged as remote, which inflates Jellyfin's geoIP/session bookkeeping. Set `KnownProxies=172.20.0.0/16` and `LocalNetworkSubnets=192.168.0.0/24` | S | +| 32 | TLS cert | **G** | LE cert valid `2026-05-08 → 2026-08-06` (89 days remaining), issued by R13, Gandi DNS-01 resolver, in `acme.json` | Healthy. No action | — | +| 33 | Request-rate posture | **G** | 81 req / hour total via traefik; 62 of those are `jellyfin@docker`. Top src 192.168.0.10 (LAN, the operator), then 157.143.84.87 (asset-prober 404s) | Low rate. No action — re-evaluate if WAN exposure draws more traffic | — | +| 34 | Idle session count | **G** | `/Sessions` returns 2 idle (s8n + guest) on 192.168.0.10; no playback in flight at audit time | No action | — | +| 35 | Item counts | **G** | 2 movies, 6 series, 169 episodes; matches `find /media -type f` (189 files, accounting for non-video extras) | Library scan is healthy; counts converged | — | + +--- + +## Recommended fix order (top 5 by impact-per-effort) + +1. **Finding 03 — enable transcode throttling + segment deletion.** + *Effort: S (two checkboxes in Playback settings).* Closes the + highest-cost behaviour we have evidence of (the 499 ms wall events). + Saves CPU cycles per stalled client. +2. **Finding 04 — set OpenSubtitles credentials, OR disable + provider.** *Effort: S.* Removes 102 ERR/6 h of log spam, fixes + subtitle download, immediately restores log signal. +3. **Finding 31 — populate `KnownProxies` + `LocalNetworkSubnets` in + `network.xml`.** *Effort: S.* Restores accurate session origin + reporting; needed before any rate-limiting or fail2ban work post-WAN. +4. **Finding 14 — cap `LibraryScanFanoutConcurrency`, + `LibraryMetadataRefreshConcurrency`, `ParallelImageEncodingLimit` + to 4–6.** *Effort: S.* Stops a future scan piling on top of the + existing host load (currently 11.4). +5. **Finding 06 — automate `/config/` backup.** *Effort: M.* Single + highest-blast-radius risk: a corrupt `library.db` or a `branding.xml` + regression and you've lost the user accounts AND the theme work in + one go. A weekly `tar.zst` to NAS closes this. + +GPU re-enable (finding 02) would unlock more wins but is **L** effort +and lives outside Jellyfin (host driver work). Throttling (#03) is the +right CPU-era patch until then. + +--- + +## Out of scope (audited and found healthy) + +- **Service worker** (`/web/serviceworker.js`, 768 B, notification-only, + not caching index.html — finding 11). +- **Container restart count** (0 — finding 30). +- **TLS cert chain** (89 days valid — finding 32). +- **Trickplay / chapter / photo extraction** (all disabled — findings + 16, 17). +- **Log rotation** (3-day retention working, 1.3 MB /config/log — + finding 24). +- **Cache directory growth** (15 MB total, healthy — finding 26). +- **Plugin set** (6 plugins, all idle-cheap — finding 18). +- **Idle session footprint** (2 idle web sessions, no playback in + flight — finding 34). +- **Item count convergence** (Items/Counts matches filesystem — + finding 35). +- **Path substitution / CORS hygiene** (empty as expected — finding 28). +- **Login disclaimer string** (per-doc-09 intentional public-facing + text — finding 21). + +--- + +## Appendix — raw evidence + +### Host + +``` +uptime: 04:18:55 up 4 days, 4:36, 3 users, load average: 11.40, 9.59, 6.19 +nproc: 12 +free -h: total 31Gi, used 9.2Gi, free 5.8Gi, swap used 6.8Gi / 24Gi +df -h /home: 399G total, 339G used, 40G avail (90 % full) +``` + +### Container + +``` +docker stats jellyfin (no-stream): +CPU 0.01 %, MEM 521.5 MiB / 31.27 GiB (1.63 %), PIDS 24, NET 83.8 MB / 361 MB +docker inspect: Restarts=0, Started=2026-05-08T02:13:01Z, Status=running +``` + +### GPU + +``` +nvidia-smi: NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver +lsmod | grep nvidia: (no matches) +ls /dev/nvidia*: No such file or directory +encoding.xml: HardwareAccelerationType=none, EnableHardwareEncoding=true +``` + +### Disk + +``` +/config 104 M (data 12M, metadata 92M, log 1.3M, plugins 128K) +/cache 15 M (images 15M, transcodes 20K, fontconfig 36K, omdb 84K) +/home/docker/jellyfin: not visible (sudo blocked); inferred from container view +``` + +### Database + +``` +jellyfin.db 208 K (WAL 473 K, SHM 32 K) +library.db 3.3 M (WAL 4.4 M, SHM 32 K) <- WAL > main +keyframes/ 16 K +splashscreen.png 3.0 M +``` + +### Traefik (last 1 h) + +``` +total log lines: 279 +jellyfin@docker requests: 62 +status 499 (client cancel): 2 (HLS segments, 6439 ms + 2890 ms) +status 5xx: 0 +top source IPs (jellyfin): + 82.31.156.86 123 (own WAN egress, hairpin) + 82.131.116.123 122 (external — likely friend / scanner) + 192.168.0.10 13 (operator LAN) + 173.244.58.11 2 (cloud scanner) + 35.203.85.72 1 (Google security scan) +``` + +### Jellyfin (last 6 h) + +``` +"Error downloading subtitles from Open Subtitles": 102 +"slow" / "throttl" matches: 1 (false positive, no real slow-warn) +Container restart events: 0 +``` + +### TLS + +``` +Subject: CN=arrflix.s8n.ru +Issuer: C=US, O=Let's Encrypt, CN=R13 +Valid: 2026-05-08 00:58:11 GMT → 2026-08-06 00:58:10 GMT (89 d) +Resolver: letsencrypt (Gandi DNS-01) +``` + +### Service worker + +``` +URL: https://arrflix.s8n.ru/web/serviceworker.js +HTTP: 200, content-type text/javascript +Size: 768 bytes +Last-Modified: Tue, 19 Nov 2024 03:43:48 GMT (Jellyfin 10.10.3 ship) +Headers: HSTS preload + nosniff + frame=SAMEORIGIN + xss-protection +``` + +### CSS / branding + +``` +/Branding/Configuration: + CustomCss bytes: 25 225 + !important rules: 17 + sole @import: https://cdn.jsdelivr.net/gh/MRunkehl/cineplex@v1.0.6/cineplex.css + LoginDisclaimer: "Welcome to ARRFLIX - Private invite only service" + SplashscreenEnabled: True +on disk: + /config/config/branding.xml 25 584 bytes +``` + +### SPA shim + +``` +/opt/docker/jellyfin/web-overrides/index.html 58 725 bytes +MutationObserver count: 2 (one head/title-favicon, one body/nukeSettings) +setInterval count: 1 (1000 ms — relocks title + favicon + nukeSettings) +``` + +### Users + +``` +# users: 5 +admin (s8n): IsHidden=true, EnableRemoteControlOfOtherUsers=true, EnableContentDeletion=true +non-admin (5, guest, house, marco): IsHidden=true, EnableContentDownloading=true, + EnableMediaConversion=true, EnableLiveTvManagement=true +``` + +### Plugins + +``` +AudioDB 10.10.3.0 Active +MusicBrainz 10.10.3.0 Active RateLimit=1, ReplaceArtistName=false +OMDb 10.10.3.0 Active CastAndCrew=false +Open Subtitles 20.0.0.0 Active Username/Password empty, CredentialsInvalid=false +Studio Images 10.10.3.0 Active +TMDb 10.10.3.0 Active TmdbApiKey empty +``` + +### Library options (both libs) + +``` +EnableRealtimeMonitor = True +ExtractChapterImagesDuringLibraryScan = False +EnableTrickplayImageExtraction = False +EnablePhotos = False +SaveLocalMetadata = False +EnableInternetProviders = False +SkipSubtitlesIfAudioTrackMatches = True +SaveSubtitlesWithMedia = True +ExtractTrickplayImagesDuringLibraryScan= False +``` + +### Network XML + +``` +EnableHttps=false (TLS handled by Traefik) | EnableUPnP=false | EnableRemoteAccess=true +KnownProxies=(empty) LocalNetworkSubnets=(empty) LocalNetworkAddresses=(empty) +IgnoreVirtualInterfaces=true VirtualInterfaceNames=[veth] +EnablePublishedServerUriByRequest=false +``` + +### System config — performance knobs + +``` +LogFileRetentionDays = 3 +EnableMetrics = False +EnableSlowResponseWarning = True (threshold 500 ms) +RemoteClientBitrateLimit = 0 (no cap) +LibraryScanFanoutConcurrency = 0 (auto = ProcessorCount = 12) +LibraryMetadataRefreshConcurrency = 0 (auto = ProcessorCount = 12) +ParallelImageEncodingLimit = 0 (auto = ProcessorCount = 12) +EnableNormalizedItemByNameIds = True (correct for 10.10.x) +QuickConnectAvailable = False +EnableCaseSensitiveItemIds = True +EnableFolderView = False +EnableGroupingIntoCollections = False +IsStartupWizardCompleted = True +ChapterImageResolution = (default) +DummyChapterDuration = (default) +ImageExtractionTimeoutMs = (default) +LibraryMonitorDelay = 60 +LibraryUpdateDuration = 30 +ActivityLogRetentionDays = (default) +``` + +### Encoding config — full dump + +``` +EncodingThreadCount = -1 (auto) +EnableAudioVbr = False +MaxMuxingQueueSize = 2048 +EnableThrottling = False ← finding 03 +ThrottleDelaySeconds = 180 +EnableSegmentDeletion = False ← finding 05 +SegmentKeepSeconds = 720 +HardwareAccelerationType = none ← finding 02 +EncoderAppPathDisplay = /usr/lib/jellyfin-ffmpeg/ffmpeg +VaapiDevice = /dev/dri/renderD128 (no Intel iGPU on host) +H264Crf = 23 +H265Crf = 28 +EncoderPreset = (nil) +EnableHardwareEncoding = True (no-op while type=none) +AllowHevcEncoding = False +AllowAv1Encoding = False +EnableSubtitleExtraction = True +HardwareDecodingCodecs = [h264, vc1] +AllowOnDemandMetadataBasedKeyframeExtractionForExtensions = [mkv] +PreferSystemNativeHwDecoder = True +EnableEnhancedNvdecDecoder = True (no-op while no nvidia) +``` + +### Scheduled tasks + +``` +Audio Normalization Idle Completed 2026-05-08T00:58 +Clean Cache Directory Idle Completed 2026-05-08T00:58 +Clean Log Directory Idle Completed 2026-05-08T00:58 +Clean Transcode Directory Idle Completed 2026-05-08T02:13 +Download missing subtitles Idle Completed 2026-05-08T00:58 +Extract Chapter Images Idle Completed 2026-05-08T01:00 +Generate Trickplay Images Idle Completed 2026-05-08T02:00 (no-op?) +Optimize database Idle Completed 2026-05-08T00:58 +Refresh People Idle Completed 2026-05-08T00:58 +Scan Media Library Idle Completed 2026-05-08T03:16 +Update Plugins Idle Completed 2026-05-08T02:13 +``` + +--- + +## Sign-off + +- Audit: 2026-05-08, read-only, ~25 min wall. +- No fixes applied. No state mutated. No container restart. +- Next audit due: **2026-08-08** (quarterly, before LE cert renewal + window opens at 2026-08-06).