ARRFLIX/docs/13-optimization-audit.md

22 KiB
Raw Permalink Blame History

13 — Optimization Audit (Read-Only)

Status: read-only audit, executed 2026-05-08 against https://arrflix.s8n.ru (Jellyfin 10.10.3 on nullstone). Scope: scan for performance, capacity, reliability, and ops-hygiene risks. No fixes applied. No state mutated. No container restarts.

Audit ran ~25 minutes wall. Inputs: Jellyfin REST API (auth X-Emby-Token: 76858153…f8b1), docker exec jellyfin, docker logs {traefik,jellyfin} --since 1h/6h/24h, host free, df, uptime, nvidia-smi, on-disk Jellyfin XML configs.


Executive summary

  1. Host is under serious memory pressure right now. uptime shows load average 11.40 / 9.59 / 6.19 on a 12-core box, 6.8 GiB of swap is in use (out of 24 GiB), and /home is 90 % full (40 GiB free of 399 GiB). Jellyfin itself is fine (522 MiB / 31 GiB cap, no restarts), but the host it lives on is loaded enough that any media ingest at scale will start swap-thrashing. This is the single biggest risk to playback latency.
  2. GPU transcode is dead and confirmed dead. nvidia-smi fails on host, lsmod | grep nvidia returns empty, /dev/nvidia* does not exist. EnableHardwareEncoding=true and HardwareAccelerationType=none in encoding.xml is harmless but misleading — the toggle is on, but the type selector is none, so every transcode goes through ffmpeg software path. Two HLS segment requests this hour returned 499 (client cancelled mid-transcode at 6.4 s and 2.9 s wall) — that is the playback-stalls signature.
  3. OpenSubtitles plugin is logging an error per file probed during library scan (102 errors in last 6 h) because Username and Password are empty in the plugin XML. Every Scan Media Library run tries Open Subtitles, fails on auth, logs an ERR, retries on the next file. This is pure log noise + wasted RTT, not data loss, but it bloats /config/log and obscures real warnings.
  4. Transcode throttling is OFF and MaxMuxingQueueSize is 2048 — on a CPU-only deploy that means a stalled client with high-bitrate AV1/HEVC source will keep ffmpeg burning a full core for up to SegmentKeepSeconds=720s after the client gives up. EnableThrottling should be on for a CPU deploy; this would have prevented the 499s seen above.
  5. No automated backup of /home/docker/jellyfin/config/. The Cineplex CSS, the 5 user accounts + permissions, the library metadata, and the Open Subtitles plugin install all live in one unprotected directory tree. The repo's snapshots/ only captures the pre-ElegantFin migration baseline; nothing on disk is being rotated off-host.

Findings table

Severity legend: R = red (acute, fix this week), Y = yellow (deferred fix, document risk), G = green (audited, healthy, no action). Effort: S ≤ 30 min, M half-day, L > 1 day.

# Category Severity Evidence Recommendation Effort
01 Host capacity R uptime load 11.40 / 9.59 / 6.19 on 12 cores; swap 6.8 GiB used / 24 GiB; /home 90 % full Identify swap hog (likely not Jellyfin — only 522 MiB RSS); reclaim space on /home; budget media additions against the 40 GiB headroom M
02 GPU transcode R nvidia-smi fails, no /dev/nvidia*, lsmod no nvidia mod; HardwareAccelerationType=none Reinstall nvidia driver on nullstone host; once nvidia-smi works, add device reservation block to compose and flip HardwareAccelerationType to nvenc L
03 Transcode throttling R EnableThrottling=false, ThrottleDelaySeconds=180, MaxMuxingQueueSize=2048, two 499 client-cancels logged (6 439 ms / 2 890 ms) Enable EnableThrottling=true and EnableSegmentDeletion=true for CPU-only era — caps wasted ffmpeg CPU after client disconnect S
04 OpenSubtitles auth R Username/Password empty in Jellyfin.Plugin.OpenSubtitles.xml; 102 Error downloading subtitles from Open Subtitles lines / 6 h Set creds via UI, OR disable the provider on both libraries (EnableInternetProviders=false already; subtitle search still runs). Doc 03-subtitles.md already calls this out as pending S
05 Cache trash budget Y EnableSegmentDeletion=false, SegmentKeepSeconds=720; /cache/transcodes only 20 K right now (no live stream), but a 4K HEVC→h264 session will fill GiBs and not auto-prune Enable EnableSegmentDeletion=true (default 720 s keep is fine) — pairs with finding 03 S
06 Backup posture R /home/docker/jellyfin/config/ (104 MB) has no off-host rotation; snapshots/ in repo only holds pre-ElegantFin baseline Add a weekly tar.zst of /config/ (excluding log/, cache/) to NAS or git-backed snapshot dir M
07 Disk pressure Y /home 90 % full, 40 GiB free of 399 GiB; /home/user/media only 189 files Cap on media growth: at current free space + episode bitrate budget user has ~34 more series before disk fills M
08 DB WAL ratio Y library.db=3.3 MB, library.db-wal=4.4 MB (WAL > main, uncheckpointed). Optimize database last ran 2026-05-08T00:58 (OK) but a fresh scan completed 03:16 left WAL fat Either trigger a manual Optimize database post-scan, or shorten its schedule to "after every full scan". WAL > main is normal during/after a scan but should checkpoint on idle S
09 Custom CSS bloat Y CustomCss in branding.xml is 25 225 bytes, 17 !important, sole @import is MRunkehl/cineplex@v1.0.6 (jsDelivr) jsDelivr import adds 1 round-trip + ~50 KB on every cold cache load. Inline the import for offline-resilience and one-fewer DNS hop. Also doc 11 already flags this as the wrong theme (Cineplex, not NeutralFin) — resolve theme race first M
10 SPA shim cost G web-overrides/index.html 58 KB; runs 2× MutationObserver + 1× setInterval(1000ms) with lockTitle/lockFavicon/nukeSettings; cost ~1 ms per tick Acceptable for a single-tab branding shim; would be a problem only on background tabs at scale. No action
11 Service worker G /web/serviceworker.js 768 bytes, last modified 2024-11-19 (Jellyfin 10.10.3 ship date), serves with cache-control: no-store (HTTPS, etag set). Notification-only SW (per doc 10) No action — it is small and not caching index.html so cannot pin stale branding
12 Metrics endpoint G EnableMetrics=false Off is correct for a single-server box. No action
13 Slow-response warning Y EnableSlowResponseWarning=true, threshold 500 ms. Two transcoding 499s above 2.8 s would normally trigger this warning, but I see 0 slow lines in 1 h logs Either Jellyfin's slow log only fires on synchronous request handlers (not HLS segment GETs), or warning suppressed by another setting. Worth confirming threshold semantics S
14 Library scan concurrency Y LibraryScanFanoutConcurrency=0, LibraryMetadataRefreshConcurrency=0, ParallelImageEncodingLimit=0 (all defaults — auto = ProcessorCount) On a 12-core box already at load 11+, 0 (= 12) for all three is aggressive. Cap each at 46 to leave headroom for Forgejo/Traefik/etc S
15 Realtime monitor Y Both libraries have EnableRealtimeMonitor=true; only 189 files; LibraryMonitorDelay=60 Fine for current size, but inotify watches grow with file count. Re-evaluate at 10 k+ files
16 Trickplay / chapter previews G EnableTrickplayImageExtraction=false, ExtractChapterImagesDuringLibraryScan=false, EnableChapterImageExtraction=false, ExtractTrickplayImagesDuringLibraryScan=false (all libs) Disabled on both libraries — saves significant CPU. No action. (Note: scheduled task Generate Trickplay Images still ran 02:00 — check it is a no-op when libs say no)
17 Photos library G EnablePhotos=false on both Correct for a movies/TV deploy. No action
18 Plugin set G 6 plugins active (AudioDB, MusicBrainz, OMDb, OpenSubtitles, StudioImages, TMDb). Username/Password empty for OMDb (= no key, falls back to anon rate limit) and TMDb (TmdbApiKey empty — falls back to bundled key) Both tolerated. AudioDB + MusicBrainz unused (no music libs) but cost zero idle. Consider removing for minimalism, not perf
19 Admin user policy R s8n admin has EnableRemoteControlOfOtherUsers=true, EnableContentDeletion=true (correct for admin) but also IsHidden=true Hidden admin is non-standard; usually a hidden admin is reserved for automation. If s8n is the operator's daily account, IsHidden=false is the convention. Low risk, just unusual S
20 Non-admin policies Y All 4 non-admin users (5, guest, house, marco) have EnableContentDownloading=true, EnableMediaConversion=true, EnableLiveTvManagement=true, EnableSharedDeviceControl=true, IsHidden=true LiveTvManagement on accounts with no Live TV is dead weight, no harm. ContentDownloading + MediaConversion let any user kick off transcodes — a foot-gun on a CPU-only host. Review desired stance S
21 Login disclaimer leak G LoginDisclaimer = "Welcome to ARRFLIX - Private invite only service" Public-facing string is intentional per doc 09. No action
22 Public WAN exposure Y EnableRemoteAccess=true, no-guest@file middleware dropped in compose (per doc 09 §1.2). 24 h log: 270 LAN reqs, 59 reqs from 157.143.84.87, 1 from 82.31.156.86 Doc 09 confirms this is intentional. The 157.143.84.87 hits are bot-style asset-prober 404s — harmless but confirms the service is internet-reachable. No action; re-verify rate limit / fail2ban once router port-forward is active
23 Splashscreen size Y /config/data/splashscreen.png is 3.0 MB A splash image of 3 MB is large for a PNG; lossless re-encode or downscale to ≤500 KB; saves on first-paint over WAN S
24 Log rotation G LogFileRetentionDays=3; /config/log 1.3 MB; rotation working No action
25 Splashscreen flag Y SplashscreenEnabled=true in branding.xml Intentional for branding, no action — pairs with finding 23 (just shrink the file)
26 Cache breakdown G /cache/images 15 MB (entire cache 15 MB); /config/metadata 92 MB; /config/data 12 MB; /config/plugins 128 KB Healthy small footprint. No action
27 Forgejo log noise Y Traefik logs show forgejo@docker returning 401 for s8n/ARRFLIX.git/info/refs?service=git-receive-pack 8× / hour from 192.168.0.10 Out of scope for this deploy but indicates a stale git push retry loop on onyx — surfaces here only because we're scanning traefik logs. Mention to operator separately
28 Path substitutions G system.xml empty <PathSubstitutions /> and <CorsHosts /> Correct (no NFS/SMB indirection, no cross-origin clients). No action
29 LiveTV residue G DisableLiveTvChannelUserDataName=true; no Live TV configured; per-user EnableLiveTvAccess=true is dead weight Cosmetic; no perf cost. No action
30 Container restart count G docker inspect RestartCount=0, Status=running, StartedAt=2026-05-08T02:13:01 (~2 h uptime, healthy) No action. (Boot was at 02:13, suggests the compose was applied for doc-09 WAN flip and ran clean since)
31 Network XML hygiene Y KnownProxies empty, LocalNetworkSubnets empty, LocalNetworkAddresses empty Jellyfin can't tell the Traefik 172.20.0.0/16 docker net from random WAN — every external IP is logged as remote, which inflates Jellyfin's geoIP/session bookkeeping. Set KnownProxies=172.20.0.0/16 and LocalNetworkSubnets=192.168.0.0/24 S
32 TLS cert G LE cert valid 2026-05-08 → 2026-08-06 (89 days remaining), issued by R13, Gandi DNS-01 resolver, in acme.json Healthy. No action
33 Request-rate posture G 81 req / hour total via traefik; 62 of those are jellyfin@docker. Top src 192.168.0.10 (LAN, the operator), then 157.143.84.87 (asset-prober 404s) Low rate. No action — re-evaluate if WAN exposure draws more traffic
34 Idle session count G /Sessions returns 2 idle (s8n + guest) on 192.168.0.10; no playback in flight at audit time No action
35 Item counts G 2 movies, 6 series, 169 episodes; matches find /media -type f (189 files, accounting for non-video extras) Library scan is healthy; counts converged

  1. Finding 03 — enable transcode throttling + segment deletion. Effort: S (two checkboxes in Playback settings). Closes the highest-cost behaviour we have evidence of (the 499 ms wall events). Saves CPU cycles per stalled client.
  2. Finding 04 — set OpenSubtitles credentials, OR disable provider. Effort: S. Removes 102 ERR/6 h of log spam, fixes subtitle download, immediately restores log signal.
  3. Finding 31 — populate KnownProxies + LocalNetworkSubnets in network.xml. Effort: S. Restores accurate session origin reporting; needed before any rate-limiting or fail2ban work post-WAN.
  4. Finding 14 — cap LibraryScanFanoutConcurrency, LibraryMetadataRefreshConcurrency, ParallelImageEncodingLimit to 46. Effort: S. Stops a future scan piling on top of the existing host load (currently 11.4).
  5. Finding 06 — automate /config/ backup. Effort: M. Single highest-blast-radius risk: a corrupt library.db or a branding.xml regression and you've lost the user accounts AND the theme work in one go. A weekly tar.zst to NAS closes this.

GPU re-enable (finding 02) would unlock more wins but is L effort and lives outside Jellyfin (host driver work). Throttling (#03) is the right CPU-era patch until then.


Out of scope (audited and found healthy)

  • Service worker (/web/serviceworker.js, 768 B, notification-only, not caching index.html — finding 11).
  • Container restart count (0 — finding 30).
  • TLS cert chain (89 days valid — finding 32).
  • Trickplay / chapter / photo extraction (all disabled — findings 16, 17).
  • Log rotation (3-day retention working, 1.3 MB /config/log — finding 24).
  • Cache directory growth (15 MB total, healthy — finding 26).
  • Plugin set (6 plugins, all idle-cheap — finding 18).
  • Idle session footprint (2 idle web sessions, no playback in flight — finding 34).
  • Item count convergence (Items/Counts matches filesystem — finding 35).
  • Path substitution / CORS hygiene (empty as expected — finding 28).
  • Login disclaimer string (per-doc-09 intentional public-facing text — finding 21).

Appendix — raw evidence

Host

uptime: 04:18:55 up 4 days,  4:36,  3 users,  load average: 11.40, 9.59, 6.19
nproc:  12
free -h: total 31Gi, used 9.2Gi, free 5.8Gi, swap used 6.8Gi / 24Gi
df -h /home: 399G total, 339G used, 40G avail (90 % full)

Container

docker stats jellyfin (no-stream):
CPU 0.01 %, MEM 521.5 MiB / 31.27 GiB (1.63 %), PIDS 24, NET 83.8 MB / 361 MB
docker inspect: Restarts=0, Started=2026-05-08T02:13:01Z, Status=running

GPU

nvidia-smi: NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver
lsmod | grep nvidia: (no matches)
ls /dev/nvidia*: No such file or directory
encoding.xml: HardwareAccelerationType=none, EnableHardwareEncoding=true

Disk

/config        104 M   (data 12M, metadata 92M, log 1.3M, plugins 128K)
/cache          15 M   (images 15M, transcodes 20K, fontconfig 36K, omdb 84K)
/home/docker/jellyfin: not visible (sudo blocked); inferred from container view

Database

jellyfin.db        208 K  (WAL 473 K, SHM 32 K)
library.db        3.3 M  (WAL 4.4 M, SHM 32 K)  <- WAL > main
keyframes/             16 K
splashscreen.png      3.0 M

Traefik (last 1 h)

total log lines:           279
jellyfin@docker requests:   62
status 499 (client cancel):  2 (HLS segments, 6439 ms + 2890 ms)
status 5xx:                  0
top source IPs (jellyfin):
  82.31.156.86  123  (own WAN egress, hairpin)
  82.131.116.123  122 (external — likely friend / scanner)
  192.168.0.10   13  (operator LAN)
  173.244.58.11   2  (cloud scanner)
  35.203.85.72    1  (Google security scan)

Jellyfin (last 6 h)

"Error downloading subtitles from Open Subtitles": 102
"slow" / "throttl" matches: 1 (false positive, no real slow-warn)
Container restart events: 0

TLS

Subject:  CN=arrflix.s8n.ru
Issuer:   C=US, O=Let's Encrypt, CN=R13
Valid:    2026-05-08 00:58:11 GMT  →  2026-08-06 00:58:10 GMT  (89 d)
Resolver: letsencrypt (Gandi DNS-01)

Service worker

URL:           https://arrflix.s8n.ru/web/serviceworker.js
HTTP:          200, content-type text/javascript
Size:          768 bytes
Last-Modified: Tue, 19 Nov 2024 03:43:48 GMT (Jellyfin 10.10.3 ship)
Headers:       HSTS preload + nosniff + frame=SAMEORIGIN + xss-protection

CSS / branding

/Branding/Configuration:
  CustomCss bytes:      25 225
  !important rules:         17
  sole @import:         https://cdn.jsdelivr.net/gh/MRunkehl/cineplex@v1.0.6/cineplex.css
  LoginDisclaimer:      "Welcome to ARRFLIX - Private invite only service"
  SplashscreenEnabled:  True
on disk:
  /config/config/branding.xml  25 584 bytes

SPA shim

/opt/docker/jellyfin/web-overrides/index.html  58 725 bytes
MutationObserver count:  2  (one head/title-favicon, one body/nukeSettings)
setInterval count:       1  (1000 ms — relocks title + favicon + nukeSettings)

Users

# users: 5
admin (s8n):       IsHidden=true, EnableRemoteControlOfOtherUsers=true, EnableContentDeletion=true
non-admin (5, guest, house, marco):  IsHidden=true, EnableContentDownloading=true,
                                     EnableMediaConversion=true, EnableLiveTvManagement=true

Plugins

AudioDB         10.10.3.0  Active
MusicBrainz     10.10.3.0  Active   RateLimit=1, ReplaceArtistName=false
OMDb            10.10.3.0  Active   CastAndCrew=false
Open Subtitles  20.0.0.0   Active   Username/Password empty, CredentialsInvalid=false
Studio Images   10.10.3.0  Active
TMDb            10.10.3.0  Active   TmdbApiKey empty

Library options (both libs)

EnableRealtimeMonitor                  = True
ExtractChapterImagesDuringLibraryScan  = False
EnableTrickplayImageExtraction         = False
EnablePhotos                           = False
SaveLocalMetadata                      = False
EnableInternetProviders                = False
SkipSubtitlesIfAudioTrackMatches       = True
SaveSubtitlesWithMedia                 = True
ExtractTrickplayImagesDuringLibraryScan= False

Network XML

EnableHttps=false (TLS handled by Traefik) | EnableUPnP=false | EnableRemoteAccess=true
KnownProxies=(empty)  LocalNetworkSubnets=(empty)  LocalNetworkAddresses=(empty)
IgnoreVirtualInterfaces=true  VirtualInterfaceNames=[veth]
EnablePublishedServerUriByRequest=false

System config — performance knobs

LogFileRetentionDays            =  3
EnableMetrics                   =  False
EnableSlowResponseWarning       =  True   (threshold 500 ms)
RemoteClientBitrateLimit        =  0      (no cap)
LibraryScanFanoutConcurrency    =  0      (auto = ProcessorCount = 12)
LibraryMetadataRefreshConcurrency = 0     (auto = ProcessorCount = 12)
ParallelImageEncodingLimit      =  0      (auto = ProcessorCount = 12)
EnableNormalizedItemByNameIds   =  True   (correct for 10.10.x)
QuickConnectAvailable           =  False
EnableCaseSensitiveItemIds      =  True
EnableFolderView                =  False
EnableGroupingIntoCollections   =  False
IsStartupWizardCompleted        =  True
ChapterImageResolution          =  (default)
DummyChapterDuration            =  (default)
ImageExtractionTimeoutMs        =  (default)
LibraryMonitorDelay             =  60
LibraryUpdateDuration           =  30
ActivityLogRetentionDays        =  (default)

Encoding config — full dump

EncodingThreadCount             = -1     (auto)
EnableAudioVbr                  = False
MaxMuxingQueueSize              = 2048
EnableThrottling                = False  ← finding 03
ThrottleDelaySeconds            = 180
EnableSegmentDeletion           = False  ← finding 05
SegmentKeepSeconds              = 720
HardwareAccelerationType        = none   ← finding 02
EncoderAppPathDisplay           = /usr/lib/jellyfin-ffmpeg/ffmpeg
VaapiDevice                     = /dev/dri/renderD128 (no Intel iGPU on host)
H264Crf                         = 23
H265Crf                         = 28
EncoderPreset                   = (nil)
EnableHardwareEncoding          = True   (no-op while type=none)
AllowHevcEncoding               = False
AllowAv1Encoding                = False
EnableSubtitleExtraction        = True
HardwareDecodingCodecs          = [h264, vc1]
AllowOnDemandMetadataBasedKeyframeExtractionForExtensions = [mkv]
PreferSystemNativeHwDecoder     = True
EnableEnhancedNvdecDecoder      = True   (no-op while no nvidia)

Scheduled tasks

Audio Normalization              Idle  Completed  2026-05-08T00:58
Clean Cache Directory            Idle  Completed  2026-05-08T00:58
Clean Log Directory              Idle  Completed  2026-05-08T00:58
Clean Transcode Directory        Idle  Completed  2026-05-08T02:13
Download missing subtitles       Idle  Completed  2026-05-08T00:58
Extract Chapter Images           Idle  Completed  2026-05-08T01:00
Generate Trickplay Images        Idle  Completed  2026-05-08T02:00  (no-op?)
Optimize database                Idle  Completed  2026-05-08T00:58
Refresh People                   Idle  Completed  2026-05-08T00:58
Scan Media Library               Idle  Completed  2026-05-08T03:16
Update Plugins                   Idle  Completed  2026-05-08T02:13

Sign-off

  • Audit: 2026-05-08, read-only, ~25 min wall.
  • No fixes applied. No state mutated. No container restart.
  • Next audit due: 2026-08-08 (quarterly, before LE cert renewal window opens at 2026-08-06).