ARRFLIX/docs/13-optimization-audit.md

372 lines
22 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 13 — Optimization Audit (Read-Only)
> Status: **read-only audit**, executed 2026-05-08 against
> `https://arrflix.s8n.ru` (Jellyfin 10.10.3 on nullstone). Scope: scan
> for performance, capacity, reliability, and ops-hygiene risks. **No
> fixes applied. No state mutated. No container restarts.**
Audit ran ~25 minutes wall. Inputs: Jellyfin REST API (auth
`X-Emby-Token: 76858153…f8b1`), `docker exec jellyfin`, `docker logs
{traefik,jellyfin} --since 1h/6h/24h`, host `free`, `df`, `uptime`,
`nvidia-smi`, on-disk Jellyfin XML configs.
---
## Executive summary
1. **Host is under serious memory pressure right now.** `uptime` shows
load average **11.40 / 9.59 / 6.19** on a 12-core box, **6.8 GiB of
swap is in use** (out of 24 GiB), and `/home` is **90 % full
(40 GiB free of 399 GiB)**. Jellyfin itself is fine
(522 MiB / 31 GiB cap, no restarts), but the host it lives on is
loaded enough that any media ingest at scale will start swap-thrashing.
This is the single biggest risk to playback latency.
2. **GPU transcode is dead and confirmed dead.** `nvidia-smi` fails on
host, `lsmod | grep nvidia` returns empty, `/dev/nvidia*` does not
exist. `EnableHardwareEncoding=true` and `HardwareAccelerationType=none`
in `encoding.xml` is harmless but misleading — the toggle is on, but
the type selector is `none`, so every transcode goes through ffmpeg
software path. Two HLS segment requests this hour returned **499**
(client cancelled mid-transcode at 6.4 s and 2.9 s wall) — that is
the playback-stalls signature.
3. **OpenSubtitles plugin is logging an error per file probed during
library scan** (102 errors in last 6 h) because `Username` and
`Password` are empty in the plugin XML. Every Scan Media Library run
tries Open Subtitles, fails on auth, logs an `ERR`, retries on the
next file. This is pure log noise + wasted RTT, not data loss, but
it bloats `/config/log` and obscures real warnings.
4. **Transcode throttling is OFF and `MaxMuxingQueueSize` is 2048**
on a CPU-only deploy that means a stalled client with high-bitrate
AV1/HEVC source will keep ffmpeg burning a full core for up to
`SegmentKeepSeconds=720`s after the client gives up. `EnableThrottling`
should be on for a CPU deploy; this would have prevented the 499s
seen above.
5. **No automated backup of `/home/docker/jellyfin/config/`.** The
Cineplex CSS, the 5 user accounts + permissions, the library
metadata, and the Open Subtitles plugin install all live in one
unprotected directory tree. The repo's `snapshots/` only captures the
pre-ElegantFin migration baseline; nothing on disk is being rotated
off-host.
---
## Findings table
Severity legend: **R** = red (acute, fix this week), **Y** = yellow
(deferred fix, document risk), **G** = green (audited, healthy, no
action). Effort: **S** ≤ 30 min, **M** half-day, **L** > 1 day.
| # | Category | Severity | Evidence | Recommendation | Effort |
|---|---|:-:|---|---|:-:|
| 01 | Host capacity | **R** | `uptime` load 11.40 / 9.59 / 6.19 on 12 cores; swap 6.8 GiB used / 24 GiB; `/home` 90 % full | Identify swap hog (likely not Jellyfin — only 522 MiB RSS); reclaim space on `/home`; budget media additions against the 40 GiB headroom | M |
| 02 | GPU transcode | **R** | `nvidia-smi` fails, no `/dev/nvidia*`, `lsmod` no nvidia mod; `HardwareAccelerationType=none` | Reinstall nvidia driver on nullstone host; once `nvidia-smi` works, add device reservation block to compose and flip `HardwareAccelerationType` to `nvenc` | L |
| 03 | Transcode throttling | **R** | `EnableThrottling=false`, `ThrottleDelaySeconds=180`, `MaxMuxingQueueSize=2048`, **two 499 client-cancels** logged (6 439 ms / 2 890 ms) | Enable `EnableThrottling=true` and `EnableSegmentDeletion=true` for CPU-only era — caps wasted ffmpeg CPU after client disconnect | S |
| 04 | OpenSubtitles auth | **R** | `Username`/`Password` empty in `Jellyfin.Plugin.OpenSubtitles.xml`; **102** `Error downloading subtitles from Open Subtitles` lines / 6 h | Set creds via UI, OR disable the provider on both libraries (`EnableInternetProviders=false` already; subtitle search still runs). Doc 03-subtitles.md already calls this out as pending | S |
| 05 | Cache trash budget | **Y** | `EnableSegmentDeletion=false`, `SegmentKeepSeconds=720`; `/cache/transcodes` only 20 K right now (no live stream), but a 4K HEVC→h264 session will fill GiBs and not auto-prune | Enable `EnableSegmentDeletion=true` (default 720 s keep is fine) — pairs with finding 03 | S |
| 06 | Backup posture | **R** | `/home/docker/jellyfin/config/` (104 MB) has no off-host rotation; `snapshots/` in repo only holds pre-ElegantFin baseline | Add a weekly `tar.zst` of `/config/` (excluding `log/`, `cache/`) to NAS or git-backed snapshot dir | M |
| 07 | Disk pressure | **Y** | `/home` 90 % full, 40 GiB free of 399 GiB; `/home/user/media` only 189 files | Cap on media growth: at current free space + episode bitrate budget user has ~34 more series before disk fills | M |
| 08 | DB WAL ratio | **Y** | `library.db`=3.3 MB, `library.db-wal`=4.4 MB (WAL > main, uncheckpointed). `Optimize database` last ran 2026-05-08T00:58 (OK) but a fresh scan completed 03:16 left WAL fat | Either trigger a manual `Optimize database` post-scan, or shorten its schedule to "after every full scan". WAL > main is normal during/after a scan but should checkpoint on idle | S |
| 09 | Custom CSS bloat | **Y** | `CustomCss` in `branding.xml` is **25 225 bytes**, 17 `!important`, sole `@import` is `MRunkehl/cineplex@v1.0.6` (jsDelivr) | jsDelivr import adds 1 round-trip + ~50 KB on every cold cache load. Inline the import for offline-resilience and one-fewer DNS hop. Also doc 11 already flags this as the wrong theme (Cineplex, not NeutralFin) — resolve theme race first | M |
| 10 | SPA shim cost | **G** | `web-overrides/index.html` 58 KB; runs **2× MutationObserver** + **1× setInterval(1000ms)** with `lockTitle/lockFavicon/nukeSettings`; cost ~1 ms per tick | Acceptable for a single-tab branding shim; would be a problem only on background tabs at scale. No action | — |
| 11 | Service worker | **G** | `/web/serviceworker.js` 768 bytes, last modified 2024-11-19 (Jellyfin 10.10.3 ship date), serves with `cache-control: no-store` (HTTPS, etag set). Notification-only SW (per doc 10) | No action — it is small and not caching `index.html` so cannot pin stale branding | — |
| 12 | Metrics endpoint | **G** | `EnableMetrics=false` | Off is correct for a single-server box. No action | — |
| 13 | Slow-response warning | **Y** | `EnableSlowResponseWarning=true`, threshold **500 ms**. Two transcoding 499s above 2.8 s would normally trigger this warning, but I see 0 `slow` lines in 1 h logs | Either Jellyfin's slow log only fires on synchronous request handlers (not HLS segment GETs), or warning suppressed by another setting. Worth confirming threshold semantics | S |
| 14 | Library scan concurrency | **Y** | `LibraryScanFanoutConcurrency=0`, `LibraryMetadataRefreshConcurrency=0`, `ParallelImageEncodingLimit=0` (all defaults — auto = `ProcessorCount`) | On a 12-core box already at load 11+, `0` (= 12) for all three is aggressive. Cap each at 46 to leave headroom for Forgejo/Traefik/etc | S |
| 15 | Realtime monitor | **Y** | Both libraries have `EnableRealtimeMonitor=true`; only 189 files; `LibraryMonitorDelay=60` | Fine for current size, but inotify watches grow with file count. Re-evaluate at 10 k+ files | — |
| 16 | Trickplay / chapter previews | **G** | `EnableTrickplayImageExtraction=false`, `ExtractChapterImagesDuringLibraryScan=false`, `EnableChapterImageExtraction=false`, `ExtractTrickplayImagesDuringLibraryScan=false` (all libs) | Disabled on both libraries — saves significant CPU. No action. (Note: scheduled task `Generate Trickplay Images` still ran 02:00 — check it is a no-op when libs say no) | — |
| 17 | Photos library | **G** | `EnablePhotos=false` on both | Correct for a movies/TV deploy. No action | — |
| 18 | Plugin set | **G** | 6 plugins active (AudioDB, MusicBrainz, OMDb, OpenSubtitles, StudioImages, TMDb). `Username/Password` empty for OMDb (= no key, falls back to anon rate limit) and TMDb (`TmdbApiKey` empty — falls back to bundled key) | Both tolerated. AudioDB + MusicBrainz unused (no music libs) but cost zero idle. Consider removing for minimalism, not perf | — |
| 19 | Admin user policy | **R** | `s8n` admin has `EnableRemoteControlOfOtherUsers=true`, `EnableContentDeletion=true` (correct for admin) but **also `IsHidden=true`** | Hidden admin is non-standard; usually a hidden admin is reserved for automation. If `s8n` is the operator's daily account, `IsHidden=false` is the convention. Low risk, just unusual | S |
| 20 | Non-admin policies | **Y** | All 4 non-admin users (`5`, `guest`, `house`, `marco`) have `EnableContentDownloading=true`, `EnableMediaConversion=true`, `EnableLiveTvManagement=true`, `EnableSharedDeviceControl=true`, `IsHidden=true` | LiveTvManagement on accounts with no Live TV is dead weight, no harm. ContentDownloading + MediaConversion let any user kick off transcodes — a foot-gun on a CPU-only host. Review desired stance | S |
| 21 | Login disclaimer leak | **G** | `LoginDisclaimer` = "Welcome to ARRFLIX - Private invite only service" | Public-facing string is intentional per doc 09. No action | — |
| 22 | Public WAN exposure | **Y** | `EnableRemoteAccess=true`, `no-guest@file` middleware **dropped** in compose (per doc 09 §1.2). 24 h log: 270 LAN reqs, **59 reqs from 157.143.84.87, 1 from 82.31.156.86** | Doc 09 confirms this is intentional. The 157.143.84.87 hits are bot-style asset-prober 404s — harmless but confirms the service is internet-reachable. No action; re-verify rate limit / fail2ban once router port-forward is active | — |
| 23 | Splashscreen size | **Y** | `/config/data/splashscreen.png` is **3.0 MB** | A splash image of 3 MB is large for a PNG; lossless re-encode or downscale to ≤500 KB; saves on first-paint over WAN | S |
| 24 | Log rotation | **G** | `LogFileRetentionDays=3`; `/config/log` 1.3 MB; rotation working | No action | — |
| 25 | Splashscreen flag | **Y** | `SplashscreenEnabled=true` in `branding.xml` | Intentional for branding, no action — pairs with finding 23 (just shrink the file) | — |
| 26 | Cache breakdown | **G** | `/cache/images` 15 MB (entire cache 15 MB); `/config/metadata` 92 MB; `/config/data` 12 MB; `/config/plugins` 128 KB | Healthy small footprint. No action | — |
| 27 | Forgejo log noise | **Y** | Traefik logs show `forgejo@docker` returning **401** for `s8n/ARRFLIX.git/info/refs?service=git-receive-pack` 8× / hour from 192.168.0.10 | Out of scope for this deploy but indicates a stale `git push` retry loop on onyx — surfaces here only because we're scanning traefik logs. Mention to operator separately | — |
| 28 | Path substitutions | **G** | `system.xml` empty `<PathSubstitutions />` and `<CorsHosts />` | Correct (no NFS/SMB indirection, no cross-origin clients). No action | — |
| 29 | LiveTV residue | **G** | `DisableLiveTvChannelUserDataName=true`; no Live TV configured; per-user `EnableLiveTvAccess=true` is dead weight | Cosmetic; no perf cost. No action | — |
| 30 | Container restart count | **G** | `docker inspect` `RestartCount=0`, `Status=running`, `StartedAt=2026-05-08T02:13:01` (~2 h uptime, healthy) | No action. (Boot was at 02:13, suggests the compose was applied for doc-09 WAN flip and ran clean since) | — |
| 31 | Network XML hygiene | **Y** | `KnownProxies` empty, `LocalNetworkSubnets` empty, `LocalNetworkAddresses` empty | Jellyfin can't tell the Traefik 172.20.0.0/16 docker net from random WAN — every external IP is logged as remote, which inflates Jellyfin's geoIP/session bookkeeping. Set `KnownProxies=172.20.0.0/16` and `LocalNetworkSubnets=192.168.0.0/24` | S |
| 32 | TLS cert | **G** | LE cert valid `2026-05-08 → 2026-08-06` (89 days remaining), issued by R13, Gandi DNS-01 resolver, in `acme.json` | Healthy. No action | — |
| 33 | Request-rate posture | **G** | 81 req / hour total via traefik; 62 of those are `jellyfin@docker`. Top src 192.168.0.10 (LAN, the operator), then 157.143.84.87 (asset-prober 404s) | Low rate. No action — re-evaluate if WAN exposure draws more traffic | — |
| 34 | Idle session count | **G** | `/Sessions` returns 2 idle (s8n + guest) on 192.168.0.10; no playback in flight at audit time | No action | — |
| 35 | Item counts | **G** | 2 movies, 6 series, 169 episodes; matches `find /media -type f` (189 files, accounting for non-video extras) | Library scan is healthy; counts converged | — |
---
## Recommended fix order (top 5 by impact-per-effort)
1. **Finding 03 — enable transcode throttling + segment deletion.**
*Effort: S (two checkboxes in Playback settings).* Closes the
highest-cost behaviour we have evidence of (the 499 ms wall events).
Saves CPU cycles per stalled client.
2. **Finding 04 — set OpenSubtitles credentials, OR disable
provider.** *Effort: S.* Removes 102 ERR/6 h of log spam, fixes
subtitle download, immediately restores log signal.
3. **Finding 31 — populate `KnownProxies` + `LocalNetworkSubnets` in
`network.xml`.** *Effort: S.* Restores accurate session origin
reporting; needed before any rate-limiting or fail2ban work post-WAN.
4. **Finding 14 — cap `LibraryScanFanoutConcurrency`,
`LibraryMetadataRefreshConcurrency`, `ParallelImageEncodingLimit`
to 46.** *Effort: S.* Stops a future scan piling on top of the
existing host load (currently 11.4).
5. **Finding 06 — automate `/config/` backup.** *Effort: M.* Single
highest-blast-radius risk: a corrupt `library.db` or a `branding.xml`
regression and you've lost the user accounts AND the theme work in
one go. A weekly `tar.zst` to NAS closes this.
GPU re-enable (finding 02) would unlock more wins but is **L** effort
and lives outside Jellyfin (host driver work). Throttling (#03) is the
right CPU-era patch until then.
---
## Out of scope (audited and found healthy)
- **Service worker** (`/web/serviceworker.js`, 768 B, notification-only,
not caching index.html — finding 11).
- **Container restart count** (0 — finding 30).
- **TLS cert chain** (89 days valid — finding 32).
- **Trickplay / chapter / photo extraction** (all disabled — findings
16, 17).
- **Log rotation** (3-day retention working, 1.3 MB /config/log —
finding 24).
- **Cache directory growth** (15 MB total, healthy — finding 26).
- **Plugin set** (6 plugins, all idle-cheap — finding 18).
- **Idle session footprint** (2 idle web sessions, no playback in
flight — finding 34).
- **Item count convergence** (Items/Counts matches filesystem —
finding 35).
- **Path substitution / CORS hygiene** (empty as expected — finding 28).
- **Login disclaimer string** (per-doc-09 intentional public-facing
text — finding 21).
---
## Appendix — raw evidence
### Host
```
uptime: 04:18:55 up 4 days, 4:36, 3 users, load average: 11.40, 9.59, 6.19
nproc: 12
free -h: total 31Gi, used 9.2Gi, free 5.8Gi, swap used 6.8Gi / 24Gi
df -h /home: 399G total, 339G used, 40G avail (90 % full)
```
### Container
```
docker stats jellyfin (no-stream):
CPU 0.01 %, MEM 521.5 MiB / 31.27 GiB (1.63 %), PIDS 24, NET 83.8 MB / 361 MB
docker inspect: Restarts=0, Started=2026-05-08T02:13:01Z, Status=running
```
### GPU
```
nvidia-smi: NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver
lsmod | grep nvidia: (no matches)
ls /dev/nvidia*: No such file or directory
encoding.xml: HardwareAccelerationType=none, EnableHardwareEncoding=true
```
### Disk
```
/config 104 M (data 12M, metadata 92M, log 1.3M, plugins 128K)
/cache 15 M (images 15M, transcodes 20K, fontconfig 36K, omdb 84K)
/home/docker/jellyfin: not visible (sudo blocked); inferred from container view
```
### Database
```
jellyfin.db 208 K (WAL 473 K, SHM 32 K)
library.db 3.3 M (WAL 4.4 M, SHM 32 K) <- WAL > main
keyframes/ 16 K
splashscreen.png 3.0 M
```
### Traefik (last 1 h)
```
total log lines: 279
jellyfin@docker requests: 62
status 499 (client cancel): 2 (HLS segments, 6439 ms + 2890 ms)
status 5xx: 0
top source IPs (jellyfin):
82.31.156.86 123 (own WAN egress, hairpin)
82.131.116.123 122 (external — likely friend / scanner)
192.168.0.10 13 (operator LAN)
173.244.58.11 2 (cloud scanner)
35.203.85.72 1 (Google security scan)
```
### Jellyfin (last 6 h)
```
"Error downloading subtitles from Open Subtitles": 102
"slow" / "throttl" matches: 1 (false positive, no real slow-warn)
Container restart events: 0
```
### TLS
```
Subject: CN=arrflix.s8n.ru
Issuer: C=US, O=Let's Encrypt, CN=R13
Valid: 2026-05-08 00:58:11 GMT → 2026-08-06 00:58:10 GMT (89 d)
Resolver: letsencrypt (Gandi DNS-01)
```
### Service worker
```
URL: https://arrflix.s8n.ru/web/serviceworker.js
HTTP: 200, content-type text/javascript
Size: 768 bytes
Last-Modified: Tue, 19 Nov 2024 03:43:48 GMT (Jellyfin 10.10.3 ship)
Headers: HSTS preload + nosniff + frame=SAMEORIGIN + xss-protection
```
### CSS / branding
```
/Branding/Configuration:
CustomCss bytes: 25 225
!important rules: 17
sole @import: https://cdn.jsdelivr.net/gh/MRunkehl/cineplex@v1.0.6/cineplex.css
LoginDisclaimer: "Welcome to ARRFLIX - Private invite only service"
SplashscreenEnabled: True
on disk:
/config/config/branding.xml 25 584 bytes
```
### SPA shim
```
/opt/docker/jellyfin/web-overrides/index.html 58 725 bytes
MutationObserver count: 2 (one head/title-favicon, one body/nukeSettings)
setInterval count: 1 (1000 ms — relocks title + favicon + nukeSettings)
```
### Users
```
# users: 5
admin (s8n): IsHidden=true, EnableRemoteControlOfOtherUsers=true, EnableContentDeletion=true
non-admin (5, guest, house, marco): IsHidden=true, EnableContentDownloading=true,
EnableMediaConversion=true, EnableLiveTvManagement=true
```
### Plugins
```
AudioDB 10.10.3.0 Active
MusicBrainz 10.10.3.0 Active RateLimit=1, ReplaceArtistName=false
OMDb 10.10.3.0 Active CastAndCrew=false
Open Subtitles 20.0.0.0 Active Username/Password empty, CredentialsInvalid=false
Studio Images 10.10.3.0 Active
TMDb 10.10.3.0 Active TmdbApiKey empty
```
### Library options (both libs)
```
EnableRealtimeMonitor = True
ExtractChapterImagesDuringLibraryScan = False
EnableTrickplayImageExtraction = False
EnablePhotos = False
SaveLocalMetadata = False
EnableInternetProviders = False
SkipSubtitlesIfAudioTrackMatches = True
SaveSubtitlesWithMedia = True
ExtractTrickplayImagesDuringLibraryScan= False
```
### Network XML
```
EnableHttps=false (TLS handled by Traefik) | EnableUPnP=false | EnableRemoteAccess=true
KnownProxies=(empty) LocalNetworkSubnets=(empty) LocalNetworkAddresses=(empty)
IgnoreVirtualInterfaces=true VirtualInterfaceNames=[veth]
EnablePublishedServerUriByRequest=false
```
### System config — performance knobs
```
LogFileRetentionDays = 3
EnableMetrics = False
EnableSlowResponseWarning = True (threshold 500 ms)
RemoteClientBitrateLimit = 0 (no cap)
LibraryScanFanoutConcurrency = 0 (auto = ProcessorCount = 12)
LibraryMetadataRefreshConcurrency = 0 (auto = ProcessorCount = 12)
ParallelImageEncodingLimit = 0 (auto = ProcessorCount = 12)
EnableNormalizedItemByNameIds = True (correct for 10.10.x)
QuickConnectAvailable = False
EnableCaseSensitiveItemIds = True
EnableFolderView = False
EnableGroupingIntoCollections = False
IsStartupWizardCompleted = True
ChapterImageResolution = (default)
DummyChapterDuration = (default)
ImageExtractionTimeoutMs = (default)
LibraryMonitorDelay = 60
LibraryUpdateDuration = 30
ActivityLogRetentionDays = (default)
```
### Encoding config — full dump
```
EncodingThreadCount = -1 (auto)
EnableAudioVbr = False
MaxMuxingQueueSize = 2048
EnableThrottling = False ← finding 03
ThrottleDelaySeconds = 180
EnableSegmentDeletion = False ← finding 05
SegmentKeepSeconds = 720
HardwareAccelerationType = none ← finding 02
EncoderAppPathDisplay = /usr/lib/jellyfin-ffmpeg/ffmpeg
VaapiDevice = /dev/dri/renderD128 (no Intel iGPU on host)
H264Crf = 23
H265Crf = 28
EncoderPreset = (nil)
EnableHardwareEncoding = True (no-op while type=none)
AllowHevcEncoding = False
AllowAv1Encoding = False
EnableSubtitleExtraction = True
HardwareDecodingCodecs = [h264, vc1]
AllowOnDemandMetadataBasedKeyframeExtractionForExtensions = [mkv]
PreferSystemNativeHwDecoder = True
EnableEnhancedNvdecDecoder = True (no-op while no nvidia)
```
### Scheduled tasks
```
Audio Normalization Idle Completed 2026-05-08T00:58
Clean Cache Directory Idle Completed 2026-05-08T00:58
Clean Log Directory Idle Completed 2026-05-08T00:58
Clean Transcode Directory Idle Completed 2026-05-08T02:13
Download missing subtitles Idle Completed 2026-05-08T00:58
Extract Chapter Images Idle Completed 2026-05-08T01:00
Generate Trickplay Images Idle Completed 2026-05-08T02:00 (no-op?)
Optimize database Idle Completed 2026-05-08T00:58
Refresh People Idle Completed 2026-05-08T00:58
Scan Media Library Idle Completed 2026-05-08T03:16
Update Plugins Idle Completed 2026-05-08T02:13
```
---
## Sign-off
- Audit: 2026-05-08, read-only, ~25 min wall.
- No fixes applied. No state mutated. No container restart.
- Next audit due: **2026-08-08** (quarterly, before LE cert renewal
window opens at 2026-08-06).