518 lines
25 KiB
Markdown
518 lines
25 KiB
Markdown
|
|
# 22 — Jellyfin Runtime Performance Audit (server scope)
|
|||
|
|
|
|||
|
|
> Status: **read-only audit**, executed 2026-05-08 ~17:30–17:45 BST against
|
|||
|
|
> `https://arrflix.s8n.ru` (Jellyfin 10.10.3 on nullstone, container `jellyfin`).
|
|||
|
|
> Scope: server runtime — CPU, RAM, container limits, FFmpeg, scheduled
|
|||
|
|
> tasks, plugins. Network/edge, storage, color/HDR are out of scope (sibling
|
|||
|
|
> agents). Supplements doc 13 (2026-05-08, host-capacity scan); does not
|
|||
|
|
> repeat findings already in 13 unless the data has materially changed.
|
|||
|
|
> **No fixes applied. No state mutated. No container restart.**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 1. Executive summary — top 3 perf culprits
|
|||
|
|
|
|||
|
|
| # | Culprit | Severity | Evidence (one line) |
|
|||
|
|
|---|---|:-:|---|
|
|||
|
|
| 1 | **4 concurrent ffmpeg processes for ONE viewer**, each upscaling 1080p → 2160p with PGS subtitle burn-in, no throttling, no segment deletion | **CRITICAL** | `ps`: PIDs 1681949 (643 % CPU), 1685275 (135 %), 1685316 (133 %), 1685478 (132 %) — all transcoding `Rick and Morty S01E01.mkv`, all `-vf scale=3840:2160` + `[0:4]overlay` subtitle burn. Container CPU 690–876 % across 3 samples |
|
|||
|
|
| 2 | **Forgejo BlueBuild CI container running uncapped on the same 12-core host** (noisy neighbor) | **HIGH** | `docker stats`: `FORGEJO-ACTIONS-TASK-202_..._Build-push-OCI` 88–99 % CPU, 4.3 GiB RAM, 5 GB net-in. Both jellyfin and the build container have `Memory=0 NanoCpus=0 CpuQuota=0` (no limits). Aggregate load 15.43 / 14.61 / 8.85 on 12 cores |
|
|||
|
|
| 3 | **GPU acceleration still off** (already in doc 13 finding 02; quantified here) — every CPU transcode spawns one ffmpeg burning 6–8 cores per stream because of the 4K-upscale + sub-overlay filtergraph | **HIGH** | `HardwareAccelerationType=none`. Per-ffmpeg cost on this filtergraph: ~6.4 cores at `preset=veryfast`. 2 viewers transcoding = full host pegged |
|
|||
|
|
|
|||
|
|
**Biggest quick-win:** turn on **transcode throttling + segment deletion**
|
|||
|
|
(doc 13 finding 03 already flags this; new evidence here makes it
|
|||
|
|
non-optional). The 4-stream pile-up in §3 is exactly what those two
|
|||
|
|
flags exist to prevent — without them, every client seek/reload spawns a
|
|||
|
|
fresh ffmpeg and the previous one keeps burning a core for up to 720 s
|
|||
|
|
(`SegmentKeepSeconds=720`). Two checkbox flips in Playback settings.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 2. Resource snapshot (3 samples, 10 s apart)
|
|||
|
|
|
|||
|
|
| Sample @time | jellyfin CPU% | jellyfin MEM | NET I/O | BLOCK I/O | PIDs |
|
|||
|
|
|---|---:|---:|---:|---:|---:|
|
|||
|
|
| t=0 | **834.3 %** | 2.635 GiB / 31.27 GiB (8.42 %) | 5.36 / 158 MB | 1.14 / 855 MB | 101 |
|
|||
|
|
| t=10s | **690.5 %** | 2.637 GiB | 5.37 / 158 MB | 1.22 / 894 MB | 102 |
|
|||
|
|
| t=20s | **876.7 %** | 2.646 GiB | 5.37 / 158 MB | 1.32 / 942 MB | 101 |
|
|||
|
|
|
|||
|
|
**Container limits:** `Memory=0 NanoCpus=0 CpuQuota=0 CpuPeriod=0
|
|||
|
|
PidsLimit=<none> RestartPolicy=unless-stopped`. **No CPU or RAM cap on
|
|||
|
|
the jellyfin container.** Same for the Forgejo build container.
|
|||
|
|
|
|||
|
|
**Host (nullstone, 12-core AMD Ryzen 5 2600X, 32 GiB RAM, 24 GiB swap):**
|
|||
|
|
- `uptime`: load avg **15.43 / 14.61 / 8.85** — 1-min load 28 % above
|
|||
|
|
core count. 5-min trend confirms sustained load. Doc 13 logged 11.40 /
|
|||
|
|
9.59 / 6.19 ~13 h ago, so the host has been getting *worse*, not better.
|
|||
|
|
- `free -h`: 31 GiB total, 10 GiB used, 8.2 GiB free, 13 GiB buff/cache;
|
|||
|
|
swap **7.8 GiB / 24 GiB used** (32 %). `SwapCached=771 MB` (kernel is
|
|||
|
|
actively servicing swap-in from cache — i.e. swap-thrash signature).
|
|||
|
|
- `vmstat 1 5`: `r=3–27`, `cs=30 K–41 K/s` (very high context switch
|
|||
|
|
rate), `si≤24 KB/s so≈0` (paging-in but not out — recovering, not
|
|||
|
|
thrashing right this second), `us=70–72 % sy=10–13 % id=16–18 %
|
|||
|
|
wa=0 %`.
|
|||
|
|
- `iostat -x`: `nvme0n1` w/s ≈ 38–433, `wkB/s` ≈ 364–2 272, util `0.4 %–
|
|||
|
|
0.9 %`. **Disk is not the bottleneck — CPU is.**
|
|||
|
|
|
|||
|
|
**All-container CPU% (sorted, top 5):**
|
|||
|
|
|
|||
|
|
| Container | CPU% | MEM | Notes |
|
|||
|
|
|---|---:|---:|---|
|
|||
|
|
| jellyfin | **773–876** | 2.6 GiB | this audit's target |
|
|||
|
|
| FORGEJO-ACTIONS-TASK-202_..._Build-push-OCI | **88–99** | 4.3 GiB | uncapped CI build, see §3 culprit 2 |
|
|||
|
|
| traefik | 9 | 48 MiB | routine reverse proxy |
|
|||
|
|
| forgejo | 9 | 207 MiB | git web |
|
|||
|
|
| minecraft-mc | 7 | 4 GiB | racked.ru server |
|
|||
|
|
| (28 other containers) | < 5 % combined | | none material |
|
|||
|
|
|
|||
|
|
The two CPU monsters together (jellyfin + bluebuild) account for **~90 %
|
|||
|
|
of the 12-core host's user time** during this audit window.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 3. Active sessions + active transcodes
|
|||
|
|
|
|||
|
|
**Sessions (within last 600 s):** **1**
|
|||
|
|
|
|||
|
|
| User | Client | Device | RemoteIP | NowPlaying | PlayMethod | Pos |
|
|||
|
|
|---|---|---|---|---|---|---|
|
|||
|
|
| s8n | Jellyfin Web | Chrome | 192.168.0.10 | Rick and Morty S01E01 — Pilot | DirectPlay (claimed) / **Transcoding** (actual) | 8 s |
|
|||
|
|
|
|||
|
|
**TranscodingInfo on the active session:**
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
VideoCodec → h264 (libx264, preset=veryfast, crf=23)
|
|||
|
|
AudioCodec → aac (libfdk_aac, 256 kbps stereo, +6 dB volume gain)
|
|||
|
|
Resolution → 3840 × 2160 (UPSCALE — source is 1080p)
|
|||
|
|
Bitrate → 13.8 Mbps
|
|||
|
|
Container → fmp4 / hls
|
|||
|
|
HW → none
|
|||
|
|
Reasons → VideoCodecNotSupported, AudioCodecNotSupported, SubtitleCodecNotSupported
|
|||
|
|
Direct → IsVideoDirect=False, IsAudioDirect=False
|
|||
|
|
Completion → 0 % (just started)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Active ffmpeg processes on host: 4** (all for the same viewer, same
|
|||
|
|
file — see §5).
|
|||
|
|
|
|||
|
|
The session reports `PlayMethod=DirectPlay` while *also* presenting a
|
|||
|
|
`TranscodingInfo` block — Jellyfin's TS DTO returns the last-set state,
|
|||
|
|
so this is the client navigating into the page; the actual decision was
|
|||
|
|
**transcode** (the 4 ffmpeg's confirm). The HLS player sometimes flips
|
|||
|
|
`PlayMethod=Transcode` only after the first segment downloads; pre-roll
|
|||
|
|
state matches the 4-process pile-up in §5.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 4. Scheduled tasks
|
|||
|
|
|
|||
|
|
All tasks **Idle**. None in progress. Last-run durations are tiny — no
|
|||
|
|
scheduled task is the culprit. Library scan runs every 6 h (last
|
|||
|
|
`14:14:04`, 0.3 s wall — only 187 items so it converges instantly).
|
|||
|
|
|
|||
|
|
| Name | State | Last end (UTC+1) | Last duration | Trigger |
|
|||
|
|
|---|---|---|---:|---|
|
|||
|
|
| Audio Normalization | Idle | 2026-05-08T00:58 | 0.0 s | IntervalTrigger |
|
|||
|
|
| Clean Cache Directory | Idle | 2026-05-08T00:58 | 0.1 s | IntervalTrigger |
|
|||
|
|
| Clean Log Directory | Idle | 2026-05-08T00:58 | 0.0 s | IntervalTrigger |
|
|||
|
|
| Clean Transcode Directory | Idle | 2026-05-08T16:22 | 0.0 s | StartupTrigger |
|
|||
|
|
| Clean up collections and playlists | Idle | 2026-05-08T16:22 | 0.0 s | StartupTrigger |
|
|||
|
|
| Download missing lyrics | Idle | 2026-05-08T00:58 | 0.1 s | IntervalTrigger |
|
|||
|
|
| Download missing subtitles | Idle | 2026-05-08T00:58 | 0.0 s | IntervalTrigger |
|
|||
|
|
| Extract Chapter Images | Idle | 2026-05-08T01:00 | 0.0 s | DailyTrigger |
|
|||
|
|
| Generate Trickplay Images | Idle | 2026-05-08T02:00 | 0.1 s | DailyTrigger |
|
|||
|
|
| Media Segment Scan | Idle | 2026-05-08T14:14 | 0.0 s | IntervalTrigger |
|
|||
|
|
| Optimize database | Idle | 2026-05-08T00:58 | 0.2 s | IntervalTrigger |
|
|||
|
|
| Refresh Guide | Idle | 2026-05-08T00:58 | 3.2 s | IntervalTrigger |
|
|||
|
|
| Refresh People | Idle | 2026-05-08T00:58 | 0.3 s | IntervalTrigger |
|
|||
|
|
| Scan Media Library | Idle | 2026-05-08T14:14 | 0.3 s | IntervalTrigger |
|
|||
|
|
| TasksRefreshChannels | Idle | 2026-05-08T00:58 | 0.1 s | IntervalTrigger |
|
|||
|
|
| Update Plugins | Idle | 2026-05-08T16:22 | 1.2 s | StartupTrigger |
|
|||
|
|
| Clean Activity Log / Keyframe Extractor / Migrate Trickplay Image Location | Idle | (never run) | — | — |
|
|||
|
|
|
|||
|
|
**Container restarted at 16:22:06 today** (StartupTrigger task end-times
|
|||
|
|
imply a restart — last audit had `StartedAt=02:13:01`, doc 13 finding 30
|
|||
|
|
expected 0 restarts). Operator likely restarted the container ~17 h ago,
|
|||
|
|
not material to perf but worth noting.
|
|||
|
|
|
|||
|
|
**Verdict:** culprit (a) "scheduled task hogging CPU" → **ruled out**.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 5. FFmpeg processes on host (snapshot)
|
|||
|
|
|
|||
|
|
**4 simultaneous ffmpeg processes, all transcoding the same source for
|
|||
|
|
the same viewer.** This is the smoking gun. Process tree from the
|
|||
|
|
container shows just `1 jellyfin` (parent) + `1579 ffmpeg` + `1725
|
|||
|
|
ffmpeg` (the others are still spawning); host `ps -ef` shows 4
|
|||
|
|
ffmpeg's owned by `user` uid 1000.
|
|||
|
|
|
|||
|
|
| PID | %CPU | %MEM | RSS | etime | What | Subs filter |
|
|||
|
|
|---:|---:|---:|---:|---:|---|---|
|
|||
|
|
| 1681949 | **643** | 6.9 | 2.27 GB | 53 s | `-ss 33s` HLS seek | **yes** — `[0:4]scale,scale=3840:2160:fast_bilinear[sub] ; [0:0]scale=3840:2160 [main] ; overlay` |
|
|||
|
|
| 1685275 | **135** | 4.4 | 1.45 GB | 6 s | `-ss 15s` HLS seek | yes — same chain |
|
|||
|
|
| 1685316 | **133** | 4.4 | 1.45 GB | 6 s | full transcode (no -ss) | no — plain `setparams + scale + format=yuv420p` |
|
|||
|
|
| 1685478 | **132** | 3.9 | 1.29 GB | 4 s | full transcode `-canvas_size 1920x1080` | yes — same chain |
|
|||
|
|
| 1669243 (earlier sample, then died) | ~759 | 7.0 | 2.30 GB | 254 s | full transcode | no |
|
|||
|
|
|
|||
|
|
**What every ffmpeg is doing:**
|
|||
|
|
- Decoding source 1080p H.265 (or H.264 — Pilot is x264 Bluray rip).
|
|||
|
|
- **Upscaling video to 3840×2160 with `scale=...:fast_bilinear`.**
|
|||
|
|
- **Burning PGS subtitle stream `0:4` ALSO upscaled to 3840×2160 onto
|
|||
|
|
the video.** This is the heaviest overlay path the JF filtergraph
|
|||
|
|
produces.
|
|||
|
|
- Re-encoding to H.264 `libx264 preset=veryfast crf=23 high@L5.1` with
|
|||
|
|
`maxrate=13.5 Mbps`.
|
|||
|
|
- `-threads 0` (= use all cores), `-max_muxing_queue_size 2048`.
|
|||
|
|
- HLS fmp4 segments to `/cache/transcodes/<sessionId><n>.mp4`.
|
|||
|
|
|
|||
|
|
**Why 4 of them at once for one user:** every time the client seeks or
|
|||
|
|
reloads, JF starts a new ffmpeg with a new sessionId and a new segment
|
|||
|
|
file prefix. Because `EnableThrottling=false` and
|
|||
|
|
`EnableSegmentDeletion=false` (doc 13 findings 03/05), the old ffmpeg
|
|||
|
|
keeps producing segments to its own cache prefix and **does not exit
|
|||
|
|
until `SegmentKeepSeconds=720` elapses**. Three observed cache prefixes
|
|||
|
|
right now: `8e8a8538…`, `ef1caecc…` (already produced segments 0–30 →
|
|||
|
|
~73 MiB), `3ba3fce4…`, `b6f150cb…`, `fcc6137e…` — five session-IDs
|
|||
|
|
across the last ~5 minutes for one viewer.
|
|||
|
|
|
|||
|
|
**Why each ffmpeg is so expensive:**
|
|||
|
|
- 1080p → 4K upscale ≈ 4× pixel volume.
|
|||
|
|
- PGS subtitle stream is also being scaled to 4K and overlaid (alpha
|
|||
|
|
blend) every frame.
|
|||
|
|
- `libfdk_aac` 256 kbps is fine, the cost is essentially all video.
|
|||
|
|
- On 12 logical cores at `preset=veryfast`, this filtergraph clocks
|
|||
|
|
**6.4 cores of headroom per ffmpeg** (643 % observed). Two
|
|||
|
|
simultaneous transcodes = full host. Four = swap thrash + the load
|
|||
|
|
avg of 15.
|
|||
|
|
|
|||
|
|
**Why is it upscaling to 4K at all?** Likely the client requested a
|
|||
|
|
profile that picked the "max bitrate / max-resolution" capability of
|
|||
|
|
the device (a desktop Chrome will report 4K-capable). The Jellyfin
|
|||
|
|
ladder is either (a) "always pick highest profile" or (b) the user's
|
|||
|
|
client is set to "Auto" with no max-resolution cap. **No client-side
|
|||
|
|
bitrate cap is set on this user** (doc 13 reported
|
|||
|
|
`RemoteClientBitrateLimit=0`). Combine that with PGS subs the client
|
|||
|
|
can't render → forced burn-in → the 4K-overlay tax kicks in.
|
|||
|
|
|
|||
|
|
**ffprobe storms:** at 13:31 the log shows **7 simultaneous ffprobe
|
|||
|
|
calls** (Mandalorian S2 episodes, all at once); at 17:37 **another 7
|
|||
|
|
simultaneous ffprobes** (Mandalorian S3). Each ffprobe with
|
|||
|
|
`-analyzeduration 200M -probesize 1G` reads up to 1 GiB into RAM. Cause:
|
|||
|
|
operator clicked into the season 2/3 page → JF kicks subtitle-search
|
|||
|
|
for every episode at once because `LibraryMetadataRefreshConcurrency=0`
|
|||
|
|
(= 12). Doc 13 finding 14 already calls the concurrency-cap fix; this
|
|||
|
|
audit confirms the symptom.
|
|||
|
|
|
|||
|
|
**Verdict:** the **single biggest user-visible "loads kinda slow"** is
|
|||
|
|
the 4K-upscale subtitle-burn pile-up.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 6. Plugin status
|
|||
|
|
|
|||
|
|
All 6 plugins **Active**. None in Faulted/Restart. No exception loops in
|
|||
|
|
log from plugin assemblies.
|
|||
|
|
|
|||
|
|
| Name | Version | Status |
|
|||
|
|
|---|---|---|
|
|||
|
|
| AudioDB | 10.10.3.0 | Active |
|
|||
|
|
| MusicBrainz | 10.10.3.0 | Active |
|
|||
|
|
| OMDb | 10.10.3.0 | Active |
|
|||
|
|
| Open Subtitles | 20.0.0.0 | Active *(but mis-configured — see §7)* |
|
|||
|
|
| Studio Images | 10.10.3.0 | Active |
|
|||
|
|
| TMDb | 10.10.3.0 | Active |
|
|||
|
|
|
|||
|
|
**Verdict:** culprit (e) "plugin throwing repeated exceptions in log
|
|||
|
|
spam loop" → **partially confirmed for OpenSubtitles** (it throws on
|
|||
|
|
every probe — 234 today already), but the cost is per-probe RTT not
|
|||
|
|
sustained CPU. Fix is doc 13 finding 04.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 7. Log error / warning summary (last 24 h, today's `log_20260508.log`)
|
|||
|
|
|
|||
|
|
`/config/log/log_20260508.log` is **3 968 lines**. Filtered tally:
|
|||
|
|
|
|||
|
|
| Pattern | Count today | Notes |
|
|||
|
|
|---|---:|---|
|
|||
|
|
| `[ERR]` total | **266** | |
|
|||
|
|
| `[WRN]` total | **124** | |
|
|||
|
|
| `Error downloading subtitles from "Open Subtitles"` | **234** | doc 13 finding 04 — `Username/Password` empty, throws `AuthenticationException` per file probed; **88 % of all errors today are this one cause** |
|
|||
|
|
| `No space left on device : '/config/metadata/library/...'` | **2** | at 13:53:10 — transient ENOSPC during a metadata write; disk now 62 % full (146 GiB free), so this is a moving-target burst (probably caused by 73 MiB+ of transcode segments accumulating in `/cache/transcodes` while a metadata write tried to extend a small file). **Worth watching** but not the current bottleneck |
|
|||
|
|
| `Invalid username or password entered` (auth fail) | 5 | three distinct minutes — looks like a user retrying creds, not a brute-forcer |
|
|||
|
|
| `WS ... error receiving data` (websocket abrupt close) | ~25 | normal: clients closing tabs / dropping carrier. Noise, not a defect |
|
|||
|
|
| `Compiling a query which loads related collections...` (EF Core warning, slow query) | 1 | EF Core's `QuerySplittingBehavior` warning — Jellyfin upstream issue, harmless on this dataset |
|
|||
|
|
| `task was canceled` on `/videos/.../hls1/main/-1.mp4` | 1 (17:41) | client gave up mid-segment-init — same 499 family as doc 13's evidence |
|
|||
|
|
| `SQLITE_BUSY` / `database is locked` | **0** | culprit (d) DB lock contention → **ruled out** |
|
|||
|
|
|
|||
|
|
**Verdict:**
|
|||
|
|
- culprit (e) "plugin log spam" → confirmed (234 OS errors / day = a
|
|||
|
|
scan or page-into-season triggers a loop of failures).
|
|||
|
|
- culprit (d) "DB lock contention" → ruled out (0 SQLITE_BUSY).
|
|||
|
|
- the **2 ENOSPC errors are NEW vs doc 13** and warrant tracking — see
|
|||
|
|
§9 culprit 4.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 8. DB and cache sizes
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
/config/data/jellyfin.db 288 K (was 208 K in doc 13 — fine)
|
|||
|
|
/config/data/library.db 3.4 M (was 3.3 M — fine)
|
|||
|
|
/config/data/library.db-wal 6.2 M (was 4.4 M — STILL LARGER THAN MAIN, see below)
|
|||
|
|
/config/data/library.db-shm 32 K
|
|||
|
|
/config/metadata 99 M (was 92 M — fine)
|
|||
|
|
/config/log 4.2 M (was 1.3 M — 3× growth in 14 h driven by §7 OS spam)
|
|||
|
|
/cache/transcodes 84 M / 43 files (snapshot)
|
|||
|
|
/cache total not measurable from in-container du (mount appears empty due to bind layout)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**library.db-wal (6.2 MB) is now ~1.8× the main `.db` (3.4 MB).** Doc 13
|
|||
|
|
finding 08 already raised this — the situation is slightly worse now
|
|||
|
|
(WAL grew faster than main during 14 h). Cause: SQLite checkpoints on
|
|||
|
|
*idle*, but with continuous transcode + ffprobe activity from two
|
|||
|
|
viewers and library refreshes there is rarely an idle moment to
|
|||
|
|
checkpoint. **Manual `Optimize database` will collapse the WAL into
|
|||
|
|
the main file.**
|
|||
|
|
|
|||
|
|
**`/cache/transcodes` 84 MB / 43 files** is the residue of three+
|
|||
|
|
abandoned ffmpeg sessions. Without `EnableSegmentDeletion=true`, this
|
|||
|
|
will grow unbounded for `SegmentKeepSeconds=720` per session. Worst
|
|||
|
|
case at 1 viewer × 4 zombie sessions × 720 s × 13 Mbps = **~5.6 GiB
|
|||
|
|
transient cache** per minute of pile-up. **This is exactly how the
|
|||
|
|
13:53 ENOSPC happened** (cache + metadata fighting for the same
|
|||
|
|
146-GiB free pool).
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 9. Concrete remediation list (ranked impact / effort)
|
|||
|
|
|
|||
|
|
### 9.1 Quick-wins (rank 1 → 4 — all are minutes of work, all read-only-safe to apply)
|
|||
|
|
|
|||
|
|
1. **Cap two transcode flags** (Settings → Playback):
|
|||
|
|
- `EnableThrottling = true`
|
|||
|
|
- `EnableSegmentDeletion = true`
|
|||
|
|
*Effect:* zombie ffmpeg from a stale session is killed instead of
|
|||
|
|
producing 720 s of segments after the client has moved on. **This
|
|||
|
|
single change directly addresses §5's 4-process pile-up.** Doc 13
|
|||
|
|
already noted this; the new evidence escalates it from "S effort,
|
|||
|
|
cleanup" to **"non-optional"**.
|
|||
|
|
|
|||
|
|
2. **Cap concurrency knobs** (Settings → Server / Library):
|
|||
|
|
- `LibraryScanFanoutConcurrency = 4`
|
|||
|
|
- `LibraryMetadataRefreshConcurrency = 4`
|
|||
|
|
- `ParallelImageEncodingLimit = 4`
|
|||
|
|
*Effect:* 7-up ffprobe burst at 13:31 / 17:37 (§5) is capped to 4
|
|||
|
|
parallel probes, not 12. Doc 13 already noted this as S effort.
|
|||
|
|
|
|||
|
|
3. **Set `RemoteClientBitrateLimit`** (Dashboard → Playback → Streaming
|
|||
|
|
→ "Internet streaming bitrate limit"):
|
|||
|
|
- Suggest `8 Mbps` (covers 1080p Bluray rips, kills 4K-upscale
|
|||
|
|
decisions on remote sessions). LAN clients that want full-bitrate
|
|||
|
|
can be flagged via per-user policy.
|
|||
|
|
*Effect:* the 13.8 Mbps maxrate-on-WAN session becomes a 8-Mbps
|
|||
|
|
session that **doesn't need the 4K-upscale path** — JF stops asking
|
|||
|
|
ffmpeg to produce 3840×2160. **This is what makes §5's per-stream
|
|||
|
|
cost drop by ~70 %.** Independent of GPU.
|
|||
|
|
|
|||
|
|
4. **Disable Open Subtitles plugin OR populate creds** (already in
|
|||
|
|
doc 13 finding 04). Removes 234 ERR/day, restores log signal,
|
|||
|
|
removes the per-probe RTT.
|
|||
|
|
|
|||
|
|
### 9.2 Investments (rank 5 → 7 — half-day to multi-day, structural)
|
|||
|
|
|
|||
|
|
5. **Add CPU + memory limits to BOTH `jellyfin` and the Forgejo
|
|||
|
|
`BlueBuild` build container in compose** — currently both are
|
|||
|
|
uncapped, fighting for the same 12 cores. Suggest:
|
|||
|
|
- `jellyfin`: `cpus: 8.0`, `mem_limit: 12G`, `mem_reservation: 4G`
|
|||
|
|
- `forgejo-runner` build pods: `cpus: 4.0`, `mem_limit: 8G`
|
|||
|
|
*Effect:* a noisy CI build cannot drag interactive playback
|
|||
|
|
latency to the floor; viewer always has 8 cores even when
|
|||
|
|
BlueBuild is hot. Note that the BlueBuild container is short-lived
|
|||
|
|
(forgejo-actions spawns it per job) so the limit goes on the
|
|||
|
|
runner's `container_options` in the runner config, not on a static
|
|||
|
|
compose service.
|
|||
|
|
|
|||
|
|
6. **Re-enable GPU transcoding on host** (doc 13 finding 02 — L
|
|||
|
|
effort). With H.264 NVENC at preset `p4` the same filtergraph
|
|||
|
|
collapses from ~6.4 CPU cores to ~0.3 CPU cores + GPU. Without
|
|||
|
|
GPU, the §5 quick-wins are the cap; with GPU, the host can
|
|||
|
|
serve 4 simultaneous viewers comfortably.
|
|||
|
|
|
|||
|
|
7. **Cap the maximum supported resolution in client policy** (Dashboard
|
|||
|
|
→ Users → each user → Playback → "Maximum allowed video bitrate" /
|
|||
|
|
"Maximum allowed video resolution"). Set non-admin users to
|
|||
|
|
`1080p` max. Closes the foot-gun where any client says "I can do
|
|||
|
|
4K" and Jellyfin obliges with a 4K-upscale CPU bomb.
|
|||
|
|
|
|||
|
|
### 9.3 Watch-list (no immediate action, monitor next audit)
|
|||
|
|
|
|||
|
|
- ENOSPC at 13:53 (only 2 occurrences, host has 146 GiB free now, so
|
|||
|
|
it was a transient pressure burst). Re-check post-quick-wins (1+2
|
|||
|
|
remove the cache pile-up that caused it).
|
|||
|
|
- `library.db-wal` 1.8× main db — manual `Optimize database` after the
|
|||
|
|
above tasks finish, or tighten its schedule from 24 h to 6 h.
|
|||
|
|
- Container restart at 16:22 (was 02:13 in doc 13) — was this operator-
|
|||
|
|
initiated or did `unless-stopped` re-spin a crash? Check
|
|||
|
|
`docker logs jellyfin --since 6h` for `panic`/`crash` next time.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 10. Quick-win vs investment summary
|
|||
|
|
|
|||
|
|
| Bucket | Action | Effort | Expected impact |
|
|||
|
|
|---|---|---|---|
|
|||
|
|
| **Quick-win** | Throttling + SegmentDeletion ON | 2 clicks | Kills §5 zombie ffmpegs immediately; expected load avg drop 40–50 % under one active viewer |
|
|||
|
|
| **Quick-win** | Concurrency caps 12 → 4 | 3 fields | Removes the 7-up ffprobe bursts at season-page navigation |
|
|||
|
|
| **Quick-win** | RemoteClientBitrateLimit = 8 Mbps | 1 field | Stops Jellyfin choosing 4K-upscale paths for WAN clients; ~70 % drop in per-stream CPU |
|
|||
|
|
| **Quick-win** | OpenSubs disable / cred | 30 sec | 234 ERR/day → 0; cleaner log; faster library scans |
|
|||
|
|
| **Investment** | Compose CPU/MEM caps for jellyfin + bluebuild | 30 min compose + 1 restart per container | Removes noisy-neighbor head-of-line blocking by the CI runner |
|
|||
|
|
| **Investment** | GPU transcode reactivation | days (driver work, host) | 20× per-stream CPU efficiency on the 1080p-and-up paths |
|
|||
|
|
| **Investment** | Per-user max-resolution policy | 5 min × N users | Prevents admin foot-gun and any future invitee from triggering the 4K-upscale path |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Appendix — raw evidence
|
|||
|
|
|
|||
|
|
### Container limits (the absence is the finding)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
docker inspect jellyfin --format '{{.HostConfig.Memory}} {{.HostConfig.NanoCpus}}
|
|||
|
|
{{.HostConfig.CpuQuota}} {{.HostConfig.CpuPeriod}}
|
|||
|
|
{{.HostConfig.PidsLimit}} {{.HostConfig.RestartPolicy.Name}}'
|
|||
|
|
→ 0 0 0 0 <no value> unless-stopped
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Host CPU + load + memory
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
nproc: 12
|
|||
|
|
lscpu Model: AMD Ryzen 5 2600X Six-Core Processor (6c / 12t, NUMA0=0–11)
|
|||
|
|
uptime: 17:42:14 up 4 days 17:59, 2 users, load average: 15.43, 14.61, 8.85
|
|||
|
|
free -h: total 31Gi, used 10Gi, free 8.2Gi, buff/cache 13Gi
|
|||
|
|
swap total 24Gi, used 7.8Gi (32 %), SwapCached 789 976 kB
|
|||
|
|
vmstat 1 5 (us / sy / id / wa, last sample): 71 / 13 / 16 / 0
|
|||
|
|
r=11, b=1, cs ≈ 30 K/s
|
|||
|
|
iostat (nvme0n1): 38–433 w/s, 364–2 272 wkB/s, util 0.4–0.9 % — disk idle
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Top hosts on host (snapshot)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
ps -eo pid,user,pcpu,pmem,rss,etimes,args --sort=-pcpu | head:
|
|||
|
|
1681949 user 643 % 6.9 % 2.30 GB 53 s ffmpeg [Rick & Morty S01E01, 4K-upscale + sub burn]
|
|||
|
|
1662267 root 52 % 0.1 % — 426 s fuse-overlayfs (BlueBuild rootfs mount)
|
|||
|
|
1661952 root 36 % 0.1 % — 431 s fuse-overlayfs (BlueBuild rootfs)
|
|||
|
|
1485847 git 8 % 0.8 % 266 MB — gitea web (forgejo)
|
|||
|
|
364785 user 8 % 2.6 % 867 MB — openclaw-gateway
|
|||
|
|
1901802 java 8 % 12.7 % 4.2 GB — minecraft jvm (-Xmx14336M)
|
|||
|
|
1660709 root 7 % 0.3 % 100 MB 442 s buildah build (BlueBuild)
|
|||
|
|
1626511 user 4 % 1.6 % 544 MB — /jellyfin/jellyfin (server proc)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### All 4 active ffmpeg's (full filter chain shown for the heaviest one)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
PID 1681949 (643 % CPU):
|
|||
|
|
-ss 33s -noaccurate_seek -canvas_size 1920x1080
|
|||
|
|
-i Rick.and.Morty.S01E01.mkv
|
|||
|
|
-threads 0 -map 0:0 -map 0:1 -map -0:0
|
|||
|
|
-codec:v libx264 -preset veryfast -crf 23 -maxrate 13546858 -bufsize 27093716
|
|||
|
|
-profile:v high -level 51
|
|||
|
|
-filter_complex
|
|||
|
|
[0:4]scale,scale=3840:2160:fast_bilinear[sub] ;
|
|||
|
|
[0:0]setparams=color_primaries=bt709:color_trc=bt709:colorspace=bt709,
|
|||
|
|
scale=trunc(min(max(iw,ih*a),min(3840,2160*a))/2)*2
|
|||
|
|
:trunc(min(max(iw/a,ih),min(3840/a,2160))/2)*2,
|
|||
|
|
format=yuv420p[main] ;
|
|||
|
|
[main][sub]overlay=eof_action=pass:repeatlast=0
|
|||
|
|
-codec:a libfdk_aac -ac 2 -ab 256000 -af volume=2
|
|||
|
|
-f hls -hls_time 3 -hls_segment_type fmp4 ...
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Sessions API (exactly 1 user, mismatched `PlayMethod` vs `TranscodingInfo`)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
GET /Sessions?activeWithinSeconds=600 → 1 session
|
|||
|
|
user=s8n client=Jellyfin Web/Chrome remote=192.168.0.10
|
|||
|
|
PlayMethod=DirectPlay (claimed)
|
|||
|
|
TranscodingInfo:
|
|||
|
|
VideoCodec=h264 AudioCodec=aac Container=fmp4/hls
|
|||
|
|
3840x2160 @ 13.8 Mbps HW=none IsVideoDirect=False IsAudioDirect=False
|
|||
|
|
Reasons = [VideoCodecNotSupported, AudioCodecNotSupported, SubtitleCodecNotSupported]
|
|||
|
|
Completion = 0.0 %
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Scheduled tasks (none in progress)
|
|||
|
|
|
|||
|
|
(Full table in §4 — every task is `Idle`, last-run durations 0–3.2 s.)
|
|||
|
|
|
|||
|
|
### Plugins (all 6 Active, no faulted)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
AudioDB 10.10.3.0 Active
|
|||
|
|
MusicBrainz 10.10.3.0 Active
|
|||
|
|
OMDb 10.10.3.0 Active
|
|||
|
|
Open Subtitles 20.0.0.0 Active ← 234 ERR/day from auth-empty creds (doc 13 finding 04)
|
|||
|
|
Studio Images 10.10.3.0 Active
|
|||
|
|
TMDb 10.10.3.0 Active
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Log tally (today's `log_20260508.log`, 3 968 lines)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
[ERR] lines: 266
|
|||
|
|
[WRN] lines: 124
|
|||
|
|
"Error downloading subtitles from Open Subtitles": 234 ← 88 % of all ERR
|
|||
|
|
"No space left on device": 2 ← 13:53:10, transient
|
|||
|
|
"Invalid username or password entered" (login): 5
|
|||
|
|
"WS ... error receiving data": ~25 ← noise
|
|||
|
|
"task was canceled" / 499: 1 ← 17:41
|
|||
|
|
"SQLITE_BUSY" / "database is locked": 0
|
|||
|
|
EF Core "QuerySplittingBehavior" warning: 1 ← upstream JF
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Disk (host vs container view)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
host df -h /home: 399G 233G 146G 62 % (was 90 % in doc 13 — improved)
|
|||
|
|
host df -i /home: ~1.49M used / ~26.6M 6 % inodes healthy
|
|||
|
|
container df -h /config /cache /media: same FS, same 146G free
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Items / counts
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
GET /Items/Counts → MovieCount=2 SeriesCount=6 EpisodeCount=181
|
|||
|
|
ArtistCount=0 ProgramCount=0 TrailerCount=0
|
|||
|
|
SongCount=0 AlbumCount=0 MusicVideoCount=0
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Container restart (StartedAt today)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Implied from ScheduledTasks where Trigger=StartupTrigger:
|
|||
|
|
Clean Transcode Directory → end 16:22:06 ← container start ≈ 16:22:05
|
|||
|
|
Clean up collections and playlists → end 16:22:06
|
|||
|
|
Update Plugins → end 16:22:07
|
|||
|
|
(doc 13 had StartedAt = 02:13:01)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Forgejo BlueBuild container (noisy neighbor, no limits)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
docker stats: CPU 88–99 % MEM 4.3 GiB NET in 5 GB BLOCK in/out 296 MB / 35.3 GB
|
|||
|
|
docker inspect: Memory=0 NanoCpus=0 CpuQuota=0 ← uncapped
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Sign-off
|
|||
|
|
|
|||
|
|
- Audit: 2026-05-08, read-only, ~15 min wall.
|
|||
|
|
- No fixes applied. No state mutated. No container restart. No plugins
|
|||
|
|
reloaded. No tasks executed.
|
|||
|
|
- Scope respected: server runtime only. Color/HDR, edge/network, and
|
|||
|
|
storage findings deferred to sibling agents.
|
|||
|
|
- Next audit due: **2026-08-08** (quarterly, paired with doc 13).
|