17 KiB
24 — Storage / Disk-I/O / Filesystem Audit (Read-Only)
Status: read-only audit, executed 2026-05-08 against
nullstone(192.168.0.100). Scope: storage stack underneath Jellyfin onarrflix.s8n.ru. Sibling audits cover color/HDR, server runtime, and edge/network — this file owns LVM, disks, ext4, mount opts, image cache, transcode cache, and the RO bind-mount overhead.No writes. No mount changes. No fstrim execution. No cache flushes. No SMART self-tests.
Executive summary
Storage is not the bottleneck. CPU is. Disk I/O across every metric came back fast and healthy. The "loads kinda slow" symptom is almost certainly playback-stall caused by a CPU-only host running 5 concurrent ffmpeg transcodes of the same file at load average 42 — not disk. The storage layer is in the bottom third of the suspect list.
Top three storage-side observations (severity, then quick-win order):
- Single PV / single LV / single NVMe — no isolation between media
reads, transcode writes, OS, and Docker overlay churn. Severity
Y. Every workload hits
/dev/nvme0n1and the ext4 journal atkeystone--vg-home. Today the SSD shrugs it off (2.1 GB/s direct, 1.2 GB/s through the container RO mount), but transcode-write contention with library-scan reads is real — and the box is currently doing 5 concurrent ffmpegs. Quick win: nothing today; investment: split media onto a second LV (or second device) so transcode-write churn does not share an ext4 journal with library-scan reads. - Read-ahead is 128 KB on the LV (
dm-4). Severity Y. Default for sequential 1080p streams from MKV; would benefit from 512 KB–1 MB for higher-bitrate or scanning workloads. Tiny win, costs 30 seconds. Quick win. relatimeon/homeupdates atime on the RO library (the bind mount is RO from the container's view but the underlying ext4 is RW from the host). Severity G→Y.relatimeis the kernel default and only writes ~1 atime update per 24 h per file, so the write cost on a 201-file library is rounding noise. Documented for completeness; not worth fixing.
Ruled out as not-a-problem: rotating disk (it's NVMe), low free space
(62 % used, 146 GiB free — was 90 % at the prior audit, materially
better), inode pressure (6 % used), stale transcodes (zero >60 min
old), image-cache GC thrash (oldest cached image is 16 h old, no
churn), bind-mount overhead (40 % vs raw — but absolute throughput
still 12× a 4K HEVC stream needs), SSD wear (8 % used, 100 % spare,
zero media errors), and data=ordered journal write barriers
(NVMe-class device, irrelevant).
1. Disk + LVM topology
Hardware
| Layer | Detail |
|---|---|
| Device | /dev/nvme0n1, Intel SSDPEKKF512G8 NVMe, 476.9 GiB, non-rotational, internal |
| Bus | NVMe |
| Loops (irrelevant) | loop0..loop3 256 M each (snap remnants — empty) |
Single physical drive. No HDDs. No external storage. No NAS mounts. The "media on rotating media" hypothesis (a) is ruled out — everything is on this NVMe.
SMART (NVMe Log 0x02):
| Field | Value |
|---|---|
| Critical Warning | 0x00 |
| Temperature | 43 °C |
| Available Spare | 100 % |
| Percentage Used | 8 % |
| Power-On Hours | 18 597 |
| Power Cycles | 3 729 |
| Unsafe Shutdowns | 774 |
| Media + Data Integrity Errors | 0 |
| Error Log Entries | 0 |
| Data Units Read | 25.7 TB |
| Data Units Written | 25.9 TB |
Drive is healthy, mid-life. No remediation.
Partitions and LVM
nvme0n1 (476.9 GiB, NVMe SSD)
├─ nvme0n1p1 976 M vfat /boot/efi
├─ nvme0n1p2 977 M ext4 /boot
└─ nvme0n1p3 475 G LVM2 PV → keystone-vg
├─ keystone--vg-root 30.4 G ext4 /
├─ keystone--vg-var 11.4 G ext4 /var
├─ keystone--vg-swap_1 24.3 G swap [SWAP]
├─ keystone--vg-tmp 2.8 G ext4 /tmp
└─ keystone--vg-home 406.2 G ext4 /home ← media + jellyfin live here
Single-PV VG, VFree = 0. Cannot grow home without adding
another PV. Note swap is on the same PV as home; under memory
pressure (the prior audit caught 6.8 GiB swap in use) swap traffic
contends with media reads on the same NVMe queue.
Mount table (relevant entries only)
| Source | Mountpoint | FS | Options |
|---|---|---|---|
keystone--vg-root |
/ |
ext4 | rw,relatime,errors=remount-ro |
keystone--vg-var |
/var |
ext4 | rw,nosuid,nodev,relatime |
keystone--vg-tmp |
/tmp |
ext4 | rw,nosuid,nodev,noexec,relatime |
keystone--vg-home |
/home |
ext4 | rw,nosuid,nodev,**relatime** |
nvme0n1p2 |
/boot |
ext4 | rw,relatime |
nvme0n1p1 |
/boot/efi |
vfat | rw,relatime,fmask=0077,dmask=0077 |
relatime is the kernel default; atime was not used (good —
pure atime is the actual horror). noatime would shave ~1 atime
write per 24 h per file accessed; on a 201-file library that's
sub-noise. Not a remediation candidate. No discard flag (good
— online discard hurts performance; the weekly fstrim.timer is the
right pattern, see §8).
Container bind mounts (Jellyfin)
| Host path | Container path | RW |
|---|---|---|
/home/docker/jellyfin/config |
/config |
RW |
/home/docker/jellyfin/cache |
/cache |
RW |
/home/user/media |
/media |
RO |
/opt/docker/jellyfin/web-overrides/index.html |
/jellyfin/jellyfin-web/index.html |
RO |
All bind mounts hit the same keystone--vg-home LV — config,
transcode cache, image cache, and media library all share one ext4
journal and one queue.
ext4 features (/dev/keystone--vg-home)
Filesystem features: has_journal ext_attr resize_inode dir_index orphan_file
filetype extent 64bit flex_bg metadata_csum_seed
sparse_super large_file huge_file dir_nlink extra_isize
metadata_csum orphan_present
Default mount options: user_xattr acl
Total journal size: 1024 M (1 GiB — chunky but standard for 400 GiB)
Journal features: journal_incompat_revoke journal_64bit journal_checksum_v3
Filesystem state: clean
Last mount time: Sun May 3 23:42:28 2026
Mount count: 8
Block size: 4096
Inode count: 26 624 000
Journal mode is the ext4 default data=ordered (no override in
mountopts). On NVMe with metadata_csum and journal_checksum_v3,
this is fine — would only matter on slow rotational. Hypothesis
(b) "ext4 journal in data=ordered starves reads" is ruled out:
the device is NVMe-class and not the bottleneck.
2. Read throughput (1 large file, raw)
Test file: Rick and Morty (2013) - S01E04 - M. Night Shaym-Aliens.mkv
(1.5 GB, host path /home/user/media/tv/...).
| Test | Bytes | Wall | Throughput |
|---|---|---|---|
dd … bs=1M count=512 iflag=direct (host, bypasses cache) |
537 MB | 0.258 s | 2.1 GB/s |
dd … bs=1M count=512 (host, page-cache eligible) |
537 MB | 0.536 s | 1.0 GB/s (still warming) |
dd … bs=1M count=256 iflag=direct (inside jellyfin, RO bind) |
268 MB | 0.233 s | 1.2 GB/s |
Bind-mount overhead = ~40 % (2.1 → 1.2 GB/s). That's higher than the "bind mounts are free" folklore but absolute throughput still crushes any practical media bitrate (4K HDR HEVC tops out around 50 Mbit/s = 6.25 MB/s; 1.2 GB/s is 190× headroom). Not a bottleneck. Not a remediation candidate.
3. Random-read latency
ioping not installed on host or in container. Skipped.
Indirect signal: NVMe device-queue stats from /proc/diskstats for
dm-4 (home LV):
reads: 15 003 996 read_sectors: 2 600 976 283 read_ms: 3 384 240
writes: 41 153 214 write_sectors: 1 997 023 232 write_ms: 145 844 732
in-flight: 0 io_ms: 5 153 616
Average per-read service ≈ 0.226 ms, average per-write ≈ 3.5 ms (consistent with NVMe + ext4 journal flush). No queue stalls observed.
4. Cache size breakdown
| Path | Bytes | Notes |
|---|---|---|
/cache (total) |
84 MB | Entire jellyfin cache fits in one MP3 album |
/cache/transcodes |
39–61 MB | Live during audit; 5 concurrent ffmpegs (see §6) |
/cache/images |
39 MB | 412 files in 16 hash-prefixed dirs |
/cache/images/resized-images |
39 MB | 0 dir, 1 dir, …, f dir (16 buckets, 18–30 files each) |
/cache/omdb |
84 KB | Plugin response cache |
/cache/fontconfig |
36 KB | |
/cache/attachments |
12 KB | Subtitle/font extracts |
/cache/imagesbyname |
4 KB | Empty |
Total cache = 84 MB on a 400 GB filesystem. There is no cache pressure. The "cache being garbage-collected mid-page-load" hypothesis (c) is ruled out (oldest cached image timestamp = 2026-05-08 01:12 BST, newest = 17:42 BST = 16.5 h retention with no eviction).
5. Image cache miss-vs-hit timing
Public asset latency from onyx → https://arrflix.s8n.ru:
| URL | Attempt 1 (cold) | Attempt 2 (warm) |
|---|---|---|
/web/assets/img/icon-transparent.png |
0.227 s | 0.047 s |
/web/serviceworker.js |
0.059 s | 0.059 s |
/web/main.jellyfin.bundle.js |
0.092 s | 0.052 s |
5-sample steady state on /web/main.jellyfin.bundle.js = 44–68 ms,
median 49 ms. Traefik + Jellyfin static-asset path is fast.
Direct poster URLs (/Items/{id}/Images/Primary) require an auth
token; could not be probed without a fresh X-Emby-Token. Inferred
from on-disk evidence: the resized-images cache contains 412
WebPs, all under 200 KB, no eviction in the last 16 h. Image cache
serves all current items from disk on warm path.
Hypothesis (c) is ruled out.
6. Stale-transcode detection
/cache/transcodes:
total bytes: 39 MB (was 61 MB earlier in audit, churn = active stream)
total files: 26
files >60 min old: 0
bytes >60 min old: 0 MB
Clean Transcode Directory task last ran 2026-05-08T02:13 (per
audit 13 task list). Currently zero stale transcode segments.
Hypothesis (d) is ruled out — no accumulation.
However, 5 concurrent ffmpeg processes are transcoding the same file right now:
PID CPU file
1685478 246% Rick and Morty S01E01 - Pilot.mkv
1686665 203% Rick and Morty S01E01 - Pilot.mkv (same file)
1686651 198% Rick and Morty S01E01 - Pilot.mkv (same file)
1689000 125% Rick and Morty S01E01 - Pilot.mkv (same file)
1689109 120% Rick and Morty S01E01 - Pilot.mkv (same file)
This is a CPU-side issue (no ffmpeg de-dup, no segment throttling — see audit 13 finding 03). It causes:
- Load average 42.62 / 22.84 / 12.32 (12-core box).
- Swap usage 7.8 GiB / 24 GiB.
- I/O wait however is 0 % in
vmstat(wa=0).
The host CPU is saturated, not the disk. Storage layer is not this user's bottleneck.
7. Inode + free-space stats
| Filesystem | 1K-blocks | Used | Available | Use % | Inodes | IUsed | IUse % |
|---|---|---|---|---|---|---|---|
keystone--vg-home (/home) |
418 106 320 | 244 025 392 | 152 768 828 | 62 % | 26 624 000 | 1 489 612 | 6 % |
keystone--vg-root (/) |
— | — | — | — | — | — | — |
keystone--vg-var (/var) |
12 G | 2.0 G | 8.6 G | 19 % | n/a | n/a | n/a |
Free space went from 40 GiB at audit 13 (90 % full) to 146 GiB now (62 %). Material improvement; the prior "low free space" hypothesis (e) is ruled out. Inode pressure ruled out.
(Note: /home houses /home/user/docker-data/100000.100000/...
which contains all userns-remapped Docker overlay2 trees. The 233 G
used number includes container layers, not just media. Library
itself is 201 files.)
8. fstrim status
fstrim.timer Loaded, enabled, active (waiting)
Last triggered: Sun 2026-05-03 23:42:29 BST
Next trigger: Mon 2026-05-11 01:12:58 BST
fstrim --dry-run /home → /home: 0 B (dry run) trimmed
Weekly trim is configured and recently ran (one week before next
trigger). Dry-run reports 0 B candidate → there is no untrimmed
free space on /home. SSD performance degradation from
unTRIMmed-blocks is not a factor. No discard mount option
(correct — async batched trim via timer is preferred over inline).
9. Read-ahead and queue settings
| Block device | read_ahead_kb |
scheduler | nr_requests |
|---|---|---|---|
nvme0n1 (physical) |
128 KB | [none] mq-deadline |
1023 |
dm-4 (keystone--vg-home, the LV) |
128 KB | n/a | n/a |
/sys/block/dm-4 lacks scheduler/nr_requests (dm devices inherit) |
128 KB read-ahead is the kernel default. For sequential MKV streams
this is OK; for library-scan workloads (stat + open + read first
chunk per file) it's also OK. Bumping to 512 KB or 1024 KB would
help scan throughput during a Jellyfin library refresh — minor
win, ~30 s of work.
NVMe is using none scheduler (correct for NVMe — multiqueue + no
elevator).
10. RO bind-mount overhead — confirmed
(From §2.) Host direct = 2.1 GB/s. Container RO bind = 1.2 GB/s. Overhead ≈ 40 % which is higher than expected, likely a side-effect of:
- userns remap (
100000.100000shifts uids) - the
nosuid,nodevflags on/homepropagating into the bind - container's
read_ahead_kbis not configurable through bind (inherits 128 KB)
Not actionable today. Both numbers are 100×+ of any media bitrate. Documented to rule out hypothesis (f).
atime cost on RO bind: bind mount inherits the host's relatime
semantics — at most one atime write per file per 24 h, gated by
relatime. On 201 files that's ≤ 201 atime writes/day = rounding
noise. Hypothesis (f) ruled out.
11. Concrete remediation list — ranked
Severity legend: R = red (acute, fix this week), Y = yellow (deferred, document risk), G = green (audited, healthy, no action). Effort: S ≤ 30 min, M half-day, L > 1 day.
| # | Severity | Effort | Bucket | Action | Why |
|---|---|---|---|---|---|
| S01 | Y | S | Quick-win | Bump read_ahead_kb on /dev/nvme0n1 to 512 KB (sysfs or udev rule) |
Helps library-scan and large-MKV streams. Tiny risk; reverts on reboot if set live. |
| S02 | Y | M | Quick-win | Add noatime (replacing relatime) to /home mount in /etc/fstab |
Eliminates the residual relatime writes; cosmetic but cheap. Requires a remount; do during a window with no playback. |
| S03 | Y | M | Investment | Carve a separate media LV (or attach a second NVMe) for /home/user/media and bind-mount it RO into Jellyfin |
Isolates library reads from transcode-write churn and Docker overlay churn on the same ext4 journal. Today it is fine; at scale it will not be. |
| S04 | Y | M | Investment | Move keystone--vg-swap_1 off keystone-vg (or onto a separate device) |
Swap is currently 7.8 GiB used and shares the NVMe queue with media reads. CPU saturation is the proximate cause but cleanly isolating swap helps when CPU finally gets fixed (GPU re-enable, see audit 13 #02). |
| S05 | Y | M | Investment | Add a second PV to keystone-vg so the VG has free space |
vgs shows VFree=0. Any future lvextend will fail until a PV is added. Latent ops trap. |
| S06 | G | — | — | Keep weekly fstrim.timer as-is |
Healthy, current. |
| S07 | G | — | — | Keep image cache untouched | 84 MB total cache, 16 h retention, no GC pressure. |
| S08 | G | — | — | No change to data=ordered ext4 journal |
NVMe; mode is fine. |
The single biggest "loads kinda slow" win lives in audit 13 (finding 03 — enable transcode throttling + segment deletion). Storage is not where this is fixed.
12. Quick-win vs investment
Quick-win (≤30 min total, today)
- S01 —
echo 1024 > /sys/block/nvme0n1/queue/read_ahead_kb(or 512). Reverts on reboot; persist via udev rule under/etc/udev/rules.d/60-readahead.rules. Marginal but free. - S02 — flip
relatime→noatimein/etc/fstabfor/home. Cosmetic but cheap. Skip if even half-load — a bad fstab + reboot is an outage; only do during a planned window.
Investment (half-day to multi-day, plan)
- S03 — separate
mediaLV. Requireslvcreate,mkfs, rsync the library, swap the bind-mount in compose. ~half-day. Pays back when (a) library grows past the current 201 files, (b) GPU transcode is re-enabled (audit 13 #02) and many concurrent reads start happening. - S04 — relocate swap. Only meaningful after GPU re-enable closes the CPU-saturation root cause.
- S05 — second PV. Trivial mechanically (
pvcreate,vgextend), blocked on having a second device. Defer until needed.
No-op (audited and healthy)
- SMART status (8 % wear, no errors)
- ext4 features and journal mode
- Inode usage (6 %)
- Free space (62 %, 146 GiB headroom)
- Cache size (84 MB total)
- Stale transcodes (zero)
fstrim.timer(working, candidate-bytes = 0)- Bind-mount throughput (1.2 GB/s, 190× any 4K stream)
13. Sign-off
- Audit: 2026-05-08, read-only, ~15 min wall.
- No fixes applied. No state mutated. No container restart. No SMART self-test. No fstrim execution. No mount changes.
- Top storage culprit: none. Storage stack is healthy. The "loads kinda slow" symptom is CPU-side (5 concurrent ffmpegs at load 42, audit 13 #02 + #03).
- Top quick-win: S01 — bump
read_ahead_kbto 512 KB onnvme0n1for marginal scan/stream gain. Real fix lives in audit 13. - Next audit due: 2026-08-08 (quarterly, with audit 13).