# 24 — Storage / Disk-I/O / Filesystem Audit (Read-Only) > Status: **read-only audit**, executed 2026-05-08 against > `nullstone` (192.168.0.100). Scope: storage stack underneath Jellyfin > on `arrflix.s8n.ru`. Sibling audits cover color/HDR, server runtime, > and edge/network — this file owns LVM, disks, ext4, mount opts, image > cache, transcode cache, and the RO bind-mount overhead. > > **No writes. No mount changes. No fstrim execution. No cache > flushes. No SMART self-tests.** --- ## Executive summary **Storage is not the bottleneck. CPU is.** Disk I/O across every metric came back fast and healthy. The "loads kinda slow" symptom is almost certainly playback-stall caused by a CPU-only host running 5 concurrent ffmpeg transcodes of the same file at load average 42 — not disk. The storage layer is in the bottom third of the suspect list. Top three storage-side observations (severity, then quick-win order): 1. **Single PV / single LV / single NVMe — no isolation between media reads, transcode writes, OS, and Docker overlay churn.** Severity **Y**. Every workload hits `/dev/nvme0n1` and the ext4 journal at `keystone--vg-home`. Today the SSD shrugs it off (2.1 GB/s direct, 1.2 GB/s through the container RO mount), but transcode-write contention with library-scan reads is real — and the box is currently doing 5 concurrent ffmpegs. **Quick win: nothing today; investment: split media onto a second LV (or second device) so transcode-write churn does not share an ext4 journal with library-scan reads.** 2. **Read-ahead is 128 KB on the LV (`dm-4`).** Severity **Y**. Default for sequential 1080p streams from MKV; would benefit from **512 KB–1 MB** for higher-bitrate or scanning workloads. Tiny win, costs 30 seconds. **Quick win.** 3. **`relatime` on `/home` updates atime on the RO library (the bind mount is RO from the container's view but the underlying ext4 is RW from the host).** Severity **G→Y**. `relatime` is the kernel default and only writes ~1 atime update per 24 h per file, so the write cost on a 201-file library is rounding noise. Documented for completeness; **not worth fixing**. Ruled out as not-a-problem: rotating disk (it's NVMe), low free space (62 % used, 146 GiB free — was 90 % at the prior audit, materially better), inode pressure (6 % used), stale transcodes (zero >60 min old), image-cache GC thrash (oldest cached image is 16 h old, no churn), bind-mount overhead (40 % vs raw — but absolute throughput still 12× a 4K HEVC stream needs), SSD wear (8 % used, 100 % spare, zero media errors), and `data=ordered` journal write barriers (NVMe-class device, irrelevant). --- ## 1. Disk + LVM topology ### Hardware | Layer | Detail | |---|---| | Device | `/dev/nvme0n1`, **Intel SSDPEKKF512G8 NVMe**, 476.9 GiB, non-rotational, internal | | Bus | NVMe | | Loops (irrelevant) | `loop0..loop3` 256 M each (snap remnants — empty) | Single physical drive. **No HDDs. No external storage. No NAS mounts.** The "media on rotating media" hypothesis (a) is **ruled out** — everything is on this NVMe. SMART (NVMe Log 0x02): | Field | Value | |---|---| | Critical Warning | `0x00` | | Temperature | 43 °C | | Available Spare | 100 % | | Percentage Used | **8 %** | | Power-On Hours | 18 597 | | Power Cycles | 3 729 | | Unsafe Shutdowns | 774 | | Media + Data Integrity Errors | **0** | | Error Log Entries | 0 | | Data Units Read | 25.7 TB | | Data Units Written | 25.9 TB | Drive is healthy, mid-life. No remediation. ### Partitions and LVM ``` nvme0n1 (476.9 GiB, NVMe SSD) ├─ nvme0n1p1 976 M vfat /boot/efi ├─ nvme0n1p2 977 M ext4 /boot └─ nvme0n1p3 475 G LVM2 PV → keystone-vg ├─ keystone--vg-root 30.4 G ext4 / ├─ keystone--vg-var 11.4 G ext4 /var ├─ keystone--vg-swap_1 24.3 G swap [SWAP] ├─ keystone--vg-tmp 2.8 G ext4 /tmp └─ keystone--vg-home 406.2 G ext4 /home ← media + jellyfin live here ``` Single-PV VG, **VFree = 0**. Cannot grow `home` without adding another PV. Note swap is **on the same PV** as `home`; under memory pressure (the prior audit caught 6.8 GiB swap in use) swap traffic contends with media reads on the same NVMe queue. ### Mount table (relevant entries only) | Source | Mountpoint | FS | Options | |---|---|---|---| | `keystone--vg-root` | `/` | ext4 | `rw,relatime,errors=remount-ro` | | `keystone--vg-var` | `/var` | ext4 | `rw,nosuid,nodev,relatime` | | `keystone--vg-tmp` | `/tmp` | ext4 | `rw,nosuid,nodev,noexec,relatime` | | `keystone--vg-home` | `/home` | ext4 | `rw,nosuid,nodev,**relatime**` | | `nvme0n1p2` | `/boot` | ext4 | `rw,relatime` | | `nvme0n1p1` | `/boot/efi` | vfat | `rw,relatime,fmask=0077,dmask=0077` | `relatime` is the kernel default; **`atime` was not used** (good — pure `atime` is the actual horror). `noatime` would shave ~1 atime write per 24 h per file accessed; on a 201-file library that's sub-noise. **Not a remediation candidate.** No `discard` flag (good — online discard hurts performance; the weekly `fstrim.timer` is the right pattern, see §8). ### Container bind mounts (Jellyfin) | Host path | Container path | RW | |---|---|---| | `/home/docker/jellyfin/config` | `/config` | RW | | `/home/docker/jellyfin/cache` | `/cache` | RW | | `/home/user/media` | `/media` | **RO** | | `/opt/docker/jellyfin/web-overrides/index.html` | `/jellyfin/jellyfin-web/index.html` | RO | All bind mounts hit the same `keystone--vg-home` LV — config, transcode cache, image cache, and media library all share one ext4 journal and one queue. ### ext4 features (`/dev/keystone--vg-home`) ``` Filesystem features: has_journal ext_attr resize_inode dir_index orphan_file filetype extent 64bit flex_bg metadata_csum_seed sparse_super large_file huge_file dir_nlink extra_isize metadata_csum orphan_present Default mount options: user_xattr acl Total journal size: 1024 M (1 GiB — chunky but standard for 400 GiB) Journal features: journal_incompat_revoke journal_64bit journal_checksum_v3 Filesystem state: clean Last mount time: Sun May 3 23:42:28 2026 Mount count: 8 Block size: 4096 Inode count: 26 624 000 ``` Journal mode is the ext4 default `data=ordered` (no override in mountopts). On NVMe with `metadata_csum` and `journal_checksum_v3`, this is **fine** — would only matter on slow rotational. Hypothesis (b) "ext4 journal in `data=ordered` starves reads" is **ruled out**: the device is NVMe-class and not the bottleneck. --- ## 2. Read throughput (1 large file, raw) Test file: `Rick and Morty (2013) - S01E04 - M. Night Shaym-Aliens.mkv` (1.5 GB, host path `/home/user/media/tv/...`). | Test | Bytes | Wall | Throughput | |---|---|---|---| | `dd … bs=1M count=512 iflag=direct` (host, bypasses cache) | 537 MB | 0.258 s | **2.1 GB/s** | | `dd … bs=1M count=512` (host, page-cache eligible) | 537 MB | 0.536 s | 1.0 GB/s (still warming) | | `dd … bs=1M count=256 iflag=direct` (inside `jellyfin`, RO bind) | 268 MB | 0.233 s | **1.2 GB/s** | **Bind-mount overhead = ~40 %** (2.1 → 1.2 GB/s). That's higher than the "bind mounts are free" folklore but absolute throughput still crushes any practical media bitrate (4K HDR HEVC tops out around 50 Mbit/s = 6.25 MB/s; 1.2 GB/s is **190× headroom**). **Not a bottleneck. Not a remediation candidate.** --- ## 3. Random-read latency `ioping` not installed on host or in container. Skipped. Indirect signal: NVMe device-queue stats from `/proc/diskstats` for `dm-4` (home LV): ``` reads: 15 003 996 read_sectors: 2 600 976 283 read_ms: 3 384 240 writes: 41 153 214 write_sectors: 1 997 023 232 write_ms: 145 844 732 in-flight: 0 io_ms: 5 153 616 ``` Average per-read service ≈ **0.226 ms**, average per-write ≈ **3.5 ms** (consistent with NVMe + ext4 journal flush). No queue stalls observed. --- ## 4. Cache size breakdown | Path | Bytes | Notes | |---|---|---| | `/cache` (total) | **84 MB** | Entire jellyfin cache fits in one MP3 album | | `/cache/transcodes` | 39–61 MB | Live during audit; **5 concurrent ffmpegs** (see §6) | | `/cache/images` | 39 MB | 412 files in 16 hash-prefixed dirs | | `/cache/images/resized-images` | 39 MB | 0 dir, 1 dir, …, f dir (16 buckets, 18–30 files each) | | `/cache/omdb` | 84 KB | Plugin response cache | | `/cache/fontconfig` | 36 KB | | | `/cache/attachments` | 12 KB | Subtitle/font extracts | | `/cache/imagesbyname` | 4 KB | Empty | Total cache = 84 MB on a 400 GB filesystem. **There is no cache pressure.** The "cache being garbage-collected mid-page-load" hypothesis (c) is **ruled out** (oldest cached image timestamp = 2026-05-08 01:12 BST, newest = 17:42 BST = **16.5 h retention with no eviction**). --- ## 5. Image cache miss-vs-hit timing Public asset latency from onyx → `https://arrflix.s8n.ru`: | URL | Attempt 1 (cold) | Attempt 2 (warm) | |---|---|---| | `/web/assets/img/icon-transparent.png` | 0.227 s | 0.047 s | | `/web/serviceworker.js` | 0.059 s | 0.059 s | | `/web/main.jellyfin.bundle.js` | 0.092 s | 0.052 s | 5-sample steady state on `/web/main.jellyfin.bundle.js` = **44–68 ms, median 49 ms**. Traefik + Jellyfin static-asset path is fast. Direct poster URLs (`/Items/{id}/Images/Primary`) require an auth token; could not be probed without a fresh `X-Emby-Token`. Inferred from on-disk evidence: the `resized-images` cache contains 412 WebPs, all under 200 KB, no eviction in the last 16 h. **Image cache serves all current items from disk on warm path.** Hypothesis (c) is **ruled out**. --- ## 6. Stale-transcode detection ``` /cache/transcodes: total bytes: 39 MB (was 61 MB earlier in audit, churn = active stream) total files: 26 files >60 min old: 0 bytes >60 min old: 0 MB ``` `Clean Transcode Directory` task last ran `2026-05-08T02:13` (per audit 13 task list). **Currently zero stale transcode segments.** Hypothesis (d) is **ruled out** — no accumulation. However, **5 concurrent ffmpeg processes are transcoding the same file** right now: ``` PID CPU file 1685478 246% Rick and Morty S01E01 - Pilot.mkv 1686665 203% Rick and Morty S01E01 - Pilot.mkv (same file) 1686651 198% Rick and Morty S01E01 - Pilot.mkv (same file) 1689000 125% Rick and Morty S01E01 - Pilot.mkv (same file) 1689109 120% Rick and Morty S01E01 - Pilot.mkv (same file) ``` This is a **CPU-side** issue (no ffmpeg de-dup, no segment throttling — see audit 13 finding 03). It causes: - Load average **42.62 / 22.84 / 12.32** (12-core box). - Swap usage 7.8 GiB / 24 GiB. - I/O wait however is **0 %** in `vmstat` (`wa=0`). The host CPU is saturated, not the disk. **Storage layer is not this user's bottleneck.** --- ## 7. Inode + free-space stats | Filesystem | 1K-blocks | Used | Available | Use % | Inodes | IUsed | IUse % | |---|---|---|---|---|---|---|---| | `keystone--vg-home` (`/home`) | 418 106 320 | 244 025 392 | 152 768 828 | **62 %** | 26 624 000 | 1 489 612 | **6 %** | | `keystone--vg-root` (`/`) | — | — | — | — | — | — | — | | `keystone--vg-var` (`/var`) | 12 G | 2.0 G | 8.6 G | **19 %** | n/a | n/a | n/a | **Free space went from 40 GiB at audit 13 (90 % full) to 146 GiB now (62 %).** Material improvement; the prior "low free space" hypothesis (e) is **ruled out**. Inode pressure ruled out. (Note: `/home` houses `/home/user/docker-data/100000.100000/...` which contains all userns-remapped Docker overlay2 trees. The 233 G used number includes container layers, not just media. Library itself is 201 files.) --- ## 8. fstrim status ``` fstrim.timer Loaded, enabled, active (waiting) Last triggered: Sun 2026-05-03 23:42:29 BST Next trigger: Mon 2026-05-11 01:12:58 BST fstrim --dry-run /home → /home: 0 B (dry run) trimmed ``` Weekly trim is configured and recently ran (one week before next trigger). **Dry-run reports 0 B candidate** → there is no untrimmed free space on `/home`. SSD performance degradation from unTRIMmed-blocks is **not** a factor. No `discard` mount option (correct — async batched trim via timer is preferred over inline). --- ## 9. Read-ahead and queue settings | Block device | `read_ahead_kb` | scheduler | `nr_requests` | |---|---|---|---| | `nvme0n1` (physical) | **128 KB** | `[none] mq-deadline` | 1023 | | `dm-4` (`keystone--vg-home`, the LV) | **128 KB** | n/a | n/a | | `/sys/block/dm-4` lacks scheduler/nr_requests (dm devices inherit) | 128 KB read-ahead is the kernel default. For sequential MKV streams this is OK; for library-scan workloads (`stat` + open + read first chunk per file) it's also OK. Bumping to 512 KB or 1024 KB would help **scan throughput** during a Jellyfin library refresh — minor win, ~30 s of work. NVMe is using `none` scheduler (correct for NVMe — multiqueue + no elevator). --- ## 10. RO bind-mount overhead — confirmed (From §2.) Host direct = 2.1 GB/s. Container RO bind = 1.2 GB/s. Overhead ≈ 40 % which is higher than expected, likely a side-effect of: - userns remap (`100000.100000` shifts uids) - the `nosuid,nodev` flags on `/home` propagating into the bind - container's `read_ahead_kb` is **not** configurable through bind (inherits 128 KB) **Not actionable today.** Both numbers are 100×+ of any media bitrate. Documented to rule out hypothesis (f). `atime` cost on RO bind: bind mount inherits the host's `relatime` semantics — at most one atime write per file per 24 h, gated by `relatime`. On 201 files that's ≤ 201 atime writes/day = **rounding noise**. Hypothesis (f) **ruled out**. --- ## 11. Concrete remediation list — ranked Severity legend: **R** = red (acute, fix this week), **Y** = yellow (deferred, document risk), **G** = green (audited, healthy, no action). Effort: **S** ≤ 30 min, **M** half-day, **L** > 1 day. | # | Severity | Effort | Bucket | Action | Why | |---|:-:|:-:|---|---|---| | S01 | Y | S | Quick-win | Bump `read_ahead_kb` on `/dev/nvme0n1` to **512 KB** (sysfs or udev rule) | Helps library-scan and large-MKV streams. Tiny risk; reverts on reboot if set live. | | S02 | Y | M | Quick-win | Add `noatime` (replacing `relatime`) to `/home` mount in `/etc/fstab` | Eliminates the residual `relatime` writes; cosmetic but cheap. Requires a remount; do during a window with no playback. | | S03 | Y | M | Investment | Carve a separate **`media` LV** (or attach a second NVMe) for `/home/user/media` and bind-mount it RO into Jellyfin | Isolates library reads from transcode-write churn and Docker overlay churn on the same ext4 journal. Today it is fine; at scale it will not be. | | S04 | Y | M | Investment | Move `keystone--vg-swap_1` off `keystone-vg` (or onto a separate device) | Swap is currently 7.8 GiB used and shares the NVMe queue with media reads. CPU saturation is the proximate cause but cleanly isolating swap helps when CPU finally gets fixed (GPU re-enable, see audit 13 #02). | | S05 | Y | M | Investment | Add a second PV to `keystone-vg` so the VG has free space | `vgs` shows **VFree=0**. Any future `lvextend` will fail until a PV is added. Latent ops trap. | | S06 | G | — | — | Keep weekly `fstrim.timer` as-is | Healthy, current. | | S07 | G | — | — | Keep image cache untouched | 84 MB total cache, 16 h retention, no GC pressure. | | S08 | G | — | — | No change to `data=ordered` ext4 journal | NVMe; mode is fine. | **The single biggest "loads kinda slow" win lives in audit 13 (finding 03 — enable transcode throttling + segment deletion). Storage is not where this is fixed.** --- ## 12. Quick-win vs investment ### Quick-win (≤30 min total, today) - **S01** — `echo 1024 > /sys/block/nvme0n1/queue/read_ahead_kb` (or 512). Reverts on reboot; persist via udev rule under `/etc/udev/rules.d/60-readahead.rules`. Marginal but free. - **S02** — flip `relatime` → `noatime` in `/etc/fstab` for `/home`. Cosmetic but cheap. **Skip if even half-load** — a bad fstab + reboot is an outage; only do during a planned window. ### Investment (half-day to multi-day, plan) - **S03** — separate `media` LV. Requires `lvcreate`, `mkfs`, rsync the library, swap the bind-mount in compose. ~half-day. Pays back when (a) library grows past the current 201 files, (b) GPU transcode is re-enabled (audit 13 #02) and many concurrent reads start happening. - **S04** — relocate swap. Only meaningful after GPU re-enable closes the CPU-saturation root cause. - **S05** — second PV. Trivial mechanically (`pvcreate`, `vgextend`), blocked on having a second device. Defer until needed. ### No-op (audited and healthy) - SMART status (8 % wear, no errors) - ext4 features and journal mode - Inode usage (6 %) - Free space (62 %, 146 GiB headroom) - Cache size (84 MB total) - Stale transcodes (zero) - `fstrim.timer` (working, candidate-bytes = 0) - Bind-mount throughput (1.2 GB/s, 190× any 4K stream) --- ## 13. Sign-off - Audit: 2026-05-08, read-only, ~15 min wall. - No fixes applied. No state mutated. No container restart. No SMART self-test. No fstrim execution. No mount changes. - **Top storage culprit: none.** Storage stack is healthy. The "loads kinda slow" symptom is CPU-side (5 concurrent ffmpegs at load 42, audit 13 #02 + #03). - **Top quick-win: S01 — bump `read_ahead_kb` to 512 KB on `nvme0n1`** for marginal scan/stream gain. Real fix lives in audit 13. - Next audit due: **2026-08-08** (quarterly, with audit 13).