# 07 — Pre-Import Cleanup Ruleset (tv.s8n.ru) Last updated: 2026-05-08 Server: Jellyfin 10.10.3 on nullstone, container `jellyfin` Library root inside container: `/media` Library root on host: `/home/user/media` This document defines the **normative pre-import cleanup ruleset** for the personal Jellyfin deploy. The owner downloads scene/group releases (e.g. `Futurama Season 1 [1080p AI x265 10bit FS99 Joy]/`) which contain a mixture of media files and non-media junk (codec readmes, release-group brags, Windows installer shortcuts, comparison images, OS thumbnail caches, etc.). This junk must NOT land in `/home/user/media/` because: 1. It clutters the library and confuses scrapers. 2. Promo PNGs may be mis-identified as artwork. 3. Release-group `.nfo` files break the NFO-override flow (doc 02 § 11). 4. **Windows executables and installer shortcuts (`.exe`, `.msi`, `.website`, `.url`, `.lnk`, `.scr`, `.bat`, `.ps1`) are a real security vector.** Even though the Linux server cannot execute them, friends with a Jellyfin account can download them through the web UI and run them on their PC. Cross-linked to: - [`01-artwork-and-images.md`](01-artwork-and-images.md) — what counts as a recognised poster / backdrop on disk. - [`02-metadata-and-titles.md`](02-metadata-and-titles.md) — NFO sidecar override flow; what a "real" Jellyfin NFO looks like. - [`03-subtitles.md`](03-subtitles.md) — which subtitle files to keep. - [`05-file-structure-rules.md`](05-file-structure-rules.md) — canonical folder layout. § 8 of doc 05 defines the recognised extras subfolders; this doc enforces them at import time. - [`08-filename-normalization.md`](08-filename-normalization.md) — the **next** stage of the pipeline (sibling agent), called after this doc's `cleanup-import.sh` has produced a clean staging tree. Sources of truth: - — extras subfolders and artwork filename patterns. - — same for series. - — NFO XML schema; used here to distinguish a real metadata NFO from a release-group brag. --- ## 0. Top-level cleanup rules These are non-negotiable. They wrap the doc 05 top-level rules with one guarantee: **nothing leaves staging until cleanup has run and been confirmed.** 1. **Never clean in-place on the source download.** The download directory (`/home/admin/Downloads/...`) is treated as a read-only artefact until the user explicitly approves deletion. The cleanup script copies into a staging area and operates there. 2. **Quarantine first, delete later.** First run of the cleanup script on a release moves junk to `~/.jellyfin-quarantine///` instead of deleting. The user reviews, then a second pass empties the quarantine after sign-off. Subsequent runs on the same release are idempotent. 3. **Two-list policy.** Every file is matched against an `ALLOW` list (KEEP) or a `DENY` list (DELETE). Anything not on either list is **flagged** and surfaced in the audit report — a human decides. Never auto-delete on "unknown". 4. **Never run cleanup as root.** All operations are as the unprivileged `admin` (onyx) or `user` (nullstone) account. The live `/home/user/media/` tree is touched only by the rename step in doc 08, after cleanup has produced an intermediate staging copy. 5. **Idempotent.** Running cleanup twice on the same source must produce the same staging tree byte-for-byte (same `find -printf '%p %s\n' | sort` output, modulo timestamps). 6. **Dry-run is the default.** The cleanup script with no flags lists what it *would* do and exits without writing. `--apply` is required to actually move/quarantine files. --- ## 1. Categorical taxonomy of non-media files in scene/group releases Scene and group ("p2p") releases follow loose conventions. The following categories cover everything observed in the wild plus everything in the Futurama download set: ### 1.1 Codec / player promotion Text files and Windows shortcut files steering the user toward a specific codec pack or media player (often K-Lite + MPC-HC). Frequently the file is an `.url` or `.website` (Internet Shortcut) pointing to a third-party installer. **Always DELETE.** Real-world examples (`/home/admin/Downloads/futrama/`): - `How to play HEVC (THIS FILE).txt` — 65 lines of MPC-HC marketing. - `Ninite K-Lite Codecs Unattended Silent Installer and Updater.website` — `URL=https://ninite.com/klitecodecs/` Internet Shortcut. Patterns: - `How to play *.txt`, `Read*Me*.txt`, `INSTALL*.txt`, `PLAY*.txt` - `*.website`, `*.url`, `*.lnk` - `K-Lite*`, `MPC-HC*`, `VLC*`, `MX Player*`, `LAV*` ### 1.2 Release-group brag Plain-text or `.nfo` files where the release group identifies itself, documents encoder settings, or pumps its tracker URL. Distinguishable from a **Jellyfin-compatible metadata NFO** (XML, root `` / `` / ``) by content — see § 3. Real-world examples: - `Encoded by JoyBell (UTR).txt` — 41-line manifesto from "Unity Team Release group" pointing to `UNITEAM.CO`. - `RARBG.txt`, `WWW.YIFY-TORRENTS.COM.url`, `.nfo` with ASCII art. Patterns: - `Encoded by *.txt`, `Ripped by *.txt`, `.txt` - `RARBG.txt`, `RARBG_DO_NOT_MIRROR.exe` (yes, those exist; § 1.10) - `*-readme.txt`, `release notes.txt` - `*.nfo` containing only ASCII art (no `` / `` / `` root element) - `*.diz`, `file_id.diz` — old "BBS description" file, scene leftover ### 1.3 Promo images that are NOT poster artwork Images that LOOK like artwork to a naive globber but are actually before/after comparisons, group banners, or screenshot proofs. **Delete unless they live inside a recognised extras folder (§ 4) or match the strict allow-list of poster/backdrop names from doc 01.** Real-world example: - `Futurama Compare.png` (1.05 MB) — encoder before/after comparison. Patterns to delete: - `*Compare*.{png,jpg,jpeg,webp}` - `*Sample*.{png,jpg,jpeg}` (when not in a `samples/` extras folder) - `*Screen*.{png,jpg}`, `*Screens/*`, `*Proof/*`, `*Preview/*` - `*-banner.png` from a group (NOT the same as Jellyfin's `banner.jpg`; group banners typically have the group name in the filename — heuristic match `*JoyBell*`, `*UTR*`, `*JoY*`, etc.) - Stray `*.gif` files (animated previews); Jellyfin doesn't use GIF. ### 1.4 OS-generated thumbnail caches Per-OS file managers (Windows Explorer, macOS Finder, GNOME Files) leave turds in every directory they browse. **Always DELETE — never useful, never metadata.** Patterns: - `Thumbs.db`, `ehthumbs.db`, `ehthumbs_vista.db` - `.DS_Store`, `._*` (macOS resource forks) - `Desktop.ini`, `desktop.ini` - `.directory` (KDE) - `.fseventsd/`, `.Spotlight-V100/`, `.Trashes/` (macOS) - `$RECYCLE.BIN/`, `System Volume Information/` (Windows mount) ### 1.5 Sample files (lower-quality previews) Scene releases sometimes ship a 30-second sample file at lower bitrate. Jellyfin treats a `samples/` subfolder as extras (doc 05 § 8.2), but a stray `Movie.sample.mkv` next to the main file would scrape as "another version". **Default: DELETE.** Reasoning: we have the full file; the sample is dead weight. If the user genuinely wants samples, drop them into a `samples/` subfolder before running cleanup and the script will preserve the folder. Patterns to delete (when at the top level of a release): - `sample.{mkv,mp4,avi,m4v}` - `*-sample.{mkv,mp4,avi,m4v}`, `*.sample.{mkv,mp4,avi,m4v}` - `*_sample.{mkv,mp4,avi,m4v}` - `Sample/` directory (rename to `samples/` to preserve as extras, OR delete) ### 1.6 Subtitle leftovers VobSub (DVD/Blu-ray bitmap subs) are shipped as a pair: `en.idx` (index) + `en.sub` (bitmap stream). Jellyfin can render them, but if a `.srt` exists with the same language tag the bitmap pair is redundant and slow. **Default: KEEP all `.srt` and `.ass`. KEEP `.idx`/`.sub` only if no `.srt` of the same language exists.** This is a per-file decision — surface to the user in the audit report rather than auto-pruning. Patterns: - `*.srt`, `*.ass`, `*.ssa`, `*.vtt` — KEEP (per doc 03). - `*.sup` (PGS bitmap, Blu-ray) — KEEP (Jellyfin renders). - `*.idx` + `*.sub` (VobSub) — KEEP if no `.srt` with same lang code; else flag for human review. - `*.smi`, `*.rt` — DELETE (obsolete formats Jellyfin doesn't support). ### 1.7 Torrent residue Files left by the torrent client itself. None are useful to Jellyfin. Patterns to delete: - `*.torrent`, `*.magnet` - `*.parts`, `*.!ut`, `*.!qB`, `*.bc!` (in-progress fragments) - `*.meta`, `*.aria2` - `*.pad`, `padding/`, `__padding_file_*` (mktorrent padding) - `*.sfv` (checksum manifest; harmless but useless after download) - `*.md5`, `*.sha1`, `*.sha256` (release-checksum sidecars) ### 1.8 Test / proof images and folders Some groups ship a `Proof/` or `Screens/` folder with screenshots to "prove" the rip's quality. Useless inside a Jellyfin library. Patterns to delete (whole folders): - `Proof/`, `proof/`, `PROOF/` - `Screens/`, `screens/`, `Screenshots/`, `Caps/` - `Preview/`, `Previews/` - `_screens/`, `screenshots-only/` ### 1.9 Multi-disc DVD/Blu-ray cruft When a release is a straight ISO rip the `VIDEO_TS/` or `BDMV/` directory sometimes survives next to the encoded file. Jellyfin can play `VIDEO_TS.IFO` directly, but a partial DVD structure left over from the encode is just clutter. Patterns: - `VIDEO_TS/` — KEEP if it contains a complete `VIDEO_TS.VOB` set; otherwise flag. - `*.IFO`, `*.BUP`, `*.VOB` — KEEP if inside a complete `VIDEO_TS/`; DELETE if loose. - `BDMV/`, `CERTIFICATE/`, `AACS/` — KEEP if complete BD structure; flag if partial. - `*.iso` inside a media folder — flag for human review (could be the intentional rip OR a Windows malware vector — see § 8). ### 1.10 Outright malicious / suspicious Some releases historically shipped Windows executables disguised as "DO NOT MIRROR" anti-leech files. Even on a Linux server these must be deleted because the friend with a Jellyfin account can download them via the web UI ("Download original file" button) and run them locally. **Always DELETE, never quarantine, never preserve, no exceptions.** Patterns: - `*.exe`, `*.msi`, `*.bat`, `*.cmd`, `*.com`, `*.scr`, `*.ps1`, `*.vbs`, `*.wsf`, `*.hta`, `*.jar` - `*.app/` (macOS bundle dropped by macOS-using uploader) - `*.dll`, `*.sys` (rare, but seen) - Anything with a double extension like `Movie.mkv.exe` --- ## 2. KEEP vs DELETE — exhaustive table This table is the **canonical decision matrix** for `cleanup-import.sh`. Patterns are case-insensitive on `ext4`+Jellyfin. `KEEP` means it goes to the staging tree; `DELETE` means it goes to quarantine on first run, then recycle-bin on confirm. | Pattern | Action | Why | |---|---|---| | `*.mkv`, `*.mp4`, `*.avi`, `*.m4v`, `*.ts`, `*.mov`, `*.webm`, `*.wmv`, `*.flv`, `*.mpg`, `*.mpeg` | **KEEP** | Media — the entire point. | | `*.srt`, `*.ass`, `*.ssa`, `*.vtt`, `*.sup` | **KEEP** | Subtitles (doc 03). | | `*.idx` + `*.sub` (VobSub pair) | **KEEP** if no `.srt` of same lang exists; else **FLAG** | Bitmap subs; redundant with SRT. | | `*.smi`, `*.rt` | **DELETE** | Obsolete subtitle formats; Jellyfin can't render. | | `folder.{jpg,png}`, `poster.{jpg,png}`, `cover.{jpg,png}`, `default.{jpg,png}`, `show.{jpg,png}`, `jacket.{jpg,png}`, `movie.{jpg,png}` | **KEEP** | Jellyfin-recognised primary artwork (doc 01). | | `backdrop.{jpg,png}`, `fanart.{jpg,png}`, `background.{jpg,png}`, `art.{jpg,png}`, `backdrop[0-9]*.{jpg,png}`, `backdrop-[0-9]*.{jpg,png}` | **KEEP** | Jellyfin-recognised backdrops (doc 01). | | `logo.{png,jpg}`, `clearlogo.{png,jpg}`, `banner.{jpg,png}`, `landscape.{jpg,png}`, `thumb.{jpg,png}`, `disc.{png,jpg}`, `clearart.{png,jpg}` | **KEEP** | Jellyfin-recognised auxiliary artwork. | | `season[0-9]*-poster.{jpg,png}`, `season[0-9]*.{jpg,png}`, `season-specials-poster.{jpg,png}` | **KEEP** | Per-season artwork (doc 01 / TV layout). | | `extrafanart/*.{jpg,png}`, `backdrops/*.{jpg,png,mp4}` | **KEEP** | Multi-backdrop folders (doc 05 § 8). | | `*.nfo` with XML root `` / `` / `` / `` / `` / `` | **KEEP** | Jellyfin-compatible metadata sidecar (doc 02 § 11). | | `*.nfo` without one of the above XML roots | **DELETE** | Release-group ASCII-art brag — pretends to be metadata, isn't. | | `*Compare*.{png,jpg,jpeg,webp,gif}` | **DELETE** | Encoder before/after — group promo. | | `*Sample*.{png,jpg,jpeg}` (image, top level) | **DELETE** | Group promo (NOT a Jellyfin sample folder). | | `*Screen*.{png,jpg}`, `Screens/`, `Screenshots/`, `Caps/` | **DELETE** | Proof shots. | | `Proof/`, `proof/`, `PROOF/` | **DELETE** (whole folder) | Quality-proof shots. | | `Preview/`, `Previews/` | **DELETE** (whole folder) | Lower-quality teaser. | | `*.txt` (any) | **DELETE** | Readme / group brag — Jellyfin doesn't read TXT. | | `*.diz`, `file_id.diz` | **DELETE** | Scene description file — obsolete. | | `*.website`, `*.url`, `*.lnk` | **DELETE** | Windows Internet Shortcut — points at codec/installer pages. **Security: § 8.** | | `*.exe`, `*.msi`, `*.bat`, `*.cmd`, `*.com`, `*.scr`, `*.ps1`, `*.vbs`, `*.wsf`, `*.hta`, `*.jar`, `*.dll`, `*.sys` | **DELETE** | Windows executable. **Security: § 8.** | | `*.app/` | **DELETE** (whole folder) | macOS bundle. | | `Thumbs.db`, `ehthumbs.db`, `ehthumbs_vista.db` | **DELETE** | Windows Explorer thumbnail cache. | | `.DS_Store`, `._*` | **DELETE** | macOS Finder. | | `Desktop.ini`, `desktop.ini` | **DELETE** | Windows folder customisation. | | `.directory` | **DELETE** | KDE Dolphin. | | `.fseventsd/`, `.Spotlight-V100/`, `.Trashes/`, `$RECYCLE.BIN/`, `System Volume Information/` | **DELETE** (whole folder) | OS metadata directories. | | `sample.{mkv,mp4,avi,m4v}` (top level) | **DELETE** | Lower-quality preview (doc 05 § 8.1: full file already present). | | `*-sample.{mkv,mp4,avi,m4v}`, `*_sample.{mkv,mp4,avi,m4v}`, `*.sample.{mkv,mp4,avi,m4v}` | **DELETE** | Same. | | `Sample/` (directory, top level) | **DELETE** | Lower-quality preview folder. | | `samples/` (directory, recognised name) | **KEEP** | Jellyfin extras folder (doc 05 § 8.2). | | `featurettes/`, `behind the scenes/`, `deleted scenes/`, `interviews/`, `scenes/`, `shorts/`, `clips/`, `trailers/`, `extras/`, `other/`, `theme-music/`, `backdrops/` | **KEEP** (whole folder) | Jellyfin extras (doc 05 § 8.2). | | `Featurettes/`, `Behind The Scenes/`, etc. (capitalised) | **KEEP** but **rename to lowercase** | Jellyfin matches case-insensitive but lowercase is the documented form. | | Any other folder name | **FLAG** | Surface to human; might be a typo of an extras folder. | | `*.torrent`, `*.magnet` | **DELETE** | Torrent client residue. | | `*.parts`, `*.!ut`, `*.!qB`, `*.bc!`, `*.aria2` | **DELETE** | In-progress download fragments (shouldn't be here, but defensive). | | `*.meta` | **DELETE** | aria2/torrent metadata. | | `*.pad`, `padding/`, `__padding_file_*`, `_____padding_file_*` | **DELETE** | mktorrent padding files. | | `*.sfv`, `*.md5`, `*.sha1`, `*.sha256` | **DELETE** | Checksum manifests; harmless but useless after download. | | `*.rar`, `*.r[0-9][0-9]`, `*.zip`, `*.7z`, `*.tar`, `*.tar.gz` | **FLAG** | Compressed archive in a media folder is suspicious — release should have been extracted before download. | | `*.iso` inside a media folder | **FLAG** | Could be intentional DVD/BD rip OR Windows-installer disguise. Human review. | | `VIDEO_TS/` (complete) | **KEEP** | Jellyfin plays DVD structure directly. | | `*.IFO`, `*.BUP`, `*.VOB` (loose, no `VIDEO_TS/`) | **DELETE** | Orphan DVD remnants. | | `BDMV/` (complete) | **KEEP** | Jellyfin plays BD structure. | | `CERTIFICATE/`, `AACS/` (without `BDMV/`) | **DELETE** | Orphan BD remnants. | | `RARBG*.{txt,exe}`, `WWW.*.url`, `*.YIFY*.url` | **DELETE** | Tracker promo. | | `RARBG_DO_NOT_MIRROR.exe` and similar | **DELETE** (security: § 8) | Historic anti-leech file; sometimes weaponised. | | Anything else | **FLAG** | Two-list policy: never auto-delete on "unknown". | --- ## 3. NFO handling — the nuanced case `.nfo` is overloaded. Two completely different file kinds share the extension: - **Scene release `.nfo`** — plain text, ASCII art, encoder credits, tracker URL. Useless to Jellyfin (and at worst gets scraped as garbage metadata if NFO Saver is enabled). - **Jellyfin/Kodi/Emby metadata NFO** — XML, root element is one of ``, ``, ``, ``, ``, ``. Documented in doc 02 § 11. ### 3.1 The discriminator one-liner ```bash is_jellyfin_nfo() { # Returns 0 (KEEP) if the file looks like a Jellyfin/Kodi NFO, # 1 (DELETE) if it looks like scene-group ASCII-art brag. head -c 4096 "$1" | tr -d '[:space:]' \ | grep -qE '<(movie|tvshow|episodedetails|artist|album|musicvideo|season)\b' } # Usage: if is_jellyfin_nfo "$f"; then echo "KEEP $f"; else echo "DELETE $f"; fi ``` The first 4096 bytes are enough — a real Jellyfin NFO declares its root within the first kilobyte. `tr -d '[:space:]'` is needed because some encoders pretty-print the XML and put ``, ``): DELETE. Jellyfin won't read it; nothing to preserve. - An NFO with valid XML but **stale TMDB/IMDB IDs** that conflict with a newer scrape: KEEP, but flag for the user — doc 02 § 11.5 explains how the NFO Saver overwrites these on next scrape. - Multiple NFOs in one folder (e.g. `release.nfo` from the group AND `tvshow.nfo` from a previous Jellyfin write): KEEP `tvshow.nfo`, DELETE `release.nfo`. Use the discriminator above on each. ### 3.3 First-100-bytes shortcut The task brief proposes this: ```bash if head -c 100 file.nfo | grep -qE '<(movie|tvshow|episodedetails)\b'; then echo KEEP; else echo DELETE; fi ``` This works for the common case but misses NFOs that start with an XML declaration (`` plus possibly a comment) before the root element — that prologue alone can be > 100 bytes. The 4096-byte version above is safer; we use that in `cleanup-import.sh`. --- ## 4. Featurettes / Extras / Bonus folders — the canonical list Per the Jellyfin docs (movies and shows pages), these subfolder names are recognised and the contained files are tagged with the matching extra type. **Folder name match is case-insensitive but lowercase is the documented canonical form** — `cleanup-import.sh` lowercases on copy to staging. | Folder name | Extra type | Notes | |---|---|---| | `behind the scenes` | Behind The Scenes | spaces, not dashes | | `deleted scenes` | Deleted Scene | | | `interviews` | Interview | | | `scenes` | Scene | | | `samples` | Sample | distinct from a top-level `Sample/` (§ 1.5) | | `shorts` | Short | | | `featurettes` | Featurette | | | `clips` | Clip | | | `other` | Other | catch-all | | `extras` | Extra | generic catch-all | | `trailers` | Trailer | | | `theme-music` | Theme music | `.mp3` files; doc 05 § 8.3 | | `backdrops` | Backdrop video | rotating video backgrounds | Anything else (e.g. `Bonus Features/`, `BTS/`, `Special Features/`, `Featurette/` singular, `behind-the-scenes/` with dashes) is **NOT** matched by Jellyfin and the contents won't surface as extras. Cleanup either renames to the canonical name (when the mapping is unambiguous) or flags for human review. ### 4.1 Canonical-name mapping (auto-rename) | Found | Renamed to | |---|---| | `Featurettes/`, `Featurette/`, `FEATURETTES/` | `featurettes/` | | `Behind The Scenes/`, `BTS/`, `behind-the-scenes/` | `behind the scenes/` | | `Deleted Scenes/`, `Deleted_Scenes/`, `deleted-scenes/` | `deleted scenes/` | | `Interviews/`, `Interview/` | `interviews/` | | `Trailers/`, `Trailer/` | `trailers/` | | `Bonus/`, `Bonus Features/`, `Bonus Material/`, `Special Features/`, `Specials/` | `extras/` (generic catch-all) | | `Outtakes/`, `Bloopers/`, `Gag Reel/` | `extras/` (no dedicated folder) | The `Specials/` rename to `extras/` is **important** — for a TV series, `Specials/` looks like a season folder (Season 0 specials), but if the files inside are featurettes rather than aired specials, putting them in the wrong folder mis-scrapes them as episodes. When in doubt, flag. ### 4.2 Real-world example: Futurama download The four Futurama season folders all contain a `Featurettes/` subfolder: ``` Futurama Season 1 [1080p AI x265 10bit FS99 Joy]/Featurettes/ ├── Episode One Animatic.mkv └── Welcome to the World of Tomorrow.mkv Futurama Season 2 .../Featurettes/ ├── Animatic -Why Must I be a Crustacean in Love.mkv └── Futurama Game Trailer.mkv Futurama Season 3 .../Featurettes/ ├── An X-Mas Message From David X. Cohen.mkv └── Deleted Scenes.mkv Futurama Season 4 .../Featurettes/ ├── Futurama Welcome to the World of Tomorrow (x265 Joy).mkv ├── Outtakes - Kif Gets Knocked Up a Notch [1080p x265 10bit Joy].mkv └── Panel on Voice Actors [1080p x265 10bit Joy].mkv ``` After cleanup these become `featurettes/` (lowercase) inside the season folder. Doc 08 (filename normalization) then renames the season folder itself to `Season 01/` and may relocate the season-level featurettes to a **series-level** `featurettes/` folder if the user prefers extras at the series root (this is a doc 05 § 8 / doc 08 decision, not this doc's). > Note: `Season 3 / Deleted Scenes.mkv` is a single file and should arguably > be moved into a `deleted scenes/` subfolder rather than left in > `featurettes/`. That's a manual disambiguation — flagged, not auto-moved. --- ## 5. Audit-then-clean workflow Three-stage pipeline. Stage 1 is mandatory; stage 2 runs on user approval; stage 3 is reversible until the quarantine retention window expires. ### 5.1 Stage 1 — Dry-run audit Lists every file in the source release classified as KEEP / DELETE / FLAG. Writes nothing. ```bash # Dry-run audit on a single release dir. cleanup-import.sh "/home/admin/Downloads/futrama/Futurama Season 1 [1080p AI x265 10bit FS99 Joy]" ``` Output (one line per file): ``` KEEP Futurama S01E01 Space Pilot 3000 [1080p x265 10bit Joy].mkv KEEP folder.jpg KEEP Featurettes/Episode One Animatic.mkv -> featurettes/Episode One Animatic.mkv DELETE Encoded by JoyBell (UTR).txt [release-group brag] DELETE How to play HEVC (THIS FILE).txt [codec promo .txt] DELETE Ninite K-Lite Codecs Unattended Silent ....website [windows .website -- SECURITY] DELETE Futurama Compare.png [encoder compare image] FLAG SomeUnknownFile.bin [unknown extension] ``` A **summary** at the bottom: ``` KEEP 16 files (5.92 GiB) DELETE 4 files (1.08 MiB) FLAG 0 files Run with --apply to quarantine the DELETE set. ``` Quick one-liner equivalents (for ad-hoc spot checks; the script § 9 is preferred): ```bash # What would I delete? find "$SRC" \( \ -iname '*.txt' -o -iname '*.nfo' -o -iname '*.url' -o -iname '*.website' \ -o -iname '*.lnk' -o -iname '*.exe' -o -iname '*.msi' -o -iname '*.bat' \ -o -iname '*.scr' -o -iname '*.ps1' -o -iname '*.cmd' -o -iname '*.com' \ -o -iname 'Thumbs.db' -o -iname '.DS_Store' -o -iname 'Desktop.ini' \ -o -iname '*Compare*.png' -o -iname '*Compare*.jpg' \ -o -iname 'sample.mkv' -o -iname '*.sample.mkv' -o -iname '*-sample.mkv' \ -o -iname '*.torrent' -o -iname '*.sfv' -o -iname '*.md5' \ \) -print # What looks like a real Jellyfin NFO vs a release-group brag? find "$SRC" -iname '*.nfo' -print0 | while IFS= read -r -d '' f; do if head -c 4096 "$f" | tr -d '[:space:]' \ | grep -qE '<(movie|tvshow|episodedetails|artist|album|musicvideo|season)\b'; then printf 'KEEP %s\n' "$f" else printf 'DELETE %s\n' "$f" fi done ``` ### 5.2 Stage 2 — Quarantine apply ```bash cleanup-import.sh --apply "/home/admin/Downloads/futrama/Futurama Season 1 [...]" ``` What it does: 1. **Copies** the source directory tree to `/home/admin/.jellyfin-staging//`. The source is never modified. 2. Inside the staging copy, **moves** every DELETE-classified file to `/home/admin/.jellyfin-quarantine///`, preserving relative paths so a user can `diff -r` to confirm. 3. **Renames** non-canonical extras subfolders to canonical lowercase (§ 4.1). 4. Writes a manifest at `/home/admin/.jellyfin-staging//.cleanup-manifest.json` listing every file action with sha256, source path, action, target path. This is what stage 3 reads. 5. Returns the staging path on stdout — that's the input to doc 08's filename normalizer. ### 5.3 Stage 3 — Confirm and recycle After the user reviews the quarantine directory and approves: ```bash cleanup-import.sh --confirm-quarantine 2026-05-08 ``` Moves `/home/admin/.jellyfin-quarantine/2026-05-08/` to the system trash (via `gio trash`) — still recoverable, but no longer cluttering the quarantine root. After 30 days a cron sweep empties trash older than that. ### 5.4 Never delete from source The source download (`/home/admin/Downloads/futrama/...`) is **never** modified by `cleanup-import.sh`. Reasons: - The user may want to re-seed the torrent. - The user may want to re-run cleanup with different rules later. - Bugs in the cleanup script must never destroy original artefacts. Source deletion is a separate manual step the user does AFTER the import is verified in Jellyfin and the library is happy. There is no script for it on purpose. --- ## 6. Idempotency, edge cases, and "unknown" handling - **Idempotent.** `cleanup-import.sh --apply` on an already-cleaned staging directory is a no-op (nothing matches DELETE). The script detects this and exits 0 with `nothing to do`. - **Re-runnable on source.** Re-running the script on the same source produces a fresh staging copy, overwriting (after backup) the previous staging directory. Quarantine is dated, so two runs on the same day for the same release append rather than overwrite (`.2/`, `.3/`, etc.). - **Unknown extension** (e.g. `.dat`, `.bin`, `.iso`, `.bin.txt`) — never auto-deleted. FLAGGED in the audit output, surfaced to the user. The user adds it to the local override file `~/.config/jellyfin-cleanup/local-rules.conf` if they want it classified next time. - **Hidden dotfiles** (anything starting with `.` other than known OS caches like `.DS_Store`) — FLAGGED. Don't auto-delete; could be a legitimate `.subliminal.cache` (subtitles plugin) or similar. - **Symlinks** — never followed. A symlink in a release directory is always FLAGGED; the script refuses to copy or quarantine it. - **Permission denied** — script bails with non-zero exit. Never partially applies. --- ## 7. The `Futurama Compare.png` problem (artwork false-positive) `Futurama Compare.png` is a 1.05 MB PNG sitting next to the season's MKV files. To a naive image-globber it looks like artwork — same extension as `folder.jpg`, larger than the typical poster, sitting in the right location. It's actually an encoder comparison shot. The rule from doc 01 (artwork) and enforced here: > **An image file in the release root is KEPT only if its name is on the > exact recognised-artwork allow-list.** Anything else is DELETED. Recognised artwork allow-list (top-level of an item folder): - `folder.{jpg,jpeg,png,webp}` - `poster.{jpg,jpeg,png,webp}` - `cover.{jpg,jpeg,png,webp}` - `default.{jpg,jpeg,png,webp}` - `show.{jpg,jpeg,png,webp}` (series only) - `jacket.{jpg,jpeg,png,webp}` (series only) - `movie.{jpg,jpeg,png,webp}` (movies only) - `backdrop.{jpg,jpeg,png,webp}` and `backdrop[0-9]*.{jpg,jpeg,png,webp}` - `fanart.{jpg,jpeg,png,webp}`, `background.{jpg,jpeg,png,webp}`, `art.{jpg,jpeg,png,webp}` - `logo.{png,jpg}`, `clearlogo.{png,jpg}` - `banner.{jpg,png}`, `landscape.{jpg,png}`, `thumb.{jpg,png}`, `disc.{png,jpg}`, `clearart.{png,jpg}` - `season[0-9]*-poster.{jpg,png}`, `season[0-9]*.{jpg,png}`, `season-specials-poster.{jpg,png}` - `extrafanart/` and `backdrops/` directories (any contents OK) Exception: images **inside** a recognised extras folder (`extras/`, `featurettes/`, etc.) are KEPT regardless of name — they're presumed to be intentional content of that extra. `Futurama Compare.png` matches none of these allow-list patterns and is not inside an extras folder, so it's DELETED. --- ## 8. Security rules The single most important rule in this document: > **Windows-executable extensions and Internet Shortcut formats are > auto-deleted, never quarantined for "review", because the threat model > isn't the Linux server, it's the Jellyfin user who downloads them.** Jellyfin has a "Download original file" button for every item. If a release contains `Codec Installer.exe`, Jellyfin will happily serve it to any user with library access — including the friend on Windows who might not understand that downloading and running an `.exe` from a media library is a terrible idea. We don't trust the upload chain (the release group), so we strip these on the server side. Exhaustive auto-delete list (security override — these bypass the "FLAG unknown" rule): | Pattern | Risk | |---|---| | `*.exe` | Windows executable. Direct code execution on download+run. | | `*.msi` | Windows Installer package. Silent install possible. | | `*.bat`, `*.cmd` | Windows batch script. Runs in `cmd.exe`. | | `*.com` | Old DOS-style executable. Still runs on modern Windows. | | `*.scr` | Windows screensaver = .exe in disguise. Classic malware vector. | | `*.ps1` | PowerShell script. Common modern malware delivery. | | `*.vbs`, `*.wsf`, `*.hta`, `*.js` (Windows Script Host) | Active scripting. | | `*.jar` | Java archive — runs as `java -jar` on systems with JRE. | | `*.dll`, `*.sys` | Windows libraries / drivers. Side-load attacks. | | `*.url`, `*.website`, `*.lnk` | Internet Shortcut / Windows Shortcut. Points at attacker-controlled URL. | | `*.iso`, `*.img` (in a media folder, not at the library root) | Mountable disk image. Can carry Windows installers. **FLAG, not auto-delete** — could legitimately be a DVD rip. | | `*.app/` | macOS application bundle. Auto-deleted. | | `Autorun.inf` | Windows autorun config. **AUTO-DELETE.** | Total auto-delete categories that are **purely** security-driven (not Jellyfin-irrelevance-driven): **15** — `.exe`, `.msi`, `.bat`, `.cmd`, `.com`, `.scr`, `.ps1`, `.vbs`, `.wsf`, `.hta`, `.jar`, `.dll`, `.sys`, `.url`/`.website`/`.lnk`, `Autorun.inf`. Plus 1 flagged for human review: `.iso`/`.img`. ### 8.1 Why `.url` is in the security list `.url` is a plain-text Internet Shortcut. On Windows, double-clicking it opens the target in the default browser. The "target" is whatever the release group put in the `URL=` line. Historically this was used to push codec-pack download pages with bundled adware. There is no benign reason for a `.url` to ship in a media release. The Futurama release contains exactly this pattern: ``` [InternetShortcut] URL=https://ninite.com/klitecodecs/ ``` Ninite itself is reputable — but the principle is "do not ship clickable URLs to third-party installers in a media library, ever". ### 8.2 The `RARBG_DO_NOT_MIRROR.exe` historic case Some releases historically contained a file named `RARBG_DO_NOT_MIRROR.exe`, ostensibly to discourage mirror sites from re-uploading. In several documented cases this file was actually adware or a cryptominer. Auto-delete, no questions asked. --- ## 9. Prepared cleanup script — `cleanup-import.sh` Idempotent. Dry-run by default. Quarantine-first. Source-immutable. Returns the staging path on stdout for piping to doc 08's normalizer. Save to `bin/cleanup-import.sh` in the `jellyfin-stack` repo. ```bash #!/usr/bin/env bash # cleanup-import.sh — Pre-import cleanup for tv.s8n.ru # Version 1.0 (2026-05-08) — see docs/07-pre-import-cleanup.md # # Usage: # cleanup-import.sh SRC # dry-run # cleanup-import.sh --apply SRC # quarantine # cleanup-import.sh --confirm-quarantine YYYY-MM-DD # recycle # # Exit codes: # 0 success / nothing to do # 1 user error (bad args, source not found) # 2 internal error (permission, partial state) # 3 flagged files present — user must review before --apply set -euo pipefail STAGING_ROOT="${JELLYFIN_STAGING_ROOT:-$HOME/.jellyfin-staging}" QUARANTINE_ROOT="${JELLYFIN_QUARANTINE_ROOT:-$HOME/.jellyfin-quarantine}" TODAY="$(date +%Y-%m-%d)" # ----- classification ----- # Returns one of: KEEP DELETE FLAG classify() { local path="$1" local base base="$(basename "$path")" local lower lower="$(printf '%s' "$base" | tr '[:upper:]' '[:lower:]')" # Security overrides — bypass everything else case "$lower" in *.exe|*.msi|*.bat|*.cmd|*.com|*.scr|*.ps1|*.vbs|*.wsf|*.hta|*.jar|*.dll|*.sys) echo DELETE; return ;; *.url|*.website|*.lnk) echo DELETE; return ;; autorun.inf) echo DELETE; return ;; esac # OS junk case "$lower" in thumbs.db|ehthumbs.db|ehthumbs_vista.db|.ds_store|desktop.ini|.directory) echo DELETE; return ;; ._*) echo DELETE; return ;; esac # Media — KEEP case "$lower" in *.mkv|*.mp4|*.avi|*.m4v|*.ts|*.mov|*.webm|*.wmv|*.flv|*.mpg|*.mpeg) echo KEEP; return ;; *.srt|*.ass|*.ssa|*.vtt|*.sup|*.idx|*.sub) echo KEEP; return ;; *.mp3|*.flac|*.ogg|*.opus|*.m4a|*.wav) echo KEEP; return ;; esac # Recognised artwork at item root case "$lower" in folder.jpg|folder.jpeg|folder.png|folder.webp) echo KEEP; return ;; poster.jpg|poster.jpeg|poster.png|poster.webp) echo KEEP; return ;; cover.jpg|cover.jpeg|cover.png|cover.webp) echo KEEP; return ;; default.jpg|default.png|show.jpg|show.png|jacket.jpg|jacket.png|movie.jpg|movie.png) echo KEEP; return ;; backdrop.jpg|backdrop.png|backdrop[0-9]*.jpg|backdrop[0-9]*.png) echo KEEP; return ;; fanart.jpg|fanart.png|background.jpg|background.png|art.jpg|art.png) echo KEEP; return ;; logo.png|logo.jpg|clearlogo.png|clearlogo.jpg|banner.jpg|banner.png) echo KEEP; return ;; landscape.jpg|landscape.png|thumb.jpg|thumb.png|disc.png|disc.jpg|clearart.png|clearart.jpg) echo KEEP; return ;; season[0-9]*-poster.jpg|season[0-9]*-poster.png|season[0-9]*.jpg|season[0-9]*.png) echo KEEP; return ;; season-specials-poster.jpg|season-specials-poster.png) echo KEEP; return ;; esac # Promo images masquerading as art case "$lower" in *compare*.png|*compare*.jpg|*compare*.jpeg|*compare*.webp|*compare*.gif) echo DELETE; return ;; *sample*.png|*sample*.jpg|*sample*.jpeg) echo DELETE; return ;; *screen*.png|*screen*.jpg|*preview*.png|*preview*.jpg) echo DELETE; return ;; esac # Text-flavoured junk case "$lower" in *.txt|*.diz|file_id.diz) echo DELETE; return ;; esac # Sample files case "$lower" in sample.mkv|sample.mp4|sample.avi|sample.m4v) echo DELETE; return ;; *-sample.mkv|*-sample.mp4|*.sample.mkv|*.sample.mp4|*_sample.mkv|*_sample.mp4) echo DELETE; return ;; esac # Torrent residue case "$lower" in *.torrent|*.magnet|*.parts|*.aria2|*.meta) echo DELETE; return ;; *.pad|__padding_file_*|_____padding_file_*) echo DELETE; return ;; *.sfv|*.md5|*.sha1|*.sha256) echo DELETE; return ;; esac # NFO discriminator — KEEP if Jellyfin-compatible XML, else DELETE case "$lower" in *.nfo) if head -c 4096 "$path" | tr -d '[:space:]' \ | grep -qE '<(movie|tvshow|episodedetails|artist|album|musicvideo|season)\b'; then echo KEEP else echo DELETE fi return ;; esac # Suspicious archives in a media folder case "$lower" in *.rar|*.r[0-9][0-9]|*.zip|*.7z|*.tar|*.tar.gz|*.iso|*.img) echo FLAG; return ;; esac echo FLAG } # ----- folder classification ----- # Returns one of: KEEP_AS-IS RENAME: DELETE FLAG classify_dir() { local d="$1" local lower lower="$(basename "$d" | tr '[:upper:]' '[:lower:]')" case "$lower" in behind\ the\ scenes|deleted\ scenes|interviews|scenes|samples|shorts|featurettes|clips|other|extras|trailers|theme-music|backdrops) echo "RENAME:$lower"; return ;; bts|behind-the-scenes) echo "RENAME:behind the scenes"; return ;; deleted-scenes|deleted_scenes) echo "RENAME:deleted scenes"; return ;; bonus|bonus\ features|bonus\ material|special\ features|outtakes|bloopers|gag\ reel) echo "RENAME:extras"; return ;; proof|screens|screenshots|caps|preview|previews) echo DELETE; return ;; sample) echo DELETE; return ;; .fseventsd|.spotlight-v100|.trashes|\$recycle.bin|system\ volume\ information) echo DELETE; return ;; extrafanart) echo "RENAME:extrafanart"; return ;; # case stays, recognised *) echo FLAG; return ;; esac } # ----- main ----- APPLY=0 CONFIRM_DATE="" SRC="" while [[ $# -gt 0 ]]; do case "$1" in --apply) APPLY=1; shift ;; --confirm-quarantine) CONFIRM_DATE="$2"; shift 2 ;; -h|--help) sed -n '2,12p' "$0"; exit 0 ;; -*) echo "unknown flag: $1" >&2; exit 1 ;; *) SRC="$1"; shift ;; esac done if [[ -n "$CONFIRM_DATE" ]]; then if [[ -d "$QUARANTINE_ROOT/$CONFIRM_DATE" ]]; then gio trash "$QUARANTINE_ROOT/$CONFIRM_DATE" echo "Recycled $QUARANTINE_ROOT/$CONFIRM_DATE" else echo "No quarantine for $CONFIRM_DATE" >&2; exit 1 fi exit 0 fi [[ -n "$SRC" && -d "$SRC" ]] || { echo "usage: $0 [--apply] SRC" >&2; exit 1; } RELEASE="$(basename "$SRC")" STAGE="$STAGING_ROOT/$RELEASE" QUAR="$QUARANTINE_ROOT/$TODAY/$RELEASE" declare -i KEEP_N=0 DEL_N=0 FLAG_N=0 # Walk source, classify each entry while IFS= read -r -d '' f; do rel="${f#$SRC/}" if [[ -d "$f" ]]; then case "$(classify_dir "$f")" in KEEP_AS-IS|RENAME:*) ;; DELETE) printf 'DELETE %s/ [junk dir]\n' "$rel"; DEL_N+=1 ;; FLAG) printf 'FLAG %s/ [unknown dir name]\n' "$rel"; FLAG_N+=1 ;; esac continue fi case "$(classify "$f")" in KEEP) printf 'KEEP %s\n' "$rel"; KEEP_N+=1 ;; DELETE) printf 'DELETE %s\n' "$rel"; DEL_N+=1 ;; FLAG) printf 'FLAG %s\n' "$rel"; FLAG_N+=1 ;; esac done < <(find "$SRC" -mindepth 1 -print0) echo "---" echo "KEEP $KEEP_N" echo "DELETE $DEL_N" echo "FLAG $FLAG_N" if (( FLAG_N > 0 )); then echo "FLAG count > 0; review before re-running with --apply." >&2 (( APPLY == 0 )) || exit 3 fi if (( APPLY == 0 )); then echo "Dry run only. Re-run with --apply to quarantine." exit 0 fi # --- APPLY path: copy to staging, move DELETE to quarantine --- mkdir -p "$STAGE" "$QUAR" # rsync -a preserves perms and is idempotent rsync -a --delete "$SRC/" "$STAGE/" while IFS= read -r -d '' f; do rel="${f#$STAGE/}" if [[ -d "$f" ]]; then res="$(classify_dir "$f")" case "$res" in RENAME:*) target="${res#RENAME:}" parent="$(dirname "$f")" [[ "$(basename "$f")" == "$target" ]] || mv "$f" "$parent/$target" ;; DELETE) mkdir -p "$QUAR/$(dirname "$rel")" mv "$f" "$QUAR/$rel" ;; esac continue fi case "$(classify "$f")" in DELETE) mkdir -p "$QUAR/$(dirname "$rel")" mv "$f" "$QUAR/$rel" ;; esac done < <(find "$STAGE" -mindepth 1 -print0) # Manifest { echo "{" echo " \"release\": \"$RELEASE\"," echo " \"date\": \"$TODAY\"," echo " \"source\": \"$SRC\"," echo " \"staging\": \"$STAGE\"," echo " \"quarantine\": \"$QUAR\"" echo "}" } > "$STAGE/.cleanup-manifest.json" # Stdout: the staging path, for piping to doc 08's normalizer echo "$STAGE" ``` ### 9.1 Pipeline integration ```bash # Full pre-import flow: SRC="/home/admin/Downloads/futrama/Futurama Season 1 [1080p AI x265 10bit FS99 Joy]" STAGING="$(cleanup-import.sh --apply "$SRC")" # STAGING is now ~/.jellyfin-staging/Futurama Season 1.../ with junk gone. # Hand off to doc 08: normalize-filenames.sh "$STAGING" # Then move to live media tree (manual; doc 05 confirms layout): mv "$STAGING" "/home/user/media/tv/Futurama (1999)/Season 01" ``` The `mv` to the live tree is **deliberately manual**. Cleanup and rename are reproducible from source; the move into `/home/user/media/` is the point of no return and the user runs it consciously. --- ## 10. What this doc explicitly does NOT do - **Filename normalization** — that's doc 08. This doc only deletes; doc 08 renames `Futurama S01E01 Space Pilot 3000 [1080p x265 10bit Joy].mkv` into the canonical `Futurama (1999) - S01E01 - Space Pilot 3000.mkv`. - **Subtitle reconciliation** — doc 03 covers per-language naming; this doc only deletes obsolete formats (`.smi`, `.rt`). - **Library refresh** — after files land in `/home/user/media/`, run `POST /Library/Refresh` on the Jellyfin API (doc 02 § 2). Cleanup never touches the running container. - **NFO writing** — doc 02 § 11 covers writing override NFOs. This doc only filters incoming NFOs. - **Source deletion** — never. The source download is read-only to this pipeline; the user removes it manually post-import. --- ## 11. TL;DR | Step | What | Where | |---|---|---| | 1 | Audit (dry-run) | `cleanup-import.sh "$SRC"` | | 2 | Apply (quarantine) | `cleanup-import.sh --apply "$SRC"` → prints staging path | | 3 | Review quarantine | `ls ~/.jellyfin-quarantine/$(date +%F)/` | | 4 | Normalize filenames | doc 08, takes staging path as input | | 5 | Move to live tree | manual `mv "$STAGING" /home/user/media/...` | | 6 | Refresh library | `POST /Library/Refresh` (doc 02) | | 7 | Confirm quarantine | `cleanup-import.sh --confirm-quarantine YYYY-MM-DD` | | 8 | Delete source | manual, only after Jellyfin shows the item correctly | The hard rule, repeated: **the source download is never modified, the live media tree is never written by cleanup, and Windows executables never reach a Jellyfin user's browser.**