ARRFLIX/docs/07-pre-import-cleanup.md

1003 lines
41 KiB
Markdown
Raw Normal View History

# 07 — Pre-Import Cleanup Ruleset (arrflix.s8n.ru)
Add pre-import cleanup + filename normalization rulesets - 07-pre-import-cleanup: 1002-line ruleset for stripping non-media junk before files land in /home/user/media/. Catalogs 10 categories (codec promo, group brag, promo images, OS thumb caches, samples, sub leftovers, torrent residue, proof folders, multi-disc cruft, Win executables). NFO discriminator uses 4096-byte head + XML-root regex (covers prologue case the brief 100-byte version misses). 15 auto-delete security categories (.exe/.msi/.bat/.scr/...); threat model = friend clicking 'Download original' then running on Win. Verified extras folders against Jellyfin docs (lowercase 'featurettes', 'behind the scenes', etc.). Includes idempotent dry-run-default cleanup-import.sh that quarantines first, returns staging path on stdout. - 08-filename-normalization: 1853-line normative renaming ruleset. Canonical: 'Show (Year) - SXXEXX - Title.ext' for TV; '<Title> (<Year>).ext' for movies; 'Show - NNNN - Title [Sub|Dub].ext' for absolute-numbered anime. Strips group tags ([YIFY]/[RARBG]/[FS99 Joy]/[GalaxyRG]), resolution (1080p/2160p/4K), codec (x264/x265/HEVC/10bit), source (WEB-DL/BluRay/HDTV), audio (DTS-HD.MA/Atmos/5.1/AAC), release-process (PROPER/REPACK/INTERNAL), trailing -NOGRP/-RARBG/-EVO, URL refs, basename language tokens. Includes stdlib-only normalize.py: dry-run default, --apply commits, --force overwrites, audit log to /var/log/jellyfin-imports/<date>.log, idempotent. Worked Futurama before/after; flags drift on live tree (current 'Futurama/' lacks '(1999)').
2026-05-08 02:07:11 +01:00
Last updated: 2026-05-08
Server: Jellyfin 10.10.3 on nullstone, container `jellyfin`
Library root inside container: `/media`
Library root on host: `/home/user/media`
This document defines the **normative pre-import cleanup ruleset** for the
personal Jellyfin deploy. The owner downloads scene/group releases (e.g.
`Futurama Season 1 [1080p AI x265 10bit FS99 Joy]/`) which contain a mixture
of media files and non-media junk (codec readmes, release-group brags, Windows
installer shortcuts, comparison images, OS thumbnail caches, etc.). This junk
must NOT land in `/home/user/media/` because:
1. It clutters the library and confuses scrapers.
2. Promo PNGs may be mis-identified as artwork.
3. Release-group `.nfo` files break the NFO-override flow (doc 02 § 11).
4. **Windows executables and installer shortcuts (`.exe`, `.msi`, `.website`,
`.url`, `.lnk`, `.scr`, `.bat`, `.ps1`) are a real security vector.** Even
though the Linux server cannot execute them, friends with a Jellyfin
account can download them through the web UI and run them on their PC.
Cross-linked to:
- [`01-artwork-and-images.md`](01-artwork-and-images.md) — what counts as a
recognised poster / backdrop on disk.
- [`02-metadata-and-titles.md`](02-metadata-and-titles.md) — NFO sidecar
override flow; what a "real" Jellyfin NFO looks like.
- [`03-subtitles.md`](03-subtitles.md) — which subtitle files to keep.
- [`05-file-structure-rules.md`](05-file-structure-rules.md) — canonical
folder layout. § 8 of doc 05 defines the recognised extras subfolders;
this doc enforces them at import time.
- [`08-filename-normalization.md`](08-filename-normalization.md) — the
**next** stage of the pipeline (sibling agent), called after this doc's
`cleanup-import.sh` has produced a clean staging tree.
Sources of truth:
- <https://jellyfin.org/docs/general/server/media/movies/> — extras subfolders
and artwork filename patterns.
- <https://jellyfin.org/docs/general/server/media/shows/> — same for series.
- <https://jellyfin.org/docs/general/server/metadata/nfo/> — NFO XML schema;
used here to distinguish a real metadata NFO from a release-group brag.
---
## 0. Top-level cleanup rules
These are non-negotiable. They wrap the doc 05 top-level rules with one
guarantee: **nothing leaves staging until cleanup has run and been
confirmed.**
1. **Never clean in-place on the source download.** The download directory
(`/home/admin/Downloads/...`) is treated as a read-only artefact until
the user explicitly approves deletion. The cleanup script copies into a
staging area and operates there.
2. **Quarantine first, delete later.** First run of the cleanup script on a
release moves junk to `~/.jellyfin-quarantine/<YYYY-MM-DD>/<release-name>/`
instead of deleting. The user reviews, then a second pass empties the
quarantine after sign-off. Subsequent runs on the same release are
idempotent.
3. **Two-list policy.** Every file is matched against an `ALLOW` list (KEEP)
or a `DENY` list (DELETE). Anything not on either list is **flagged** and
surfaced in the audit report — a human decides. Never auto-delete on
"unknown".
4. **Never run cleanup as root.** All operations are as the unprivileged
`admin` (onyx) or `user` (nullstone) account. The live `/home/user/media/`
tree is touched only by the rename step in doc 08, after cleanup has
produced an intermediate staging copy.
5. **Idempotent.** Running cleanup twice on the same source must produce the
same staging tree byte-for-byte (same `find -printf '%p %s\n' | sort`
output, modulo timestamps).
6. **Dry-run is the default.** The cleanup script with no flags lists what it
*would* do and exits without writing. `--apply` is required to actually
move/quarantine files.
---
## 1. Categorical taxonomy of non-media files in scene/group releases
Scene and group ("p2p") releases follow loose conventions. The following
categories cover everything observed in the wild plus everything in the
Futurama download set:
### 1.1 Codec / player promotion
Text files and Windows shortcut files steering the user toward a specific
codec pack or media player (often K-Lite + MPC-HC). Frequently the file is
an `.url` or `.website` (Internet Shortcut) pointing to a third-party
installer. **Always DELETE.**
Real-world examples (`/home/admin/Downloads/futrama/`):
- `How to play HEVC (THIS FILE).txt` — 65 lines of MPC-HC marketing.
- `Ninite K-Lite Codecs Unattended Silent Installer and Updater.website`
`URL=https://ninite.com/klitecodecs/` Internet Shortcut.
Patterns:
- `How to play *.txt`, `Read*Me*.txt`, `INSTALL*.txt`, `PLAY*.txt`
- `*.website`, `*.url`, `*.lnk`
- `K-Lite*`, `MPC-HC*`, `VLC*`, `MX Player*`, `LAV*`
### 1.2 Release-group brag
Plain-text or `.nfo` files where the release group identifies itself,
documents encoder settings, or pumps its tracker URL. Distinguishable from a
**Jellyfin-compatible metadata NFO** (XML, root `<movie>` / `<tvshow>` /
`<episodedetails>`) by content — see § 3.
Real-world examples:
- `Encoded by JoyBell (UTR).txt` — 41-line manifesto from "Unity Team
Release group" pointing to `UNITEAM.CO`.
- `RARBG.txt`, `WWW.YIFY-TORRENTS.COM.url`, `<group>.nfo` with ASCII art.
Patterns:
- `Encoded by *.txt`, `Ripped by *.txt`, `<GROUP>.txt`
- `RARBG.txt`, `RARBG_DO_NOT_MIRROR.exe` (yes, those exist; § 1.10)
- `*-readme.txt`, `release notes.txt`
- `*.nfo` containing only ASCII art (no `<movie>` / `<tvshow>` /
`<episodedetails>` root element)
- `*.diz`, `file_id.diz` — old "BBS description" file, scene leftover
### 1.3 Promo images that are NOT poster artwork
Images that LOOK like artwork to a naive globber but are actually before/after
comparisons, group banners, or screenshot proofs. **Delete unless they live
inside a recognised extras folder (§ 4) or match the strict allow-list of
poster/backdrop names from doc 01.**
Real-world example:
- `Futurama Compare.png` (1.05 MB) — encoder before/after comparison.
Patterns to delete:
- `*Compare*.{png,jpg,jpeg,webp}`
- `*Sample*.{png,jpg,jpeg}` (when not in a `samples/` extras folder)
- `*Screen*.{png,jpg}`, `*Screens/*`, `*Proof/*`, `*Preview/*`
- `*-banner.png` from a group (NOT the same as Jellyfin's `banner.jpg`;
group banners typically have the group name in the filename — heuristic
match `*JoyBell*`, `*UTR*`, `*JoY*`, etc.)
- Stray `*.gif` files (animated previews); Jellyfin doesn't use GIF.
### 1.4 OS-generated thumbnail caches
Per-OS file managers (Windows Explorer, macOS Finder, GNOME Files) leave
turds in every directory they browse. **Always DELETE — never useful, never
metadata.**
Patterns:
- `Thumbs.db`, `ehthumbs.db`, `ehthumbs_vista.db`
- `.DS_Store`, `._*` (macOS resource forks)
- `Desktop.ini`, `desktop.ini`
- `.directory` (KDE)
- `.fseventsd/`, `.Spotlight-V100/`, `.Trashes/` (macOS)
- `$RECYCLE.BIN/`, `System Volume Information/` (Windows mount)
### 1.5 Sample files (lower-quality previews)
Scene releases sometimes ship a 30-second sample file at lower bitrate.
Jellyfin treats a `samples/` subfolder as extras (doc 05 § 8.2), but a stray
`Movie.sample.mkv` next to the main file would scrape as "another version".
**Default: DELETE.** Reasoning: we have the full file; the sample is dead
weight. If the user genuinely wants samples, drop them into a `samples/`
subfolder before running cleanup and the script will preserve the folder.
Patterns to delete (when at the top level of a release):
- `sample.{mkv,mp4,avi,m4v}`
- `*-sample.{mkv,mp4,avi,m4v}`, `*.sample.{mkv,mp4,avi,m4v}`
- `*_sample.{mkv,mp4,avi,m4v}`
- `Sample/` directory (rename to `samples/` to preserve as extras, OR delete)
### 1.6 Subtitle leftovers
VobSub (DVD/Blu-ray bitmap subs) are shipped as a pair: `en.idx` (index) +
`en.sub` (bitmap stream). Jellyfin can render them, but if a `.srt` exists
with the same language tag the bitmap pair is redundant and slow.
**Default: KEEP all `.srt` and `.ass`. KEEP `.idx`/`.sub` only if no `.srt`
of the same language exists.** This is a per-file decision — surface to the
user in the audit report rather than auto-pruning.
Patterns:
- `*.srt`, `*.ass`, `*.ssa`, `*.vtt` — KEEP (per doc 03).
- `*.sup` (PGS bitmap, Blu-ray) — KEEP (Jellyfin renders).
- `*.idx` + `*.sub` (VobSub) — KEEP if no `.srt` with same lang code; else
flag for human review.
- `*.smi`, `*.rt` — DELETE (obsolete formats Jellyfin doesn't support).
### 1.7 Torrent residue
Files left by the torrent client itself. None are useful to Jellyfin.
Patterns to delete:
- `*.torrent`, `*.magnet`
- `*.parts`, `*.!ut`, `*.!qB`, `*.bc!` (in-progress fragments)
- `*.meta`, `*.aria2`
- `*.pad`, `padding/`, `__padding_file_*` (mktorrent padding)
- `*.sfv` (checksum manifest; harmless but useless after download)
- `*.md5`, `*.sha1`, `*.sha256` (release-checksum sidecars)
### 1.8 Test / proof images and folders
Some groups ship a `Proof/` or `Screens/` folder with screenshots to "prove"
the rip's quality. Useless inside a Jellyfin library.
Patterns to delete (whole folders):
- `Proof/`, `proof/`, `PROOF/`
- `Screens/`, `screens/`, `Screenshots/`, `Caps/`
- `Preview/`, `Previews/`
- `_screens/`, `screenshots-only/`
### 1.9 Multi-disc DVD/Blu-ray cruft
When a release is a straight ISO rip the `VIDEO_TS/` or `BDMV/` directory
sometimes survives next to the encoded file. Jellyfin can play
`VIDEO_TS.IFO` directly, but a partial DVD structure left over from the
encode is just clutter.
Patterns:
- `VIDEO_TS/` — KEEP if it contains a complete `VIDEO_TS.VOB` set;
otherwise flag.
- `*.IFO`, `*.BUP`, `*.VOB` — KEEP if inside a complete `VIDEO_TS/`;
DELETE if loose.
- `BDMV/`, `CERTIFICATE/`, `AACS/` — KEEP if complete BD structure;
flag if partial.
- `*.iso` inside a media folder — flag for human review (could be the
intentional rip OR a Windows malware vector — see § 8).
### 1.10 Outright malicious / suspicious
Some releases historically shipped Windows executables disguised as
"DO NOT MIRROR" anti-leech files. Even on a Linux server these must be
deleted because the friend with a Jellyfin account can download them via
the web UI ("Download original file" button) and run them locally.
**Always DELETE, never quarantine, never preserve, no exceptions.**
Patterns:
- `*.exe`, `*.msi`, `*.bat`, `*.cmd`, `*.com`, `*.scr`, `*.ps1`, `*.vbs`,
`*.wsf`, `*.hta`, `*.jar`
- `*.app/` (macOS bundle dropped by macOS-using uploader)
- `*.dll`, `*.sys` (rare, but seen)
- Anything with a double extension like `Movie.mkv.exe`
---
## 2. KEEP vs DELETE — exhaustive table
This table is the **canonical decision matrix** for `cleanup-import.sh`.
Patterns are case-insensitive on `ext4`+Jellyfin. `KEEP` means it goes to the
staging tree; `DELETE` means it goes to quarantine on first run, then
recycle-bin on confirm.
| Pattern | Action | Why |
|---|---|---|
| `*.mkv`, `*.mp4`, `*.avi`, `*.m4v`, `*.ts`, `*.mov`, `*.webm`, `*.wmv`, `*.flv`, `*.mpg`, `*.mpeg` | **KEEP** | Media — the entire point. |
| `*.srt`, `*.ass`, `*.ssa`, `*.vtt`, `*.sup` | **KEEP** | Subtitles (doc 03). |
| `*.idx` + `*.sub` (VobSub pair) | **KEEP** if no `.srt` of same lang exists; else **FLAG** | Bitmap subs; redundant with SRT. |
| `*.smi`, `*.rt` | **DELETE** | Obsolete subtitle formats; Jellyfin can't render. |
| `folder.{jpg,png}`, `poster.{jpg,png}`, `cover.{jpg,png}`, `default.{jpg,png}`, `show.{jpg,png}`, `jacket.{jpg,png}`, `movie.{jpg,png}` | **KEEP** | Jellyfin-recognised primary artwork (doc 01). |
| `backdrop.{jpg,png}`, `fanart.{jpg,png}`, `background.{jpg,png}`, `art.{jpg,png}`, `backdrop[0-9]*.{jpg,png}`, `backdrop-[0-9]*.{jpg,png}` | **KEEP** | Jellyfin-recognised backdrops (doc 01). |
| `logo.{png,jpg}`, `clearlogo.{png,jpg}`, `banner.{jpg,png}`, `landscape.{jpg,png}`, `thumb.{jpg,png}`, `disc.{png,jpg}`, `clearart.{png,jpg}` | **KEEP** | Jellyfin-recognised auxiliary artwork. |
| `season[0-9]*-poster.{jpg,png}`, `season[0-9]*.{jpg,png}`, `season-specials-poster.{jpg,png}` | **KEEP** | Per-season artwork (doc 01 / TV layout). |
| `extrafanart/*.{jpg,png}`, `backdrops/*.{jpg,png,mp4}` | **KEEP** | Multi-backdrop folders (doc 05 § 8). |
| `*.nfo` with XML root `<movie>` / `<tvshow>` / `<episodedetails>` / `<artist>` / `<album>` / `<musicvideo>` | **KEEP** | Jellyfin-compatible metadata sidecar (doc 02 § 11). |
| `*.nfo` without one of the above XML roots | **DELETE** | Release-group ASCII-art brag — pretends to be metadata, isn't. |
| `*Compare*.{png,jpg,jpeg,webp,gif}` | **DELETE** | Encoder before/after — group promo. |
| `*Sample*.{png,jpg,jpeg}` (image, top level) | **DELETE** | Group promo (NOT a Jellyfin sample folder). |
| `*Screen*.{png,jpg}`, `Screens/`, `Screenshots/`, `Caps/` | **DELETE** | Proof shots. |
| `Proof/`, `proof/`, `PROOF/` | **DELETE** (whole folder) | Quality-proof shots. |
| `Preview/`, `Previews/` | **DELETE** (whole folder) | Lower-quality teaser. |
| `*.txt` (any) | **DELETE** | Readme / group brag — Jellyfin doesn't read TXT. |
| `*.diz`, `file_id.diz` | **DELETE** | Scene description file — obsolete. |
| `*.website`, `*.url`, `*.lnk` | **DELETE** | Windows Internet Shortcut — points at codec/installer pages. **Security: § 8.** |
| `*.exe`, `*.msi`, `*.bat`, `*.cmd`, `*.com`, `*.scr`, `*.ps1`, `*.vbs`, `*.wsf`, `*.hta`, `*.jar`, `*.dll`, `*.sys` | **DELETE** | Windows executable. **Security: § 8.** |
| `*.app/` | **DELETE** (whole folder) | macOS bundle. |
| `Thumbs.db`, `ehthumbs.db`, `ehthumbs_vista.db` | **DELETE** | Windows Explorer thumbnail cache. |
| `.DS_Store`, `._*` | **DELETE** | macOS Finder. |
| `Desktop.ini`, `desktop.ini` | **DELETE** | Windows folder customisation. |
| `.directory` | **DELETE** | KDE Dolphin. |
| `.fseventsd/`, `.Spotlight-V100/`, `.Trashes/`, `$RECYCLE.BIN/`, `System Volume Information/` | **DELETE** (whole folder) | OS metadata directories. |
| `sample.{mkv,mp4,avi,m4v}` (top level) | **DELETE** | Lower-quality preview (doc 05 § 8.1: full file already present). |
| `*-sample.{mkv,mp4,avi,m4v}`, `*_sample.{mkv,mp4,avi,m4v}`, `*.sample.{mkv,mp4,avi,m4v}` | **DELETE** | Same. |
| `Sample/` (directory, top level) | **DELETE** | Lower-quality preview folder. |
| `samples/` (directory, recognised name) | **KEEP** | Jellyfin extras folder (doc 05 § 8.2). |
| `featurettes/`, `behind the scenes/`, `deleted scenes/`, `interviews/`, `scenes/`, `shorts/`, `clips/`, `trailers/`, `extras/`, `other/`, `theme-music/`, `backdrops/` | **KEEP** (whole folder) | Jellyfin extras (doc 05 § 8.2). |
| `Featurettes/`, `Behind The Scenes/`, etc. (capitalised) | **KEEP** but **rename to lowercase** | Jellyfin matches case-insensitive but lowercase is the documented form. |
| Any other folder name | **FLAG** | Surface to human; might be a typo of an extras folder. |
| `*.torrent`, `*.magnet` | **DELETE** | Torrent client residue. |
| `*.parts`, `*.!ut`, `*.!qB`, `*.bc!`, `*.aria2` | **DELETE** | In-progress download fragments (shouldn't be here, but defensive). |
| `*.meta` | **DELETE** | aria2/torrent metadata. |
| `*.pad`, `padding/`, `__padding_file_*`, `_____padding_file_*` | **DELETE** | mktorrent padding files. |
| `*.sfv`, `*.md5`, `*.sha1`, `*.sha256` | **DELETE** | Checksum manifests; harmless but useless after download. |
| `*.rar`, `*.r[0-9][0-9]`, `*.zip`, `*.7z`, `*.tar`, `*.tar.gz` | **FLAG** | Compressed archive in a media folder is suspicious — release should have been extracted before download. |
| `*.iso` inside a media folder | **FLAG** | Could be intentional DVD/BD rip OR Windows-installer disguise. Human review. |
| `VIDEO_TS/` (complete) | **KEEP** | Jellyfin plays DVD structure directly. |
| `*.IFO`, `*.BUP`, `*.VOB` (loose, no `VIDEO_TS/`) | **DELETE** | Orphan DVD remnants. |
| `BDMV/` (complete) | **KEEP** | Jellyfin plays BD structure. |
| `CERTIFICATE/`, `AACS/` (without `BDMV/`) | **DELETE** | Orphan BD remnants. |
| `RARBG*.{txt,exe}`, `WWW.*.url`, `*.YIFY*.url` | **DELETE** | Tracker promo. |
| `RARBG_DO_NOT_MIRROR.exe` and similar | **DELETE** (security: § 8) | Historic anti-leech file; sometimes weaponised. |
| Anything else | **FLAG** | Two-list policy: never auto-delete on "unknown". |
---
## 3. NFO handling — the nuanced case
`.nfo` is overloaded. Two completely different file kinds share the
extension:
- **Scene release `.nfo`** — plain text, ASCII art, encoder credits, tracker
URL. Useless to Jellyfin (and at worst gets scraped as garbage metadata
if NFO Saver is enabled).
- **Jellyfin/Kodi/Emby metadata NFO** — XML, root element is one of
`<movie>`, `<tvshow>`, `<episodedetails>`, `<artist>`, `<album>`,
`<musicvideo>`. Documented in doc 02 § 11.
### 3.1 The discriminator one-liner
```bash
is_jellyfin_nfo() {
# Returns 0 (KEEP) if the file looks like a Jellyfin/Kodi NFO,
# 1 (DELETE) if it looks like scene-group ASCII-art brag.
head -c 4096 "$1" | tr -d '[:space:]' \
| grep -qE '<(movie|tvshow|episodedetails|artist|album|musicvideo|season)\b'
}
# Usage:
if is_jellyfin_nfo "$f"; then echo "KEEP $f"; else echo "DELETE $f"; fi
```
The first 4096 bytes are enough — a real Jellyfin NFO declares its root
within the first kilobyte. `tr -d '[:space:]'` is needed because some
encoders pretty-print the XML and put `<movie` on a different line from `<`.
### 3.2 Edge cases
- An NFO with both ASCII art **and** an XML root: KEEP. Jellyfin's parser
ignores leading non-XML noise as long as the XML element parses.
- An NFO with a different XML root (e.g. `<root>`, `<info>`): DELETE.
Jellyfin won't read it; nothing to preserve.
- An NFO with valid XML but **stale TMDB/IMDB IDs** that conflict with a
newer scrape: KEEP, but flag for the user — doc 02 § 11.5 explains how
the NFO Saver overwrites these on next scrape.
- Multiple NFOs in one folder (e.g. `release.nfo` from the group AND
`tvshow.nfo` from a previous Jellyfin write): KEEP `tvshow.nfo`,
DELETE `release.nfo`. Use the discriminator above on each.
### 3.3 First-100-bytes shortcut
The task brief proposes this:
```bash
if head -c 100 file.nfo | grep -qE '<(movie|tvshow|episodedetails)\b'; then echo KEEP; else echo DELETE; fi
```
This works for the common case but misses NFOs that start with an XML
declaration (`<?xml version="1.0"?>` plus possibly a comment) before the
root element — that prologue alone can be > 100 bytes. The 4096-byte
version above is safer; we use that in `cleanup-import.sh`.
---
## 4. Featurettes / Extras / Bonus folders — the canonical list
Per the Jellyfin docs (movies and shows pages), these subfolder names are
recognised and the contained files are tagged with the matching extra
type. **Folder name match is case-insensitive but lowercase is the
documented canonical form** — `cleanup-import.sh` lowercases on copy to
staging.
| Folder name | Extra type | Notes |
|---|---|---|
| `behind the scenes` | Behind The Scenes | spaces, not dashes |
| `deleted scenes` | Deleted Scene | |
| `interviews` | Interview | |
| `scenes` | Scene | |
| `samples` | Sample | distinct from a top-level `Sample/` (§ 1.5) |
| `shorts` | Short | |
| `featurettes` | Featurette | |
| `clips` | Clip | |
| `other` | Other | catch-all |
| `extras` | Extra | generic catch-all |
| `trailers` | Trailer | |
| `theme-music` | Theme music | `.mp3` files; doc 05 § 8.3 |
| `backdrops` | Backdrop video | rotating video backgrounds |
Anything else (e.g. `Bonus Features/`, `BTS/`, `Special Features/`,
`Featurette/` singular, `behind-the-scenes/` with dashes) is **NOT** matched
by Jellyfin and the contents won't surface as extras. Cleanup either
renames to the canonical name (when the mapping is unambiguous) or flags
for human review.
### 4.1 Canonical-name mapping (auto-rename)
| Found | Renamed to |
|---|---|
| `Featurettes/`, `Featurette/`, `FEATURETTES/` | `featurettes/` |
| `Behind The Scenes/`, `BTS/`, `behind-the-scenes/` | `behind the scenes/` |
| `Deleted Scenes/`, `Deleted_Scenes/`, `deleted-scenes/` | `deleted scenes/` |
| `Interviews/`, `Interview/` | `interviews/` |
| `Trailers/`, `Trailer/` | `trailers/` |
| `Bonus/`, `Bonus Features/`, `Bonus Material/`, `Special Features/`, `Specials/` | `extras/` (generic catch-all) |
| `Outtakes/`, `Bloopers/`, `Gag Reel/` | `extras/` (no dedicated folder) |
The `Specials/` rename to `extras/` is **important** — for a TV series,
`Specials/` looks like a season folder (Season 0 specials), but if the
files inside are featurettes rather than aired specials, putting them in
the wrong folder mis-scrapes them as episodes. When in doubt, flag.
### 4.2 Real-world example: Futurama download
The four Futurama season folders all contain a `Featurettes/` subfolder:
```
Futurama Season 1 [1080p AI x265 10bit FS99 Joy]/Featurettes/
├── Episode One Animatic.mkv
└── Welcome to the World of Tomorrow.mkv
Futurama Season 2 .../Featurettes/
├── Animatic -Why Must I be a Crustacean in Love.mkv
└── Futurama Game Trailer.mkv
Futurama Season 3 .../Featurettes/
├── An X-Mas Message From David X. Cohen.mkv
└── Deleted Scenes.mkv
Futurama Season 4 .../Featurettes/
├── Futurama Welcome to the World of Tomorrow (x265 Joy).mkv
├── Outtakes - Kif Gets Knocked Up a Notch [1080p x265 10bit Joy].mkv
└── Panel on Voice Actors [1080p x265 10bit Joy].mkv
```
After cleanup these become `featurettes/` (lowercase) inside the season
folder. Doc 08 (filename normalization) then renames the season folder
itself to `Season 01/` and may relocate the season-level featurettes to a
**series-level** `featurettes/` folder if the user prefers extras at the
series root (this is a doc 05 § 8 / doc 08 decision, not this doc's).
> Note: `Season 3 / Deleted Scenes.mkv` is a single file and should arguably
> be moved into a `deleted scenes/` subfolder rather than left in
> `featurettes/`. That's a manual disambiguation — flagged, not auto-moved.
---
## 5. Audit-then-clean workflow
Three-stage pipeline. Stage 1 is mandatory; stage 2 runs on user approval;
stage 3 is reversible until the quarantine retention window expires.
### 5.1 Stage 1 — Dry-run audit
Lists every file in the source release classified as KEEP / DELETE / FLAG.
Writes nothing.
```bash
# Dry-run audit on a single release dir.
cleanup-import.sh "/home/admin/Downloads/futrama/Futurama Season 1 [1080p AI x265 10bit FS99 Joy]"
```
Output (one line per file):
```
KEEP Futurama S01E01 Space Pilot 3000 [1080p x265 10bit Joy].mkv
KEEP folder.jpg
KEEP Featurettes/Episode One Animatic.mkv -> featurettes/Episode One Animatic.mkv
DELETE Encoded by JoyBell (UTR).txt [release-group brag]
DELETE How to play HEVC (THIS FILE).txt [codec promo .txt]
DELETE Ninite K-Lite Codecs Unattended Silent ....website [windows .website -- SECURITY]
DELETE Futurama Compare.png [encoder compare image]
FLAG SomeUnknownFile.bin [unknown extension]
```
A **summary** at the bottom:
```
KEEP 16 files (5.92 GiB)
DELETE 4 files (1.08 MiB)
FLAG 0 files
Run with --apply to quarantine the DELETE set.
```
Quick one-liner equivalents (for ad-hoc spot checks; the script § 9 is
preferred):
```bash
# What would I delete?
find "$SRC" \( \
-iname '*.txt' -o -iname '*.nfo' -o -iname '*.url' -o -iname '*.website' \
-o -iname '*.lnk' -o -iname '*.exe' -o -iname '*.msi' -o -iname '*.bat' \
-o -iname '*.scr' -o -iname '*.ps1' -o -iname '*.cmd' -o -iname '*.com' \
-o -iname 'Thumbs.db' -o -iname '.DS_Store' -o -iname 'Desktop.ini' \
-o -iname '*Compare*.png' -o -iname '*Compare*.jpg' \
-o -iname 'sample.mkv' -o -iname '*.sample.mkv' -o -iname '*-sample.mkv' \
-o -iname '*.torrent' -o -iname '*.sfv' -o -iname '*.md5' \
\) -print
# What looks like a real Jellyfin NFO vs a release-group brag?
find "$SRC" -iname '*.nfo' -print0 | while IFS= read -r -d '' f; do
if head -c 4096 "$f" | tr -d '[:space:]' \
| grep -qE '<(movie|tvshow|episodedetails|artist|album|musicvideo|season)\b'; then
printf 'KEEP %s\n' "$f"
else
printf 'DELETE %s\n' "$f"
fi
done
```
### 5.2 Stage 2 — Quarantine apply
```bash
cleanup-import.sh --apply "/home/admin/Downloads/futrama/Futurama Season 1 [...]"
```
What it does:
1. **Copies** the source directory tree to
`/home/admin/.jellyfin-staging/<release-name>/`. The source is never
modified.
2. Inside the staging copy, **moves** every DELETE-classified file to
`/home/admin/.jellyfin-quarantine/<YYYY-MM-DD>/<release-name>/`,
preserving relative paths so a user can `diff -r` to confirm.
3. **Renames** non-canonical extras subfolders to canonical lowercase
(§ 4.1).
4. Writes a manifest at
`/home/admin/.jellyfin-staging/<release-name>/.cleanup-manifest.json`
listing every file action with sha256, source path, action, target
path. This is what stage 3 reads.
5. Returns the staging path on stdout — that's the input to doc 08's
filename normalizer.
### 5.3 Stage 3 — Confirm and recycle
After the user reviews the quarantine directory and approves:
```bash
cleanup-import.sh --confirm-quarantine 2026-05-08
```
Moves `/home/admin/.jellyfin-quarantine/2026-05-08/` to the system trash
(via `gio trash`) — still recoverable, but no longer cluttering the
quarantine root. After 30 days a cron sweep empties trash older than that.
### 5.4 Never delete from source
The source download (`/home/admin/Downloads/futrama/...`) is **never**
modified by `cleanup-import.sh`. Reasons:
- The user may want to re-seed the torrent.
- The user may want to re-run cleanup with different rules later.
- Bugs in the cleanup script must never destroy original artefacts.
Source deletion is a separate manual step the user does AFTER the
import is verified in Jellyfin and the library is happy. There is no
script for it on purpose.
---
## 6. Idempotency, edge cases, and "unknown" handling
- **Idempotent.** `cleanup-import.sh --apply` on an already-cleaned staging
directory is a no-op (nothing matches DELETE). The script detects this
and exits 0 with `nothing to do`.
- **Re-runnable on source.** Re-running the script on the same source
produces a fresh staging copy, overwriting (after backup) the previous
staging directory. Quarantine is dated, so two runs on the same day for
the same release append rather than overwrite (`<release-name>.2/`,
`.3/`, etc.).
- **Unknown extension** (e.g. `.dat`, `.bin`, `.iso`, `.bin.txt`) — never
auto-deleted. FLAGGED in the audit output, surfaced to the user. The
user adds it to the local override file
`~/.config/jellyfin-cleanup/local-rules.conf` if they want it
classified next time.
- **Hidden dotfiles** (anything starting with `.` other than known OS
caches like `.DS_Store`) — FLAGGED. Don't auto-delete; could be a
legitimate `.subliminal.cache` (subtitles plugin) or similar.
- **Symlinks** — never followed. A symlink in a release directory is
always FLAGGED; the script refuses to copy or quarantine it.
- **Permission denied** — script bails with non-zero exit. Never
partially applies.
---
## 7. The `Futurama Compare.png` problem (artwork false-positive)
`Futurama Compare.png` is a 1.05 MB PNG sitting next to the season's MKV
files. To a naive image-globber it looks like artwork — same extension as
`folder.jpg`, larger than the typical poster, sitting in the right
location. It's actually an encoder comparison shot.
The rule from doc 01 (artwork) and enforced here:
> **An image file in the release root is KEPT only if its name is on the
> exact recognised-artwork allow-list.** Anything else is DELETED.
Recognised artwork allow-list (top-level of an item folder):
- `folder.{jpg,jpeg,png,webp}`
- `poster.{jpg,jpeg,png,webp}`
- `cover.{jpg,jpeg,png,webp}`
- `default.{jpg,jpeg,png,webp}`
- `show.{jpg,jpeg,png,webp}` (series only)
- `jacket.{jpg,jpeg,png,webp}` (series only)
- `movie.{jpg,jpeg,png,webp}` (movies only)
- `backdrop.{jpg,jpeg,png,webp}` and `backdrop[0-9]*.{jpg,jpeg,png,webp}`
- `fanart.{jpg,jpeg,png,webp}`, `background.{jpg,jpeg,png,webp}`,
`art.{jpg,jpeg,png,webp}`
- `logo.{png,jpg}`, `clearlogo.{png,jpg}`
- `banner.{jpg,png}`, `landscape.{jpg,png}`, `thumb.{jpg,png}`,
`disc.{png,jpg}`, `clearart.{png,jpg}`
- `season[0-9]*-poster.{jpg,png}`, `season[0-9]*.{jpg,png}`,
`season-specials-poster.{jpg,png}`
- `extrafanart/` and `backdrops/` directories (any contents OK)
Exception: images **inside** a recognised extras folder (`extras/`,
`featurettes/`, etc.) are KEPT regardless of name — they're presumed to be
intentional content of that extra.
`Futurama Compare.png` matches none of these allow-list patterns and is
not inside an extras folder, so it's DELETED.
---
## 8. Security rules
The single most important rule in this document:
> **Windows-executable extensions and Internet Shortcut formats are
> auto-deleted, never quarantined for "review", because the threat model
> isn't the Linux server, it's the Jellyfin user who downloads them.**
Jellyfin has a "Download original file" button for every item. If a
release contains `Codec Installer.exe`, Jellyfin will happily serve it to
any user with library access — including the friend on Windows who might
not understand that downloading and running an `.exe` from a media library
is a terrible idea. We don't trust the upload chain (the release group),
so we strip these on the server side.
Exhaustive auto-delete list (security override — these bypass the
"FLAG unknown" rule):
| Pattern | Risk |
|---|---|
| `*.exe` | Windows executable. Direct code execution on download+run. |
| `*.msi` | Windows Installer package. Silent install possible. |
| `*.bat`, `*.cmd` | Windows batch script. Runs in `cmd.exe`. |
| `*.com` | Old DOS-style executable. Still runs on modern Windows. |
| `*.scr` | Windows screensaver = .exe in disguise. Classic malware vector. |
| `*.ps1` | PowerShell script. Common modern malware delivery. |
| `*.vbs`, `*.wsf`, `*.hta`, `*.js` (Windows Script Host) | Active scripting. |
| `*.jar` | Java archive — runs as `java -jar` on systems with JRE. |
| `*.dll`, `*.sys` | Windows libraries / drivers. Side-load attacks. |
| `*.url`, `*.website`, `*.lnk` | Internet Shortcut / Windows Shortcut. Points at attacker-controlled URL. |
| `*.iso`, `*.img` (in a media folder, not at the library root) | Mountable disk image. Can carry Windows installers. **FLAG, not auto-delete** — could legitimately be a DVD rip. |
| `*.app/` | macOS application bundle. Auto-deleted. |
| `Autorun.inf` | Windows autorun config. **AUTO-DELETE.** |
Total auto-delete categories that are **purely** security-driven (not
Jellyfin-irrelevance-driven): **15**`.exe`, `.msi`, `.bat`, `.cmd`,
`.com`, `.scr`, `.ps1`, `.vbs`, `.wsf`, `.hta`, `.jar`, `.dll`, `.sys`,
`.url`/`.website`/`.lnk`, `Autorun.inf`. Plus 1 flagged for human review:
`.iso`/`.img`.
### 8.1 Why `.url` is in the security list
`.url` is a plain-text Internet Shortcut. On Windows, double-clicking it
opens the target in the default browser. The "target" is whatever the
release group put in the `URL=` line. Historically this was used to push
codec-pack download pages with bundled adware. There is no benign reason
for a `.url` to ship in a media release.
The Futurama release contains exactly this pattern:
```
[InternetShortcut]
URL=https://ninite.com/klitecodecs/
```
Ninite itself is reputable — but the principle is "do not ship clickable
URLs to third-party installers in a media library, ever".
### 8.2 The `RARBG_DO_NOT_MIRROR.exe` historic case
Some releases historically contained a file named
`RARBG_DO_NOT_MIRROR.exe`, ostensibly to discourage mirror sites from
re-uploading. In several documented cases this file was actually adware
or a cryptominer. Auto-delete, no questions asked.
---
## 9. Prepared cleanup script — `cleanup-import.sh`
Idempotent. Dry-run by default. Quarantine-first. Source-immutable.
Returns the staging path on stdout for piping to doc 08's normalizer.
Save to `bin/cleanup-import.sh` in the `ARRFLIX` repo.
Add pre-import cleanup + filename normalization rulesets - 07-pre-import-cleanup: 1002-line ruleset for stripping non-media junk before files land in /home/user/media/. Catalogs 10 categories (codec promo, group brag, promo images, OS thumb caches, samples, sub leftovers, torrent residue, proof folders, multi-disc cruft, Win executables). NFO discriminator uses 4096-byte head + XML-root regex (covers prologue case the brief 100-byte version misses). 15 auto-delete security categories (.exe/.msi/.bat/.scr/...); threat model = friend clicking 'Download original' then running on Win. Verified extras folders against Jellyfin docs (lowercase 'featurettes', 'behind the scenes', etc.). Includes idempotent dry-run-default cleanup-import.sh that quarantines first, returns staging path on stdout. - 08-filename-normalization: 1853-line normative renaming ruleset. Canonical: 'Show (Year) - SXXEXX - Title.ext' for TV; '<Title> (<Year>).ext' for movies; 'Show - NNNN - Title [Sub|Dub].ext' for absolute-numbered anime. Strips group tags ([YIFY]/[RARBG]/[FS99 Joy]/[GalaxyRG]), resolution (1080p/2160p/4K), codec (x264/x265/HEVC/10bit), source (WEB-DL/BluRay/HDTV), audio (DTS-HD.MA/Atmos/5.1/AAC), release-process (PROPER/REPACK/INTERNAL), trailing -NOGRP/-RARBG/-EVO, URL refs, basename language tokens. Includes stdlib-only normalize.py: dry-run default, --apply commits, --force overwrites, audit log to /var/log/jellyfin-imports/<date>.log, idempotent. Worked Futurama before/after; flags drift on live tree (current 'Futurama/' lacks '(1999)').
2026-05-08 02:07:11 +01:00
```bash
#!/usr/bin/env bash
# cleanup-import.sh — Pre-import cleanup for arrflix.s8n.ru
Add pre-import cleanup + filename normalization rulesets - 07-pre-import-cleanup: 1002-line ruleset for stripping non-media junk before files land in /home/user/media/. Catalogs 10 categories (codec promo, group brag, promo images, OS thumb caches, samples, sub leftovers, torrent residue, proof folders, multi-disc cruft, Win executables). NFO discriminator uses 4096-byte head + XML-root regex (covers prologue case the brief 100-byte version misses). 15 auto-delete security categories (.exe/.msi/.bat/.scr/...); threat model = friend clicking 'Download original' then running on Win. Verified extras folders against Jellyfin docs (lowercase 'featurettes', 'behind the scenes', etc.). Includes idempotent dry-run-default cleanup-import.sh that quarantines first, returns staging path on stdout. - 08-filename-normalization: 1853-line normative renaming ruleset. Canonical: 'Show (Year) - SXXEXX - Title.ext' for TV; '<Title> (<Year>).ext' for movies; 'Show - NNNN - Title [Sub|Dub].ext' for absolute-numbered anime. Strips group tags ([YIFY]/[RARBG]/[FS99 Joy]/[GalaxyRG]), resolution (1080p/2160p/4K), codec (x264/x265/HEVC/10bit), source (WEB-DL/BluRay/HDTV), audio (DTS-HD.MA/Atmos/5.1/AAC), release-process (PROPER/REPACK/INTERNAL), trailing -NOGRP/-RARBG/-EVO, URL refs, basename language tokens. Includes stdlib-only normalize.py: dry-run default, --apply commits, --force overwrites, audit log to /var/log/jellyfin-imports/<date>.log, idempotent. Worked Futurama before/after; flags drift on live tree (current 'Futurama/' lacks '(1999)').
2026-05-08 02:07:11 +01:00
# Version 1.0 (2026-05-08) — see docs/07-pre-import-cleanup.md
#
# Usage:
# cleanup-import.sh SRC # dry-run
# cleanup-import.sh --apply SRC # quarantine
# cleanup-import.sh --confirm-quarantine YYYY-MM-DD # recycle
#
# Exit codes:
# 0 success / nothing to do
# 1 user error (bad args, source not found)
# 2 internal error (permission, partial state)
# 3 flagged files present — user must review before --apply
set -euo pipefail
STAGING_ROOT="${JELLYFIN_STAGING_ROOT:-$HOME/.jellyfin-staging}"
QUARANTINE_ROOT="${JELLYFIN_QUARANTINE_ROOT:-$HOME/.jellyfin-quarantine}"
TODAY="$(date +%Y-%m-%d)"
# ----- classification -----
# Returns one of: KEEP DELETE FLAG
classify() {
local path="$1"
local base
base="$(basename "$path")"
local lower
lower="$(printf '%s' "$base" | tr '[:upper:]' '[:lower:]')"
# Security overrides — bypass everything else
case "$lower" in
*.exe|*.msi|*.bat|*.cmd|*.com|*.scr|*.ps1|*.vbs|*.wsf|*.hta|*.jar|*.dll|*.sys) echo DELETE; return ;;
*.url|*.website|*.lnk) echo DELETE; return ;;
autorun.inf) echo DELETE; return ;;
esac
# OS junk
case "$lower" in
thumbs.db|ehthumbs.db|ehthumbs_vista.db|.ds_store|desktop.ini|.directory) echo DELETE; return ;;
._*) echo DELETE; return ;;
esac
# Media — KEEP
case "$lower" in
*.mkv|*.mp4|*.avi|*.m4v|*.ts|*.mov|*.webm|*.wmv|*.flv|*.mpg|*.mpeg) echo KEEP; return ;;
*.srt|*.ass|*.ssa|*.vtt|*.sup|*.idx|*.sub) echo KEEP; return ;;
*.mp3|*.flac|*.ogg|*.opus|*.m4a|*.wav) echo KEEP; return ;;
esac
# Recognised artwork at item root
case "$lower" in
folder.jpg|folder.jpeg|folder.png|folder.webp) echo KEEP; return ;;
poster.jpg|poster.jpeg|poster.png|poster.webp) echo KEEP; return ;;
cover.jpg|cover.jpeg|cover.png|cover.webp) echo KEEP; return ;;
default.jpg|default.png|show.jpg|show.png|jacket.jpg|jacket.png|movie.jpg|movie.png) echo KEEP; return ;;
backdrop.jpg|backdrop.png|backdrop[0-9]*.jpg|backdrop[0-9]*.png) echo KEEP; return ;;
fanart.jpg|fanart.png|background.jpg|background.png|art.jpg|art.png) echo KEEP; return ;;
logo.png|logo.jpg|clearlogo.png|clearlogo.jpg|banner.jpg|banner.png) echo KEEP; return ;;
landscape.jpg|landscape.png|thumb.jpg|thumb.png|disc.png|disc.jpg|clearart.png|clearart.jpg) echo KEEP; return ;;
season[0-9]*-poster.jpg|season[0-9]*-poster.png|season[0-9]*.jpg|season[0-9]*.png) echo KEEP; return ;;
season-specials-poster.jpg|season-specials-poster.png) echo KEEP; return ;;
esac
# Promo images masquerading as art
case "$lower" in
*compare*.png|*compare*.jpg|*compare*.jpeg|*compare*.webp|*compare*.gif) echo DELETE; return ;;
*sample*.png|*sample*.jpg|*sample*.jpeg) echo DELETE; return ;;
*screen*.png|*screen*.jpg|*preview*.png|*preview*.jpg) echo DELETE; return ;;
esac
# Text-flavoured junk
case "$lower" in
*.txt|*.diz|file_id.diz) echo DELETE; return ;;
esac
# Sample files
case "$lower" in
sample.mkv|sample.mp4|sample.avi|sample.m4v) echo DELETE; return ;;
*-sample.mkv|*-sample.mp4|*.sample.mkv|*.sample.mp4|*_sample.mkv|*_sample.mp4) echo DELETE; return ;;
esac
# Torrent residue
case "$lower" in
*.torrent|*.magnet|*.parts|*.aria2|*.meta) echo DELETE; return ;;
*.pad|__padding_file_*|_____padding_file_*) echo DELETE; return ;;
*.sfv|*.md5|*.sha1|*.sha256) echo DELETE; return ;;
esac
# NFO discriminator — KEEP if Jellyfin-compatible XML, else DELETE
case "$lower" in
*.nfo)
if head -c 4096 "$path" | tr -d '[:space:]' \
| grep -qE '<(movie|tvshow|episodedetails|artist|album|musicvideo|season)\b'; then
echo KEEP
else
echo DELETE
fi
return
;;
esac
# Suspicious archives in a media folder
case "$lower" in
*.rar|*.r[0-9][0-9]|*.zip|*.7z|*.tar|*.tar.gz|*.iso|*.img) echo FLAG; return ;;
esac
echo FLAG
}
# ----- folder classification -----
# Returns one of: KEEP_AS-IS RENAME:<target> DELETE FLAG
classify_dir() {
local d="$1"
local lower
lower="$(basename "$d" | tr '[:upper:]' '[:lower:]')"
case "$lower" in
behind\ the\ scenes|deleted\ scenes|interviews|scenes|samples|shorts|featurettes|clips|other|extras|trailers|theme-music|backdrops)
echo "RENAME:$lower"; return ;;
bts|behind-the-scenes) echo "RENAME:behind the scenes"; return ;;
deleted-scenes|deleted_scenes) echo "RENAME:deleted scenes"; return ;;
bonus|bonus\ features|bonus\ material|special\ features|outtakes|bloopers|gag\ reel) echo "RENAME:extras"; return ;;
proof|screens|screenshots|caps|preview|previews) echo DELETE; return ;;
sample) echo DELETE; return ;;
.fseventsd|.spotlight-v100|.trashes|\$recycle.bin|system\ volume\ information) echo DELETE; return ;;
extrafanart) echo "RENAME:extrafanart"; return ;; # case stays, recognised
*) echo FLAG; return ;;
esac
}
# ----- main -----
APPLY=0
CONFIRM_DATE=""
SRC=""
while [[ $# -gt 0 ]]; do
case "$1" in
--apply) APPLY=1; shift ;;
--confirm-quarantine) CONFIRM_DATE="$2"; shift 2 ;;
-h|--help) sed -n '2,12p' "$0"; exit 0 ;;
-*) echo "unknown flag: $1" >&2; exit 1 ;;
*) SRC="$1"; shift ;;
esac
done
if [[ -n "$CONFIRM_DATE" ]]; then
if [[ -d "$QUARANTINE_ROOT/$CONFIRM_DATE" ]]; then
gio trash "$QUARANTINE_ROOT/$CONFIRM_DATE"
echo "Recycled $QUARANTINE_ROOT/$CONFIRM_DATE"
else
echo "No quarantine for $CONFIRM_DATE" >&2; exit 1
fi
exit 0
fi
[[ -n "$SRC" && -d "$SRC" ]] || { echo "usage: $0 [--apply] SRC" >&2; exit 1; }
RELEASE="$(basename "$SRC")"
STAGE="$STAGING_ROOT/$RELEASE"
QUAR="$QUARANTINE_ROOT/$TODAY/$RELEASE"
declare -i KEEP_N=0 DEL_N=0 FLAG_N=0
# Walk source, classify each entry
while IFS= read -r -d '' f; do
rel="${f#$SRC/}"
if [[ -d "$f" ]]; then
case "$(classify_dir "$f")" in
KEEP_AS-IS|RENAME:*) ;;
DELETE) printf 'DELETE %s/ [junk dir]\n' "$rel"; DEL_N+=1 ;;
FLAG) printf 'FLAG %s/ [unknown dir name]\n' "$rel"; FLAG_N+=1 ;;
esac
continue
fi
case "$(classify "$f")" in
KEEP) printf 'KEEP %s\n' "$rel"; KEEP_N+=1 ;;
DELETE) printf 'DELETE %s\n' "$rel"; DEL_N+=1 ;;
FLAG) printf 'FLAG %s\n' "$rel"; FLAG_N+=1 ;;
esac
done < <(find "$SRC" -mindepth 1 -print0)
echo "---"
echo "KEEP $KEEP_N"
echo "DELETE $DEL_N"
echo "FLAG $FLAG_N"
if (( FLAG_N > 0 )); then
echo "FLAG count > 0; review before re-running with --apply." >&2
(( APPLY == 0 )) || exit 3
fi
if (( APPLY == 0 )); then
echo "Dry run only. Re-run with --apply to quarantine."
exit 0
fi
# --- APPLY path: copy to staging, move DELETE to quarantine ---
mkdir -p "$STAGE" "$QUAR"
# rsync -a preserves perms and is idempotent
rsync -a --delete "$SRC/" "$STAGE/"
while IFS= read -r -d '' f; do
rel="${f#$STAGE/}"
if [[ -d "$f" ]]; then
res="$(classify_dir "$f")"
case "$res" in
RENAME:*)
target="${res#RENAME:}"
parent="$(dirname "$f")"
[[ "$(basename "$f")" == "$target" ]] || mv "$f" "$parent/$target"
;;
DELETE)
mkdir -p "$QUAR/$(dirname "$rel")"
mv "$f" "$QUAR/$rel"
;;
esac
continue
fi
case "$(classify "$f")" in
DELETE)
mkdir -p "$QUAR/$(dirname "$rel")"
mv "$f" "$QUAR/$rel"
;;
esac
done < <(find "$STAGE" -mindepth 1 -print0)
# Manifest
{
echo "{"
echo " \"release\": \"$RELEASE\","
echo " \"date\": \"$TODAY\","
echo " \"source\": \"$SRC\","
echo " \"staging\": \"$STAGE\","
echo " \"quarantine\": \"$QUAR\""
echo "}"
} > "$STAGE/.cleanup-manifest.json"
# Stdout: the staging path, for piping to doc 08's normalizer
echo "$STAGE"
```
### 9.1 Pipeline integration
```bash
# Full pre-import flow:
SRC="/home/admin/Downloads/futrama/Futurama Season 1 [1080p AI x265 10bit FS99 Joy]"
STAGING="$(cleanup-import.sh --apply "$SRC")"
# STAGING is now ~/.jellyfin-staging/Futurama Season 1.../ with junk gone.
# Hand off to doc 08:
normalize-filenames.sh "$STAGING"
# Then move to live media tree (manual; doc 05 confirms layout):
mv "$STAGING" "/home/user/media/tv/Futurama (1999)/Season 01"
```
The `mv` to the live tree is **deliberately manual**. Cleanup and rename
are reproducible from source; the move into `/home/user/media/` is the
point of no return and the user runs it consciously.
---
## 10. What this doc explicitly does NOT do
- **Filename normalization** — that's doc 08. This doc only deletes; doc 08
renames `Futurama S01E01 Space Pilot 3000 [1080p x265 10bit Joy].mkv`
into the canonical `Futurama (1999) - S01E01 - Space Pilot 3000.mkv`.
- **Subtitle reconciliation** — doc 03 covers per-language naming; this
doc only deletes obsolete formats (`.smi`, `.rt`).
- **Library refresh** — after files land in `/home/user/media/`, run
`POST /Library/Refresh` on the Jellyfin API (doc 02 § 2). Cleanup never
touches the running container.
- **NFO writing** — doc 02 § 11 covers writing override NFOs. This doc
only filters incoming NFOs.
- **Source deletion** — never. The source download is read-only to this
pipeline; the user removes it manually post-import.
---
## 11. TL;DR
| Step | What | Where |
|---|---|---|
| 1 | Audit (dry-run) | `cleanup-import.sh "$SRC"` |
| 2 | Apply (quarantine) | `cleanup-import.sh --apply "$SRC"` → prints staging path |
| 3 | Review quarantine | `ls ~/.jellyfin-quarantine/$(date +%F)/` |
| 4 | Normalize filenames | doc 08, takes staging path as input |
| 5 | Move to live tree | manual `mv "$STAGING" /home/user/media/...` |
| 6 | Refresh library | `POST /Library/Refresh` (doc 02) |
| 7 | Confirm quarantine | `cleanup-import.sh --confirm-quarantine YYYY-MM-DD` |
| 8 | Delete source | manual, only after Jellyfin shows the item correctly |
The hard rule, repeated: **the source download is never modified, the live
media tree is never written by cleanup, and Windows executables never
reach a Jellyfin user's browser.**