ARRFLIX/docs/07-pre-import-cleanup.md
s8n 1f5ba31483 Rename: nasflix → ARRFLIX + apply Cineplex theme
Domain + repo rename: nasflix.s8n.ru → arrflix.s8n.ru, NASFLIX → ARRFLIX
(Forgejo repo, Pi-hole DNS, Traefik file+label routes, compose env+labels,
onyx /etc/hosts, branding LoginDisclaimer, all repo refs, logo asset).

Theme: ElegantFin → Cineplex v1.0.6 (MRunkehl, pinned). Picked by research
agent over JellyFlix (halted), DarkFlix (10.8.x only), Theme Park (no
Netflix preset). Real #E50914 + Netflix Sans webfont + transform:scale
hover + gradient login backdrop. Doc 04 updated with full candidate
matrix, theme-history subsection, rollback-to-ElegantFin snippet.

Logo asset saved at assets/logo.png (235x85 RGBA).

Live: https://arrflix.s8n.ru → 302. tv.s8n.ru + nasflix.s8n.ru retired (404).
2026-05-08 02:57:34 +01:00

41 KiB

07 — Pre-Import Cleanup Ruleset (arrflix.s8n.ru)

Last updated: 2026-05-08 Server: Jellyfin 10.10.3 on nullstone, container jellyfin Library root inside container: /media Library root on host: /home/user/media

This document defines the normative pre-import cleanup ruleset for the personal Jellyfin deploy. The owner downloads scene/group releases (e.g. Futurama Season 1 [1080p AI x265 10bit FS99 Joy]/) which contain a mixture of media files and non-media junk (codec readmes, release-group brags, Windows installer shortcuts, comparison images, OS thumbnail caches, etc.). This junk must NOT land in /home/user/media/ because:

  1. It clutters the library and confuses scrapers.
  2. Promo PNGs may be mis-identified as artwork.
  3. Release-group .nfo files break the NFO-override flow (doc 02 § 11).
  4. Windows executables and installer shortcuts (.exe, .msi, .website, .url, .lnk, .scr, .bat, .ps1) are a real security vector. Even though the Linux server cannot execute them, friends with a Jellyfin account can download them through the web UI and run them on their PC.

Cross-linked to:

Sources of truth:


0. Top-level cleanup rules

These are non-negotiable. They wrap the doc 05 top-level rules with one guarantee: nothing leaves staging until cleanup has run and been confirmed.

  1. Never clean in-place on the source download. The download directory (/home/admin/Downloads/...) is treated as a read-only artefact until the user explicitly approves deletion. The cleanup script copies into a staging area and operates there.
  2. Quarantine first, delete later. First run of the cleanup script on a release moves junk to ~/.jellyfin-quarantine/<YYYY-MM-DD>/<release-name>/ instead of deleting. The user reviews, then a second pass empties the quarantine after sign-off. Subsequent runs on the same release are idempotent.
  3. Two-list policy. Every file is matched against an ALLOW list (KEEP) or a DENY list (DELETE). Anything not on either list is flagged and surfaced in the audit report — a human decides. Never auto-delete on "unknown".
  4. Never run cleanup as root. All operations are as the unprivileged admin (onyx) or user (nullstone) account. The live /home/user/media/ tree is touched only by the rename step in doc 08, after cleanup has produced an intermediate staging copy.
  5. Idempotent. Running cleanup twice on the same source must produce the same staging tree byte-for-byte (same find -printf '%p %s\n' | sort output, modulo timestamps).
  6. Dry-run is the default. The cleanup script with no flags lists what it would do and exits without writing. --apply is required to actually move/quarantine files.

1. Categorical taxonomy of non-media files in scene/group releases

Scene and group ("p2p") releases follow loose conventions. The following categories cover everything observed in the wild plus everything in the Futurama download set:

1.1 Codec / player promotion

Text files and Windows shortcut files steering the user toward a specific codec pack or media player (often K-Lite + MPC-HC). Frequently the file is an .url or .website (Internet Shortcut) pointing to a third-party installer. Always DELETE.

Real-world examples (/home/admin/Downloads/futrama/):

  • How to play HEVC (THIS FILE).txt — 65 lines of MPC-HC marketing.
  • Ninite K-Lite Codecs Unattended Silent Installer and Updater.websiteURL=https://ninite.com/klitecodecs/ Internet Shortcut.

Patterns:

  • How to play *.txt, Read*Me*.txt, INSTALL*.txt, PLAY*.txt
  • *.website, *.url, *.lnk
  • K-Lite*, MPC-HC*, VLC*, MX Player*, LAV*

1.2 Release-group brag

Plain-text or .nfo files where the release group identifies itself, documents encoder settings, or pumps its tracker URL. Distinguishable from a Jellyfin-compatible metadata NFO (XML, root <movie> / <tvshow> / <episodedetails>) by content — see § 3.

Real-world examples:

  • Encoded by JoyBell (UTR).txt — 41-line manifesto from "Unity Team Release group" pointing to UNITEAM.CO.
  • RARBG.txt, WWW.YIFY-TORRENTS.COM.url, <group>.nfo with ASCII art.

Patterns:

  • Encoded by *.txt, Ripped by *.txt, <GROUP>.txt
  • RARBG.txt, RARBG_DO_NOT_MIRROR.exe (yes, those exist; § 1.10)
  • *-readme.txt, release notes.txt
  • *.nfo containing only ASCII art (no <movie> / <tvshow> / <episodedetails> root element)
  • *.diz, file_id.diz — old "BBS description" file, scene leftover

1.3 Promo images that are NOT poster artwork

Images that LOOK like artwork to a naive globber but are actually before/after comparisons, group banners, or screenshot proofs. Delete unless they live inside a recognised extras folder (§ 4) or match the strict allow-list of poster/backdrop names from doc 01.

Real-world example:

  • Futurama Compare.png (1.05 MB) — encoder before/after comparison.

Patterns to delete:

  • *Compare*.{png,jpg,jpeg,webp}
  • *Sample*.{png,jpg,jpeg} (when not in a samples/ extras folder)
  • *Screen*.{png,jpg}, *Screens/*, *Proof/*, *Preview/*
  • *-banner.png from a group (NOT the same as Jellyfin's banner.jpg; group banners typically have the group name in the filename — heuristic match *JoyBell*, *UTR*, *JoY*, etc.)
  • Stray *.gif files (animated previews); Jellyfin doesn't use GIF.

1.4 OS-generated thumbnail caches

Per-OS file managers (Windows Explorer, macOS Finder, GNOME Files) leave turds in every directory they browse. Always DELETE — never useful, never metadata.

Patterns:

  • Thumbs.db, ehthumbs.db, ehthumbs_vista.db
  • .DS_Store, ._* (macOS resource forks)
  • Desktop.ini, desktop.ini
  • .directory (KDE)
  • .fseventsd/, .Spotlight-V100/, .Trashes/ (macOS)
  • $RECYCLE.BIN/, System Volume Information/ (Windows mount)

1.5 Sample files (lower-quality previews)

Scene releases sometimes ship a 30-second sample file at lower bitrate. Jellyfin treats a samples/ subfolder as extras (doc 05 § 8.2), but a stray Movie.sample.mkv next to the main file would scrape as "another version".

Default: DELETE. Reasoning: we have the full file; the sample is dead weight. If the user genuinely wants samples, drop them into a samples/ subfolder before running cleanup and the script will preserve the folder.

Patterns to delete (when at the top level of a release):

  • sample.{mkv,mp4,avi,m4v}
  • *-sample.{mkv,mp4,avi,m4v}, *.sample.{mkv,mp4,avi,m4v}
  • *_sample.{mkv,mp4,avi,m4v}
  • Sample/ directory (rename to samples/ to preserve as extras, OR delete)

1.6 Subtitle leftovers

VobSub (DVD/Blu-ray bitmap subs) are shipped as a pair: en.idx (index) + en.sub (bitmap stream). Jellyfin can render them, but if a .srt exists with the same language tag the bitmap pair is redundant and slow.

Default: KEEP all .srt and .ass. KEEP .idx/.sub only if no .srt of the same language exists. This is a per-file decision — surface to the user in the audit report rather than auto-pruning.

Patterns:

  • *.srt, *.ass, *.ssa, *.vtt — KEEP (per doc 03).
  • *.sup (PGS bitmap, Blu-ray) — KEEP (Jellyfin renders).
  • *.idx + *.sub (VobSub) — KEEP if no .srt with same lang code; else flag for human review.
  • *.smi, *.rt — DELETE (obsolete formats Jellyfin doesn't support).

1.7 Torrent residue

Files left by the torrent client itself. None are useful to Jellyfin.

Patterns to delete:

  • *.torrent, *.magnet
  • *.parts, *.!ut, *.!qB, *.bc! (in-progress fragments)
  • *.meta, *.aria2
  • *.pad, padding/, __padding_file_* (mktorrent padding)
  • *.sfv (checksum manifest; harmless but useless after download)
  • *.md5, *.sha1, *.sha256 (release-checksum sidecars)

1.8 Test / proof images and folders

Some groups ship a Proof/ or Screens/ folder with screenshots to "prove" the rip's quality. Useless inside a Jellyfin library.

Patterns to delete (whole folders):

  • Proof/, proof/, PROOF/
  • Screens/, screens/, Screenshots/, Caps/
  • Preview/, Previews/
  • _screens/, screenshots-only/

1.9 Multi-disc DVD/Blu-ray cruft

When a release is a straight ISO rip the VIDEO_TS/ or BDMV/ directory sometimes survives next to the encoded file. Jellyfin can play VIDEO_TS.IFO directly, but a partial DVD structure left over from the encode is just clutter.

Patterns:

  • VIDEO_TS/ — KEEP if it contains a complete VIDEO_TS.VOB set; otherwise flag.
  • *.IFO, *.BUP, *.VOB — KEEP if inside a complete VIDEO_TS/; DELETE if loose.
  • BDMV/, CERTIFICATE/, AACS/ — KEEP if complete BD structure; flag if partial.
  • *.iso inside a media folder — flag for human review (could be the intentional rip OR a Windows malware vector — see § 8).

1.10 Outright malicious / suspicious

Some releases historically shipped Windows executables disguised as "DO NOT MIRROR" anti-leech files. Even on a Linux server these must be deleted because the friend with a Jellyfin account can download them via the web UI ("Download original file" button) and run them locally.

Always DELETE, never quarantine, never preserve, no exceptions.

Patterns:

  • *.exe, *.msi, *.bat, *.cmd, *.com, *.scr, *.ps1, *.vbs, *.wsf, *.hta, *.jar
  • *.app/ (macOS bundle dropped by macOS-using uploader)
  • *.dll, *.sys (rare, but seen)
  • Anything with a double extension like Movie.mkv.exe

2. KEEP vs DELETE — exhaustive table

This table is the canonical decision matrix for cleanup-import.sh. Patterns are case-insensitive on ext4+Jellyfin. KEEP means it goes to the staging tree; DELETE means it goes to quarantine on first run, then recycle-bin on confirm.

Pattern Action Why
*.mkv, *.mp4, *.avi, *.m4v, *.ts, *.mov, *.webm, *.wmv, *.flv, *.mpg, *.mpeg KEEP Media — the entire point.
*.srt, *.ass, *.ssa, *.vtt, *.sup KEEP Subtitles (doc 03).
*.idx + *.sub (VobSub pair) KEEP if no .srt of same lang exists; else FLAG Bitmap subs; redundant with SRT.
*.smi, *.rt DELETE Obsolete subtitle formats; Jellyfin can't render.
folder.{jpg,png}, poster.{jpg,png}, cover.{jpg,png}, default.{jpg,png}, show.{jpg,png}, jacket.{jpg,png}, movie.{jpg,png} KEEP Jellyfin-recognised primary artwork (doc 01).
backdrop.{jpg,png}, fanart.{jpg,png}, background.{jpg,png}, art.{jpg,png}, backdrop[0-9]*.{jpg,png}, backdrop-[0-9]*.{jpg,png} KEEP Jellyfin-recognised backdrops (doc 01).
logo.{png,jpg}, clearlogo.{png,jpg}, banner.{jpg,png}, landscape.{jpg,png}, thumb.{jpg,png}, disc.{png,jpg}, clearart.{png,jpg} KEEP Jellyfin-recognised auxiliary artwork.
season[0-9]*-poster.{jpg,png}, season[0-9]*.{jpg,png}, season-specials-poster.{jpg,png} KEEP Per-season artwork (doc 01 / TV layout).
extrafanart/*.{jpg,png}, backdrops/*.{jpg,png,mp4} KEEP Multi-backdrop folders (doc 05 § 8).
*.nfo with XML root <movie> / <tvshow> / <episodedetails> / <artist> / <album> / <musicvideo> KEEP Jellyfin-compatible metadata sidecar (doc 02 § 11).
*.nfo without one of the above XML roots DELETE Release-group ASCII-art brag — pretends to be metadata, isn't.
*Compare*.{png,jpg,jpeg,webp,gif} DELETE Encoder before/after — group promo.
*Sample*.{png,jpg,jpeg} (image, top level) DELETE Group promo (NOT a Jellyfin sample folder).
*Screen*.{png,jpg}, Screens/, Screenshots/, Caps/ DELETE Proof shots.
Proof/, proof/, PROOF/ DELETE (whole folder) Quality-proof shots.
Preview/, Previews/ DELETE (whole folder) Lower-quality teaser.
*.txt (any) DELETE Readme / group brag — Jellyfin doesn't read TXT.
*.diz, file_id.diz DELETE Scene description file — obsolete.
*.website, *.url, *.lnk DELETE Windows Internet Shortcut — points at codec/installer pages. Security: § 8.
*.exe, *.msi, *.bat, *.cmd, *.com, *.scr, *.ps1, *.vbs, *.wsf, *.hta, *.jar, *.dll, *.sys DELETE Windows executable. Security: § 8.
*.app/ DELETE (whole folder) macOS bundle.
Thumbs.db, ehthumbs.db, ehthumbs_vista.db DELETE Windows Explorer thumbnail cache.
.DS_Store, ._* DELETE macOS Finder.
Desktop.ini, desktop.ini DELETE Windows folder customisation.
.directory DELETE KDE Dolphin.
.fseventsd/, .Spotlight-V100/, .Trashes/, $RECYCLE.BIN/, System Volume Information/ DELETE (whole folder) OS metadata directories.
sample.{mkv,mp4,avi,m4v} (top level) DELETE Lower-quality preview (doc 05 § 8.1: full file already present).
*-sample.{mkv,mp4,avi,m4v}, *_sample.{mkv,mp4,avi,m4v}, *.sample.{mkv,mp4,avi,m4v} DELETE Same.
Sample/ (directory, top level) DELETE Lower-quality preview folder.
samples/ (directory, recognised name) KEEP Jellyfin extras folder (doc 05 § 8.2).
featurettes/, behind the scenes/, deleted scenes/, interviews/, scenes/, shorts/, clips/, trailers/, extras/, other/, theme-music/, backdrops/ KEEP (whole folder) Jellyfin extras (doc 05 § 8.2).
Featurettes/, Behind The Scenes/, etc. (capitalised) KEEP but rename to lowercase Jellyfin matches case-insensitive but lowercase is the documented form.
Any other folder name FLAG Surface to human; might be a typo of an extras folder.
*.torrent, *.magnet DELETE Torrent client residue.
*.parts, *.!ut, *.!qB, *.bc!, *.aria2 DELETE In-progress download fragments (shouldn't be here, but defensive).
*.meta DELETE aria2/torrent metadata.
*.pad, padding/, __padding_file_*, _____padding_file_* DELETE mktorrent padding files.
*.sfv, *.md5, *.sha1, *.sha256 DELETE Checksum manifests; harmless but useless after download.
*.rar, *.r[0-9][0-9], *.zip, *.7z, *.tar, *.tar.gz FLAG Compressed archive in a media folder is suspicious — release should have been extracted before download.
*.iso inside a media folder FLAG Could be intentional DVD/BD rip OR Windows-installer disguise. Human review.
VIDEO_TS/ (complete) KEEP Jellyfin plays DVD structure directly.
*.IFO, *.BUP, *.VOB (loose, no VIDEO_TS/) DELETE Orphan DVD remnants.
BDMV/ (complete) KEEP Jellyfin plays BD structure.
CERTIFICATE/, AACS/ (without BDMV/) DELETE Orphan BD remnants.
RARBG*.{txt,exe}, WWW.*.url, *.YIFY*.url DELETE Tracker promo.
RARBG_DO_NOT_MIRROR.exe and similar DELETE (security: § 8) Historic anti-leech file; sometimes weaponised.
Anything else FLAG Two-list policy: never auto-delete on "unknown".

3. NFO handling — the nuanced case

.nfo is overloaded. Two completely different file kinds share the extension:

  • Scene release .nfo — plain text, ASCII art, encoder credits, tracker URL. Useless to Jellyfin (and at worst gets scraped as garbage metadata if NFO Saver is enabled).
  • Jellyfin/Kodi/Emby metadata NFO — XML, root element is one of <movie>, <tvshow>, <episodedetails>, <artist>, <album>, <musicvideo>. Documented in doc 02 § 11.

3.1 The discriminator one-liner

is_jellyfin_nfo() {
  # Returns 0 (KEEP) if the file looks like a Jellyfin/Kodi NFO,
  # 1 (DELETE) if it looks like scene-group ASCII-art brag.
  head -c 4096 "$1" | tr -d '[:space:]' \
    | grep -qE '<(movie|tvshow|episodedetails|artist|album|musicvideo|season)\b'
}

# Usage:
if is_jellyfin_nfo "$f"; then echo "KEEP $f"; else echo "DELETE $f"; fi

The first 4096 bytes are enough — a real Jellyfin NFO declares its root within the first kilobyte. tr -d '[:space:]' is needed because some encoders pretty-print the XML and put <movie on a different line from <.

3.2 Edge cases

  • An NFO with both ASCII art and an XML root: KEEP. Jellyfin's parser ignores leading non-XML noise as long as the XML element parses.
  • An NFO with a different XML root (e.g. <root>, <info>): DELETE. Jellyfin won't read it; nothing to preserve.
  • An NFO with valid XML but stale TMDB/IMDB IDs that conflict with a newer scrape: KEEP, but flag for the user — doc 02 § 11.5 explains how the NFO Saver overwrites these on next scrape.
  • Multiple NFOs in one folder (e.g. release.nfo from the group AND tvshow.nfo from a previous Jellyfin write): KEEP tvshow.nfo, DELETE release.nfo. Use the discriminator above on each.

3.3 First-100-bytes shortcut

The task brief proposes this:

if head -c 100 file.nfo | grep -qE '<(movie|tvshow|episodedetails)\b'; then echo KEEP; else echo DELETE; fi

This works for the common case but misses NFOs that start with an XML declaration (<?xml version="1.0"?> plus possibly a comment) before the root element — that prologue alone can be > 100 bytes. The 4096-byte version above is safer; we use that in cleanup-import.sh.


4. Featurettes / Extras / Bonus folders — the canonical list

Per the Jellyfin docs (movies and shows pages), these subfolder names are recognised and the contained files are tagged with the matching extra type. Folder name match is case-insensitive but lowercase is the documented canonical formcleanup-import.sh lowercases on copy to staging.

Folder name Extra type Notes
behind the scenes Behind The Scenes spaces, not dashes
deleted scenes Deleted Scene
interviews Interview
scenes Scene
samples Sample distinct from a top-level Sample/ (§ 1.5)
shorts Short
featurettes Featurette
clips Clip
other Other catch-all
extras Extra generic catch-all
trailers Trailer
theme-music Theme music .mp3 files; doc 05 § 8.3
backdrops Backdrop video rotating video backgrounds

Anything else (e.g. Bonus Features/, BTS/, Special Features/, Featurette/ singular, behind-the-scenes/ with dashes) is NOT matched by Jellyfin and the contents won't surface as extras. Cleanup either renames to the canonical name (when the mapping is unambiguous) or flags for human review.

4.1 Canonical-name mapping (auto-rename)

Found Renamed to
Featurettes/, Featurette/, FEATURETTES/ featurettes/
Behind The Scenes/, BTS/, behind-the-scenes/ behind the scenes/
Deleted Scenes/, Deleted_Scenes/, deleted-scenes/ deleted scenes/
Interviews/, Interview/ interviews/
Trailers/, Trailer/ trailers/
Bonus/, Bonus Features/, Bonus Material/, Special Features/, Specials/ extras/ (generic catch-all)
Outtakes/, Bloopers/, Gag Reel/ extras/ (no dedicated folder)

The Specials/ rename to extras/ is important — for a TV series, Specials/ looks like a season folder (Season 0 specials), but if the files inside are featurettes rather than aired specials, putting them in the wrong folder mis-scrapes them as episodes. When in doubt, flag.

4.2 Real-world example: Futurama download

The four Futurama season folders all contain a Featurettes/ subfolder:

Futurama Season 1  [1080p AI x265 10bit FS99 Joy]/Featurettes/
├── Episode One Animatic.mkv
└── Welcome to the World of Tomorrow.mkv

Futurama Season 2 .../Featurettes/
├── Animatic -Why Must I be a Crustacean in Love.mkv
└── Futurama Game Trailer.mkv

Futurama Season 3 .../Featurettes/
├── An X-Mas Message From David X. Cohen.mkv
└── Deleted Scenes.mkv

Futurama Season 4 .../Featurettes/
├── Futurama Welcome to the World of Tomorrow (x265 Joy).mkv
├── Outtakes - Kif Gets Knocked Up a Notch  [1080p x265 10bit Joy].mkv
└── Panel on Voice Actors   [1080p x265 10bit Joy].mkv

After cleanup these become featurettes/ (lowercase) inside the season folder. Doc 08 (filename normalization) then renames the season folder itself to Season 01/ and may relocate the season-level featurettes to a series-level featurettes/ folder if the user prefers extras at the series root (this is a doc 05 § 8 / doc 08 decision, not this doc's).

Note: Season 3 / Deleted Scenes.mkv is a single file and should arguably be moved into a deleted scenes/ subfolder rather than left in featurettes/. That's a manual disambiguation — flagged, not auto-moved.


5. Audit-then-clean workflow

Three-stage pipeline. Stage 1 is mandatory; stage 2 runs on user approval; stage 3 is reversible until the quarantine retention window expires.

5.1 Stage 1 — Dry-run audit

Lists every file in the source release classified as KEEP / DELETE / FLAG. Writes nothing.

# Dry-run audit on a single release dir.
cleanup-import.sh "/home/admin/Downloads/futrama/Futurama Season 1  [1080p AI x265 10bit FS99 Joy]"

Output (one line per file):

KEEP    Futurama S01E01 Space Pilot 3000  [1080p x265 10bit Joy].mkv
KEEP    folder.jpg
KEEP    Featurettes/Episode One Animatic.mkv  -> featurettes/Episode One Animatic.mkv
DELETE  Encoded by JoyBell (UTR).txt                    [release-group brag]
DELETE  How to play HEVC (THIS FILE).txt                [codec promo .txt]
DELETE  Ninite K-Lite Codecs Unattended Silent ....website   [windows .website -- SECURITY]
DELETE  Futurama Compare.png                            [encoder compare image]
FLAG    SomeUnknownFile.bin                             [unknown extension]

A summary at the bottom:

KEEP   16 files (5.92 GiB)
DELETE 4 files  (1.08 MiB)
FLAG   0 files
Run with --apply to quarantine the DELETE set.

Quick one-liner equivalents (for ad-hoc spot checks; the script § 9 is preferred):

# What would I delete?
find "$SRC" \( \
     -iname '*.txt' -o -iname '*.nfo' -o -iname '*.url' -o -iname '*.website' \
  -o -iname '*.lnk' -o -iname '*.exe' -o -iname '*.msi' -o -iname '*.bat' \
  -o -iname '*.scr' -o -iname '*.ps1' -o -iname '*.cmd' -o -iname '*.com' \
  -o -iname 'Thumbs.db' -o -iname '.DS_Store' -o -iname 'Desktop.ini' \
  -o -iname '*Compare*.png' -o -iname '*Compare*.jpg' \
  -o -iname 'sample.mkv' -o -iname '*.sample.mkv' -o -iname '*-sample.mkv' \
  -o -iname '*.torrent' -o -iname '*.sfv' -o -iname '*.md5' \
\) -print

# What looks like a real Jellyfin NFO vs a release-group brag?
find "$SRC" -iname '*.nfo' -print0 | while IFS= read -r -d '' f; do
  if head -c 4096 "$f" | tr -d '[:space:]' \
       | grep -qE '<(movie|tvshow|episodedetails|artist|album|musicvideo|season)\b'; then
    printf 'KEEP   %s\n' "$f"
  else
    printf 'DELETE %s\n' "$f"
  fi
done

5.2 Stage 2 — Quarantine apply

cleanup-import.sh --apply "/home/admin/Downloads/futrama/Futurama Season 1  [...]"

What it does:

  1. Copies the source directory tree to /home/admin/.jellyfin-staging/<release-name>/. The source is never modified.
  2. Inside the staging copy, moves every DELETE-classified file to /home/admin/.jellyfin-quarantine/<YYYY-MM-DD>/<release-name>/, preserving relative paths so a user can diff -r to confirm.
  3. Renames non-canonical extras subfolders to canonical lowercase (§ 4.1).
  4. Writes a manifest at /home/admin/.jellyfin-staging/<release-name>/.cleanup-manifest.json listing every file action with sha256, source path, action, target path. This is what stage 3 reads.
  5. Returns the staging path on stdout — that's the input to doc 08's filename normalizer.

5.3 Stage 3 — Confirm and recycle

After the user reviews the quarantine directory and approves:

cleanup-import.sh --confirm-quarantine 2026-05-08

Moves /home/admin/.jellyfin-quarantine/2026-05-08/ to the system trash (via gio trash) — still recoverable, but no longer cluttering the quarantine root. After 30 days a cron sweep empties trash older than that.

5.4 Never delete from source

The source download (/home/admin/Downloads/futrama/...) is never modified by cleanup-import.sh. Reasons:

  • The user may want to re-seed the torrent.
  • The user may want to re-run cleanup with different rules later.
  • Bugs in the cleanup script must never destroy original artefacts.

Source deletion is a separate manual step the user does AFTER the import is verified in Jellyfin and the library is happy. There is no script for it on purpose.


6. Idempotency, edge cases, and "unknown" handling

  • Idempotent. cleanup-import.sh --apply on an already-cleaned staging directory is a no-op (nothing matches DELETE). The script detects this and exits 0 with nothing to do.
  • Re-runnable on source. Re-running the script on the same source produces a fresh staging copy, overwriting (after backup) the previous staging directory. Quarantine is dated, so two runs on the same day for the same release append rather than overwrite (<release-name>.2/, .3/, etc.).
  • Unknown extension (e.g. .dat, .bin, .iso, .bin.txt) — never auto-deleted. FLAGGED in the audit output, surfaced to the user. The user adds it to the local override file ~/.config/jellyfin-cleanup/local-rules.conf if they want it classified next time.
  • Hidden dotfiles (anything starting with . other than known OS caches like .DS_Store) — FLAGGED. Don't auto-delete; could be a legitimate .subliminal.cache (subtitles plugin) or similar.
  • Symlinks — never followed. A symlink in a release directory is always FLAGGED; the script refuses to copy or quarantine it.
  • Permission denied — script bails with non-zero exit. Never partially applies.

7. The Futurama Compare.png problem (artwork false-positive)

Futurama Compare.png is a 1.05 MB PNG sitting next to the season's MKV files. To a naive image-globber it looks like artwork — same extension as folder.jpg, larger than the typical poster, sitting in the right location. It's actually an encoder comparison shot.

The rule from doc 01 (artwork) and enforced here:

An image file in the release root is KEPT only if its name is on the exact recognised-artwork allow-list. Anything else is DELETED.

Recognised artwork allow-list (top-level of an item folder):

  • folder.{jpg,jpeg,png,webp}
  • poster.{jpg,jpeg,png,webp}
  • cover.{jpg,jpeg,png,webp}
  • default.{jpg,jpeg,png,webp}
  • show.{jpg,jpeg,png,webp} (series only)
  • jacket.{jpg,jpeg,png,webp} (series only)
  • movie.{jpg,jpeg,png,webp} (movies only)
  • backdrop.{jpg,jpeg,png,webp} and backdrop[0-9]*.{jpg,jpeg,png,webp}
  • fanart.{jpg,jpeg,png,webp}, background.{jpg,jpeg,png,webp}, art.{jpg,jpeg,png,webp}
  • logo.{png,jpg}, clearlogo.{png,jpg}
  • banner.{jpg,png}, landscape.{jpg,png}, thumb.{jpg,png}, disc.{png,jpg}, clearart.{png,jpg}
  • season[0-9]*-poster.{jpg,png}, season[0-9]*.{jpg,png}, season-specials-poster.{jpg,png}
  • extrafanart/ and backdrops/ directories (any contents OK)

Exception: images inside a recognised extras folder (extras/, featurettes/, etc.) are KEPT regardless of name — they're presumed to be intentional content of that extra.

Futurama Compare.png matches none of these allow-list patterns and is not inside an extras folder, so it's DELETED.


8. Security rules

The single most important rule in this document:

Windows-executable extensions and Internet Shortcut formats are auto-deleted, never quarantined for "review", because the threat model isn't the Linux server, it's the Jellyfin user who downloads them.

Jellyfin has a "Download original file" button for every item. If a release contains Codec Installer.exe, Jellyfin will happily serve it to any user with library access — including the friend on Windows who might not understand that downloading and running an .exe from a media library is a terrible idea. We don't trust the upload chain (the release group), so we strip these on the server side.

Exhaustive auto-delete list (security override — these bypass the "FLAG unknown" rule):

Pattern Risk
*.exe Windows executable. Direct code execution on download+run.
*.msi Windows Installer package. Silent install possible.
*.bat, *.cmd Windows batch script. Runs in cmd.exe.
*.com Old DOS-style executable. Still runs on modern Windows.
*.scr Windows screensaver = .exe in disguise. Classic malware vector.
*.ps1 PowerShell script. Common modern malware delivery.
*.vbs, *.wsf, *.hta, *.js (Windows Script Host) Active scripting.
*.jar Java archive — runs as java -jar on systems with JRE.
*.dll, *.sys Windows libraries / drivers. Side-load attacks.
*.url, *.website, *.lnk Internet Shortcut / Windows Shortcut. Points at attacker-controlled URL.
*.iso, *.img (in a media folder, not at the library root) Mountable disk image. Can carry Windows installers. FLAG, not auto-delete — could legitimately be a DVD rip.
*.app/ macOS application bundle. Auto-deleted.
Autorun.inf Windows autorun config. AUTO-DELETE.

Total auto-delete categories that are purely security-driven (not Jellyfin-irrelevance-driven): 15.exe, .msi, .bat, .cmd, .com, .scr, .ps1, .vbs, .wsf, .hta, .jar, .dll, .sys, .url/.website/.lnk, Autorun.inf. Plus 1 flagged for human review: .iso/.img.

8.1 Why .url is in the security list

.url is a plain-text Internet Shortcut. On Windows, double-clicking it opens the target in the default browser. The "target" is whatever the release group put in the URL= line. Historically this was used to push codec-pack download pages with bundled adware. There is no benign reason for a .url to ship in a media release.

The Futurama release contains exactly this pattern:

[InternetShortcut]
URL=https://ninite.com/klitecodecs/

Ninite itself is reputable — but the principle is "do not ship clickable URLs to third-party installers in a media library, ever".

8.2 The RARBG_DO_NOT_MIRROR.exe historic case

Some releases historically contained a file named RARBG_DO_NOT_MIRROR.exe, ostensibly to discourage mirror sites from re-uploading. In several documented cases this file was actually adware or a cryptominer. Auto-delete, no questions asked.


9. Prepared cleanup script — cleanup-import.sh

Idempotent. Dry-run by default. Quarantine-first. Source-immutable. Returns the staging path on stdout for piping to doc 08's normalizer.

Save to bin/cleanup-import.sh in the ARRFLIX repo.

#!/usr/bin/env bash
# cleanup-import.sh — Pre-import cleanup for arrflix.s8n.ru
# Version 1.0 (2026-05-08) — see docs/07-pre-import-cleanup.md
#
# Usage:
#   cleanup-import.sh                              SRC          # dry-run
#   cleanup-import.sh --apply                      SRC          # quarantine
#   cleanup-import.sh --confirm-quarantine YYYY-MM-DD           # recycle
#
# Exit codes:
#   0 success / nothing to do
#   1 user error (bad args, source not found)
#   2 internal error (permission, partial state)
#   3 flagged files present — user must review before --apply
set -euo pipefail

STAGING_ROOT="${JELLYFIN_STAGING_ROOT:-$HOME/.jellyfin-staging}"
QUARANTINE_ROOT="${JELLYFIN_QUARANTINE_ROOT:-$HOME/.jellyfin-quarantine}"
TODAY="$(date +%Y-%m-%d)"

# ----- classification -----
# Returns one of: KEEP DELETE FLAG
classify() {
  local path="$1"
  local base
  base="$(basename "$path")"
  local lower
  lower="$(printf '%s' "$base" | tr '[:upper:]' '[:lower:]')"

  # Security overrides — bypass everything else
  case "$lower" in
    *.exe|*.msi|*.bat|*.cmd|*.com|*.scr|*.ps1|*.vbs|*.wsf|*.hta|*.jar|*.dll|*.sys) echo DELETE; return ;;
    *.url|*.website|*.lnk) echo DELETE; return ;;
    autorun.inf) echo DELETE; return ;;
  esac

  # OS junk
  case "$lower" in
    thumbs.db|ehthumbs.db|ehthumbs_vista.db|.ds_store|desktop.ini|.directory) echo DELETE; return ;;
    ._*) echo DELETE; return ;;
  esac

  # Media — KEEP
  case "$lower" in
    *.mkv|*.mp4|*.avi|*.m4v|*.ts|*.mov|*.webm|*.wmv|*.flv|*.mpg|*.mpeg) echo KEEP; return ;;
    *.srt|*.ass|*.ssa|*.vtt|*.sup|*.idx|*.sub) echo KEEP; return ;;
    *.mp3|*.flac|*.ogg|*.opus|*.m4a|*.wav) echo KEEP; return ;;
  esac

  # Recognised artwork at item root
  case "$lower" in
    folder.jpg|folder.jpeg|folder.png|folder.webp) echo KEEP; return ;;
    poster.jpg|poster.jpeg|poster.png|poster.webp) echo KEEP; return ;;
    cover.jpg|cover.jpeg|cover.png|cover.webp) echo KEEP; return ;;
    default.jpg|default.png|show.jpg|show.png|jacket.jpg|jacket.png|movie.jpg|movie.png) echo KEEP; return ;;
    backdrop.jpg|backdrop.png|backdrop[0-9]*.jpg|backdrop[0-9]*.png) echo KEEP; return ;;
    fanart.jpg|fanart.png|background.jpg|background.png|art.jpg|art.png) echo KEEP; return ;;
    logo.png|logo.jpg|clearlogo.png|clearlogo.jpg|banner.jpg|banner.png) echo KEEP; return ;;
    landscape.jpg|landscape.png|thumb.jpg|thumb.png|disc.png|disc.jpg|clearart.png|clearart.jpg) echo KEEP; return ;;
    season[0-9]*-poster.jpg|season[0-9]*-poster.png|season[0-9]*.jpg|season[0-9]*.png) echo KEEP; return ;;
    season-specials-poster.jpg|season-specials-poster.png) echo KEEP; return ;;
  esac

  # Promo images masquerading as art
  case "$lower" in
    *compare*.png|*compare*.jpg|*compare*.jpeg|*compare*.webp|*compare*.gif) echo DELETE; return ;;
    *sample*.png|*sample*.jpg|*sample*.jpeg) echo DELETE; return ;;
    *screen*.png|*screen*.jpg|*preview*.png|*preview*.jpg) echo DELETE; return ;;
  esac

  # Text-flavoured junk
  case "$lower" in
    *.txt|*.diz|file_id.diz) echo DELETE; return ;;
  esac

  # Sample files
  case "$lower" in
    sample.mkv|sample.mp4|sample.avi|sample.m4v) echo DELETE; return ;;
    *-sample.mkv|*-sample.mp4|*.sample.mkv|*.sample.mp4|*_sample.mkv|*_sample.mp4) echo DELETE; return ;;
  esac

  # Torrent residue
  case "$lower" in
    *.torrent|*.magnet|*.parts|*.aria2|*.meta) echo DELETE; return ;;
    *.pad|__padding_file_*|_____padding_file_*) echo DELETE; return ;;
    *.sfv|*.md5|*.sha1|*.sha256) echo DELETE; return ;;
  esac

  # NFO discriminator — KEEP if Jellyfin-compatible XML, else DELETE
  case "$lower" in
    *.nfo)
      if head -c 4096 "$path" | tr -d '[:space:]' \
           | grep -qE '<(movie|tvshow|episodedetails|artist|album|musicvideo|season)\b'; then
        echo KEEP
      else
        echo DELETE
      fi
      return
      ;;
  esac

  # Suspicious archives in a media folder
  case "$lower" in
    *.rar|*.r[0-9][0-9]|*.zip|*.7z|*.tar|*.tar.gz|*.iso|*.img) echo FLAG; return ;;
  esac

  echo FLAG
}

# ----- folder classification -----
# Returns one of: KEEP_AS-IS RENAME:<target> DELETE FLAG
classify_dir() {
  local d="$1"
  local lower
  lower="$(basename "$d" | tr '[:upper:]' '[:lower:]')"
  case "$lower" in
    behind\ the\ scenes|deleted\ scenes|interviews|scenes|samples|shorts|featurettes|clips|other|extras|trailers|theme-music|backdrops)
      echo "RENAME:$lower"; return ;;
    bts|behind-the-scenes) echo "RENAME:behind the scenes"; return ;;
    deleted-scenes|deleted_scenes) echo "RENAME:deleted scenes"; return ;;
    bonus|bonus\ features|bonus\ material|special\ features|outtakes|bloopers|gag\ reel) echo "RENAME:extras"; return ;;
    proof|screens|screenshots|caps|preview|previews) echo DELETE; return ;;
    sample) echo DELETE; return ;;
    .fseventsd|.spotlight-v100|.trashes|\$recycle.bin|system\ volume\ information) echo DELETE; return ;;
    extrafanart) echo "RENAME:extrafanart"; return ;;  # case stays, recognised
    *) echo FLAG; return ;;
  esac
}

# ----- main -----
APPLY=0
CONFIRM_DATE=""
SRC=""

while [[ $# -gt 0 ]]; do
  case "$1" in
    --apply) APPLY=1; shift ;;
    --confirm-quarantine) CONFIRM_DATE="$2"; shift 2 ;;
    -h|--help) sed -n '2,12p' "$0"; exit 0 ;;
    -*) echo "unknown flag: $1" >&2; exit 1 ;;
    *) SRC="$1"; shift ;;
  esac
done

if [[ -n "$CONFIRM_DATE" ]]; then
  if [[ -d "$QUARANTINE_ROOT/$CONFIRM_DATE" ]]; then
    gio trash "$QUARANTINE_ROOT/$CONFIRM_DATE"
    echo "Recycled $QUARANTINE_ROOT/$CONFIRM_DATE"
  else
    echo "No quarantine for $CONFIRM_DATE" >&2; exit 1
  fi
  exit 0
fi

[[ -n "$SRC" && -d "$SRC" ]] || { echo "usage: $0 [--apply] SRC" >&2; exit 1; }

RELEASE="$(basename "$SRC")"
STAGE="$STAGING_ROOT/$RELEASE"
QUAR="$QUARANTINE_ROOT/$TODAY/$RELEASE"

declare -i KEEP_N=0 DEL_N=0 FLAG_N=0

# Walk source, classify each entry
while IFS= read -r -d '' f; do
  rel="${f#$SRC/}"
  if [[ -d "$f" ]]; then
    case "$(classify_dir "$f")" in
      KEEP_AS-IS|RENAME:*) ;;
      DELETE) printf 'DELETE  %s/  [junk dir]\n' "$rel"; DEL_N+=1 ;;
      FLAG)   printf 'FLAG    %s/  [unknown dir name]\n' "$rel"; FLAG_N+=1 ;;
    esac
    continue
  fi
  case "$(classify "$f")" in
    KEEP)   printf 'KEEP    %s\n' "$rel"; KEEP_N+=1 ;;
    DELETE) printf 'DELETE  %s\n' "$rel"; DEL_N+=1 ;;
    FLAG)   printf 'FLAG    %s\n' "$rel"; FLAG_N+=1 ;;
  esac
done < <(find "$SRC" -mindepth 1 -print0)

echo "---"
echo "KEEP   $KEEP_N"
echo "DELETE $DEL_N"
echo "FLAG   $FLAG_N"

if (( FLAG_N > 0 )); then
  echo "FLAG count > 0; review before re-running with --apply." >&2
  (( APPLY == 0 )) || exit 3
fi

if (( APPLY == 0 )); then
  echo "Dry run only. Re-run with --apply to quarantine."
  exit 0
fi

# --- APPLY path: copy to staging, move DELETE to quarantine ---
mkdir -p "$STAGE" "$QUAR"
# rsync -a preserves perms and is idempotent
rsync -a --delete "$SRC/" "$STAGE/"

while IFS= read -r -d '' f; do
  rel="${f#$STAGE/}"
  if [[ -d "$f" ]]; then
    res="$(classify_dir "$f")"
    case "$res" in
      RENAME:*)
        target="${res#RENAME:}"
        parent="$(dirname "$f")"
        [[ "$(basename "$f")" == "$target" ]] || mv "$f" "$parent/$target"
        ;;
      DELETE)
        mkdir -p "$QUAR/$(dirname "$rel")"
        mv "$f" "$QUAR/$rel"
        ;;
    esac
    continue
  fi
  case "$(classify "$f")" in
    DELETE)
      mkdir -p "$QUAR/$(dirname "$rel")"
      mv "$f" "$QUAR/$rel"
      ;;
  esac
done < <(find "$STAGE" -mindepth 1 -print0)

# Manifest
{
  echo "{"
  echo "  \"release\": \"$RELEASE\","
  echo "  \"date\":    \"$TODAY\","
  echo "  \"source\":  \"$SRC\","
  echo "  \"staging\": \"$STAGE\","
  echo "  \"quarantine\": \"$QUAR\""
  echo "}"
} > "$STAGE/.cleanup-manifest.json"

# Stdout: the staging path, for piping to doc 08's normalizer
echo "$STAGE"

9.1 Pipeline integration

# Full pre-import flow:
SRC="/home/admin/Downloads/futrama/Futurama Season 1  [1080p AI x265 10bit FS99 Joy]"
STAGING="$(cleanup-import.sh --apply "$SRC")"
# STAGING is now ~/.jellyfin-staging/Futurama Season 1.../ with junk gone.
# Hand off to doc 08:
normalize-filenames.sh "$STAGING"
# Then move to live media tree (manual; doc 05 confirms layout):
mv "$STAGING" "/home/user/media/tv/Futurama (1999)/Season 01"

The mv to the live tree is deliberately manual. Cleanup and rename are reproducible from source; the move into /home/user/media/ is the point of no return and the user runs it consciously.


10. What this doc explicitly does NOT do

  • Filename normalization — that's doc 08. This doc only deletes; doc 08 renames Futurama S01E01 Space Pilot 3000 [1080p x265 10bit Joy].mkv into the canonical Futurama (1999) - S01E01 - Space Pilot 3000.mkv.
  • Subtitle reconciliation — doc 03 covers per-language naming; this doc only deletes obsolete formats (.smi, .rt).
  • Library refresh — after files land in /home/user/media/, run POST /Library/Refresh on the Jellyfin API (doc 02 § 2). Cleanup never touches the running container.
  • NFO writing — doc 02 § 11 covers writing override NFOs. This doc only filters incoming NFOs.
  • Source deletion — never. The source download is read-only to this pipeline; the user removes it manually post-import.

11. TL;DR

Step What Where
1 Audit (dry-run) cleanup-import.sh "$SRC"
2 Apply (quarantine) cleanup-import.sh --apply "$SRC" → prints staging path
3 Review quarantine ls ~/.jellyfin-quarantine/$(date +%F)/
4 Normalize filenames doc 08, takes staging path as input
5 Move to live tree manual mv "$STAGING" /home/user/media/...
6 Refresh library POST /Library/Refresh (doc 02)
7 Confirm quarantine cleanup-import.sh --confirm-quarantine YYYY-MM-DD
8 Delete source manual, only after Jellyfin shows the item correctly

The hard rule, repeated: the source download is never modified, the live media tree is never written by cleanup, and Windows executables never reach a Jellyfin user's browser.