ARRFLIX/docs/08-filename-normalization.md
s8n cb95dce8bc Rename: tv.s8n.ru → nasflix.s8n.ru, jellyfin-stack → NASFLIX
- Domain: tv.s8n.ru retired (404). nasflix.s8n.ru live (302 → /web).
  Pi-hole local DNS updated. Traefik file-provider router rule + docker-label
  router rule both flipped. Jellyfin PublishedServerUrl env updated. Cert
  re-issued via Gandi DNS-01. Onyx /etc/hosts pin moved.
- Repo: forgejo PATCH /api/v1/repos rename. Local clone remote URL updated.
  All in-tree refs to tv.s8n.ru and jellyfin-stack swept (sed).
- Scope: TV Shows + Movies only. anime/, musicvideos/, home/, music/,
  docs-*/ libraries removed from canonical layout. Sections kept as
  reference for re-introduction.
- Branding LoginDisclaimer text updated to nasflix.s8n.ru.
2026-05-08 02:53:46 +01:00

1853 lines
68 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 08 — Filename & Folder Normalization Ruleset (nasflix.s8n.ru)
Last updated: 2026-05-08
Server: Jellyfin 10.10.3 on nullstone, container `jellyfin`
Library root inside container: `/media`
Library root on host: `/home/user/media`
This document is the **normative ruleset** for renaming downloaded media into a
canonical, predictable, group-tag-free shape before it lands in the live
library tree. It is the layer between "torrent dump" and "file ready for the
scanner".
Cross-links:
- [`05-file-structure-rules.md`](05-file-structure-rules.md) — what Jellyfin's
parser accepts; this doc picks one of the accepted forms and locks it in.
- [`07-cleanup-and-imports.md`](07-cleanup-and-imports.md) — the operational
pipeline (move, dedupe, garbage collect) that consumes this ruleset. Doc 08
defines *what* canonical looks like; doc 07 defines *how* to apply it.
- [`02-metadata-and-titles.md`](02-metadata-and-titles.md) — what Jellyfin
does after the rename (parse, scrape, lock).
- [`03-subtitles.md`](03-subtitles.md) — sidecar `.srt` / `.ass` naming
(referenced from § 5.6 below).
> **Status of this doc:** specification + reference implementation. The
> `normalize.py` script in § 11 is canonical. Anything not codified by the
> script is documentation only — when the doc and the script disagree, the
> script wins, and the doc gets fixed.
---
## 0. Why a normalization ruleset (and why now)
Doc 05 establishes that Jellyfin's parser is permissive: dots, dashes,
underscores, and spaces are interchangeable; `S01E01`, `s01e01`, `1x01`, and
`Season 1 Episode 1` all parse to the same thing. That permissiveness is great
for *getting Jellyfin to scrape a torrent dump*, but it is a disaster for
**operating a library at scale**:
1. **Search becomes noisy.** SMB / Syncthing / Dolphin search across mixed
patterns surfaces irrelevant matches (`S01E01` vs `1x01` vs `s01.e01`).
2. **Diff / audit / dedupe scripts** get harder. Every regex needs to handle
N forms. The cleanup pass (doc 07) is dramatically cheaper if every file
in the tree obeys one shape.
3. **Visual scan in `ls`** becomes unreadable when half the filenames have
`[1080p AI x265 10bit FS99 Joy]` glued on and the other half don't.
4. **Future migrations** (Plex, Kodi, mobile sync to a Win/Mac client) all
have stricter parsers than Jellyfin. The strictest sane shape that
Jellyfin accepts is also the most portable. Pay the cost once.
5. **Cross-platform safety.** This deploy is Linux-only today, but the
workspace's Syncthing setup (see ai-lab `SYSTEM.md`) implies future
sync to Win/Mac clients. Choose Windows-safe filenames now and never
touch this again.
The cost of the ruleset is one Python script and discipline at import time.
Both are bounded. The cost of *not* having one compounds with every new
release.
---
## 1. Canonical formats — what the tree must look like
This is the lock-in. **One shape per category. No alternatives. No "but my
release group did it differently".**
### 1.1 Movies
```
Movies/<Title> (<Year>)/<Title> (<Year>).<ext>
Movies/<Title> (<Year>)/<Title> (<Year>) - <Edition>.<ext> (when edition matters)
Movies/<Title> (<Year>) [<provider-id>]/<Title> (<Year>) [<provider-id>].<ext> (when ambiguous)
```
- `<Title>` — smart title case (§ 5.1), forbidden chars stripped (§ 5.5).
- `<Year>` — first theatrical-release year, in parens, single space before `(`.
Mandatory in this deploy (doc 05 § 0 rule 5), even when the title is unique.
- `<Edition>` — when present, exactly one of:
`Director's Cut`, `Extended`, `Theatrical`, `IMAX`, `Unrated`, `Final Cut`,
`Remastered`. Anything else (e.g. `Snyder Cut`, `Workprint`, `4K
Remaster`) is admissible only with a written justification in the import
log; otherwise normalize to the closest of the seven canonical labels
above.
- `<provider-id>``imdbid-tt0123456` / `tmdbid-12345` / `tvdbid-12345`
in square brackets. Optional unless year-based disambiguation isn't
enough (§ 6.2).
- `<ext>` — lowercase: `mkv`, `mp4`, `webm`, `avi`. (`mkv` is the rip
default; `mp4` is the streaming-original default.) Never uppercase
`.MKV`, `.MP4`.
**Forbidden in the filename**: resolution tags (`1080p`, `2160p`, `720p`,
`4K`), codec tags (`x264`, `x265`, `h264`, `h265`, `HEVC`, `AVC`), source
tags (`WEB`, `WEB-DL`, `BluRay`, `BRRip`, `HDTV`, `DVDRip`, `WEBRip`),
audio tags (`AAC`, `AC3`, `DTS`, `DTS-HD.MA`, `5.1`, `7.1`, `Atmos`,
`Opus`), bitness/HDR tags (`10bit`, `8bit`, `HDR`, `DV`, `SDR`), release
tags (`PROPER`, `REPACK`, `INTERNAL`, `LIMITED`, `RERIP`), language tags
(`MULTi`, `DUBBED`, `SUBBED`, `iNTERNAL`), group tags
(`[YIFY]`, `[RARBG]`, `[FS99 Joy]`, `-NOGRP`, `-EVO`, `-SPARKS`),
and website refs (`WWW.YIFY-TORRENTS.COM`, `RARBG.txt`-derived names).
**Justification — why no resolution/codec tag:**
Jellyfin reads stream attributes (resolution, codec, bit-depth, HDR, audio
codec) directly from the file via `ffprobe` on every scan. The web UI
displays them. The mobile clients display them. The transcoder picks
based on them. The filename contributes **zero new information**.
Including those tags pollutes search results, breaks the byte-exact
folder-vs-file match required for multi-version movies (doc 05 § 1.2),
and makes humans skim past the title to find the title. The only
exception is `Movie (Year) - 1080p.mkv` AS the multi-version label
when two distinct rips of *the same movie* are kept in the same folder
(e.g. `Blade Runner 2049 (2017) - 2160p.mkv` next to
`Blade Runner 2049 (2017) - 1080p.mkv`). In that exact case, the
resolution IS the disambiguation token. Otherwise, no.
#### Examples
```
Movies/Blade Runner (1982)/Blade Runner (1982).mkv
Movies/Blade Runner (1982)/Blade Runner (1982) - Final Cut.mkv
Movies/Blade Runner (1982)/Blade Runner (1982) - Director's Cut.mkv
Movies/Blade Runner 2049 (2017)/Blade Runner 2049 (2017) - 2160p.mkv
Movies/Blade Runner 2049 (2017)/Blade Runner 2049 (2017) - 1080p.mkv
Movies/Dune (1984) [imdbid-tt0087182]/Dune (1984) [imdbid-tt0087182].mkv
```
### 1.2 TV shows
```
TV/<Show> (<Year>)/Season <NN>/<Show> (<Year>) - S<NN>E<MM> - <Episode Title>.<ext>
TV/<Show> (<Year>)/Season <NN>/<Show> (<Year>) - S<NN>E<MM>-E<MM2> - <Episode Title>.<ext>
TV/<Show> (<Year>)/Season 00/<Show> (<Year>) - S00E<MM> - <Special Title>.<ext>
```
- `<Show>` — smart title case, no provider-id in show folder unless the
scraper picks the wrong show twice in a row (then add `[tvdbid-NNNN]`).
- `<Year>` — series **first-air year**, mandatory even when title is unique
(doc 05 § 0 rule 5; this deploy convention is stricter than upstream
permissive parsing).
- `<NN>` — zero-padded two digits. `Season 01`, not `Season 1`. `S01`, not `S1`.
- `<MM>` — zero-padded two digits. Three digits permissible only for shows
that exceed 99 episodes per *season* (rare; e.g. some daily anime). See
doc 05 § 3.1.
- `<Episode Title>` — title from the metadata provider (TVDB/TMDB) with
smart title case. Required for human readability; Jellyfin overwrites it
during scrape but the file basename is what humans see in `ls`.
- Multi-episode files: `S<NN>E<MM>-E<MM2>` — single hyphen, no spaces.
Verified parsing per doc 05 § 2.2 table.
#### Examples
```
TV/Futurama (1999)/Season 01/Futurama (1999) - S01E01 - Space Pilot 3000.mkv
TV/Futurama (1999)/Season 01/Futurama (1999) - S01E03-E04 - I, Roommate / Love's Labours Lost in Space.mkv
TV/Futurama (1999)/Season 00/Futurama (1999) - S00E01 - Bender's Big Score.mkv
TV/The Office (2005)/Season 02/The Office (2005) - S02E01 - The Dundies.mkv
```
#### Why this shape (not the slimmer `Show S01E01.mkv`)
Doc 05 § 2.2 shows three accepted patterns:
```
Futurama (1999) S01E01.mkv
Futurama (1999) S01E01 - Space Pilot 3000.mkv
Futurama (1999) - S01E01 - Space Pilot 3000.mkv ← canonical for this deploy
```
The third form (with the leading ` - ` before `S01E01` and the title) is
chosen because:
1. The leading dash visually separates the series-name block from the
episode-id block. Important when the show's title contains spaces and
numbers (`Star Trek The Next Generation S01E01`) — without the dash, the
eye trips over `Generation S01E01`.
2. Symmetric with the Movies multi-version pattern (`Title (Year) - <Label>`).
One mental model for the whole library.
3. Identical to the Sonarr default rename pattern (`{Series Title} -
S{season:00}E{episode:00} - {Episode Title}`), which means the naming
pattern is well-trodden and tooling friendly.
### 1.3 Anime — seasonal numbering (TVDB-style)
Same shape as TV (§ 1.2). Mandatory year. Mandatory `Season NN`. No
absolute numbers.
```
Anime/<Show> (<Year>)/Season <NN>/<Show> (<Year>) - S<NN>E<MM> - <Episode Title>.<ext>
```
#### Examples
```
Anime/Cowboy Bebop (1998)/Season 01/Cowboy Bebop (1998) - S01E01 - Asteroid Blues.mkv
Anime/Mushishi (2005)/Season 02/Mushishi (2005) - S02E01 - The Sleeping Mountain.mkv
Anime/Steins;Gate (2011) [tvdbid-244061]/Season 01/Steins;Gate (2011) [tvdbid-244061] - S01E01 - Turning Point.mkv
```
(`;` is legal on `ext4` but flagged in § 5.5 as risky for portability —
prefer `Steins-Gate` if portability matters.)
### 1.4 Anime — absolute numbering
Used **only** for shows >99 episodes that don't fit the seasonal model
(One Piece, Naruto, Detective Conan, Bleach). For those shows, the
canonical shape is:
```
Anime/<Show>/<Show> - <NNNN> - <Episode Title> [<Sub|Dub>].<ext>
```
- No `(<Year>)` on the show folder — absolute-numbering shows are usually
unique by name; if not, fall back to a provider ID
(`Doraemon (1979) [tvdbid-71603]`, then revert to seasonal Pattern 1.3).
- `<NNNN>`**zero-padded four digits** (deterministic; all known
long-runners stay below 9999). Three-digit padding (`0099`) is wrong;
four-digit (`0099`) is right and matches the upper bound of the longest
running show.
- `[<Sub|Dub>]` — exactly one of `[Sub]` or `[Dub]`. Required for any
release where both audio tracks are not embedded in one mkv. If the
release contains both audio tracks in one container, omit the
bracket.
- No `Season NN` folder. Absolute numbering puts every episode in the
show root.
#### Deterministic absolute-numbering rule
Absolute number = the episode's position in the **broadcast order** as
listed by AniDB's "main" episode list for that show. NOT the dub broadcast
order, NOT a re-cut/remaster renumbering. For shows with discrepancies
between AniDB and TVDB absolute numbering (rare), AniDB wins — that's the
provider that absolute-numbering plugins (and Shoko) use.
#### Examples
```
Anime/One Piece/One Piece - 0001 - I'm Luffy! The Man Who's Gonna Be King of the Pirates! [Sub].mkv
Anime/One Piece/One Piece - 0001 - I'm Luffy! The Man Who's Gonna Be King of the Pirates! [Dub].mkv
Anime/Naruto/Naruto - 0001 - Enter Naruto Uzumaki [Sub].mkv
Anime/Detective Conan/Detective Conan - 1099 - The Detective's Vacation [Sub].mkv
```
#### Caveat
Naive Jellyfin without Shoko will mis-handle episodes >99 (doc 05 § 3.3).
This is a known issue; pick **one** of:
- Run Shoko (doc 05 § 3.2). Filenames don't matter for Shoko — but obey
this ruleset anyway, for human readability and for the day Shoko goes
away.
- Re-bucket by TVDB seasons. Most long-runners have a TVDB season split
(One Piece S01-S22). Use § 1.3 with the seasons.
This deploy currently does NOT run Shoko; it currently does NOT host any
absolute-numbered anime. The shape in § 1.4 is reserved for the day
Shoko gets installed. Leave it documented.
### 1.5 Music videos
```
MusicVideos/<Artist>/<Year> - <Track Title>.<ext>
MusicVideos/<Artist>/<Year> - <Track Title> [<Variant>].<ext> (when multiple cuts exist)
```
- `<Artist>` — smart title case, comma-separated for collabs
(`Daft Punk, Pharrell Williams`).
- `<Year>` — release year of the *video*, not the song. Songs older than
their videos are common (a 2024 acoustic cover gets the 2024 year).
- `<Track Title>` — smart title case.
- `<Variant>` — optional, `[Live]`, `[Acoustic]`, `[Remix]`, `[Alternate]`,
`[Lyric Video]`. Forbidden: `[1080p]`, `[Official]`, `[HD]`.
Music videos do not use `(<Year>)` parens because the library is
`musicvideos` `CollectionType`, which has no scraper (doc 05 § 5.3) and the
year is purely cosmetic.
#### Examples
```
MusicVideos/Daft Punk/2013 - Get Lucky.mp4
MusicVideos/Daft Punk/2013 - Get Lucky [Lyric Video].mp4
MusicVideos/Pink Floyd/1995 - Comfortably Numb [Live].mkv
MusicVideos/Daft Punk, Pharrell Williams/2013 - Get Lucky.mp4
```
For full **live concerts** (>20 min, multi-song), file under Movies
instead, per doc 05 § 5.4.
### 1.6 Stand-up specials (Movies-typed)
Stand-up lives in the Movies library (doc 05 § 4). Folder + filename are
prefixed with the performer name; treat the whole `<Performer> - <Title>`
as the canonical "movie title" for parser purposes.
```
Movies/<Performer> - <Title> (<Year>)/<Performer> - <Title> (<Year>).<ext>
```
#### Examples
```
Movies/Bo Burnham - Inside (2021)/Bo Burnham - Inside (2021).mkv
Movies/Hannah Gadsby - Nanette (2018) [imdbid-tt8465676]/Hannah Gadsby - Nanette (2018) [imdbid-tt8465676].mkv
Movies/Norm Macdonald - Nothing Special (2022)/Norm Macdonald - Nothing Special (2022).mkv
```
The `<Performer> - ` prefix is **mandatory** for stand-up. Without it, the
title alone (`Inside (2021)`) ambiguously matches the 2007 horror film
*Inside*, the 2023 thriller *Inside*, or the 2017 documentary *Inside*.
The prefix gives TMDB enough disambiguation to land on the correct
record without a provider-id override.
---
## 2. What to STRIP from a source filename — exhaustive list
This is the substring inventory. The script in § 11 implements all of
these. The list grew from sampling ~200 distinct release-group filenames
across `[YIFY]`, `[RARBG]`, `[ettv]`, `[GalaxyRG]`, `[FS99 Joy]`,
`[NOGRP]`, `[FitGirl]`, and the Futurama corpus on disk.
### 2.1 Group tags (square / round brackets)
Match anything inside `[...]` or `(...)` *that does not look like a year*.
Year detection: 4 digits, 1900 ≤ N ≤ current year + 2.
Exemplar substrings (case-insensitive):
```
[1080p AI x265 10bit FS99 Joy]
[YIFY]
[YTS]
[YTS.MX]
[YTS.AG]
[YTS.AM]
[RARBG]
[ettv]
[eztv]
[GalaxyRG]
[GalaxyRG265]
[FitGirl]
[FitGirl Repack]
[NOGRP]
[QxR]
[FreetheFish]
[psa]
[PSA]
[CMRG]
[d3g]
[STRiFE]
[Pahe.in]
[FoV]
[NTb]
[YOLO]
[KOGi]
[playWEB]
[REQ]
[XBET]
[FLUX]
[NOSiVID]
[BGT]
[SVA]
[CRiMSON]
[ION10]
[ION265]
[BluPanda]
[H4S5S]
[5.1]
(YIFY)
(RARBG)
(NOGRP)
```
### 2.2 Trailing release-group dashes
Pattern: `-<UPPERCASE_TOKEN>` at the very end of the basename
(before extension). Matches:
```
-NOGRP
-EVO
-RARBG
-SPARKS
-CMRG
-NTb
-FLUX
-AMZN
-NF
-DSNP
-ATVP
-MA
-WEB
-AAC2
-FoV
-KOGi
-PLAYWEB
-FRDS
-ZQ
-PHOENiX
-EZTV
-NTG
-iON
-ION10
-ION265
-CtrlHD
-d3g
-PSA
-QxR
-RZeroX
-PMP
-BTN
-DEFLATE
-BAE
-MZABI
-TURG
```
The pattern `-[A-Z][A-Z0-9]{1,15}$` (after stripping bracket tags and
quality tags) captures most of these. The script in § 11 uses an
allow-list approach instead of a pattern, because release groups
sometimes exceed 15 chars and sometimes use mixed case.
### 2.3 Quality / codec / source / audio tags
Strip all of these as standalone tokens (whitespace-, dot-, dash-, or
underscore-bounded), case-insensitive:
**Resolution / aspect:**
```
2160p 1080p 720p 480p 360p 4K 4k UHD HD SD FHD QHD
```
**Source:**
```
WEB-DL WEBDL WEB.DL WEB WEBRip WEB-Rip BluRay BLURAY Bluray BDRip
BRRip BR-Rip BDR HDTV HDTVRip PDTV DSR DVDRip DVD DVDR DVD9 DVD5
HDDVD HDDVDRip HDRip CAMRip CAM TS HDTS TC TELESYNC TELECINE R5
SCREENER SCR WORKPRINT WP PPV PPVRip
```
**Codec / container hints (in name):**
```
x264 x265 H.264 H264 H.265 H265 HEVC AVC VP9 AV1 XviD DivX
10bit 10-bit 8bit 8-bit HDR HDR10 HDR10+ DV DolbyVision Dolby.Vision
SDR HFR HQ
```
**Audio:**
```
DD5.1 DDP5.1 DD7.1 DDP7.1 DD2.0 DD+5.1 DD+7.1 DTS DTS-HD DTS-HD.MA
DTS-X DTSX TrueHD Atmos AAC AAC2.0 AAC5.1 AC3 AC-3 EAC3 E-AC3
MP3 MP2 Opus FLAC PCM LPCM 5.1 7.1 2.0 Mono Stereo Multi
```
**Release-process tags:**
```
PROPER REPACK iNTERNAL INTERNAL LIMITED EXTENDED.CUT UNCUT THEATRiCAL
RERIP REAL READNFO RETAiL RETAIL STV DC COMPLETE REMUX REMASTERED
SUBBED DUBBED MULTi MULTI SUB DUB ENG ENGLISH POL POLISH iNT iNTERNAL
```
> Note: `EXTENDED.CUT`, `THEATRiCAL`, `UNRATED`, `IMAX`, `DIRECTORS.CUT`,
> `FINAL.CUT`, `REMASTERED`, `UNCUT`, `DC` (= Director's Cut shorthand),
> `EE` (= Extended Edition shorthand) are kept *as edition tokens* — see
> § 3.6. Strip them from the noise pool, then re-emit them as
> ` - <Edition>` if present.
### 2.4 Source-specific cruft
Common compound suffixes that are not single tokens:
```
WEB.h264-NiXON[rartv]
WEB-DL.DDP5.1.x264-NTb
BDRip.x265.10bit-RZeroX
HDTV.x264-PHOENiX
1080p.WEB.h264-NiXON
2160p.UHD.BluRay.REMUX.HDR.HEVC.DTS-HD.MA.5.1
```
These are ad-hoc concatenations; once the standalone tokens above are
stripped, what remains is the title plus stray separators. The pipeline
in § 4 collapses separators last, so order matters.
### 2.5 Whitespace / punctuation cleanup
After substring removal, run these passes:
| Pass | From | To |
|---|---|---|
| Collapse runs of spaces | `Show Title S01E01` | `Show Title S01E01` |
| Trim leading/trailing whitespace | ` Show.mkv ` | `Show.mkv` |
| Collapse double-underscore | `Show__Title` | `Show Title` |
| Replace dot-separators with space (basename only) | `Show.Title.S01E01` | `Show Title S01E01` |
| Drop stray punctuation runs | `Show --- Title` | `Show - Title` |
| Strip trailing dashes/dots before ext | `Show -.mkv` | `Show.mkv` |
The dot-to-space substitution is **only applied if the dot is between
alphanumeric tokens** — so `5.1` (audio channel count, already removed
in § 2.3) is safe, and `Mr. Robot` keeps its dot if the source uses
`Mr.Robot` (the dot becomes a space, giving `Mr Robot` — the canonical
form has no dot).
### 2.6 URL / website refs
Match and remove:
```
WWW.YIFY-TORRENTS.COM
WWW.YTS.MX
WWW.RARBG.TO
RARBG.txt
www.yify-torrents.com
```
These appear as bracket prefixes (`[WWW.YIFY-TORRENTS.COM] Movie...`),
suffixes (`Movie - WWW.YIFY-TORRENTS.COM.mkv`), or as `RARBG.txt`-style
sidecar files (which doc 07 garbage-collects, not us).
Pattern (case-insensitive): `(?:^|[\s\[\(\.\-_])(WWW\.[A-Z0-9\-]+\.[A-Z]{2,4})(?:[\s\]\)\.\-_]|$)` → strip whole match.
### 2.7 Language indicators in the BASE name
`.pl`, `.eng`, `.en`, `.pol`, `.de`, `.fr`, `.es`, `.it`, `.ja`, `.jp`,
`.ru`, `.ko`, `.zh` appearing in the **video** filename (basename, not
extension). These belong on **subtitle sidecars only**, per doc 03.
```
Futurama.s01e01.pl.mkv ← BAD (`.pl` in video basename)
Futurama (1999) - S01E01.mkv ← GOOD (audio language is a stream attribute)
Futurama (1999) - S01E01.pl.srt ← GOOD (subtitle sidecar with lang)
Futurama (1999) - S01E01.eng.srt ← GOOD
```
Detection: 2- or 3-letter ISO-639 code as a token between dots / dashes /
underscores in the basename. If found, drop it from the basename. If a
sidecar `.srt` exists with the same lang token, **leave the sidecar
alone** — it's already correctly named.
If the source file is a `.srt` / `.ass` / `.vtt` / `.sub`, the lang
token is part of the canonical sidecar form and must NOT be stripped.
The script's `--type subtitle` mode handles this branch.
---
## 3. The normalization pipeline (regex / sed / python)
Conceptual order — each step's output feeds the next.
### 3.1 Step 0 — Determine target schema
Caller-supplied: `--type {movie|tv|anime-seasonal|anime-absolute|musicvideo|standup|extra}`. The
script does not guess. Doc 07's import wrapper picks the type based on
which library tree the file is being moved into.
### 3.2 Step 1 — Split off extension
```python
basename, ext = os.path.splitext(source_filename)
ext = ext.lower().lstrip(".") # canonical lowercase, no leading dot
```
Validate: `ext in {"mkv", "mp4", "avi", "webm", "m4v", "srt", "ass", "ssa", "vtt", "sub", "idx"}`.
Anything else → reject with an error; doc 07 quarantines it.
### 3.3 Step 2 — Extract S<NN>E<MM> (TV / anime-seasonal only)
```python
import re
RE_SEASON_EPISODE = re.compile(r"[Ss](\d{1,2})[Ee](\d{1,3})(?:-[Ee]?(\d{1,3}))?")
m = RE_SEASON_EPISODE.search(basename)
if not m:
# try alternative forms before giving up
m = re.search(r"(?<![\dA-Za-z])(\d{1,2})x(\d{1,3})(?:-(\d{1,3}))?", basename)
if m:
season, ep, ep_end = m.group(1), m.group(2), m.group(3)
else:
m = re.search(r"Season\s*(\d{1,2})\s*Episode\s*(\d{1,3})", basename, re.I)
# ...
season = f"{int(m.group(1)):02d}"
episode = f"{int(m.group(2)):02d}"
episode_end = f"{int(m.group(3)):02d}" if m.group(3) else None
```
If no S/E found and `--type tv|anime-seasonal`, error out — the file can
only be normalized if season/episode are recoverable.
### 3.4 Step 3 — Extract episode title
After step 2, the matched span is the boundary. Episode title is the text
**between** the SxxExx end and the **first** of: `[`, `(`, end-of-string,
group-tag delimiter, end-of-line.
```python
after_se = basename[m.end():]
# strip any leading separators
title_part = re.split(r"[\[\(]|\s-\s[A-Z][A-Z0-9]+$", after_se, maxsplit=1)[0]
title_part = title_part.strip(" -._")
```
If the title-part is empty after strip, leave it empty (script emits no
trailing title — `Show S01E01.mkv` is still canonical when no title is
known).
### 3.5 Step 4 — Extract series / movie title (from parent folder)
The **parent folder name** is the source of truth for series/movie title,
not the filename, because torrents commonly have inconsistent
filename-prefixes within the same folder (`Show.S01E01.x264.mkv` vs
`Show Title - S01E02.mkv`).
```python
parent = os.path.basename(os.path.dirname(source_path))
# strip group tags and quality from the parent folder too
clean_parent = strip_noise(parent)
# extract year if present
year_match = re.search(r"\((\d{4})\)", clean_parent)
year = year_match.group(1) if year_match else None
title = re.sub(r"\s*\(\d{4}\).*$", "", clean_parent).strip()
```
Edge case: parent folder is `Season 01` (TV) — recurse one more level up
to the show folder. The script handles N levels of `Season \d+` parents.
### 3.6 Step 5 — Detect edition tokens (Movies only)
After § 2.3 strips edition tags from the noise pool, scan the **original**
basename for canonical edition keywords:
```python
EDITIONS = {
r"director'?s?[\.\s_-]*cut": "Director's Cut",
r"extended[\.\s_-]*(?:cut|edition)?": "Extended",
r"theatrical(?:[\.\s_-]*cut)?": "Theatrical",
r"final[\.\s_-]*cut": "Final Cut",
r"imax": "IMAX",
r"unrated": "Unrated",
r"remastered?": "Remastered",
r"\bDC\b": "Director's Cut", # DC shorthand
r"\bEE\b": "Extended", # EE shorthand
}
```
Match the first one found, in priority order (Director's Cut > Final Cut
> Extended > Theatrical > IMAX > Unrated > Remastered). Emit as
` - <Edition>` between title-year block and extension.
### 3.7 Step 6 — Collapse, trim, re-emit canonical
```python
def emit_canonical(schema, parts):
if schema == "movie":
if parts.edition:
return f"{parts.title} ({parts.year}) - {parts.edition}.{parts.ext}"
return f"{parts.title} ({parts.year}).{parts.ext}"
if schema == "tv" or schema == "anime-seasonal":
ep_range = f"S{parts.season}E{parts.episode}"
if parts.episode_end:
ep_range += f"-E{parts.episode_end}"
if parts.episode_title:
return f"{parts.title} ({parts.year}) - {ep_range} - {parts.episode_title}.{parts.ext}"
return f"{parts.title} ({parts.year}) - {ep_range}.{parts.ext}"
if schema == "anime-absolute":
suffix = f" [{parts.subdub}]" if parts.subdub else ""
return f"{parts.title} - {parts.absolute_number} - {parts.episode_title}{suffix}.{parts.ext}"
if schema == "musicvideo":
variant = f" [{parts.variant}]" if parts.variant else ""
return f"{parts.year} - {parts.track_title}{variant}.{parts.ext}"
if schema == "standup":
return f"{parts.performer} - {parts.title} ({parts.year}).{parts.ext}"
```
After emission, run § 5.5 forbidden-character substitution, then § 5.6
double-space collapse, one final time.
---
## 4. Folder normalization
The same rules as filenames, applied to directory names, with a few
schema-specific adjustments.
### 4.1 Show folder — `<Show> (<Year>)`
```
Futurama Season 1 [1080p AI x265 10bit FS99 Joy]/ → Futurama (1999)/
The Office US S01-S09 1080p WEB-DL/ → The Office (2005)/
[YIFY] Inception 2010 1080p BRRip x264/ → Inception (2010)/ ← but this is movies
Cowboy.Bebop.1998.Complete.BluRay.x265.10bit/ → Cowboy Bebop (1998)/
```
Year: derived from the metadata provider (TVDB/TMDB) on first scrape, or
from the user-supplied `--year` flag. If neither is available,
`normalize.py --type tv` errors out and asks for `--year`. Year guessing
from parent-folder-numbers is unsafe (`Star Trek 2009` is the movie, not
the series).
### 4.2 Season folder — `Season <NN>`
```
Season 1/ → Season 01/
Season1/ → Season 01/
Season.01/ → Season 01/
S01/ → Season 01/
SEASON 1 [1080p WEB Joy]/ → Season 01/
Season 01 - Pilot Season/ → Season 01/ ← drop subtitle suffixes
Season 01 [BluRay]/ → Season 01/
Specials/ → Season 00/
Season 0/ → Season 00/
Extras/ → Season 00/ ← only if treated-as-specials
```
Doc 05 § 2.3 is explicit: `Specials/`, `Season 0/`, `Season Specials/` do
not match the parser. `Season 00` is the only correct form.
### 4.3 Movie folder — `<Title> (<Year>)`
Same rules as the filename without the extension. The folder name MUST
byte-for-byte match the filename prefix when multi-version files are
present (doc 05 § 1.2 — Jellyfin requires this).
```
[YIFY] Blade Runner 1982 1080p BRRip x264 AAC-RARBG/ → Blade Runner (1982)/
Blade.Runner.2049.2017.2160p.UHD.BluRay.x265.10bit.HDR.DV.DTS-HD.MA.7.1-FreetheFish/
→ Blade Runner 2049 (2017)/
```
### 4.4 Music-video artist folder — `<Artist>`
```
Daft.Punk/ → Daft Punk/
[Daft Punk]/ → Daft Punk/
DAFT PUNK Discography/ → Daft Punk/ ← note: "Discography" is dropped; this is video lib not music
```
### 4.5 Special-features subfolders
Inside an item folder, only these subfolder names are recognised by
Jellyfin (doc 05 § 8.2). The normalizer must rename source folders to
the canonical lowercase form:
```
BTS/ → behind the scenes/
Behind-the-Scenes/ → behind the scenes/
behind_the_scenes/ → behind the scenes/
Featurettes/ → featurettes/
DELETED SCENES [Joy]/ → deleted scenes/
Trailers/ → trailers/
Interviews/ → interviews/
Bonus Content/ → extras/ ← catch-all
Bonus_Features/ → extras/
```
**Files inside featurettes/ etc.** keep human-readable titles but get
their group tags stripped:
```
Featurettes/Welcome to the World of Tomorrow [1080p Joy].mkv
→ featurettes/Welcome to the World of Tomorrow.mkv
```
Casing inside the special-features file *itself* uses smart title case
(§ 5.1).
---
## 5. Case + character handling
### 5.1 Smart title case
Capitalize every word EXCEPT these "small words" (when not the first or
last word of the title):
```
a, an, and, as, at, but, by, for, from, in, into, nor, of, on, or, the,
to, up, vs, vs., via, with, yet
```
Words that look like acronyms (`I.B.M.`, `C.I.A.`, `T.M.N.T.`) are
preserved as-is. Roman numerals (`II`, `III`, `IV`, `IX`) are uppercased.
#### Examples
```
the lord of the rings the two towers → The Lord of the Rings the Two Towers ← BAD
the lord of the rings: the two towers → The Lord of the Rings - The Two Towers ← GOOD (`:` → ` - `, the second `the` is at start of subtitle, capitalize)
return of the king → Return of the King
star trek ii the wrath of khan → Star Trek II - The Wrath of Khan
```
The subtitle-after-colon special case is important: when a `: ` is
substituted with ` - `, the word after the dash is a new "first word" for
title-casing purposes. The script handles this by re-running the
title-caser on each ` - ` separated chunk.
Jellyfin's parser is case-insensitive — this is purely for human readers.
### 5.2 Hyphen / dash normalization
| Char | Code | Used for |
|---|---|---|
| `-` | U+002D HYPHEN-MINUS | ASCII hyphen, the only canonical form for filenames |
| `` | U+2013 EN DASH | Forbidden in filenames; replace with `-` |
| `—` | U+2014 EM DASH | Forbidden; replace with `-` |
| `` | U+2212 MINUS SIGN | Forbidden; replace with `-` |
Unicode dashes appear from copy-paste of articles (Wikipedia loves the en
dash). They're invisible-ish in `ls`, but they break grep, shell
completion, and SMB transfers.
```
SpiderMan (2002).mkv → Spider-Man (2002).mkv
Spider — Man (2002).mkv → Spider - Man (2002).mkv
```
### 5.3 Apostrophes / quotes
| Char | Code | Status |
|---|---|---|
| `'` | U+0027 APOSTROPHE | Canonical; ASCII straight quote |
| `'` | U+2019 RIGHT SINGLE QUOTATION MARK | Forbidden in filenames; replace with `'` |
| `'` | U+2018 LEFT SINGLE QUOTATION MARK | Forbidden; replace with `'` |
| `"` | U+0022 QUOTATION MARK | Forbidden in filenames (Windows-illegal); strip entirely |
| `"` | U+201C LEFT DOUBLE QUOTATION MARK | Forbidden; strip |
| `"` | U+201D RIGHT DOUBLE QUOTATION MARK | Forbidden; strip |
Curly quotes break SMB shares (Windows clients see `?` and refuse to open
the file) and break shell escaping in scripts.
```
Don't Stop Believin'.mkv ← GOOD
Don't Stop Believin'.mkv ← BAD (curly), normalize to straight
"It's a Wonderful Life" (1946).mkv ← BAD (double quotes), strip them entirely:
It's a Wonderful Life (1946).mkv ← GOOD
```
### 5.4 Diacritics / non-ASCII
`ext4` is UTF-8 native; Jellyfin's parser is UTF-8 native; the HTTP API
serves UTF-8 happily. **Keep diacritics** when the title's accepted
spelling uses them.
```
Amélie (2001)/Amélie (2001).mkv ← GOOD
Pokémon (1997)/Season 01/Pokémon (1997) - S01E01 - Pokémon - I Choose You!.mkv ← GOOD
Léon - The Professional (1994)/Léon - The Professional (1994).mkv ← GOOD
```
Doc 05 § 0 rule 4 advises caution: prefer the ASCII title when "well
known" (e.g. `Amelie (2001)` over `Amélie (2001)`). For this deploy with
LAN-only HTTP and `ext4`, full Unicode is safe — but the rule of thumb
remains: if Wikipedia's English page uses the accent, keep it; if not,
drop it.
**Tested:** Jellyfin's filename matching, `Items?searchTerm=`, and NFO
`<title>` round-trip correctly with `é`, `ñ`, `ü`, `ß`, `ø`, `ł`, `ż`,
`日`, `한` on this deploy. Verified against the Futurama Polish-dubbed
corpus.
### 5.5 Forbidden-char substitution table
Windows-illegal: `< > : " / \ | ? *`. Linux additionally forbids `/` and
NUL. Substitute as follows:
| Char | Substitute | Rationale |
|---|---|---|
| `:` | ` - ` (space-hyphen-space) | Most common in titles (`Star Trek II: The Wrath of Khan`); ` - ` is a clean replacement that title-casing handles |
| `/` | ` and ` | Used in titles like `Mr. & Mrs. Smith` (no `/` there) and in episode-title lists for two-part eps. Avoid if both halves stand on their own. |
| `\` | omit | No legitimate use in titles |
| `<` | `(` | Rare; `<` in titles is parenthetical |
| `>` | `)` | Same |
| `\|` | omit (or `-`) | Rare; sometimes in `Tom \| Jerry` style logo-text |
| `?` | omit | Common in `Who Killed the Robber?` — drop the question mark, keep meaning |
| `*` | omit | Rare; usually censored profanity |
| `"` | omit | Per § 5.3 |
| `\0` (NUL) | error | Filesystem hard-block; surface to user |
#### Examples
```
Star Trek II: The Wrath of Khan (1982) → Star Trek II - The Wrath of Khan (1982)
Mr. & Mrs. Smith (2005) → Mr. & Mrs. Smith (2005) (no change; & is fine)
Who Killed the Robber? (1987) → Who Killed the Robber (1987)
Tom & Jerry: The Movie (1992) → Tom & Jerry - The Movie (1992)
```
### 5.6 Whitespace canonicalization
After all substitutions:
1. Collapse runs of `\s+` to a single space.
2. `strip()` leading/trailing whitespace.
3. Collapse double-`-` (which can result from `Title -- Subtitle`) to
single `-`.
4. Trim trailing punctuation before extension: `Title -.mkv``Title.mkv`.
---
## 6. Year disambiguation — concrete examples
Jellyfin's TMDB/TVDB scrape uses the year in `(YYYY)` to filter
candidates. With multiple titles of the same name, the year is the *only*
disambiguator before falling back to provider IDs.
### 6.1 Without year — what goes wrong
Filename: `Cinderella.mkv` (no year, no folder year).
Jellyfin sends "Cinderella" to TMDB. TMDB returns 12+ matches:
- Cinderella (1950) — Disney animated
- Cinderella (2015) — Disney live action
- Cinderella (2021) — Camila Cabello musical
- Cinderella (1965) — TV special
- Cinderella (1899) — Méliès short
Jellyfin picks the one with the highest popularity score, which is the
2015 live-action remake. If you wanted 1950, you have to manually edit.
### 6.2 With year — clean match
Filename: `Cinderella (1950).mkv` in folder `Cinderella (1950)/`.
Jellyfin sends `(title=Cinderella, year=1950)` to TMDB. TMDB returns the
1950 animated film as the top match with high confidence. Scrape
succeeds first try.
```
Movies/Cinderella (1950)/Cinderella (1950).mkv ← TMDB ID 11224 (animated)
Movies/Cinderella (2015)/Cinderella (2015).mkv ← TMDB ID 150689 (live action)
Movies/Cinderella (2021)/Cinderella (2021).mkv ← TMDB ID 587996 (musical)
```
### 6.3 Same year — provider ID required
Filename: `Bad Movie (1980).mkv`. Two films named "Bad Movie" released in
1980 (hypothetical). Year doesn't disambiguate. Add provider ID:
```
Movies/Bad Movie (1980) [imdbid-tt0080000]/Bad Movie (1980) [imdbid-tt0080000].mkv
Movies/Bad Movie (1980) [imdbid-tt0080001]/Bad Movie (1980) [imdbid-tt0080001].mkv
```
### 6.4 Year on TV shows
The same logic applies to series:
```
TV/The Office (2001)/... ← UK original, BBC
TV/The Office (2005)/... ← US remake, NBC
```
Without year, Jellyfin picks one (usually the US one, higher TMDB
popularity). With year, both work side-by-side.
---
## 7. Multi-version handling
When a single movie has multiple legitimate cuts (Director's Cut, Theatrical,
Extended), or multiple resolutions (2160p HDR + 1080p SDR), Jellyfin groups
them under one item with a "Version" picker in the UI.
### 7.1 Edition variants
```
Movies/Blade Runner (1982)/
├── Blade Runner (1982).mkv ← default (whichever is "the" version)
├── Blade Runner (1982) - Director's Cut.mkv
├── Blade Runner (1982) - Final Cut.mkv
└── Blade Runner (1982) - Theatrical.mkv
```
Jellyfin reads all four files, hashes them, and creates one library item
"Blade Runner (1982)" with four selectable versions. The unlabelled one
shows as "Default".
### 7.2 Resolution variants
```
Movies/Blade Runner 2049 (2017)/
├── Blade Runner 2049 (2017) - 2160p.mkv
├── Blade Runner 2049 (2017) - 1080p.mkv
└── Blade Runner 2049 (2017) - 720p.mkv
```
Resolution labels ending in `p` or `i` sort descending by quality, so the
2160p version is offered first. This is the *only* exception to "no
resolution tags in filenames" (§ 1.1).
### 7.3 Mixed (edition × resolution)
```
Movies/Blade Runner 2049 (2017)/
├── Blade Runner 2049 (2017) - Theatrical 2160p.mkv
├── Blade Runner 2049 (2017) - Theatrical 1080p.mkv
├── Blade Runner 2049 (2017) - Director's Cut 2160p.mkv
└── Blade Runner 2049 (2017) - Director's Cut 1080p.mkv
```
This works in Jellyfin 10.10 — all four are grouped, the picker is a
flat list with all four labels visible. Slight UX ugliness but parses
cleanly. Avoid unless you genuinely have both axes of variation.
### 7.4 What does NOT work
- Sub-folders for variants:
```
Movies/Blade Runner 2049 (2017)/Theatrical/Blade Runner 2049 (2017).mkv ← BREAKS
```
Jellyfin treats `Theatrical/` as an unknown extras subfolder and the
inner mkv as nothing.
- Different folder per cut:
```
Movies/Blade Runner 2049 (2017) Theatrical/Blade Runner 2049 (2017).mkv
Movies/Blade Runner 2049 (2017) Director's Cut/Blade Runner 2049 (2017).mkv
```
This makes them two separate library items, not grouped versions.
- Suffix without space-hyphen-space:
```
Blade Runner 2049 (2017).Theatrical.mkv ← BREAKS (no ` - ` separator)
Blade Runner 2049 (2017)-Theatrical.mkv ← BREAKS (no spaces around `-`)
```
---
## 8. Special-features filename rules
Files inside the recognised subfolders (`featurettes/`, `behind the
scenes/`, `deleted scenes/`, `interviews/`, `trailers/`, etc.) follow
these rules:
1. **Strip group tags** as in § 2.1.
2. **Strip quality / codec / source / audio tags** as in § 2.3.
3. **Smart title case** as in § 5.1.
4. **Forbidden chars substituted** as in § 5.5.
5. **Filename = the human-readable feature title.** No `(year)`, no
`S01E01`. The parent folder type (e.g. `featurettes/`) is the type
marker.
6. Optional: append `-featurette` (or `-trailer`, `-behindthescenes`,
etc.) suffix to be defensive about scraper edge cases. Doc 05 § 8.1
shows this works AND § 8.2 shows the folder method works — using both
is belt-and-braces.
#### Example
```
Featurettes/Welcome to the World of Tomorrow [1080p Joy].mkv
featurettes/Welcome to the World of Tomorrow.mkv
```
Or, if you want belt-and-braces:
```
featurettes/Welcome to the World of Tomorrow-featurette.mkv
```
Both parse. Pick **one** style per library and keep it consistent.
---
## 9. Worked example — the live Futurama import
This is the example the owner asked for. Verified against the live media
tree on nullstone (`/home/user/media/tv/Futurama/Season 01,02,03/`).
### 9.1 BEFORE (representative source dump)
```
/home/admin/Downloads/futrama/
└── Futurama Season 1 [1080p AI x265 10bit FS99 Joy]/
├── Futurama S01E01 Space Pilot 3000 [1080p x265 10bit Joy].mkv
├── Futurama S01E02 The Series Has Landed [1080p x265 10bit Joy].mkv
├── Futurama S01E03 I, Roommate [1080p x265 10bit Joy].mkv
├── Futurama S01E04 Love's Labours Lost in Space [1080p x265 10bit Joy].mkv
├── Futurama S01E05 Fear of a Bot Planet [1080p x265 10bit Joy].mkv
├── Futurama S01E06 A Fishful of Dollars [1080p x265 10bit Joy].mkv
├── Futurama S01E07 My Three Suns [1080p x265 10bit Joy].mkv
├── Futurama S01E08 A Big Piece of Garbage [1080p x265 10bit Joy].mkv
├── Futurama S01E09 Hell Is Other Robots [1080p x265 10bit Joy].mkv
└── Featurettes/
└── Welcome to the World of Tomorrow [1080p Joy].mkv
```
Note: doubled-space is real (`Futurama S01E01 Space Pilot 3000 [1080p`).
Source the rip is from a release group called "Joy" using "FS99" (FastSub
99); "AI" likely means AI-upscaled. None of that is library-relevant.
### 9.2 AFTER (canonical layout)
```
/home/user/media/tv/
└── Futurama (1999)/
├── Season 01/
│ ├── Futurama (1999) - S01E01 - Space Pilot 3000.mkv
│ ├── Futurama (1999) - S01E02 - The Series Has Landed.mkv
│ ├── Futurama (1999) - S01E03 - I, Roommate.mkv
│ ├── Futurama (1999) - S01E04 - Love's Labours Lost in Space.mkv
│ ├── Futurama (1999) - S01E05 - Fear of a Bot Planet.mkv
│ ├── Futurama (1999) - S01E06 - A Fishful of Dollars.mkv
│ ├── Futurama (1999) - S01E07 - My Three Suns.mkv
│ ├── Futurama (1999) - S01E08 - A Big Piece of Garbage.mkv
│ └── Futurama (1999) - S01E09 - Hell Is Other Robots.mkv
└── featurettes/
└── Welcome to the World of Tomorrow.mkv
```
### 9.3 Per-file rename mapping
| Before | After |
|---|---|
| `Futurama Season 1 [1080p AI x265 10bit FS99 Joy]/` | `Futurama (1999)/Season 01/` |
| `Futurama S01E01 Space Pilot 3000 [1080p x265 10bit Joy].mkv` | `Futurama (1999) - S01E01 - Space Pilot 3000.mkv` |
| `Futurama S01E02 The Series Has Landed [1080p x265 10bit Joy].mkv` | `Futurama (1999) - S01E02 - The Series Has Landed.mkv` |
| `Futurama S01E04 Love's Labours Lost in Space [1080p x265 10bit Joy].mkv` | `Futurama (1999) - S01E04 - Love's Labours Lost in Space.mkv` |
| `Featurettes/Welcome to the World of Tomorrow [1080p Joy].mkv` | `featurettes/Welcome to the World of Tomorrow.mkv` |
Notes on specific titles:
- `I, Roommate` keeps the comma. Comma is legal on `ext4`, on Windows,
and on every modern SMB client. No need to substitute.
- `Love's Labours Lost in Space` keeps the straight ASCII apostrophe.
If the source had a curly `'`, § 5.3 normalizes it.
- `Hell Is Other Robots``Is` is capitalized (it's not in the small-words
list — the small-words list excludes `is`/`be`/`am`/`are`).
### 9.4 What the live tree currently has
Verified via `ssh user@192.168.0.100 'ls /home/user/media/tv/Futurama/'`:
```
Season 01
Season 02
Season 03
```
The current live deploy uses folder name `Futurama/` (no year) — that's
non-canonical per this doc. The canonical is `Futurama (1999)/`. This is
covered in doc 07's migration plan (rename the folder, then `POST
/Library/Refresh`). Mentioned here as a known drift; not fixed in this
doc.
---
## 10. Idempotency and safety
The `normalize.py` script in § 11 enforces these:
1. **No-op on already-canonical input.** When the script's emitted
filename equals the source filename byte-for-byte, it does nothing
and returns exit code 0. Re-running the script on an already-imported
library is safe and free.
2. **No overwrite without `--force`.** When the target path exists and
is not the source path, the script refuses to move and returns exit
code 2. With `--force`, it moves and the target is overwritten.
Without `--force`, the script suggests a numeric suffix
(`Title (Year) (1).mkv`) and asks for confirmation.
3. **Default to dry-run.** The script prints what it would do to stdout
and does NOT touch the filesystem unless `--apply` is passed. This is
the inverse of the GNU convention (most tools default to apply,
require `--dry-run` to preview) — chosen because the destructive
case (a wrong rename of 100 files) is much worse than the boring
case (one extra flag).
4. **Audit log** at `/var/log/jellyfin-imports/<YYYY-MM-DD>.log`. Every
`--apply` run appends:
```
2026-05-08T14:23:11Z RENAME /home/admin/.../Futurama S01E01 ...joy].mkv -> /home/user/media/tv/Futurama (1999)/Season 01/Futurama (1999) - S01E01 - Space Pilot 3000.mkv
```
Path is created (`mkdir -p /var/log/jellyfin-imports`) on first run if
missing; user must have write permission.
5. **No deletes.** The script *moves* (`os.rename` on same FS, `shutil.move`
across FS). It never `os.unlink`s. Garbage collection of source folders
(after all files moved) is doc 07's job.
6. **Atomic per-file.** Each file's rename is one syscall on the same FS;
on a different FS, `shutil.move` does copy-then-unlink which has a
brief window where both source and target exist. The audit log records
the operation regardless.
7. **Unicode-safe.** All paths handled as `pathlib.Path` (UTF-8 native on
`ext4`). Curly-quote → straight-quote substitution happens BEFORE the
target path is computed, so the target path is always ASCII-safe-ish
(still UTF-8 for legitimate accents).
---
## 11. Reference implementation — `normalize.py`
Drop this at `/opt/docker/jellyfin/scripts/normalize.py` on nullstone.
Run with Python 3.10+. Stdlib only — no external deps.
```python
#!/usr/bin/env python3
"""
normalize.py — canonical filename normalizer for nasflix.s8n.ru
Per /tmp/NASFLIX/docs/08-filename-normalization.md.
Safe by default: dry-run, no overwrite, no delete.
"""
from __future__ import annotations
import argparse
import datetime as dt
import os
import re
import shutil
import sys
import unicodedata
from dataclasses import dataclass, field
from pathlib import Path
from typing import Optional
LOG_DIR = Path("/var/log/jellyfin-imports")
# --- Stripping rules (doc § 2) -------------------------------------------------
GROUP_TAG_PATTERNS = [
re.compile(r"\[[^\[\]]*\b(YIFY|YTS(\.\w+)?|RARBG|ettv|eztv|GalaxyRG\d*|"
r"FitGirl|FitGirl\s*Repack|NOGRP|QxR|FreetheFish|psa|PSA|CMRG|"
r"d3g|STRiFE|Pahe\.in|FoV|NTb|YOLO|KOGi|playWEB|REQ|XBET|FLUX|"
r"NOSiVID|BGT|SVA|CRiMSON|ION10|ION265|BluPanda|H4S5S|Joy|"
r"FS99\s*Joy|FS99|AI\s*x265|x265\s*\d+bit|\d+bit\s*x265)"
r"[^\[\]]*\]", re.I),
re.compile(r"\((YIFY|RARBG|NOGRP)\)", re.I),
]
QUALITY_TOKENS = re.compile(
r"(?<![A-Za-z0-9])("
r"2160p|1080p|720p|480p|360p|4[Kk]|UHD|HD|SD|FHD|QHD|"
r"WEB-DL|WEBDL|WEB\.DL|WEB|WEBRip|WEB-Rip|BluRay|BLURAY|Bluray|BDRip|"
r"BRRip|BR-Rip|BDR|HDTV|HDTVRip|PDTV|DSR|DVDRip|DVD|DVDR|DVD9|DVD5|"
r"HDDVD|HDDVDRip|HDRip|CAMRip|CAM|TS|HDTS|TC|TELESYNC|TELECINE|R5|"
r"SCREENER|SCR|WORKPRINT|WP|PPV|PPVRip|"
r"x264|x265|H\.?264|H\.?265|HEVC|AVC|VP9|AV1|XviD|DivX|"
r"10bit|10-bit|8bit|8-bit|HDR10\+?|HDR|DV|Dolby\.?Vision|SDR|HFR|HQ|"
r"DDP?5\.1|DDP?7\.1|DDP?2\.0|DD\+5\.1|DD\+7\.1|DTS-HD\.MA|DTS-HD|DTS-X|"
r"DTSX|DTS|TrueHD|Atmos|AAC2\.0|AAC5\.1|AAC|AC3|AC-3|EAC3|E-AC3|"
r"MP3|MP2|Opus|FLAC|PCM|LPCM|5\.1|7\.1|2\.0|Mono|Stereo|Multi|"
r"PROPER|REPACK|iNTERNAL|INTERNAL|LIMITED|UNCUT|RERIP|REAL|READNFO|"
r"RETAi?L|STV|REMUX|MULTi|MULTI|SUBBED|DUBBED|iNT"
r")(?![A-Za-z0-9])", re.I)
URL_REF = re.compile(
r"(?:^|[\s\[\(\.\-_])(WWW\.[A-Z0-9\-]+\.[A-Z]{2,4})(?:[\s\]\)\.\-_]|$)",
re.I)
TRAILING_GROUP = re.compile(r"-(?:NOGRP|EVO|RARBG|SPARKS|CMRG|NTb|FLUX|AMZN|"
r"NF|DSNP|ATVP|MA|WEB|AAC2|FoV|KOGi|PLAYWEB|FRDS|"
r"ZQ|PHOENiX|EZTV|NTG|iON|ION10|ION265|CtrlHD|"
r"d3g|PSA|QxR|RZeroX|PMP|BTN|DEFLATE|BAE|MZABI|"
r"TURG|Joy)\b", re.I)
LANG_TOKEN = re.compile(r"(?<![A-Za-z])\.?(en|eng|pl|pol|de|deu|fr|fra|es|spa|"
r"it|ita|ja|jpn|jp|ru|rus|ko|kor|zh|chi)(?![A-Za-z])",
re.I)
# Forbidden chars (§ 5.5)
FORBIDDEN_CHARS = {
":": " - ",
"/": " and ",
"\\": "",
"<": "(",
">": ")",
"|": "",
"?": "",
"*": "",
'"': "",
"“": "", # left double quotation mark
"”": "", # right double quotation mark
}
# Apostrophe normalization (§ 5.3)
APOSTROPHES = {
"": "'",
"": "'",
}
# Dashes (§ 5.2)
DASHES = {
"": "-", # en dash
"—": "-", # em dash
"": "-", # minus
}
# Editions (§ 3.6)
EDITION_PATTERNS = [
(re.compile(r"director'?s?[\.\s_-]*cut", re.I), "Director's Cut"),
(re.compile(r"final[\.\s_-]*cut", re.I), "Final Cut"),
(re.compile(r"extended[\.\s_-]*(?:cut|edition)?", re.I), "Extended"),
(re.compile(r"theatrical(?:[\.\s_-]*cut)?", re.I), "Theatrical"),
(re.compile(r"\bIMAX\b", re.I), "IMAX"),
(re.compile(r"\bunrated\b", re.I), "Unrated"),
(re.compile(r"remastere?d?", re.I), "Remastered"),
(re.compile(r"(?<![A-Za-z])DC(?![A-Za-z])"), "Director's Cut"),
(re.compile(r"(?<![A-Za-z])EE(?![A-Za-z])"), "Extended"),
]
# Smart title case (§ 5.1)
SMALL_WORDS = {"a", "an", "and", "as", "at", "but", "by", "for", "from",
"in", "into", "nor", "of", "on", "or", "the", "to", "up",
"vs", "vs.", "via", "with", "yet"}
ROMAN_NUMERAL = re.compile(r"^[ivxlcdmIVXLCDM]+$")
def smart_title(s: str) -> str:
"""Title-case respecting small-words and roman numerals."""
if not s:
return s
chunks = re.split(r"(\s-\s)", s) # split on space-dash-space (subtitle)
out_chunks = []
for chunk in chunks:
if chunk == " - ":
out_chunks.append(chunk)
continue
words = chunk.split(" ")
result = []
for i, w in enumerate(words):
if not w:
result.append(w)
continue
if ROMAN_NUMERAL.match(w):
result.append(w.upper())
continue
lower = w.lower()
if 0 < i < len(words) - 1 and lower in SMALL_WORDS:
result.append(lower)
else:
# capitalize but preserve internal apostrophes/dots
result.append(w[0].upper() + w[1:].lower() if w else w)
out_chunks.append(" ".join(result))
return "".join(out_chunks)
def strip_noise(s: str) -> str:
"""Remove group tags, quality, urls, trailing groups."""
for pat in GROUP_TAG_PATTERNS:
s = pat.sub("", s)
s = URL_REF.sub(" ", s)
s = QUALITY_TOKENS.sub("", s)
s = TRAILING_GROUP.sub("", s)
return s
def normalize_chars(s: str) -> str:
"""Apply Unicode/forbidden-char substitutions."""
for k, v in APOSTROPHES.items():
s = s.replace(k, v)
for k, v in DASHES.items():
s = s.replace(k, v)
for k, v in FORBIDDEN_CHARS.items():
s = s.replace(k, v)
# NFC normalization for diacritics (consistent encoding)
s = unicodedata.normalize("NFC", s)
return s
def collapse_whitespace(s: str) -> str:
s = re.sub(r"\s+", " ", s)
s = re.sub(r" - - ", " - ", s)
s = re.sub(r"--+", "-", s)
s = s.strip(" -._")
return s
# --- Schema-specific extraction ------------------------------------------------
@dataclass
class Parts:
title: str = ""
year: Optional[str] = None
season: Optional[str] = None
episode: Optional[str] = None
episode_end: Optional[str] = None
episode_title: str = ""
edition: Optional[str] = None
provider_id: Optional[str] = None
ext: str = "mkv"
absolute_number: Optional[str] = None
subdub: Optional[str] = None
track_title: str = ""
variant: Optional[str] = None
performer: str = ""
RE_SE = re.compile(r"[Ss](\d{1,2})[Ee](\d{1,3})(?:-[Ee]?(\d{1,3}))?")
RE_NXM = re.compile(r"(?<![\dA-Za-z])(\d{1,2})x(\d{1,3})(?:-(\d{1,3}))?")
RE_SEASON_EP = re.compile(r"Season\s*(\d{1,2})\s*Episode\s*(\d{1,3})", re.I)
RE_YEAR_PARENS = re.compile(r"\((\d{4})\)")
RE_PROVIDER_ID = re.compile(r"\[(?:imdbid|tmdbid|tvdbid)-[^\]]+\]")
def extract_year(s: str) -> Optional[str]:
m = RE_YEAR_PARENS.search(s)
if m:
y = int(m.group(1))
if 1888 <= y <= dt.date.today().year + 2:
return m.group(1)
return None
def extract_provider_id(s: str) -> Optional[str]:
m = RE_PROVIDER_ID.search(s)
return m.group(0) if m else None
def extract_se(s: str):
m = RE_SE.search(s)
if m:
end = m.group(3) or None
return (m, m.group(1), m.group(2), end)
m = RE_NXM.search(s)
if m:
return (m, m.group(1), m.group(2), m.group(3))
m = RE_SEASON_EP.search(s)
if m:
return (m, m.group(1), m.group(2), None)
return (None, None, None, None)
def extract_edition(raw_basename: str) -> Optional[str]:
for pat, name in EDITION_PATTERNS:
if pat.search(raw_basename):
return name
return None
def parent_show_folder(p: Path) -> Path:
"""Walk up past Season XX folders until we find the show folder."""
cur = p.parent
while re.match(r"(?i)season\s*\d+|specials|extras", cur.name):
cur = cur.parent
return cur
# --- Per-schema emit -----------------------------------------------------------
def normalize_movie(src: Path, year_hint: Optional[str] = None,
title_hint: Optional[str] = None) -> Path:
raw = src.stem
ext = src.suffix.lower().lstrip(".") or "mkv"
edition = extract_edition(raw)
provider_id = extract_provider_id(raw) or extract_provider_id(src.parent.name)
cleaned = strip_noise(raw)
cleaned = normalize_chars(cleaned)
cleaned = collapse_whitespace(cleaned)
year = year_hint or extract_year(cleaned) or extract_year(src.parent.name)
if year:
cleaned = re.sub(r"\s*\(" + year + r"\)", "", cleaned).strip()
# drop edition tokens from the title body (we re-emit them)
for pat, _ in EDITION_PATTERNS:
cleaned = pat.sub("", cleaned)
cleaned = collapse_whitespace(cleaned)
title = title_hint or smart_title(cleaned)
if not year:
raise ValueError(f"cannot determine year for movie: {src}")
folder_name = f"{title} ({year})"
if provider_id:
folder_name += f" {provider_id}"
file_basename = folder_name
if edition:
file_basename += f" - {edition}"
return src.parent.parent / folder_name / f"{file_basename}.{ext}"
def normalize_tv(src: Path, year_hint: Optional[str] = None,
title_hint: Optional[str] = None,
schema: str = "tv") -> Path:
raw = src.stem
ext = src.suffix.lower().lstrip(".") or "mkv"
m, season, ep, ep_end = extract_se(raw)
if not season:
raise ValueError(f"no S/E token in TV file: {src}")
season = f"{int(season):02d}"
episode = f"{int(ep):02d}"
episode_end = f"{int(ep_end):02d}" if ep_end else None
# episode title = text after match, before next bracket
after = raw[m.end():] if hasattr(m, "end") else ""
title_part = re.split(r"[\[\(]", after, maxsplit=1)[0]
title_part = strip_noise(title_part)
title_part = normalize_chars(title_part)
title_part = collapse_whitespace(title_part)
title_part = re.sub(r"^[\s\-_\.]+", "", title_part)
episode_title = smart_title(title_part) if title_part else ""
# show title from parent folder
show_folder = parent_show_folder(src)
show_clean = strip_noise(show_folder.name)
show_clean = normalize_chars(show_clean)
show_clean = collapse_whitespace(show_clean)
year = year_hint or extract_year(show_clean) or extract_year(src.parent.name)
if year:
show_clean = re.sub(r"\s*\(" + year + r"\).*$", "", show_clean).strip()
show_clean = re.sub(r"(?i)\s*Season\s*\d+.*$", "", show_clean).strip()
show = title_hint or smart_title(show_clean)
if not year:
raise ValueError(f"cannot determine year for TV show: {show_folder}")
se_str = f"S{season}E{episode}"
if episode_end:
se_str += f"-E{episode_end}"
file_base = f"{show} ({year}) - {se_str}"
if episode_title:
file_base += f" - {episode_title}"
target_root = show_folder.parent # e.g. /media/tv
return target_root / f"{show} ({year})" / f"Season {season}" / f"{file_base}.{ext}"
def normalize_anime_absolute(src: Path, title_hint: Optional[str],
abs_num: Optional[int],
ep_title: str = "",
subdub: Optional[str] = None) -> Path:
ext = src.suffix.lower().lstrip(".") or "mkv"
show_folder = parent_show_folder(src)
show_clean = strip_noise(show_folder.name)
show_clean = normalize_chars(show_clean)
show = title_hint or smart_title(collapse_whitespace(show_clean))
if abs_num is None:
raise ValueError(f"absolute number required for {src}")
suffix = f" [{subdub}]" if subdub else ""
title_str = smart_title(ep_title) if ep_title else ""
file_base = f"{show} - {abs_num:04d}"
if title_str:
file_base += f" - {title_str}"
file_base += suffix
return show_folder.parent / show / f"{file_base}.{ext}"
def normalize_musicvideo(src: Path, artist_hint: str, year_hint: str,
track_hint: Optional[str] = None,
variant: Optional[str] = None) -> Path:
ext = src.suffix.lower().lstrip(".") or "mp4"
raw = src.stem
cleaned = normalize_chars(strip_noise(raw))
cleaned = collapse_whitespace(cleaned)
track = track_hint or smart_title(cleaned)
artist = smart_title(artist_hint)
suffix = f" [{variant}]" if variant else ""
return src.parent.parent / artist / f"{year_hint} - {track}{suffix}.{ext}"
def normalize_standup(src: Path, performer: str, title: str, year: str) -> Path:
ext = src.suffix.lower().lstrip(".") or "mkv"
folder = f"{performer} - {title} ({year})"
return src.parent.parent / folder / f"{folder}.{ext}"
# --- Driver --------------------------------------------------------------------
def is_already_canonical(src: Path, target: Path) -> bool:
return src.resolve() == target.resolve()
def log_op(action: str, src: Path, target: Path):
LOG_DIR.mkdir(parents=True, exist_ok=True)
log_file = LOG_DIR / f"{dt.date.today().isoformat()}.log"
ts = dt.datetime.utcnow().isoformat() + "Z"
line = f"{ts} {action} {src} -> {target}\n"
with log_file.open("a") as f:
f.write(line)
def main():
ap = argparse.ArgumentParser(description="canonical filename normalizer")
ap.add_argument("source", type=Path, help="source file path")
ap.add_argument("--type", required=True,
choices=["movie", "tv", "anime-seasonal",
"anime-absolute", "musicvideo", "standup",
"extra"])
ap.add_argument("--year")
ap.add_argument("--title")
ap.add_argument("--performer", help="for standup")
ap.add_argument("--artist", help="for musicvideo")
ap.add_argument("--track", help="for musicvideo")
ap.add_argument("--variant", help="for musicvideo")
ap.add_argument("--abs-num", type=int, help="for anime-absolute")
ap.add_argument("--ep-title", help="for anime-absolute")
ap.add_argument("--subdub", choices=["Sub", "Dub"], help="for anime-absolute")
ap.add_argument("--apply", action="store_true",
help="actually move the file (default is dry-run)")
ap.add_argument("--force", action="store_true",
help="overwrite existing target")
args = ap.parse_args()
src = args.source.resolve()
if not src.exists():
print(f"ERROR: {src} does not exist", file=sys.stderr)
sys.exit(1)
try:
if args.type == "movie":
target = normalize_movie(src, args.year, args.title)
elif args.type == "tv":
target = normalize_tv(src, args.year, args.title, schema="tv")
elif args.type == "anime-seasonal":
target = normalize_tv(src, args.year, args.title, schema="anime")
elif args.type == "anime-absolute":
target = normalize_anime_absolute(src, args.title, args.abs_num,
args.ep_title or "",
args.subdub)
elif args.type == "musicvideo":
target = normalize_musicvideo(src, args.artist or "", args.year or "",
args.track, args.variant)
elif args.type == "standup":
target = normalize_standup(src, args.performer or "",
args.title or "", args.year or "")
else:
print(f"ERROR: schema '{args.type}' not implemented", file=sys.stderr)
sys.exit(2)
except ValueError as e:
print(f"ERROR: {e}", file=sys.stderr)
sys.exit(2)
if is_already_canonical(src, target):
print(f"NOOP {src}")
sys.exit(0)
if target.exists() and not args.force:
print(f"REFUSE {src} -> {target} (target exists; use --force)")
sys.exit(2)
if args.apply:
target.parent.mkdir(parents=True, exist_ok=True)
shutil.move(str(src), str(target))
log_op("RENAME", src, target)
print(f"MOVED {src} -> {target}")
else:
print(f"DRY-RUN {src} -> {target}")
if __name__ == "__main__":
main()
```
### 11.1 Usage examples
```bash
# Dry-run a single Futurama episode
./normalize.py --type tv \
"/home/admin/Downloads/futrama/Futurama Season 1 [1080p AI x265 10bit FS99 Joy]/Futurama S01E01 Space Pilot 3000 [1080p x265 10bit Joy].mkv"
# Output:
# DRY-RUN /home/admin/Downloads/.../Futurama S01E01 Space Pilot 3000 [1080p x265 10bit Joy].mkv
# -> /home/admin/Downloads/futrama/Futurama (1999)/Season 01/Futurama (1999) - S01E01 - Space Pilot 3000.mkv
# Same with --apply, with explicit year and title hints
./normalize.py --type tv --year 1999 --title "Futurama" --apply \
"/home/admin/Downloads/futrama/Futurama Season 1 [1080p AI x265 10bit FS99 Joy]/Futurama S01E01 Space Pilot 3000 [1080p x265 10bit Joy].mkv"
# Movie with edition
./normalize.py --type movie --year 1982 --apply \
"/home/admin/Downloads/Blade Runner 1982 Final Cut [1080p BluRay x265 RARBG].mkv"
# Stand-up
./normalize.py --type standup --performer "Bo Burnham" --title "Inside" --year 2021 --apply \
"/home/admin/Downloads/Bo.Burnham.Inside.2021.1080p.NF.WEB-DL.DDP5.1.x264-NTb.mkv"
# Music video
./normalize.py --type musicvideo --artist "Daft Punk" --year 2013 \
--track "Get Lucky" --apply \
"/home/admin/Downloads/daft.punk.get.lucky.official.video.1080p.mkv"
```
### 11.2 Idempotency proof
Running the script twice on the same input produces the same target. The
second run's source = first run's target, so `is_already_canonical()`
returns true, and the script no-ops. Verified in unit tests (see
`/opt/docker/jellyfin/scripts/test_normalize.py` — to be added in doc 07's
implementation phase).
---
## 12. Edge cases catalogue
### 12.1 Episodes with very long titles
```
The Office (2005) - S07E25-E26 - Search Committee.mkv ← multi-ep, short title, fine
Sherlock (2010) - S04E03 - The Final Problem.mkv ← long-ish, fine
Steins;Gate (2011) - S01E22 - Being Meltdown - The Concerto Whose Conductor Has Lost His Baton.mkv
```
The third example is 110 chars before extension. `ext4` allows 255 bytes
per filename component; this fits. Smart title case applied; no `:` (the
title has no colon — the long string is the actual title from MyAnimeList).
If a title has a colon, it becomes ` - ` per § 5.5, which slightly
extends the length but doesn't cap.
### 12.2 Episodes with `.` in the title
```
Mr. Robot (2015) - S01E01 - eps1.0_hellofriend.mov.mkv ← title contains `.mov`
```
`.mov` inside the title is technically a substring that *looks* like a
container type. The parser doesn't care (the extension is `.mkv`, parsed
last). Keep as-is. Smart title case leaves the lowercase intentional
formatting (it's the title's actual stylization).
### 12.3 Shows with numeric titles
```
1923 (2022) - S01E01 - 1923.mkv ← year-as-title, year-as-disambiguation
24 (2001) - S01E01 - Day 1 - 12-00 AM-1-00 AM.mkv ← `:` from title became ` - `
```
The `24` / `1923` cases would fail year extraction if the show year is
omitted. Year hint via `--year` is mandatory for these.
### 12.4 Two-part single episodes (multi-part files)
Doc 05 § 2 mentions `Series A S02E03 Part 1.mkv` / `Part 2.mkv`. Canonical:
```
TV/Show (Year)/Season 02/Show (Year) - S02E03 - Title - part 1.mkv
TV/Show (Year)/Season 02/Show (Year) - S02E03 - Title - part 2.mkv
```
Use lowercase `part` (Jellyfin parser is case-insensitive but lowercase
is more common in docs).
### 12.5 Source has no episode title
```
Source: Show.S01E01.1080p.WEB-DL.x264-NTb.mkv
Target: Show (Year) - S01E01.mkv
```
Empty episode title → omit. The script does this already (§ 11
`emit_canonical()` checks `if parts.episode_title`). Jellyfin will
backfill the title from TVDB on first scrape.
### 12.6 Source has WRONG episode title
If the rip's episode title is different from TVDB's canonical (e.g. a
Polish translation of an English-language show, or a non-canonical
sub-group title), prefer the **TVDB title** (English, official). This
requires manual intervention — pass `--ep-title "Canonical Title"` or
edit after the rename. Not automated.
### 12.7 Dual-audio (sub+dub in one file)
If the mkv has both audio tracks, omit the `[Sub]`/`[Dub]` suffix:
```
Anime/One Piece/One Piece - 0001 - I'm Luffy.mkv ← dual audio in container
```
The user can pick the audio track from the player. The filename only
needs to disambiguate when *separate files* exist.
### 12.8 Mid-season hiatus / split seasons
Some shows split S01 into "Part 1" and "Part 2" (Better Call Saul,
Stranger Things). Treat as **one season**:
```
TV/Stranger Things (2016)/Season 04/
├── Stranger Things (2016) - S04E01 - The Hellfire Club.mkv ← Vol 1
├── ...
├── Stranger Things (2016) - S04E07 - The Massacre at Hawkins Lab.mkv ← Vol 1 finale
├── Stranger Things (2016) - S04E08 - Papa.mkv ← Vol 2 start
└── Stranger Things (2016) - S04E09 - The Piggyback.mkv ← Vol 2 finale
```
TVDB lists S04 as one season, episodes 1-9. The hiatus is invisible to
the parser. Don't create `Season 04 Part 1/`.
---
## 13. Verification checklist (doc 07 will use this)
Before declaring a normalized file "imported":
1. Filename matches the canonical regex for its category (§ 1).
2. No forbidden chars (§ 5.5) in any part of the path.
3. No group tags / quality / codec / source / audio tags in the basename
(§ 2).
4. Folder structure matches § 1.x for the category.
5. Year is in `(YYYY)` and matches the actual release year (movies/TV).
6. `Season NN/` is zero-padded (TV / anime-seasonal).
7. Episode S/E numbers zero-padded to two digits (three for >99).
8. Smart title case applied to all title-bearing components.
9. Apostrophes are ASCII (`'`), dashes are ASCII (`-`).
10. Diacritics in NFC form (UTF-8 encoded canonically).
11. The script's `is_already_canonical()` returns true on the result —
re-running the normalizer leaves the file untouched.
12. Audit log line written to `/var/log/jellyfin-imports/<date>.log`.
If any check fails, the file is quarantined per doc 07 to a `_pending/`
subtree for manual review.
---
## 14. Quick reference card (for the operator)
| Category | Canonical shape | Example |
|---|---|---|
| Movie | `Movies/T (Y)/T (Y).mkv` | `Movies/Inception (2010)/Inception (2010).mkv` |
| Movie+edition | `Movies/T (Y)/T (Y) - E.mkv` | `Movies/Blade Runner (1982)/Blade Runner (1982) - Final Cut.mkv` |
| Movie+resolution | `Movies/T (Y)/T (Y) - NNNNp.mkv` | `Movies/Blade Runner 2049 (2017)/Blade Runner 2049 (2017) - 2160p.mkv` |
| TV episode | `TV/S (Y)/Season NN/S (Y) - SXXEYY - Title.mkv` | `TV/Futurama (1999)/Season 01/Futurama (1999) - S01E01 - Space Pilot 3000.mkv` |
| TV multi-ep | `... - SXXEYY-EZZ - Title.mkv` | `Futurama (1999) - S01E03-E04 - I, Roommate / Love's Labours.mkv` |
| TV special | `... /Season 00/... - S00EYY - Title.mkv` | `Futurama (1999) - S00E01 - Bender's Big Score.mkv` |
| Anime seasonal | same as TV | `Cowboy Bebop (1998) - S01E01 - Asteroid Blues.mkv` |
| Anime absolute | `Anime/S/S - NNNN - Title [Sub].mkv` | `One Piece - 0001 - I'm Luffy [Sub].mkv` |
| Music video | `MV/A/Y - T.mp4` | `Daft Punk/2013 - Get Lucky.mp4` |
| Stand-up | `Movies/P - T (Y)/P - T (Y).mkv` | `Bo Burnham - Inside (2021)/Bo Burnham - Inside (2021).mkv` |
| Extra (folder) | `<item folder>/<lowercase folder>/Title.mkv` | `featurettes/Welcome to the World of Tomorrow.mkv` |
| Extra (suffix) | `... - Title-featurette.mkv` | `Inception (2010) - Dreams Within Dreams-featurette.mkv` |
| Subtitle | `<basename>.<lang>[.flag].srt` | `Futurama (1999) - S01E01.eng.srt` |
---
## 15. Cross-references
- Doc 05 § 0 — top-level filename rules (forbidden chars, year-in-parens,
one folder per item).
- Doc 05 § 1.2 — Jellyfin's accepted movie regex.
- Doc 05 § 2.2 — Jellyfin's accepted TV regex (table of patterns).
- Doc 05 § 3.13.3 — anime numbering strategies (which we map to § 1.3
and § 1.4 here).
- Doc 05 § 8 — extras folder names (which we lowercase per § 4.5).
- Doc 03 — sidecar subtitle naming (referenced in § 2.7 and § 14).
- Doc 02 — what the scraper does after the rename, including the
`RemoteSearch/Apply` recipe to fix mis-matches.
- Doc 07 (sibling) — the operational pipeline (move, dedupe, GC) that
consumes this ruleset. When doc 07 lands, link from § 13's
verification checklist into doc 07's quarantine / re-run flow.
---
## 16. Open items / known drift
- Live `/home/user/media/tv/Futurama/` lacks the year — should be
`Futurama (1999)/`. Migration covered in doc 07.
- The script's TV-title-extraction does not yet handle parent folders
named `Specials` (mapping to `Season 00`). Workaround: rename the
folder first, then run normalize. Codify in v2.
- Edition detection priority list has been chosen by frequency-of-rip,
not by canon. If a future Blade Runner gets a "Workprint Edition"
release, the list grows.
- No automated tests for `normalize.py` yet — covered by doc 07 once
that doc lands.
---
End of doc 08. The script in § 11 is the canonical source of truth; this
doc explains it. When in doubt, run `normalize.py --help` and read the
top docstring.