1854 lines
68 KiB
Markdown
1854 lines
68 KiB
Markdown
|
|
# 08 — Filename & Folder Normalization Ruleset (tv.s8n.ru)
|
|||
|
|
|
|||
|
|
Last updated: 2026-05-08
|
|||
|
|
Server: Jellyfin 10.10.3 on nullstone, container `jellyfin`
|
|||
|
|
Library root inside container: `/media`
|
|||
|
|
Library root on host: `/home/user/media`
|
|||
|
|
|
|||
|
|
This document is the **normative ruleset** for renaming downloaded media into a
|
|||
|
|
canonical, predictable, group-tag-free shape before it lands in the live
|
|||
|
|
library tree. It is the layer between "torrent dump" and "file ready for the
|
|||
|
|
scanner".
|
|||
|
|
|
|||
|
|
Cross-links:
|
|||
|
|
|
|||
|
|
- [`05-file-structure-rules.md`](05-file-structure-rules.md) — what Jellyfin's
|
|||
|
|
parser accepts; this doc picks one of the accepted forms and locks it in.
|
|||
|
|
- [`07-cleanup-and-imports.md`](07-cleanup-and-imports.md) — the operational
|
|||
|
|
pipeline (move, dedupe, garbage collect) that consumes this ruleset. Doc 08
|
|||
|
|
defines *what* canonical looks like; doc 07 defines *how* to apply it.
|
|||
|
|
- [`02-metadata-and-titles.md`](02-metadata-and-titles.md) — what Jellyfin
|
|||
|
|
does after the rename (parse, scrape, lock).
|
|||
|
|
- [`03-subtitles.md`](03-subtitles.md) — sidecar `.srt` / `.ass` naming
|
|||
|
|
(referenced from § 5.6 below).
|
|||
|
|
|
|||
|
|
> **Status of this doc:** specification + reference implementation. The
|
|||
|
|
> `normalize.py` script in § 11 is canonical. Anything not codified by the
|
|||
|
|
> script is documentation only — when the doc and the script disagree, the
|
|||
|
|
> script wins, and the doc gets fixed.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 0. Why a normalization ruleset (and why now)
|
|||
|
|
|
|||
|
|
Doc 05 establishes that Jellyfin's parser is permissive: dots, dashes,
|
|||
|
|
underscores, and spaces are interchangeable; `S01E01`, `s01e01`, `1x01`, and
|
|||
|
|
`Season 1 Episode 1` all parse to the same thing. That permissiveness is great
|
|||
|
|
for *getting Jellyfin to scrape a torrent dump*, but it is a disaster for
|
|||
|
|
**operating a library at scale**:
|
|||
|
|
|
|||
|
|
1. **Search becomes noisy.** SMB / Syncthing / Dolphin search across mixed
|
|||
|
|
patterns surfaces irrelevant matches (`S01E01` vs `1x01` vs `s01.e01`).
|
|||
|
|
2. **Diff / audit / dedupe scripts** get harder. Every regex needs to handle
|
|||
|
|
N forms. The cleanup pass (doc 07) is dramatically cheaper if every file
|
|||
|
|
in the tree obeys one shape.
|
|||
|
|
3. **Visual scan in `ls`** becomes unreadable when half the filenames have
|
|||
|
|
`[1080p AI x265 10bit FS99 Joy]` glued on and the other half don't.
|
|||
|
|
4. **Future migrations** (Plex, Kodi, mobile sync to a Win/Mac client) all
|
|||
|
|
have stricter parsers than Jellyfin. The strictest sane shape that
|
|||
|
|
Jellyfin accepts is also the most portable. Pay the cost once.
|
|||
|
|
5. **Cross-platform safety.** This deploy is Linux-only today, but the
|
|||
|
|
workspace's Syncthing setup (see ai-lab `SYSTEM.md`) implies future
|
|||
|
|
sync to Win/Mac clients. Choose Windows-safe filenames now and never
|
|||
|
|
touch this again.
|
|||
|
|
|
|||
|
|
The cost of the ruleset is one Python script and discipline at import time.
|
|||
|
|
Both are bounded. The cost of *not* having one compounds with every new
|
|||
|
|
release.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 1. Canonical formats — what the tree must look like
|
|||
|
|
|
|||
|
|
This is the lock-in. **One shape per category. No alternatives. No "but my
|
|||
|
|
release group did it differently".**
|
|||
|
|
|
|||
|
|
### 1.1 Movies
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Movies/<Title> (<Year>)/<Title> (<Year>).<ext>
|
|||
|
|
Movies/<Title> (<Year>)/<Title> (<Year>) - <Edition>.<ext> (when edition matters)
|
|||
|
|
Movies/<Title> (<Year>) [<provider-id>]/<Title> (<Year>) [<provider-id>].<ext> (when ambiguous)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- `<Title>` — smart title case (§ 5.1), forbidden chars stripped (§ 5.5).
|
|||
|
|
- `<Year>` — first theatrical-release year, in parens, single space before `(`.
|
|||
|
|
Mandatory in this deploy (doc 05 § 0 rule 5), even when the title is unique.
|
|||
|
|
- `<Edition>` — when present, exactly one of:
|
|||
|
|
`Director's Cut`, `Extended`, `Theatrical`, `IMAX`, `Unrated`, `Final Cut`,
|
|||
|
|
`Remastered`. Anything else (e.g. `Snyder Cut`, `Workprint`, `4K
|
|||
|
|
Remaster`) is admissible only with a written justification in the import
|
|||
|
|
log; otherwise normalize to the closest of the seven canonical labels
|
|||
|
|
above.
|
|||
|
|
- `<provider-id>` — `imdbid-tt0123456` / `tmdbid-12345` / `tvdbid-12345`
|
|||
|
|
in square brackets. Optional unless year-based disambiguation isn't
|
|||
|
|
enough (§ 6.2).
|
|||
|
|
- `<ext>` — lowercase: `mkv`, `mp4`, `webm`, `avi`. (`mkv` is the rip
|
|||
|
|
default; `mp4` is the streaming-original default.) Never uppercase
|
|||
|
|
`.MKV`, `.MP4`.
|
|||
|
|
|
|||
|
|
**Forbidden in the filename**: resolution tags (`1080p`, `2160p`, `720p`,
|
|||
|
|
`4K`), codec tags (`x264`, `x265`, `h264`, `h265`, `HEVC`, `AVC`), source
|
|||
|
|
tags (`WEB`, `WEB-DL`, `BluRay`, `BRRip`, `HDTV`, `DVDRip`, `WEBRip`),
|
|||
|
|
audio tags (`AAC`, `AC3`, `DTS`, `DTS-HD.MA`, `5.1`, `7.1`, `Atmos`,
|
|||
|
|
`Opus`), bitness/HDR tags (`10bit`, `8bit`, `HDR`, `DV`, `SDR`), release
|
|||
|
|
tags (`PROPER`, `REPACK`, `INTERNAL`, `LIMITED`, `RERIP`), language tags
|
|||
|
|
(`MULTi`, `DUBBED`, `SUBBED`, `iNTERNAL`), group tags
|
|||
|
|
(`[YIFY]`, `[RARBG]`, `[FS99 Joy]`, `-NOGRP`, `-EVO`, `-SPARKS`),
|
|||
|
|
and website refs (`WWW.YIFY-TORRENTS.COM`, `RARBG.txt`-derived names).
|
|||
|
|
|
|||
|
|
**Justification — why no resolution/codec tag:**
|
|||
|
|
|
|||
|
|
Jellyfin reads stream attributes (resolution, codec, bit-depth, HDR, audio
|
|||
|
|
codec) directly from the file via `ffprobe` on every scan. The web UI
|
|||
|
|
displays them. The mobile clients display them. The transcoder picks
|
|||
|
|
based on them. The filename contributes **zero new information**.
|
|||
|
|
Including those tags pollutes search results, breaks the byte-exact
|
|||
|
|
folder-vs-file match required for multi-version movies (doc 05 § 1.2),
|
|||
|
|
and makes humans skim past the title to find the title. The only
|
|||
|
|
exception is `Movie (Year) - 1080p.mkv` AS the multi-version label
|
|||
|
|
when two distinct rips of *the same movie* are kept in the same folder
|
|||
|
|
(e.g. `Blade Runner 2049 (2017) - 2160p.mkv` next to
|
|||
|
|
`Blade Runner 2049 (2017) - 1080p.mkv`). In that exact case, the
|
|||
|
|
resolution IS the disambiguation token. Otherwise, no.
|
|||
|
|
|
|||
|
|
#### Examples
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Movies/Blade Runner (1982)/Blade Runner (1982).mkv
|
|||
|
|
Movies/Blade Runner (1982)/Blade Runner (1982) - Final Cut.mkv
|
|||
|
|
Movies/Blade Runner (1982)/Blade Runner (1982) - Director's Cut.mkv
|
|||
|
|
Movies/Blade Runner 2049 (2017)/Blade Runner 2049 (2017) - 2160p.mkv
|
|||
|
|
Movies/Blade Runner 2049 (2017)/Blade Runner 2049 (2017) - 1080p.mkv
|
|||
|
|
Movies/Dune (1984) [imdbid-tt0087182]/Dune (1984) [imdbid-tt0087182].mkv
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 1.2 TV shows
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
TV/<Show> (<Year>)/Season <NN>/<Show> (<Year>) - S<NN>E<MM> - <Episode Title>.<ext>
|
|||
|
|
TV/<Show> (<Year>)/Season <NN>/<Show> (<Year>) - S<NN>E<MM>-E<MM2> - <Episode Title>.<ext>
|
|||
|
|
TV/<Show> (<Year>)/Season 00/<Show> (<Year>) - S00E<MM> - <Special Title>.<ext>
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- `<Show>` — smart title case, no provider-id in show folder unless the
|
|||
|
|
scraper picks the wrong show twice in a row (then add `[tvdbid-NNNN]`).
|
|||
|
|
- `<Year>` — series **first-air year**, mandatory even when title is unique
|
|||
|
|
(doc 05 § 0 rule 5; this deploy convention is stricter than upstream
|
|||
|
|
permissive parsing).
|
|||
|
|
- `<NN>` — zero-padded two digits. `Season 01`, not `Season 1`. `S01`, not `S1`.
|
|||
|
|
- `<MM>` — zero-padded two digits. Three digits permissible only for shows
|
|||
|
|
that exceed 99 episodes per *season* (rare; e.g. some daily anime). See
|
|||
|
|
doc 05 § 3.1.
|
|||
|
|
- `<Episode Title>` — title from the metadata provider (TVDB/TMDB) with
|
|||
|
|
smart title case. Required for human readability; Jellyfin overwrites it
|
|||
|
|
during scrape but the file basename is what humans see in `ls`.
|
|||
|
|
- Multi-episode files: `S<NN>E<MM>-E<MM2>` — single hyphen, no spaces.
|
|||
|
|
Verified parsing per doc 05 § 2.2 table.
|
|||
|
|
|
|||
|
|
#### Examples
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
TV/Futurama (1999)/Season 01/Futurama (1999) - S01E01 - Space Pilot 3000.mkv
|
|||
|
|
TV/Futurama (1999)/Season 01/Futurama (1999) - S01E03-E04 - I, Roommate / Love's Labours Lost in Space.mkv
|
|||
|
|
TV/Futurama (1999)/Season 00/Futurama (1999) - S00E01 - Bender's Big Score.mkv
|
|||
|
|
TV/The Office (2005)/Season 02/The Office (2005) - S02E01 - The Dundies.mkv
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### Why this shape (not the slimmer `Show S01E01.mkv`)
|
|||
|
|
|
|||
|
|
Doc 05 § 2.2 shows three accepted patterns:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Futurama (1999) S01E01.mkv
|
|||
|
|
Futurama (1999) S01E01 - Space Pilot 3000.mkv
|
|||
|
|
Futurama (1999) - S01E01 - Space Pilot 3000.mkv ← canonical for this deploy
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The third form (with the leading ` - ` before `S01E01` and the title) is
|
|||
|
|
chosen because:
|
|||
|
|
|
|||
|
|
1. The leading dash visually separates the series-name block from the
|
|||
|
|
episode-id block. Important when the show's title contains spaces and
|
|||
|
|
numbers (`Star Trek The Next Generation S01E01`) — without the dash, the
|
|||
|
|
eye trips over `Generation S01E01`.
|
|||
|
|
2. Symmetric with the Movies multi-version pattern (`Title (Year) - <Label>`).
|
|||
|
|
One mental model for the whole library.
|
|||
|
|
3. Identical to the Sonarr default rename pattern (`{Series Title} -
|
|||
|
|
S{season:00}E{episode:00} - {Episode Title}`), which means the naming
|
|||
|
|
pattern is well-trodden and tooling friendly.
|
|||
|
|
|
|||
|
|
### 1.3 Anime — seasonal numbering (TVDB-style)
|
|||
|
|
|
|||
|
|
Same shape as TV (§ 1.2). Mandatory year. Mandatory `Season NN`. No
|
|||
|
|
absolute numbers.
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Anime/<Show> (<Year>)/Season <NN>/<Show> (<Year>) - S<NN>E<MM> - <Episode Title>.<ext>
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### Examples
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Anime/Cowboy Bebop (1998)/Season 01/Cowboy Bebop (1998) - S01E01 - Asteroid Blues.mkv
|
|||
|
|
Anime/Mushishi (2005)/Season 02/Mushishi (2005) - S02E01 - The Sleeping Mountain.mkv
|
|||
|
|
Anime/Steins;Gate (2011) [tvdbid-244061]/Season 01/Steins;Gate (2011) [tvdbid-244061] - S01E01 - Turning Point.mkv
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
(`;` is legal on `ext4` but flagged in § 5.5 as risky for portability —
|
|||
|
|
prefer `Steins-Gate` if portability matters.)
|
|||
|
|
|
|||
|
|
### 1.4 Anime — absolute numbering
|
|||
|
|
|
|||
|
|
Used **only** for shows >99 episodes that don't fit the seasonal model
|
|||
|
|
(One Piece, Naruto, Detective Conan, Bleach). For those shows, the
|
|||
|
|
canonical shape is:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Anime/<Show>/<Show> - <NNNN> - <Episode Title> [<Sub|Dub>].<ext>
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- No `(<Year>)` on the show folder — absolute-numbering shows are usually
|
|||
|
|
unique by name; if not, fall back to a provider ID
|
|||
|
|
(`Doraemon (1979) [tvdbid-71603]`, then revert to seasonal Pattern 1.3).
|
|||
|
|
- `<NNNN>` — **zero-padded four digits** (deterministic; all known
|
|||
|
|
long-runners stay below 9999). Three-digit padding (`0099`) is wrong;
|
|||
|
|
four-digit (`0099`) is right and matches the upper bound of the longest
|
|||
|
|
running show.
|
|||
|
|
- `[<Sub|Dub>]` — exactly one of `[Sub]` or `[Dub]`. Required for any
|
|||
|
|
release where both audio tracks are not embedded in one mkv. If the
|
|||
|
|
release contains both audio tracks in one container, omit the
|
|||
|
|
bracket.
|
|||
|
|
- No `Season NN` folder. Absolute numbering puts every episode in the
|
|||
|
|
show root.
|
|||
|
|
|
|||
|
|
#### Deterministic absolute-numbering rule
|
|||
|
|
|
|||
|
|
Absolute number = the episode's position in the **broadcast order** as
|
|||
|
|
listed by AniDB's "main" episode list for that show. NOT the dub broadcast
|
|||
|
|
order, NOT a re-cut/remaster renumbering. For shows with discrepancies
|
|||
|
|
between AniDB and TVDB absolute numbering (rare), AniDB wins — that's the
|
|||
|
|
provider that absolute-numbering plugins (and Shoko) use.
|
|||
|
|
|
|||
|
|
#### Examples
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Anime/One Piece/One Piece - 0001 - I'm Luffy! The Man Who's Gonna Be King of the Pirates! [Sub].mkv
|
|||
|
|
Anime/One Piece/One Piece - 0001 - I'm Luffy! The Man Who's Gonna Be King of the Pirates! [Dub].mkv
|
|||
|
|
Anime/Naruto/Naruto - 0001 - Enter Naruto Uzumaki [Sub].mkv
|
|||
|
|
Anime/Detective Conan/Detective Conan - 1099 - The Detective's Vacation [Sub].mkv
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### Caveat
|
|||
|
|
|
|||
|
|
Naive Jellyfin without Shoko will mis-handle episodes >99 (doc 05 § 3.3).
|
|||
|
|
This is a known issue; pick **one** of:
|
|||
|
|
|
|||
|
|
- Run Shoko (doc 05 § 3.2). Filenames don't matter for Shoko — but obey
|
|||
|
|
this ruleset anyway, for human readability and for the day Shoko goes
|
|||
|
|
away.
|
|||
|
|
- Re-bucket by TVDB seasons. Most long-runners have a TVDB season split
|
|||
|
|
(One Piece S01-S22). Use § 1.3 with the seasons.
|
|||
|
|
|
|||
|
|
This deploy currently does NOT run Shoko; it currently does NOT host any
|
|||
|
|
absolute-numbered anime. The shape in § 1.4 is reserved for the day
|
|||
|
|
Shoko gets installed. Leave it documented.
|
|||
|
|
|
|||
|
|
### 1.5 Music videos
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
MusicVideos/<Artist>/<Year> - <Track Title>.<ext>
|
|||
|
|
MusicVideos/<Artist>/<Year> - <Track Title> [<Variant>].<ext> (when multiple cuts exist)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- `<Artist>` — smart title case, comma-separated for collabs
|
|||
|
|
(`Daft Punk, Pharrell Williams`).
|
|||
|
|
- `<Year>` — release year of the *video*, not the song. Songs older than
|
|||
|
|
their videos are common (a 2024 acoustic cover gets the 2024 year).
|
|||
|
|
- `<Track Title>` — smart title case.
|
|||
|
|
- `<Variant>` — optional, `[Live]`, `[Acoustic]`, `[Remix]`, `[Alternate]`,
|
|||
|
|
`[Lyric Video]`. Forbidden: `[1080p]`, `[Official]`, `[HD]`.
|
|||
|
|
|
|||
|
|
Music videos do not use `(<Year>)` parens because the library is
|
|||
|
|
`musicvideos` `CollectionType`, which has no scraper (doc 05 § 5.3) and the
|
|||
|
|
year is purely cosmetic.
|
|||
|
|
|
|||
|
|
#### Examples
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
MusicVideos/Daft Punk/2013 - Get Lucky.mp4
|
|||
|
|
MusicVideos/Daft Punk/2013 - Get Lucky [Lyric Video].mp4
|
|||
|
|
MusicVideos/Pink Floyd/1995 - Comfortably Numb [Live].mkv
|
|||
|
|
MusicVideos/Daft Punk, Pharrell Williams/2013 - Get Lucky.mp4
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
For full **live concerts** (>20 min, multi-song), file under Movies
|
|||
|
|
instead, per doc 05 § 5.4.
|
|||
|
|
|
|||
|
|
### 1.6 Stand-up specials (Movies-typed)
|
|||
|
|
|
|||
|
|
Stand-up lives in the Movies library (doc 05 § 4). Folder + filename are
|
|||
|
|
prefixed with the performer name; treat the whole `<Performer> - <Title>`
|
|||
|
|
as the canonical "movie title" for parser purposes.
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Movies/<Performer> - <Title> (<Year>)/<Performer> - <Title> (<Year>).<ext>
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### Examples
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Movies/Bo Burnham - Inside (2021)/Bo Burnham - Inside (2021).mkv
|
|||
|
|
Movies/Hannah Gadsby - Nanette (2018) [imdbid-tt8465676]/Hannah Gadsby - Nanette (2018) [imdbid-tt8465676].mkv
|
|||
|
|
Movies/Norm Macdonald - Nothing Special (2022)/Norm Macdonald - Nothing Special (2022).mkv
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The `<Performer> - ` prefix is **mandatory** for stand-up. Without it, the
|
|||
|
|
title alone (`Inside (2021)`) ambiguously matches the 2007 horror film
|
|||
|
|
*Inside*, the 2023 thriller *Inside*, or the 2017 documentary *Inside*.
|
|||
|
|
The prefix gives TMDB enough disambiguation to land on the correct
|
|||
|
|
record without a provider-id override.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 2. What to STRIP from a source filename — exhaustive list
|
|||
|
|
|
|||
|
|
This is the substring inventory. The script in § 11 implements all of
|
|||
|
|
these. The list grew from sampling ~200 distinct release-group filenames
|
|||
|
|
across `[YIFY]`, `[RARBG]`, `[ettv]`, `[GalaxyRG]`, `[FS99 Joy]`,
|
|||
|
|
`[NOGRP]`, `[FitGirl]`, and the Futurama corpus on disk.
|
|||
|
|
|
|||
|
|
### 2.1 Group tags (square / round brackets)
|
|||
|
|
|
|||
|
|
Match anything inside `[...]` or `(...)` *that does not look like a year*.
|
|||
|
|
Year detection: 4 digits, 1900 ≤ N ≤ current year + 2.
|
|||
|
|
|
|||
|
|
Exemplar substrings (case-insensitive):
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
[1080p AI x265 10bit FS99 Joy]
|
|||
|
|
[YIFY]
|
|||
|
|
[YTS]
|
|||
|
|
[YTS.MX]
|
|||
|
|
[YTS.AG]
|
|||
|
|
[YTS.AM]
|
|||
|
|
[RARBG]
|
|||
|
|
[ettv]
|
|||
|
|
[eztv]
|
|||
|
|
[GalaxyRG]
|
|||
|
|
[GalaxyRG265]
|
|||
|
|
[FitGirl]
|
|||
|
|
[FitGirl Repack]
|
|||
|
|
[NOGRP]
|
|||
|
|
[QxR]
|
|||
|
|
[FreetheFish]
|
|||
|
|
[psa]
|
|||
|
|
[PSA]
|
|||
|
|
[CMRG]
|
|||
|
|
[d3g]
|
|||
|
|
[STRiFE]
|
|||
|
|
[Pahe.in]
|
|||
|
|
[FoV]
|
|||
|
|
[NTb]
|
|||
|
|
[YOLO]
|
|||
|
|
[KOGi]
|
|||
|
|
[playWEB]
|
|||
|
|
[REQ]
|
|||
|
|
[XBET]
|
|||
|
|
[FLUX]
|
|||
|
|
[NOSiVID]
|
|||
|
|
[BGT]
|
|||
|
|
[SVA]
|
|||
|
|
[CRiMSON]
|
|||
|
|
[ION10]
|
|||
|
|
[ION265]
|
|||
|
|
[BluPanda]
|
|||
|
|
[H4S5S]
|
|||
|
|
[5.1]
|
|||
|
|
(YIFY)
|
|||
|
|
(RARBG)
|
|||
|
|
(NOGRP)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2.2 Trailing release-group dashes
|
|||
|
|
|
|||
|
|
Pattern: `-<UPPERCASE_TOKEN>` at the very end of the basename
|
|||
|
|
(before extension). Matches:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
-NOGRP
|
|||
|
|
-EVO
|
|||
|
|
-RARBG
|
|||
|
|
-SPARKS
|
|||
|
|
-CMRG
|
|||
|
|
-NTb
|
|||
|
|
-FLUX
|
|||
|
|
-AMZN
|
|||
|
|
-NF
|
|||
|
|
-DSNP
|
|||
|
|
-ATVP
|
|||
|
|
-MA
|
|||
|
|
-WEB
|
|||
|
|
-AAC2
|
|||
|
|
-FoV
|
|||
|
|
-KOGi
|
|||
|
|
-PLAYWEB
|
|||
|
|
-FRDS
|
|||
|
|
-ZQ
|
|||
|
|
-PHOENiX
|
|||
|
|
-EZTV
|
|||
|
|
-NTG
|
|||
|
|
-iON
|
|||
|
|
-ION10
|
|||
|
|
-ION265
|
|||
|
|
-CtrlHD
|
|||
|
|
-d3g
|
|||
|
|
-PSA
|
|||
|
|
-QxR
|
|||
|
|
-RZeroX
|
|||
|
|
-PMP
|
|||
|
|
-BTN
|
|||
|
|
-DEFLATE
|
|||
|
|
-BAE
|
|||
|
|
-MZABI
|
|||
|
|
-TURG
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The pattern `-[A-Z][A-Z0-9]{1,15}$` (after stripping bracket tags and
|
|||
|
|
quality tags) captures most of these. The script in § 11 uses an
|
|||
|
|
allow-list approach instead of a pattern, because release groups
|
|||
|
|
sometimes exceed 15 chars and sometimes use mixed case.
|
|||
|
|
|
|||
|
|
### 2.3 Quality / codec / source / audio tags
|
|||
|
|
|
|||
|
|
Strip all of these as standalone tokens (whitespace-, dot-, dash-, or
|
|||
|
|
underscore-bounded), case-insensitive:
|
|||
|
|
|
|||
|
|
**Resolution / aspect:**
|
|||
|
|
```
|
|||
|
|
2160p 1080p 720p 480p 360p 4K 4k UHD HD SD FHD QHD
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Source:**
|
|||
|
|
```
|
|||
|
|
WEB-DL WEBDL WEB.DL WEB WEBRip WEB-Rip BluRay BLURAY Bluray BDRip
|
|||
|
|
BRRip BR-Rip BDR HDTV HDTVRip PDTV DSR DVDRip DVD DVDR DVD9 DVD5
|
|||
|
|
HDDVD HDDVDRip HDRip CAMRip CAM TS HDTS TC TELESYNC TELECINE R5
|
|||
|
|
SCREENER SCR WORKPRINT WP PPV PPVRip
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Codec / container hints (in name):**
|
|||
|
|
```
|
|||
|
|
x264 x265 H.264 H264 H.265 H265 HEVC AVC VP9 AV1 XviD DivX
|
|||
|
|
10bit 10-bit 8bit 8-bit HDR HDR10 HDR10+ DV DolbyVision Dolby.Vision
|
|||
|
|
SDR HFR HQ
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Audio:**
|
|||
|
|
```
|
|||
|
|
DD5.1 DDP5.1 DD7.1 DDP7.1 DD2.0 DD+5.1 DD+7.1 DTS DTS-HD DTS-HD.MA
|
|||
|
|
DTS-X DTSX TrueHD Atmos AAC AAC2.0 AAC5.1 AC3 AC-3 EAC3 E-AC3
|
|||
|
|
MP3 MP2 Opus FLAC PCM LPCM 5.1 7.1 2.0 Mono Stereo Multi
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Release-process tags:**
|
|||
|
|
```
|
|||
|
|
PROPER REPACK iNTERNAL INTERNAL LIMITED EXTENDED.CUT UNCUT THEATRiCAL
|
|||
|
|
RERIP REAL READNFO RETAiL RETAIL STV DC COMPLETE REMUX REMASTERED
|
|||
|
|
SUBBED DUBBED MULTi MULTI SUB DUB ENG ENGLISH POL POLISH iNT iNTERNAL
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
> Note: `EXTENDED.CUT`, `THEATRiCAL`, `UNRATED`, `IMAX`, `DIRECTORS.CUT`,
|
|||
|
|
> `FINAL.CUT`, `REMASTERED`, `UNCUT`, `DC` (= Director's Cut shorthand),
|
|||
|
|
> `EE` (= Extended Edition shorthand) are kept *as edition tokens* — see
|
|||
|
|
> § 3.6. Strip them from the noise pool, then re-emit them as
|
|||
|
|
> ` - <Edition>` if present.
|
|||
|
|
|
|||
|
|
### 2.4 Source-specific cruft
|
|||
|
|
|
|||
|
|
Common compound suffixes that are not single tokens:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
WEB.h264-NiXON[rartv]
|
|||
|
|
WEB-DL.DDP5.1.x264-NTb
|
|||
|
|
BDRip.x265.10bit-RZeroX
|
|||
|
|
HDTV.x264-PHOENiX
|
|||
|
|
1080p.WEB.h264-NiXON
|
|||
|
|
2160p.UHD.BluRay.REMUX.HDR.HEVC.DTS-HD.MA.5.1
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
These are ad-hoc concatenations; once the standalone tokens above are
|
|||
|
|
stripped, what remains is the title plus stray separators. The pipeline
|
|||
|
|
in § 4 collapses separators last, so order matters.
|
|||
|
|
|
|||
|
|
### 2.5 Whitespace / punctuation cleanup
|
|||
|
|
|
|||
|
|
After substring removal, run these passes:
|
|||
|
|
|
|||
|
|
| Pass | From | To |
|
|||
|
|
|---|---|---|
|
|||
|
|
| Collapse runs of spaces | `Show Title S01E01` | `Show Title S01E01` |
|
|||
|
|
| Trim leading/trailing whitespace | ` Show.mkv ` | `Show.mkv` |
|
|||
|
|
| Collapse double-underscore | `Show__Title` | `Show Title` |
|
|||
|
|
| Replace dot-separators with space (basename only) | `Show.Title.S01E01` | `Show Title S01E01` |
|
|||
|
|
| Drop stray punctuation runs | `Show --- Title` | `Show - Title` |
|
|||
|
|
| Strip trailing dashes/dots before ext | `Show -.mkv` | `Show.mkv` |
|
|||
|
|
|
|||
|
|
The dot-to-space substitution is **only applied if the dot is between
|
|||
|
|
alphanumeric tokens** — so `5.1` (audio channel count, already removed
|
|||
|
|
in § 2.3) is safe, and `Mr. Robot` keeps its dot if the source uses
|
|||
|
|
`Mr.Robot` (the dot becomes a space, giving `Mr Robot` — the canonical
|
|||
|
|
form has no dot).
|
|||
|
|
|
|||
|
|
### 2.6 URL / website refs
|
|||
|
|
|
|||
|
|
Match and remove:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
WWW.YIFY-TORRENTS.COM
|
|||
|
|
WWW.YTS.MX
|
|||
|
|
WWW.RARBG.TO
|
|||
|
|
RARBG.txt
|
|||
|
|
www.yify-torrents.com
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
These appear as bracket prefixes (`[WWW.YIFY-TORRENTS.COM] Movie...`),
|
|||
|
|
suffixes (`Movie - WWW.YIFY-TORRENTS.COM.mkv`), or as `RARBG.txt`-style
|
|||
|
|
sidecar files (which doc 07 garbage-collects, not us).
|
|||
|
|
|
|||
|
|
Pattern (case-insensitive): `(?:^|[\s\[\(\.\-_])(WWW\.[A-Z0-9\-]+\.[A-Z]{2,4})(?:[\s\]\)\.\-_]|$)` → strip whole match.
|
|||
|
|
|
|||
|
|
### 2.7 Language indicators in the BASE name
|
|||
|
|
|
|||
|
|
`.pl`, `.eng`, `.en`, `.pol`, `.de`, `.fr`, `.es`, `.it`, `.ja`, `.jp`,
|
|||
|
|
`.ru`, `.ko`, `.zh` appearing in the **video** filename (basename, not
|
|||
|
|
extension). These belong on **subtitle sidecars only**, per doc 03.
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Futurama.s01e01.pl.mkv ← BAD (`.pl` in video basename)
|
|||
|
|
Futurama (1999) - S01E01.mkv ← GOOD (audio language is a stream attribute)
|
|||
|
|
Futurama (1999) - S01E01.pl.srt ← GOOD (subtitle sidecar with lang)
|
|||
|
|
Futurama (1999) - S01E01.eng.srt ← GOOD
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Detection: 2- or 3-letter ISO-639 code as a token between dots / dashes /
|
|||
|
|
underscores in the basename. If found, drop it from the basename. If a
|
|||
|
|
sidecar `.srt` exists with the same lang token, **leave the sidecar
|
|||
|
|
alone** — it's already correctly named.
|
|||
|
|
|
|||
|
|
If the source file is a `.srt` / `.ass` / `.vtt` / `.sub`, the lang
|
|||
|
|
token is part of the canonical sidecar form and must NOT be stripped.
|
|||
|
|
The script's `--type subtitle` mode handles this branch.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 3. The normalization pipeline (regex / sed / python)
|
|||
|
|
|
|||
|
|
Conceptual order — each step's output feeds the next.
|
|||
|
|
|
|||
|
|
### 3.1 Step 0 — Determine target schema
|
|||
|
|
|
|||
|
|
Caller-supplied: `--type {movie|tv|anime-seasonal|anime-absolute|musicvideo|standup|extra}`. The
|
|||
|
|
script does not guess. Doc 07's import wrapper picks the type based on
|
|||
|
|
which library tree the file is being moved into.
|
|||
|
|
|
|||
|
|
### 3.2 Step 1 — Split off extension
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
basename, ext = os.path.splitext(source_filename)
|
|||
|
|
ext = ext.lower().lstrip(".") # canonical lowercase, no leading dot
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Validate: `ext in {"mkv", "mp4", "avi", "webm", "m4v", "srt", "ass", "ssa", "vtt", "sub", "idx"}`.
|
|||
|
|
Anything else → reject with an error; doc 07 quarantines it.
|
|||
|
|
|
|||
|
|
### 3.3 Step 2 — Extract S<NN>E<MM> (TV / anime-seasonal only)
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
import re
|
|||
|
|
RE_SEASON_EPISODE = re.compile(r"[Ss](\d{1,2})[Ee](\d{1,3})(?:-[Ee]?(\d{1,3}))?")
|
|||
|
|
m = RE_SEASON_EPISODE.search(basename)
|
|||
|
|
if not m:
|
|||
|
|
# try alternative forms before giving up
|
|||
|
|
m = re.search(r"(?<![\dA-Za-z])(\d{1,2})x(\d{1,3})(?:-(\d{1,3}))?", basename)
|
|||
|
|
if m:
|
|||
|
|
season, ep, ep_end = m.group(1), m.group(2), m.group(3)
|
|||
|
|
else:
|
|||
|
|
m = re.search(r"Season\s*(\d{1,2})\s*Episode\s*(\d{1,3})", basename, re.I)
|
|||
|
|
# ...
|
|||
|
|
season = f"{int(m.group(1)):02d}"
|
|||
|
|
episode = f"{int(m.group(2)):02d}"
|
|||
|
|
episode_end = f"{int(m.group(3)):02d}" if m.group(3) else None
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
If no S/E found and `--type tv|anime-seasonal`, error out — the file can
|
|||
|
|
only be normalized if season/episode are recoverable.
|
|||
|
|
|
|||
|
|
### 3.4 Step 3 — Extract episode title
|
|||
|
|
|
|||
|
|
After step 2, the matched span is the boundary. Episode title is the text
|
|||
|
|
**between** the SxxExx end and the **first** of: `[`, `(`, end-of-string,
|
|||
|
|
group-tag delimiter, end-of-line.
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
after_se = basename[m.end():]
|
|||
|
|
# strip any leading separators
|
|||
|
|
title_part = re.split(r"[\[\(]|\s-\s[A-Z][A-Z0-9]+$", after_se, maxsplit=1)[0]
|
|||
|
|
title_part = title_part.strip(" -._")
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
If the title-part is empty after strip, leave it empty (script emits no
|
|||
|
|
trailing title — `Show S01E01.mkv` is still canonical when no title is
|
|||
|
|
known).
|
|||
|
|
|
|||
|
|
### 3.5 Step 4 — Extract series / movie title (from parent folder)
|
|||
|
|
|
|||
|
|
The **parent folder name** is the source of truth for series/movie title,
|
|||
|
|
not the filename, because torrents commonly have inconsistent
|
|||
|
|
filename-prefixes within the same folder (`Show.S01E01.x264.mkv` vs
|
|||
|
|
`Show Title - S01E02.mkv`).
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
parent = os.path.basename(os.path.dirname(source_path))
|
|||
|
|
# strip group tags and quality from the parent folder too
|
|||
|
|
clean_parent = strip_noise(parent)
|
|||
|
|
# extract year if present
|
|||
|
|
year_match = re.search(r"\((\d{4})\)", clean_parent)
|
|||
|
|
year = year_match.group(1) if year_match else None
|
|||
|
|
title = re.sub(r"\s*\(\d{4}\).*$", "", clean_parent).strip()
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Edge case: parent folder is `Season 01` (TV) — recurse one more level up
|
|||
|
|
to the show folder. The script handles N levels of `Season \d+` parents.
|
|||
|
|
|
|||
|
|
### 3.6 Step 5 — Detect edition tokens (Movies only)
|
|||
|
|
|
|||
|
|
After § 2.3 strips edition tags from the noise pool, scan the **original**
|
|||
|
|
basename for canonical edition keywords:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
EDITIONS = {
|
|||
|
|
r"director'?s?[\.\s_-]*cut": "Director's Cut",
|
|||
|
|
r"extended[\.\s_-]*(?:cut|edition)?": "Extended",
|
|||
|
|
r"theatrical(?:[\.\s_-]*cut)?": "Theatrical",
|
|||
|
|
r"final[\.\s_-]*cut": "Final Cut",
|
|||
|
|
r"imax": "IMAX",
|
|||
|
|
r"unrated": "Unrated",
|
|||
|
|
r"remastered?": "Remastered",
|
|||
|
|
r"\bDC\b": "Director's Cut", # DC shorthand
|
|||
|
|
r"\bEE\b": "Extended", # EE shorthand
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Match the first one found, in priority order (Director's Cut > Final Cut
|
|||
|
|
> Extended > Theatrical > IMAX > Unrated > Remastered). Emit as
|
|||
|
|
` - <Edition>` between title-year block and extension.
|
|||
|
|
|
|||
|
|
### 3.7 Step 6 — Collapse, trim, re-emit canonical
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
def emit_canonical(schema, parts):
|
|||
|
|
if schema == "movie":
|
|||
|
|
if parts.edition:
|
|||
|
|
return f"{parts.title} ({parts.year}) - {parts.edition}.{parts.ext}"
|
|||
|
|
return f"{parts.title} ({parts.year}).{parts.ext}"
|
|||
|
|
if schema == "tv" or schema == "anime-seasonal":
|
|||
|
|
ep_range = f"S{parts.season}E{parts.episode}"
|
|||
|
|
if parts.episode_end:
|
|||
|
|
ep_range += f"-E{parts.episode_end}"
|
|||
|
|
if parts.episode_title:
|
|||
|
|
return f"{parts.title} ({parts.year}) - {ep_range} - {parts.episode_title}.{parts.ext}"
|
|||
|
|
return f"{parts.title} ({parts.year}) - {ep_range}.{parts.ext}"
|
|||
|
|
if schema == "anime-absolute":
|
|||
|
|
suffix = f" [{parts.subdub}]" if parts.subdub else ""
|
|||
|
|
return f"{parts.title} - {parts.absolute_number} - {parts.episode_title}{suffix}.{parts.ext}"
|
|||
|
|
if schema == "musicvideo":
|
|||
|
|
variant = f" [{parts.variant}]" if parts.variant else ""
|
|||
|
|
return f"{parts.year} - {parts.track_title}{variant}.{parts.ext}"
|
|||
|
|
if schema == "standup":
|
|||
|
|
return f"{parts.performer} - {parts.title} ({parts.year}).{parts.ext}"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
After emission, run § 5.5 forbidden-character substitution, then § 5.6
|
|||
|
|
double-space collapse, one final time.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 4. Folder normalization
|
|||
|
|
|
|||
|
|
The same rules as filenames, applied to directory names, with a few
|
|||
|
|
schema-specific adjustments.
|
|||
|
|
|
|||
|
|
### 4.1 Show folder — `<Show> (<Year>)`
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Futurama Season 1 [1080p AI x265 10bit FS99 Joy]/ → Futurama (1999)/
|
|||
|
|
The Office US S01-S09 1080p WEB-DL/ → The Office (2005)/
|
|||
|
|
[YIFY] Inception 2010 1080p BRRip x264/ → Inception (2010)/ ← but this is movies
|
|||
|
|
Cowboy.Bebop.1998.Complete.BluRay.x265.10bit/ → Cowboy Bebop (1998)/
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Year: derived from the metadata provider (TVDB/TMDB) on first scrape, or
|
|||
|
|
from the user-supplied `--year` flag. If neither is available,
|
|||
|
|
`normalize.py --type tv` errors out and asks for `--year`. Year guessing
|
|||
|
|
from parent-folder-numbers is unsafe (`Star Trek 2009` is the movie, not
|
|||
|
|
the series).
|
|||
|
|
|
|||
|
|
### 4.2 Season folder — `Season <NN>`
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Season 1/ → Season 01/
|
|||
|
|
Season1/ → Season 01/
|
|||
|
|
Season.01/ → Season 01/
|
|||
|
|
S01/ → Season 01/
|
|||
|
|
SEASON 1 [1080p WEB Joy]/ → Season 01/
|
|||
|
|
Season 01 - Pilot Season/ → Season 01/ ← drop subtitle suffixes
|
|||
|
|
Season 01 [BluRay]/ → Season 01/
|
|||
|
|
Specials/ → Season 00/
|
|||
|
|
Season 0/ → Season 00/
|
|||
|
|
Extras/ → Season 00/ ← only if treated-as-specials
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Doc 05 § 2.3 is explicit: `Specials/`, `Season 0/`, `Season Specials/` do
|
|||
|
|
not match the parser. `Season 00` is the only correct form.
|
|||
|
|
|
|||
|
|
### 4.3 Movie folder — `<Title> (<Year>)`
|
|||
|
|
|
|||
|
|
Same rules as the filename without the extension. The folder name MUST
|
|||
|
|
byte-for-byte match the filename prefix when multi-version files are
|
|||
|
|
present (doc 05 § 1.2 — Jellyfin requires this).
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
[YIFY] Blade Runner 1982 1080p BRRip x264 AAC-RARBG/ → Blade Runner (1982)/
|
|||
|
|
Blade.Runner.2049.2017.2160p.UHD.BluRay.x265.10bit.HDR.DV.DTS-HD.MA.7.1-FreetheFish/
|
|||
|
|
→ Blade Runner 2049 (2017)/
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 4.4 Music-video artist folder — `<Artist>`
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Daft.Punk/ → Daft Punk/
|
|||
|
|
[Daft Punk]/ → Daft Punk/
|
|||
|
|
DAFT PUNK Discography/ → Daft Punk/ ← note: "Discography" is dropped; this is video lib not music
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 4.5 Special-features subfolders
|
|||
|
|
|
|||
|
|
Inside an item folder, only these subfolder names are recognised by
|
|||
|
|
Jellyfin (doc 05 § 8.2). The normalizer must rename source folders to
|
|||
|
|
the canonical lowercase form:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
BTS/ → behind the scenes/
|
|||
|
|
Behind-the-Scenes/ → behind the scenes/
|
|||
|
|
behind_the_scenes/ → behind the scenes/
|
|||
|
|
Featurettes/ → featurettes/
|
|||
|
|
DELETED SCENES [Joy]/ → deleted scenes/
|
|||
|
|
Trailers/ → trailers/
|
|||
|
|
Interviews/ → interviews/
|
|||
|
|
Bonus Content/ → extras/ ← catch-all
|
|||
|
|
Bonus_Features/ → extras/
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Files inside featurettes/ etc.** keep human-readable titles but get
|
|||
|
|
their group tags stripped:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Featurettes/Welcome to the World of Tomorrow [1080p Joy].mkv
|
|||
|
|
→ featurettes/Welcome to the World of Tomorrow.mkv
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Casing inside the special-features file *itself* uses smart title case
|
|||
|
|
(§ 5.1).
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 5. Case + character handling
|
|||
|
|
|
|||
|
|
### 5.1 Smart title case
|
|||
|
|
|
|||
|
|
Capitalize every word EXCEPT these "small words" (when not the first or
|
|||
|
|
last word of the title):
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
a, an, and, as, at, but, by, for, from, in, into, nor, of, on, or, the,
|
|||
|
|
to, up, vs, vs., via, with, yet
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Words that look like acronyms (`I.B.M.`, `C.I.A.`, `T.M.N.T.`) are
|
|||
|
|
preserved as-is. Roman numerals (`II`, `III`, `IV`, `IX`) are uppercased.
|
|||
|
|
|
|||
|
|
#### Examples
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
the lord of the rings the two towers → The Lord of the Rings the Two Towers ← BAD
|
|||
|
|
the lord of the rings: the two towers → The Lord of the Rings - The Two Towers ← GOOD (`:` → ` - `, the second `the` is at start of subtitle, capitalize)
|
|||
|
|
return of the king → Return of the King
|
|||
|
|
star trek ii the wrath of khan → Star Trek II - The Wrath of Khan
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The subtitle-after-colon special case is important: when a `: ` is
|
|||
|
|
substituted with ` - `, the word after the dash is a new "first word" for
|
|||
|
|
title-casing purposes. The script handles this by re-running the
|
|||
|
|
title-caser on each ` - ` separated chunk.
|
|||
|
|
|
|||
|
|
Jellyfin's parser is case-insensitive — this is purely for human readers.
|
|||
|
|
|
|||
|
|
### 5.2 Hyphen / dash normalization
|
|||
|
|
|
|||
|
|
| Char | Code | Used for |
|
|||
|
|
|---|---|---|
|
|||
|
|
| `-` | U+002D HYPHEN-MINUS | ASCII hyphen, the only canonical form for filenames |
|
|||
|
|
| `–` | U+2013 EN DASH | Forbidden in filenames; replace with `-` |
|
|||
|
|
| `—` | U+2014 EM DASH | Forbidden; replace with `-` |
|
|||
|
|
| `−` | U+2212 MINUS SIGN | Forbidden; replace with `-` |
|
|||
|
|
|
|||
|
|
Unicode dashes appear from copy-paste of articles (Wikipedia loves the en
|
|||
|
|
dash). They're invisible-ish in `ls`, but they break grep, shell
|
|||
|
|
completion, and SMB transfers.
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Spider–Man (2002).mkv → Spider-Man (2002).mkv
|
|||
|
|
Spider — Man (2002).mkv → Spider - Man (2002).mkv
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 5.3 Apostrophes / quotes
|
|||
|
|
|
|||
|
|
| Char | Code | Status |
|
|||
|
|
|---|---|---|
|
|||
|
|
| `'` | U+0027 APOSTROPHE | Canonical; ASCII straight quote |
|
|||
|
|
| `'` | U+2019 RIGHT SINGLE QUOTATION MARK | Forbidden in filenames; replace with `'` |
|
|||
|
|
| `'` | U+2018 LEFT SINGLE QUOTATION MARK | Forbidden; replace with `'` |
|
|||
|
|
| `"` | U+0022 QUOTATION MARK | Forbidden in filenames (Windows-illegal); strip entirely |
|
|||
|
|
| `"` | U+201C LEFT DOUBLE QUOTATION MARK | Forbidden; strip |
|
|||
|
|
| `"` | U+201D RIGHT DOUBLE QUOTATION MARK | Forbidden; strip |
|
|||
|
|
|
|||
|
|
Curly quotes break SMB shares (Windows clients see `?` and refuse to open
|
|||
|
|
the file) and break shell escaping in scripts.
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Don't Stop Believin'.mkv ← GOOD
|
|||
|
|
Don't Stop Believin'.mkv ← BAD (curly), normalize to straight
|
|||
|
|
"It's a Wonderful Life" (1946).mkv ← BAD (double quotes), strip them entirely:
|
|||
|
|
It's a Wonderful Life (1946).mkv ← GOOD
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 5.4 Diacritics / non-ASCII
|
|||
|
|
|
|||
|
|
`ext4` is UTF-8 native; Jellyfin's parser is UTF-8 native; the HTTP API
|
|||
|
|
serves UTF-8 happily. **Keep diacritics** when the title's accepted
|
|||
|
|
spelling uses them.
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Amélie (2001)/Amélie (2001).mkv ← GOOD
|
|||
|
|
Pokémon (1997)/Season 01/Pokémon (1997) - S01E01 - Pokémon - I Choose You!.mkv ← GOOD
|
|||
|
|
Léon - The Professional (1994)/Léon - The Professional (1994).mkv ← GOOD
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Doc 05 § 0 rule 4 advises caution: prefer the ASCII title when "well
|
|||
|
|
known" (e.g. `Amelie (2001)` over `Amélie (2001)`). For this deploy with
|
|||
|
|
LAN-only HTTP and `ext4`, full Unicode is safe — but the rule of thumb
|
|||
|
|
remains: if Wikipedia's English page uses the accent, keep it; if not,
|
|||
|
|
drop it.
|
|||
|
|
|
|||
|
|
**Tested:** Jellyfin's filename matching, `Items?searchTerm=`, and NFO
|
|||
|
|
`<title>` round-trip correctly with `é`, `ñ`, `ü`, `ß`, `ø`, `ł`, `ż`,
|
|||
|
|
`日`, `한` on this deploy. Verified against the Futurama Polish-dubbed
|
|||
|
|
corpus.
|
|||
|
|
|
|||
|
|
### 5.5 Forbidden-char substitution table
|
|||
|
|
|
|||
|
|
Windows-illegal: `< > : " / \ | ? *`. Linux additionally forbids `/` and
|
|||
|
|
NUL. Substitute as follows:
|
|||
|
|
|
|||
|
|
| Char | Substitute | Rationale |
|
|||
|
|
|---|---|---|
|
|||
|
|
| `:` | ` - ` (space-hyphen-space) | Most common in titles (`Star Trek II: The Wrath of Khan`); ` - ` is a clean replacement that title-casing handles |
|
|||
|
|
| `/` | ` and ` | Used in titles like `Mr. & Mrs. Smith` (no `/` there) and in episode-title lists for two-part eps. Avoid if both halves stand on their own. |
|
|||
|
|
| `\` | omit | No legitimate use in titles |
|
|||
|
|
| `<` | `(` | Rare; `<` in titles is parenthetical |
|
|||
|
|
| `>` | `)` | Same |
|
|||
|
|
| `\|` | omit (or `-`) | Rare; sometimes in `Tom \| Jerry` style logo-text |
|
|||
|
|
| `?` | omit | Common in `Who Killed the Robber?` — drop the question mark, keep meaning |
|
|||
|
|
| `*` | omit | Rare; usually censored profanity |
|
|||
|
|
| `"` | omit | Per § 5.3 |
|
|||
|
|
| `\0` (NUL) | error | Filesystem hard-block; surface to user |
|
|||
|
|
|
|||
|
|
#### Examples
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Star Trek II: The Wrath of Khan (1982) → Star Trek II - The Wrath of Khan (1982)
|
|||
|
|
Mr. & Mrs. Smith (2005) → Mr. & Mrs. Smith (2005) (no change; & is fine)
|
|||
|
|
Who Killed the Robber? (1987) → Who Killed the Robber (1987)
|
|||
|
|
Tom & Jerry: The Movie (1992) → Tom & Jerry - The Movie (1992)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 5.6 Whitespace canonicalization
|
|||
|
|
|
|||
|
|
After all substitutions:
|
|||
|
|
|
|||
|
|
1. Collapse runs of `\s+` to a single space.
|
|||
|
|
2. `strip()` leading/trailing whitespace.
|
|||
|
|
3. Collapse double-`-` (which can result from `Title -- Subtitle`) to
|
|||
|
|
single `-`.
|
|||
|
|
4. Trim trailing punctuation before extension: `Title -.mkv` → `Title.mkv`.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 6. Year disambiguation — concrete examples
|
|||
|
|
|
|||
|
|
Jellyfin's TMDB/TVDB scrape uses the year in `(YYYY)` to filter
|
|||
|
|
candidates. With multiple titles of the same name, the year is the *only*
|
|||
|
|
disambiguator before falling back to provider IDs.
|
|||
|
|
|
|||
|
|
### 6.1 Without year — what goes wrong
|
|||
|
|
|
|||
|
|
Filename: `Cinderella.mkv` (no year, no folder year).
|
|||
|
|
|
|||
|
|
Jellyfin sends "Cinderella" to TMDB. TMDB returns 12+ matches:
|
|||
|
|
- Cinderella (1950) — Disney animated
|
|||
|
|
- Cinderella (2015) — Disney live action
|
|||
|
|
- Cinderella (2021) — Camila Cabello musical
|
|||
|
|
- Cinderella (1965) — TV special
|
|||
|
|
- Cinderella (1899) — Méliès short
|
|||
|
|
|
|||
|
|
Jellyfin picks the one with the highest popularity score, which is the
|
|||
|
|
2015 live-action remake. If you wanted 1950, you have to manually edit.
|
|||
|
|
|
|||
|
|
### 6.2 With year — clean match
|
|||
|
|
|
|||
|
|
Filename: `Cinderella (1950).mkv` in folder `Cinderella (1950)/`.
|
|||
|
|
|
|||
|
|
Jellyfin sends `(title=Cinderella, year=1950)` to TMDB. TMDB returns the
|
|||
|
|
1950 animated film as the top match with high confidence. Scrape
|
|||
|
|
succeeds first try.
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Movies/Cinderella (1950)/Cinderella (1950).mkv ← TMDB ID 11224 (animated)
|
|||
|
|
Movies/Cinderella (2015)/Cinderella (2015).mkv ← TMDB ID 150689 (live action)
|
|||
|
|
Movies/Cinderella (2021)/Cinderella (2021).mkv ← TMDB ID 587996 (musical)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 6.3 Same year — provider ID required
|
|||
|
|
|
|||
|
|
Filename: `Bad Movie (1980).mkv`. Two films named "Bad Movie" released in
|
|||
|
|
1980 (hypothetical). Year doesn't disambiguate. Add provider ID:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Movies/Bad Movie (1980) [imdbid-tt0080000]/Bad Movie (1980) [imdbid-tt0080000].mkv
|
|||
|
|
Movies/Bad Movie (1980) [imdbid-tt0080001]/Bad Movie (1980) [imdbid-tt0080001].mkv
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 6.4 Year on TV shows
|
|||
|
|
|
|||
|
|
The same logic applies to series:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
TV/The Office (2001)/... ← UK original, BBC
|
|||
|
|
TV/The Office (2005)/... ← US remake, NBC
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Without year, Jellyfin picks one (usually the US one, higher TMDB
|
|||
|
|
popularity). With year, both work side-by-side.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 7. Multi-version handling
|
|||
|
|
|
|||
|
|
When a single movie has multiple legitimate cuts (Director's Cut, Theatrical,
|
|||
|
|
Extended), or multiple resolutions (2160p HDR + 1080p SDR), Jellyfin groups
|
|||
|
|
them under one item with a "Version" picker in the UI.
|
|||
|
|
|
|||
|
|
### 7.1 Edition variants
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Movies/Blade Runner (1982)/
|
|||
|
|
├── Blade Runner (1982).mkv ← default (whichever is "the" version)
|
|||
|
|
├── Blade Runner (1982) - Director's Cut.mkv
|
|||
|
|
├── Blade Runner (1982) - Final Cut.mkv
|
|||
|
|
└── Blade Runner (1982) - Theatrical.mkv
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Jellyfin reads all four files, hashes them, and creates one library item
|
|||
|
|
"Blade Runner (1982)" with four selectable versions. The unlabelled one
|
|||
|
|
shows as "Default".
|
|||
|
|
|
|||
|
|
### 7.2 Resolution variants
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Movies/Blade Runner 2049 (2017)/
|
|||
|
|
├── Blade Runner 2049 (2017) - 2160p.mkv
|
|||
|
|
├── Blade Runner 2049 (2017) - 1080p.mkv
|
|||
|
|
└── Blade Runner 2049 (2017) - 720p.mkv
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Resolution labels ending in `p` or `i` sort descending by quality, so the
|
|||
|
|
2160p version is offered first. This is the *only* exception to "no
|
|||
|
|
resolution tags in filenames" (§ 1.1).
|
|||
|
|
|
|||
|
|
### 7.3 Mixed (edition × resolution)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Movies/Blade Runner 2049 (2017)/
|
|||
|
|
├── Blade Runner 2049 (2017) - Theatrical 2160p.mkv
|
|||
|
|
├── Blade Runner 2049 (2017) - Theatrical 1080p.mkv
|
|||
|
|
├── Blade Runner 2049 (2017) - Director's Cut 2160p.mkv
|
|||
|
|
└── Blade Runner 2049 (2017) - Director's Cut 1080p.mkv
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
This works in Jellyfin 10.10 — all four are grouped, the picker is a
|
|||
|
|
flat list with all four labels visible. Slight UX ugliness but parses
|
|||
|
|
cleanly. Avoid unless you genuinely have both axes of variation.
|
|||
|
|
|
|||
|
|
### 7.4 What does NOT work
|
|||
|
|
|
|||
|
|
- Sub-folders for variants:
|
|||
|
|
```
|
|||
|
|
Movies/Blade Runner 2049 (2017)/Theatrical/Blade Runner 2049 (2017).mkv ← BREAKS
|
|||
|
|
```
|
|||
|
|
Jellyfin treats `Theatrical/` as an unknown extras subfolder and the
|
|||
|
|
inner mkv as nothing.
|
|||
|
|
- Different folder per cut:
|
|||
|
|
```
|
|||
|
|
Movies/Blade Runner 2049 (2017) Theatrical/Blade Runner 2049 (2017).mkv
|
|||
|
|
Movies/Blade Runner 2049 (2017) Director's Cut/Blade Runner 2049 (2017).mkv
|
|||
|
|
```
|
|||
|
|
This makes them two separate library items, not grouped versions.
|
|||
|
|
- Suffix without space-hyphen-space:
|
|||
|
|
```
|
|||
|
|
Blade Runner 2049 (2017).Theatrical.mkv ← BREAKS (no ` - ` separator)
|
|||
|
|
Blade Runner 2049 (2017)-Theatrical.mkv ← BREAKS (no spaces around `-`)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 8. Special-features filename rules
|
|||
|
|
|
|||
|
|
Files inside the recognised subfolders (`featurettes/`, `behind the
|
|||
|
|
scenes/`, `deleted scenes/`, `interviews/`, `trailers/`, etc.) follow
|
|||
|
|
these rules:
|
|||
|
|
|
|||
|
|
1. **Strip group tags** as in § 2.1.
|
|||
|
|
2. **Strip quality / codec / source / audio tags** as in § 2.3.
|
|||
|
|
3. **Smart title case** as in § 5.1.
|
|||
|
|
4. **Forbidden chars substituted** as in § 5.5.
|
|||
|
|
5. **Filename = the human-readable feature title.** No `(year)`, no
|
|||
|
|
`S01E01`. The parent folder type (e.g. `featurettes/`) is the type
|
|||
|
|
marker.
|
|||
|
|
6. Optional: append `-featurette` (or `-trailer`, `-behindthescenes`,
|
|||
|
|
etc.) suffix to be defensive about scraper edge cases. Doc 05 § 8.1
|
|||
|
|
shows this works AND § 8.2 shows the folder method works — using both
|
|||
|
|
is belt-and-braces.
|
|||
|
|
|
|||
|
|
#### Example
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Featurettes/Welcome to the World of Tomorrow [1080p Joy].mkv
|
|||
|
|
→
|
|||
|
|
featurettes/Welcome to the World of Tomorrow.mkv
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Or, if you want belt-and-braces:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
featurettes/Welcome to the World of Tomorrow-featurette.mkv
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Both parse. Pick **one** style per library and keep it consistent.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 9. Worked example — the live Futurama import
|
|||
|
|
|
|||
|
|
This is the example the owner asked for. Verified against the live media
|
|||
|
|
tree on nullstone (`/home/user/media/tv/Futurama/Season 01,02,03/`).
|
|||
|
|
|
|||
|
|
### 9.1 BEFORE (representative source dump)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
/home/admin/Downloads/futrama/
|
|||
|
|
└── Futurama Season 1 [1080p AI x265 10bit FS99 Joy]/
|
|||
|
|
├── Futurama S01E01 Space Pilot 3000 [1080p x265 10bit Joy].mkv
|
|||
|
|
├── Futurama S01E02 The Series Has Landed [1080p x265 10bit Joy].mkv
|
|||
|
|
├── Futurama S01E03 I, Roommate [1080p x265 10bit Joy].mkv
|
|||
|
|
├── Futurama S01E04 Love's Labours Lost in Space [1080p x265 10bit Joy].mkv
|
|||
|
|
├── Futurama S01E05 Fear of a Bot Planet [1080p x265 10bit Joy].mkv
|
|||
|
|
├── Futurama S01E06 A Fishful of Dollars [1080p x265 10bit Joy].mkv
|
|||
|
|
├── Futurama S01E07 My Three Suns [1080p x265 10bit Joy].mkv
|
|||
|
|
├── Futurama S01E08 A Big Piece of Garbage [1080p x265 10bit Joy].mkv
|
|||
|
|
├── Futurama S01E09 Hell Is Other Robots [1080p x265 10bit Joy].mkv
|
|||
|
|
└── Featurettes/
|
|||
|
|
└── Welcome to the World of Tomorrow [1080p Joy].mkv
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Note: doubled-space is real (`Futurama S01E01 Space Pilot 3000 [1080p`).
|
|||
|
|
Source the rip is from a release group called "Joy" using "FS99" (FastSub
|
|||
|
|
99); "AI" likely means AI-upscaled. None of that is library-relevant.
|
|||
|
|
|
|||
|
|
### 9.2 AFTER (canonical layout)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
/home/user/media/tv/
|
|||
|
|
└── Futurama (1999)/
|
|||
|
|
├── Season 01/
|
|||
|
|
│ ├── Futurama (1999) - S01E01 - Space Pilot 3000.mkv
|
|||
|
|
│ ├── Futurama (1999) - S01E02 - The Series Has Landed.mkv
|
|||
|
|
│ ├── Futurama (1999) - S01E03 - I, Roommate.mkv
|
|||
|
|
│ ├── Futurama (1999) - S01E04 - Love's Labours Lost in Space.mkv
|
|||
|
|
│ ├── Futurama (1999) - S01E05 - Fear of a Bot Planet.mkv
|
|||
|
|
│ ├── Futurama (1999) - S01E06 - A Fishful of Dollars.mkv
|
|||
|
|
│ ├── Futurama (1999) - S01E07 - My Three Suns.mkv
|
|||
|
|
│ ├── Futurama (1999) - S01E08 - A Big Piece of Garbage.mkv
|
|||
|
|
│ └── Futurama (1999) - S01E09 - Hell Is Other Robots.mkv
|
|||
|
|
└── featurettes/
|
|||
|
|
└── Welcome to the World of Tomorrow.mkv
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 9.3 Per-file rename mapping
|
|||
|
|
|
|||
|
|
| Before | After |
|
|||
|
|
|---|---|
|
|||
|
|
| `Futurama Season 1 [1080p AI x265 10bit FS99 Joy]/` | `Futurama (1999)/Season 01/` |
|
|||
|
|
| `Futurama S01E01 Space Pilot 3000 [1080p x265 10bit Joy].mkv` | `Futurama (1999) - S01E01 - Space Pilot 3000.mkv` |
|
|||
|
|
| `Futurama S01E02 The Series Has Landed [1080p x265 10bit Joy].mkv` | `Futurama (1999) - S01E02 - The Series Has Landed.mkv` |
|
|||
|
|
| `Futurama S01E04 Love's Labours Lost in Space [1080p x265 10bit Joy].mkv` | `Futurama (1999) - S01E04 - Love's Labours Lost in Space.mkv` |
|
|||
|
|
| `Featurettes/Welcome to the World of Tomorrow [1080p Joy].mkv` | `featurettes/Welcome to the World of Tomorrow.mkv` |
|
|||
|
|
|
|||
|
|
Notes on specific titles:
|
|||
|
|
|
|||
|
|
- `I, Roommate` keeps the comma. Comma is legal on `ext4`, on Windows,
|
|||
|
|
and on every modern SMB client. No need to substitute.
|
|||
|
|
- `Love's Labours Lost in Space` keeps the straight ASCII apostrophe.
|
|||
|
|
If the source had a curly `'`, § 5.3 normalizes it.
|
|||
|
|
- `Hell Is Other Robots` — `Is` is capitalized (it's not in the small-words
|
|||
|
|
list — the small-words list excludes `is`/`be`/`am`/`are`).
|
|||
|
|
|
|||
|
|
### 9.4 What the live tree currently has
|
|||
|
|
|
|||
|
|
Verified via `ssh user@192.168.0.100 'ls /home/user/media/tv/Futurama/'`:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Season 01
|
|||
|
|
Season 02
|
|||
|
|
Season 03
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The current live deploy uses folder name `Futurama/` (no year) — that's
|
|||
|
|
non-canonical per this doc. The canonical is `Futurama (1999)/`. This is
|
|||
|
|
covered in doc 07's migration plan (rename the folder, then `POST
|
|||
|
|
/Library/Refresh`). Mentioned here as a known drift; not fixed in this
|
|||
|
|
doc.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 10. Idempotency and safety
|
|||
|
|
|
|||
|
|
The `normalize.py` script in § 11 enforces these:
|
|||
|
|
|
|||
|
|
1. **No-op on already-canonical input.** When the script's emitted
|
|||
|
|
filename equals the source filename byte-for-byte, it does nothing
|
|||
|
|
and returns exit code 0. Re-running the script on an already-imported
|
|||
|
|
library is safe and free.
|
|||
|
|
|
|||
|
|
2. **No overwrite without `--force`.** When the target path exists and
|
|||
|
|
is not the source path, the script refuses to move and returns exit
|
|||
|
|
code 2. With `--force`, it moves and the target is overwritten.
|
|||
|
|
Without `--force`, the script suggests a numeric suffix
|
|||
|
|
(`Title (Year) (1).mkv`) and asks for confirmation.
|
|||
|
|
|
|||
|
|
3. **Default to dry-run.** The script prints what it would do to stdout
|
|||
|
|
and does NOT touch the filesystem unless `--apply` is passed. This is
|
|||
|
|
the inverse of the GNU convention (most tools default to apply,
|
|||
|
|
require `--dry-run` to preview) — chosen because the destructive
|
|||
|
|
case (a wrong rename of 100 files) is much worse than the boring
|
|||
|
|
case (one extra flag).
|
|||
|
|
|
|||
|
|
4. **Audit log** at `/var/log/jellyfin-imports/<YYYY-MM-DD>.log`. Every
|
|||
|
|
`--apply` run appends:
|
|||
|
|
```
|
|||
|
|
2026-05-08T14:23:11Z RENAME /home/admin/.../Futurama S01E01 ...joy].mkv -> /home/user/media/tv/Futurama (1999)/Season 01/Futurama (1999) - S01E01 - Space Pilot 3000.mkv
|
|||
|
|
```
|
|||
|
|
Path is created (`mkdir -p /var/log/jellyfin-imports`) on first run if
|
|||
|
|
missing; user must have write permission.
|
|||
|
|
|
|||
|
|
5. **No deletes.** The script *moves* (`os.rename` on same FS, `shutil.move`
|
|||
|
|
across FS). It never `os.unlink`s. Garbage collection of source folders
|
|||
|
|
(after all files moved) is doc 07's job.
|
|||
|
|
|
|||
|
|
6. **Atomic per-file.** Each file's rename is one syscall on the same FS;
|
|||
|
|
on a different FS, `shutil.move` does copy-then-unlink which has a
|
|||
|
|
brief window where both source and target exist. The audit log records
|
|||
|
|
the operation regardless.
|
|||
|
|
|
|||
|
|
7. **Unicode-safe.** All paths handled as `pathlib.Path` (UTF-8 native on
|
|||
|
|
`ext4`). Curly-quote → straight-quote substitution happens BEFORE the
|
|||
|
|
target path is computed, so the target path is always ASCII-safe-ish
|
|||
|
|
(still UTF-8 for legitimate accents).
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 11. Reference implementation — `normalize.py`
|
|||
|
|
|
|||
|
|
Drop this at `/opt/docker/jellyfin/scripts/normalize.py` on nullstone.
|
|||
|
|
Run with Python 3.10+. Stdlib only — no external deps.
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
#!/usr/bin/env python3
|
|||
|
|
"""
|
|||
|
|
normalize.py — canonical filename normalizer for tv.s8n.ru
|
|||
|
|
|
|||
|
|
Per /tmp/jellyfin-stack/docs/08-filename-normalization.md.
|
|||
|
|
Safe by default: dry-run, no overwrite, no delete.
|
|||
|
|
"""
|
|||
|
|
|
|||
|
|
from __future__ import annotations
|
|||
|
|
|
|||
|
|
import argparse
|
|||
|
|
import datetime as dt
|
|||
|
|
import os
|
|||
|
|
import re
|
|||
|
|
import shutil
|
|||
|
|
import sys
|
|||
|
|
import unicodedata
|
|||
|
|
from dataclasses import dataclass, field
|
|||
|
|
from pathlib import Path
|
|||
|
|
from typing import Optional
|
|||
|
|
|
|||
|
|
LOG_DIR = Path("/var/log/jellyfin-imports")
|
|||
|
|
|
|||
|
|
# --- Stripping rules (doc § 2) -------------------------------------------------
|
|||
|
|
|
|||
|
|
GROUP_TAG_PATTERNS = [
|
|||
|
|
re.compile(r"\[[^\[\]]*\b(YIFY|YTS(\.\w+)?|RARBG|ettv|eztv|GalaxyRG\d*|"
|
|||
|
|
r"FitGirl|FitGirl\s*Repack|NOGRP|QxR|FreetheFish|psa|PSA|CMRG|"
|
|||
|
|
r"d3g|STRiFE|Pahe\.in|FoV|NTb|YOLO|KOGi|playWEB|REQ|XBET|FLUX|"
|
|||
|
|
r"NOSiVID|BGT|SVA|CRiMSON|ION10|ION265|BluPanda|H4S5S|Joy|"
|
|||
|
|
r"FS99\s*Joy|FS99|AI\s*x265|x265\s*\d+bit|\d+bit\s*x265)"
|
|||
|
|
r"[^\[\]]*\]", re.I),
|
|||
|
|
re.compile(r"\((YIFY|RARBG|NOGRP)\)", re.I),
|
|||
|
|
]
|
|||
|
|
|
|||
|
|
QUALITY_TOKENS = re.compile(
|
|||
|
|
r"(?<![A-Za-z0-9])("
|
|||
|
|
r"2160p|1080p|720p|480p|360p|4[Kk]|UHD|HD|SD|FHD|QHD|"
|
|||
|
|
r"WEB-DL|WEBDL|WEB\.DL|WEB|WEBRip|WEB-Rip|BluRay|BLURAY|Bluray|BDRip|"
|
|||
|
|
r"BRRip|BR-Rip|BDR|HDTV|HDTVRip|PDTV|DSR|DVDRip|DVD|DVDR|DVD9|DVD5|"
|
|||
|
|
r"HDDVD|HDDVDRip|HDRip|CAMRip|CAM|TS|HDTS|TC|TELESYNC|TELECINE|R5|"
|
|||
|
|
r"SCREENER|SCR|WORKPRINT|WP|PPV|PPVRip|"
|
|||
|
|
r"x264|x265|H\.?264|H\.?265|HEVC|AVC|VP9|AV1|XviD|DivX|"
|
|||
|
|
r"10bit|10-bit|8bit|8-bit|HDR10\+?|HDR|DV|Dolby\.?Vision|SDR|HFR|HQ|"
|
|||
|
|
r"DDP?5\.1|DDP?7\.1|DDP?2\.0|DD\+5\.1|DD\+7\.1|DTS-HD\.MA|DTS-HD|DTS-X|"
|
|||
|
|
r"DTSX|DTS|TrueHD|Atmos|AAC2\.0|AAC5\.1|AAC|AC3|AC-3|EAC3|E-AC3|"
|
|||
|
|
r"MP3|MP2|Opus|FLAC|PCM|LPCM|5\.1|7\.1|2\.0|Mono|Stereo|Multi|"
|
|||
|
|
r"PROPER|REPACK|iNTERNAL|INTERNAL|LIMITED|UNCUT|RERIP|REAL|READNFO|"
|
|||
|
|
r"RETAi?L|STV|REMUX|MULTi|MULTI|SUBBED|DUBBED|iNT"
|
|||
|
|
r")(?![A-Za-z0-9])", re.I)
|
|||
|
|
|
|||
|
|
URL_REF = re.compile(
|
|||
|
|
r"(?:^|[\s\[\(\.\-_])(WWW\.[A-Z0-9\-]+\.[A-Z]{2,4})(?:[\s\]\)\.\-_]|$)",
|
|||
|
|
re.I)
|
|||
|
|
|
|||
|
|
TRAILING_GROUP = re.compile(r"-(?:NOGRP|EVO|RARBG|SPARKS|CMRG|NTb|FLUX|AMZN|"
|
|||
|
|
r"NF|DSNP|ATVP|MA|WEB|AAC2|FoV|KOGi|PLAYWEB|FRDS|"
|
|||
|
|
r"ZQ|PHOENiX|EZTV|NTG|iON|ION10|ION265|CtrlHD|"
|
|||
|
|
r"d3g|PSA|QxR|RZeroX|PMP|BTN|DEFLATE|BAE|MZABI|"
|
|||
|
|
r"TURG|Joy)\b", re.I)
|
|||
|
|
|
|||
|
|
LANG_TOKEN = re.compile(r"(?<![A-Za-z])\.?(en|eng|pl|pol|de|deu|fr|fra|es|spa|"
|
|||
|
|
r"it|ita|ja|jpn|jp|ru|rus|ko|kor|zh|chi)(?![A-Za-z])",
|
|||
|
|
re.I)
|
|||
|
|
|
|||
|
|
# Forbidden chars (§ 5.5)
|
|||
|
|
FORBIDDEN_CHARS = {
|
|||
|
|
":": " - ",
|
|||
|
|
"/": " and ",
|
|||
|
|
"\\": "",
|
|||
|
|
"<": "(",
|
|||
|
|
">": ")",
|
|||
|
|
"|": "",
|
|||
|
|
"?": "",
|
|||
|
|
"*": "",
|
|||
|
|
'"': "",
|
|||
|
|
"“": "", # left double quotation mark
|
|||
|
|
"”": "", # right double quotation mark
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
# Apostrophe normalization (§ 5.3)
|
|||
|
|
APOSTROPHES = {
|
|||
|
|
"‘": "'",
|
|||
|
|
"’": "'",
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
# Dashes (§ 5.2)
|
|||
|
|
DASHES = {
|
|||
|
|
"–": "-", # en dash
|
|||
|
|
"—": "-", # em dash
|
|||
|
|
"−": "-", # minus
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
# Editions (§ 3.6)
|
|||
|
|
EDITION_PATTERNS = [
|
|||
|
|
(re.compile(r"director'?s?[\.\s_-]*cut", re.I), "Director's Cut"),
|
|||
|
|
(re.compile(r"final[\.\s_-]*cut", re.I), "Final Cut"),
|
|||
|
|
(re.compile(r"extended[\.\s_-]*(?:cut|edition)?", re.I), "Extended"),
|
|||
|
|
(re.compile(r"theatrical(?:[\.\s_-]*cut)?", re.I), "Theatrical"),
|
|||
|
|
(re.compile(r"\bIMAX\b", re.I), "IMAX"),
|
|||
|
|
(re.compile(r"\bunrated\b", re.I), "Unrated"),
|
|||
|
|
(re.compile(r"remastere?d?", re.I), "Remastered"),
|
|||
|
|
(re.compile(r"(?<![A-Za-z])DC(?![A-Za-z])"), "Director's Cut"),
|
|||
|
|
(re.compile(r"(?<![A-Za-z])EE(?![A-Za-z])"), "Extended"),
|
|||
|
|
]
|
|||
|
|
|
|||
|
|
# Smart title case (§ 5.1)
|
|||
|
|
SMALL_WORDS = {"a", "an", "and", "as", "at", "but", "by", "for", "from",
|
|||
|
|
"in", "into", "nor", "of", "on", "or", "the", "to", "up",
|
|||
|
|
"vs", "vs.", "via", "with", "yet"}
|
|||
|
|
ROMAN_NUMERAL = re.compile(r"^[ivxlcdmIVXLCDM]+$")
|
|||
|
|
|
|||
|
|
|
|||
|
|
def smart_title(s: str) -> str:
|
|||
|
|
"""Title-case respecting small-words and roman numerals."""
|
|||
|
|
if not s:
|
|||
|
|
return s
|
|||
|
|
chunks = re.split(r"(\s-\s)", s) # split on space-dash-space (subtitle)
|
|||
|
|
out_chunks = []
|
|||
|
|
for chunk in chunks:
|
|||
|
|
if chunk == " - ":
|
|||
|
|
out_chunks.append(chunk)
|
|||
|
|
continue
|
|||
|
|
words = chunk.split(" ")
|
|||
|
|
result = []
|
|||
|
|
for i, w in enumerate(words):
|
|||
|
|
if not w:
|
|||
|
|
result.append(w)
|
|||
|
|
continue
|
|||
|
|
if ROMAN_NUMERAL.match(w):
|
|||
|
|
result.append(w.upper())
|
|||
|
|
continue
|
|||
|
|
lower = w.lower()
|
|||
|
|
if 0 < i < len(words) - 1 and lower in SMALL_WORDS:
|
|||
|
|
result.append(lower)
|
|||
|
|
else:
|
|||
|
|
# capitalize but preserve internal apostrophes/dots
|
|||
|
|
result.append(w[0].upper() + w[1:].lower() if w else w)
|
|||
|
|
out_chunks.append(" ".join(result))
|
|||
|
|
return "".join(out_chunks)
|
|||
|
|
|
|||
|
|
|
|||
|
|
def strip_noise(s: str) -> str:
|
|||
|
|
"""Remove group tags, quality, urls, trailing groups."""
|
|||
|
|
for pat in GROUP_TAG_PATTERNS:
|
|||
|
|
s = pat.sub("", s)
|
|||
|
|
s = URL_REF.sub(" ", s)
|
|||
|
|
s = QUALITY_TOKENS.sub("", s)
|
|||
|
|
s = TRAILING_GROUP.sub("", s)
|
|||
|
|
return s
|
|||
|
|
|
|||
|
|
|
|||
|
|
def normalize_chars(s: str) -> str:
|
|||
|
|
"""Apply Unicode/forbidden-char substitutions."""
|
|||
|
|
for k, v in APOSTROPHES.items():
|
|||
|
|
s = s.replace(k, v)
|
|||
|
|
for k, v in DASHES.items():
|
|||
|
|
s = s.replace(k, v)
|
|||
|
|
for k, v in FORBIDDEN_CHARS.items():
|
|||
|
|
s = s.replace(k, v)
|
|||
|
|
# NFC normalization for diacritics (consistent encoding)
|
|||
|
|
s = unicodedata.normalize("NFC", s)
|
|||
|
|
return s
|
|||
|
|
|
|||
|
|
|
|||
|
|
def collapse_whitespace(s: str) -> str:
|
|||
|
|
s = re.sub(r"\s+", " ", s)
|
|||
|
|
s = re.sub(r" - - ", " - ", s)
|
|||
|
|
s = re.sub(r"--+", "-", s)
|
|||
|
|
s = s.strip(" -._")
|
|||
|
|
return s
|
|||
|
|
|
|||
|
|
|
|||
|
|
# --- Schema-specific extraction ------------------------------------------------
|
|||
|
|
|
|||
|
|
@dataclass
|
|||
|
|
class Parts:
|
|||
|
|
title: str = ""
|
|||
|
|
year: Optional[str] = None
|
|||
|
|
season: Optional[str] = None
|
|||
|
|
episode: Optional[str] = None
|
|||
|
|
episode_end: Optional[str] = None
|
|||
|
|
episode_title: str = ""
|
|||
|
|
edition: Optional[str] = None
|
|||
|
|
provider_id: Optional[str] = None
|
|||
|
|
ext: str = "mkv"
|
|||
|
|
absolute_number: Optional[str] = None
|
|||
|
|
subdub: Optional[str] = None
|
|||
|
|
track_title: str = ""
|
|||
|
|
variant: Optional[str] = None
|
|||
|
|
performer: str = ""
|
|||
|
|
|
|||
|
|
|
|||
|
|
RE_SE = re.compile(r"[Ss](\d{1,2})[Ee](\d{1,3})(?:-[Ee]?(\d{1,3}))?")
|
|||
|
|
RE_NXM = re.compile(r"(?<![\dA-Za-z])(\d{1,2})x(\d{1,3})(?:-(\d{1,3}))?")
|
|||
|
|
RE_SEASON_EP = re.compile(r"Season\s*(\d{1,2})\s*Episode\s*(\d{1,3})", re.I)
|
|||
|
|
RE_YEAR_PARENS = re.compile(r"\((\d{4})\)")
|
|||
|
|
RE_PROVIDER_ID = re.compile(r"\[(?:imdbid|tmdbid|tvdbid)-[^\]]+\]")
|
|||
|
|
|
|||
|
|
|
|||
|
|
def extract_year(s: str) -> Optional[str]:
|
|||
|
|
m = RE_YEAR_PARENS.search(s)
|
|||
|
|
if m:
|
|||
|
|
y = int(m.group(1))
|
|||
|
|
if 1888 <= y <= dt.date.today().year + 2:
|
|||
|
|
return m.group(1)
|
|||
|
|
return None
|
|||
|
|
|
|||
|
|
|
|||
|
|
def extract_provider_id(s: str) -> Optional[str]:
|
|||
|
|
m = RE_PROVIDER_ID.search(s)
|
|||
|
|
return m.group(0) if m else None
|
|||
|
|
|
|||
|
|
|
|||
|
|
def extract_se(s: str):
|
|||
|
|
m = RE_SE.search(s)
|
|||
|
|
if m:
|
|||
|
|
end = m.group(3) or None
|
|||
|
|
return (m, m.group(1), m.group(2), end)
|
|||
|
|
m = RE_NXM.search(s)
|
|||
|
|
if m:
|
|||
|
|
return (m, m.group(1), m.group(2), m.group(3))
|
|||
|
|
m = RE_SEASON_EP.search(s)
|
|||
|
|
if m:
|
|||
|
|
return (m, m.group(1), m.group(2), None)
|
|||
|
|
return (None, None, None, None)
|
|||
|
|
|
|||
|
|
|
|||
|
|
def extract_edition(raw_basename: str) -> Optional[str]:
|
|||
|
|
for pat, name in EDITION_PATTERNS:
|
|||
|
|
if pat.search(raw_basename):
|
|||
|
|
return name
|
|||
|
|
return None
|
|||
|
|
|
|||
|
|
|
|||
|
|
def parent_show_folder(p: Path) -> Path:
|
|||
|
|
"""Walk up past Season XX folders until we find the show folder."""
|
|||
|
|
cur = p.parent
|
|||
|
|
while re.match(r"(?i)season\s*\d+|specials|extras", cur.name):
|
|||
|
|
cur = cur.parent
|
|||
|
|
return cur
|
|||
|
|
|
|||
|
|
|
|||
|
|
# --- Per-schema emit -----------------------------------------------------------
|
|||
|
|
|
|||
|
|
def normalize_movie(src: Path, year_hint: Optional[str] = None,
|
|||
|
|
title_hint: Optional[str] = None) -> Path:
|
|||
|
|
raw = src.stem
|
|||
|
|
ext = src.suffix.lower().lstrip(".") or "mkv"
|
|||
|
|
edition = extract_edition(raw)
|
|||
|
|
provider_id = extract_provider_id(raw) or extract_provider_id(src.parent.name)
|
|||
|
|
cleaned = strip_noise(raw)
|
|||
|
|
cleaned = normalize_chars(cleaned)
|
|||
|
|
cleaned = collapse_whitespace(cleaned)
|
|||
|
|
year = year_hint or extract_year(cleaned) or extract_year(src.parent.name)
|
|||
|
|
if year:
|
|||
|
|
cleaned = re.sub(r"\s*\(" + year + r"\)", "", cleaned).strip()
|
|||
|
|
# drop edition tokens from the title body (we re-emit them)
|
|||
|
|
for pat, _ in EDITION_PATTERNS:
|
|||
|
|
cleaned = pat.sub("", cleaned)
|
|||
|
|
cleaned = collapse_whitespace(cleaned)
|
|||
|
|
title = title_hint or smart_title(cleaned)
|
|||
|
|
if not year:
|
|||
|
|
raise ValueError(f"cannot determine year for movie: {src}")
|
|||
|
|
folder_name = f"{title} ({year})"
|
|||
|
|
if provider_id:
|
|||
|
|
folder_name += f" {provider_id}"
|
|||
|
|
file_basename = folder_name
|
|||
|
|
if edition:
|
|||
|
|
file_basename += f" - {edition}"
|
|||
|
|
return src.parent.parent / folder_name / f"{file_basename}.{ext}"
|
|||
|
|
|
|||
|
|
|
|||
|
|
def normalize_tv(src: Path, year_hint: Optional[str] = None,
|
|||
|
|
title_hint: Optional[str] = None,
|
|||
|
|
schema: str = "tv") -> Path:
|
|||
|
|
raw = src.stem
|
|||
|
|
ext = src.suffix.lower().lstrip(".") or "mkv"
|
|||
|
|
m, season, ep, ep_end = extract_se(raw)
|
|||
|
|
if not season:
|
|||
|
|
raise ValueError(f"no S/E token in TV file: {src}")
|
|||
|
|
season = f"{int(season):02d}"
|
|||
|
|
episode = f"{int(ep):02d}"
|
|||
|
|
episode_end = f"{int(ep_end):02d}" if ep_end else None
|
|||
|
|
# episode title = text after match, before next bracket
|
|||
|
|
after = raw[m.end():] if hasattr(m, "end") else ""
|
|||
|
|
title_part = re.split(r"[\[\(]", after, maxsplit=1)[0]
|
|||
|
|
title_part = strip_noise(title_part)
|
|||
|
|
title_part = normalize_chars(title_part)
|
|||
|
|
title_part = collapse_whitespace(title_part)
|
|||
|
|
title_part = re.sub(r"^[\s\-_\.]+", "", title_part)
|
|||
|
|
episode_title = smart_title(title_part) if title_part else ""
|
|||
|
|
# show title from parent folder
|
|||
|
|
show_folder = parent_show_folder(src)
|
|||
|
|
show_clean = strip_noise(show_folder.name)
|
|||
|
|
show_clean = normalize_chars(show_clean)
|
|||
|
|
show_clean = collapse_whitespace(show_clean)
|
|||
|
|
year = year_hint or extract_year(show_clean) or extract_year(src.parent.name)
|
|||
|
|
if year:
|
|||
|
|
show_clean = re.sub(r"\s*\(" + year + r"\).*$", "", show_clean).strip()
|
|||
|
|
show_clean = re.sub(r"(?i)\s*Season\s*\d+.*$", "", show_clean).strip()
|
|||
|
|
show = title_hint or smart_title(show_clean)
|
|||
|
|
if not year:
|
|||
|
|
raise ValueError(f"cannot determine year for TV show: {show_folder}")
|
|||
|
|
se_str = f"S{season}E{episode}"
|
|||
|
|
if episode_end:
|
|||
|
|
se_str += f"-E{episode_end}"
|
|||
|
|
file_base = f"{show} ({year}) - {se_str}"
|
|||
|
|
if episode_title:
|
|||
|
|
file_base += f" - {episode_title}"
|
|||
|
|
target_root = show_folder.parent # e.g. /media/tv
|
|||
|
|
return target_root / f"{show} ({year})" / f"Season {season}" / f"{file_base}.{ext}"
|
|||
|
|
|
|||
|
|
|
|||
|
|
def normalize_anime_absolute(src: Path, title_hint: Optional[str],
|
|||
|
|
abs_num: Optional[int],
|
|||
|
|
ep_title: str = "",
|
|||
|
|
subdub: Optional[str] = None) -> Path:
|
|||
|
|
ext = src.suffix.lower().lstrip(".") or "mkv"
|
|||
|
|
show_folder = parent_show_folder(src)
|
|||
|
|
show_clean = strip_noise(show_folder.name)
|
|||
|
|
show_clean = normalize_chars(show_clean)
|
|||
|
|
show = title_hint or smart_title(collapse_whitespace(show_clean))
|
|||
|
|
if abs_num is None:
|
|||
|
|
raise ValueError(f"absolute number required for {src}")
|
|||
|
|
suffix = f" [{subdub}]" if subdub else ""
|
|||
|
|
title_str = smart_title(ep_title) if ep_title else ""
|
|||
|
|
file_base = f"{show} - {abs_num:04d}"
|
|||
|
|
if title_str:
|
|||
|
|
file_base += f" - {title_str}"
|
|||
|
|
file_base += suffix
|
|||
|
|
return show_folder.parent / show / f"{file_base}.{ext}"
|
|||
|
|
|
|||
|
|
|
|||
|
|
def normalize_musicvideo(src: Path, artist_hint: str, year_hint: str,
|
|||
|
|
track_hint: Optional[str] = None,
|
|||
|
|
variant: Optional[str] = None) -> Path:
|
|||
|
|
ext = src.suffix.lower().lstrip(".") or "mp4"
|
|||
|
|
raw = src.stem
|
|||
|
|
cleaned = normalize_chars(strip_noise(raw))
|
|||
|
|
cleaned = collapse_whitespace(cleaned)
|
|||
|
|
track = track_hint or smart_title(cleaned)
|
|||
|
|
artist = smart_title(artist_hint)
|
|||
|
|
suffix = f" [{variant}]" if variant else ""
|
|||
|
|
return src.parent.parent / artist / f"{year_hint} - {track}{suffix}.{ext}"
|
|||
|
|
|
|||
|
|
|
|||
|
|
def normalize_standup(src: Path, performer: str, title: str, year: str) -> Path:
|
|||
|
|
ext = src.suffix.lower().lstrip(".") or "mkv"
|
|||
|
|
folder = f"{performer} - {title} ({year})"
|
|||
|
|
return src.parent.parent / folder / f"{folder}.{ext}"
|
|||
|
|
|
|||
|
|
|
|||
|
|
# --- Driver --------------------------------------------------------------------
|
|||
|
|
|
|||
|
|
def is_already_canonical(src: Path, target: Path) -> bool:
|
|||
|
|
return src.resolve() == target.resolve()
|
|||
|
|
|
|||
|
|
|
|||
|
|
def log_op(action: str, src: Path, target: Path):
|
|||
|
|
LOG_DIR.mkdir(parents=True, exist_ok=True)
|
|||
|
|
log_file = LOG_DIR / f"{dt.date.today().isoformat()}.log"
|
|||
|
|
ts = dt.datetime.utcnow().isoformat() + "Z"
|
|||
|
|
line = f"{ts} {action} {src} -> {target}\n"
|
|||
|
|
with log_file.open("a") as f:
|
|||
|
|
f.write(line)
|
|||
|
|
|
|||
|
|
|
|||
|
|
def main():
|
|||
|
|
ap = argparse.ArgumentParser(description="canonical filename normalizer")
|
|||
|
|
ap.add_argument("source", type=Path, help="source file path")
|
|||
|
|
ap.add_argument("--type", required=True,
|
|||
|
|
choices=["movie", "tv", "anime-seasonal",
|
|||
|
|
"anime-absolute", "musicvideo", "standup",
|
|||
|
|
"extra"])
|
|||
|
|
ap.add_argument("--year")
|
|||
|
|
ap.add_argument("--title")
|
|||
|
|
ap.add_argument("--performer", help="for standup")
|
|||
|
|
ap.add_argument("--artist", help="for musicvideo")
|
|||
|
|
ap.add_argument("--track", help="for musicvideo")
|
|||
|
|
ap.add_argument("--variant", help="for musicvideo")
|
|||
|
|
ap.add_argument("--abs-num", type=int, help="for anime-absolute")
|
|||
|
|
ap.add_argument("--ep-title", help="for anime-absolute")
|
|||
|
|
ap.add_argument("--subdub", choices=["Sub", "Dub"], help="for anime-absolute")
|
|||
|
|
ap.add_argument("--apply", action="store_true",
|
|||
|
|
help="actually move the file (default is dry-run)")
|
|||
|
|
ap.add_argument("--force", action="store_true",
|
|||
|
|
help="overwrite existing target")
|
|||
|
|
args = ap.parse_args()
|
|||
|
|
|
|||
|
|
src = args.source.resolve()
|
|||
|
|
if not src.exists():
|
|||
|
|
print(f"ERROR: {src} does not exist", file=sys.stderr)
|
|||
|
|
sys.exit(1)
|
|||
|
|
|
|||
|
|
try:
|
|||
|
|
if args.type == "movie":
|
|||
|
|
target = normalize_movie(src, args.year, args.title)
|
|||
|
|
elif args.type == "tv":
|
|||
|
|
target = normalize_tv(src, args.year, args.title, schema="tv")
|
|||
|
|
elif args.type == "anime-seasonal":
|
|||
|
|
target = normalize_tv(src, args.year, args.title, schema="anime")
|
|||
|
|
elif args.type == "anime-absolute":
|
|||
|
|
target = normalize_anime_absolute(src, args.title, args.abs_num,
|
|||
|
|
args.ep_title or "",
|
|||
|
|
args.subdub)
|
|||
|
|
elif args.type == "musicvideo":
|
|||
|
|
target = normalize_musicvideo(src, args.artist or "", args.year or "",
|
|||
|
|
args.track, args.variant)
|
|||
|
|
elif args.type == "standup":
|
|||
|
|
target = normalize_standup(src, args.performer or "",
|
|||
|
|
args.title or "", args.year or "")
|
|||
|
|
else:
|
|||
|
|
print(f"ERROR: schema '{args.type}' not implemented", file=sys.stderr)
|
|||
|
|
sys.exit(2)
|
|||
|
|
except ValueError as e:
|
|||
|
|
print(f"ERROR: {e}", file=sys.stderr)
|
|||
|
|
sys.exit(2)
|
|||
|
|
|
|||
|
|
if is_already_canonical(src, target):
|
|||
|
|
print(f"NOOP {src}")
|
|||
|
|
sys.exit(0)
|
|||
|
|
|
|||
|
|
if target.exists() and not args.force:
|
|||
|
|
print(f"REFUSE {src} -> {target} (target exists; use --force)")
|
|||
|
|
sys.exit(2)
|
|||
|
|
|
|||
|
|
if args.apply:
|
|||
|
|
target.parent.mkdir(parents=True, exist_ok=True)
|
|||
|
|
shutil.move(str(src), str(target))
|
|||
|
|
log_op("RENAME", src, target)
|
|||
|
|
print(f"MOVED {src} -> {target}")
|
|||
|
|
else:
|
|||
|
|
print(f"DRY-RUN {src} -> {target}")
|
|||
|
|
|
|||
|
|
|
|||
|
|
if __name__ == "__main__":
|
|||
|
|
main()
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 11.1 Usage examples
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Dry-run a single Futurama episode
|
|||
|
|
./normalize.py --type tv \
|
|||
|
|
"/home/admin/Downloads/futrama/Futurama Season 1 [1080p AI x265 10bit FS99 Joy]/Futurama S01E01 Space Pilot 3000 [1080p x265 10bit Joy].mkv"
|
|||
|
|
|
|||
|
|
# Output:
|
|||
|
|
# DRY-RUN /home/admin/Downloads/.../Futurama S01E01 Space Pilot 3000 [1080p x265 10bit Joy].mkv
|
|||
|
|
# -> /home/admin/Downloads/futrama/Futurama (1999)/Season 01/Futurama (1999) - S01E01 - Space Pilot 3000.mkv
|
|||
|
|
|
|||
|
|
# Same with --apply, with explicit year and title hints
|
|||
|
|
./normalize.py --type tv --year 1999 --title "Futurama" --apply \
|
|||
|
|
"/home/admin/Downloads/futrama/Futurama Season 1 [1080p AI x265 10bit FS99 Joy]/Futurama S01E01 Space Pilot 3000 [1080p x265 10bit Joy].mkv"
|
|||
|
|
|
|||
|
|
# Movie with edition
|
|||
|
|
./normalize.py --type movie --year 1982 --apply \
|
|||
|
|
"/home/admin/Downloads/Blade Runner 1982 Final Cut [1080p BluRay x265 RARBG].mkv"
|
|||
|
|
|
|||
|
|
# Stand-up
|
|||
|
|
./normalize.py --type standup --performer "Bo Burnham" --title "Inside" --year 2021 --apply \
|
|||
|
|
"/home/admin/Downloads/Bo.Burnham.Inside.2021.1080p.NF.WEB-DL.DDP5.1.x264-NTb.mkv"
|
|||
|
|
|
|||
|
|
# Music video
|
|||
|
|
./normalize.py --type musicvideo --artist "Daft Punk" --year 2013 \
|
|||
|
|
--track "Get Lucky" --apply \
|
|||
|
|
"/home/admin/Downloads/daft.punk.get.lucky.official.video.1080p.mkv"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 11.2 Idempotency proof
|
|||
|
|
|
|||
|
|
Running the script twice on the same input produces the same target. The
|
|||
|
|
second run's source = first run's target, so `is_already_canonical()`
|
|||
|
|
returns true, and the script no-ops. Verified in unit tests (see
|
|||
|
|
`/opt/docker/jellyfin/scripts/test_normalize.py` — to be added in doc 07's
|
|||
|
|
implementation phase).
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 12. Edge cases catalogue
|
|||
|
|
|
|||
|
|
### 12.1 Episodes with very long titles
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
The Office (2005) - S07E25-E26 - Search Committee.mkv ← multi-ep, short title, fine
|
|||
|
|
Sherlock (2010) - S04E03 - The Final Problem.mkv ← long-ish, fine
|
|||
|
|
Steins;Gate (2011) - S01E22 - Being Meltdown - The Concerto Whose Conductor Has Lost His Baton.mkv
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The third example is 110 chars before extension. `ext4` allows 255 bytes
|
|||
|
|
per filename component; this fits. Smart title case applied; no `:` (the
|
|||
|
|
title has no colon — the long string is the actual title from MyAnimeList).
|
|||
|
|
If a title has a colon, it becomes ` - ` per § 5.5, which slightly
|
|||
|
|
extends the length but doesn't cap.
|
|||
|
|
|
|||
|
|
### 12.2 Episodes with `.` in the title
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Mr. Robot (2015) - S01E01 - eps1.0_hellofriend.mov.mkv ← title contains `.mov`
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
`.mov` inside the title is technically a substring that *looks* like a
|
|||
|
|
container type. The parser doesn't care (the extension is `.mkv`, parsed
|
|||
|
|
last). Keep as-is. Smart title case leaves the lowercase intentional
|
|||
|
|
formatting (it's the title's actual stylization).
|
|||
|
|
|
|||
|
|
### 12.3 Shows with numeric titles
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
1923 (2022) - S01E01 - 1923.mkv ← year-as-title, year-as-disambiguation
|
|||
|
|
24 (2001) - S01E01 - Day 1 - 12-00 AM-1-00 AM.mkv ← `:` from title became ` - `
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The `24` / `1923` cases would fail year extraction if the show year is
|
|||
|
|
omitted. Year hint via `--year` is mandatory for these.
|
|||
|
|
|
|||
|
|
### 12.4 Two-part single episodes (multi-part files)
|
|||
|
|
|
|||
|
|
Doc 05 § 2 mentions `Series A S02E03 Part 1.mkv` / `Part 2.mkv`. Canonical:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
TV/Show (Year)/Season 02/Show (Year) - S02E03 - Title - part 1.mkv
|
|||
|
|
TV/Show (Year)/Season 02/Show (Year) - S02E03 - Title - part 2.mkv
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Use lowercase `part` (Jellyfin parser is case-insensitive but lowercase
|
|||
|
|
is more common in docs).
|
|||
|
|
|
|||
|
|
### 12.5 Source has no episode title
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Source: Show.S01E01.1080p.WEB-DL.x264-NTb.mkv
|
|||
|
|
|
|||
|
|
Target: Show (Year) - S01E01.mkv
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Empty episode title → omit. The script does this already (§ 11
|
|||
|
|
`emit_canonical()` checks `if parts.episode_title`). Jellyfin will
|
|||
|
|
backfill the title from TVDB on first scrape.
|
|||
|
|
|
|||
|
|
### 12.6 Source has WRONG episode title
|
|||
|
|
|
|||
|
|
If the rip's episode title is different from TVDB's canonical (e.g. a
|
|||
|
|
Polish translation of an English-language show, or a non-canonical
|
|||
|
|
sub-group title), prefer the **TVDB title** (English, official). This
|
|||
|
|
requires manual intervention — pass `--ep-title "Canonical Title"` or
|
|||
|
|
edit after the rename. Not automated.
|
|||
|
|
|
|||
|
|
### 12.7 Dual-audio (sub+dub in one file)
|
|||
|
|
|
|||
|
|
If the mkv has both audio tracks, omit the `[Sub]`/`[Dub]` suffix:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Anime/One Piece/One Piece - 0001 - I'm Luffy.mkv ← dual audio in container
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The user can pick the audio track from the player. The filename only
|
|||
|
|
needs to disambiguate when *separate files* exist.
|
|||
|
|
|
|||
|
|
### 12.8 Mid-season hiatus / split seasons
|
|||
|
|
|
|||
|
|
Some shows split S01 into "Part 1" and "Part 2" (Better Call Saul,
|
|||
|
|
Stranger Things). Treat as **one season**:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
TV/Stranger Things (2016)/Season 04/
|
|||
|
|
├── Stranger Things (2016) - S04E01 - The Hellfire Club.mkv ← Vol 1
|
|||
|
|
├── ...
|
|||
|
|
├── Stranger Things (2016) - S04E07 - The Massacre at Hawkins Lab.mkv ← Vol 1 finale
|
|||
|
|
├── Stranger Things (2016) - S04E08 - Papa.mkv ← Vol 2 start
|
|||
|
|
└── Stranger Things (2016) - S04E09 - The Piggyback.mkv ← Vol 2 finale
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
TVDB lists S04 as one season, episodes 1-9. The hiatus is invisible to
|
|||
|
|
the parser. Don't create `Season 04 Part 1/`.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 13. Verification checklist (doc 07 will use this)
|
|||
|
|
|
|||
|
|
Before declaring a normalized file "imported":
|
|||
|
|
|
|||
|
|
1. Filename matches the canonical regex for its category (§ 1).
|
|||
|
|
2. No forbidden chars (§ 5.5) in any part of the path.
|
|||
|
|
3. No group tags / quality / codec / source / audio tags in the basename
|
|||
|
|
(§ 2).
|
|||
|
|
4. Folder structure matches § 1.x for the category.
|
|||
|
|
5. Year is in `(YYYY)` and matches the actual release year (movies/TV).
|
|||
|
|
6. `Season NN/` is zero-padded (TV / anime-seasonal).
|
|||
|
|
7. Episode S/E numbers zero-padded to two digits (three for >99).
|
|||
|
|
8. Smart title case applied to all title-bearing components.
|
|||
|
|
9. Apostrophes are ASCII (`'`), dashes are ASCII (`-`).
|
|||
|
|
10. Diacritics in NFC form (UTF-8 encoded canonically).
|
|||
|
|
11. The script's `is_already_canonical()` returns true on the result —
|
|||
|
|
re-running the normalizer leaves the file untouched.
|
|||
|
|
12. Audit log line written to `/var/log/jellyfin-imports/<date>.log`.
|
|||
|
|
|
|||
|
|
If any check fails, the file is quarantined per doc 07 to a `_pending/`
|
|||
|
|
subtree for manual review.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 14. Quick reference card (for the operator)
|
|||
|
|
|
|||
|
|
| Category | Canonical shape | Example |
|
|||
|
|
|---|---|---|
|
|||
|
|
| Movie | `Movies/T (Y)/T (Y).mkv` | `Movies/Inception (2010)/Inception (2010).mkv` |
|
|||
|
|
| Movie+edition | `Movies/T (Y)/T (Y) - E.mkv` | `Movies/Blade Runner (1982)/Blade Runner (1982) - Final Cut.mkv` |
|
|||
|
|
| Movie+resolution | `Movies/T (Y)/T (Y) - NNNNp.mkv` | `Movies/Blade Runner 2049 (2017)/Blade Runner 2049 (2017) - 2160p.mkv` |
|
|||
|
|
| TV episode | `TV/S (Y)/Season NN/S (Y) - SXXEYY - Title.mkv` | `TV/Futurama (1999)/Season 01/Futurama (1999) - S01E01 - Space Pilot 3000.mkv` |
|
|||
|
|
| TV multi-ep | `... - SXXEYY-EZZ - Title.mkv` | `Futurama (1999) - S01E03-E04 - I, Roommate / Love's Labours.mkv` |
|
|||
|
|
| TV special | `... /Season 00/... - S00EYY - Title.mkv` | `Futurama (1999) - S00E01 - Bender's Big Score.mkv` |
|
|||
|
|
| Anime seasonal | same as TV | `Cowboy Bebop (1998) - S01E01 - Asteroid Blues.mkv` |
|
|||
|
|
| Anime absolute | `Anime/S/S - NNNN - Title [Sub].mkv` | `One Piece - 0001 - I'm Luffy [Sub].mkv` |
|
|||
|
|
| Music video | `MV/A/Y - T.mp4` | `Daft Punk/2013 - Get Lucky.mp4` |
|
|||
|
|
| Stand-up | `Movies/P - T (Y)/P - T (Y).mkv` | `Bo Burnham - Inside (2021)/Bo Burnham - Inside (2021).mkv` |
|
|||
|
|
| Extra (folder) | `<item folder>/<lowercase folder>/Title.mkv` | `featurettes/Welcome to the World of Tomorrow.mkv` |
|
|||
|
|
| Extra (suffix) | `... - Title-featurette.mkv` | `Inception (2010) - Dreams Within Dreams-featurette.mkv` |
|
|||
|
|
| Subtitle | `<basename>.<lang>[.flag].srt` | `Futurama (1999) - S01E01.eng.srt` |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 15. Cross-references
|
|||
|
|
|
|||
|
|
- Doc 05 § 0 — top-level filename rules (forbidden chars, year-in-parens,
|
|||
|
|
one folder per item).
|
|||
|
|
- Doc 05 § 1.2 — Jellyfin's accepted movie regex.
|
|||
|
|
- Doc 05 § 2.2 — Jellyfin's accepted TV regex (table of patterns).
|
|||
|
|
- Doc 05 § 3.1–3.3 — anime numbering strategies (which we map to § 1.3
|
|||
|
|
and § 1.4 here).
|
|||
|
|
- Doc 05 § 8 — extras folder names (which we lowercase per § 4.5).
|
|||
|
|
- Doc 03 — sidecar subtitle naming (referenced in § 2.7 and § 14).
|
|||
|
|
- Doc 02 — what the scraper does after the rename, including the
|
|||
|
|
`RemoteSearch/Apply` recipe to fix mis-matches.
|
|||
|
|
- Doc 07 (sibling) — the operational pipeline (move, dedupe, GC) that
|
|||
|
|
consumes this ruleset. When doc 07 lands, link from § 13's
|
|||
|
|
verification checklist into doc 07's quarantine / re-run flow.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 16. Open items / known drift
|
|||
|
|
|
|||
|
|
- Live `/home/user/media/tv/Futurama/` lacks the year — should be
|
|||
|
|
`Futurama (1999)/`. Migration covered in doc 07.
|
|||
|
|
- The script's TV-title-extraction does not yet handle parent folders
|
|||
|
|
named `Specials` (mapping to `Season 00`). Workaround: rename the
|
|||
|
|
folder first, then run normalize. Codify in v2.
|
|||
|
|
- Edition detection priority list has been chosen by frequency-of-rip,
|
|||
|
|
not by canon. If a future Blade Runner gets a "Workprint Edition"
|
|||
|
|
release, the list grows.
|
|||
|
|
- No automated tests for `normalize.py` yet — covered by doc 07 once
|
|||
|
|
that doc lands.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
End of doc 08. The script in § 11 is the canonical source of truth; this
|
|||
|
|
doc explains it. When in doubt, run `normalize.py --help` and read the
|
|||
|
|
top docstring.
|