- Domain: tv.s8n.ru retired (404). nasflix.s8n.ru live (302 → /web). Pi-hole local DNS updated. Traefik file-provider router rule + docker-label router rule both flipped. Jellyfin PublishedServerUrl env updated. Cert re-issued via Gandi DNS-01. Onyx /etc/hosts pin moved. - Repo: forgejo PATCH /api/v1/repos rename. Local clone remote URL updated. All in-tree refs to tv.s8n.ru and jellyfin-stack swept (sed). - Scope: TV Shows + Movies only. anime/, musicvideos/, home/, music/, docs-*/ libraries removed from canonical layout. Sections kept as reference for re-introduction. - Branding LoginDisclaimer text updated to nasflix.s8n.ru.
1853 lines
68 KiB
Markdown
1853 lines
68 KiB
Markdown
# 08 — Filename & Folder Normalization Ruleset (nasflix.s8n.ru)
|
||
|
||
Last updated: 2026-05-08
|
||
Server: Jellyfin 10.10.3 on nullstone, container `jellyfin`
|
||
Library root inside container: `/media`
|
||
Library root on host: `/home/user/media`
|
||
|
||
This document is the **normative ruleset** for renaming downloaded media into a
|
||
canonical, predictable, group-tag-free shape before it lands in the live
|
||
library tree. It is the layer between "torrent dump" and "file ready for the
|
||
scanner".
|
||
|
||
Cross-links:
|
||
|
||
- [`05-file-structure-rules.md`](05-file-structure-rules.md) — what Jellyfin's
|
||
parser accepts; this doc picks one of the accepted forms and locks it in.
|
||
- [`07-cleanup-and-imports.md`](07-cleanup-and-imports.md) — the operational
|
||
pipeline (move, dedupe, garbage collect) that consumes this ruleset. Doc 08
|
||
defines *what* canonical looks like; doc 07 defines *how* to apply it.
|
||
- [`02-metadata-and-titles.md`](02-metadata-and-titles.md) — what Jellyfin
|
||
does after the rename (parse, scrape, lock).
|
||
- [`03-subtitles.md`](03-subtitles.md) — sidecar `.srt` / `.ass` naming
|
||
(referenced from § 5.6 below).
|
||
|
||
> **Status of this doc:** specification + reference implementation. The
|
||
> `normalize.py` script in § 11 is canonical. Anything not codified by the
|
||
> script is documentation only — when the doc and the script disagree, the
|
||
> script wins, and the doc gets fixed.
|
||
|
||
---
|
||
|
||
## 0. Why a normalization ruleset (and why now)
|
||
|
||
Doc 05 establishes that Jellyfin's parser is permissive: dots, dashes,
|
||
underscores, and spaces are interchangeable; `S01E01`, `s01e01`, `1x01`, and
|
||
`Season 1 Episode 1` all parse to the same thing. That permissiveness is great
|
||
for *getting Jellyfin to scrape a torrent dump*, but it is a disaster for
|
||
**operating a library at scale**:
|
||
|
||
1. **Search becomes noisy.** SMB / Syncthing / Dolphin search across mixed
|
||
patterns surfaces irrelevant matches (`S01E01` vs `1x01` vs `s01.e01`).
|
||
2. **Diff / audit / dedupe scripts** get harder. Every regex needs to handle
|
||
N forms. The cleanup pass (doc 07) is dramatically cheaper if every file
|
||
in the tree obeys one shape.
|
||
3. **Visual scan in `ls`** becomes unreadable when half the filenames have
|
||
`[1080p AI x265 10bit FS99 Joy]` glued on and the other half don't.
|
||
4. **Future migrations** (Plex, Kodi, mobile sync to a Win/Mac client) all
|
||
have stricter parsers than Jellyfin. The strictest sane shape that
|
||
Jellyfin accepts is also the most portable. Pay the cost once.
|
||
5. **Cross-platform safety.** This deploy is Linux-only today, but the
|
||
workspace's Syncthing setup (see ai-lab `SYSTEM.md`) implies future
|
||
sync to Win/Mac clients. Choose Windows-safe filenames now and never
|
||
touch this again.
|
||
|
||
The cost of the ruleset is one Python script and discipline at import time.
|
||
Both are bounded. The cost of *not* having one compounds with every new
|
||
release.
|
||
|
||
---
|
||
|
||
## 1. Canonical formats — what the tree must look like
|
||
|
||
This is the lock-in. **One shape per category. No alternatives. No "but my
|
||
release group did it differently".**
|
||
|
||
### 1.1 Movies
|
||
|
||
```
|
||
Movies/<Title> (<Year>)/<Title> (<Year>).<ext>
|
||
Movies/<Title> (<Year>)/<Title> (<Year>) - <Edition>.<ext> (when edition matters)
|
||
Movies/<Title> (<Year>) [<provider-id>]/<Title> (<Year>) [<provider-id>].<ext> (when ambiguous)
|
||
```
|
||
|
||
- `<Title>` — smart title case (§ 5.1), forbidden chars stripped (§ 5.5).
|
||
- `<Year>` — first theatrical-release year, in parens, single space before `(`.
|
||
Mandatory in this deploy (doc 05 § 0 rule 5), even when the title is unique.
|
||
- `<Edition>` — when present, exactly one of:
|
||
`Director's Cut`, `Extended`, `Theatrical`, `IMAX`, `Unrated`, `Final Cut`,
|
||
`Remastered`. Anything else (e.g. `Snyder Cut`, `Workprint`, `4K
|
||
Remaster`) is admissible only with a written justification in the import
|
||
log; otherwise normalize to the closest of the seven canonical labels
|
||
above.
|
||
- `<provider-id>` — `imdbid-tt0123456` / `tmdbid-12345` / `tvdbid-12345`
|
||
in square brackets. Optional unless year-based disambiguation isn't
|
||
enough (§ 6.2).
|
||
- `<ext>` — lowercase: `mkv`, `mp4`, `webm`, `avi`. (`mkv` is the rip
|
||
default; `mp4` is the streaming-original default.) Never uppercase
|
||
`.MKV`, `.MP4`.
|
||
|
||
**Forbidden in the filename**: resolution tags (`1080p`, `2160p`, `720p`,
|
||
`4K`), codec tags (`x264`, `x265`, `h264`, `h265`, `HEVC`, `AVC`), source
|
||
tags (`WEB`, `WEB-DL`, `BluRay`, `BRRip`, `HDTV`, `DVDRip`, `WEBRip`),
|
||
audio tags (`AAC`, `AC3`, `DTS`, `DTS-HD.MA`, `5.1`, `7.1`, `Atmos`,
|
||
`Opus`), bitness/HDR tags (`10bit`, `8bit`, `HDR`, `DV`, `SDR`), release
|
||
tags (`PROPER`, `REPACK`, `INTERNAL`, `LIMITED`, `RERIP`), language tags
|
||
(`MULTi`, `DUBBED`, `SUBBED`, `iNTERNAL`), group tags
|
||
(`[YIFY]`, `[RARBG]`, `[FS99 Joy]`, `-NOGRP`, `-EVO`, `-SPARKS`),
|
||
and website refs (`WWW.YIFY-TORRENTS.COM`, `RARBG.txt`-derived names).
|
||
|
||
**Justification — why no resolution/codec tag:**
|
||
|
||
Jellyfin reads stream attributes (resolution, codec, bit-depth, HDR, audio
|
||
codec) directly from the file via `ffprobe` on every scan. The web UI
|
||
displays them. The mobile clients display them. The transcoder picks
|
||
based on them. The filename contributes **zero new information**.
|
||
Including those tags pollutes search results, breaks the byte-exact
|
||
folder-vs-file match required for multi-version movies (doc 05 § 1.2),
|
||
and makes humans skim past the title to find the title. The only
|
||
exception is `Movie (Year) - 1080p.mkv` AS the multi-version label
|
||
when two distinct rips of *the same movie* are kept in the same folder
|
||
(e.g. `Blade Runner 2049 (2017) - 2160p.mkv` next to
|
||
`Blade Runner 2049 (2017) - 1080p.mkv`). In that exact case, the
|
||
resolution IS the disambiguation token. Otherwise, no.
|
||
|
||
#### Examples
|
||
|
||
```
|
||
Movies/Blade Runner (1982)/Blade Runner (1982).mkv
|
||
Movies/Blade Runner (1982)/Blade Runner (1982) - Final Cut.mkv
|
||
Movies/Blade Runner (1982)/Blade Runner (1982) - Director's Cut.mkv
|
||
Movies/Blade Runner 2049 (2017)/Blade Runner 2049 (2017) - 2160p.mkv
|
||
Movies/Blade Runner 2049 (2017)/Blade Runner 2049 (2017) - 1080p.mkv
|
||
Movies/Dune (1984) [imdbid-tt0087182]/Dune (1984) [imdbid-tt0087182].mkv
|
||
```
|
||
|
||
### 1.2 TV shows
|
||
|
||
```
|
||
TV/<Show> (<Year>)/Season <NN>/<Show> (<Year>) - S<NN>E<MM> - <Episode Title>.<ext>
|
||
TV/<Show> (<Year>)/Season <NN>/<Show> (<Year>) - S<NN>E<MM>-E<MM2> - <Episode Title>.<ext>
|
||
TV/<Show> (<Year>)/Season 00/<Show> (<Year>) - S00E<MM> - <Special Title>.<ext>
|
||
```
|
||
|
||
- `<Show>` — smart title case, no provider-id in show folder unless the
|
||
scraper picks the wrong show twice in a row (then add `[tvdbid-NNNN]`).
|
||
- `<Year>` — series **first-air year**, mandatory even when title is unique
|
||
(doc 05 § 0 rule 5; this deploy convention is stricter than upstream
|
||
permissive parsing).
|
||
- `<NN>` — zero-padded two digits. `Season 01`, not `Season 1`. `S01`, not `S1`.
|
||
- `<MM>` — zero-padded two digits. Three digits permissible only for shows
|
||
that exceed 99 episodes per *season* (rare; e.g. some daily anime). See
|
||
doc 05 § 3.1.
|
||
- `<Episode Title>` — title from the metadata provider (TVDB/TMDB) with
|
||
smart title case. Required for human readability; Jellyfin overwrites it
|
||
during scrape but the file basename is what humans see in `ls`.
|
||
- Multi-episode files: `S<NN>E<MM>-E<MM2>` — single hyphen, no spaces.
|
||
Verified parsing per doc 05 § 2.2 table.
|
||
|
||
#### Examples
|
||
|
||
```
|
||
TV/Futurama (1999)/Season 01/Futurama (1999) - S01E01 - Space Pilot 3000.mkv
|
||
TV/Futurama (1999)/Season 01/Futurama (1999) - S01E03-E04 - I, Roommate / Love's Labours Lost in Space.mkv
|
||
TV/Futurama (1999)/Season 00/Futurama (1999) - S00E01 - Bender's Big Score.mkv
|
||
TV/The Office (2005)/Season 02/The Office (2005) - S02E01 - The Dundies.mkv
|
||
```
|
||
|
||
#### Why this shape (not the slimmer `Show S01E01.mkv`)
|
||
|
||
Doc 05 § 2.2 shows three accepted patterns:
|
||
|
||
```
|
||
Futurama (1999) S01E01.mkv
|
||
Futurama (1999) S01E01 - Space Pilot 3000.mkv
|
||
Futurama (1999) - S01E01 - Space Pilot 3000.mkv ← canonical for this deploy
|
||
```
|
||
|
||
The third form (with the leading ` - ` before `S01E01` and the title) is
|
||
chosen because:
|
||
|
||
1. The leading dash visually separates the series-name block from the
|
||
episode-id block. Important when the show's title contains spaces and
|
||
numbers (`Star Trek The Next Generation S01E01`) — without the dash, the
|
||
eye trips over `Generation S01E01`.
|
||
2. Symmetric with the Movies multi-version pattern (`Title (Year) - <Label>`).
|
||
One mental model for the whole library.
|
||
3. Identical to the Sonarr default rename pattern (`{Series Title} -
|
||
S{season:00}E{episode:00} - {Episode Title}`), which means the naming
|
||
pattern is well-trodden and tooling friendly.
|
||
|
||
### 1.3 Anime — seasonal numbering (TVDB-style)
|
||
|
||
Same shape as TV (§ 1.2). Mandatory year. Mandatory `Season NN`. No
|
||
absolute numbers.
|
||
|
||
```
|
||
Anime/<Show> (<Year>)/Season <NN>/<Show> (<Year>) - S<NN>E<MM> - <Episode Title>.<ext>
|
||
```
|
||
|
||
#### Examples
|
||
|
||
```
|
||
Anime/Cowboy Bebop (1998)/Season 01/Cowboy Bebop (1998) - S01E01 - Asteroid Blues.mkv
|
||
Anime/Mushishi (2005)/Season 02/Mushishi (2005) - S02E01 - The Sleeping Mountain.mkv
|
||
Anime/Steins;Gate (2011) [tvdbid-244061]/Season 01/Steins;Gate (2011) [tvdbid-244061] - S01E01 - Turning Point.mkv
|
||
```
|
||
|
||
(`;` is legal on `ext4` but flagged in § 5.5 as risky for portability —
|
||
prefer `Steins-Gate` if portability matters.)
|
||
|
||
### 1.4 Anime — absolute numbering
|
||
|
||
Used **only** for shows >99 episodes that don't fit the seasonal model
|
||
(One Piece, Naruto, Detective Conan, Bleach). For those shows, the
|
||
canonical shape is:
|
||
|
||
```
|
||
Anime/<Show>/<Show> - <NNNN> - <Episode Title> [<Sub|Dub>].<ext>
|
||
```
|
||
|
||
- No `(<Year>)` on the show folder — absolute-numbering shows are usually
|
||
unique by name; if not, fall back to a provider ID
|
||
(`Doraemon (1979) [tvdbid-71603]`, then revert to seasonal Pattern 1.3).
|
||
- `<NNNN>` — **zero-padded four digits** (deterministic; all known
|
||
long-runners stay below 9999). Three-digit padding (`0099`) is wrong;
|
||
four-digit (`0099`) is right and matches the upper bound of the longest
|
||
running show.
|
||
- `[<Sub|Dub>]` — exactly one of `[Sub]` or `[Dub]`. Required for any
|
||
release where both audio tracks are not embedded in one mkv. If the
|
||
release contains both audio tracks in one container, omit the
|
||
bracket.
|
||
- No `Season NN` folder. Absolute numbering puts every episode in the
|
||
show root.
|
||
|
||
#### Deterministic absolute-numbering rule
|
||
|
||
Absolute number = the episode's position in the **broadcast order** as
|
||
listed by AniDB's "main" episode list for that show. NOT the dub broadcast
|
||
order, NOT a re-cut/remaster renumbering. For shows with discrepancies
|
||
between AniDB and TVDB absolute numbering (rare), AniDB wins — that's the
|
||
provider that absolute-numbering plugins (and Shoko) use.
|
||
|
||
#### Examples
|
||
|
||
```
|
||
Anime/One Piece/One Piece - 0001 - I'm Luffy! The Man Who's Gonna Be King of the Pirates! [Sub].mkv
|
||
Anime/One Piece/One Piece - 0001 - I'm Luffy! The Man Who's Gonna Be King of the Pirates! [Dub].mkv
|
||
Anime/Naruto/Naruto - 0001 - Enter Naruto Uzumaki [Sub].mkv
|
||
Anime/Detective Conan/Detective Conan - 1099 - The Detective's Vacation [Sub].mkv
|
||
```
|
||
|
||
#### Caveat
|
||
|
||
Naive Jellyfin without Shoko will mis-handle episodes >99 (doc 05 § 3.3).
|
||
This is a known issue; pick **one** of:
|
||
|
||
- Run Shoko (doc 05 § 3.2). Filenames don't matter for Shoko — but obey
|
||
this ruleset anyway, for human readability and for the day Shoko goes
|
||
away.
|
||
- Re-bucket by TVDB seasons. Most long-runners have a TVDB season split
|
||
(One Piece S01-S22). Use § 1.3 with the seasons.
|
||
|
||
This deploy currently does NOT run Shoko; it currently does NOT host any
|
||
absolute-numbered anime. The shape in § 1.4 is reserved for the day
|
||
Shoko gets installed. Leave it documented.
|
||
|
||
### 1.5 Music videos
|
||
|
||
```
|
||
MusicVideos/<Artist>/<Year> - <Track Title>.<ext>
|
||
MusicVideos/<Artist>/<Year> - <Track Title> [<Variant>].<ext> (when multiple cuts exist)
|
||
```
|
||
|
||
- `<Artist>` — smart title case, comma-separated for collabs
|
||
(`Daft Punk, Pharrell Williams`).
|
||
- `<Year>` — release year of the *video*, not the song. Songs older than
|
||
their videos are common (a 2024 acoustic cover gets the 2024 year).
|
||
- `<Track Title>` — smart title case.
|
||
- `<Variant>` — optional, `[Live]`, `[Acoustic]`, `[Remix]`, `[Alternate]`,
|
||
`[Lyric Video]`. Forbidden: `[1080p]`, `[Official]`, `[HD]`.
|
||
|
||
Music videos do not use `(<Year>)` parens because the library is
|
||
`musicvideos` `CollectionType`, which has no scraper (doc 05 § 5.3) and the
|
||
year is purely cosmetic.
|
||
|
||
#### Examples
|
||
|
||
```
|
||
MusicVideos/Daft Punk/2013 - Get Lucky.mp4
|
||
MusicVideos/Daft Punk/2013 - Get Lucky [Lyric Video].mp4
|
||
MusicVideos/Pink Floyd/1995 - Comfortably Numb [Live].mkv
|
||
MusicVideos/Daft Punk, Pharrell Williams/2013 - Get Lucky.mp4
|
||
```
|
||
|
||
For full **live concerts** (>20 min, multi-song), file under Movies
|
||
instead, per doc 05 § 5.4.
|
||
|
||
### 1.6 Stand-up specials (Movies-typed)
|
||
|
||
Stand-up lives in the Movies library (doc 05 § 4). Folder + filename are
|
||
prefixed with the performer name; treat the whole `<Performer> - <Title>`
|
||
as the canonical "movie title" for parser purposes.
|
||
|
||
```
|
||
Movies/<Performer> - <Title> (<Year>)/<Performer> - <Title> (<Year>).<ext>
|
||
```
|
||
|
||
#### Examples
|
||
|
||
```
|
||
Movies/Bo Burnham - Inside (2021)/Bo Burnham - Inside (2021).mkv
|
||
Movies/Hannah Gadsby - Nanette (2018) [imdbid-tt8465676]/Hannah Gadsby - Nanette (2018) [imdbid-tt8465676].mkv
|
||
Movies/Norm Macdonald - Nothing Special (2022)/Norm Macdonald - Nothing Special (2022).mkv
|
||
```
|
||
|
||
The `<Performer> - ` prefix is **mandatory** for stand-up. Without it, the
|
||
title alone (`Inside (2021)`) ambiguously matches the 2007 horror film
|
||
*Inside*, the 2023 thriller *Inside*, or the 2017 documentary *Inside*.
|
||
The prefix gives TMDB enough disambiguation to land on the correct
|
||
record without a provider-id override.
|
||
|
||
---
|
||
|
||
## 2. What to STRIP from a source filename — exhaustive list
|
||
|
||
This is the substring inventory. The script in § 11 implements all of
|
||
these. The list grew from sampling ~200 distinct release-group filenames
|
||
across `[YIFY]`, `[RARBG]`, `[ettv]`, `[GalaxyRG]`, `[FS99 Joy]`,
|
||
`[NOGRP]`, `[FitGirl]`, and the Futurama corpus on disk.
|
||
|
||
### 2.1 Group tags (square / round brackets)
|
||
|
||
Match anything inside `[...]` or `(...)` *that does not look like a year*.
|
||
Year detection: 4 digits, 1900 ≤ N ≤ current year + 2.
|
||
|
||
Exemplar substrings (case-insensitive):
|
||
|
||
```
|
||
[1080p AI x265 10bit FS99 Joy]
|
||
[YIFY]
|
||
[YTS]
|
||
[YTS.MX]
|
||
[YTS.AG]
|
||
[YTS.AM]
|
||
[RARBG]
|
||
[ettv]
|
||
[eztv]
|
||
[GalaxyRG]
|
||
[GalaxyRG265]
|
||
[FitGirl]
|
||
[FitGirl Repack]
|
||
[NOGRP]
|
||
[QxR]
|
||
[FreetheFish]
|
||
[psa]
|
||
[PSA]
|
||
[CMRG]
|
||
[d3g]
|
||
[STRiFE]
|
||
[Pahe.in]
|
||
[FoV]
|
||
[NTb]
|
||
[YOLO]
|
||
[KOGi]
|
||
[playWEB]
|
||
[REQ]
|
||
[XBET]
|
||
[FLUX]
|
||
[NOSiVID]
|
||
[BGT]
|
||
[SVA]
|
||
[CRiMSON]
|
||
[ION10]
|
||
[ION265]
|
||
[BluPanda]
|
||
[H4S5S]
|
||
[5.1]
|
||
(YIFY)
|
||
(RARBG)
|
||
(NOGRP)
|
||
```
|
||
|
||
### 2.2 Trailing release-group dashes
|
||
|
||
Pattern: `-<UPPERCASE_TOKEN>` at the very end of the basename
|
||
(before extension). Matches:
|
||
|
||
```
|
||
-NOGRP
|
||
-EVO
|
||
-RARBG
|
||
-SPARKS
|
||
-CMRG
|
||
-NTb
|
||
-FLUX
|
||
-AMZN
|
||
-NF
|
||
-DSNP
|
||
-ATVP
|
||
-MA
|
||
-WEB
|
||
-AAC2
|
||
-FoV
|
||
-KOGi
|
||
-PLAYWEB
|
||
-FRDS
|
||
-ZQ
|
||
-PHOENiX
|
||
-EZTV
|
||
-NTG
|
||
-iON
|
||
-ION10
|
||
-ION265
|
||
-CtrlHD
|
||
-d3g
|
||
-PSA
|
||
-QxR
|
||
-RZeroX
|
||
-PMP
|
||
-BTN
|
||
-DEFLATE
|
||
-BAE
|
||
-MZABI
|
||
-TURG
|
||
```
|
||
|
||
The pattern `-[A-Z][A-Z0-9]{1,15}$` (after stripping bracket tags and
|
||
quality tags) captures most of these. The script in § 11 uses an
|
||
allow-list approach instead of a pattern, because release groups
|
||
sometimes exceed 15 chars and sometimes use mixed case.
|
||
|
||
### 2.3 Quality / codec / source / audio tags
|
||
|
||
Strip all of these as standalone tokens (whitespace-, dot-, dash-, or
|
||
underscore-bounded), case-insensitive:
|
||
|
||
**Resolution / aspect:**
|
||
```
|
||
2160p 1080p 720p 480p 360p 4K 4k UHD HD SD FHD QHD
|
||
```
|
||
|
||
**Source:**
|
||
```
|
||
WEB-DL WEBDL WEB.DL WEB WEBRip WEB-Rip BluRay BLURAY Bluray BDRip
|
||
BRRip BR-Rip BDR HDTV HDTVRip PDTV DSR DVDRip DVD DVDR DVD9 DVD5
|
||
HDDVD HDDVDRip HDRip CAMRip CAM TS HDTS TC TELESYNC TELECINE R5
|
||
SCREENER SCR WORKPRINT WP PPV PPVRip
|
||
```
|
||
|
||
**Codec / container hints (in name):**
|
||
```
|
||
x264 x265 H.264 H264 H.265 H265 HEVC AVC VP9 AV1 XviD DivX
|
||
10bit 10-bit 8bit 8-bit HDR HDR10 HDR10+ DV DolbyVision Dolby.Vision
|
||
SDR HFR HQ
|
||
```
|
||
|
||
**Audio:**
|
||
```
|
||
DD5.1 DDP5.1 DD7.1 DDP7.1 DD2.0 DD+5.1 DD+7.1 DTS DTS-HD DTS-HD.MA
|
||
DTS-X DTSX TrueHD Atmos AAC AAC2.0 AAC5.1 AC3 AC-3 EAC3 E-AC3
|
||
MP3 MP2 Opus FLAC PCM LPCM 5.1 7.1 2.0 Mono Stereo Multi
|
||
```
|
||
|
||
**Release-process tags:**
|
||
```
|
||
PROPER REPACK iNTERNAL INTERNAL LIMITED EXTENDED.CUT UNCUT THEATRiCAL
|
||
RERIP REAL READNFO RETAiL RETAIL STV DC COMPLETE REMUX REMASTERED
|
||
SUBBED DUBBED MULTi MULTI SUB DUB ENG ENGLISH POL POLISH iNT iNTERNAL
|
||
```
|
||
|
||
> Note: `EXTENDED.CUT`, `THEATRiCAL`, `UNRATED`, `IMAX`, `DIRECTORS.CUT`,
|
||
> `FINAL.CUT`, `REMASTERED`, `UNCUT`, `DC` (= Director's Cut shorthand),
|
||
> `EE` (= Extended Edition shorthand) are kept *as edition tokens* — see
|
||
> § 3.6. Strip them from the noise pool, then re-emit them as
|
||
> ` - <Edition>` if present.
|
||
|
||
### 2.4 Source-specific cruft
|
||
|
||
Common compound suffixes that are not single tokens:
|
||
|
||
```
|
||
WEB.h264-NiXON[rartv]
|
||
WEB-DL.DDP5.1.x264-NTb
|
||
BDRip.x265.10bit-RZeroX
|
||
HDTV.x264-PHOENiX
|
||
1080p.WEB.h264-NiXON
|
||
2160p.UHD.BluRay.REMUX.HDR.HEVC.DTS-HD.MA.5.1
|
||
```
|
||
|
||
These are ad-hoc concatenations; once the standalone tokens above are
|
||
stripped, what remains is the title plus stray separators. The pipeline
|
||
in § 4 collapses separators last, so order matters.
|
||
|
||
### 2.5 Whitespace / punctuation cleanup
|
||
|
||
After substring removal, run these passes:
|
||
|
||
| Pass | From | To |
|
||
|---|---|---|
|
||
| Collapse runs of spaces | `Show Title S01E01` | `Show Title S01E01` |
|
||
| Trim leading/trailing whitespace | ` Show.mkv ` | `Show.mkv` |
|
||
| Collapse double-underscore | `Show__Title` | `Show Title` |
|
||
| Replace dot-separators with space (basename only) | `Show.Title.S01E01` | `Show Title S01E01` |
|
||
| Drop stray punctuation runs | `Show --- Title` | `Show - Title` |
|
||
| Strip trailing dashes/dots before ext | `Show -.mkv` | `Show.mkv` |
|
||
|
||
The dot-to-space substitution is **only applied if the dot is between
|
||
alphanumeric tokens** — so `5.1` (audio channel count, already removed
|
||
in § 2.3) is safe, and `Mr. Robot` keeps its dot if the source uses
|
||
`Mr.Robot` (the dot becomes a space, giving `Mr Robot` — the canonical
|
||
form has no dot).
|
||
|
||
### 2.6 URL / website refs
|
||
|
||
Match and remove:
|
||
|
||
```
|
||
WWW.YIFY-TORRENTS.COM
|
||
WWW.YTS.MX
|
||
WWW.RARBG.TO
|
||
RARBG.txt
|
||
www.yify-torrents.com
|
||
```
|
||
|
||
These appear as bracket prefixes (`[WWW.YIFY-TORRENTS.COM] Movie...`),
|
||
suffixes (`Movie - WWW.YIFY-TORRENTS.COM.mkv`), or as `RARBG.txt`-style
|
||
sidecar files (which doc 07 garbage-collects, not us).
|
||
|
||
Pattern (case-insensitive): `(?:^|[\s\[\(\.\-_])(WWW\.[A-Z0-9\-]+\.[A-Z]{2,4})(?:[\s\]\)\.\-_]|$)` → strip whole match.
|
||
|
||
### 2.7 Language indicators in the BASE name
|
||
|
||
`.pl`, `.eng`, `.en`, `.pol`, `.de`, `.fr`, `.es`, `.it`, `.ja`, `.jp`,
|
||
`.ru`, `.ko`, `.zh` appearing in the **video** filename (basename, not
|
||
extension). These belong on **subtitle sidecars only**, per doc 03.
|
||
|
||
```
|
||
Futurama.s01e01.pl.mkv ← BAD (`.pl` in video basename)
|
||
Futurama (1999) - S01E01.mkv ← GOOD (audio language is a stream attribute)
|
||
Futurama (1999) - S01E01.pl.srt ← GOOD (subtitle sidecar with lang)
|
||
Futurama (1999) - S01E01.eng.srt ← GOOD
|
||
```
|
||
|
||
Detection: 2- or 3-letter ISO-639 code as a token between dots / dashes /
|
||
underscores in the basename. If found, drop it from the basename. If a
|
||
sidecar `.srt` exists with the same lang token, **leave the sidecar
|
||
alone** — it's already correctly named.
|
||
|
||
If the source file is a `.srt` / `.ass` / `.vtt` / `.sub`, the lang
|
||
token is part of the canonical sidecar form and must NOT be stripped.
|
||
The script's `--type subtitle` mode handles this branch.
|
||
|
||
---
|
||
|
||
## 3. The normalization pipeline (regex / sed / python)
|
||
|
||
Conceptual order — each step's output feeds the next.
|
||
|
||
### 3.1 Step 0 — Determine target schema
|
||
|
||
Caller-supplied: `--type {movie|tv|anime-seasonal|anime-absolute|musicvideo|standup|extra}`. The
|
||
script does not guess. Doc 07's import wrapper picks the type based on
|
||
which library tree the file is being moved into.
|
||
|
||
### 3.2 Step 1 — Split off extension
|
||
|
||
```python
|
||
basename, ext = os.path.splitext(source_filename)
|
||
ext = ext.lower().lstrip(".") # canonical lowercase, no leading dot
|
||
```
|
||
|
||
Validate: `ext in {"mkv", "mp4", "avi", "webm", "m4v", "srt", "ass", "ssa", "vtt", "sub", "idx"}`.
|
||
Anything else → reject with an error; doc 07 quarantines it.
|
||
|
||
### 3.3 Step 2 — Extract S<NN>E<MM> (TV / anime-seasonal only)
|
||
|
||
```python
|
||
import re
|
||
RE_SEASON_EPISODE = re.compile(r"[Ss](\d{1,2})[Ee](\d{1,3})(?:-[Ee]?(\d{1,3}))?")
|
||
m = RE_SEASON_EPISODE.search(basename)
|
||
if not m:
|
||
# try alternative forms before giving up
|
||
m = re.search(r"(?<![\dA-Za-z])(\d{1,2})x(\d{1,3})(?:-(\d{1,3}))?", basename)
|
||
if m:
|
||
season, ep, ep_end = m.group(1), m.group(2), m.group(3)
|
||
else:
|
||
m = re.search(r"Season\s*(\d{1,2})\s*Episode\s*(\d{1,3})", basename, re.I)
|
||
# ...
|
||
season = f"{int(m.group(1)):02d}"
|
||
episode = f"{int(m.group(2)):02d}"
|
||
episode_end = f"{int(m.group(3)):02d}" if m.group(3) else None
|
||
```
|
||
|
||
If no S/E found and `--type tv|anime-seasonal`, error out — the file can
|
||
only be normalized if season/episode are recoverable.
|
||
|
||
### 3.4 Step 3 — Extract episode title
|
||
|
||
After step 2, the matched span is the boundary. Episode title is the text
|
||
**between** the SxxExx end and the **first** of: `[`, `(`, end-of-string,
|
||
group-tag delimiter, end-of-line.
|
||
|
||
```python
|
||
after_se = basename[m.end():]
|
||
# strip any leading separators
|
||
title_part = re.split(r"[\[\(]|\s-\s[A-Z][A-Z0-9]+$", after_se, maxsplit=1)[0]
|
||
title_part = title_part.strip(" -._")
|
||
```
|
||
|
||
If the title-part is empty after strip, leave it empty (script emits no
|
||
trailing title — `Show S01E01.mkv` is still canonical when no title is
|
||
known).
|
||
|
||
### 3.5 Step 4 — Extract series / movie title (from parent folder)
|
||
|
||
The **parent folder name** is the source of truth for series/movie title,
|
||
not the filename, because torrents commonly have inconsistent
|
||
filename-prefixes within the same folder (`Show.S01E01.x264.mkv` vs
|
||
`Show Title - S01E02.mkv`).
|
||
|
||
```python
|
||
parent = os.path.basename(os.path.dirname(source_path))
|
||
# strip group tags and quality from the parent folder too
|
||
clean_parent = strip_noise(parent)
|
||
# extract year if present
|
||
year_match = re.search(r"\((\d{4})\)", clean_parent)
|
||
year = year_match.group(1) if year_match else None
|
||
title = re.sub(r"\s*\(\d{4}\).*$", "", clean_parent).strip()
|
||
```
|
||
|
||
Edge case: parent folder is `Season 01` (TV) — recurse one more level up
|
||
to the show folder. The script handles N levels of `Season \d+` parents.
|
||
|
||
### 3.6 Step 5 — Detect edition tokens (Movies only)
|
||
|
||
After § 2.3 strips edition tags from the noise pool, scan the **original**
|
||
basename for canonical edition keywords:
|
||
|
||
```python
|
||
EDITIONS = {
|
||
r"director'?s?[\.\s_-]*cut": "Director's Cut",
|
||
r"extended[\.\s_-]*(?:cut|edition)?": "Extended",
|
||
r"theatrical(?:[\.\s_-]*cut)?": "Theatrical",
|
||
r"final[\.\s_-]*cut": "Final Cut",
|
||
r"imax": "IMAX",
|
||
r"unrated": "Unrated",
|
||
r"remastered?": "Remastered",
|
||
r"\bDC\b": "Director's Cut", # DC shorthand
|
||
r"\bEE\b": "Extended", # EE shorthand
|
||
}
|
||
```
|
||
|
||
Match the first one found, in priority order (Director's Cut > Final Cut
|
||
> Extended > Theatrical > IMAX > Unrated > Remastered). Emit as
|
||
` - <Edition>` between title-year block and extension.
|
||
|
||
### 3.7 Step 6 — Collapse, trim, re-emit canonical
|
||
|
||
```python
|
||
def emit_canonical(schema, parts):
|
||
if schema == "movie":
|
||
if parts.edition:
|
||
return f"{parts.title} ({parts.year}) - {parts.edition}.{parts.ext}"
|
||
return f"{parts.title} ({parts.year}).{parts.ext}"
|
||
if schema == "tv" or schema == "anime-seasonal":
|
||
ep_range = f"S{parts.season}E{parts.episode}"
|
||
if parts.episode_end:
|
||
ep_range += f"-E{parts.episode_end}"
|
||
if parts.episode_title:
|
||
return f"{parts.title} ({parts.year}) - {ep_range} - {parts.episode_title}.{parts.ext}"
|
||
return f"{parts.title} ({parts.year}) - {ep_range}.{parts.ext}"
|
||
if schema == "anime-absolute":
|
||
suffix = f" [{parts.subdub}]" if parts.subdub else ""
|
||
return f"{parts.title} - {parts.absolute_number} - {parts.episode_title}{suffix}.{parts.ext}"
|
||
if schema == "musicvideo":
|
||
variant = f" [{parts.variant}]" if parts.variant else ""
|
||
return f"{parts.year} - {parts.track_title}{variant}.{parts.ext}"
|
||
if schema == "standup":
|
||
return f"{parts.performer} - {parts.title} ({parts.year}).{parts.ext}"
|
||
```
|
||
|
||
After emission, run § 5.5 forbidden-character substitution, then § 5.6
|
||
double-space collapse, one final time.
|
||
|
||
---
|
||
|
||
## 4. Folder normalization
|
||
|
||
The same rules as filenames, applied to directory names, with a few
|
||
schema-specific adjustments.
|
||
|
||
### 4.1 Show folder — `<Show> (<Year>)`
|
||
|
||
```
|
||
Futurama Season 1 [1080p AI x265 10bit FS99 Joy]/ → Futurama (1999)/
|
||
The Office US S01-S09 1080p WEB-DL/ → The Office (2005)/
|
||
[YIFY] Inception 2010 1080p BRRip x264/ → Inception (2010)/ ← but this is movies
|
||
Cowboy.Bebop.1998.Complete.BluRay.x265.10bit/ → Cowboy Bebop (1998)/
|
||
```
|
||
|
||
Year: derived from the metadata provider (TVDB/TMDB) on first scrape, or
|
||
from the user-supplied `--year` flag. If neither is available,
|
||
`normalize.py --type tv` errors out and asks for `--year`. Year guessing
|
||
from parent-folder-numbers is unsafe (`Star Trek 2009` is the movie, not
|
||
the series).
|
||
|
||
### 4.2 Season folder — `Season <NN>`
|
||
|
||
```
|
||
Season 1/ → Season 01/
|
||
Season1/ → Season 01/
|
||
Season.01/ → Season 01/
|
||
S01/ → Season 01/
|
||
SEASON 1 [1080p WEB Joy]/ → Season 01/
|
||
Season 01 - Pilot Season/ → Season 01/ ← drop subtitle suffixes
|
||
Season 01 [BluRay]/ → Season 01/
|
||
Specials/ → Season 00/
|
||
Season 0/ → Season 00/
|
||
Extras/ → Season 00/ ← only if treated-as-specials
|
||
```
|
||
|
||
Doc 05 § 2.3 is explicit: `Specials/`, `Season 0/`, `Season Specials/` do
|
||
not match the parser. `Season 00` is the only correct form.
|
||
|
||
### 4.3 Movie folder — `<Title> (<Year>)`
|
||
|
||
Same rules as the filename without the extension. The folder name MUST
|
||
byte-for-byte match the filename prefix when multi-version files are
|
||
present (doc 05 § 1.2 — Jellyfin requires this).
|
||
|
||
```
|
||
[YIFY] Blade Runner 1982 1080p BRRip x264 AAC-RARBG/ → Blade Runner (1982)/
|
||
Blade.Runner.2049.2017.2160p.UHD.BluRay.x265.10bit.HDR.DV.DTS-HD.MA.7.1-FreetheFish/
|
||
→ Blade Runner 2049 (2017)/
|
||
```
|
||
|
||
### 4.4 Music-video artist folder — `<Artist>`
|
||
|
||
```
|
||
Daft.Punk/ → Daft Punk/
|
||
[Daft Punk]/ → Daft Punk/
|
||
DAFT PUNK Discography/ → Daft Punk/ ← note: "Discography" is dropped; this is video lib not music
|
||
```
|
||
|
||
### 4.5 Special-features subfolders
|
||
|
||
Inside an item folder, only these subfolder names are recognised by
|
||
Jellyfin (doc 05 § 8.2). The normalizer must rename source folders to
|
||
the canonical lowercase form:
|
||
|
||
```
|
||
BTS/ → behind the scenes/
|
||
Behind-the-Scenes/ → behind the scenes/
|
||
behind_the_scenes/ → behind the scenes/
|
||
Featurettes/ → featurettes/
|
||
DELETED SCENES [Joy]/ → deleted scenes/
|
||
Trailers/ → trailers/
|
||
Interviews/ → interviews/
|
||
Bonus Content/ → extras/ ← catch-all
|
||
Bonus_Features/ → extras/
|
||
```
|
||
|
||
**Files inside featurettes/ etc.** keep human-readable titles but get
|
||
their group tags stripped:
|
||
|
||
```
|
||
Featurettes/Welcome to the World of Tomorrow [1080p Joy].mkv
|
||
→ featurettes/Welcome to the World of Tomorrow.mkv
|
||
```
|
||
|
||
Casing inside the special-features file *itself* uses smart title case
|
||
(§ 5.1).
|
||
|
||
---
|
||
|
||
## 5. Case + character handling
|
||
|
||
### 5.1 Smart title case
|
||
|
||
Capitalize every word EXCEPT these "small words" (when not the first or
|
||
last word of the title):
|
||
|
||
```
|
||
a, an, and, as, at, but, by, for, from, in, into, nor, of, on, or, the,
|
||
to, up, vs, vs., via, with, yet
|
||
```
|
||
|
||
Words that look like acronyms (`I.B.M.`, `C.I.A.`, `T.M.N.T.`) are
|
||
preserved as-is. Roman numerals (`II`, `III`, `IV`, `IX`) are uppercased.
|
||
|
||
#### Examples
|
||
|
||
```
|
||
the lord of the rings the two towers → The Lord of the Rings the Two Towers ← BAD
|
||
the lord of the rings: the two towers → The Lord of the Rings - The Two Towers ← GOOD (`:` → ` - `, the second `the` is at start of subtitle, capitalize)
|
||
return of the king → Return of the King
|
||
star trek ii the wrath of khan → Star Trek II - The Wrath of Khan
|
||
```
|
||
|
||
The subtitle-after-colon special case is important: when a `: ` is
|
||
substituted with ` - `, the word after the dash is a new "first word" for
|
||
title-casing purposes. The script handles this by re-running the
|
||
title-caser on each ` - ` separated chunk.
|
||
|
||
Jellyfin's parser is case-insensitive — this is purely for human readers.
|
||
|
||
### 5.2 Hyphen / dash normalization
|
||
|
||
| Char | Code | Used for |
|
||
|---|---|---|
|
||
| `-` | U+002D HYPHEN-MINUS | ASCII hyphen, the only canonical form for filenames |
|
||
| `–` | U+2013 EN DASH | Forbidden in filenames; replace with `-` |
|
||
| `—` | U+2014 EM DASH | Forbidden; replace with `-` |
|
||
| `−` | U+2212 MINUS SIGN | Forbidden; replace with `-` |
|
||
|
||
Unicode dashes appear from copy-paste of articles (Wikipedia loves the en
|
||
dash). They're invisible-ish in `ls`, but they break grep, shell
|
||
completion, and SMB transfers.
|
||
|
||
```
|
||
Spider–Man (2002).mkv → Spider-Man (2002).mkv
|
||
Spider — Man (2002).mkv → Spider - Man (2002).mkv
|
||
```
|
||
|
||
### 5.3 Apostrophes / quotes
|
||
|
||
| Char | Code | Status |
|
||
|---|---|---|
|
||
| `'` | U+0027 APOSTROPHE | Canonical; ASCII straight quote |
|
||
| `'` | U+2019 RIGHT SINGLE QUOTATION MARK | Forbidden in filenames; replace with `'` |
|
||
| `'` | U+2018 LEFT SINGLE QUOTATION MARK | Forbidden; replace with `'` |
|
||
| `"` | U+0022 QUOTATION MARK | Forbidden in filenames (Windows-illegal); strip entirely |
|
||
| `"` | U+201C LEFT DOUBLE QUOTATION MARK | Forbidden; strip |
|
||
| `"` | U+201D RIGHT DOUBLE QUOTATION MARK | Forbidden; strip |
|
||
|
||
Curly quotes break SMB shares (Windows clients see `?` and refuse to open
|
||
the file) and break shell escaping in scripts.
|
||
|
||
```
|
||
Don't Stop Believin'.mkv ← GOOD
|
||
Don't Stop Believin'.mkv ← BAD (curly), normalize to straight
|
||
"It's a Wonderful Life" (1946).mkv ← BAD (double quotes), strip them entirely:
|
||
It's a Wonderful Life (1946).mkv ← GOOD
|
||
```
|
||
|
||
### 5.4 Diacritics / non-ASCII
|
||
|
||
`ext4` is UTF-8 native; Jellyfin's parser is UTF-8 native; the HTTP API
|
||
serves UTF-8 happily. **Keep diacritics** when the title's accepted
|
||
spelling uses them.
|
||
|
||
```
|
||
Amélie (2001)/Amélie (2001).mkv ← GOOD
|
||
Pokémon (1997)/Season 01/Pokémon (1997) - S01E01 - Pokémon - I Choose You!.mkv ← GOOD
|
||
Léon - The Professional (1994)/Léon - The Professional (1994).mkv ← GOOD
|
||
```
|
||
|
||
Doc 05 § 0 rule 4 advises caution: prefer the ASCII title when "well
|
||
known" (e.g. `Amelie (2001)` over `Amélie (2001)`). For this deploy with
|
||
LAN-only HTTP and `ext4`, full Unicode is safe — but the rule of thumb
|
||
remains: if Wikipedia's English page uses the accent, keep it; if not,
|
||
drop it.
|
||
|
||
**Tested:** Jellyfin's filename matching, `Items?searchTerm=`, and NFO
|
||
`<title>` round-trip correctly with `é`, `ñ`, `ü`, `ß`, `ø`, `ł`, `ż`,
|
||
`日`, `한` on this deploy. Verified against the Futurama Polish-dubbed
|
||
corpus.
|
||
|
||
### 5.5 Forbidden-char substitution table
|
||
|
||
Windows-illegal: `< > : " / \ | ? *`. Linux additionally forbids `/` and
|
||
NUL. Substitute as follows:
|
||
|
||
| Char | Substitute | Rationale |
|
||
|---|---|---|
|
||
| `:` | ` - ` (space-hyphen-space) | Most common in titles (`Star Trek II: The Wrath of Khan`); ` - ` is a clean replacement that title-casing handles |
|
||
| `/` | ` and ` | Used in titles like `Mr. & Mrs. Smith` (no `/` there) and in episode-title lists for two-part eps. Avoid if both halves stand on their own. |
|
||
| `\` | omit | No legitimate use in titles |
|
||
| `<` | `(` | Rare; `<` in titles is parenthetical |
|
||
| `>` | `)` | Same |
|
||
| `\|` | omit (or `-`) | Rare; sometimes in `Tom \| Jerry` style logo-text |
|
||
| `?` | omit | Common in `Who Killed the Robber?` — drop the question mark, keep meaning |
|
||
| `*` | omit | Rare; usually censored profanity |
|
||
| `"` | omit | Per § 5.3 |
|
||
| `\0` (NUL) | error | Filesystem hard-block; surface to user |
|
||
|
||
#### Examples
|
||
|
||
```
|
||
Star Trek II: The Wrath of Khan (1982) → Star Trek II - The Wrath of Khan (1982)
|
||
Mr. & Mrs. Smith (2005) → Mr. & Mrs. Smith (2005) (no change; & is fine)
|
||
Who Killed the Robber? (1987) → Who Killed the Robber (1987)
|
||
Tom & Jerry: The Movie (1992) → Tom & Jerry - The Movie (1992)
|
||
```
|
||
|
||
### 5.6 Whitespace canonicalization
|
||
|
||
After all substitutions:
|
||
|
||
1. Collapse runs of `\s+` to a single space.
|
||
2. `strip()` leading/trailing whitespace.
|
||
3. Collapse double-`-` (which can result from `Title -- Subtitle`) to
|
||
single `-`.
|
||
4. Trim trailing punctuation before extension: `Title -.mkv` → `Title.mkv`.
|
||
|
||
---
|
||
|
||
## 6. Year disambiguation — concrete examples
|
||
|
||
Jellyfin's TMDB/TVDB scrape uses the year in `(YYYY)` to filter
|
||
candidates. With multiple titles of the same name, the year is the *only*
|
||
disambiguator before falling back to provider IDs.
|
||
|
||
### 6.1 Without year — what goes wrong
|
||
|
||
Filename: `Cinderella.mkv` (no year, no folder year).
|
||
|
||
Jellyfin sends "Cinderella" to TMDB. TMDB returns 12+ matches:
|
||
- Cinderella (1950) — Disney animated
|
||
- Cinderella (2015) — Disney live action
|
||
- Cinderella (2021) — Camila Cabello musical
|
||
- Cinderella (1965) — TV special
|
||
- Cinderella (1899) — Méliès short
|
||
|
||
Jellyfin picks the one with the highest popularity score, which is the
|
||
2015 live-action remake. If you wanted 1950, you have to manually edit.
|
||
|
||
### 6.2 With year — clean match
|
||
|
||
Filename: `Cinderella (1950).mkv` in folder `Cinderella (1950)/`.
|
||
|
||
Jellyfin sends `(title=Cinderella, year=1950)` to TMDB. TMDB returns the
|
||
1950 animated film as the top match with high confidence. Scrape
|
||
succeeds first try.
|
||
|
||
```
|
||
Movies/Cinderella (1950)/Cinderella (1950).mkv ← TMDB ID 11224 (animated)
|
||
Movies/Cinderella (2015)/Cinderella (2015).mkv ← TMDB ID 150689 (live action)
|
||
Movies/Cinderella (2021)/Cinderella (2021).mkv ← TMDB ID 587996 (musical)
|
||
```
|
||
|
||
### 6.3 Same year — provider ID required
|
||
|
||
Filename: `Bad Movie (1980).mkv`. Two films named "Bad Movie" released in
|
||
1980 (hypothetical). Year doesn't disambiguate. Add provider ID:
|
||
|
||
```
|
||
Movies/Bad Movie (1980) [imdbid-tt0080000]/Bad Movie (1980) [imdbid-tt0080000].mkv
|
||
Movies/Bad Movie (1980) [imdbid-tt0080001]/Bad Movie (1980) [imdbid-tt0080001].mkv
|
||
```
|
||
|
||
### 6.4 Year on TV shows
|
||
|
||
The same logic applies to series:
|
||
|
||
```
|
||
TV/The Office (2001)/... ← UK original, BBC
|
||
TV/The Office (2005)/... ← US remake, NBC
|
||
```
|
||
|
||
Without year, Jellyfin picks one (usually the US one, higher TMDB
|
||
popularity). With year, both work side-by-side.
|
||
|
||
---
|
||
|
||
## 7. Multi-version handling
|
||
|
||
When a single movie has multiple legitimate cuts (Director's Cut, Theatrical,
|
||
Extended), or multiple resolutions (2160p HDR + 1080p SDR), Jellyfin groups
|
||
them under one item with a "Version" picker in the UI.
|
||
|
||
### 7.1 Edition variants
|
||
|
||
```
|
||
Movies/Blade Runner (1982)/
|
||
├── Blade Runner (1982).mkv ← default (whichever is "the" version)
|
||
├── Blade Runner (1982) - Director's Cut.mkv
|
||
├── Blade Runner (1982) - Final Cut.mkv
|
||
└── Blade Runner (1982) - Theatrical.mkv
|
||
```
|
||
|
||
Jellyfin reads all four files, hashes them, and creates one library item
|
||
"Blade Runner (1982)" with four selectable versions. The unlabelled one
|
||
shows as "Default".
|
||
|
||
### 7.2 Resolution variants
|
||
|
||
```
|
||
Movies/Blade Runner 2049 (2017)/
|
||
├── Blade Runner 2049 (2017) - 2160p.mkv
|
||
├── Blade Runner 2049 (2017) - 1080p.mkv
|
||
└── Blade Runner 2049 (2017) - 720p.mkv
|
||
```
|
||
|
||
Resolution labels ending in `p` or `i` sort descending by quality, so the
|
||
2160p version is offered first. This is the *only* exception to "no
|
||
resolution tags in filenames" (§ 1.1).
|
||
|
||
### 7.3 Mixed (edition × resolution)
|
||
|
||
```
|
||
Movies/Blade Runner 2049 (2017)/
|
||
├── Blade Runner 2049 (2017) - Theatrical 2160p.mkv
|
||
├── Blade Runner 2049 (2017) - Theatrical 1080p.mkv
|
||
├── Blade Runner 2049 (2017) - Director's Cut 2160p.mkv
|
||
└── Blade Runner 2049 (2017) - Director's Cut 1080p.mkv
|
||
```
|
||
|
||
This works in Jellyfin 10.10 — all four are grouped, the picker is a
|
||
flat list with all four labels visible. Slight UX ugliness but parses
|
||
cleanly. Avoid unless you genuinely have both axes of variation.
|
||
|
||
### 7.4 What does NOT work
|
||
|
||
- Sub-folders for variants:
|
||
```
|
||
Movies/Blade Runner 2049 (2017)/Theatrical/Blade Runner 2049 (2017).mkv ← BREAKS
|
||
```
|
||
Jellyfin treats `Theatrical/` as an unknown extras subfolder and the
|
||
inner mkv as nothing.
|
||
- Different folder per cut:
|
||
```
|
||
Movies/Blade Runner 2049 (2017) Theatrical/Blade Runner 2049 (2017).mkv
|
||
Movies/Blade Runner 2049 (2017) Director's Cut/Blade Runner 2049 (2017).mkv
|
||
```
|
||
This makes them two separate library items, not grouped versions.
|
||
- Suffix without space-hyphen-space:
|
||
```
|
||
Blade Runner 2049 (2017).Theatrical.mkv ← BREAKS (no ` - ` separator)
|
||
Blade Runner 2049 (2017)-Theatrical.mkv ← BREAKS (no spaces around `-`)
|
||
```
|
||
|
||
---
|
||
|
||
## 8. Special-features filename rules
|
||
|
||
Files inside the recognised subfolders (`featurettes/`, `behind the
|
||
scenes/`, `deleted scenes/`, `interviews/`, `trailers/`, etc.) follow
|
||
these rules:
|
||
|
||
1. **Strip group tags** as in § 2.1.
|
||
2. **Strip quality / codec / source / audio tags** as in § 2.3.
|
||
3. **Smart title case** as in § 5.1.
|
||
4. **Forbidden chars substituted** as in § 5.5.
|
||
5. **Filename = the human-readable feature title.** No `(year)`, no
|
||
`S01E01`. The parent folder type (e.g. `featurettes/`) is the type
|
||
marker.
|
||
6. Optional: append `-featurette` (or `-trailer`, `-behindthescenes`,
|
||
etc.) suffix to be defensive about scraper edge cases. Doc 05 § 8.1
|
||
shows this works AND § 8.2 shows the folder method works — using both
|
||
is belt-and-braces.
|
||
|
||
#### Example
|
||
|
||
```
|
||
Featurettes/Welcome to the World of Tomorrow [1080p Joy].mkv
|
||
→
|
||
featurettes/Welcome to the World of Tomorrow.mkv
|
||
```
|
||
|
||
Or, if you want belt-and-braces:
|
||
|
||
```
|
||
featurettes/Welcome to the World of Tomorrow-featurette.mkv
|
||
```
|
||
|
||
Both parse. Pick **one** style per library and keep it consistent.
|
||
|
||
---
|
||
|
||
## 9. Worked example — the live Futurama import
|
||
|
||
This is the example the owner asked for. Verified against the live media
|
||
tree on nullstone (`/home/user/media/tv/Futurama/Season 01,02,03/`).
|
||
|
||
### 9.1 BEFORE (representative source dump)
|
||
|
||
```
|
||
/home/admin/Downloads/futrama/
|
||
└── Futurama Season 1 [1080p AI x265 10bit FS99 Joy]/
|
||
├── Futurama S01E01 Space Pilot 3000 [1080p x265 10bit Joy].mkv
|
||
├── Futurama S01E02 The Series Has Landed [1080p x265 10bit Joy].mkv
|
||
├── Futurama S01E03 I, Roommate [1080p x265 10bit Joy].mkv
|
||
├── Futurama S01E04 Love's Labours Lost in Space [1080p x265 10bit Joy].mkv
|
||
├── Futurama S01E05 Fear of a Bot Planet [1080p x265 10bit Joy].mkv
|
||
├── Futurama S01E06 A Fishful of Dollars [1080p x265 10bit Joy].mkv
|
||
├── Futurama S01E07 My Three Suns [1080p x265 10bit Joy].mkv
|
||
├── Futurama S01E08 A Big Piece of Garbage [1080p x265 10bit Joy].mkv
|
||
├── Futurama S01E09 Hell Is Other Robots [1080p x265 10bit Joy].mkv
|
||
└── Featurettes/
|
||
└── Welcome to the World of Tomorrow [1080p Joy].mkv
|
||
```
|
||
|
||
Note: doubled-space is real (`Futurama S01E01 Space Pilot 3000 [1080p`).
|
||
Source the rip is from a release group called "Joy" using "FS99" (FastSub
|
||
99); "AI" likely means AI-upscaled. None of that is library-relevant.
|
||
|
||
### 9.2 AFTER (canonical layout)
|
||
|
||
```
|
||
/home/user/media/tv/
|
||
└── Futurama (1999)/
|
||
├── Season 01/
|
||
│ ├── Futurama (1999) - S01E01 - Space Pilot 3000.mkv
|
||
│ ├── Futurama (1999) - S01E02 - The Series Has Landed.mkv
|
||
│ ├── Futurama (1999) - S01E03 - I, Roommate.mkv
|
||
│ ├── Futurama (1999) - S01E04 - Love's Labours Lost in Space.mkv
|
||
│ ├── Futurama (1999) - S01E05 - Fear of a Bot Planet.mkv
|
||
│ ├── Futurama (1999) - S01E06 - A Fishful of Dollars.mkv
|
||
│ ├── Futurama (1999) - S01E07 - My Three Suns.mkv
|
||
│ ├── Futurama (1999) - S01E08 - A Big Piece of Garbage.mkv
|
||
│ └── Futurama (1999) - S01E09 - Hell Is Other Robots.mkv
|
||
└── featurettes/
|
||
└── Welcome to the World of Tomorrow.mkv
|
||
```
|
||
|
||
### 9.3 Per-file rename mapping
|
||
|
||
| Before | After |
|
||
|---|---|
|
||
| `Futurama Season 1 [1080p AI x265 10bit FS99 Joy]/` | `Futurama (1999)/Season 01/` |
|
||
| `Futurama S01E01 Space Pilot 3000 [1080p x265 10bit Joy].mkv` | `Futurama (1999) - S01E01 - Space Pilot 3000.mkv` |
|
||
| `Futurama S01E02 The Series Has Landed [1080p x265 10bit Joy].mkv` | `Futurama (1999) - S01E02 - The Series Has Landed.mkv` |
|
||
| `Futurama S01E04 Love's Labours Lost in Space [1080p x265 10bit Joy].mkv` | `Futurama (1999) - S01E04 - Love's Labours Lost in Space.mkv` |
|
||
| `Featurettes/Welcome to the World of Tomorrow [1080p Joy].mkv` | `featurettes/Welcome to the World of Tomorrow.mkv` |
|
||
|
||
Notes on specific titles:
|
||
|
||
- `I, Roommate` keeps the comma. Comma is legal on `ext4`, on Windows,
|
||
and on every modern SMB client. No need to substitute.
|
||
- `Love's Labours Lost in Space` keeps the straight ASCII apostrophe.
|
||
If the source had a curly `'`, § 5.3 normalizes it.
|
||
- `Hell Is Other Robots` — `Is` is capitalized (it's not in the small-words
|
||
list — the small-words list excludes `is`/`be`/`am`/`are`).
|
||
|
||
### 9.4 What the live tree currently has
|
||
|
||
Verified via `ssh user@192.168.0.100 'ls /home/user/media/tv/Futurama/'`:
|
||
|
||
```
|
||
Season 01
|
||
Season 02
|
||
Season 03
|
||
```
|
||
|
||
The current live deploy uses folder name `Futurama/` (no year) — that's
|
||
non-canonical per this doc. The canonical is `Futurama (1999)/`. This is
|
||
covered in doc 07's migration plan (rename the folder, then `POST
|
||
/Library/Refresh`). Mentioned here as a known drift; not fixed in this
|
||
doc.
|
||
|
||
---
|
||
|
||
## 10. Idempotency and safety
|
||
|
||
The `normalize.py` script in § 11 enforces these:
|
||
|
||
1. **No-op on already-canonical input.** When the script's emitted
|
||
filename equals the source filename byte-for-byte, it does nothing
|
||
and returns exit code 0. Re-running the script on an already-imported
|
||
library is safe and free.
|
||
|
||
2. **No overwrite without `--force`.** When the target path exists and
|
||
is not the source path, the script refuses to move and returns exit
|
||
code 2. With `--force`, it moves and the target is overwritten.
|
||
Without `--force`, the script suggests a numeric suffix
|
||
(`Title (Year) (1).mkv`) and asks for confirmation.
|
||
|
||
3. **Default to dry-run.** The script prints what it would do to stdout
|
||
and does NOT touch the filesystem unless `--apply` is passed. This is
|
||
the inverse of the GNU convention (most tools default to apply,
|
||
require `--dry-run` to preview) — chosen because the destructive
|
||
case (a wrong rename of 100 files) is much worse than the boring
|
||
case (one extra flag).
|
||
|
||
4. **Audit log** at `/var/log/jellyfin-imports/<YYYY-MM-DD>.log`. Every
|
||
`--apply` run appends:
|
||
```
|
||
2026-05-08T14:23:11Z RENAME /home/admin/.../Futurama S01E01 ...joy].mkv -> /home/user/media/tv/Futurama (1999)/Season 01/Futurama (1999) - S01E01 - Space Pilot 3000.mkv
|
||
```
|
||
Path is created (`mkdir -p /var/log/jellyfin-imports`) on first run if
|
||
missing; user must have write permission.
|
||
|
||
5. **No deletes.** The script *moves* (`os.rename` on same FS, `shutil.move`
|
||
across FS). It never `os.unlink`s. Garbage collection of source folders
|
||
(after all files moved) is doc 07's job.
|
||
|
||
6. **Atomic per-file.** Each file's rename is one syscall on the same FS;
|
||
on a different FS, `shutil.move` does copy-then-unlink which has a
|
||
brief window where both source and target exist. The audit log records
|
||
the operation regardless.
|
||
|
||
7. **Unicode-safe.** All paths handled as `pathlib.Path` (UTF-8 native on
|
||
`ext4`). Curly-quote → straight-quote substitution happens BEFORE the
|
||
target path is computed, so the target path is always ASCII-safe-ish
|
||
(still UTF-8 for legitimate accents).
|
||
|
||
---
|
||
|
||
## 11. Reference implementation — `normalize.py`
|
||
|
||
Drop this at `/opt/docker/jellyfin/scripts/normalize.py` on nullstone.
|
||
Run with Python 3.10+. Stdlib only — no external deps.
|
||
|
||
```python
|
||
#!/usr/bin/env python3
|
||
"""
|
||
normalize.py — canonical filename normalizer for nasflix.s8n.ru
|
||
|
||
Per /tmp/NASFLIX/docs/08-filename-normalization.md.
|
||
Safe by default: dry-run, no overwrite, no delete.
|
||
"""
|
||
|
||
from __future__ import annotations
|
||
|
||
import argparse
|
||
import datetime as dt
|
||
import os
|
||
import re
|
||
import shutil
|
||
import sys
|
||
import unicodedata
|
||
from dataclasses import dataclass, field
|
||
from pathlib import Path
|
||
from typing import Optional
|
||
|
||
LOG_DIR = Path("/var/log/jellyfin-imports")
|
||
|
||
# --- Stripping rules (doc § 2) -------------------------------------------------
|
||
|
||
GROUP_TAG_PATTERNS = [
|
||
re.compile(r"\[[^\[\]]*\b(YIFY|YTS(\.\w+)?|RARBG|ettv|eztv|GalaxyRG\d*|"
|
||
r"FitGirl|FitGirl\s*Repack|NOGRP|QxR|FreetheFish|psa|PSA|CMRG|"
|
||
r"d3g|STRiFE|Pahe\.in|FoV|NTb|YOLO|KOGi|playWEB|REQ|XBET|FLUX|"
|
||
r"NOSiVID|BGT|SVA|CRiMSON|ION10|ION265|BluPanda|H4S5S|Joy|"
|
||
r"FS99\s*Joy|FS99|AI\s*x265|x265\s*\d+bit|\d+bit\s*x265)"
|
||
r"[^\[\]]*\]", re.I),
|
||
re.compile(r"\((YIFY|RARBG|NOGRP)\)", re.I),
|
||
]
|
||
|
||
QUALITY_TOKENS = re.compile(
|
||
r"(?<![A-Za-z0-9])("
|
||
r"2160p|1080p|720p|480p|360p|4[Kk]|UHD|HD|SD|FHD|QHD|"
|
||
r"WEB-DL|WEBDL|WEB\.DL|WEB|WEBRip|WEB-Rip|BluRay|BLURAY|Bluray|BDRip|"
|
||
r"BRRip|BR-Rip|BDR|HDTV|HDTVRip|PDTV|DSR|DVDRip|DVD|DVDR|DVD9|DVD5|"
|
||
r"HDDVD|HDDVDRip|HDRip|CAMRip|CAM|TS|HDTS|TC|TELESYNC|TELECINE|R5|"
|
||
r"SCREENER|SCR|WORKPRINT|WP|PPV|PPVRip|"
|
||
r"x264|x265|H\.?264|H\.?265|HEVC|AVC|VP9|AV1|XviD|DivX|"
|
||
r"10bit|10-bit|8bit|8-bit|HDR10\+?|HDR|DV|Dolby\.?Vision|SDR|HFR|HQ|"
|
||
r"DDP?5\.1|DDP?7\.1|DDP?2\.0|DD\+5\.1|DD\+7\.1|DTS-HD\.MA|DTS-HD|DTS-X|"
|
||
r"DTSX|DTS|TrueHD|Atmos|AAC2\.0|AAC5\.1|AAC|AC3|AC-3|EAC3|E-AC3|"
|
||
r"MP3|MP2|Opus|FLAC|PCM|LPCM|5\.1|7\.1|2\.0|Mono|Stereo|Multi|"
|
||
r"PROPER|REPACK|iNTERNAL|INTERNAL|LIMITED|UNCUT|RERIP|REAL|READNFO|"
|
||
r"RETAi?L|STV|REMUX|MULTi|MULTI|SUBBED|DUBBED|iNT"
|
||
r")(?![A-Za-z0-9])", re.I)
|
||
|
||
URL_REF = re.compile(
|
||
r"(?:^|[\s\[\(\.\-_])(WWW\.[A-Z0-9\-]+\.[A-Z]{2,4})(?:[\s\]\)\.\-_]|$)",
|
||
re.I)
|
||
|
||
TRAILING_GROUP = re.compile(r"-(?:NOGRP|EVO|RARBG|SPARKS|CMRG|NTb|FLUX|AMZN|"
|
||
r"NF|DSNP|ATVP|MA|WEB|AAC2|FoV|KOGi|PLAYWEB|FRDS|"
|
||
r"ZQ|PHOENiX|EZTV|NTG|iON|ION10|ION265|CtrlHD|"
|
||
r"d3g|PSA|QxR|RZeroX|PMP|BTN|DEFLATE|BAE|MZABI|"
|
||
r"TURG|Joy)\b", re.I)
|
||
|
||
LANG_TOKEN = re.compile(r"(?<![A-Za-z])\.?(en|eng|pl|pol|de|deu|fr|fra|es|spa|"
|
||
r"it|ita|ja|jpn|jp|ru|rus|ko|kor|zh|chi)(?![A-Za-z])",
|
||
re.I)
|
||
|
||
# Forbidden chars (§ 5.5)
|
||
FORBIDDEN_CHARS = {
|
||
":": " - ",
|
||
"/": " and ",
|
||
"\\": "",
|
||
"<": "(",
|
||
">": ")",
|
||
"|": "",
|
||
"?": "",
|
||
"*": "",
|
||
'"': "",
|
||
"“": "", # left double quotation mark
|
||
"”": "", # right double quotation mark
|
||
}
|
||
|
||
# Apostrophe normalization (§ 5.3)
|
||
APOSTROPHES = {
|
||
"‘": "'",
|
||
"’": "'",
|
||
}
|
||
|
||
# Dashes (§ 5.2)
|
||
DASHES = {
|
||
"–": "-", # en dash
|
||
"—": "-", # em dash
|
||
"−": "-", # minus
|
||
}
|
||
|
||
# Editions (§ 3.6)
|
||
EDITION_PATTERNS = [
|
||
(re.compile(r"director'?s?[\.\s_-]*cut", re.I), "Director's Cut"),
|
||
(re.compile(r"final[\.\s_-]*cut", re.I), "Final Cut"),
|
||
(re.compile(r"extended[\.\s_-]*(?:cut|edition)?", re.I), "Extended"),
|
||
(re.compile(r"theatrical(?:[\.\s_-]*cut)?", re.I), "Theatrical"),
|
||
(re.compile(r"\bIMAX\b", re.I), "IMAX"),
|
||
(re.compile(r"\bunrated\b", re.I), "Unrated"),
|
||
(re.compile(r"remastere?d?", re.I), "Remastered"),
|
||
(re.compile(r"(?<![A-Za-z])DC(?![A-Za-z])"), "Director's Cut"),
|
||
(re.compile(r"(?<![A-Za-z])EE(?![A-Za-z])"), "Extended"),
|
||
]
|
||
|
||
# Smart title case (§ 5.1)
|
||
SMALL_WORDS = {"a", "an", "and", "as", "at", "but", "by", "for", "from",
|
||
"in", "into", "nor", "of", "on", "or", "the", "to", "up",
|
||
"vs", "vs.", "via", "with", "yet"}
|
||
ROMAN_NUMERAL = re.compile(r"^[ivxlcdmIVXLCDM]+$")
|
||
|
||
|
||
def smart_title(s: str) -> str:
|
||
"""Title-case respecting small-words and roman numerals."""
|
||
if not s:
|
||
return s
|
||
chunks = re.split(r"(\s-\s)", s) # split on space-dash-space (subtitle)
|
||
out_chunks = []
|
||
for chunk in chunks:
|
||
if chunk == " - ":
|
||
out_chunks.append(chunk)
|
||
continue
|
||
words = chunk.split(" ")
|
||
result = []
|
||
for i, w in enumerate(words):
|
||
if not w:
|
||
result.append(w)
|
||
continue
|
||
if ROMAN_NUMERAL.match(w):
|
||
result.append(w.upper())
|
||
continue
|
||
lower = w.lower()
|
||
if 0 < i < len(words) - 1 and lower in SMALL_WORDS:
|
||
result.append(lower)
|
||
else:
|
||
# capitalize but preserve internal apostrophes/dots
|
||
result.append(w[0].upper() + w[1:].lower() if w else w)
|
||
out_chunks.append(" ".join(result))
|
||
return "".join(out_chunks)
|
||
|
||
|
||
def strip_noise(s: str) -> str:
|
||
"""Remove group tags, quality, urls, trailing groups."""
|
||
for pat in GROUP_TAG_PATTERNS:
|
||
s = pat.sub("", s)
|
||
s = URL_REF.sub(" ", s)
|
||
s = QUALITY_TOKENS.sub("", s)
|
||
s = TRAILING_GROUP.sub("", s)
|
||
return s
|
||
|
||
|
||
def normalize_chars(s: str) -> str:
|
||
"""Apply Unicode/forbidden-char substitutions."""
|
||
for k, v in APOSTROPHES.items():
|
||
s = s.replace(k, v)
|
||
for k, v in DASHES.items():
|
||
s = s.replace(k, v)
|
||
for k, v in FORBIDDEN_CHARS.items():
|
||
s = s.replace(k, v)
|
||
# NFC normalization for diacritics (consistent encoding)
|
||
s = unicodedata.normalize("NFC", s)
|
||
return s
|
||
|
||
|
||
def collapse_whitespace(s: str) -> str:
|
||
s = re.sub(r"\s+", " ", s)
|
||
s = re.sub(r" - - ", " - ", s)
|
||
s = re.sub(r"--+", "-", s)
|
||
s = s.strip(" -._")
|
||
return s
|
||
|
||
|
||
# --- Schema-specific extraction ------------------------------------------------
|
||
|
||
@dataclass
|
||
class Parts:
|
||
title: str = ""
|
||
year: Optional[str] = None
|
||
season: Optional[str] = None
|
||
episode: Optional[str] = None
|
||
episode_end: Optional[str] = None
|
||
episode_title: str = ""
|
||
edition: Optional[str] = None
|
||
provider_id: Optional[str] = None
|
||
ext: str = "mkv"
|
||
absolute_number: Optional[str] = None
|
||
subdub: Optional[str] = None
|
||
track_title: str = ""
|
||
variant: Optional[str] = None
|
||
performer: str = ""
|
||
|
||
|
||
RE_SE = re.compile(r"[Ss](\d{1,2})[Ee](\d{1,3})(?:-[Ee]?(\d{1,3}))?")
|
||
RE_NXM = re.compile(r"(?<![\dA-Za-z])(\d{1,2})x(\d{1,3})(?:-(\d{1,3}))?")
|
||
RE_SEASON_EP = re.compile(r"Season\s*(\d{1,2})\s*Episode\s*(\d{1,3})", re.I)
|
||
RE_YEAR_PARENS = re.compile(r"\((\d{4})\)")
|
||
RE_PROVIDER_ID = re.compile(r"\[(?:imdbid|tmdbid|tvdbid)-[^\]]+\]")
|
||
|
||
|
||
def extract_year(s: str) -> Optional[str]:
|
||
m = RE_YEAR_PARENS.search(s)
|
||
if m:
|
||
y = int(m.group(1))
|
||
if 1888 <= y <= dt.date.today().year + 2:
|
||
return m.group(1)
|
||
return None
|
||
|
||
|
||
def extract_provider_id(s: str) -> Optional[str]:
|
||
m = RE_PROVIDER_ID.search(s)
|
||
return m.group(0) if m else None
|
||
|
||
|
||
def extract_se(s: str):
|
||
m = RE_SE.search(s)
|
||
if m:
|
||
end = m.group(3) or None
|
||
return (m, m.group(1), m.group(2), end)
|
||
m = RE_NXM.search(s)
|
||
if m:
|
||
return (m, m.group(1), m.group(2), m.group(3))
|
||
m = RE_SEASON_EP.search(s)
|
||
if m:
|
||
return (m, m.group(1), m.group(2), None)
|
||
return (None, None, None, None)
|
||
|
||
|
||
def extract_edition(raw_basename: str) -> Optional[str]:
|
||
for pat, name in EDITION_PATTERNS:
|
||
if pat.search(raw_basename):
|
||
return name
|
||
return None
|
||
|
||
|
||
def parent_show_folder(p: Path) -> Path:
|
||
"""Walk up past Season XX folders until we find the show folder."""
|
||
cur = p.parent
|
||
while re.match(r"(?i)season\s*\d+|specials|extras", cur.name):
|
||
cur = cur.parent
|
||
return cur
|
||
|
||
|
||
# --- Per-schema emit -----------------------------------------------------------
|
||
|
||
def normalize_movie(src: Path, year_hint: Optional[str] = None,
|
||
title_hint: Optional[str] = None) -> Path:
|
||
raw = src.stem
|
||
ext = src.suffix.lower().lstrip(".") or "mkv"
|
||
edition = extract_edition(raw)
|
||
provider_id = extract_provider_id(raw) or extract_provider_id(src.parent.name)
|
||
cleaned = strip_noise(raw)
|
||
cleaned = normalize_chars(cleaned)
|
||
cleaned = collapse_whitespace(cleaned)
|
||
year = year_hint or extract_year(cleaned) or extract_year(src.parent.name)
|
||
if year:
|
||
cleaned = re.sub(r"\s*\(" + year + r"\)", "", cleaned).strip()
|
||
# drop edition tokens from the title body (we re-emit them)
|
||
for pat, _ in EDITION_PATTERNS:
|
||
cleaned = pat.sub("", cleaned)
|
||
cleaned = collapse_whitespace(cleaned)
|
||
title = title_hint or smart_title(cleaned)
|
||
if not year:
|
||
raise ValueError(f"cannot determine year for movie: {src}")
|
||
folder_name = f"{title} ({year})"
|
||
if provider_id:
|
||
folder_name += f" {provider_id}"
|
||
file_basename = folder_name
|
||
if edition:
|
||
file_basename += f" - {edition}"
|
||
return src.parent.parent / folder_name / f"{file_basename}.{ext}"
|
||
|
||
|
||
def normalize_tv(src: Path, year_hint: Optional[str] = None,
|
||
title_hint: Optional[str] = None,
|
||
schema: str = "tv") -> Path:
|
||
raw = src.stem
|
||
ext = src.suffix.lower().lstrip(".") or "mkv"
|
||
m, season, ep, ep_end = extract_se(raw)
|
||
if not season:
|
||
raise ValueError(f"no S/E token in TV file: {src}")
|
||
season = f"{int(season):02d}"
|
||
episode = f"{int(ep):02d}"
|
||
episode_end = f"{int(ep_end):02d}" if ep_end else None
|
||
# episode title = text after match, before next bracket
|
||
after = raw[m.end():] if hasattr(m, "end") else ""
|
||
title_part = re.split(r"[\[\(]", after, maxsplit=1)[0]
|
||
title_part = strip_noise(title_part)
|
||
title_part = normalize_chars(title_part)
|
||
title_part = collapse_whitespace(title_part)
|
||
title_part = re.sub(r"^[\s\-_\.]+", "", title_part)
|
||
episode_title = smart_title(title_part) if title_part else ""
|
||
# show title from parent folder
|
||
show_folder = parent_show_folder(src)
|
||
show_clean = strip_noise(show_folder.name)
|
||
show_clean = normalize_chars(show_clean)
|
||
show_clean = collapse_whitespace(show_clean)
|
||
year = year_hint or extract_year(show_clean) or extract_year(src.parent.name)
|
||
if year:
|
||
show_clean = re.sub(r"\s*\(" + year + r"\).*$", "", show_clean).strip()
|
||
show_clean = re.sub(r"(?i)\s*Season\s*\d+.*$", "", show_clean).strip()
|
||
show = title_hint or smart_title(show_clean)
|
||
if not year:
|
||
raise ValueError(f"cannot determine year for TV show: {show_folder}")
|
||
se_str = f"S{season}E{episode}"
|
||
if episode_end:
|
||
se_str += f"-E{episode_end}"
|
||
file_base = f"{show} ({year}) - {se_str}"
|
||
if episode_title:
|
||
file_base += f" - {episode_title}"
|
||
target_root = show_folder.parent # e.g. /media/tv
|
||
return target_root / f"{show} ({year})" / f"Season {season}" / f"{file_base}.{ext}"
|
||
|
||
|
||
def normalize_anime_absolute(src: Path, title_hint: Optional[str],
|
||
abs_num: Optional[int],
|
||
ep_title: str = "",
|
||
subdub: Optional[str] = None) -> Path:
|
||
ext = src.suffix.lower().lstrip(".") or "mkv"
|
||
show_folder = parent_show_folder(src)
|
||
show_clean = strip_noise(show_folder.name)
|
||
show_clean = normalize_chars(show_clean)
|
||
show = title_hint or smart_title(collapse_whitespace(show_clean))
|
||
if abs_num is None:
|
||
raise ValueError(f"absolute number required for {src}")
|
||
suffix = f" [{subdub}]" if subdub else ""
|
||
title_str = smart_title(ep_title) if ep_title else ""
|
||
file_base = f"{show} - {abs_num:04d}"
|
||
if title_str:
|
||
file_base += f" - {title_str}"
|
||
file_base += suffix
|
||
return show_folder.parent / show / f"{file_base}.{ext}"
|
||
|
||
|
||
def normalize_musicvideo(src: Path, artist_hint: str, year_hint: str,
|
||
track_hint: Optional[str] = None,
|
||
variant: Optional[str] = None) -> Path:
|
||
ext = src.suffix.lower().lstrip(".") or "mp4"
|
||
raw = src.stem
|
||
cleaned = normalize_chars(strip_noise(raw))
|
||
cleaned = collapse_whitespace(cleaned)
|
||
track = track_hint or smart_title(cleaned)
|
||
artist = smart_title(artist_hint)
|
||
suffix = f" [{variant}]" if variant else ""
|
||
return src.parent.parent / artist / f"{year_hint} - {track}{suffix}.{ext}"
|
||
|
||
|
||
def normalize_standup(src: Path, performer: str, title: str, year: str) -> Path:
|
||
ext = src.suffix.lower().lstrip(".") or "mkv"
|
||
folder = f"{performer} - {title} ({year})"
|
||
return src.parent.parent / folder / f"{folder}.{ext}"
|
||
|
||
|
||
# --- Driver --------------------------------------------------------------------
|
||
|
||
def is_already_canonical(src: Path, target: Path) -> bool:
|
||
return src.resolve() == target.resolve()
|
||
|
||
|
||
def log_op(action: str, src: Path, target: Path):
|
||
LOG_DIR.mkdir(parents=True, exist_ok=True)
|
||
log_file = LOG_DIR / f"{dt.date.today().isoformat()}.log"
|
||
ts = dt.datetime.utcnow().isoformat() + "Z"
|
||
line = f"{ts} {action} {src} -> {target}\n"
|
||
with log_file.open("a") as f:
|
||
f.write(line)
|
||
|
||
|
||
def main():
|
||
ap = argparse.ArgumentParser(description="canonical filename normalizer")
|
||
ap.add_argument("source", type=Path, help="source file path")
|
||
ap.add_argument("--type", required=True,
|
||
choices=["movie", "tv", "anime-seasonal",
|
||
"anime-absolute", "musicvideo", "standup",
|
||
"extra"])
|
||
ap.add_argument("--year")
|
||
ap.add_argument("--title")
|
||
ap.add_argument("--performer", help="for standup")
|
||
ap.add_argument("--artist", help="for musicvideo")
|
||
ap.add_argument("--track", help="for musicvideo")
|
||
ap.add_argument("--variant", help="for musicvideo")
|
||
ap.add_argument("--abs-num", type=int, help="for anime-absolute")
|
||
ap.add_argument("--ep-title", help="for anime-absolute")
|
||
ap.add_argument("--subdub", choices=["Sub", "Dub"], help="for anime-absolute")
|
||
ap.add_argument("--apply", action="store_true",
|
||
help="actually move the file (default is dry-run)")
|
||
ap.add_argument("--force", action="store_true",
|
||
help="overwrite existing target")
|
||
args = ap.parse_args()
|
||
|
||
src = args.source.resolve()
|
||
if not src.exists():
|
||
print(f"ERROR: {src} does not exist", file=sys.stderr)
|
||
sys.exit(1)
|
||
|
||
try:
|
||
if args.type == "movie":
|
||
target = normalize_movie(src, args.year, args.title)
|
||
elif args.type == "tv":
|
||
target = normalize_tv(src, args.year, args.title, schema="tv")
|
||
elif args.type == "anime-seasonal":
|
||
target = normalize_tv(src, args.year, args.title, schema="anime")
|
||
elif args.type == "anime-absolute":
|
||
target = normalize_anime_absolute(src, args.title, args.abs_num,
|
||
args.ep_title or "",
|
||
args.subdub)
|
||
elif args.type == "musicvideo":
|
||
target = normalize_musicvideo(src, args.artist or "", args.year or "",
|
||
args.track, args.variant)
|
||
elif args.type == "standup":
|
||
target = normalize_standup(src, args.performer or "",
|
||
args.title or "", args.year or "")
|
||
else:
|
||
print(f"ERROR: schema '{args.type}' not implemented", file=sys.stderr)
|
||
sys.exit(2)
|
||
except ValueError as e:
|
||
print(f"ERROR: {e}", file=sys.stderr)
|
||
sys.exit(2)
|
||
|
||
if is_already_canonical(src, target):
|
||
print(f"NOOP {src}")
|
||
sys.exit(0)
|
||
|
||
if target.exists() and not args.force:
|
||
print(f"REFUSE {src} -> {target} (target exists; use --force)")
|
||
sys.exit(2)
|
||
|
||
if args.apply:
|
||
target.parent.mkdir(parents=True, exist_ok=True)
|
||
shutil.move(str(src), str(target))
|
||
log_op("RENAME", src, target)
|
||
print(f"MOVED {src} -> {target}")
|
||
else:
|
||
print(f"DRY-RUN {src} -> {target}")
|
||
|
||
|
||
if __name__ == "__main__":
|
||
main()
|
||
```
|
||
|
||
### 11.1 Usage examples
|
||
|
||
```bash
|
||
# Dry-run a single Futurama episode
|
||
./normalize.py --type tv \
|
||
"/home/admin/Downloads/futrama/Futurama Season 1 [1080p AI x265 10bit FS99 Joy]/Futurama S01E01 Space Pilot 3000 [1080p x265 10bit Joy].mkv"
|
||
|
||
# Output:
|
||
# DRY-RUN /home/admin/Downloads/.../Futurama S01E01 Space Pilot 3000 [1080p x265 10bit Joy].mkv
|
||
# -> /home/admin/Downloads/futrama/Futurama (1999)/Season 01/Futurama (1999) - S01E01 - Space Pilot 3000.mkv
|
||
|
||
# Same with --apply, with explicit year and title hints
|
||
./normalize.py --type tv --year 1999 --title "Futurama" --apply \
|
||
"/home/admin/Downloads/futrama/Futurama Season 1 [1080p AI x265 10bit FS99 Joy]/Futurama S01E01 Space Pilot 3000 [1080p x265 10bit Joy].mkv"
|
||
|
||
# Movie with edition
|
||
./normalize.py --type movie --year 1982 --apply \
|
||
"/home/admin/Downloads/Blade Runner 1982 Final Cut [1080p BluRay x265 RARBG].mkv"
|
||
|
||
# Stand-up
|
||
./normalize.py --type standup --performer "Bo Burnham" --title "Inside" --year 2021 --apply \
|
||
"/home/admin/Downloads/Bo.Burnham.Inside.2021.1080p.NF.WEB-DL.DDP5.1.x264-NTb.mkv"
|
||
|
||
# Music video
|
||
./normalize.py --type musicvideo --artist "Daft Punk" --year 2013 \
|
||
--track "Get Lucky" --apply \
|
||
"/home/admin/Downloads/daft.punk.get.lucky.official.video.1080p.mkv"
|
||
```
|
||
|
||
### 11.2 Idempotency proof
|
||
|
||
Running the script twice on the same input produces the same target. The
|
||
second run's source = first run's target, so `is_already_canonical()`
|
||
returns true, and the script no-ops. Verified in unit tests (see
|
||
`/opt/docker/jellyfin/scripts/test_normalize.py` — to be added in doc 07's
|
||
implementation phase).
|
||
|
||
---
|
||
|
||
## 12. Edge cases catalogue
|
||
|
||
### 12.1 Episodes with very long titles
|
||
|
||
```
|
||
The Office (2005) - S07E25-E26 - Search Committee.mkv ← multi-ep, short title, fine
|
||
Sherlock (2010) - S04E03 - The Final Problem.mkv ← long-ish, fine
|
||
Steins;Gate (2011) - S01E22 - Being Meltdown - The Concerto Whose Conductor Has Lost His Baton.mkv
|
||
```
|
||
|
||
The third example is 110 chars before extension. `ext4` allows 255 bytes
|
||
per filename component; this fits. Smart title case applied; no `:` (the
|
||
title has no colon — the long string is the actual title from MyAnimeList).
|
||
If a title has a colon, it becomes ` - ` per § 5.5, which slightly
|
||
extends the length but doesn't cap.
|
||
|
||
### 12.2 Episodes with `.` in the title
|
||
|
||
```
|
||
Mr. Robot (2015) - S01E01 - eps1.0_hellofriend.mov.mkv ← title contains `.mov`
|
||
```
|
||
|
||
`.mov` inside the title is technically a substring that *looks* like a
|
||
container type. The parser doesn't care (the extension is `.mkv`, parsed
|
||
last). Keep as-is. Smart title case leaves the lowercase intentional
|
||
formatting (it's the title's actual stylization).
|
||
|
||
### 12.3 Shows with numeric titles
|
||
|
||
```
|
||
1923 (2022) - S01E01 - 1923.mkv ← year-as-title, year-as-disambiguation
|
||
24 (2001) - S01E01 - Day 1 - 12-00 AM-1-00 AM.mkv ← `:` from title became ` - `
|
||
```
|
||
|
||
The `24` / `1923` cases would fail year extraction if the show year is
|
||
omitted. Year hint via `--year` is mandatory for these.
|
||
|
||
### 12.4 Two-part single episodes (multi-part files)
|
||
|
||
Doc 05 § 2 mentions `Series A S02E03 Part 1.mkv` / `Part 2.mkv`. Canonical:
|
||
|
||
```
|
||
TV/Show (Year)/Season 02/Show (Year) - S02E03 - Title - part 1.mkv
|
||
TV/Show (Year)/Season 02/Show (Year) - S02E03 - Title - part 2.mkv
|
||
```
|
||
|
||
Use lowercase `part` (Jellyfin parser is case-insensitive but lowercase
|
||
is more common in docs).
|
||
|
||
### 12.5 Source has no episode title
|
||
|
||
```
|
||
Source: Show.S01E01.1080p.WEB-DL.x264-NTb.mkv
|
||
|
||
Target: Show (Year) - S01E01.mkv
|
||
```
|
||
|
||
Empty episode title → omit. The script does this already (§ 11
|
||
`emit_canonical()` checks `if parts.episode_title`). Jellyfin will
|
||
backfill the title from TVDB on first scrape.
|
||
|
||
### 12.6 Source has WRONG episode title
|
||
|
||
If the rip's episode title is different from TVDB's canonical (e.g. a
|
||
Polish translation of an English-language show, or a non-canonical
|
||
sub-group title), prefer the **TVDB title** (English, official). This
|
||
requires manual intervention — pass `--ep-title "Canonical Title"` or
|
||
edit after the rename. Not automated.
|
||
|
||
### 12.7 Dual-audio (sub+dub in one file)
|
||
|
||
If the mkv has both audio tracks, omit the `[Sub]`/`[Dub]` suffix:
|
||
|
||
```
|
||
Anime/One Piece/One Piece - 0001 - I'm Luffy.mkv ← dual audio in container
|
||
```
|
||
|
||
The user can pick the audio track from the player. The filename only
|
||
needs to disambiguate when *separate files* exist.
|
||
|
||
### 12.8 Mid-season hiatus / split seasons
|
||
|
||
Some shows split S01 into "Part 1" and "Part 2" (Better Call Saul,
|
||
Stranger Things). Treat as **one season**:
|
||
|
||
```
|
||
TV/Stranger Things (2016)/Season 04/
|
||
├── Stranger Things (2016) - S04E01 - The Hellfire Club.mkv ← Vol 1
|
||
├── ...
|
||
├── Stranger Things (2016) - S04E07 - The Massacre at Hawkins Lab.mkv ← Vol 1 finale
|
||
├── Stranger Things (2016) - S04E08 - Papa.mkv ← Vol 2 start
|
||
└── Stranger Things (2016) - S04E09 - The Piggyback.mkv ← Vol 2 finale
|
||
```
|
||
|
||
TVDB lists S04 as one season, episodes 1-9. The hiatus is invisible to
|
||
the parser. Don't create `Season 04 Part 1/`.
|
||
|
||
---
|
||
|
||
## 13. Verification checklist (doc 07 will use this)
|
||
|
||
Before declaring a normalized file "imported":
|
||
|
||
1. Filename matches the canonical regex for its category (§ 1).
|
||
2. No forbidden chars (§ 5.5) in any part of the path.
|
||
3. No group tags / quality / codec / source / audio tags in the basename
|
||
(§ 2).
|
||
4. Folder structure matches § 1.x for the category.
|
||
5. Year is in `(YYYY)` and matches the actual release year (movies/TV).
|
||
6. `Season NN/` is zero-padded (TV / anime-seasonal).
|
||
7. Episode S/E numbers zero-padded to two digits (three for >99).
|
||
8. Smart title case applied to all title-bearing components.
|
||
9. Apostrophes are ASCII (`'`), dashes are ASCII (`-`).
|
||
10. Diacritics in NFC form (UTF-8 encoded canonically).
|
||
11. The script's `is_already_canonical()` returns true on the result —
|
||
re-running the normalizer leaves the file untouched.
|
||
12. Audit log line written to `/var/log/jellyfin-imports/<date>.log`.
|
||
|
||
If any check fails, the file is quarantined per doc 07 to a `_pending/`
|
||
subtree for manual review.
|
||
|
||
---
|
||
|
||
## 14. Quick reference card (for the operator)
|
||
|
||
| Category | Canonical shape | Example |
|
||
|---|---|---|
|
||
| Movie | `Movies/T (Y)/T (Y).mkv` | `Movies/Inception (2010)/Inception (2010).mkv` |
|
||
| Movie+edition | `Movies/T (Y)/T (Y) - E.mkv` | `Movies/Blade Runner (1982)/Blade Runner (1982) - Final Cut.mkv` |
|
||
| Movie+resolution | `Movies/T (Y)/T (Y) - NNNNp.mkv` | `Movies/Blade Runner 2049 (2017)/Blade Runner 2049 (2017) - 2160p.mkv` |
|
||
| TV episode | `TV/S (Y)/Season NN/S (Y) - SXXEYY - Title.mkv` | `TV/Futurama (1999)/Season 01/Futurama (1999) - S01E01 - Space Pilot 3000.mkv` |
|
||
| TV multi-ep | `... - SXXEYY-EZZ - Title.mkv` | `Futurama (1999) - S01E03-E04 - I, Roommate / Love's Labours.mkv` |
|
||
| TV special | `... /Season 00/... - S00EYY - Title.mkv` | `Futurama (1999) - S00E01 - Bender's Big Score.mkv` |
|
||
| Anime seasonal | same as TV | `Cowboy Bebop (1998) - S01E01 - Asteroid Blues.mkv` |
|
||
| Anime absolute | `Anime/S/S - NNNN - Title [Sub].mkv` | `One Piece - 0001 - I'm Luffy [Sub].mkv` |
|
||
| Music video | `MV/A/Y - T.mp4` | `Daft Punk/2013 - Get Lucky.mp4` |
|
||
| Stand-up | `Movies/P - T (Y)/P - T (Y).mkv` | `Bo Burnham - Inside (2021)/Bo Burnham - Inside (2021).mkv` |
|
||
| Extra (folder) | `<item folder>/<lowercase folder>/Title.mkv` | `featurettes/Welcome to the World of Tomorrow.mkv` |
|
||
| Extra (suffix) | `... - Title-featurette.mkv` | `Inception (2010) - Dreams Within Dreams-featurette.mkv` |
|
||
| Subtitle | `<basename>.<lang>[.flag].srt` | `Futurama (1999) - S01E01.eng.srt` |
|
||
|
||
---
|
||
|
||
## 15. Cross-references
|
||
|
||
- Doc 05 § 0 — top-level filename rules (forbidden chars, year-in-parens,
|
||
one folder per item).
|
||
- Doc 05 § 1.2 — Jellyfin's accepted movie regex.
|
||
- Doc 05 § 2.2 — Jellyfin's accepted TV regex (table of patterns).
|
||
- Doc 05 § 3.1–3.3 — anime numbering strategies (which we map to § 1.3
|
||
and § 1.4 here).
|
||
- Doc 05 § 8 — extras folder names (which we lowercase per § 4.5).
|
||
- Doc 03 — sidecar subtitle naming (referenced in § 2.7 and § 14).
|
||
- Doc 02 — what the scraper does after the rename, including the
|
||
`RemoteSearch/Apply` recipe to fix mis-matches.
|
||
- Doc 07 (sibling) — the operational pipeline (move, dedupe, GC) that
|
||
consumes this ruleset. When doc 07 lands, link from § 13's
|
||
verification checklist into doc 07's quarantine / re-run flow.
|
||
|
||
---
|
||
|
||
## 16. Open items / known drift
|
||
|
||
- Live `/home/user/media/tv/Futurama/` lacks the year — should be
|
||
`Futurama (1999)/`. Migration covered in doc 07.
|
||
- The script's TV-title-extraction does not yet handle parent folders
|
||
named `Specials` (mapping to `Season 00`). Workaround: rename the
|
||
folder first, then run normalize. Codify in v2.
|
||
- Edition detection priority list has been chosen by frequency-of-rip,
|
||
not by canon. If a future Blade Runner gets a "Workprint Edition"
|
||
release, the list grows.
|
||
- No automated tests for `normalize.py` yet — covered by doc 07 once
|
||
that doc lands.
|
||
|
||
---
|
||
|
||
End of doc 08. The script in § 11 is the canonical source of truth; this
|
||
doc explains it. When in doubt, run `normalize.py --help` and read the
|
||
top docstring.
|