diff --git a/docs/RESEARCH-2B2T-QUEUE.md b/docs/RESEARCH-2B2T-QUEUE.md new file mode 100644 index 0000000..e508050 --- /dev/null +++ b/docs/RESEARCH-2B2T-QUEUE.md @@ -0,0 +1,75 @@ +# Research: 2b2t Queue / Login Gatekeeper + +Read-only reference for AuthLimbo v2 design. Last updated 2026-05-07. + +## TL;DR +- **Architecture**: BungeeCord-style proxy plus a separate "queue server" (a stripped-down Minecraft instance acting as a holding world); the main Paper server is gated behind it. +- **Drain model**: Slow FIFO with a small reserved pool for paid priority — pacing is what protects main from join-flood crashes more than any explicit packet shaper. +- **Drama**: Almost every controversy (paid priority, veteran-queue removal, prio-strip ban waves) is policy-layer, not technical. Avoid the policies; copy the architecture. + +## 1. Architecture +- Two-tier: **Velocity/Bungee proxy** -> **queue server** (limbo holding JVM) -> **main Paper server**. Queue is its own process, not a plugin on main. +- Public clones use the same shape: `PistonQueue` (Bungee+Velocity, v4.0.0 Apr 2026, most production-grade), `AnarchyQueue` (Velocity, pairs with `QueueServerPlugin` on the limbo instance), `LeeesBungeeQueue` (archived 2025-04-28, 1.12.2 cap). +- Queue state is **in-memory** on the proxy; clones don't persist across restart. Disconnect = back of line. + +## 2. Queue Mechanics +- Pure FIFO inside each tier. Tiers historically: priority -> veteran -> regular. Today: priority -> regular. +- Slot allocation: ~200 reserved slots for priority on ~1000-cap main; regular advances only when a non-reserved slot frees. +- Drain rate is wall-clock, not packet-throttled — 1000-deep regular queue = 6-12h. +- ETA = naive `position * avg_drain`. Wrong because priority steals slots from above; ETA can go *up*. + +## 3. AFK + Reconnect +- 2016 queue: reconnect every ~30s, drove hacked-client adoption. Replaced within a year by limbo-queue with auto-updating position. +- Main: 15-min idle disconnect. Queue: long-lived TCP; drop = position lost. `2bored2wait` (archived) proxies queue locally for headless waiting. + +## 4. Priority Queue +- Separate FIFO + reserved slot pool. Tier check = permission/uuid lookup on join. +- Pricing: $19.99/mo originally, now $29/mo via 2b2t.shop. +- TheCampingRusher held add/remove power on priority + veteran lists; Torogadude incident. +- Reserved-slot design means a queue can exist even when main isn't full — structurally pay-to-skip. + +## 5. Chunk-Load / Crash Mitigation +- Queue server runs near-empty world; no chunk gen, minimal ticks, absorbs thousands of idle TCP sessions cheaply. +- Pacing the drain protects main's chunk pipeline; no explicit login-packet shaper beyond letting `PlayerJoinEvent` finish before pulling next. +- **Nocom (Jul 2018 - Jul 2021)**: unrate-limited `CPacketPlayerDigging` flood on queue starved keepalives, forced mass disconnects, skipped queue. Hausemaster: 500 pkt/s late-2019; factor-14 May 2020; factor-8 next day; factor-2 Jul 2021; full patch 2021-07-15. Leijurv's Monte Carlo particle-filter tracker (2020-2021) kept working at 2 checks/s. + +## 6. Veteran Tier +- Whitelist: `joined_before = 2016-06-01`, offline lookup against historical login data. +- Removed **2017-12-04** explicitly to "increase incentive to buy priority". Trust burned. + +## 7. Bot Ecosystem +- Mineflayer / headless clients sit in queue 24/7 — indistinguishable from a human leaving client running. +- Detection: behavior only (instant logout on join, scripted movement). "Good" bot = afk-for-owner; "exploit" bot = multi-account prio-skip or queue-bypass client. +- For AuthLimbo v2: AFK bots in pre-auth limbo cost ~nothing. Gate at promote-to-main, not join-limbo. + +## 8. Failure Modes +- Nocom-era queue crashes dropped 1000+ waiting players. +- "Ghost queue" — players queued but TCP dead — caused by keepalive starvation, fixed by rate limits. +- Recovery: full restart loses all positions. No persisted state. + +## 9. Public Clones — Survey +- **PistonQueue** — Bungee+Velocity, reserved slots, shadow-ban, pre-queue auth, active. +- **AnarchyQueue** — Velocity, minimal, needs `QueueServerPlugin` companion. +- **LeeesBungeeQueue** — archived 2025. +- **Shirodo-Queue**, **eslym/bungee-queue** — toy reimplementations. +- Common mistakes: in-memory only, no priority-abuse audit log, no rate-limit on queue's own packet handlers (re-creates Nocom-class risk). + +## 10. Drama Timeline +- **2016-06** Rusher influx; queue introduced. +- **2016-2017** Rusher holds add/remove power on priority + veteran lists. +- **2017-12-04** Veteran queue removed. Mass quits. +- **2018-07 / 2021-07** Nocom queue-bypass exploit + tracking. +- **2022-04** ~40 prio-stripped + banned over a doxxing chain. +- **2022-12-07** 500+ accounts prio-banned cumulatively; `2builders12rules` discord forms to track strips. + +## Drama-Avoidance Principles for AuthLimbo v2 +1. **No paid priority. Ever.** FIFO only; no money-tied reserved slots. +2. **No hidden-criteria veteran tier.** If seniority exists, rule is public, automated, irrevocable. +3. **No staff add/remove of queue position.** Admin commands log to append-only audit; no silent privilege. +4. **Persist queue state.** Position survives proxy restart (sqlite/redis). +5. **Rate-limit every packet handler in limbo.** Nocom is the canonical lesson. +6. **Honest ETA or no ETA.** Position only, or confidence interval — no fake countdowns. +7. **Privacy-first limbo (AuthLimbo thesis):** new joiners isolated from main-world coords/inventory until AuthMe login completes. +8. **Bots welcome in limbo, gated at promote.** Don't fight Mineflayer pre-auth. +9. **Open source the gatekeeper.** Hausemaster's plugin is closed; opacity amplifies drama. +10. **Document idle/disconnect rules in-game.** No silent kicks. diff --git a/docs/RESEARCH-LIMBO-PLUGIN-SURVEY.md b/docs/RESEARCH-LIMBO-PLUGIN-SURVEY.md new file mode 100644 index 0000000..5f6b41b --- /dev/null +++ b/docs/RESEARCH-LIMBO-PLUGIN-SURVEY.md @@ -0,0 +1,165 @@ +# RESEARCH — Limbo / Queue / Auth Plugin Survey + +Read-only research feeding **AuthLimbo v2**. 2026-05-07. + +--- + +## 1. TL;DR + +**Top-3 STEAL** (vendor / shade / depend): + +1. **Elytrium LimboAPI** (AGPL-3.0, Velocity) — virtual fake-server + primitives at the Velocity packet layer. License-compatible, exactly + the abstraction we need for "hold pre-login on the proxy, never let + the player touch the Paper world". +2. **Elytrium LimboAuth** (AGPL-3.0, Velocity) — production auth flow + built on LimboAPI. AuthMe-import path, BCrypt+TOTP, weak-password + list. We can fork or depend; AGPL == AGPL. +3. **PistonQueue** (Apache-2.0, Bungee+Velocity+Bukkit) — closest + open-source 2b2t-style queue, actively maintained, permissive + license (we can shade safely into AGPL). + +**Top-3 PATTERN** (read & re-implement): + +1. **AnarchyQueue (zeroBzeroT)** — clean Velocity/Paper split, separate + queue-server, position-update cadence; small enough to read + end-to-end. +2. **LeeesVelocityQueue** — minimal MIT priority/bypass model; good + reference for *non-paid* trust-tier permissions. +3. **LimboFilter** — anti-bot CAPTCHA + packet-prep tricks; pattern + only since AGPL fork would entangle us further. + +**Stack decision:** **Velocity + Paper, both required.** Pre-auth +holding belongs at the proxy (LimboAPI virtual server) — Paper-only +can't truly hide the world. Paper plugin keeps the post-auth +chunk-preload + void-guard from current AuthLimbo. See §3. + +--- + +## 2. Per-plugin detail + +| Plugin | License | Stack | Last release | Status | Rating | +|---|---|---|---|---|---| +| Elytrium LimboAPI | AGPL-3.0 | Velocity | 1.1.26 (2024-09) | Active, slowing | STEAL | +| Elytrium LimboAuth | AGPL-3.0 | Velocity (LimboAPI) | 1.1.14 (2024-06) | Active | STEAL | +| Elytrium LimboFilter | AGPL-3.0 | Velocity (LimboAPI) | 1.1.18 (2024-06) | Active | PATTERN | +| PistonQueue (AlexProgrammerDE) | Apache-2.0 | Velocity+Bungee+Bukkit | 4.0.0 (2026-04) | Very active | STEAL | +| AnarchyQueue (zeroBzeroT) | custom permissive (no-warranty) | Velocity | 3.0.13 (2025-10) | Active | PATTERN | +| LeeesVelocityQueue | MIT | Velocity | 1.0.1 (2025-07) | Light, alive | PATTERN | +| ajQueue | GPL-3.0-only | Velocity+Bungee+Paper | active 2.x | Active | PATTERN (license clash w/ AGPL is one-way OK) | +| McMackety/velocity-queue | GPL-3.0 | Velocity (Kotlin) | 1.1.2 (2021-06) | **Archived** | SKIP | +| Shirodo-Queue | MIT | Bungee | none | Hobby | SKIP | +| ProjectPersistence/queue | n/a | mixed | n/a | **404** | SKIP | +| NanoLimbo (Nan1t) | GPL-3.0 | standalone+proxy fwd | 1.12.0 (2026-04) | Active | PATTERN (no auth/queue, but reference impl) | +| NanoLimboPlugin (bivashy) | GPL-3.0 | Velocity+Bungee | 1.8.1 (2024-06) | Maintenance | PATTERN | +| AuthMe-Reloaded | GPL-3.0 | Spigot/Paper/Folia/Bungee/Velocity | 5.7.0 (2026-04) | Active | KEEP (current dep, not a v2 base) | +| kennytv/Maintenance | GPL-3.0 | Paper/Bungee/Velocity/Sponge | active | Active | PATTERN (motd + whitelist gate UX) | +| EaglerProxy | n/a | JS shim | active | Off-target | SKIP — not our threat model | +| TitanProxy | closed-source | n/a | n/a | n/a | SKIP | + +Notes: +- **NanoLimbo ≠ NanoLimboPlugin.** Former is a standalone Netty + server; latter wraps it as a proxy plugin. Neither does auth. +- **ProjectPersistence/queue** URL 404'd; treat as dead. +- **McMackety/velocity-queue** archived 2021-08; Kotlin code is + readable but do not depend. + +--- + +## 3. Recommended architecture for AuthLimbo v2 + +``` +client ──► Velocity proxy ──► [LimboAPI virtual server: auth + queue] + │ + ▼ (only after auth+queue cleared) + Paper backend ──► [auth-limbo Paper plugin: + chunk-preload, void-guard, + inventory snapshot] +``` + +### Velocity side (new module `auth-limbo-velocity`) + +- **Depend:** `com.velocitypowered:velocity-api:3.4.x` +- **Depend (compileOnly+shade):** `net.elytrium:limboapi:1.1.26` + (AGPL — fine, we are AGPL). +- **Vendor / fork:** parts of `LimboAuth` for the auth state-machine + (BCrypt verify against AuthMe schema, TOTP, weak-password list). Do + not pull the H2/MySQL stack — read AuthMe's existing SQLite directly + to keep one source of truth. +- **Queue logic:** port PistonQueue's `QueueListener` + position + ticker (Apache-2.0 → AGPL is a clean re-license). Strip its paid + tiers; replace with permission-based trust tiers + (`authlimbo.priority.trusted`, `.regular`, no `.donor`). +- **Anti-bot:** PATTERN from LimboFilter — client-brand check + join + rate-limit; skip the CAPTCHA for now (UX cost too high for a + small server). + +### Paper side (existing `auth-limbo` plugin, becomes +`auth-limbo-paper`) + +- Keep current chunk-preload + void-world generator. +- Land ROADMAP F1 (void-damage guard), F2 (TP retry), F3 (3×3 + preload), F5 (inventory snapshot) — these are *post-auth* defences + and remain Paper-side. +- Drop responsibility for "hide world pre-auth" — Velocity holds it + now. + +### Shared + +- Plugin-message channel `authlimbo:handshake` carries `{uuid, + trust-tier, reconnect-token}` Velocity → Paper so the Paper side + knows the player already passed auth+queue and skips its own login + gate. + +### Maven coords + +`net.elytrium:limboapi-api:1.1.26` (AGPL, compileOnly), +`com.velocitypowered:velocity-api:3.4.0` (MIT), +`AlexProgrammerDE/PistonQueue:4.0.0` (Apache-2.0, study), +`io.papermc.paper:paper-api:1.21.11-R0.1` (GPL-3.0, compileOnly). + +--- + +## 4. License compatibility matrix + +Outbound: AuthLimbo v2 = **AGPL-3.0**. Inbound combinations: + +| Source license | Compatible direction | Action | +|---|---|---| +| AGPL-3.0 (LimboAPI/Auth/Filter) | bidirectional | depend or shade freely | +| GPL-3.0 (NanoLimbo, ajQueue, AuthMe, Maintenance) | one-way (GPL → AGPL ok) | depend; cannot upstream patches without coordination | +| Apache-2.0 (PistonQueue) | one-way (permissive → AGPL) | shade or copy with NOTICE | +| MIT (LeeesVelocityQueue, Shirodo) | one-way | shade or copy with attribution | +| Custom no-warranty (AnarchyQueue) | unclear | **read code, do not vendor**; re-implement | +| Closed (TitanProxy, EaglerProxy logic) | n/a | skip | + +AGPL §13 invariant: if we ship a network service modified from +LimboAuth, source must be offered. Forgejo `git.s8n.ru` already +satisfies this for our fleet. + +--- + +## 5. Risks + +1. **Elytrium upstream slowdown** — last release mid-2024. Pin to + tag, plan soft-fork at git.s8n.ru for 1.21.11+ protocol fixes. +2. **AGPL §13** — modified network deploys need source-link. Footer + + `/authlimbo source` covers it. +3. **PistonQueue size** — selective copy beats shading whole jar. +4. **AnarchyQueue licence ambiguity** — no-warranty header not OSI; + read-only. +5. **Velocity↔Paper handshake** is a new failure mode; need + integration test before deploy. +6. **No CAPTCHA** = bot-flood exposure. Acceptable for small private + server; revisit if we open up. +7. **Reconnect token storage** (SQLite vs in-memory) still pending. + +--- + +## 6. Sources + +Elytrium/{LimboAPI,LimboAuth,LimboFilter}, Nan1t/NanoLimbo, +bivashy/NanoLimboPlugin, AlexProgrammerDE/PistonQueue, +zeroBzeroT/AnarchyQueue, XeraPlugins/LeeesVelocityQueue, +McMackety/velocity-queue (archived), ShirodoBurak/Shirodo-Queue, +AuthMe/AuthMeReloaded, kennytv/Maintenance, modrinth/ajqueue. diff --git a/docs/V2-ARCHITECTURE.md b/docs/V2-ARCHITECTURE.md new file mode 100644 index 0000000..b868555 --- /dev/null +++ b/docs/V2-ARCHITECTURE.md @@ -0,0 +1,484 @@ +# AuthLimbo v2 — Architecture + +Status: **Design draft** (no code). Drafted 2026-05-07 by the auth-limbo +v2 design pass after the YOU500 / second-player void-death incidents. +Audience: operator (P) and future contributors. + +Companion docs: +- [`AUDIT-2026-05-07.md`](../AUDIT-2026-05-07.md) — root-cause forensic. +- [`ROADMAP.md`](../ROADMAP.md) — v1.x tracking (F1-F7). +- [`V2-ROADMAP.md`](V2-ROADMAP.md) — milestones M0-M5 for v2. + +--- + +## 1. Why v2 + +v1 is a single-jar Paper plugin glued onto AuthMe. It works *most* of +the time, but its core failure modes are now well-understood and can't +be patched away inside the v1 design: + +| v1 limitation | v2 must address | +|---------------|------------------| +| Player object exists on the main server *before* auth — coords/inventory technically restorable from RAM by buggy plugins, world chunk activity is observable. | Strong isolation: limbo is the only state the player can touch pre-auth. | +| Restore relies on AuthMe firing `LoginEvent`. AuthMe's own broken teleport runs in the same window — F4 pre-empts it but the design still races. | Authoritative state machine that doesn't trust AuthMe's teleport at all. | +| Inventory loss on transit-death depends on F1 + F5 holding. There is no inventory-of-record outside live game state. | Snapshot-on-pre-login + snapshot-restore is a first-class subsystem, not a defensive add-on. | +| No metrics, no audit log, no admin alerting. Bugs only surface when a player loses gear. | Built-in observability: Prometheus + JSON-Lines audit + Discord webhook. | +| No queue / login-throttle. If 50 bots connect at once, AuthMe stalls. | Bounded concurrency with transparent FIFO and trust tiers (NOT pay tiers). | + +v2 is a clean break (`v2.0.0`), not a v1 patch. v1 stays receiving F3, +F5, F6, F7 backports for as long as racked.ru still runs the old jar. + +--- + +## 2. Stack decision — **Paper-only**, with a Velocity-ready seam + +**Recommendation: Paper-only single-server plugin for v2.0.0.** +Velocity-mode is a v2.x deferrable behind a feature flag. + +### Reasoning + +racked.ru today is one Purpur 1.21.11 server in `minecraft-mc` itzg +container on nullstone. There is no Velocity / BungeeCord, no second +backend, no Forced Hosts, no proxy network. Adding Velocity to ship a +gatekeeper plugin would mean: + +- standing up a new container, opening a new public port (or keeping + 25565 on the proxy and 25566 internal), +- migrating the 12+ existing Paper plugins through the velocity-paper + bridge contract for chat / commands / placeholders, +- new TLS / RCON / proxy-protocol surface to harden, +- breaking changes to AuthMe's data flow (proxy-side login flow vs + paper-side `AuthMeAsyncPreLoginEvent`), +- one more thing for the operator to babysit. + +The privacy property the operator cares about — *no other player sees +pre-auth coords / inventory* — is achievable on Paper-only via a +strictly isolated limbo world + audience scoping (see §4). Velocity adds +*stronger* isolation (player never reaches the backend at all) but the +incremental privacy gain is small for a 0-10 player community, and the +operational cost is large. + +### When Velocity becomes worth it + +Codify trip-wires up front so the decision isn't dragged out: + +1. racked.ru splits into ≥2 backends (e.g. `survival` + `creative`) — + you need a proxy anyway. +2. cobblestone server comes online and shares an account/auth pool. +3. Botting attempts cross 100 connections / minute and `connection-throttle` + + `firewalld rate-limit` are no longer enough. Velocity + a queue + plugin (Ajax / VeloctyQueue) become operationally cheaper than + chasing botnets at the application layer. + +Until any of those, Paper-only is the right answer. + +### The Velocity-ready seam + +v2 internal API is split into two layers so the proxy migration is +mechanical: + +``` ++-------------------------------+ +-------------------------------+ +| Gatekeeper (proxy or paper) | | Restore (paper only) | +| - accept connection | | - read snapshot | +| - check ban / rate limit | | - chunk preload | +| - hold in limbo / queue | | - authoritative TP | +| - hand off on auth-success | | - publish metrics | ++--------------+----------------+ +-------------------------------+ + | hand-off event (UUID, target Location, source IP) + v +``` + +In v2.0 both layers live in the Paper plugin and the hand-off is just a +local method call. In a future "v2-velo" both layers split: gatekeeper +runs as a Velocity plugin, restore stays on Paper, hand-off becomes a +plugin-message channel. No code outside those two layers needs to +change. + +--- + +## 3. Queue model — login-throttle + transparent trust tiers, NO 2b2t-style sale + +**For 0-10 player normal load: queue depth is always 0 and players +never see "queued" UI. The queue exists for crisis scenarios (bot +flood, restart drain, AuthMe DB stall) and to define explicit policy +even if it's rarely hit.** + +### Policy + +| Tier | Definition | Effect | +|------|-----------|--------| +| `staff` | Player has `authlimbo.queue.priority.staff` permission (LP-managed). | Always passes. Bypasses queue entirely. | +| `returning` | Player is in AuthMe DB AND has logged in within last 30 days. | Default tier for everyone who isn't new. Normal FIFO ordering by connect-time. | +| `new` | Player is NOT in AuthMe DB OR last seen >30 days ago. | Same FIFO as `returning` BUT with a per-IP 1/minute throttle. Stops bot-floods. | +| `flagged` | Player IP matches a Pi-hole/CrowdSec/abuse-DB block. | Rejected at gatekeeper, never enters the queue. | + +Hard rules — written into `V2-ARCHITECTURE.md` so they outlive any one +operator's mood: + +1. **No paid priority. Ever.** No "priority queue pass", no + "supporter rank skip", no Patreon tier. The 2b2t community + collapsed under that grift; we don't repeat it. +2. **No hidden veteran tier.** Every tier is documented in this file + and in `/authlimbo queue policy` in-game. If a player can't see why + they're in tier X, the tier is illegitimate. +3. **No in-game bidding / griefing for queue spots.** Queue position + is purely connect-time + tier; no player action affects it. +4. **Ops-staff bypass is logged.** Every staff bypass writes a JSON-L + audit row. + +### Capacity + +- `gatekeeper.max-concurrent-auth: 5` — at most 5 players in the + pre-auth limbo at once. Defaults sized for racked.ru. AuthMe DB + reads + chunk pins per concurrent player are roughly free, but bound + it anyway. +- `gatekeeper.max-queue-depth: 50` — beyond 50 waiting, new + connections get a "server is starting up, try again in 30s" kick. + Better UX than a 5-minute black-screen wait. +- `gatekeeper.queue-timeout-seconds: 120` — anyone in the queue >2 + minutes gets the same kick + a Discord webhook fires. + +### What queue UX looks like + +In limbo, a `BossBar` (Adventure API) shows tier + position: + +``` +[returning] Queue position: 3 / 7 ETA: ~15s +``` + +When position == 0 and AuthMe accepts, the bar disappears. There's no +hidden state. `/queue` in-chat re-displays the same info. + +--- + +## 4. Privacy isolation + +This is the original feature; v2 must not regress it. + +### Limbo world + +- Separate Bukkit world `auth_limbo`, `Environment.THE_END`, + `VoidGenerator`. Same as v1. +- `keepSpawnInMemory=true`. Game-rules: no daylight, no weather, no + mobs, no fire-tick, no PvP, `doImmediateRespawn=true`, + `keepInventory=true` (defence-in-depth — limbo never *should* see a + death event but if it does, no item drops happen). +- Per-player view-distance forced to 2 in limbo via Paper's + `Player#setViewDistance`. They see 5x5 chunks, all empty. +- Limbo platform: 5x5 of `BARRIER` blocks at y=127, single block of + `BARRIER` ceiling at y=129 to prevent flying out. y=0..126 and + y=130+ are pure void. + +### Adventure-API audience scoping + +`PlayerChatEvent` listener at `EventPriority.HIGHEST`: + +- If sender is in main worlds, recipient list is filtered: anyone + whose `World#getName().equals("auth_limbo")` is dropped. Pre-auth + players never see overworld chat. +- If sender is in limbo (would normally not chat — AuthMe blocks it + — but defence in depth), recipient list is set to *only* the + sender. They cannot leak messages to the main world. +- `PlayerJoinEvent` join messages are suppressed for + `auth_limbo`-spawn joins. Main world only sees a join announcement + *after* the authoritative restore TP succeeds (M2 §"join-message + shifting" below). + +### Tablist scoping + +Hook `PaperPlayerListEntryEvent` (or fall back to +`PlayerJoinEvent` + `Player#hidePlayer`): + +- Limbo players are hidden from main-world tablist. +- Main-world players are hidden from limbo tablist. +- Limbo players cannot see each other (each limbo player sees only + themselves). + +### What main world observers can detect + +After scoping: + +- They cannot see the player's name in tablist pre-auth. +- They cannot see chat from the player. +- They cannot see the player's world or coordinates (AuthMe blocks + movement output anyway, but we don't rely on it). +- They CAN see the connection event in server logs (operator-only). +- They can see "PLAYER joined the game" only AFTER restore succeeds + — join message is shifted to fire on restore-success, not on + initial connect. + +This matches the v1 privacy posture and tightens the join-message +leak. + +--- + +## 5. Login flow — explicit state machine + +``` +[CONNECT] ---throttle ok---> [GATE] + | + failed throttle / ban | + v v + [REJECTED] [SNAPSHOT] <-- read AuthMe DB, + | dump current invent + xp + loc + v to plugins/AuthLimbo/snapshots/.nbt + [LIMBO] + | + AuthMe /login ok + | + v + [PRELOAD] <-- 3x3 chunk pin around target + | + v + [RESTORE] <-- teleportAsync, retry up to 3 + | + +-----+-----+ + | | + success fail x3 + | | + v v + [LIVE] [SPECTATOR-AT-LIMBO + admin alert] +``` + +Each transition has: + +1. **Trigger event** (e.g. `LoginEvent` MONITOR). +2. **Pre-conditions** (e.g. UUID in `pendingTransit`). +3. **Side-effects** (e.g. metric counter, audit-log row). +4. **Failure handler** (next state on error). + +States persist in `plugins/AuthLimbo/state/.json` so a plugin +crash mid-flow can resume on rejoin. State file is deleted on +[LIVE] entry. + +### Snapshot subsystem + +**This is the operator-bug-survives-everything layer.** + +- On `AuthMeAsyncPreLoginEvent` (player just connected, NOT yet + auth'd): if a player file `world/playerdata/.dat` exists, + read it and shadow-copy to `plugins/AuthLimbo/snapshots/.nbt` + with timestamp. SHA-256 of file content is logged. +- `/authlimbo restore ` can roll back any restore by + feeding the snapshot through nbtlib (same as the void-death recovery + protocol from `feedback_mc_tp_safety.md`). +- Snapshots retained 7 days, then GC'd. Configurable. +- On `PlayerDeathEvent` while UUID in `pendingTransit`: + `keepInventory=true`, `event.getDrops().clear()`, log SEVERE, + trigger Discord webhook, schedule restore-from-snapshot on respawn. + +### Restore step (replaces v1's `doTeleport` + 10-tick delay) + +1. Read saved location from AuthMe DB (cached from pre-login — + single in-memory hashmap keyed by UUID, evicted on transit clear). +2. Compute 3x3 chunk grid centred on saved location. +3. `addPluginChunkTicket` on all 9 chunks. +4. `CompletableFuture.allOf(getChunkAtAsyncUrgently x9)` — wait for + all 9 to actually be loaded, not just the centre one (closes the + "loaded but neighbour unloaded" race). +5. `teleportAsync(saved, PLUGIN)`. If `false`: F2 retry loop (already + in v1.1.0, carries over). +6. On success: 5-tick delay, then verify + `player.getLocation().distance(saved) < 2.0`. If not, treat as a + silent failure → retry. +7. Release tickets 5s post-success. +8. Mark transition [LIVE], publish `auth_login_success_total` + metric, write audit-log row, send delayed join-message to main + world, clear snapshot. + +### F8 — drop the SPECTATOR pre-TP trick + +v1 considered "set GameMode.SPECTATOR before TP, revert after". v2 +does NOT do this — spectator mode has its own client-side render races +on chunk-load and silently swallows damage events that the F1 guard +*needs to see*. Instead: invariant-driven recovery (snapshot + retry + +admin alert) is the safety net. SPECTATOR is the final fallback after +3 failed retries (F6 in v1, kept for v2). + +--- + +## 6. Anti-drama checklist (2b2t lessons) + +Codified up-front so future "monetisation" pressure is rejected by +reference, not by argument. + +- [x] No pay-to-skip. Tier list above is the entire policy. +- [x] No hidden tier or undocumented bypass (staff bypass is logged). +- [x] No queue spot trading / selling. +- [x] No "queue position visible to others" — your position is only + visible to you. No social pressure surface. +- [x] Queue is purely FIFO + tier; no algorithm tweaks, no "lottery". +- [x] AGPL-3.0 means anyone can fork and self-host an alt + gatekeeper if they distrust ours. Operator-friendly. +- [x] Audit log is local-file JSON-L, not phoned home, not + centralised. Operator-readable, no hidden telemetry. + +--- + +## 7. Operational surface + +### Metrics (Prometheus) + +Exposed via embedded HTTP server bound to `127.0.0.1:9091` (loopback +only — Prometheus on nullstone scrapes via localhost): + +| Metric | Type | Labels | +|--------|------|--------| +| `authlimbo_connections_total` | counter | `tier`, `outcome={accepted, queued, rejected}` | +| `authlimbo_queue_depth` | gauge | — | +| `authlimbo_login_success_total` | counter | `tier` | +| `authlimbo_login_fail_total` | counter | `reason={timeout, authme_db, tp_failed_3x, ...}` | +| `authlimbo_void_damage_blocked_total` | counter | — | +| `authlimbo_snapshot_restored_total` | counter | — | +| `authlimbo_restore_duration_seconds` | histogram | `tier` | + +Trip-wire alerts (configured server-side, in +`prometheus/alerts.yml`, not in the plugin): + +- `authlimbo_login_fail_total{reason="tp_failed_3x"}` rate > 0 for 5m. +- `authlimbo_void_damage_blocked_total` rate > 0 for 1m. +- `authlimbo_queue_depth` > 10 for 5m. + +### Discord webhooks + +Plugin-side webhook fires on: + +- Snapshot restored (gear was about to be lost). +- 3x retry give-up (manual `/authlimbo tp` needed). +- Queue depth > config threshold. +- AuthMe DB unreachable. +- Plugin reload / crash. + +Webhook URL is in config, redacted from `/authlimbo dump`. + +### Audit log + +`plugins/AuthLimbo/audit.log` — JSON Lines, one row per state +transition. Fields: `ts`, `uuid`, `name`, `ip`, `tier`, `state`, +`prev_state`, `extra` (free-form JSON). Logrotate-compatible; rotates +at 100MB, keeps 7 files. + +### Reload-without-restart + +`/authlimbo reload`: + +- Re-reads `config.yml`. +- Drains in-flight transits to completion (no new joins accepted + during drain, max 30s wait). +- Re-binds metrics HTTP server if port changed. +- Re-creates limbo world if name/spawn changed. +- Discord webhook fires "reload completed in Xs". + +--- + +## 8. Failure modes & recovery + +| Failure | Detection | Recovery | +|---------|-----------|----------| +| Plugin crashes mid-restore | On startup, scan `state/*.json` files older than 30s. | For each: if player offline, leave snapshot; if online, treat as new transit, force re-restore from saved AuthMe loc. | +| Snapshot file corrupt / unreadable | NBT parse exception. | Fall back to AuthMe DB saved-loc; log SEVERE; webhook. Player may lose newest items but not entire inventory. | +| World save corrupts | Paper World#getChunkAtAsync throws. | After 3 retries: kick player with "server experiencing storage issue, try again in 5min"; webhook. | +| AuthMe DB unreachable | JDBC `getConnection` throws / read times out > 5s. | **Fail closed.** Reject connection at gatekeeper with kick: "auth service degraded". Log + webhook. Do NOT let player onto main world without auth. | +| Server `/stop` mid-login window | Paper shutdown hook. | `clearTransit` for all UUIDs, force-save snapshots, kick all limbo players with "server restarting, your gear is safe". | +| Race: AuthMe LoginEvent fires twice (HaHaWTH bug) | UUID already in `pendingTransit` and not in `RESTORE` state. | Idempotent — restore handler is a no-op if UUID is past [PRELOAD]. Log INFO. | +| Player disconnects in [LIMBO] | `PlayerQuitEvent`. | Clear pendingTransit + retry counter. Snapshot retained 7d. State file kept until snapshot GC. | + +`fail-open` is never the right choice for an auth gatekeeper. Every +failure mode resolves to either: keep player in limbo, or kick them. +Never advance them to main-world unauth'd. + +--- + +## 9. Migration from v1 + +In-place upgrade path (`v1.1.x` → `v2.0.0`): + +1. Stop server. +2. Drop new jar in `plugins/`. v2 jar is not v1-compatible — old + `AuthLimbo-1.x.jar` must be removed. +3. v2 detects `plugins/AuthLimbo/config.yml` from v1 and rewrites it + to v2 schema, leaving a `config.v1.bak` backup. +4. v2 detects `auth_limbo` world dir on disk and re-uses it (no + recreation, no data loss). +5. AuthMe DB schema unchanged — v2 still treats `authme.db` as + read-only authoritative. +6. New: `plugins/AuthLimbo/snapshots/` and + `plugins/AuthLimbo/state/` directories created, owned by the same + uid as the itzg container's runtime user. +7. Start server. v2 startup logs walk through migration steps. + +There is no DB migration. No mandatory player action. Permissions +node names change (`authlimbo.admin` is now +`authlimbo.command.admin`, etc.) — operator must update LP groups +(noted in CHANGELOG). + +--- + +## 10. Test plan + +### Unit (JUnit 5 + Mockito) + +- `LimboWorldManager` — barrier-platform construction is idempotent. +- `AuthMeDatabase.getQuitLocation` — returns `Location` for present row, + null for absent, null for malformed row. +- `Snapshot.serialize` / `deserialize` round-trip. +- State-machine: every transition rejects from invalid prev-state. + +### Integration (Paper test-server harness) + +- Stand up Paper 1.21.x in CI (Forgejo Actions runner on nullstone). +- Mock AuthMe via a stub plugin that fires `AuthMeAsyncPreLoginEvent` + and `LoginEvent` programmatically. +- Test scenarios: §5.1-5.6 from `AUDIT-2026-05-07.md` plus + v2-specific: queue overflow, snapshot-restore on death, + reload-without-restart, fail-closed on AuthMe DB down. + +### Stress (Bot flood) + +- 1000 fake connections in 60s using mineflayer or + [`MCBotsPro`](https://github.com/Sammy1Am/MCBotsPro). Verify: + - queue-depth bounded (gatekeeper kicks beyond max-queue-depth); + - no `pendingTransit` leak (size returns to 0 after); + - metrics counters consistent with audit log. + +### Chaos + +- Kill plugin (`/plugman unload AuthLimbo`) mid-restore, verify + state recovery on rejoin. +- `iptables -A OUTPUT -d -j DROP` and verify + fail-closed. +- `kill -9` itzg container during transit, verify next-startup + walks `state/*.json` and recovers. + +--- + +## 11. Versioning + release + +- v2.0.0 = breaking redesign (this doc), AGPL-3.0 retained. +- v2.1.0 = polish (BossBar UX, /queue command, more metrics). +- v2.2.0 = Velocity-mode behind feature flag. +- v1.x = receives F3, F5, F6, F7 backports until racked.ru cuts over + to v2; then archived. + +Coordinate naming: when the codename migration completes +(onyx→obsidian, nullstone→bedrock per +`gravel-laptop-build/ROADMAP.md`), the racked.ru server moves to +bedrock. v2.0.0 must run on both naming worlds without config drift. + +--- + +## 12. Open questions + +- BossBar UI — does the operator want it visible to limbo players, or + silent? Default proposed: visible. +- Snapshot retention — 7 days is the proposed default. Storage cost + is ~1 KB/snapshot for vanilla inventories, up to ~50 KB for + shulker-stuffed players. 100 active players → ~5 MB max. +- Webhook destination — same Discord channel as `s8n-ru` server-status + alerts, or a new channel? Default proposed: same channel, prefixed + `[AuthLimbo]`. +- v2.2 Velocity migration — needs a separate design pass once + cobblestone or a second backend is real. + +Sign-off pending operator review. diff --git a/docs/V2-ROADMAP.md b/docs/V2-ROADMAP.md new file mode 100644 index 0000000..a38e325 --- /dev/null +++ b/docs/V2-ROADMAP.md @@ -0,0 +1,309 @@ +# AuthLimbo v2 — Roadmap (M0-M5) + +Companion to [`V2-ARCHITECTURE.md`](V2-ARCHITECTURE.md). Tracks the +v2.0.0 implementation as ordered milestones with explicit acceptance +criteria, dependencies, and parking lots for non-blocking work. + +Status legend: `OPEN`, `WIP`, `BLOCKED`, `DONE`. +Owner: Claude Code agents under operator review. +Branching: every milestone lands on a feature branch +`v2/M{N}-` and merges into `v2-main` after acceptance. `v2-main` +becomes `main` at v2.0.0 release. + +Pre-requisite: v1.1.0 (F1 + F2 + F4) is on `main` and tagged. +v2 work begins on a fresh `v2-main` branch. + +--- + +## M0 · Foundations · OPEN + +**Goal:** Land the v2 skeleton so all later milestones plug into a +shared backbone. No behaviour changes for end-users. + +### Deliverables + +- New maven module `core` for the gatekeeper/restore split (Velocity-ready + seam). Existing `ru.authlimbo` package becomes `ru.authlimbo.paper`. +- `State` enum + `StateMachine` class (`CONNECT → GATE → SNAPSHOT + → LIMBO → PRELOAD → RESTORE → LIVE | REJECTED | SPECTATOR_FAIL`) + with persistence to `plugins/AuthLimbo/state/.json`. +- `AuditLog` writer (JSON-Lines append-only, logrotate-compatible). +- `MetricsRegistry` skeleton (counters, gauges, histograms — no HTTP + server yet, just in-memory accounting). +- Config-v2 schema + automatic v1→v2 migration with backup. +- Build: maven multi-module, sqlite-jdbc still shaded, Adventure API + brought in via Paper API (no extra shade). + +### Acceptance + +1. Plugin loads on Paper 1.21.11 with v1 config; v1→v2 migration runs + exactly once and writes `config.v1.bak`. +2. `/authlimbo state ` shows current state for any in-flight + player. +3. `audit.log` is created and rotates at 100MB (verified by manual + 100MB-noise injection). +4. All v1.1.0 behaviour is preserved (F1, F2, F4 still work + end-to-end on a stub-AuthMe test server). +5. Unit tests for state-machine transition validity pass in CI. + +### Dependencies + +None. M0 is the foundation. + +--- + +## M1 · Snapshot subsystem · OPEN + +**Goal:** Make inventory loss impossible regardless of any chunk / +teleport / damage bug downstream. + +### Deliverables + +- On `AuthMeAsyncPreLoginEvent`: copy `world/playerdata/.dat` + to `plugins/AuthLimbo/snapshots/-.nbt`, log + SHA-256. +- On `PlayerDeathEvent` while UUID is in `pendingTransit`: + `keepInventory=true`, drops cleared, SEVERE logged, Discord webhook + fired, schedule restore-from-snapshot on respawn. +- New command `/authlimbo restore [--snapshot=]` that + rolls back to a snapshot (uses bundled nbtlib equivalent or an + embedded reader). +- Snapshot retention GC: 7-day default, configurable, runs hourly. +- Metric: `authlimbo_snapshot_restored_total`. + +### Acceptance + +1. Forced-void-death during transit (test-harness `/limbo void `): + player respawns with full inventory + xp. +2. Snapshot files appear in `snapshots/`, SHA-256 logged on creation + and on read-back. +3. GC removes >7-day snapshots; verified by setting retention=10s in + test config. +4. `/authlimbo restore ` after a successful login restores + the pre-login inventory and sends an audit-log entry. + +### Dependencies + +M0 (audit log + state machine). + +--- + +## M2 · Privacy-isolation hardening · OPEN + +**Goal:** Tighten the limbo-world isolation surface — no leaks of +chat, tablist, or join messages between limbo and main world. Make +the privacy invariant testable. + +### Deliverables + +- `PlayerChatEvent` listener (HIGHEST): drop limbo-world recipients + from main-world chat; main-world recipients from limbo chat. +- Tablist scoping via `Player#hidePlayer(plugin, target)`: + - limbo players hidden from main-world tablist; + - main-world players hidden from limbo tablist; + - limbo players hidden from each other. +- Join-message shifting: suppress vanilla join message on initial + connect; fire delayed join message at state-machine [LIVE] + transition. +- Per-player view-distance forced to 2 in limbo + (`Player#setViewDistance(2)` on limbo entry, restore on exit). +- Limbo BARRIER ceiling at y=129 added to `LimboWorldManager`. + +### Acceptance + +1. With two test accounts (`alice` in main world, `bob` connecting + to limbo): `alice` does not see `bob` in tablist before `bob` + completes login. After login, `alice` sees `bob`'s join message + exactly once. +2. `bob` in limbo cannot see chat from `alice`. Verified via + integration test. +3. `bob` cannot fly out of limbo via creative/elytra (server starts + bob in survival; barrier ceiling prevents y>129). +4. Privacy invariant test (`PrivacyInvariantTest`) covers all six + scope boundaries (chat in/out, tablist in/out, join-msg before/after). + +### Dependencies + +M0. + +--- + +## M3 · Restore reliability (3x3 preload + chunk-ready verification) · OPEN + +**Goal:** Make the restore-teleport bullet-proof against the +"loaded-but-neighbour-unloaded" race that v1's F3 was designed for, +plus the silent-failure case where `teleportAsync` returns true but +the player is still at the old position. + +### Deliverables + +- 3x3 chunk preload around target (`addPluginChunkTicket` x9 + + `CompletableFuture.allOf(getChunkAtAsyncUrgently x9)`). +- Post-TP verification: 5 ticks after `teleportAsync` returns true, + check `player.getLocation().distance(saved) < 2.0`. If not, treat + as silent fail and retry. +- F2-style retry loop already from v1.1 carried over with v2 metrics + + audit log integration. +- Drop the SPECTATOR pre-TP trick (v1's F8 redesign): rely on the + snapshot + damage-guard layers instead. +- Metric: `authlimbo_restore_duration_seconds` histogram. + +### Acceptance + +1. AUDIT-2026-05-07 §5.1 (unloaded-chunk void) reproduces no + void-death and no inventory loss. Player lands at saved coords. +2. AUDIT-2026-05-07 §5.2 (invalid Y) escalates to + `SPECTATOR_FAIL` after 3 retries with audit-log + webhook. +3. New scenario: target at chunk-section boundary + (e.g. (16, 70, 16)) — 3x3 preload makes this work first try. +4. Histogram p99 restore duration < 2.5s under normal load (no bot + flood). + +### Dependencies + +M0, M1 (snapshot is the safety net while M3 retry-loops). + +--- + +## M4 · Gatekeeper + queue + observability · OPEN + +**Goal:** Bring the queue, trust tiers, metrics endpoint, and +Discord webhook online. After M4 the operator has full visibility +without needing to grep logs. + +### Deliverables + +- Gatekeeper interface (`Gatekeeper.accept(connection) → Decision`) + with Paper-side implementation. Decision: `accept`, `queue`, + `reject`. +- Trust-tier resolver: reads LP permissions for `staff`, + AuthMe-DB last-seen for `returning` vs `new`, IP-block list for + `flagged`. Cacheable. +- Bounded queue with FIFO ordering by connect-time + tier priority. + Configurable `max-concurrent-auth`, `max-queue-depth`, + `queue-timeout-seconds`. +- BossBar UI in limbo: shows tier + position + ETA. Updates every + second. +- `/queue` command in-chat re-displays state. +- Prometheus HTTP server bound to `127.0.0.1:9091` (loopback only). +- Discord webhook config + plumbing for the alert categories from + ARCHITECTURE §7. +- `/authlimbo queue policy` command — prints the tier policy + in-game so players can self-verify they're not in a hidden tier. + +### Acceptance + +1. Stress test: 1000 simulated connections in 60s. + `authlimbo_queue_depth` peaks at `max-queue-depth`, never higher. + No `pendingTransit` leak (returns to 0 within 30s of flood end). +2. Staff bypass: a player with `authlimbo.queue.priority.staff` + skips even a full queue. Audit log records the bypass. +3. Pi-hole-style IP blocklist drops a connection at gatekeeper — + never enters limbo. `authlimbo_connections_total{outcome="rejected"}` + increments. +4. Prometheus scrape of `localhost:9091/metrics` returns OpenMetrics + format with all metrics from ARCHITECTURE §7. +5. `/authlimbo queue policy` output matches ARCHITECTURE §3 tier table + verbatim (rendered from a single source-of-truth string). + +### Dependencies + +M0 (state machine + audit log), M3 (so legitimate logins still +flow correctly through the new gatekeeper layer). + +--- + +## M5 · Hardening, drama-avoidance lock-in, release · OPEN + +**Goal:** Lock in the anti-drama policy so it can't drift. Ship v2.0.0. + +### Deliverables + +- Anti-drama policy constants in code (not config) — paid-tier and + hidden-tier escape hatches do not exist as configurable knobs. + Adding one would require a code change + AGPL fork. +- Reload-without-restart (`/authlimbo reload`) with in-flight transit + drain (max 30s wait). +- Fail-closed implementation for AuthMe DB unreachable case (kick + with operator-friendly message + webhook). +- Server-shutdown drain hook: clear transit, save snapshots, kick + limbo players with "server restarting" message. +- Chaos-test suite: kill-plugin-mid-login, kill-container, AuthMe-DB + network-drop. All recoverable. +- Documentation: `V2-ARCHITECTURE.md` (this milestone's companion), + `V2-RELEASE.md` migration guide for operators, updated + `compatibility.md` and `installation.md`. +- Tag v2.0.0, push to git.s8n.ru/s8n/auth-limbo, GitHub + push-mirror, attach jar to release. + +### Acceptance + +1. Plugin reload during a live transit completes the in-flight + restore correctly, no inventory loss. +2. Killing the plugin (`/plugman unload`) during [LIMBO] state and + restarting the server: rejoining player is restored from state + + snapshot. +3. AuthMe DB hard-down: connection rejected at gatekeeper, never + reaches main world. Operator gets webhook within 30s. +4. CHANGELOG documents every breaking change, every renamed + permission node, every config schema change. +5. v2.0.0 jar runs end-to-end on the racked.ru staging container + (parallel to v1 prod) for 7 days with zero void-deaths and zero + inventory losses. + +### Dependencies + +M0-M4. M5 is the gate to release. + +--- + +## Parked / non-blocking + +These items are **not** in the v2.0.0 critical path. Tracked here so +they aren't lost. + +- `P-VELO` · Velocity-mode behind feature flag (target: v2.2.0). + Requires a real second backend or proxy mesh first. +- `P-COBBLE` · cobblestone-server interop. Wait for cobblestone + intake to land in `_github/infra/`. +- `P-PLUGIN-MSG` · Plugin-message channel between paper-side and + proxy-side gatekeepers (prep for `P-VELO`). +- `P-WEB-UI` · Read-only web dashboard for queue + metrics. Defer + until operator asks. +- `P-CROWDSEC` · Pluggable IP-blocklist source (CrowdSec API). v2.0.0 + uses static config + Pi-hole hosts file. +- `P-MOJANG-BAN-CHECK` · Honor Mojang's name-changed-but-banned + blocklist. Niche, defer. + +--- + +## Cross-cutting acceptance: privacy invariant + +Every milestone must preserve the v1 privacy invariant: *no +main-world player can observe any pre-auth player's coordinates, +inventory, or chat*. + +A dedicated `PrivacyInvariantTest` (introduced in M2) runs on every +PR and must pass for merge. The test enumerates the six scope +boundaries from ARCHITECTURE §4 and asserts no leak in either +direction. + +If a milestone would relax any boundary, it MUST be flagged in the PR +description and reviewed against `feedback_audit_then_plan.md` +(audit-then-fix workflow). + +--- + +## Release plan + +| Tag | Contents | Target | +|-----|----------|--------| +| v2.0.0-rc1 | M0 + M1 + M2 + M3 | end of week 1 | +| v2.0.0-rc2 | + M4 | end of week 2 | +| v2.0.0 | + M5, 7-day staging soak | end of week 3 | +| v2.1.0 | parked items as operator pulls them in | opportunistic | + +All releases tagged on `git.s8n.ru/s8n/auth-limbo` first; GitHub is +push-mirror per `feedback_my_git_is_forgejo.md`. + +Operator handles end-of-session push.