4 parallel research agents output (2026-05-07): - RESEARCH-2B2T-QUEUE.md — 2b2t queue tech deep-dive: architecture, drama timeline, 5 patterns to copy + 5 to avoid - RESEARCH-LIMBO-PLUGIN-SURVEY.md — open-source plugin survey: STEAL list (Elytrium LimboAPI/LimboAuth + PistonQueue), PATTERN list, SKIP list - V2-ARCHITECTURE.md — Paper-only stack with Velocity-ready seam, 7-state login flow, snapshot-on-pre-login, transparent FIFO trust tiers - V2-ROADMAP.md — M0-M5 milestones with acceptance criteria + dep graph Stack decision: Paper-only for now (no proxy required), but architecture split into Gatekeeper + Restore layers so future Velocity migration is mechanical. Trip-wires codified for when to reconsider. Anti-drama policy locked in code (not config): no paid priority, no hidden veteran tier, transparent ban appeals. Bootstrap repo at git.s8n.ru/s8n/auth-limbo-v2 ready for M0 work.
20 KiB
AuthLimbo v2 — Architecture
Status: Design draft (no code). Drafted 2026-05-07 by the auth-limbo v2 design pass after the YOU500 / second-player void-death incidents. Audience: operator (P) and future contributors.
Companion docs:
AUDIT-2026-05-07.md— root-cause forensic.ROADMAP.md— v1.x tracking (F1-F7).V2-ROADMAP.md— milestones M0-M5 for v2.
1. Why v2
v1 is a single-jar Paper plugin glued onto AuthMe. It works most of the time, but its core failure modes are now well-understood and can't be patched away inside the v1 design:
| v1 limitation | v2 must address |
|---|---|
| Player object exists on the main server before auth — coords/inventory technically restorable from RAM by buggy plugins, world chunk activity is observable. | Strong isolation: limbo is the only state the player can touch pre-auth. |
Restore relies on AuthMe firing LoginEvent. AuthMe's own broken teleport runs in the same window — F4 pre-empts it but the design still races. |
Authoritative state machine that doesn't trust AuthMe's teleport at all. |
| Inventory loss on transit-death depends on F1 + F5 holding. There is no inventory-of-record outside live game state. | Snapshot-on-pre-login + snapshot-restore is a first-class subsystem, not a defensive add-on. |
| No metrics, no audit log, no admin alerting. Bugs only surface when a player loses gear. | Built-in observability: Prometheus + JSON-Lines audit + Discord webhook. |
| No queue / login-throttle. If 50 bots connect at once, AuthMe stalls. | Bounded concurrency with transparent FIFO and trust tiers (NOT pay tiers). |
v2 is a clean break (v2.0.0), not a v1 patch. v1 stays receiving F3,
F5, F6, F7 backports for as long as racked.ru still runs the old jar.
2. Stack decision — Paper-only, with a Velocity-ready seam
Recommendation: Paper-only single-server plugin for v2.0.0. Velocity-mode is a v2.x deferrable behind a feature flag.
Reasoning
racked.ru today is one Purpur 1.21.11 server in minecraft-mc itzg
container on nullstone. There is no Velocity / BungeeCord, no second
backend, no Forced Hosts, no proxy network. Adding Velocity to ship a
gatekeeper plugin would mean:
- standing up a new container, opening a new public port (or keeping 25565 on the proxy and 25566 internal),
- migrating the 12+ existing Paper plugins through the velocity-paper bridge contract for chat / commands / placeholders,
- new TLS / RCON / proxy-protocol surface to harden,
- breaking changes to AuthMe's data flow (proxy-side login flow vs
paper-side
AuthMeAsyncPreLoginEvent), - one more thing for the operator to babysit.
The privacy property the operator cares about — no other player sees pre-auth coords / inventory — is achievable on Paper-only via a strictly isolated limbo world + audience scoping (see §4). Velocity adds stronger isolation (player never reaches the backend at all) but the incremental privacy gain is small for a 0-10 player community, and the operational cost is large.
When Velocity becomes worth it
Codify trip-wires up front so the decision isn't dragged out:
- racked.ru splits into ≥2 backends (e.g.
survival+creative) — you need a proxy anyway. - cobblestone server comes online and shares an account/auth pool.
- Botting attempts cross 100 connections / minute and
connection-throttle+firewalld rate-limitare no longer enough. Velocity + a queue plugin (Ajax / VeloctyQueue) become operationally cheaper than chasing botnets at the application layer.
Until any of those, Paper-only is the right answer.
The Velocity-ready seam
v2 internal API is split into two layers so the proxy migration is mechanical:
+-------------------------------+ +-------------------------------+
| Gatekeeper (proxy or paper) | | Restore (paper only) |
| - accept connection | | - read snapshot |
| - check ban / rate limit | | - chunk preload |
| - hold in limbo / queue | | - authoritative TP |
| - hand off on auth-success | | - publish metrics |
+--------------+----------------+ +-------------------------------+
| hand-off event (UUID, target Location, source IP)
v
In v2.0 both layers live in the Paper plugin and the hand-off is just a local method call. In a future "v2-velo" both layers split: gatekeeper runs as a Velocity plugin, restore stays on Paper, hand-off becomes a plugin-message channel. No code outside those two layers needs to change.
3. Queue model — login-throttle + transparent trust tiers, NO 2b2t-style sale
For 0-10 player normal load: queue depth is always 0 and players never see "queued" UI. The queue exists for crisis scenarios (bot flood, restart drain, AuthMe DB stall) and to define explicit policy even if it's rarely hit.
Policy
| Tier | Definition | Effect |
|---|---|---|
staff |
Player has authlimbo.queue.priority.staff permission (LP-managed). |
Always passes. Bypasses queue entirely. |
returning |
Player is in AuthMe DB AND has logged in within last 30 days. | Default tier for everyone who isn't new. Normal FIFO ordering by connect-time. |
new |
Player is NOT in AuthMe DB OR last seen >30 days ago. | Same FIFO as returning BUT with a per-IP 1/minute throttle. Stops bot-floods. |
flagged |
Player IP matches a Pi-hole/CrowdSec/abuse-DB block. | Rejected at gatekeeper, never enters the queue. |
Hard rules — written into V2-ARCHITECTURE.md so they outlive any one
operator's mood:
- No paid priority. Ever. No "priority queue pass", no "supporter rank skip", no Patreon tier. The 2b2t community collapsed under that grift; we don't repeat it.
- No hidden veteran tier. Every tier is documented in this file
and in
/authlimbo queue policyin-game. If a player can't see why they're in tier X, the tier is illegitimate. - No in-game bidding / griefing for queue spots. Queue position is purely connect-time + tier; no player action affects it.
- Ops-staff bypass is logged. Every staff bypass writes a JSON-L audit row.
Capacity
gatekeeper.max-concurrent-auth: 5— at most 5 players in the pre-auth limbo at once. Defaults sized for racked.ru. AuthMe DB reads + chunk pins per concurrent player are roughly free, but bound it anyway.gatekeeper.max-queue-depth: 50— beyond 50 waiting, new connections get a "server is starting up, try again in 30s" kick. Better UX than a 5-minute black-screen wait.gatekeeper.queue-timeout-seconds: 120— anyone in the queue >2 minutes gets the same kick + a Discord webhook fires.
What queue UX looks like
In limbo, a BossBar (Adventure API) shows tier + position:
[returning] Queue position: 3 / 7 ETA: ~15s
When position == 0 and AuthMe accepts, the bar disappears. There's no
hidden state. /queue in-chat re-displays the same info.
4. Privacy isolation
This is the original feature; v2 must not regress it.
Limbo world
- Separate Bukkit world
auth_limbo,Environment.THE_END,VoidGenerator. Same as v1. keepSpawnInMemory=true. Game-rules: no daylight, no weather, no mobs, no fire-tick, no PvP,doImmediateRespawn=true,keepInventory=true(defence-in-depth — limbo never should see a death event but if it does, no item drops happen).- Per-player view-distance forced to 2 in limbo via Paper's
Player#setViewDistance. They see 5x5 chunks, all empty. - Limbo platform: 5x5 of
BARRIERblocks at y=127, single block ofBARRIERceiling at y=129 to prevent flying out. y=0..126 and y=130+ are pure void.
Adventure-API audience scoping
PlayerChatEvent listener at EventPriority.HIGHEST:
- If sender is in main worlds, recipient list is filtered: anyone
whose
World#getName().equals("auth_limbo")is dropped. Pre-auth players never see overworld chat. - If sender is in limbo (would normally not chat — AuthMe blocks it — but defence in depth), recipient list is set to only the sender. They cannot leak messages to the main world.
PlayerJoinEventjoin messages are suppressed forauth_limbo-spawn joins. Main world only sees a join announcement after the authoritative restore TP succeeds (M2 §"join-message shifting" below).
Tablist scoping
Hook PaperPlayerListEntryEvent (or fall back to
PlayerJoinEvent + Player#hidePlayer):
- Limbo players are hidden from main-world tablist.
- Main-world players are hidden from limbo tablist.
- Limbo players cannot see each other (each limbo player sees only themselves).
What main world observers can detect
After scoping:
- They cannot see the player's name in tablist pre-auth.
- They cannot see chat from the player.
- They cannot see the player's world or coordinates (AuthMe blocks movement output anyway, but we don't rely on it).
- They CAN see the connection event in server logs (operator-only).
- They can see "PLAYER joined the game" only AFTER restore succeeds — join message is shifted to fire on restore-success, not on initial connect.
This matches the v1 privacy posture and tightens the join-message leak.
5. Login flow — explicit state machine
[CONNECT] ---throttle ok---> [GATE]
|
failed throttle / ban |
v v
[REJECTED] [SNAPSHOT] <-- read AuthMe DB,
| dump current invent + xp + loc
v to plugins/AuthLimbo/snapshots/<uuid>.nbt
[LIMBO]
|
AuthMe /login ok
|
v
[PRELOAD] <-- 3x3 chunk pin around target
|
v
[RESTORE] <-- teleportAsync, retry up to 3
|
+-----+-----+
| |
success fail x3
| |
v v
[LIVE] [SPECTATOR-AT-LIMBO + admin alert]
Each transition has:
- Trigger event (e.g.
LoginEventMONITOR). - Pre-conditions (e.g. UUID in
pendingTransit). - Side-effects (e.g. metric counter, audit-log row).
- Failure handler (next state on error).
States persist in plugins/AuthLimbo/state/<uuid>.json so a plugin
crash mid-flow can resume on rejoin. State file is deleted on
[LIVE] entry.
Snapshot subsystem
This is the operator-bug-survives-everything layer.
- On
AuthMeAsyncPreLoginEvent(player just connected, NOT yet auth'd): if a player fileworld/playerdata/<uuid>.datexists, read it and shadow-copy toplugins/AuthLimbo/snapshots/<uuid>.nbtwith timestamp. SHA-256 of file content is logged. /authlimbo restore <player>can roll back any restore by feeding the snapshot through nbtlib (same as the void-death recovery protocol fromfeedback_mc_tp_safety.md).- Snapshots retained 7 days, then GC'd. Configurable.
- On
PlayerDeathEventwhile UUID inpendingTransit:keepInventory=true,event.getDrops().clear(), log SEVERE, trigger Discord webhook, schedule restore-from-snapshot on respawn.
Restore step (replaces v1's doTeleport + 10-tick delay)
- Read saved location from AuthMe DB (cached from pre-login — single in-memory hashmap keyed by UUID, evicted on transit clear).
- Compute 3x3 chunk grid centred on saved location.
addPluginChunkTicketon all 9 chunks.CompletableFuture.allOf(getChunkAtAsyncUrgently x9)— wait for all 9 to actually be loaded, not just the centre one (closes the "loaded but neighbour unloaded" race).teleportAsync(saved, PLUGIN). Iffalse: F2 retry loop (already in v1.1.0, carries over).- On success: 5-tick delay, then verify
player.getLocation().distance(saved) < 2.0. If not, treat as a silent failure → retry. - Release tickets 5s post-success.
- Mark transition [LIVE], publish
auth_login_success_totalmetric, write audit-log row, send delayed join-message to main world, clear snapshot.
F8 — drop the SPECTATOR pre-TP trick
v1 considered "set GameMode.SPECTATOR before TP, revert after". v2 does NOT do this — spectator mode has its own client-side render races on chunk-load and silently swallows damage events that the F1 guard needs to see. Instead: invariant-driven recovery (snapshot + retry + admin alert) is the safety net. SPECTATOR is the final fallback after 3 failed retries (F6 in v1, kept for v2).
6. Anti-drama checklist (2b2t lessons)
Codified up-front so future "monetisation" pressure is rejected by reference, not by argument.
- No pay-to-skip. Tier list above is the entire policy.
- No hidden tier or undocumented bypass (staff bypass is logged).
- No queue spot trading / selling.
- No "queue position visible to others" — your position is only visible to you. No social pressure surface.
- Queue is purely FIFO + tier; no algorithm tweaks, no "lottery".
- AGPL-3.0 means anyone can fork and self-host an alt gatekeeper if they distrust ours. Operator-friendly.
- Audit log is local-file JSON-L, not phoned home, not centralised. Operator-readable, no hidden telemetry.
7. Operational surface
Metrics (Prometheus)
Exposed via embedded HTTP server bound to 127.0.0.1:9091 (loopback
only — Prometheus on nullstone scrapes via localhost):
| Metric | Type | Labels |
|---|---|---|
authlimbo_connections_total |
counter | tier, outcome={accepted, queued, rejected} |
authlimbo_queue_depth |
gauge | — |
authlimbo_login_success_total |
counter | tier |
authlimbo_login_fail_total |
counter | reason={timeout, authme_db, tp_failed_3x, ...} |
authlimbo_void_damage_blocked_total |
counter | — |
authlimbo_snapshot_restored_total |
counter | — |
authlimbo_restore_duration_seconds |
histogram | tier |
Trip-wire alerts (configured server-side, in
prometheus/alerts.yml, not in the plugin):
authlimbo_login_fail_total{reason="tp_failed_3x"}rate > 0 for 5m.authlimbo_void_damage_blocked_totalrate > 0 for 1m.authlimbo_queue_depth> 10 for 5m.
Discord webhooks
Plugin-side webhook fires on:
- Snapshot restored (gear was about to be lost).
- 3x retry give-up (manual
/authlimbo tpneeded). - Queue depth > config threshold.
- AuthMe DB unreachable.
- Plugin reload / crash.
Webhook URL is in config, redacted from /authlimbo dump.
Audit log
plugins/AuthLimbo/audit.log — JSON Lines, one row per state
transition. Fields: ts, uuid, name, ip, tier, state,
prev_state, extra (free-form JSON). Logrotate-compatible; rotates
at 100MB, keeps 7 files.
Reload-without-restart
/authlimbo reload:
- Re-reads
config.yml. - Drains in-flight transits to completion (no new joins accepted during drain, max 30s wait).
- Re-binds metrics HTTP server if port changed.
- Re-creates limbo world if name/spawn changed.
- Discord webhook fires "reload completed in Xs".
8. Failure modes & recovery
| Failure | Detection | Recovery |
|---|---|---|
| Plugin crashes mid-restore | On startup, scan state/*.json files older than 30s. |
For each: if player offline, leave snapshot; if online, treat as new transit, force re-restore from saved AuthMe loc. |
| Snapshot file corrupt / unreadable | NBT parse exception. | Fall back to AuthMe DB saved-loc; log SEVERE; webhook. Player may lose newest items but not entire inventory. |
| World save corrupts | Paper World#getChunkAtAsync throws. | After 3 retries: kick player with "server experiencing storage issue, try again in 5min"; webhook. |
| AuthMe DB unreachable | JDBC getConnection throws / read times out > 5s. |
Fail closed. Reject connection at gatekeeper with kick: "auth service degraded". Log + webhook. Do NOT let player onto main world without auth. |
Server /stop mid-login window |
Paper shutdown hook. | clearTransit for all UUIDs, force-save snapshots, kick all limbo players with "server restarting, your gear is safe". |
| Race: AuthMe LoginEvent fires twice (HaHaWTH bug) | UUID already in pendingTransit and not in RESTORE state. |
Idempotent — restore handler is a no-op if UUID is past [PRELOAD]. Log INFO. |
| Player disconnects in [LIMBO] | PlayerQuitEvent. |
Clear pendingTransit + retry counter. Snapshot retained 7d. State file kept until snapshot GC. |
fail-open is never the right choice for an auth gatekeeper. Every
failure mode resolves to either: keep player in limbo, or kick them.
Never advance them to main-world unauth'd.
9. Migration from v1
In-place upgrade path (v1.1.x → v2.0.0):
- Stop server.
- Drop new jar in
plugins/. v2 jar is not v1-compatible — oldAuthLimbo-1.x.jarmust be removed. - v2 detects
plugins/AuthLimbo/config.ymlfrom v1 and rewrites it to v2 schema, leaving aconfig.v1.bakbackup. - v2 detects
auth_limboworld dir on disk and re-uses it (no recreation, no data loss). - AuthMe DB schema unchanged — v2 still treats
authme.dbas read-only authoritative. - New:
plugins/AuthLimbo/snapshots/andplugins/AuthLimbo/state/directories created, owned by the same uid as the itzg container's runtime user. - Start server. v2 startup logs walk through migration steps.
There is no DB migration. No mandatory player action. Permissions
node names change (authlimbo.admin is now
authlimbo.command.admin, etc.) — operator must update LP groups
(noted in CHANGELOG).
10. Test plan
Unit (JUnit 5 + Mockito)
LimboWorldManager— barrier-platform construction is idempotent.AuthMeDatabase.getQuitLocation— returnsLocationfor present row, null for absent, null for malformed row.Snapshot.serialize/deserializeround-trip.- State-machine: every transition rejects from invalid prev-state.
Integration (Paper test-server harness)
- Stand up Paper 1.21.x in CI (Forgejo Actions runner on nullstone).
- Mock AuthMe via a stub plugin that fires
AuthMeAsyncPreLoginEventandLoginEventprogrammatically. - Test scenarios: §5.1-5.6 from
AUDIT-2026-05-07.mdplus v2-specific: queue overflow, snapshot-restore on death, reload-without-restart, fail-closed on AuthMe DB down.
Stress (Bot flood)
- 1000 fake connections in 60s using mineflayer or
MCBotsPro. Verify:- queue-depth bounded (gatekeeper kicks beyond max-queue-depth);
- no
pendingTransitleak (size returns to 0 after); - metrics counters consistent with audit log.
Chaos
- Kill plugin (
/plugman unload AuthLimbo) mid-restore, verify state recovery on rejoin. iptables -A OUTPUT -d <authme-db-host> -j DROPand verify fail-closed.kill -9itzg container during transit, verify next-startup walksstate/*.jsonand recovers.
11. Versioning + release
- v2.0.0 = breaking redesign (this doc), AGPL-3.0 retained.
- v2.1.0 = polish (BossBar UX, /queue command, more metrics).
- v2.2.0 = Velocity-mode behind feature flag.
- v1.x = receives F3, F5, F6, F7 backports until racked.ru cuts over to v2; then archived.
Coordinate naming: when the codename migration completes
(onyx→obsidian, nullstone→bedrock per
gravel-laptop-build/ROADMAP.md), the racked.ru server moves to
bedrock. v2.0.0 must run on both naming worlds without config drift.
12. Open questions
- BossBar UI — does the operator want it visible to limbo players, or silent? Default proposed: visible.
- Snapshot retention — 7 days is the proposed default. Storage cost is ~1 KB/snapshot for vanilla inventories, up to ~50 KB for shulker-stuffed players. 100 active players → ~5 MB max.
- Webhook destination — same Discord channel as
s8n-ruserver-status alerts, or a new channel? Default proposed: same channel, prefixed[AuthLimbo]. - v2.2 Velocity migration — needs a separate design pass once cobblestone or a second backend is real.
Sign-off pending operator review.