auth-limbo/docs/V2-ROADMAP.md
s8n ab1f607df6 docs: AuthLimbo v2 research + architecture + roadmap
4 parallel research agents output (2026-05-07):
- RESEARCH-2B2T-QUEUE.md — 2b2t queue tech deep-dive: architecture, drama
  timeline, 5 patterns to copy + 5 to avoid
- RESEARCH-LIMBO-PLUGIN-SURVEY.md — open-source plugin survey: STEAL list
  (Elytrium LimboAPI/LimboAuth + PistonQueue), PATTERN list, SKIP list
- V2-ARCHITECTURE.md — Paper-only stack with Velocity-ready seam, 7-state
  login flow, snapshot-on-pre-login, transparent FIFO trust tiers
- V2-ROADMAP.md — M0-M5 milestones with acceptance criteria + dep graph

Stack decision: Paper-only for now (no proxy required), but architecture
split into Gatekeeper + Restore layers so future Velocity migration is
mechanical. Trip-wires codified for when to reconsider.

Anti-drama policy locked in code (not config): no paid priority, no
hidden veteran tier, transparent ban appeals.

Bootstrap repo at git.s8n.ru/s8n/auth-limbo-v2 ready for M0 work.
2026-05-07 19:31:40 +01:00

11 KiB

AuthLimbo v2 — Roadmap (M0-M5)

Companion to V2-ARCHITECTURE.md. Tracks the v2.0.0 implementation as ordered milestones with explicit acceptance criteria, dependencies, and parking lots for non-blocking work.

Status legend: OPEN, WIP, BLOCKED, DONE. Owner: Claude Code agents under operator review. Branching: every milestone lands on a feature branch v2/M{N}-<slug> and merges into v2-main after acceptance. v2-main becomes main at v2.0.0 release.

Pre-requisite: v1.1.0 (F1 + F2 + F4) is on main and tagged. v2 work begins on a fresh v2-main branch.


M0 · Foundations · OPEN

Goal: Land the v2 skeleton so all later milestones plug into a shared backbone. No behaviour changes for end-users.

Deliverables

  • New maven module core for the gatekeeper/restore split (Velocity-ready seam). Existing ru.authlimbo package becomes ru.authlimbo.paper.
  • State enum + StateMachine class (CONNECT → GATE → SNAPSHOT → LIMBO → PRELOAD → RESTORE → LIVE | REJECTED | SPECTATOR_FAIL) with persistence to plugins/AuthLimbo/state/<uuid>.json.
  • AuditLog writer (JSON-Lines append-only, logrotate-compatible).
  • MetricsRegistry skeleton (counters, gauges, histograms — no HTTP server yet, just in-memory accounting).
  • Config-v2 schema + automatic v1→v2 migration with backup.
  • Build: maven multi-module, sqlite-jdbc still shaded, Adventure API brought in via Paper API (no extra shade).

Acceptance

  1. Plugin loads on Paper 1.21.11 with v1 config; v1→v2 migration runs exactly once and writes config.v1.bak.
  2. /authlimbo state <player> shows current state for any in-flight player.
  3. audit.log is created and rotates at 100MB (verified by manual 100MB-noise injection).
  4. All v1.1.0 behaviour is preserved (F1, F2, F4 still work end-to-end on a stub-AuthMe test server).
  5. Unit tests for state-machine transition validity pass in CI.

Dependencies

None. M0 is the foundation.


M1 · Snapshot subsystem · OPEN

Goal: Make inventory loss impossible regardless of any chunk / teleport / damage bug downstream.

Deliverables

  • On AuthMeAsyncPreLoginEvent: copy world/playerdata/<uuid>.dat to plugins/AuthLimbo/snapshots/<uuid>-<timestamp>.nbt, log SHA-256.
  • On PlayerDeathEvent while UUID is in pendingTransit: keepInventory=true, drops cleared, SEVERE logged, Discord webhook fired, schedule restore-from-snapshot on respawn.
  • New command /authlimbo restore <player> [--snapshot=<file>] that rolls back to a snapshot (uses bundled nbtlib equivalent or an embedded reader).
  • Snapshot retention GC: 7-day default, configurable, runs hourly.
  • Metric: authlimbo_snapshot_restored_total.

Acceptance

  1. Forced-void-death during transit (test-harness /limbo void <player>): player respawns with full inventory + xp.
  2. Snapshot files appear in snapshots/, SHA-256 logged on creation and on read-back.
  3. GC removes >7-day snapshots; verified by setting retention=10s in test config.
  4. /authlimbo restore <player> after a successful login restores the pre-login inventory and sends an audit-log entry.

Dependencies

M0 (audit log + state machine).


M2 · Privacy-isolation hardening · OPEN

Goal: Tighten the limbo-world isolation surface — no leaks of chat, tablist, or join messages between limbo and main world. Make the privacy invariant testable.

Deliverables

  • PlayerChatEvent listener (HIGHEST): drop limbo-world recipients from main-world chat; main-world recipients from limbo chat.
  • Tablist scoping via Player#hidePlayer(plugin, target):
    • limbo players hidden from main-world tablist;
    • main-world players hidden from limbo tablist;
    • limbo players hidden from each other.
  • Join-message shifting: suppress vanilla join message on initial connect; fire delayed join message at state-machine [LIVE] transition.
  • Per-player view-distance forced to 2 in limbo (Player#setViewDistance(2) on limbo entry, restore on exit).
  • Limbo BARRIER ceiling at y=129 added to LimboWorldManager.

Acceptance

  1. With two test accounts (alice in main world, bob connecting to limbo): alice does not see bob in tablist before bob completes login. After login, alice sees bob's join message exactly once.
  2. bob in limbo cannot see chat from alice. Verified via integration test.
  3. bob cannot fly out of limbo via creative/elytra (server starts bob in survival; barrier ceiling prevents y>129).
  4. Privacy invariant test (PrivacyInvariantTest) covers all six scope boundaries (chat in/out, tablist in/out, join-msg before/after).

Dependencies

M0.


M3 · Restore reliability (3x3 preload + chunk-ready verification) · OPEN

Goal: Make the restore-teleport bullet-proof against the "loaded-but-neighbour-unloaded" race that v1's F3 was designed for, plus the silent-failure case where teleportAsync returns true but the player is still at the old position.

Deliverables

  • 3x3 chunk preload around target (addPluginChunkTicket x9 + CompletableFuture.allOf(getChunkAtAsyncUrgently x9)).
  • Post-TP verification: 5 ticks after teleportAsync returns true, check player.getLocation().distance(saved) < 2.0. If not, treat as silent fail and retry.
  • F2-style retry loop already from v1.1 carried over with v2 metrics
    • audit log integration.
  • Drop the SPECTATOR pre-TP trick (v1's F8 redesign): rely on the snapshot + damage-guard layers instead.
  • Metric: authlimbo_restore_duration_seconds histogram.

Acceptance

  1. AUDIT-2026-05-07 §5.1 (unloaded-chunk void) reproduces no void-death and no inventory loss. Player lands at saved coords.
  2. AUDIT-2026-05-07 §5.2 (invalid Y) escalates to SPECTATOR_FAIL after 3 retries with audit-log + webhook.
  3. New scenario: target at chunk-section boundary (e.g. (16, 70, 16)) — 3x3 preload makes this work first try.
  4. Histogram p99 restore duration < 2.5s under normal load (no bot flood).

Dependencies

M0, M1 (snapshot is the safety net while M3 retry-loops).


M4 · Gatekeeper + queue + observability · OPEN

Goal: Bring the queue, trust tiers, metrics endpoint, and Discord webhook online. After M4 the operator has full visibility without needing to grep logs.

Deliverables

  • Gatekeeper interface (Gatekeeper.accept(connection) → Decision) with Paper-side implementation. Decision: accept, queue, reject.
  • Trust-tier resolver: reads LP permissions for staff, AuthMe-DB last-seen for returning vs new, IP-block list for flagged. Cacheable.
  • Bounded queue with FIFO ordering by connect-time + tier priority. Configurable max-concurrent-auth, max-queue-depth, queue-timeout-seconds.
  • BossBar UI in limbo: shows tier + position + ETA. Updates every second.
  • /queue command in-chat re-displays state.
  • Prometheus HTTP server bound to 127.0.0.1:9091 (loopback only).
  • Discord webhook config + plumbing for the alert categories from ARCHITECTURE §7.
  • /authlimbo queue policy command — prints the tier policy in-game so players can self-verify they're not in a hidden tier.

Acceptance

  1. Stress test: 1000 simulated connections in 60s. authlimbo_queue_depth peaks at max-queue-depth, never higher. No pendingTransit leak (returns to 0 within 30s of flood end).
  2. Staff bypass: a player with authlimbo.queue.priority.staff skips even a full queue. Audit log records the bypass.
  3. Pi-hole-style IP blocklist drops a connection at gatekeeper — never enters limbo. authlimbo_connections_total{outcome="rejected"} increments.
  4. Prometheus scrape of localhost:9091/metrics returns OpenMetrics format with all metrics from ARCHITECTURE §7.
  5. /authlimbo queue policy output matches ARCHITECTURE §3 tier table verbatim (rendered from a single source-of-truth string).

Dependencies

M0 (state machine + audit log), M3 (so legitimate logins still flow correctly through the new gatekeeper layer).


M5 · Hardening, drama-avoidance lock-in, release · OPEN

Goal: Lock in the anti-drama policy so it can't drift. Ship v2.0.0.

Deliverables

  • Anti-drama policy constants in code (not config) — paid-tier and hidden-tier escape hatches do not exist as configurable knobs. Adding one would require a code change + AGPL fork.
  • Reload-without-restart (/authlimbo reload) with in-flight transit drain (max 30s wait).
  • Fail-closed implementation for AuthMe DB unreachable case (kick with operator-friendly message + webhook).
  • Server-shutdown drain hook: clear transit, save snapshots, kick limbo players with "server restarting" message.
  • Chaos-test suite: kill-plugin-mid-login, kill-container, AuthMe-DB network-drop. All recoverable.
  • Documentation: V2-ARCHITECTURE.md (this milestone's companion), V2-RELEASE.md migration guide for operators, updated compatibility.md and installation.md.
  • Tag v2.0.0, push to git.s8n.ru/s8n/auth-limbo, GitHub push-mirror, attach jar to release.

Acceptance

  1. Plugin reload during a live transit completes the in-flight restore correctly, no inventory loss.
  2. Killing the plugin (/plugman unload) during [LIMBO] state and restarting the server: rejoining player is restored from state + snapshot.
  3. AuthMe DB hard-down: connection rejected at gatekeeper, never reaches main world. Operator gets webhook within 30s.
  4. CHANGELOG documents every breaking change, every renamed permission node, every config schema change.
  5. v2.0.0 jar runs end-to-end on the racked.ru staging container (parallel to v1 prod) for 7 days with zero void-deaths and zero inventory losses.

Dependencies

M0-M4. M5 is the gate to release.


Parked / non-blocking

These items are not in the v2.0.0 critical path. Tracked here so they aren't lost.

  • P-VELO · Velocity-mode behind feature flag (target: v2.2.0). Requires a real second backend or proxy mesh first.
  • P-COBBLE · cobblestone-server interop. Wait for cobblestone intake to land in _github/infra/.
  • P-PLUGIN-MSG · Plugin-message channel between paper-side and proxy-side gatekeepers (prep for P-VELO).
  • P-WEB-UI · Read-only web dashboard for queue + metrics. Defer until operator asks.
  • P-CROWDSEC · Pluggable IP-blocklist source (CrowdSec API). v2.0.0 uses static config + Pi-hole hosts file.
  • P-MOJANG-BAN-CHECK · Honor Mojang's name-changed-but-banned blocklist. Niche, defer.

Cross-cutting acceptance: privacy invariant

Every milestone must preserve the v1 privacy invariant: no main-world player can observe any pre-auth player's coordinates, inventory, or chat.

A dedicated PrivacyInvariantTest (introduced in M2) runs on every PR and must pass for merge. The test enumerates the six scope boundaries from ARCHITECTURE §4 and asserts no leak in either direction.

If a milestone would relax any boundary, it MUST be flagged in the PR description and reviewed against feedback_audit_then_plan.md (audit-then-fix workflow).


Release plan

Tag Contents Target
v2.0.0-rc1 M0 + M1 + M2 + M3 end of week 1
v2.0.0-rc2 + M4 end of week 2
v2.0.0 + M5, 7-day staging soak end of week 3
v2.1.0 parked items as operator pulls them in opportunistic

All releases tagged on git.s8n.ru/s8n/auth-limbo first; GitHub is push-mirror per feedback_my_git_is_forgejo.md.

Operator handles end-of-session push.