310 lines
11 KiB
Markdown
310 lines
11 KiB
Markdown
|
|
# AuthLimbo v2 — Roadmap (M0-M5)
|
||
|
|
|
||
|
|
Companion to [`V2-ARCHITECTURE.md`](V2-ARCHITECTURE.md). Tracks the
|
||
|
|
v2.0.0 implementation as ordered milestones with explicit acceptance
|
||
|
|
criteria, dependencies, and parking lots for non-blocking work.
|
||
|
|
|
||
|
|
Status legend: `OPEN`, `WIP`, `BLOCKED`, `DONE`.
|
||
|
|
Owner: Claude Code agents under operator review.
|
||
|
|
Branching: every milestone lands on a feature branch
|
||
|
|
`v2/M{N}-<slug>` and merges into `v2-main` after acceptance. `v2-main`
|
||
|
|
becomes `main` at v2.0.0 release.
|
||
|
|
|
||
|
|
Pre-requisite: v1.1.0 (F1 + F2 + F4) is on `main` and tagged.
|
||
|
|
v2 work begins on a fresh `v2-main` branch.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## M0 · Foundations · OPEN
|
||
|
|
|
||
|
|
**Goal:** Land the v2 skeleton so all later milestones plug into a
|
||
|
|
shared backbone. No behaviour changes for end-users.
|
||
|
|
|
||
|
|
### Deliverables
|
||
|
|
|
||
|
|
- New maven module `core` for the gatekeeper/restore split (Velocity-ready
|
||
|
|
seam). Existing `ru.authlimbo` package becomes `ru.authlimbo.paper`.
|
||
|
|
- `State` enum + `StateMachine` class (`CONNECT → GATE → SNAPSHOT
|
||
|
|
→ LIMBO → PRELOAD → RESTORE → LIVE | REJECTED | SPECTATOR_FAIL`)
|
||
|
|
with persistence to `plugins/AuthLimbo/state/<uuid>.json`.
|
||
|
|
- `AuditLog` writer (JSON-Lines append-only, logrotate-compatible).
|
||
|
|
- `MetricsRegistry` skeleton (counters, gauges, histograms — no HTTP
|
||
|
|
server yet, just in-memory accounting).
|
||
|
|
- Config-v2 schema + automatic v1→v2 migration with backup.
|
||
|
|
- Build: maven multi-module, sqlite-jdbc still shaded, Adventure API
|
||
|
|
brought in via Paper API (no extra shade).
|
||
|
|
|
||
|
|
### Acceptance
|
||
|
|
|
||
|
|
1. Plugin loads on Paper 1.21.11 with v1 config; v1→v2 migration runs
|
||
|
|
exactly once and writes `config.v1.bak`.
|
||
|
|
2. `/authlimbo state <player>` shows current state for any in-flight
|
||
|
|
player.
|
||
|
|
3. `audit.log` is created and rotates at 100MB (verified by manual
|
||
|
|
100MB-noise injection).
|
||
|
|
4. All v1.1.0 behaviour is preserved (F1, F2, F4 still work
|
||
|
|
end-to-end on a stub-AuthMe test server).
|
||
|
|
5. Unit tests for state-machine transition validity pass in CI.
|
||
|
|
|
||
|
|
### Dependencies
|
||
|
|
|
||
|
|
None. M0 is the foundation.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## M1 · Snapshot subsystem · OPEN
|
||
|
|
|
||
|
|
**Goal:** Make inventory loss impossible regardless of any chunk /
|
||
|
|
teleport / damage bug downstream.
|
||
|
|
|
||
|
|
### Deliverables
|
||
|
|
|
||
|
|
- On `AuthMeAsyncPreLoginEvent`: copy `world/playerdata/<uuid>.dat`
|
||
|
|
to `plugins/AuthLimbo/snapshots/<uuid>-<timestamp>.nbt`, log
|
||
|
|
SHA-256.
|
||
|
|
- On `PlayerDeathEvent` while UUID is in `pendingTransit`:
|
||
|
|
`keepInventory=true`, drops cleared, SEVERE logged, Discord webhook
|
||
|
|
fired, schedule restore-from-snapshot on respawn.
|
||
|
|
- New command `/authlimbo restore <player> [--snapshot=<file>]` that
|
||
|
|
rolls back to a snapshot (uses bundled nbtlib equivalent or an
|
||
|
|
embedded reader).
|
||
|
|
- Snapshot retention GC: 7-day default, configurable, runs hourly.
|
||
|
|
- Metric: `authlimbo_snapshot_restored_total`.
|
||
|
|
|
||
|
|
### Acceptance
|
||
|
|
|
||
|
|
1. Forced-void-death during transit (test-harness `/limbo void <player>`):
|
||
|
|
player respawns with full inventory + xp.
|
||
|
|
2. Snapshot files appear in `snapshots/`, SHA-256 logged on creation
|
||
|
|
and on read-back.
|
||
|
|
3. GC removes >7-day snapshots; verified by setting retention=10s in
|
||
|
|
test config.
|
||
|
|
4. `/authlimbo restore <player>` after a successful login restores
|
||
|
|
the pre-login inventory and sends an audit-log entry.
|
||
|
|
|
||
|
|
### Dependencies
|
||
|
|
|
||
|
|
M0 (audit log + state machine).
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## M2 · Privacy-isolation hardening · OPEN
|
||
|
|
|
||
|
|
**Goal:** Tighten the limbo-world isolation surface — no leaks of
|
||
|
|
chat, tablist, or join messages between limbo and main world. Make
|
||
|
|
the privacy invariant testable.
|
||
|
|
|
||
|
|
### Deliverables
|
||
|
|
|
||
|
|
- `PlayerChatEvent` listener (HIGHEST): drop limbo-world recipients
|
||
|
|
from main-world chat; main-world recipients from limbo chat.
|
||
|
|
- Tablist scoping via `Player#hidePlayer(plugin, target)`:
|
||
|
|
- limbo players hidden from main-world tablist;
|
||
|
|
- main-world players hidden from limbo tablist;
|
||
|
|
- limbo players hidden from each other.
|
||
|
|
- Join-message shifting: suppress vanilla join message on initial
|
||
|
|
connect; fire delayed join message at state-machine [LIVE]
|
||
|
|
transition.
|
||
|
|
- Per-player view-distance forced to 2 in limbo
|
||
|
|
(`Player#setViewDistance(2)` on limbo entry, restore on exit).
|
||
|
|
- Limbo BARRIER ceiling at y=129 added to `LimboWorldManager`.
|
||
|
|
|
||
|
|
### Acceptance
|
||
|
|
|
||
|
|
1. With two test accounts (`alice` in main world, `bob` connecting
|
||
|
|
to limbo): `alice` does not see `bob` in tablist before `bob`
|
||
|
|
completes login. After login, `alice` sees `bob`'s join message
|
||
|
|
exactly once.
|
||
|
|
2. `bob` in limbo cannot see chat from `alice`. Verified via
|
||
|
|
integration test.
|
||
|
|
3. `bob` cannot fly out of limbo via creative/elytra (server starts
|
||
|
|
bob in survival; barrier ceiling prevents y>129).
|
||
|
|
4. Privacy invariant test (`PrivacyInvariantTest`) covers all six
|
||
|
|
scope boundaries (chat in/out, tablist in/out, join-msg before/after).
|
||
|
|
|
||
|
|
### Dependencies
|
||
|
|
|
||
|
|
M0.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## M3 · Restore reliability (3x3 preload + chunk-ready verification) · OPEN
|
||
|
|
|
||
|
|
**Goal:** Make the restore-teleport bullet-proof against the
|
||
|
|
"loaded-but-neighbour-unloaded" race that v1's F3 was designed for,
|
||
|
|
plus the silent-failure case where `teleportAsync` returns true but
|
||
|
|
the player is still at the old position.
|
||
|
|
|
||
|
|
### Deliverables
|
||
|
|
|
||
|
|
- 3x3 chunk preload around target (`addPluginChunkTicket` x9 +
|
||
|
|
`CompletableFuture.allOf(getChunkAtAsyncUrgently x9)`).
|
||
|
|
- Post-TP verification: 5 ticks after `teleportAsync` returns true,
|
||
|
|
check `player.getLocation().distance(saved) < 2.0`. If not, treat
|
||
|
|
as silent fail and retry.
|
||
|
|
- F2-style retry loop already from v1.1 carried over with v2 metrics
|
||
|
|
+ audit log integration.
|
||
|
|
- Drop the SPECTATOR pre-TP trick (v1's F8 redesign): rely on the
|
||
|
|
snapshot + damage-guard layers instead.
|
||
|
|
- Metric: `authlimbo_restore_duration_seconds` histogram.
|
||
|
|
|
||
|
|
### Acceptance
|
||
|
|
|
||
|
|
1. AUDIT-2026-05-07 §5.1 (unloaded-chunk void) reproduces no
|
||
|
|
void-death and no inventory loss. Player lands at saved coords.
|
||
|
|
2. AUDIT-2026-05-07 §5.2 (invalid Y) escalates to
|
||
|
|
`SPECTATOR_FAIL` after 3 retries with audit-log + webhook.
|
||
|
|
3. New scenario: target at chunk-section boundary
|
||
|
|
(e.g. (16, 70, 16)) — 3x3 preload makes this work first try.
|
||
|
|
4. Histogram p99 restore duration < 2.5s under normal load (no bot
|
||
|
|
flood).
|
||
|
|
|
||
|
|
### Dependencies
|
||
|
|
|
||
|
|
M0, M1 (snapshot is the safety net while M3 retry-loops).
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## M4 · Gatekeeper + queue + observability · OPEN
|
||
|
|
|
||
|
|
**Goal:** Bring the queue, trust tiers, metrics endpoint, and
|
||
|
|
Discord webhook online. After M4 the operator has full visibility
|
||
|
|
without needing to grep logs.
|
||
|
|
|
||
|
|
### Deliverables
|
||
|
|
|
||
|
|
- Gatekeeper interface (`Gatekeeper.accept(connection) → Decision`)
|
||
|
|
with Paper-side implementation. Decision: `accept`, `queue`,
|
||
|
|
`reject`.
|
||
|
|
- Trust-tier resolver: reads LP permissions for `staff`,
|
||
|
|
AuthMe-DB last-seen for `returning` vs `new`, IP-block list for
|
||
|
|
`flagged`. Cacheable.
|
||
|
|
- Bounded queue with FIFO ordering by connect-time + tier priority.
|
||
|
|
Configurable `max-concurrent-auth`, `max-queue-depth`,
|
||
|
|
`queue-timeout-seconds`.
|
||
|
|
- BossBar UI in limbo: shows tier + position + ETA. Updates every
|
||
|
|
second.
|
||
|
|
- `/queue` command in-chat re-displays state.
|
||
|
|
- Prometheus HTTP server bound to `127.0.0.1:9091` (loopback only).
|
||
|
|
- Discord webhook config + plumbing for the alert categories from
|
||
|
|
ARCHITECTURE §7.
|
||
|
|
- `/authlimbo queue policy` command — prints the tier policy
|
||
|
|
in-game so players can self-verify they're not in a hidden tier.
|
||
|
|
|
||
|
|
### Acceptance
|
||
|
|
|
||
|
|
1. Stress test: 1000 simulated connections in 60s.
|
||
|
|
`authlimbo_queue_depth` peaks at `max-queue-depth`, never higher.
|
||
|
|
No `pendingTransit` leak (returns to 0 within 30s of flood end).
|
||
|
|
2. Staff bypass: a player with `authlimbo.queue.priority.staff`
|
||
|
|
skips even a full queue. Audit log records the bypass.
|
||
|
|
3. Pi-hole-style IP blocklist drops a connection at gatekeeper —
|
||
|
|
never enters limbo. `authlimbo_connections_total{outcome="rejected"}`
|
||
|
|
increments.
|
||
|
|
4. Prometheus scrape of `localhost:9091/metrics` returns OpenMetrics
|
||
|
|
format with all metrics from ARCHITECTURE §7.
|
||
|
|
5. `/authlimbo queue policy` output matches ARCHITECTURE §3 tier table
|
||
|
|
verbatim (rendered from a single source-of-truth string).
|
||
|
|
|
||
|
|
### Dependencies
|
||
|
|
|
||
|
|
M0 (state machine + audit log), M3 (so legitimate logins still
|
||
|
|
flow correctly through the new gatekeeper layer).
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## M5 · Hardening, drama-avoidance lock-in, release · OPEN
|
||
|
|
|
||
|
|
**Goal:** Lock in the anti-drama policy so it can't drift. Ship v2.0.0.
|
||
|
|
|
||
|
|
### Deliverables
|
||
|
|
|
||
|
|
- Anti-drama policy constants in code (not config) — paid-tier and
|
||
|
|
hidden-tier escape hatches do not exist as configurable knobs.
|
||
|
|
Adding one would require a code change + AGPL fork.
|
||
|
|
- Reload-without-restart (`/authlimbo reload`) with in-flight transit
|
||
|
|
drain (max 30s wait).
|
||
|
|
- Fail-closed implementation for AuthMe DB unreachable case (kick
|
||
|
|
with operator-friendly message + webhook).
|
||
|
|
- Server-shutdown drain hook: clear transit, save snapshots, kick
|
||
|
|
limbo players with "server restarting" message.
|
||
|
|
- Chaos-test suite: kill-plugin-mid-login, kill-container, AuthMe-DB
|
||
|
|
network-drop. All recoverable.
|
||
|
|
- Documentation: `V2-ARCHITECTURE.md` (this milestone's companion),
|
||
|
|
`V2-RELEASE.md` migration guide for operators, updated
|
||
|
|
`compatibility.md` and `installation.md`.
|
||
|
|
- Tag v2.0.0, push to git.s8n.ru/s8n/auth-limbo, GitHub
|
||
|
|
push-mirror, attach jar to release.
|
||
|
|
|
||
|
|
### Acceptance
|
||
|
|
|
||
|
|
1. Plugin reload during a live transit completes the in-flight
|
||
|
|
restore correctly, no inventory loss.
|
||
|
|
2. Killing the plugin (`/plugman unload`) during [LIMBO] state and
|
||
|
|
restarting the server: rejoining player is restored from state +
|
||
|
|
snapshot.
|
||
|
|
3. AuthMe DB hard-down: connection rejected at gatekeeper, never
|
||
|
|
reaches main world. Operator gets webhook within 30s.
|
||
|
|
4. CHANGELOG documents every breaking change, every renamed
|
||
|
|
permission node, every config schema change.
|
||
|
|
5. v2.0.0 jar runs end-to-end on the racked.ru staging container
|
||
|
|
(parallel to v1 prod) for 7 days with zero void-deaths and zero
|
||
|
|
inventory losses.
|
||
|
|
|
||
|
|
### Dependencies
|
||
|
|
|
||
|
|
M0-M4. M5 is the gate to release.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Parked / non-blocking
|
||
|
|
|
||
|
|
These items are **not** in the v2.0.0 critical path. Tracked here so
|
||
|
|
they aren't lost.
|
||
|
|
|
||
|
|
- `P-VELO` · Velocity-mode behind feature flag (target: v2.2.0).
|
||
|
|
Requires a real second backend or proxy mesh first.
|
||
|
|
- `P-COBBLE` · cobblestone-server interop. Wait for cobblestone
|
||
|
|
intake to land in `_github/infra/`.
|
||
|
|
- `P-PLUGIN-MSG` · Plugin-message channel between paper-side and
|
||
|
|
proxy-side gatekeepers (prep for `P-VELO`).
|
||
|
|
- `P-WEB-UI` · Read-only web dashboard for queue + metrics. Defer
|
||
|
|
until operator asks.
|
||
|
|
- `P-CROWDSEC` · Pluggable IP-blocklist source (CrowdSec API). v2.0.0
|
||
|
|
uses static config + Pi-hole hosts file.
|
||
|
|
- `P-MOJANG-BAN-CHECK` · Honor Mojang's name-changed-but-banned
|
||
|
|
blocklist. Niche, defer.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Cross-cutting acceptance: privacy invariant
|
||
|
|
|
||
|
|
Every milestone must preserve the v1 privacy invariant: *no
|
||
|
|
main-world player can observe any pre-auth player's coordinates,
|
||
|
|
inventory, or chat*.
|
||
|
|
|
||
|
|
A dedicated `PrivacyInvariantTest` (introduced in M2) runs on every
|
||
|
|
PR and must pass for merge. The test enumerates the six scope
|
||
|
|
boundaries from ARCHITECTURE §4 and asserts no leak in either
|
||
|
|
direction.
|
||
|
|
|
||
|
|
If a milestone would relax any boundary, it MUST be flagged in the PR
|
||
|
|
description and reviewed against `feedback_audit_then_plan.md`
|
||
|
|
(audit-then-fix workflow).
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Release plan
|
||
|
|
|
||
|
|
| Tag | Contents | Target |
|
||
|
|
|-----|----------|--------|
|
||
|
|
| v2.0.0-rc1 | M0 + M1 + M2 + M3 | end of week 1 |
|
||
|
|
| v2.0.0-rc2 | + M4 | end of week 2 |
|
||
|
|
| v2.0.0 | + M5, 7-day staging soak | end of week 3 |
|
||
|
|
| v2.1.0 | parked items as operator pulls them in | opportunistic |
|
||
|
|
|
||
|
|
All releases tagged on `git.s8n.ru/s8n/auth-limbo` first; GitHub is
|
||
|
|
push-mirror per `feedback_my_git_is_forgejo.md`.
|
||
|
|
|
||
|
|
Operator handles end-of-session push.
|