auth-limbo/docs/V2-ROADMAP.md

310 lines
11 KiB
Markdown
Raw Permalink Normal View History

# AuthLimbo v2 — Roadmap (M0-M5)
Companion to [`V2-ARCHITECTURE.md`](V2-ARCHITECTURE.md). Tracks the
v2.0.0 implementation as ordered milestones with explicit acceptance
criteria, dependencies, and parking lots for non-blocking work.
Status legend: `OPEN`, `WIP`, `BLOCKED`, `DONE`.
Owner: Claude Code agents under operator review.
Branching: every milestone lands on a feature branch
`v2/M{N}-<slug>` and merges into `v2-main` after acceptance. `v2-main`
becomes `main` at v2.0.0 release.
Pre-requisite: v1.1.0 (F1 + F2 + F4) is on `main` and tagged.
v2 work begins on a fresh `v2-main` branch.
---
## M0 · Foundations · OPEN
**Goal:** Land the v2 skeleton so all later milestones plug into a
shared backbone. No behaviour changes for end-users.
### Deliverables
- New maven module `core` for the gatekeeper/restore split (Velocity-ready
seam). Existing `ru.authlimbo` package becomes `ru.authlimbo.paper`.
- `State` enum + `StateMachine` class (`CONNECT → GATE → SNAPSHOT
→ LIMBO → PRELOAD → RESTORE → LIVE | REJECTED | SPECTATOR_FAIL`)
with persistence to `plugins/AuthLimbo/state/<uuid>.json`.
- `AuditLog` writer (JSON-Lines append-only, logrotate-compatible).
- `MetricsRegistry` skeleton (counters, gauges, histograms — no HTTP
server yet, just in-memory accounting).
- Config-v2 schema + automatic v1→v2 migration with backup.
- Build: maven multi-module, sqlite-jdbc still shaded, Adventure API
brought in via Paper API (no extra shade).
### Acceptance
1. Plugin loads on Paper 1.21.11 with v1 config; v1→v2 migration runs
exactly once and writes `config.v1.bak`.
2. `/authlimbo state <player>` shows current state for any in-flight
player.
3. `audit.log` is created and rotates at 100MB (verified by manual
100MB-noise injection).
4. All v1.1.0 behaviour is preserved (F1, F2, F4 still work
end-to-end on a stub-AuthMe test server).
5. Unit tests for state-machine transition validity pass in CI.
### Dependencies
None. M0 is the foundation.
---
## M1 · Snapshot subsystem · OPEN
**Goal:** Make inventory loss impossible regardless of any chunk /
teleport / damage bug downstream.
### Deliverables
- On `AuthMeAsyncPreLoginEvent`: copy `world/playerdata/<uuid>.dat`
to `plugins/AuthLimbo/snapshots/<uuid>-<timestamp>.nbt`, log
SHA-256.
- On `PlayerDeathEvent` while UUID is in `pendingTransit`:
`keepInventory=true`, drops cleared, SEVERE logged, Discord webhook
fired, schedule restore-from-snapshot on respawn.
- New command `/authlimbo restore <player> [--snapshot=<file>]` that
rolls back to a snapshot (uses bundled nbtlib equivalent or an
embedded reader).
- Snapshot retention GC: 7-day default, configurable, runs hourly.
- Metric: `authlimbo_snapshot_restored_total`.
### Acceptance
1. Forced-void-death during transit (test-harness `/limbo void <player>`):
player respawns with full inventory + xp.
2. Snapshot files appear in `snapshots/`, SHA-256 logged on creation
and on read-back.
3. GC removes >7-day snapshots; verified by setting retention=10s in
test config.
4. `/authlimbo restore <player>` after a successful login restores
the pre-login inventory and sends an audit-log entry.
### Dependencies
M0 (audit log + state machine).
---
## M2 · Privacy-isolation hardening · OPEN
**Goal:** Tighten the limbo-world isolation surface — no leaks of
chat, tablist, or join messages between limbo and main world. Make
the privacy invariant testable.
### Deliverables
- `PlayerChatEvent` listener (HIGHEST): drop limbo-world recipients
from main-world chat; main-world recipients from limbo chat.
- Tablist scoping via `Player#hidePlayer(plugin, target)`:
- limbo players hidden from main-world tablist;
- main-world players hidden from limbo tablist;
- limbo players hidden from each other.
- Join-message shifting: suppress vanilla join message on initial
connect; fire delayed join message at state-machine [LIVE]
transition.
- Per-player view-distance forced to 2 in limbo
(`Player#setViewDistance(2)` on limbo entry, restore on exit).
- Limbo BARRIER ceiling at y=129 added to `LimboWorldManager`.
### Acceptance
1. With two test accounts (`alice` in main world, `bob` connecting
to limbo): `alice` does not see `bob` in tablist before `bob`
completes login. After login, `alice` sees `bob`'s join message
exactly once.
2. `bob` in limbo cannot see chat from `alice`. Verified via
integration test.
3. `bob` cannot fly out of limbo via creative/elytra (server starts
bob in survival; barrier ceiling prevents y>129).
4. Privacy invariant test (`PrivacyInvariantTest`) covers all six
scope boundaries (chat in/out, tablist in/out, join-msg before/after).
### Dependencies
M0.
---
## M3 · Restore reliability (3x3 preload + chunk-ready verification) · OPEN
**Goal:** Make the restore-teleport bullet-proof against the
"loaded-but-neighbour-unloaded" race that v1's F3 was designed for,
plus the silent-failure case where `teleportAsync` returns true but
the player is still at the old position.
### Deliverables
- 3x3 chunk preload around target (`addPluginChunkTicket` x9 +
`CompletableFuture.allOf(getChunkAtAsyncUrgently x9)`).
- Post-TP verification: 5 ticks after `teleportAsync` returns true,
check `player.getLocation().distance(saved) < 2.0`. If not, treat
as silent fail and retry.
- F2-style retry loop already from v1.1 carried over with v2 metrics
+ audit log integration.
- Drop the SPECTATOR pre-TP trick (v1's F8 redesign): rely on the
snapshot + damage-guard layers instead.
- Metric: `authlimbo_restore_duration_seconds` histogram.
### Acceptance
1. AUDIT-2026-05-07 §5.1 (unloaded-chunk void) reproduces no
void-death and no inventory loss. Player lands at saved coords.
2. AUDIT-2026-05-07 §5.2 (invalid Y) escalates to
`SPECTATOR_FAIL` after 3 retries with audit-log + webhook.
3. New scenario: target at chunk-section boundary
(e.g. (16, 70, 16)) — 3x3 preload makes this work first try.
4. Histogram p99 restore duration < 2.5s under normal load (no bot
flood).
### Dependencies
M0, M1 (snapshot is the safety net while M3 retry-loops).
---
## M4 · Gatekeeper + queue + observability · OPEN
**Goal:** Bring the queue, trust tiers, metrics endpoint, and
Discord webhook online. After M4 the operator has full visibility
without needing to grep logs.
### Deliverables
- Gatekeeper interface (`Gatekeeper.accept(connection) → Decision`)
with Paper-side implementation. Decision: `accept`, `queue`,
`reject`.
- Trust-tier resolver: reads LP permissions for `staff`,
AuthMe-DB last-seen for `returning` vs `new`, IP-block list for
`flagged`. Cacheable.
- Bounded queue with FIFO ordering by connect-time + tier priority.
Configurable `max-concurrent-auth`, `max-queue-depth`,
`queue-timeout-seconds`.
- BossBar UI in limbo: shows tier + position + ETA. Updates every
second.
- `/queue` command in-chat re-displays state.
- Prometheus HTTP server bound to `127.0.0.1:9091` (loopback only).
- Discord webhook config + plumbing for the alert categories from
ARCHITECTURE §7.
- `/authlimbo queue policy` command — prints the tier policy
in-game so players can self-verify they're not in a hidden tier.
### Acceptance
1. Stress test: 1000 simulated connections in 60s.
`authlimbo_queue_depth` peaks at `max-queue-depth`, never higher.
No `pendingTransit` leak (returns to 0 within 30s of flood end).
2. Staff bypass: a player with `authlimbo.queue.priority.staff`
skips even a full queue. Audit log records the bypass.
3. Pi-hole-style IP blocklist drops a connection at gatekeeper —
never enters limbo. `authlimbo_connections_total{outcome="rejected"}`
increments.
4. Prometheus scrape of `localhost:9091/metrics` returns OpenMetrics
format with all metrics from ARCHITECTURE §7.
5. `/authlimbo queue policy` output matches ARCHITECTURE §3 tier table
verbatim (rendered from a single source-of-truth string).
### Dependencies
M0 (state machine + audit log), M3 (so legitimate logins still
flow correctly through the new gatekeeper layer).
---
## M5 · Hardening, drama-avoidance lock-in, release · OPEN
**Goal:** Lock in the anti-drama policy so it can't drift. Ship v2.0.0.
### Deliverables
- Anti-drama policy constants in code (not config) — paid-tier and
hidden-tier escape hatches do not exist as configurable knobs.
Adding one would require a code change + AGPL fork.
- Reload-without-restart (`/authlimbo reload`) with in-flight transit
drain (max 30s wait).
- Fail-closed implementation for AuthMe DB unreachable case (kick
with operator-friendly message + webhook).
- Server-shutdown drain hook: clear transit, save snapshots, kick
limbo players with "server restarting" message.
- Chaos-test suite: kill-plugin-mid-login, kill-container, AuthMe-DB
network-drop. All recoverable.
- Documentation: `V2-ARCHITECTURE.md` (this milestone's companion),
`V2-RELEASE.md` migration guide for operators, updated
`compatibility.md` and `installation.md`.
- Tag v2.0.0, push to git.s8n.ru/s8n/auth-limbo, GitHub
push-mirror, attach jar to release.
### Acceptance
1. Plugin reload during a live transit completes the in-flight
restore correctly, no inventory loss.
2. Killing the plugin (`/plugman unload`) during [LIMBO] state and
restarting the server: rejoining player is restored from state +
snapshot.
3. AuthMe DB hard-down: connection rejected at gatekeeper, never
reaches main world. Operator gets webhook within 30s.
4. CHANGELOG documents every breaking change, every renamed
permission node, every config schema change.
5. v2.0.0 jar runs end-to-end on the racked.ru staging container
(parallel to v1 prod) for 7 days with zero void-deaths and zero
inventory losses.
### Dependencies
M0-M4. M5 is the gate to release.
---
## Parked / non-blocking
These items are **not** in the v2.0.0 critical path. Tracked here so
they aren't lost.
- `P-VELO` · Velocity-mode behind feature flag (target: v2.2.0).
Requires a real second backend or proxy mesh first.
- `P-COBBLE` · cobblestone-server interop. Wait for cobblestone
intake to land in `_github/infra/`.
- `P-PLUGIN-MSG` · Plugin-message channel between paper-side and
proxy-side gatekeepers (prep for `P-VELO`).
- `P-WEB-UI` · Read-only web dashboard for queue + metrics. Defer
until operator asks.
- `P-CROWDSEC` · Pluggable IP-blocklist source (CrowdSec API). v2.0.0
uses static config + Pi-hole hosts file.
- `P-MOJANG-BAN-CHECK` · Honor Mojang's name-changed-but-banned
blocklist. Niche, defer.
---
## Cross-cutting acceptance: privacy invariant
Every milestone must preserve the v1 privacy invariant: *no
main-world player can observe any pre-auth player's coordinates,
inventory, or chat*.
A dedicated `PrivacyInvariantTest` (introduced in M2) runs on every
PR and must pass for merge. The test enumerates the six scope
boundaries from ARCHITECTURE §4 and asserts no leak in either
direction.
If a milestone would relax any boundary, it MUST be flagged in the PR
description and reviewed against `feedback_audit_then_plan.md`
(audit-then-fix workflow).
---
## Release plan
| Tag | Contents | Target |
|-----|----------|--------|
| v2.0.0-rc1 | M0 + M1 + M2 + M3 | end of week 1 |
| v2.0.0-rc2 | + M4 | end of week 2 |
| v2.0.0 | + M5, 7-day staging soak | end of week 3 |
| v2.1.0 | parked items as operator pulls them in | opportunistic |
All releases tagged on `git.s8n.ru/s8n/auth-limbo` first; GitHub is
push-mirror per `feedback_my_git_is_forgejo.md`.
Operator handles end-of-session push.