diff --git a/AUDIT-2026-05-07.md b/AUDIT-2026-05-07.md new file mode 100644 index 0000000..37dae9b --- /dev/null +++ b/AUDIT-2026-05-07.md @@ -0,0 +1,260 @@ +# AUDIT — 2026-05-07 — YOU500 void-death on AuthMe restore + +Reviewer: Claude (auth-limbo audit pass). +Scope: Read-only review of `src/main/java/ru/authlimbo/**` against a real +production incident on `racked.ru` at 2026-05-07 17:13:39 UTC. +Status: **Audit-only — no code changes applied.** Fixes tracked in +[ROADMAP.md](ROADMAP.md). + +--- + +## 1. Incident + +`YOU500` joined the server, was held in `auth_limbo` (correct), authenticated +to AuthMe, and was teleported back to overworld — but Paper rejected the +teleport and the player void-died with full inventory loss. + +### Raw log (paper-server.log, trimmed) + +``` +17:13:35 YOU500[/45.157.234.219] logged in with entity id 26548 + at ([auth_limbo]0.5, 128.0, 0.5) +17:13:38 [AuthMe] YOU500 logged in +17:13:39 [INFO:DEBUG] Restoring fly speed for LimboPlayer YOU500 to 0.1 (RESTORE_NO_ZERO mode) +17:13:39 [INFO:DEBUG] Teleporting `YOU500` after login, based on the player auth +17:13:39 YOU500 left the confines of this world <-- VOID DEATH +17:13:39 [AuthLimbo] Restoring YOU500 to world(2380.4, 69.9, -11358.4) +17:13:39 [AuthLimbo] teleportAsync returned false for YOU500 + — Paper may have rejected the location. +``` + +Loss: full inventory, full xp. Privacy posture (limbo-on-join) **was not +breached** — player was authenticated before the failure. The bug is purely +the restore step. + +### What happened, in order + +1. AuthMe pre-login fires. `LoginListener.onAsyncPreLogin` reads + `authme.db` and schedules `addPluginChunkTicket` on `world` chunk + `(2380>>4=148, -11359>>4=-710)`. So far so good. +2. AuthMe authenticates and runs **its own** broken teleport + (`Teleporting YOU500 after login`). This is the AuthMe-fork log line, not + ours — AuthMe does a `teleportAsync` of its own with no chunk preload. +3. AuthMe's teleport partially moves the entity into `world` at the saved + coords **before the chunk is actually loaded**. The entity is now at + y=69.9 in an unloaded section. Paper's "outside loaded chunk" path + triggers and the player drops/voids — log line `left the confines of + this world` fires. +4. Our 10-tick delayed callback runs (`LoginListener.doTeleport`, + line 133). Player is "online" but already dead/spectator-on-respawn. + We log `Restoring …` and call `teleportAsync`. +5. `teleportAsync` resolves with `false` because Paper rejects the move + for a dead/transitioning entity, or because the player is no longer in + a state where a `PLUGIN`-cause teleport is accepted. +6. We log `teleportAsync returned false` and return. Player remains in + void-death state. + +The inventory loss is not from us — it's vanilla `keepInventory=false` +behaviour on void death. We do not snapshot inventories. + +--- + +## 2. Code path trace + +| Step | File | Lines | Note | +|------|------|-------|------| +| Pre-login chunk pin | `LoginListener.java` | 78–109 | OK — runs ~1s before login completes. | +| Login event handler | `LoginListener.java` | 113–129 | MONITOR priority, schedules `doTeleport` 10 ticks later. | +| Saved-location read | `AuthMeDatabase.java` | 68–107 | Read-only, fresh JDBC conn per call. | +| `doTeleport` | `LoginListener.java` | 133–192 | The hot path. | +| `getChunkAtAsyncUrgently` | line 165 | — | Fires; on success calls teleportAsync. | +| `teleportAsync` | line 166 | — | The call that returned `false`. | +| Failure branch | lines 172–175 | — | **Logs only.** No retry. No safety relocate. **No void-death guard.** | +| `exceptionally` branch | lines 180–185, 186–191 | — | Logs only. | + +--- + +## 3. Root-cause hypothesis (ranked) + +### H1 — AuthMe's own broken teleport voids the player BEFORE our handler fires *(most likely)* + +The AuthMe-fork log line `Teleporting YOU500 after login, based on the +player auth` at `17:13:39` is from AuthMe-ReReloaded fork b49 itself +(`PlayerAuth.teleportOnLogin` flow). AuthMe does a teleport with **no chunk +preload** to the saved coords. In Paper 1.21.11, calling `teleportAsync` to +a location where the chunk is still not fully *loaded into the player's +view* (vs. just having a chunk-ticket) can move the entity into a section +where its block-below check returns null and the entity is treated as +out-of-world. The `left the confines of this world` line fires immediately +after, BEFORE our 10-tick delay elapses. + +By the time `doTeleport` runs at 17:13:39, the player is already dead / +respawning. Paper rejects our `teleportAsync` because: +- `Player.isOnline()` returns true (still connected) — passes our guard +- but the entity is mid-respawn / dead — Paper rejects PLUGIN-cause TPs + against entities in that state ⇒ `false`. + +This is consistent with all five log lines and with Paper #4085's +description of the race. + +> Implication: our pre-login chunk-ticket and our delayed teleport are +> defending the wrong moment. AuthMe-fork's *own* teleport, which runs +> ~1 tick after `LoginEvent`, is what voids the player. We then arrive +> too late. + +### H2 — The chunk ticket is added on the wrong chunk *(possible secondary)* + +`onAsyncPreLogin` adds a ticket on the chunk computed from the saved +quit-location. But the player's first-time-join behaviour might use a +different teleport target (AuthMe spawn-on-first-login). For an existing +player like YOU500 this is unlikely — they have a saved row. + +### H3 — `teleport-delay-ticks: 10` is too long *(secondary)* + +10 ticks (~500 ms) leaves a window for AuthMe's own broken teleport to +void-kill. A delay of `0` (run immediately on LoginEvent) **and** +cancelling AuthMe's teleport would close the gap, but cancelling AuthMe's +teleport is non-trivial. + +### H4 — Y=69.9 is too low for a chunk that hasn't generated/loaded *(unlikely)* + +The world is the main overworld and has been visited (player logged out +there). Chunk exists on disk. Y=69.9 is normal terrain height. Not the +issue. + +### H5 — Paper rejected because saved Y=69.9 is below world min height *(no)* + +1.21 overworld min-Y is -64. 69.9 is fine. + +**Conclusion:** H1 dominates. The fix must (a) defend the player against +void during the AuthMe-own-teleport window, and (b) recover gracefully +when our authoritative teleport's future returns `false`. + +--- + +## 4. Proposed fixes + +Ordered must-fix → defensive → nice-to-have. Implementation deferred per +project workflow (audit first, code after sign-off). + +### F1 — MUST: void-damage guard while player is in "transit" *(primary fix)* + +While a player is in the post-LoginEvent restore window, register a +`Set pendingTransit`. On `EntityDamageEvent` filter by: +- entity is Player, UUID in `pendingTransit` +- damage cause is `VOID` + +→ `event.setCancelled(true)` and immediately teleport the player back to +limbo spawn (`limboManager.spawn()`) at y=128. Then re-attempt the +authoritative teleport via `doTeleport` with a backoff. + +This single guard would have saved YOU500's life and inventory. + +### F2 — MUST: when `teleportAsync` future returns `false`, recover + +Right now the code at `LoginListener.java:172–175` only logs. Replace +with: + +1. Player still in pendingTransit? **Yes** ⇒ teleport to + `limboManager.spawn()` synchronously (`player.teleport(...)`, not + async, since we need to land *now*). +2. Schedule one retry of `doTeleport` after 20 ticks with the same saved + location. +3. After N=3 retries, give up and leave at limbo spawn + send player a + message ("/authlimbo tp" requires admin help). Also send admin alert. + +### F3 — MUST: pre-flight `World#getChunkAtAsync(cx, cz, true).get()` before calling teleportAsync + +Today we call `getChunkAtAsyncUrgently` then chain teleport. The chain +*should* mean the chunk is loaded — but `getChunkAtAsyncUrgently` returns +the `Chunk` object as soon as it's loaded server-side, not necessarily +"ready for entity placement" with all neighbouring sections paged in. +Force the surrounding 3x3 chunks loaded via additional +`addPluginChunkTicket` on neighbours before teleporting. + +### F4 — SHOULD: cancel or pre-empt AuthMe's own teleport + +AuthMe-ReReloaded fork b49 fires the broken teleport itself. Two options: +- **(a)** listen to `LoginEvent` at `LOWEST` priority too, immediately + teleport the player to limbo spawn (overriding any in-flight position), + then on `MONITOR` do our authoritative TP. Net: AuthMe's teleport is + effectively a no-op because we beat it back to limbo and run last. +- **(b)** `teleport-delay-ticks: 0` + use a `PlayerTeleportEvent` listener + to cancel any teleport with `TeleportCause.PLUGIN` whose source is the + AuthMe plugin instance, while pendingTransit is set for that UUID. + +(a) is simpler and contained inside our plugin. + +### F5 — SHOULD: inventory snapshot on AuthMeAsyncPreLoginEvent + +Before AuthMe authenticates, snapshot the player's inventory + xp + +location into an in-memory `Map` and persist to +`plugins/AuthLimbo/snapshots/.nbt` (or a SQLite table). On +`PlayerDeathEvent` while UUID in pendingTransit, restore inventory from +the snapshot via `keepInventory`-style override (cancel drops, restore on +respawn). Discard snapshot 30 s after successful TP. + +This is a defensive belt-and-braces — even if all chunk logic fails, no +inventory is ever lost on an auth-flow death. + +### F6 — NICE: spectator-mode fallback + +If F1–F4 all fail and the player is still in void state after N retries, +set `GameMode.SPECTATOR`, teleport to overworld spawn (server world's +default spawn), and send admin a Discord/console alert: "AuthLimbo could +not restore YOU500 — manual `/authlimbo tp YOU500` required". The +spectator mode prevents further damage and lets the player observe the +world while admin acts. + +### F7 — NICE: telemetry + +Bump a counter on each failed restore (success/fail/retry) and expose via +`/authlimbo stats` for ops visibility. + +--- + +## 5. Test plan + +Reproducible in a dev Paper 1.21.11 server with AuthMe-ReReloaded: + +1. **Unloaded-chunk void.** Set saved coord to (10000, 70, 10000) in + `authme.db` for a test account. Restart server (chunks unload). Login. + Expect: void-damage guard cancels VOID damage, player lands at saved + coords or at limbo spawn for retry. +2. **Invalid Y (above build limit).** Set saved Y to 5000. Login. Expect: + `teleportAsync` returns false, recovery branch teleports to limbo + spawn, retry escalation works. +3. **World no longer loaded.** Set saved world to a string that no longer + exists (e.g. `world_old`). Login. Expect: graceful fallback to + overworld spawn, admin notified. +4. **Death during transit.** Force `EntityDamageEvent.VOID` via a debug + command while the player is mid-restore. Expect: damage cancelled, + player relocated to limbo spawn, restore retried. +5. **Snapshot/restore on death.** With F5 implemented, kill the player + during transit. Expect: respawn with full inventory + xp. +6. **AuthMe pre-empt.** With F4(a), watch logs — AuthMe's teleport line + fires but the position is immediately overwritten by our limbo TP at + LOWEST, then by our authoritative TP at MONITOR. + +All tests run on a clean dev server, not racked.ru production. + +--- + +## 6. Privacy posture — unchanged + +None of the proposed fixes weaken the limbo-on-join privacy property: +- F1 keeps the player in limbo *longer* on damage, never exposes them + to the overworld pre-auth. +- F4(a) actually *strengthens* it by guaranteeing the limbo position is + reasserted at LOGIN-LOWEST. +- F5 stores inventory snapshots locally (server-side, plugin folder) — + no new network exposure. + +--- + +## 7. Sign-off + +Audit author: Claude (auth-limbo plugin audit pass) +Date: 2026-05-07 +Recommended next action: review ROADMAP.md, approve F1+F2 for first +implementation pass, F3+F4 for second pass. diff --git a/ROADMAP.md b/ROADMAP.md new file mode 100644 index 0000000..c2fd4c0 --- /dev/null +++ b/ROADMAP.md @@ -0,0 +1,127 @@ +# ROADMAP — AuthLimbo + +Tracked work items for the plugin. Format: priority, ID, title, status, +acceptance criteria. Source-of-truth for what needs to ship next. + +Status legend: +- `OPEN` — not started +- `WIP` — in progress on a branch +- `BLOCKED` — waiting on upstream / external +- `DONE` — landed on `main`, in a tagged release + +--- + +## P0 — must-fix (data-loss bugs) + +### F1 · OPEN · Void-damage guard during post-login restore + +Source: [AUDIT-2026-05-07.md](AUDIT-2026-05-07.md) §4 F1. Triggered by +YOU500 incident on 2026-05-07 — full inventory loss to void on login. + +Acceptance: +- New `Set pendingTransit` in `LoginListener`. +- UUID added on `LoginEvent`, removed on TP success or final retry give-up. +- `EntityDamageEvent` listener at `EventPriority.HIGHEST`: if + `entity instanceof Player`, UUID in `pendingTransit`, cause is `VOID` → + `setCancelled(true)` and `player.teleport(limboManager.spawn())` + *synchronously* (we need to land before the next tick voids them again). +- Covered by test plan §5.1 and §5.4 in AUDIT-2026-05-07.md. + +### F2 · OPEN · Recovery when teleportAsync returns false + +Source: [AUDIT-2026-05-07.md](AUDIT-2026-05-07.md) §4 F2. + +Today: `LoginListener.java:172–175` only logs. After fix: +- On `success == false`: synchronously TP to limbo spawn, schedule one + retry of `doTeleport` after 20 ticks. +- Track retry count per UUID (max 3). After 3 failures: drop into F6 path. +- Also wire the `exceptionally` branches (lines 180–185, 186–191) into + the same recovery. + +Acceptance: test plan §5.2 passes — invalid coords trigger recovery, no +void death, admin sees logged retries. + +### F4 · OPEN · Pre-empt AuthMe's own broken teleport + +Source: [AUDIT-2026-05-07.md](AUDIT-2026-05-07.md) §4 F4. Implements +option (a): add a second `LoginEvent` handler at `EventPriority.LOWEST` +that immediately teleports the player back to limbo spawn. AuthMe's +internal teleport then runs against an irrelevant location, our MONITOR +handler wins last. + +Depends on: F1 (so void damage during the LOWEST→MONITOR window is +guarded). + +Acceptance: in test plan §5.6, log shows AuthMe's TP line followed by no +`left the confines` event before our authoritative TP fires. + +--- + +## P1 — defensive (failure modes we know about) + +### F3 · OPEN · Pre-flight 3x3 chunk preload before teleportAsync + +Source: [AUDIT-2026-05-07.md](AUDIT-2026-05-07.md) §4 F3. + +In `doTeleport` (LoginListener.java:133), before calling +`getChunkAtAsyncUrgently` on the centre chunk, also `addPluginChunkTicket` +on the eight neighbours. Release all nine tickets via the existing +`scheduleTicketRelease` path (extend it to take a list). + +Acceptance: test plan §5.1 passes even when the centre chunk is on a +section boundary that previously triggered "loaded but not ready". + +### F5 · OPEN · Inventory snapshot + auto-restore on transit death + +Source: [AUDIT-2026-05-07.md](AUDIT-2026-05-07.md) §4 F5. Defence in +depth — even if F1–F4 all fail, no inventory is lost. + +- On `AuthMeAsyncPreLoginEvent` (in addition to chunk pin), snapshot + player inventory + xp + location into `Map`. +- Optionally persist to `plugins/AuthLimbo/snapshots/.nbt` for + crash-survivability. +- On `PlayerDeathEvent` while UUID in `pendingTransit`: + `event.getDrops().clear()`, `event.setKeepInventory(true)`, + `event.setKeepLevel(true)`. On `PlayerRespawnEvent`, restore from + snapshot if needed and TP to limbo spawn for re-restore. +- Snapshot discarded 30 s after successful TP. + +Acceptance: test plan §5.5 — induced death during transit yields full +inventory on respawn. + +--- + +## P2 — nice-to-have + +### F6 · OPEN · Spectator-mode admin-alert fallback + +Source: [AUDIT-2026-05-07.md](AUDIT-2026-05-07.md) §4 F6. After 3 failed +retries (F2): set spectator, TP to overworld default spawn, console log ++ optional Discord webhook config key. Player gets a message to ping +staff for manual `/authlimbo tp`. + +### F7 · OPEN · Telemetry counters + `/authlimbo stats` + +Source: [AUDIT-2026-05-07.md](AUDIT-2026-05-07.md) §4 F7. + +Track per-session counters: restore_success, restore_retry, +restore_fail, void_damage_blocked, snapshot_restored. Expose via a new +`/authlimbo stats` subcommand. Reset on plugin reload. + +--- + +## Done + +(Nothing landed since v1.0.0 release on 2026-04-30. First post-1.0 +release will be triggered by F1+F2+F4 landing as v1.1.0.) + +--- + +## Release plan + +- **v1.1.0** — F1, F2, F4 (data-loss fix). Target: ASAP. +- **v1.2.0** — F3, F5 (defence in depth). Target: within 2 weeks. +- **v1.3.0** — F6, F7 (ops UX). Target: opportunistic. + +Privacy invariant (limbo-on-join, no overworld exposure pre-auth) must +hold across every release. See `AUDIT-2026-05-07.md` §6.