docs: incident audit + roadmap for 2026-05-07 void-death
Some checks failed
Build / build (push) Has been cancelled
Some checks failed
Build / build (push) Has been cancelled
Player YOU500 lost full inventory at 17:13:39 due to AuthLimbo teleportAsync rejection during AuthMe-driven post-login teleport. Items void-dropped, no backup recoverable. AUDIT-2026-05-07.md traces code path (LoginListener.java:128, 172-175) and ranks fix candidates F1-F7. ROADMAP.md slots them across v1.1.0/v1.2.0/v1.3.0 with priority and acceptance criteria. P0 fixes pending source change: - F1: VOID-damage guard (EntityDamageEvent listener at HIGHEST) - F2: recovery on teleportAsync false (sync-TP back to limbo + retry) - F4: pre-empt AuthMe internal teleport (LoginEvent at LOWEST) Privacy posture preserved across all proposed changes.
This commit is contained in:
parent
b6863806cd
commit
1f9d4bb198
2 changed files with 387 additions and 0 deletions
260
AUDIT-2026-05-07.md
Normal file
260
AUDIT-2026-05-07.md
Normal file
|
|
@ -0,0 +1,260 @@
|
|||
# AUDIT — 2026-05-07 — YOU500 void-death on AuthMe restore
|
||||
|
||||
Reviewer: Claude (auth-limbo audit pass).
|
||||
Scope: Read-only review of `src/main/java/ru/authlimbo/**` against a real
|
||||
production incident on `racked.ru` at 2026-05-07 17:13:39 UTC.
|
||||
Status: **Audit-only — no code changes applied.** Fixes tracked in
|
||||
[ROADMAP.md](ROADMAP.md).
|
||||
|
||||
---
|
||||
|
||||
## 1. Incident
|
||||
|
||||
`YOU500` joined the server, was held in `auth_limbo` (correct), authenticated
|
||||
to AuthMe, and was teleported back to overworld — but Paper rejected the
|
||||
teleport and the player void-died with full inventory loss.
|
||||
|
||||
### Raw log (paper-server.log, trimmed)
|
||||
|
||||
```
|
||||
17:13:35 YOU500[/45.157.234.219] logged in with entity id 26548
|
||||
at ([auth_limbo]0.5, 128.0, 0.5)
|
||||
17:13:38 [AuthMe] YOU500 logged in
|
||||
17:13:39 [INFO:DEBUG] Restoring fly speed for LimboPlayer YOU500 to 0.1 (RESTORE_NO_ZERO mode)
|
||||
17:13:39 [INFO:DEBUG] Teleporting `YOU500` after login, based on the player auth
|
||||
17:13:39 YOU500 left the confines of this world <-- VOID DEATH
|
||||
17:13:39 [AuthLimbo] Restoring YOU500 to world(2380.4, 69.9, -11358.4)
|
||||
17:13:39 [AuthLimbo] teleportAsync returned false for YOU500
|
||||
— Paper may have rejected the location.
|
||||
```
|
||||
|
||||
Loss: full inventory, full xp. Privacy posture (limbo-on-join) **was not
|
||||
breached** — player was authenticated before the failure. The bug is purely
|
||||
the restore step.
|
||||
|
||||
### What happened, in order
|
||||
|
||||
1. AuthMe pre-login fires. `LoginListener.onAsyncPreLogin` reads
|
||||
`authme.db` and schedules `addPluginChunkTicket` on `world` chunk
|
||||
`(2380>>4=148, -11359>>4=-710)`. So far so good.
|
||||
2. AuthMe authenticates and runs **its own** broken teleport
|
||||
(`Teleporting YOU500 after login`). This is the AuthMe-fork log line, not
|
||||
ours — AuthMe does a `teleportAsync` of its own with no chunk preload.
|
||||
3. AuthMe's teleport partially moves the entity into `world` at the saved
|
||||
coords **before the chunk is actually loaded**. The entity is now at
|
||||
y=69.9 in an unloaded section. Paper's "outside loaded chunk" path
|
||||
triggers and the player drops/voids — log line `left the confines of
|
||||
this world` fires.
|
||||
4. Our 10-tick delayed callback runs (`LoginListener.doTeleport`,
|
||||
line 133). Player is "online" but already dead/spectator-on-respawn.
|
||||
We log `Restoring …` and call `teleportAsync`.
|
||||
5. `teleportAsync` resolves with `false` because Paper rejects the move
|
||||
for a dead/transitioning entity, or because the player is no longer in
|
||||
a state where a `PLUGIN`-cause teleport is accepted.
|
||||
6. We log `teleportAsync returned false` and return. Player remains in
|
||||
void-death state.
|
||||
|
||||
The inventory loss is not from us — it's vanilla `keepInventory=false`
|
||||
behaviour on void death. We do not snapshot inventories.
|
||||
|
||||
---
|
||||
|
||||
## 2. Code path trace
|
||||
|
||||
| Step | File | Lines | Note |
|
||||
|------|------|-------|------|
|
||||
| Pre-login chunk pin | `LoginListener.java` | 78–109 | OK — runs ~1s before login completes. |
|
||||
| Login event handler | `LoginListener.java` | 113–129 | MONITOR priority, schedules `doTeleport` 10 ticks later. |
|
||||
| Saved-location read | `AuthMeDatabase.java` | 68–107 | Read-only, fresh JDBC conn per call. |
|
||||
| `doTeleport` | `LoginListener.java` | 133–192 | The hot path. |
|
||||
| `getChunkAtAsyncUrgently` | line 165 | — | Fires; on success calls teleportAsync. |
|
||||
| `teleportAsync` | line 166 | — | The call that returned `false`. |
|
||||
| Failure branch | lines 172–175 | — | **Logs only.** No retry. No safety relocate. **No void-death guard.** |
|
||||
| `exceptionally` branch | lines 180–185, 186–191 | — | Logs only. |
|
||||
|
||||
---
|
||||
|
||||
## 3. Root-cause hypothesis (ranked)
|
||||
|
||||
### H1 — AuthMe's own broken teleport voids the player BEFORE our handler fires *(most likely)*
|
||||
|
||||
The AuthMe-fork log line `Teleporting YOU500 after login, based on the
|
||||
player auth` at `17:13:39` is from AuthMe-ReReloaded fork b49 itself
|
||||
(`PlayerAuth.teleportOnLogin` flow). AuthMe does a teleport with **no chunk
|
||||
preload** to the saved coords. In Paper 1.21.11, calling `teleportAsync` to
|
||||
a location where the chunk is still not fully *loaded into the player's
|
||||
view* (vs. just having a chunk-ticket) can move the entity into a section
|
||||
where its block-below check returns null and the entity is treated as
|
||||
out-of-world. The `left the confines of this world` line fires immediately
|
||||
after, BEFORE our 10-tick delay elapses.
|
||||
|
||||
By the time `doTeleport` runs at 17:13:39, the player is already dead /
|
||||
respawning. Paper rejects our `teleportAsync` because:
|
||||
- `Player.isOnline()` returns true (still connected) — passes our guard
|
||||
- but the entity is mid-respawn / dead — Paper rejects PLUGIN-cause TPs
|
||||
against entities in that state ⇒ `false`.
|
||||
|
||||
This is consistent with all five log lines and with Paper #4085's
|
||||
description of the race.
|
||||
|
||||
> Implication: our pre-login chunk-ticket and our delayed teleport are
|
||||
> defending the wrong moment. AuthMe-fork's *own* teleport, which runs
|
||||
> ~1 tick after `LoginEvent`, is what voids the player. We then arrive
|
||||
> too late.
|
||||
|
||||
### H2 — The chunk ticket is added on the wrong chunk *(possible secondary)*
|
||||
|
||||
`onAsyncPreLogin` adds a ticket on the chunk computed from the saved
|
||||
quit-location. But the player's first-time-join behaviour might use a
|
||||
different teleport target (AuthMe spawn-on-first-login). For an existing
|
||||
player like YOU500 this is unlikely — they have a saved row.
|
||||
|
||||
### H3 — `teleport-delay-ticks: 10` is too long *(secondary)*
|
||||
|
||||
10 ticks (~500 ms) leaves a window for AuthMe's own broken teleport to
|
||||
void-kill. A delay of `0` (run immediately on LoginEvent) **and**
|
||||
cancelling AuthMe's teleport would close the gap, but cancelling AuthMe's
|
||||
teleport is non-trivial.
|
||||
|
||||
### H4 — Y=69.9 is too low for a chunk that hasn't generated/loaded *(unlikely)*
|
||||
|
||||
The world is the main overworld and has been visited (player logged out
|
||||
there). Chunk exists on disk. Y=69.9 is normal terrain height. Not the
|
||||
issue.
|
||||
|
||||
### H5 — Paper rejected because saved Y=69.9 is below world min height *(no)*
|
||||
|
||||
1.21 overworld min-Y is -64. 69.9 is fine.
|
||||
|
||||
**Conclusion:** H1 dominates. The fix must (a) defend the player against
|
||||
void during the AuthMe-own-teleport window, and (b) recover gracefully
|
||||
when our authoritative teleport's future returns `false`.
|
||||
|
||||
---
|
||||
|
||||
## 4. Proposed fixes
|
||||
|
||||
Ordered must-fix → defensive → nice-to-have. Implementation deferred per
|
||||
project workflow (audit first, code after sign-off).
|
||||
|
||||
### F1 — MUST: void-damage guard while player is in "transit" *(primary fix)*
|
||||
|
||||
While a player is in the post-LoginEvent restore window, register a
|
||||
`Set<UUID> pendingTransit`. On `EntityDamageEvent` filter by:
|
||||
- entity is Player, UUID in `pendingTransit`
|
||||
- damage cause is `VOID`
|
||||
|
||||
→ `event.setCancelled(true)` and immediately teleport the player back to
|
||||
limbo spawn (`limboManager.spawn()`) at y=128. Then re-attempt the
|
||||
authoritative teleport via `doTeleport` with a backoff.
|
||||
|
||||
This single guard would have saved YOU500's life and inventory.
|
||||
|
||||
### F2 — MUST: when `teleportAsync` future returns `false`, recover
|
||||
|
||||
Right now the code at `LoginListener.java:172–175` only logs. Replace
|
||||
with:
|
||||
|
||||
1. Player still in pendingTransit? **Yes** ⇒ teleport to
|
||||
`limboManager.spawn()` synchronously (`player.teleport(...)`, not
|
||||
async, since we need to land *now*).
|
||||
2. Schedule one retry of `doTeleport` after 20 ticks with the same saved
|
||||
location.
|
||||
3. After N=3 retries, give up and leave at limbo spawn + send player a
|
||||
message ("/authlimbo tp" requires admin help). Also send admin alert.
|
||||
|
||||
### F3 — MUST: pre-flight `World#getChunkAtAsync(cx, cz, true).get()` before calling teleportAsync
|
||||
|
||||
Today we call `getChunkAtAsyncUrgently` then chain teleport. The chain
|
||||
*should* mean the chunk is loaded — but `getChunkAtAsyncUrgently` returns
|
||||
the `Chunk` object as soon as it's loaded server-side, not necessarily
|
||||
"ready for entity placement" with all neighbouring sections paged in.
|
||||
Force the surrounding 3x3 chunks loaded via additional
|
||||
`addPluginChunkTicket` on neighbours before teleporting.
|
||||
|
||||
### F4 — SHOULD: cancel or pre-empt AuthMe's own teleport
|
||||
|
||||
AuthMe-ReReloaded fork b49 fires the broken teleport itself. Two options:
|
||||
- **(a)** listen to `LoginEvent` at `LOWEST` priority too, immediately
|
||||
teleport the player to limbo spawn (overriding any in-flight position),
|
||||
then on `MONITOR` do our authoritative TP. Net: AuthMe's teleport is
|
||||
effectively a no-op because we beat it back to limbo and run last.
|
||||
- **(b)** `teleport-delay-ticks: 0` + use a `PlayerTeleportEvent` listener
|
||||
to cancel any teleport with `TeleportCause.PLUGIN` whose source is the
|
||||
AuthMe plugin instance, while pendingTransit is set for that UUID.
|
||||
|
||||
(a) is simpler and contained inside our plugin.
|
||||
|
||||
### F5 — SHOULD: inventory snapshot on AuthMeAsyncPreLoginEvent
|
||||
|
||||
Before AuthMe authenticates, snapshot the player's inventory + xp +
|
||||
location into an in-memory `Map<UUID, Snapshot>` and persist to
|
||||
`plugins/AuthLimbo/snapshots/<uuid>.nbt` (or a SQLite table). On
|
||||
`PlayerDeathEvent` while UUID in pendingTransit, restore inventory from
|
||||
the snapshot via `keepInventory`-style override (cancel drops, restore on
|
||||
respawn). Discard snapshot 30 s after successful TP.
|
||||
|
||||
This is a defensive belt-and-braces — even if all chunk logic fails, no
|
||||
inventory is ever lost on an auth-flow death.
|
||||
|
||||
### F6 — NICE: spectator-mode fallback
|
||||
|
||||
If F1–F4 all fail and the player is still in void state after N retries,
|
||||
set `GameMode.SPECTATOR`, teleport to overworld spawn (server world's
|
||||
default spawn), and send admin a Discord/console alert: "AuthLimbo could
|
||||
not restore YOU500 — manual `/authlimbo tp YOU500` required". The
|
||||
spectator mode prevents further damage and lets the player observe the
|
||||
world while admin acts.
|
||||
|
||||
### F7 — NICE: telemetry
|
||||
|
||||
Bump a counter on each failed restore (success/fail/retry) and expose via
|
||||
`/authlimbo stats` for ops visibility.
|
||||
|
||||
---
|
||||
|
||||
## 5. Test plan
|
||||
|
||||
Reproducible in a dev Paper 1.21.11 server with AuthMe-ReReloaded:
|
||||
|
||||
1. **Unloaded-chunk void.** Set saved coord to (10000, 70, 10000) in
|
||||
`authme.db` for a test account. Restart server (chunks unload). Login.
|
||||
Expect: void-damage guard cancels VOID damage, player lands at saved
|
||||
coords or at limbo spawn for retry.
|
||||
2. **Invalid Y (above build limit).** Set saved Y to 5000. Login. Expect:
|
||||
`teleportAsync` returns false, recovery branch teleports to limbo
|
||||
spawn, retry escalation works.
|
||||
3. **World no longer loaded.** Set saved world to a string that no longer
|
||||
exists (e.g. `world_old`). Login. Expect: graceful fallback to
|
||||
overworld spawn, admin notified.
|
||||
4. **Death during transit.** Force `EntityDamageEvent.VOID` via a debug
|
||||
command while the player is mid-restore. Expect: damage cancelled,
|
||||
player relocated to limbo spawn, restore retried.
|
||||
5. **Snapshot/restore on death.** With F5 implemented, kill the player
|
||||
during transit. Expect: respawn with full inventory + xp.
|
||||
6. **AuthMe pre-empt.** With F4(a), watch logs — AuthMe's teleport line
|
||||
fires but the position is immediately overwritten by our limbo TP at
|
||||
LOWEST, then by our authoritative TP at MONITOR.
|
||||
|
||||
All tests run on a clean dev server, not racked.ru production.
|
||||
|
||||
---
|
||||
|
||||
## 6. Privacy posture — unchanged
|
||||
|
||||
None of the proposed fixes weaken the limbo-on-join privacy property:
|
||||
- F1 keeps the player in limbo *longer* on damage, never exposes them
|
||||
to the overworld pre-auth.
|
||||
- F4(a) actually *strengthens* it by guaranteeing the limbo position is
|
||||
reasserted at LOGIN-LOWEST.
|
||||
- F5 stores inventory snapshots locally (server-side, plugin folder) —
|
||||
no new network exposure.
|
||||
|
||||
---
|
||||
|
||||
## 7. Sign-off
|
||||
|
||||
Audit author: Claude (auth-limbo plugin audit pass)
|
||||
Date: 2026-05-07
|
||||
Recommended next action: review ROADMAP.md, approve F1+F2 for first
|
||||
implementation pass, F3+F4 for second pass.
|
||||
127
ROADMAP.md
Normal file
127
ROADMAP.md
Normal file
|
|
@ -0,0 +1,127 @@
|
|||
# ROADMAP — AuthLimbo
|
||||
|
||||
Tracked work items for the plugin. Format: priority, ID, title, status,
|
||||
acceptance criteria. Source-of-truth for what needs to ship next.
|
||||
|
||||
Status legend:
|
||||
- `OPEN` — not started
|
||||
- `WIP` — in progress on a branch
|
||||
- `BLOCKED` — waiting on upstream / external
|
||||
- `DONE` — landed on `main`, in a tagged release
|
||||
|
||||
---
|
||||
|
||||
## P0 — must-fix (data-loss bugs)
|
||||
|
||||
### F1 · OPEN · Void-damage guard during post-login restore
|
||||
|
||||
Source: [AUDIT-2026-05-07.md](AUDIT-2026-05-07.md) §4 F1. Triggered by
|
||||
YOU500 incident on 2026-05-07 — full inventory loss to void on login.
|
||||
|
||||
Acceptance:
|
||||
- New `Set<UUID> pendingTransit` in `LoginListener`.
|
||||
- UUID added on `LoginEvent`, removed on TP success or final retry give-up.
|
||||
- `EntityDamageEvent` listener at `EventPriority.HIGHEST`: if
|
||||
`entity instanceof Player`, UUID in `pendingTransit`, cause is `VOID` →
|
||||
`setCancelled(true)` and `player.teleport(limboManager.spawn())`
|
||||
*synchronously* (we need to land before the next tick voids them again).
|
||||
- Covered by test plan §5.1 and §5.4 in AUDIT-2026-05-07.md.
|
||||
|
||||
### F2 · OPEN · Recovery when teleportAsync returns false
|
||||
|
||||
Source: [AUDIT-2026-05-07.md](AUDIT-2026-05-07.md) §4 F2.
|
||||
|
||||
Today: `LoginListener.java:172–175` only logs. After fix:
|
||||
- On `success == false`: synchronously TP to limbo spawn, schedule one
|
||||
retry of `doTeleport` after 20 ticks.
|
||||
- Track retry count per UUID (max 3). After 3 failures: drop into F6 path.
|
||||
- Also wire the `exceptionally` branches (lines 180–185, 186–191) into
|
||||
the same recovery.
|
||||
|
||||
Acceptance: test plan §5.2 passes — invalid coords trigger recovery, no
|
||||
void death, admin sees logged retries.
|
||||
|
||||
### F4 · OPEN · Pre-empt AuthMe's own broken teleport
|
||||
|
||||
Source: [AUDIT-2026-05-07.md](AUDIT-2026-05-07.md) §4 F4. Implements
|
||||
option (a): add a second `LoginEvent` handler at `EventPriority.LOWEST`
|
||||
that immediately teleports the player back to limbo spawn. AuthMe's
|
||||
internal teleport then runs against an irrelevant location, our MONITOR
|
||||
handler wins last.
|
||||
|
||||
Depends on: F1 (so void damage during the LOWEST→MONITOR window is
|
||||
guarded).
|
||||
|
||||
Acceptance: in test plan §5.6, log shows AuthMe's TP line followed by no
|
||||
`left the confines` event before our authoritative TP fires.
|
||||
|
||||
---
|
||||
|
||||
## P1 — defensive (failure modes we know about)
|
||||
|
||||
### F3 · OPEN · Pre-flight 3x3 chunk preload before teleportAsync
|
||||
|
||||
Source: [AUDIT-2026-05-07.md](AUDIT-2026-05-07.md) §4 F3.
|
||||
|
||||
In `doTeleport` (LoginListener.java:133), before calling
|
||||
`getChunkAtAsyncUrgently` on the centre chunk, also `addPluginChunkTicket`
|
||||
on the eight neighbours. Release all nine tickets via the existing
|
||||
`scheduleTicketRelease` path (extend it to take a list).
|
||||
|
||||
Acceptance: test plan §5.1 passes even when the centre chunk is on a
|
||||
section boundary that previously triggered "loaded but not ready".
|
||||
|
||||
### F5 · OPEN · Inventory snapshot + auto-restore on transit death
|
||||
|
||||
Source: [AUDIT-2026-05-07.md](AUDIT-2026-05-07.md) §4 F5. Defence in
|
||||
depth — even if F1–F4 all fail, no inventory is lost.
|
||||
|
||||
- On `AuthMeAsyncPreLoginEvent` (in addition to chunk pin), snapshot
|
||||
player inventory + xp + location into `Map<UUID, Snapshot>`.
|
||||
- Optionally persist to `plugins/AuthLimbo/snapshots/<uuid>.nbt` for
|
||||
crash-survivability.
|
||||
- On `PlayerDeathEvent` while UUID in `pendingTransit`:
|
||||
`event.getDrops().clear()`, `event.setKeepInventory(true)`,
|
||||
`event.setKeepLevel(true)`. On `PlayerRespawnEvent`, restore from
|
||||
snapshot if needed and TP to limbo spawn for re-restore.
|
||||
- Snapshot discarded 30 s after successful TP.
|
||||
|
||||
Acceptance: test plan §5.5 — induced death during transit yields full
|
||||
inventory on respawn.
|
||||
|
||||
---
|
||||
|
||||
## P2 — nice-to-have
|
||||
|
||||
### F6 · OPEN · Spectator-mode admin-alert fallback
|
||||
|
||||
Source: [AUDIT-2026-05-07.md](AUDIT-2026-05-07.md) §4 F6. After 3 failed
|
||||
retries (F2): set spectator, TP to overworld default spawn, console log
|
||||
+ optional Discord webhook config key. Player gets a message to ping
|
||||
staff for manual `/authlimbo tp`.
|
||||
|
||||
### F7 · OPEN · Telemetry counters + `/authlimbo stats`
|
||||
|
||||
Source: [AUDIT-2026-05-07.md](AUDIT-2026-05-07.md) §4 F7.
|
||||
|
||||
Track per-session counters: restore_success, restore_retry,
|
||||
restore_fail, void_damage_blocked, snapshot_restored. Expose via a new
|
||||
`/authlimbo stats` subcommand. Reset on plugin reload.
|
||||
|
||||
---
|
||||
|
||||
## Done
|
||||
|
||||
(Nothing landed since v1.0.0 release on 2026-04-30. First post-1.0
|
||||
release will be triggered by F1+F2+F4 landing as v1.1.0.)
|
||||
|
||||
---
|
||||
|
||||
## Release plan
|
||||
|
||||
- **v1.1.0** — F1, F2, F4 (data-loss fix). Target: ASAP.
|
||||
- **v1.2.0** — F3, F5 (defence in depth). Target: within 2 weeks.
|
||||
- **v1.3.0** — F6, F7 (ops UX). Target: opportunistic.
|
||||
|
||||
Privacy invariant (limbo-on-join, no overworld exposure pre-auth) must
|
||||
hold across every release. See `AUDIT-2026-05-07.md` §6.
|
||||
Loading…
Reference in a new issue