docs: incident audit + roadmap for 2026-05-07 void-death
Some checks failed
Build / build (push) Has been cancelled

Player YOU500 lost full inventory at 17:13:39 due to AuthLimbo
teleportAsync rejection during AuthMe-driven post-login teleport.
Items void-dropped, no backup recoverable.

AUDIT-2026-05-07.md traces code path (LoginListener.java:128, 172-175)
and ranks fix candidates F1-F7. ROADMAP.md slots them across
v1.1.0/v1.2.0/v1.3.0 with priority and acceptance criteria.

P0 fixes pending source change:
- F1: VOID-damage guard (EntityDamageEvent listener at HIGHEST)
- F2: recovery on teleportAsync false (sync-TP back to limbo + retry)
- F4: pre-empt AuthMe internal teleport (LoginEvent at LOWEST)

Privacy posture preserved across all proposed changes.
This commit is contained in:
s8n 2026-05-07 17:33:07 +01:00
parent b6863806cd
commit 1f9d4bb198
2 changed files with 387 additions and 0 deletions

260
AUDIT-2026-05-07.md Normal file
View file

@ -0,0 +1,260 @@
# AUDIT — 2026-05-07 — YOU500 void-death on AuthMe restore
Reviewer: Claude (auth-limbo audit pass).
Scope: Read-only review of `src/main/java/ru/authlimbo/**` against a real
production incident on `racked.ru` at 2026-05-07 17:13:39 UTC.
Status: **Audit-only — no code changes applied.** Fixes tracked in
[ROADMAP.md](ROADMAP.md).
---
## 1. Incident
`YOU500` joined the server, was held in `auth_limbo` (correct), authenticated
to AuthMe, and was teleported back to overworld — but Paper rejected the
teleport and the player void-died with full inventory loss.
### Raw log (paper-server.log, trimmed)
```
17:13:35 YOU500[/45.157.234.219] logged in with entity id 26548
at ([auth_limbo]0.5, 128.0, 0.5)
17:13:38 [AuthMe] YOU500 logged in
17:13:39 [INFO:DEBUG] Restoring fly speed for LimboPlayer YOU500 to 0.1 (RESTORE_NO_ZERO mode)
17:13:39 [INFO:DEBUG] Teleporting `YOU500` after login, based on the player auth
17:13:39 YOU500 left the confines of this world <-- VOID DEATH
17:13:39 [AuthLimbo] Restoring YOU500 to world(2380.4, 69.9, -11358.4)
17:13:39 [AuthLimbo] teleportAsync returned false for YOU500
— Paper may have rejected the location.
```
Loss: full inventory, full xp. Privacy posture (limbo-on-join) **was not
breached** — player was authenticated before the failure. The bug is purely
the restore step.
### What happened, in order
1. AuthMe pre-login fires. `LoginListener.onAsyncPreLogin` reads
`authme.db` and schedules `addPluginChunkTicket` on `world` chunk
`(2380>>4=148, -11359>>4=-710)`. So far so good.
2. AuthMe authenticates and runs **its own** broken teleport
(`Teleporting YOU500 after login`). This is the AuthMe-fork log line, not
ours — AuthMe does a `teleportAsync` of its own with no chunk preload.
3. AuthMe's teleport partially moves the entity into `world` at the saved
coords **before the chunk is actually loaded**. The entity is now at
y=69.9 in an unloaded section. Paper's "outside loaded chunk" path
triggers and the player drops/voids — log line `left the confines of
this world` fires.
4. Our 10-tick delayed callback runs (`LoginListener.doTeleport`,
line 133). Player is "online" but already dead/spectator-on-respawn.
We log `Restoring …` and call `teleportAsync`.
5. `teleportAsync` resolves with `false` because Paper rejects the move
for a dead/transitioning entity, or because the player is no longer in
a state where a `PLUGIN`-cause teleport is accepted.
6. We log `teleportAsync returned false` and return. Player remains in
void-death state.
The inventory loss is not from us — it's vanilla `keepInventory=false`
behaviour on void death. We do not snapshot inventories.
---
## 2. Code path trace
| Step | File | Lines | Note |
|------|------|-------|------|
| Pre-login chunk pin | `LoginListener.java` | 78109 | OK — runs ~1s before login completes. |
| Login event handler | `LoginListener.java` | 113129 | MONITOR priority, schedules `doTeleport` 10 ticks later. |
| Saved-location read | `AuthMeDatabase.java` | 68107 | Read-only, fresh JDBC conn per call. |
| `doTeleport` | `LoginListener.java` | 133192 | The hot path. |
| `getChunkAtAsyncUrgently` | line 165 | — | Fires; on success calls teleportAsync. |
| `teleportAsync` | line 166 | — | The call that returned `false`. |
| Failure branch | lines 172175 | — | **Logs only.** No retry. No safety relocate. **No void-death guard.** |
| `exceptionally` branch | lines 180185, 186191 | — | Logs only. |
---
## 3. Root-cause hypothesis (ranked)
### H1 — AuthMe's own broken teleport voids the player BEFORE our handler fires *(most likely)*
The AuthMe-fork log line `Teleporting YOU500 after login, based on the
player auth` at `17:13:39` is from AuthMe-ReReloaded fork b49 itself
(`PlayerAuth.teleportOnLogin` flow). AuthMe does a teleport with **no chunk
preload** to the saved coords. In Paper 1.21.11, calling `teleportAsync` to
a location where the chunk is still not fully *loaded into the player's
view* (vs. just having a chunk-ticket) can move the entity into a section
where its block-below check returns null and the entity is treated as
out-of-world. The `left the confines of this world` line fires immediately
after, BEFORE our 10-tick delay elapses.
By the time `doTeleport` runs at 17:13:39, the player is already dead /
respawning. Paper rejects our `teleportAsync` because:
- `Player.isOnline()` returns true (still connected) — passes our guard
- but the entity is mid-respawn / dead — Paper rejects PLUGIN-cause TPs
against entities in that state ⇒ `false`.
This is consistent with all five log lines and with Paper #4085's
description of the race.
> Implication: our pre-login chunk-ticket and our delayed teleport are
> defending the wrong moment. AuthMe-fork's *own* teleport, which runs
> ~1 tick after `LoginEvent`, is what voids the player. We then arrive
> too late.
### H2 — The chunk ticket is added on the wrong chunk *(possible secondary)*
`onAsyncPreLogin` adds a ticket on the chunk computed from the saved
quit-location. But the player's first-time-join behaviour might use a
different teleport target (AuthMe spawn-on-first-login). For an existing
player like YOU500 this is unlikely — they have a saved row.
### H3 — `teleport-delay-ticks: 10` is too long *(secondary)*
10 ticks (~500 ms) leaves a window for AuthMe's own broken teleport to
void-kill. A delay of `0` (run immediately on LoginEvent) **and**
cancelling AuthMe's teleport would close the gap, but cancelling AuthMe's
teleport is non-trivial.
### H4 — Y=69.9 is too low for a chunk that hasn't generated/loaded *(unlikely)*
The world is the main overworld and has been visited (player logged out
there). Chunk exists on disk. Y=69.9 is normal terrain height. Not the
issue.
### H5 — Paper rejected because saved Y=69.9 is below world min height *(no)*
1.21 overworld min-Y is -64. 69.9 is fine.
**Conclusion:** H1 dominates. The fix must (a) defend the player against
void during the AuthMe-own-teleport window, and (b) recover gracefully
when our authoritative teleport's future returns `false`.
---
## 4. Proposed fixes
Ordered must-fix → defensive → nice-to-have. Implementation deferred per
project workflow (audit first, code after sign-off).
### F1 — MUST: void-damage guard while player is in "transit" *(primary fix)*
While a player is in the post-LoginEvent restore window, register a
`Set<UUID> pendingTransit`. On `EntityDamageEvent` filter by:
- entity is Player, UUID in `pendingTransit`
- damage cause is `VOID`
`event.setCancelled(true)` and immediately teleport the player back to
limbo spawn (`limboManager.spawn()`) at y=128. Then re-attempt the
authoritative teleport via `doTeleport` with a backoff.
This single guard would have saved YOU500's life and inventory.
### F2 — MUST: when `teleportAsync` future returns `false`, recover
Right now the code at `LoginListener.java:172175` only logs. Replace
with:
1. Player still in pendingTransit? **Yes** ⇒ teleport to
`limboManager.spawn()` synchronously (`player.teleport(...)`, not
async, since we need to land *now*).
2. Schedule one retry of `doTeleport` after 20 ticks with the same saved
location.
3. After N=3 retries, give up and leave at limbo spawn + send player a
message ("/authlimbo tp" requires admin help). Also send admin alert.
### F3 — MUST: pre-flight `World#getChunkAtAsync(cx, cz, true).get()` before calling teleportAsync
Today we call `getChunkAtAsyncUrgently` then chain teleport. The chain
*should* mean the chunk is loaded — but `getChunkAtAsyncUrgently` returns
the `Chunk` object as soon as it's loaded server-side, not necessarily
"ready for entity placement" with all neighbouring sections paged in.
Force the surrounding 3x3 chunks loaded via additional
`addPluginChunkTicket` on neighbours before teleporting.
### F4 — SHOULD: cancel or pre-empt AuthMe's own teleport
AuthMe-ReReloaded fork b49 fires the broken teleport itself. Two options:
- **(a)** listen to `LoginEvent` at `LOWEST` priority too, immediately
teleport the player to limbo spawn (overriding any in-flight position),
then on `MONITOR` do our authoritative TP. Net: AuthMe's teleport is
effectively a no-op because we beat it back to limbo and run last.
- **(b)** `teleport-delay-ticks: 0` + use a `PlayerTeleportEvent` listener
to cancel any teleport with `TeleportCause.PLUGIN` whose source is the
AuthMe plugin instance, while pendingTransit is set for that UUID.
(a) is simpler and contained inside our plugin.
### F5 — SHOULD: inventory snapshot on AuthMeAsyncPreLoginEvent
Before AuthMe authenticates, snapshot the player's inventory + xp +
location into an in-memory `Map<UUID, Snapshot>` and persist to
`plugins/AuthLimbo/snapshots/<uuid>.nbt` (or a SQLite table). On
`PlayerDeathEvent` while UUID in pendingTransit, restore inventory from
the snapshot via `keepInventory`-style override (cancel drops, restore on
respawn). Discard snapshot 30 s after successful TP.
This is a defensive belt-and-braces — even if all chunk logic fails, no
inventory is ever lost on an auth-flow death.
### F6 — NICE: spectator-mode fallback
If F1F4 all fail and the player is still in void state after N retries,
set `GameMode.SPECTATOR`, teleport to overworld spawn (server world's
default spawn), and send admin a Discord/console alert: "AuthLimbo could
not restore YOU500 — manual `/authlimbo tp YOU500` required". The
spectator mode prevents further damage and lets the player observe the
world while admin acts.
### F7 — NICE: telemetry
Bump a counter on each failed restore (success/fail/retry) and expose via
`/authlimbo stats` for ops visibility.
---
## 5. Test plan
Reproducible in a dev Paper 1.21.11 server with AuthMe-ReReloaded:
1. **Unloaded-chunk void.** Set saved coord to (10000, 70, 10000) in
`authme.db` for a test account. Restart server (chunks unload). Login.
Expect: void-damage guard cancels VOID damage, player lands at saved
coords or at limbo spawn for retry.
2. **Invalid Y (above build limit).** Set saved Y to 5000. Login. Expect:
`teleportAsync` returns false, recovery branch teleports to limbo
spawn, retry escalation works.
3. **World no longer loaded.** Set saved world to a string that no longer
exists (e.g. `world_old`). Login. Expect: graceful fallback to
overworld spawn, admin notified.
4. **Death during transit.** Force `EntityDamageEvent.VOID` via a debug
command while the player is mid-restore. Expect: damage cancelled,
player relocated to limbo spawn, restore retried.
5. **Snapshot/restore on death.** With F5 implemented, kill the player
during transit. Expect: respawn with full inventory + xp.
6. **AuthMe pre-empt.** With F4(a), watch logs — AuthMe's teleport line
fires but the position is immediately overwritten by our limbo TP at
LOWEST, then by our authoritative TP at MONITOR.
All tests run on a clean dev server, not racked.ru production.
---
## 6. Privacy posture — unchanged
None of the proposed fixes weaken the limbo-on-join privacy property:
- F1 keeps the player in limbo *longer* on damage, never exposes them
to the overworld pre-auth.
- F4(a) actually *strengthens* it by guaranteeing the limbo position is
reasserted at LOGIN-LOWEST.
- F5 stores inventory snapshots locally (server-side, plugin folder) —
no new network exposure.
---
## 7. Sign-off
Audit author: Claude (auth-limbo plugin audit pass)
Date: 2026-05-07
Recommended next action: review ROADMAP.md, approve F1+F2 for first
implementation pass, F3+F4 for second pass.

127
ROADMAP.md Normal file
View file

@ -0,0 +1,127 @@
# ROADMAP — AuthLimbo
Tracked work items for the plugin. Format: priority, ID, title, status,
acceptance criteria. Source-of-truth for what needs to ship next.
Status legend:
- `OPEN` — not started
- `WIP` — in progress on a branch
- `BLOCKED` — waiting on upstream / external
- `DONE` — landed on `main`, in a tagged release
---
## P0 — must-fix (data-loss bugs)
### F1 · OPEN · Void-damage guard during post-login restore
Source: [AUDIT-2026-05-07.md](AUDIT-2026-05-07.md) §4 F1. Triggered by
YOU500 incident on 2026-05-07 — full inventory loss to void on login.
Acceptance:
- New `Set<UUID> pendingTransit` in `LoginListener`.
- UUID added on `LoginEvent`, removed on TP success or final retry give-up.
- `EntityDamageEvent` listener at `EventPriority.HIGHEST`: if
`entity instanceof Player`, UUID in `pendingTransit`, cause is `VOID`
`setCancelled(true)` and `player.teleport(limboManager.spawn())`
*synchronously* (we need to land before the next tick voids them again).
- Covered by test plan §5.1 and §5.4 in AUDIT-2026-05-07.md.
### F2 · OPEN · Recovery when teleportAsync returns false
Source: [AUDIT-2026-05-07.md](AUDIT-2026-05-07.md) §4 F2.
Today: `LoginListener.java:172175` only logs. After fix:
- On `success == false`: synchronously TP to limbo spawn, schedule one
retry of `doTeleport` after 20 ticks.
- Track retry count per UUID (max 3). After 3 failures: drop into F6 path.
- Also wire the `exceptionally` branches (lines 180185, 186191) into
the same recovery.
Acceptance: test plan §5.2 passes — invalid coords trigger recovery, no
void death, admin sees logged retries.
### F4 · OPEN · Pre-empt AuthMe's own broken teleport
Source: [AUDIT-2026-05-07.md](AUDIT-2026-05-07.md) §4 F4. Implements
option (a): add a second `LoginEvent` handler at `EventPriority.LOWEST`
that immediately teleports the player back to limbo spawn. AuthMe's
internal teleport then runs against an irrelevant location, our MONITOR
handler wins last.
Depends on: F1 (so void damage during the LOWEST→MONITOR window is
guarded).
Acceptance: in test plan §5.6, log shows AuthMe's TP line followed by no
`left the confines` event before our authoritative TP fires.
---
## P1 — defensive (failure modes we know about)
### F3 · OPEN · Pre-flight 3x3 chunk preload before teleportAsync
Source: [AUDIT-2026-05-07.md](AUDIT-2026-05-07.md) §4 F3.
In `doTeleport` (LoginListener.java:133), before calling
`getChunkAtAsyncUrgently` on the centre chunk, also `addPluginChunkTicket`
on the eight neighbours. Release all nine tickets via the existing
`scheduleTicketRelease` path (extend it to take a list).
Acceptance: test plan §5.1 passes even when the centre chunk is on a
section boundary that previously triggered "loaded but not ready".
### F5 · OPEN · Inventory snapshot + auto-restore on transit death
Source: [AUDIT-2026-05-07.md](AUDIT-2026-05-07.md) §4 F5. Defence in
depth — even if F1F4 all fail, no inventory is lost.
- On `AuthMeAsyncPreLoginEvent` (in addition to chunk pin), snapshot
player inventory + xp + location into `Map<UUID, Snapshot>`.
- Optionally persist to `plugins/AuthLimbo/snapshots/<uuid>.nbt` for
crash-survivability.
- On `PlayerDeathEvent` while UUID in `pendingTransit`:
`event.getDrops().clear()`, `event.setKeepInventory(true)`,
`event.setKeepLevel(true)`. On `PlayerRespawnEvent`, restore from
snapshot if needed and TP to limbo spawn for re-restore.
- Snapshot discarded 30 s after successful TP.
Acceptance: test plan §5.5 — induced death during transit yields full
inventory on respawn.
---
## P2 — nice-to-have
### F6 · OPEN · Spectator-mode admin-alert fallback
Source: [AUDIT-2026-05-07.md](AUDIT-2026-05-07.md) §4 F6. After 3 failed
retries (F2): set spectator, TP to overworld default spawn, console log
+ optional Discord webhook config key. Player gets a message to ping
staff for manual `/authlimbo tp`.
### F7 · OPEN · Telemetry counters + `/authlimbo stats`
Source: [AUDIT-2026-05-07.md](AUDIT-2026-05-07.md) §4 F7.
Track per-session counters: restore_success, restore_retry,
restore_fail, void_damage_blocked, snapshot_restored. Expose via a new
`/authlimbo stats` subcommand. Reset on plugin reload.
---
## Done
(Nothing landed since v1.0.0 release on 2026-04-30. First post-1.0
release will be triggered by F1+F2+F4 landing as v1.1.0.)
---
## Release plan
- **v1.1.0** — F1, F2, F4 (data-loss fix). Target: ASAP.
- **v1.2.0** — F3, F5 (defence in depth). Target: within 2 weeks.
- **v1.3.0** — F6, F7 (ops UX). Target: opportunistic.
Privacy invariant (limbo-on-join, no overworld exposure pre-auth) must
hold across every release. See `AUDIT-2026-05-07.md` §6.