auth-limbo/AUDIT-2026-05-07.md
s8n 1f9d4bb198
Some checks failed
Build / build (push) Has been cancelled
docs: incident audit + roadmap for 2026-05-07 void-death
Player YOU500 lost full inventory at 17:13:39 due to AuthLimbo
teleportAsync rejection during AuthMe-driven post-login teleport.
Items void-dropped, no backup recoverable.

AUDIT-2026-05-07.md traces code path (LoginListener.java:128, 172-175)
and ranks fix candidates F1-F7. ROADMAP.md slots them across
v1.1.0/v1.2.0/v1.3.0 with priority and acceptance criteria.

P0 fixes pending source change:
- F1: VOID-damage guard (EntityDamageEvent listener at HIGHEST)
- F2: recovery on teleportAsync false (sync-TP back to limbo + retry)
- F4: pre-empt AuthMe internal teleport (LoginEvent at LOWEST)

Privacy posture preserved across all proposed changes.
2026-05-07 17:33:07 +01:00

260 lines
11 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# AUDIT — 2026-05-07 — YOU500 void-death on AuthMe restore
Reviewer: Claude (auth-limbo audit pass).
Scope: Read-only review of `src/main/java/ru/authlimbo/**` against a real
production incident on `racked.ru` at 2026-05-07 17:13:39 UTC.
Status: **Audit-only — no code changes applied.** Fixes tracked in
[ROADMAP.md](ROADMAP.md).
---
## 1. Incident
`YOU500` joined the server, was held in `auth_limbo` (correct), authenticated
to AuthMe, and was teleported back to overworld — but Paper rejected the
teleport and the player void-died with full inventory loss.
### Raw log (paper-server.log, trimmed)
```
17:13:35 YOU500[/45.157.234.219] logged in with entity id 26548
at ([auth_limbo]0.5, 128.0, 0.5)
17:13:38 [AuthMe] YOU500 logged in
17:13:39 [INFO:DEBUG] Restoring fly speed for LimboPlayer YOU500 to 0.1 (RESTORE_NO_ZERO mode)
17:13:39 [INFO:DEBUG] Teleporting `YOU500` after login, based on the player auth
17:13:39 YOU500 left the confines of this world <-- VOID DEATH
17:13:39 [AuthLimbo] Restoring YOU500 to world(2380.4, 69.9, -11358.4)
17:13:39 [AuthLimbo] teleportAsync returned false for YOU500
— Paper may have rejected the location.
```
Loss: full inventory, full xp. Privacy posture (limbo-on-join) **was not
breached** — player was authenticated before the failure. The bug is purely
the restore step.
### What happened, in order
1. AuthMe pre-login fires. `LoginListener.onAsyncPreLogin` reads
`authme.db` and schedules `addPluginChunkTicket` on `world` chunk
`(2380>>4=148, -11359>>4=-710)`. So far so good.
2. AuthMe authenticates and runs **its own** broken teleport
(`Teleporting YOU500 after login`). This is the AuthMe-fork log line, not
ours — AuthMe does a `teleportAsync` of its own with no chunk preload.
3. AuthMe's teleport partially moves the entity into `world` at the saved
coords **before the chunk is actually loaded**. The entity is now at
y=69.9 in an unloaded section. Paper's "outside loaded chunk" path
triggers and the player drops/voids — log line `left the confines of
this world` fires.
4. Our 10-tick delayed callback runs (`LoginListener.doTeleport`,
line 133). Player is "online" but already dead/spectator-on-respawn.
We log `Restoring …` and call `teleportAsync`.
5. `teleportAsync` resolves with `false` because Paper rejects the move
for a dead/transitioning entity, or because the player is no longer in
a state where a `PLUGIN`-cause teleport is accepted.
6. We log `teleportAsync returned false` and return. Player remains in
void-death state.
The inventory loss is not from us — it's vanilla `keepInventory=false`
behaviour on void death. We do not snapshot inventories.
---
## 2. Code path trace
| Step | File | Lines | Note |
|------|------|-------|------|
| Pre-login chunk pin | `LoginListener.java` | 78109 | OK — runs ~1s before login completes. |
| Login event handler | `LoginListener.java` | 113129 | MONITOR priority, schedules `doTeleport` 10 ticks later. |
| Saved-location read | `AuthMeDatabase.java` | 68107 | Read-only, fresh JDBC conn per call. |
| `doTeleport` | `LoginListener.java` | 133192 | The hot path. |
| `getChunkAtAsyncUrgently` | line 165 | — | Fires; on success calls teleportAsync. |
| `teleportAsync` | line 166 | — | The call that returned `false`. |
| Failure branch | lines 172175 | — | **Logs only.** No retry. No safety relocate. **No void-death guard.** |
| `exceptionally` branch | lines 180185, 186191 | — | Logs only. |
---
## 3. Root-cause hypothesis (ranked)
### H1 — AuthMe's own broken teleport voids the player BEFORE our handler fires *(most likely)*
The AuthMe-fork log line `Teleporting YOU500 after login, based on the
player auth` at `17:13:39` is from AuthMe-ReReloaded fork b49 itself
(`PlayerAuth.teleportOnLogin` flow). AuthMe does a teleport with **no chunk
preload** to the saved coords. In Paper 1.21.11, calling `teleportAsync` to
a location where the chunk is still not fully *loaded into the player's
view* (vs. just having a chunk-ticket) can move the entity into a section
where its block-below check returns null and the entity is treated as
out-of-world. The `left the confines of this world` line fires immediately
after, BEFORE our 10-tick delay elapses.
By the time `doTeleport` runs at 17:13:39, the player is already dead /
respawning. Paper rejects our `teleportAsync` because:
- `Player.isOnline()` returns true (still connected) — passes our guard
- but the entity is mid-respawn / dead — Paper rejects PLUGIN-cause TPs
against entities in that state ⇒ `false`.
This is consistent with all five log lines and with Paper #4085's
description of the race.
> Implication: our pre-login chunk-ticket and our delayed teleport are
> defending the wrong moment. AuthMe-fork's *own* teleport, which runs
> ~1 tick after `LoginEvent`, is what voids the player. We then arrive
> too late.
### H2 — The chunk ticket is added on the wrong chunk *(possible secondary)*
`onAsyncPreLogin` adds a ticket on the chunk computed from the saved
quit-location. But the player's first-time-join behaviour might use a
different teleport target (AuthMe spawn-on-first-login). For an existing
player like YOU500 this is unlikely — they have a saved row.
### H3 — `teleport-delay-ticks: 10` is too long *(secondary)*
10 ticks (~500 ms) leaves a window for AuthMe's own broken teleport to
void-kill. A delay of `0` (run immediately on LoginEvent) **and**
cancelling AuthMe's teleport would close the gap, but cancelling AuthMe's
teleport is non-trivial.
### H4 — Y=69.9 is too low for a chunk that hasn't generated/loaded *(unlikely)*
The world is the main overworld and has been visited (player logged out
there). Chunk exists on disk. Y=69.9 is normal terrain height. Not the
issue.
### H5 — Paper rejected because saved Y=69.9 is below world min height *(no)*
1.21 overworld min-Y is -64. 69.9 is fine.
**Conclusion:** H1 dominates. The fix must (a) defend the player against
void during the AuthMe-own-teleport window, and (b) recover gracefully
when our authoritative teleport's future returns `false`.
---
## 4. Proposed fixes
Ordered must-fix → defensive → nice-to-have. Implementation deferred per
project workflow (audit first, code after sign-off).
### F1 — MUST: void-damage guard while player is in "transit" *(primary fix)*
While a player is in the post-LoginEvent restore window, register a
`Set<UUID> pendingTransit`. On `EntityDamageEvent` filter by:
- entity is Player, UUID in `pendingTransit`
- damage cause is `VOID`
`event.setCancelled(true)` and immediately teleport the player back to
limbo spawn (`limboManager.spawn()`) at y=128. Then re-attempt the
authoritative teleport via `doTeleport` with a backoff.
This single guard would have saved YOU500's life and inventory.
### F2 — MUST: when `teleportAsync` future returns `false`, recover
Right now the code at `LoginListener.java:172175` only logs. Replace
with:
1. Player still in pendingTransit? **Yes** ⇒ teleport to
`limboManager.spawn()` synchronously (`player.teleport(...)`, not
async, since we need to land *now*).
2. Schedule one retry of `doTeleport` after 20 ticks with the same saved
location.
3. After N=3 retries, give up and leave at limbo spawn + send player a
message ("/authlimbo tp" requires admin help). Also send admin alert.
### F3 — MUST: pre-flight `World#getChunkAtAsync(cx, cz, true).get()` before calling teleportAsync
Today we call `getChunkAtAsyncUrgently` then chain teleport. The chain
*should* mean the chunk is loaded — but `getChunkAtAsyncUrgently` returns
the `Chunk` object as soon as it's loaded server-side, not necessarily
"ready for entity placement" with all neighbouring sections paged in.
Force the surrounding 3x3 chunks loaded via additional
`addPluginChunkTicket` on neighbours before teleporting.
### F4 — SHOULD: cancel or pre-empt AuthMe's own teleport
AuthMe-ReReloaded fork b49 fires the broken teleport itself. Two options:
- **(a)** listen to `LoginEvent` at `LOWEST` priority too, immediately
teleport the player to limbo spawn (overriding any in-flight position),
then on `MONITOR` do our authoritative TP. Net: AuthMe's teleport is
effectively a no-op because we beat it back to limbo and run last.
- **(b)** `teleport-delay-ticks: 0` + use a `PlayerTeleportEvent` listener
to cancel any teleport with `TeleportCause.PLUGIN` whose source is the
AuthMe plugin instance, while pendingTransit is set for that UUID.
(a) is simpler and contained inside our plugin.
### F5 — SHOULD: inventory snapshot on AuthMeAsyncPreLoginEvent
Before AuthMe authenticates, snapshot the player's inventory + xp +
location into an in-memory `Map<UUID, Snapshot>` and persist to
`plugins/AuthLimbo/snapshots/<uuid>.nbt` (or a SQLite table). On
`PlayerDeathEvent` while UUID in pendingTransit, restore inventory from
the snapshot via `keepInventory`-style override (cancel drops, restore on
respawn). Discard snapshot 30 s after successful TP.
This is a defensive belt-and-braces — even if all chunk logic fails, no
inventory is ever lost on an auth-flow death.
### F6 — NICE: spectator-mode fallback
If F1F4 all fail and the player is still in void state after N retries,
set `GameMode.SPECTATOR`, teleport to overworld spawn (server world's
default spawn), and send admin a Discord/console alert: "AuthLimbo could
not restore YOU500 — manual `/authlimbo tp YOU500` required". The
spectator mode prevents further damage and lets the player observe the
world while admin acts.
### F7 — NICE: telemetry
Bump a counter on each failed restore (success/fail/retry) and expose via
`/authlimbo stats` for ops visibility.
---
## 5. Test plan
Reproducible in a dev Paper 1.21.11 server with AuthMe-ReReloaded:
1. **Unloaded-chunk void.** Set saved coord to (10000, 70, 10000) in
`authme.db` for a test account. Restart server (chunks unload). Login.
Expect: void-damage guard cancels VOID damage, player lands at saved
coords or at limbo spawn for retry.
2. **Invalid Y (above build limit).** Set saved Y to 5000. Login. Expect:
`teleportAsync` returns false, recovery branch teleports to limbo
spawn, retry escalation works.
3. **World no longer loaded.** Set saved world to a string that no longer
exists (e.g. `world_old`). Login. Expect: graceful fallback to
overworld spawn, admin notified.
4. **Death during transit.** Force `EntityDamageEvent.VOID` via a debug
command while the player is mid-restore. Expect: damage cancelled,
player relocated to limbo spawn, restore retried.
5. **Snapshot/restore on death.** With F5 implemented, kill the player
during transit. Expect: respawn with full inventory + xp.
6. **AuthMe pre-empt.** With F4(a), watch logs — AuthMe's teleport line
fires but the position is immediately overwritten by our limbo TP at
LOWEST, then by our authoritative TP at MONITOR.
All tests run on a clean dev server, not racked.ru production.
---
## 6. Privacy posture — unchanged
None of the proposed fixes weaken the limbo-on-join privacy property:
- F1 keeps the player in limbo *longer* on damage, never exposes them
to the overworld pre-auth.
- F4(a) actually *strengthens* it by guaranteeing the limbo position is
reasserted at LOGIN-LOWEST.
- F5 stores inventory snapshots locally (server-side, plugin folder) —
no new network exposure.
---
## 7. Sign-off
Audit author: Claude (auth-limbo plugin audit pass)
Date: 2026-05-07
Recommended next action: review ROADMAP.md, approve F1+F2 for first
implementation pass, F3+F4 for second pass.