147 lines
6.6 KiB
Markdown
147 lines
6.6 KiB
Markdown
|
|
# Interim Mitigations — 2026-05-07
|
||
|
|
|
||
|
|
Server-level temporary workarounds applied while permanent fixes are pending.
|
||
|
|
Each item lists its **revert trigger** so we don't carry these forever.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## H1 — `gamerule keepInventory true` (server-wide)
|
||
|
|
|
||
|
|
**Status:** **NOT YET APPLIED LIVE.** The `gamerule` command is unreachable
|
||
|
|
via the current RCON path — every variant attempted (`gamerule keepInventory
|
||
|
|
true`, `minecraft:gamerule …`, `execute in minecraft:overworld run gamerule
|
||
|
|
…`, lowercase, no value) returned `Incorrect argument for command` from
|
||
|
|
Paper's command parser, and the command never appears as a "Rcon issued
|
||
|
|
server command" line in `/data/logs/latest.log`. This matches AUDIT-2026-05-07
|
||
|
|
finding **F-16**: rcon-cli quoting / Paper 1.21.11 brigadier interaction
|
||
|
|
appears to swallow the gamerule command client-side.
|
||
|
|
|
||
|
|
**Why:** Until AuthLimbo F1 (void-damage guard) and F2 (`teleportAsync` retry)
|
||
|
|
ship in production, ANY login race that void-kills a transiting player
|
||
|
|
results in full inventory + xp loss (see YOU500 incident, 2026-05-07
|
||
|
|
17:13:39 BST). `keepInventory=true` server-wide is a blunt but sound safety
|
||
|
|
net during the gap. Trade-off: removes survival death penalty everywhere,
|
||
|
|
not just on auth-flow deaths.
|
||
|
|
|
||
|
|
**To apply (operator action required, in-game):**
|
||
|
|
|
||
|
|
1. Op-login as `s8n` (or any rank-4 op).
|
||
|
|
2. In chat, run: `/gamerule keepInventory true`
|
||
|
|
3. Verify: `/gamerule keepInventory` should reply `keepInventory is set to true`
|
||
|
|
4. Note the date in this file under "Applied".
|
||
|
|
|
||
|
|
**Applied:** _pending — deferred to operator while RCON gamerule path is
|
||
|
|
broken (see F-16). Ask s8n to run it next time they're logged in. They
|
||
|
|
were online 2026-05-07 17:47 BST restoring YOU500's gear — ideal moment
|
||
|
|
missed; do it on next op session._
|
||
|
|
|
||
|
|
**Revert trigger (drop this safety net):**
|
||
|
|
|
||
|
|
When **AuthLimbo 1.1.0** is deployed with **all of**:
|
||
|
|
- F1 (void-damage guard for `pendingTransit` UUIDs)
|
||
|
|
- F2 (post-`teleportAsync==false` recovery: snap to limbo spawn + retry)
|
||
|
|
- F4 (pre-empt AuthMe's own broken teleport at LOGIN-LOWEST)
|
||
|
|
|
||
|
|
…AND those have been observed handling at least one production void-death
|
||
|
|
race correctly (look for `[AuthLimbo] void-damage cancelled for <uuid>` or
|
||
|
|
`teleportAsync recovered after retry` lines in latest.log).
|
||
|
|
|
||
|
|
**Revert command (in-game):** `/gamerule keepInventory false`
|
||
|
|
|
||
|
|
**Cross-reference:**
|
||
|
|
- Audit: `/home/admin/ai-lab/_github/minecraft-server/AUDIT-2026-05-07.md` F-02, F-16
|
||
|
|
- Plugin audit: `/home/admin/ai-lab/_github/auth-limbo/AUDIT-2026-05-07.md` F1, F2, F4
|
||
|
|
- Plugin roadmap: `/home/admin/ai-lab/_github/auth-limbo/ROADMAP.md`
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## H2 — Container capability hardening (compose)
|
||
|
|
|
||
|
|
**Status:** Applied to compose file 2026-05-07. **NOT yet applied to running
|
||
|
|
container** — change goes live on next `docker compose up -d --force-recreate`.
|
||
|
|
|
||
|
|
**Reason for deferral:** 2 players online (s8n + YOU500) at the time of edit;
|
||
|
|
operator was actively restoring inventory via `/give`. Restart deferred to
|
||
|
|
avoid a second incident on the same player on the same day.
|
||
|
|
|
||
|
|
**Restart command (when window opens, no players online or with announcement):**
|
||
|
|
```bash
|
||
|
|
ssh user@192.168.0.100 'docker compose -f /opt/docker/minecraft/docker-compose.yml down && docker compose -f /opt/docker/minecraft/docker-compose.yml up -d'
|
||
|
|
```
|
||
|
|
|
||
|
|
**Post-restart verification:**
|
||
|
|
```bash
|
||
|
|
# 1. Container came up healthy:
|
||
|
|
docker ps --filter name=minecraft-mc --format '{{.Status}}'
|
||
|
|
# Expected: "Up X seconds (healthy)" — wait 4-5 min for healthcheck.
|
||
|
|
|
||
|
|
# 2. itzg entrypoint did its chowns successfully:
|
||
|
|
docker logs minecraft-mc 2>&1 | grep -iE "(error|denied|cannot)" | head
|
||
|
|
|
||
|
|
# 3. RCON still reachable:
|
||
|
|
echo "list" | docker exec -i minecraft-mc rcon-cli
|
||
|
|
```
|
||
|
|
|
||
|
|
**If the container fails to start** (most likely cause: missing capability):
|
||
|
|
1. Check logs for `chown: ... Operation not permitted` -> add `DAC_OVERRIDE`.
|
||
|
|
2. Check for `setuid` / `setgid` errors -> already in cap_add, but verify spelling.
|
||
|
|
3. Roll back: `cp /opt/docker/minecraft/docker-compose.yml.bak-2026-05-07-before-H2H3 /opt/docker/minecraft/docker-compose.yml && docker compose up -d`.
|
||
|
|
|
||
|
|
**No revert trigger** — this is a permanent hardening, not a workaround.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## H3 — JVM Xmx lowered 16384M → 14336M (compose)
|
||
|
|
|
||
|
|
**Status:** Applied to compose file 2026-05-07. **NOT yet applied to running
|
||
|
|
container** — change goes live on the same restart that activates H2.
|
||
|
|
|
||
|
|
**Reason:** AUDIT-2026-05-07 F-05 — original `-Xmx16384M` inside an 18 GB
|
||
|
|
container leaves <2 GB headroom for off-heap (Netty buffers, native mmaps,
|
||
|
|
plugin metadata). With 25 plugins on Aikar G1 flags, native memory regularly
|
||
|
|
sits 2-3 GB above heap. A player surge that pushes G1 to its full 16 GB
|
||
|
|
ceiling results in a silent kernel OOM kill of the container.
|
||
|
|
|
||
|
|
**Decision:** Lower Xmx (14 GB), do NOT raise the container limit. Host has
|
||
|
|
31 GB RAM total with ~13 GB free at edit time, but nullstone runs other
|
||
|
|
docker workloads (matrix, rocketchat, traefik, forgejo, etc) and the 18 GB
|
||
|
|
budget for MC was already aggressive. New layout: 14 GB heap + ~3.5 GB
|
||
|
|
native + 0.5 GB direct buffers fits comfortably in 18 GB.
|
||
|
|
|
||
|
|
**No revert trigger** — permanent. If TPS regresses under load due to
|
||
|
|
heap pressure, raise Xmx in 1 GB steps and re-evaluate; don't blanket-revert.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## H4 — Compose backups (defence-in-depth)
|
||
|
|
|
||
|
|
**Status:** Applied 2026-05-07.
|
||
|
|
|
||
|
|
**Files saved:**
|
||
|
|
- Deployed: `/opt/docker/minecraft/docker-compose.yml.bak-2026-05-07-before-H2H3`
|
||
|
|
- Repo: `/home/admin/ai-lab/_github/minecraft-server/docker-compose.yml.bak-2026-05-07-before-H2H3`
|
||
|
|
|
||
|
|
**Restore commands (if H2/H3 prove broken after restart):**
|
||
|
|
```bash
|
||
|
|
# Deployed (revert + restart):
|
||
|
|
ssh user@192.168.0.100 'cp /opt/docker/minecraft/docker-compose.yml.bak-2026-05-07-before-H2H3 /opt/docker/minecraft/docker-compose.yml && docker compose -f /opt/docker/minecraft/docker-compose.yml up -d --force-recreate'
|
||
|
|
|
||
|
|
# Repo:
|
||
|
|
cp /home/admin/ai-lab/_github/minecraft-server/docker-compose.yml.bak-2026-05-07-before-H2H3 /home/admin/ai-lab/_github/minecraft-server/docker-compose.yml
|
||
|
|
```
|
||
|
|
|
||
|
|
**Backup retention:** keep both `.bak-2026-05-07-before-H2H3` files until
|
||
|
|
the post-restart verification has been signed off (i.e. one full day of
|
||
|
|
healthy uptime under load).
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Index of applied measures
|
||
|
|
|
||
|
|
| ID | Status | Applied (live) | Reverts when |
|
||
|
|
|-----|----------------------|----------------|-------------------------------|
|
||
|
|
| H1 | Compose-staged only | NO (deferred to operator: F-16 RCON path broken) | AuthLimbo 1.1.0 (F1+F2+F4) ships and proves itself in prod |
|
||
|
|
| H2 | Compose edits saved | NO (next restart) | never — permanent hardening |
|
||
|
|
| H3 | Compose edits saved | NO (next restart) | never — permanent |
|
||
|
|
| H4 | Backups created | YES | after H2/H3 prove healthy |
|