minecraft-server/INTERIM-MITIGATIONS.md
s8n 2d9c8db2dc audit P0 quick-wins: H2 container hardening, H3 Xmx tuning, H1 staged
H2 (F-06): cap_drop ALL + minimum cap_add (CHOWN, SETUID, SETGID, FOWNER),
no-new-privileges, deploy.resources.limits.pids=4096. compose config valid.
DAC_OVERRIDE deliberately omitted; re-add only if entrypoint chown fails.

H3 (F-05): Xmx 16384M -> 14336M, MEMORY_SIZE 16G -> 14G. Leaves ~3.5G
headroom for off-heap inside the unchanged 18G container limit. Host has
no spare RAM to raise the cap (other workloads).

H1 (F-02): server-wide gamerule keepInventory true planned but RCON path
for gamerule is broken (F-16) so it's deferred to operator in-game on next
op session. Documented in INTERIM-MITIGATIONS.md with a clear revert
trigger (when AuthLimbo F1+F2+F4 ship).

H4: pre-edit compose backed up to docker-compose.yml.bak-2026-05-07-before-H2H3
(deployed and repo). Restore commands in INTERIM-MITIGATIONS.md.

Live restart deferred: 2 players online (s8n actively restoring YOU500's
gear via /give). H2/H3 go live on next compose recreate.
2026-05-07 17:51:58 +01:00

6.6 KiB

Interim Mitigations — 2026-05-07

Server-level temporary workarounds applied while permanent fixes are pending. Each item lists its revert trigger so we don't carry these forever.


H1 — gamerule keepInventory true (server-wide)

Status: NOT YET APPLIED LIVE. The gamerule command is unreachable via the current RCON path — every variant attempted (gamerule keepInventory true, minecraft:gamerule …, execute in minecraft:overworld run gamerule …, lowercase, no value) returned Incorrect argument for command from Paper's command parser, and the command never appears as a "Rcon issued server command" line in /data/logs/latest.log. This matches AUDIT-2026-05-07 finding F-16: rcon-cli quoting / Paper 1.21.11 brigadier interaction appears to swallow the gamerule command client-side.

Why: Until AuthLimbo F1 (void-damage guard) and F2 (teleportAsync retry) ship in production, ANY login race that void-kills a transiting player results in full inventory + xp loss (see YOU500 incident, 2026-05-07 17:13:39 BST). keepInventory=true server-wide is a blunt but sound safety net during the gap. Trade-off: removes survival death penalty everywhere, not just on auth-flow deaths.

To apply (operator action required, in-game):

  1. Op-login as s8n (or any rank-4 op).
  2. In chat, run: /gamerule keepInventory true
  3. Verify: /gamerule keepInventory should reply keepInventory is set to true
  4. Note the date in this file under "Applied".

Applied: pending — deferred to operator while RCON gamerule path is broken (see F-16). Ask s8n to run it next time they're logged in. They were online 2026-05-07 17:47 BST restoring YOU500's gear — ideal moment missed; do it on next op session.

Revert trigger (drop this safety net):

When AuthLimbo 1.1.0 is deployed with all of:

  • F1 (void-damage guard for pendingTransit UUIDs)
  • F2 (post-teleportAsync==false recovery: snap to limbo spawn + retry)
  • F4 (pre-empt AuthMe's own broken teleport at LOGIN-LOWEST)

…AND those have been observed handling at least one production void-death race correctly (look for [AuthLimbo] void-damage cancelled for <uuid> or teleportAsync recovered after retry lines in latest.log).

Revert command (in-game): /gamerule keepInventory false

Cross-reference:

  • Audit: /home/admin/ai-lab/_github/minecraft-server/AUDIT-2026-05-07.md F-02, F-16
  • Plugin audit: /home/admin/ai-lab/_github/auth-limbo/AUDIT-2026-05-07.md F1, F2, F4
  • Plugin roadmap: /home/admin/ai-lab/_github/auth-limbo/ROADMAP.md

H2 — Container capability hardening (compose)

Status: Applied to compose file 2026-05-07. NOT yet applied to running container — change goes live on next docker compose up -d --force-recreate.

Reason for deferral: 2 players online (s8n + YOU500) at the time of edit; operator was actively restoring inventory via /give. Restart deferred to avoid a second incident on the same player on the same day.

Restart command (when window opens, no players online or with announcement):

ssh user@192.168.0.100 'docker compose -f /opt/docker/minecraft/docker-compose.yml down && docker compose -f /opt/docker/minecraft/docker-compose.yml up -d'

Post-restart verification:

# 1. Container came up healthy:
docker ps --filter name=minecraft-mc --format '{{.Status}}'
# Expected: "Up X seconds (healthy)" — wait 4-5 min for healthcheck.

# 2. itzg entrypoint did its chowns successfully:
docker logs minecraft-mc 2>&1 | grep -iE "(error|denied|cannot)" | head

# 3. RCON still reachable:
echo "list" | docker exec -i minecraft-mc rcon-cli

If the container fails to start (most likely cause: missing capability):

  1. Check logs for chown: ... Operation not permitted -> add DAC_OVERRIDE.
  2. Check for setuid / setgid errors -> already in cap_add, but verify spelling.
  3. Roll back: cp /opt/docker/minecraft/docker-compose.yml.bak-2026-05-07-before-H2H3 /opt/docker/minecraft/docker-compose.yml && docker compose up -d.

No revert trigger — this is a permanent hardening, not a workaround.


H3 — JVM Xmx lowered 16384M → 14336M (compose)

Status: Applied to compose file 2026-05-07. NOT yet applied to running container — change goes live on the same restart that activates H2.

Reason: AUDIT-2026-05-07 F-05 — original -Xmx16384M inside an 18 GB container leaves <2 GB headroom for off-heap (Netty buffers, native mmaps, plugin metadata). With 25 plugins on Aikar G1 flags, native memory regularly sits 2-3 GB above heap. A player surge that pushes G1 to its full 16 GB ceiling results in a silent kernel OOM kill of the container.

Decision: Lower Xmx (14 GB), do NOT raise the container limit. Host has 31 GB RAM total with ~13 GB free at edit time, but nullstone runs other docker workloads (matrix, rocketchat, traefik, forgejo, etc) and the 18 GB budget for MC was already aggressive. New layout: 14 GB heap + ~3.5 GB native + 0.5 GB direct buffers fits comfortably in 18 GB.

No revert trigger — permanent. If TPS regresses under load due to heap pressure, raise Xmx in 1 GB steps and re-evaluate; don't blanket-revert.


H4 — Compose backups (defence-in-depth)

Status: Applied 2026-05-07.

Files saved:

  • Deployed: /opt/docker/minecraft/docker-compose.yml.bak-2026-05-07-before-H2H3
  • Repo: /home/admin/ai-lab/_github/minecraft-server/docker-compose.yml.bak-2026-05-07-before-H2H3

Restore commands (if H2/H3 prove broken after restart):

# Deployed (revert + restart):
ssh user@192.168.0.100 'cp /opt/docker/minecraft/docker-compose.yml.bak-2026-05-07-before-H2H3 /opt/docker/minecraft/docker-compose.yml && docker compose -f /opt/docker/minecraft/docker-compose.yml up -d --force-recreate'

# Repo:
cp /home/admin/ai-lab/_github/minecraft-server/docker-compose.yml.bak-2026-05-07-before-H2H3 /home/admin/ai-lab/_github/minecraft-server/docker-compose.yml

Backup retention: keep both .bak-2026-05-07-before-H2H3 files until the post-restart verification has been signed off (i.e. one full day of healthy uptime under load).


Index of applied measures

ID Status Applied (live) Reverts when
H1 Compose-staged only NO (deferred to operator: F-16 RCON path broken) AuthLimbo 1.1.0 (F1+F2+F4) ships and proves itself in prod
H2 Compose edits saved NO (next restart) never — permanent hardening
H3 Compose edits saved NO (next restart) never — permanent
H4 Backups created YES after H2/H3 prove healthy