H2 (F-06): cap_drop ALL + minimum cap_add (CHOWN, SETUID, SETGID, FOWNER), no-new-privileges, deploy.resources.limits.pids=4096. compose config valid. DAC_OVERRIDE deliberately omitted; re-add only if entrypoint chown fails. H3 (F-05): Xmx 16384M -> 14336M, MEMORY_SIZE 16G -> 14G. Leaves ~3.5G headroom for off-heap inside the unchanged 18G container limit. Host has no spare RAM to raise the cap (other workloads). H1 (F-02): server-wide gamerule keepInventory true planned but RCON path for gamerule is broken (F-16) so it's deferred to operator in-game on next op session. Documented in INTERIM-MITIGATIONS.md with a clear revert trigger (when AuthLimbo F1+F2+F4 ship). H4: pre-edit compose backed up to docker-compose.yml.bak-2026-05-07-before-H2H3 (deployed and repo). Restore commands in INTERIM-MITIGATIONS.md. Live restart deferred: 2 players online (s8n actively restoring YOU500's gear via /give). H2/H3 go live on next compose recreate.
20 KiB
Minecraft Server Audit — racked.ru
Container: minecraft-mc on nullstone (192.168.0.100)
Date: 2026-05-07
Audit type: Operational / data-integrity (NOT a network-security audit)
Auditor: Claude (Opus 4.7) via SSH read-only inspection
Catalyst: Player YOU500 void-died at login (~17:13:39 BST), inventory lost. No usable backup existed.
Executive Summary
Status: Critical issues found. Risk score model: Likelihood (1-5) x Impact (1-5) = 1-25. >=15 = High, >=20 = Critical.
A live AuthLimbo teleportAsync returned false warning fired during YOU500's first login of the day, immediately after YOU500 left the confines of this world (void death in auth_limbo world). The player retried twice. On retry #3 they were teleported to (-264.6, 86, -49.8) and 23 seconds later was blown up by Creeper. Console operator (s8n) attempted recovery via RCON but neither the void death nor the creeper death had item-restore data because:
- No working backups.
/opt/docker/backup.shdeployed on nullstone is a stale 88-line copy missing the entire Minecraft block. The repo version (scripts/backup.sh) has the block but was never deployed. Daily 02:00 cron has been running for at least 7 days producing 8-12K archives that contain no world / playerdata / plugins.BACKUP.mdclaims the script handles MC; it does not. - CoreProtect tracks inventory transactions but not death drops.
co inspectwill not surface "dropped on death" entries the way it does pickup/drop, and even if it did, the 1.5 GB SQLite blob is approaching the point where/co rollbackover an inventory radius is operationally slow. - No
keepInventoryrule, no death-drop rescue plugin. Withdifficulty=hard,gamemode=survival, and no Essentialskeepinvpermission flow visible, every death is a total loss. - AuthLimbo has no death-listener and no failure remediation. When
teleportAsyncreturns false, the player is dropped at limbo spawn and the warning is logged at WARN level only — no alert, no rollback, no temp-stash of inventory. - JVM heap sized larger than container limit.
JVM_OPTS=-Xmx16384Minside an18Gcontainer limit withMEMORY_SIZE=16G; if Aikar G1 heap actually grows to Xmx, plus off-heap (Netty, mmaps, zip cache) >2 GB, kernel OOM kills the container. Restart-on-OOM has no warning hook to discord/Matrix.
Three biggest exposures
- Backups silently broken for 7+ days. (Critical — 5x4=20)
- No item-loss safety net for any cause of death. (Critical — 4x5=20)
- AuthLimbo failure path has no recovery. (High — 4x4=16)
Findings Table
Severity = Likelihood x Impact. P0 = act this week, P1 = this month, P2 = this quarter.
| ID | Severity | Finding | Recommendation | Effort |
|---|---|---|---|---|
| F-01 | P0 / 20 | /opt/docker/backup.sh on nullstone is missing the entire MC backup block. Repo scripts/backup.sh has it but was never deployed. Daily backups since 2026-04-30 are 8-12K (effectively empty). |
Sync the deployed script with repo, run a manual backup, verify world tarball >= 5 GB. Add a sentinel check to backup.sh that fails the run if mc-world-backup-*.tar.gz < 1 GB. |
30 min |
| F-02 | P0 / 20 | No keepInventory rule and no essentials.keepinv permission. Every death is total loss. |
Decide policy: (a) gamerule keepInventory true server-wide, (b) keep-inv only when death cause is "void"/"plugin teleport", or (c) auto-restore-on-AuthLimbo-failure. The narrow option (b) preserves survival pain while plugging the AuthLimbo data-loss vector. Plugin candidates: KeepInventoryOnVoid, DeathChestPro, custom listener in AuthLimbo. |
1-2h research, 1d implement |
| F-03 | P0 / 18 | AuthLimbo logs WARN on teleport failure but has no alerting or recovery. The player is left at limbo spawn (y128 platform) where they re-disconnect and on retry get teleported normally — but the warning never reaches an operator. | (a) Bump teleportAsync returned false to ERROR. (b) Add a Discord/Matrix webhook alert via existing webhook stack. (c) On failure: snapshot player inventory, kick with friendly message, write recovery file auth_limbo/incident-<uuid>-<ts>.dat for ops replay. |
1d |
| F-04 | P0 / 18 | YOU500's first failed teleport target was (2380.4, 69.9, -11358.4) — that's 11k blocks out and the chunk likely was not loaded yet. AuthLimbo's preload-chunks: true setting fires on AuthMeAsyncPreLoginEvent which may not run before LoginEvent in HaHaWTH's AuthMe fork. Exact timing race is unverified. |
Add chunk-loaded assertion in AuthLimbo before calling teleportAsync; if not loaded, force-load synchronously OR delay teleport another 10-20 ticks. Add debug logging of chunk-load state in the WARN line. |
0.5d |
| F-05 | P0 / 16 | JVM -Xmx16384M inside container mem_limit=18G with no headroom for off-heap (Netty buffers, native mmaps, mod metadata). Aikar flags + 25 plugins easily push native to 2-3 GB. Kernel OOM kill is silent. |
Either (a) lower -Xmx to 12-14 GB and MaxRAMPercentage-style flag, OR (b) raise mem_limit to 24 GB. Also add oom_score_adj and a docker events --filter event=oom watcher that pings Discord. |
1h config + 2h alerting |
| F-06 | P0 / 16 | No pids_limit, no cap_drop: ALL, no read_only: true. Container runs with the default Docker capability set (CAP_NET_RAW, CAP_SYS_CHROOT, etc.) it does not need. |
Add cap_drop: [ALL], cap_add: [NET_BIND_SERVICE] (only if binding <1024; 25565 is high so likely none), pids_limit: 4096, security_opt: [no-new-privileges:true]. Test boot, watch for startup failures. |
1h test |
| F-07 | P1 / 15 | CoreProtect SQLite at 1.5 GB. Performance and reliability degrade past 2-3 GB. database.db is the only copy; no WAL checkpoint or vacuum schedule. |
(a) Migrate to MySQL/MariaDB sidecar container. (b) Add monthly cron co purge t:30d (purge entries older than 30 days; CoreProtect docs). (c) Schedule VACUUM after purge. |
1d for MySQL migration, 1h for purge cron |
| F-08 | P1 / 12 | AuthMe still on passwordHash: SHA256 (legacy). Migration plan for SHA256 -> BCRYPT is on TODO list and still pending. |
Set legacyHashes: [SHA256] and passwordHash: BCRYPT. AuthMe re-hashes on next successful login. Communicate "your password works as before, no action needed". |
30 min config + monitoring |
| F-09 | P1 / 12 | online-mode=false. Server depends entirely on AuthMe + EpicGuard for identity. EpicGuard config not audited in this pass. |
Verify enableProtection: false in AuthMe (currently false) is intentional, since geofencing is US, GB, LOCALHOST only — any user from another country is locked out if protection re-enabled. Document the choice in RULES.md. |
1h doc only |
| F-10 | P1 / 12 | auto-save-interval: 2400 (= 2 minutes at 20 TPS) is fine, BUT paper-global.yml has player-auto-save: rate: -1 (= use auto-save-interval, so also 2 min). A player who joins, dies, and disconnects within 2 min may have NO post-death snapshot persisted before the player.dat is overwritten by their next login. Player save does fire on quit, but if the death happens and the player keeps moving / interacting before logout, items in chunks not yet saved are at risk for tar-while-running backups. |
Set player-auto-save: rate: 1200 (= 1 min). Switch backup strategy to save-off + save-all flush + tar + save-on to guarantee consistency, OR snapshot the host bind-mount with a filesystem-level snapshot (LVM / btrfs / ZFS). |
30 min config, 0.5d for snapshot path |
| F-11 | P2 / 10 | EZShop-1.0-SNAPSHOT.jar is bundled alongside AuctionHouse-1.4.6.jar. PLUGIN_ALTERNATIVES.md TODO calls for dropping EZShop. |
Remove EZShop, migrate any active shops to AuctionHouse, document the migration in docs/migrations/. |
0.5d player communication, 1h technical |
| F-12 | P2 / 10 | Spigot entity-tracking-range: monsters 96, misc 96. Roadmap suggests tightening to monster=32, misc=16 for TPS / network savings. |
Tune on next maintenance window, re-baseline TPS with spark profile. |
1h config, 1d to verify under load |
| F-13 | P2 / 9 | 21 plugin folders without matching jar (orphans): bStats, CarbonChat, ComfyWhitelist, EpicGuard, Essentials, faststats, GrimAC, Homestead, Lands, LPC, MarriageMaster, MiniMOTD, Multiverse-Core, PhantomSMP, TAB, UltimateTimber, UnexpectedSpawn, Vault, WorldEdit, plus .bak-* directories. Most have a renamed jar (carbonchat-paper-...jar, EssentialsX-...jar) so this is mostly cosmetic. Lands, LPC, MarriageMaster, PhantomSMP, UltimateTimber, UnexpectedSpawn truly orphaned: jars not present. |
Audit each: delete data dirs of plugins truly removed; the bStats/Essentials/Vault names are normal. Document plugin-name <-> jar-name pattern in PLUGINS.md. |
1h |
| F-14 | P2 / 9 | No TPS Discord webhook alert (mentioned on TODO). spark is installed but auto-profile + alerting are not wired up. | spark already supports spark profile --thresholds; route to Discord via existing webhook stack. |
0.5d |
| F-15 | P2 / 8 | RCON output for async commands (CoreProtect, LuckPerms) does not return to the issuing rcon-cli session. Found while trying co inspect from RCON. Async command results land in console only. |
Document this in docs/OPERATIONS.md (does not exist yet — create it). For automation, attach to docker logs -f minecraft-mc in parallel. |
30 min doc |
| F-16 | P2 / 8 | gamerule keepInventory could not be queried via rcon-cli due to execute in <world> run argument parsing bug in itzg's rcon-cli wrapper (or RCON quoting). State unknown without in-game console. |
Verify in-game by op user, document the rcon-cli limitation. | 5 min in-game |
| F-17 | P2 / 6 | RCON_PASSWORD is committed to docker-compose.yml in plaintext (*redacted*). RCON port (25575) is bound to 127.0.0.1 so the blast radius is local only — but the secret is still in git history. |
Rotate password, move to .env (gitignored), confirm 127.0.0.1-only binding stays. |
30 min |
| F-18 | P2 / 6 | restart: unless-stopped with no start_period re-evaluation on rapid OOM-restart loops. If the container OOMs every 60s, Docker keeps restarting indefinitely. |
Add restart_policy: { condition: on-failure, max_attempts: 5, window: 300s } (compose v3+ deploy block) and a watchdog alert. |
30 min |
Detailed Methodology
Inputs inspected (read-only, no writes)
| Source | Path | Method |
|---|---|---|
| Container env | docker inspect minecraft-mc |
host shell |
| docker-compose | /opt/docker/minecraft/docker-compose.yml |
host cat |
| AuthLimbo config | /data/plugins/AuthLimbo/config.yml |
docker exec cat |
| AuthLimbo logs | /data/plugins/AuthLimbo/ (no log files exist; only config.yml) |
docker exec ls |
| AuthMe config | /data/plugins/AuthMe/config.yml |
docker exec cat |
| AuthMe DB record for YOU500 | /data/plugins/AuthMe/authme.db |
docker exec python3 sqlite3 |
| CoreProtect config | /data/plugins/CoreProtect/config.yml |
docker exec cat |
| CoreProtect DB size | /data/plugins/CoreProtect/database.db |
docker exec du -sh |
| Server log | /data/logs/latest.log |
docker exec grep |
| Paper / Spigot / Purpur configs | /data/config/paper-*.yml, /data/spigot.yml, /data/purpur.yml |
docker exec cat |
| World sizes | /data/world*/ |
docker exec du -sh |
| Backup script (deployed) | /opt/docker/backup.sh |
host cat |
| Backup script (repo) | /home/admin/ai-lab/_github/minecraft-server/scripts/backup.sh |
local cat |
| Backup output | /opt/backups/ |
host stat |
| Backup log | /opt/backups/backup.log |
host tail |
| Live state | RCON tps, list |
docker exec rcon-cli |
YOU500 incident timeline (reconstructed from latest.log)
| Time (BST 2026-05-07) | Event |
|---|---|
| 17:13:34 | Login from 45.157.234.219, UUID c7c2df8e-...-686b |
| 17:13:35 | Spawned in auth_limbo (0.5, 128, 0.5) per AuthLimbo platform default |
| 17:13:38 | AuthMe: "YOU500 logged in" |
| 17:13:39 | AuthLimbo: "Restoring YOU500 to world(2380.4, 69.9, -11358.4)" |
| 17:13:39 | YOU500 left the confines of this world — void death |
| 17:13:39 | [AuthLimbo] teleportAsync returned false for YOU500 — Paper may have rejected the location. |
| 17:15:33 | Disconnect |
| 17:15:39 | Re-login from 82.22.5.229. Stored auth-loc has now been UPDATED to (-264.6, 86, -49.8) — different from the first attempt. Either user /sethome'd previously or AuthMe overwrote on the void death. |
| 17:15:44 | AuthLimbo: "Restoring YOU500 to world(-264.6, 86.0, -49.8)" — no WARN this time |
| 17:15:53 | Disconnect |
| 17:16:00 | Re-login from 82.22.5.230 |
| 17:16:05 | AuthLimbo: "Restoring YOU500 to world(-264.6, 86.0, -49.8)" |
| 17:16:28 | YOU500 was blown up by Creeper |
| 17:16:57 | Operator (s8n) RCON: tpa YOU500 -264 86 -50 + tell YOU500 grab items fast 5min despawn |
| 17:17:02 | RCON teleport executed |
| 17:18:22 | s8n in-game: /tp2p YOU500 s8n |
The void death at 17:13:39 is the data-loss event. AuthMe had SaveQuitLocation: true so the (2380, 70, -11358) was a real prior position but the chunk was almost certainly not loaded yet (11k blocks out, no recent player there). teleportAsync returned false either because:
- the chunk failed to load within Paper's async generation budget, or
- the entity was already dead (void death raced ahead of teleport).
What CoreProtect WOULD have caught (and didn't)
CoreProtect inventory tracking is enabled (item-transactions: true, item-drops: true, item-pickups: true, rollback-items: true). However:
- A void-death drops items into the world for ~5 min then despawns. Drops are item entities, not container transactions; CoreProtect logs them as drops only if a player was the immediate cause of the drop.
- A death-drop in the
auth_limboworld (where the void death happened) drops into y<0 air which is itself a non-event for CP. - Thus there was no item-rollback path even if
co inspecthad been run within minutes.
Implication: CoreProtect is the wrong tool for death-drop recovery. A real death-drop plugin or keepInventory is the only fix.
Backup script forensics
- Deployed: 88 lines, last block is "Prune old backups". No Minecraft block. No
umask 077. - Repo: 131 lines (with malformed lines 119-122 leftover from a bad merge — ALSO a bug to fix on the next push). Has the Minecraft block. Has
umask 077. /opt/backups/backup.logshows last 5 days of "Backup complete" entries averaging 8-12K. None contain MC data. None mention MC. The log lineConfigs: partial (some files missing)is the configs section misfiring on Matrix paths and was never the MC block.- Last verified-good MC archive on host:
/opt/backups/mc-plugins-prerebrand-2026-04-30.tar.gz(one-shot pre-rebrand snapshot; contents not verified in this audit).
Action Items (Prioritised)
P0 — this week (by 2026-05-14)
- F-01 / Backups. Sync deployed backup.sh with repo. Fix the lines 119-122 corruption in repo first. Add post-run sentinel:
[ "$(stat -c%s mc-world-backup-*.tar.gz)" -gt 1073741824 ] || log "WORLD BACKUP TOO SMALL — ABORT". Run manual backup, verify >= 5 GB on disk. Test a restore into a scratch dir. - F-02 / Item-loss safety net. Decide policy. Recommend: enable
keepInventory trueinauth_limboworld only (cheap, narrow), and write a 50-line AuthLimbo extensionOnPlayerDeathlistener that detects "death in auth_limbo" -> restore inventory snapshot taken at AuthMeAsyncPreLogin. Survival pain preserved everywhere else. [H1, 2026-05-07] Interim: server-widegamerule keepInventory trueplanned but deferred — RCON command path can't reachgamerule(see F-16). Operator must run/gamerule keepInventory truein-game on next op session. Revert plan documented inINTERIM-MITIGATIONS.md(revert when AuthLimbo F1+F2+F4 ship). - F-03 / AuthLimbo recovery. Bump WARN to ERROR. Wire to existing Discord webhook (per workspace memory: webhook stack on nullstone). On failure, write player snapshot to
auth_limbo/incidents/<uuid>-<ts>.dat. - F-04 / Chunk preload race. Add chunk-loaded check + sync force-load before
teleportAsync. If still false, kick with friendly message instead of letting the player drop into limbo. - F-05 / OOM headroom. Lower
-Xmxto 14 GB and adddocker eventswatcher. [H3, 2026-05-07]-Xms8192M -Xmx14336M+MEMORY_SIZE: "14G"written todocker-compose.yml(both deployed + repo). Container limit unchanged at 18G — host is 31G total / ~13G free, other workloads need the rest. Goes live on next compose recreate (deferred — 2 players online).docker eventswatcher remains TODO. - F-06 / Container hardening. Add
cap_drop,pids_limit,no-new-privileges. Boot test in a window. [H2, 2026-05-07]cap_drop: [ALL]+cap_add: [CHOWN, SETUID, SETGID, FOWNER]+security_opt: [no-new-privileges:true]+deploy.resources.limits.pids: 4096written todocker-compose.yml.compose config --quietvalidates clean. DAC_OVERRIDE deliberately omitted — add only if entrypoint chown fails. Goes live on next recreate. Backup of pre-edit compose at/opt/docker/minecraft/docker-compose.yml.bak-2026-05-07-before-H2H3.
P1 — this month
- F-07 CoreProtect prune cron, plan MySQL migration.
- F-08 SHA256 -> BCRYPT migration with legacyHashes fallback.
- F-09 Document
online-mode=falserationale in RULES.md. - F-10 Consider LVM/ZFS snapshot for backup atomicity.
P2 — this quarter
- F-11 Drop EZShop after player communication window.
- F-12 Tighten entity tracking range, re-profile with spark.
- F-13 Clean orphan plugin folders.
- F-14 Wire spark TPS alerts to Discord.
- F-15 Document RCON async-command behaviour.
- F-17 Rotate RCON password, move to .env.
- F-18 Add restart-policy max_attempts.
Open Questions for the Operator
- Inventory restoration policy. Is silent
keepInventoryonly inauth_limboacceptable, or do you want a manual ops-restore-from-snapshot approval gate? - YOU500 specifically. Is there an out-of-band record of what they were carrying (Discord screenshot, witness)? If yes, manual NBT injection into player.dat is feasible. CoreProtect cannot help.
- Chunk preload trade-off. Force-loading distant chunks at login adds 200-2000ms to login time. Acceptable vs the void-death risk?
- MySQL for CoreProtect. Adds an operational dependency (another container, another backup target). Worth the complexity, or is monthly purge to keep SQLite under 1 GB sufficient?
- RCON password rotation. The committed value should be rotated on principle. Schedule a maintenance window?
- online-mode=false. Confirm long-term stance. Mojang ToS implications for racked.ru?
- Backups offsite. Currently
/opt/backups/is on the same host. Plan for offsite copy (B2, restic to friend-PC, anything)?
What was NOT in scope this audit
- Network firewall, fail2ban, host-side security (nullstone-server has its own audit folder).
- Plugin source-supply-chain audit (covered by
docs/ROADMAP.md"plugin acquisition overhaul"). - Performance profiling under load (deferred per F-12).
- LuckPerms permission graph correctness.
- Rules / chat-format / prefix audit (workspace memory: do NOT touch LP prefixes).
- Per-region (Lands / Homestead) data integrity.
Sign-off
| Field | Value |
|---|---|
| Audit date | 2026-05-07 |
| Method | Read-only SSH inspection, no fixes applied |
| Workspace rule applied | "Audit findings -> docs first, then fix" |
| Next action | Operator review + go/no-go on each P0 item |
| Next audit due | 2026-08-07 (quarterly), or sooner after backups remediated |