From a1cc3940cfe0bc8ac72c6ca4d3ea7f3291226fe0 Mon Sep 17 00:00:00 2001
From: s8n <admin@s8n.ru>
Date: Thu, 7 May 2026 17:33:24 +0100
Subject: [PATCH] docs: 2026-05-07 incident audit + backup strategy
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Player YOU500 lost full inventory to AuthLimbo void-death at 17:13:39.
Investigation revealed deployed /opt/docker/backup.sh is an 88-line stub
missing the Minecraft block; last successful world backup 2026-05-02
(already pruned). No recoverable .dat exists.

Files:
- AUDIT-2026-05-07.md — server-side findings F-01..F-06 (P0 backups,
  no-keepInventory, AuthLimbo silent failure, chunk preload race,
  Xmx > container headroom, container hardening gaps)
- BACKUP-HUNT-2026-05-07.md — exhaustive backup scan; only 6-week-old
  archive at _archive/minecraft-old-2026-04-27.tar.gz
- BACKUP-STRATEGY.md — restic-based plan; 5min/hourly/daily classes,
  off-host to onyx via Tailscale, monthly drill
- CROSS-REFERENCE-2026-05-07.md — repo+doc landing map; flags
  pre-existing infra/STATE.md backup-broken note + HA-CLUSTER restic
  draft to extend rather than duplicate
- docs/RUNBOOK-BACKUP-RESTORE.md — operator runbook for .dat restore,
  full-world restore, host-loss restore, drill log
---
 AUDIT-2026-05-07.md            | 184 +++++++++++++++
 BACKUP-HUNT-2026-05-07.md      | 118 ++++++++++
 BACKUP-STRATEGY.md             | 393 +++++++++++++++++++++++++++++++++
 CROSS-REFERENCE-2026-05-07.md  | 364 ++++++++++++++++++++++++++++++
 docs/RUNBOOK-BACKUP-RESTORE.md | 156 +++++++++++++
 5 files changed, 1215 insertions(+)
 create mode 100644 AUDIT-2026-05-07.md
 create mode 100644 BACKUP-HUNT-2026-05-07.md
 create mode 100644 BACKUP-STRATEGY.md
 create mode 100644 CROSS-REFERENCE-2026-05-07.md
 create mode 100644 docs/RUNBOOK-BACKUP-RESTORE.md

diff --git a/AUDIT-2026-05-07.md b/AUDIT-2026-05-07.md
new file mode 100644
index 0000000..0b638cc
--- /dev/null
+++ b/AUDIT-2026-05-07.md
@@ -0,0 +1,184 @@
+# Minecraft Server Audit — racked.ru
+**Container:** `minecraft-mc` on nullstone (192.168.0.100)
+**Date:** 2026-05-07
+**Audit type:** Operational / data-integrity (NOT a network-security audit)
+**Auditor:** Claude (Opus 4.7) via SSH read-only inspection
+**Catalyst:** Player **YOU500** void-died at login (~17:13:39 BST), inventory lost. No usable backup existed.
+
+---
+
+## Executive Summary
+
+**Status:** Critical issues found.
+**Risk score model:** Likelihood (1-5) x Impact (1-5) = 1-25. >=15 = High, >=20 = Critical.
+
+A live AuthLimbo `teleportAsync returned false` warning fired during YOU500's first login of the day, immediately after `YOU500 left the confines of this world` (void death in `auth_limbo` world). The player retried twice. On retry #3 they were teleported to (-264.6, 86, -49.8) and 23 seconds later `was blown up by Creeper`. Console operator (s8n) attempted recovery via RCON but neither the void death nor the creeper death had item-restore data because:
+
+1. **No working backups.** `/opt/docker/backup.sh` deployed on nullstone is a stale 88-line copy missing the entire Minecraft block. The repo version (`scripts/backup.sh`) has the block but **was never deployed**. Daily 02:00 cron has been running for at least 7 days producing 8-12K archives that contain no world / playerdata / plugins. `BACKUP.md` claims the script handles MC; it does not.
+2. **CoreProtect tracks inventory transactions but not death drops.** `co inspect` will not surface "dropped on death" entries the way it does pickup/drop, and even if it did, the 1.5 GB SQLite blob is approaching the point where `/co rollback` over an inventory radius is operationally slow.
+3. **No `keepInventory` rule, no death-drop rescue plugin.** With `difficulty=hard`, `gamemode=survival`, and no Essentials `keepinv` permission flow visible, every death is a total loss.
+4. **AuthLimbo has no death-listener and no failure remediation.** When `teleportAsync` returns false, the player is dropped at limbo spawn and the warning is logged at WARN level only — no alert, no rollback, no temp-stash of inventory.
+5. **JVM heap sized larger than container limit.** `JVM_OPTS=-Xmx16384M` inside an `18G` container limit with `MEMORY_SIZE=16G`; if Aikar G1 heap actually grows to Xmx, plus off-heap (Netty, mmaps, zip cache) >2 GB, **kernel OOM kills the container**. Restart-on-OOM has no warning hook to discord/Matrix.
+
+**Three biggest exposures**
+1. Backups silently broken for 7+ days. (Critical — 5x4=20)
+2. No item-loss safety net for any cause of death. (Critical — 4x5=20)
+3. AuthLimbo failure path has no recovery. (High — 4x4=16)
+
+---
+
+## Findings Table
+
+Severity = Likelihood x Impact. P0 = act this week, P1 = this month, P2 = this quarter.
+
+| ID | Severity | Finding | Recommendation | Effort |
+|----|----------|---------|----------------|--------|
+| F-01 | **P0 / 20** | `/opt/docker/backup.sh` on nullstone is missing the entire MC backup block. Repo `scripts/backup.sh` has it but was never deployed. Daily backups since 2026-04-30 are 8-12K (effectively empty). | Sync the deployed script with repo, run a manual backup, verify world tarball >= 5 GB. Add a sentinel check to backup.sh that fails the run if `mc-world-backup-*.tar.gz` < 1 GB. | 30 min |
+| F-02 | **P0 / 20** | No `keepInventory` rule and no `essentials.keepinv` permission. Every death is total loss. | Decide policy: (a) `gamerule keepInventory true` server-wide, (b) keep-inv only when death cause is "void"/"plugin teleport", or (c) auto-restore-on-AuthLimbo-failure. The narrow option (b) preserves survival pain while plugging the AuthLimbo data-loss vector. Plugin candidates: `KeepInventoryOnVoid`, `DeathChestPro`, custom listener in AuthLimbo. | 1-2h research, 1d implement |
+| F-03 | **P0 / 18** | AuthLimbo logs WARN on teleport failure but has no alerting or recovery. The player is left at limbo spawn (y128 platform) where they re-disconnect and on retry get teleported normally — but the warning never reaches an operator. | (a) Bump `teleportAsync returned false` to ERROR. (b) Add a Discord/Matrix webhook alert via existing webhook stack. (c) On failure: snapshot player inventory, kick with friendly message, write recovery file `auth_limbo/incident-<uuid>-<ts>.dat` for ops replay. | 1d |
+| F-04 | **P0 / 18** | YOU500's first failed teleport target was (2380.4, 69.9, -11358.4) — that's 11k blocks out and the chunk likely was not loaded yet. AuthLimbo's `preload-chunks: true` setting fires on `AuthMeAsyncPreLoginEvent` which may not run before `LoginEvent` in HaHaWTH's AuthMe fork. Exact timing race is unverified. | Add chunk-loaded assertion in AuthLimbo before calling `teleportAsync`; if not loaded, force-load synchronously OR delay teleport another 10-20 ticks. Add debug logging of chunk-load state in the WARN line. | 0.5d |
+| F-05 | **P0 / 16** | JVM `-Xmx16384M` inside container `mem_limit=18G` with no headroom for off-heap (Netty buffers, native mmaps, mod metadata). Aikar flags + 25 plugins easily push native to 2-3 GB. Kernel OOM kill is silent. | Either (a) lower `-Xmx` to 12-14 GB and `MaxRAMPercentage`-style flag, OR (b) raise `mem_limit` to 24 GB. Also add `oom_score_adj` and a `docker events --filter event=oom` watcher that pings Discord. | 1h config + 2h alerting |
+| F-06 | **P0 / 16** | No `pids_limit`, no `cap_drop: ALL`, no `read_only: true`. Container runs with the default Docker capability set (CAP_NET_RAW, CAP_SYS_CHROOT, etc.) it does not need. | Add `cap_drop: [ALL]`, `cap_add: [NET_BIND_SERVICE]` (only if binding <1024; 25565 is high so likely none), `pids_limit: 4096`, `security_opt: [no-new-privileges:true]`. Test boot, watch for startup failures. | 1h test |
+| F-07 | **P1 / 15** | CoreProtect SQLite at 1.5 GB. Performance and reliability degrade past 2-3 GB. `database.db` is the only copy; no WAL checkpoint or vacuum schedule. | (a) Migrate to MySQL/MariaDB sidecar container. (b) Add monthly cron `co purge t:30d` (purge entries older than 30 days; CoreProtect docs). (c) Schedule `VACUUM` after purge. | 1d for MySQL migration, 1h for purge cron |
+| F-08 | **P1 / 12** | AuthMe still on `passwordHash: SHA256` (legacy). Migration plan for SHA256 -> BCRYPT is on TODO list and still pending. | Set `legacyHashes: [SHA256]` and `passwordHash: BCRYPT`. AuthMe re-hashes on next successful login. Communicate "your password works as before, no action needed". | 30 min config + monitoring |
+| F-09 | **P1 / 12** | `online-mode=false`. Server depends entirely on AuthMe + EpicGuard for identity. EpicGuard config not audited in this pass. | Verify `enableProtection: false` in AuthMe (currently false) is intentional, since geofencing is `US, GB, LOCALHOST` only — any user from another country is locked out if protection re-enabled. Document the choice in `RULES.md`. | 1h doc only |
+| F-10 | **P1 / 12** | `auto-save-interval: 2400` (= 2 minutes at 20 TPS) is fine, BUT `paper-global.yml` has `player-auto-save: rate: -1` (= use auto-save-interval, so also 2 min). A player who joins, dies, and disconnects within 2 min may have NO post-death snapshot persisted before the player.dat is overwritten by their next login. Player save *does* fire on quit, but if the death happens and the player keeps moving / interacting before logout, items in chunks not yet saved are at risk for tar-while-running backups. | Set `player-auto-save: rate: 1200` (= 1 min). Switch backup strategy to `save-off` + `save-all flush` + tar + `save-on` to guarantee consistency, OR snapshot the host bind-mount with a filesystem-level snapshot (LVM / btrfs / ZFS). | 30 min config, 0.5d for snapshot path |
+| F-11 | **P2 / 10** | `EZShop-1.0-SNAPSHOT.jar` is bundled alongside `AuctionHouse-1.4.6.jar`. PLUGIN_ALTERNATIVES.md TODO calls for dropping EZShop. | Remove EZShop, migrate any active shops to AuctionHouse, document the migration in `docs/migrations/`. | 0.5d player communication, 1h technical |
+| F-12 | **P2 / 10** | Spigot `entity-tracking-range`: monsters 96, misc 96. Roadmap suggests tightening to monster=32, misc=16 for TPS / network savings. | Tune on next maintenance window, re-baseline TPS with `spark` profile. | 1h config, 1d to verify under load |
+| F-13 | **P2 / 9** | 21 plugin folders without matching jar (orphans): `bStats`, `CarbonChat`, `ComfyWhitelist`, `EpicGuard`, `Essentials`, `faststats`, `GrimAC`, `Homestead`, `Lands`, `LPC`, `MarriageMaster`, `MiniMOTD`, `Multiverse-Core`, `PhantomSMP`, `TAB`, `UltimateTimber`, `UnexpectedSpawn`, `Vault`, `WorldEdit`, plus `.bak-*` directories. Most have a renamed jar (`carbonchat-paper-...jar`, `EssentialsX-...jar`) so this is mostly cosmetic. `Lands`, `LPC`, `MarriageMaster`, `PhantomSMP`, `UltimateTimber`, `UnexpectedSpawn` truly orphaned: jars not present. | Audit each: delete data dirs of plugins truly removed; the bStats/Essentials/Vault names are normal. Document plugin-name <-> jar-name pattern in `PLUGINS.md`. | 1h |
+| F-14 | **P2 / 9** | No TPS Discord webhook alert (mentioned on TODO). spark is installed but auto-profile + alerting are not wired up. | spark already supports `spark profile --thresholds`; route to Discord via existing webhook stack. | 0.5d |
+| F-15 | **P2 / 8** | RCON output for async commands (CoreProtect, LuckPerms) does not return to the issuing rcon-cli session. Found while trying `co inspect` from RCON. Async command results land in console only. | Document this in `docs/OPERATIONS.md` (does not exist yet — create it). For automation, attach to `docker logs -f minecraft-mc` in parallel. | 30 min doc |
+| F-16 | **P2 / 8** | `gamerule keepInventory` could not be queried via `rcon-cli` due to `execute in <world> run` argument parsing bug in itzg's rcon-cli wrapper (or RCON quoting). State unknown without in-game console. | Verify in-game by op user, document the rcon-cli limitation. | 5 min in-game |
+| F-17 | **P2 / 6** | `RCON_PASSWORD` is committed to `docker-compose.yml` in plaintext (`*redacted*`). RCON port (25575) is bound to `127.0.0.1` so the blast radius is local only — but the secret is still in git history. | Rotate password, move to `.env` (gitignored), confirm `127.0.0.1`-only binding stays. | 30 min |
+| F-18 | **P2 / 6** | `restart: unless-stopped` with no `start_period` re-evaluation on rapid OOM-restart loops. If the container OOMs every 60s, Docker keeps restarting indefinitely. | Add `restart_policy: { condition: on-failure, max_attempts: 5, window: 300s }` (compose v3+ deploy block) and a watchdog alert. | 30 min |
+
+---
+
+## Detailed Methodology
+
+### Inputs inspected (read-only, no writes)
+
+| Source | Path | Method |
+|--------|------|--------|
+| Container env | `docker inspect minecraft-mc` | host shell |
+| docker-compose | `/opt/docker/minecraft/docker-compose.yml` | host cat |
+| AuthLimbo config | `/data/plugins/AuthLimbo/config.yml` | `docker exec cat` |
+| AuthLimbo logs | `/data/plugins/AuthLimbo/` (no log files exist; only `config.yml`) | `docker exec ls` |
+| AuthMe config | `/data/plugins/AuthMe/config.yml` | `docker exec cat` |
+| AuthMe DB record for YOU500 | `/data/plugins/AuthMe/authme.db` | `docker exec python3 sqlite3` |
+| CoreProtect config | `/data/plugins/CoreProtect/config.yml` | `docker exec cat` |
+| CoreProtect DB size | `/data/plugins/CoreProtect/database.db` | `docker exec du -sh` |
+| Server log | `/data/logs/latest.log` | `docker exec grep` |
+| Paper / Spigot / Purpur configs | `/data/config/paper-*.yml`, `/data/spigot.yml`, `/data/purpur.yml` | `docker exec cat` |
+| World sizes | `/data/world*/` | `docker exec du -sh` |
+| Backup script (deployed) | `/opt/docker/backup.sh` | host cat |
+| Backup script (repo) | `/home/admin/ai-lab/_github/minecraft-server/scripts/backup.sh` | local cat |
+| Backup output | `/opt/backups/` | host stat |
+| Backup log | `/opt/backups/backup.log` | host tail |
+| Live state | RCON `tps`, `list` | `docker exec rcon-cli` |
+
+### YOU500 incident timeline (reconstructed from `latest.log`)
+
+| Time (BST 2026-05-07) | Event |
+|-----------------------|-------|
+| 17:13:34 | Login from 45.157.234.219, UUID c7c2df8e-...-686b |
+| 17:13:35 | Spawned in `auth_limbo` (0.5, 128, 0.5) per AuthLimbo platform default |
+| 17:13:38 | AuthMe: "YOU500 logged in" |
+| 17:13:39 | AuthLimbo: "Restoring YOU500 to world(2380.4, 69.9, -11358.4)" |
+| 17:13:39 | **`YOU500 left the confines of this world`** — void death |
+| 17:13:39 | **`[AuthLimbo] teleportAsync returned false for YOU500 — Paper may have rejected the location.`** |
+| 17:15:33 | Disconnect |
+| 17:15:39 | Re-login from 82.22.5.229. Stored auth-loc has now been UPDATED to (-264.6, 86, -49.8) — different from the first attempt. Either user `/sethome`'d previously or AuthMe overwrote on the void death. |
+| 17:15:44 | AuthLimbo: "Restoring YOU500 to world(-264.6, 86.0, -49.8)" — no WARN this time |
+| 17:15:53 | Disconnect |
+| 17:16:00 | Re-login from 82.22.5.230 |
+| 17:16:05 | AuthLimbo: "Restoring YOU500 to world(-264.6, 86.0, -49.8)" |
+| 17:16:28 | **`YOU500 was blown up by Creeper`** |
+| 17:16:57 | Operator (s8n) RCON: `tpa YOU500 -264 86 -50` + `tell YOU500 grab items fast 5min despawn` |
+| 17:17:02 | RCON teleport executed |
+| 17:18:22 | s8n in-game: `/tp2p YOU500 s8n` |
+
+The void death at 17:13:39 is the data-loss event. AuthMe had `SaveQuitLocation: true` so the (2380, 70, -11358) was a real prior position but the chunk was almost certainly not loaded yet (11k blocks out, no recent player there). `teleportAsync` returned false either because:
+- the chunk failed to load within Paper's async generation budget, or
+- the entity was already dead (void death raced ahead of teleport).
+
+### What CoreProtect WOULD have caught (and didn't)
+
+CoreProtect inventory tracking is enabled (`item-transactions: true`, `item-drops: true`, `item-pickups: true`, `rollback-items: true`). However:
+- A void-death drops items into the world for ~5 min then despawns. Drops are item entities, not container transactions; CoreProtect logs them as drops only if a player was the immediate cause of the drop.
+- A death-drop in the `auth_limbo` world (where the void death happened) drops into y<0 air which is itself a non-event for CP.
+- Thus there was no item-rollback path even if `co inspect` had been run within minutes.
+
+**Implication:** CoreProtect is the wrong tool for death-drop recovery. A real death-drop plugin or `keepInventory` is the only fix.
+
+### Backup script forensics
+
+- Deployed: 88 lines, last block is "Prune old backups". No Minecraft block. No `umask 077`.
+- Repo: 131 lines (with malformed lines 119-122 leftover from a bad merge — ALSO a bug to fix on the next push). Has the Minecraft block. Has `umask 077`.
+- `/opt/backups/backup.log` shows last 5 days of "Backup complete" entries averaging 8-12K. None contain MC data. None mention MC. The log line `Configs: partial (some files missing)` is the configs section misfiring on Matrix paths and was never the MC block.
+- Last verified-good MC archive on host: `/opt/backups/mc-plugins-prerebrand-2026-04-30.tar.gz` (one-shot pre-rebrand snapshot; contents not verified in this audit).
+
+---
+
+## Action Items (Prioritised)
+
+### P0 — this week (by 2026-05-14)
+
+1. **F-01 / Backups.** Sync deployed backup.sh with repo. Fix the lines 119-122 corruption in repo first. Add post-run sentinel: `[ "$(stat -c%s mc-world-backup-*.tar.gz)" -gt 1073741824 ] || log "WORLD BACKUP TOO SMALL — ABORT"`. Run manual backup, verify >= 5 GB on disk. Test a restore into a scratch dir.
+2. **F-02 / Item-loss safety net.** Decide policy. Recommend: enable `keepInventory true` in `auth_limbo` world only (cheap, narrow), and write a 50-line AuthLimbo extension `OnPlayerDeath` listener that detects "death in auth_limbo" -> restore inventory snapshot taken at AuthMeAsyncPreLogin. Survival pain preserved everywhere else.
+3. **F-03 / AuthLimbo recovery.** Bump WARN to ERROR. Wire to existing Discord webhook (per workspace memory: webhook stack on nullstone). On failure, write player snapshot to `auth_limbo/incidents/<uuid>-<ts>.dat`.
+4. **F-04 / Chunk preload race.** Add chunk-loaded check + sync force-load before `teleportAsync`. If still false, kick with friendly message instead of letting the player drop into limbo.
+5. **F-05 / OOM headroom.** Lower `-Xmx` to 14 GB and add `docker events` watcher.
+6. **F-06 / Container hardening.** Add `cap_drop`, `pids_limit`, `no-new-privileges`. Boot test in a window.
+
+### P1 — this month
+
+7. **F-07** CoreProtect prune cron, plan MySQL migration.
+8. **F-08** SHA256 -> BCRYPT migration with legacyHashes fallback.
+9. **F-09** Document `online-mode=false` rationale in RULES.md.
+10. **F-10** Consider LVM/ZFS snapshot for backup atomicity.
+
+### P2 — this quarter
+
+11. **F-11** Drop EZShop after player communication window.
+12. **F-12** Tighten entity tracking range, re-profile with spark.
+13. **F-13** Clean orphan plugin folders.
+14. **F-14** Wire spark TPS alerts to Discord.
+15. **F-15** Document RCON async-command behaviour.
+16. **F-17** Rotate RCON password, move to .env.
+17. **F-18** Add restart-policy max_attempts.
+
+---
+
+## Open Questions for the Operator
+
+1. **Inventory restoration policy.** Is silent `keepInventory` only in `auth_limbo` acceptable, or do you want a manual ops-restore-from-snapshot approval gate?
+2. **YOU500 specifically.** Is there an out-of-band record of what they were carrying (Discord screenshot, witness)? If yes, manual NBT injection into player.dat is feasible. CoreProtect cannot help.
+3. **Chunk preload trade-off.** Force-loading distant chunks at login adds 200-2000ms to login time. Acceptable vs the void-death risk?
+4. **MySQL for CoreProtect.** Adds an operational dependency (another container, another backup target). Worth the complexity, or is monthly purge to keep SQLite under 1 GB sufficient?
+5. **RCON password rotation.** The committed value should be rotated on principle. Schedule a maintenance window?
+6. **online-mode=false.** Confirm long-term stance. Mojang ToS implications for racked.ru?
+7. **Backups offsite.** Currently `/opt/backups/` is on the same host. Plan for offsite copy (B2, restic to friend-PC, anything)?
+
+---
+
+## What was NOT in scope this audit
+
+- Network firewall, fail2ban, host-side security (nullstone-server has its own audit folder).
+- Plugin source-supply-chain audit (covered by `docs/ROADMAP.md` "plugin acquisition overhaul").
+- Performance profiling under load (deferred per F-12).
+- LuckPerms permission graph correctness.
+- Rules / chat-format / prefix audit (workspace memory: do NOT touch LP prefixes).
+- Per-region (Lands / Homestead) data integrity.
+
+---
+
+## Sign-off
+
+| Field | Value |
+|-------|-------|
+| Audit date | 2026-05-07 |
+| Method | Read-only SSH inspection, no fixes applied |
+| Workspace rule applied | "Audit findings -> docs first, then fix" |
+| Next action | Operator review + go/no-go on each P0 item |
+| Next audit due | 2026-08-07 (quarterly), or sooner after backups remediated |
diff --git a/BACKUP-HUNT-2026-05-07.md b/BACKUP-HUNT-2026-05-07.md
new file mode 100644
index 0000000..08091fe
--- /dev/null
+++ b/BACKUP-HUNT-2026-05-07.md
@@ -0,0 +1,118 @@
+# YOU500 Inventory Recovery — Backup Hunt Report
+
+**Date:** 2026-05-07
+**Player:** YOU500 (UUID `c7c2df8e-8783-30b5-891c-86ec9343686b`)
+**Incident:** Full inventory loss at 17:13:39 BST. AuthLimbo `teleportAsync returned false`, player teleport into world from auth_limbo failed → `YOU500 left the confines of this world` (void death). Vanilla `/data/world/playerdata` overwritten on respawn with empty inventory; vanilla void = no drops in world.
+**Host:** nullstone (192.168.0.100), live MC data at `/home/docker/minecraft/` (== `/opt/docker/minecraft/`, same FS, inode 18877649 confirmed).
+**SSH user:** `user` (no sudo). All `/opt/backups/2026*` dated subdirs are root-owned 0700 → unreadable. `/var/lib/docker/volumes/` unreadable.
+
+---
+
+## Summary
+
+**Recoverable backup exists: YES — partial.** The pre-rebrand world archive `/home/user/ai-lab/_archive/minecraft-old-2026-04-27.tar.gz` contains YOU500's playerdata `.dat` from **2026-03-25 18:53** (size 9617 B vs current 9192 B — bigger = inventory likely populated). It is the **only known full-inventory snapshot for this UUID** anywhere on the host.
+
+**Caveat:** This is a 6-week-old snapshot. Items gained between 2026-03-25 and 2026-05-07 17:13 are NOT recoverable from any file backup. **CoreProtect** is installed and has been logging since 2026-05-01 → use `/co inventory YOU500` and `/co rollback` to retrieve anything stored in containers post-2026-05-01.
+
+**No scheduled world backups exist.** `/opt/docker/backup.sh` stopped backing up the MC world after 2026-05-02 (the world-backup branch was removed when the script was last edited; only configs/Matrix/RC are now dumped). Last world tarball that landed on disk: `/opt/backups/20260430_020001/minecraft-configs-20260430_020001.tar.gz` (12 KB → configs only, no playerdata).
+
+---
+
+## Inventory of Backup Artifacts (oldest → newest)
+
+| When | Path | Size | Owner | Contains YOU500 .dat? | Notes |
+|------|------|------|-------|----------------------|-------|
+| 2026-03-25 18:53 (file mtime inside) | `/home/user/ai-lab/_archive/minecraft-old-2026-04-27.tar.gz` | ~? large | user | **YES** — `minecraft/world/playerdata/c7c2df8e-…dat` 9617 B + `.dat_old` 9616 B (2026-03-25 18:49) | **Best candidate.** 133 player .dat files, full world tree, Essentials/LitePlaytimeRewards/LandClaim DBs, advancements, stats. |
+| 2026-04-30 02:01 | `/opt/backups/20260430_020001/minecraft-configs-20260430_020001.tar.gz` | 12 KB | root (UNREADABLE) | NO — configs only | Cannot read without sudo; size implies no world data anyway. |
+| 2026-04-30 02:01 | `/opt/backups/20260430_020001/configs-20260430_020001.tar.gz` | 2.4 KB | root | NO | Traefik/Matrix/RC configs. |
+| 2026-04-30 19:21 | `/opt/backups/mc-plugins-prerebrand-2026-04-30.tar.gz` | 224 MB | user | NO playerdata `.dat` files. Has `plugins/AuthMe/playerdata/` (empty), `plugins/AuthMe.bak-20260430-144204/playerdata/` (empty), `plugins/SkinsRestorer/cache/YOU500.mojangcache`. Vanilla world NOT included. | Plugin trees only — useful for password DB (`plugins/AuthMe.bak-…/authme.db`), not inventory. |
+| 2026-05-03 02:00 | `/opt/backups/20260503_020001/configs-20260503_020001.tar.gz` | 2.4 KB | root | NO | Configs. |
+| 2026-05-04 02:00 → 2026-05-07 02:00 | `/opt/backups/20260504_020001` … `20260507_020001` | 0700 dirs | root (UNREADABLE) | Inferred NO from log: backup.log shows only "configs OK" / "Matrix Postgres skipping" / "Volumes skipping" — world not touched after 2026-05-02. | All four dirs report 12 KB. |
+| 2026-05-07 17:15 | `/home/docker/minecraft/world/playerdata/c7c2df8e-…dat_old` | 9181 B | uid 101000 | YES — but POST-DEATH (empty inventory). | Identical to live state right after first respawn. |
+| 2026-05-07 17:21 | `/home/docker/minecraft/world/playerdata/c7c2df8e-…dat` | 9192 B | uid 101000 | YES — current live, empty inventory. | |
+| 2026-05-07 17:15 | `/tmp/you500.dat` | 9181 B | user | YES — but byte-identical-size to `.dat_old`; gunzip strings show only base attribute schema (no item/Slot tags) → already empty. | Someone (you) already extracted the empty post-death dat. Useless for recovery. |
+
+### Misc archives checked, NOT relevant
+
+- `/opt/source-endpoint/source.tar.gz` — Misskey AGPL source dump.
+- `/opt/backups/misskey/*` — Misskey DB/files.
+- `/home/user/ai-lab/.stversions/_projects/_minecraft/launcher/java/java21.tar~*.gz` — JDK.
+- `/home/user/ai-lab/_projects/_minecraft/resources/racked.ru.-.minecraft.7z` — launcher resources.
+- `/home/user/ai-lab/.stversions/**` — Syncthing versions hold only **server config files** (`server.properties`, `bukkit.yml`, `purpur.yml` etc.) under `_github/online/minecraft-server/config/`. **No `.dat` or `playerdata/`** anywhere in `.stversions`. `.stignore` does not list `world/`, but the synced repo never contained the world dir to begin with (it's `_github/minecraft-server/` = configs + docker-compose only).
+
+---
+
+## CoreProtect — Live Rollback Source
+
+| Path | Size | Born | Last modified |
+|------|------|------|---------------|
+| `/data/plugins/CoreProtect/database.db` (in container) | 1.59 GB | 2026-05-01 10:11:53 | 2026-05-07 17:27 |
+
+CoreProtect logs container interactions, item drops, deaths, inventory changes since **2026-05-01**. For YOU500's items stored in chests/shulkers/ender chests within the world, an in-game rollback can recover them:
+
+- Inspect deaths: `/co lookup user:YOU500 action:#kill time:1d`
+- Inspect inventory transactions: `/co inventory YOU500` (CoreProtect-CE feature)
+- Rollback drops/voids near death: `/co rollback time:1h user:YOU500 radius:#global action:-drop,#kill`
+
+(Items YOU500 carried in person and lost to void at 17:13:39 are unlikely to appear in CoreProtect — vanilla void death deletes drops without a kill event in some versions; CoreProtect's `#kill` may or may not have logged it. Worth a `/co lookup user:YOU500 time:30m` to confirm.)
+
+---
+
+## Best Recovery Candidate
+
+**File:** `/home/user/ai-lab/_archive/minecraft-old-2026-04-27.tar.gz`
+**Internal path:** `minecraft/world/playerdata/c7c2df8e-8783-30b5-891c-86ec9343686b.dat`
+**Snapshot date:** 2026-03-25 18:53 (~6 weeks before incident).
+
+### Extraction command (DO NOT RUN — for review only)
+
+```bash
+# Extract just the YOU500 dat to a staging area, do NOT touch live data
+mkdir -p /tmp/you500-recovery
+tar -xzvf /home/user/ai-lab/_archive/minecraft-old-2026-04-27.tar.gz \
+  -C /tmp/you500-recovery \
+  minecraft/world/playerdata/c7c2df8e-8783-30b5-891c-86ec9343686b.dat \
+  minecraft/world/playerdata/c7c2df8e-8783-30b5-891c-86ec9343686b.dat_old
+
+# Confirm and inspect (NBT viewer or zcat | strings) before any restore
+ls -la /tmp/you500-recovery/minecraft/world/playerdata/
+zcat /tmp/you500-recovery/minecraft/world/playerdata/c7c2df8e-8783-30b5-891c-86ec9343686b.dat \
+  | strings | grep -E 'Slot|count|minecraft:diamond|minecraft:netherite' | head -40
+```
+
+### Restore plan (operator decision — NOT executed)
+
+1. Stop the server (or kick YOU500) so file is not held open.
+2. With sudo (uid 101000 owns the file): copy the extracted `.dat` over `/home/docker/minecraft/world/playerdata/c7c2df8e-8783-30b5-891c-86ec9343686b.dat`, preserve mode/owner.
+3. Also overwrite `.dat_old`.
+4. Optional: replace `Essentials/userdata/c7c2df8e-…yml` from same archive if the YML matters.
+5. Restart server. Player rejoins with March 25 inventory + position.
+
+**Tradeoff:** YOU500 will lose all progress 2026-03-25 → 2026-05-07. Communicate before applying. Combine with CoreProtect rollback to minimise loss.
+
+---
+
+## Gaps
+
+- **No scheduled world backups since 2026-05-02.** `/opt/docker/backup.sh` no longer dumps `world/`. The 2026-04-30 daily contains a 12 KB "minecraft-configs" tarball (configs, not world). Action: re-add a world tarball to the daily script.
+- **No off-host backup.** No restic / borg / duplicity / rsnapshot installed. No rclone. No second host pulling MC data. Syncthing does not sync the world dir.
+- **No filesystem snapshots.** Root is ext4 on LVM (no LVM thinpool snapshots in use), `/home` is ext4 (no btrfs/ZFS).
+- **`/var/lib/docker/volumes/` unreadable** without sudo. Confirmed via `docker volume ls | grep -iE mine|back|world` returning empty (named volumes not used for MC — bind mount only).
+- **`/opt/backups/2026*_020001` subdirs unreadable** (mode 0700 root). Cannot diff their contents byte-for-byte; relied on `backup.log` text + indirect listing. They almost certainly contain only configs (12 KB dirs, log entries match).
+- **`docker exec minecraft-mc env | grep -i backup` returned nothing** — no env-driven autosave/backup plugin enabled (e.g. `itzg/mc-backup` sidecar absent, no AutomatedBackup / EasyBackup jar in `/data/plugins`).
+- **AuthMe `playerdata/` dirs are empty** in both live and `.bak-20260430-144204` — AuthMe is configured without inventory protection (no logged-out inv snapshots).
+- **No InvSee / InventoryRollback plugin.** Only CoreProtect (logs, not snapshots).
+
+---
+
+## Permission-Limited Reads (no sudo via SSH)
+
+| Path | What we couldn't see | Likely contents |
+|------|----------------------|-----------------|
+| `/opt/backups/20260504_020001/` … `20260507_020001/` | Directory listings (0700 root) | Daily configs tarballs, ~12 KB each — confirmed via `du` in backup.log |
+| `/opt/backups/20260430_020001/minecraft-configs-20260430_020001.tar.gz` | tar listing (root-owned, 0600) | MC config bind-mount tarball, 12914 B |
+| `/var/lib/docker/volumes/` | Directory listing | Named volumes — not used by MC (bind mount only) |
+| `/var/backups/` (host) | Listing | Standard Debian dpkg/apt backups, not MC |
+| `/root/` | Anything | — |
+
+Re-run with `sudo` if any of these need confirmation, but content is improbable to change the conclusion.
diff --git a/BACKUP-STRATEGY.md b/BACKUP-STRATEGY.md
new file mode 100644
index 0000000..2a69bb8
--- /dev/null
+++ b/BACKUP-STRATEGY.md
@@ -0,0 +1,393 @@
+# Minecraft Backup Strategy — racked.ru on nullstone
+
+**Status:** PROPOSAL (2026-05-07) — not yet implemented.
+**Author trigger:** Player lost full inventory to void death today; rollback impossible because the existing 02:00 daily backup had **silently failed for 5 of the last 7 days** and there is **zero off-host copy**.
+**Owner:** `s8n` (operator).
+**Target host:** `nullstone` (192.168.0.100, Debian 13 trixie).
+
+---
+
+## 0. Current state (audited 2026-05-07)
+
+Existing system in `/opt/docker/backup.sh` + `cron.d/docker-backup` (02:00 daily, 7-day retention in `/opt/backups/`).
+
+Findings from `/opt/backups/backup.log`:
+
+| Date | MC world result | Backup dir total |
+|------|-----------------|------------------|
+| 2026-04-26 | FAILED | — |
+| 2026-04-27 | FAILED | — |
+| 2026-04-28 | FAILED | — |
+| 2026-04-29 | OK (3.6 G) | — |
+| 2026-04-30 | FAILED | — |
+| 2026-05-01 | FAILED | — |
+| 2026-05-02 | OK (3.6 G) | — |
+| 2026-05-03 | (no MC log line) | 8 K |
+| 2026-05-04 | (no MC log line) | 8 K |
+| 2026-05-05 | (no MC log line) | 8 K |
+| 2026-05-06 | (no MC log line) | 12 K |
+| 2026-05-07 | (no MC log line) | 12 K |
+
+After 2026-05-02 the entire MC block stopped emitting log lines. The script appears to be exiting before reaching it (the duplicated stray `chmod 600 ... synapse-signing-key` lines at L119–122 are orphaned from a botched edit and may now break `set -e`). Effective state: **two MC backups in the last 12 days**, both already pruned by 7-day retention. **No usable backup exists right now.**
+
+Cross-references:
+- `_github/infra/STATE.md` Top-5 weakness #2 ("backup.sh broken silently") and #5 ("No off-host backup").
+- `_github/infra/runbooks/MIGRATION-nullstone-to-cobblestone.md` §5 already names this `F-backup-1` and proposes "Restic + autorestic to B2/Wasabi or to nullstone-as-spare". This strategy refines that to use on-hand resources rather than paid storage.
+
+### Available resources (no purchasing required)
+
+| Asset | Location | Free | Reachability | Role |
+|---|---|---|---|---|
+| nullstone `/home` | local NVMe (ext4 LVM) | 142 G of 399 G | local | Primary repo + restic cache |
+| onyx `/home` | LUKS NVMe | 1.6 T of 1.9 T | Tailscale 100.64.0.1 (LAN ~5 ms) | **Off-host primary** |
+| friend RTX 4080 PC | DESKTOP-LR0RILA | unknown (Windows, large) | Tailscale 100.64.0.3 (WAN, IP-stable via tailnet) | **Off-host secondary** (defer) |
+| nullstone `/opt/backups` | same disk as `/opt/docker` | 142 G | local | *Not* a real backup target — same-disk SPOF |
+
+**No purchased B2 / Wasabi / S3 in this proposal.** Tailscale + onyx covers off-host today. B2 stays in the future-options annex.
+
+---
+
+## 1. Threat model
+
+| # | Threat | Concrete example | Frequency | Mitigation in this plan |
+|---|---|---|---|---|
+| T1 | Player accidental loss (void death, lava, fall) | YOU500, 2026-05-07 | weekly | 5-min playerdata snapshots (RPO ≤ 5 min) |
+| T2 | Griefing / theft / chest emptied by ban-evader | possible | monthly | 5-min playerdata + 1-h world snapshots |
+| T3 | World corruption (chunk error, region-file truncate) | rare | — | 6-h pre-flight validated full world snapshot |
+| T4 | Plugin / config bad change (LuckPerms wipe, server.properties) | edits during ops | weekly | daily configs + DB dump + git history (`live-server/` repo) |
+| T5 | Host disk failure (single NVMe) | low/year | — | nightly off-host copy to onyx (Tailscale) |
+| T6 | Ransomware / host compromise | low | — | append-only Restic repo on onyx; nullstone holds **no** delete key |
+| T7 | Operator `rm -rf` or wrong `docker compose down -v` | low | — | retention floor (4 weekly + 12 monthly) survives a recent rm |
+| T8 | Backup script silently failing (current state) | OBSERVED | — | heartbeat alert + monthly restore drill (§7) |
+
+T8 is the one that just bit us. The single most important addition is **alerting on missed runs**, not the storage tech.
+
+---
+
+## 2. RPO / RTO
+
+| Class | Data | RPO | RTO | Backup mechanism |
+|---|---|---|---|---|
+| A | playerdata (`world/playerdata/*.dat`, `stats/`, `advancements/`) | **5 min** | < 2 min per player | rcon `save-all flush` → rsync to local snapshot, then restic-add |
+| B | full world (region files, end + nether) | **1 h** during play, **6 h** otherwise | 15 min | restic of `world*/` |
+| C | plugin configs + LuckPerms YAML | 24 h | 30 min | tar of `plugins/*/config*.yml` + LP file dump |
+| D | LuckPerms / Homestead SQLite DBs (`*.db`, `homestead_data.db`) | 1 h | 5 min | sqlite `.backup` then restic-add |
+| E | host-level configs (`docker-compose.yml`, `server.properties`, `purpur.yml`, `bukkit.yml`, `paper-*.yml`, `whitelist.json`, `ops.json`, `banned-*.json`, `config/`) | 24 h | 5 min | already in git repo `_github/minecraft-server/`; backup just covers drift |
+
+**Justification for RPO=5 min on Class A:** the void-death case rebuilds in seconds — recovering one `<uuid>.dat` is a ~30 s operation if a 5-min-old snapshot exists. Snapshotting just the 1.3 MB `playerdata/` dir is cheap (single-digit MB/day after dedup).
+
+---
+
+## 3. Tool choice — Restic
+
+Compared:
+
+| Tool | Dedup | Encryption | Snapshots | Network destinations | Verdict |
+|---|---|---|---|---|---|
+| **restic** | content-addressed, very effective on MC region files | AES-256, repo-key | yes | sftp (Tailscale), local, B2, S3, Azure, rclone | **WINNER** |
+| borgbackup | similar | yes | yes | ssh only, lock-on-write | Equally good; restic chosen because operator already plans `restic + autorestic` per `infra/STATE.md` line 112; sftp dest is simpler than borg's required serverside binary |
+| rsnapshot | hardlinks, no dedup | none | rotated dirs | local + rsync | No encryption ⇒ off-host copy on Tailscale (already encrypted) is fine, but no dedup means 18 G × N snapshots is painful. Reject. |
+| zfs send | block-level | (zfs native) | snapshots | yes | nullstone is **ext4/LVM**, no ZFS, no btrfs. Reject. |
+| LVM snapshot | COW | none | yes | local only | Same-disk only, doesn't survive disk failure. Useful as a *staging* primitive only. |
+| custom rsync + cp -al | hardlinks | none | yes | yes | Reinventing rsnapshot. Reject. |
+| itzg `BACKUP_*` env | tar to volume | none | rotation | local | Already tried in spirit by current `backup.sh`; same-disk; not granular. Reject as primary. |
+
+**Decision:** `restic` for Classes A, B, C, D. Continue using a thin tar wrapper for Class E (configs are already in the git repo, this is just safety).
+
+Restic strengths for our case:
+- Region files dedup *very* well (chunks unchanged across snapshots).
+- A 5-min Class-A snapshot adds ~MB to the repo, not the full 1.3 MB × N.
+- One repo on local disk + one mirror to onyx via `rclone serve restic` or direct `sftp:` — no agent needed on onyx beyond ssh.
+- `restic check --read-data-subset=5%` is the canonical scrub.
+
+Apt: `apt install restic` on trixie ships 0.16.x — sufficient.
+
+---
+
+## 4. Schedule
+
+All times Europe/London (matches `TZ` in compose file).
+
+| Job | Cadence | Source | Destination | Mechanism |
+|---|---|---|---|---|
+| **A — playerdata** | every **5 min** | `world/playerdata/`, `world/stats/`, `world/advancements/`, `world*/level.dat`, `*.db` (LP+homestead) | restic repo `/home/user/restic/mc-frequent/` | systemd timer `mc-backup-frequent.timer` |
+| **B — full world** | every **1 h** during play (07:00–01:00), **6 h** otherwise | `world/`, `world_nether/`, `world_the_end/` | restic repo `/home/user/restic/mc-world/` | systemd timer `mc-backup-world.timer` |
+| **C — configs + plugins** | **daily 02:00** | `/opt/docker/minecraft/*.yml`, `*.json`, `plugins/*/config*.yml`, `plugins/LuckPerms/`, `docker-compose.yml` | restic repo `mc-world` (path-tagged) | reuse same timer with second backup target |
+| **D — DB dumps** | every **1 h** | `homestead_data.db`, `plugins/CoreProtect/database.db`, `plugins/LuckPerms/luckperms-h2-*` | restic repo `mc-world` | timer hooks `sqlite3 .backup` first |
+| **E — off-host mirror** | **nightly 03:30** | nullstone `/home/user/restic/` | onyx `100.64.0.1:/home/admin/backups/nullstone-mc-restic/` | `restic copy` over sftp (Tailscale) — append-only key on onyx side |
+| **F — verify** | **weekly Sun 04:00** | both repos | — | `restic check --read-data-subset=5%` then alert on rc |
+| **G — drill** | **monthly 1st Sat 11:00** | random snapshot | scratch dir | §7 procedure |
+
+### Why this works for the void-death case
+
+T1 hits at 18:42. By 18:45 a Class-A snapshot exists containing the player's `<uuid>.dat` from 18:40. Restore: `restic -r ... restore --target /tmp/r --include 'world/playerdata/<uuid>.dat' latest`, stop server (or `/save-off` + minimanip), copy file into place, `/save-on`. Total RTO < 2 min.
+
+---
+
+## 5. Retention
+
+Restic policy (passed to `restic forget --keep-*`):
+
+```
+--keep-last 24            # 24 most recent (covers 2h of 5-min snapshots)
+--keep-hourly 24          # 24h of hourly
+--keep-daily 7            # 7 days
+--keep-weekly 4           # 4 weeks
+--keep-monthly 12         # 12 months
+```
+
+Applied per-tag — Class A snapshots tagged `playerdata`, B/C/D tagged `world`. Forget is run **only on the local repo**; the onyx mirror inherits via `restic copy` with same policy after the local forget+prune.
+
+### Storage budget
+
+- Class A: 1.3 MB raw × dedup (~20× on `.dat`, mostly empty NBT slots) → ~70 KB / snapshot **net**.
+  - 24/h × 24h × 7 = 4 032 snapshots/week → < 300 MB/week.
+- Class B/C/D: 18 G raw → ~6.5 G compressed (per current 3.6 G figure × adjustment for nether/end now active). Restic dedup on hourly snapshots: ~50–200 MB delta/snapshot during active play.
+  - 24h hourly + 7 daily + 4 weekly + 12 monthly ≈ 47 retained → estimate **15–25 GB total** at steady state.
+- E (off-host): same as above on onyx (1.6 TB free — 30× headroom).
+
+**Conclusion:** comfortably fits in nullstone's 142 G free. Onyx is essentially unconstrained.
+
+---
+
+## 6. Off-host destination — onyx via Tailscale
+
+**Choice:** `onyx` (100.64.0.1, 1.6 TB free on `/home`). Reasons:
+- Already in the tailnet (`tag:admin`), already trusted, already SSH-reachable.
+- 1.6 TB is 100× the dataset.
+- Operator's daily-driver: a missed-backup alert on onyx is *seen*.
+- Deferred (phase 2): replicate to friend's RTX 4080 PC (100.64.0.3) for true geographic separation. Tailnet IP is stable across the friend's ISP IP changes per memory `project_friend_gpu`.
+
+**Mechanics:**
+1. On onyx: create restricted user `mc-backup` with `~/backups/nullstone-mc-restic/` and a `~/.ssh/authorized_keys` entry that **only allows `internal-sftp` chrooted to that dir**, no shell, no port-forward. (`Match User mc-backup ... ChrootDirectory %h, ForceCommand internal-sftp -d /backups/nullstone-mc-restic`).
+2. On nullstone: install nullstone's ssh public key on onyx for that user. Use a second **append-only** restic key (separate password) so a compromised nullstone cannot run `forget`/`prune` on the onyx repo. Restic supports this via per-key `--no-cache`-friendly flags, but the harder lock comes from sftp chroot perms (set parent dir owner to root, give `mc-backup` write inside but no `unlink` on rotated lockfiles? — practical compromise: rely on `restic copy` adding-only and audit `forget` runs).
+3. Nightly job on nullstone: `restic -r sftp:mc-backup@100.64.0.1:/backups/nullstone-mc-restic copy --from-repo /home/user/restic/mc-world latest && ... mc-frequent ...`.
+4. Onyx-side cron weekly: `restic check` on the mirror (independent verification).
+
+**Why not friend's GPU PC?** Windows host, no built-in SSH, asymmetric trust. Defer to phase 2 once an SMB or `rclone serve` target is set up there.
+
+---
+
+## 7. Restore drill (monthly, 1st Saturday 11:00)
+
+Runbook: `docs/RUNBOOK-BACKUP-RESTORE.md` (created alongside this proposal).
+
+Drill scenario: "YOU500 lost his inventory to a void death 6 minutes ago." Steps:
+
+1. Pick a known UUID from `world/playerdata/` (operator's own UUID).
+2. `restic -r /home/user/restic/mc-frequent snapshots --tag playerdata | tail -5` — confirm freshest snapshot is ≤ 6 min old.
+3. `restic -r ... restore latest --target /tmp/drill-$(date +%s) --include 'world/playerdata/<uuid>.dat'`.
+4. `nbted` or `python -m nbtlib` parse the `.dat` — confirm it's a valid GZIP NBT structure (not zero bytes, not partial).
+5. `diff` against the live `.dat` — log the differences (expected: at least the inventory NBT path differs because player kept playing).
+6. Repeat from the **onyx mirror** repo to prove off-host works end-to-end.
+7. Log result to `docs/RUNBOOK-BACKUP-RESTORE.md` § Drill log.
+
+Drill is **non-destructive** — never overwrite live `.dat` during a drill. Real restores follow §3 of the runbook.
+
+Pass criteria: both restores complete in < 2 min wall-clock and the parsed NBT root tag is well-formed.
+
+---
+
+## 8. Implementation — concrete drafts
+
+Two layers: a **fix** to the existing daily script (Class C/E) and a **new sidecar timer** for Classes A/B/D.
+
+### 8.1 Fix `/opt/docker/backup.sh` (F-backup-1)
+
+Already documented in `infra/runbooks/MIGRATION-nullstone-to-cobblestone.md` §5. Minimum work:
+- Drop dead `matrix-postgres` block (Synapse retired).
+- Drop / fix `mongodb` block (RC stopped 2026-05-06).
+- Remove orphaned `chmod 600 ...synapse-signing-key...` block at L119–122 (causing `set -e` exit before MC block on most days).
+- Wrap each module in `( ... ) || log "module FAILED"` so one module's failure doesn't skip the rest.
+
+Out-of-scope for this strategy doc — track in infra audit.
+
+### 8.2 New: `mc-backup-frequent` (Class A) and `mc-backup-world` (Classes B/C/D)
+
+Drop-in files (operator review before deploy):
+
+**`/etc/systemd/system/mc-backup-frequent.service`**
+```ini
+[Unit]
+Description=Minecraft frequent backup (playerdata, every 5 min)
+After=docker.service
+Wants=docker.service
+
+[Service]
+Type=oneshot
+User=user
+Group=docker
+EnvironmentFile=/etc/mc-backup.env
+ExecStart=/usr/local/bin/mc-backup-frequent.sh
+Nice=10
+IOSchedulingClass=best-effort
+IOSchedulingPriority=7
+```
+
+**`/etc/systemd/system/mc-backup-frequent.timer`**
+```ini
+[Unit]
+Description=Run mc-backup-frequent every 5 minutes
+
+[Timer]
+OnBootSec=2min
+OnUnitActiveSec=5min
+AccuracySec=30s
+Persistent=true
+
+[Install]
+WantedBy=timers.target
+```
+
+**`/etc/mc-backup.env`** (mode 0600, owner `user:docker`)
+```
+RESTIC_REPOSITORY_FREQUENT=/home/user/restic/mc-frequent
+RESTIC_REPOSITORY_WORLD=/home/user/restic/mc-world
+RESTIC_PASSWORD_FILE=/etc/mc-backup.pw
+MC_DATA=/opt/docker/minecraft
+RCON_HOST=127.0.0.1
+RCON_PORT=25575
+RCON_PASS=*redacted*
+HEARTBEAT_URL=https://ntfy.s8n.ru/mc-backup-frequent
+ALERT_URL=https://ntfy.s8n.ru/mc-backup-alerts
+TS_OFFHOST_USER=mc-backup
+TS_OFFHOST_HOST=100.64.0.1
+TS_OFFHOST_PATH=/backups/nullstone-mc-restic
+```
+
+**`/usr/local/bin/mc-backup-frequent.sh`**
+```bash
+#!/usr/bin/env bash
+set -euo pipefail
+. /etc/mc-backup.env
+
+trap 'curl -fsS -m 10 -d "fail rc=$?" "$ALERT_URL" >/dev/null || true' ERR
+
+# 1. Ask MC to flush via rcon (best-effort; don't fail backup if rcon down)
+if command -v mcrcon >/dev/null 2>&1; then
+    mcrcon -H "$RCON_HOST" -P "$RCON_PORT" -p "$RCON_PASS" -w 1 \
+        "save-all flush" >/dev/null 2>&1 || true
+fi
+
+# 2. Snapshot just the small fast-changing things
+restic backup \
+    --tag playerdata \
+    --tag auto-5min \
+    --host nullstone \
+    --exclude='*.lock' \
+    "$MC_DATA/world/playerdata" \
+    "$MC_DATA/world/stats" \
+    "$MC_DATA/world/advancements" \
+    "$MC_DATA/world/level.dat" \
+    "$MC_DATA/world_nether/level.dat" \
+    "$MC_DATA/world_the_end/level.dat" \
+    "$MC_DATA/homestead_data.db" \
+    "$MC_DATA/plugins/LuckPerms" \
+    "$MC_DATA/plugins/CoreProtect/database.db" 2>/dev/null || true
+
+# 3. Cheap retention (only on local repo)
+restic forget --tag auto-5min \
+    --keep-last 24 --keep-hourly 24 --keep-daily 7 \
+    --prune --quiet
+
+# 4. Heartbeat — alert if NOT received in 15 min via ntfy server
+curl -fsS -m 5 "$HEARTBEAT_URL" >/dev/null || true
+```
+
+**`mc-backup-world.{service,timer,sh}`** — same shape, runs hourly during play / 6h otherwise (use `OnCalendar=*-*-* 07,08,...,01:00:00` or two timers), backs up full `world*/`, configs, DB dumps. After local backup, runs:
+
+```bash
+restic copy \
+    --from-repo "$RESTIC_REPOSITORY_WORLD" \
+    -r "sftp:$TS_OFFHOST_USER@$TS_OFFHOST_HOST:$TS_OFFHOST_PATH" \
+    latest
+```
+
+And once nightly (separate timer) the same `copy` for `mc-frequent`.
+
+### 8.3 docker-compose.override.yml — alternative path (rejected)
+
+Considered: itzg image supports `BACKUP_INTERVAL`, `BACKUP_METHOD=restic`. Pros: in-container, knows when world is loaded. Cons:
+- Bind-mount to host restic repo crosses userns-remap boundary (uid 100000 vs host uid 1000) — already a known nullstone footgun (memory `project_nullstone_docker_userns`).
+- Container restart wipes restic cache, slow first run after every reboot.
+- Mixing in-image and host-cron backup logic doubles failure surfaces.
+
+**Decision:** keep backups in systemd on the host; container is unaware. Override file is **not** part of this proposal.
+
+---
+
+## 9. Monitoring & alerting
+
+Three signals, all routed to ntfy on the existing self-hosted `ntfy.s8n.ru` (assumed to exist; if not, add as part of phase 1 — single-container deploy). DiscordSRV was dropped on 2026-04-30 per README.md L170, so Discord is not an option.
+
+| Signal | Trigger | Channel |
+|---|---|---|
+| `mc-backup-frequent` heartbeat | timer fires successfully | ntfy topic `mc-backup-frequent` (silent on success) |
+| Heartbeat **missing > 15 min** | dead-man's switch on ntfy server, or external (`healthchecks.io` is free + self-hostable) | ntfy topic `mc-backup-alerts` (high priority) |
+| `restic check` weekly | non-zero rc | ntfy topic `mc-backup-alerts` (high priority) |
+| Off-host mirror failure | `restic copy` non-zero rc | ntfy topic `mc-backup-alerts` (high priority) |
+
+Operator subscribes onyx + phone to `mc-backup-alerts` only. The `-frequent` topic is a heartbeat sink (not a notification stream).
+
+**Alternative if no ntfy yet:** write to `/var/log/mc-backup.log` AND a tiny status file `/var/lib/mc-backup/last-success` (mtime checked by an external monitor — Gatus on roadmap, Beszel on roadmap). Until either of those lands, a simple cron on **onyx** doing `ssh user@nullstone 'find /var/lib/mc-backup/last-success -mmin -15 | grep .'` and triggering a desktop `notify-send` is enough.
+
+This addresses T8 (the silent-failure threat) directly.
+
+---
+
+## 10. Cost & capacity
+
+**Hardware cost:** £0. Uses existing nullstone NVMe + onyx NVMe + existing Tailscale mesh.
+
+**Disk consumption (steady state, both repos):**
+
+| Where | Estimate | Headroom |
+|---|---|---|
+| nullstone `/home/user/restic/mc-frequent` | < 1 GB | 142 G free → ~140× |
+| nullstone `/home/user/restic/mc-world` | 15–25 GB | ~6× |
+| onyx `~/backups/nullstone-mc-restic/` | 16–26 GB | 1.6 T free → ~60× |
+
+**Days of retention given current free space:** even if the world doubles to 36 GB raw, dedup keeps growth linear at ~5 % per snapshot — well over a year of monthly retention fits.
+
+**Network:** Tailscale LAN-direct (5 ms onyx ↔ nullstone). Nightly delta typically < 500 MB after dedup. Negligible.
+
+**Operator time:** ~2 h initial deploy, ~10 min/month for the drill, ~zero on autopilot.
+
+---
+
+## 11. Phase plan
+
+| Phase | What | When | Blocker |
+|---|---|---|---|
+| 0 | This doc + runbook stub written, reviewed | TODAY | — |
+| 1 | Stop the bleeding: fix `backup.sh` orphan lines so daily MC tar at least runs again | TODAY (15 min) | — |
+| 2 | Stand up `mc-backup-frequent` timer + local restic repo (Class A) | this week | needs `apt install restic mcrcon` |
+| 3 | Add `mc-backup-world` timer + Class B/C/D | this week | — |
+| 4 | Onyx off-host SFTP target + `restic copy` job | this week | onyx user provisioning + ssh key |
+| 5 | First monthly drill | next 1st Saturday | — |
+| 6 | Wire ntfy alerts | when ntfy/Gatus deployed (infra roadmap) | external |
+| 7 | Friend RTX 4080 PC as second off-host (geographic) | phase 2 | Windows-side tooling |
+
+Phases 1–4 are doable today with what's on hand. Nothing in phases 1–5 requires purchasing.
+
+---
+
+## 12. Open questions for operator
+
+1. **ntfy.s8n.ru — does it exist yet?** Memory hints at Tuwunel + Matrix on `txt.s8n.ru`. If ntfy isn't deployed, decide: deploy ntfy *now*, or use Matrix room via Tuwunel webhook bridge as alert sink.
+2. **Onyx user `mc-backup`** — create today or reuse existing `admin` with restricted authorized_keys? Restricted user is cleaner; reusing `admin` is faster.
+3. **Append-only enforcement** on the onyx side — accept "sftp chroot + no shell" as good-enough, or invest in a per-repo restic key with `--no-delete`-style isolation (more work, partial mitigation only)?
+4. **Pre-flight world validation** — run `region-fixer` against the latest snapshot weekly to catch silent corruption (T3)? Adds ~5 min compute weekly. Recommend yes.
+5. **Class-E (host configs) — already in `live-server/` git repo via Syncthing/manual?** If yes, drop Class E from this scheme; if no, add it.
+
+---
+
+## 13. References
+
+- `docs/BACKUP.md` — current (broken) state docs.
+- `docs/RUNBOOK-BACKUP-RESTORE.md` — operational runbook (this commit).
+- `scripts/backup.sh` — to-be-fixed daily script (F-backup-1 in `infra/STATE.md`).
+- `_github/infra/STATE.md` — Top-5 weakness #2 + #5 tracking this work.
+- `_github/infra/runbooks/MIGRATION-nullstone-to-cobblestone.md` §5 — F-backup-1 detail; nullstone-as-spare hint.
+- Memory: `project_friend_gpu` (Tailscale stable IP for friend), `project_tailscale_mesh` (mesh layout), `project_nullstone_docker_userns` (why container-side backup is rejected).
+- `CLAUDE.md` Device Registry — onyx 192.168.0.28 / 100.64.0.1.
diff --git a/CROSS-REFERENCE-2026-05-07.md b/CROSS-REFERENCE-2026-05-07.md
new file mode 100644
index 0000000..946b4a9
--- /dev/null
+++ b/CROSS-REFERENCE-2026-05-07.md
@@ -0,0 +1,364 @@
+<!--
+  Cross-reference survey for the 2026-05-07 racked.ru / YOU500 incident.
+  Read-only inventory of existing docs across local repo clones, written
+  to help the four parallel investigation outputs (backup hunt, AuthLimbo
+  audit, backup strategy, server audit) integrate without conflict.
+
+  Author: cross-reference agent (read-only)
+  Status: survey only — no fixes proposed here, that's the other agents' job.
+-->
+
+# Cross-Reference Survey — 2026-05-07
+
+**Trigger:** racked.ru player **YOU500** void-died via AuthLimbo
+`teleportAsync` failure, lost full inventory, no backups exist.
+Four parallel agents are writing audit + plan docs. This doc maps
+them onto existing infra so nothing collides or gets orphaned.
+
+---
+
+## 1. Per-repo state snapshot
+
+### `auth-limbo` (Paper plugin source)
+
+| Field | Value |
+|---|---|
+| Origin | `ssh://git@192.168.0.100:222/s8n-ru/auth-limbo.git` ⚠️ stale (`s8n-ru` rename) |
+| Latest tag in CHANGELOG | **1.0.0** (2026-04-30) — single release |
+| Last commit | `b686380 readme: restyle to match minecraft-launcher format` |
+| Recent commits | README rewrites, AGPL switch, rename chain `RackedLimbo → LoginLimbo → AuthLimbo` |
+| CI | `.github/workflows/build.yml` + `release.yml` (GitHub Actions, **not** `.forgejo/`) |
+| Tests | **None.** `src/test/` does not exist. |
+| Source | 5 Java files: `AuthLimbo`, `AuthMeDatabase`, `LimboWorldManager`, `LoginListener`, `VoidGenerator` |
+| Docs | `docs/{compatibility,configuration,how-it-works,installation}.md` |
+| CHANGELOG style | **Keep a Changelog + SemVer**, date-suffixed `## [1.0.0] - 2026-04-30` |
+| License | AGPL-3.0-or-later, SPDX header in every Java file |
+
+**Key existing detail relevant to the bug** — `LoginListener.java`
+already implements the documented Paper #4085 fix (chunk-ticket pin
+in `AuthMeAsyncPreLoginEvent` + `getChunkAtAsyncUrgently` chained
+with `teleportAsync` at MONITOR priority on `LoginEvent`, with
+configurable `authme.teleport-delay-ticks`). If YOU500 still
+void-died, the bug is in **how** that chain handled a return-value
+of `false` / a thrown exception — the current code only logs a
+`warning` and lets the player stay wherever they were (which on
+login is the limbo void). See `LoginListener.java:166-191`.
+
+The AuthLimbo audit agent's findings should land as:
+- **`docs/INCIDENT-2026-05-07-you500.md`** (new) — forensic root-cause
+  doc, follow `docs/REBRAND_2026-04-30.md` style (date-prefixed,
+  scope/apply/result/rollback sections — convention shown below).
+- **`CHANGELOG.md`** — bump to `## [1.0.1] - 2026-05-07` with
+  `### Fixed` block, follow Keep-a-Changelog format.
+- **`src/main/java/ru/authlimbo/LoginListener.java`** — code patch.
+  Likely changes: handle `success == false` and `exceptionally`
+  with a kick or retry rather than silent log; consider raising
+  default `teleport-delay-ticks` from 10 → 20.
+- **`src/test/`** (new directory) — unit tests for the listener.
+  No precedent here, but pom.xml needs JUnit added.
+
+---
+
+### `minecraft-server` (server repo — this repo)
+
+| Field | Value |
+|---|---|
+| Origin | `ssh://git@192.168.0.100:222/s8n-ru/minecraft-server.git` ⚠️ stale |
+| Last commit | `ede6029 proantitab: allow lp/luckperms in global; deny essentials.motd default` |
+| Top-level docs | `MISSION.md`, `README.md`, `RULES.md`, `THANKS.md`, `VIBE.md`, `TELEMETRY_AUDIT.md` |
+| `docs/` | `BACKUP.md`, `DEPLOY.md`, `PERMISSIONS.md`, `PLUGINS.md`, `PLUGIN_ALTERNATIVES.md`, `RACKED_BRAND.md`, `REBRAND_2026-04-30.md`, `ROADMAP.md`, `migrations/lands-to-landclaim.md`, `plugins/<name>.md` (20 files) |
+| Existing TODO | The README "Roadmap / TODO" section (lines 91-180) is the canonical living checklist. Tagged `[P0]` blocker / `[P1]` vision / `[P2]` improvement / `[P3]` nice-to-have. `docs/ROADMAP.md` is **scoped narrowly** to plugin-acquisition overhaul (Phases 1-3). |
+| `live-server/` | live config snapshot (purpur.yml, server.properties, ops.json, plugins/) — **mirrors prod state**, not a build input. |
+| Backup script | `scripts/backup.sh` — note **bug at line 119** (orphaned `"${BACKUP_PATH}/synapse-signing-key-${TIMESTAMP}.key"` block sits outside any `if`, will fail at runtime if signing-key path absent) |
+| CI | `.github/workflows/` is empty. `.github/ISSUE_TEMPLATE/` empty. No `.forgejo/`. |
+
+**No existing files named** `AUDIT*`, `INCIDENT*`, `RUNBOOK*`,
+`TODO*`, `CHANGELOG*` at root or in `docs/`. The closest precedents:
+- `docs/REBRAND_2026-04-30.md` — date-prefixed event log w/
+  Apply/Side incident/Rollback sections. **Use this as the format
+  template for any new INCIDENT-* doc.**
+- `docs/migrations/lands-to-landclaim.md` — multi-section migration
+  plan (Current State / Target / Plan / Rollback). Format template
+  for future strategy docs.
+- `MISSION.md` / `VIBE.md` / `RULES.md` — top-level "values" docs.
+  Don't add new top-level capitalised md files unless the doc is
+  similarly load-bearing for the project's identity. Detail goes in
+  `docs/`.
+
+---
+
+### `infra` (nullstone+cobblestone runbooks)
+
+| Field | Value |
+|---|---|
+| Origin | `ssh://git@192.168.0.100:222/veilor-org/infra.git` ✅ org-scoped, no rename impact |
+| Last commit | `381f923 runbook: distribute load + sync data (operator's HA vision)` |
+| Layout | `forgejo/`, `runbooks/`, `repos/`, root `STATE.md` + `AUDIT-2026-05-05.md` |
+| Runbooks | `COBBLESTONE-INTAKE.md`, `DE-DECISION-cobblestone.md`, **`HA-CLUSTER-distribute-and-sync.md`** (already covers MC backup placement!), `MIGRATION-nullstone-to-cobblestone.md` |
+
+**Critical pre-existing context:**
+- `STATE.md` already lists *"`/opt/docker/backup.sh` fixes —
+  matrix-postgres + rocketchat-mongodb + literal CHANGE_ME pw"* as
+  open issue (line 97), AND lists Restic+autorestic as the **#1**
+  recommended addition (lines 113, 283-285 of `AUDIT-2026-05-05.md`).
+- `runbooks/HA-CLUSTER-distribute-and-sync.md` line 51 already plans
+  *"Backups (offsite) — Restic to B2/Wasabi nightly"* and line 72
+  pins MC to nullstone with *"World data ZFS-replicated for DR
+  only"*. The backup-strategy agent's plan must reconcile with this
+  — don't propose a parallel scheme; either extend the HA runbook or
+  cross-link it as the parent design.
+- `AUDIT-2026-05-05.md` lines 200-203 already flag the backup script
+  as silently broken (RC + ex-Matrix not dumping). Confirms the
+  symptom that caused YOU500's loss.
+
+**Format conventions in `infra/`:**
+- Audit reports: `# 5-Agent Audit Report — YYYY-MM-DD` header,
+  TL;DR section, severity-ordered Action items section, file index.
+- Runbooks: `# Runbook — <topic>` header, Goal blockquote, North-star
+  diagram if applicable, phase plan, failure scenarios + RTO table,
+  open decisions, related links.
+- Dating: filenames always `<TYPE>-YYYY-MM-DD.md`.
+
+---
+
+### `minecraft-launcher`
+
+| Field | Value |
+|---|---|
+| Origin | `ssh://git@192.168.0.100:222/s8n-ru/minecraft-launcher.git` ⚠️ stale |
+| Last commit | `31d25f8 readme: shrink license section to single sub line` |
+| Relevance to incident | None direct. Would only matter if the incident agent recommends a launcher-side patch (e.g. forced relog on void death detection) — unlikely. |
+
+### `minecraft-client`
+
+**Not a git repo** (`fatal: not a git repository`). No remote to
+worry about. Excluded from any rewrite list.
+
+### `veilor-os`
+
+| Field | Value |
+|---|---|
+| Origin | `ssh://git@192.168.0.100:222/veilor-org/veilor-os.git` ✅ no rename impact |
+| Relevance | None — separate brand (security distro), not Minecraft. Skipped per instructions. |
+
+---
+
+## 2. Stale `s8n-ru` origin URLs (per 2026-05-07 rename)
+
+Per workspace memory `user_git_identity.md` the Forgejo user `s8n-ru`
+was renamed to `s8n` on 2026-05-07. Forgejo serves a 307 redirect for
+now but the canonical path is `s8n/<repo>`. The following local
+clones still have the old origin:
+
+| Repo (local clone) | Current origin | Should become |
+|---|---|---|
+| `_github/auth-limbo` | `ssh://git@192.168.0.100:222/s8n-ru/auth-limbo.git` | `ssh://git@192.168.0.100:222/s8n/auth-limbo.git` |
+| `_github/minecraft-server` | `ssh://git@192.168.0.100:222/s8n-ru/minecraft-server.git` | `ssh://git@192.168.0.100:222/s8n/minecraft-server.git` |
+| `_github/minecraft-launcher` | `ssh://git@192.168.0.100:222/s8n-ru/minecraft-launcher.git` | `ssh://git@192.168.0.100:222/s8n/minecraft-launcher.git` |
+
+**No rename required for:** `_github/infra` (`veilor-org/`),
+`_github/veilor-os` (`veilor-org/`), `_github/minecraft-client` (not
+a repo).
+
+Recommended one-shot fix (deferred — not part of these four agents):
+
+```bash
+for r in auth-limbo minecraft-server minecraft-launcher; do
+  cd /home/admin/ai-lab/_github/$r
+  git remote set-url origin ssh://git@192.168.0.100:222/s8n/$r.git
+done
+```
+
+Also update the in-doc URL references:
+- `auth-limbo/src/main/resources/plugin.yml` line 7: `website: https://github.com/s8n-ru/auth-limbo`
+- `auth-limbo/src/main/java/ru/authlimbo/*.java` SPDX header: `Copyright (C) 2026 s8n-ru`
+- `minecraft-server/VIBE.md` line 38: `github.com/s8n-ru/auth-limbo`
+
+---
+
+## 3. Overlap with session-noted TODO items
+
+The session noted these TODOs that the four agents may want to fold
+into recommendations. State as of HEAD:
+
+| Item | Existing mention? | Where | Status |
+|---|---|---|---|
+| **SHA256 → BCRYPT** (AuthMe hashing) | ✅ flagged 2026-05-02 | `security/nullstone-server/2026-05-02-mc-audit.md` summary: *"AuthMe also uses unsalted SHA-256, no tempban, no captcha, and 5-char minimum passwords"* | **Not yet addressed in repo.** No TODO entry in README. New. |
+| **EZShop drop** | ⚠️ Plugin loaded via `PLUGINS:` in `docker-compose.yml:51` | docker-compose.yml | No TODO entry yet. New. |
+| **CapDrop** (Linux capabilities) | ❌ No mention | — | Net-new infra-side item (deploy.security level). Belongs in server-audit agent's report. |
+| **tracking-range** | ❌ No mention | — | Net-new (purpur.yml tuning). New. |
+| **CO DB → MySQL** (CoreProtect) | ❌ No mention | — | Net-new. Touches plugin policy (CoreProtect-CE is the one acknowledged license exception per MISSION.md — CO config change OK, plugin swap not). |
+| **TPS webhook** | ⚠️ "Prometheus exporter + Grafana" entry exists in README:105 (P2). Webhook would be lighter-weight alternative. | README.md:105 | Adjacent to existing TODO; consider replacing or augmenting it. |
+| **spark baseline** | ✅ spark already loaded in `PLUGINS:` (compose:54) and listed in VIBE.md:78 | docker-compose.yml, VIBE.md | "Baseline" = capture a profiling run for ref. Net-new. |
+| **plugin folder cleanup** | ⚠️ `live-server/plugins/` is checked-in live config snapshot. Past cleanup happened in REBRAND_2026-04-30 (Side incident — disk full). | docs/REBRAND_2026-04-30.md:65-74 | Operational, not docs. Net-new. |
+
+**None of the eight overlap with the existing `docs/ROADMAP.md`**
+(which is scoped narrowly to *plugin-acquisition* — manifest +
+lockfile + CI). They all belong in the **README.md "Roadmap / TODO"
+checklist** by current convention. The server-audit agent should
+append them there, not create a new ROADMAP-* doc.
+
+---
+
+## 4. Existing backup-related mentions
+
+| File | Line | Content |
+|---|---|---|
+| `docs/BACKUP.md` | all | Documents the daily 02:00 cron + retention. **Critical drift:** describes worlds being backed up but VIBE.md:54-58 says *"no world backups"*. Direct contradiction. |
+| `scripts/backup.sh` | 80-117 | Minecraft block: docker-exec tar of world/world_nether/world_the_end + configs. **Real, working code.** |
+| `scripts/backup.sh` | 119-122 | **Orphaned dead-code block** outside any `if` (dangling from `synapse-signing-key`). Will trigger script failure if signing-key path missing. |
+| `README.md` | 23, 45, 164, 179 | Mentions backup feature. README:179 records "freed 11G+ (old backups, ...)". |
+| `VIBE.md` | 54-58 | *"Daily configs, no world backups (it'd eat too much disk). If you lose a base to grief, that's the game."* — **conflicts with reality.** |
+| `docs/REBRAND_2026-04-30.md` | 53, 65-74 | Records 2026-04-30 backup tarball and 2026-05-01 disk-full incident from accumulated backups. Confirms backups *were* running. |
+| `SYSTEM.md` | 737-749 | Workspace-level system reference says backups run daily, ~5-7GB compressed. Out-of-date plugin counts (says 25, actual ~16) and Purpur version (says 1.21.10, actual 1.21.11). |
+
+**Major contradiction the backup-strategy agent must resolve:**
+either VIBE.md must drop the *"no world backups"* line (recommended
+— reality is that worlds **are** being backed up), or the operator
+must accept that the YOU500 loss happened because the worlds were
+**logically excluded from the policy** even though they were
+mechanically being archived. The latter is unlikely — daily 02:00
+tarball would have caught a 2026-05-07 daytime void death.
+
+**Backup-hunt agent finding to verify:** does `/opt/backups/` on
+nullstone actually contain any usable `mc-world-backup-*.tar.gz`
+files? `STATE.md` line 97 + `AUDIT-2026-05-05.md` lines 200-203
+suggest the script *runs* but its other arms are failing silently;
+the MC arm at lines 80-117 of backup.sh has no obvious bug, so
+backups should exist. If they don't, that's the deepest finding.
+
+---
+
+## 5. Forgejo runner / CI integration
+
+Per memory `project_forgejo_nullstone.md` and `STATE.md` line 26-27,
+nullstone runs a Forgejo runner with labels
+`ubuntu-24.04 + nullstone`. **No repo currently has a `.forgejo/`
+directory** — neither auth-limbo nor minecraft-server nor infra. CI
+in `auth-limbo` is GitHub Actions (`.github/workflows/`).
+
+`STATE.md` line 121-129 notes the v0.5.32 veilor-os ship is pending
+on flipping `runs-on:` to `nullstone` to use the Forgejo runner.
+
+**Implication for the audit agents:** if the AuthLimbo agent wants
+the fix to land via CI, two options:
+1. Keep `.github/workflows/build.yml`, since GH-mirror is now
+   manual-only post-2026-05-06 (`STATE.md`:14-18) — workflow won't
+   trigger automatically anymore, would need manual mirror push.
+2. Migrate to `.forgejo/workflows/build.yml` with
+   `runs-on: ubuntu-24.04` (compatible with the runner). Cleaner,
+   matches new direction. **Recommended.**
+
+Either path: pre-existing dependency on `AUTHME_JAR_URL` repo secret
+(see `.github/workflows/build.yml:21-26`) needs to be re-added on
+Forgejo if path 2 is taken.
+
+---
+
+## 6. Workspace-level `SYSTEM.md` updates needed after backup-strategy lands
+
+`/home/admin/ai-lab/SYSTEM.md` lines 665-779 has the canonical
+workspace-level Minecraft section. After the backup-strategy doc
+lands, the following blocks need editing (one PR, one paragraph
+each):
+
+| SYSTEM.md location | Existing content | Drift |
+|---|---|---|
+| Line 677 | "Minecraft Version: 1.21.10 (Purpur build 2532)" | Actual: 1.21.11 (compose line 10) |
+| Line 686-690 | "25 plugins loaded ... bulk-updated 2026-04-17" | Plugin set has shifted heavily since (LandClaimPlugin → Homestead, WorldEdit → FAWE, Vault → VaultUnlocked, LoginSecurity → AuthMe, AuthLimbo added, EZShop+AuctionHouse added). Real count ≈ 16. |
+| Line 692-706 | RAM 7GB idle, Purpur 1.21.10-2535, startup 47s | Out of date; would-be benefit re-measure as part of "spark baseline" TODO. |
+| Line 765-771 | "Known Issues" block | Add YOU500 incident closure note (post-fix), F10 RCON wildcard already promised in Wave 2. |
+| Line 776 | "Backup frequency: Add 6-hourly world snapshots for active play sessions" | This is the existing wishlist item the backup-strategy agent will likely satisfy. Strike or replace with "Done — see infra/runbooks/MC-BACKUP-2026-05-07.md" (or wherever the strategy lands). |
+
+**Per `CLAUDE.md` workspace rules**, technical detail belongs in
+SYSTEM.md, not README.md. The README device-table line for
+nullstone won't change.
+
+---
+
+## 7. Integration recommendations — where each parallel agent's doc lands
+
+| Agent | Output should land at | Rationale |
+|---|---|---|
+| **Backup hunt** (find existing backups) | `_github/minecraft-server/docs/INCIDENT-2026-05-07-you500-backup-hunt.md` | Date-prefixed, follows REBRAND_2026-04-30.md format. Forensic in nature → minecraft-server `docs/`. |
+| **AuthLimbo audit** (root-cause + code patch) | (1) `_github/auth-limbo/docs/INCIDENT-2026-05-07-teleportasync-failure.md` for forensic write-up; (2) source patch + `CHANGELOG.md` bump in same repo; (3) optional cross-link from `minecraft-server/docs/INCIDENT-2026-05-07-you500-backup-hunt.md` | Plugin source repo owns plugin bugs. INCIDENT- naming convention matches REBRAND_*.md. |
+| **Backup strategy** (forward-looking design) | `_github/infra/runbooks/MC-BACKUP-strategy-2026-05-07.md` (or extend `HA-CLUSTER-distribute-and-sync.md` with a Phase 1.5 sub-section) | infra owns nullstone-side cron + restic. Cross-link from `minecraft-server/docs/BACKUP.md` (replace its current contents with a thin pointer). |
+| **Server audit** (broader hardening — CapDrop, plugin folder, MySQL, etc) | `_github/minecraft-server/docs/AUDIT-2026-05-07.md` (synthesis), then **append individual TODOs to README.md "Roadmap / TODO"** | Matches `infra/AUDIT-2026-05-05.md` precedent. README is the canonical TODO surface for this repo per existing convention. |
+
+**Files needing edits AFTER all four agents finish:**
+
+| File | Change |
+|---|---|
+| `_github/minecraft-server/README.md` | Append new TODO entries from server-audit agent: SHA256→BCRYPT, EZShop drop, CapDrop, tracking-range, CO MySQL, TPS webhook, spark baseline, plugin folder cleanup. Add `[x]` for the YOU500 incident under "Done" once fix shipped. |
+| `_github/minecraft-server/docs/BACKUP.md` | Rewrite to point to infra runbook; current Schedule/Strategy/Manual sections move to infra. Or replace contents with thin "see infra/runbooks/MC-BACKUP-strategy-2026-05-07.md". |
+| `_github/minecraft-server/VIBE.md` | Drop or revise lines 54-58 — *"no world backups"* contradicts reality and is the philosophical claim that may have justified treating backups as low-priority. Important narrative fix. |
+| `_github/minecraft-server/scripts/backup.sh` | Fix orphaned line 119-122 dead-code block. Independent of strategy agent's output. |
+| `_github/minecraft-server/docker-compose.yml` | If EZShop drop accepted: remove line 51. (Server-audit agent decision.) |
+| `_github/auth-limbo/CHANGELOG.md` | New `## [1.0.1] - 2026-05-07` entry. |
+| `_github/auth-limbo/pom.xml` | Version bump 1.0.0 → 1.0.1 if patch shipped. |
+| `_github/auth-limbo/src/main/java/ru/authlimbo/LoginListener.java` | Code fix per AuthLimbo agent. |
+| `_github/infra/STATE.md` | Add 2026-05-07 changelog entry referencing the incident; check off "/opt/docker/backup.sh fixes" pending decision (line 97) when backup script repaired. |
+| `_github/infra/AUDIT-2026-05-05.md` | Append addendum or leave dated; the new audit replaces/augments the F-numbered findings related to MC backups. |
+| `/home/admin/ai-lab/SYSTEM.md` | Update Minecraft section per §6 above. Add note in Known Issues (line 765). Update Last Updated. |
+| `/home/admin/ai-lab/README.md` | "Last Updated" stamp; one-line status mention if user wants it surfaced at workspace level. |
+
+---
+
+## 8. Open conflicts and duplications
+
+1. **VIBE.md vs reality** (most important narrative conflict). VIBE
+   says no world backups; backup.sh + BACKUP.md + REBRAND_2026-04-30
+   prove worlds **are** archived nightly. The YOU500 inventory loss
+   means either (a) backups didn't run that day, (b) backup ran but
+   the rollback isn't operationally feasible (would lose other
+   players' progress between 02:00 and the death), or (c) operator
+   chose not to rollback. **The backup-strategy agent must address
+   this explicitly** rather than just propose a new scheme.
+
+2. **`docs/ROADMAP.md` scope vs README "Roadmap / TODO"** — the
+   docs file is narrowly about plugin-acquisition Phases 1-3, while
+   the README has the all-up living checklist. Future agents should
+   not put generic TODO items into `docs/ROADMAP.md`. Keep its scope
+   tight or rename it `docs/PLUGIN-ACQUISITION-ROADMAP.md`.
+
+3. **infra `HA-CLUSTER-distribute-and-sync.md` vs new MC-backup
+   strategy** — there's a real risk the backup-strategy agent
+   designs Restic-to-B2 in isolation while HA-CLUSTER already plans
+   that exact service for both nullstone+cobblestone. Strategy doc
+   must reference and extend the HA-CLUSTER plan (specifically the
+   "Backups (offsite)" row in its layer table, line 51).
+
+4. **CoreProtect MySQL migration** — proposed in session TODOs.
+   `MISSION.md:24` codifies CoreProtect-CE as "the one acknowledged
+   license exception". Switching its DB backend to MySQL is fine
+   under that policy (config, not plugin swap), but the server-audit
+   agent should explicitly note "this is a config change, not a
+   plugin swap, so MISSION.md:24 still holds" so the policy isn't
+   accidentally diluted.
+
+5. **AuthLimbo CI host** — `.github/workflows/` lives in repo but
+   GH push-mirror is off as of 2026-05-06. Builds will only run if
+   someone manually pushes to GH. Worth flagging to the AuthLimbo
+   agent that any CI step they propose may need a `.forgejo/`
+   variant, otherwise the patched 1.0.1 release won't auto-build.
+
+6. **`_github/minecraft-client` is not a git repo** — nothing to
+   worry about for this incident, but anyone iterating on the
+   incident later may try to commit something there expecting it to
+   work. Worth recording.
+
+---
+
+## 9. Summary table — convention by repo
+
+| Repo | Audit doc convention | Incident doc convention | TODO surface | CHANGELOG style |
+|---|---|---|---|---|
+| `auth-limbo` | (none yet) | (none yet — recommend `docs/INCIDENT-YYYY-MM-DD-<slug>.md`) | (none — small repo) | Keep a Changelog + SemVer, `## [X.Y.Z] - YYYY-MM-DD` |
+| `minecraft-server` | (none yet — recommend `docs/AUDIT-YYYY-MM-DD.md` matching infra style) | follow `docs/REBRAND_2026-04-30.md` template | README "Roadmap / TODO" with `[P0..P3]` tags | (none — uses git log) |
+| `infra` | `AUDIT-YYYY-MM-DD.md` at root | (use runbooks for forward-looking; no incident files yet) | `STATE.md` "Pending decisions" table | (none — uses git log + STATE.md) |
+| `minecraft-launcher` | n/a | n/a | (none) | (none) |
+| `veilor-os` | (separate brand — out of scope) | — | — | — |
+
+---
+
+*End of survey. Read-only. No files modified. No commits pushed.*
diff --git a/docs/RUNBOOK-BACKUP-RESTORE.md b/docs/RUNBOOK-BACKUP-RESTORE.md
new file mode 100644
index 0000000..c8b467d
--- /dev/null
+++ b/docs/RUNBOOK-BACKUP-RESTORE.md
@@ -0,0 +1,156 @@
+# Runbook — Backup & Restore (Minecraft, racked.ru on nullstone)
+
+Strategy doc: [`../BACKUP-STRATEGY.md`](../BACKUP-STRATEGY.md). This runbook is the **operator-facing** procedure for the three scenarios that come up in practice. Keep it short, copy-paste-able, and reachable from the player support workflow.
+
+> **Status (2026-05-07):** This runbook is written **ahead** of the implementation it describes. The `mc-backup-frequent` timer and onyx mirror are NOT yet deployed. The "What if no snapshot exists yet?" section at the bottom covers today's reality.
+
+---
+
+## TL;DR — restore one player's `.dat` from N minutes ago
+
+```bash
+# On nullstone, as `user`:
+PUUID=<player-uuid>          # e.g. from /opt/docker/minecraft/usercache.json
+WHEN=latest                  # or "5 min ago", or a snapshot id
+RESTIC_PASSWORD_FILE=/etc/mc-backup.pw \
+restic -r /home/user/restic/mc-frequent \
+    restore "$WHEN" \
+    --target /tmp/restore-$$ \
+    --include "world/playerdata/${PUUID}.dat"
+
+# Verify the file is well-formed NBT before applying:
+file /tmp/restore-$$/opt/docker/minecraft/world/playerdata/${PUUID}.dat
+# Expected: "gzip compressed data"
+
+# Apply (server must be running so playerdata is writable; the player
+# MUST be offline or we're racing the writer):
+mcrcon -H 127.0.0.1 -P 25575 -p *redacted* "kick ${PUUID_NICK} Restore in progress"
+mcrcon -H 127.0.0.1 -P 25575 -p *redacted* "save-off"
+mcrcon -H 127.0.0.1 -P 25575 -p *redacted* "save-all flush"
+
+cp /opt/docker/minecraft/world/playerdata/${PUUID}.dat \
+   /opt/docker/minecraft/world/playerdata/${PUUID}.dat.preFix-$(date +%s)
+cp /tmp/restore-$$/opt/docker/minecraft/world/playerdata/${PUUID}.dat \
+   /opt/docker/minecraft/world/playerdata/${PUUID}.dat
+chown 100000:100000 /opt/docker/minecraft/world/playerdata/${PUUID}.dat   # userns-remap
+
+mcrcon -H 127.0.0.1 -P 25575 -p *redacted* "save-on"
+# Tell the player to log back in.
+```
+
+**Why kick + `save-off`:** if the player is online, the server holds their NBT in memory and rewrites the `.dat` on next save tick — clobbering the restore. `save-off` halts auto-save; kicking guarantees the in-memory state for that player won't be flushed.
+
+**Userns-remap reminder:** the host sees container-uid `100000` for files written by the MC process. Restored files written by `user` (uid 1000) will appear empty/permission-denied to the container. Always `chown 100000:100000` (or `chmod 666`) after restore. Memory: `project_nullstone_docker_userns`.
+
+---
+
+## Scenario 1 — Player lost inventory (T1, the void-death case)
+
+This is what the strategy was written for. RTO target: **< 2 minutes**.
+
+1. Find the UUID:
+   ```bash
+   grep -i 'NICK' /opt/docker/minecraft/usercache.json
+   ```
+2. Pick a snapshot just **before** the loss. `restic snapshots --tag playerdata` shows timestamps.
+3. Run the TL;DR block above with that snapshot id (or `latest` if loss happened in the last 5 min).
+4. Inform the player: "Your inventory from HH:MM has been restored. Anything you picked up after that point is gone."
+5. Log the incident: append to `docs/INCIDENTS.md` (create if absent) — date, player, snapshot id, cause.
+
+---
+
+## Scenario 2 — Whole world rolled back (T2/T3, griefing or corruption)
+
+RTO target: **15 minutes**. Server downtime expected.
+
+1. Announce, kick, stop:
+   ```bash
+   mcrcon ... "say Server going down for restore — back in ~15 min"
+   mcrcon ... "kick @a Restore in progress"
+   cd /opt/docker/minecraft && docker compose down
+   ```
+2. Move live data aside (do not delete):
+   ```bash
+   mv /opt/docker/minecraft /opt/docker/minecraft.broken-$(date +%F)
+   mkdir -p /opt/docker/minecraft
+   ```
+3. Restore from the world repo:
+   ```bash
+   RESTIC_PASSWORD_FILE=/etc/mc-backup.pw \
+   restic -r /home/user/restic/mc-world \
+       restore <snapshot-id> --target /tmp/world-restore
+   rsync -aHAX /tmp/world-restore/opt/docker/minecraft/ /opt/docker/minecraft/
+   ```
+4. **Re-apply userns-remap perms** (critical — see memory):
+   ```bash
+   chmod -R 777 /opt/docker/minecraft   # quickfix; or chown -R 100000:100000
+   ```
+5. Boot:
+   ```bash
+   cd /opt/docker/minecraft && docker compose up -d
+   docker logs -f minecraft-mc   # watch for "Done" line
+   ```
+6. Verify with a known-good UUID's `.dat` parse, then announce server up.
+7. Keep `minecraft.broken-YYYY-MM-DD/` for at least 7 days for forensic comparison.
+
+---
+
+## Scenario 3 — Host disk dead (T5)
+
+RTO target: **few hours, depends on hardware swap**.
+
+1. New host: install Debian 13 + Docker per `_github/infra/runbooks/MIGRATION-nullstone-to-cobblestone.md`.
+2. `apt install restic`. Pull the password from operator's password manager into `/etc/mc-backup.pw`.
+3. Initialise destination dir, then restore from **onyx mirror** (not local — local is gone):
+   ```bash
+   restic -r sftp:mc-backup@100.64.0.1:/backups/nullstone-mc-restic \
+       restore latest --target /tmp/world-restore
+   ```
+4. Continue Scenario 2 from step 4.
+5. Stand up the timers on the new host. **Do not** point them at the same off-host repo until the new host has been re-keyed (rotate restic passwords as part of disaster recovery).
+
+---
+
+## Drill log (monthly)
+
+| Date | Operator | Snapshot age | Class A restore time | Off-host restore time | Result |
+|------|----------|--------------|----------------------|------------------------|--------|
+| (first drill — 2026-06-06) | s8n | TBD | TBD | TBD | TBD |
+
+Procedure: see `BACKUP-STRATEGY.md` §7.
+
+---
+
+## What if no snapshot exists yet? (CURRENT REALITY 2026-05-07)
+
+Until phases 1–4 of `BACKUP-STRATEGY.md` are deployed, the only recovery resources are:
+
+| Source | What's there | Recoverable? |
+|---|---|---|
+| `/opt/backups/202604xx_020001/mc-world-backup-*.tar.gz` | World tar from Apr 29 + May 2 (others FAILED) | **GONE** — pruned by 7-day retention |
+| `/opt/backups/mc-plugins-prerebrand-2026-04-30.tar.gz` | Plugin jars only, no world | Not useful for player data |
+| Live `/opt/docker/minecraft/world/playerdata/<uuid>.dat_old` | MC's own .dat_old shadow file from previous save | **YES** — last save tick before current. **First-line defence right now.** |
+| CoreProtect DB (`plugins/CoreProtect/database.db`) | Block + container actions, NOT inventory state | Partial — can roll back grief, can't restore lost items |
+
+**Today's playbook for inventory-loss reports:**
+
+1. Server console → `co lookup u:NICK` to confirm the loss event in CoreProtect.
+2. **Stop the server immediately** if the report comes in within the same play session — every save tick overwrites `.dat_old`. `docker compose down` buys time.
+3. Inspect `world/playerdata/<uuid>.dat_old` — if it predates the loss, copy it over `<uuid>.dat`, fix perms (uid 100000), restart.
+4. If `.dat_old` is too new (already overwritten): **the loss is unrecoverable until BACKUP-STRATEGY phases 1–4 are deployed.** Apologise to the player. Spawn-in compensation per operator discretion (ops creative-mode replacement is the customary remedy).
+5. Log the incident — adds urgency to deploying the new strategy.
+
+---
+
+## TODO — open items (links into BACKUP-STRATEGY.md §11)
+
+- [ ] Phase 1: fix `/opt/docker/backup.sh` orphan-line bug (F-backup-1).
+- [ ] Phase 2: deploy `mc-backup-frequent.timer` (Class A, 5-min playerdata).
+- [ ] Phase 3: deploy `mc-backup-world.timer` (Class B/C/D, hourly).
+- [ ] Phase 4: provision `mc-backup` user on onyx + `restic copy` job.
+- [ ] Phase 5: schedule monthly drill calendar entry, run first drill.
+- [ ] Phase 6: ntfy / Matrix alert wiring (depends on ntfy deployment).
+- [ ] Phase 7: friend RTX 4080 PC as secondary off-host.
+- [ ] Verify `usercache.json` on this host: confirm UUID lookup workflow above resolves to the right `.dat`.
+- [ ] Decide: `mcrcon` package vs lightweight Python `mcrcon` lib.
+- [ ] Document compensation policy for unrecoverable losses (operator discretion right now).