# Runbook — Backup & Restore (Minecraft, racked.ru on nullstone) Strategy doc: [`../BACKUP-STRATEGY.md`](../BACKUP-STRATEGY.md). This runbook is the **operator-facing** procedure for the three scenarios that come up in practice. Keep it short, copy-paste-able, and reachable from the player support workflow. > **Status (2026-05-07):** Phase 1 (the daily `/opt/docker/backup.sh` MC world tarball) is **deployed and verified** — see "Phase 1 deployment" section near the bottom. Phase 2 (`mc-backup-playerdata.timer`, 5-min cadence) and the onyx off-host mirror are NOT yet deployed; deployment steps in "Phase 2 deployment" below. Until Phase 2 lands, the daily 02:00 tarball is the only safety net (RPO up to 24h). --- ## TL;DR — restore one player's `.dat` from N minutes ago ```bash # On nullstone, as `user`: PUUID= # e.g. from /opt/docker/minecraft/usercache.json WHEN=latest # or "5 min ago", or a snapshot id RESTIC_PASSWORD_FILE=/etc/mc-backup.pw \ restic -r /home/user/restic/mc-frequent \ restore "$WHEN" \ --target /tmp/restore-$$ \ --include "world/playerdata/${PUUID}.dat" # Verify the file is well-formed NBT before applying: file /tmp/restore-$$/opt/docker/minecraft/world/playerdata/${PUUID}.dat # Expected: "gzip compressed data" # Apply (server must be running so playerdata is writable; the player # MUST be offline or we're racing the writer): mcrcon -H 127.0.0.1 -P 25575 -p ${RCON_PASSWORD} "kick ${PUUID_NICK} Restore in progress" mcrcon -H 127.0.0.1 -P 25575 -p ${RCON_PASSWORD} "save-off" mcrcon -H 127.0.0.1 -P 25575 -p ${RCON_PASSWORD} "save-all flush" cp /opt/docker/minecraft/world/playerdata/${PUUID}.dat \ /opt/docker/minecraft/world/playerdata/${PUUID}.dat.preFix-$(date +%s) cp /tmp/restore-$$/opt/docker/minecraft/world/playerdata/${PUUID}.dat \ /opt/docker/minecraft/world/playerdata/${PUUID}.dat chown 100000:100000 /opt/docker/minecraft/world/playerdata/${PUUID}.dat # userns-remap mcrcon -H 127.0.0.1 -P 25575 -p ${RCON_PASSWORD} "save-on" # Tell the player to log back in. ``` **Why kick + `save-off`:** if the player is online, the server holds their NBT in memory and rewrites the `.dat` on next save tick — clobbering the restore. `save-off` halts auto-save; kicking guarantees the in-memory state for that player won't be flushed. **Userns-remap reminder:** the host sees container-uid `100000` for files written by the MC process. Restored files written by `user` (uid 1000) will appear empty/permission-denied to the container. Always `chown 100000:100000` (or `chmod 666`) after restore. Memory: `project_nullstone_docker_userns`. --- ## Scenario 1 — Player lost inventory (T1, the void-death case) This is what the strategy was written for. RTO target: **< 2 minutes**. 1. Find the UUID: ```bash grep -i 'NICK' /opt/docker/minecraft/usercache.json ``` 2. Pick a snapshot just **before** the loss. `restic snapshots --tag playerdata` shows timestamps. 3. Run the TL;DR block above with that snapshot id (or `latest` if loss happened in the last 5 min). 4. Inform the player: "Your inventory from HH:MM has been restored. Anything you picked up after that point is gone." 5. Log the incident: append to `docs/INCIDENTS.md` (create if absent) — date, player, snapshot id, cause. --- ## Scenario 2 — Whole world rolled back (T2/T3, griefing or corruption) RTO target: **15 minutes**. Server downtime expected. 1. Announce, kick, stop: ```bash mcrcon ... "say Server going down for restore — back in ~15 min" mcrcon ... "kick @a Restore in progress" cd /opt/docker/minecraft && docker compose down ``` 2. Move live data aside (do not delete): ```bash mv /opt/docker/minecraft /opt/docker/minecraft.broken-$(date +%F) mkdir -p /opt/docker/minecraft ``` 3. Restore from the world repo: ```bash RESTIC_PASSWORD_FILE=/etc/mc-backup.pw \ restic -r /home/user/restic/mc-world \ restore --target /tmp/world-restore rsync -aHAX /tmp/world-restore/opt/docker/minecraft/ /opt/docker/minecraft/ ``` 4. **Re-apply userns-remap perms** (critical — see memory): ```bash chmod -R 777 /opt/docker/minecraft # quickfix; or chown -R 100000:100000 ``` 5. Boot: ```bash cd /opt/docker/minecraft && docker compose up -d docker logs -f minecraft-mc # watch for "Done" line ``` 6. Verify with a known-good UUID's `.dat` parse, then announce server up. 7. Keep `minecraft.broken-YYYY-MM-DD/` for at least 7 days for forensic comparison. --- ## Scenario 3 — Host disk dead (T5) RTO target: **few hours, depends on hardware swap**. 1. New host: install Debian 13 + Docker per `_github/infra/runbooks/MIGRATION-nullstone-to-cobblestone.md`. 2. `apt install restic`. Pull the password from operator's password manager into `/etc/mc-backup.pw`. 3. Initialise destination dir, then restore from **onyx mirror** (not local — local is gone): ```bash restic -r sftp:mc-backup@100.64.0.1:/backups/nullstone-mc-restic \ restore latest --target /tmp/world-restore ``` 4. Continue Scenario 2 from step 4. 5. Stand up the timers on the new host. **Do not** point them at the same off-host repo until the new host has been re-keyed (rotate restic passwords as part of disaster recovery). --- ## Drill log (monthly) | Date | Operator | Snapshot age | Class A restore time | Off-host restore time | Result | |------|----------|--------------|----------------------|------------------------|--------| | (first drill — 2026-06-06) | s8n | TBD | TBD | TBD | TBD | Procedure: see `BACKUP-STRATEGY.md` §7. --- ## What if no snapshot exists yet? (CURRENT REALITY 2026-05-07) Until phases 1–4 of `BACKUP-STRATEGY.md` are deployed, the only recovery resources are: | Source | What's there | Recoverable? | |---|---|---| | `/opt/backups/202604xx_020001/mc-world-backup-*.tar.gz` | World tar from Apr 29 + May 2 (others FAILED) | **GONE** — pruned by 7-day retention | | `/opt/backups/mc-plugins-prerebrand-2026-04-30.tar.gz` | Plugin jars only, no world | Not useful for player data | | Live `/opt/docker/minecraft/world/playerdata/.dat_old` | MC's own .dat_old shadow file from previous save | **YES** — last save tick before current. **First-line defence right now.** | | CoreProtect DB (`plugins/CoreProtect/database.db`) | Block + container actions, NOT inventory state | Partial — can roll back grief, can't restore lost items | **Today's playbook for inventory-loss reports:** 1. Server console → `co lookup u:NICK` to confirm the loss event in CoreProtect. 2. **Stop the server immediately** if the report comes in within the same play session — every save tick overwrites `.dat_old`. `docker compose down` buys time. 3. Inspect `world/playerdata/.dat_old` — if it predates the loss, copy it over `.dat`, fix perms (uid 100000), restart. 4. If `.dat_old` is too new (already overwritten): **the loss is unrecoverable until BACKUP-STRATEGY phases 1–4 are deployed.** Apologise to the player. Spawn-in compensation per operator discretion (ops creative-mode replacement is the customary remedy). 5. Log the incident — adds urgency to deploying the new strategy. --- ## Phase 1 deployment — DONE 2026-05-07 The daily fallback (`/opt/docker/backup.sh`) was repaired and redeployed. It now backs up MC world (~12 G compressed), plugins (~490 M), plugin DBs (~280 M), and configs nightly at 02:00, prunes after 7 days, and writes a sentinel `/opt/backups/.last-success` on success. External monitor (cron on onyx) — the simplest dead-man's switch until ntfy lands: ```bash # Add to onyx crontab, e.g. every 30 min */30 * * * * ssh user@192.168.0.100 \ 'find /opt/backups/.last-success -mmin -1500 | grep -q . || \ echo "ALERT: nullstone MC backup sentinel stale (>25h)"' \ | mail -s "MC backup stale" you@example.com ``` (swap `mail` for `notify-send`, `ntfy publish`, etc once those are wired) A copy of the pre-fix script is preserved at `/opt/docker/backup.sh.bak-20260507-pre-phase1` for forensic reference. --- ## Phase 2 deployment — restic playerdata snapshots every 5 min Implementation is in this repo: - `scripts/restic-backup-playerdata.sh` — the per-run script - `scripts/restic-init.sh` — one-time bootstrap (must run as root) - `scripts/systemd/mc-backup-playerdata.{service,timer}` — 5-min cadence - Strategy + retention + threat model in `BACKUP-STRATEGY.md` **Deployment status (2026-05-07): NOT YET DEPLOYED — operator action required.** `restic` is not on nullstone; installing it needs sudo, and `user`'s sudo is password-locked. Operator runs: ```bash # On nullstone, as root (sudo -i or via console) apt-get update && apt-get install -y restic mcrcon cd /opt/docker git -C /home/user/repos/minecraft-server pull \ || git clone ssh://git@192.168.0.100:222/s8n/minecraft-server.git /home/user/repos/minecraft-server cd /home/user/repos/minecraft-server # 1) Bootstrap repos + env file sudo bash scripts/restic-init.sh # 2) Install systemd units + run script sudo install -m 644 scripts/systemd/mc-backup-playerdata.service /etc/systemd/system/ sudo install -m 644 scripts/systemd/mc-backup-playerdata.timer /etc/systemd/system/ sudo install -m 755 scripts/restic-backup-playerdata.sh /usr/local/bin/ # 3) Enable + start sudo systemctl daemon-reload sudo systemctl enable --now mc-backup-playerdata.timer # 4) Verify systemctl list-timers mc-backup-playerdata.timer journalctl -u mc-backup-playerdata.service -n 50 --no-pager ls -la /home/user/restic/mc-frequent/ restic -r /home/user/restic/mc-frequent --password-file /etc/mc-backup.pw snapshots ``` The first run should appear within ~7 min (`OnBootSec=2min` + 5-min cadence). ### Off-host mirror to onyx (Phase 4 — separate) After Phase 2 is running cleanly for ~24h, provision `mc-backup` user on onyx with chrooted SFTP, then add a nightly `restic copy` job from nullstone. See `BACKUP-STRATEGY.md` §6 for the SFTP chroot config and §11 phase plan. Until then, the local nullstone repo is single-host — survives operator error and bad config edits, **not** disk failure. The Phase 1 daily tarball in `/opt/backups/` is the only redundancy until §6 lands. --- ## TODO — open items (links into BACKUP-STRATEGY.md §11) - [x] Phase 1: fix `/opt/docker/backup.sh` orphan-line bug (F-backup-1). **Done 2026-05-07.** - [ ] Phase 2: deploy `mc-backup-playerdata.timer` (Class A, 5-min). Scripts in repo; **blocked on operator running `apt install restic` + `restic-init.sh` with sudo**. - [ ] Phase 3: deploy `mc-backup-world.timer` (Class B/C/D, hourly). Script not yet drafted; will mirror playerdata script. - [ ] Phase 4: provision `mc-backup` user on onyx + `restic copy` job. - [ ] Phase 5: schedule monthly drill calendar entry, run first drill. - [ ] Phase 6: ntfy / Matrix alert wiring (depends on ntfy deployment). - [ ] Phase 7: friend RTX 4080 PC as secondary off-host. - [ ] Verify `usercache.json` on this host: confirm UUID lookup workflow above resolves to the right `.dat`. - [ ] Decide: `mcrcon` package vs lightweight Python `mcrcon` lib. - [ ] Document compensation policy for unrecoverable losses (operator discretion right now). - [ ] Drop dead `matrix-postgres` + `mongodb` + `synapse-*` blocks from `/opt/docker/backup.sh` once retirement is complete (currently they no-op-skip — minor noise in log only).