Replaced literal values with env-var placeholders (${RCON_PASSWORD},
${MGMT_SECRET}, ${MC_RCON_PASSWORD}) across server.properties,
.rcon-cli.env, docker-compose.yml(s), backup scripts, and AUDIT-2026-05-07.md.
Affected secrets:
- Paper management-server-secret (HIGH; mitigated by management-server-enabled=false)
- RCON password '*redacted*' (MEDIUM; bound to 127.0.0.1)
- MC_RCON_PASSWORD backup-pipeline default fallback (MEDIUM; same blast radius)
WARNING: HEAD redaction only — values remain in git history. Treat as
compromised and rotate (closes F-17 audit-finding's deferred TODO).
Originals backed up to private s8n/secrets/minecraft-server/.
21 KiB
Minecraft Backup Strategy — racked.ru on nullstone
Status: PROPOSAL (2026-05-07) — not yet implemented.
Author trigger: Player lost full inventory to void death today; rollback impossible because the existing 02:00 daily backup had silently failed for 5 of the last 7 days and there is zero off-host copy.
Owner: s8n (operator).
Target host: nullstone (192.168.0.100, Debian 13 trixie).
0. Current state (audited 2026-05-07)
Existing system in /opt/docker/backup.sh + cron.d/docker-backup (02:00 daily, 7-day retention in /opt/backups/).
Findings from /opt/backups/backup.log:
| Date | MC world result | Backup dir total |
|---|---|---|
| 2026-04-26 | FAILED | — |
| 2026-04-27 | FAILED | — |
| 2026-04-28 | FAILED | — |
| 2026-04-29 | OK (3.6 G) | — |
| 2026-04-30 | FAILED | — |
| 2026-05-01 | FAILED | — |
| 2026-05-02 | OK (3.6 G) | — |
| 2026-05-03 | (no MC log line) | 8 K |
| 2026-05-04 | (no MC log line) | 8 K |
| 2026-05-05 | (no MC log line) | 8 K |
| 2026-05-06 | (no MC log line) | 12 K |
| 2026-05-07 | (no MC log line) | 12 K |
After 2026-05-02 the entire MC block stopped emitting log lines. The script appears to be exiting before reaching it (the duplicated stray chmod 600 ... synapse-signing-key lines at L119–122 are orphaned from a botched edit and may now break set -e). Effective state: two MC backups in the last 12 days, both already pruned by 7-day retention. No usable backup exists right now.
Cross-references:
_github/infra/STATE.mdTop-5 weakness #2 ("backup.sh broken silently") and #5 ("No off-host backup")._github/infra/runbooks/MIGRATION-nullstone-to-cobblestone.md§5 already names thisF-backup-1and proposes "Restic + autorestic to B2/Wasabi or to nullstone-as-spare". This strategy refines that to use on-hand resources rather than paid storage.
Available resources (no purchasing required)
| Asset | Location | Free | Reachability | Role |
|---|---|---|---|---|
nullstone /home |
local NVMe (ext4 LVM) | 142 G of 399 G | local | Primary repo + restic cache |
onyx /home |
LUKS NVMe | 1.6 T of 1.9 T | Tailscale 100.64.0.1 (LAN ~5 ms) | Off-host primary |
| friend RTX 4080 PC | DESKTOP-LR0RILA | unknown (Windows, large) | Tailscale 100.64.0.3 (WAN, IP-stable via tailnet) | Off-host secondary (defer) |
nullstone /opt/backups |
same disk as /opt/docker |
142 G | local | Not a real backup target — same-disk SPOF |
No purchased B2 / Wasabi / S3 in this proposal. Tailscale + onyx covers off-host today. B2 stays in the future-options annex.
1. Threat model
| # | Threat | Concrete example | Frequency | Mitigation in this plan |
|---|---|---|---|---|
| T1 | Player accidental loss (void death, lava, fall) | YOU500, 2026-05-07 | weekly | 5-min playerdata snapshots (RPO ≤ 5 min) |
| T2 | Griefing / theft / chest emptied by ban-evader | possible | monthly | 5-min playerdata + 1-h world snapshots |
| T3 | World corruption (chunk error, region-file truncate) | rare | — | 6-h pre-flight validated full world snapshot |
| T4 | Plugin / config bad change (LuckPerms wipe, server.properties) | edits during ops | weekly | daily configs + DB dump + git history (live-server/ repo) |
| T5 | Host disk failure (single NVMe) | low/year | — | nightly off-host copy to onyx (Tailscale) |
| T6 | Ransomware / host compromise | low | — | append-only Restic repo on onyx; nullstone holds no delete key |
| T7 | Operator rm -rf or wrong docker compose down -v |
low | — | retention floor (4 weekly + 12 monthly) survives a recent rm |
| T8 | Backup script silently failing (current state) | OBSERVED | — | heartbeat alert + monthly restore drill (§7) |
T8 is the one that just bit us. The single most important addition is alerting on missed runs, not the storage tech.
2. RPO / RTO
| Class | Data | RPO | RTO | Backup mechanism |
|---|---|---|---|---|
| A | playerdata (world/playerdata/*.dat, stats/, advancements/) |
5 min | < 2 min per player | rcon save-all flush → rsync to local snapshot, then restic-add |
| B | full world (region files, end + nether) | 1 h during play, 6 h otherwise | 15 min | restic of world*/ |
| C | plugin configs + LuckPerms YAML | 24 h | 30 min | tar of plugins/*/config*.yml + LP file dump |
| D | LuckPerms / Homestead SQLite DBs (*.db, homestead_data.db) |
1 h | 5 min | sqlite .backup then restic-add |
| E | host-level configs (docker-compose.yml, server.properties, purpur.yml, bukkit.yml, paper-*.yml, whitelist.json, ops.json, banned-*.json, config/) |
24 h | 5 min | already in git repo _github/minecraft-server/; backup just covers drift |
Justification for RPO=5 min on Class A: the void-death case rebuilds in seconds — recovering one <uuid>.dat is a ~30 s operation if a 5-min-old snapshot exists. Snapshotting just the 1.3 MB playerdata/ dir is cheap (single-digit MB/day after dedup).
3. Tool choice — Restic
Compared:
| Tool | Dedup | Encryption | Snapshots | Network destinations | Verdict |
|---|---|---|---|---|---|
| restic | content-addressed, very effective on MC region files | AES-256, repo-key | yes | sftp (Tailscale), local, B2, S3, Azure, rclone | WINNER |
| borgbackup | similar | yes | yes | ssh only, lock-on-write | Equally good; restic chosen because operator already plans restic + autorestic per infra/STATE.md line 112; sftp dest is simpler than borg's required serverside binary |
| rsnapshot | hardlinks, no dedup | none | rotated dirs | local + rsync | No encryption ⇒ off-host copy on Tailscale (already encrypted) is fine, but no dedup means 18 G × N snapshots is painful. Reject. |
| zfs send | block-level | (zfs native) | snapshots | yes | nullstone is ext4/LVM, no ZFS, no btrfs. Reject. |
| LVM snapshot | COW | none | yes | local only | Same-disk only, doesn't survive disk failure. Useful as a staging primitive only. |
| custom rsync + cp -al | hardlinks | none | yes | yes | Reinventing rsnapshot. Reject. |
itzg BACKUP_* env |
tar to volume | none | rotation | local | Already tried in spirit by current backup.sh; same-disk; not granular. Reject as primary. |
Decision: restic for Classes A, B, C, D. Continue using a thin tar wrapper for Class E (configs are already in the git repo, this is just safety).
Restic strengths for our case:
- Region files dedup very well (chunks unchanged across snapshots).
- A 5-min Class-A snapshot adds ~MB to the repo, not the full 1.3 MB × N.
- One repo on local disk + one mirror to onyx via
rclone serve resticor directsftp:— no agent needed on onyx beyond ssh. restic check --read-data-subset=5%is the canonical scrub.
Apt: apt install restic on trixie ships 0.16.x — sufficient.
4. Schedule
All times Europe/London (matches TZ in compose file).
| Job | Cadence | Source | Destination | Mechanism |
|---|---|---|---|---|
| A — playerdata | every 5 min | world/playerdata/, world/stats/, world/advancements/, world*/level.dat, *.db (LP+homestead) |
restic repo /home/user/restic/mc-frequent/ |
systemd timer mc-backup-frequent.timer |
| B — full world | every 1 h during play (07:00–01:00), 6 h otherwise | world/, world_nether/, world_the_end/ |
restic repo /home/user/restic/mc-world/ |
systemd timer mc-backup-world.timer |
| C — configs + plugins | daily 02:00 | /opt/docker/minecraft/*.yml, *.json, plugins/*/config*.yml, plugins/LuckPerms/, docker-compose.yml |
restic repo mc-world (path-tagged) |
reuse same timer with second backup target |
| D — DB dumps | every 1 h | homestead_data.db, plugins/CoreProtect/database.db, plugins/LuckPerms/luckperms-h2-* |
restic repo mc-world |
timer hooks sqlite3 .backup first |
| E — off-host mirror | nightly 03:30 | nullstone /home/user/restic/ |
onyx 100.64.0.1:/home/admin/backups/nullstone-mc-restic/ |
restic copy over sftp (Tailscale) — append-only key on onyx side |
| F — verify | weekly Sun 04:00 | both repos | — | restic check --read-data-subset=5% then alert on rc |
| G — drill | monthly 1st Sat 11:00 | random snapshot | scratch dir | §7 procedure |
Why this works for the void-death case
T1 hits at 18:42. By 18:45 a Class-A snapshot exists containing the player's <uuid>.dat from 18:40. Restore: restic -r ... restore --target /tmp/r --include 'world/playerdata/<uuid>.dat' latest, stop server (or /save-off + minimanip), copy file into place, /save-on. Total RTO < 2 min.
5. Retention
Restic policy (passed to restic forget --keep-*):
--keep-last 24 # 24 most recent (covers 2h of 5-min snapshots)
--keep-hourly 24 # 24h of hourly
--keep-daily 7 # 7 days
--keep-weekly 4 # 4 weeks
--keep-monthly 12 # 12 months
Applied per-tag — Class A snapshots tagged playerdata, B/C/D tagged world. Forget is run only on the local repo; the onyx mirror inherits via restic copy with same policy after the local forget+prune.
Storage budget
- Class A: 1.3 MB raw × dedup (~20× on
.dat, mostly empty NBT slots) → ~70 KB / snapshot net.- 24/h × 24h × 7 = 4 032 snapshots/week → < 300 MB/week.
- Class B/C/D: 18 G raw → ~6.5 G compressed (per current 3.6 G figure × adjustment for nether/end now active). Restic dedup on hourly snapshots: ~50–200 MB delta/snapshot during active play.
- 24h hourly + 7 daily + 4 weekly + 12 monthly ≈ 47 retained → estimate 15–25 GB total at steady state.
- E (off-host): same as above on onyx (1.6 TB free — 30× headroom).
Conclusion: comfortably fits in nullstone's 142 G free. Onyx is essentially unconstrained.
6. Off-host destination — onyx via Tailscale
Choice: onyx (100.64.0.1, 1.6 TB free on /home). Reasons:
- Already in the tailnet (
tag:admin), already trusted, already SSH-reachable. - 1.6 TB is 100× the dataset.
- Operator's daily-driver: a missed-backup alert on onyx is seen.
- Deferred (phase 2): replicate to friend's RTX 4080 PC (100.64.0.3) for true geographic separation. Tailnet IP is stable across the friend's ISP IP changes per memory
project_friend_gpu.
Mechanics:
- On onyx: create restricted user
mc-backupwith~/backups/nullstone-mc-restic/and a~/.ssh/authorized_keysentry that only allowsinternal-sftpchrooted to that dir, no shell, no port-forward. (Match User mc-backup ... ChrootDirectory %h, ForceCommand internal-sftp -d /backups/nullstone-mc-restic). - On nullstone: install nullstone's ssh public key on onyx for that user. Use a second append-only restic key (separate password) so a compromised nullstone cannot run
forget/pruneon the onyx repo. Restic supports this via per-key--no-cache-friendly flags, but the harder lock comes from sftp chroot perms (set parent dir owner to root, givemc-backupwrite inside but nounlinkon rotated lockfiles? — practical compromise: rely onrestic copyadding-only and auditforgetruns). - Nightly job on nullstone:
restic -r sftp:mc-backup@100.64.0.1:/backups/nullstone-mc-restic copy --from-repo /home/user/restic/mc-world latest && ... mc-frequent .... - Onyx-side cron weekly:
restic checkon the mirror (independent verification).
Why not friend's GPU PC? Windows host, no built-in SSH, asymmetric trust. Defer to phase 2 once an SMB or rclone serve target is set up there.
7. Restore drill (monthly, 1st Saturday 11:00)
Runbook: docs/RUNBOOK-BACKUP-RESTORE.md (created alongside this proposal).
Drill scenario: "YOU500 lost his inventory to a void death 6 minutes ago." Steps:
- Pick a known UUID from
world/playerdata/(operator's own UUID). restic -r /home/user/restic/mc-frequent snapshots --tag playerdata | tail -5— confirm freshest snapshot is ≤ 6 min old.restic -r ... restore latest --target /tmp/drill-$(date +%s) --include 'world/playerdata/<uuid>.dat'.nbtedorpython -m nbtlibparse the.dat— confirm it's a valid GZIP NBT structure (not zero bytes, not partial).diffagainst the live.dat— log the differences (expected: at least the inventory NBT path differs because player kept playing).- Repeat from the onyx mirror repo to prove off-host works end-to-end.
- Log result to
docs/RUNBOOK-BACKUP-RESTORE.md§ Drill log.
Drill is non-destructive — never overwrite live .dat during a drill. Real restores follow §3 of the runbook.
Pass criteria: both restores complete in < 2 min wall-clock and the parsed NBT root tag is well-formed.
8. Implementation — concrete drafts
Two layers: a fix to the existing daily script (Class C/E) and a new sidecar timer for Classes A/B/D.
8.1 Fix /opt/docker/backup.sh (F-backup-1)
Already documented in infra/runbooks/MIGRATION-nullstone-to-cobblestone.md §5. Minimum work:
- Drop dead
matrix-postgresblock (Synapse retired). - Drop / fix
mongodbblock (RC stopped 2026-05-06). - Remove orphaned
chmod 600 ...synapse-signing-key...block at L119–122 (causingset -eexit before MC block on most days). - Wrap each module in
( ... ) || log "module FAILED"so one module's failure doesn't skip the rest.
Out-of-scope for this strategy doc — track in infra audit.
8.2 New: mc-backup-frequent (Class A) and mc-backup-world (Classes B/C/D)
Drop-in files (operator review before deploy):
/etc/systemd/system/mc-backup-frequent.service
[Unit]
Description=Minecraft frequent backup (playerdata, every 5 min)
After=docker.service
Wants=docker.service
[Service]
Type=oneshot
User=user
Group=docker
EnvironmentFile=/etc/mc-backup.env
ExecStart=/usr/local/bin/mc-backup-frequent.sh
Nice=10
IOSchedulingClass=best-effort
IOSchedulingPriority=7
/etc/systemd/system/mc-backup-frequent.timer
[Unit]
Description=Run mc-backup-frequent every 5 minutes
[Timer]
OnBootSec=2min
OnUnitActiveSec=5min
AccuracySec=30s
Persistent=true
[Install]
WantedBy=timers.target
/etc/mc-backup.env (mode 0600, owner user:docker)
RESTIC_REPOSITORY_FREQUENT=/home/user/restic/mc-frequent
RESTIC_REPOSITORY_WORLD=/home/user/restic/mc-world
RESTIC_PASSWORD_FILE=/etc/mc-backup.pw
MC_DATA=/opt/docker/minecraft
RCON_HOST=127.0.0.1
RCON_PORT=25575
RCON_PASS=${RCON_PASSWORD}
HEARTBEAT_URL=https://ntfy.s8n.ru/mc-backup-frequent
ALERT_URL=https://ntfy.s8n.ru/mc-backup-alerts
TS_OFFHOST_USER=mc-backup
TS_OFFHOST_HOST=100.64.0.1
TS_OFFHOST_PATH=/backups/nullstone-mc-restic
/usr/local/bin/mc-backup-frequent.sh
#!/usr/bin/env bash
set -euo pipefail
. /etc/mc-backup.env
trap 'curl -fsS -m 10 -d "fail rc=$?" "$ALERT_URL" >/dev/null || true' ERR
# 1. Ask MC to flush via rcon (best-effort; don't fail backup if rcon down)
if command -v mcrcon >/dev/null 2>&1; then
mcrcon -H "$RCON_HOST" -P "$RCON_PORT" -p "$RCON_PASS" -w 1 \
"save-all flush" >/dev/null 2>&1 || true
fi
# 2. Snapshot just the small fast-changing things
restic backup \
--tag playerdata \
--tag auto-5min \
--host nullstone \
--exclude='*.lock' \
"$MC_DATA/world/playerdata" \
"$MC_DATA/world/stats" \
"$MC_DATA/world/advancements" \
"$MC_DATA/world/level.dat" \
"$MC_DATA/world_nether/level.dat" \
"$MC_DATA/world_the_end/level.dat" \
"$MC_DATA/homestead_data.db" \
"$MC_DATA/plugins/LuckPerms" \
"$MC_DATA/plugins/CoreProtect/database.db" 2>/dev/null || true
# 3. Cheap retention (only on local repo)
restic forget --tag auto-5min \
--keep-last 24 --keep-hourly 24 --keep-daily 7 \
--prune --quiet
# 4. Heartbeat — alert if NOT received in 15 min via ntfy server
curl -fsS -m 5 "$HEARTBEAT_URL" >/dev/null || true
mc-backup-world.{service,timer,sh} — same shape, runs hourly during play / 6h otherwise (use OnCalendar=*-*-* 07,08,...,01:00:00 or two timers), backs up full world*/, configs, DB dumps. After local backup, runs:
restic copy \
--from-repo "$RESTIC_REPOSITORY_WORLD" \
-r "sftp:$TS_OFFHOST_USER@$TS_OFFHOST_HOST:$TS_OFFHOST_PATH" \
latest
And once nightly (separate timer) the same copy for mc-frequent.
8.3 docker-compose.override.yml — alternative path (rejected)
Considered: itzg image supports BACKUP_INTERVAL, BACKUP_METHOD=restic. Pros: in-container, knows when world is loaded. Cons:
- Bind-mount to host restic repo crosses userns-remap boundary (uid 100000 vs host uid 1000) — already a known nullstone footgun (memory
project_nullstone_docker_userns). - Container restart wipes restic cache, slow first run after every reboot.
- Mixing in-image and host-cron backup logic doubles failure surfaces.
Decision: keep backups in systemd on the host; container is unaware. Override file is not part of this proposal.
9. Monitoring & alerting
Three signals, all routed to ntfy on the existing self-hosted ntfy.s8n.ru (assumed to exist; if not, add as part of phase 1 — single-container deploy). DiscordSRV was dropped on 2026-04-30 per README.md L170, so Discord is not an option.
| Signal | Trigger | Channel |
|---|---|---|
mc-backup-frequent heartbeat |
timer fires successfully | ntfy topic mc-backup-frequent (silent on success) |
| Heartbeat missing > 15 min | dead-man's switch on ntfy server, or external (healthchecks.io is free + self-hostable) |
ntfy topic mc-backup-alerts (high priority) |
restic check weekly |
non-zero rc | ntfy topic mc-backup-alerts (high priority) |
| Off-host mirror failure | restic copy non-zero rc |
ntfy topic mc-backup-alerts (high priority) |
Operator subscribes onyx + phone to mc-backup-alerts only. The -frequent topic is a heartbeat sink (not a notification stream).
Alternative if no ntfy yet: write to /var/log/mc-backup.log AND a tiny status file /var/lib/mc-backup/last-success (mtime checked by an external monitor — Gatus on roadmap, Beszel on roadmap). Until either of those lands, a simple cron on onyx doing ssh user@nullstone 'find /var/lib/mc-backup/last-success -mmin -15 | grep .' and triggering a desktop notify-send is enough.
This addresses T8 (the silent-failure threat) directly.
10. Cost & capacity
Hardware cost: £0. Uses existing nullstone NVMe + onyx NVMe + existing Tailscale mesh.
Disk consumption (steady state, both repos):
| Where | Estimate | Headroom |
|---|---|---|
nullstone /home/user/restic/mc-frequent |
< 1 GB | 142 G free → ~140× |
nullstone /home/user/restic/mc-world |
15–25 GB | ~6× |
onyx ~/backups/nullstone-mc-restic/ |
16–26 GB | 1.6 T free → ~60× |
Days of retention given current free space: even if the world doubles to 36 GB raw, dedup keeps growth linear at ~5 % per snapshot — well over a year of monthly retention fits.
Network: Tailscale LAN-direct (5 ms onyx ↔ nullstone). Nightly delta typically < 500 MB after dedup. Negligible.
Operator time: ~2 h initial deploy, ~10 min/month for the drill, ~zero on autopilot.
11. Phase plan
| Phase | What | When | Blocker |
|---|---|---|---|
| 0 | This doc + runbook stub written, reviewed | TODAY | — |
| 1 | Stop the bleeding: fix backup.sh orphan lines so daily MC tar at least runs again |
TODAY (15 min) | — |
| 2 | Stand up mc-backup-frequent timer + local restic repo (Class A) |
this week | needs apt install restic mcrcon |
| 3 | Add mc-backup-world timer + Class B/C/D |
this week | — |
| 4 | Onyx off-host SFTP target + restic copy job |
this week | onyx user provisioning + ssh key |
| 5 | First monthly drill | next 1st Saturday | — |
| 6 | Wire ntfy alerts | when ntfy/Gatus deployed (infra roadmap) | external |
| 7 | Friend RTX 4080 PC as second off-host (geographic) | phase 2 | Windows-side tooling |
Phases 1–4 are doable today with what's on hand. Nothing in phases 1–5 requires purchasing.
12. Open questions for operator
- ntfy.s8n.ru — does it exist yet? Memory hints at Tuwunel + Matrix on
txt.s8n.ru. If ntfy isn't deployed, decide: deploy ntfy now, or use Matrix room via Tuwunel webhook bridge as alert sink. - Onyx user
mc-backup— create today or reuse existingadminwith restricted authorized_keys? Restricted user is cleaner; reusingadminis faster. - Append-only enforcement on the onyx side — accept "sftp chroot + no shell" as good-enough, or invest in a per-repo restic key with
--no-delete-style isolation (more work, partial mitigation only)? - Pre-flight world validation — run
region-fixeragainst the latest snapshot weekly to catch silent corruption (T3)? Adds ~5 min compute weekly. Recommend yes. - Class-E (host configs) — already in
live-server/git repo via Syncthing/manual? If yes, drop Class E from this scheme; if no, add it.
13. References
docs/BACKUP.md— current (broken) state docs.docs/RUNBOOK-BACKUP-RESTORE.md— operational runbook (this commit).scripts/backup.sh— to-be-fixed daily script (F-backup-1 ininfra/STATE.md)._github/infra/STATE.md— Top-5 weakness #2 + #5 tracking this work._github/infra/runbooks/MIGRATION-nullstone-to-cobblestone.md§5 — F-backup-1 detail; nullstone-as-spare hint.- Memory:
project_friend_gpu(Tailscale stable IP for friend),project_tailscale_mesh(mesh layout),project_nullstone_docker_userns(why container-side backup is rejected). CLAUDE.mdDevice Registry — onyx 192.168.0.28 / 100.64.0.1.