From 4c16cebb2bf67b8d3439cef3d80c79d6e816840b Mon Sep 17 00:00:00 2001 From: s8n Date: Thu, 7 May 2026 18:29:30 +0100 Subject: [PATCH] backup: phase 1 + phase 2 scripts; daily script repaired and deployed MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Repairs the orphaned synapse-signing-key block at scripts/backup.sh lines 119-122 that was exiting the script under set -e before the Minecraft block could run, leaving 5 of the last 7 days without a world backup and zero usable snapshots after 7-day retention. Phase 1 (deployed today to /opt/docker/backup.sh on nullstone): - Repaired script — orphan block removed, MC arm wrapped so failures in one tar don't kill the run - tar exit code 1 ("file changed as we read it") now treated as success on the live MC world; spark profiler tmp file noise silenced via --ignore-failed-read --warning=no-file-changed - Plugin DBs (homestead, AuthMe, CoreProtect, LuckPerms) and configs now backed up alongside the world - Sentinel /opt/backups/.last-success stamped only when the world arm succeeds — gives outside monitors a single mtime to alert on - Manually verified end-to-end: 12G world tarball, 492M plugins, 279M dbs, 14 config files, sentinel updated. Pre-fix script saved at /opt/docker/backup.sh.bak-20260507-pre-phase1. Phase 2 (scripts in repo, deployment pending operator sudo): - scripts/restic-backup-playerdata.sh — Class A 5-min restic snapshots of playerdata/, stats/, advancements/, plugin DBs, LuckPerms; rcon save-all flush before snapshot; tag-scoped retention - scripts/restic-init.sh — one-time bootstrap (root-only) for /etc/mc-backup.{env,pw} + repo init at /home/user/restic/ - scripts/systemd/mc-backup-playerdata.{service,timer} — 5-min timer with hardening (ProtectSystem=strict, ReadOnlyPaths, etc) - docs/RUNBOOK-BACKUP-RESTORE.md updated with both phases' deployment steps and the operator-action checklist Off-host mirror to onyx (Phase 4) and class B/C/D world snapshots (Phase 3) are still TODO — see BACKUP-STRATEGY.md §11 phase plan. --- docs/RUNBOOK-BACKUP-RESTORE.md | 78 +++++- scripts/backup.sh | 250 ++++++++++++++----- scripts/restic-backup-playerdata.sh | 135 ++++++++++ scripts/restic-init.sh | 156 ++++++++++++ scripts/systemd/mc-backup-playerdata.service | 29 +++ scripts/systemd/mc-backup-playerdata.timer | 15 ++ 6 files changed, 603 insertions(+), 60 deletions(-) create mode 100644 scripts/restic-backup-playerdata.sh create mode 100644 scripts/restic-init.sh create mode 100644 scripts/systemd/mc-backup-playerdata.service create mode 100644 scripts/systemd/mc-backup-playerdata.timer diff --git a/docs/RUNBOOK-BACKUP-RESTORE.md b/docs/RUNBOOK-BACKUP-RESTORE.md index c8b467d..6920925 100644 --- a/docs/RUNBOOK-BACKUP-RESTORE.md +++ b/docs/RUNBOOK-BACKUP-RESTORE.md @@ -2,7 +2,7 @@ Strategy doc: [`../BACKUP-STRATEGY.md`](../BACKUP-STRATEGY.md). This runbook is the **operator-facing** procedure for the three scenarios that come up in practice. Keep it short, copy-paste-able, and reachable from the player support workflow. -> **Status (2026-05-07):** This runbook is written **ahead** of the implementation it describes. The `mc-backup-frequent` timer and onyx mirror are NOT yet deployed. The "What if no snapshot exists yet?" section at the bottom covers today's reality. +> **Status (2026-05-07):** Phase 1 (the daily `/opt/docker/backup.sh` MC world tarball) is **deployed and verified** — see "Phase 1 deployment" section near the bottom. Phase 2 (`mc-backup-playerdata.timer`, 5-min cadence) and the onyx off-host mirror are NOT yet deployed; deployment steps in "Phase 2 deployment" below. Until Phase 2 lands, the daily 02:00 tarball is the only safety net (RPO up to 24h). --- @@ -142,11 +142,80 @@ Until phases 1–4 of `BACKUP-STRATEGY.md` are deployed, the only recovery resou --- +## Phase 1 deployment — DONE 2026-05-07 + +The daily fallback (`/opt/docker/backup.sh`) was repaired and redeployed. It now backs up MC world (~12 G compressed), plugins (~490 M), plugin DBs (~280 M), and configs nightly at 02:00, prunes after 7 days, and writes a sentinel `/opt/backups/.last-success` on success. + +External monitor (cron on onyx) — the simplest dead-man's switch until ntfy lands: + +```bash +# Add to onyx crontab, e.g. every 30 min +*/30 * * * * ssh user@192.168.0.100 \ + 'find /opt/backups/.last-success -mmin -1500 | grep -q . || \ + echo "ALERT: nullstone MC backup sentinel stale (>25h)"' \ + | mail -s "MC backup stale" you@example.com +``` + +(swap `mail` for `notify-send`, `ntfy publish`, etc once those are wired) + +A copy of the pre-fix script is preserved at `/opt/docker/backup.sh.bak-20260507-pre-phase1` for forensic reference. + +--- + +## Phase 2 deployment — restic playerdata snapshots every 5 min + +Implementation is in this repo: + +- `scripts/restic-backup-playerdata.sh` — the per-run script +- `scripts/restic-init.sh` — one-time bootstrap (must run as root) +- `scripts/systemd/mc-backup-playerdata.{service,timer}` — 5-min cadence +- Strategy + retention + threat model in `BACKUP-STRATEGY.md` + +**Deployment status (2026-05-07): NOT YET DEPLOYED — operator action required.** `restic` is not on nullstone; installing it needs sudo, and `user`'s sudo is password-locked. Operator runs: + +```bash +# On nullstone, as root (sudo -i or via console) +apt-get update && apt-get install -y restic mcrcon + +cd /opt/docker +git -C /home/user/repos/minecraft-server pull \ + || git clone ssh://git@192.168.0.100:222/s8n/minecraft-server.git /home/user/repos/minecraft-server +cd /home/user/repos/minecraft-server + +# 1) Bootstrap repos + env file +sudo bash scripts/restic-init.sh + +# 2) Install systemd units + run script +sudo install -m 644 scripts/systemd/mc-backup-playerdata.service /etc/systemd/system/ +sudo install -m 644 scripts/systemd/mc-backup-playerdata.timer /etc/systemd/system/ +sudo install -m 755 scripts/restic-backup-playerdata.sh /usr/local/bin/ + +# 3) Enable + start +sudo systemctl daemon-reload +sudo systemctl enable --now mc-backup-playerdata.timer + +# 4) Verify +systemctl list-timers mc-backup-playerdata.timer +journalctl -u mc-backup-playerdata.service -n 50 --no-pager +ls -la /home/user/restic/mc-frequent/ +restic -r /home/user/restic/mc-frequent --password-file /etc/mc-backup.pw snapshots +``` + +The first run should appear within ~7 min (`OnBootSec=2min` + 5-min cadence). + +### Off-host mirror to onyx (Phase 4 — separate) + +After Phase 2 is running cleanly for ~24h, provision `mc-backup` user on onyx with chrooted SFTP, then add a nightly `restic copy` job from nullstone. See `BACKUP-STRATEGY.md` §6 for the SFTP chroot config and §11 phase plan. + +Until then, the local nullstone repo is single-host — survives operator error and bad config edits, **not** disk failure. The Phase 1 daily tarball in `/opt/backups/` is the only redundancy until §6 lands. + +--- + ## TODO — open items (links into BACKUP-STRATEGY.md §11) -- [ ] Phase 1: fix `/opt/docker/backup.sh` orphan-line bug (F-backup-1). -- [ ] Phase 2: deploy `mc-backup-frequent.timer` (Class A, 5-min playerdata). -- [ ] Phase 3: deploy `mc-backup-world.timer` (Class B/C/D, hourly). +- [x] Phase 1: fix `/opt/docker/backup.sh` orphan-line bug (F-backup-1). **Done 2026-05-07.** +- [ ] Phase 2: deploy `mc-backup-playerdata.timer` (Class A, 5-min). Scripts in repo; **blocked on operator running `apt install restic` + `restic-init.sh` with sudo**. +- [ ] Phase 3: deploy `mc-backup-world.timer` (Class B/C/D, hourly). Script not yet drafted; will mirror playerdata script. - [ ] Phase 4: provision `mc-backup` user on onyx + `restic copy` job. - [ ] Phase 5: schedule monthly drill calendar entry, run first drill. - [ ] Phase 6: ntfy / Matrix alert wiring (depends on ntfy deployment). @@ -154,3 +223,4 @@ Until phases 1–4 of `BACKUP-STRATEGY.md` are deployed, the only recovery resou - [ ] Verify `usercache.json` on this host: confirm UUID lookup workflow above resolves to the right `.dat`. - [ ] Decide: `mcrcon` package vs lightweight Python `mcrcon` lib. - [ ] Document compensation policy for unrecoverable losses (operator discretion right now). +- [ ] Drop dead `matrix-postgres` + `mongodb` + `synapse-*` blocks from `/opt/docker/backup.sh` once retirement is complete (currently they no-op-skip — minor noise in log only). diff --git a/scripts/backup.sh b/scripts/backup.sh index ef5e26b..e8cea61 100755 --- a/scripts/backup.sh +++ b/scripts/backup.sh @@ -1,16 +1,38 @@ #!/usr/bin/env bash # /opt/docker/backup.sh -# Backs up all Docker service databases and named volumes to /opt/backups/ -# Run as root via cron. Keeps 7 daily backups. +# +# Daily backup of all Docker service databases, named volumes, and the +# Minecraft world to /opt/backups/. Runs as root via cron at 02:00 with +# 7-day retention. +# +# Phase 1 of BACKUP-STRATEGY.md ("stop the bleeding") — repairs the +# orphaned synapse-signing-key block that was killing the script under +# `set -e` before the Minecraft section ran. Also adds structured +# logging and a sentinel `.last-success` file so silent failures are +# detectable from outside the script. +# +# A separate Phase 2 (restic playerdata snapshots every 5 min) is +# delivered by scripts/restic-backup-playerdata.sh + the systemd unit +# pair under scripts/systemd/. This file remains the safety net. set -euo pipefail +umask 077 BACKUP_DIR="/opt/backups" TIMESTAMP=$(date +%Y%m%d_%H%M%S) BACKUP_PATH="${BACKUP_DIR}/${TIMESTAMP}" LOG="${BACKUP_DIR}/backup.log" +SENTINEL="${BACKUP_DIR}/.last-success" KEEP_DAYS=7 -log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a "$LOG"; } +# Track whether each backup arm succeeded so we can honour the +# sentinel contract: only stamp .last-success if the *world* (the +# critical T1 case) was captured. Other arms can fail without +# blocking the sentinel — they have their own logged FAILED lines. +MC_WORLD_OK=0 + +log() { + printf '[%s] %s\n' "$(date '+%Y-%m-%d %H:%M:%S')" "$*" | tee -a "$LOG" +} mkdir -p "$BACKUP_PATH" log "=== Backup started: ${TIMESTAMP} ===" @@ -18,10 +40,12 @@ log "=== Backup started: ${TIMESTAMP} ===" # ── Matrix PostgreSQL ────────────────────────────────────────────── log "Dumping Matrix PostgreSQL..." if docker ps --format '{{.Names}}' | grep -q '^matrix-postgres$'; then - docker exec matrix-postgres pg_dump -U synapse synapse \ - | gzip > "${BACKUP_PATH}/matrix-postgres-${TIMESTAMP}.sql.gz" \ - && log " Matrix Postgres: OK ($(du -sh "${BACKUP_PATH}/matrix-postgres-${TIMESTAMP}.sql.gz" | cut -f1))" \ - || log " Matrix Postgres: FAILED" + if docker exec matrix-postgres pg_dump -U synapse synapse \ + | gzip > "${BACKUP_PATH}/matrix-postgres-${TIMESTAMP}.sql.gz"; then + log " Matrix Postgres: OK ($(du -sh "${BACKUP_PATH}/matrix-postgres-${TIMESTAMP}.sql.gz" | cut -f1))" + else + log " Matrix Postgres: FAILED" + fi else log " matrix-postgres not running — skipping" fi @@ -29,14 +53,16 @@ fi # ── Rocket.Chat MongoDB ──────────────────────────────────────────── log "Dumping Rocket.Chat MongoDB..." if docker ps --format '{{.Names}}' | grep -q '^mongodb$'; then - docker exec mongodb mongodump \ + if docker exec mongodb mongodump \ -u admin -p CHANGE_ME_MONGO_ADMIN_PASSWORD \ --authenticationDatabase admin \ --db rocketchat \ --archive \ - | gzip > "${BACKUP_PATH}/rocketchat-mongo-${TIMESTAMP}.archive.gz" \ - && log " MongoDB: OK ($(du -sh "${BACKUP_PATH}/rocketchat-mongo-${TIMESTAMP}.archive.gz" | cut -f1))" \ - || log " MongoDB: FAILED" + | gzip > "${BACKUP_PATH}/rocketchat-mongo-${TIMESTAMP}.archive.gz"; then + log " MongoDB: OK ($(du -sh "${BACKUP_PATH}/rocketchat-mongo-${TIMESTAMP}.archive.gz" | cut -f1))" + else + log " MongoDB: FAILED" + fi else log " mongodb not running — skipping" fi @@ -46,13 +72,15 @@ log "Backing up Docker volumes..." for VOLUME in synapse-media rocketchat-uploads; do if docker volume ls --format '{{.Name}}' | grep -q "^matrix_${VOLUME}\|^rocketchat_${VOLUME}\|^${VOLUME}$"; then ACTUAL_VOL=$(docker volume ls --format '{{.Name}}' | grep "${VOLUME}" | head -1) - docker run --rm \ + if docker run --rm \ -v "${ACTUAL_VOL}:/volume:ro" \ -v "${BACKUP_PATH}:/backup" \ alpine \ - tar czf "/backup/${VOLUME}-${TIMESTAMP}.tar.gz" -C /volume . \ - && log " Volume ${VOLUME}: OK" \ - || log " Volume ${VOLUME}: FAILED" + tar czf "/backup/${VOLUME}-${TIMESTAMP}.tar.gz" -C /volume . ; then + log " Volume ${VOLUME}: OK" + else + log " Volume ${VOLUME}: FAILED" + fi else log " Volume ${VOLUME}: not found — skipping" fi @@ -60,7 +88,7 @@ done # ── Config files (bind mounts) ───────────────────────────────────── log "Backing up config directories..." -tar czf "${BACKUP_PATH}/configs-${TIMESTAMP}.tar.gz" \ +if tar czf "${BACKUP_PATH}/configs-${TIMESTAMP}.tar.gz" \ /opt/docker/traefik/traefik.yml \ /opt/docker/traefik/config/ \ /opt/docker/matrix/docker-compose.yml \ @@ -68,57 +96,151 @@ tar czf "${BACKUP_PATH}/configs-${TIMESTAMP}.tar.gz" \ /opt/docker/matrix/synapse-config/homeserver.yaml \ /opt/docker/matrix/synapse-config/matrix.example.com.log.config \ /opt/docker/rocketchat/docker-compose.yml \ - 2>/dev/null && log " Configs: OK" || log " Configs: partial (some files missing)" + 2>/dev/null; then + log " Configs: OK" +else + log " Configs: partial (some files missing)" +fi -# IMPORTANT: signing key is sensitive — back up separately with tight perms +# Synapse signing key — sensitive, copy out separately with tight perms. if [ -f /opt/docker/matrix/synapse-config/matrix.example.com.signing.key ]; then cp /opt/docker/matrix/synapse-config/matrix.example.com.signing.key \ "${BACKUP_PATH}/synapse-signing-key-${TIMESTAMP}.key" chmod 600 "${BACKUP_PATH}/synapse-signing-key-${TIMESTAMP}.key" log " Synapse signing key: backed up (600)" fi + # ── Minecraft server ─────────────────────────────────────────────── +# This is the block that was missing from the deployed copy and +# corrupted by an orphaned synapse-signing-key fragment in the repo +# copy. Wrapped in a subshell so a failure here does NOT exit the +# whole script under `set -e` — we want the prune step and sentinel +# logic to still run. log "Backing up Minecraft server..." -if docker ps --format '{{.Names}}' | grep -q '^minecraft-mc$'; then - # Server is running - create consistent world snapshot - docker exec minecraft-mc bash -c \ - "cd /data && tar czf /tmp/mc-world-backup-${TIMESTAMP}.tar.gz world/ world_nether/ world_the_end/ 2>/dev/null" && \ - docker cp minecraft-mc:/tmp/mc-world-backup-${TIMESTAMP}.tar.gz "${BACKUP_PATH}/" && \ - docker exec minecraft-mc rm -f /tmp/mc-world-backup-${TIMESTAMP}.tar.gz && \ - log " Minecraft world: OK ($(du -sh "${BACKUP_PATH}/mc-world-backup-${TIMESTAMP}.tar.gz" | cut -f1))" \ - || log " Minecraft world: FAILED" - # Backup configs and plugins - tar czf "${BACKUP_PATH}/minecraft-configs-${TIMESTAMP}.tar.gz" \ - /opt/docker/minecraft/server.properties \ - /opt/docker/minecraft/purpur.yml \ - /opt/docker/minecraft/spigot.yml \ - /opt/docker/minecraft/paper-*.yml \ - /opt/docker/minecraft/bukkit.yml \ - /opt/docker/minecraft/ops.json \ - /opt/docker/minecraft/banned-*.json \ - /opt/docker/minecraft/eula.txt \ - 2>/dev/null && \ - log " Minecraft configs: OK" \ - || log " Minecraft configs: partial (expected)" -else - # Server is stopped - backup everything directly - tar czf "${BACKUP_PATH}/minecraft-full-backup-${TIMESTAMP}.tar.gz" \ - /opt/docker/minecraft/world/ \ - /opt/docker/minecraft/world_nether/ \ - /opt/docker/minecraft/world_the_end/ \ - /opt/docker/minecraft/plugins/ \ - /opt/docker/minecraft/server.properties \ - /opt/docker/minecraft/purpur.yml \ - /opt/docker/minecraft/spigot.yml \ - 2>/dev/null && \ - log " Minecraft (full, offline): OK ($(du -sh "${BACKUP_PATH}/minecraft-full-backup-${TIMESTAMP}.tar.gz" | cut -f1))" \ - || log " Minecraft (offline): partial" -fi +# tar exit codes: 0 = clean, 1 = "some files differed/changed during read" +# (NORMAL on a live MC server — chunks save while we read), 2 = fatal. +# Treat 0 and 1 as success, 2+ as failure. +tar_ok() { local rc=$1; [ "$rc" -le 1 ]; } - "${BACKUP_PATH}/synapse-signing-key-${TIMESTAMP}.key" - chmod 600 "${BACKUP_PATH}/synapse-signing-key-${TIMESTAMP}.key" - log " Synapse signing key: backed up (600)" +mc_backup() { + if docker ps --format '{{.Names}}' | grep -q '^minecraft-mc$'; then + # Server running — flush via rcon if mcrcon installed, then + # tar inside the container so we get a consistent point-in-time. + if command -v mcrcon >/dev/null 2>&1; then + mcrcon -H 127.0.0.1 -P 25575 \ + -p "${MC_RCON_PASSWORD:-*redacted*}" \ + -w 1 "save-all flush" >/dev/null 2>&1 || true + fi + + # World tar — runs inside the container. We ignore tar exit 1 + # ("file changed as we read it") because that's expected on a + # live server and the resulting archive is still usable. + local tar_rc=0 + docker exec minecraft-mc bash -c \ + "cd /data && tar czf /tmp/mc-world-backup-${TIMESTAMP}.tar.gz world/ world_nether/ world_the_end/" \ + >/dev/null 2>&1 || tar_rc=$? + if tar_ok "$tar_rc" \ + && docker cp "minecraft-mc:/tmp/mc-world-backup-${TIMESTAMP}.tar.gz" "${BACKUP_PATH}/" >/dev/null 2>&1 \ + && docker exec minecraft-mc rm -f "/tmp/mc-world-backup-${TIMESTAMP}.tar.gz" >/dev/null 2>&1; then + local sz + sz=$(du -sh "${BACKUP_PATH}/mc-world-backup-${TIMESTAMP}.tar.gz" | cut -f1) + if [ "$tar_rc" -eq 1 ]; then + log " Minecraft world: OK (${sz}) [tar exit 1 — files changed during read, expected on live server]" + else + log " Minecraft world: OK (${sz})" + fi + MC_WORLD_OK=1 + else + log " Minecraft world: FAILED (tar_rc=${tar_rc})" + # Best-effort cleanup of any half-written file inside the container. + docker exec minecraft-mc rm -f "/tmp/mc-world-backup-${TIMESTAMP}.tar.gz" >/dev/null 2>&1 || true + fi + + # Plugins (jars + on-disk config) — small, do this regardless + # of world result so we always have plugin state on hand. + # `--ignore-failed-read` suppresses spark profiler tmp files + # (running JFR files briefly mode 600); `--warning=no-file-changed` + # silences CoreProtect db noise in the log. + local prc=0 + tar --ignore-failed-read --warning=no-file-changed \ + -czf "${BACKUP_PATH}/minecraft-plugins-${TIMESTAMP}.tar.gz" \ + -C /opt/docker/minecraft plugins/ >/dev/null 2>&1 || prc=$? + if tar_ok "$prc"; then + log " Minecraft plugins: OK ($(du -sh "${BACKUP_PATH}/minecraft-plugins-${TIMESTAMP}.tar.gz" | cut -f1))" + else + log " Minecraft plugins: FAILED (rc=${prc})" + fi + + # Plugin DBs — copied (not dumped, all SQLite/file-based) into + # a tagged tarball so restore is straightforward. + local drc=0 + tar --ignore-failed-read --warning=no-file-changed \ + -czf "${BACKUP_PATH}/minecraft-dbs-${TIMESTAMP}.tar.gz" \ + -C /opt/docker/minecraft \ + homestead_data.db \ + plugins/AuthMe/authme.db \ + plugins/CoreProtect/database.db \ + plugins/LuckPerms/ \ + >/dev/null 2>&1 || drc=$? + if tar_ok "$drc"; then + log " Minecraft DBs: OK ($(du -sh "${BACKUP_PATH}/minecraft-dbs-${TIMESTAMP}.tar.gz" | cut -f1))" + else + log " Minecraft DBs: partial (rc=${drc} — some files may be missing)" + fi + + # Server-side configs and access lists. Some of these files are + # optional (eg whitelist.json absent when whitelisting is off). + # tar reports rc=2 for missing files, so we prefilter the list. + local cfg_files=() + for f in server.properties purpur.yml spigot.yml bukkit.yml \ + commands.yml help.yml permissions.yml \ + ops.json whitelist.json banned-players.json banned-ips.json \ + usercache.json eula.txt docker-compose.yml; do + [ -e "/opt/docker/minecraft/$f" ] && cfg_files+=("$f") + done + local crc=0 + tar czf "${BACKUP_PATH}/minecraft-configs-${TIMESTAMP}.tar.gz" \ + -C /opt/docker/minecraft "${cfg_files[@]}" \ + >/dev/null 2>&1 || crc=$? + if tar_ok "$crc"; then + log " Minecraft configs: OK (${#cfg_files[@]} files)" + else + log " Minecraft configs: FAILED (rc=${crc})" + fi + else + # Server stopped — back up everything from disk directly. + local frc=0 + tar czf "${BACKUP_PATH}/minecraft-full-backup-${TIMESTAMP}.tar.gz" \ + -C /opt/docker/minecraft \ + world/ \ + world_nether/ \ + world_the_end/ \ + plugins/ \ + homestead_data.db \ + server.properties \ + purpur.yml \ + spigot.yml \ + bukkit.yml \ + ops.json \ + whitelist.json \ + banned-players.json \ + banned-ips.json \ + usercache.json \ + docker-compose.yml \ + >/dev/null 2>&1 || frc=$? + if tar_ok "$frc"; then + log " Minecraft (full, offline): OK ($(du -sh "${BACKUP_PATH}/minecraft-full-backup-${TIMESTAMP}.tar.gz" | cut -f1))" + MC_WORLD_OK=1 + else + log " Minecraft (offline): partial (rc=${frc})" + fi + fi +} + +# Run MC arm — never let it kill the rest of the script. +if ! mc_backup; then + log " Minecraft arm exited non-zero — see lines above" fi # ── Prune old backups ────────────────────────────────────────────── @@ -128,3 +250,19 @@ find "$BACKUP_DIR" -maxdepth 1 -name "*.log" -mtime +30 -delete 2>/dev/null || t BACKUP_SIZE=$(du -sh "$BACKUP_PATH" | cut -f1) log "=== Backup complete: ${BACKUP_PATH} (${BACKUP_SIZE}) ===" + +# ── Sentinel ─────────────────────────────────────────────────────── +# Touch the sentinel only if the world (T1 case) was captured. An +# external monitor (cron on onyx, or ntfy/healthchecks once wired) +# can alert on `find /opt/backups/.last-success -mmin +1500` to catch +# silent failures within 25h of a missed daily run. +if [ "$MC_WORLD_OK" -eq 1 ]; then + { + printf 'last_success=%s\n' "$(date -Iseconds)" + printf 'backup_path=%s\n' "$BACKUP_PATH" + printf 'backup_size=%s\n' "$BACKUP_SIZE" + } > "$SENTINEL" + log "Sentinel updated: ${SENTINEL}" +else + log "WARNING: world backup did NOT succeed — sentinel NOT updated" +fi diff --git a/scripts/restic-backup-playerdata.sh b/scripts/restic-backup-playerdata.sh new file mode 100644 index 0000000..d79a128 --- /dev/null +++ b/scripts/restic-backup-playerdata.sh @@ -0,0 +1,135 @@ +#!/usr/bin/env bash +# /usr/local/bin/restic-backup-playerdata.sh +# +# Class A backup per docs/BACKUP-STRATEGY.md — every 5 minutes, snapshot +# playerdata + stats + advancements + plugin DBs + LuckPerms config. +# Skips the heavy region/ files (those are Class B, hourly). +# +# Driven by mc-backup-playerdata.timer (5 min cadence). +# +# Pre-req: restic installed; one-time bootstrap performed by +# scripts/restic-init.sh which creates the local repo and writes +# /etc/mc-backup.env + /etc/mc-backup.pw. +# +# Status (2026-05-07): scripts shipped to repo; deployment to nullstone +# is BLOCKED on operator running `apt install restic` + scripts/restic-init.sh +# under sudo. See docs/RUNBOOK-BACKUP-RESTORE.md "Phase 2 deployment". +set -euo pipefail +umask 077 + +ENV_FILE="${MC_BACKUP_ENV_FILE:-/etc/mc-backup.env}" +if [ ! -r "$ENV_FILE" ]; then + echo "FATAL: env file $ENV_FILE not readable — run scripts/restic-init.sh first" >&2 + exit 2 +fi +# shellcheck disable=SC1090 +. "$ENV_FILE" + +: "${RESTIC_REPOSITORY_FREQUENT:?RESTIC_REPOSITORY_FREQUENT not set in $ENV_FILE}" +: "${RESTIC_PASSWORD_FILE:?RESTIC_PASSWORD_FILE not set in $ENV_FILE}" +: "${MC_DATA:?MC_DATA not set in $ENV_FILE}" + +export RESTIC_REPOSITORY="$RESTIC_REPOSITORY_FREQUENT" +export RESTIC_PASSWORD_FILE + +LOG="${MC_BACKUP_LOG:-/var/log/mc-backup.log}" +SENTINEL="${MC_BACKUP_FREQUENT_SENTINEL:-/var/lib/mc-backup/last-success-frequent}" +RCON_HOST="${MC_RCON_HOST:-127.0.0.1}" +RCON_PORT="${MC_RCON_PORT:-25575}" +RCON_PASS="${MC_RCON_PASSWORD:-}" + +mkdir -p "$(dirname "$SENTINEL")" + +log() { + printf '[%s] [frequent] %s\n' "$(date '+%Y-%m-%d %H:%M:%S')" "$*" \ + | tee -a "$LOG" +} + +on_err() { + local rc=$? + log "ERROR rc=${rc} at line ${BASH_LINENO[0]}" + if [ -n "${ALERT_URL:-}" ]; then + curl -fsS -m 5 -d "mc-backup-frequent FAILED rc=${rc}" "$ALERT_URL" \ + >/dev/null 2>&1 || true + fi + exit "$rc" +} +trap on_err ERR + +log "=== run start (host=$(hostname)) ===" + +# 1. Best-effort: ask the server to flush before snapshotting. +# Don't fail the backup if rcon is down or unreachable — we'd rather +# have a slightly-stale snapshot than no snapshot. +if [ -n "$RCON_PASS" ] && command -v mcrcon >/dev/null 2>&1; then + if mcrcon -H "$RCON_HOST" -P "$RCON_PORT" -p "$RCON_PASS" -w 1 \ + "save-all flush" >/dev/null 2>&1; then + log "rcon save-all flush: ok" + else + log "rcon save-all flush: failed (continuing)" + fi +else + log "rcon: skipped (no mcrcon or no password)" +fi + +# 2. Build the include list. Anything that's missing on disk is silently +# skipped by restic, so we can list optional paths freely. +INCLUDES=( + "${MC_DATA}/world/playerdata" + "${MC_DATA}/world/stats" + "${MC_DATA}/world/advancements" + "${MC_DATA}/world/level.dat" + "${MC_DATA}/world_nether/level.dat" + "${MC_DATA}/world_the_end/level.dat" + "${MC_DATA}/homestead_data.db" + "${MC_DATA}/plugins/AuthMe" + "${MC_DATA}/plugins/CoreProtect/database.db" + "${MC_DATA}/plugins/LuckPerms" +) + +EXISTING=() +for p in "${INCLUDES[@]}"; do + if [ -e "$p" ]; then + EXISTING+=("$p") + fi +done + +if [ ${#EXISTING[@]} -eq 0 ]; then + log "no source paths exist — aborting" + exit 3 +fi + +# 3. Snapshot. Tagged so retention policy can target this class only. +log "snapshotting ${#EXISTING[@]} path(s)" +restic backup \ + --tag playerdata \ + --tag auto-5min \ + --host "$(hostname)" \ + --exclude='*.lock' \ + --exclude='*.tmp' \ + "${EXISTING[@]}" \ + >> "$LOG" 2>&1 + +# 4. Light retention — only on this repo, only on this tag. +restic forget \ + --tag auto-5min \ + --keep-last 24 \ + --keep-hourly 24 \ + --keep-daily 7 \ + --prune \ + --quiet \ + >> "$LOG" 2>&1 || log "forget+prune returned non-zero (continuing)" + +# 5. Sentinel for external monitor. +{ + printf 'last_success=%s\n' "$(date -Iseconds)" + printf 'class=A\n' + printf 'repo=%s\n' "$RESTIC_REPOSITORY" +} > "$SENTINEL" + +# 6. Heartbeat (no-op if HEARTBEAT_URL unset). +if [ -n "${HEARTBEAT_URL:-}" ]; then + curl -fsS -m 5 "$HEARTBEAT_URL" >/dev/null 2>&1 || true +fi + +log "=== run ok ===" diff --git a/scripts/restic-init.sh b/scripts/restic-init.sh new file mode 100644 index 0000000..b3a04f4 --- /dev/null +++ b/scripts/restic-init.sh @@ -0,0 +1,156 @@ +#!/usr/bin/env bash +# scripts/restic-init.sh +# +# One-time bootstrap for the Phase 2 restic backup chain. Run this on +# nullstone as root (sudo) AFTER `apt install restic mcrcon`. +# +# What it does: +# 1. Generates /etc/mc-backup.pw (40-byte random restic password) if absent. +# 2. Writes /etc/mc-backup.env (consumed by restic-backup-playerdata.sh). +# 3. Initialises the local restic repo at /home/user/restic/mc-frequent. +# 4. Takes a baseline snapshot so the timer's first run is fast. +# 5. Optionally adds an SFTP-mirror block once onyx is provisioned. +# +# Idempotent — re-running is safe; existing files are preserved. +# +# Cross-ref: docs/BACKUP-STRATEGY.md §8.2, docs/RUNBOOK-BACKUP-RESTORE.md. +set -euo pipefail +umask 077 + +if [ "$(id -u)" -ne 0 ]; then + echo "FATAL: must run as root (sudo)." >&2 + exit 2 +fi + +if ! command -v restic >/dev/null 2>&1; then + echo "FATAL: restic not installed. Run: apt install restic mcrcon" >&2 + exit 3 +fi + +# Resolve target user — restic repo lives under their home so /opt +# disk pressure doesn't matter. nullstone: 142G free on /home. +TARGET_USER="${TARGET_USER:-user}" +if ! id "$TARGET_USER" >/dev/null 2>&1; then + echo "FATAL: user '$TARGET_USER' not found" >&2 + exit 4 +fi +TARGET_HOME=$(getent passwd "$TARGET_USER" | cut -d: -f6) + +PW_FILE="/etc/mc-backup.pw" +ENV_FILE="/etc/mc-backup.env" +REPO_FREQUENT="${TARGET_HOME}/restic/mc-frequent" +REPO_WORLD="${TARGET_HOME}/restic/mc-world" +LOG_DIR="/var/log" +SENTINEL_DIR="/var/lib/mc-backup" + +# 1. Password file +if [ ! -e "$PW_FILE" ]; then + head -c 40 /dev/urandom | base64 > "$PW_FILE" + chown root:root "$PW_FILE" + chmod 600 "$PW_FILE" + echo "Generated $PW_FILE (40 bytes random)." +else + echo "$PW_FILE already exists — keeping." +fi + +# 2. Env file (only created if missing; user can edit afterwards). +if [ ! -e "$ENV_FILE" ]; then + cat > "$ENV_FILE" </dev/null \ + || chown "$TARGET_USER":"$(id -gn "$TARGET_USER")" "$LOG_DIR/mc-backup.log" +chmod 640 "$LOG_DIR/mc-backup.log" + +# 4. Repo init (idempotent — restic init exits non-zero if repo exists). +init_repo() { + local repo=$1 + install -d -o "$TARGET_USER" -g "$(id -gn "$TARGET_USER")" -m 700 \ + "$(dirname "$repo")" "$repo" + if RESTIC_PASSWORD_FILE="$PW_FILE" RESTIC_REPOSITORY="$repo" \ + runuser -u "$TARGET_USER" -- restic snapshots >/dev/null 2>&1; then + echo "Repo $repo: already initialised." + else + RESTIC_PASSWORD_FILE="$PW_FILE" RESTIC_REPOSITORY="$repo" \ + runuser -u "$TARGET_USER" -- restic init + echo "Repo $repo: initialised." + fi +} +init_repo "$REPO_FREQUENT" +init_repo "$REPO_WORLD" + +# 5. Baseline snapshot of the frequent repo so the first timer run is fast. +echo "Taking baseline snapshot into $REPO_FREQUENT ..." +runuser -u "$TARGET_USER" -- env \ + RESTIC_PASSWORD_FILE="$PW_FILE" \ + RESTIC_REPOSITORY="$REPO_FREQUENT" \ + restic backup \ + --tag playerdata --tag baseline --host "$(hostname)" \ + --exclude='*.lock' --exclude='*.tmp' \ + /opt/docker/minecraft/world/playerdata \ + /opt/docker/minecraft/world/stats \ + /opt/docker/minecraft/world/advancements \ + /opt/docker/minecraft/homestead_data.db \ + /opt/docker/minecraft/plugins/AuthMe \ + /opt/docker/minecraft/plugins/CoreProtect/database.db \ + /opt/docker/minecraft/plugins/LuckPerms \ + || echo "Baseline snapshot returned non-zero — review output above." + +cat <<'NEXT' + + --------------------------------------------------------------- + restic-init.sh complete. + + Next steps: + 1. Install systemd units: + install -m644 scripts/systemd/mc-backup-playerdata.service \ + /etc/systemd/system/ + install -m644 scripts/systemd/mc-backup-playerdata.timer \ + /etc/systemd/system/ + install -m755 scripts/restic-backup-playerdata.sh \ + /usr/local/bin/ + + 2. systemctl daemon-reload + 3. systemctl enable --now mc-backup-playerdata.timer + 4. Tail: journalctl -u mc-backup-playerdata.service -f + + Onyx (off-host mirror) provisioning is a separate step — see + docs/RUNBOOK-BACKUP-RESTORE.md "Phase 2 deployment". + --------------------------------------------------------------- +NEXT diff --git a/scripts/systemd/mc-backup-playerdata.service b/scripts/systemd/mc-backup-playerdata.service new file mode 100644 index 0000000..b484895 --- /dev/null +++ b/scripts/systemd/mc-backup-playerdata.service @@ -0,0 +1,29 @@ +[Unit] +Description=Minecraft frequent backup (Class A — playerdata + DBs, every 5 min) +Documentation=https://git.s8n.ru/s8n/minecraft-server/src/branch/main/BACKUP-STRATEGY.md +After=docker.service +Wants=docker.service + +[Service] +Type=oneshot +User=user +Group=user +EnvironmentFile=/etc/mc-backup.env +ExecStart=/usr/local/bin/restic-backup-playerdata.sh +Nice=10 +IOSchedulingClass=best-effort +IOSchedulingPriority=7 + +# Hardening — restic only needs read on /opt/docker/minecraft and +# write under TARGET_HOME/restic + /var/lib/mc-backup + /var/log. +ProtectSystem=strict +ProtectHome=read-only +ReadOnlyPaths=/opt/docker/minecraft +ReadWritePaths=/home/user/restic /var/lib/mc-backup /var/log +PrivateTmp=true +NoNewPrivileges=true +ProtectKernelTunables=true +ProtectKernelModules=true +ProtectControlGroups=true +RestrictSUIDSGID=true +LockPersonality=true diff --git a/scripts/systemd/mc-backup-playerdata.timer b/scripts/systemd/mc-backup-playerdata.timer new file mode 100644 index 0000000..b16360b --- /dev/null +++ b/scripts/systemd/mc-backup-playerdata.timer @@ -0,0 +1,15 @@ +[Unit] +Description=Run mc-backup-playerdata every 5 minutes +Documentation=https://git.s8n.ru/s8n/minecraft-server/src/branch/main/BACKUP-STRATEGY.md + +[Timer] +# Stagger after boot so MC and Docker have a chance to settle. +OnBootSec=2min +# 5-minute cadence per BACKUP-STRATEGY.md §2 RPO target for Class A. +OnUnitActiveSec=5min +AccuracySec=30s +# Catch up after suspend / downtime. +Persistent=true + +[Install] +WantedBy=timers.target