Backups & Storage
Snapshot retention: GFS and tiers
'Keep the last N snapshots' quietly fails: a burst of hourly snapshots can age out months of history in a day, and a long-dwell ransomware infection can push every clean snapshot out of the window. Tiered (GFS) retention keeps restore points across timescales so both 'I deleted a file an hour ago' and 'the corruption started months ago' are recoverable.
Who this is for
Home operators running NAS snapshots or a dedup backup tool who want a retention policy that survives both a recent mistake and a long-dwell corruption — instead of 'keep the last N snapshots', which silently loses history during snapshot bursts or long ransomware dwell times.
Outcome
A grandfather-father-son retention policy (daily/weekly/monthly, often plus yearly) expressed in your tools — restic forget keep-* flags, Synology Snapshot Replication Smart Retention, or TrueNAS snapshot lifetime plus an independent replication retention — sized to your change rate and free space, with dense recent history on the source and a long GFS ladder on the offsite copy.
Required inputs
- Your snapshot/backup tool and where it runs (restic, Synology Snapshot Replication, TrueNAS periodic snapshot + replication tasks, etc.).
- An estimate of your data's change rate (how much diverges per day) and the free space available on the pool/repository.
- A target recovery profile: how recent you need fine-grained restore points and how far back you need any restore point at all.
- A separate offsite/replicated target if you want the long tail of the GFS ladder to live off the box.
Step-by-step procedure
Define the GFS tiers you need
Do: Decide daily (sons), weekly (fathers), monthly (grandfathers), and optionally yearly retention counts based on your recovery profile — e.g. 7 daily, 5 weekly, 12 monthly, a few yearly.
Expected result: You have explicit counts per tier rather than a single 'keep N' number.
If not: If you only have a fixed count, a snapshot burst or a long dwell can wipe your real history — convert to tiers.
Express the policy in your tool
Do: Set restic forget --keep-daily/-weekly/-monthly/-yearly, or Synology Snapshot Replication Smart Retention tiers, or TrueNAS snapshot lifetime — the tool keeps the most recent snapshot in each slot.
Expected result: Running the retention/forget operation results in snapshots spread across the tiers, not just the most recent ones.
If not: If everything older than a few days is gone after pruning, the tiered flags aren't applied — re-check the policy.
Split source vs offsite retention
Do: Keep short, dense history on the source (fast pool) and configure the replication/offsite target's retention independently for the long GFS ladder.
Expected result: The source holds recent restore points; the offsite/replicated copy holds the deep history.
If not: If the source carries the entire long ladder, it bloats the fast pool — move the long tail to the target.
Size it against change rate and free space
Do: Estimate snapshot growth from your change rate (not dataset size), and keep copy-on-write pools out of the near-full zone where fragmentation bites; trim the longest tiers if space is tight.
Expected result: Projected snapshot usage fits within free space with a planning buffer.
If not: If the pool trends toward full, reduce the deepest tiers or move them offsite before performance degrades.
Pair the deep history with immutability
Do: Put the long GFS tail on an immutable/offsite tier so a long-dwell attack can't delete the very restore points you'd need.
Expected result: The deep restore points live somewhere ransomware can't reach.
If not: If the long history is on the same writable box, treat that as the gap to close next.
Verify old restore points still restore
Do: Periodically restore from an older tier (not just the latest), since older versions are exactly what you need after a long-dwell problem.
Expected result: An older monthly/weekly restore point restores and opens correctly.
If not: If only the latest restores, your retention is effectively shallow — investigate the older tiers.
Commands and settings paths
Apply and inspect a restic GFS policy
restic forget --keep-daily 7 --keep-weekly 5 --keep-monthly 12 --keep-yearly 3 --dry-run
Where: On the backup host (dry-run first)
Expected: The plan keeps the most recent snapshot in each daily/weekly/monthly/yearly slot.
Failure means: If it would drop all but the last few, the keep-* flags aren't taking effect.
Safe next step: Fix the flags; run without --dry-run only once the plan looks right.
Check snapshot space vs free space
zfs list -t snapshot (TrueNAS) / Storage Manager → Volume usage (Synology)
Where: On the NAS
Expected: Snapshot usage is a tracked, bounded fraction of free space.
Failure means: Runaway snapshot growth means the change rate is higher than the retention assumes.
Safe next step: Trim the deepest tiers or move them offsite; keep the pool out of the near-full zone.
Restore from an older tier
Restore a file from a monthly/weekly restore point (not the latest) to a scratch folder.
Where: On a clean machine / scratch location
Expected: The older restore point restores and the file opens.
Failure means: If only the latest works, your effective retention is shallow.
Safe next step: Investigate why older tiers aren't usable before trusting the policy.
Evidence to record
- The per-tier retention counts (daily/weekly/monthly/yearly) and where they're configured.
- The split between source retention and offsite/replication retention.
- Projected and actual snapshot space usage vs free space on the pool/repo.
- The date and tier of the last successful older-restore-point test.
Common mistakes
- Using 'keep the last N snapshots' so a burst of frequent snapshots evicts months of history.
- Assuming snapshots are free — they grow with change rate and can fill a copy-on-write pool.
- Keeping the entire long ladder on the source's fast pool instead of the offsite copy.
- Never testing an older restore point, so a shallow-in-practice policy goes unnoticed.
Stop points
- Stop adding deeper tiers once projected snapshot usage would push a copy-on-write pool toward full.
- Stop trusting the policy until an older (not latest) restore point has actually been restored.
Last reviewed
2026-06-03
Source-backed checks
HomeTechOps turns official docs and conservative safety rules into a shorter runbook. These links are the source trail for the page direction.