Adopt pgBackRest GFS backup strategy
Context
The application uses PostgreSQL as its primary data store with pgBackRest for backup and point-in-time recovery (PITR) to Cloudflare R2. The initial configuration used daily full backups with 30-day retention, which provides adequate recovery capability but doesn't follow industry best practices for long-term backup retention.
A proper backup strategy must balance:
- Recovery Point Objective (RPO): Maximum acceptable data loss
- Recovery Time Objective (RTO): Maximum acceptable recovery time
- Storage efficiency: Cost-effective use of backup storage
- Recovery chain reliability: Minimizing dependencies between backups
The Grandfather-Father-Son (GFS) pattern is the industry standard for balancing these concerns, using a hierarchy of backup frequencies with different retention periods.
Decision
Adopt a GFS-style backup strategy using pgBackRest's full, differential, and incremental backup types with tiered retention.
Backup Types
Full Backup (Weekly - Sunday 03:00 UTC)
- Complete copy of entire database
- Self-contained, restores independently
- Anchor point for all other backups that week
Differential Backup (Daily - Mon-Sat 03:00 UTC)
- All changes since last full backup
- Restores with: full + diff
- Grows larger through the week but limits restore chain to 2 backups
Incremental Backup (Every 6 hours - 09:00, 15:00, 21:00 UTC)
- Only changes since last backup (full, diff, or incr)
- Smallest size, provides frequent restore points
- Restores with: full + diff + incr chain
Retention Policy
repo1-retention-full=52 # 1 year of weekly fulls
repo1-retention-diff=7 # 7 daily diffs (current week)
repo1-retention-archive=1 # WAL retained for oldest diff
repo1-retention-archive-type=diff
Weekly Schedule
Sun Mon Tue Wed Thu Fri Sat
03:00 Full Diff Diff Diff Diff Diff Diff
| | | | | | |
09:00 Incr Incr Incr Incr Incr Incr Incr
15:00 Incr Incr Incr Incr Incr Incr Incr
21:00 Incr Incr Incr Incr Incr Incr Incr
Example Restore Scenarios
Restore Thursday 17:00:
- Sunday Full
- Thursday Diff
- Thursday 15:00 Incr
- WAL replay to 17:00
Restore Tuesday 10:00:
- Sunday Full
- Tuesday Diff
- Tuesday 09:00 Incr
- WAL replay to 10:00
Restore 3 months ago (any Sunday):
- That Sunday's Full backup
- No PITR available (weekly granularity only)
Recovery Capability
| Time Range | Granularity | Method |
|---|---|---|
| Last 7 days | Any point in time | WAL PITR |
| Last 7 days | 6-hour checkpoints | Incremental backups |
| 8 days - 52 weeks | Weekly (Sundays only) | Full backups |
Note: WAL archive retention (repo1-retention-archive=1, type=diff) limits PITR to the current week. Beyond 7 days, restore is limited to weekly full backup points.
Rationale
Why differential over pure incremental? With pure incrementals, a corrupted Monday backup would invalidate all subsequent backups that week. Differentials limit the blast radius - each day's backup only depends on Sunday's full.
Why weekly fulls instead of daily? Daily fulls with 52-week retention would require 365 full backups. Weekly fulls reduce this to 52 while maintaining the same recovery window, saving ~85% storage.
Why 6-hour incrementals? Balances restore chain length against backup frequency. More frequent than daily provides better RPO, less frequent than hourly keeps restore operations manageable.
Why R2 storage? Cloudflare R2 provides S3-compatible storage with free egress, critical for disaster recovery scenarios where large data transfers are needed. The 10GB free tier covers our needs through significant scale.
Consequences
Positive
- Industry-standard approach: Follows proven GFS methodology
- Storage efficient: ~430MB for 1 year of backups at current DB size (50MB)
- Reliable recovery: Limited backup chain dependencies
- Cost effective: Stays within R2 free tier until DB exceeds ~1GB
- Flexible recovery: Choose between speed (recent backup) or granularity (PITR)
Negative
- Increased complexity: Three backup types vs. single type
- Longer restore for old data: Weekly granularity beyond 7 days
- WAL dependency: PITR requires continuous WAL archiving
Storage Projections
| DB Size | Compressed | Annual Storage | Monthly Cost |
|---|---|---|---|
| 50MB | ~7MB | ~430MB | Free |
| 1GB | ~140MB | ~8GB | Free |
| 10GB | ~1.4GB | ~80GB | ~$1/mo |
| 100GB | ~14GB | ~800GB | ~$12/mo |
Verification
The backup strategy can be verified with:
# Check backup status and retention
just backup-status
# List all backups
pgbackrest info --stanza=main
# Verify backup integrity (runs weekly)
pgbackrest verify --stanza=main
Related Decisions
- Infrastructure: pgBackRest configuration in
infra/modules/platform/nixos/pgbackrest.nix - Storage: Cloudflare R2