Deduplicated backup and restore

Back directories up to Backblaze B2 or S3 with content-addressed deduplication, so each backup only uploads what actually changed — and every restore is re-hashed and verified as it lands.

The problem

Routine backups are mostly redundant: day over day, only a small fraction of a directory changes, yet naive backup schemes re-upload everything or rely on fragile incremental chains that are painful to restore from. Worse, a backup you cannot verify is not really a backup — bit rot in cold object storage, a truncated upload, or silent corruption can leave you with an archive that fails exactly when you need it. Restores then become an act of faith.

Why snapdir

Backups are pushed to an object store, deduplicated by content, and verified on the way back:

  • Incremental by content, not by chain. Objects live at content-addressed keys, so a backup only uploads objects whose bytes are new. Identical files across days — or across different machines pushing to the same bucket — are stored once.

  • Every backup is self-contained. A snapshot is addressed by one ID and carries its full manifest; there is no incremental chain to walk or to break. Restore any backup directly.

  • Verified restore. On pull every object is re-hashed against the manifest, so a successful restore is a proof of integrity, not an assumption.

Walkthrough

Back a directory up to a B2 bucket. push uploads only the objects not already present and prints the snapshot ID — record it as your restore point:

backup_id=$(snapdir push --store b2://acme-backups/home ./srv/data)
echo "$backup_id"

Run it again tomorrow. Because the store is content-addressed, only the objects that changed since the last backup are uploaded; everything unchanged is skipped:

snapdir push --store b2://acme-backups/home ./srv/data

To restore, pull a chosen restore point into a fresh directory. Each object is fetched, re-hashed, and only then written into place:

snapdir pull --store b2://acme-backups/home --id "$backup_id" ./restore

Before relying on a cold backup, verify it end-to-end against the store without restoring it first:

snapdir verify --store b2://acme-backups/home --id "$backup_id"

The same commands work against s3:// — swap the store URI and nothing else changes. See Stores for backend authentication.

Outcome

Backups cost only the bytes that actually changed, deduplicated across days and across machines, while each one remains an independent, directly restorable snapshot rather than a fragile incremental chain. Restores are verified by construction, and a cold archive can be checked at any time with snapdir verify — so you find corruption on a routine check, not during a real recovery. Past restore points stay queryable via snapdir revisions.