Cross-cloud replication

Replicate a snapshot directly between S3, Google Cloud Storage, and Backblaze B2 with snapdir sync — streaming store-to-store, skipping objects the destination already has, never staging the data on your own disk.

The problem

Keeping a copy of critical data in a second cloud — for disaster recovery, to avoid lock-in, or to satisfy a data-residency requirement — usually means downloading everything from one provider and re-uploading it to another. That burns egress on the round trip, needs scratch space large enough to hold the whole dataset, and re-transfers data that is already present at the destination from a previous run. And once the copy exists, you still have to prove it is faithful to the source.

Why snapdir

snapdir sync copies a snapshot — its manifest and every object it references — directly from one store to another, streaming through memory with no local staging:

No round trip to disk. sync pipes the snapshot straight from the source store to the destination, instead of a fetch then push that would pull every object into your cache first.
Skips what's already there. The transfer is content-addressed, so objects already present at the destination are skipped — re-running a sync after a small change copies only the new objects, which makes it cheap as a recurring replication step.
Split manifest and object pools. When manifests live in one location but content objects live in a shared pool, --from-objects and --to-objects let each side name that pool explicitly while --from and --to stay focused on the manifest locations.
Faithful by construction. Both endpoints address objects by content, so the replicated snapshot carries the same ID and the same integrity guarantee as the source.

Walkthrough

Replicate a snapshot from a primary S3 bucket to a GCS disaster-recovery bucket. sync needs the snapshot --id and both endpoints, and the two must differ:

snapdir sync \
  --id "$id" \
  --from s3://primary-bucket/snapshots \
  --to   gs://dr-bucket/snapshots

Preview a cross-cloud copy before committing to it with --dryrun, then run it rate-limited so it does not saturate the link:

snapdir sync --id "$id" --from s3://primary/snap --to b2://dr/snap --dryrun
snapdir sync --id "$id" --from s3://primary/snap --to b2://dr/snap --limit-rate 50M

If either side uses a split store, keep manifests and object pools separate on the command line. The manifest is copied between --from and --to; objects are read from and written to the per-side pools:

snapdir sync \
  --id "$id" \
  --from s3://primary/manifests \
  --from-objects s3://primary/object-pool \
  --to gs://dr/manifests \
  --to-objects gs://dr/object-pool

Confirm the replica is complete by verifying the snapshot against the destination store — every object is re-hashed against the manifest:

snapdir verify --store gs://dr-bucket/snapshots --id "$id"

Both endpoints must be in-process stores (file://, s3://, gs://, b2://); to replicate through a custom backend, fall back to a fetch then push.

Outcome

A second-cloud copy is maintained with one command, no scratch disk, and minimal egress: each run copies only objects the destination is missing, so steady-state replication is nearly free. The replica is provably faithful — same content, same ID, re-verifiable on demand — giving you a real cross-cloud DR posture without vendor lock-in. Discover which snapshots and locations exist to replicate with snapdir locations.