Syncing

snapdir sync copies a snapshot — its manifest and every object it references — directly between two stores, streaming through memory with no local staging. This is how you replicate a snapshot from one bucket or region to another (or between clouds) without ever materializing it on your own disk.

It is the right tool when the data does not need to land locally: a normal fetch + push would pull every object into your cache and push it back out, whereas sync pipes the snapshot straight from the source store to the destination store.

Replicate a snapshot between stores

sync requires a snapshot --id and both endpoints, --from and --to:

snapdir sync \
  --id "$id" \
  --from s3://primary-bucket/snapshots \
  --to   gs://dr-bucket/snapshots

When omitted, --from defaults to $SNAPDIR_STORE, so you can set the source store once via the SNAPDIR_STORE environment variable and only pass --to on each sync. --to is always explicit — a sync needs two distinct stores, so the destination is never inferred.

The source and destination must differ. The transfer is content-addressed, so objects already present at the destination are skipped — re-running a sync after a small change only copies the new objects, which makes sync cheap to run repeatedly as a replication step.

Sync split object pools

sync has its own per-side split-store flags:

  • --from-objects <URI> reads source objects from that object pool while source manifests come from --from.

  • --to-objects <URI> writes destination objects to that object pool while destination manifests go to --to.

These flags are distinct from the global --objects-store / SNAPDIR_OBJECTS_STORE used by push, fetch, and pull. For sync, name each object pool explicitly on the side that is split; a side that omits its --*-objects flag is treated as a normal colocated store. Passing the global --objects-store to sync does not stand in for --from-objects or --to-objects.

snapdir sync \
  --id "$id" \
  --from s3://inventory/manifests/host-a \
  --from-objects s3://inventory/source-object-pool \
  --to b2://dr/manifests/host-a \
  --to-objects b2://dr/object-pool

The source and destination can use different object pools. Objects are read from the source pool, written to the destination pool, and skipped when the destination pool already has that content-addressed object. The manifest is still published to the destination --to location last.

Both endpoints must be in-process stores (file://, s3://, gs://, b2://), and the same restriction applies to --from-objects and --to-objects. External snapdir-*-store URLs — including the first-party ssh:// and sftp:// stores — have no in-process streaming surface and are rejected by sync; to replicate through such a backend, fall back to a fetch then push. See Stores for the backend matrix and authentication.

Mirror a local destination store

sync --delete keeps the destination store's manifest set aligned with the source. It copies the requested snapshot first, then removes destination manifests that are not present in the source. It does not delete objects, so unreferenced object bytes are left for a separate garbage-collection policy.

snapdir sync \
  --id "$id" \
  --from gs://feature-snapshots \
  --to file:///srv/snapdir/features \
  --delete

Mirror deletion is intentionally local-only: --to must be a file:// store. Object stores such as s3://, gs://, and b2://, and external stores such as ssh://, are refused for sync --delete. Add --dryrun to preview the manifest prune count before changing the destination.

Tuning and dry runs

sync honors the shared transfer-tuning flags, applied to the single store-to-store pipe:

  • -j, --jobs <N> — concurrent object transfers (0/auto = CPUs, capped 16).

  • --limit-rate <RATE> — cap aggregate bandwidth, e.g. 50M, 512K, 1G.

  • --adaptive[=<FRACTION>] — adaptively tune concurrency/bandwidth toward a fraction (default 0.8) of measured capacity, backing off under contention.

  • --dryrun — report what would be copied without writing anything to the destination. Useful for confirming a replication plan before it runs.

# Preview a cross-cloud replication, then run it rate-limited.
snapdir sync --id "$id" --from s3://primary/snap --to b2://dr/snap --dryrun
snapdir sync --id "$id" --from s3://primary/snap --to b2://dr/snap --limit-rate 50M

On a real sync the snapshot ID is printed to stdout and a human-readable summary goes to stderr, consistent with push and stage. The summary counts unique objects actually copied, not file references in the manifest. If a manifest references the same content more than once, or the destination pool already has an object, those duplicates do not inflate the copied count; a sync that only publishes a missing manifest can correctly report 0 copied.

Where to go next