Stores and cache

A store is any backend that holds snapdir's content-addressed objects and manifests. The local cache is a special store on your own disk that sits in front of the remote ones. Both use the exact same on-disk layout, which is why a cache populated by one snapdir and a bucket written by another remain mutually readable. In the usual colocated shape, one store holds both manifests and objects. In a split shape, --store names the manifest-side store and --objects-store names a shared object pool.

Store backends

Snapdir routes a location to a backend purely by its URL scheme:

Scheme Backend
file:// A local (or mounted) directory. Also the shape of the cache.
s3:// Amazon S3 and S3-compatible object storage.
gs:// Google Cloud Storage.
b2:// Backblaze B2.
ssh:// Any host with SSH shell access, via the system OpenSSH client.
sftp:// Any SFTP server, incl. restricted/chroot accounts.

s3:// and b2:// are served by native, built-in adapters. gs:// maps to the GCS adapter (the oracle's hardcoded gsgcs special case). Any other scheme xyz:// is dispatched to an external shim named snapdir-xyz-store found on your PATH, so you can add a backend without modifying snapdir itself — a webdav:// URL, for example, would call snapdir-webdav-store. The ssh:// and sftp:// schemes in the table above ship as two such first-party shims, turning any SSH-reachable host into a store by driving the system OpenSSH client; see SSH and SFTP stores.

A store is selected per command with --store, e.g.:

snapdir push --store s3://my-bucket/snapshots ./my-dir
snapdir pull --store file:///srv/snapdir <snapshot-id> ./restore-here

See the stores guide and snapdir locations for configuring named stores.

Each cloud backend also has per-provider request and bandwidth limits that snapdir paces against; see Storage provider limits for the defaults and how to override them.

The on-disk (and in-bucket) layout

Every store — local cache included — keys both objects and manifests on their lowercase hex digest and shards them across three directory levels so no single directory accumulates millions of entries:

.objects/<h[0:3]>/<h[3:6]>/<h[6:9]>/<h[9:]>
.manifests/<id[0:3]>/<id[3:6]>/<id[6:9]>/<id[9:]>
  • For an object, h is the file's content checksum.
  • For a manifest, id is its snapshot ID.

For checksum/id 49dc870df1de7fd60794cebce449f5ccdae575affaa67a24b62acb03e039db92:

.objects/49d/c87/0df/1de7fd60794cebce449f5ccdae575affaa67a24b62acb03e039db92
.manifests/49d/c87/0df/1de7fd60794cebce449f5ccdae575affaa67a24b62acb03e039db92

Because the address determines the path, every store is content-addressed and naturally deduplicated: an object can exist at exactly one location, and a second write of the same content is a no-op.

Split stores and shared pools

--objects-store <URI> (or SNAPDIR_OBJECTS_STORE) splits a transfer into two store roots:

  • manifests are read from or written to --store under .manifests/;
  • content objects are read from or written to the object pool under .objects/.

That lets many manifest locations share one object pool. For example, a daily inventory can write each run to a date-specific manifest prefix while all runs reuse the same content-addressed object pool. Unchanged objects are skipped, and the snapshot ID remains the hash of the manifest content, not of the store layout that happens to hold it.

When a snapshot was pushed this way, readers need both halves: fetch and pull use the manifest-side --store to find the manifest and --objects-store to find the objects it names. Without --objects-store, the store is colocated and both trees live below the same URI.

For store-to-store replication, snapdir sync uses explicit per-side pool flags, --from-objects and --to-objects. Those flags are not aliases for the global --objects-store: each sync side is either split by its own --*-objects flag or colocated when that flag is omitted.

The local cache

The local cache is a file://-shaped store rooted at ${XDG_CACHE_HOME:-$HOME/.cache}/snapdir/, using the same .objects/.manifests sharded layout as every remote. It serves two purposes:

  • A staging areasnapdir writes objects and manifests to the cache first, then transfers them to a remote store.

  • A read-through accelerator — content already present in the cache does not need to be re-fetched from a remote.

You can inspect cache integrity with snapdir verify-cache and reclaim space with snapdir flush-cache.

Copy-on-write local copies

For local file stores and the local cache, snapdir opportunistically uses copy-on-write clones when the source and destination are on a compatible filesystem. macOS uses APFS clonefile(2), and Linux uses FICLONE reflinks on filesystems that support them. These clones can make stage, push, fetch, and checkout much faster and avoid allocating another full copy of large objects.

The optimization is transparent: unsupported platforms, non-CoW filesystems, and cross-device copies fall back to normal fs::copy. Set SNAPDIR_CLONEFILE=0 to disable the fast path, or SNAPDIR_VERIFY_COPIES=1 to force the strict write-time re-hash even after a clone succeeds. In all modes, object bytes, manifest bytes, and snapshot IDs are byte-identical.

For read-only local views, checkout and pull can also use --linked. In linked mode the destination files are symlinks into local content-addressed objects, and those objects are hardened to mode 0444. This does not copy file bytes into the destination, and writing through the symlink fails instead of mutating the shared object. Linked mode requires local objects; remote object stores are refused.

Push and fetch discipline

Transfers between the cache and a remote store follow a strict ordering so a store is never left referencing data it does not hold:

  • Push — confirm the manifest is not already present, push objects before the manifest, and only push objects that are absent (skip-if-present). The manifest, written last, always points at objects that already exist.

  • Fetch — download to a temporary path, verify the BLAKE3 checksum, retry on mismatch, then atomically rename into place. A half-written or corrupted object never becomes visible at its address.

This ordering, combined with content addressing, is what lets pushes and fetches be safely retried and resumed. See Integrity for the verification details.

Where to go next