Use cases

snapdir turns any directory into a verifiable, content-addressed snapshot: a deterministic ID derived from a BLAKE3 manifest, objects deduplicated in a local cache, and the whole snapshot pushable to — or synced between — file://, s3://, gs://, and b2:// stores. Because identity comes from content, the same data is only ever stored once, and every pull is re-hashed and verified as it lands.

These scenarios show how that single guarantee plays out across day-to-day work — from versioning datasets and proving reproducible builds to deduplicated backups, cross-cloud replication, and auditable chains of custody. Each article frames a concrete problem and walks through the snapdir workflow that solves it.

  • ML dataset versioning — pin training datasets and model artifacts to a content-addressed ID so every run trains on byte-identical data, uploaded only once.

  • Reproducible builds — prove a build is byte-for-byte reproducible by comparing output snapshot IDs, and archive each release under a verifiable ID.

  • Deduplicated backup and restore — back directories up to B2 or S3 uploading only what changed, with every restore re-hashed and verified.

  • Cross-cloud replication — replicate a snapshot directly between S3, GCS, and B2 with snapdir sync, never staging the data on your own disk.

  • CI artifact caching — key a CI cache on snapdir id so a cache hit means the inputs are provably identical, not just similarly named.

  • Infrastructure config snapshots — snapshot /etc, Terraform state, and config to a store, using revisions and ancestors as a verifiable audit trail.

  • Content distribution integrity — publish assets to multiple mirrors under one ID so consumers verify-on-pull that they got exactly what you shipped.

  • Container image distribution — distribute OCI layers as content-addressed snapshots: deduplicate base layers, replicate with snapdir sync, and verify every pull.

  • Vulnerability scanning in restricted environments — pull a verified intake from an immutable evidence bucket into an air-gapped environment and push findings back as an auditable chain of custody.

See also the Guide for the underlying workflows and the Command reference for every subcommand.