Use cases
snapdir turns any directory into a verifiable, content-addressed snapshot: a
deterministic ID derived from a BLAKE3 manifest, objects deduplicated in a local
cache, and the whole snapshot pushable to — or synced between — file://, s3://,
gs://, and b2:// stores. Because identity comes from content, the same data is
only ever stored once, and every pull is re-hashed and verified as it lands.
These scenarios show how that single guarantee plays out across day-to-day work — from versioning datasets and proving reproducible builds to deduplicated backups, cross-cloud replication, and auditable chains of custody. Each article frames a concrete problem and walks through the snapdir workflow that solves it.
-
ML dataset versioning — pin training datasets and model artifacts to a content-addressed ID so every run trains on byte-identical data, uploaded only once.
-
Reproducible builds — prove a build is byte-for-byte reproducible by comparing output snapshot IDs, and archive each release under a verifiable ID.
-
Deduplicated backup and restore — back directories up to B2 or S3 uploading only what changed, with every restore re-hashed and verified.
-
Cross-cloud replication — replicate a snapshot directly between S3, GCS, and B2 with
snapdir sync, never staging the data on your own disk. -
CI artifact caching — key a CI cache on
snapdir idso a cache hit means the inputs are provably identical, not just similarly named. -
Infrastructure config snapshots — snapshot
/etc, Terraform state, and config to a store, usingrevisionsandancestorsas a verifiable audit trail. -
Content distribution integrity — publish assets to multiple mirrors under one ID so consumers verify-on-pull that they got exactly what you shipped.
-
Container image distribution — distribute OCI layers as content-addressed snapshots: deduplicate base layers, replicate with
snapdir sync, and verify every pull. -
Vulnerability scanning in restricted environments — pull a verified intake from an immutable evidence bucket into an air-gapped environment and push findings back as an auditable chain of custody.
See also the Guide for the underlying workflows and the Command reference for every subcommand.