Use cases
snapdir turns any directory into a verifiable, content-addressed snapshot: a
deterministic ID derived from a BLAKE3 manifest, objects deduplicated in a local
cache, and the whole snapshot pushable to file://, s3://, gs://, b2://,
ssh://, and sftp:// stores — or synced directly between the in-process ones.
Because identity comes from content, the same data is only ever stored once, and
every pull is re-hashed and verified as it lands.
These scenarios show how that single guarantee plays out across day-to-day work — from versioning datasets and targeting agentic security review to deduplicated backups, cross-cloud replication, and auditable chains of custody. Each article frames a concrete problem and walks through the snapdir workflow that solves it.
-
ML dataset versioning — pin training datasets and model artifacts to a content-addressed ID so every run trains on byte-identical data, uploaded only once.
-
Parquet feature mirrors for inference — keep long-running feature partitions in a local read-only symlink mirror, updated by snapshot ID as new Parquet data lands.
-
Agentic vulnerability auditing — snapshot the dependency tree a build actually used and diff manifests so expensive agentic security review only reads what changed since the last audited release.
-
Deduplicated backup and restore — back directories up to B2 or S3 uploading only what changed, with every restore re-hashed and verified.
-
Cross-cloud replication — replicate a snapshot directly between S3, GCS, and B2 with
snapdir sync, never staging the data on your own disk. -
CI artifact caching — key a CI cache on
snapdir idso a cache hit means the inputs are provably identical, not just similarly named. -
Infrastructure config snapshots — snapshot
/etc, Terraform state, and config to a store, usingrevisionsandancestorsas a verifiable audit trail. -
Content distribution integrity — publish assets to multiple mirrors under one ID so consumers verify-on-pull that they got exactly what you shipped.
-
Container image distribution — distribute OCI layers as content-addressed snapshots: deduplicate base layers, replicate with
snapdir sync, and verify every pull. -
Vulnerability scanning in restricted environments — pull a verified intake from an immutable evidence bucket into an air-gapped environment and push findings back as an auditable chain of custody.
See also the Guide for the underlying workflows and the Command reference for every subcommand.