Use cases

snapdir turns any directory into a verifiable, content-addressed snapshot: a deterministic ID derived from a BLAKE3 manifest, objects deduplicated in a local cache, and the whole snapshot pushable to file://, s3://, gs://, b2://, ssh://, and sftp:// stores — or synced directly between the in-process ones. Because identity comes from content, the same data is only ever stored once, and every pull is re-hashed and verified as it lands.

These scenarios show how that single guarantee plays out across day-to-day work — from versioning datasets and targeting agentic security review to deduplicated backups, cross-cloud replication, and auditable chains of custody. Each article frames a concrete problem and walks through the snapdir workflow that solves it.

  • ML dataset versioning — pin training datasets and model artifacts to a content-addressed ID so every run trains on byte-identical data, uploaded only once.

  • Parquet feature mirrors for inference — keep long-running feature partitions in a local read-only symlink mirror, updated by snapshot ID as new Parquet data lands.

  • Agentic vulnerability auditing — snapshot the dependency tree a build actually used and diff manifests so expensive agentic security review only reads what changed since the last audited release.

  • Deduplicated backup and restore — back directories up to B2 or S3 uploading only what changed, with every restore re-hashed and verified.

  • Cross-cloud replication — replicate a snapshot directly between S3, GCS, and B2 with snapdir sync, never staging the data on your own disk.

  • CI artifact caching — key a CI cache on snapdir id so a cache hit means the inputs are provably identical, not just similarly named.

  • Infrastructure config snapshots — snapshot /etc, Terraform state, and config to a store, using revisions and ancestors as a verifiable audit trail.

  • Content distribution integrity — publish assets to multiple mirrors under one ID so consumers verify-on-pull that they got exactly what you shipped.

  • Container image distribution — distribute OCI layers as content-addressed snapshots: deduplicate base layers, replicate with snapdir sync, and verify every pull.

  • Vulnerability scanning in restricted environments — pull a verified intake from an immutable evidence bucket into an air-gapped environment and push findings back as an auditable chain of custody.

See also the Guide for the underlying workflows and the Command reference for every subcommand.