# Snapdir

# What is snapdir?

Snapdir is a tool for creating and restoring snapshots of directories.

The main feature are:

  • Generating manifests and unique identifiers of the contents of directories and files.
  • Saving and restores data from pluggable storage backends such as Amazon S3 and Backblaze B2.
  • Verifying the integrity of the data using cryptographic hashes.
  • UNIX-style composability.
  • Content addressable local object cache.

Snapdir is a building block for applications that need one or more of the following characteristics:

  • Storing data on untrusted environments.
  • Content replicated data types (CRDTs).
  • File-system based data replication.
  • Data integrity verification.
  • File deduplication.
  • Multicloud file sharing.

# Motivation

Snapdir was created as a prototype to explore an optimal workflow for consuming and generating files in ephemeral environments. At BermiLabs, we used it to replicate parquet files in our analytics pipelines and our distributed ETL workflows.

We decided to open source it could be used by others to implement CRDT strategies on eventually consistent read-heavy applications.

# Usage

# Prerequisites

Snapdir requires BLAKE3 for hashing and HMAC signing and optionally [sqlite] to query local snapshots.

To verify your dependencies are on your $PATH run:

command -v b3sum
command -v sqlite3

To install the dependencies on debian flavored distributions you can run:

apt-get install -y wget sqlite3
wget -q "https://github.com/BLAKE3-team/BLAKE3/releases/download/1.3.1/b3sum_linux_x64_bin" -O /usr/local/bin/b3sum
chmod +x /usr/local/bin/b3sum

# Installation

Snapdir has been implemented as independent and tested bash scripts.

To install them all on /usr/local/bin/ you can run:

wget -O - https://raw.githubusercontent.com/bermi/snapdir/main/utils/install.sh | bash

At a minimum, snapdir requires the snapdir and snapdir-manifest scripts to be on your $PATH.

program description docs docs status status
snapdir Snapshotting, verification and sharing of directories with pluggable storage backends. install, manual unit tests status
snapdir-manifest Standalone tool for creating directory snapshot manifests that can be versioned controlled and audited by humans. manual unit tests status
snapdir-file-store Storage backend using the filesystem. manual unit tests status
snapdir-s3-store Storage backend using Amazon S3. manual s3 status
snapdir-b2-store Storage backend using Backblaze B2. README manual b2 status
snapdir-sqlite3-catalog Basic catalog of local and remote manifests. manual catalog status
bermi/snapdir docker image 8MB Docker image containing snapdir and all its dependencies. docker pull bermi/snapdir docker status

# Via Docker

You can try snapdir using the Docker image bermi/snapdir

target_dir=./ # specify a target directory
# using -v to mount the target directory on the docker container
docker run -it --rm \
    -v "$(realpath $target_dir):/target" \
    -v "${HOME}/.cache/snapdir:/root/.cache/snapdir" \
    bermi/snapdir manifest /target

# Contributing

Snapdir is licensed under the MIT License and contributions are welcome! Please check the contributing guidelines and visit the github repo for more information.

To checkout the code and run tests:

git clone https://github.com/bermi/snapdir.git
cd snapdir
./snapdir-test

There project includes a VSCode devcontainer configuration that you can use to develop snapdir in a containerized environment.

# Alternatives

There are many other tools that might be better suited for your particular use case. For example: ostree, mtree, Git LFS, DVC, Syncthing, BitTorrent, DAT, git, HDF5, tar, Btrfs, ZFS, IPFS, Perkeep, SeaweedFS, upspin Keybase Filesystem and Sigstore.

We use Snapdir in conjunction with some of the tools mentioned above. None of them met the simplicity, ergonomics and auditability goals we had in mind when defining Snapdir.