# Content addressing Snapdir is a **content-addressed** system: every piece of data is named by a cryptographic hash of its own bytes, not by a path, a filename, or a timestamp. The address *is* the content's fingerprint. Change a single byte and the address changes; leave the bytes untouched and the address is stable forever, on any machine, in any store. This one idea is what makes snapdir snapshots deterministic, deduplicating, and self-verifying. ## The hash is the name By default snapdir hashes content with [BLAKE3](https://github.com/BLAKE3-team/BLAKE3), producing a 256-bit digest rendered as a 64-character lowercase hex string. Two kinds of things get a content address: - **Objects** — the raw bytes of every regular file. An object's address is the BLAKE3 hash of its contents. - **Manifests** — the text document describing a whole directory tree. A manifest's address is its [snapshot ID](manifests.md), the BLAKE3 hash of the manifest text. Because the name is derived purely from the bytes, the mapping is *deterministic*. Hashing `foo\n` always yields `49dc870df1de7fd60794cebce449f5ccdae575affaa67a24b62acb03e039db92`, and the empty file always yields `af1349b9f5f9a1a6a0404dea36dcc9499bcb25c9adc112b7cc9a93cae41f3262` — whether the snapshot was produced today, last year, or by any tool that follows the same [manifest spec](manifests.md). Identical content always lands at the same address; different content (with overwhelming probability) lands at a different one. ## Deduplication falls out for free Content addressing makes deduplication automatic rather than a feature bolted on top. If a file's bytes already exist at their address — in your local [cache](stores-and-cache.md) or in a remote store — there is nothing new to write. Snapdir's [push](../reference/snapdir-push.md) path is *skip-if-present*: it only uploads objects whose content address is absent, so re-snapshotting a tree where only one file changed transfers just that one file's object. Deduplication works at three levels at once: - **Within a snapshot** — two files with identical bytes share one object. - **Across snapshots** — unchanged files across revisions reuse the same objects; successive snapshots cost only the deltas. - **Across trees and stores** — any two directories that happen to share content share its storage, regardless of where they came from. The same property extends to directories. A directory's checksum is derived from the *deduplicated* set of its children's checksums (see the [merkle rule](manifests.md#directory-checksums-the-merkle-rule)), so subtrees with identical content collapse to one address too. ## Why a cryptographic hash A content address is only trustworthy if collisions are infeasible to forge. BLAKE3 is a modern cryptographic hash, so an address doubles as an [integrity](integrity.md) guarantee: if the bytes you fetch hash back to the address you asked for, they are — to cryptographic certainty — exactly the bytes that were stored. Snapdir leans on this end to end, re-verifying every object's hash on fetch. For interoperability with legacy tooling, the checksum function can be switched to `md5` or `sha256`, but BLAKE3 remains the default and the basis of snapdir's security claims — see [Integrity](integrity.md). ## Where to go next - [Manifests](manifests.md) — how content addresses compose into a snapshot. - [Stores and cache](stores-and-cache.md) — how addressed objects are laid out and shared. - [Integrity](integrity.md) — how addresses are verified on every transfer.