Agentic vulnerability auditing
Snapshot the dependency tree a build actually used, diff plain-text manifests to isolate exactly what changed since the last audited release, and spend expensive agentic security review on that delta — while cheap advisory-database scanners keep covering the whole tree.
The problem
Deep dependency review does not scale by brute force. Agentic security reviewers such as Anthropic's Mythos read dependency source the way a human auditor would, and their cost grows with every file they are given — yet between two releases most of a dependency tree is byte-for-byte unchanged, so reviewing everything mostly re-reads code that was already audited. Targeting the review is hard, though, because the usual inputs are weak: a lockfile claims what should be installed, not what the build actually used, and diffing two installed trees by hand is exactly the tedious work nobody does. And the security team often operates in a restricted environment with no registry or package-manager access, while still needing the exact bytes the build consumed.
Why snapdir
-
Proof of what the build used. Pushing the materialized tree right after dependency install records every file — direct and transitive dependencies alike — as bytes on disk under one content-addressed ID. This complements an SBOM rather than replacing it: an SBOM names the packages; the snapshot proves the bytes.
-
A plain-text audit surface. The manifest is one line per file or directory, with exactly five single-space-separated fields —
TYPE PERMS CHECKSUM SIZE PATH— content checksums (BLAKE3 by default), sorted by path, comments starting with#. Standard Unix text tools can work on it directly. -
Audit-once-per-content. Review verdicts keyed on the
CHECKSUMcolumn transfer across paths, packages, and releases: content whose checksum appears in any previously audited manifest never needs re-reading by the deep reviewer, even after a rename. (Checking unchanged content against newly disclosed vulnerabilities remains the advisory-database scanners' job — see the two-tier pairing in the walkthrough.) -
Verified hand-off to restricted environments.
snapdir pullre-hashes every object on the way in, so the analysis host gets a byte-identical copy of the exact tree without touching a registry.
Walkthrough
1. Capture the materialized tree in CI. Right after dependency install,
snapshot the directory the package manager produced — node_modules, a cargo
vendor directory, a pip install --target directory — and save the plain-text
manifest as a build artifact. Snapshot a quiescent tree (no postinstall step
still writing to it), so the saved manifest and the pushed ID describe the same
bytes; the assertion makes that correspondence checkable:
deps_id=$(snapdir push --store s3://builds/deps ./vendor)
snapdir manifest ./vendor > deps.manifest
[ "$(snapdir id ./vendor)" = "$deps_id" ] || { echo "vendor changed mid-capture"; exit 1; }
2. Compute the delta against the last audited release. Here
current.manifest is this build's deps.manifest, and audited.manifest is
the same artifact saved by the last release that passed review. The set of
manifest lines new since that audit is one comm -13 away. The explicit sort
matters: manifests are ordered by the path field, but comm compares whole
lines, so re-sort both sides first:
comm -13 <(grep -v '^#' audited.manifest | sort) \
<(grep -v '^#' current.manifest | sort) > delta.lines
cut -d' ' -f5- delta.lines # paths only
Comparing full lines is safe for paths that contain spaces — only the first
four columns are space-delimited, so everything after the fourth space is the
path. Parent directories appear in the delta as well, because a directory's
checksum is derived from its children; the F lines carry the content to
review.
3. Skip content that is already audited. Key past verdicts on the
CHECKSUM column: any delta line whose checksum appears in a previously
audited manifest is bytes the team has already read, whatever path, package, or
release it now lives in — a vendored file that merely moved is filtered out,
not re-reviewed. For an LLM-based reviewer this is the cost model: the input an
agent must read scales with the delta, not with the size of the tree. The
transfer is of the bytes, not the surrounding context — so the delta review
should still consider how new code calls into unchanged files, since that
interaction is only visible from the new side.
4. Pair the two review tiers. Delta-only review has a blind spot it cannot paper over: a newly disclosed vulnerability in an unchanged dependency never shows up in the delta. So run cheap advisory-database scanners over the entire tree on every build, and reserve the expensive agentic deep review for the delta. The tiers answer different questions — "is anything here on a known-bad list?" versus "does this new code do something hostile?"
5. Pull the exact tree into the restricted analysis environment.
snapdir pull --store s3://builds/deps --id "$deps_id" ./audit-tree
Every object is re-hashed against the manifest on fetch, so the team audits a byte-identical copy of what CI built — with no registry or package-manager access on the analysis host. For the evidence and chain-of-custody side of that workflow, see vulnerability scanning in restricted environments.
6. Carry the verdict to the shipped artifact. An audit of the inputs is only worth as much as its link to the output. The manifest records each file's path, permissions, and content checksum — and embeds no timestamps — so two builds print the same snapshot ID exactly when their manifests are identical: same file contents, same paths, same permissions. A clean rebuild that prints the same ID proves the audited inputs produced the shipped bits, and CI can gate on it:
ref=$(snapdir id ./dist)
got=$(snapdir id ./dist-rebuild)
[ "$ref" = "$got" ] || { echo "build not reproducible: $ref != $got"; exit 1; }
Outcome
Audit effort tracks what actually changed: each release, the delta falls out of
two text files CI already produces, content reviewed once is never read again
by the deep reviewer, and the agentic reviewers spend tokens only on genuinely
new bytes — while advisory-database scanners keep watching the unchanged
remainder for new disclosures. The same content-addressed IDs let the work
distribute: a fleet of agents on different machines — code review, triage,
pentest tooling — pulls one ID and gets identical bytes, pushes only transfer
objects absent from the store, content already in a host's local cache is not
fetched again, and --linked checkouts give
near-instant working copies. For fleets inside a private network,
ssh:// stores serve the same
snapshots without a cloud bucket.