Agentic vulnerability auditing

Snapshot the dependency tree a build actually used, diff plain-text manifests to isolate exactly what changed since the last audited release, and spend expensive agentic security review on that delta — while cheap advisory-database scanners keep covering the whole tree.

The problem

Deep dependency review does not scale by brute force. Agentic security reviewers such as Anthropic's Mythos read dependency source the way a human auditor would, and their cost grows with every file they are given — yet between two releases most of a dependency tree is byte-for-byte unchanged, so reviewing everything mostly re-reads code that was already audited. Targeting the review is hard, though, because the usual inputs are weak: a lockfile claims what should be installed, not what the build actually used, and diffing two installed trees by hand is exactly the tedious work nobody does. And the security team often operates in a restricted environment with no registry or package-manager access, while still needing the exact bytes the build consumed.

Why snapdir

  • Proof of what the build used. Pushing the materialized tree right after dependency install records every file — direct and transitive dependencies alike — as bytes on disk under one content-addressed ID. This complements an SBOM rather than replacing it: an SBOM names the packages; the snapshot proves the bytes.

  • A plain-text audit surface. The manifest is one line per file or directory, with exactly five single-space-separated fields — TYPE PERMS CHECKSUM SIZE PATH — content checksums (BLAKE3 by default), sorted by path, comments starting with #. Standard Unix text tools can work on it directly.

  • Audit-once-per-content. Review verdicts keyed on the CHECKSUM column transfer across paths, packages, and releases: content whose checksum appears in any previously audited manifest never needs re-reading by the deep reviewer, even after a rename. (Checking unchanged content against newly disclosed vulnerabilities remains the advisory-database scanners' job — see the two-tier pairing in the walkthrough.)

  • Verified hand-off to restricted environments. snapdir pull re-hashes every object on the way in, so the analysis host gets a byte-identical copy of the exact tree without touching a registry.

Walkthrough

1. Capture the materialized tree in CI. Right after dependency install, snapshot the directory the package manager produced — node_modules, a cargo vendor directory, a pip install --target directory — and save the plain-text manifest as a build artifact. Snapshot a quiescent tree (no postinstall step still writing to it), so the saved manifest and the pushed ID describe the same bytes; the assertion makes that correspondence checkable:

deps_id=$(snapdir push --store s3://builds/deps ./vendor)
snapdir manifest ./vendor > deps.manifest
[ "$(snapdir id ./vendor)" = "$deps_id" ] || { echo "vendor changed mid-capture"; exit 1; }

2. Compute the delta against the last audited release. Here current.manifest is this build's deps.manifest, and audited.manifest is the same artifact saved by the last release that passed review. The set of manifest lines new since that audit is one comm -13 away. The explicit sort matters: manifests are ordered by the path field, but comm compares whole lines, so re-sort both sides first:

comm -13 <(grep -v '^#' audited.manifest | sort) \
         <(grep -v '^#' current.manifest | sort) > delta.lines
cut -d' ' -f5- delta.lines   # paths only

Comparing full lines is safe for paths that contain spaces — only the first four columns are space-delimited, so everything after the fourth space is the path. Parent directories appear in the delta as well, because a directory's checksum is derived from its children; the F lines carry the content to review.

3. Skip content that is already audited. Key past verdicts on the CHECKSUM column: any delta line whose checksum appears in a previously audited manifest is bytes the team has already read, whatever path, package, or release it now lives in — a vendored file that merely moved is filtered out, not re-reviewed. For an LLM-based reviewer this is the cost model: the input an agent must read scales with the delta, not with the size of the tree. The transfer is of the bytes, not the surrounding context — so the delta review should still consider how new code calls into unchanged files, since that interaction is only visible from the new side.

4. Pair the two review tiers. Delta-only review has a blind spot it cannot paper over: a newly disclosed vulnerability in an unchanged dependency never shows up in the delta. So run cheap advisory-database scanners over the entire tree on every build, and reserve the expensive agentic deep review for the delta. The tiers answer different questions — "is anything here on a known-bad list?" versus "does this new code do something hostile?"

5. Pull the exact tree into the restricted analysis environment.

snapdir pull --store s3://builds/deps --id "$deps_id" ./audit-tree

Every object is re-hashed against the manifest on fetch, so the team audits a byte-identical copy of what CI built — with no registry or package-manager access on the analysis host. For the evidence and chain-of-custody side of that workflow, see vulnerability scanning in restricted environments.

6. Carry the verdict to the shipped artifact. An audit of the inputs is only worth as much as its link to the output. The manifest records each file's path, permissions, and content checksum — and embeds no timestamps — so two builds print the same snapshot ID exactly when their manifests are identical: same file contents, same paths, same permissions. A clean rebuild that prints the same ID proves the audited inputs produced the shipped bits, and CI can gate on it:

ref=$(snapdir id ./dist)
got=$(snapdir id ./dist-rebuild)
[ "$ref" = "$got" ] || { echo "build not reproducible: $ref != $got"; exit 1; }

Outcome

Audit effort tracks what actually changed: each release, the delta falls out of two text files CI already produces, content reviewed once is never read again by the deep reviewer, and the agentic reviewers spend tokens only on genuinely new bytes — while advisory-database scanners keep watching the unchanged remainder for new disclosures. The same content-addressed IDs let the work distribute: a fleet of agents on different machines — code review, triage, pentest tooling — pulls one ID and gets identical bytes, pushes only transfer objects absent from the store, content already in a host's local cache is not fetched again, and --linked checkouts give near-instant working copies. For fleets inside a private network, ssh:// stores serve the same snapshots without a cloud bucket.