Vulnerability scanning in restricted environments

Run security investigations against an immutable GCS evidence bucket from a restricted, air-gapped analysis environment: pull a verified, byte-identical intake; scan and triage locally; and push evidence back step by step as an auditable, content-addressed chain of custody.

The problem

Security and forensic work has constraints most data pipelines do not. The material under investigation must be retrieved into a restricted or air-gapped analysis environment without trusting the transport, the intake must be provably byte-identical to what was collected, and everything the analyst does afterward has to be recorded in a way that holds up later — a defensible chain of custody. On top of that, the resulting evidence must interoperate with existing SIEM, antivirus, and forensic tooling that keys artifacts on legacy digests, even though those digests are not trustworthy as security primitives. Ad-hoc copies and loose checksum files satisfy none of these requirements at once.

Why snapdir

snapdir pairs a content-addressed, tamper-evident snapshot model with an immutable Google Cloud Storage backend, which is exactly the shape a chain-of-custody workflow needs:

Immutable evidence bucket. Back the workflow with a GCS bucket configured for object versioning and a retention policy / bucket lock, so objects cannot be overwritten or deleted before their retention expires. Every snapshot written there is preserved.
Verified, byte-identical intake. snapdir pull re-hashes every object on the way in, so a successful pull from the evidence bucket is a cryptographic proof the intake matches what was collected — tamper-evident by construction.
Auditable chain of custody. Each snapdir push yields an immutable, content-addressed snapshot ID. The ordered chain of IDs (snapdir ancestors / snapdir revisions) is a defensible, append-only record of the investigation.
Legacy-checksum interop. --checksum md5|sha256 makes snapshot manifests interoperate with tooling that keys on those digests, while BLAKE3 stays the default security primitive.

Walkthrough

1. Verified intake. From the restricted analysis host, pull the snapshot to be analyzed from the immutable evidence bucket. Every object is re-hashed against the manifest, so the checkout is byte-identical to what was collected:

snapdir pull --store gs://evidence-immutable/intake --id "$intake_id" ./case-1234

2. Scan and triage locally. Run your scanners and analysis tooling against the checked-out tree inside the air-gapped environment. Nothing leaves the host.

3. Push evidence back step by step. As the investigation produces artifacts — extracted samples, scanner output, an analyst's session notes — push each stage to a write-once evidence bucket. Each push is one immutable, content-addressed step in the record:

# Step 1: capture the triaged sample set.
snapdir push --store gs://evidence-writeonce/case-1234 ./findings/samples

# Step 2: capture the scanner reports produced from it.
snapdir push --store gs://evidence-writeonce/case-1234 ./findings/scan-output

# Step 3: capture the analyst's investigation session.
session_id=$(snapdir push --store gs://evidence-writeonce/case-1234 ./findings/session)

4. Trace the chain of custody. The successive snapshot IDs form an ordered, auditable lineage. Trace it with snapdir ancestors, and verify any step against the immutable store at any time:

snapdir ancestors --id "$session_id"
snapdir verify    --store gs://evidence-writeonce/case-1234 --id "$session_id"

5. Interoperate with existing tooling. When a SIEM, AV, or forensic platform keys artifacts on a legacy digest, produce a snapshot whose manifest uses that digest so the IDs line up with the other system's records:

# md5 for a tool that indexes on md5; sha256 where that is what the platform expects.
snapdir push --store gs://evidence-writeonce/case-1234 --checksum md5    ./findings/samples
snapdir push --store gs://evidence-writeonce/case-1234 --checksum sha256 ./findings/samples

md5 is provided for interop only — it is not a security primitive: md5 collisions are cheap to construct, so it must never be relied on to detect tampering. sha256 is likewise primarily an interop choice. For every trust boundary in the investigation, BLAKE3 remains the default and the basis of the integrity guarantees. See Integrity → checksum modes for the full rationale.

Outcome

A vulnerability investigation runs end to end without weakening any constraint: intake is verified byte-identical on pull into the restricted environment, evidence is committed step by step to an immutable, retention-locked GCS bucket, and the chain of content-addressed IDs gives investigators a defensible, tamper-evident chain of custody they can re-verify on demand. Legacy md5/sha256 manifests bridge to existing SIEM/AV/forensic tooling, while BLAKE3 underpins every actual trust decision. Navigate the recorded lineage with snapdir ancestors.