Forensic Acquisition and Imaging
Forensic acquisition is the process of extracting digital evidence from its source in a form that preserves its integrity and supports subsequent analysis. Forensic imaging is the specific technical practice of producing a copy — a bit-for-bit image — that captures every byte of the source so the analysis can happen on the copy without putting the original at risk. The two terms are sometimes used interchangeably; the relationship is that imaging is one common form of acquisition, and the most defensible form when the situation supports it.
Almost every other phase of a forensic engagement depends on what happened during acquisition. An incomplete acquisition forfeits artifacts that can never be recovered. A non-reproducible acquisition produces findings that cannot be cross-examined. An acquisition that altered the source destroys the chain at the moment it most needed to hold. The acquisition step is one of two times in a forensic engagement (the other being the report) where doing the work badly cannot be recovered from doing more work later.
This page covers the methodology: the order of volatility that governs sequencing, the choice between live and dead acquisition, the image formats and what they capture, the acquisition tools and their tradeoffs, the verification math that proves the acquisition was sound, and the special cases (encrypted volumes, RAID, cloud, mobile, memory) where the standard methodology has to be adjusted. Evidence Handling and Chain of Custody covers the procedural framework that the acquisition sits inside, including the write-blocker methodology and the chain-of-custody documentation requirements that are assumed throughout this page.
The order of volatility
The order-of-volatility principle, articulated in RFC 3227 (2002, “Guidelines for Evidence Collection and Archiving”) and elaborated in every subsequent forensic standard, states that evidence should be collected from the most volatile sources first. The principle exists because some classes of data exist only briefly (memory contents, network connections, running process state) and capturing them later is impossible.
The canonical ordering, from most to least volatile:
- CPU registers and cache. Effectively impossible to capture in practice; the cache contents are gone by the time any forensic tool runs.
- Memory (RAM). The contents of physical memory at a specific moment. Includes running processes, open network connections, loaded modules, kernel structures, command-line arguments, environment variables, and any unencrypted data the system is currently working with.
- Network state and routing tables. Active connections, ARP cache, routing tables, listening sockets. Some of this is captured as part of a memory image; some has to be captured separately.
- Running processes. The list of executing processes, their command lines, parent-child relationships, open handles. Largely overlapping with memory but sometimes captured via separate tooling for analyst convenience.
- Disk contents, including unallocated and slack space. The full byte stream of attached storage media. Less volatile than memory but still subject to change as the system runs.
- Logging on remote systems. Logs forwarded to a SIEM, syslog server, or cloud log service. Less volatile than local disk because they exist independently of the source system, but subject to retention windows.
- Physical configuration and topology. Network topology, USB devices currently connected, attached peripherals. Captured by photographing and documenting before the system is moved or powered down.
- Backup media. Offline copies, tape archives. The least volatile but also the most likely to be incomplete or stale.
The ordering matters because the act of capturing later-stage data sometimes destroys earlier-stage data. Powering down a system to image its disk eliminates the memory contents. Running acquisition tools on a live system modifies memory and possibly disk. Each acquisition decision constrains the subsequent options, and the constraint goes in one direction: you cannot go back and capture the memory contents from a system you have already powered down.
The order is a default, not a mandate. Specific cases adjust it. If the disk is the only forensic surface that matters (a routine e-discovery acquisition from a sealed laptop), the memory may not be worth capturing. If the system is actively under attack and the immediate priority is containment, the IR team may pull the network cable before any forensic acquisition begins. The principle is that the deviation from the canonical order should be a documented decision, not an accident.
Live versus dead acquisition
The choice between live acquisition (capturing data from a running system) and dead acquisition (powering the system down and imaging the storage) is the most consequential methodological decision in the acquisition phase. The choice is not always available. Some systems cannot be powered down without unacceptable operational consequences. Some cannot be left running without unacceptable evidentiary consequences. Where the choice exists, the tradeoffs are real.
Dead acquisition is the historical default and remains the most legally clean methodology. The system is powered off (typically by pulling the plug rather than performing a clean shutdown, to avoid the shutdown sequence overwriting forensically interesting state), the storage media is removed (or accessed via a hardware write blocker without removal), and a bit-for-bit image is produced from the static storage. The chain of custody is straightforward; the verification math is clean; the methodology has decades of court acceptance behind it.
Dead acquisition’s costs are equally real. Everything in memory is lost: the running processes, the network connections, the encryption keys held in memory, the in-memory-only malware that never touched disk, the unencrypted contents of files that are encrypted at rest. For systems with full-disk encryption (BitLocker, FileVault, LUKS), dead acquisition produces an image whose contents cannot be examined without recovering the key, which may not be possible from the static image alone.
Live acquisition captures memory and live system state before (or sometimes instead of) the static storage. The methodology requires running acquisition tools on the live system, which inherently modifies the system being examined. The act of running an acquisition tool allocates memory, opens files, creates processes, writes to disk if the tool persists output locally, and may trigger antivirus, EDR, or other monitoring tools that themselves modify state.
The live acquisition tradeoff is between completeness (capturing data that dead acquisition cannot) and rigor (introducing variables that dead acquisition does not). The methodology mitigations are well-established: use known, validated acquisition tools; record the exact tool, version, and invocation; write output to external storage rather than the source; document every action taken on the live system; capture in a sequence that minimizes the perturbation cost.
The modern operational default in DFIR contexts is live triage first, dead-box imaging second. The responder captures memory and a curated set of live artifacts using rapid triage tools (KAPE, Velociraptor, or equivalent), then arranges for a full disk image while the system is held for offline analysis. The triage data informs decisions about containment and scope; the full disk image supports the longer analytical process.
In pure-evidence contexts (law enforcement, internal investigations not involving an ongoing incident), dead acquisition remains the default. The completeness loss from skipping memory is acceptable when the case does not require the live state.
In cloud and ephemeral compute contexts, the live-versus-dead distinction breaks down. There is no physical storage to image. The acquisition methodology becomes API-driven: snapshot the EBS volume, export the disk image from the cloud provider, capture the audit logs from the control plane, request memory dumps from the hypervisor if the provider supports them. The verification math still applies; the procedural workflow is different. The Cloud Forensics subpage covers this case in depth.
The bit-for-bit standard
The acquisition methodology produces a bit-for-bit image of the source: a copy that captures every byte of the source, in order, without interpretation. The image includes:
- All allocated content (files, file system metadata, system areas).
- All unallocated content (regions the file system marks as free, which often contain remnants of previously-deleted data).
- Slack space (the unused portion of the last cluster of a file, which contains residual data from prior file system state).
- File system structures (the MFT on NTFS, the inode table on ext4, the catalog tree on APFS).
- Boot sectors, partition tables, and other system areas outside the file system.
The bit-for-bit standard exists because anything short of it is challengeable on completeness grounds. A logical copy (a file-level backup, an rsync, a tar archive) acquires only the allocated, visible content, which forfeits any data in unallocated space, slack, or file system metadata. Forensic examiners use logical acquisitions only when bit-for-bit acquisition is impossible (encrypted cloud storage where only the decrypted user-facing view is available, for example), and the methodology then has to acknowledge what the logical acquisition could not capture.
The bit-for-bit image is the working copy. The source is sealed and stored. All subsequent analysis happens on the image. If the analytical work modifies the image (file carving, slack-space extraction, in-place decryption), the modification is performed on a copy of the image, not on the image itself. The image is treated as if it were the source.
Image formats
The forensic image format determines what is captured alongside the bit stream itself. The major formats:
Raw / dd format. The simplest format: a literal byte-for-byte copy of the source media, written to a file (or split into segments). No metadata, no compression, no integrity protection within the format itself. Hash values are computed externally and recorded in a sidecar file or in the chain of custody. The format is universally readable (any tool that can read a file can read a raw image) and is the default for open-source workflows. The drawback is that the integrity protection lives outside the image; if the hash sidecar is lost, the image’s integrity cannot be independently verified.
EWF (Expert Witness Format), commonly called E01. The format used by EnCase and the dominant format in U.S. commercial forensic practice. The format wraps the bit stream in a container that includes the acquisition metadata (case number, examiner name, evidence number, acquisition date, notes), the hash values (MD5 and SHA-1, with SHA-256 added in the Ex01 variant), and per-chunk CRCs that detect corruption at the chunk level. Compression is optional and is widely used. The format supports segmentation, allowing a large image to be split across multiple files of manageable size.
The EWF format has been the U.S. commercial standard for over two decades and is supported by every major forensic tool. The successor format Ex01 (sometimes called EnCase Evidence File version 2) extends the integrity protection and adds support for AES encryption of the image contents. Ex01 has not displaced E01 in operational use; most engagements still produce E01.
AFF (Advanced Forensic Format) and AFF4. Originally developed by Simson Garfinkel and now maintained as an open standard, AFF and its successor AFF4 are designed for forensic acquisition with stronger integrity protection and more flexible storage. AFF4 in particular supports content-addressable storage, allowing redundant blocks (common in operating system images) to be stored once with multiple references. The format is supported by the open-source forensic ecosystem (Sleuth Kit, Autopsy, plaso) and a subset of commercial tools.
AFF4 has not achieved the deployment of EWF, but it remains the format of choice for high-assurance forensic work and for engagements where storage efficiency on large image populations matters.
SMART / s01. A less common format with similar capabilities to EWF. Supported primarily by ASR Data’s SMART tool and a few open-source readers. Rarely used in new engagements.
Virtual disk formats (VMDK, VHD, QCOW2). When the source is a virtual machine, the existing virtual disk file is often used as the forensic image. The format is the same as the running VM’s disk and is readable by every virtualization platform and most forensic tools. The methodology has to handle the case where the VM disk is sparse (uses thin provisioning) or differencing (overlays a base image), since the forensic analysis may need to reconstruct the full effective contents.
The choice of format is partly a tooling question (use the format your downstream tools expect) and partly a methodology question (use a format with built-in integrity protection where the storage horizon is long). The standards do not mandate a specific format. U.S. commercial practice defaults to EWF/E01. Open-source practice defaults to raw dd plus a sidecar, or to AFF4 where the tools support it.
Segmentation matters operationally. A 4 TB source image stored as a single file is awkward to copy, transport, and archive. Standard segment sizes are 1 GB, 2 GB, or 4 GB depending on the file system constraints of the storage media. EWF and AFF4 segment natively; raw images are segmented with a --split or equivalent argument to the imaging tool. The segments are hashed individually and as a complete reassembled image; verification at both levels detects corruption either in individual segments or in the reassembly.
Acquisition tools
The acquisition tooling landscape is mature and convergent. The major tools, with their operational characteristics.
dd and its forensic variants. The original Unix tool for byte-level copying. The vanilla dd lacks forensic features (no hashing, no error handling, no progress reporting) and is rarely used directly for forensic acquisition in modern practice. The forensic variants are dc3dd (developed by the U.S. Department of Defense Cyber Crime Center) and dcfldd (its predecessor, developed by the DoD Computer Forensics Lab). Both add hashing during acquisition, error-handling that records bad sectors rather than aborting, progress reporting, and verification math. dc3dd is the current open-source standard for raw imaging.
ewfacquire. Part of the libewf library, this is the open-source tool for producing EWF (E01) images. Supports compression, segmentation, metadata recording, and verification. The standard choice when an open-source workflow needs to produce EWF output for compatibility with commercial tools.
Guymager. A GUI imaging tool for Linux, popular on forensic Linux distributions (CAINE, SIFT). Supports raw, EWF, and AFF formats with multi-threaded acquisition. Strong choice for examiners who prefer GUI workflows but want open-source tooling.
FTK Imager. AccessData’s (now Exterro’s) free imaging tool. Supports raw, EWF (E01 and Ex01), and SMART formats. Available for Windows, Linux, and macOS. The most widely-used Windows-based imaging tool, partly because it is free, partly because it is integrated with the FTK examination workflow. Has imaging quirks (occasional verification mismatches on very large images, version-specific behavior on encrypted volumes) that the methodology has to account for.
Tableau Imager. OpenText’s GUI imaging tool, designed to pair with Tableau hardware write blockers. Supports the standard formats and integrates the write-blocker status into the acquisition log. The standard choice in EnCase-centric workflows.
EnCase Imager. A subset of EnCase Forensic focused on imaging. Produces EWF images natively. Used in shops where EnCase is the analytical platform.
X-Ways Imager. X-Ways Forensics’ integrated imaging capability. Fast, technically dense, and the tool of choice in examiner workflows where X-Ways is the analytical platform.
KAPE (Kroll Artifact Parser and Extractor). A rapid triage tool rather than a traditional imaging tool. Captures a curated set of artifacts (registry hives, event logs, prefetch, browser history, ShellBags, hundreds more) from a live or dead system in minutes. Output is not a bit-for-bit image but a structured collection of high-value artifacts. KAPE has become the standard tool for live triage in DFIR workflows. The full-image acquisition still happens; KAPE feeds the responder’s immediate analytical needs while the full image is being prepared.
Velociraptor. An open-source endpoint visibility and collection platform. Supports remote artifact collection at scale across an enterprise, including memory acquisition and selective disk acquisition. The fit-for-purpose tool when the acquisition target is “a hundred endpoints in a single investigation” rather than a single system.
Cellebrite, GrayKey, Magnet AXIOM, Oxygen Forensic Detective. Mobile-specific acquisition tools, covered in the Mobile Forensics subpage.
Magnet RAM Capture, Belkasoft Live RAM Capturer, WinPmem, AVML, LiME. Memory acquisition tools, covered below.
The tool choice depends on the operating environment (Windows examiner station versus Linux forensic distribution), the format requirements (does the downstream analysis tool need EWF, AFF4, or raw), and the acquisition scenario (single system imaging versus enterprise-wide collection). Most examiners use multiple tools: dc3dd or Guymager for raw images of physical media, FTK Imager for routine Windows acquisitions, KAPE for triage, Velociraptor for scale.
Verification math
The verification math is the technical mechanism that demonstrates the acquired image is identical to the source. The standard methodology produces hashes at multiple points:
- Hash of the source media, computed during acquisition by the imaging tool reading the source. This is the reference hash that subsequent verifications compare against.
- Hash of the acquired image, computed during acquisition by the same tool writing the image. The two hashes should match. If they do not, the acquisition is invalid and has to be repeated.
- Verification hash, computed after acquisition by re-reading the acquired image. Confirms that the image has not been corrupted by the storage medium it was written to.
- Per-chunk hashes (in EWF and AFF4), allowing corruption to be localized to specific regions of the image rather than detected only as an image-level mismatch.
The dual computation during acquisition (source and image hashed simultaneously, as the data flows through the imaging tool) is the part that proves the image faithfully captured the source. If the imaging tool computes only the image hash, the verification only proves the image is internally consistent. It does not prove the image matches the source. Modern forensic imaging tools all compute both hashes during acquisition; legacy tools or naive dd workflows sometimes do not, and the methodology has to account for the gap.
The algorithm choice is covered in the Evidence Handling and Chain of Custody subpage. The short version: MD5 and SHA-1 remain in operational use despite cryptographic weaknesses because the threat model is integrity-against-error, not integrity-against-adversary. Modern practice records SHA-256 alongside the legacy hashes.
When an acquisition produces a hash mismatch (source and image hashes differ, or verification hash differs from acquisition hash), the methodology has to acknowledge and address the mismatch. The standard responses, in order of preference: re-acquire if the source is still accessible; locate the corrupted region using per-chunk hashes and document the affected data; declare the acquisition invalid if the corruption is widespread. Recalculating and silently recording a new “correct” hash is not a permitted response; it is a chain-of-custody failure that destroys the integrity proof.
Special acquisition cases
The standard bit-for-bit methodology assumes the source is accessible physical storage with a recognizable file system. Several common scenarios deviate from that assumption.
RAID arrays. Hardware and software RAID present multiple physical disks as a single logical volume. Forensic acquisition can target the individual physical disks (preserving the original storage state but requiring the RAID to be reconstructed during analysis) or the assembled logical volume (giving immediate file-system access but requiring the RAID controller or software to be trusted during acquisition). The methodology depends on whether the RAID metadata is preserved and whether the analysis tools can reassemble the array. For software RAID (mdadm on Linux, Windows Storage Spaces), the metadata is on the disks themselves and acquisition of the physical disks preserves all the information needed to rebuild the array offline. For hardware RAID, the controller’s metadata may not be accessible from the raw disks, and acquisition of the assembled volume may be the only option.
LVM, ZFS, Btrfs, and other storage virtualization layers. Logical volume management and modern file systems with built-in volume management require the storage layer to be reassembled before the file system is visible. The methodology is to acquire all physical members of the volume group, then reconstruct the logical volume during analysis. The verification math applies to the physical disks; the file system contents are derived from the reconstructed volume.
Full-disk encryption. BitLocker, FileVault, and LUKS encrypt the entire volume below the file system layer. Dead acquisition produces an image of the encrypted ciphertext, which cannot be examined without the key. The methodology options are: recover the key from a live system before powering down (the key is in memory on a running system); obtain the recovery key from the user, the IT department, or the key escrow service; obtain the key through legal process if available; or acknowledge that the image cannot be examined and document the limitation. The Memory Forensics subpage covers in-memory key recovery in detail.
Cloud storage and snapshots. Cloud-resident data is acquired through provider APIs rather than physical access. The methodology produces a cloud snapshot (EBS snapshot in AWS, managed disk snapshot in Azure, persistent disk snapshot in GCP), exports the snapshot as a disk image, and proceeds with standard analysis. The acquisition log records the snapshot ID, the export operation, and the hash of the exported image. The provider’s audit logs (CloudTrail, Activity Log, Audit Logs) record the snapshot and export operations and serve as the chain-of-custody equivalent for the cloud-side actions.
Container and Kubernetes acquisition. Running containers are ephemeral and acquisition has to happen before the container exits. The standard approach is to acquire the container’s writable layer (the file system changes the container has made since being instantiated from its image), the container’s memory, and the relevant runtime logs. For Kubernetes specifically, the etcd state, the kubelet logs, and the container runtime logs constitute the cluster-level forensic surface; the container-level acquisition is one layer below that.
Virtual machines. When the source is a VM, the existing virtual disk file (VMDK, VHD, QCOW2) can serve as the forensic image, provided the file system permissions allow read-only access and the file is not actively being written to. The methodology either pauses or snapshots the VM to obtain a consistent image, then copies the virtual disk file with verification. The memory of the VM is captured separately, typically through a hypervisor-level memory dump (VMware’s vmss2core, the QEMU memory snapshot, etc.).
Network-attached storage and SAN volumes. Storage accessed via iSCSI, NFS, or SMB requires acquisition methodology that accounts for the network as part of the path. Hardware write blockers do not exist for network-attached storage; the protection mechanism is the storage system’s own access controls (read-only export, snapshot-based access). The acquisition is performed against a snapshot or a read-only mount, and the chain of custody records the access methodology.
Memory acquisition
Memory acquisition is a distinct discipline from disk imaging and deserves separate treatment. The challenges are different: memory is volatile and changes continuously, the acquisition tool itself runs in memory and perturbs the very state it is capturing, and the hardware mechanisms for accessing physical memory vary across platforms.
The Windows memory acquisition tools include Magnet RAM Capture, Belkasoft Live RAM Capturer, FTK Imager (which has a memory acquisition feature), and WinPmem (the open-source standard). The mechanism is a kernel driver loaded specifically for the acquisition, which reads physical memory and writes the contents to external storage. The driver is the trust anchor; a malicious driver could alter the captured memory as it was being read. Modern tools mitigate this by using signed drivers and recording the driver’s hash in the acquisition log.
The Linux memory acquisition tools include LiME (Linux Memory Extractor) and AVML (Acquire Volatile Memory for Linux), the latter developed by Microsoft. Both work by loading a kernel module that accesses physical memory and writes the contents to a file or network destination.
The macOS memory acquisition story is harder. Apple’s System Integrity Protection and the kernel extension deprecation have made kernel-level memory access progressively more difficult. Modern macOS forensic acquisition often relies on hypervisor-level access (acquiring the memory of a macOS VM) or on user-mode partial captures.
The memory image format is typically raw: a flat dump of physical memory contents, sometimes wrapped in a container (AFF4-Memory, or proprietary formats from specific vendors) that includes the system information needed for subsequent analysis. The standard analytical tool is Volatility, covered in the Memory Forensics subpage.
The memory acquisition workflow is sensitive to ordering. The act of running the acquisition tool allocates memory, which may overwrite the memory regions of interest. The mitigation is to use small-footprint acquisition tools (Magnet RAM Capture’s footprint is on the order of a few MB), to write the output to external storage rather than the source system’s disk, and to run as early in the acquisition sequence as possible: memory first, before any other tool that would allocate or modify memory.
Acquisition failure modes
The failure modes that recur often enough to be worth naming.
The half-acquired image. The acquisition tool encountered an error mid-acquisition, produced a partial image, and the partial image was not recognized as such. The hash matches what the tool computed, but the tool stopped before reading the entire source. The mitigation is to verify the image size against the known source size at the end of the acquisition and to treat any mismatch as a failure.
The unverified source hash. The acquisition tool computed the image hash but not the source hash. The image’s internal consistency is verified, but there is no proof that the image faithfully captures the source. The mitigation is to use tools that compute both hashes during acquisition, and to read the source a second time independently to verify the source hash if necessary.
The skipped bad sector. The source media has unreadable sectors. The imaging tool either aborts (producing an incomplete image), retries indefinitely (never finishing), or skips with a zero-fill (producing an image whose contents differ from the source). The methodology has to choose explicitly: dc3dd and dcfldd log bad sectors and continue with zero-fill, recording the affected ranges. The chain of custody documents the skipped regions. The alternative is a hardware imager (Atola, Tableau) with specialized bad-sector recovery, which produces a more complete image when the bad sectors are recoverable.
The encryption surprise. The acquisition completes successfully, the hashes verify, and then the examination discovers that the image is encrypted and the keys were not captured. Dead acquisition of an encrypted system is a complete loss if the keys cannot be recovered separately. The mitigation is to assess encryption status before powering down and to choose live acquisition (or memory-first acquisition) when the system is encrypted.
The triggered system response. The acquisition tool, or the act of attaching the source media to the examiner’s system, triggers a response: antivirus scan, EDR alert, automated takeover, anti-forensics tripwire. The mitigation is acquisition on an isolated system (a dedicated forensic workstation with no network connectivity, antivirus disabled, indexing disabled, automount disabled) and using hardware write blockers to prevent any inadvertent writes that might trigger detection.
The unverified image transport. The image was acquired correctly, the verification hash matched, and then the image was copied to long-term storage without re-hashing. A transport error corrupted the image between the original acquisition workstation and the storage location. The mitigation is to re-verify the hash after every transfer, not just at acquisition.
The wrong image format for the downstream workflow. The image was acquired as raw dd, the analysis tool expected EWF and chokes on the format mismatch. The acquisition is technically intact but the workflow stalls. The mitigation is to coordinate the format choice with the downstream analytical workflow before acquisition.
Validation and tool testing
The NIST Computer Forensics Tool Testing (CFTT) program is the standards-body validation effort for forensic acquisition tools. The program tests imaging tools against documented requirements (does the tool produce a complete image, does it handle bad sectors as specified, does it report errors correctly), publishes the results, and provides the methodological support that holds up in court when the tool’s behavior is questioned.
CFTT reports exist for the major imaging tools (FTK Imager, EnCase Imager, Guymager, dc3dd) and for the major hardware write blockers (Tableau, WiebeTech). The reports are public and can be cited in expert testimony to establish that a particular tool, in a particular version, has been independently tested for the specific acquisition behaviors at issue in the case.
The CFTT program is one of the strongest answers to a Daubert challenge to forensic tooling. The methodology question “did your imaging tool produce a complete and accurate image” is answered, in part, by reference to the CFTT validation of the tool used. The validation is not absolute proof (tools have version-specific bugs that may not be caught by the CFTT testing), but it is the strongest available structural support for tool reliability.
Examiner-level validation supplements the CFTT testing. The standard practice is to verify each acquisition tool’s behavior in the examiner’s specific environment before using it for evidence work: image a test source whose contents are known, compare the result against expectation, document the validation. The validation record is part of the methodology that supports the examination.
The acquisition step is the part of a forensic engagement where rigor pays off most disproportionately. An acquisition done well sets up everything that follows. An acquisition done poorly limits everything that follows in ways that cannot be repaired. The methodology choices that drive the rest of the case sit here: order of volatility, live versus dead, image format, acquisition tool, verification math.
The connected pages cover the procedural and analytical sides of the work the acquisition supports: Evidence Handling and Chain of Custody covers the procedural framework that the acquisition sits inside; Disk and File System Forensics covers the analysis of the resulting disk images; Memory Forensics covers the analysis of memory captures; and Court Admissibility and Expert Testimony covers the legal framework that the acquisition methodology is ultimately serving. The Digital Forensics hub covers the discipline as a whole.