§ Trackr.Live

Malware Analysis

Malware analysis is the analytical discipline of examining malicious code to understand what it does, how it does it, and what evidence it leaves behind. The discipline shares its conceptual foundations with reverse engineering. Both involve recovering the behavior of a binary from its compiled form. But the operational orientation is different. Malware analysis is defender-aligned, focused on producing the indicators of compromise, the behavioral summary, and the family attribution that downstream incident response and threat intelligence consume. The work happens at the intersection of forensics (which surfaces the binaries from disk and memory captures), reverse engineering (which extracts the behavior), and threat intelligence (which contextualizes the family and the campaign).

This page covers the discipline as it actually exists: the static analysis workflow (examining the binary without execution), the dynamic analysis workflow (controlled execution in sandboxes and debuggers), the disassembler / decompiler ecosystem (Ghidra, IDA Pro, Binary Ninja, radare2), the sandboxing landscape (Cuckoo, CAPE, Any.run, Joe Sandbox), the anti-analysis arms race that has shaped both sides for two decades, configuration extraction from C2-bearing binaries, the YARA classification system that underlies most modern malware identification, the ransomware-specific patterns including the cryptographic mistakes that occasionally enable decryption, and the report-level output that the analysis produces.

Memory forensics surfaces what the malware did on the victim system; network forensics surfaces what it did on the wire; disk forensics surfaces what it persisted. Malware analysis goes one layer deeper: given the binary itself, what does it do in general, not just what did it do in this specific case. The output is durable knowledge that applies across every system the same binary touches (IOCs, behavioral signatures, family attribution, detection rules) rather than the per-incident reconstruction that the other forensic disciplines produce.

The malware taxonomy

A working taxonomy of what malware analysts encounter, because the analytical workflow varies by category.

Trojans are programs that masquerade as legitimate software while performing malicious actions. The category is the broadest in modern malware; most observed malware fits the trojan label at some level. Subcategories include downloaders (which fetch and execute additional payloads), droppers (which carry embedded payloads they extract and run), and loaders (which decode and run encrypted payloads).

Remote Access Trojans (RATs) provide remote control of the infected system. The category includes both commodity tools (Quasar, NjRAT, AsyncRAT) and bespoke nation-state implants. RATs typically have a command-and-control protocol, a set of remote-administration capabilities (file transfer, command execution, screenshot, keylog, webcam, microphone), and persistence mechanisms.

Beacons are a subset of RATs designed for periodic check-in with C2 rather than continuous control. Cobalt Strike beacons are the dominant example; Sliver, Mythic, and Brute Ratel have grown alongside it. Beacons are typically reflectively loaded into legitimate processes and operate from memory; the disk artifact is often just the initial loader.

Ransomware encrypts the victim’s data and demands payment for decryption. Modern ransomware operations also exfiltrate data before encryption (double extortion) and threaten public release. The analytical interest in ransomware is partly the encryption scheme (is decryption possible without paying?) and partly the family attribution that supports the broader response.

Wipers destroy data without encrypting it. The category overlaps with ransomware operationally (wipers are sometimes ransomware with no functioning recovery path), and analytically the analysis focuses on the destructive routine and the trigger conditions.

Infostealers harvest credentials, browser-stored data, cryptocurrency wallets, and other valuable data, exfiltrating to attacker-controlled infrastructure. RedLine, Vidar, Raccoon, and Lumma are the dominant commodity infostealers; the bespoke nation-state equivalents are tracked separately.

Rootkits modify the operating system to hide the malware’s presence. The category divides into user-mode rootkits (DLL injection, API hooking, hidden processes) and kernel-mode rootkits (driver-based hiding, kernel object manipulation). Kernel-mode rootkits are increasingly rare on Windows because of driver signing requirements; user-mode rootkits remain common.

Bootkits and firmware implants persist below the operating system layer. The category includes UEFI implants (BlackLotus, MoonBounce), MBR-resident malware (historical), and the various firmware-resident techniques. Analysis requires the equivalent of bare-metal forensics; the methodology is different from standard binary analysis.

Banking trojans target online banking sessions, typically through web injection or browser hooking. The category has been shrinking as banking moves to mobile, but specific families (TrickBot, Emotet, IcedID, QakBot) have repeatedly resurfaced or been replaced by successors.

Cryptojackers mine cryptocurrency on the victim’s hardware. The analytical work is typically straightforward; the operational interest is which mining pool the cryptojacker connects to and what coverage exists for blocking it.

Mobile spyware targets iOS and Android devices. The category includes commodity stalkerware (FlexiSpy, mSpy and equivalents) and high-end commercial spyware (Pegasus from NSO, Predator from Intellexa, Reign from Quadream). The analysis of mobile spyware is the analytical front line on commercial-grade spyware.

Worms propagate without user interaction, typically by exploiting network vulnerabilities. The category has shrunk substantially since the WannaCry / NotPetya era; modern incidents involving worm-like propagation are usually scripted by the attacker rather than autonomous.

The analytical workflow adjusts by category: ransomware analysis focuses heavily on the encryption routine; beacon analysis focuses on the C2 protocol and configuration; infostealer analysis focuses on what’s stolen and where it goes; rootkit analysis focuses on the hooking and hiding mechanisms.

Static analysis

Static analysis is the examination of the binary without executing it. The technique is the lowest-cost starting point. The binary can be examined without risk of infection or alteration to the analyst’s system, and it produces substantial output before dynamic analysis is needed.

File identification. The first question for any sample is “what is this.” File magic bytes identify the format (PE for Windows, ELF for Linux, Mach-O for macOS, APK for Android, MZ for the legacy DOS header, ZIP-based containers for Office documents and many archives). The file utility produces a basic identification; more specialized tools (Detect It Easy / DIE, PEiD, TrID) identify packers, compilers, and language runtimes.

Hash lookups. SHA-256, SHA-1, and MD5 of the sample are computed and looked up against known-malware databases. VirusTotal is the public-access standard; commercial threat intelligence platforms (Mandiant Advantage, Recorded Future, CrowdStrike Falcon Intelligence) provide additional context. The answer to “is this known” determines whether the analysis is rediscovery or genuinely new.

Fuzzy hashing. ssdeep, TLSH, and similar fuzzy hashes match samples with substantially-similar bytes: the same family with minor variations, the same packed payload with different stubs. The technique is useful for clustering samples even when exact hashes differ. PE-specific equivalents (imphash, vhash, rich header hashes) identify samples by their import tables, version metadata, or compiler signatures.

Strings extraction. Running strings against the sample produces a list of printable byte sequences. The cheap-first technique surfaces URLs, file paths, registry keys, error messages, command lines, and any string that the malware did not bother to obfuscate. Modern malware often obfuscates strings to defeat this; the FLOSS tool (Mandiant) extracts both static and obfuscated strings by detecting common encoding patterns.

PE / ELF / Mach-O header analysis. The binary format header reveals substantial structural information. For Windows PE files, the header includes the entry point address, the section layout (.text, .data, .rdata, .rsrc, .reloc), the import table (which APIs the binary calls), the export table (which functions the binary exposes), the resources (icons, dialogs, embedded data), and the timestamps. Tools like PE-bear, pestudio, CFF Explorer, and the open-source pefile library parse these headers; analysts use them to characterize the binary before opening a disassembler.

The import table is particularly forensically valuable. A binary that imports CreateRemoteThread, WriteProcessMemory, and VirtualAllocEx is positioned to inject code into other processes. A binary that imports CryptEncrypt, CryptGenKey, and CryptAcquireContext is positioned to do cryptographic operations. The import table is not proof of behavior (the binary may import APIs it never calls), but it is strong evidence of what the binary could do.

Disassembly. Converting the binary’s machine code into assembly language is the foundational reverse-engineering step. The major disassemblers:

  • Ghidra (NSA, released open-source in 2019) is the dominant free disassembler with decompilation. The tool covers x86, x86-64, ARM, MIPS, PowerPC, and many other architectures, with a substantial extension ecosystem. Ghidra has displaced IDA Pro as the default choice for new analysts because of its price (free) and its decompiler quality.
  • IDA Pro (Hex-Rays) is the long-running commercial standard, with the most extensive analysis features and the most mature decompiler (the Hex-Rays decompiler is widely considered the best in the industry). IDA Pro is expensive enough that many analysts use Ghidra by default and reach for IDA Pro for specific situations.
  • Binary Ninja (Vector 35) is a newer commercial disassembler with a strong API and scriptable workflow. The tool has gained adoption among analysts who value the API ergonomics.
  • radare2 and Cutter are open-source command-line and GUI alternatives. Less polished than Ghidra but valuable for scripted analysis workflows.
  • Hopper is a macOS-focused disassembler with decompilation, useful for Mach-O analysis.
  • objdump and readelf are the standard Unix tools for basic ELF disassembly and structure dumping.

Decompilation. Recovering C-like source code from disassembly is the analytical productivity multiplier. The major decompilers (Hex-Rays in IDA Pro, Ghidra’s decompiler, Binary Ninja’s, RetDec) produce output of varying quality depending on the binary’s compiler, optimization level, and obfuscation. Decompilation does not recover the original source code (variable names are synthetic, control flow may be reorganized, optimizations are not always reversed), but the output is substantially faster to read than raw assembly.

Control flow analysis. The disassembler produces a control-flow graph (CFG) that shows the branches, loops, and function calls. Analysts use the CFG to identify the binary’s main control loop, the suspicious functions (those with cryptographic imports, network imports, or file-system imports), and the relationships between functions.

Packing and obfuscation

A substantial fraction of modern malware is packed: the original binary is compressed or encrypted, with a small stub that unpacks the original code at runtime. Static analysis of the packed form reveals the stub but not the unpacked payload; analyzing the unpacked code requires either dynamic analysis to capture the runtime state or manual unpacking.

Common packers. UPX is the original and most common packer; it is trivially detected and unpacked with upx -d. Beyond UPX, the landscape includes:

  • Themida and VMProtect are commercial protectors that combine packing with code virtualization. Both are widely used by malware that wants to defeat static analysis; both produce binaries that are substantially harder to analyze than UPX-packed code.
  • ASPack, MPRESS, Petite, Enigma Protector are additional packers in active use.
  • Custom packers are common in nation-state and bespoke malware: the packer is unique to the malware family, with no public unpacker available.

Identifying packed code. Several signatures suggest packing: a single small section that contains all the imports and entry-point code (with empty or near-empty other sections), high entropy in the body of the binary (encrypted content has near-random entropy), an unusually small import table (most packers resolve imports at runtime rather than statically), and the entry point in an unusual section. Tools like DIE and PE-bear report these characteristics.

Manual unpacking. When no public unpacker exists, the standard technique is dynamic unpacking: run the packed binary under a debugger, allow the unpacker stub to execute, and dump the unpacked code from memory once it has been written to its execution address. The tools (Scylla for import-table reconstruction, x64dbg with the ScyllaHide plugin for anti-debug evasion, MegaDumper) support the workflow. The result is a binary that can be statically analyzed against the unpacked content.

Code virtualization is a particular form of obfuscation that converts native code into bytecode for a custom virtual machine, with the VM interpreter bundled into the binary. The result is that disassembly shows the VM interpreter, but the malware’s actual logic is in the bytecode that the VM executes. Analyzing virtualized code requires reverse-engineering the VM itself first, substantial work that the analyst sometimes skips by switching to dynamic analysis.

Dynamic analysis

Dynamic analysis runs the malware in a controlled environment to observe its behavior. The technique complements static analysis by revealing what the code actually does when executed, which catches behaviors that the static analysis would miss (runtime API resolution, packed code, network behavior, conditional logic that doesn’t fire on the analyst’s path).

Sandboxing. Automated sandboxes execute the sample in an instrumented virtual machine and produce a structured behavioral report. The major sandboxes:

  • Cuckoo Sandbox is the original open-source sandbox, with a substantial deployment base. The project’s active maintenance has been intermittent; forks (CAPE Sandbox, Cuckoo3) carry the work forward.
  • CAPE Sandbox (Configuration and Payload Extraction) is a Cuckoo fork focused on configuration extraction from malware. CAPE has become the de facto open-source standard, with built-in extractors for many common families.
  • Any.run is a commercial cloud-based sandbox with an interactive interface. The analyst can interact with the sample as it runs, which is useful for malware that requires user actions to fully execute.
  • Joe Sandbox is a commercial sandbox with strong reporting and a large coverage matrix.
  • Hatching Triage (now Recorded Future Triage) is a commercial cloud sandbox positioned for high-throughput automated analysis.
  • VMRay is a commercial sandbox using a different architecture (hypervisor-based observation rather than agent-based) that is harder for malware to detect.

Sandbox output typically includes: the executed process tree, the file system changes, the registry changes, the network connections attempted (with the DNS queries and the HTTP requests), the API calls made (sequence and parameters), the loaded modules, the dropped files, and any detected behaviors flagged by built-in rules.

Sandbox evasion is the arms-race counterpart. Modern malware checks for sandbox indicators before executing the malicious payload: specific MAC addresses, BIOS strings, CPU vendor IDs, process names of monitoring tools, registry keys created by virtualization software, file artifacts (VirtualBox guest additions, VMware tools), timing checks (operations that take much longer in a hypervisor), interactive elements (mouse movement, keyboard input), and the presence of standard sandbox infrastructure. Sandbox operators counter with hardening, replacing the indicators that malware checks for, and the arms race continues.

The practical reality is that sandbox results have to be evaluated for evasion. A sample that produces minimal behavior in a sandbox may be evading detection rather than being benign. The mitigation is to run samples in multiple sandboxes with different configurations and to follow up with manual analysis when sandbox results look suspiciously thin.

Behavioral monitoring on bare metal. When sandboxing is not sufficient (heavy evasion, hardware-specific behaviors, samples that require domain-joined environments), analysts execute samples in dedicated analysis VMs with monitoring tools. The standard toolset:

  • Process Monitor (Procmon) from Sysinternals captures file system, registry, network, and process activity at the kernel level. Procmon is the workhorse tool for behavioral observation on Windows.
  • Process Hacker / System Informer is the modern replacement for Process Explorer, providing per-process examination, memory inspection, and thread enumeration.
  • API Monitor captures Windows API calls per process, including parameters and return values.
  • Sysmon (System Monitor) is the kernel-mode logging service that records process creation, network connections, file creates, registry modifications, and several other event types. Sysmon’s output goes to the Windows Event Log and is the standard input for many detection rules.
  • Wireshark captures network traffic from the sandboxed machine.
  • INetSim simulates internet services (DNS, HTTP, FTP) for offline analysis. The sandbox responds as if it were the internet, allowing the malware to proceed through its network-dependent behaviors without actually reaching real C2 infrastructure.

Debugging. Live debugging allows the analyst to step through the malware’s execution, set breakpoints, examine memory at specific points, and intervene in the control flow. The standard debuggers:

  • x64dbg (and x32dbg for 32-bit) is the open-source Windows debugger that has displaced OllyDbg as the default. The ScyllaHide plugin provides anti-anti-debug capabilities.
  • WinDbg is Microsoft’s official debugger, with strong kernel debugging support. WinDbg Preview is the modern UI; the classic WinDbg remains in use.
  • OllyDbg is the legacy 32-bit Windows debugger, still in occasional use for legacy malware.
  • gdb with pwndbg or gef extensions is the Linux standard. The IDA Pro debugger and the Ghidra debugger integrations are also available.

Anti-debug techniques that malware uses include checking IsDebuggerPresent, examining NtGlobalFlag in the PEB, checking for debug hardware breakpoints, using RDTSC timing checks, and registering structured exception handlers that detect single-step exceptions. The anti-anti-debug techniques (ScyllaHide, x64dbg’s anti-anti-debug plugins, manual patching of the checks) counter each technique with a specific bypass.

The reverse engineering workflow

A typical workflow for analyzing an unknown binary, in rough order.

  1. Triage. Hash the sample, look up the hash, run it through a sandbox if one is available, scan with YARA rules. The output is either “known family, here’s the family report” or “unknown, proceed to manual analysis.”
  2. Static identification. Run file, DIE, PE-bear, FLOSS strings. Identify the file type, the language and compiler, any obvious packing, the import table, the resources. The output is a characterization of what the binary looks like before execution.
  3. Static analysis in a disassembler. Open in Ghidra or IDA Pro. Identify the entry point, the main function, the suspicious functions (typically those calling imports of interest), the strings of interest. Begin building a mental model of the binary’s structure.
  4. Dynamic analysis with monitoring. Execute the sample in a sandbox or instrumented VM. Capture the behavioral output. Correlate the observed behaviors with the static analysis: which function did the API call come from, which path through the code produced the file drop, what triggered the network connection.
  5. Targeted reverse engineering. Focus on the specific behaviors of interest: the configuration extraction, the encryption routine, the C2 protocol, the persistence mechanism. The static analysis is targeted at specific functions identified in the dynamic analysis.
  6. Configuration extraction. For malware with embedded configuration (C2 URLs, encryption keys, behavior flags), extract the configuration from the binary. The result is concrete IOCs that the report can include.
  7. YARA rule writing. Produce YARA rules that match the family or the specific sample, for use in subsequent hunting.
  8. Report writing. Produce the structured output for downstream consumption: IOCs, behavioral summary, ATT&CK mapping, family attribution, decryption possibility (for ransomware).

The workflow is iterative. The dynamic analysis informs the static analysis, the static analysis informs the next dynamic run, the configuration extraction sometimes requires additional reverse engineering, and so on. Skilled analysts move through the steps in different orders depending on what the binary reveals at each stage.

Configuration extraction

A substantial subset of modern malware embeds its configuration in the binary: the C2 URLs, the campaign identifiers, the encryption keys, the behavioral flags. Extracting the configuration is one of the highest-value analytical outputs because it produces immediate, actionable IOCs.

The pattern. Most malware families have a configuration block stored in the binary in encrypted or obfuscated form. At runtime, the malware decodes the configuration and uses it to control behavior. The configuration block typically includes:

  • C2 URLs or IP addresses, often as a primary plus several fallbacks.
  • The communication protocol or protocol variant.
  • The campaign identifier (used by the operator to track infections back to specific campaigns).
  • Encryption keys for the C2 communication.
  • Behavior flags (whether to persist, whether to spread, whether to wipe).
  • The sleep interval between check-ins.
  • The user-agent or other protocol-level identifiers.

Family-specific extractors. The open-source community has built configuration extractors for most major malware families. CAPE Sandbox includes hundreds of extractors. malduck (CERT.PL) is a framework for writing configuration extractors. mwcfg is the Mandiant equivalent. Family-specific tools exist for high-prevalence malware: 1768.py (Didier Stevens) for Cobalt Strike beacons, CobaltStrikeScan for the same, trickbot_config for TrickBot, and many others.

The workflow. Identify the family (through YARA matching or behavioral characteristics), find or write the configuration extractor, run it against the sample, parse the output. The result is a set of IOCs in a structured format (domain names, IP addresses, campaign codes, key material) that the report can consume.

When no public extractor exists, the analyst reverse-engineers the configuration decoding routine in the binary, identifies the encoding (XOR with a key, RC4, AES, custom encoding), implements the decoder externally, and applies it to the encoded configuration block. The resulting decoded configuration produces the same IOCs as a public extractor would.

Ransomware analysis specifics

Ransomware analysis has specific patterns worth covering because the discipline asks specific questions that other malware categories do not.

Encryption algorithm identification. The first analytical question is “what encryption is being used.” Modern ransomware typically combines symmetric encryption for file contents (AES, ChaCha20) with asymmetric encryption for the symmetric keys (RSA, Curve25519). Identification proceeds through the disassembly of the encryption routine, the cryptographic API calls (Windows CryptoAPI, CNG, or custom implementations), and the constants visible in the code (S-box tables for AES, magic numbers for specific algorithms).

Key generation analysis. The decryption question depends on how keys are generated. Ransomware that uses:

  • A unique random key per file, encrypted with the operator’s public key, with no recoverable seed, is decryption-resistant in the standard cryptographic sense. The decryption requires the operator’s private key.
  • A deterministic key derived from a static seed (a file path, a victim ID, a hardcoded value) may be decryptable if the seed can be recovered.
  • A flawed key generation routine (using time-based seeds with insufficient entropy, using broken PRNGs, deriving keys from victim-visible material) may produce keys that defenders can recover. Several ransomware families have had decryption tools released because of cryptographic mistakes.

Decryption possibility. Several outcomes are possible: cryptographically sound implementation with no recoverable keys (no decryption possible without paying); operator-side decryption tool capture (the operator’s decryption infrastructure is sometimes seized in law enforcement operations, with the keys released publicly); cryptographic mistakes that enable decryption (rare but they happen, with the NoMoreRansom project tracking them); and ransomware that exits without functioning encryption (rare but it happens, particularly with bespoke or hastily-written families).

The NoMoreRansom project maintains decryption tools for ransomware families where decryption is possible. The analytical workflow includes checking whether the family is on NoMoreRansom before reporting that decryption requires payment.

Family identification. Ransomware family attribution matters because it informs the broader response: whether negotiation is feasible, whether the operator has historically delivered decryption tools after payment, whether law enforcement has specific intelligence on the operation. The major active families (LockBit, Cl0p, ALPHV/BlackCat, Akira, Royal, the various RaaS operations) each have known characteristics that the analysis identifies.

Exfiltration analysis. Modern ransomware operations exfiltrate data before encryption. The exfiltration patterns are part of the analysis: what data was targeted, what destination was used, what protocol was used. The exfiltration is often the more important part of the incident from a legal-exposure standpoint than the encryption itself.

Mobile malware analysis

Mobile malware analysis covers the binaries that mobile forensic engagements surface.

Android APK analysis. An APK is a ZIP container with a specific internal layout: classes.dex (the Dalvik bytecode), AndroidManifest.xml (the permissions, components, and entry points), resources.arsc (the compiled resources), assets (raw files), and a META-INF directory with signatures. The analytical workflow:

  • Decompile the DEX to Java with jadx or dex2jar plus JD-GUI. The output is reconstructed Java source code, of varying quality depending on obfuscation.
  • Decompile the manifest with apktool. The output is the human-readable AndroidManifest.xml that declares the application’s permissions, exported components, and intent filters.
  • Examine the native libraries in lib/<architecture>/ with standard ELF tools (Ghidra, IDA Pro). The native code is where the more advanced malware logic typically lives.
  • MobSF (Mobile Security Framework) is the de facto open-source analysis platform for APKs, providing automated static and dynamic analysis.

The Android malware landscape includes commodity banking trojans (Anubis, Cerberus, Hydra), spyware (the various surveillanceware families), and accessibility-service-abusing malware that takes advantage of the powerful but dangerous accessibility API.

iOS binary analysis. iOS binaries are Mach-O executables. The analytical tooling is Ghidra, IDA Pro, Hopper, and lldb. The challenges are that iOS binaries are typically code-signed, encrypted on disk (FairPlay encryption), and obtained only through specific channels: jailbroken devices that allow decrypted-binary extraction, the IPA files for non-encrypted apps, or law-enforcement-grade extraction.

The high-end commercial spyware (Pegasus, Predator, Reign) is the analytical front line on iOS. The Citizen Lab and Amnesty Tech analyses of Pegasus established much of the public knowledge about the family; the analytical workflow involved extracting the implant from infected devices, reverse-engineering the exploitation chain, and identifying the infrastructure indicators that allowed detection.

Mobile dynamic analysis. Frida (a dynamic instrumentation toolkit) is the dominant tool for mobile dynamic analysis. It hooks function calls, intercepts API access, and allows runtime modification. Frida works on both iOS (jailbroken) and Android (rooted), with substantial library coverage for the typical analytical use cases.

YARA, the classification system

YARA is the de facto standard for malware classification. A YARA rule defines patterns (byte sequences, strings, conditions) that match samples; a YARA scanner applies a ruleset to a sample (or a memory image, or a file system) and reports matches.

Rule structure. A YARA rule consists of metadata (description, author, date), a strings section (named patterns the rule looks for), and a condition (a logical expression over the strings). A simple rule:

rule Generic_Cobalt_Strike_Beacon
{
    meta:
        description = "Detects Cobalt Strike beacon"
    strings:
        $config = { 2e 2f 2e 2f 2e 2c 40 ?? 2e 30 2e 31 }
        $reflective = "ReflectiveLoader"
    condition:
        any of them
}

YARA rules can match byte patterns with wildcards, strings (ASCII or wide), regular expressions, and combinations. The condition language supports counts, offsets, file size, and module-specific conditions (the PE module exposes PE-header fields, the ELF module exposes ELF-header fields, the math module exposes entropy and other statistical measures).

The rule ecosystem. YARA rules circulate through several channels:

  • Public open-source repositories like the YARA-Rules project, the Citizen Lab repository, and the various threat intelligence firm public rules.
  • Commercial threat intelligence subscriptions that include rule feeds (Mandiant, CrowdStrike, Recorded Future, Group-IB).
  • Internal organizational rules that defenders write for their specific environment.

The rule quality varies substantially. Public rules may be false-positive-prone or out-of-date; commercial rules are typically higher-quality but expensive; internal rules are tuned to the environment but require analyst capacity to write and maintain.

YARA-X is the Rust rewrite of YARA, released in 2024. The implementation is faster, has more modern syntax, and is positioned as the long-term successor to the original C YARA. The compatibility is high but not perfect; some advanced rules require minor adjustment.

Use cases. YARA is used for:

  • Classification. Identifying which family a sample belongs to.
  • Hunting. Scanning collections of files (or memory images, or network captures) for samples matching specific patterns.
  • Detection. Integrating rules into endpoint security tools (most EDRs support YARA), SIEMs (via integration), and forensic tools (Volatility’s yarascan, the various commercial integrations).

The malware analysis toolkit

A consolidated tour of the standard tools beyond what has been mentioned in context.

File analysis. file (libmagic), exiftool (metadata extraction from many file formats), binwalk (firmware and embedded-file analysis), 7z (archive examination), oletools (Office document analysis), pdf-parser and peepdf (PDF analysis).

PE-specific. pestudio (Windows-only GUI, broad analysis features), PE-bear (cross-platform PE editor and analyzer), CFF Explorer (legacy but still useful), Detect It Easy / DIE (packer and compiler identification), the pefile Python library, Capa (Mandiant’s open-source tool for identifying capabilities in PE files based on disassembled behaviors).

Disassemblers and decompilers. Covered above (Ghidra, IDA Pro, Binary Ninja, radare2, Cutter, Hopper, objdump).

Debuggers. Covered above (x64dbg, WinDbg, OllyDbg, gdb).

Sandboxes. Covered above (Cuckoo, CAPE, Any.run, Joe Sandbox, Triage, VMRay).

Behavioral monitoring. Covered above (Procmon, Process Hacker / System Informer, API Monitor, Sysmon).

Network simulation. INetSim (offline internet service emulation), fakedns (DNS response control), Wireshark (traffic capture), Burp Suite (HTTP proxy and modification).

Memory analysis. Volatility (covered in Memory Forensics).

Configuration extraction. Covered above (CAPE, malduck, mwcfg, family-specific extractors).

YARA. YARA, YARA-X, yara-x-py for Python integration, dnYara for .NET, the Volatility yarascan plugin.

The threat intelligence platforms. MISP (open-source threat intelligence platform), OpenCTI, and the commercial equivalents (Mandiant Advantage, Recorded Future, CrowdStrike Falcon Intelligence, Anomali). The analytical workflow consumes IOCs from these platforms and produces IOCs back into them.

Mobile-specific. jadx, apktool, MobSF, Frida, objection (Frida-based runtime exploration), dex2jar, AndroBugs, QARK.

The malware analysis report

The analytical output that downstream consumers actually use.

Classification. Family identification, variant identification where possible, similarity to known samples. The classification supports correlation across incidents and threat intelligence.

Indicators of compromise (IOCs). Hashes of the sample and any dropped files, domain names and IP addresses (C2, DGA seed, fallback infrastructure), URLs (full paths used), registry keys (persistence locations, configuration storage, runtime indicators), file paths (drop locations, scratch directories), mutex names (used by malware for single-instance enforcement and serving as IOCs), and any other concrete artifacts that defenders can detect on.

Behavioral summary. What the malware does: initial execution, persistence, privilege escalation if any, lateral movement capabilities, communication with C2, payload behaviors (data theft, encryption, destruction), exit conditions.

MITRE ATT&CK technique mapping. Modern reports map observed behaviors to ATT&CK techniques (T-codes). The mapping supports the detection engineering work that follows the analysis. Defenders write detection rules tagged with the same techniques.

Detection content. YARA rules, Sigma rules, Suricata signatures: concrete detection artifacts that operational defenders can deploy. The detection content is often the most directly actionable output of the analysis.

Family attribution. When the family is known, the report attributes the sample to the family with confidence level. When the family is unknown, the report describes the closest similar families and the differences.

Recommendations. For the specific incident: containment actions, eradication steps, recovery considerations. For the broader posture: detection gaps the analysis surfaced, hardening recommendations that would have blocked the observed behaviors.

The report’s audience varies. SOC analysts need IOCs and detection content; incident responders need the behavioral summary and recommendations; threat intelligence teams need the family attribution and the broader context; executives need a summary of impact and exposure. Well-written reports serve multiple audiences with different sections at different depth levels.

Where the analysis can’t go

The structural problems malware analysis is currently working through:

Heavily packed and virtualized malware. Themida, VMProtect, and similar protectors at their highest settings produce binaries that are months of analyst time to fully reverse. Most analytical engagements do not have that budget; the practical mitigation is to identify the malware at a higher level (through dynamic behavior, through configuration extraction, through family-level attribution) rather than fully reversing it.

Bespoke / targeted malware. Nation-state implants and custom-built malware for specific targets often have no community knowledge to draw on. The analysis proceeds from first principles and produces less complete output than the analysis of commodity malware, which has family knowledge to draw on.

Time pressure. Incident response timelines often allow days, not weeks, for malware analysis. The analytical output reflects what could be produced in the available time, which is sometimes less than what comprehensive analysis would have surfaced. The methodology has to be honest about what the report establishes versus what is provisional.

Sample unavailability. Analysts sometimes work without the original sample. The binary was wiped during incident response, the analyst has only forensic artifacts (registry entries, file system traces, network indicators), or the sample was captured incomplete. The analysis proceeds with what’s available and acknowledges the gaps.

Anti-analysis sophistication. The anti-analysis arms race favors the malware authors in the long run. Each new sandbox evasion technique requires the sandbox operators to respond, and the response is always behind. The mitigation is layered analysis (multiple sandboxes, manual analysis when sandbox results look thin, ongoing investment in sandbox hardening) rather than reliance on any single technique.

Mobile analysis limitations. Mobile binary analysis is less mature than desktop analysis, particularly on iOS. The encrypted iOS binaries, the rapid OS update cadence, and the smaller community of analysts working on mobile produce analytical output that is less complete than the desktop equivalent for comparable effort.

Family overlap and the attribution problem. Many malware families share code (commodity tools used across families), share infrastructure (the same hosting providers, the same domain registrars), and share operational tradecraft. Confident family attribution is harder than analyst write-ups sometimes suggest; the methodology has to track confidence levels.

The “is this actually malicious” boundary. Some samples sit in the gray area between unwanted software, aggressive adware, and malware. The analytical output has to be honest about the boundary and not over-classify gray-area samples as malicious.

Malware analysis is the discipline that turns observed binaries into durable defensive knowledge. The output (IOCs, behavioral signatures, family attribution, detection rules) feeds the broader defensive ecosystem in ways that single-incident forensic analysis does not. The discipline has matured into a substantial toolkit over the last two decades, with the static / dynamic split as the organizing axis, the disassembler / decompiler ecosystem as the analytical workhorse, and the YARA classification system as the de facto language for matching samples to families. The arms race with malware authors continues; the methodology adapts to each new round.

The connected pages cover the work that surfaces the binaries this analysis examines and the analytical workflows the output feeds: Memory Forensics covers the in-memory artifact surface that often produces the samples; Disk and File System Forensics covers the disk-resident artifacts that complement the binary analysis; Network Forensics covers the C2 protocol observations that the configuration extraction supplements; Mobile Forensics covers the mobile-device artifact surface that produces the iOS and Android samples; Timeline Analysis covers the cross-artifact reconstruction that malware analysis output feeds into; and Incident Response and DFIR Workflow covers the operational context that the analysis serves. The Digital Forensics hub covers the discipline as a whole.