Hash Functions and MACs
A cryptographic hash function takes an input of arbitrary length and produces a fixed-length output called a digest or hash. The mapping is deterministic — the same input always produces the same output — and the function is designed to be one-way. Given an output, finding the input that produced it should be computationally infeasible. Given an input, finding a different input that produces the same output should be computationally infeasible. These two properties, together with collision resistance, are the entire security story.
Hash functions are the workhorses of applied cryptography. They underlie message authentication codes, digital signatures, password storage, blockchain consensus, file integrity verification, key derivation, and dozens of other constructions. They are not encryption — there is no key, no decryption operation, and the output is intentionally one-way.
This page is the deep-dive companion to the Cryptography umbrella overview. Scope here is the hash-function primitives themselves, the MAC constructions built on top of them, and the recurring failure patterns in deployment. Key management — which is the harder discipline that turns these primitives into usable systems — has its own page.
The three security properties
Every cryptographic hash function claims, formally, three security properties:
Preimage resistance. Given a hash output h, it should be computationally infeasible to find an input m such that hash(m) = h. The one-way property.
Second-preimage resistance. Given an input m1, it should be computationally infeasible to find a different input m2 such that hash(m1) = hash(m2). The “you can’t substitute a different message that hashes to the same value” property.
Collision resistance. It should be computationally infeasible to find any two distinct inputs m1, m2 such that hash(m1) = hash(m2). The strongest of the three properties — collision resistance implies second-preimage resistance, but not the reverse.
The three properties have different cost profiles. By the birthday paradox, collision resistance is bounded at roughly n/2 bits of security for an n-bit hash output, while preimage and second-preimage resistance are bounded at n bits. SHA-256 therefore provides roughly 128 bits of collision resistance and 256 bits of preimage resistance. This asymmetry is why hash output sizes are typically chosen to provide collision resistance at the desired security level — if you want 128 bits of collision resistance, you need at least a 256-bit hash.
Confusion between these properties is a source of design error. A protocol that relies on the difficulty of finding some collision is using collision resistance; a protocol that relies on the difficulty of finding a collision with a specific predetermined value is using second-preimage resistance, which is harder for the attacker. MD5 has been collision-broken for over twenty years but remains second-preimage-resistant in practice — which is why MD5 still appears in some legacy contexts where the use case only depends on the weaker property. This is still bad practice, but the distinction explains why MD5 has not produced more dramatic failures in those specific narrow contexts.
The hash families that have mattered
MD5 — the cautionary tale
MD5, designed by Ron Rivest in 1991, produces a 128-bit output. Theoretical weaknesses were found in 1996; the first practical collision attack was demonstrated in 2004 by Xiaoyun Wang and her group. By 2008, the Flame malware (discovered in 2012 but operational years earlier) exploited an MD5 collision to forge a Microsoft code-signing certificate. The Sotirov et al. rogue CA attack in 2008 used MD5 collisions to create a malicious certificate authority. MD5 has been entirely unsuitable for any security purpose since at least 2010, and it should never appear in new systems. It still shows up in legacy integrity-checksum contexts (file-transfer checksums, RPM database) where the use case genuinely does not depend on adversarial collision resistance, but those are exceptions to a strong general rule against use.
SHA-1 — the slow death
SHA-1, designed by the NSA and published as FIPS 180-1 in 1995, produces a 160-bit output. Theoretical attacks appeared in 2005; the first practical collision was demonstrated by Google and CWI Amsterdam with the SHAttered attack in 2017, which produced two distinct PDF files with the same SHA-1 hash at a computational cost of roughly $110,000 in cloud compute at the time. The SHA-1 is a Shambles attack in 2020 reduced the cost further and demonstrated chosen-prefix collisions, which is the more dangerous variant for attacks on certificates and signed documents. SHA-1 was deprecated by NIST for digital signatures in 2011, prohibited in TLS certificates by the major browsers between 2017 and 2020, and is now considered structurally inadequate for any new use.
SHA-2 — the workhorse
The SHA-2 family, published as FIPS 180-2 in 2002, includes six variants with different output sizes and internal structures:
- SHA-224, SHA-256 — built on a 32-bit internal state, 64 rounds, 256-bit internal state size.
- SHA-384, SHA-512 — built on a 64-bit internal state, 80 rounds, 512-bit internal state size.
- SHA-512/224, SHA-512/256 — the 64-bit-state variants truncated to 224 or 256 bits of output. Faster than SHA-224/SHA-256 on 64-bit hardware because they share the SHA-512 internal structure.
SHA-256 is the dominant SHA-2 variant by deployment, used in TLS certificates, code signing, Bitcoin and most cryptocurrency consensus, Git commit hashing (in the process of migrating from SHA-1), and the bulk of modern digital signature schemes. SHA-2 remains structurally sound — no significant cryptanalytic attack has materially reduced its security in two decades — and is the default cryptographic hash for the majority of deployed systems in 2026.
SHA-2 uses the Merkle-Damgård construction, which is the same iterated-compression-function pattern used by MD5 and SHA-1. The construction is mathematically sound for collision resistance, but it has a subtle weakness that has caused real-world bugs: length-extension attacks. Given hash(secret || message) and the length of secret||message, an attacker can compute hash(secret || message || padding || extension) for an arbitrary extension, without knowing secret. This is fine in most use cases (file integrity, certificate signing) but catastrophic if a protocol uses the construction hash(secret || message) as a MAC. Naive MAC constructions on SHA-256 are broken; HMAC was designed specifically to avoid this attack, and HMAC-SHA-256 is the correct MAC construction.
SHA-3 — the sponge
In 2007, NIST opened a public competition for a SHA-2 alternative, partly as a hedge against future cryptanalytic breakthroughs against SHA-2 and partly to obtain a hash function with a fundamentally different internal structure. The competition ran for five years; the winner, announced in October 2012 and standardized as FIPS 202 in 2015, was Keccak, designed by Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles Van Assche.
Keccak — published as SHA-3 — is architecturally distinct from SHA-2 in a way that matters for both security and capability. Instead of the Merkle-Damgård iterated compression construction, SHA-3 uses the sponge construction, which absorbs input into a large internal state through XOR operations interleaved with a fixed permutation, then squeezes output from the state through the same permutation. The internal state is 1600 bits — much larger than SHA-2’s 256 or 512 bits — and the rate (the portion of state that interacts with input/output per call) is configurable per variant.
Three consequences of the sponge architecture matter for practitioners:
First, length-extension attacks do not apply. The sponge does not commit the full internal state to output; some portion of state remains hidden after squeezing finishes. An attacker who sees SHA3-256(secret || message) cannot extend the message because they cannot reconstruct the hidden state. This means SHA3-256(secret || message) is a sound MAC construction (though KMAC, described below, is the formally recommended approach).
Second, the output length becomes flexible. The sponge can be squeezed for as many output bits as desired, which leads directly to the XOFs.
Third, the internal permutation is reusable across multiple constructions. The same Keccak-f[1600] permutation underlies SHA-3, SHAKE, KMAC, and several authenticated-encryption modes. This is an engineering virtue — one well-analyzed primitive supports an entire family of constructions — and is part of why SHA-3-family hardware blocks are showing up in modern cryptographic accelerators.
The four fixed-output SHA-3 variants are SHA3-224, SHA3-256, SHA3-384, and SHA3-512, paralleling the SHA-2 family in output size. Adoption has been slower than its design quality would suggest, primarily because SHA-2 remains sound and migration is expensive, but SHA-3 is the conservative long-term choice for new deployments that want to hedge against unexpected SHA-2 cryptanalysis.
SHAKE128 and SHAKE256 — the extendable-output functions
The extendable-output functions (XOFs) are the part of the SHA-3 family that has no SHA-2 analog. SHAKE128 and SHAKE256 take an input of arbitrary length and produce an output of arbitrary length — the caller specifies how many bytes of output they want. This is structurally distinct from a hash function: SHA-256 always produces 256 bits; SHAKE128 produces however many bits you ask for, drawn from the same sponge state.
The numerical suffix on SHAKE128 and SHAKE256 refers to the security strength, not the output length. SHAKE128 provides 128-bit security regardless of output length. SHAKE256 provides 256-bit security. This naming convention is one of the few things that is genuinely confusing about the SHA-3 family — the suffix means security, not size.
XOFs unlock several constructions that are awkward with fixed-output hashes:
- Variable-length key derivation. Derive a 64-byte key, a 128-byte key, or a 1-megabyte stream from a single seed in one operation. HKDF approximates this with HMAC iteration, but a native XOF is cleaner and faster.
- Mask generation in signature padding. RSA-PSS and RSA-OAEP both need mask-generation functions; SHAKE provides one directly without the MGF1 construction that wraps a fixed-output hash.
- Deterministic random byte streams from a seed. Useful in zero-knowledge proofs, deterministic protocols, and stateless hashing schemes.
- Hash-based signature schemes. SLH-DSA (FIPS 205, the SPHINCS+-based post-quantum signature standard) uses SHAKE256 as a core primitive.
A pair of XOF variants worth knowing by name: cSHAKE128 and cSHAKE256 (customizable SHAKE) accept a function-name string and a customization string in addition to the input, allowing the same XOF to be used safely in multiple protocols without cross-protocol collisions. NIST SP 800-185 specifies cSHAKE alongside KMAC, TupleHash, and ParallelHash, all of which are SHAKE-derived constructions for specific use cases.
The SHAKE family is genuinely one of the more interesting design wins in modern symmetric cryptography. The sponge architecture, the single underlying permutation, the flexibility of output length, and the family of derived constructions all flow from a coherent design rather than the bolted-on accretion that has characterized previous hash standards. The lack of widespread adoption in 2026 reflects migration cost, not design quality.
BLAKE2 and BLAKE3 — the speed-optimized alternatives
BLAKE2, designed by Jean-Philippe Aumasson and others in 2012, is a SHA-3 competition finalist that did not win but was published independently as a fast, well-designed hash family. BLAKE2b (64-bit, up to 512-bit output) and BLAKE2s (32-bit, up to 256-bit output) are widely used in non-NIST contexts — Argon2 uses BLAKE2b internally, libsodium provides BLAKE2 as its default hash, and many cryptocurrency systems use it as a faster alternative to SHA-256.
BLAKE3, published in 2020, is a further evolution focused on extreme performance. BLAKE3 is a tree-hash construction that parallelizes naturally across multiple CPU cores, and on modern hardware it consistently outperforms SHA-256, SHA-3, and even AES-NI-accelerated GHASH for bulk-data hashing. BLAKE3 is also a XOF (extendable-output function) in its native form, supports keyed hashing as a MAC, and supports a KDF mode. For new performance-critical applications that do not require FIPS compliance, BLAKE3 is increasingly the default choice; for FIPS-validated environments, SHA-2 or SHA-3 remains mandatory.
Message Authentication Codes (MACs)
A Message Authentication Code (MAC) is a fixed-length tag computed over a message using a shared secret key, such that anyone with the key can verify that the message has not been modified and was produced by someone who knew the key. MACs provide integrity and authenticity but not non-repudiation — both parties hold the same key, so either could have produced any given tag.
Several MAC constructions matter in practice.
HMAC — the standard
HMAC (Hash-based Message Authentication Code), specified in RFC 2104 and FIPS 198-1, is the canonical hash-based MAC construction. HMAC-SHA-256 is the dominant deployed MAC in 2026, used in TLS, IPsec, SSH, JWT signing, AWS request signing, and most authentication protocols that don’t need an authenticated-encryption AEAD.
The construction wraps two hash invocations around the message and key:
HMAC(K, m) = H((K ⊕ opad) || H((K ⊕ ipad) || m))
The double-hash structure was designed specifically to prevent length-extension attacks against Merkle-Damgård hashes like SHA-256 — even though SHA-256 itself is length-extendable, HMAC-SHA-256 is not. The construction is provably secure under standard assumptions about the underlying hash function, and has held up to twenty-five years of analysis.
HMAC variants are usually named after their underlying hash: HMAC-SHA-256, HMAC-SHA-512, HMAC-SHA3-256, HMAC-BLAKE2b. All are sound; the choice between them follows the same logic as the choice of hash function.
KMAC — the SHA-3-native MAC
KMAC (KECCAK Message Authentication Code), specified in NIST SP 800-185, is a MAC construction built directly on the SHA-3 permutation rather than wrapping a SHA-3 hash with HMAC. Because SHA-3 itself is not length-extendable, KMAC can use a simpler construction than HMAC — it’s a more elegant design that achieves the same security property with less overhead. KMAC128 and KMAC256 correspond to the SHAKE128 and SHAKE256 security strengths.
KMAC is preferred over HMAC-SHA3 in new SHA-3-based designs because it makes better use of the sponge construction. In FIPS-validated environments where SHA-3 is being deployed, KMAC is the right MAC choice.
Poly1305 and GHASH — the one-time MACs
Poly1305, designed by Daniel J. Bernstein and used in ChaCha20-Poly1305, is a one-time MAC: it provides strong authentication for a single message under a given key, but is catastrophically broken if the same key is used for two messages. The one-time restriction is mitigated in practice by deriving a fresh Poly1305 key from a long-term key via a stream cipher for each message — which is exactly what ChaCha20-Poly1305 does.
GHASH, the authenticator inside AES-GCM, has the same structural property. Both are universal hash functions that achieve their efficiency by depending on key freshness rather than on the hardness of inverting a hash.
The practical takeaway: Poly1305 and GHASH are not general-purpose MACs. They are components of AEAD constructions that derive fresh authenticator keys per message. Using them directly as MACs requires careful key management that is easy to get wrong.
CMAC — the block-cipher-based MAC
CMAC (Cipher-based Message Authentication Code), specified in NIST SP 800-38B, builds a MAC from a block cipher (typically AES) rather than from a hash function. CMAC has the advantage of needing only a block cipher implementation — useful for constrained devices that already implement AES but do not have a hash function available. CMAC is used in some smart card and IoT contexts, but is generally less common than HMAC in software systems.
Key derivation functions
A key derivation function (KDF) turns one secret (a master key, a Diffie-Hellman shared output, or a password) into one or more keys suitable for cryptographic use. KDFs are a specialized application of hash functions but have distinct security requirements depending on the input.
HKDF — for high-entropy inputs
HKDF (HMAC-based Key Derivation Function), specified in RFC 5869, derives multiple keys from a single high-entropy input — typically the output of a Diffie-Hellman key agreement. HKDF uses an extract-then-expand pattern: first concentrate the entropy in the input into a uniform pseudorandom key, then expand that key into the desired output keys.
HKDF is the standard for deriving session keys from key-agreement outputs in TLS 1.3, Signal, and most modern key-exchange protocols. The extract-then-expand pattern is sound, the construction is fast, and it cleanly supports deriving multiple distinct keys from the same input by using different context strings in the expand phase.
PBKDF2, scrypt, Argon2 — for passwords
Password-based KDFs solve a fundamentally different problem: turning a low-entropy human-memorable password into a key, while making brute-force attacks expensive. The construction must be deliberately slow and (in modern designs) deliberately memory-expensive to defeat attackers running specialized hardware.
PBKDF2, specified in PKCS#5 v2.0 and RFC 8018, is the oldest of the family. It iterates HMAC many times (typically 100,000 or more) over the password and a salt. PBKDF2 is computationally hard but not memory-hard, which means GPUs and ASICs can attack PBKDF2 hashes far more efficiently than CPUs. PBKDF2 remains FIPS-approved and acceptable for password hashing where memory-hard alternatives are not available, but it should not be the default choice for new systems.
scrypt, designed by Colin Percival in 2009 and specified in RFC 7914, was the first widely-deployed memory-hard password hash. scrypt requires a configurable amount of memory in addition to CPU work, which forces attackers using GPUs and ASICs to provision the same memory per parallel attack stream. scrypt remains sound and widely used, particularly in cryptocurrency contexts where it became the standard for proof-of-work alternatives to Bitcoin’s SHA-256-based system.
Argon2, the winner of the 2015 Password Hashing Competition, is the current state of the art. It comes in three variants: Argon2d (data-dependent memory access, fastest but vulnerable to side channels), Argon2i (data-independent memory access, slower but side-channel-resistant), and Argon2id (a hybrid of the two, recommended by the competition organizers and by RFC 9106). Argon2id at appropriate parameters (memory cost ≥ 64 MB, time cost ≥ 3 iterations, parallelism ≥ 4) is the correct choice for password hashing in new systems.
The general rule for password hashing: never use a fast cryptographic hash (SHA-256, SHA-3, BLAKE2) directly on a password. The speed that makes those hashes good for general use makes them catastrophic for password storage — an attacker with leaked hashes and a GPU can try billions of password guesses per second.
How hash functions and MACs fail in practice
Six recurring failure patterns:
Using broken hashes for security purposes. MD5 and SHA-1 still appear in security-critical contexts in long-tail systems. The mitigation is migration; the obstacle is usually that hash functions appear deep in protocols and binary formats where changing them is structurally hard.
Hash-then-MAC instead of MAC-then-encrypt or AEAD. Composing a hash, a cipher, and an authentication step in the wrong order produces protocols that are subtly broken. The Lucky 13 attack against TLS exploited timing differences in MAC-then-encrypt CBC. The mitigation is AEAD constructions that compose encryption and authentication into a single primitive.
Length-extension misuse. Using H(secret || message) as a MAC with a length-extendable hash (SHA-256) is broken; the attacker can extend the message arbitrarily without knowing the secret. The mitigation is HMAC, KMAC, or sponge-based hashes. The pattern still appears in custom protocols whose designers did not read the FIPS notes carefully.
Fast hashes for password storage. SHA-256(password) is not password hashing. SHA-256(salt || password) is also not password hashing. PBKDF2, scrypt, or Argon2 is password hashing. The volume of historical breaches involving plain-hash password storage suggests this lesson keeps having to be relearned.
Truncation without consideration. Truncating a hash output (using only the first 128 bits of a SHA-256 hash, for example) is sometimes done for space reasons, and it does reduce the collision-resistance margin in a predictable way. Truncating below the security level needed for the application is a quiet failure mode.
Salt handling errors. Reusing salts across users, using empty salts, using salts derived from predictable inputs — all of these reduce the cost of precomputed rainbow-table attacks. The correct pattern is a long random per-user salt stored alongside the hash.
Quantum impact
Hash functions, like symmetric ciphers, are far less affected by quantum computing than asymmetric cryptography is. Grover’s algorithm provides a square-root speedup against the preimage problem, reducing the effective preimage resistance of an n-bit hash from n bits to n/2 bits.
The collision-resistance picture is more complex. The Brassard-Høyer-Tapp algorithm provides a cube-root speedup against finding collisions, reducing collision resistance from n/2 bits to n/3 bits for an n-bit hash. In practice, the quantum speedup for collision finding is dominated by the memory cost of the algorithm, and the practical security margin is closer to the classical n/2 bound than the theoretical n/3 bound.
The standard recommendation for post-quantum-safe hashing is to use a 384-bit or 512-bit output where 256-bit was previously chosen. SHA-384, SHA-512, SHA3-384, SHA3-512, SHAKE256, and BLAKE2b-512 are all reasonable choices for new deployments concerned about quantum adversaries.
Standards and references
- FIPS 180-4 — Secure Hash Standard (SHA-1, SHA-2 family).
- FIPS 202 — SHA-3 Standard (SHA3 hashes and SHAKE XOFs).
- NIST SP 800-185 — SHA-3 derived functions: cSHAKE, KMAC, TupleHash, ParallelHash.
- FIPS 198-1 — The Keyed-Hash Message Authentication Code (HMAC).
- NIST SP 800-38B — CMAC for block-cipher-based MACs.
- NIST SP 800-56C — Recommendation for Key Derivation Through Extraction-then-Expansion.
- RFC 2104 — HMAC: Keyed-Hashing for Message Authentication.
- RFC 5869 — HKDF.
- RFC 7914 — scrypt.
- RFC 9106 — Argon2.
- RFC 8439 — ChaCha20 and Poly1305.
What to actually use in 2026
For new systems, the short answers:
- General-purpose hashing: SHA-256 for FIPS-validated environments and broad compatibility. SHA3-256 for new deployments with no FIPS constraint and a preference for the conservative long-term choice. BLAKE3 for performance-critical applications without FIPS requirements.
- High-security hashing (post-quantum safe): SHA-512, SHA3-512, or SHAKE256 with 512-bit output.
- MACs: HMAC-SHA-256 in FIPS environments and for compatibility. KMAC256 when deploying SHA-3 natively. Poly1305 and GHASH only as components of AEAD constructions, never directly.
- Key derivation from high-entropy inputs: HKDF-SHA-256 (HKDF-Extract followed by HKDF-Expand).
- Password hashing: Argon2id with memory cost ≥ 64 MB, time cost ≥ 3 iterations, parallelism ≥ 4. scrypt as an acceptable alternative for systems that already have it. PBKDF2-HMAC-SHA-256 with at least 600,000 iterations as the FIPS-approved fallback.
Avoid: MD5 entirely, SHA-1 entirely, raw-hash MAC constructions on Merkle-Damgård hashes (use HMAC instead), fast hashes for password storage, truncation below the application’s required security level, and any custom hash-based protocol that has not been reviewed by someone who has read at least three papers in this list.
The SHA-3 family — and especially the SHAKE XOFs and the KMAC construction built on the same permutation — is the most architecturally satisfying recent development in this corner of cryptography. The sponge construction does several things at once that previously required separate primitives, and it does them with a clean security argument and a single well-analyzed core function. Whether your deployment migrates to SHA-3 in the near term depends mostly on whether FIPS compliance and ecosystem inertia push back. The design is good enough that the migration is worth taking seriously when the opportunity opens.