1 of 43

Cryptographic Hashes and MACs

CS 161 Spring 2025 - Lecture 9

Computer Science 161

2 of 43

Last Time: Block Ciphers and Modes of Operation

  • Block cipher: input a k-bit key and n-bit block, receive n-bit block
    • Not IND-CPA secure (deterministic)
  • Mode of operation (e.g., CBC, CTR): encrypt arbitrary-length plaintext, in an IND-CPA secure way
  • Neither provides any guarantee of integrity or authenticity

2

Computer Science 161

3 of 43

Today: Cryptography Hashes and MACs

  • Hashing
    • Definition
    • Security: one-way, second preimage resistant, collision resistant
    • Examples
    • Length extension attacks
    • Application: Lowest-hash scheme
    • Do hashes provide integrity?
  • Message Authentication Codes (MACs)
    • Definition
    • Security: unforgeability
    • Example: HMAC
    • Do MACs provide integrity?
  • Authenticated Encryption
    • Definition
    • Key Reuse
    • MAC-then-Encrypt or Encrypt-then-MAC?
    • AEAD Encryption Modes

3

Computer Science 161

4 of 43

Cryptographic Hashes

4

Textbook Chapter 7.1–7.3

Computer Science 161

5 of 43

Cryptography Roadmap

  • Hash functions
  • Pseudorandom number generators
  • Public key exchange (e.g. Diffie-Hellman)

5

Symmetric-key

Asymmetric-key

Confidentiality

  • One-time pads
  • Block ciphers with chaining modes (e.g. AES-CBC)
  • RSA encryption
  • ElGamal encryption

Integrity,�Authentication

  • MACs (e.g. HMAC)
  • Digital signatures (e.g. RSA signatures)
  • Key management (certificates)
  • Password management

Computer Science 161

6 of 43

Cryptographic Hash Function: Definition

  • Hash function: H(M)
    • Input: Arbitrary length message M
    • Output: Fixed length, n-bit hash
    • Sometimes written as {0, 1}* → {0, 1}n
  • Properties
    • Correctness: Deterministic
      • Hashing the same input always produces the same output
    • Efficiency: Efficient to compute
    • Security: One-way-ness (“preimage resistance”)
    • Security: Collision-resistance
    • Security: Random/unpredictability, no predictable patterns for how changing the input affects the output
      • Changing one bit in the input causes the output to be completely different
      • Also called “random oracle” assumption

6

Computer Science 161

7 of 43

Hash Function: Intuition

  • A hash function provides a fixed-length “fingerprint” for a sequence of bits
  • Example: Document comparison
    • If Alice and Bob both have a 1 GB document, they can both compute a hash over the document and (securely) communicate the hashes to each other
    • If the hashes are the same, the files must be the same, since they have the same “fingerprint”
    • If the hashes are different, the files must be different

7

Computer Science 161

8 of 43

Hash Function: One-way-ness or Preimage Resistance

  • Informal: Given an output y, it is infeasible to find any input x such that H(x) = y
  • More formally: For every polynomial time adversary,

Pr[x chosen randomly from plaintext space; y = H(x):

Adv(y) outputs x' s.t. H(x') = y] is negligible

  • Intuition: Here’s an output. Can you find an input that hashes to this output?
    • Note: The adversary just needs to find any input, not necessarily the input that was actually used to generate the hash
  • Example: Is H(x) = 42 one-way?
    • No, because given output 42, an attacker can return any number x

8

Computer Science 161

9 of 43

Hash Function: Collision Resistance

  • Collision: Two different inputs with the same output
    • xx' and H(x) = H(x')
    • Can we design a hash function with no collisions?
      • No, because there are more inputs than outputs (pigeonhole principle)
    • However, we want to make finding collisions infeasible for an attacker
  • Collision resistance: It is infeasible to (i.e. no polynomial time attacker can) find any pair of inputs x'x such that H(x) = H(x')
  • Intuition: Can you find any two inputs that collide with the same hash output for any output?

9

Computer Science 161

10 of 43

Hash Function: Collision Resistance

  • Birthday attack: Finding a collision on an n-bit output requires only 2n/2 tries on average
    • This is why a group of 23 people are >50% likely to have at least one birthday in common

10

Computer Science 161

11 of 43

Hash Function: Examples

  • MD5
    • Output: 128 bits
    • Security: Completely broken
  • SHA-1
    • Output: 160 bits
    • Security: Completely broken in 2017
    • Was known to be weak before 2017, but still used sometimes
  • SHA-2
    • Output: 256, 384, or 512 bits (sometimes labeled SHA-256, SHA-384, SHA-512)
    • Not currently broken, but some variants are vulnerable to a length extension attack
    • Current standard
  • SHA-3 (Keccak)
    • Output: 256, 384, or 512 bits
    • Current standard (not meant to replace SHA-2, just a different construction)

11

A GIF that displays its own MD5 hash

Computer Science 161

12 of 43

Length Extension Attacks

  • Length extension attack: Given H(x) and the length of x, but not x, an attacker can create H(x || m) for any m of the attacker’s choosing
    • See homework for a demo
    • Note: This doesn’t violate any property of hash functions but is undesirable in some circumstances
  • SHA-256 (256-bit version of SHA-2) is vulnerable
  • SHA-3 is not vulnerable

12

Computer Science 161

13 of 43

Do hashes provide integrity?

  • It depends how you use them
  • Scenario
    • Mozilla publishes a new version of Firefox on some download servers
    • Alice downloads the program binary
    • How can she be sure that nobody tampered with the program?
  • Idea: use cryptographic hashes
    • Mozilla hashes the program binary and publishes the hash on its website
    • Alice hashes the binary she downloaded and checks that it matches the hash on the website
    • If Alice downloaded a malicious program, the hash would not match (tampering detected!)
    • An attacker can’t create a malicious program with the same hash (one-way property)
  • Threat model: We assume the attacker cannot modify the hash on the website
    • We have integrity, as long as we can communicate the hash securely

13

Computer Science 161

14 of 43

Do hashes provide integrity?

  • It depends how you use them
  • Scenario
    • Alice and Bob want to communicate over an insecure channel
    • Mallory might tamper with messages
  • Idea: Use cryptographic hashes
    • Alice sends her message with a cryptographic hash over the channel
    • Bob receives the message and computes a hash on the message
    • Bob checks that the hash he computed matches the hash sent by Alice
  • Threat model: Mallory can modify the message and the hash
    • No integrity!

14

Alice

Bob

M, H(M)

X

MI

X

H(MI)

Computer Science 161

15 of 43

Do hashes provide integrity?

  • It depends how you use them
  • If the attacker can modify the hash, hashes don’t provide integrity
  • Main issue: Hashes are unkeyed functions
    • There is no secret key being used as input, so any attacker can compute a hash on any value
  • Next: Use hashes to design schemes that provide integrity

15

Computer Science 161

16 of 43

Message Authentication Codes (MACs)

16

Textbook Chapter 8.1–8.3 & 8.5–8.6

Computer Science 161

17 of 43

Cryptography Roadmap

  • Hash functions
  • Pseudorandom number generators
  • Public key exchange (e.g. Diffie-Hellman)

17

Symmetric-key

Asymmetric-key

Confidentiality

  • One-time pads
  • Block ciphers with chaining modes (e.g. AES-CBC)
  • RSA encryption
  • ElGamal encryption

Integrity,�Authentication

  • MACs (e.g. HMAC)
  • Digital signatures (e.g. RSA signatures)
  • Key management (certificates)
  • Password management

Computer Science 161

18 of 43

How to Provide Integrity

  • Reminder: We’re still in the symmetric-key setting
    • Assume that Alice and Bob share a secret key, and attackers don’t know the key
  • We want to attach some piece of information to prove that someone with the key sent this message
    • This piece of information can only be generated by someone with the key

18

Computer Science 161

19 of 43

MACs: Usage

  • Alice wants to send M to Bob, but doesn’t want Mallory to tamper with it
  • Alice sends M and T = MAC(K, M) to Bob
  • Bob receives M and T
  • Bob computes MAC(K, M) and checks that it matches T
  • If the MACs match, Bob is confident the message has not been tampered with (integrity)

19

Message

Key

MAC

Message

Key

Verify

Message

Alice

Bob

Insecure Channel

T

Computer Science 161

20 of 43

MACs: Definition

  • Two parts:
    • KeyGen() → K: Generate a key K
    • MAC(K, M) → T: Generate a tag T for the message M using key K
      • Inputs: A secret key and an arbitrary-length message
      • Output: A fixed-length tag on the message
  • Properties
    • Correctness: Determinism
      • Note: Some more complicated MAC schemes have an additional Verify(K, M, T) function that don’t require determinism, but this is out of scope
    • Efficiency: Computing a MAC should be efficient
    • Security: EU-CPA (existentially unforgeable under chosen plaintext attack)

20

Computer Science 161

21 of 43

Defining Integrity: EU-CPA

  • A secure MAC is existentially unforgeable: without the key, an attacker cannot create a valid tag on a message
    • Mallory cannot generate MAC(K, M') without K
    • Mallory cannot find any M'M such that MAC(K, M') = MAC(K, M)
  • Formally defined by a security game: existential unforgeability under chosen-plaintext attack, or EU-CPA
  • MACs should be unforgeable under chosen plaintext attack
    • Intuition: Like IND-CPA, but for integrity and authenticity
    • Even if Mallory can trick Alice into creating MACs for messages that Mallory chooses, Mallory cannot create a valid MAC on a message that Alice hasn't sent before

21

Computer Science 161

22 of 43

Defining Integrity: EU-CPA

  1. Mallory may send messages to Alice and receive their tags
  2. Eventually, Mallory creates a message-tag pair (M', T')
    • M' cannot be a message that Mallory requested earlier
    • If T' is a valid tag for M', then Mallory wins. Otherwise, she loses.
    • A scheme is EU-CPA secure if for all polynomial time adversaries, the probability of winning is 0 or negligible

22

M

MAC(K, M)

(repeat)

Alice (challenger)

Mallory (adversary)

Output (M', T')

Computer Science 161

23 of 43

MACs: Usage

  • Alice wants to send M to Bob, but doesn’t want Mallory to tamper with it
  • Alice sends M and T = MAC(K, M) to Bob
  • Bob receives M and T
  • Bob computes MAC(K, M) and checks that it matches T
  • If the MACs match, Bob is confident the message has not been tampered with (integrity)

23

Message

Key

MAC

Message

Key

Verify

Message

Alice

Bob

Insecure Channel

T

Computer Science 161

24 of 43

Example: NMAC

  • Can we use secure cryptographic hashes to build a secure MAC?
    • Intuition: Hash output is unpredictable and looks random, so let’s hash the key and the message together
  • KeyGen():
    • Output two random, n-bit keys K1 and K2, where n is the length of the hash output
  • NMAC(K1, K2, M):
    • Output H(K1 || H(K2 || M))
  • NMAC is EU-CPA secure
    • Provably secure if the underlying hash function is secure
  • Intuition: Using two hashes prevents a length extension attack
    • Otherwise, an attacker who sees a tag for M could generate a tag for M || M'

24

Computer Science 161

25 of 43

Example: HMAC

  • Issues with NMAC:
    • Recall: NMAC(K1, K2, M) = H(K1 || H (K2 || M))
    • We need two keys
    • NMAC requires the keys to be the same length as the hash output (n bits)
    • Can we use NMAC to design a scheme that uses one key?
  • HMAC(K, M):
    • Compute K' as a version of K that is the length of the hash output
      • If K is too short, pad K with 0’s to make it n bits (be careful with keys that are too short and lack randomness)
      • If K is too long, hash it so it’s n bits
    • Output H((K' ⊕ opad) || H((K' ⊕ ipad) || M))

25

Computer Science 161

26 of 43

Example: HMAC

  • HMAC(K, M):
    • Compute K' as a version of K that is the length of the hash output
      • If K is too short, pad K with 0’s to make it n bits (be careful with keys that are too short and lack randomness)
      • If K is too long, hash it so it’s n bits
    • Output H((K' ⊕ opad) || H((K' ⊕ ipad) || M))
  • Use K' to derive two different keys
    • opad (outer pad) is the hard-coded byte 0x5c repeated until it’s the same length as K'
    • ipad (inner pad) is the hard-coded byte 0x36 repeated until it’s the same length as K'
    • As long as opad and ipad are different, you’ll get two different keys
    • For paranoia, the designers chose two very different bit patterns, even though they theoretically need only differ in one bit

26

Computer Science 161

27 of 43

HMAC Properties

  • HMAC(K, M) = H((K' ⊕ opad) || H((K' ⊕ ipad) || M))
  • HMAC is a hash function, so it has the properties of the underlying hash too
    • It is collision resistant
    • Given HMAC(K, M) and K, an attacker can’t learn M
    • If the underlying hash is secure, HMAC doesn’t reveal M, but it is still deterministic
  • You can’t verify a tag T if you don’t have K
    • This means that an attacker can’t brute-force the message M without knowing K

27

Computer Science 161

28 of 43

Do MACs provide integrity?

  • Do MACs provide integrity?
    • Yes. An attacker cannot tamper with the message without being detected
  • Do MACs provide authenticity?
    • If a message has a valid MAC, you can be sure it came from someone with the secret key, but you can’t narrow it down to one person
    • If only two people have the secret key, MACs provide authenticity: it has a valid MAC, and it’s not from me, so it must be from the other person
  • Do MACs provide confidentiality?
    • MACs are deterministic ⇒ No IND-CPA security
    • MACs in general have no confidentiality guarantees; they can leak information about the message
      • HMAC doesn’t leak information about the message, but it’s still deterministic, so it’s not IND-CPA secure

28

Computer Science 161

29 of 43

Authenticated Encryption

29

Textbook Chapter 8.7 & 8.8

Computer Science 161

30 of 43

Cryptography Roadmap

  • Hash functions
  • Pseudorandom number generators
  • Public key exchange (e.g. Diffie-Hellman)

30

Symmetric-key

Asymmetric-key

Confidentiality

  • One-time pads
  • Block ciphers with chaining modes (e.g. AES-CBC)
  • RSA encryption
  • ElGamal encryption

Integrity,�Authentication

  • MACs (e.g. HMAC)
  • Digital signatures (e.g. RSA signatures)
  • Key management (certificates)
  • Password management

Computer Science 161

31 of 43

Authenticated Encryption: Definition

  • Authenticated encryption (AE): A scheme that simultaneously guarantees confidentiality and integrity (and authenticity, depending on how it is used) on a message
  • Two ways of achieving authenticated encryption:
    • Combine schemes that provide confidentiality with schemes that provide integrity
    • Use a single scheme that is designed to provide both confidentiality and integrity

31

Computer Science 161

32 of 43

Combining Schemes: Let’s design it together

  • You can use:
    • An IND-CPA encryption scheme (e.g. AES-CBC): Enc(K, M) and Dec(K, M)
    • An unforgeable MAC scheme (e.g. HMAC): MAC(K, M)
  • First attempt: Alice sends Enc(K1, M) and MAC(K2, M)
    • Integrity? Yes, attacker can’t generate a valid MAC on a modified message
    • Confidentiality? No, the MAC is not IND-CPA secure
  • Idea 1: Let’s compute the MAC on the ciphertext instead of the plaintext:�Enc(K1, M) and MAC(K2, Enc(K1, M))
    • Integrity? Yes, attacker can’t generate a valid MAC on a modified message
    • Confidentiality? Yes, the MAC might leak info about the ciphertext, but that’s okay
  • Idea 2: Let’s encrypt the MAC: Enc(K1, M || MAC(K2, M))
    • Integrity? Yes, attacker can’t generate a valid MAC on a modified message
    • Confidentiality? Yes, everything is encrypted

32

Computer Science 161

33 of 43

MAC-then-Encrypt or Encrypt-then-MAC?

  • Encrypt-then-MAC
    • First compute Enc(K1, M)
    • Then MAC the ciphertext: MAC(K2, Enc(K1, M))
  • MAC-then-encrypt
    • First compute MAC(K2, M)
    • Then encrypt the message and the MAC together: Enc(K1, M || MAC(K2, M))
  • Which is better?
    • In theory, both are IND-CPA and EU-CPA secure if applied properly
    • MAC-then-encrypt has a flaw: You don’t know if tampering has occurred until after decrypting
      • Attacker can supply arbitrary tampered input, and you always have to decrypt it
      • Passing attacker-chosen input through the decryption function can cause side-channel leaks
  • Always use encrypt-then-MAC because it’s more robust to mistakes

33

Computer Science 161

34 of 43

Key Reuse

  • Key reuse: Using the same key in two different use cases
    • Note: Using the same key multiple times for the same use (e.g. computing HMACs on different messages in the same context with the same key) is not key reuse
  • Reusing keys can cause the underlying algorithms to interfere with each other and affect security guarantees
    • Example: If you use a block-cipher-based MAC algorithm and a block cipher chaining mode, the underlying block ciphers may no longer be secure
    • Thinking about these attacks is hard

34

Lecture Version Definition

  • Encrypt-then-MAC
    • First compute Enc(K1, M)
    • Then MAC the ciphertext: MAC(K2, Enc(K1, M))

Enc(K, M) MAC(K, Enc(K, M))

Computer Science 161

35 of 43

Key Reuse

  • Simplest solution: Do not reuse keys! One key per use.
    • Encrypt a piece of data and MAC a piece of data?
      • Different use (purpose); different key
    • MAC one of Alice’s messages to Bob and MAC one of Bob’s messages to Alice?
      • Different use (purpose); different key
    • Encrypt one of Alice’s files and encrypt another one of Alice’s files?
      • Same use (purpose); this is fine
    • Encrypt user metadata, encrypt file metadata, and encrypt file data?
      • You’ll have to think about this in Project 2!

35

Computer Science 161

36 of 43

TLS 1.0 “Lucky 13” Attack

  • TLS: A protocol for sending encrypted and authenticated messages over the Internet (we’ll study it more in the networking unit)
  • TLS 1.0 uses MAC-then-encrypt: Enc(K1, M || MAC(K2, M))
    • The encryption algorithm is AES-CBC
  • The Lucky 13 attack abuses MAC-then-encrypt to read encrypted messages
    • Guess a byte of plaintext and change the ciphertext accordingly
    • The MAC will error, but the time it takes to error is different depending on if the guess is correct
    • Attacker measures how long it takes to error in order to learn information about plaintext
    • TLS will send the message again if the MAC errors, so the attacker can guess repeatedly
  • Takeaways
    • Side channel attack: The algorithm is proved secure, but poor implementation made it vulnerable
    • Always encrypt-then-MAC
    • You’ll try a similar attack in Homework 2!

36

Computer Science 161

37 of 43

AEAD Encryption

  • Second method for authenticated encryption: Use a scheme that is designed to provide confidentiality, integrity, and authenticity
  • Authenticated encryption with additional data (AEAD): An algorithm that provides both confidentiality and integrity over the plaintext and integrity over additional data
    • Additional data is usually context (e.g. memory address), so you can’t change the context without breaking the MAC
  • Great: No more worrying about MAC-then-encrypt

37

Computer Science 161

38 of 43

AEAD Example: Galois Counter Mode (GCM)

  • Galois Counter Mode (GCM): An AEAD block cipher mode of operation
  • EK is standard block cipher encryption
  • multH is 128-bit multiplication over a special field (Galois multiplication)
    • Don’t worry about the math

38

Computer Science 161

39 of 43

AEAD Example: Galois Counter Mode (GCM)

  • Very fast mode of operation
    • Fully parallel encryption
    • Galois multiplication isn’t parallelizable, but it’s very fast
  • Drawbacks
    • IV reuse leads to loss of confidentiality, integrity, and authentication
    • This wouldn’t happen if you used AES-CTR and HMAC-SHA256
    • Implementing Galois implementation is difficult and easy to screw up
  • Takeaway: GCM provides integrity and confidentiality, but if you misuse it, it’s even worse than CTR mode

39

Computer Science 161

40 of 43

Hashes: Summary

  • Map arbitrary-length input to fixed-length output
  • Output is deterministic and unpredictable
  • Security properties
    • One way: Given an output y, it is infeasible to find any input x such that H(x) = y.
    • Second preimage resistant: Given an input x, it is infeasible to find another input x'x such that H(x) = H(x').
    • Collision resistant: It is infeasible to find another any pair of inputs x'x such that H(x) = H(x').
  • Some hashes are vulnerable to length extension attacks (H(M) → H(M||M’))
  • Hashes don’t provide integrity (unless you can publish the hash securely)

40

Computer Science 161

41 of 43

MACs: Summary

  • Inputs: a secret key and a message
  • Output: a tag on the message
  • A secure MAC is unforgeable: Even if Mallory can trick Alice into creating MACs for messages that Mallory chooses, Mallory cannot create a valid MAC on a message that she hasn't seen before
    • Example: HMAC(K, M) = H((K' ⊕ opad) || H((K' ⊕ ipad) || M))
  • MACs do not provide confidentiality

41

Computer Science 161

42 of 43

Authenticated Encryption: Summary

  • Authenticated encryption: A scheme that simultaneously guarantees confidentiality and integrity (and authenticity) on a message
  • First approach: Combine schemes that provide confidentiality with schemes that provide integrity and authenticity
    • MAC-then-encrypt: Enc(K1, M || MAC(K2, M))
    • Encrypt-then-MAC: Enc(K1, M) || MAC(K2, Enc(K1, M))
    • Always use Encrypt-then-MAC because it's more robust to mistakes
  • Second approach: Use AEAD encryption modes designed to provide confidentiality, integrity, and authenticity

42

Computer Science 161

43 of 43

Next Time

  • Symmetric-key encryption schemes need randomness. How do we securely generate random numbers?
  • When discussing symmetric-key schemes, we assumed Alice and Bob managed to share a secret key. How can Alice and Bob share a symmetric key over an insecure channel?

43

Computer Science 161