SE205 - Unit 4: Message Authentication and Hash Functions

Glossary of Terms

The process of verifying the identity of a user, process, or device, often as a prerequisite to granting access to resources. Also refers to verifying the source and integrity of data.

A type of cryptanalytic attack based on the mathematics of the birthday problem. It makes finding a collision for a hash function easier than brute-force, requiring roughly 2n/2 attempts for an n-bit hash.

A unique identifier for a block in a blockchain, generated by hashing the block's header content. Acts as a digital fingerprint and links blocks together.

A distributed, immutable digital ledger that stores records (transactions) in blocks, chronologically linked using cryptography (hashes).

A mode of operation for block ciphers where each block of plaintext is XORed with the previous ciphertext block before being encrypted. Can be used as a basis for MAC algorithms (e.g., CMAC).

A type of Message Authentication Code (MAC) algorithm based on a symmetric block cipher (like AES), rather than a hash function.

Occurs when two different input messages produce the exact same hash output value. Cryptographic hash functions aim to be collision-resistant.

A property of cryptographic hash functions. Strong collision resistance means it's computationally infeasible to find *any* two different inputs M1, M2 such that H(M1) = H(M2). Weak collision resistance (or second preimage resistance) means given an input M1, it's infeasible to find a different M2 such that H(M1) = H(M2).

The assurance that digital information has not been altered in an unauthorized way since it was created, transmitted, or stored. Hash functions and MACs are primary tools for ensuring this.

A cryptographic mechanism used to verify the authenticity (origin), integrity (unaltered content), and non-repudiation (sender cannot deny sending) of digital data. Typically created by hashing the data and encrypting the hash with the sender's private key.

A mathematical algorithm that takes an arbitrary block of data (input message) and returns a fixed-size bit string (the hash value or message digest), such that any change to the data will (with high probability) change the hash value. Key properties include being one-way and collision-resistant.

A specific type of Message Authentication Code (MAC) involving a cryptographic hash function (like SHA-256) in combination with a secret cryptographic key.

The cryptographic algorithm based on the sponge construction that was selected as the winner of the NIST hash function competition and became the SHA-3 standard.

Another term for a Message Authentication Code (MAC), emphasizing that the hash calculation incorporates a secret key shared between the parties.

A short piece of information generated using a secret key and a cryptographic algorithm (either hash-based like HMAC or cipher-based like CMAC) applied to a message. It verifies both the data integrity and authenticity of the message.

MAC = f(Key, Message)

A hash contained in a block header representing the hash of all transactions within that block, constructed using a Merkle Tree. Allows for efficient verification of transaction inclusion.

A common structure for building cryptographic hash functions (like MD5, SHA-1, SHA-2). It processes the input message in fixed-size blocks iteratively using a compression function.

A service used to verify the integrity of a message and, often, its origin. Ensures the message received is exactly as sent and came from the purported source.

The fixed-size output value of a hash function. Also called a hash value.

"Number used once." In Proof-of-Work blockchains (like Bitcoin), it's a value in the block header that miners repeatedly change and re-hash until they find a hash value meeting the network's difficulty target.

A property of cryptographic hash functions: given a hash output h, it is computationally infeasible to find any input x such that H(x) = h.

For a hash function H and an output h, an input x such that H(x) = h is called a preimage of h.

A blockchain consensus mechanism where block creators (validators) are chosen based on the number of coins they hold/stake. An alternative to Proof-of-Work.

A blockchain consensus mechanism requiring participants (miners) to solve a computationally intensive puzzle (finding a valid nonce) to validate transactions and create new blocks. Used by Bitcoin.

A family of cryptographic hash functions published by NIST. Includes SHA-1, SHA-2 (SHA-224, SHA-256, SHA-384, SHA-512), and SHA-3.

Produces a 160-bit hash value. Widely used previously but now considered insecure due to practical collision attacks ("broken" in 2017).

A family of hash functions including SHA-256 (256-bit hash) and SHA-512 (512-bit hash). Based on Merkle–Damgård construction. Still considered secure.

The latest SHA standard, based on the Keccak algorithm and the sponge construction. Designed as an alternative to SHA-2, with a different internal structure.

A mode of operation for cryptographic functions, used in Keccak (SHA-3). It takes an input ("absorbing" phase) and produces an output ("squeezing" phase) of variable lengths, acting like a sponge.

Key Concepts & Examples from Unit 4

Authentication Overview

Why Authenticate?

  • Ensures message integrity (data hasn't been altered).
  • Verifies sender identity (data is from the claimed source).
  • Prevents replay attacks (prevents capturing and resending old messages).

Threats Countered: Disclosure, Masquerade, Content Modification, Sequence Modification.

Types of Authentication Functions:

  1. Message Encryption (Symmetric/Asymmetric - provides some auth but primarily confidentiality).
  2. Message Authentication Code (MAC).
  3. Hash Functions.

Hash Functions

A core tool for ensuring data integrity and enabling authentication mechanisms.

  • Takes an arbitrary-length input M.
  • Produces a fixed-length output h = H(M) (hash value or message digest).
  • Purpose: Data Integrity. Also used for password storage, secure authentication, file organization.
  • Based on computationally infeasible properties:
    • One-Way Property (Preimage Resistance): Given h, hard to find M such that H(M)=h.
    • Second Preimage Resistance (Weak Collision Resistance): Given M1, hard to find M2 ≠ M1 such that H(M1)=H(M2).
    • Collision Resistance (Strong Collision Resistance): Hard to find *any* pair M1 ≠ M2 such that H(M1)=H(M2).
Hashing Process Diagram
General Hashing Process

Hash Function Requirements

RequirementDescription
Variable input sizeCan be applied to data block of any size.
Fixed output sizeProduces a fixed-length output (hash).
EfficiencyH(x) is relatively easy/fast to compute.
Preimage resistant (one-way)Given h, infeasible to find x such that H(x)=h.
Second preimage resistant (weak collision)Given x, infeasible to find y≠x with H(y)=H(x).
Collision resistant (strong collision)Infeasible to find any pair (x, y) with x≠y such that H(x)=H(y).
PseudorandomnessOutput meets standard tests for pseudorandomness.

Attacks on Hash Functions

  • Brute-Force Attacks: Depend on hash output length (n).
    • To find a preimage or second preimage: Requires ~2n operations.
    • To find a collision (Birthday Attack): Requires only ~2n/2 operations. This is often the determining factor for security strength.
  • Cryptanalysis: Exploits specific weaknesses in the hash algorithm itself (e.g., collisions found in MD5 and SHA-1).

Basic Hash Structure (Merkle–Damgård)

Iterative process using a compression function 'f':

  1. Append Padding & Length: Message padded to be multiple of block size (b). Length of original message often appended.
  2. Initialize: Start with an initial hash value IV (CV0).
  3. Process Blocks: For each message block Yi: CVi = f(CVi-1, Yi). The output of one stage feeds into the next.
  4. Final Hash: The last output CVL is the hash value H(M).
Merkle-Damgard Construction Diagram
Merkle–Damgård Structure

Applications of Hash Functions

Message Authentication (using Hash only)

Basic method: Sender computes h=H(M), sends {M, h}. Receiver recomputes H(M') and compares with received h.

Vulnerable to Man-in-the-Middle attack if channel isn't secure (Attacker intercepts {M,h}, replaces with {M', h'=H(M')}).

Hash for Integrity Check
Using Hash Function to Check Data Integrity

Improved Methods (Combining with Encryption/Secrets):

  • Encrypt Message + Hash: E(K, [M || H(M)]) - Provides confidentiality and integrity. (Slide 12a)
  • Encrypt Hash Only: M || E(K, H(M)) - Efficient for large messages, provides integrity but not confidentiality. (Slide 12b)
  • Hashed Secret Value (like MAC): M || H(M || S) - Uses a shared secret S. Provides integrity/authentication, no encryption needed. (Slide 12c)
  • Encrypted Hashed Secret Value: E(K, [M || H(M || S)]) - Provides confidentiality and integrity/authentication. (Slide 12d)

Message Authentication Codes (MACs)

A keyed hash function, provides integrity and authenticity.

MAC = CK(M) = f(K, M)

Requires sender and receiver to share a secret key K.

  • Sender computes MAC = CK(M), sends {M, MAC}.
  • Receiver computes MAC' = CK(M'), compares MAC' with received MAC.
MAC Process Diagram
MAC Generation and Verification

Types:

  • HMAC: Based on a cryptographic hash function (e.g., HMAC-SHA256).
  • CMAC: Based on a block cipher (e.g., AES-CMAC).

Must also protect against replay attacks (e.g., include timestamp or sequence number in M).

Digital Signatures

Provides integrity, authenticity, and non-repudiation.

Uses asymmetric cryptography:

  • Sender computes h = H(M).
  • Sender encrypts hash with their Private Key: Sig = E(KRsender, h).
  • Sender transmits {M, Sig}.
  • Receiver computes h' = H(M').
  • Receiver decrypts signature using sender's Public Key: h'' = D(KUsender, Sig).
  • Receiver compares h' and h''. If match, message is authentic and intact.
Digital Signature Creation and Verification
Digital Signature Creation and Verification

Other Uses

  • Password Storage: Store H(password) instead of password. Compare H(entered_password) with stored hash.
  • Intrusion/Virus Detection: Store hashes of critical files. Periodically recompute hashes to detect modifications.
  • Pseudorandom Function (PRF) / Number Generator (PRNG): Hashing can be used to construct PRFs or PRNGs.

Secure Hash Algorithm (SHA) Family

Developed by NIST.

  • SHA-1: 160-bit hash. Widely used but now considered insecure (collisions found).
  • SHA-2: Family including SHA-224, SHA-256, SHA-384, SHA-512. Uses Merkle–Damgård structure. Still considered secure. SHA-512 processes 1024-bit message blocks through 80 rounds.
  • SHA-3: Based on Keccak algorithm and Sponge Construction. Different internal structure than SHA-1/SHA-2. Offers similar hash sizes (224, 256, 384, 512) plus extendable output functions (SHAKE).
SHA-512 Message Digest Generation
SHA-512 Message Digest Generation Overview

Hashing in Blockchain

  • Blockchain is a distributed ledger of blocks linked chronologically.
  • Each block contains data (transactions) and a Block Header.
  • Block Header includes: Timestamp, Merkle Root (hash of all transactions), Nonce, and crucially, the Hash of the Previous Block's Header.
  • This previous block hash creates the "chain", ensuring immutability (changing a block invalidates subsequent hashes).
  • Proof-of-Work (PoW): Miners compete to find a Nonce such that H(Block Header) meets a difficulty target. Requires computational effort, secures the network (e.g., Bitcoin uses double SHA-256).
  • Proof-of-Stake (PoS): Validators are chosen to create blocks based on staked currency, avoids high energy consumption of PoW.
  • 51% Attack: If an attacker controls >50% of mining power (PoW) or stake (PoS), they can potentially rewrite recent history or double-spend. Harder on large networks.
Blockchain Hash Chaining
Blocks Linked by Hashes

Fill in the Blank Questions

True/False Questions

Multiple Choice Questions