SE205 - Unit 4: Message Authentication and Hash Functions

Glossary of Terms

The process of verifying the identity of a user, process, or device, often as a prerequisite to granting access to resources. Also refers to verifying the source and integrity of data.

A type of cryptanalytic attack based on the mathematics of the birthday problem. It makes finding a collision for a hash function easier than brute-force, requiring roughly 2^n/2 attempts for an n-bit hash.

A unique identifier for a block in a blockchain, generated by hashing the block's header content. Acts as a digital fingerprint and links blocks together.

A distributed, immutable digital ledger that stores records (transactions) in blocks, chronologically linked using cryptography (hashes).

A mode of operation for block ciphers where each block of plaintext is XORed with the previous ciphertext block before being encrypted. Can be used as a basis for MAC algorithms (e.g., CMAC).

A type of Message Authentication Code (MAC) algorithm based on a symmetric block cipher (like AES), rather than a hash function.

Occurs when two different input messages produce the exact same hash output value. Cryptographic hash functions aim to be collision-resistant.

A property of cryptographic hash functions. Strong collision resistance means it's computationally infeasible to find *any* two different inputs M1, M2 such that H(M1) = H(M2). Weak collision resistance (or second preimage resistance) means given an input M1, it's infeasible to find a different M2 such that H(M1) = H(M2).

The assurance that digital information has not been altered in an unauthorized way since it was created, transmitted, or stored. Hash functions and MACs are primary tools for ensuring this.

A cryptographic mechanism used to verify the authenticity (origin), integrity (unaltered content), and non-repudiation (sender cannot deny sending) of digital data. Typically created by hashing the data and encrypting the hash with the sender's private key.

A mathematical algorithm that takes an arbitrary block of data (input message) and returns a fixed-size bit string (the hash value or message digest), such that any change to the data will (with high probability) change the hash value. Key properties include being one-way and collision-resistant.

A specific type of Message Authentication Code (MAC) involving a cryptographic hash function (like SHA-256) in combination with a secret cryptographic key.

The cryptographic algorithm based on the sponge construction that was selected as the winner of the NIST hash function competition and became the SHA-3 standard.

Another term for a Message Authentication Code (MAC), emphasizing that the hash calculation incorporates a secret key shared between the parties.

A short piece of information generated using a secret key and a cryptographic algorithm (either hash-based like HMAC or cipher-based like CMAC) applied to a message. It verifies both the data integrity and authenticity of the message.

MAC = f(Key, Message)

A hash contained in a block header representing the hash of all transactions within that block, constructed using a Merkle Tree. Allows for efficient verification of transaction inclusion.

A common structure for building cryptographic hash functions (like MD5, SHA-1, SHA-2). It processes the input message in fixed-size blocks iteratively using a compression function.

A service used to verify the integrity of a message and, often, its origin. Ensures the message received is exactly as sent and came from the purported source.

The fixed-size output value of a hash function. Also called a hash value.

"Number used once." In Proof-of-Work blockchains (like Bitcoin), it's a value in the block header that miners repeatedly change and re-hash until they find a hash value meeting the network's difficulty target.

A property of cryptographic hash functions: given a hash output h, it is computationally infeasible to find any input x such that H(x) = h.

For a hash function H and an output h, an input x such that H(x) = h is called a preimage of h.

A blockchain consensus mechanism where block creators (validators) are chosen based on the number of coins they hold/stake. An alternative to Proof-of-Work.

A blockchain consensus mechanism requiring participants (miners) to solve a computationally intensive puzzle (finding a valid nonce) to validate transactions and create new blocks. Used by Bitcoin.

A family of cryptographic hash functions published by NIST. Includes SHA-1, SHA-2 (SHA-224, SHA-256, SHA-384, SHA-512), and SHA-3.

Produces a 160-bit hash value. Widely used previously but now considered insecure due to practical collision attacks ("broken" in 2017).

A family of hash functions including SHA-256 (256-bit hash) and SHA-512 (512-bit hash). Based on Merkle–Damgård construction. Still considered secure.

The latest SHA standard, based on the Keccak algorithm and the sponge construction. Designed as an alternative to SHA-2, with a different internal structure.

A mode of operation for cryptographic functions, used in Keccak (SHA-3). It takes an input ("absorbing" phase) and produces an output ("squeezing" phase) of variable lengths, acting like a sponge.

Key Concepts & Examples from Unit 4

Authentication Overview

Why Authenticate?

Ensures message integrity (data hasn't been altered).
Verifies sender identity (data is from the claimed source).
Prevents replay attacks (prevents capturing and resending old messages).

Threats Countered: Disclosure, Masquerade, Content Modification, Sequence Modification.

Types of Authentication Functions:

Message Encryption (Symmetric/Asymmetric - provides some auth but primarily confidentiality).
Message Authentication Code (MAC).
Hash Functions.

Hash Functions

A core tool for ensuring data integrity and enabling authentication mechanisms.

Takes an arbitrary-length input M.
Produces a fixed-length output h = H(M) (hash value or message digest).
Purpose: Data Integrity. Also used for password storage, secure authentication, file organization.
Based on computationally infeasible properties:
- One-Way Property (Preimage Resistance): Given h, hard to find M such that H(M)=h.
- Second Preimage Resistance (Weak Collision Resistance): Given M1, hard to find M2 ≠ M1 such that H(M1)=H(M2).
- Collision Resistance (Strong Collision Resistance): Hard to find *any* pair M1 ≠ M2 such that H(M1)=H(M2).

General Hashing Process

Hash Function Requirements

Requirement	Description
Variable input size	Can be applied to data block of any size.
Fixed output size	Produces a fixed-length output (hash).
Efficiency	H(x) is relatively easy/fast to compute.
Preimage resistant (one-way)	Given h, infeasible to find x such that H(x)=h.
Second preimage resistant (weak collision)	Given x, infeasible to find y≠x with H(y)=H(x).
Collision resistant (strong collision)	Infeasible to find any pair (x, y) with x≠y such that H(x)=H(y).
Pseudorandomness	Output meets standard tests for pseudorandomness.

Attacks on Hash Functions

Brute-Force Attacks: Depend on hash output length (n).
- To find a preimage or second preimage: Requires ~2ⁿ operations.
- To find a collision (Birthday Attack): Requires only ~2^n/2 operations. This is often the determining factor for security strength.
Cryptanalysis: Exploits specific weaknesses in the hash algorithm itself (e.g., collisions found in MD5 and SHA-1).

Basic Hash Structure (Merkle–Damgård)

Iterative process using a compression function 'f':

Append Padding & Length: Message padded to be multiple of block size (b). Length of original message often appended.
Initialize: Start with an initial hash value IV (CV₀).
Process Blocks: For each message block Y_i: CV_i = f(CV_i-1, Y_i). The output of one stage feeds into the next.
Final Hash: The last output CV_L is the hash value H(M).

Merkle–Damgård Structure

Applications of Hash Functions

Message Authentication (using Hash only)

Basic method: Sender computes h=H(M), sends {M, h}. Receiver recomputes H(M') and compares with received h.

Vulnerable to Man-in-the-Middle attack if channel isn't secure (Attacker intercepts {M,h}, replaces with {M', h'=H(M')}).

Using Hash Function to Check Data Integrity

Improved Methods (Combining with Encryption/Secrets):

Encrypt Message + Hash: E(K, [M || H(M)]) - Provides confidentiality and integrity. (Slide 12a)
Encrypt Hash Only: M || E(K, H(M)) - Efficient for large messages, provides integrity but not confidentiality. (Slide 12b)
Hashed Secret Value (like MAC): M || H(M || S) - Uses a shared secret S. Provides integrity/authentication, no encryption needed. (Slide 12c)
Encrypted Hashed Secret Value: E(K, [M || H(M || S)]) - Provides confidentiality and integrity/authentication. (Slide 12d)

Message Authentication Codes (MACs)

A keyed hash function, provides integrity and authenticity.

MAC = C_K(M) = f(K, M)

Requires sender and receiver to share a secret key K.

Sender computes MAC = C_K(M), sends {M, MAC}.
Receiver computes MAC' = C_K(M'), compares MAC' with received MAC.

MAC Generation and Verification

Types:

HMAC: Based on a cryptographic hash function (e.g., HMAC-SHA256).
CMAC: Based on a block cipher (e.g., AES-CMAC).

Must also protect against replay attacks (e.g., include timestamp or sequence number in M).

Digital Signatures

Provides integrity, authenticity, and non-repudiation.

Uses asymmetric cryptography:

Sender computes h = H(M).
Sender encrypts hash with their Private Key: Sig = E(KR_sender, h).
Sender transmits {M, Sig}.
Receiver computes h' = H(M').
Receiver decrypts signature using sender's Public Key: h'' = D(KU_sender, Sig).
Receiver compares h' and h''. If match, message is authentic and intact.

Digital Signature Creation and Verification

Other Uses

Password Storage: Store H(password) instead of password. Compare H(entered_password) with stored hash.
Intrusion/Virus Detection: Store hashes of critical files. Periodically recompute hashes to detect modifications.
Pseudorandom Function (PRF) / Number Generator (PRNG): Hashing can be used to construct PRFs or PRNGs.

Secure Hash Algorithm (SHA) Family

Developed by NIST.

SHA-1: 160-bit hash. Widely used but now considered insecure (collisions found).
SHA-2: Family including SHA-224, SHA-256, SHA-384, SHA-512. Uses Merkle–Damgård structure. Still considered secure. SHA-512 processes 1024-bit message blocks through 80 rounds.
SHA-3: Based on Keccak algorithm and Sponge Construction. Different internal structure than SHA-1/SHA-2. Offers similar hash sizes (224, 256, 384, 512) plus extendable output functions (SHAKE).

SHA-512 Message Digest Generation Overview

Hashing in Blockchain

Blockchain is a distributed ledger of blocks linked chronologically.
Each block contains data (transactions) and a Block Header.
Block Header includes: Timestamp, Merkle Root (hash of all transactions), Nonce, and crucially, the Hash of the Previous Block's Header.
This previous block hash creates the "chain", ensuring immutability (changing a block invalidates subsequent hashes).
Proof-of-Work (PoW): Miners compete to find a Nonce such that H(Block Header) meets a difficulty target. Requires computational effort, secures the network (e.g., Bitcoin uses double SHA-256).
Proof-of-Stake (PoS): Validators are chosen to create blocks based on staked currency, avoids high energy consumption of PoW.
51% Attack: If an attacker controls >50% of mining power (PoW) or stake (PoS), they can potentially rewrite recent history or double-spend. Harder on large networks.

Blocks Linked by Hashes