We said integrity means detecting if data was tampered with. So how do you check whether a message arrived exactly as it was sent?
Imagine you download a file from the internet. How do you know the file wasnāt corrupted during transfer? Or worse, how do you know someone didnāt swap it with a malicious version?
You could compare the file byte by byte with the original. But you donāt have the original. Thatās the whole point, youāre downloading it because you donāt have it yet.
You need a way to create a short āfingerprintā of the data that you can compare. If the fingerprint matches, the data is intact. If it doesnāt, something changed.
A hash function takes any input, no matter how large, and produces a fixed-size output. Think of it as a fingerprint machine. You feed in a document, and it spits out a unique fingerprint.
The input can be anything: a single character, a paragraph, an entire movie file. The output is always the same size. For SHA-256, the output is always 256 bits (64 hexadecimal characters).
Hereās what makes a good hash function:
SHA-256 is the most widely used hash function today. Hereās what it looks like:
Input: "hello"
Output: 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
Input: "hellp"
Output: fdd7585e08c4e2afd71dcabdb4636c89d557a3f42db9e2040c8bbd1708aa4ce7
Completely different outputs from inputs that differ by one character. Thereās no way to look at the hash and figure out what the input was. And thereās no way to craft a different input that produces the same hash.
Hereās the basic idea. The sender computes a hash of the message and sends both the message and the hash. The receiver computes the hash of the received message and compares it to the hash that was sent. If they match, the message wasnāt modified.
sequenceDiagram
participant S as Sender
participant R as Receiver
S->>S: Compute hash of message
S->>R: Message + Hash
R->>R: Compute hash of received message
R->>R: Compare computed hash with received hash
Note over R: Match = data intact
Note over R: Mismatch = data was modified
This is how software downloads work. The website publishes the SHA-256 hash of the file. You download the file, compute the hash yourself, and compare. If they match, you got the right file.
Thereās a catch. If an attacker can modify the message, they can also modify the hash. They change the message, compute a new hash for the modified message, and send both. The receiver computes the hash, it matches, and they have no idea the message was tampered with.
Plain hashing only works if the hash is delivered through a separate, trusted channel. For software downloads, the hash is on the website (hopefully over HTTPS). But for data flowing over a network connection, we need something better.
HMAC (Hash-based Message Authentication Code) solves this. Itās a hash that requires a secret key. Only someone who knows the key can compute the correct HMAC.
The sender and receiver share a secret key. The sender computes HMAC(key, message) and sends the message plus the HMAC. The receiver computes HMAC(key, received message) with the same key and compares. If they match, two things are true:
An attacker who modifies the message canāt compute the correct HMAC because they donāt have the key. They canāt just recompute the hash.
In TLS, after the handshake establishes a shared secret key, every message includes a MAC (or uses AEAD encryption, which bundles encryption and integrity together). This ensures that encrypted data canāt be tampered with in transit.
Weāll see exactly how this works when we get to cipher suites and the handshake. For now, the key takeaway is: hashing gives us integrity, and HMAC gives us integrity that an attacker canāt forge.
| Algorithm | Output Size | Status |
|---|---|---|
| MD5 | 128 bits | Broken. Do not use. |
| SHA-1 | 160 bits | Broken. Being phased out. |
| SHA-256 | 256 bits | Current standard. Used everywhere in TLS. |
| SHA-384 | 384 bits | Used in some TLS cipher suites. |
| SHA-512 | 512 bits | Available but less common in TLS. |
MD5 and SHA-1 are ābrokenā because researchers found ways to create collisions: two different inputs that produce the same hash. This means an attacker could create a malicious file with the same hash as a legitimate one. SHA-256 and above have no known practical attacks.
Next: Symmetric Encryption