Authors: Zademn Reviewed by:
Another desired propriety of our cryptographic protocols is data / message integrity. This propriety assures that during a data transfer the data has not been modified.
Suppose Alice has a new favourite game and wants to send it to Bob. How can Bob be sure that the file he receives is the same as the one Alice intended to send? One would say to run the game and see. But what if the game is a malware? What if there are changes that are undetectable to the human eye?
Hashes are efficient algorithms to check if two files are the same based on the data they contain. The slightest change (a single bit) would change the hash completely.
Definition - Hash
A hash is an efficient deterministic function that takes an arbitrary length input and produces a fixed length output (digest, hash). Let be a function where
= message space
= digest space
Fast to compute
Small changes change the hash completely
Preimage, second preimage and collision resistance (Explained below)
How to use a hash:
Suppose you want to check if Alice and Bob have the same version of some file (File integrity)
They check if
from hashlib import sha256m1 = b"Some file"m2 = b"Some file"sha256(m1).digest() == sha256(m2).digest() # -> True
Preimage Image Resistance
Second Preimage resistance
Resistant to collisions
The hash function must be a one way function. Given find s.t
Given it should be hard to find with
An adversary is given a message and outputs a message .
wins the game if he finds
His advantage is where is a probability
In practice a hash function with bits output should need queries before one can find a second preimage
This propriety prevents an attacker to substitute a message with another and get the same hash
Why do we care about hash collisions?
Since hashes are used to fastly verify a message integrity if two messages have the same hash then we can replace one with another => We can play with data
Now, we want to hash big files and big messages so => It would appear that hash collisions are possible
Natural collisions are normal to happen and we consider them improbable if is big enough ( )
Yet, we don't want hash collisions to be computable
We don't want an attacker to be able to craft collisions or find collisions given a message
An adversary outputs two messages
wins the game if he finds
His advantage is
A hash function is collision resistant if for all efficient and explicit adversaries the advantage is negligible
There is a fundamental difference in how hard it is to break collision resistance and second-preimage resistance.
Breaking collision-resistance is like inviting more people into the room until the room contains 2 people with the same birthday.
Breaking second-preimage resistance is like inviting more people into the room until the room contains another person with your birthday.
One of these fundamentally takes longer than the other
Assuming a function is preimage resistant for every element of the range of is a weaker assumption than assuming it is either collision resistant or second preimage resistant.
Assuming a function is second preimage resistant is a weaker assumption than assuming it is collision resistant.
https://en.wikipedia.org/wiki/Cryptographic_hash_function - Wikipedia entry
https://www.youtube.com/watch?v=b4b8ktEV4Bg - Computerphile
https://www.cs.ucdavis.edu/~rogaway/papers/relates.pdf - Good read for more details
Figure 1 - Wikipedia