CryptoBook
  • CryptoBook
  • Book Plan
  • Style Guide
    • Sample Page
  • Contributors
  • Fundamentals
    • Mathematical Notation
    • Division and Greatest common divisor
      • Euclidean Algorithm
    • Modular Arithmetic
      • Theorems of Wilson, Euler, and Fermat
        • Fermat's Little Theorem in Detail
        • Euler's Theorem in Detail
      • Quadratic Residues
    • Continued Fractions
  • Number Theory
  • Ideals
  • Polynomials With Shared Roots
  • Integer Factorization
    • Pollard rho
    • Sieves
  • Abstract algebra
    • Groups
      • Another take on groups
      • Discrete Log Problem
    • Rings
    • Fields
    • Polynomials
  • Elliptic Curves
    • Untitled
  • Lattices
    • Introduction
    • LLL reduction
      • Gram-Schmidt Orthogonalization
      • Lagrange's algorithm
      • LLL reduction
    • Lattice reduction
      • Minkowski reduced
      • HKZ reduced
      • LLL reduced
    • Applications
      • Coppersmith algorithm
      • Extensions of Coppersmith algorithm
    • Hard lattice problems
    • Lattices of interest
    • Cryptographic lattice problems
      • Short integer solutions (SIS)
      • Learning with errors (LWE)
      • Ring-LWE
      • NTRU
    • Interactive fun
    • Resources and notations
  • Asymmetric Cryptography
  • RSA
    • Proof of correctness
    • RSA application
    • Low Private Component Attacks
      • Wiener's Attack
      • Boneh-Durfee Attack
    • Common Modulus Attack
    • Recovering the Modulus
  • Diffie-Hellman
    • MITM
  • Elliptic Curve Cryptography
  • Symmetric Cryptography
    • Encryption
    • The One Time Pad
    • AES
      • Rijndael Finite Field
      • Round Transformations
  • Hashes
    • Introduction / overview
    • The Birthday paradox / attack
  • Isogeny Based Cryptography
    • Introduction to Isogeny Cryptography
    • Isogenies
    • Isogeny and Ramanujan Graphs
  • Appendices
    • Sets and Functions
    • Probability Theory
Powered by GitBook
On this page
  • Introduction
  • Definitions and Formalism
  • Proprieties
  • 1. Preimage Resistance
  • 2. Second Preimage Resistance
  • 3. Hash Collisions
  • Resources

Was this helpful?

Export as PDF
  1. Hashes

Introduction / overview

PreviousRound TransformationsNextThe Birthday paradox / attack

Last updated 4 years ago

Was this helpful?

Authors: Zademn Reviewed by:

Introduction

Another desired propriety of our cryptographic protocols is data / message integrity. This propriety assures that during a data transfer the data has not been modified.

Suppose Alice has a new favourite game and wants to send it to Bob. How can Bob be sure that the file he receives is the same as the one Alice intended to send? One would say to run the game and see. But what if the game is a malware? What if there are changes that are undetectable to the human eye?

Hashes are efficient algorithms to check if two files are the same based on the data they contain. The slightest change (a single bit) would change the hash completely.

On the internet, when you download, files you often see a number near the download button called the hash of that file. If you download that file, recalculate the hash locally and obtain the same hash you can be sure that the data you downloaded is the intended one.

Another use for hashes is storing passwords. We don't want to store plaintext passwords because in case of a breach the attacker will know our password. If we hash them he will have to reverse the hash (or find a collision) to use our password. Luckily the hashes are very hard to reverse and collision resistant by definition and construction.

Note that hashes need a secure channel for communication. Alice must have a secure way to send her hash to Bob. If Eve intercepts Alice's message and hash she can impersonate Alice by changing the file, computing the hash and sending them to Bob. Hashes do not provide authenticity.

Definitions and Formalism

Definition - Hash

A hash is an efficient deterministic function that takes an arbitrary length input and produces a fixed length output (digest, hash). Let H:M⟶TH:\mathcal{M} \longrightarrow \mathcal{T}H:M⟶T be a function where

  • M\mathcal{M}M = message space

  • T\mathcal{T}T = digest space

Desired proprieties

  • Deterministic

  • Fast to compute

  • Small changes change the hash completely

  • Preimage, second preimage and collision resistance (Explained below)

How to use a hash:

  • Suppose you want to check if Alice and Bob have the same version of some file (File integrity)

from hashlib import sha256
m1 = b"Some file"
m2 = b"Some file"
sha256(m1).digest() == sha256(m2).digest() # -> True

Proprieties

  • Preimage Image Resistance

  • Second Preimage resistance

  • Resistant to collisions

1. Preimage Resistance

Intuition

This propriety prevents an attacker to find the original message from a hash

2. Second Preimage Resistance

Attack game

  • This propriety prevents an attacker to substitute a message with another and get the same hash

3. Hash Collisions

Intuition

A hash collision happens when we have two different messages that have the same hash

Why do we care about hash collisions?

  • Since hashes are used to fastly verify a message integrity if two messages have the same hash then we can replace one with another => We can play with data

  • Yet, we don't want hash collisions to be computable

    • We don't want an attacker to be able to craft collisions or find collisions given a message

Let's throw some definitions

Attack game

Security

Intuition

We know hash collisions exist (therefore an efficient adversary must exist) and that is easy to prove therefore we request an explicit algorithm that finds these collisions

This propriety makes it difficult for an attacker to find 2 input values with the same hash

Difference from 2nd preimage

  • There is a fundamental difference in how hard it is to break collision resistance and second-preimage resistance.

    • Breaking collision-resistance is like inviting more people into the room until the room contains 2 people with the same birthday.

    • Breaking second-preimage resistance is like inviting more people into the room until the room contains another person with your birthday.

  • One of these fundamentally takes longer than the other

Implications

Lemma 1

Note

  • Provisional implication

Lemma 2

Assuming a function is second preimage resistant is a weaker assumption than assuming it is collision resistant.

Resources

Bibliography

  • Figure 1 - Wikipedia

They compute H(a),H(b)H(a), H(b)H(a),H(b)

They check if H(a)=H(b)H(a) = H(b)H(a)=H(b)

The hash function must be a one way function. Given t∈Tt \in \mathcal{T}t∈T find m∈Mm \in \mathcal{M}m∈M s.t H(m)=tH(m) = tH(m)=t

It should be unfeasible to reverse a hash function (O(2l)\mathcal{O}(2^l)O(2l) time where lll is the number of output bits)

Given mmm it should be hard to find m′≠mm' \neq mm′=m with H(m′)=H(m)H(m') = H(m)H(m′)=H(m)

An adversary A\mathcal{A}A is given a message mmm and outputs a message m′≠mm' \neq mm′=m.

A\mathcal{A}A wins the game if he finds H(m)=H(m′)H(m) = H(m')H(m)=H(m′)

His advantage is Pr[A finds a second preimage]Pr[\mathcal{A} \text{ finds a second preimage}]Pr[A finds a second preimage] where Pr(⋅)Pr(\cdot)Pr(⋅)is a probability

In practice a hash function with lll bits output should need 2l2^l2l queries before one can find a second preimage

Now, we want to hash big files and big messages so ∣M∣>>∣T∣|\mathcal{M}| >> |\mathcal{T}|∣M∣>>∣T∣ => It would appear that hash collisions are possible

Natural collisions are normal to happen and we consider them improbable if T\mathcal{T}T is big enough (SHA256⇒T=\text{SHA256} \Rightarrow T =SHA256⇒T= {0,1}256\{0,1\}^{256}{0,1}256)

An adversary A\mathcal{A}A outputs two messages m0≠m1m_0 \neq m_1m0​=m1​

A\mathcal{A}A wins the game if he finds H(m0)=H(m1)H(m_0) = H(m_1)H(m0​)=H(m1​)

His advantage is Pr[Adversary finds a collision]Pr[\text{Adversary finds a collision}]Pr[Adversary finds a collision]

A hash function HHH is collision resistant if for all efficient and explicit adversaries the advantage is negligible

Assuming a function HHH is preimage resistant for every element of the range of HHH is a weaker assumption than assuming it is either collision resistant or second preimage resistant.

- Wikipedia entry

- Computerphile

- Good read for more details

https://crypto.stackexchange.com/questions/10602/why-does-second-pre-image-resistance-imply-pre-image-resistance?rq=1
https://crypto.stackexchange.com/questions/9684/pre-image-resistant-but-not-2nd-pre-image-resistant
https://en.wikipedia.org/wiki/Cryptographic_hash_function
https://www.youtube.com/watch?v=b4b8ktEV4Bg
https://www.tutorialspoint.com/cryptography/cryptography_hash_functions.htm
https://www.cs.ucdavis.edu/~rogaway/papers/relates.pdf
Figure 1.
Figure 2. Security game - second preimage resistance
Figure 3. Security game - Collision resistance