Last modified on 17 October 2014, at 05:36

Cryptography/Hashes

A digest, sometimes called a hash, is the result of the application of a hash function (a very specific mathematical function or algorithm) that takes in some arbitrary value and produces a hash value, based on the given input.

Information security often includes situations where a user wants to transform one block of information into another block of information in such a way that the original block can not be recreated. It is also required that every time the input block is processed, it will produce the same output block. This means that the process is deterministic.

Such processes behave similar to a hash function and so are typically called cryptographic hashes. These hashes are used in serving authentication and integrity goals of cryptography. A cryptographic hash can be described as f(message) = hash and has property that the hash function is one way. A given hash value can not feasibly be reversed to get a message that produces that hash value. I.e. There is no useful inverse hash function f'(hash) = message

This property can be formally expanded to provide the following properties of a secure hash:

  • Preimage resistant : Given H it should be hard to find M such that H = hash(M).
  • Second preimage resistant: Given an input m1, it should be hard to find another input, m2 (not equal to m1) such that hash(m1) = hash(m2).
  • Collision-resistant: it should be hard to find two different messages m1 and m2 such that hash(m1) = hash(m2). Because of the birthday paradox this means the hash function must have a larger image than is required for preimage-resistance.


A hash function is the implementation of an algorithm that, given some data as input, will generate a short result called a digest.

For Ex: If our hash function is 'X' and we have 'wiki' as our input... then X('wiki')= a5g78 i.e. some hash value.

Qualities of a good hash function are

1. Produces a fixed length key for variable input
2. Has got infinite key space, implies the next point
3. No collisions (i.e. no two different pieces of input give the same key value)

Applications of hash functionsEdit

Non-cryptographic hash functions have many applications,[1] but in this book we focus on applications that require cryptographic hash functions:

A typical use of a cryptographic hash would be as follows: Alice poses to Bob a tough math problem and claims she has solved it. Bob would like to try it himself, but would yet like to be sure that Alice is not bluffing. Therefore, Alice writes down her solution, appends a random nonce, computes its hash and tells Bob the hash value (whilst keeping the solution secret). This way, when Bob comes up with the solution himself a few days later, Alice can verify his solution but still be able to prove that she had the solution earlier.

In actual practice, Alice and Bob will often be computer programs, and the secret would be something less easily spoofed than a claimed puzzle solution. The above application is called a commitment scheme. Another important application of secure hashes is verification of message integrity. Determination of whether or not any changes have been made to a message (or a file), for example, can be accomplished by comparing message digests calculated before, and after, transmission (or any other event) (see Tripwire, a system using this property as a defense against malware and malfeasance). A message digest can also serve as a means of reliably identifying a file.

A related application is password verification. Passwords are usually not stored in cleartext, for obvious reasons, but instead in digest form. We discuss password handling -- in particular, why hashing the password once is inadequate -- in more detail in a later chapter, Password handling.

A hash function is a key part of message authentication (HMAC).

Most distributed version control systems (DVCSs) use cryptographic hashes.[2]

For both security and performance reasons, most digital signature algorithms specify that only the digest of the message be "signed", not the entire message. Hash functions can also be used in the generation of pseudo-random bits.

SHA-1, MD5, and RIPEMD-160 are among the most commonly-used message digest algorithms as of 2004. In August 2004, researchers found weaknesses in a number of hash functions, including MD5, SHA-0 and RIPEMD. This has called into question the long-term security of later algorithms which are derived from these hash functions. In particular, SHA-1 (a strengthened version of SHA-0), RIPEMD-128 and RIPEMD-160 (strengthened versions of RIPEMD). Neither SHA-0 nor RIPEMD are widely used since they were replaced by their strengthened versions.

Other common cryptographic hashes include SHA-2 and Tiger.


Later we will discuss the "birthday attack" and other techniques people use for Breaking Hash Algorithms.

Hash speedEdit

There are two contradictory requirements for cryptographic hash speed:

  • When using hashes for password verification, people prefer hash functions that take a long time to run. If/when a password verification database (the /etc/passwd file, the /etc/shadow file, etc.) is accidentally leaked, they want to force a brute-force attacker to take a long time to test each guess.[3]

Some popular hash functions in this category are

    • scrypt
    • bcrypt
    • PBKDF2
  • When using hashes for file verification, people prefer hash functions that run very fast. They want a corrupted file can be detected as soon as possible (and queued for retransmission, quarantined, or etc.).

Some popular hash functions in this category are

    • SHA256
    • SHA-3


Further readingEdit

  1. applications of non-cryptographic hash functions are described in Data Structures/Hash Tables and Algorithm Implementation/Hashing.
  2. Eric Sink. "Version Control by Example". Chapter 12: "Git: Cryptographic Hashes".
  3. "Speed Hashing"