A digest, sometimes simply called a hash, is the result of a hash function, a specific mathematical function or algorithm, that can be described as . "Hashing" is required to be a deterministic process, and so, every time the input block is "hashed" by the application of the same hash function, the resulting digest or hash is constant, maintaining a verifiable relation with the input data. Thus making this type of algorithms useful for information security.

Other processes called cryptographic hashes, function similarly to hashing, but require added security, in the form or a level of guarantee that the input data can not feasibly be reversed from the generated hash value. I.e. That there is no useful inverse hash function

This property can be formally expanded to provide the following properties of a secure hash:

  • Preimage resistant : Given H it should be hard to find M such that H = hash(M).
  • Second preimage resistant: Given an input m1, it should be hard to find another input, m2 (not equal to m1) such that hash(m1) = hash(m2).
  • Collision-resistant: it should be hard to find two different messages m1 and m2 such that hash(m1) = hash(m2). Because of the birthday paradox this means the hash function must have a larger image than is required for preimage-resistance.

A hash function is the implementation of an algorithm that, given some data as input, will generate a short result called a digest. A useful hash function generates a fixed length of hashed value.

For Ex: If our hash function is 'X' and we have 'wiki' as our input... then X('wiki')= a5g78 i.e. some hash value.

Applications of hash functions

edit

Non-cryptographic hash functions have many applications,[1] but in this section we focus on applications that specifically require cryptographic hash functions:

A typical use of a cryptographic hash would be as follows: Alice poses to Bob a tough math problem and claims she has solved it. Bob would like to try it himself, but would yet like to be sure that Alice is not bluffing. Therefore, Alice writes down her solution, appends a random nonce, computes its hash and tells Bob the hash value (whilst keeping the solution secret). This way, when Bob comes up with the solution himself a few days later, Alice can verify his solution but still be able to prove that she had the solution earlier.

In actual practice, Alice and Bob will often be computer programs, and the secret would be something less easily spoofed than a claimed puzzle solution. The above application is called a commitment scheme. Another important application of secure hashes is verification of message integrity. Determination of whether or not any changes have been made to a message (or a file), for example, can be accomplished by comparing message digests calculated before, and after, transmission (or any other event) (see Tripwire, a system using this property as a defense against malware and malfeasance). A message digest can also serve as a means of reliably identifying a file.

A related application is password verification. Passwords should not be stored in clear text, for obvious reasons, but instead in digest form. In a later chapter, Password handling will be discussed in more detail—in particular, why hashing the password once is inadequate.

A hash function is a key part of message authentication (HMAC).

Most distributed version control systems (DVCSs) use cryptographic hashes.[2]

For both security and performance reasons, most digital signature algorithms specify that only the digest of the message be "signed", not the entire message. The Hash functions can also be used in the generation of pseudo-random bits.

SHA-1, MD5, and RIPEMD-160 are among the most commonly-used message digest algorithms as of 2004. In August 2004, researchers found weaknesses in a number of hash functions, including MD5, SHA-0 and RIPEMD. This has called into question the long-term security of later algorithms which are derived from these hash functions. In particular, SHA-1 (a strengthened version of SHA-0), RIPEMD-128 and RIPEMD-160 (strengthened versions of RIPEMD). Neither SHA-0 nor RIPEMD are widely used since they were replaced by their strengthened versions.

Other common cryptographic hashes include SHA-2 and Tiger.

Later we will discuss the "birthday attack" and other techniques people use for Breaking Hash Algorithms.

Hash speed

edit

When using hashes for file verification, people prefer hash functions that run very fast. They want a corrupted file can be detected as soon as possible (and queued for retransmission, quarantined, or etc.). Some popular hash functions in this category are:

  • BLAKE2b
  • SHA-3

In addition, both SHA-256 (SHA-2) and SHA-1 have seen hardware support in some CPU instruction sets.

When using hashes for password verification, people prefer hash functions that take a long time to run. If/when a password verification database (the /etc/passwd file, the /etc/shadow file, etc.) is accidentally leaked, they want to force a brute-force attacker to take a long time to test each guess.[3] Some popular hash functions in this category are:

  • Argon2
  • scrypt
  • bcrypt
  • PBKDF2

We talk more about password hashing in the Cryptography/Secure Passwords section.

Further reading

edit
  1. applications of non-cryptographic hash functions are described in Data Structures/Hash Tables and Algorithm Implementation/Hashing.
  2. Eric Sink. "Version Control by Example". Chapter 12: "Git: Cryptographic Hashes".
  3. "Speed Hashing"