Algorithm Implementation/Hashing

Hashing algorithms are generically split into three sub-sets:

Indexing Algorithms
generally used to quickly find items, using lists called "hash tables".
Checksums
often used for simple data checking, to detect any accidental bit errors during communication—we discuss them in an earlier chapter, Checksums.
Message Digests
a cryptographically secure one-way function, and many are closely examined for their security in the computer security field.

Indexing Algorithms

edit

Jenkins one-at-a-time hash

edit

The "Jenkins One-at-a-time hash", from an article by Bob Jenkins in Dr. Dobb's September 1997.

C:

uint32 joaat_hash(uchar *key, size_t key_len) {
    uint32 hash = 0;
    size_t i;

    for (i = 0; i < key_len; i++) {
        hash += key[i];
        hash += (hash << 10);
        hash ^= (hash >> 6);
    }
    hash += (hash << 3);
    hash ^= (hash >> 11);
    hash += (hash << 15);
    return hash;
}

Java:

int joaat_hash(byte[] key) {
    int hash = 0;

    for (byte b : key) {
        hash += (b & 0xFF);
        hash += (hash << 10);
        hash ^= (hash >>> 6);
    }
    hash += (hash << 3);
    hash ^= (hash >>> 11);
    hash += (hash << 15);
    return hash;
}

See Data Structures/Hash Tables#Choosing a good hash function for more details on the "Jenkins One-at-a-time hash".

other hash functions for hash tables

edit

Other popular hash functions for hash tables include: other Jenkins hash functions, CityHash, and MurmurHash. It may be desirable to use a keyed hash function to resist "hash flooding": modern implementations use SipHash for this purpose; others use a "universal" hash function.

Checksums and Cyclic Redundancy Checks

edit

See Algorithm Implementation/Checksums.

Message Digests

edit

The state-of-the-art for message digests and what is considered secure change frequently. The US NSA holds algorithm contests and select the winners as SHAs, the "Secure Hashing Algorithms".

  • MD5 (RFC 1321) and its predecessors MD2 and MD4 are all broken. They are now both obsolete and insecure.
  • SHA0/SHA1 (FIPS-180-1) are partially broken. They are obsolete and considered insecure against well-funded opponents since 2005.
  • SHA-2 is not yet considered broken, but is vulnerable to length extension attacks.
  • SHA-3 (Keccak) is not vulnerable to length extension attacks.
    • KangarooTwelve is an extremely parallelized (hence very fast) version of Keccak. It is not vetted by the NSA, but the authors believe it is as secure as SHA-3.
    • BLAKE is a hash function that made it to the final round of the SHA-3 contest. The BLAKE2b variant is widely used and considered secure (as of 2020). It also has an extremely parallelized version called BLAKE3.

Although these algorithms can be implemented directly from the specification, as with other forms of cryptography it is usually safer and faster to use a thoroughly-reviewed open-source library.

Further reading

edit


References

edit