Data Coding Theory/Hamming Codes
IntroductionEdit
Transfer of information from one place to another faces many difficulties. Principal among them is noise. For example suppose 01010101 is sent from one end. Due to noise, it may be received as 11010101, with the first digit changed by noise. Clearly if what is sent is not what is received, communication can be problematic. Error correcting codes have been developed to solve this type of problem.
In practical applications it makes sense to send messages in blocks, i.e. a series of digits one after another, e.g. 11111100. It is well known that electronic data are represented using 0's and 1's. Each digit (0 or 1) is called a bit. A byte is made up of 8 bits. A byte allows us to represent symbols. Let's suppose for the moment that data is sent a byte at time. As before, due to noise that is sent may not always be what is received.
Computer scientists came up with a simple error detection method called parity check. With this method, we represent data using only the first 7 bits. The last bit is always chosen so that together with the other seven there are an even number of 1's in the byte. When the data arrives at the other end, the receiver counts the number of 1's in the byte. If it's odd, then the byte must have been contaminated by noise so the receiver may ask for retransmission.
This method only detects errors but it can not correct them other than ask for retransmission. If retransmission is expensive (e.g. satellite), parity check is not ideal. Also its error detection ability is very low. If two bits were changed by noise, then the receiver will assume the message is correct. More sophisticated error correction codes address these problems.
Hamming CodeEdit
Hamming code is an improvement on the parity check method. It can correct 1 error but at a price. In the parity check scheme, the first 7 bits in a byte are actual information so different symbols may be represented using a byte. But for Hamming code each block of data contains 7 bits (not 8) and only 4 bits in a block are used to represent data, so only symbols may be represented in a block. Therefore, to send the same amount of info in Hamming code, we will need to send a lot more bits. Anyhow let's see how Hamming Code works.
In this section we will see that Hamming code has some amazing properties, although we will not discuss why it works just yet. In fact, if only one error is introduced in transmission, i.e. only one bit got changed, then the decoding method employed by the receiver will definitely be able to correct it. It is easy to appreciate then, that if 2 or more errors are made, correction and even detection may not be possible.
For now, we will describe how Hamming code works, but only later do we develop the mathematics behind it. Therefore, this section can be skipped if one wishes.
We let each of a, b, c and d take the value 0 or 1, and these are called information bits. A Hamming code is a block of 7 bits in the form of
 (a + b + d, a + c + d, a, b + c + d, b, c, d) (mod 2)
..give matrix repn as can be seen, the digits a, b, c and d appear on their own in component 3, 5, 6, and 7. All other components are a combination of a, b, c and d. So we call the 3rd, 5th, 6th and 7th component the information component, all other components are check components, and these components carry extra information that allows us to detect and correct single errors. We will explain the funny notation above in turn. The (mod 2) notation means that we take each of the values in the bracket, separated by commas, and look at its value modular 2 (we will see an example later).
We have represented the block of 7 bits in vector form, where each component corresponds to a bit. E.g. let a = b= c = d = 1, then we have
 (1 + 1 + 1, 1 + 1 + 1, 1 , 1 + 1 + 1, 1, 1, 1) (mod 2)
which is
 (1, 1, 1, 1, 1, 1, 1)
therefore 1111111 is the block of bits we are sending.
To detect single errors, it's also very simple. Suppose the codeword received is , we compute 3 values
then we declare the error is at the th position.
Suppose 1111111 is sent but 1111011 is received. The receiver computes
so the error is at the th position as predicted!
If no error is made in transmission, then .
Summary
If one wishes to send a block of information consisting of 4 bits. Let these 4 bits be abcd, say.
 To send: abcd
 we compute and send (a + b + d, a + c + d, a, b + c + d, b, c, d) (mod 2)
 To decode:

 if e = 0 then assume no error, otherwise declare a single error occurred at the eth position.
ExerciseEdit
...compute some codewords and decode them
BasicsEdit
The mathematical theory of error correcting codes applies the notion of a finite field, called also Galois field (after the famous French mathematician, Evariste Galois, 1811–1832). In particular, the 2element set {0,1} supports the structure of a finite field of size 2. In general, a finite field may have q elements, where q is a prime power (no other number of elements is possible in a finite field; any two finite fields with the same number of elements are essentially alike, i.e. isomorphic). E.g. a 7element field may consist of elements 0,1,2,3,4,5,6, and its arithmetic operations +  * are performed modulo 7.
We denote the finite field of size q as or GF(q). GF stands for Galois Field.
Some DefinitionsEdit
Code & Codeword
 Let A be a finite set, called alphabet; it should have at least two different elements. Let n be a natural number. A code of length n over alphabet A is any set C of nlong sequences of elements from A; the sequences from C are called codewords of C.
If alphabet A = {0,1} then codes over A are called binary.
For example, let C be a code over alphabet A := GF(5) := {0,1,2,3,4}. Let C := {000,111,222,333,444}. It has codewords 000, 111, 222, 333 and 444.
Now we should discuss some properties of a code. Firstly, we can have the notion of distance between two codewords.
Hamming Distance
 Let C be a code, and x and y (bold to signify that each codeword is like a vector) are codewords of C. The Hamming distance of x and y denoted
 d(x,y)
 is the number places in which x and y differ.
E.g. d(000,111) = 3.
Hamming distance enjoys the following three fundamental metric properties:
 d(x,y) = 0 <==> 'x' = 'y'
 d(x,y) = d(y,x)
 d(x,y) ≤ d(x,z) + d(z,y); triangle inequality
Minimum distance
 The minimum distance of a code C denoted d(C) is the minimum distance possible between two different codewords of C
E.g. Let C = {000,111,110,001}, then d(C) = d(000,001) = 1, as the distance between any other codewords are greater than or equal to 1.
The significance of minimum distanceEdit
The minimum distance of a code C is closely related to its ability to correct errors. Let's illustrate why, using a hypothetical code C. Let say this code has minimum distance 5, i.e. d(C) = 5. If a codeword, x is sent and only up to 2 errors were introduced in transmission, then it can be corrected.
Suppose x is sent but x + e is received, where e corresponds to some vector with up to 2 nonzero components. We see that x + e is closer to x than any other codeword! This is due to the fact that d(C) = 5.
E.g. let C = {00000,11111,22222} and 00000 is sent but 00012 is received. It is easy to see that 00000 is the closet codeword to 00012. So we decode 00012 as 00000, we have in effect corrected 2 errors. But if 3 or more errors are made and we decode using the closest codeword, then we might be in trouble. E.g. if 11111 is sent but 11222 is received. We decode 11222 as 22222, but this is wrong!
No error correcting code is perfect (although we call some perfect codes). No code can correct every possible error vector. But it is also reasonable to assume that only a small number of errors are made each transmission and so we only need codes that can correct a small number of errors.
Nearest Neighbour decodingEdit
If m > n, then it is reasonable to assume that it is more likely that n errors were made than m errors. In any communication channel, it is reasonable to assume that the more errors, the less likely. Therefore it is very reasonable to decode a received block using nearest neighbour decoding, i.e. if y is received, we look for a codeword x (of C) so that d(x,y) is minimum.
Using the above scheme, it is easy to see that if a code C has minimum distance d(C) = 2t + 1, then up to t errors can be corrected.
ExercisesEdit
If a code C has minimum distance d(C) = 2t + 2. How many errors can it correct using nearest neighbour decoding?
Linear CodesEdit
Basic linear algebra assumed from here on.
Notation
 Let and both denote the ndimensional vector space with components coming from {0,1,2,..,q  1} with arithmetic performed modulo q
A linear code C is a qary [n,k,d] code, if
 C is a set of vectors of length n,
 each component (of a codeword) takes a value from GF(q),
 all the codewords of C are spanned by k linearly independent vectors, and
 d(C) = d
Note that if x and y are codewords then so is x + y. In other words, we can think of C as a vectorsubspace of , of dimension k. So C can be completely determined by providing a basis of k vectors that span the codewords. Let {  i = 1, 2, ..., k} be a basis for C, we call the matrix
the generator matrix of C.
E.g. let C be a 5ary [5,3,3] code spanned by {10034,01013,00111} then the generator matrix is
Information rate
A qary [n,k,d] code can have different codewords as each codeword c is of the form
where may take values 0, 1, .., q  1 (as arithmetic is performed modulo q). So this code can represent symobols.
We see that the row span of G are all the codewords, so suppose one wants to send , we can compute the corresponding codeword c by
E.g. let C and G be as above and we wish to send 012 to the receiver, we compute the codeword
Notice how the first 3 digits of the codeword is actually the message we want to send, so the last 2 digits are not needed if we don't want any errorcorrection ability in the code.
A linear code is in standard form if it's generator matrix is of the form (IN) (has an identiy matrix at the left end of the generator matrix). The matrix G above is in standard form. It turns out that if G is in standard then it's easy for the receiver to read the intended message. It has another advantage which will be discussed in the next section.
DecodingEdit
One advantage of using linear code is that detection of errors is easy. Actually, we can find a matrix H such that if and only if x is a codeword. So if y is received and then we can confidently say that y has been contaminated by noise.
To find such a H, let's suppose C is a qary [n,k,d] code and C is spanned by i = 1, 2, .., k.
Definition—Inner Product & Orthogonality
Define the inner product of any two vectors to be
we say two vectors v and w are orthogonal if <v,w> = 0.
E.g. let C be a 7ary [7,4,3] code, then
 <(0,1,2,4,5,6,3), (1,4,5,0,3,2,0)> = 4 + 10 + 15 + 12 = 41 ≡ 6 (mod 7)
Firstly note that H must be a n by j matrix for some j. Think of H as a linear transformation in . By definition kerH = C, and by the ranknullity theorom
so H has rank n  k. In fact, the row span of H is the span of n  k linearly independent vectors, i = 1,2,..,n  k where are orthogonal to each codeword in C.
Notice that C = imH and kerH are vectors subspaces (exercise). In fact we denote
where means the vector subspace where any vector is orthogonal to every vector in C.
So we need to find a basis for and let H be the matrix whose rows the basis vectors! If the generator matrix G of C is in standard form, then H is very easy to compute. Indeed if
then
For example, let G be as above i.e.
then we can conclude
look at the values modulo 5 (recall that G generates a 5ary code), we get
We call H the parity check matrix as H can tell us whether a code has been contamniated with.
To see that each codeword , all we need to do if multiply H by each row of G (transposed) since the rows of G span C (exercise).
ExercisesEdit
1. Let H be the parity check matrix of a code C, i.e. Hx^{T} = 0 for all codewords x of C. Think of H as a linear transformation. Prove that C = imH and kerH are vectors subspaces.
2. If G (the generator matrix) is in standard form, prove that H as constructed above is spanned by all the vectors orthogonal to the row span of G.
Why Hamming code worksEdit
The binary Hamming code is a [7,4,3] code. Since the minimum distance is 3, so it can correct 1 error. The Hamming code is special as it tells the receiver where the error is. To construct a Hamming code, it is easier to construct the parity check matrix first
We will not discuss how to find G, it is left as an exercise. Notice the columns of H are just the binary representation of the number 1,2,.., and 7 in increasing order. This is how Hamming code can tell us where the error is.
Let x be a codeword of the Hamming Code, and suppose x + e_{j} is received, where e_{j} is the vector where only the jth position is 1. In other words, one error is made in the jth position.
Now notice that
but is just the j column of H which is the binary representation of j.