Serial Programming/Error Correction Methods

IntroductionEdit

There are 3 main types of handling errors:

  • acknowledge or retry (ACK-NAK).
  • "Forward Error Correction" (FEC)
  • Pretend It Never Happened

ACK-NAKEdit

Each packet is checked by the receiver to make sure it is "good".

If it *is* good, the receiver (eventually) tells the sender that it came through OK -- it acknowledges (ACK) the packet.


All versions of ACK-NAK absolutely require Two Way Communication .

How does the receiver know it's good ?Edit

The sender calculates a checksum or CRC for the entire packet (except for the footer), then appends it to the end of the packet (in the footer/trailer).

The typical CRC is 32 bits, often a Fletcher-32 checksum.

Aside: Note that the checksum or CRC are forms of hashing, ie, irreversibly shrinking data. Checksums and CRCs are weaker algorithms than "cryptographically strong" message authentication code algorithms such as MD5 or SHA variants. Cryptographically strong algorithms can detect errors better than checksums or CRCs, but they take more time to calculate.

Whenever the receiver receives a packet, the receiver calculates exactly the same checksum or CRC, then compares it to the one in the footer/trailer. If they match, the entire packet is (almost certainly) good, so the receiver sends an ACK.

When there's even the slightest question that the packet has any sort of error (which could be *either* in the actual data *or* in the header *or* in the checksum bits -- there's no way for the receiver to tell), the receiver discards it completely and (in most cases) pretends it never saw it.

If it's not good, the sender sends it again.

How does the sender know it wasn't good ?Edit

It never got the ACK. (So either the packet was corrupted, *or* the ACK was corrupted -- there's no way for the sender to know).

"Stop-and-wait ARQ"Edit

The simplest version of ACK-NAK is "Stop-and-wait ARQ".

The sender sends a packet, then waits a little for an ACK. As soon as it gets the ACK, it immediately sends the next packet. If the sender doesn't hear the ACK in time, it starts over from the beginning, sending the same packet again, until it does get an ACK.

The receiver waits for a packet. If the packet passes all the error-detection tests perfectly, the receiver transmits an ACK (acknowledgment) to the sender.

Subtleties: If the receiver receives the packet perfectly, but the ACK message is delayed too long, then the transmitter sends another copy of the message (a "communication echo"). Imagine the packet contained the message "deduct $11,000 from Fred's account.". When the receiver gets this second copy of the packet, what should it do? Certainly it should send an ACK (otherwise the transmitter will keep trying to send this packet over and over). Either or both of the following problems could occur:

  • The delayed first ACK could hit the transmitter after it transmits the second copy of the message, so it transmits the next packet. Then the second ACK hits the transmitter, tricking the transmitter into thinking that "next packet" has been successfully received, when it hasn't.
  • When the receiver gets 2 identical consecutive packets saying "deduct $11,000 from Fred's account", are these 2 legitimate independent transactions, and so it should deduct $22,000 from Fred's account? Or is it really just 1 transaction, with a bit of echo, and so should deduct a total of only $11,000 from Fred's account?

Both of these problems can be solved by adding a "sequence number". The transmitter keeps a count of how many independent packets it has transmitted to that receiver, and puts that sequence number in the header of each packet. But when it re-transmits a packet, it re-transmits that same identical packet with that same identical sequence number. Also, the receiver, rather than sending a generic "ACK" message, specifies which particular packet it is responding to by putting its sequence number in the ACK message. When there is a communication echo, the receiver sees the same sequence number, so ACKs that sequence number (again) but then discards and ignores the extra, redundant copy of a packet it already received. When the transmitter is sending a new packet that merely happens to contain the same data, the receiver sees a different sequence number, so it ACKs that new sequence number, and takes another $11,000 out of Fred's account. Poor Fred.

A 1-bit sequence number (alternating 1 - 0 - 1 - 0 for each new packet, and ACK1 ACK0 ACK1 ACK0 in response) is adequate for a stop-and-wait system. But as we will see, other ARQ protocols require a larger sequence number.

Subtleties: Some early protocols had the receiver send a NAK (negative acknowledgment) to the sender whenever a bad packet was received, and the sender would wait indefinitely until it received *either* an ACK *or* a NAK. This is a bad idea. Imagine what happens when (a) a little bit of noise made a bad packet, so the receiver sends the NAK back to the sender, but then (b) a little bit of noise made that NAK unrecognizable. Alternatively, imagine a shared-medium network with 1 sender and 2 receivers. What happens when a little noise messes up the "destination" field of the packet ?

With "Stop-and-wait ARQ", the sender and the receiver only needs to keep 1 packet in memory at a time.

streaming ARQEdit

The sender sends a packet, then the next packet, then the next, without waiting.

As it sends each packet, it puts a copy of that packet in a "window".

Each packet is consecutively numbered. (The sequence number must be at least large enough to uniquely identify every packet in the window).

... turn-around time ... bouncing off geostationary satellites ...

The receiver occasionally transmits an acknowledgment ("I got all packets up to 8980", "I got all packets up to 8990").

If the receiver is expecting packet number 9007, but it receives a packet with an *earlier* number (that it had already received successfully), it transmits (or possibly re-transmits) a "I got all packets up to 9006" message.

When the sender receives an acknowledgment of any packet in the "window", it deletes that copy.

When the sender's window gets full, it waits a little, then tries re-sending the packets in the window starting with the oldest.

So when the sender suspects an error in some packet, it resend *all* packets starting with the erroneous packet. This guarantees that the receiver will (eventually) receive all packets in order.

Optionally, If the receiver is expecting packet number 9007, but it receives packet number 9008, it may transmit a negative acknowledge (NAK) for 9007, and ignores any higher packet numbers until it gets packet 9007.

When the sender receives a NAK for any packet in the window, it re-starts transmission with that packet (and keeps it in the window).

With "streaming ARQ", the sender needs to keep the entire window of packets in memory at a time. But the receiver still only needs to handle 1 packet at a time, and handles them in consecutive order.

(Some people think of "streaming" as one big packet the size of the window using "stop-and-wait" protocol, divided into smaller "sub-packets").

Selective Repeat ARQEdit

A selective repeat ARQ system is a kind of streaming ARQ.

But instead of the receiver only handling 1 packet at a time, and discarding all packets higher or lower than the one it is looking for, the receiver tries to keep a copy of all packets it receives in a window of its own, and negotiates with the sender to try to resend *only* the erroneous packets.

FECEdit

If you have only one-way communication, you are forced to use Forward Error Correction, sometimes called EDAC (Error Detection And Correction).

You transmit the data, then (instead of a CRC) you transmit "check bits" that are calculated from the data.

... NASA space probes ... compact disks ...

The simplest kind is "repeat the message".

If I send the same packet twice, and noise only corrupts one of them, *and* the receiver can tell which one was corrupted, then no data was lost. If I send the same packet 3 times, and noise corrupts any one of them, then the receiver can do "best 2 out of 3". The "check bits" are 2 copies of the data bits. In fact, noise could corrupt a little bit of *all three* of them, and you could still extract all the data -- align the 3 packets next to each other, and do "best 2 out of 3" for every bit. As long as there were only a few bits of noise in each packet, and the noise was in a different place in each packet, all the data can be recovered.

... (put picture here) ...

There are some very clever kinds of FEC (Hamming codes, Reed-Solomon codes) that can correct all kinds of common errors better than "best 2 out of 3", and only require the same number of "check bits" as there are data bits.

Pretend It Never HappenedEdit

A sender often streams audio and video live, in real-time.

What should a receiver do when a packet gets mangled ?

If it sends a message back to the sender, asking it to resend that packet, by the time the reply gets back, it's probably several video frames later. It's too late to use that information.

Rather than pausing the entire movie until the request makes a round-trip, it's far less jarring to the audience if the receiver silently discards the mangled packet, fills in as best it can (for example, with nearby pixels' colors), try not to draw attention to the error, and continue on as if nothing had happened.

Note:
Signal degradation should be documented and easily findable as to let users know that there is no guarantee of exact reproduction.

combinationEdit

Even when they have 2-way communication, sometimes people use FEC anyway. That way small amounts of noise can be corrected at the receiver. If a packet is corrupted so badly that FEC cannot fix it, the protocol falls back on ACK-NAK retransmission (or on Pretend It Never Happened).

further readingEdit

a detailed description of one ACK-NAK protocol: "XModem / YModem Protocol Reference" by Chuck Forsberg 1988-10-14 http://www.commonsoftinc.com/Babylon_Cpp/Documentation/Res/yModem.htm

a detailed description of one streaming protocol: "The ZMODEM Inter Application File Transfer Protocol" by Chuck Forsberg 1988-10-14 http://www.commonsoftinc.com/Babylon_Cpp/Documentation/Res/zModem.htm

"Data Link Error Detection / Correction Methods" http://techref.massmind.org/techref/method/errors.htm brief descriptions of several error correction methods: Hamming codes, Fire codes, Reed-Solomon codes, Viterbi decoding, etc.


further readingEdit

Last modified on 10 December 2011, at 21:47