Serial Programming/Forming Data Packets

Just about every idea for communicating between computers involves "data packets", especially when more than 2 computers are involved.

The idea is very similar to putting a check in an envelope to mail to the electricity company. We take the data (the "check") we want to send to a particular computer, and we place it inside an "envelope" that includes the address of that particular computer.

A packet of data starts with a preamble, followed by a header, followed by the raw data, and finishes up with a few more bytes of transmission-related error-detection information -- often a Fletcher-32 checksum. We will talk more about what we do with this error-detection information in the next chapter, Serial Programming/Error Correction Methods.

The accountant at the electricity company throws away the envelope when she gets the check. She already knows the address of her own company. Does this mean the "overhead" of the envelope is useless ? No.

In a similar way, once a computer receives a packet, it immediately throws away the preamble. If the computer sees that the packet is addressed to itself, and has no errors, then it discards the wrapper and keeps the data.

The header contains the destination address information used by all the routers and switches to send the complete packet to the correct destination address, like a paper envelope bears the destination address used by the postal workers that carry the mail to the correct destination address. Most protocol use a header that, like most paper mail envelopes, also include the source address and a few other bits of transmission-related information.

Unfortunately, there are dozens of slightly different, incompatible protocols for data packets, because people pick slightly different ways to represent the address information and the error-detection information.

... gateways between incompatible protocols ...

Packet size tradeoffs

edit

Protocol designers pick a maximum and minimum packet size based on many tradeoffs.

  • packets should be "small" to prevent one transmitter transmitting a long packet from hogging the network.
  • packets should be "small" so that a single error can be corrected by retransmitting one small packet rather than one large packet.
  • packets should be "large" so more time is spent transmitting good data and less time is spent on overhead (preamble, header, footer, postamble, and between-packet gap).
  • the packet header and trailing footer should be short, to reduce overhead.
  • the footer should hold a large error-detection codeword field, because a shorter codeword is more likely to incorrectly accept an error-riddled packet (we discuss error-detection in more detail in the next chapter, ../Error Correction Methods/).
  • making the packet header a little longer, so that meaningful fields fall on byte or word boundaries, rather than highly encoded bit fields, makes it easier for a CPU to interpret them, allowing lower-cost network hardware.
  • making the packet header a little longer -- instead of a single error-detection field that covers the whole packet, we have one error-detection field for the header, and another error-detection field for the data -- allows a node to immediately reject a packet with a bit error in the destination address or the length field, avoiding needless processing. The same CRC polynomial is used for both. Such an "extra" CRC that covers only the header is used in several file formats[1] -- it's an optional feature of the MPEG audio file format,[2][3][4] an optional feature of the gzip format, etc.
  • fixed-size packets -- where all packets fall into a few length categories -- do not require a "length" field, and simplify buffer allocation, but waste "internal" data space on padding the last packet when you want to send data that is not an exact multiple of the fixed data size.

Start-of-packet and transparency tradeoffs

edit

Unfortunately, it is impossible for any communication protocol to have all these nice-to-have features:

  • transparency: data communication is transparent and "8 bit clean" -- (a) any possible data file can be transmitted, (b) byte sequences in the file always handled as data, and never mis-interpreted as something else, and (c) the destination receives the entire data file without error, without any additions or deletions.
  • simple copy: forming packets is easiest if we simply blindly copy data from the source to the data field of the packet without change.
  • unique start: the start-of-packet symbol is easy to recognize, because it is a known constant byte that never occurs anywhere else in the headers, header CRC, data payload, or data CRC.
  • 8-bit: only uses 8-bit bytes.

Some communication protocols break transparency, requiring extra complexity elsewhere -- requiring higher network layers to implement work-arounds such as w:binary-to-text encoding or else suffer mysterious errors, as with the w:Time Independent Escape Sequence.

Some communication protocols break "8-bit" -- i.e., in addition to the 256 possible bytes, they have "extra symbols". Some communication protocols have just a few extra non-data symbols -- such as the "long pause" used as part of the Hayes escape sequence; the "long break" used as part of the SDI-12 protocol; "command characters" or "control symbols" in 4B5B coding, 8b/10b encoding; etc. Other systems, such as 9-bit protocols,[5][6][7][8][9][10][11] transmit 9 bit symbols. Typically the first 9-bit symbol of a packet has its high bit set to 1, waking up all nodes; then each node checks the destination address of the packet, and all nodes other than the addressed node go back to sleep. The rest of the data in the packet (and the ACK response) is transmitted as 9 bit symbols with the high bit cleared to 0, effectively 8 bit values, which is ignored by the sleeping nodes. (This is similar to the way that all data bytes in a MIDI message are effectively 7 bit values; the high bit is set only on the first byte in a MIDI message). Alas, some UARTs make it awkward,[12][13] difficult, or impossible to send and receive such 9-bit characters.

Some communication protocols break "unique start" -- i.e., they allow the no-longer-unique start-of-packet symbol to occur elsewhere -- most often because we are sending a file that includes that byte, and "simple copy" puts that byte in the data payload. When a receiver is first turned on, or when cables are unplugged and later reconnected, or when noise corrupts what was intended to be the real start-of-packet symbol, the receiver will incorrectly interpret that data as the start-of-packet. Even though the receiver usually recognizes that something is wrong (checksum failure), a single such noise glitch may lead to a cascade of many lost packets, as the receiver goes back and forth between (incorrectly) interpreting that data byte in the payload as a start-of-packet, and then (incorrectly) interpreting a real start-of-packet symbol as payload data. Even worse, such common problems may cause the receiver to lose track of where characters begin and end. Early protocol designers believed that once synchronization has been lost, there must be a unique start-of-packet character sequence required to regain synchronization.[14] Later protocol designers have designed a few protocols, such as CRC-based framing,[15] that not only break "unique start" -- allow the data payload contain the same byte sequence as the start-of-packet, supporting simple-copy transparency -- they don't even need a fixed unchanging start-of-packet character sequence.

In order to keep the "unique start" feature, many communication protocols break "simple copy". This requires a little extra software and a little more time per packet than simply copying the data -- which is usually insignificant with modern processors. The awkwardness comes from (a) making sure that the entire process -- the transmitter encoding/escaping a chunk of raw data into a packet payload that must not include the start-of-packet byte, and the receiver decoding/unescaping the packet payload into a chunk of raw data -- is completely transparent to any possible sequence of raw data bytes, even if those bytes include one or more start-of-packet bytes, and (b) since the encoded/escaped payload data inevitably requires more bytes than the raw data, we must make sure we don't overflow any buffers even with the worst possible expansion, and (c) unlike "simple copy" where a constant bitrate of payload data bits results in the same constant goodput of raw data bits, we must make sure that the system is designed to handle the variations in payload data bitrate or raw data bit goodput or both. Some of this awkwardness can be reduced by using consistent-overhead byte stuffing (COBS).[16] rather than variable-overhead byte stuffing techniques such as the one used by SLIP.

Calculate the CRC and append it to the packet *before* encoding both the raw data and the CRC with COBS.[17]

preamble

edit

Two popular approaches to preambles are:

  • The first transmitter sends just enough preamble bits for this hop's receiver to lock on. Then it sends the packet. That receiver, once it knows it needs to forward the packet another hop, transmits a fresh new full-size preamble over the next hop, followed by the data packet. The benefit is that a relatively short, fixed-length preamble is adequate no matter how many hops the data packet jumps over.
  • The first transmitter sends a much longer preamble, much longer than is really necessary for this hop's receiver to lock on. Then it sends the packet. That receiver, as soon at it detects the preamble, immediately transmits each incoming bit out the other port(s) until it gets to the end of the packet. The benefit is that relay nodes can be extremely simple, since they don't have to store any data. Preamble consumption (w:5-4-3 rule#Preamble consumption) makes the preamble get shorter and shorter after each hop -- too many bits lost and eventually there is not enough preamble bits to lock on and the entire packet is lost.

Automatic baud rate detection

edit


Clipboard

To do:
Say a few words about automatic baud rate detection (autobaud). (Is there a better section of the Serial Programming book to discuss autobaud implementation?)




For further reading

edit
  1. Andy McFadden. "Designing File Formats"
  2. Wikipedia: elementary stream
  3. Gabriel Bouvigne. "MPEG Audio Layer I/II/III frame header"
  4. Predrag Supurovic. "MPEG Audio Frame Header"
  5. uLan: 9-bit message oriented communication protocol, which is transferred over RS-485 link.
  6. Pavel Pisa. "uLan RS-485 Communication Driver" "9-bit message oriented communication protocol, which is transferred over RS-485 link."
  7. Peter Gasparik. "9-bit data transfer format"
  8. Stephen Byron Cooper. "9-Bit Serial Protocol".
  9. "Use The PC's UART With 9-Bit Protocols". 1998.
  10. Wikipedia: multidrop bus (MDB) is a 9-bit protocol used in many vending machines.
  11. ParitySwitch_9BitProtocols: manipulate parity to emulate a 9 bit protocol
  12. "Use The PC's UART With 9-Bit Protocols". Electronic Design. 1998-December.
  13. Thomas Lochmatter. "Linux and MARK/SPACE Parity". 2010.
  14. J. Robinson. RFC 935. "Reliable link layer protocols". 1985. quote: "once a header error has been detected, the count field must be assumed to be invalid, and so there must be a unique character sequence that introduces the next header in order that the receiver can regain synchronization with the sender."
  15. Wikipedia: CRC-based framing
  16. "Consistent Overhead Byte Stuffing" by Stuart Cheshire and Mary Baker, 1999.
  17. Jason Sachs. "Help, My Serial Data Has Been Framed: How To Handle Packets When All You Have Are Streams". 2011.
"CMX-MicroNet is the first system that allows TCP/IP
and other protocols to be run natively on small processors
... [including] AVR, PIC 18, M16C."