Embedded Systems/Threading and Synchronization

Embedded Systems

One of the most useful developments in the history of computing is multitasking and multithreading. This technique isn't always available to an embedded system engineer, but some embedded systems and RTOS have multithreading (MT) capability. The chapters in this section will talk about some of the uses of MT, and will discuss some of the common pitfalls associated with MT programming. This page is only going to serve as a brief reference to multi-threaded programming.

Preemptive Multithreading

When the first multi-tasking systems were established, they did not have a central controller. Multi-tasking was established by having programs voluntarily give up control to the system, and the system would then give control to another process. This system worked reasonably well, except that any program that was misbehaving would slow down the entire system. For instance, if a program got stuck in an infinite loop, it would never give up control, and the system would freeze.

The solution to this problem is preemptive multithreading. In a preemptive environment, control could be moved from one process to another process at any given time. The process that was "preempted" would not even know that anything had happened, except maybe there would be a larger than average delay between 2 instructions. Preemptive multithreading allows for programs that do not voluntarily give up control, and it also allows a computer to continue functioning when a single process hangs.

There are a number of problems associated with preemptive multithreading that all stem from the fact that control is taken away from one process when it is not necessarily prepared to give up control. For instance, if one process were writing to a memory location, and was preempted, the next process would see half-written data, or even corrupted data in that memory location. Or, if a task was reading in data from an input port, and it was preempted, the timing would be wrong, and data would be missed from the line. Clearly, this is unacceptable.

The solution to this new problem then is the idea of synchronization. Synchronization is a series of tools provided by the preemptive multithreaded operating system to ensure that these problems are avoided. Synchronization tools can include timers, "critical sections," and locks. Timers can ensure that a given process may be preempted, but only for a certain time. Critical sections are commands in the code that prevent the system from switching control for a certain time. Locks are commands that prevent interference in atomic operations. These topics will all be discussed in the following chapters.

Mutexes

The term Mutex is short for "Mutual Exclusion", and is a type of mechanism used in a preemptive environment that can prevent unauthorized access to resources that are currently in use. Mutexes follow several rules:

Mutexes are system wide objects, that are maintained by the kernel.
Mutexes can only be owned by one process at a time
Mutexes can be acquired by asking the kernel to allocate that mutex to the current task
If a Mutex is already allocated, the request function will block until the mutex is available.

In general, it is considered good programming practice to release mutexes as quickly as possible. Some problems with mutexes will be discussed in the chapter on deadlocks.

Spin Locks

Spin locks is a quick form of synchronization methods. It is named after its behavior - spin in the loop while the condition is false. To implement spin lock system should support test-and-set idiom or give exclusive access to a locking thread by any means (masking interrupts, locking bus).

An advantage of spin locks is that they are very simple. A disadvantage is that they waste CPU cycles in loop waiting. Most common use of spin locks is to synchronize quick access to objects. It is not advisable to do a long computations while spin locked a section.

Sentinels

Critical Sections

A critical section is a sequence of computer instructions that may malfunction if interrupted. An atomic operation is a sequence of computer instructions that cannot be interrupted and function correctly. In practice, these two subtly different definitions are often combined. Operating systems provide synchronization objects to meet these requirements, and some actually call these objects as "critical sections," "atomic operations" or "monitors."

An example of a critical section is code that removes data from a queue that is filled by an interrupt. If the critical section is not protected, the interrupt can occur while the dequeuing function is changing pointers, and corrupt the queue pointers. An example of an atomic operation is an I/O read where the process must read all the data at a particular rate, and cannot be preempted while reading.

A generally good programming practice is to have programs exit their critical sections as quickly as possible, because holding a critical section for too long will cause other processes on the system not to get any time, and will cause a major performance decrease. Critical sections should be used with care.

Priority Scheduling

Many RTOS have a mechanism to distinguish the relative priorities of different tasks. High-priority tasks are executed more frequently than the low priority tasks. Each implementation of priority scheduling will be a little different, however.

Deadlock

A deadlock occurs when a series of synchronization objects are held in a preemptive MT system in such a way that no process can move forward. Let us take a look at an example:

Let's say that we have 2 threads: T1 and T2. We also have 2 mutexes, M1 and M2.

T1 asks for and acquires mutex M1.
T2 acquires M2
T1 asks for M2, and the system transfers control to T2 until T2 releases M2.
T2 asks for M1, and the system is in deadlock (neither thread can continue until the other releases it's mutex).

This is a very difficult problem to diagnose, and an even more difficult problem to fix. This chapter will provide some general guide-lines for preventing deadlocks.

Watchdog Timer

See Embedded Systems/Watchdog Timer.

reading counters without locking

read twice and compare

Perhaps the most common concurrency algorithm used in embedded systems is the "read the counter twice and compare" optimistic concurrency control pattern.

Many hardware timers and hardware counters are connected to the CPU over an 8-bit bus. If the low byte of the timer happens to be 0xFF when we start to read the timer, if the timer increments between two byte reads, a simple read of each byte separately will get a corrupted value.^[1] We get a slightly different corrupted value if we read the low byte first vs. the low byte last, but it's corrupted either way.

One simple solution is to read the counter twice and compare:^[2]^[3]^[4]^[5]^[6]^[7]^[8]^[9]^[10]^[11]^[12]

long atomic_read_counter(volatile long *counter){
    long counter_old, counter_new;
    do{
        counter_old = *counter; // alas, not an atomic operation when the timer is connected to the CPU over an 8-bit bus.
        counter_new = *counter;
    }while( counter_old != counter_new );
    return counter_new;
}

or

// "optimized" routine hard-wired to read a 16-bit Counter1:
// the entire routine takes 3 machine instructions on the 8051 -- see Craig Steiner, Abhishek Yadav, etc.
inline
int atomic_read_counter1(){
    byte upper, lower;
    do{
        upper = Counter1H;
        lower = Counter1L;
    }while( upper != Counter1H );
    return( (upper << 8) | lower );
}

Because no locks are used, the double-read algorithm avoids deadlocks, avoids priority inversion, and avoids other problems with locks.

There are many other algorithms based on this read-twice-and-compare algorithm.

For example, the seqlocks used in Linux use this read-twice-and-compare algorithm.^[13]^[14]

For example, single-reader single-writer ring buffers are also common in embedded systems, and both the reader and the writer can be implemented with a similar lock-free algorithm.

incrementing counter

When the read-twice-and-compare algorithm is used to synchronize several processes (rather than, as above, a hardware counter and a process that reads the counter), the process that changes the counter needs to make that change in a way that appears atomic to the other process.

Many architectures have a single instruction that can atomically increment a variable. Alas, the exact details vary from one architecture to another, and from one C compiler to another. Some relatively portable ways to tell the C compiler to atomically increment a variable involve the std::atomic template class from the standard Atomic operations library, part of the C11 standard for the C programming language;^[15] or the tbb::atomic template class; or the boost::atomic template class; etc.

// rough draft -- untested code

__interrupt_handler h(){
        // ...
        // inside interrupt handler, interrupts are already turned off, so it's safe to increment the counter
        raw_counter++;
        // ...
}

// using CAS to update the counter is safe even if interrupts are turned on.
// inspired by example_atomic_inc() in
// https://www.kernel.org/doc/Documentation/atomic_ops.txt
void increment_counter(volatile long *counter){
    _Bool ret;
	do{
		long old = *counter;
		long new = old + 1;
		// requires #include <stdatomic.h>
		ret = atomic_compare_exchange_strong( counter, &old, new );
	}while(!ret);
}

// only use if interrupts are turned on, and it's not a multiprocessor machine:
void increment_counter(volatile long *counter){
    disable_interrupts();
        (*counter)++;
    enable_interrupts();
}

The main problem with the double-read algorithm is the ABA problem.^[16] To avoid the ABA problem, we use counters that only count up,^[17] and we give the counter enough bits that, for any reasonable amount of time a thread could possibly be delayed between reading the first byte of the first read and the last byte of the second read (say, maximum 1 second), the counter is long enough that at the worst-case (highest-frequency) count rate it takes much longer than that—say, the minimum time to overflow is once every 10 seconds.

For example, up-down counters can be synthesized from 2 counters, one that only counts up, and the other that only counts down.

↑ A few timers, such as the Atmel AVR timers, fix this with hardware. Those times have special additional hardware so that, when the program reads the low byte, all the remaining bytes of the timer are simultaneously latched into a hardware temporary register. Later the CPU can read each byte from the temporary latch (while the timer itself continues to count) without corruption. Atmel. "AVR072: Accessing 16-bit I/O Registers". 2002. Unfortunately, sometimes the main application needs to read one 16-bit register that uses a hardware temporary register, and an interrupt routine needs to read a (the same or some other) 16-bit register that overwrites the same temporary register. Then the main application will occasionally get a corrupted value. "AVR Libc FAQ: Why do some 16-bit timer registers sometimes get trashed?"
↑ Epson Toyocom. "Real Time Clock Module: Application manual". p. 21 says "since the read data is not held (the data may be changing), to obtain accurate data the countdown status should be read twice and then compared."
↑ Michael Silva. "Introduction to embedded programming: Timers/Counters". section "Potential problem using overflows". (mirror).
↑ Nigel Jones. "An unfortunate consequence of a 32-bit world".
↑ Opensolaris. "clock.h: Locking strategy for high-resolution timing services".
↑ Milan Verle. "Architecture and programming of 8051 MCU's". Chapter 2 : "8051 Microcontroller Architecture". section: 2.6 "Counters and Timers". subsection: "How to 'read' a timer?". says: "... The lower byte is correctly read (255), but at the moment the program counter was about to read the higher byte TH0, an overflow occurred and the contents of both registers have been changed (TH0: 14→15, TL0: 255→0). This problem has a simple solution. The higher byte should be read first, then the lower byte and once again the higher byte. If the number stored in the higher byte is different then this sequence should be repeated. It's about a short loop consisting of only 3 instructions in the program. ... another solution ... is ... turn the timer off while reading ... and turn it on again after reading is finished."
↑ "PICmicro Mid-range MCU family". Section 12.5.2: "Reading and Writing Timer1 in Asynchronous Counter Mode" in "Example 12-2: Reading a 16-bit Free-Running Timer" says: "Read high byte ... Read low byte ... Read high byte [again]"
↑ Nippey. "Reading out a 16bit timer on an 8bit system without a latch for the high/low byte". Stackoverflow. says "Sample as long as it takes to not hit an overflow between sampling the lower and the upper byte".
↑ James Rogers. "The 8051 Timers". section "Reading the Timers on the Fly". "read the high byte, then read the low byte, then read the high byte again. If the two readings of the [high byte] are not the same, repeat the procedure."
↑ Craig Steiner. "The 8051/8052 Microcontroller: Architecture, Assembly Language, and Hardware Interfacing". p. 41. Section 8.2.6.1 "Reading the value of a timer". says "The program reads the high byte of the timer, the reads the low byte, then reads the high byte again. If the high byte read the second time is not the same as the high byte read the first time, the cycle is repeated. In code, this would be written as [3 assembly-language instructions]".
↑ Abhishek Yadav. "Microprocessor 8085, 8086". p. 463. says "You read the high byte of the timer, then read the low byte, then read the high byte again. If the high byte read the second time is not the same as the high byte read the first time, you repeat the cycle. In code, this would appear as [3 assembly-language instructions]".
↑ Alka Kalra, Sanjeev Kumar Kalra. "Architecture and Programming of 8051 Microcontroller". 2010. Section 5.5.8: "Reading the value of a timer". p. 163. "You read the high byte of the timer, then read the low byte, then read the high byte again. If the high byte read the second time is not the same as the high byte read the first time, you repeat the cycle. In code, this would appear as [3 assembly-language instructions]".
↑ Wikipedia: seqlock
↑ Jonathan Corbet, Greg Kroah-Hartman, Alessandro Rubini. "Linux Device Drivers". Chapter "Alternatives to Locking" O'Reilly 2005.
↑ http://en.cppreference.com/w/c/atomic
↑ Wikipedia: ABA problem
↑ Counters that only count down are also fine.

[1] A few timers, such as the Atmel AVR timers, fix this with hardware. Those times have special additional hardware so that, when the program reads the low byte, all the remaining bytes of the timer are simultaneously latched into a hardware temporary register. Later the CPU can read each byte from the temporary latch (while the timer itself continues to count) without corruption. Atmel. "AVR072: Accessing 16-bit I/O Registers". 2002. Unfortunately, sometimes the main application needs to read one 16-bit register that uses a hardware temporary register, and an interrupt routine needs to read a (the same or some other) 16-bit register that overwrites the same temporary register. Then the main application will occasionally get a corrupted value. "AVR Libc FAQ: Why do some 16-bit timer registers sometimes get trashed?"

[2] Epson Toyocom. "Real Time Clock Module: Application manual". p. 21 says "since the read data is not held (the data may be changing), to obtain accurate data the countdown status should be read twice and then compared."

[3] Michael Silva. "Introduction to embedded programming: Timers/Counters". section "Potential problem using overflows". (mirror).

[4] Nigel Jones. "An unfortunate consequence of a 32-bit world".

[5] Opensolaris. "clock.h: Locking strategy for high-resolution timing services".

[6] Milan Verle. "Architecture and programming of 8051 MCU's". Chapter 2 : "8051 Microcontroller Architecture". section: 2.6 "Counters and Timers". subsection: "How to 'read' a timer?". says: "... The lower byte is correctly read (255), but at the moment the program counter was about to read the higher byte TH0, an overflow occurred and the contents of both registers have been changed (TH0: 14→15, TL0: 255→0). This problem has a simple solution. The higher byte should be read first, then the lower byte and once again the higher byte. If the number stored in the higher byte is different then this sequence should be repeated. It's about a short loop consisting of only 3 instructions in the program. ... another solution ... is ... turn the timer off while reading ... and turn it on again after reading is finished."

[7] "PICmicro Mid-range MCU family". Section 12.5.2: "Reading and Writing Timer1 in Asynchronous Counter Mode" in "Example 12-2: Reading a 16-bit Free-Running Timer" says: "Read high byte ... Read low byte ... Read high byte [again]"

[8] Nippey. "Reading out a 16bit timer on an 8bit system without a latch for the high/low byte". Stackoverflow. says "Sample as long as it takes to not hit an overflow between sampling the lower and the upper byte".

[9] James Rogers. "The 8051 Timers". section "Reading the Timers on the Fly". "read the high byte, then read the low byte, then read the high byte again. If the two readings of the [high byte] are not the same, repeat the procedure."

[10] Craig Steiner. "The 8051/8052 Microcontroller: Architecture, Assembly Language, and Hardware Interfacing". p. 41. Section 8.2.6.1 "Reading the value of a timer". says "The program reads the high byte of the timer, the reads the low byte, then reads the high byte again. If the high byte read the second time is not the same as the high byte read the first time, the cycle is repeated. In code, this would be written as [3 assembly-language instructions]".

[11] Abhishek Yadav. "Microprocessor 8085, 8086". p. 463. says "You read the high byte of the timer, then read the low byte, then read the high byte again. If the high byte read the second time is not the same as the high byte read the first time, you repeat the cycle. In code, this would appear as [3 assembly-language instructions]".

[12] Alka Kalra, Sanjeev Kumar Kalra. "Architecture and Programming of 8051 Microcontroller". 2010. Section 5.5.8: "Reading the value of a timer". p. 163. "You read the high byte of the timer, then read the low byte, then read the high byte again. If the high byte read the second time is not the same as the high byte read the first time, you repeat the cycle. In code, this would appear as [3 assembly-language instructions]".

[13] Wikipedia: seqlock

[14] Jonathan Corbet, Greg Kroah-Hartman, Alessandro Rubini. "Linux Device Drivers". Chapter "Alternatives to Locking" O'Reilly 2005.

[15] ttp://en.cppreference.com/w/c/atomic

[16] Wikipedia: ABA problem

[17] Counters that only count down are also fine.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]