Operating System Design/Scheduling Processes/Preemption

Preemption as used with respect to operating systems means the ability of the operating system to preempt (that is, stop or pause) a currently scheduled task in favour of a higher priority task. The resource being scheduled may be the processor or I/O, among others.

Non-preemptability arises, for instance, when handling an interrupt. In this case, scheduling is avoided until the interrupt is handled.

The schedulers used in most modern operating systems, such as various flavours of Unix, can preempt user processes. This is called preemptive multitasking, and is in contrast to cooperative multitasking wherein a process "gives away" its time by utilizing kernel resources or by specifically calling a kernel routine to allow other processes time to run.



Some operating systems' schedulers (including Linux as of the 2.6 series) have the ability to preempt a process while it is processing a system call as well (a preemptible kernel).

Sinclair QDOS was the first preemptive multitasking system available for home users (1984). Other preemptive operating systems include AmigaOS, the Windows NT family (also XP or Vista), Linux, *BSD, and Mac OS X. Examples of cooperative operating systems include Windows for Workgroups (also known as Windows 3.1 or 95), NetWare, and Mac OS versions 9.x.

Linux kernels prior to Linux 2.6 were also nonpreemptive, but later releases implemented the preemptive model. Several commercial versions of UNIX are preemptive, including Solaris and IRIX.



The person currently at the cashier may be interrupted and have to wait for another customer before finishing.

Pre-emptive Scheduling




In the Linux kernel, the scheduler is called after each timer interrupt (that is, quite a few times per second). It determines what process to run next based on a variety of factors, including priority, time already run, etc. The implementation of preemption in other kernels is likely to be similar.

Advantages and Disadvantages


Making a scheduler preemptible has the advantage of better system responsiveness and scalability, but comes with the disadvantage of race conditions (where the executing process accesses the same resource before another preempted process finished using it).

Round-robin pre-emptive scheduling


The simplest pre-emptive scheduling algorithm is round-robin. The round-robin scheduler keeps all the runnable processes in a circular queue. Every time the hardware timer interrupts the currently-running process, (or when that process voluntarily gives up control), the scheduler puts that process at the back of the queue. Then the scheduler fetches the process at the head of the queue and runs it.

Priority Pre-emptive Scheduling


(FIXME: consider splitting this section out to a separate "Priority Pre-emptive Scheduling" page)

Nearly all modern operating systems use priority pre-emptive scheduling.

Most real-time computer systems use fixed priority pre-emptive scheduling -- usually rate-monotonic scheduling or deadline monotonic scheduling.[1]

fixed priority pre-emptive scheduling


rate monotonic scheduling: a job with a lower frequency of activation is assigned a lower priority than (and so is pre-empted by) all jobs with a higher frequency of activation. RMS will always meet deadlines if the CPU utilization is less than  .[2]

deadline monotonic scheduling: a job's priority is inversely proportional to its relative deadline. (Deadline monotonic scheduling becomes equivalent to rate monotonic scheduling in the special case where each job's relative deadline is equal to its period).

dynamic priority pre-emptive scheduling


earliest-deadline first scheduling: a job's priority is inversely proportional to its absolute deadline. The difference between deadline monotonic scheduling and earliest-deadline first scheduling is that DM is a static priority algorithm, EDF is a dynamic priority algorithm.[3] EDF can guarantee that all deadlines are met provided that the total CPU utilization is less than  .

fixed vs dynamic vs round-robin


Dynamic priority algorithms have the advantage that they can successfully schedule some job sets (i.e., not miss any deadlines) that cause static priority algorithms to fail (i.e., miss deadlines).

Fixed priority algorithms have the advantage over dynamic priority algorithms that if the system experiences an overload at a certain priority level -- so many jobs are scheduled at that priority that it's impossible for any scheduler to meet all their deadlines -- fixed priority schedulers can still guarantee that all higher-priority jobs will still be scheduled and still meet their deadlines.[3] Fixed priority algorithms also have the advantage that they are easier to implement.

People who implement priority-based schedulers need to worry about two potential problems:

  • process starvation: If one process requests all of the CPU time it can get -- because of an accidental overload or malicious denial-of-service attack -- all lower-priority processes will be locked out.[4]
  • priority inversion: (FIXME:)

Round-robin schedulers have the advantage that those priority-related problems cannot occur.

mixed scheduling


Some schedulers implement some mixture of the above scheduling algorithms.

Adaptive partition schedulers uses priority-based scheduling when the system isn't under full CPU load, but also guarantee that even low-priority services get some minimum amount of CPU time even when the system is heavily loaded[4][5] using a round-robin-like algorithm.

Many real-time systems have a lowest-priority "background task" that does not have any hard real-time deadline, and the system only runs it in the otherwise idle spare time left over after all the realtime tasks are done.[6][7]

Some of those systems use "dual kernels", with an entire general-purpose OS such as Linux with its own scheduler running as a single lowest-priority task on top of a hard real-time kernel.[8] (When the realtime kernel uses rate monotonic scheduling, that non-real-time task is guaranteed to get at least   of the CPU time).

interrupt latency


In addition to the timer tick interrupt, most systems have other hardware interrupts. (FIXME: something about critical sections) (FIXME: something about deterministic response times vs. jitter) (FIXME: something about streaming audio data?) The worst-case interrupt latency is at least the length of the longest critical section in the kernel. [6]

Some high-availability operating systems support virtual device drivers -- the amount of code that runs with the highest priority and directly manipulates hardware, such as interrupt service routines, is minimized; and most of the device driver work is handled in user space by a second-level interrupt handler.[7] (FIXME: possibly something about hardware interrupts triggering first-level interrupt handlers, and the timer tick interrupt running the scheduler which may in turn run a second-level interrupt handler? Or is this already covered by Embedded Systems/Interrupts ?)

We talk more about various hardware interrupts in a later chapter, Operating System Design/Processes/Interrupt.

hard real-time scheduler


Many computer systems are designed to accept new tasks at any time. If there is only one task running -- or if all the other tasks fit into the times when that task couldn't do anything useful because it is waiting for the disk to spin, the network card to finish a packet, etc -- then that task finishes in some minimum amount of time. But as other tasks are added to the system which preempt that task, wall-clock time required for the system to finish the task becomes unbounded. (FIXME: say something about "starvation-free" here)

Hard real-time computer systems are designed to deliberately refuse to accept new tasks. Programmers carefully design real-time tasks in such a way that is relatively easy to calculate the worst-case runtime (i.e., avoiding the halting problem). Given that runtime, the desired deadline for the task, and the task itself, one can use rate-monotonic analysis to decide if, after adding that task to all the other tasks already added to a system, it is possible to not only meet the desired deadline for this task but also continue to meet all the desired deadlines for all those previous tasks -- and if it's not possible to guarantee that all those deadlines will continue to be met, refuse to accept that new task. Some people build systems that allow new tasks to be presented to the system, and design the system to automatically do RMA analysis, and then the system decides whether or not to accept this new task. Other people manually do RMA by hand to decide which tasks to allow in a system, and then hard-wire the system to only run those tasks -- these systems are much simpler.

People who build hard real-time systems think that occasionally refusing to accept some new task is well worth the guarantee that all previous tasks will still meet their deadlines.

  1. Phil Koopman. "Real Time Scheduling Analysis for Critical Systems".
  2. Eric Verhulst, Raymond T. Boute, José Miguel Sampaio Faria, Bernhard H.C. Sputh, Vitaliy Mezhuyev. "Formal Development of a Network-Centric RTOS: Software Engineering for Reliable Embedded Systems". Section 2.3.4: Rate Monotonic Analysis. p. 24. 2011.
  3. a b Barry Watson. "The Design of the Everyman Hard Real-time Kernel". 2014. p. 15.
  4. a b Kerry Johnson, Jason Clarke, Paul Leroux, and Robert Craig. "Choosing the best real-time scheduling method for your embedded device". 2006.
  5. "Partition Scheduling".
  6. a b David Kleidermacher and Mark Griglock. "Safety-Critical Operating Systems". 2001.
  7. a b David Kleidermacher. "Optimizing RTOSes for HA architectures". 2002.
  8. Paul N. Leroux and Jeff Schaffer. "Exactly When Do You Need Real Time?". 2006.