Ruby Hacking Guide/Threads

Outline

Ruby Interface

Come to think of it, I haven’t yet shown an example of using Ruby threads in practice. It’s not much but here’s an introduction for now:

Thread.new {
  while true
    puts 'from thread'
  end
}
while true
  puts 'from main'
end

If you execute this program, you should see “from thread” and “from main” mixed together in the output.

Of course, more than just creating multiple threads, there are also a number of ways to control them. There isn’t a synchronize keyword like in Java, but common primitives like Mutex, Queue, and Monitor are provided, and the API below can be used for operations on the threads themselves.

Thread API

Thread.pass — Pass execution to another thread
Thread.kill(th) — End the thread th
Thread.exit — End this thread
Thread.stop — Pause this thread temporarily
Thread#join — Wait for the receiving thread to end
Thread#wakeup — Resume a thread that was previously paused

Ruby Threads

At a glance, threads may seem to all be run together, but they are actually executed in turns, each for a little bit of time. Strictly speaking, on multi-CPU machines one can concurrently run multiple threads, but even so, if there are more threads than CPU’s, the threads must run in turns.

Ruby still has a GIL (Global Interpreter Lock). Because of this lock, the ruby interpreter can strictly speaking run only one thread at a time. However, when a thread is blocked (e.g. waiting for data to arrive over the network), the interpreter can use switch to another thread while the blocked thread is waiting. At the moment, if you want to truly run multiple threads concurrently with ruby you will have to run multiple interpreters. This technique is often used by web servers like unicorn. Much work as been done to lessen the impact of the GIL, and in the future it might disappear altogether. For most purposes though, the current situation is sufficient.

Preemptive?

Now we’ll talk about the characteristics of Ruby threads in a little more detail. When talking about threads, one can talk about whether or not they are preemptive.

In a preemptive threading system, even if the user of the threads doesn’t explicitly switch threads, the threads will be switched on their own. Looked at in reverse, the timing of thread switching cannot be controlled by the user.

On the other hand, in a non-preemptive threading system, as long as the user of the threads doesn’t explicitly say “you can pass control to the next thread now,” the threads won’t switch. Looked at in reverse again, it is clear that the user of the threads can control where it is possible for threads to be switched.

This distinction can also be made for processes. In this case, preemptive is seen as the “superior” approach. If, for example, a program had a bug which caused it to fall into an infinite loop, processes would not be able to switch. In other words, one user program could lock up the entire system; this is no good. Windows 3.1 has MS-DOS as its foundation, so its process switching is non-preemptive, but Windows 95’s is preemptive. Therefore, Windows 95 is more robust and it can be said that Windows 95 is “superior” to Windows 3.1.

So which is it for Ruby threads? At the Ruby level, threads are preemptive and at the C level, threads are non-preemptive. In other words, when writing C code, you can almost exactly specify the timing of thread switches.

Why is Ruby threading like this? Threads are indeed convenient but there are certain considerations that must be made when using them. Namely, code must accommodate for the threads (the code must be thread-safe). That is, if thread switching were preemptive at the C level, all of the C libraries that we use would have to be thread-safe.

However, there are actually many C libraries that are not yet thread-safe. If we decreased the number of libraries you can use by making thread safety a requirement, all of the effort taken to make extension libraries easy to write would be meaningless. Thus, for Ruby, making threading non-preemptive at the Ruby level is the rational choice.

Management Structure

We learned that at the C level, Ruby threads are non-preemptive. That is, after your thread runs for a while, it voluntarily gives up control to another thread. So let’s consider an executing thread that is just about to stop running. Which thread should it pass control to? No, to begin with we need to know how Ruby threads are represented internally. Let’s take a look at the variables and data structure for managing threads.

▼ Thread management structure

864  typedef struct thread * rb_thread_t;
865  static rb_thread_t curr_thread = 0;
866  static rb_thread_t main_thread;

7301  struct thread {
7302      struct thread *next, *prev;

(eval.c)

For various reasons, struct thread has become very large, so we focus on the important parts here. Looking at just the two members, next and prev, which are both rb_thread_t structures, you might think that rb_thread_t is a doubly-linked list. But actually, it is not just a doubly-linked list; its ends meet. In other words, it is a circular doubly-linked list. This is an important point. When you add the static variables, main_thread and curr_thread, the whole data structure looks like Diagram 1.

Figure 1: Data structure for managing threads

main_thread is a thread that exists when the program is starting up. In other words, it is the “first” thread. curr_thread is, of course, the current thread; that is, the thread that is currently running. The value of main_thread doesn’t change throughout the operation of the process but the value of curr_thread changes rapidly.

With the threads forming a cycle in this manner, choosing the next thread is simple: just follow the next link and choose that thread. With just that, you can run all threads evenly, to an extent.