Foundations of Computer Science/Parallel Processing

Computing is fundamentally about information processes. On a digital computer such processes are carried out via symbol manipulations in binary logic. With the advancement in semiconducting technology we have been able to keep making computers run faster—manipulate bits at a higher speed—by cramming more transistors into computer chips. This is known as the Moore's law originated around 1970's. The trend of increase has slowed down and will eventually flatten due to limits in physics as predicted by some physicist, who also predicted potential new technologies that may replace semiconductors (silicon) in computer hardware manufacturing.

In the meantime, hardware companies have tweaked their technologies to maintain the growth in hardware capacity. Multicore technology replaces one fast CPU (Central Processing Unit—the brain of a computer) with many slower ones (called cores) to avoid overheating the chip. Even though each core is a slower but we get more of them and could get more done if we can arrange the work properly. For instance, a strong worker can lift 100 bricks a minute and a normal worker can only lift 34 bricks. Three normal workers can outperform one strong worker even though they are much slower individually. This is the idea of parallel processing.

Traditionally computer program has been written to describe sequential processes, which means the steps can only be carried out one at a time and one after another in a sequence. This type of program works fine on a computer with a single processor because the computer can perform one symbol manipulation at a time any ways. In fact we have been reaping the benefit of Moore's law: every two computer hardware double it speed causing our program run twice as fast without us doing anything. This trend has stopped. Each individual processor (core) is not getting faster but we have more of them in a computer. As a result our existing sequential program will run slower even though the hardware's capacity has become larger. Before the next generation of computers are invented we can parallel computing/processing to solve problems faster computationally.

The idea of parallel processing is not new. For example, a car assembly line allows multiple cars to be built at the same time. Even though different parts of the car are being assembled at a given time this assembly line keeps all the workers busy increasing the throughput (number of cars built per unit time) of the whole system. We can make the workers work faster to further increase the throughput or we could add another assembly and hire more workers. This is one form of parallel processing - pipelining. Another form of parallelism divides the whole computing task into parts that can be computed simultaneously and run them physically on different CPU (computers). This is similar to putting a jigsaw puzzle together with friends. As you can imagine having some extra help will definitely help solve the puzzle faster, but do it mean the more the better. The answer is no. As the number of helpers increase the amount of coordination and communication increases faster. When you have too many people they may start stepping on each other's toes and competing with each other for resources (space and puzzle pieces). This is known as the overhead of parallel processing, which causes the diminishing return on investment. We can see this pattern clearly when we measure the improvement in performance as a function of the workers involved.

In the context of parallel processing/computing we use a metric called speedup to measure the improvement in performance. The achieved speedup equals the solution/execution time of a program with out parallel processing divided by the execution time of the same task with parallel processing.

where:

  • is the speedup.
  • is the old execution time without the parallel processing.
  • is the new execution time with the parallel processing.

If parallel processing makes a program run twice as fast the speedup is two (a.k.a two-fold speedup). Theoretically as we double the number of workers or resources we can expect a two-fold speed up. Practically it is hard to achieve this optimal speedup because some tasks are not always parallelizable. For example you can not usually lay the carpet before the floor of house is constructed and you cannot always add more painters to get the painting job down faster. Computational tasks often have similar dependency and resources constraints to keep us from fully utilize the parallel processing systems (e.g. multi-core computers) we have.

Exercise:

With a washing machine and a dryer you could work on one load of laundry at time. You wash it first and then put it into the dryer. Assume the whole task takes an hour. This works perfectly when you have only one load to do and there is nothing you can do to make it go faster. What if you have many loads of laundry to do? You can at least at one load done every hour. Can you "speed it up"? If the number of loads can be arbitrary larger what is shortest average per load laundry time you can achieve?