Optimizing Code for Speed/Introduction

Optimization is the term that is applied to the process in which code is tuned to be better in some respects: either speed, memory consumption, input/output (disk read and writes or network reads and writes), etc. In mathematics, optimization means a process in which one finds the values with the best performance. In computing, where programs are very complex, usually optimizing for speed in the mathematical sense is impossible. Instead the term has come to mean just advancing in the direction of better performance in one or more respects.

This document will focus on optimizing code to run faster. However, as you will see later, doing this may involve having to optimize the code in a different aspect. Furthermore, often when programmers are trying to optimize one aspect of a program, they are doing so in order to increase speed.

Intended Audience

The intended audience of this article are software developers, primarily programmers who know at least one programming language well enough to write moderately complex programs. If you're interested in becoming involved in the world of software development, refer to the Teaching Open Source site, and especially the essay "How to start contributing to or using Open Source Software" there. Note that not all software developers are programmers, whether those that actively program as part of their job definition, or even those that are qualified to write source code. The essay and the book that is maintained on the site refer to resources about learning how to program in case you're interested in doing that.

Then and only then you can return to this wikibook.

Why optimize?

So why should we optimize? If you've been using a computer system for a while, and experimented with not-so-"polished" software applications, it is more than likely that you've encountered places where the application annoyed you: it took too long to start; it didn't behave as you expected; there were bugs; it took too long to perform an operation that seemed trivial; it was impossible to cancel an operation; the error messages were cryptic and did not explain how to overcome the issue; the program wouldn't exit; it caused you to lose data; and the list goes on. All these issues likely annoyed and irritated you and made you feel unhappy. With all due respect to the computation power and storage capacities of modern computers and their multi-purpose functionality, computers are still tools in the hand of their masters, the sentient human beings. As a result, they should perform the function that their masters instructed them to as quickly, as correctly and as thoroughly as possible and allow us to continue using them flawlessly for our own work or pleasure, or alternatively take a break from computing and do something else instead of fighting with the computer.

A software component or application that performs its function well, is of high quality and makes the user feel happy and in control. On the other hand, a software component that is too slow, too buggy, too unstable, etc. lacks the essential quality and makes the user feel frustrated, miserable, angry, or not in control. No one likes to feel that.

How to create and maintain software that is high quality in all respects is out of the scope of this article, and the interested reader is directed to a previous installment by the originator as this one titled "What makes software high quality?". This article will focus on the making our software speedy and responsive, which are an important aspect of quality, that is sometimes overlooked.

If our program is too slow or unresponsive (the latter is also known as being "sluggish"), then our user will feel frustrated at it, or get bored while it is running and do something else. In that case, we should usually optimize to make sure our program runs quickly enough or is responsive/snappy enough that will make the user happier as he doesn't have to wait for it to finish or respond. We cover more aspects of that below.

In which programs is optimization required?

There are several type of programs in which optimization is required. One type of them is real time programs. Despite common misconception, these are not necessarily programs that need to be very fast. Instead real time programs are programs that must respond to certain events within a certain time limit that was dedicated to the event. So for example, if the user presses a keyboard key while inside a word processor window, it should display the character relatively promptly. A 5 seconds delay will be unacceptable and as such a word processor is a kind of application that has some real-time constraints.

Some parts of real-time systems may need optimization. However, once the time falls below the required delay, no other optimization is necessary. (At least not until testing proves it is again exceeding the limit.)

Other programs that require optimization are programs that cannot be fast enough. Examples for such programs are Artificial Intelligence programs, games, multimedia programs, virtual machines and interpreters for programming languages, or any type of publicly available library or module whose use "in the wild" may necessitate it to be very fast.

Naturally, another type of such programs are programs that are too slow or perceived to be too slow. There usually is no inherent reason for them to be slow, and often nothing prevents such a program from being fast. Such programs that are considered too slow should be optimized to make them run faster.

However, other programs are inherently slow. For example, key-strengthening algorithms are specifically designed to be slow with no known shortcuts. Another example is that copying 8 gigabytes through a a 12 Mbit/s "USB full-speed" port inherently requires more than an hour, due to the constraints of the hardware.

When to optimize?

The rule of the thumb is that one should optimize if the program is not fast enough compared to the desired speed. A common misconception is that if a program is not fast enough now, it will run faster as hardware gets faster. However that is not always the case.

If your program is slow, it is usually because the code is not optimized (such as due to the fact it uses sub-optimal algorithms). While better hardware will usually improve performance, it does not guarantee a noticeable speed increase. Furthermore, while the progress in hardware speed has allowed programmers to write more wasteful code (see Paul Graham's "The Hundred Years Language" essay), some programs are still excessively slow.

When writing a program on a modern setup, one should make it fast enough already. Some people have claimed that we have already reached the end of Moore's First Law in regards to uni-core processors, and we cannot expect single-processed single-threaded program to run faster. In any case, even if the program runs well on a high-end platform, it may still need to run on older hardware.

We've all seen the fact that while computers got faster, software has often become slower to run unless the hardware is upgraded. The so-called "Gates' Law" claims that commercial programs decrease in speed by half every 18 months, due to various reasons. It is well known that the various versions of the DOS operating system ran adequately on a PC XT's and 286's and that an Intel 386 was a "lean and mean DOS machine" as a certain journalist noted back then. On the other hand, Microsoft Windows 3.0 and Microsoft Windows 3.1 already required a fast 486 computer to be ran comfortably, while Windows 95 was barely usable there and needed a Pentium computer. Windows XP already ran slowly on a Pentium machine and required a high end Pentium III or Pentium 4 computer. Windows Vista requires even more hardware resources than Windows XP, up to the point that many computers in use today cannot run it comfortably.

Now, while software simulations that run directly against the CPU and memory (and possibly hard-disk) are still running much faster than before, the responsiveness of the system itself does not seem to improve much.

When not to optimize?

One should not optimize when the program in question runs fast enough. However, if it also has to run adequately on slower hardware, handle a greater load, or work well with fewer resources at its disposal, then the program might need to be optimized further based on such requirements.

Surprisingly, often profiling reveals that applying a particular optimization technique to a particular program makes it slower. One should revert to the previous "un-optimized" version of the program in those cases.

Optimization vs. Modularity and Readability

Optimization may have a bad influence on such code-quality factors as modularity and readability. Consider for example, what the developers of GMP (GNU Multi-precision), wrote in their manual:

The speed of GMP is achieved by using fullwords as the basic arithmetic type, by using sophisticated algorithms, by including carefully optimized assembly code for the most common inner loops for many different CPUs, and by a general emphasis on speed (as opposed to simplicity or elegance).

Also consider that short functions and methods are considered preferable over longer methods, to allow for greater code re-use, testing, and self-documentation. (See the "Extract Method" refactoring pattern). However, function or method calls are considered a relatively costly operation, and so this extra code-quality, will probably make the program slower.

Therefore, it's not hard to imagine that a developer who wishes to save some extra cycles will merge a function that gets used only once or a few times into its caller code (or incorporating its code using a macro, or a preprocessing stage if possible), and therefore making the code less modular.

Similarly, highly optimized code may also become much more unreadable. For example, the C Preprocessor's macros are notorious for making code that uses them unreadable. I heard some complaints that mentioned that OpenSSL's code, which is based on the preprocessor macros, identically named static functions and lots of code-generation is very hard to understand and follow due to these facts.

Similar complaints are constantly voiced against perl5's XS, its interface for creating subroutines in ANSI C and other lower-level languages. The API was designed to be as efficient and fast as possible and, as a result, is hard to learn, understand and follow.