Microprocessor Design/Performance

Microprocessor Design

Clock Cycles

The clock signal is a 1-bit signal that oscillates between a "1" and a "0" with a certain frequency. When the clock transitions from a "0" to a "1" it is called the positive edge, and when the clock transitions from a "1" to a "0" it is called the negative edge.

The time it takes to go from one positive edge to the next positive edge is known as the clock period, and represents one clock cycle.

The number of clock cycles that can fit in 1 second is called the clock frequency. To get the clock frequency, we can use the following formula:

{\mbox{Clock Frequency}}={\frac {1}{\mbox{Clock Period}}}

Clock frequency is measured in units of cycles per second.

Cycles per Instruction

In many microprocessor designs, it is common for multiple clock cycles to transpire while performing a single instruction. For this reason, it is frequently useful to keep a count of how many cycles are required to perform a single instruction. This number is known as the cycles per instruction, or CPI of the processor.

Because all processors may operate using a different CPI, it is not possible to accurately compare multiple processors simply by comparing the clock frequencies. It is more useful to compare the number of instructions per second, which can be calculated as such:

{\mbox{Instructions per Second}}={\frac {\mbox{Clock Frequency}}{CPI}}

One of the most common units of measure in modern processors is the "MIPS", which stands for millions of instructions per second. A processor with 5 MIPS can perform 5 million instructions every second. Another common metric is "FLOPS", which stands for floating point operations per second. MFLOPS is a million FLOPS, GFLOPS is a billion FLOPS, and TFLOPS is a trillion FLOPS.

Instruction count

The "instruction count" in microprocessor performance measurement is the number of instructions executed during the run of a program. Typical benchmark programs have instruction counts in the millions or billions -- even though the program itself may be very short, those benchmarks have inner loops that are repeated millions of times.

Some microprocessor designers have the freedom to add instructions to or remove instructions from the instruction set. Typically the only way to reduce the instruction count is to add instructions such that those inner loops can be re-written in a way that does the necessary work using fewer instructions -- those instructions do "more work" per instruction.

Sometimes, counter-intuitively, we can improve overall CPU performance (i.e., reduce CPU time) in a way that increases the instruction count, by using instructions in that inner loop that may do "less work" per instruction, but those instructions finish in less time.

CPU Time

CPU Time is the amount of time it takes the CPU to complete a particular program. CPU time is a function of the amount of time it takes to complete instructions, and the number of instructions in the program:

{\mbox{CPU time}}={\mbox{Instruction Count}}\times CPI\times {\mbox{Clock Cycle Time}}

Sometimes we can improve one of the 3 components alone, reducing CPU time. But quite often we find a tradeoff -- say, a technique that increases instruction count, but reduces the clock cycle time -- and we have to measure the total CPU time to see if that technique makes the overall performance better or worse.

Performance

Amdahls Law

Wikipedia has related information at Amdahl's Law

Amdahl's Law is a law concerned with computer performance and optimization. Amdahl's law states that an improvement in the speed of a single processor component will have a comparatively small effect on the performance of the overall processor unit.

In the most general sense, Amdahl's Law can be stated mathematically as follows:

\Delta ={\frac {1}{\sum _{k=0}^{n}{{\big (}{\frac {P_{k}}{S_{k}}}{\big )}}}}

where:

Δ is the factor by which the program is sped up or slowed down,
P_k is a percentage of the instructions that can be improved (or slowed),
S_k is the speed-up multiplier (where 1 is no speed-up and no slowing),
k represents a label for each different percentage and speed-up, and
n is the number of different speed-up/slow-downs resulting from the system change.

For instance, if we make a speed improvement in the memory module, only the instructions that deal directly with the memory module will experience a speedup. In this case, the percentage of load and store instructions in our program will be P₀, and the factor by which those instructions are sped up will be S₀. All other instructions, which are not affected by the memory unit will be P₁, and the speed up will be S₁ Where:

P_{1}=1-P_{0}

S_{1}=1

We set S₁ to 1 because those instructions are not sped up or slowed down by the change to the memory unit.

Benchmarking

SpecInt
SpecFP
"Maxim/Dallas APPLICATION NOTE 3593" benchmarking
"Mod51 Benchmarks"
EEMBC, the Embedded Microprocessor Benchmark Consortium