Microprocessor Design/Design Steps

Microprocessor Design

When designing a new microprocessor or microcontroller unit, there are a few general steps that can be followed to make the process flow more logically. These few steps can be further sub-divided into smaller tasks that can be tackled more easily. The general steps to designing a new microprocessor are:

Determine the capabilities the new processor should have.
Lay out the datapath to handle the necessary capabilities.
Define the machine code instruction format (ISA).
Construct the necessary logic to control the datapath.

We will discuss each of these steps below:

Determine Machine Capabilities

Before you start to design a new processor element, it is important to first ask why you are designing it at all. What new thing will your processor do that existing processors cannot? Keep in mind that it is always less expensive to utilize an existing chip than to design and manufacture a new one.

Some questions to start:

Is this chip an embedded chip, a general-purpose chip, or a different type entirely?
What, if any, are the limitations in terms of resources, price, power, or speed?

With that in mind, we need to ask what our chip will do:

Does it have integer, floating-point, or fixed point arithmetic, or a combination of all three?
Does it have scalar or vector operation abilities?
Is it self-contained, or must it interface with a number of external peripherals?
Will it support interrupts? If so, How much interrupt latency is tolerable? How much interrupt-response jitter is tolerable?

We also need to ask ourselves whether the machine will support a wide array of instructions, or if it will have a limited set of instructions. More instructions make the design more difficult, but make programming and using the chip easier. On the other hand, having fewer instructions is easier to design, but can be harder and more costly to program.

Lay out the basic arithmetic operations you want your chip to have:

Addition/Subtraction
Multiplication
Division
Shifting and Rotating
Logical Operations: AND, OR, XOR, NOR, NOT, etc.

List other capabilities that your machine has:

Unconditional jumps
Conditional Jumps (and what conditions?)
Stack operations (Push, pop)

Once we know what our chip is supposed to do, it is easier to lay out the framework for our datapath

Design the Datapath

Right off the bat we need to determine what ALU architecture that our processor will use:

Accumulator
Stack
Register
A combination of the above 3

This decision, more than any other, is going to have the largest effect on your final design. Do not proceed in the design process until you have made this decision. Once you have your ALU architecture, you create your memory element (stack or register file), and you can lay out your ALU.

Create ISA

Once we have our basic datapath, we can start to design our ISA. There are a few things that we need to consider:

Is this processor RISC, CISC, or VLIW?
How long is a machine word?
How do you deal with immediate values? What kinds of instructions can accept immediate values?

Once we have our machine code basics, we frequently need to determine whether our processor will be compatible with higher-level languages. Specifically, are there any instructions that can be used for function call and return?

Determining the length of the instruction word in a RISC is a very important matter, and one that is worth a considerable amount of thought. For additional flexibility you can utilize a variable-length instruction set instead — like most CISC machines — at the expense of additional—and more complicated—instruction decode logic. If the instruction word is too long, programmers will be able to fit fewer instructions into memory. If the instruction word is too small, there will not be enough room for all the necessary information. On a desktop PC with several megabytes or even gigabytes of RAM, large instruction words are not a big problem. On an embedded system however, with limited program ROM, the length of the instruction word will have a direct effect on the size of potential programs, and the usefulness of the chips.

Each instruction should have an associated opcode, and typically the length of the opcode field should be constant for all instructions, to reduce complexity of the decoder. The length of the opcode field will directly impact the number of distinct instructions that can be implemented. if the opcode field is too small, you won't have enough room to designate all your instructions. If your opcode is too large, you will be wasting precious bits in your instruction word.

Some instructions will need to be larger than others. For instance, instructions that deal with an immediate value, a memory location, or a jump address are typically larger than instructions that only deal with registers. Instructions that deal only with registers, therefore, will have additional space left over that can be used as an extension to the opcode field.

Example: MIPS R-Type

In the MIPS architecture, instructions that only deal with registers are called R type instructions. With 32 registers, a register address is only 5 bits wide. The MIPS opcode is 6 bits wide. With the opcode and the three register addresses (two source and 1 destination register), an R-type instruction only uses 21 out of the 32 bits available.

The additional 11 bits are broken into two additional fields: Shamt, a 5 bit immediate value that controls the amount of places shifted by a shift or rotate instruction, and Func. Func is a 6 bit field that contains additional information about R-Type instructions. Because of the availability of the Func field, all R-Type instructions share an opcode of 0.

Instruction set design

Picking a particular set of instructions is often more an art than a science.

Historically there have been different perspectives on what makes a "good" instruction set.

The early CISC years focused on making instruction sets that expert assembly language programmers enjoyed programming -- " code density" was a common metric.
the early RISC years focused on making instruction sets that ran a few benchmark programs in C, when compiled with relatively primitive compilers, really, really fast -- "cycles per instruction", and later "instructions per cycle" was recognized as an important part of achieving low "time to run the benchmark".

Wikipedia has related information at non-blocking synchronization

The rise of multitasking operating systems (and shared-memory parallel processors) lead to the discovery of non-blocking synchronization and the instructions necessary to support it.
CPUs dedicated to a single application (ASICs or FPGAs) led to the idea of customizing the CPU for one particular application^[1]

Wikipedia has related information at Popek and Goldberg virtualization requirements

The rise of viruses and other malware led to the recognition of the Popek and Goldberg virtualization requirements.

Build Control Logic

Once we have our datapath and our ISA, we can start to construct the logic of our primary control unit. These units are typically implemented as a finite state machine, and we can try to map the ISA to the control unit in a logical way.

We go into much more detail on control unit design in the following sections, Microprocessor Design/Control and Datapath and Microprocessor Design/Instruction Decoder.

Design the Address Path

If a simple virtual==physical address path is adequate for your CPU, you can skip this section.

Most processors have a very simple address path -- address bits come from the PC or some other programmer-visible register, or directly from some instruction, and they are directly applied to the address bus.

Many general-purpose processors have a more complex address path: user-level programs run as if they have a simple address path, but the physical address applied to the address bus is significantly different than the programmer-visible address. This enables virtual memory, memory protection, and other desirable features.

We talk more about the benefits and drawbacks of an MMU, and how to implement it, in

Verify the design

People who design a CPU often spend more time on functional verification than all other steps combined.

References

↑ "Generating instruction sets and microarchitectures from applications" by Ing-Jer Huang, and Alvin M. Despain

[1] "Generating instruction sets and microarchitectures from applications" by Ing-Jer Huang, and Alvin M. Despain

[1]