Microprocessor Design/Instruction Decoder

Microprocessor Design

The Instruction Decoder reads the next instruction in from memory, and sends the component pieces of that instruction to the necessary destinations.

For each machine-language instruction, the control unit produces the sequence of pulses on each control signal line required to implement that instruction (and to fetch the next instruction).

If you are lucky, when you design a processor you will find that many of those control signals can be "directly decoded" from the instruction register. For example, sometimes a few output bits from the instruction register IR can be directly wired to the "which function" inputs of the ALU. Even if those bits mean something completely unrelated in non-ALU instructions, it's OK if the ALU performs, say, a bogus SUBTRACT, while the rest of the processor is executing a STORE instruction.

The remaining control signals that cannot be decoded from the instruction register -- if you are unlucky, all the control signals -- are generated by the control unit, which is implemented as a [Moore machine][2] or a [Mealy machine][3]. There are many different ways to implement the control unit.

If you design a processor with a Princeton architecture -- your processor normally pulls instructions from the same single-ported memory used to read and write data -- then you are forced to have at least LOAD and STORE take more than one clock cycle to execute. (One cycle for the data, and another cycle to read the next instruction). (Many processors are designed with "single-cycle execution", either very simple Harvard architecture processors, or complicated high-performance processors with a separate instruction cache).

RISC Instruction Decoder

The philosophy behind a RISC machine is that instruction execution is kept extremely fast and simple. Thus the instruction decoder, such as there is one must be as streamlined as possible. The tradeoff is that more instructions must be executed to perform the more complex machine functions. All low-level abstraction must be provided in the machine code compiler.

There are generally two variations to a RISC decoder.

In the first variant the bits of the instruction code are fed directly to the hardware components so there really is no decoder at all. All timing requirements must be met by careful programming of the machine code and there is no delay in execution from an instruction decoder. The necessary control line 'waveforms' are created directly by the sequence of instructions. As soon as the instruction is presented the hardware begins to respond. The bits in the instruction set directly represent the actual physical controls of the hardware components.

The second variant includes a simple (fast) decoder between the instruction code and the hardware components. This decoding is usually implemented by discrete logic gates. The primary purpose of this decoder is to allow a 'cleaner' instruction set representation. As with the direct variant, each instruction still represents a single machine function but the use of a decoder will often significantly reduce the number of bits needed in the machine code word. This in turn may allow several hardware functions to be initiated in a single machine instruction.

CISC Instruction Decoder

Decoding a CISC instruction word is much more difficult than the RISC case, and the increased complexity of the decoder is a common reason people cite when they choose to use RISC over CISC in their designs.

A CISC decoder commonly uses a state machine set up as a random length sequencer. The machine reads the opcode field to determine what type of instruction it is, and where the other data values are as needed. The instruction word may even be read in piece by piece, so decisions may be made at each stage as to how the remainder of the instruction word and any included data will be read.

This decoder sequences through a number of states of the machine hardware as opposed to the case of the RISC decoder. The instruction code selects which sequence is to be executed and the sequencer controls the timing of this performance. One 'step' in this performace may include fetching more input from the instruction code, whether this input is further instructions or needed data values.

Perhaps the easiest way to understand this decoder type is to conceive it as a very wide control store ROM. The program in this ROM is the microcode for the decoder/sequencer. A part of the instruction register is used to form part of the address to this ROM. Several bits of the address are feedback from the ROM itself. Most of the ROM's outputs are used to control the system hardware.

The feedback address portion forms a loop that will step through a random sequence as programmed into the ROM. This loop operates as fast as the access time of the ROM allows and is typically a single system clock. The feedback address is usually programmed to increment by 1 for each successive state of the sequencer. The number of feedback bits determines the maximum length of the microcoded sequence.

Wikipedia has related information at control store

Wikipedia has related information at microprogram

A pipeline register latches all the output bits of the control store ROM every clock cycle.

Each clock cycle the pipeline register latches a new set of bits.

The output of the sequencer has 2 components:

Control bits that go out to the hardware components of the processor. The "microPC" that feeds back to some of the address inputs of the control store ROM. Some people hardwire the carry flag to one of the address inputs of the control store ROM.

Every time a new opcode is fetched from main memory, typically the high bits of the microPC are loaded with the opcode, and the low bits of the microPC reset to zero. (To make things easier to debug, some designers load the opcode into both a separate instruction register IR as well as the microPC register, at least in the initial prototypes. Once the design is debugged, it might turn out that some or all the bits from the IR or the microPC register or both are never used, and so can be left out of the final design). During execution of the instruction, each clock cycle the pipeline register loads a new microPC address from the control store and a new set of control bits from the control store. Typically the person who writes the microprogram -- burned into the control store ROM -- designs the next-address output bits of that ROM to sequentially increment for the first few cycles of the implementation of that opcode. Then the microprogram for every "normal" opcode eventually jumps to one common section of the control store ROM that handles fetch-next-instruction.