Software Engineers Handbook/Language Dictionary/DEC PDP-11

Note: All assembly language sample code depends on the developing environment assembler and operating system.

Digital Equipment Corporation PDP-11

Type

CISC

Execution Entry Point

On Unix, execution of a program would begin at the first word. The Unix "a.out" format (assembler output) had a header, and the first word of the header, 407 octal, was a branch around the rest of the header. By Fifth Edition Unix, the header did not actually get loaded in the execution image, so the 407 octal did not have to be executed; however, the "magic number" 0407 (octal) stuck as part of a.out format, even as the format moved to other computer architectures.

Registers

There were eight 16-bit registers, addressed by three bits in addressing modes in instructions. Register 7 was the program counter and register 6 was the stack pointer. Registers zero through 5 were general-purpose. The stack grew downward.

Addressing modes included postincrement and predecrement. A popular myth assumes these to be the source of the increment and decrement operators in C, but in fact those were inherited from B, which was implemented before the PDP-11 existed.

A consequence of the fact that the PC was addressed as an ordinary register, and of the inclusion of the postincrement addressing mode, was that you could load a literal value into a register (or send it to memory, for that matter) by having the literal follow the instruction in memory.

       mov (PC)+, R5
       177265

would put 177265 octal in register 5. In the assembler (using Unix assembler syntax here), you could abbreviate this as

       mov $177265, R5

General Syntax

In the Unix assembler, a colon followed each label.

foo:   mov -(PC), -(PC)   / copy this instruction to the previous
                          / location and branch there
       br foo             / branch to foo

Of course in the above example, the branch to foo would not have to be executed since the instruction before it is a (nasty) loop unto itself.

Instruction fields occurred on three-bit boundaries starting from the low-order side, so it was easy to remember some instructions and the addressing modes for programming in binary from the front-panel switches. For instance, machine language for

       mov -(PC), -(PC)

was 01x7x7 octal, where I forget what x was but it denoted the addressing mode, predecrement, each 7 denotes the PC, and the 01 is the opcode for move (which means copy). These field boundaries did not hold for the branch instructions, however.

Comments

At least one of the assemblers (the DEC?) allowed comments delimited with a semicolon to end of line. In the Unix Assembler the character "/" was used instead. ^[1]

Interrupts

The processor status word (PSW) was addressed at a specific memory-mapped location (-2, I believe). The interrupt vectors were in low memory. An interrupt would push the return address on the stack, and something else, because the format for an interrupt differed from that of a subroutine call, because there was a return-from-interrupt instruction distinct from the normal return (from subroutine) instruction.

Conditional Statements

There were branches, which went to short relative addresses, and jump instructions, which could go to any address using any addressing mode. Only branches could be conditional, in which case they would depend on condition codes set by a previous compare or arithmetic instruction. I forget whether there was a "branch never". There was a "wait", which would wait until the next interrupt, and a "halt", which would give control to the console (if executed in kernel mode or on a machine without protection).

      cmp r0, (r1)        / compare the contents of register R0 to
                          / what R1 points to in memory (two-byte word)
      bne foo             / branch not equal, to foo

Input/Output

Devices were memory mapped in high addresses. You could use interrupts or polling to know when they were ready. There were several levels of interrupt (or bus request).

There was a graphical computer called the GT40, before raster graphics became de rigeur. It had a computing processor that was a PDP-11, and a coprocessor to do the vectoring graphics. The graphic processor had its own jump instruction and would be put in a loop to keep the screen refreshed. There were great lunar lander and space war programs.

User programs on Unix, of course, did I/O with system calls, which were trap instructions.

stdout = 1
.data
msg:   <Hello, world.\n>
.code
       sys write; stdout; msg; 14.   / is the length
       / don't bother checking for error or a write of less than
       / the full buffer.
       sys exit; 0                   / exit with an OK status

I'm not certain whether the exit status followed the "sys exit" as above, or whether it was in r0.

       clr r0
       sys exit

Since the arguments to system calls usually followed the trap instructions in memory, and the arguments often should be variable, and you can't write reentrant code if there are variables mixed in your executable code, Unix provided the "sys indir" call which could point to a system call in data space, which Unix would interpret as though it had occurred inline.

Indirection

One layer of indirection, determined by the instruction. You could index by a literal following the instruction.

Physical Structure

On Unix, assembly language usually used the .s extention (for "source"). The result of assembly was called a.out unless you told the assembler otherwise, in which case the convention was to use a .o suffix (for "object"). Typically the linker "ld" (for "loader") would be run on a bunch of .o files to produce the new a.out, which you could then cause the execution of by just typing its name.

Useful Commands

There were, in both the Unix and DEC assemblers, nonce labels. Rather than make up a name for every label you needed, you could just use a number and reach it locally.

1:    cmp r3, r4           / hit size limit yet?
      bge 2f
      mov (r0)+, (r1)+
      inc r3
      br 1b
2:

In the Unix assembler, 2f meant 2 forward (the next "2:") and 1b meant 1 back, and so on.

Jumps could jump farther than branches, but took up two words, so a branch was desirable if possible. The assemblers provided a way to code a branch if it could reach and a jump otherwise. You could write it conditional, in which case the assembler would output a branch around a jump if necessary (inverting the condition, of course).

     jle foo                 / jump to foo if less or equal
     jbr bar                 / jump or branch to bar

The PDP-11 ia one of the original mini-computers. It was used in a huge variety of ways, from timesharing to embedded control as well as some desktop use. PDP-11 is not a language per se, but PDP-11 assembly is a flavor of assembly. This early machine's architecture influenced later microprocessors and because it affected the design of the machine native code, it also affected higher level languages.

The postincrement addressing also made it into the Motorola 6800.

The only reason you would want to learn the PDP-11 assembly language would be if you had acquired a working PDP-11 machine and wanted to write or modify programs running on it. The machine, PDP-11, as well as every machine that ran it native is obsolete.

Sources

Web References

<List additional references on the web. Please include for what level reader the references are appropriate. (beginner/intermediate/advanced)

Where is the code set on the web?>

Books and Articles

PDP-11 Programmer's Handbook, Digital Equipment Corporation

↑ Lions' Commentary on UNIX 6th Edition, Chapter 2 - Unix Assembler

[1] Lions' Commentary on UNIX 6th Edition, Chapter 2 - Unix Assembler

[1]