x86 Assembly/MMX

x86 Assembly
quick links: registers • move • jump • calculate • logic • rearrange • misc. • FPU

Wikipedia has related information at MMX (instruction set)

MMX is a supplemental instruction set introduced by Intel in 1996. Most of the new instructions are "single instruction, multiple data" (SIMD), meaning that single instructions work with multiple pieces of data in parallel.

MMX has a few problems, though: instructions run slightly slower than the regular arithmetic instructions, the Floating Point Unit (FPU) can't be used when the MMX registers are in use, and MMX registers use saturation arithmetic.

Saturation Arithmetic edit

In an 8-bit grayscale picture, 255 is the value for pure white, and 0 is the value for pure black. In a regular register (AX, BX, CX ...) if we add one to white, we get black! This is because the regular registers "roll-over" to the next value. MMX registers get around this by a technique called "Saturation Arithmetic". In saturation arithmetic, the value of the register never rolls over to 0 again. This means that in the MMX world, we have the following equations:

255 + 100 = 255
200 + 100 = 255
0 - 100 = 0;
99 - 100 = 0;

This may seem counter-intuitive at first to people who are used to their registers rolling over, but it makes sense in some situations: if we try to make white brighter, it shouldn't become black.

Single Instruction Multiple Data (SIMD) Instructions edit

The MMX registers are 64 bits wide, but can be broken down as follows:

2 32 bit values
4 16 bit values
8 8 bit values

The MMX registers cannot easily be used for 64 bit arithmetic. Let's say that we have 4 bytes loaded in an MMX register: 10, 25, 128, 255. We have them arranged as such:

MM0: | 10 | 25 | 128 | 255 |

And we do the following pseudo code operation:

MM0 + 10

We would get the following result:

MM0: | 10+10 | 25+10 | 128+10 | 255+10 | = | 20 | 35 | 138 | 255 |

Remember that our arithmetic "saturates" in the last box, so the value doesn't go over 255.

Using MMX, we are essentially performing 4 additions in the time it takes to perform 1 addition using the regular registers, using 4 times fewer instructions.

MMX Registers edit

There are 8 64-bit MMX registers. To avoid having to add new registers, they were made to overlap with the FPU stack register. This means that the MMX instructions and the FPU instructions cannot be used simultaneously. MMX registers are addressed directly, and do not need to be accessed by pushing and popping in the same way as the FPU registers.

MM7 MM6 MM5 MM4 MM3 MM2 MM1 MM0

These registers correspond to the same numbered FPU registers on the FPU stack.

Usually when you initiate an assembly block in your code that contains MMX instructions, the CPU automatically will disallow floating point instructions. To re-allow FPU operations you must end all MMX code with emms.

The following is a program for GNU AS and GCC which copies 8 bytes from one variable to another and prints the result.

Assembler portion

.globl copy_memory8
.type  copy_memory8, @function
copy_memory8:
    pushl %ebp
    mov  %esp, %ebp
    mov 8(%ebp), %eax
    movq (%eax), %mm0
    mov 12(%ebp), %eax
    movq %mm0, (%eax)
    popl %ebp
    emms
    ret
.size copy_memory8,.-copy_memory8

C portion

#include <stdio.h>

void copy_memory8(void * a, void * b);

int main () {
	long long b = 0x0fffffff00000000;
	long long c = 0x00000000ffffffff;
	printf("%lld == %lld\n", b, c);
	copy_memory8(&b, &c);
	printf("%lld == %lld\n", b, c);
	return 0;
}

MMX Instruction Set edit

Several suffixes are used to indicate what data size the instruction operates on:

Byte (8 bits)
Word (16 bits)
Double word (32 bits)
Quad word (64 bits)

The signedness of the operation is also signified by the suffix: US for unsigned and S for signed.

For example, PSUBUSB subtracts unsigned bytes, while PSUBSD subtracts signed double words.

MMX defined over 40 new instructions, listed below.

EMMS, MOVD, MOVQ, PACKSSDW, PACKSSWB, PACKUSWB, PADDB, PADDD, PADDSB, PADDSW, PADDUSB, PADDUSW, PADDW, PAND, PANDN, PCMPEQB, PCMPEQD, PCMPEQW, PCMPGTB, PCMPGTD, PCMPGTW, PMADDWD, PMULHW, PMULLW, POR, PSLLD, PSLLQ, PSLLW, PSRAD, PSRAW, PSRLD, PSRLQ, PSRLW, PSUBB, PSUBD, PSUBSB, PSUBSW, PSUBUSB, PSUBUSW, PSUBW, PUNPCKHBW, PUNPCKHDQ, PUNPCKHWD, PUNPCKLBW, PUNPCKLDQ, PUNPCKLWD, PXOR