X86 Assembly/MMX

MMX is a supplemental instruction set introduced by Intel in 1996. Most of the new instructions are "single instruction, multiple data" (SIMD), meaning that single instructions work with multiple pieces of data in parallel.

MMX has a few problems, though: instructions run slightly slower than the regular arithmetic instructions, the FPU can't be used when the MMX registers are in use, and MMX registers use saturation arithmetic.

Saturation ArithmeticEdit

In an 8-bit grayscale picture, 255 is the value for pure white, and 0 is the value for pure black. In a regular register (AX, BX, CX ...) if we add one to white, we get black! This is because the regular registers "roll-over" to the next value. MMX registers get around this by a technique called "Saturation Arithmetic". In saturation arithmetic, the value of the register never rolls over to 0 again. This means that in the MMX world, we have the following equations:

255 + 100 = 255
200 + 100 = 255
0 - 100 = 0;
99 - 100 = 0;

This may seem counter-intuitive at first to people who are used to their registers rolling over, but it makes sense in some situations: if we try to make white brighter, it shouldn't become black.

Single Instruction Multiple Data (SIMD) InstructionsEdit

The MMX registers are 64 bits wide, but can be broken down as follows:

2 32 bit values
4 16 bit values
8 8 bit values

The MMX registers cannot easily be used for 64 bit arithmetic. Let's say that we have 4 bytes loaded in an MMX register: 10, 25, 128, 255. We have them arranged as such:

MM0: | 10 | 25 | 128 | 255 |

And we do the following pseudo code operation:

MM0 + 10

We would get the following result:

MM0: | 10+10 | 25+10 | 128+10 | 255+10 | = | 20 | 35 | 138 | 255 |

Remember that our arithmetic "saturates" in the last box, so the value doesn't go over 255.

Using MMX, we are essentially performing 4 additions in the time it takes to perform 1 addition using the regular registers, using 4 times fewer instructions.

MMX RegistersEdit

There are 8 64-bit MMX registers. To avoid having to add new registers, they were made to overlap with the FPU stack register. This means that the MMX instructions and the FPU instructions cannot be used simultaneously. MMX registers are addressed directly, and do not need to be accessed by pushing and popping in the same way as the FPU registers.

MM7 MM6 MM5 MM4 MM3 MM2 MM1 MM0

These registers correspond to to same numbered FPU registers on the FPU stack.

Usually when you initiate an assembly block in your code that contains MMX instructions, the CPU automatically will disallow floating point instructions. To re-allow FPU operations you must end all MMX code with emms. Here is an example of a C routine calling assembly language with MMX code:

//---------------------------------------------------
// A simple example using MMX to copy 8 bytes of data 
// From source s2 to destination s1
//---------------------------------------------------
void __fastcall CopyMemory8(char *s1, const char *s2)
{
    __asm
    {
        push edx	//save state
        mov ecx, s2	//load data pointers
        mov edx, s1
        movq mm0, [ecx] //move quad-word (64 bits) into mm0
        movq [edx], mm0 //store in s1
        pop edx		//restore edx
        emms		//re-allow floating point operations
    }
}

NOTE: this example will probably only work with Borland C++ compatible compilers.

MMX Instruction SetEdit

Several suffixes are used to indicate what data size the instruction operates on:

  • Byte (8 bits)
  • Word (16 bits)
  • Double word (32 bits)
  • Quad word (64 bits)

The signedness of the operation is also signified by the suffix: US for unsigned and S for signed.

For example, PSUBUSB subtracts unsigned bytes, while PSUBSD subtracts signed double words.

MMX defined over 40 new instructions, listed below.

EMMS, MOVD, MOVQ, PACKSSDW, PACKSSWB, PACKUSWB, PADDB, PADDD, PADDSB, PADDSW, PADDUSB, PADDUSW, PADDW, PAND, PANDN, PCMPEQB, PCMPEQD, PCMPEQW, PCMPGTB, PCMPGTD, PCMPGTW, PMADDWD, PMULHW, PMULLW, POR, PSLLD, PSLLQ, PSLLW, PSRAD, PSRAW, PSRLD, PSRLQ, PSRLW, PSUBB, PSUBD, PSUBSB, PSUBSW, PSUBUSB, PSUBUSW, PSUBW, PUNPCKHBW, PUNPCKHDQ, PUNPCKHWD, PUNPCKLBW, PUNPCKLDQ, PUNPCKLWD, PXOR

Last modified on 4 December 2011, at 06:15