x86 Disassembly/Code Optimization

x86 Disassembly

Code Optimization

An optimizing compiler is perhaps one of the most complicated, most powerful, and most interesting programs in existence. This chapter will talk about optimizations, although this chapter will not include a table of common optimizations.

Stages of Optimizations

There are two times when a compiler can perform optimizations: first, in the intermediate representation, and second, during the code generation.

Intermediate Representation Optimizations

While in the intermediate representation, a compiler can perform various optimizations, often based on dataflow analysis techniques. For example, consider the following code fragment:

 x = 5;
 if(x != 5)
 {
   //loop body
 }

An optimizing compiler might notice that at the point of "if (x != 5)", the value of x is always the constant "5". This allows substituting "5" for x resulting in "5 != 5". Then the compiler notices that the resulting expression operates entirely on constants, so the value can be calculated now instead of at run time, resulting in optimizing the conditional to "if (false)". Finally the compiler sees that this means the body of the if conditional will never be executed, so it can omit the entire body of the if conditional altogether.

Consider the reverse case:

 x = 5;
 if(x == 5)
 { 
    //loop body
 }

In this case, the optimizing compiler would notice that the IF conditional will always be true, and it won't even bother writing code to test x.

Control Flow Optimizations

Another set of optimization which can be performed either at the intermediate or at the code generation level are control flow optimizations. Most of these optimizations deal with the elimination of useless branches. Consider the following code:

 if(A)
 {
    if(B)
    {
       C;
    }
    else
    {
       D;
    }
    end_B:
 }
 else
 {
    E;
 }
 end_A:

In this code, a simplistic compiler would generate a jump from the C block to end_B, and then another jump from end_B to end_A (to get around the E statements). Clearly jumping to a jump is inefficient, so optimizing compilers will generate a direct jump from block C to end_A.

This unfortunately will make the code more confused and will prevent a nice recovery of the original code. For complex functions, it's possible that one will have to consider the code made of only if()-goto; sequences, without being able to identify higher level statements like if-else or loops.

The process of identifying high level statement hierarchies is called "code structuring".

Code Generation Optimizations

Once the compiler has sifted through all the logical inefficiencies in your code, the code generator takes over. Often the code generator will replace certain slow machine instructions with faster machine instructions.

For instance, the instruction:

 beginning:
 ...
 loopnz beginning

operates much slower than the equivalent instruction set:

 beginning:
 ...
 dec ecx
 jne beginning

So then why would a compiler ever use a loopxx instruction? The answer is that most optimizing compilers never use a loopxx instruction, and therefore as a reverser, you will probably never see one used in real code.

What about the instruction:

  
 mov eax, 0

The mov instruction is relatively quick, but a faster part of the processor is the arithmetic unit. Therefore, it makes more sense to use the following instruction:

 xor eax, eax

because xor operates in very few processor cycles (and saves three bytes at the same time), and is therefore faster than a "mov eax, 0". The only drawback of a xor instruction is that it changes the processor flags, so it cannot be used between a comparison instruction and the corresponding conditional jump.

Loop Unwinding

When a loop needs to run for a small, but definite number of iterations, it is often better to unwind the loop in order to reduce the number of jump instructions performed, and in many cases prevent the processor's branch predictor from failing. Consider the following C loop, which calls the function MyFunction() 5 times:

for(x = 0; x < 5; x++) 
{
  MyFunction();
}

Converting to assembly, we see that this becomes, roughly:

mov eax, 0
loop_top:
cmp eax, 5
jge loop_end
call _MyFunction
inc eax
jmp loop_top
loop_end:

Each loop iteration requires the following operations to be performed:

Compare the value in eax (the variable "x") to 5, and jump to the end if greater than or equal
Increment eax
Jump back to the top of the loop.

Notice that we remove all these instructions if we manually repeat our call to MyFunction():

call _MyFunction
call _MyFunction
call _MyFunction
call _MyFunction
call _MyFunction

This new version not only takes up less disk space because it uses fewer instructions, but also runs faster because fewer instructions are executed. This process is called Loop Unwinding.

Inline Functions

The C and C++ languages allow the definition of an inline type of function. Inline functions are functions which are treated similarly to macros. During compilation, calls to an inline function are replaced with the body of that function, instead of performing a call instruction. In addition to using the inline keyword to declare an inline function, optimizing compilers may decide to make other functions inline as well.

Function inlining works similarly to loop unwinding for increasing code performance. A non-inline function requires a call instruction, several instructions to create a stack frame, and then several more instructions to destroy the stack frame and return from the function. By copying the body of the function instead of making a call, the size of the machine code increases, but the execution time decreases.

It is not necessarily possible to determine whether identical portions of code were created originally as macros, inline functions, or were simply copy and pasted. However, when disassembling it can make your work easier to separate these blocks out into separate inline functions, to help keep the code straight.