x86 Disassembly/Variables

Variables edit

We've already seen some mechanisms to create local storage on the stack. This chapter will talk about some other variables, including global variables, static variables, variables labelled "const," "register," and "volatile." It will also consider some general techniques concerning variables, including accessor and setter methods (to borrow from object-oriented terminology). This section may also talk about setting memory breakpoints in a debugger to track memory I/O on a variable.

How to Spot a Variable edit

Variables come in 2 distinct flavors: those that are created on the stack (local variables), and those that are accessed via a hardcoded memory address (global variables). Any memory that is accessed via a hard-coded address is usually a global variable. Variables that are accessed as an offset from esp, or ebp are frequently local variables.

Hardcoded address
Anything hardcoded is a value that is stored as-is in the binary, and is not changed at runtime. For instance, the value 0x2054 is hardcoded, whereas the current value of variable X is not hard-coded and may change at runtime.

Example of a hardcoded address:

 mov eax, [0x77651010]

OR:

 mov ecx, 0x77651010
 mov eax, [ecx]

Example of a non-hardcoded (softcoded?) address:

 mov ecx, [esp + 4]
 add ecx, ebx
 mov eax, [ecx]

In the last example, the value of ecx is calculated at run-time, whereas in the first 2 examples, the value is the same every time. RVAs are considered hard-coded addresses, even though the loader needs to "fix them up" to point to the correct locations.

.BSS and .DATA sections edit

Both .bss and .data sections contain values which can change at run-time (e.g. variables). Typically, variables that are initialized to a non-zero value in the source are allocated in the .data section (e.g. "int a = 10;"). Variables that are not initialized, or initialized with a zero value, can be allocated to the .bss section (e.g. "int arr[100];"). Because all values of .bss variables are guaranteed to be zero at the start of the program, there is no need for the linker to allocate space in the binary file. Therefore, .bss sections do not take space in the binary file, regardless of their size.

"Static" Local Variables edit

Local variables labeled static maintain their value across function calls, and therefore cannot be created on the stack like other local variables are. How are static variables created? Let's take a simple example C function:

 void MyFunction(int a)
 {
 	static int x = 0;
 	printf("my number: ");
 	printf("%d, %d\n", a, x);
 }

Compiling to a listing file with cl.exe gives us the following code:

 _BSS	SEGMENT
 ?x@?1??MyFunction@@9@9 DD 01H DUP (?)   	; `MyFunction'::`2'::x
 _BSS	ENDS
 _DATA	SEGMENT
 $SG796	DB	'my number: ', 00H
 $SG797	DB	'%d, %d', 0aH, 00H
 _DATA	ENDS
 PUBLIC	_MyFunction
 EXTRN	_printf:NEAR
 ; Function compile flags: /Odt
 _TEXT	SEGMENT
 _a$ = 8					; size = 4
 _MyFunction PROC NEAR
 ; Line 4
 	push	ebp
 	mov	ebp, esp
 ; Line 6
 	push	OFFSET FLAT:$SG796
 	call	_printf
 	add	esp, 4
 ; Line 7
 	mov	eax, DWORD PTR ?x@?1??MyFunction@@9@9
 	push	eax
 	mov	ecx, DWORD PTR _a$[ebp]
 	push	ecx
 	push	OFFSET FLAT:$SG797
 	call	_printf
 	add	esp, 12					; 0000000cH
 ; Line 8
 	pop	ebp
 	ret	0
 _MyFunction ENDP
 _TEXT	ENDS

Normally when assembly listings are posted in this wikibook, most of the code gibberish is discarded to aid readability, but in this instance, the "gibberish" contains the answer we are looking for. As can be clearly seen, this function creates a standard stack frame, and it doesn't create any local variables on the stack. In the interests of being complete, we will take baby-steps here, and work to the conclusion logically.

In the code for Line 7, there is a call to _printf with 3 arguments. Printf is a standard libc function, and it therefore can be assumed to be cdecl calling convention. Arguments are pushed, therefore, from right to left. Three arguments are pushed onto the stack before _printf is called:

  • DWORD PTR ?x@?1??MyFunction@@9@9
  • DWORD PTR _a$[ebp]
  • OFFSET FLAT:$SG797

The second one, _a$[ebp] is partially defined in this assembly instruction:

_a$ = 8

And therefore _a$[ebp] is the variable located at offset +8 from ebp, or the first argument to the function. OFFSET FLAT:$SG797 likewise is declared in the assembly listing as such:

 SG797	DB	'%d, %d', 0aH, 00H

If you have your ASCII table handy, you will notice that 0aH = 0x0A = '\n'. OFFSET FLAT:$SG797 then is the format string to our printf statement. Our last option then is the mysterious-looking "?x@?1??MyFunction@@9@9", which is defined in the following assembly code section:

 _BSS	SEGMENT
 ?x@?1??MyFunction@@9@9 DD 01H DUP (?) 
 _BSS	ENDS

This shows that the Microsoft C compiler creates static variables in the .bss section. This might not be the same for all compilers, but the lesson is the same: local static variables are created and used in a very similar, if not the exact same, manner as global values. In fact, as far as the reverser is concerned, the two are usually interchangeable. Remember, the only real difference between static variables and global variables is the idea of "scope", which is only used by the compiler.

Signed and Unsigned Variables edit

Integer formatted variables, such as int, char, short and long may be declared signed or unsigned variables in the C source code. There are two differences in how these variables are treated:

  1. Signed variables use signed instructions such as add, and sub. Unsigned variables use unsigned arithmetic instructions such as addi, and subi.
  2. Signed variables use signed branch instructions such as jge and jl. Unsigned variables use unsigned branch instructions such as jae, and jb.

The difference between signed and unsigned instructions is the conditions under which the various flags for greater-than or less-than (overflow flags) are set. The integer result values are exactly the same for both signed and unsigned data.

Floating-Point Values edit

Floating point values tend to be 32-bit data values (for float) or 64-bit data values (for double). These values are distinguished from ordinary integer-valued variables because they are used with floating-point instructions. Floating point instructions typically start with the letter f. For instance, fadd, fcmp, and similar instructions are used with floating point values. Of particular note are the fload instruction and variants. These instructions take an integer-valued variable and converts it into a floating point variable.

We will discuss floating point variables in more detail in a later chapter.

Global Variables edit

Global variables do not have a limited scope like lexical variables do inside a function body. Since the notion of lexical scope implies the use of the system stack, and since global variables are not lexical in nature, they are typically not found on the stack. Global variables tend to exist in the program as a hard-coded memory address, a location which never changes throughout program execution. These could exist in the DATA segment of the executable, or anywhere else that a hard-coded memory address can be used to store data.

In C, global variables are defined outside the body of any function. There is no "global" keyword. Any variable which is not defined inside a function is global. In C however, a variable which is not defined inside a function is only global to the particular source code file in which it is defined. For example, we have two files Foo.c and Bar.c, and a global variable MyGlobalVar:

Foo.c Bar.c
int MyGlobalVar;

int GetVarFoo(void)
{
  //right!
  return MyGlobalVar;
}
int GetVarBar(void)
{
  //wrong!
  return MyGlobalVar; 
}

In the example above, the variable MyGlobalVar is visible inside the file Foo.c, but is not visible inside the file Bar.c. To make MyGlobalVar visible inside all project files, we need to use the extern keyword, which we will discuss below.

"static" Variables edit

The C programming language specifies a special keyword "static" to define variables which are lexical to the function (they cannot be referenced from outside the function) but they maintain their values across function calls. Unlike ordinary lexical variables which are created on the stack when the function is entered and are destroyed from the stack when the function returns, static variables are created once and are never destroyed.

int MyFunction(void) 
{
  static int x;
  ...
}

Static variables in C are global variables, except the compiler takes precautions to prevent the variable from being accessed outside of the parent function's scope. Like global variables, static variables are referenced using a hardcoded memory address, not a location on the stack like ordinary variables. However unlike globals, static variables are only used inside a single function. There is no difference between a global variable which is only used in a single function, and a static variable inside that same function. However, it's good programming practice to limit the number of global variables, so when disassembling, you should prefer interpreting these variables as static instead of global.

"extern" Variables edit

The extern keyword is used by a C compiler to indicate that a particular variable is global to the entire project, not just to a single source code file. Besides this distinction, and the slightly larger lexical scope of extern variables, they should be treated like ordinary global variables.

In static libraries, variables marked as being extern might be available for use with programs which are linked to the library.

Global Variables Summary edit

Here is a table to summarize some points about global variables:

How it's referenced Lexical scope Notes
static variables Hard-coded memory address, only in one function One function only In disassembly, indistinguishable from global variables except that it's only used in one function. A global variable is only static if it's never used in another function.
Global variables Hard-coded memory address, only in one file One source code file only Global variables are only used in a single file. This can help you when disassembling to get a rough estimate for how the original source code was arranged.
extern variables Hard-coded memory address, in the entire project The entire project Extern variables are available for use in all functions of a project, and in programs linked to the project (external libraries, for example).

When disassembling, a hard-coded memory address should be considered to be an ordinary global variable unless you can determine from the scope of the variable that it is static or extern.

Constants edit

Variables qualified with the const keyword (in C) are frequently stored in the .data section of the executable. Constant values can be distinguished because they are initialized at the beginning of the program, and are never modified by the program itself. For this reasons, some compilers may choose to store constant variables (especially strings) in the .text section of the executable, thus allowing the sharing of these variables across multiple instances of the same process. This creates a big problem for the reverser, who now has to decide whether the code he's looking at is part of a constant variable or part of a subroutine.

"Volatile" memory edit

In C and C++, variables can be declared "volatile," which tells the compiler that the memory location can be accessed from external or concurrent processes, and that the compiler should not perform any optimizations on the variable. For instance, if multiple threads were all accessing and modifying a single global value, it would be bad for the compiler to store that variable in a register sometimes, and flush it to memory infrequently. In general, Volatile memory must be flushed to memory after every calculation, to ensure that the most current version of the data is in memory when other processes come to look for it.

It is not always possible to determine from a disassembly listing whether a given variable is a volatile variable. However, if the variable is accessed frequently from memory, and its value is constantly updated in memory (especially if there are free registers available), that's a good hint that the variable might be volatile.

Simple Accessor Methods edit

An Accessor Method is a tool derived from OO theory and practice. In its most simple form, an accessor method is a function that receives no parameters (or perhaps simply an offset), and returns the value of a variable. Accessor and Setter methods are ways to restrict access to certain variables. The only standard way to get the value of the variable is to use the Accessor.

Accessors can prevent some simple problems, such as out-of-bounds array indexing, and using uninitialized data. Frequently, Accessors contain little or no error-checking.

Here is an example:

 push ebp
 mov ebp, esp
 mov eax, [ecx + 8] ;THISCALL function, passes "this" pointer in ecx
 mov esp, ebp
 pop ebp
 ret

Because they are so simple, accessor methods are frequently heavily optimized (they generally don't need a stack frame), and are even occasionally inlined by the compiler.

Simple Setter (Manipulator) Methods edit

Setter methods are the antithesis of an accessor method, and provide a unified way of altering the value of a given variable. Setter methods will often take as a parameter the value to be set to the variable, although some methods (Initializers) simply set the variable to a pre-defined value. Setter methods often do bounds checking, and error checking on the variable before it is set, and frequently either a) return no value, or b) return a simple boolean value to determine success.

Here is an example:

 push ebp
 mov ebp, esp
 cmp [ebp + 8], 0
 je error
 mov eax, [ebp + 8]
 mov [ecx + 0], eax
 mov eax, 1
 jmp end
 :error
 mov eax, 0
 :end
 mov esp, ebp
 pop ebp
 ret