x86 Disassembly/Objects and Classes
The Objects and Classes page of the X86 Disassembly Wikibook is a stub. You can help by expanding this section.
Object-Oriented Programming
editObject-Oriented (OO) programming provides for us a new unit of program structure to contend with: the Object. This chapter will look at disassembled classes from C++. This chapter will not deal directly with COM, but it will work to set a lot of the groundwork for future discussions in reversing COM components (Windows users only).
Classes
editA basic class that has not inherited anything can be broken into two parts, the variables and the methods. The non-static variables are shoved into a simple data structure while the methods are compiled and called like every other function.
When you start adding in inheritance and polymorphism, things get a little more complicated. For the purposes of simplicity, the structure of an object will be described in terms of having no inheritance. At the end, however, inheritance and polymorphism will be covered.
Variables
editAll static variables defined in a class resides in the static region of memory for the entire duration of the application. Every other variable defined in the class is placed into a data structure known as an object. Typically when the constructor is called, the variables are placed into the object in sequential order, see Figure 1.
A:
class ABC123 {
public:
int a, b, c;
ABC123():a(1), b(2), c(3) {};
};
B:
0x00200000 dd 1 ;int a
0x00200004 dd 2 ;int b
0x00200008 dd 3 ;int c
Figure 1: An example of what an object looks like in memory |
However, the compiler typically needs the variables to be separated into sizes that are multiples of a word (2 bytes) in order to locate them. Not all variables fit this requirement, namely char arrays; some unused bits might be used pad the variables so they meet this size requirement. This is illustrated in Figure 2.
A:
class ABC123{
public:
int a;
char b[3];
double c;
ABC123():a(1),c(3) { strcpy(b,"02"); };
};
B:
0x00200000 dd 1 ;int a ; offset = abc123 + 0*word_size
0x00200004 db '0' ;b[0] = '0' ; offset = abc123 + 2*word_size
0x00200005 db '2' ;b[1] = '2'
0x00200006 db 0 ;b[2] = null
0x00200007 db 0 ;<= UNUSED BYTE
0x00200008 dd 0x00000000 ;double c, lower 32 bits ; offset = abc123 + 4*word_size
0x0020000C dd 0x40080000 ;double c, upper 32 bits
Figure 2: An example of an object having a padded variable |
In order for the application to access one of these object variables, an object pointer needs to be offset to find the desired variable. The offset of every variable is known by the compiler and written into the object code wherever it's needed. Figure 3 shows how to offset a pointer to retrieve variables.
;abc123 = pointer to object
mov eax, [abc123] ;eax = &a ;offset = abc123+0*word_size = abc123
mov ebx, [abc123+4] ;ebx = &b ;offset = abc123+2*word_size = abc123+4
mov ecx, [abc123+8] ;ecx = &c ;offset = abc123+4*word_size = abc123+8
Figure 3: This shows how to offset a pointer to retrieve variables. The first line places the address of variable 'a' into eax. The second line places the address of variable 'b' into ebx. And the last line places the variable 'c' into ecx.
Methods
editAt a low level, there is almost no difference between a function and a method. When decompiling, it can sometimes be hard to tell a difference between the two. They both reside in the text memory space, and both are called the same way. An example of how a method is called can be seen in Figure 4.
A:
//method call
abc123->foo(1, 2, 3);
B:
push 3 ; int c
push 2 ; int b
push 1 ; int a
push [ebp-4] ; the address of the object
call 0x00434125 ; call to method
Figure 4: A method call. |
A notable characteristic in a method call is the address of the object being passed in as an argument. This, however, is not a always a good indicator. Figure 5 shows function with the first argument being an object passed in by reference. The result is function that looks identical to a method call.
A:
//function call
foo(abc123, 1, 2, 3);
B:
push 3 ; int c
push 2 ; int b
push 1 ; int a
push [ebp+4] ; the address of the object
call 0x00498372 ; call to function
Figure 5: A function call. |
Inheritance & Polymorphism
editInheritance and polymorphism completely changes the structure of a class, the object no longer contains just variables, they also contain pointers to the inherited methods. This is due to the fact that polymorphism requires the address of a method or inner object to be figured out at runtime.
Take Figure 6 into consideration. How does the application know to call D::one or C::one? The answer is that the compiler figures out a convention in which to order variables and method pointers inside the object such that when they're referenced, the offsets are the same for any object that has inherited its methods and variables.
A *obj[2];
obj[0] = new C();
obj[1] = new D();
for(int i=0; i<2; i++)
obj[i]->one();
|
Figure 6: A small C++ polymorphic loop that calls a function, one. The classes C and D both inherit an abstract class, A. The class A, for this code to work, must have a virtual method with the name, "one." |
The abstract class A acts as a blueprint for the compiler, defining an expected structure for any class that inherits it. Every variable defined in class A and every virtual method defined in A will have the exact same offset for any of its children. Figure 7 declares a possible inheritance scheme as well as it structure in memory. Notice how the offset to C::one is the same as D::one, and the offset to C's copy of A::a is the same as D's copy. In this, our polymorphic loop can just iterate through the array of pointers and know exactly where to find each method.
A:
class A{
public:
int a;
virtual void one() = 0;
};
class B{
public:
int b;
int c;
virtual void two() = 0;
};
class C: public A{
public:
int d;
void one();
};
class D: public A, public B{
public:
int e;
void one();
void two();
};
B:
;Object C
0x00200000 dd 0x00423848 ; address of C::one ;offset = 0*word_size
0x00200004 dd 1 ; C's copy of A::a ;offset = 2*word_size
0x00200008 dd 4 ; C::d ;offset = 4*word_size
;Object D
0x00200100 dd 0x00412348 ; address of D::one ;offset = 0*word_size
0x00200104 dd 1 ; D's copy of A::a ;offset = 2*word_size
0x00200108 dd 0x00431255 ; address of D::two ;offset = 4*word_size
0x0020010C dd 2 ; D's copy of B::b ;offset = 6*word_size
0x00200110 dd 3 ; D's copy of B::c ;offset = 8*word_size
0x00200114 dd 5 ; D::e ;offset = 10*word_size
Figure 7: A polymorphic inheritance scheme.
Figure 7.A defines the inheritance scheme. It shows that class C inherits class A, and class D inherits class A and class B. |