x86 Disassembly/Introduction
What Is This Book About?
editThis book is about the disassembly of x86 machine code into human-readable assembly, and the decompilation of x86 assembly code into human-readable C or C++ source code. Some topics covered will be common to all computer architectures, not just x86-compatible machines.
What Will This Book Cover?
editThis book is going to look in-depth at the disassembly and decompilation of x86 machine code and assembly code. We are going to look at the way programs are made using assemblers and compilers, and examine the way that assembly code is made from C or C++ source code. Using this knowledge, we will try to reverse the process. By examining common structures, such as data and control structures, we can find patterns that enable us to disassemble and decompile programs quickly.
Who Is This Book For?
editThis book is for readers at the undergraduate level with experience programming in x86 Assembly and C or C++. This book is not designed to teach assembly language programming, C or C++ programming, or compiler/assembler theory.
What Are The Prerequisites?
editThe reader should have a thorough understanding of x86 Assembly, C Programming, and possibly C++ Programming. This book is intended to increase the reader's understanding of the relationship between x86 machine code, x86 Assembly Language, and the C Programming Language. If you are not too familar with these topics, you may want to reread some of the above-mentioned books before continuing.
What is Disassembly?
editComputer programs are written originally in a human readable code form, such as assembly language or a high-level language. These programs are then compiled into a binary format called machine code. This binary format is not directly readable or understandable by humans. Many programs -- such as malware, proprietary commercial programs, or very old legacy programs -- may not have the source code available to you.
Programs frequently perform tasks that need to be duplicated, or need to be made to interact with other programs. Without the source code and without adequate documentation, these tasks can be difficult to accomplish. This book outlines tools and techniques for attempting to convert the raw machine code of an executable file into equivalent code in assembly language and the high-level languages C and C++. With the high-level code to perform a particular task, several things become possible:
- Programs can be ported to new computer platforms, by compiling the source code in a different environment.
- The algorithm used by a program can be determined. This allows other programs to make use of the same algorithm, or for updated versions of a program to be rewritten without needing to track down old copies of the source code.
- Security holes and vulnerabilities can be identified and patched by users without needing access to the original source code.
- New interfaces can be implemented for old programs. New components can be built on top of old components to speed development time and reduce the need to rewrite large volumes of code.
- We can figure out what a piece of malware does. We hope this leads us to figuring out how to block its harmful effects. Unfortunately, some malware writers use self-modifying code techniques (polymorphic camouflage, XOR encryption, scrambling)[1], apparently to make it difficult to even detect that malware, much less disassemble it.
Disassembling code has a large number of practical uses. One of the positive side effects of it is that the reader will gain a better understanding of the relation between machine code, assembly language, and high-level languages. Having a good knowledge of these topics will help programmers to produce code that is more efficient and more secure.