x86 Disassembly/Windows Executable Files

MS-DOS COM Files edit

COM files are loaded into RAM exactly as they appear; no change is made at all from the harddisk image to RAM. This is possible due to the segmented memory model of the early x86 line. Two 16-bit registers determine the actual address used for a memory access, a “segment” register specifying a 64K byte window into the 1M+64K byte space (in 16-byte increments) and an “offset” specifying an offset into that window. The segment register would be set by DOS and the COM file would be expected to respect this setting and not ever change the segment registers. The offset registers, however, were fair game and served (for COM files) the same purpose as a modern 32-bit register. The downside was that the offset registers were only 16-bit and, therefore, since COM files could not change the segment registers, COM files were limited to using 64K of RAM. The good thing about this approach, however, was that no extra work was needed by DOS to load and run a COM file: just load the file, set the segment register, and jump to it. (The programs could perform 'near' jumps by just giving an offset to jump to.)

COM files are loaded into RAM at offset $100. The space before that would be used for passing data to and from DOS (for example, the contents of the command line used to invoke the program).

Note that COM files, by definition, cannot be 32-bit. Windows provides support for COM files via a special CPU mode.

MS-DOS EXE Files edit

One way MS-DOS compilers got around the 64K memory limitation was with the introduction of memory models. The basic concept is to cleverly set different segment registers in the x86 CPU (CS, DS, ES, SS) to point to the same or different segments, thus allowing varying degrees of access to memory. Typical memory models were:

tiny
All memory accesses are 16-bit (segment registers unchanged). Produces a .COM file instead of an .EXE file.
small
All memory accesses are 16-bit (segment registers unchanged).
compact
Data addresses include both segment and offset, reloading the DS or ES registers on access and allowing up to 1M of data. Code accesses don't change the CS register, allowing 64K of code.
medium
Code addresses include the segment address, reloading CS on access and allowing up to 1M of code. Data accesses don't change the DS and ES registers, allowing 64K of data.
large
Both code and data addresses are (segment, offset) pairs, always reloading the segment addresses. The whole 1M byte memory space is available for both code and data.
huge
Same as the large model, with additional arithmetic being generated by the compiler to allow access to arrays larger than 64K.

When looking at an EXE file, one has to decide which memory model was used to build that file.

PE Files edit

A Portable Executable (PE) file is the standard binary file format for an Executable or DLL under Windows NT, Windows 95, and Win32. The Win32 SDK contains a file, winnt.h, which declares various structs and variables used in the PE files. Some functions for manipulating PE files are also included in imagehlp.dll. PE files are broken down into various sections which can be examined.

Relative Virtual Addressing (RVA) edit

In a Windows environment, executable modules can be loaded at any point in memory, and are expected to run without problem. To allow multiple programs to be loaded at seemingly random locations in memory, PE files have adopted a tool called RVA: Relative Virtual Addresses. RVAs assume that the "base address" of where a module is loaded into memory is not known at compile time. So, PE files describe the location of data in memory as an offset from the base address, wherever that may be in memory.

Some processor instructions require the code itself to directly identify where in memory some data is. This is not possible when the location of the module in memory is not known at compile time. The solution to this problem is described in the section on "Relocations".

It is important to remember that the addresses obtained from a disassembly of a module will not always match up to the addresses seen in a debugger as the program is running.

File Format edit

The PE portable executable file format includes a number of informational headers, and is arranged in the following format:

 

The basic format of a Microsoft PE file

MS-DOS header edit

Open any Win32 binary executable in a hex editor, and note what you see: The first 2 letters are always the letters "MZ", the initials of Mark Zbikowski, who created the first linker for DOS. To some people, the first few bytes in a file that determine the type of file are called the "magic number," although there is no rule that states that the "magic number" needs to be a single number. Instead, we will use the term "File ID Tag", or simply, File ID. Sometimes this is also known as File Signature.

After the File ID, the hex editor will show several bytes of either random-looking symbols, or whitespace, before the human-readable string "This program cannot be run in DOS mode".

What is this?

 

Hex Listing of an MS-DOS file header

What you are looking at is the MS-DOS header of the Win32 PE file. To ensure either a) backwards compatibility, or b) graceful decline of new file types, Microsoft has written a series of machine instructions(an example program is listed below the DOS header structure) into the head of each PE file. When a 32-bit Windows file is run in a 16-bit DOS environment, the program will display the error message: "This program cannot be run in DOS mode.", then terminate.

The DOS header is also known by some as the EXE header. Here is the DOS header presented as a C data structure:

 struct DOS_Header 
 {
// short is 2 bytes, long is 4 bytes
     char signature[2] = { 'M', 'Z' };
     short lastsize;
     short nblocks;
     short nreloc;
     short hdrsize;
     short minalloc;
     short maxalloc;
     void *ss; // 2 byte value
     void *sp; // 2 byte value
     short checksum;
     void *ip; // 2 byte value
     void *cs; // 2 byte value
     short relocpos;
     short noverlay;
     short reserved1[4];
     short oem_id;
     short oem_info;
     short reserved2[10];
     long  e_lfanew; // Offset to the 'PE\0\0' signature relative to the beginning of the file
 }

After the DOS header there is a stub program mentioned in the paragraph above the DOS header structure. Listed below is a commented example of that program, it was taken from a program compiled with GCC.

;# Using NASM with Intel syntax

push cs ;# Push CS onto the stack
pop ds  ;# Set DS to CS

mov dx,message ; point to our message "This program cannot be run in DOS mode.", 0x0d, 0x0d, 0x0a, '$'
mov ah, 09  
int 0x21 ;# when AH = 9, DOS interrupt to write a string

;# terminate the program
mov ax,0x4c01 
int 0x21

message db "This program cannot be run in DOS mode.", 0x0d, 0x0d, 0x0a, '$'

PE Header edit

At offset 60 (0x3C) from the beginning of the DOS header is a pointer to the Portable Executable (PE) File header (e_lfanew in MZ structure). DOS will print the error message and terminate, but Windows will follow this pointer to the next batch of information.

 

Hex Listing of a PE signature, and the pointer to it

The PE header consists only of a File ID signature, with the value "PE\0\0" where each '\0' character is an ASCII NUL character. This signature indicates a) that this file is a legitimate PE file, and b) the byte order of the file. Byte order will not be considered in this chapter, and all PE files are assumed to be in "little endian" format.

The first big chunk of information lies in the COFF header, directly after the PE signature.

COFF Header edit

The COFF header is present in both COFF object files (before they are linked) and in PE files where it is known as the "File header". The COFF header has some information that is useful to an executable, and some information that is more useful to an object file.

Here is the COFF header, presented as a C data structure:

 struct COFFHeader
 {
    short Machine;
    short NumberOfSections;
    long TimeDateStamp;
    long PointerToSymbolTable;
    long NumberOfSymbols;
    short SizeOfOptionalHeader;
    short Characteristics;
 }
Machine
This field determines what machine the file was compiled for. A hex value of 0x14C (332 in decimal) is the code for an Intel 80386.

Here's a list of possible values it can have.

Value Description
0x14c Intel 386
0x8664 x64
0x162 MIPS R3000
0x168 MIPS R10000
0x169 MIPS little endian WCI v2
0x183 old Alpha AXP
0x184 Alpha AXP
0x1a2 Hitachi SH3
0x1a3 Hitachi SH3 DSP
0x1a6 Hitachi SH4
0x1a8 Hitachi SH5
0x1c0 ARM little endian
0x1c2 Thumb
0x1c4 ARMv7 (Thumb-2)
0x1d3 Matsushita AM33
0x1f0 PowerPC little endian
0x1f1 PowerPC with floating point support
0x1f2 PowerPC 64-bit little endian
0x200 Intel IA64
0x266 MIPS16
0x268 Motorola 68000 series
0x284 Alpha AXP 64-bit
0x366 MIPS with FPU
0x466 MIPS16 with FPU
0xebc EFI Byte Code
0x8664 AMD AMD64
0x9041 Mitsubishi M32R little endian
0xaa64 ARM64 little endian
0xc0ee clr pure MSIL
NumberOfSections
The number of sections that are described at the end of the PE headers.
TimeDateStamp
32 bit time at which this header was generated: is used in the process of "Binding", see below.
SizeOfOptionalHeader
this field shows how long the "PE Optional Header" is that follows the COFF header.
Characteristics
This is a field of bit flags, that show some characteristics of the file.
Constant Name Bit Position / Mask Description
IMAGE_FILE_RELOCS_STRIPPED 1 / 0x0001 Relocation information was stripped from file
IMAGE_FILE_EXECUTABLE_IMAGE 2 / 0x0002 The file is executable
IMAGE_FILE_LINE_NUMS_STRIPPED 3 / 0x0004 COFF line numbers were stripped from file
IMAGE_FILE_LOCAL_SYMS_STRIPPED 4 / 0x0008 COFF symbol table entries were stripped from file
IMAGE_FILE_AGGRESIVE_WS_TRIM 5 / 0x0010 Aggressively trim the working set(obsolete)
IMAGE_FILE_LARGE_ADDRESS_AWARE 6 / 0x0020 The application can handle addresses greater than 2 GB
IMAGE_FILE_BYTES_REVERSED_LO 8 / 0x0080 The bytes of the word are reversed(obsolete)
IMAGE_FILE_32BIT_MACHINE 9 / 0x0100 The computer supports 32-bit words
IMAGE_FILE_DEBUG_STRIPPED 10 / 0x0200 Debugging information was removed and stored separately in another file
IMAGE_FILE_REMOVABLE_RUN_FROM_SWAP 11 / 0x0400 If the image is on removable media, copy it to and run it from the swap file
IMAGE_FILE_NET_RUN_FROM_SWAP 12 / 0x0800 If the image is on the network, copy it to and run it from the swap file
IMAGE_FILE_SYSTEM 13 / 0x1000 The image is a system file
IMAGE_FILE_DLL 14 / 0x2000 The image is a DLL file
IMAGE_FILE_UP_SYSTEM_ONLY 15 / 0x4000 The image should only be ran on a single processor computer
IMAGE_FILE_BYTES_REVERSED_HI 16 / 0x8000 The bytes of the word are reversed(obsolete)

PE Optional Header edit

The "PE Optional Header" is not "optional" per se, because it is required in Executable files, but not in COFF object files. There are two different versions of the optional header depending on whether or not the file is 64 bit or 32 bit. The Optional header includes lots and lots of information that can be used to pick apart the file structure, and obtain some useful information about it.

The PE Optional Header occurs directly after the COFF header, and some sources even show the two headers as being part of the same structure. This wikibook separates them out for convenience.

Here is the 64 bit PE Optional Header presented as a C data structure:

 struct PEOptHeader
 {
/* 64 bit version of the PE Optional Header also known as IMAGE_OPTIONAL_HEADER64
char is 1 byte
short is 2 bytes
long is 4 bytes
long long is 8 bytes
*/
    short signature; //decimal number 267 for 32 bit, 523 for 64 bit, and 263 for a ROM image. 
    char MajorLinkerVersion; 
    char MinorLinkerVersion;
    long SizeOfCode;
    long SizeOfInitializedData;
    long SizeOfUninitializedData;
    long AddressOfEntryPoint;  //The RVA of the code entry point
    long BaseOfCode;
    /*The next 21 fields are an extension to the COFF optional header format*/
    long long ImageBase;
    long SectionAlignment;
    long FileAlignment;
    short MajorOSVersion;
    short MinorOSVersion;
    short MajorImageVersion;
    short MinorImageVersion;
    short MajorSubsystemVersion;
    short MinorSubsystemVersion;
    long Win32VersionValue;
    long SizeOfImage;
    long SizeOfHeaders;
    long Checksum;
    short Subsystem;
    short DLLCharacteristics;
    long long SizeOfStackReserve;
    long long SizeOfStackCommit;
    long long SizeOfHeapReserve;
    long long SizeOfHeapCommit;
    long LoaderFlags;
    long NumberOfRvaAndSizes;
    data_directory DataDirectory[NumberOfRvaAndSizes];     //Can have any number of elements, matching the number in NumberOfRvaAndSizes.
 }                                        //However, it is always 16 in PE files.

This is the 32 bit version of the PE Optional Header presented as a C data structure:

 struct PEOptHeader
 {
/* 32 bit version of the PE Optional Header also known as IMAGE_OPTIONAL_HEADER
char is 1 byte
short is 2 bytes
long is 4 bytes
*/
    short signature; //decimal number 267 for 32 bit, 523 for 64 bit, and 263 for a ROM image. 
    char MajorLinkerVersion; 
    char MinorLinkerVersion;
    long SizeOfCode;
    long SizeOfInitializedData;
    long SizeOfUninitializedData;
    long AddressOfEntryPoint;  //The RVA of the code entry point
    long BaseOfCode;
    long BaseOfData;
    /*The next 21 fields are an extension to the COFF optional header format*/
    long ImageBase;
    long SectionAlignment;
    long FileAlignment;
    short MajorOSVersion;
    short MinorOSVersion;
    short MajorImageVersion;
    short MinorImageVersion;
    short MajorSubsystemVersion;
    short MinorSubsystemVersion;
    long Win32VersionValue;
    long SizeOfImage;
    long SizeOfHeaders;
    long Checksum;
    short Subsystem;
    short DLLCharacteristics;
    long SizeOfStackReserve;
    long SizeOfStackCommit;
    long SizeOfHeapReserve;
    long SizeOfHeapCommit;
    long LoaderFlags;
    long NumberOfRvaAndSizes;
    data_directory DataDirectory[NumberOfRvaAndSizes];     //Can have any number of elements, matching the number in NumberOfRvaAndSizes.
 }                                        //However, it is always 16 in PE files.

This is the data_directory(also known as IMAGE_DATA_DIRECTORY) structure as found in the two structures above:

/*
long is 4 bytes
*/
 struct data_directory
 { 
    long VirtualAddress;
    long Size;
 }
Signature
Contains a signature that identifies the image.
Constant Name Value Description
IMAGE_NT_OPTIONAL_HDR32_MAGIC 0x10b 32 bit executable image.
IMAGE_NT_OPTIONAL_HDR64_MAGIC 0x20b 64 bit executable image
IMAGE_ROM_OPTIONAL_HDR_MAGIC 0x107 ROM image
MajorLinkerVersion
The major version number of the linker.
MinorLinkerVersion
The minor version number of the linker.
SizeOfCode
The size of the code section, in bytes, or the sum of all such sections if there are multiple code sections.
SizeOfInitializedData
The size of the initialized data section, in bytes, or the sum of all such sections if there are multiple initialized data sections.
SizeOfUninitializedData
The size of the uninitialized data section, in bytes, or the sum of all such sections if there are multiple uninitialized data sections.
AddressOfEntryPoint
A pointer to the entry point function, relative to the image base address. For executable files, this is the starting address. For device drivers, this is the address of the initialization function. The entry point function is optional for DLLs. When no entry point is present, this member is zero.
BaseOfCode
A pointer to the beginning of the code section, relative to the image base.
BaseOfData
A pointer to the beginning of the data section, relative to the image base.
ImageBase
The preferred address of the first byte of the image when it is loaded in memory. This value is a multiple of 64K bytes. The default value for DLLs is 0x10000000. The default value for applications is 0x00400000, except on Windows CE where it is 0x00010000.
SectionAlignment
The alignment of sections loaded in memory, in bytes. This value must be greater than or equal to the FileAlignment member. The default value is the page size for the system.
FileAlignment
The alignment of the raw data of sections in the image file, in bytes. The value should be a power of 2 between 512 and 64K (inclusive). The default is 512. If the SectionAlignment member is less than the system page size, this member must be the same as SectionAlignment.
MajorOSVersion
The major version number of the required operating system.
MinorOSVersion
The minor version number of the required operating system.
MajorImageVersion
The major version number of the image.
MinorImageVersion
The minor version number of the image.
MajorSubsystemVersion
The major version number of the subsystem.
MinorSubsystemVersion
The minor version number of the subsystem.
Win32VersionValue
This member is reserved and must be 0.
SizeOfImage
The size of the image, in bytes, including all headers. Must be a multiple of SectionAlignment.
SizeOfHeaders
The combined size of the following items, rounded to a multiple of the value specified in the FileAlignment member.
  • e_lfanew member of DOS_Header
  • 4 byte signature
  • size of COFFHeader
  • size of optional header
  • size of all section headers
CheckSum
The image file checksum. The following files are validated at load time: all drivers, any DLL loaded at boot time, and any DLL loaded into a critical system process.
Subsystem
The Subsystem that will be invoked to run the executable
Constant Name Value Description
IMAGE_SUBSYSTEM_UNKNOWN 0 Unknown subsystem
IMAGE_SUBSYSTEM_NATIVE 1 No subsystem required (device drivers and native system processes)
IMAGE_SUBSYSTEM_WINDOWS_GUI 2 Windows graphical user interface (GUI) subsystem
IMAGE_SUBSYSTEM_WINDOWS_CUI 3 Windows character-mode user interface (CUI) subsystem
IMAGE_SUBSYSTEM_OS2_CUI 5 OS/2 CUI subsystem
IMAGE_SUBSYSTEM_POSIX_CUI 7 POSIX CUI subsystem
IMAGE_SUBSYSTEM_WINDOWS_CE_GUI 9 Windows CE system
IMAGE_SUBSYSTEM_EFI_APPLICATION 10 Extensible Firmware Interface (EFI) application
IMAGE_SUBSYSTEM_EFI_BOOT_SERVICE_DRIVER 11 EFI driver with boot services
IMAGE_SUBSYSTEM_EFI_RUNTIME_DRIVER 12 EFI driver with run-time services
IMAGE_SUBSYSTEM_EFI_ROM 13 EFI ROM image
IMAGE_SUBSYSTEM_XBOX 14 Xbox system
IMAGE_SUBSYSTEM_WINDOWS_BOOT_APPLICATION 16 Boot application
DLLCharacteristics
The DLL characteristics of the image
Constant Name Value Description
No constant name 0x0001 Reserved
No constant name 0x0002 Reserved
No constant name 0x0004 Reserved
No constant name 0x0008 Reserved
IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE 0x0040 The DLL can be relocated at load time
IMAGE_DLLCHARACTERISTICS_FORCE_INTEGRITY 0x0080 Code integrity checks are forced
IMAGE_DLLCHARACTERISTICS_NX_COMPAT 0x0100 The image is compatible with data execution prevention (DEP)
IMAGE_DLLCHARACTERISTICS_NO_ISOLATION 0x0200 The image is isolation aware, but should not be isolated
IMAGE_DLLCHARACTERISTICS_NO_SEH 0x0400 The image does not use structured exception handling (SEH). No handlers can be called in this image
IMAGE_DLLCHARACTERISTICS_NO_BIND 0x0800 Do not bind the image
IMAGE_DLLCHARACTERISTICS_APPCONTAINER 0x1000 The image must be executed within an App container
IMAGE_DLLCHARACTERISTICS_WDM_DRIVER 0x2000 A WDM driver
No constant name 0x4000 Reserved
IMAGE_DLLCHARACTERISTICS_TERMINAL_SERVER_AWARE 0x8000 The image is terminal server aware
SizeOfStackReserve
The number of bytes to reserve for the stack. Only the memory specified by the SizeOfStackCommit member is committed at load time; the rest is made available one page at a time until this reserve size is reached.
SizeOfStackCommit
The number of bytes to commit for the stack.
SizeOfHeapReserve
The number of bytes to reserve for the local heap. Only the memory specified by the SizeOfHeapCommit member is committed at load time; the rest is made available one page at a time until this reserve size is reached.
SizeOfHeapCommit
The number of bytes to commit for the local heap.
LoaderFlags
This member is obsolete.
NumberOfRvaAndSizes
The number of directory entries in the remainder of the optional header. Each entry describes a location and size.
DataDirectory
Possibly the most interesting member of this structure. Provides RVAs and sizes which locate various data structures, which are used for setting up the execution environment of a module. The data structures that the array of DataDirectory points to can be found in the various sections of the file as pointed to by the Section Table. The details of what these structures do exist in other sections of this page. The most interesting entries in DataDirectory are as follows: Export Directory, Import Directory, Resource Directory, and the Bound Import directory. The .NET descriptor table (CLI Header) contains the metadata of the .NET assembly, the table is an IMAGE_COR20_HEADER structure, which is defined in winnt.h. Note that the offsets in bytes are relative to the beginning of the optional header.
Constant Name Value Description Offset PE(32 bit) Offset PE32+(64 bit)
IMAGE_DIRECTORY_ENTRY_EXPORT 0 Export Directory 96 112
IMAGE_DIRECTORY_ENTRY_IMPORT 1 Import Directory 104 120
IMAGE_DIRECTORY_ENTRY_RESOURCE 2 Resource Directory 112 128
IMAGE_DIRECTORY_ENTRY_EXCEPTION 3 Exception Directory 120 136
IMAGE_DIRECTORY_ENTRY_SECURITY 4 Security Directory 128 144
IMAGE_DIRECTORY_ENTRY_BASERELOC 5 Base Relocation Table 136 152
IMAGE_DIRECTORY_ENTRY_DEBUG 6 Debug Directory 144 160
IMAGE_DIRECTORY_ENTRY_ARCHITECTURE 7 Architecture specific data 152 168
IMAGE_DIRECTORY_ENTRY_GLOBALPTR 8 Global pointer register relative virtual address 160 176
IMAGE_DIRECTORY_ENTRY_TLS 9 Thread Local Storage directory 168 184
IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG 10 Load Configuration directory 176 192
IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT 11 Bound Import directory 184 200
IMAGE_DIRECTORY_ENTRY_IAT 12 Import Address Table 192 208
IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT 13 Delay Import table 200 216
IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR 14 COM or .net descriptor table (CLI Header) 208 224
No constant name 15 Reserved 216 232

Section Table edit

Immediately after the PE Optional Header we find a section table. The section table consists of an array of IMAGE_SECTION_HEADER structures. The number of structures that we find in the file are determined by the member NumberOfSections in the COFF Header. Each structure is 40 bytes in length. Pictured below is a hex dump from a program I am writing depicting the section table:

 

The outlined areas correlate to the Name member of three IMAGE_SECTION_HEADER structures

The IMAGE_SECTION_HEADER defined as a C structure is as follows:

 struct IMAGE_SECTION_HEADER 
 {
// short is 2 bytes
// long is 4 bytes
  char  Name[IMAGE_SIZEOF_SHORT_NAME]; // IMAGE_SIZEOF_SHORT_NAME is 8 bytes
  union {
    long PhysicalAddress;
    long VirtualSize;
  } Misc;
  long  VirtualAddress;
  long  SizeOfRawData;
  long  PointerToRawData;
  long  PointerToRelocations;
  long  PointerToLinenumbers;
  short NumberOfRelocations;
  short NumberOfLinenumbers;
  long  Characteristics;
 }
Name
8-byte null-padded UTF-8 string(the string may not be null terminated if all 8 characters are used). For longer names, this member will contain '/' followed by an ASCII representation of a decimal number that is an offset into the string table. Executable images do not use a string table and do not support section names greater than 8 characters.
Misc
PhysicalAddress - The file address.
VirtualSize - The total size of the section when loaded into memory, in bytes. If this value is greater than the SizeOfRawData member, the section is filled with zeroes. This field is valid only for executable images and should be set to 0 for object files.
The Misc member should be considered unreliable unless the linker and the behavior of the linker that was used is known.
VirtualAddress
The address of the first byte of the section when loaded into memory, relative to the image base. For object files, this is the address of the first byte before relocation is applied.
SizeOfRawData
The size of the initialized data on disk, in bytes. This value must be a multiple of the FileAlignment member of the PE Optional Header structure. If this value is less than the VirtualSize member, the remainder of the section is filled with zeroes. If the section contains only uninitialized data, the value is 0.
PointerToRawData
A file pointer relative to the beginning of the file to the first page within the COFF file. This value must be a multiple of the FileAlignment member of the PE Optional Header structure. If a section contains only uninitialized data this value should be 0.
PointerToRelocations
A file pointer to the beginning of the relocation entries for the section. If there are no relocations, this value is 0.
PointerToLinenumbers
A file pointer to the beginning of the line-number entries for the section. If there are no COFF line numbers, this value is 0.
NumberOfRelocations
The number of relocation entries for the section. This value is 0 for executable images.
NumberOfLinenumbers
The number of line-number entries for the section.
Characteristics
The characteristics of the image.

The table below defines the possible 32-bit mask values for this member:

Constant Name Value Description
No Constant Name 0x00000000 Reserved
No Constant Name 0x00000001 Reserved
No Constant Name 0x00000002 Reserved
No Constant Name 0x00000004 Reserved
IMAGE_SCN_TYPE_NO_PAD 0x00000008 The section should not be padded to the next boundary. This flag is obsolete and is replaced by IMAGE_SCN_ALIGN_1BYTES
No Constant Name 0x00000010 Reserved
IMAGE_SCN_CNT_CODE 0x00000020 The section contains executable code (.text section)
IMAGE_SCN_CNT_INITIALIZED_DATA 0x00000040 The section contains initialized data
IMAGE_SCN_CNT_UNINITIALIZED_DATA 0x00000080 The section contains uninitialized data
IMAGE_SCN_LNK_OTHER 0x00000100 Reserved
IMAGE_SCN_LNK_INFO 0x00000200 The section contains comments or other information. This is valid only for object files (.drectve section)
No Constant Name 0x00000400 Reserved
IMAGE_SCN_LNK_REMOVE 0x00000800 The section will not become part of the image. This is valid only for object files
IMAGE_SCN_LNK_COMDAT 0x00001000 The section contains COMDAT data. This is valid only for object files
No Constant Name 0x00002000 Reserved
IMAGE_SCN_NO_DEFER_SPEC_EXC 0x00004000 Reset speculative exceptions handling bits in the TLB entries for this section
IMAGE_SCN_GPREL 0x00008000 The section contains data referenced through the global pointer
No Constant Name 0x00010000 Reserved
IMAGE_SCN_MEM_PURGEABLE 0x00020000 Reserved
IMAGE_SCN_MEM_LOCKED 0x00040000 Reserved
IMAGE_SCN_MEM_PRELOAD 0x00080000 Reserved
IMAGE_SCN_ALIGN_1BYTES 0x00100000 Align data on a 1-byte boundary. This is valid only for object files
IMAGE_SCN_ALIGN_2BYTES 0x00200000 Align data on a 2-byte boundary. This is valid only for object files
IMAGE_SCN_ALIGN_4BYTES 0x00300000 Align data on a 4-byte boundary. This is valid only for object files
IMAGE_SCN_ALIGN_8BYTES 0x00400000 Align data on a 8-byte boundary. This is valid only for object files
IMAGE_SCN_ALIGN_16BYTES 0x00500000 Align data on a 16-byte boundary. This is valid only for object files
IMAGE_SCN_ALIGN_32BYTES 0x00600000 Align data on a 32-byte boundary. This is valid only for object files
IMAGE_SCN_ALIGN_64BYTES 0x00700000 Align data on a 64-byte boundary. This is valid only for object files
IMAGE_SCN_ALIGN_128BYTES 0x00800000 Align data on a 128-byte boundary. This is valid only for object files
IMAGE_SCN_ALIGN_256BYTES 0x00900000 Align data on a 256-byte boundary. This is valid only for object files
IMAGE_SCN_ALIGN_512BYTES 0x00A00000 Align data on a 512-byte boundary. This is valid only for object files
IMAGE_SCN_ALIGN_1024BYTES 0x00B00000 Align data on a 1024-byte boundary. This is valid only for object files
IMAGE_SCN_ALIGN_2048BYTES 0x00C00000 Align data on a 2048-byte boundary. This is valid only for object files
IMAGE_SCN_ALIGN_4096BYTES 0x00D00000 Align data on a 4096-byte boundary. This is valid only for object files
IMAGE_SCN_ALIGN_8192BYTES 0x00E00000 Align data on a 8192-byte boundary. This is valid only for object files
IMAGE_SCN_LNK_NRELOC_OVFL 0x01000000 The section contains extended relocations. The count of relocations for the section exceeds the 16 bits that is reserved for it in the section header. If the NumberOfRelocations field in the section header is 0xffff, the actual relocation count is stored in the VirtualAddress field of the first relocation. It is an error if IMAGE_SCN_LNK_NRELOC_OVFL is set and there are fewer than 0xffff relocations in the section
IMAGE_SCN_MEM_DISCARDABLE 0x02000000 The section can be discarded as needed
IMAGE_SCN_MEM_NOT_CACHED 0x04000000 The section cannot be cached
IMAGE_SCN_MEM_NOT_PAGED 0x08000000 The section cannot be paged
IMAGE_SCN_MEM_SHARED 0x10000000 The section can be shared in memory
IMAGE_SCN_MEM_EXECUTE 0x20000000 The section can be executed as code (.text, etc. section)
IMAGE_SCN_MEM_READ 0x40000000 The section can be read
IMAGE_SCN_MEM_WRITE 0x80000000 The section can be written to

A PE loader will place the sections of the executable image at the locations specified by these section descriptors (relative to the base address) and usually the alignment is 0x1000, which matches the size of pages on the x86.

Common sections are:

  1. .text/.code/CODE/TEXT - Contains executable code (machine instructions)
  2. .textbss/TEXTBSS - Present if Incremental Linking is enabled
  3. .data/.idata/DATA/IDATA - Contains initialised data
  4. .bss/BSS - Contains uninitialised data
  5. .rsrc - Contains resource data

Imports and Exports - Linking to other modules edit

What is linking? edit

Whenever a developer writes a program, there are a number of subroutines and functions which are expected to be implemented already, saving the writer the hassle of having to write out more code or work with complex data structures. Instead, the coder need only declare one call to the subroutine, and the linker will decide what happens next.

There are two types of linking that can be used: static and dynamic. Static uses a library of precompiled functions. This precompiled code can be inserted into the final executable to implement a function, saving the programmer a lot of time. In contrast, dynamic linking allows subroutine code to reside in a different file (or module), which is loaded at runtime by the operating system. This is also known as a "Dynamically linked library", or DLL. A library is a module containing a series of functions or values that can be exported. This is different from the term executable, which imports things from libraries to do what it wants. From here on, "module" means any file of PE format, and a "Library" is any module which exports and imports functions and values.

Dynamically linking has the following benefits:

  • It saves disk space, if more than one executable links to the library module
  • Allows instant updating of routines, without providing new executables for all applications
  • Can save space in memory by mapping the code of a library into more than one process
  • Increases abstraction of implementation. The method by which an action is achieved can be modified without the need for reprogramming of applications. This is extremely useful for backward compatibility with operating systems.

This section discusses how this is achieved using the PE file format. An important point to note at this point is that anything can be imported or exported between modules, including variables as well as subroutines.

Loading edit

The downside of dynamically linking modules together is that, at runtime, the software which is initialising an executable must link these modules together. For various reasons, you cannot declare that "The function in this dynamic library will always exist in memory here". If that memory address is unavailable or the library is updated, the function will no longer exist there, and the application trying to use it will break. Instead, each module (library or executable) must declare what functions or values it exports to other modules, and also what it wishes to import from other modules.

As said above, a module cannot declare where in memory it expects a function or value to be. Instead, it declares where in its own memory it expects to find a pointer to the value it wishes to import. This permits the module to address any imported value, wherever it turns up in memory.

Exports edit

Exports are functions and values in one module that have been declared to be shared with other modules. This is done through the use of the "Export Directory", which is used to translate between the name of an export (or "Ordinal", see below), and a location in memory where the code or data can be found. The start of the export directory is identified by the IMAGE_DIRECTORY_ENTRY_EXPORT entry of the resource directory. All export data must exist in the same section. The directory is headed by the following structure:

struct IMAGE_EXPORT_DIRECTORY {
	long Characteristics;
	long TimeDateStamp;
	short MajorVersion;
	short MinorVersion;
	long Name;
	long Base;
	long NumberOfFunctions;
	long NumberOfNames;
	long *AddressOfFunctions;
	long *AddressOfNames;
	long *AddressOfNameOrdinals;
}

The "Characteristics" value is generally unused, TimeDateStamp describes the time the export directory was generated, MajorVersion and MinorVersion should describe the version details of the directory, but their nature is undefined. These values have little or no impact on the actual exports themselves. The "Name" value is an RVA to a zero terminated ASCII string, the name of this library name, or module.

Names and Ordinals edit

Each exported value has both a name and an "ordinal" (a kind of index). The actual exports themselves are described through AddressOfFunctions, which is an RVA to an array of RVAs, each pointing to a different function or value to be exported. The size of this array is in the value NumberOfFunctions. Each of these functions has an ordinal. The "Base" value is used as the ordinal of the first export, and the next RVA in the array is Base+1, and so forth.

Each entry in the AddressOfFunctions array is identified by a name, found through the RVA AddressOfNames. The data where AddressOfNames points to is an array of RVAs, of the size NumberOfNames. Each RVA points to a zero terminated ASCII string, each being the name of an export. There is also a second array, pointed to by the RVA in AddressOfNameOrdinals. This is also of size NumberOfNames, but each value is a 16 bit word, each value being an ordinal. These two arrays are parallel and are used to get an export value from AddressOfFunctions. To find an export by name, search the AddressOfNames array for the correct string and then take the corresponding value from the AddressOfNameOrdinals array. This value is then used as index to AddressOfFunctions (yes, it's 0-based index actually, NOT base-biased ordinal, as the official documentation suggests!).

Forwarding edit

As well as being able to export functions and values in a module, the export directory can forward an export to another library. This allows more flexibility when re-organising libraries: perhaps some functionality has branched into another module. If so, an export can be forwarded to that library, instead of messy reorganising inside the original module.

Forwarding is achieved by making an RVA in the AddressOfFunctions array point into the section which contains the export directory, something that normal exports should not do. At that location, there should be a zero terminated ASCII string of format "LibraryName.ExportName" for the appropriate place to forward this export to.

Imports edit

The other half of dynamic linking is importing functions and values into an executable or other module. Before runtime, compilers and linkers do not know where in memory a value that needs to be imported could exist. The import table solves this by creating an array of pointers at runtime, each one pointing to the memory location of an imported value. This array of pointers exists inside of the module at a defined RVA location. In this way, the linker can use addresses inside of the module to access values outside of it.

The Import directory edit

The start of the import directory is pointed to by both the IMAGE_DIRECTORY_ENTRY_IAT and IMAGE_DIRECTORY_ENTRY_IMPORT entries of the resource directory (the reason for this is uncertain). At that location, there is an array of IMAGE_IMPORT_DESCRIPTOR structures. Each of these identify a library or module that has a value we need to import. The array continues until an entry where all the values are zero. The structure is as follows:

struct IMAGE_IMPORT_DESCRIPTOR {
	long *OriginalFirstThunk;
	long TimeDateStamp;
	long ForwarderChain;
	long Name;
	long *FirstThunk;
}

The TimeDateStamp is relevant to the act of "Binding", see below. The Name value is an RVA to an ASCII string, naming the library to import. ForwarderChain will be explained later. The only thing of interest at this point, are the RVAs OriginalFirstThunk and FirstThunk. Both these values point to arrays of RVAs, each of which point to a IMAGE_IMPORT_BY_NAMES struct. The arrays are terminated with an entry that is equal to zero. These two arrays are parallel and point to the same structure, in the same order. The reason for this will become apparent shortly.

Each of these IMAGE_IMPORT_BY_NAMES structs has the following form:

struct IMAGE_IMPORT_BY_NAME {
	short Hint;
	char Name[1];
}

"Name" is an ASCII string of any size that names the value to be imported. This is used when looking up a value in the export directory (see above) through the AddressOfNames array. The "Hint" value is an index into the AddressOfNames array; to save searching for a string, the loader first checks the AddressOfNames entry corresponding to "Hint".


To summarise: The import table consists of a large array of IMAGE_IMPORT_DESCRIPTORs, terminated by an all-zero entry. These descriptors identify a library to import things from. There are then two parallel RVA arrays, each pointing at IMAGE_IMPORT_BY_NAME structures, which identify a specific value to be imported.

Imports at runtime edit

Using the above import directory at runtime, the loader finds the appropriate modules, loads them into memory, and seeks the correct export. However, to be able to use the export, a pointer to it must be stored somewhere in the importing module's memory. This is why there are two parallel arrays, OriginalFirstThunk and FirstThunk, identifying IMAGE_IMPORT_BY_NAME structures. Once an imported value has been resolved, the pointer to it is stored in the FirstThunk array. It can then be used at runtime to address imported values.

Bound imports edit

The PE file format also supports a peculiar feature known as "binding". The process of loading and resolving import addresses can be time consuming, and in some situations this is to be avoided. If a developer is fairly certain that a library is not going to be updated or changed, then the addresses in memory of imported values will not change each time the application is loaded. So, the import address can be precomputed and stored in the FirstThunk array before runtime, allowing the loader to skip resolving the imports - the imports are "bound" to a particular memory location. However, if the versions numbers between modules do not match, or the imported library needs to be relocated, the loader will assume the bound addresses are invalid, and resolve the imports anyway.

The "TimeDateStamp" member of the import directory entry for a module controls binding; if it is set to zero, then the import directory is not bound. If it is non-zero, then it is bound to another module. However, the TimeDateStamp in the import table must match the TimeDateStamp in the bound module's FileHeader, otherwise the bound values will be discarded by the loader.

Forwarding and binding edit

Binding can of course be a problem if the bound library / module forwards its exports to another module. In these cases, the non-forwarded imports can be bound, but the values which get forwarded must be identified so the loader can resolve them. This is done through the ForwarderChain member of the import descriptor. The value of "ForwarderChain" is an index into the FirstThunk and OriginalFirstThunk arrays. The OriginalFirstThunk for that index identifies the IMAGE_IMPORT_BY_NAME structure for a import that needs to be resolved, and the FirstThunk for that index is the index of another entry that needs to be resolved. This continues until the FirstThunk value is -1, indicating no more forwarded values to import.

Resources edit

Resources are data items in modules which are difficult to be stored or described using the chosen programming language. This requires a separate compiler or resource builder, allowing insertion of dialog boxes, icons, menus, images, and other types of resources, including arbitrary binary data. Although a number of Resource API calls can be used to retrieve resources from a PE file, we are going to look at the resources without the use of those APIs.

Locating the Resource Section edit

The first thing we need to do when we want to manually manipulate a file's resources is to find the resource section. To do this we need some information found in the DataDirectory array and the Section Table. We need the RVA stored in the VirtualAddress member of the DataDirectory[IMAGE_DIRECTORY_ENTRY_RESOURCE] structure found in the PE Optional Header. Once the RVA is known we can then lookup that RVA in the Section Table by comparing the VirtualAddress member of the DataDirectory[IMAGE_DIRECTORY_ENTRY_RESOURCE] structure with the VirtualAddress member of an IMAGE_SECTION_HEADER. Except in rare cases, the VirtualAddress member of the DataDirectory structure will equal the value of the VirtualAddress member of an IMAGE_SECTION_HEADER. The name member of that particular IMAGE_SECTION_HEADER will except in rare cases be named, '.rsrc'. Once the correct IMAGE_SECTION_HEADER structure is found the PointerToRawData member can be used to locate the resource section. The PointerToRawData member contains an offset from the beginning of the file that will lead you to the first byte of the resource section. The picture below depicts an example of a DataDirectory array on the left with an IMAGE_SECTION_HEADER on the right populated with the information for the resource section. We can see the highlighted line underneath of Directory Information '2 RVA: 20480 Size: 3512' has a VirtualAddress(RVA) of 20480, this corresponds to the VirtualAddress of 20480 for the .rsrc(resource) section. You can also see that the value of PointerToRawData is equal to 7168. In this particular PE file we will find the resource section starting at offset 7168 from the beginning of the file.

 

Once the resource section is found we can start looking at the structures and data contained in that section.

Resource structures edit

The IMAGE_RESOURCE_DIRECTORY is the very first structure we come across and it starts at the 1st byte of the resource section.

IMAGE_RESOURCE_DIRECTORY structure:

struct IMAGE_RESOURCE_DIRECTORY
{
	long Characteristics;
	long TimeDateStamp;
	short MajorVersion;
	short MinorVersion;
	short NumberOfNamedEntries;
	short NumberOfIdEntries;
}

Characteristics is unused, and TimeDateStamp is normally the time of creation, although it doesn't matter if it's set or not. MajorVersion and MinorVersion relate to the versioning info of the resources: the fields have no defined values. Immediately following the IMAGE_RESOURCE_DIRECTORY structure is a series of IMAGE_RESOURCE_DIRECTORY_ENTRYs, the number of which are defined by the total of NumberOfNamedEntries and NumberOfIdEntries. The first portion of these entries are for named resources, the latter for ID resources, depending on the values in the IMAGE_RESOURCE_DIRECTORY struct. The actual shape of the resource entry structure is as follows:

struct IMAGE_RESOURCE_DIRECTORY_ENTRY
{
	long NameId;
	long *Data;
}

The NameId value has dual purpose: if the most significant bit (or sign bit) is clear, then the lower 16 bits are an ID number of the resource. Alternately, if the top bit is set, then the lower 31 bits make up an offset from the start of the resource data to the name string of this particular resource. The Data value also has a dual purpose: if the most significant bit is set, the remaining 31 bits form an offset from the start of the resource data to another IMAGE_RESOURCE_DIRECTORY (i.e. this entry is an interior node of the resource tree). Otherwise, this is a leaf node, and Data contains the offset from the start of the resource data to a structure which describes the specifics of the resource data itself (which can be considered to be an ordered stream of bytes):

struct IMAGE_RESOURCE_DATA_ENTRY
{
        long *Data;
        long Size;
        long CodePage;
        long Reserved;
}

The Data value contains an RVA to the actual resource data, Size is self-explanatory, and CodePage contains the Unicode codepage to be used for decoding Unicode-encoded strings in the resource (if any). Reserved should be set to 0.

Layout edit

The above system of resource directory and entries allows simple storage of resources, by name or ID number. However, this can get very complicated very quickly. Different types of resources, the resources themselves, and instances of resources in other languages can become muddled in just one directory of resources. For this reason, the resource directory has been given a structure to work by, allowing separation of the different resources.

For this purpose, the "Data" value of resource entries points at another IMAGE_RESOURCE_DIRECTORY structure, forming a tree-diagram like organisation of resources. The first level of resource entries identifies the type of the resource: cursors, bitmaps, icons and similar. They use the ID method of identifying the resource entries, of which there are twelve defined values in total. More user defined resource types can be added. Each of these resource entries points at a resource directory, naming the actual resources themselves. These can be of any name or value. These point at yet another resource directory, which uses ID numbers to distinguish languages, allowing different specific resources for systems using a different language. Finally, the entries in the language directory actually provide the offset to the resource data itself, the format of which is not defined by the PE specification and can be treated as an arbitrary stream of bytes.

Relocations edit

Alternate Bound Import Structure edit

Windows DLL Files edit

Windows DLL files are a brand of PE file with a few key differences:

  • A .DLL file extension
  • A DllMain() entry point, instead of a WinMain() or main().
  • The DLL flag set in the PE header.

DLLs may be loaded in one of two ways, a) at load-time, or b) by calling the LoadModule() Win32 API function.

Function Exports edit

Functions are exported from a DLL file by using the following syntax:

 __declspec(dllexport) void MyFunction() ...

The "__declspec" keyword here is not a C language standard, but is implemented by many compilers to set extendable, compiler-specific options for functions and variables. Microsoft C Compiler and GCC versions that run on windows allow for the __declspec keyword, and the dllexport property.

Functions may also be exported from regular .exe files, and .exe files with exported functions may be called dynamically in a similar manner to .dll files. This is a rare occurrence, however.

Identifying DLL Exports edit

There are several ways to determine which functions are exported by a DLL. A common approach is to use dumpbin in the following manner:

dumpbin /EXPORTS <dll file>

This will post a list of the function exports, along with their ordinal and RVA to the console.

Function Imports edit

In a similar manner to function exports, a program may import a function from an external DLL file. The dll file will load into the process memory when the program is started, and the function will be used like a local function. DLL imports need to be prototyped in the following manner, for the compiler and linker to recognize that the function is coming from an external library:

 __declspec(dllimport) void MyFunction();

Identifying DLL Imports edit

If is often useful to determine which functions are imported from external libraries when examining a program. To list import files to the console, use dumpbin in the following manner:

dumpbin /IMPORTS <dll file>

You can also use depends.exe to list imported and exported functions. Depends is a a GUI tool and comes with Microsoft Platform SDK.