x86 Assembly/FASM Syntax

FASM, also known as Flat Assembler, is an optimizing assembler for the x86 architecture. FASM is written in assembly, so it can assemble/bootstrap itself. It runs on various operating systems including DOS, Linux, Unix, and Windows. It supports the x86 and x86-64 instruction sets including SIMD extensions MMX, SSE - SSE4, and AVX.

Hexadecimal Numbers

edit

FASM supports all popular syntaxes used to define hexadecimal numbers:

0xbadf00d ; C-Like Syntax
$badf00d  ; Pascal-Like Syntax
0badf00dh  ; h Syntax, requires leading zero to be valid at assembly time

Labels

edit

FASM supports several unique labeling features.

Anonymous Labels

edit

FASM supports labels that use no identifier or label name.

  • @@: represents an anonymous label. Any number of anonymous labels can be defined.
  • @b refers to the closest @@ that can be found when looking backwards in source. @r and @b are equivalent.
  • @f refers to the closest @@ that can be found when looking forward in source.
@@:
    inc eax
    push eax
    jmp @b     ; This will result in a stack fault sooner or later
    jmp @f     ; This instruction will never be hit
@@:            ; if jmp @f was ever hit, the instruction pointer would be set to this anonymous label
    invoke ExitProcess, 0 ; Winasm only

Local Labels

edit

Local labels, which begin with a . (period). You can reference a local label in the context of its global label parent.

entry globallabel

globallabel:
    .locallabelone:
        jmp globallabel2.locallabelone
    .locallabeltwo:
 
globallabel2:
    .locallabelone:
    .locallabeltwo:
        jmp globallabel.locallabelone ; infinite loop

Operators

edit

FASM supports several unique operators to simplify assembly code.

The $ Operator

edit

$ describes the current location in an addressing space. It is used to determine the size of a block of code or data. The MASM equivalent of the $ is the SIZEOF operator.

mystring db "This is my string", 0
mystring.length = $ - mystring

The # Operator

edit

# is the symbol concatenation operator, used for combining multiple symbols into one. It can only be used inside of the body of a macro like rept or a custom/user-defined macro, because it will replace the name of the macro argument supplied with its value.

macro contrived value {
    some#value db 22
}
; ...
contrived 2

; assembles to...
some2 db 22

The ` Operator

edit

` is used to obtain the name of a symbol passed to a macro, converting it to a string.

macro print_contrived value {
    formatter db "%s\n"
    invoke printf, formatter, `value
}
; ...
print_contrived SOMEVALUE

; assembles to...
formatter db "%s\n"
invoke printf, formatter, "SOMEVALUE"

Built In Macros

edit

FASM has several useful built in macros to simplify writing assembly code.

Repetition

edit

The rept directive is used to compact repetitive assembly instructions into a block. The directive begins with the word rept, then a number or variable specifying the number of times the assembly instructions inside of the curly braces proceeding the instruction should be repeated. The counter variable can be aliased to be used as a symbol, or as part of an instruction within the rept block.

rept 2 {
    db "Hello World!", 0Ah, 0
}

; assembles to...
db "Hello World!", 0Ah, 0
db "Hello World!", 0Ah, 0

; and...
rept 2 helloNumber {
    hello#helloNumber db "Hello World!", 0Ah, 0 ; use the symbol concatenation operator '#' to create unique labels hello1 and hello2
}

; assembles to...
hello1 db "Hello World!", 0Ah, 0
hello2 db "Hello World!", 0Ah, 0

Structures

edit

The struc directive allows assembly of data into a format similar to that of a C structure with members. The definition of a struc makes use of local labels to define member values.

struc 3dpoint x, y, z
{
    .x db x,
    .y db y,
    .z db z
}

some 3dpoint 1, 2, 3

; assembles to...
some:
    .x db 1
    .y db 2
    .z db 3

; access a member through some.x, some.y, or some.z for x, y, and z respectively

Custom Macros

edit

FASM supports defining custom macros as a way of assembling multiple instructions or conditional assembly as one larger instruction. They require a name and can have an optional list of arguments, separated by commas.

macro name arg1, arg2, ... {
   ; <macro body>
}

Variable Arguments

edit

Macros can support a variable number of arguments through the square bracket syntax.

macro name arg1, arg2, [varargs] {
   ; <macro body>
}

Required Operands

edit

The FASM macro syntax can require operands in a macro definition using the * operator after each operand.

; all operands required, will not assemble without
macro mov op1*, op2*, op3*
{
    mov op1, op2
    mov op2, op3
}

Operator Overloading

edit

The FASM macro syntax allows for the overloading of the syntax of an instruction, or creating a new instruction. Below, the mov instruction has been overloaded to support a third operand. In the case that none is supplied, the regular move instruction is assembled. Otherwise, the data in op2 is moved to op1 and op2 is replaced by op3.

; not all operands required, though if op1 or op2 are not supplied
; assembly should fail
; could also be defined as 'macro mov op1*, op2*, op3' to force requirement of the first two arguments
macro mov op1, op2, op3
{
    if op3 eq
        mov op1, op2
    else
        mov op1, op2
        mov op2, op3
    end if
}

Hello World

edit

This is a complete example of a Win32 assembly program that prints 'Hello World!' to the console and then waits for the user to press any key before exiting the application.

format PE console                            ; Win32 portable executable console format
entry _start                                 ; _start is the program's entry point

include 'win32a.inc'                         

section '.data' data readable writable       ; data definitions

hello db "Hello World!", 0
stringformat db "%s", 0ah, 0

section '.code' code readable executable     ; code

_start:
        invoke printf, stringformat, hello   ; call printf, defined in msvcrt.dll
        invoke getchar                       ; wait for any key
        invoke ExitProcess, 0                ; exit the process

section '.imports' import data readable      ; data imports

library kernel, 'kernel32.dll',\             ; link to kernel32.dll, msvcrt.dll
        msvcrt, 'msvcrt.dll'

import kernel, \                             ; import ExitProcess from kernel32.dll
       ExitProcess, 'ExitProcess'

import msvcrt, \                             ; import printf and getchar from msvcrt.dll
       printf, 'printf',\
       getchar, '_fgetchar'

This is an example for x86_64 GNU+Linux:

format ELF64 executable 3                 ;; ELF64 Format for GNU+Linux
segment readable executable               ;; Executable code section

;; Some definitions for readabilty purposes

define SYS_exit     60
define SYS_write    1

define stdout       1
define exit_success 0

_start:                                   ;; Entry point for our program
    mov eax, SYS_write                    ;; SYS_write(               // Call the write(2) syscall
    mov edi, stdout                       ;;     STDOUT_FILENO,       // Write to stdout
    mov esi, hello_world                  ;;     hello_world,         // Buffer to write to STDOUT_FILENO: hello_world
    mov edx, hello_world_length           ;;     hello_world_length,  // Buffer length
    syscall                               ;; );

    mov eax, SYS_exit                     ;; SYS_exit(                // Call the exit exit(2) syscall
    mov edi, exit_success                 ;;     EXIT_SUCCESS,        // Exit with success exit code, required if we don't want a segfault
    syscall                               ;; );

segment readable                          ;; Read-only constant data section
    hello_world: db "Hello world", 10     ;; const char *hello_world = "Hello world\n";
    hello_world_length = $ - hello_world  ;; const size_t hello_world_length = strlen(hello_world);
edit