GNU C Compiler Internals/GNU C Compiler Architecture 3 4

An Overview of GCC Architecture. Compilation of an expression. edit

This section is based on a Red Hat magazine article [1].

The GNU Compiler Collection (GCC) comprises a number of compilers for different programming languages. The main GCC executable gcc processes source files written in C, C++, Objective-C, Objective-C++, Java, Fortran, or Ada and produces an assembly file for each source file. It is a driver program that invokes the appropriate compilation programs depending on the language of the source file. For a C source file they are the preprocessor and compiler cc1, the assembler as, and the linker collect2. The first and the third programs come with a GCC distribution; the assembler is a part of the GNU binutils package. This book describes the internals of the preprocessor and compiler cc1.

Each compiler includes three components: a front end, a middle end, and a back end. GCC compiles one file at a time. A source file goes through all three components one after another. Its representation is modified when it goes from a component to the next. Figure 1 illustrates the components and the source file representations associated with each of them. The abstract syntax tree (AST), register transfer language (RTL), and object are the main representations.

GCC front end, middle end, and back end with source file representations.

Main Representations edit

The purpose of the front end is to read the source file, parse it, and convert it into the standard abstract syntax tree (AST) representation. The AST is a dual-type representation: it is a tree where a node can have children and a list of statements where nodes are chained one after another. There is one front end for each programming language.

The AST is then used to generate a register-transfer language (RTL) tree. RTL is a hardware-based representation that corresponds to an abstract target architecture with an infinite number of registers. An RTL optimization pass optimizes the tree in the RTL form. Finally, a GCC back end generates the assembly code for the target architecture using the RTL representation. Examples of back ends are the x86 back end and the MIPS back end.

In the next sections we describe the internals of the C front end and the x86 back end. The compiler starts with its initialization and command line options processing. After that the C front end preprocesses the source file, parses it and performs a number of optimizations. The back end then generates the assembly code for the target platform and saves it to a file.

Auxiliary Data Structures edit

GCC has a number of additional data structures that facilitate code development, for example vector and heap.

The macros defined in vec.h implement a set of templated vector types and associated interfaces. These templates are implemented with macros, as we're not in C++ land. The interface functions are typesafe and use static inline functions, sometimes backed by out-of-line generic functions. The vectors are designed to interoperate with the GTY machinery.

Because of the different behavior of structure objects, scalar objects and of pointers, there are three flavors, one for each of these variants. Both the pointer and structure object variants pass pointers to objects around -- in the former case the pointers are stored into the vector and in the latter case the pointers are dereferenced and the objects copied into the vector. The scalar object variant is suitable for int-like objects, and the vector elements are returned by value.

There are both 'index' and 'iterate' accessors. The iterator returns a boolean iteration condition and updates the iteration variable passed by reference. Because the iterator will be inlined, the address-of can be optimized away.

The vectors are implemented using the trailing array idiom, thus they are not resizeable without changing the address of the vector object itself. This means you cannot have variables or fields of vector type -- always use a pointer to a vector. The one exception is the final field of a structure, which could be a vector type. You will have to use the embedded_size & embedded_init calls to create such objects, and they will probably not be resizeable (so don't use the 'safe' allocation variants). The trailing array idiom is used (rather than a pointer to an array of data), because, if we allow NULL to also represent an empty vector, empty vectors occupy minimal space in the structure containing them.

Each operation that increases the number of active elements is available in 'quick' and 'safe' variants. The former presumes that there is sufficient allocated space for the operation to succeed (it dies if there is not). The latter will reallocate the vector, if needed. Reallocation causes an exponential increase in vector size. If you know you will be adding N elements, it would be more efficient to use the reserve operation before adding the elements with the 'quick' operation. This will ensure there are at least as many elements as you ask for, it will exponentially increase if there are too few spare slots. If you want to reserve a specific number of slots, but do not want the exponential increase (for instance, you know this is the last allocation), use a negative number for reservation. You can also create a vector of a specific size from the get go.

You should prefer the push and pop operations, as they append and remove from the end of the vector. If you need to remove several items in one go, use the truncate operation. The insert and remove operations allow you to change elements in the middle of the vector. There are two remove operations, one which preserves the element ordering 'ordered_remove', and one which does not 'unordered_remove'. The latter function copies the end element into the removed slot, rather than invoke a memmove operation. The 'lower_bound' function will determine where to place an item in the array using insert that will maintain sorted order.

When a vector type is defined, first a non-memory managed version is created. You can then define either or both garbage collected and heap allocated versions. The allocation mechanism is specified when the type is defined, and is therefore part of the type. If you need both gc'd and heap allocated versions, you still must have exactly one definition of the common non-memory managed base vector.

If you need to directly manipulate a vector, then the 'address' accessor will return the address of the start of the vector. Also the 'space' predicate will tell you whether there is spare capacity in the vector. You will not normally need to use these two functions.

Vector types are defined using a DEF_VEC_{O,P,I}(TYPEDEF) macro, to get the non-memory allocation version, and then a DEF_VEC_ALLOC_{O,P,I}(TYPEDEF,ALLOC) macro to get memory managed vectors. Variables of vector type are declared using a VEC(TYPEDEF,ALLOC) macro. The ALLOC argument specifies the allocation strategy, and can be either 'gc' or 'heap' for garbage collected and heap allocated respectively. It can be 'none' to get a vector that must be explicitly allocated (for instance as a trailing array of another structure). The characters O, P and I indicate whether TYPEDEF is a pointer (P), object (O) or integral (I) type. Be careful to pick the correct one, as you'll get an awkward and inefficient API if you use the wrong one. There is a check, which results in a compile-time warning, for the P and I versions, but there is no check for the O versions, as that is not possible in plain C. Due to the way GTY works, you must annotate any structures you wish to insert or reference from a vector with a GTY(()) tag. You need to do this even if you never declare the GC allocated variants.

An example of their use would be,

 DEF_VEC_P(tree);   // non-managed tree vector.
 DEF_VEC_ALLOC_P(tree,gc);    // gc'd vector of tree pointers.  This must
                              // appear at file scope.
 struct my_struct {
   VEC(tree,gc) *v;      // A (pointer to) a vector of tree pointers.
 struct my_struct *s;
 if (VEC_length(tree,s->v)) { we have some contents }
 VEC_safe_push(tree,gc,s->v,decl); // append some decl onto the end
 for (ix = 0; VEC_iterate(tree,s->v,ix,elt); ix++)
   { do something with elt }

Additional Representations edit

Does GCC 3.4 have additional representations?

Take home: GCC is a compiler collection that consists of a front end for each programming language, a middle end, and a back end for each architecture. The main representations that each source file goes through are AST in the front end, RTL in the middle end, and the assembly representation in the back end. GCC compiles one file at a time.

GCC Initialization edit

GCC initialization.

The C front end includes the C/C++ preprocessor and the C compiler. Program cc1 includes both the preprocessor and C compiler. It compiles a C source file and generates an assembly (.S) file.

The compiler frontend and backend interact with each other using callback functions called language hooks. All hooks are included into a global variable struct lang_hooks lang_hooks that is defined in file langhooks.h. There are the following types of hooks: hooks for tree inlining, hooks for call graph, hooks for functions, hooks for tree dump, hooks for types, hooks for declarations, and language-specific hooks. The default values for the hooks are defined in file langhooks-def.h.

GCC initialization consists of command line option parsing, initializing the back end, creating the global scope, and initializing the built-in data types and functions.

Each declaration is associated with a scope. For example, a local variable is associated with its function's scope. A global declaration is associated with the global scope.

File toplev.c contains the main cc1 function toplev_main() and global variables that define the state of the compiler. Variable current_function_decl is the declaration of the function being compiled or NULL if between functions.

Function toplev_main() is the function that processes the command-line options, initializes the compiler, compiles a file, and frees up the allocated resources. Function decode_options() processes the command-line options and sets the corresponding variables within the compiler.

Following the command line option parsing function do_compile() is called. It performs the back-end initialization by calling function backend_init().

Backend initialization includes a number of steps. Function init_emit_once() generates RTL expressions for a number of registers: pc_rtx for the program counter, cc0 for the condition, stack_pointer_rtx, frame_pointer_rtx, etc. They are saved in array global_rtl.

After that, function lang_dependent_init() performs language-dependent initialization which includes the initialization of a front-end and a back-end. The C initialization function c_objc_common_init() creates built-in data types, initializes the global scope and performs other initialization tasks. Function c_common_nodes_and_builtins() creates pre-defined types described in file builtin-types.def.

The standard C types are created at the initialization time. The following table presents several types:

GCC builtin types
Variable Name C type
char_type_node char
integer_type_node int
unsigned_type_node unsigned int
void_type_node void
ptr_type_node void*

GCC built-in functions are the functions that are evaluated at compile time. For example, if the size argument of a strcpy() function is a constant then GCC replaces the function call with the required number of assignments. The compiler replaces standard library calls with built-in functions and then evaluates them once the function's AST is constructed. In case of strcpy(), the compiler checks the size argument and uses the optimized built-in version of strcpy() if the argument is constant. builtin_constant_p() allows to find out if the value of its argument is known at compile time. GCC builtins are used beyond GCC. For example, the string processing library of Linux kernel uses builtin_constant_p() to invoke the optimized version of a string processing function if the string size is known at compile time.

GCC evaluates each builtin function using a corresponding expand_builtin() function. For example, builtin_strcmp() is evaluated using expand_builtin_strcmp(). The following table gives examples of GCC builtins:

GCC builtin functions
Builtin Name Explanation
builtin_constant_p returns true if the argument is a constant
builtin_memcpy equivalent to memcpy()
builtin_strlen equivalent to strlen()

Take home: GCC initialization consists of command line option parsing, initializing the back end, creating the global scope, and initializing the built-in data types and functions.

Parser and Preprocessor edit

Following the initialization, function do_compile() calls function compile_file(). This function invokes parse_file() front-end language hook which is set to function c_common_parse_file() for C language. The latter function invokes function finish_options() which initializes the preprocessor and handles -D, -U, and -A command line options (which are equivalent to #define, #undef, and #assert respectively). The C preprocessor handles the preprocessor directives such as #define, #include in the source code.

Parser edit

Following the preprocessor initialization, {{{4}}} function is invoked. This function uses the standard lex/bison tools to parse the file.

Preprocessor edit

The preprocessor is implemented as a library. The C language lexer function c_lex() calls the libcpp function cpp_get_token() which handles the preprocessor keywords. The state of the preprocessor is defined by variable cpp_reader *parse_in. Type struct cpp_reader contains most importantly the list of text buffers being processed. Each buffer corresponds to a source file (.c or .h). Function cpp_get_token() calls appropriate functions for legitimate preprocessor keywords. For example, when #include is encountered, function do_include_common() is invoked. It allocates a new buffer and places it on top of the buffer stack making it the current buffer. When a new file is compiled the buffer is popped off the stack and the compilation of the old file continues.

Function do_define() is called whenever a new macro is defined using #define keyword.

Take home: The preprocessor takes care of the preprocessor directives, for example #include and #ifdef.

From Source Code to AST edit

After running the preprocessor GCC constructs an abstract syntax tree (AST) for each function of the source file. An AST is a number of connected nodes of type struct tree. Each node has a tree code that defines the type of the tree. Macro TREE_CODE() is used to refer to the code. Tree codes are defined in files tree.def and c-common.def. Trees with different tree codes are grouped into tree classes. The following tree classes are defined among others in GCC:

GCC tree classes
Tree Class Explanation
'd' declarations
'<' comparison
'2' binary arithmetic

There are two types of trees: statements and expressions. Statements correspond to the C constructs, for example an expression followed by a ';', a conditional statement, a loop statement, etc. Expressions are what the statements are built up from. Examples of expressions are an assignment expression, an arithmetic expression, a function call, etc. Examples of tree codes are given in the following table:

GCC tree codes
Tree Code Tree Class Explanation Operands
MODIFY_EXPR 'e' an assignment expression TREE_OPERAND(t,0) - left hand side; TREE_OPERAND(t,1) - right hand side;
CALL_EXPR 'e' function call TREE_OPERAND(t,0) - function definition; TREE_OPERAND(t,1) - function arguments;
FUNCTION_DECL 'd' variable declaration DECL_SOURCE_FILE(t) - source file; DECL_NAME(t) - variable name;
ARRAY_TYPE 't' array type TREE_TYPE(t) - type of an array element; TYPE_DOMAIN(t) - type of index;
DECL_STMT 'e' variable declaration TREE_OPERAND(t,0) - variable; DECL_INITIAL(TREE_OPERAND(t,1)) - initial value;

In addition to the tree code that defines the type of the tree, a number of operands different for each tree type are available. For example, an assignment expression has two operands which correspond to the left and right sides of the expression. Macro TREE_OPERAND is used to refer to the operands. Macro IDENTIFIER_POINTER is used to find the name of an identifier that an IDENTIFIER_NODE tree represents. The following table presents several tree nodes, their purpose, and their operands.

Each tree has a type that corresponds to the type of C expression that it represents For example, the type of the MODIFY_EXPR node is the type of the left-hand side operand. NOP_EXPR and CONVERT_EXPR trees are used for type casting.

Tree NULL_TREE is equivalent to NULL. Function debug_tree() prints out the tree to stderr.

When a new identifier is lexed, it is added to the pool of strings that GCC maintains. The tree code of an identifier is IDENTIFIER_NODE. When the same identifier is lexed again, the same tree node is returned. Function get_identifier() returns the tree node for an identifier.

Parsing variable declaration.

A new variable declaration is processed in a number of function calls. First, function start_decl() is called with the name of the declaration, its type as returned by the lexer, and its attributes. The function calls grokdeclarator() that checks the type and argument nodes and returns a tree with the code appropriate for the declaration: VAR_DECL for variables, TYPE_DECL for types, etc. The declarartion is then added to the scope. A scope contains all declarations created within a function, but does not contain global declarations. There is also the global scope that contains global declarations. When a function is parsed, its declarations are attached to its body as a BLOCK node. When a declaration is created, the identifier node is associated with the declaration node using IDENTIFIER_SYMBOL_VALUE. Function lookup_name() returns the declaration for a given identifier. When a declaration leaves the scope the tree attributes C_DECL_INVISIBLE is asserted.

A symbol table is not maintained in GCC. Instead, the compiler uses the pool of identifiers and C_DECL_INVISIBLE attribute. Language hook lang_hooks.decls.getdecls() returns the variables in the scope chained together.

Functions start_init() and finish_init() are called for an initialized declaration. Function finish_decl() finalizes the declaration. For an array declaration it computes the size of an initialized array. Then function layout_decl() is called. It computes the size and alignment of the declaration.

Parsing a function depends on the presence of its body. A function declaration is parsed using the same functions as those used for variable declarations. For a function definition, function start_function() is called. Then the compiler parses the body of the function. When the function ends function finish_function() is called.

Functions start_decl() and start_function() take declaration's attributes parameter as one of their arguments. The attributes are described in the GCC Manual. Attributes are extensions of GNU implementation of C. The following table presents a few of them and explains their purpose:

Function attributes
Attribute Explanation
constructor call function automatically before main()
destructor call function automatically after main()
alias alias to another function

For each type of C statement there is a function that constructs a tree node of the corresponding type. For example, function build_function_call() builds a CALL_EXPR node for a function call. It takes the identifier of the function name and the arguments as its parameters. The function finds the function declaration using lookup_name() and type casts the arguments using default_conversion().

After a function has been parsed, use macro DECL_SAVED_TREE to access its body. It is represented with a BIND_EXPR tree which binds local variables to statements. BIND_EXPR_VARS gives a chain of declared variables. BIND_EXPR_BODY returns a tree of STATEMENT_LIST type.

The following API allows one to traverse a statement list and to manipulate it:

Tree construction API
Function Purpose
tsi_start(stmt_list) get an iterator pointing at list head
tsi_last(stmt_list) get an iterator pointing at list tail
tsi_end_p(iter) is end of list?
tsi_stmt(iter) get current element
tsi_split_statement_list_before(&iter) split elements at iter
tsi_link_after(&iter, stmt, mode) link chain after iter
tsi_next(&iter) next element of the list
append_to_statement_list(tree, &stmt_list) append tree to stmt_list

It is possible to instrument function prolog/epilog at this level as demonstrated in gimplify_function_tree(). To add a statement to function epilog, use TRY_FINALLY_EXPR tree. Its first operand are the old statements, the second argument is the epilog statements. This type of treet instructs the following passes to execute the statements when a common exit basic block of a function is created.

To instrument function prolog, prepend the tree with the desired statements. Therefore, BIND_EXPR_BODY will have the prolog and the TRY_FINALLY_EXPR tree.

The ASTs are then converted into the SSA and eventually to the RTL representations. Whether the conversion occurs after parsing each function or when the whole file is parsed is controlled by the compiler option -funit-at-a-time. It is false by default.

Take home: GCC parser builds the AST representation of the source file. ASTs are built up of tree nodes. Each node has a code. Tree nodes correspond to the statements and expressions of C. Function debug_tree() prints out the tree.

From AST to GIMPLE edit

An AST is gimplified when gimplify_function_tree() is eventually called from finish_function().

GIMPLE representation is based on SIMPLE described in [2]

According to this paper, the goal is to represent the tree as basic statements:

GIMPLE trees
x=a binop b x=a x=cast b f(args)
*p=a binop b *p=a *p=cast b -
x=unop a x=*q x=&y x=f(args)
*p=unop a *p=*q *p=&y *p=f(args)

Temp variables are created as necessary in functions create_tmp_var() and declare_tmp_vars().

At this stage, GCC optimizes of complex conditional expressions, that is

 if (a || b) stmt;

is translated into

 if (a) goto L1;
 if (b) goto L1; else goto L2;

Also, each branch of a conditional expression is wrapped into STATEMENT_LIST tree.

From GIMPLE to RTL edit

Register Transfer Language represents an abstract machine with an infinite number of registers. Type rtx describes an instruction. Each RTL expression has a code and machine mode.

Similarly to ASTs, codes are grouped in a number of classes. They are defined in mode-classes.def.

Classes of RTL expressions
Classes Explanation
RTX_CONST_OBJ represent a constant object (e.g, CONST_INT)
RTX_OBJ represent an object (e.g, REG, MEM)
RTX_COMPARE comparison (e.g, LT, GT)
RTX_COMM_COMPARE commutative comparison (e.g, EQ, NE, ORDERED)
RTX_UNARY unary arithmetic expression (e.g, NEG, NOT)
RTX_COMM_ARITH commutative binary operation (e.g,, PLUS, MULT)
RTX_TERNARY non-bitfield three input operation (IF_THEN_ELSE)
RTX_BIN_ARITH non-commutative binary operation (e.g., MINUS, DIV)
RTX_MATCH something that matches in insns (e.g, MATCH_DUP)
RTX_AUTOINC autoincrement addressing modes (e.g. POST_DEC)
RTX_EXTRA everything else

Machine modes listed in file machmode.def specify a size and format of data at the machine level. At the syntax tree level, each ..._TYPE and each ..._DECL node has a machine mode which describes data of that type or the data of the variable declared.

A list of instructions is built when function is compiled. Function emit_insn() adds an instruction to the list. A variable declaration AST has its RTL already generated. Use DECL_RTL to access it. Function emit_cmp_and_jump_insns() outputs a conditional statement. emit_label() prints out a label. These functions chain instructions one after another. Macros PREV_INSN and NEXT_INSN are used to traverse the list.

It is possible to access the first and last instructions using first_insn and last_insn. get_insns() gives the first insn of the current sequence or current function.

Use debug_rtx() to print out an RTL instruction on the screen and function print_rtl() to print a list of RTL expressions.

A number of functions create the nodes. For example, gen_label_rtx() build a label. The most generic functions are located in the target-specific directory. For example, x86 architecture rtl generation files genrtl.c and genrtl.h are in ./host-i686-pc-linux-gnu.

From RTL to Object edit

Each target architecture has its description represented as struct gcc_target targetm. Default initializers are available in file targhooks.c

The backend generates the assembly code for the specified target platform. Function output_asm_insn() is called for each instruction written to the assembly file. Function final_start_function() generates the prolog of the function before it is saved to the assembly file.

Lowering Passes edit

The processing of a function includes its lowering when a number of optimization passes are applied to it in function tree_lowering_passes(). As a result of lowering a function its control-flow graph is generated. A subsequent call to function cgraph_create_edges() uses basic block information to augment edges of the callgraph with invocations that the current function performs. References to functions yet undefined are saved in function record_references().

All lowering passes
Name Meaning
remove_useless_stmts N/A
mudflap_1 narrow-pointer bounds-checking by tree rewriting
lower_cf Lowers GIMPLE into unstructured form
pass_lower_eh Exception handling semantics and decomposition for trees
pass_build_cfg Create basic blocks
pass_lower_complex_O0 complex operation lowering without optimization
pass_lower_vector Lower vector operations to scalar operations
pass_warn_function_return Emit return warnings
pass_early_tree_profile set up hooks for tree-based profiling

Switch Statement edit

Let us consider how a switch statement is translated from the source code to GIMPLE to RTL.

Function c_parser_switch_statement() is called when the statement is encountered in the source code. A typical switch statement has a number of case statements that might have breaks. Therefore, the parser has c_break_label tree that marks the place where switch ends. The function parses the body of the statement and generates LABEL_EXPR tree for the break label if at least one break was found. Function c_finish_case() attaches the body to the SWITCH_EXPR tree as one of its operands. In addition, this tree has two other operands: switch condition and switch labels. The operands are accessed using macro SWITCH_COND(), SWITCH_BODY(), and SWITCH_LABELS(). The labels are not filled in at the parse time.

A switch statement is gimplified in function gimplify_switch_expr(). The idea is to separate the body from the decision part and generate switch labels to make it possible to redirect execution to the appropriate case after verifying the condition. We will consider the case when default label exists. This function has two pointers to the statement list: pre_p which is the list of side effects and expr_p which is the statement itself.

The body of the switch is gimplified in gimplify_to_stmt_list(). Case labels are saved in field case_labels of variable struct gimplify_ctx gimplify_ctxp. Then the function creates a TREE_VEC of labels and initializes them with the corresponding case label. The TREE_VEC is assigned to SWITCH_LABELS operand of switch statement which is then appended to the pre_p list. The original statement is then overwritten with the SWITCH_BODY using expr_p pointer. Finally, the SWITCH_BODY operand of the switch statement in the side effects list is wiped out so that it contains labels only.

From this point on it is clear that compiler tries to represent the original statement using a jump table which maps each possible index value to the address of the corresponding case. Function expand_case() implements this idea. It generates a table_label at which jump instructions for each possible index value are generated. Then function try_tablejump() is called which expands index tree into index rtl and invokes do_tablejump(). This function generates absolute index rtl which combines the base address table_label and index offset. It emits jump instruction to the proper entry of the jump table afterwards. The execution continues in function expand_case(). The jump table is generated using SWITCH_LABELS:

labelvec[i] = gen_rtx_LABEL_REF (Pmode, label_rtx (n->code_label));

A number of jump instructions are emitted finally:

if (CASE_VECTOR_PC_RELATIVE || flag_pic)
  emit_jump_insn (gen_rtx_ADDR_DIFF_VEC (CASE_VECTOR_MODE,
                                         gen_rtx_LABEL_REF (Pmode, table_label),
                                         gen_rtvec_v (ncases, labelvec),
                                         const0_rtx, const0_rtx));

Take home: The backend generates the assembly code for the specified target platform.
  1. ^
  2. ^ L. Hendren, C. Donawa, M. Emami, G. Gao, Justiani, and B. Sridharan. Designing the McCAT Compiler Based on a Family of Structured Intermediate Representations. In Proc. of the 5th International Workshop on Languages and compilers for Parallel Computing, 1992.