GNU C Compiler Internals/GEM Framework 3 4

Hooks edit

GEM framework is designed to facilitate development of compiler extensions. The idea of GEM is similar to the idea of Linux Security Modules (LSM), a project that defines hooks throughout Linux kernel that allow one to enforce a security policy.

GEM defines a number of hooks throughout GCC's source code. It is implemented as a patch to GCC. With GEM, a compiler extension is developed as a stand-alone program. It is compiled into a dynamically-linked module which is specified as the command line argument when GCC is invoked. GCC loads the module and calls its initialization function. The module then registers its hooks that are call-back functions in GCC.

In addition to the compiler hooks, GEM provides macros and functions that simplify extension development. In this chapter we will first introduce the hooks that GEM framework adds to GCC. Then we describe the typical issues in extension programming.

The project home page is at http://research.alexeysmirnov.name/gem

GEM adds several hooks throughout GCC source code. New hooks are added to GEM as necessary.

  • Hook gem_handle_option to function handle_option() which processes each command line option. The hook takes the current option as its argument. If the hook returns value GEM_RETURN then GCC ignores the option.
  • Hook gem_c_common_nodes_and_builtins which is called after all standard types are created. The GCC extension can create additional types.
  • Hook gem_macro_name allows one to save the name of the macro being defined. Another GEM hook gem_macro_def is called when the macro definition is parsed. Using the macro name of the new macro definition it is possible to re-define the macro. This hook is added to function create_iso_definition().
  • Hooks gem_start_decl and gem_start_function are called when a function or variable declaration/definition starts.
  • Hook gem_build_function_call allows one to modify the name and the arguments of a function call.
  • Hook gem_finish_function is inserted to finish_function() which is called from from grammar file. The compiler extension receives the function body of the function before it is translated into RTL.
  • Hooks gem_output_asm_insn and gem_final_start_function are added to function output_asm_insn() which is called for each instruction of the assembly code and function final_start_function() called when the assembly code is written to the file, respectively. The former hook receives the text that is written to the file which allows it to modify the output. The latter hook can modify function's prolog.
Take home: GEM hooks are defined mostly at the AST level. A few hooks are defined at the assembly level. The new hooks are added as necessary.

Traversing an AST edit

When the function's AST is constructed one can instrument it. GEM's gem_finish_function hook receives the AST of a function. The idea is to traverse the AST and instrument the AST nodes as necessary. Function walk_tree() takes the AST, the callback function, the optional data, NULL by default, and the walk_subtrees parameter, NULL by default. The callback function is called for each node of the AST before the operands are traversed. If the callback function modifies the walk_subtree() variable then the operands are not processed.

The following code demonstrates the idea:

  static tree walk_tree_callback(tree *tp, int *walk_subtrees, void *data) {
    tree t=*tp;
    enum tree_code code = TREE_CODE(t);
    switch (code) {
    case CALL_EXPR:
      instrument_call_expr(t);
      break;
    case MODIFY_EXPR:
      instrument_modify_expr(t);
      break;
    }
  }
  walk_tree(&t_body, walk_tree_callback, NULL, NULL);
Take home: Function walk_tree() traverses an AST applying user-defined callback function to each tree node.

Instrumenting an AST edit

In this section we describe functions that create new tree nodes and how to add the new nodes to an AST.

The walk_tree callback function can instrument the AST. Functions build1() and build() construct new tree nodes. The former function takes one operand, the latter one takes more then one operand. The following code computes the address of the operand, same as '&' C operator:

  t = build1(ADDR_EXPR, TREE_TYPE(t), t);

The following example refers to an array element arr[0]:

  t = build(ARRAY_REF, integer_type_node, arr, integer_zero_node);

The following example builds an integer constant:

  t = build_int_2(value, 0);

Building string constant is more difficult. The following example demonstrates the idea:

  tree gem_build_string_literal(int len, const char *str) {
     tree t, elem, index, type;
     t = build_string (len, str);
     elem = build_type_variant (char_type_node, 1, 0);
     index = build_index_type (build_int_2(len-1, 0));
     type = build_array_type (elem, index);
     T_T(t) = type;
     TREE_READONLY(t)=1;
     TREE_STATIC(t)=1;
     TREE_CONSTANT(t)=1;
     type=build_pointer_type (type);
     t = build1 (ADDR_EXPR, type, t);
     t = build1 (NOP_EXPR, build_pointer_type(char_type_node), t);
     return t;
  }

To build a function call one needs to build the list of arguments. Then the CALL_EXPR is constructed:

  t_arg1 = build_tree_list(NULL_TREE, arg1);
  t_arg2 = build_tree_list(NULL_TREE, arg2);
  ...
  TREE_CHAIN(t_arg1)=t_arg2;
  ...
  TREE_CHAIN(t_argn)=NULL_TREE;
  t_call = build_function_call(t_func_decl, t_arg1);

The constructed tree node is added to the AST as a statement or as an expression. A statement is added to the linked list of statements using TREE_CHAIN:

  t_stmt=build_stmt (EXPR_STMT, t_call);
  TREE_CHAIN(t_cur)=t_stmt;

Adding a tree node as an expression is the same as using '()' C operator. The new expression is added as the first argument of the operator. The result of the operator is the result of its second argument:

  t_res = build(COMPOUND_EXPR, TREE_TYPE(t), t_call, t);

If you want to add the call after t then you need to build a compound statement. This is equivalent to using curly brackets in C:

  static tree gen_start_scope() {
    t_hdr = build_stmt (COMPOUND_STMT, NULL_TREE);
    ss = build_stmt (SCOPE_STMT, NULL_TREE);
    SCOPE_BEGIN_P(ss)=1;
    SCOPE_PARTIAL_P(ss)=0;
    TREE_CHAIN(t_hdr)=NULL_TREE;
    TREE_OPERAND(t_hdr, 0)=ss;
    return t_hdr;
  }
  static tree gen_end_scope() {
    ss = build_stmt (SCOPE_STMT, NULL_TREE);
    SCOPE_BEGIN_P(ss)=0;
    SCOPE_PARTIAL_P(ss)=0;
    return ss;
  }
  static tree scope_stmt(tree t) {
    t_res=gen_start_scope();
    TREE_CHAIN(TREE_OPERAND(t_res, 0)) = t;
    while (TREE_CHAIN(t)) t = TREE_CHAIN(t);
    TREE_CHAIN(t)=gen_end_scope();
    return t_res;
  }
Take home: Use GCC's functions to build new nodes and add them to the AST, either as a statement or as an expression.

When to Instrument edit

In this section we will describe when each of GEM hooks is used.

  • Add new function and type declarations in hook gem_c_common_nodes_and_builtins.
  • Instrument an AST after it is parsed in hook gem_finish_function.

Function Prolog/Epilog edit

The assembly instructions are written to the assembly file:

  #define OUTPUT_ASM_INST(inst) \
    p=inst;                     \
    putc('\t', asm_out_file);   \
    while (*p++) putc(p, asm_out_file);  \
    putc('\n', asm_out_file);   
  OUTPUT_ASM_INST("pushl %%eax");
  OUTPUT_ASM_INST("popl %%eax");
Take home: Assembly instructions are added to function prolog and epilog using hooks gem_output_asm_insn and gem_final_start_function.