GNU C Compiler Internals/GEM Framework 4 1

Hooks edit

GEM framework is designed to facilitate development of compiler extensions. The idea of GEM is similar to the idea of Linux Security Modules (LSM), a project that defines hooks throughout Linux kernel that allow one to enforce a security policy.

GEM defines a number of hooks throughout GCC's source code. It is implemented as a patch to GCC. With GEM, a compiler extension is developed as a stand-alone program. It is compiled into a dynamically-linked module which is specified as the command line argument when GCC is invoked. GCC loads the module and calls its initialization function. The module then registers its hooks that are call-back functions in GCC.

In addition to the compiler hooks, GEM provides macros and functions that simplify extension development. In this chapter we will first introduce the hooks that GEM framework adds to GCC. Then we describe the typical issues in extension programming.

The project home page is at http://research.alexeysmirnov.name/gem

GEM adds several hooks throughout GCC source code. New hooks are added to GEM as necessary.

  • Hook gem_handle_option to function handle_option() which processes each command line option. The hook takes the current option as its argument. If the hook returns value GEM_RETURN then GCC ignores the option.
  • Hook gem_c_common_nodes_and_builtins which is called after all standard types are created. The GCC extension can create additional types.
  • Hook gem_macro_name allows one to save the name of the macro being defined. Another GEM hook gem_macro_def is called when the macro definition is parsed. Using the macro name of the new macro definition it is possible to re-define the macro. This hook is added to function create_iso_definition().
  • Hooks gem_start_decl and gem_start_function are called when a function or variable declaration/definition starts.
  • Hook gem_build_function_call allows one to modify the name and the arguments of a function call.
  • Hook gem_finish_function is inserted to finish_function() which is called from from grammar file. The compiler extension receives the function body of the function before it is translated into RTL.
  • Hooks gem_output_asm_insn and gem_final_start_function are added to function output_asm_insn() which is called for each instruction of the assembly code and function final_start_function() called when the assembly code is written to the file, respectively. The former hook receives the text that is written to the file which allows it to modify the output. The latter hook can modify function's prolog.
Take home: GEM hooks are defined mostly at the AST level. A few hooks are defined at the assembly level. The new hooks are added as necessary.

Traversing an AST edit

When the function's AST is constructed one can instrument it. GEM's gem_finish_function hook receives the AST of a function. The idea is to traverse the AST and instrument the AST nodes as necessary. Function walk_tree() takes the AST, the callback function, the optional data, NULL by default, and the walk_subtrees parameter, NULL by default. The callback function is called for each node of the AST before the operands are traversed. If the callback function modifies the walk_subtree() variable then the operands are not processed.

The following code demonstrates the idea:

  static tree walk_tree_callback(tree *tp, int *walk_subtrees, void *data) {
    tree t=*tp;
    enum tree_code code = TREE_CODE(t);
    switch (code) {
    case CALL_EXPR:
      instrument_call_expr(t);
      break;
    case MODIFY_EXPR:
      instrument_modify_expr(t);
      break;
    }
  }
  walk_tree(&t_body, walk_tree_callback, NULL, NULL);
Take home: Function walk_tree() traverses an AST applying user-defined callback function to each tree node.

Instrumenting an AST edit

In this section we describe functions that create new tree nodes and how to add the new nodes to an AST.

Lookup of a Declaration in the Symbol Table edit

 void gem_find_symtab(tree *t_var, char *name) {
   tree t_ident = get_identifier(name);
   if (t_ident) *t_var = lookup_name(t_ident); else *t_var=NULL_TREE;
 }

Building Tree Nodes edit

The walk_tree callback function can instrument the AST. Functions build1() and build() construct new tree nodes. The former function takes one operand, the latter one takes more then one operand. The following code computes the address of the operand, same as '&' C operator:

  t = build1(ADDR_EXPR, TREE_TYPE(t), t);

The following example refers to an array element arr[0]:

  t = build(ARRAY_REF, integer_type_node, arr, integer_zero_node);

The following example builds an integer constant:

  t = build_int_cst(NULL_TREE, 123);

Building a string constant is more difficult. The following example demonstrates the idea:

  tree gem_build_string_literal(int len, const char *str) {
     tree t, elem, index, type;
     t = build_string (len, str);
     elem = build_type_variant (char_type_node, 1, 0);
     index = build_index_type (build_int_2(len-1, 0));
     type = build_array_type (elem, index);
     T_T(t) = type;
     TREE_READONLY(t)=1;
     TREE_STATIC(t)=1;
     TREE_CONSTANT(t)=1;
     type=build_pointer_type (type);
     t = build1 (ADDR_EXPR, type, t);
     t = build1 (NOP_EXPR, build_pointer_type(char_type_node), t);
     return t;
  }

To build a function call one needs to find the function's declaration and build the list of arguments. Then the CALL_EXPR is constructed:

  gem_find_symtab(&t_func_decl, "func");
  t_arg1 = build_tree_list(NULL_TREE, arg1);
  t_arg2 = build_tree_list(NULL_TREE, arg2);
  ...
  TREE_CHAIN(t_arg1)=t_arg2;
  ...
  TREE_CHAIN(t_argn)=NULL_TREE;
  t_call = build_function_call(t_func_decl, t_arg1);

If you want to build a list of statements { stmt1; stmt2; ... }, then you need to use function append_to_statement_list():

  tree list=NULL_TREE;
  for (i=0; i<num_stmt; i++) {
    BUILD_FUNC_CALL1(t_call, t_send, t_arr[i], NULL_TREE);
    append_to_statement_list(t_call, &list);
  }

Adding Nodes to a Tree edit

GCC 4.1 has an interface that allows one to add a chain of nodes into another chain of nodes implemented in file tree-iterator.c. Functions tsi_start() and tsi_last() create a tree statement iterator and assigns it to the first or the last tree in the list, respectively. Functions tsi_link_before() and tsi_link_after() link a statement using the iterator either before or after the current statement. There is also function append_to_statement_list() that adds a node to a list. If the specified list argument is NULL_TREE then a new statement list is allocated.

Building Function and Variable Declarations edit

A global declaration is added in hook gem_c_common_nodes_and_builtins(). In this following example we build a structure type and create a global variable of this type. The structure has a field of type unsigned int and a function pointer field.

  t_log = make_node(RECORD_TYPE);
  decl_chain = NULL_TREE;
  field_decl = build_decl(FIELD_DECL, get_identifier("addr"), unsigned_type_node);
  TREE_CHAIN(field_decl)=decl_chain;
  decl_chain=field_decl;
  DECL_FIELD_CONTEXT(decl_chain) = t_log;
  ...
  t_func_type = build_function_type_list(void_type_node, unsigned_type_node, NULL_TREE);
  field_decl = build_decl(FIELD_DECL, get_identifier("add_addr"), build_pointer_type(t_func_type);
  TREE_CHAIN(field_decl)=decl_chain;
  decl_chain=field_decl;
  DECL_FIELD_CONTEXT(decl_chain) = t_log;
  ...
  TYPE_FIELDS(t_log) = nreverse(decl_chain);
  layout_type(t_log);
  pushdecl(build_decl(TYPE_DECL, get_identifier("log_t"), t_log));
  decl = build_decl(VAR_DECL, get_identifier("log"), build_pointer_type(t_log));
  DECL_EXTERNAL(decl)=1;
  pushdecl(decl);

When to Instrument edit

In this section we will describe when each of GEM hooks is used.

  • Add new function and type declarations in hook gem_c_common_nodes_and_builtins.
  • Instrument an AST after it is parsed in hook gem_finish_function.
  • Modify attributes of a declaration in hooks gem_start_decl and gem_finish_decl. Let us say we would like to replace local array declarations char arr[10] with a heap array char *arr=(char*)malloc(10);
 void l2h_start_decl(void *p_decl, void *p_declspecs, init initialized, void *p_attr) {
   struct c_declarator *decl = *((struct c_declarator**)p_decl);
   if (current_function_decl == NULL_TREE) return;
   if (decl->kind == cdk_array) {
     decl->kind = cdk_pointer;
     decl->u.pointer_quals = 0;
   }
 }
 void l2h_finish_decl(tree decl, tree *init, tree spec) {
   ...
   gem_find_symtab(&t_malloc, "malloc");
   BUILD_FUNC_CALL1(t_call, t_malloc, build_int_cst(NULL_TREE, size), NULL_TREE);
   *init = build1(NOP_EXPR, build_pointer_type(char_type_node), t_call);
   DECL(decl) = build_int_cst(NULL_TREE, 0); // if this field is NULL the init is ignored
 }
  • Replace function call with a proxy function

Function Prolog/Epilog edit

The assembly instructions are written to the assembly file:

  #define OUTPUT_ASM_INST(inst) \
    p=inst;                     \
    putc('\t', asm_out_file);   \
    while (*p++) putc(p, asm_out_file);  \
    putc('\n', asm_out_file);   
  OUTPUT_ASM_INST("pushl %%eax");
  OUTPUT_ASM_INST("popl %%eax");
Take home: Assembly instructions are added to function prolog and epilog using hooks gem_output_asm_insn and gem_final_start_function.