ROSE Compiler Framework/outliner

Overview

edit

Basic concept: outlining is the process of replacing a block of consecutive statements with a function call to a new function containing those statements. Conceptually, outlining the inverse of inlining.

Use: Outlining is widely used to generate kernel functions to be executed on CPU and/or GPU.

  • help implementing programming models such as OpenMP
  • support empirical tuning of a code portion by first generating a function out of the code portion.

ROSE provide a builtin translator called AST outliner, which can outline a specified portion of code and generate a function from it.

Official documentation for the AST outliner is located in Chapter 37 Using the AST Outliner with the ROSE Tutorial. pdf.

There are two basic ways to use the outliner.

  • Command line method: You can use a command (outline )with options to specify outlining targets, there are two ways to specify which portion of code to outline
    • use a special pragma to mark outline targets in the input program, and then call a high-level driver routine to process these pragmas.
    • using abstract handle strings (detailed in Chapter 46 of ROSE tutorial) as command line options
  • Function call method: call “low-level” outlining routines that operate directly on AST nodes you want to outline

Installation

edit

Please follow instructions at https://github.com/rose-compiler/rose/wiki/How-to-Set-Up-ROSE

If you are installing from source code, please check https://github.com/rose-compiler/rose/wiki/Install-Rose-From-Source

To install the tool only, type

  • make install -C tests/nonsmoke/functional/roseTests/astOutliningTests

The outliner tool will be installed as

  • ROSE_INST/bin/outline

Command line

edit

The tool rose/bin/outline relies on 1) pragmas in input codes or 2) abstract handles specified as command line options to find the target code portion to be outlined.

  • The pragma: put #pragam rose_outline right in front of the code portion you want to outline in your input code
  • abstract handle: -rose:outline:abstract_handle your_handle_string

Options

edit

./outline --help | more


Outliner-specific options
Usage: outline [OPTION]... FILENAME...
Main operation mode:
        -rose:outline:preproc-only                     preprocessing only, no actual outlining
        -rose:outline:abstract_handle handle_string    using an abstract handle to specify an outlining target
        -rose:outline:parameter_wrapper                use an array of pointers to pack the variables to be passed
        -rose:outline:structure_wrapper                use a data structure to pack the variables to be passed
        -rose:outline:enable_classic                   use parameters directly in the outlined function body without transferring statement, C only
        -rose:outline:temp_variable                    use temp variables to reduce pointer dereferencing for the variables to be passed
        -rose:outline:enable_liveness                  use liveness analysis to reduce restoring statements if temp_variable is turned on
        -rose:outline:new_file                         use a new source file for the generated outlined function
        -rose:outline:output_path                      the path to store newly generated files for outlined functions, if requested by new_file. The original source file's path is used by default.
        -rose:outline:exclude_headers                  do not include any headers in the new file for outlined functions
        -rose:outline:use_dlopen                       use dlopen() to find the outlined functions saved in new files.It will turn on new_file and parameter_wrapper flags internally
        -rose:outline:copy_orig_file                   used with dlopen(): single lib source file copied from the entire original input file. All generated outlined functions are appended to the lib source file
        -rose:outline:enable_debug                     run outliner in a debugging mode
        -rose:outline:select_omp_loop                  select OpenMP for loops for outlining, used for testing purpose

Example use

edit
  • outline test.cpp // outline code portions in test.cpp. These code portions are marked by the special rose_outline pragma
  • outline -rose:skipfinalCompileStep -rose:outline:new_file test.cpp // skip compiling the generated rose_? file, put the generated function into a new file

Using abstract handles at command lines, no need to insert pragmas into your input codes anymore

  • outline -rose:outline:abstract_handle ”ForStatement<position,12>” test3.cpp // outline the for loop located at line 12 of test3.cpp
  • outline -rose:outline:abstract_handle ”FunctionDeclaration<name,initialize>::ForStatement<numbering,2>” test2.cpp // outline the 2nd for loop within a function named "initialize" within the test2.cpp file.

/home/liao6/workspace/masterDevClean/buildtree/tests/roseTests/astOutliningTests/outline -rose:outline:new_file -rose:outline:temp_variable -rose:outline:exclude_headers -rose:outline:abstract_handle 'ForStatement<numbering,1>' -c /home/liao6/workspace/masterDevClean/sourcetree/tests/roseTests/astOutliningTests/complexStruct.c

Programming API

edit

You can build your own translators leveraging the outlining support in ROSE. The programming API is defined in

  • Header file: src/midend/programTransformation/astOutlining/
  • Namespace: Outliner

A few functions and options are provided:

  • Functions: Outliner::outline(), Outliner::isOutlineable()
  • Options

Internal control variables

edit

Outliner.cc

namespace Outliner {
  //! A set of flags to control the internal behavior of the outliner
  bool enable_classic=false;
  // use a wrapper for all variables or one parameter for a variable or a wrapper for all variables
  bool useParameterWrapper=false;  // use an array of pointers wrapper for parameters of the outlined function
  bool useStructureWrapper=false;  // use a structure wrapper for parameters of the outlined function
  bool preproc_only_=false;  // preprocessing only
  bool useNewFile=false; // generate the outlined function into a new source file
  bool copy_origFile=false; // when generating the new file to store outlined function, copy entire original file to it.
  bool temp_variable=false; // use temporary variables to reduce pointer dereferencing
  bool enable_liveness =false;
  bool enable_debug=false; // 
  bool exclude_headers=false;
  bool use_dlopen=false; // Outlining the target to a separated file and calling it using a dlopen() scheme. It turns on useNewFile.
  std::string output_path=""; // default output path is the original file's directory
  std::vector<std::string> handles; //  abstract handles of outlining targets, given by command line option -rose:outline:abstract_handle for each

// DQ (3/19/2019): Suppress the output of the #include "autotuning_lib.h" since some tools will want to define their own supporting libraries and header files.
  bool suppress_autotuning_header = false; // when generating the new file to store outlined function, suppress output of #include "autotuning_lib.h".
};

Algorithm

edit

Top-level driver

edit

The outliner uses three methods to find which code portions to outline

  • collectPragms() for C/C++
  • collectFortranTarget() for Fortran ,
  • collectAbstractHandles() using abstract handles

the outline program's top level driver: PragmaInterface.cc

  • Outliner::outlineAll (SgProject* project)
    • collectPragms() for C/C++ or collectFortranTarget() for Fortran , or collectAbstractHandles() using abstract handles
  • outline(SgPragmaDeclaration)
      • outline(SgStatement, func_name)
        • preprocess(s)
        • outlineBlock (s_post, func_name) // Transform.cc The key function here!!
    • deleteAST(SgPragmaDeclaration)

Eligibility Check

edit

Check if a SgNode is eligible for outlining.

  • Outliner::isOutlineable() src/Check.cc:251
    • checkType() // only specified SgNode types can be outlined, a list is maintained here
    • excluding SgVariableDeclaration
    • must enclosed inside a function declaration
      • excluding template instantiation (member) function declaration
    • does not refer hidden types ...

Preprocessing

edit

There are two phases: preprocessing and actual transformation.

  • SgBasicBlock* s_post = preprocess (s);
    • SgStatement * processPragma (SgPragmaDeclaration* decl) // check if it's an outline pragma (#pragma rose_outline), return the next stmt if so.
    • Outliner::preprocess(SgStatement);
      • SgBasicBlock * Outliner::Preprocess::preprocessOutlineTarget (SgStatement* s)
        • normalizeVarDecl()
        • createBlock()
        • Outliner::Preprocess::transformPreprocIfs
        • Outliner::Preprocess::transformThisExprs
        • Outliner::Preprocess::transformNonLocalControlFlow
        • Outliner::Preprocess::gatherNonLocalDecls(); // duplicate function declarations here, e.g. test2005_179.C

Actual Transformation

edit

Outliner::outline(stmt) --> generateFuncName(s) unique function name Outliner::outline (stmt, func_name)

  • Outliner::Transform::outlineBlock (s_post, func_name); // Transform.cc
    • Outliner::Transform::collectVars (s, syms); // collect variables to be passed
    • Outliner::generateFunction() //generate an outlined function, src/midend/programTransformation/astOutlining/GenerateFunc.cc
      • createFuncSkeleton()
      • moveStatementsBetweenBlocks (s, func_body); // move statements in the source BB into the function body
      • variableHandling (syms, func, vsym_remap); // append unwrapping statements
        • createParam() // create parameters
        • createUnpackDecl() // create unpacking statement: int local = parameter, from src/midend/programTransformation/astOutlining/GenerateFunc.cc
        • createPackStmt() // transfer local back to the parameter after all local calculation
      • remapVarSyms (vsym_remap, func_body); // variable substitution
    • insert() from Insert.cc // insert outlined function and its prototype
      • insertFriendDecls()
      • insertGlobalPrototype()
        • GlobalProtoInserter::insertManually ()
          • generatePrototype()
    • generateCall() // generate a call to the outlined function
    • ASTtools::replaceStatement () // replace the original portion with the call


Call stack

#0  Outliner::generateFunction (s=0x7fffe849a990, func_name_str="OUT__1__11770__", syms=..) at ../../../sourcetree/src/midend/programTransformation/astOutlining/GenerateFunc.cc:1283
#1  0x00007ffff65bd93e in Outliner::outlineBlock (s=0x7fffe849a990, func_name_str="OUT__1__11770__") at ../../../sourcetree/src/midend/programTransformation/astOutlining/Transform.cc:310
#2  0x00007ffff6589b09 in Outliner::outline (s=0x7fffe849a990, func_name="OUT__1__11770__") at ../../../sourcetree/src/midend/programTransformation/astOutlining/Outliner.cc:166
#3  0x00007ffff65907f9 in Outliner::outline (decl=0x7fffe87a2310) at ../../../sourcetree/src/midend/programTransformation/astOutlining/PragmaInterface.cc:141
#4  0x00007ffff65911b8 in Outliner::outlineAll (project=0x7fffebc38010) at ../../../sourcetree/src/midend/programTransformation/astOutlining/PragmaInterface.cc:355
#5  0x000000000040c84f in main (argc=12, argv=0x7fffffffae38) at ../../../../../../sourcetree/tests/nonsmoke/functional/roseTests/astOutliningTests/outline.cc:51


For C++ code blocks to be outlined, we have to check access to private members and add necessary friend function declarations

call chain for creation: all inside Insert.cc

  • Outliner::insert (SgFunctionDeclaration* func, SgGlobal* scope, SgBasicBlock* target_outlined_code )
    • insertFriendDecls (SgFunctionDeclaration* func, SgGlobal* scope, FuncDeclList_t& friends) // what is func here??
      • insertFriendDecl (const SgFunctionDeclaration* func, SgGlobal* scope, SgClassDefinition* cls_def)
        • generateFriendPrototype (const SgFunctionDeclaration* full_decl, SgScopeStatement* scope, SgScopeStatement* class_scope) Insert.cc


The algorithm of insertFriendDecls (SgFunctionDeclaration* func, SgGlobal* scope, FuncDeclList_t& friends)

for the outlined function

  • look for references to class private variables, using isProtPrivMember (func)
  • look for references to class private member functions, using isProtPrivMember (f_ref)
  • save the relevant class definitions into a list


If the outlned function will be created in a new source file. The outliner will copy dependent declarations into the new source file also. The relevant function used is SageInterface::appendStatementWithDependentDeclaration(func,glob_scope,func_orig,exclude_headers);

The code using function is at line 636 of the source file: https://github.com/rose-compiler/rose/blob/weekly/src/midend/programTransformation/astOutlining/Transform.cc

Variable handling

edit

The variable handling process finds variables used within a code block, and decides how to pass a variable into and outside of an outlined function. It relies on several program analyses to achieve best results.

  • scope analysis (in CollectVars.cc): deciding what variables should be passed as function parameter, using visibility of a variable's declaration with respect to the location of the outlined function. If the original declaration is visible to the outlined function, there is no need to pass it as a function parameter.
  • collectPointerDereferencingVar: finding variables which should be using pointer deferencing in outlined function(in VarSym.cc): ASTtools::collectPointerDereferencingVarSyms(s,pdSyms);
  • side-effect analysis : SageInterface::collectReadOnlyVariables(s,readOnlyVars);
  • liveness analysis : SageInterface::getLiveVariables(liv, isSgForStatement(firstStmt), liveIns, liveOuts);

Scope analysis: notations of variable sets and set operations to obtain which variables should be passed as function parameters, implemented in

  • U : the set of variables used within the code block (s) to be outined
  • L: local variable declared within s
  • U-L : variables which should be passed as function parameters into/out of the outlined function
  • Q: variables defined within the function enclosing s that are visible at s, but not globally declared beyond the enclosing function. Global variables should not be passed as parameters if the outlined function is put within the same file.
  • (U-L) Intersect Q: the variables to be passed into the outlined function

ASTtools::collectPointerDereferencingVarSyms (): Collect variables to be replaced by pointer dereferencing (pdSym) in the outlined function

  • pdSyms = useByAddressVars + Non-assignableVars + Struct/ClassVars
  • use-by-address analysis: collectVarRefsUsingAddress(s, varSetB); e.g. &a
  • un-assignable variable analysis: collectVarRefsOfTypeWithoutAssignmentSupport(s,varSetB); variables with types which are not assignable
  • class/struct variables: passing by reference is more efficient for them

calculateVariableRestorationSet(): decide if some variables need to be restored from their clones in the end of the outlined function, only used when the variable cloning feature is turned on

  • check each function parameter
  • the parameter should be restored if isWritten && isLiveOut : changed within the outlined function and will be used after the outlined function.

Transform.cc


/**
 * Major work of outlining is done here
 *  Preparations: variable collection
 *  Generate outlined function
 *  Replace outlining target with a function call
 *  Append dependent declarations,headers to new file if needed
 */
Outliner::Result
Outliner::outlineBlock (SgBasicBlock* s, const string& func_name_str)
{
...

  SgClassDeclaration* struct_decl = NULL;
  if (Outliner::useStructureWrapper)
  {
    struct_decl = generateParameterStructureDeclaration (s, func_name_str, syms, pdSyms, glob_scope);
    ROSE_ASSERT (struct_decl != NULL);
  }

  std::set<SgInitializedName*> restoreVars;
  calculateVariableRestorationSet (syms, readOnlyVars,liveOuts,restoreVars);

Advanced features

edit

Some details for outlining can be specified by using command line options or internal flags of the programming API.

List

  • wrap all variables into a data structure: Outliner::useStructureWrapper

Variable cloning

edit

Option to turn this feature on

  • -rose:outline:temp_variable use temp variables to reduce pointer dereferencing for the variables to be passed

The purpose of this feature is to reduce the pointer dereferencing in a code block so the code block can be more easily be optimized later. The transformaiton will use a local variable to obtain the value, then use the local variable to participate in the computation. After that, the local variable's value is transferred back to the pointer value.

Example

// input code
#include <stdio.h>
#include <stdlib.h>

const char *abc_soups[10] = {("minstrone"), ("french onion"), ("Texas chili"), ("clam chowder"), ("potato leek"), ("lentil"), ("white bean"), ("chicken noodle"), ("pho"), ("fish ball")};

int main (void)
{
// split variable declarations with their initializations, as a better demo for the outliner
  const char *soupName;
  int value; 
#pragma rose_outline
  {
    value = rand();
    soupName = abc_soups[value  % 10];
  }

  printf ("Here are your %d,  %s soup\n", value, soupName);
  return 0;
}



// without variable cloning

#include <stdio.h>
#include <stdlib.h>
const char *abc_soups[10] = {("minstrone"), ("french onion"), ("Texas chili"), ("clam chowder"), ("potato leek"), ("lentil"), ("white bean"), ("chicken noodle"), ("pho"), ("fish ball")};
static void OUT__1__12274__(void **__out_argv);

int main()
{
// split variable declarations with their initializations, as a better demo for the outliner
  const char *soupName;
  int value;
  void *__out_argv1__12274__[2];
  __out_argv1__12274__[0] = ((void *)(&value));
  __out_argv1__12274__[1] = ((void *)(&soupName));
  OUT__1__12274__(__out_argv1__12274__);
  printf("Here are your %d,  %s soup\n",value,soupName);
  return 0;
}

static void OUT__1__12274__(void **__out_argv)
{
  const char **soupName = (const char **)__out_argv[1];
  int *value = (int *)__out_argv[0];
   *value = rand();                            // pointer dreferencing is used in the computation
   *soupName = abc_soups[ *value % 10];
}

// With variable cloning

#include <stdio.h>
#include <stdlib.h>
const char *abc_soups[10] = {("minstrone"), ("french onion"), ("Texas chili"), ("clam chowder"), ("potato leek"), ("lentil"), ("white bean"), ("chicken noodle"), ("pho"), ("fish ball")};
static void OUT__1__12274__(void **__out_argv);

int main()
{
// split variable declarations with their initializations, as a better demo for the outliner
  const char *soupName;
  int value;
  void *__out_argv1__12274__[2];
  __out_argv1__12274__[0] = ((void *)(&value));
  __out_argv1__12274__[1] = ((void *)(&soupName));
  OUT__1__12274__(__out_argv1__12274__);
  printf("Here are your %d,  %s soup\n",value,soupName);
  return 0;
}


static void OUT__1__12274__(void **__out_argv)
{
  const char *soupName =  *((const char **)__out_argv[1]);
  int value =  *((int *)__out_argv[0]);   // local variable, original type, (not pointer type)
  value = rand();                         // local variable in computation.
  soupName = abc_soups[value % 10];
   *((const char **)__out_argv[1]) = soupName;
   *((int *)__out_argv[0]) = value;
}


local variable's type:

 452│     SgType* local_type = NULL;
 453│     if( SageInterface::is_Fortran_language( ) )
 454│         local_type= orig_var_type;
 455│     else if( Outliner::temp_variable || Outliner::useStructureWrapper )
 456│     // unique processing for C/C++ if temp variables are used
 457│     {
 458│         if( isPointerDeref || ( !isPointerDeref && is_array_parameter ) )
 459│         {
 460│           // Liao 3/11/2015. For a parameter of a reference type, we have to specially tweak the unpacking statement
 461│           // It is not allowed to create a pointer to a reference type. So we use a pointer to its raw type (stripped reference type) instead.
 462│             // use pointer dereferencing for some
 463│             if (SgReferenceType* rtype = isSgReferenceType(orig_var_type))
 464│                local_type = buildPointerType(rtype->get_base_type());
 465│             else
 466│                local_type = buildPointerType(orig_var_type);
 467│         }
 468│         else                    // use variable clone instead for others
 469│             local_type = orig_var_type;
 470│     }
 471│     else // all other cases: non-fortran, not using variable clones 
 472│     {
 473│         if( is_C_language( ) )
 474│         {   
 475│             // we use pointer types for all variables to be passed
 476│             // the classic outlining will not use unpacking statement, but use the parameters directly.
 477│             // So we can safely always use pointer dereferences here
 478│             local_type = buildPointerType( orig_var_type );
 479│         }
 480│         else // C++ language
 481│             // Rich's idea was to leverage C++'s reference type: two cases:
 482│             //  a) for variables of reference type: no additional work
 483│             //  b) for others: make a reference type to them
 484│             //   all variable accesses in the outlined function will have
 485│             //   access the address of the by default, not variable substitution is needed 
 486│         { 
 487|              local_type = isSgReferenceType( orig_var_type ) ? orig_var_type 
 488│                                                             : SgReferenceType::createType( orig_var_type );
 489│         }
 490│     }


Transform.cc : collect variables

 std::set<SgInitializedName*> restoreVars;
 calculateVariableRestorationSet (syms, readOnlyVars,liveOuts,restoreVars);

dlopen

edit

use_dlopen option tells the outliner to use the dlopen() to find and call the outlined function stored into a dynamically loadable library.

This option will turn on several other options (inside Outliner.cc Outliner::validateSettings())

  • -rose:outline:exclude_headers
  • useNewFile= true;
  • useParameterWrapper = true;
  • temp_variable = true;


compilation and linking instructions: assume the input file is ft.c

  • outline -rose:outline:use_dlopen -I/home/liao6/workspace/outliner/build/../sourcetree/projects/autoTuning -c /path/to/ft.c
    • this step will generate two files:
    • rose_ft.c : the original ft.c file is transformed into this file
    • rose_ft_lib.c (outlined function in a shared lib file)
  • build the .so file from rose_ft_lib.c
    • gcc -I. -g -fPIC -c rose_ft_lib.c
    • gcc -g -shared rose_ft_lib.o -o rose_ft_lib.so
    • cp rose_ft_lib.so /tmp/.
  • link everything together
    • the object files should be linked with libautoTuning.a, built from projects/autoTuning/autotuning_lib.c, which in turn defines findFunctionUsingDlopen().
    • gcc -o a.out rose_ft.o /roseInstallPath/lib/libautoTuning.a -Wl,--export-dynamic -g -ldl -lm

A full example using dlopen can be found at

Testing

edit

The ROSE AST outliner has a dedicated testing directory: rose/tests/nonsmoke/functional/roseTests/astOutliningTests

  • Some C, C++ and Fortran test input files are prepared there.
  • Sample command line options are available in the Makefile.am file within this test directory.


full command line example

  • /home/liao6/workspace/rose/buildtree/tests/nonsmoke/functional/roseTests/astOutliningTests/outline -rose:outline:use_dlopen -rose:outline:temp_variable -I/home/liao6/workspace/rose/buildtree/../sourcetree/projects/autoTuning -rose:outline:exclude_headers -rose:outline:output_path . -c /home/liao6/workspace/rose/sourcetree/tests/nonsmoke/functional/roseTests/astOutliningTests/array1.c

To trigger single test , assuming the input file is named inputFile.c:

  • make classic_inputFile.c.passed //classic behavior
  • make dlopen_inputFile.c.passed // dlopen feature

As you can see, the prefix indicate different options of using the outliner.

Example input and output

edit

As a standalone tool

edit

Input file, with a pragma to indicate which code portion to be outlined:

int main()
{
    double n, start=1, total;
    double unlucky=0, lucky;
    double *number;
	                 
    scanf("%lf",&n);                    
    total = 9;                      
    for(int j =1; j < n; j++)
    {
      total = total * 10;
      start = start *10;
    }

    number = (double*)malloc(n * sizeof(double));                           
    for(double i = start; i < start*10; i++)
    {
      double temp = i;
#pragma rose_outline
      for(int j = 1; j<= n; j++)
      {
	number[j]=(int)temp%10;
	temp = temp/10;
      }
      for(int k = n; k>=1; k--)
      {
	if(number[k] == 1 && number[k-1] == 3){
	  unlucky++;
	  break;
	}
      }
    }                                   
    lucky = total - unlucky;
    printf("there are %f lucky integers in %f digits integers", lucky, n);
    return 0;
}


//------------output file is

static void OUT__1__2222__(double *np__,double **numberp__,double *tempp__);

int main()
{
  double n;
  double start = 1;
  double total;
  double unlucky = 0;
  double lucky;
  double *number;
  scanf("%lf",&n);
  total = 9;
  for (int j = 1; j < n; j++) {
    total = total * 10;
    start = start * 10;
  }
  number = ((double *)(malloc(n * (sizeof(double )))));
  for (double i = start; i < start * 10; i++) {
    double temp = i;
    OUT__1__2222__(&n,&number,&temp);
    for (int k = n; k >= 1; k--) {
      if (number[k] == 1 && number[k - 1] == 3) {
        unlucky++;
        break; 
      }
    }
  }
  lucky = total - unlucky;
  printf("there are %f lucky integers in %f digits integers",lucky,n);
  return 0;
}

static void OUT__1__2222__(double *np__,double **numberp__,double *tempp__)
{
  double *n = (double *)np__;
  double **number = (double **)numberp__;
  double *temp = (double *)tempp__;
  for (int j = 1; j <=  *n; j++) {
    ( *number)[j] = (((int )( *temp)) % 10);
     *temp =  *temp / 10;
  }
}

char* type

edit

Input:

#include <stdio.h>
#include <stdlib.h>

const char *abc_soups[10] = {("minstrone"), ("french onion"), ("Texas chili"), ("clam chowder"), ("potato leek"), ("lentil"), ("white bean"), ("chicken noodle"), ("pho"), ("fish ball")};

int main (void)
{
// split variable declarations with their initializations, as a better demo for the outliner
  int abc_numBowls;
  const char *abc_soupName;
  int numBowls;
  const char *soupName;
#pragma rose_outline
  {
    abc_numBowls = rand () % 10;
    abc_soupName = abc_soups[rand () % 10];
    numBowls = abc_numBowls;
    soupName = abc_soupName;
  }

  printf ("Here are your %d bowls of %s soup\n", numBowls, soupName);

  printf ("-----------------------------------------------------\n");
  return 0;
}

outline --edg:no_warnings -rose:verbose 0 -rose:outline:parameter_wrapper -rose:detect_dangling_pointers 1 -c input.cpp

Output file:

#include <stdio.h>
#include <stdlib.h>
const char *abc_soups[10] = {("minstrone"), ("french onion"), ("Texas chili"), ("clam chowder"), ("potato leek"), ("lentil"), ("white bean"), ("chicken noodle"), ("pho"), ("fish ball")};
static void OUT__1__11770__(void **__out_argv);

int main()
{
// split variable declarations with their initializations, as a better demo for the outliner
  int abc_numBowls;
  const char *abc_soupName;
  int numBowls;
  const char *soupName;
  void *__out_argv1__11770__[4];
  __out_argv1__11770__[0] = ((void *)(&soupName));
  __out_argv1__11770__[1] = ((void *)(&numBowls));
  __out_argv1__11770__[2] = ((void *)(&abc_soupName));
  __out_argv1__11770__[3] = ((void *)(&abc_numBowls));
  OUT__1__11770__(__out_argv1__11770__);
  printf("Here are your %d bowls of %s soup\n",numBowls,soupName);
  printf("-----------------------------------------------------\n");
  return 0;
}

static void OUT__1__11770__(void **__out_argv)
{
  int &abc_numBowls =  *((int *)__out_argv[3]);
  const char *&abc_soupName =  *((const char **)__out_argv[2]);
  int &numBowls =  *((int *)__out_argv[1]);
  const char *&soupName =  *((const char **)__out_argv[0]);
  abc_numBowls = rand() % 10;
  abc_soupName = abc_soups[rand() % 10];
  numBowls = abc_numBowls;
  soupName = abc_soupName;
}

work with C++ member functions

edit

Input code:

int a;

class B 
{
  private: 

  int b;
 inline void foo(int c)
 {
#pragma rose_outline
   b = a+c;
 }
};

Output code

  • add friend declaration for the outlined function so it can access private class members
  • pass this pointer to a class object as a function argument

int a;
static void OUT__1__2386__(int *cp__,void *this__ptr__p__);

class B 
{
  public: friend void ::OUT__1__2386__(int *cp__,void *this__ptr__p__);
  private: int b;
  

  inline void foo(int c)
{
// //A declaration for this pointer
    class B *this__ptr__ = this;
    OUT__1__2386__(&c,&this__ptr__);
  }
}
;

static void OUT__1__2386__(int *cp__,void *this__ptr__p__)
{
  int &c =  *((int *)cp__);
  class B *&this__ptr__ =  *((class B **)this__ptr__p__);
  this__ptr__ -> b = a + c;
}

Using -rose:outline:parameter_wrapper , the result will be slightly different:

  • all parameters will be wrapped into an array of pointers in the caller function
  • the array will be unpacked to retrieve the parameters in the outlined function

int a;
static void OUT__1__2391__(void **__out_argv);

class B 
{
  public: friend void ::OUT__1__2391__(void **__out_argv);
  private: int b;
  

  inline void foo(int c)
{
// //A declaration for this pointer
    class B *this__ptr__ = this;
    void *__out_argv1__1527__[2];
    __out_argv1__1527__[0] = ((void *)(&this__ptr__));
    __out_argv1__1527__[1] = ((void *)(&c));
    OUT__1__2391__(__out_argv1__1527__);
  }
}
;

static void OUT__1__2391__(void **__out_argv)
{
  int &c =  *((int *)__out_argv[1]);
  class B *&this__ptr__ =  *((class B **)__out_argv[0]);
  this__ptr__ -> b = a + c;
}

Used for OpenMP Implementation

edit

See more at ROSE_Compiler_Framework/OpenMP_Support.

Below is an example translation:

/*a test C program. You can replace this content with yours, within 20,000 character limit (about 500 lines) . */
#include<stdio.h>
#include<stdlib.h>

int main(int argc, char* argv[])
{
    int nthreads, tid;
    #pragma omp parallel private(nthreads, tid)
    {
        tid = omp_get_thread_num();
	printf("Hello World from thread = %d ", tid);
	if(tid == 0)
	{
	    nthreads = omp_get_num_threads();
	    printf("Number of threads = %d", nthreads);
	}
    }
    return 0;
}


//------------- output code --------------
/*a test C program. You can replace this content with yours, within 20,000 character limit (about 500 lines) . */
#include<stdio.h>
#include<stdlib.h>
#include "libxomp.h" 
static void OUT__1__2231__(void *__out_argv);

int main(int argc,char *argv[])
{
  int status = 0;
  XOMP_init(argc,argv);
  int nthreads;
  int tid;
  XOMP_parallel_start(OUT__1__2231__,0,1,0,"/tmp/test-20191219_224253-113680.c",8);
  XOMP_parallel_end("/tmp/test-20191219_224253-113680.c",17);
  XOMP_terminate(status);
  return 0;
}

static void OUT__1__2231__(void *__out_argv)
{
  int _p_nthreads;
  int _p_tid;
  _p_tid = omp_get_thread_num();
  printf("Hello World from thread = %d ",_p_tid);
  if (_p_tid == 0) {
    _p_nthreads = omp_get_num_threads();
    printf("Number of threads = %d",_p_nthreads);
  }
}

Used to Generate CUDA kernels for OpenMP 4.x

edit

Example input and output code for the classic Jacobi OpenMP 4.0 version:


//--------------input--------------

void jacobi( )
{
  REAL omega;
  int i,j,k;
  REAL error,resid,ax,ay,b;
  //      double  error_local;

  //      float ta,tb,tc,td,te,ta1,ta2,tb1,tb2,tc1,tc2,td1,td2;
  //      float te1,te2;
  //      float second;

  omega=relax;
  /*
   * Initialize coefficients */

  ax = 1.0/(dx*dx); /* X-direction coef */
  ay = 1.0/(dy*dy); /* Y-direction coef */
  b  = -2.0/(dx*dx)-2.0/(dy*dy) - alpha; /* Central coeff */

  error = 10.0 * tol;
  k = 1;

  // An optimization on top of naive coding: promoting data handling outside the while loop
  // data properties may change since the scope is bigger:
#pragma omp target data map(to:n, m, omega, ax, ay, b, f[0:n][0:m]) map(tofrom:u[0:n][0:m]) map(alloc:uold[0:n][0:m])
  while ((k<=mits)&&(error>tol))
  {
    error = 0.0;

    /* Copy new solution into old */
#pragma omp target map(to:n, m, u[0:n][0:m]) map(from:uold[0:n][0:m])
#pragma omp parallel for private(j,i) collapse(2)
    for(i=0;i<n;i++)
      for(j=0;j<m;j++)
        uold[i][j] = u[i][j];

#pragma omp target map(to:n, m, omega, ax, ay, b, f[0:n][0:m], uold[0:n][0:m]) map(from:u[0:n][0:m])
#pragma omp parallel for private(resid,j,i) reduction(+:error) collapse(2) // nowait
    for (i=1;i<(n-1);i++)
      for (j=1;j<(m-1);j++)
      { 
        resid = (ax*(uold[i-1][j] + uold[i+1][j])\
            + ay*(uold[i][j-1] + uold[i][j+1])+ b * uold[i][j] - f[i][j])/b;

        u[i][j] = uold[i][j] - omega * resid;
        error = error + resid*resid ;
      }
...

    /* Error check */

    if (k%500==0)
      printf("Finished %d iteration with error =%f\n",k, error);
    error = sqrt(error)/(n*m);

    k = k + 1;
  }          /*  End iteration loop */
  printf("Total Number of Iterations:%d\n",k);
  printf("Residual:%E\n", error);
  printf("Residual_ref :%E\n", resid_ref);
  printf ("Diff ref=%E\n", fabs(error-resid_ref));
  assert (fabs(error-resid_ref) < 1E-13);
}



//----------------output-----------------

#include "libxomp.h" 
#include "xomp_cuda_lib_inlined.cu" 
...



__global__ void OUT__1__8714__(float omega,float ax,float ay,float b,int __final_total_iters__2__,int __i_interval__3__,float *_dev_per_block_error,float *_dev_u,float *_dev_f,float *_dev_uold)
{
  int _p_i;
  int _p_j;
  float _p_error;
  _p_error = 0;
  float _p_resid;
  int _p___collapsed_index__5__;
  int _dev_lower;
  int _dev_upper;
  int _dev_loop_chunk_size;
  int _dev_loop_sched_index;
  int _dev_loop_stride;
  int _dev_thread_num = getCUDABlockThreadCount(1);
  int _dev_thread_id = getLoopIndexFromCUDAVariables(1);
  XOMP_static_sched_init(0,__final_total_iters__2__ - 1,1,1,_dev_thread_num,_dev_thread_id,&_dev_loop_chunk_size,&_dev_loop_sched_index,&_dev_loop_stride);
  while(XOMP_static_sched_next(&_dev_loop_sched_index,__final_total_iters__2__ - 1,1,_dev_loop_stride,_dev_loop_chunk_size,_dev_thread_num,_dev_thread_id,&_dev_lower,&_dev_upper))
    for (_p___collapsed_index__5__ = _dev_lower; _p___collapsed_index__5__ <= _dev_upper; _p___collapsed_index__5__ += 1) {
      _p_i = _p___collapsed_index__5__ / __i_interval__3__ * 1 + 1;
      _p_j = _p___collapsed_index__5__ % __i_interval__3__ * 1 + 1;
      _p_resid = (ax * (_dev_uold[(_p_i - 1) * 512 + _p_j] + _dev_uold[(_p_i + 1) * 512 + _p_j]) + ay * (_dev_uold[_p_i * 512 + (_p_j - 1)] + _dev_uold[_p_i * 512 + (_p_j + 1)]) + b * _dev_uold[_p_i * 512 + _p_j] - _dev_f[_p_i * 512 + _p_j]) / b;
      _dev_u[_p_i * 512 + _p_j] = _dev_uold[_p_i * 512 + _p_j] - omega * _p_resid;
      _p_error = _p_error + _p_resid * _p_resid;
    }
  xomp_inner_block_reduction_float(_p_error,_dev_per_block_error,6);
}

...


void jacobi()
{
  float omega;
  int i;
  int j;
  int k;
  float error;
  float resid;
  float ax;
  float ay;
  float b;
//      double  error_local;
//      float ta,tb,tc,td,te,ta1,ta2,tb1,tb2,tc1,tc2,td1,td2;
//      float te1,te2;
//      float second;
  omega = relax;
/*
     * Initialize coefficients */
/* X-direction coef */
  ax = (1.0 / (dx * dx));
/* Y-direction coef */
  ay = (1.0 / (dy * dy));
/* Central coeff */
  b = (- 2.0 / (dx * dx) - 2.0 / (dy * dy) - alpha);
  error = (10.0 * tol);
  k = 1;
/* Translated from #pragma omp target data ... */
{
    xomp_deviceDataEnvironmentEnter();
    float *_dev_u;
    int _dev_u_size = sizeof(float ) * n * m;
    _dev_u = ((float *)(xomp_deviceDataEnvironmentPrepareVariable(((void *)u),_dev_u_size,1,1)));
    float *_dev_f;
    int _dev_f_size = sizeof(float ) * n * m;
    _dev_f = ((float *)(xomp_deviceDataEnvironmentPrepareVariable(((void *)f),_dev_f_size,1,0)));
    float *_dev_uold;
    int _dev_uold_size = sizeof(float ) * n * m;
    _dev_uold = ((float *)(xomp_deviceDataEnvironmentPrepareVariable(((void *)uold),_dev_uold_size,0,0)));
    while(k <= mits && error > tol){
      int __i_total_iters__0__ = (n - 1 - 1 - 1 + 1) % 1 == 0?(n - 1 - 1 - 1 + 1) / 1 : (n - 1 - 1 - 1 + 1) / 1 + 1;
      int __j_total_iters__1__ = (m - 1 - 1 - 1 + 1) % 1 == 0?(m - 1 - 1 - 1 + 1) / 1 : (m - 1 - 1 - 1 + 1) / 1 + 1;
      int __final_total_iters__2__ = 1 * __i_total_iters__0__ * __j_total_iters__1__;
      int __i_interval__3__ = __j_total_iters__1__ * 1;
      int __j_interval__4__ = 1;
      int __collapsed_index__5__;
      int __i_total_iters__6__ = (n - 1 - 0 + 1) % 1 == 0?(n - 1 - 0 + 1) / 1 : (n - 1 - 0 + 1) / 1 + 1;
      int __j_total_iters__7__ = (m - 1 - 0 + 1) % 1 == 0?(m - 1 - 0 + 1) / 1 : (m - 1 - 0 + 1) / 1 + 1;
      int __final_total_iters__8__ = 1 * __i_total_iters__6__ * __j_total_iters__7__;
      int __i_interval__9__ = __j_total_iters__7__ * 1;
      int __j_interval__10__ = 1;
      int __collapsed_index__11__;
      error = 0.0;
/* Copy new solution into old */
{
        xomp_deviceDataEnvironmentEnter();
        float *_dev_u;
        int _dev_u_size = sizeof(float ) * n * m;
        _dev_u = ((float *)(xomp_deviceDataEnvironmentPrepareVariable(((void *)u),_dev_u_size,1,0)));
        float *_dev_uold;
        int _dev_uold_size = sizeof(float ) * n * m;
        _dev_uold = ((float *)(xomp_deviceDataEnvironmentPrepareVariable(((void *)uold),_dev_uold_size,0,1)));
/* Launch CUDA kernel ... */
        int _threads_per_block_ = xomp_get_maxThreadsPerBlock();
        int _num_blocks_ = xomp_get_max1DBlock(__final_total_iters__8__ - 1 - 0 + 1);
        OUT__2__8714__<<<_num_blocks_,_threads_per_block_>>>(__final_total_iters__8__,__i_interval__9__,_dev_u,_dev_uold);
        xomp_deviceDataEnvironmentExit();
      }
{
        xomp_deviceDataEnvironmentEnter();
        float *_dev_u;
        int _dev_u_size = sizeof(float ) * n * m;
        _dev_u = ((float *)(xomp_deviceDataEnvironmentPrepareVariable(((void *)u),_dev_u_size,0,1)));
        float *_dev_f;
        int _dev_f_size = sizeof(float ) * n * m;
        _dev_f = ((float *)(xomp_deviceDataEnvironmentPrepareVariable(((void *)f),_dev_f_size,1,0)));
        float *_dev_uold;
        int _dev_uold_size = sizeof(float ) * n * m;
        _dev_uold = ((float *)(xomp_deviceDataEnvironmentPrepareVariable(((void *)uold),_dev_uold_size,1,0)));
/* Launch CUDA kernel ... */
        int _threads_per_block_ = xomp_get_maxThreadsPerBlock();
        int _num_blocks_ = xomp_get_max1DBlock(__final_total_iters__2__ - 1 - 0 + 1);
        float *_dev_per_block_error = (float *)(xomp_deviceMalloc(_num_blocks_ * sizeof(float )));
        OUT__1__8714__<<<_num_blocks_,_threads_per_block_,(_threads_per_block_ * sizeof(float ))>>>(omega,ax,ay,b,__final_total_iters__2__,__i_interval__3__,_dev_per_block_error,_dev_u,_dev_f,_dev_uold);
        error = xomp_beyond_block_reduction_float(_dev_per_block_error,_num_blocks_,6);
        xomp_freeDevice(_dev_per_block_error);
        xomp_deviceDataEnvironmentExit();
      }
//    }
/*  omp end parallel */
/* Error check */
      if (k % 500 == 0) {
        printf("Finished %d iteration with error =%f\n",k,error);
      }
      error = (sqrt(error) / (n * m));
      k = k + 1;
/*  End iteration loop */
    }
    xomp_deviceDataEnvironmentExit();
  }
  printf("Total Number of Iterations:%d\n",k);
  printf("Residual:%E\n",error);
  printf("Residual_ref :%E\n",resid_ref);
  printf("Diff ref=%E\n",(fabs((error - resid_ref))));
  fabs((error - resid_ref)) < 1E-14?((void )0) : __assert_fail("fabs(error-resid_ref) < 1E-14","jacobi-ompacc-opt2.c",236,__PRETTY_FUNCTION__);
}


See details at ROSE_Compiler_Framework/OpenMP_Acclerator_Model_Implementation

Known issues

edit

List

  • the message "error in side effect analysis!" when setting Outliner::useStructureWrapper to true. This also happens in the outlineIfs example from the tutorial directory.
    • you can ignore this warning message if your translator still works. The outliner uses quite some analyses internally if Outliner::useStructureWrapper is turned on. But some of the analyses may not always handle all situations so they just give up and notify the outliner. The outliner is designed to make conservative decisions in this case and to generate less optimal translated code.

Publications

edit

A paper describing the internals of the AST outliner, the default paper to cite if you happen to use the AST outliner for your research work

  • Chunhua Liao, Daniel J. Quinlan, Richard Vuduc, and Thomas Panas. 2009. Effective source-to-source outlining to support whole program empirical optimization. In Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing (LCPC'09)

To support generating multi-threaded kernels for CPUs and GPUs

  • Chunhua Liao , Daniel J. Quinlan , Thomas Panas , Bronis R. de Supinski, A ROSE-Based OpenMP 3.0 research compiler supporting multiple runtime libraries, Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP: accelerators, Tasking and more, June 14-16, 2010, Tsukuba, Japan
  • C. Liao, Y. Yan, B. R. de Supinski, D. J. Quinlan, and B. Chapman, “Early experiences with the openmp accelerator model,” in Openmp in the era of low power devices and accelerators, Springer, 2013, pp. 84-98.

Being used to support empirical tuning or autotuning

  • Shirley Moore, Refactoring and automated performance tuning of computational chemistry application codes, Proceedings of the Winter Simulation Conference, December 09-12, 2012, Berlin, Germany
  • Nicholas Chaimov , Scott Biersdorff , Allen D Malony, Tools for machine-learning-based empirical autotuning and specialization, International Journal of High Performance Computing Applications, v.27 n.4, p.403-411, November 2013