ROSE Compiler Framework/LoopProcessor

Where is the tool

Source file

https://github.com/rose-compiler/rose-develop/blob/master/tutorial/LoopProcessor.C

Binary, not built or installed by default . You have to build it:

cd rose_buildtree/tutorial
make loopProcessor

Documentation

See more at

Chapter 38 of http://rosecompiler.org/ROSE_Tutorial/ROSE-Tutorial.pdf

Command line options

..[buildtree/tutorial]./loopProcessor --help

loopProcessor <options> <program name>

-gobj: generate object file
-orig: copy non-modified statements from original file

# split loop
#----------------------------------
-splitloop: applying loop splitting to remove conditionals inside loops

-annot <filename> 
-pre:  apply partial redundancy elimination
-fd:  apply finite differencing to array index expressions


# Debugging options
#----------------------------------
-debugloop: print debugging information for loop transformations; 
-debugdep: print debugging information for dependence analysis; 

-tmloop: print timing information for loop transformations; 


# Use special function to denote array access (the special function can be replaced
# with macros after transformation). This option is for circumventing complex
# subscript expressions for linearized multi-dimensional arrays.
-arracc <funcname>: use function <funcname> to denote multi-dimensional array access;


opt <level=0>: the level of loop optimizations to apply; by default, only the outermost level is optimized;

# unroll loop: 
#----------------------------------
-unroll [-locond] [-nvar] [poet] <-unrollsize> : unrolling innermost loops at <unrollsize>

# break up statements in loops
#----------------------------------
-bs <stmtsize> : break up statements in loops at <stmtsize>


-bk_poet <blocksize> : parameterize the blocking transformation

-par_poet <blocksize> : paralleization transformation using POET

# loop blocking
#----------------------------------
-bk1 <blocksize> :block outer loops
-bk2 <blocksize> :block inner loops
-bk3 <blocksize> :block all loops


# copy array
#----------------------------------
-cp <copydim> :copy array regions with dimensions <= <copydim>
-cp_poet<copydim> :parameterize array copy array regions; to be applied together with blocking.


# loop interchange
#----------------------------------
-ic1 :loop interchange for more reuses  // *** 


# loop fission
#----------------------------------
-fs0 : maximum distribution at all loops
-fs01 : maximum distribution at inner-most loops

# loop fusing 
#----------------------------------
-fs1 :single-level loop fusion for more reuses
-fs2 :multi-level loop fusion for more reuses

# Max number of nodes to split for transitive dependence analysis (to limit the overhead of transitive dep. analysis)
-ta <int> :split limit for transitive dep. analysis

#  set cache line size in evaluating spatial locality (affect decisions in applying loop optimizations)
-clsize <int> :set cache line size

# set maximum distance of reuse that can exploit cache (used to evaluate temporal locality of loops)
-reuse_dist <int> :set reuse distance

-dt :perform dynamic tuning

Example use

Loop fusion


// -----------test loop fusion input.c ---------------
#define N 1024

void foo(double a[N], double b[N], double c[N])
{
 int i,j;
  for (i = 0; i < N; i++)
    a[i - 1] = b[i];

  for (j = 0; j < N; j++)
    c[j] = a[j];
}


// command line 

[..buildtree/tutorial]./loopProcessor -fs2 input.c

//------------------------ output---------------
// test loop fusion
#define N 1024

void foo(double a[1024],double b[1024],double c[1024])
{
  int i;
  int j;
  for (i = 0; i <= 1024; i += 1) {
    if (i <= 1023) {
      a[i - 1] = b[i];
    }
     else {
    }
    if (i >= 1) {
      c[-1 + i] = a[-1 + i];
    }
     else {
    }
  }
}