ROSE Compiler Framework/LoopProcessor

Where is the tool


Source file

Binary, not built or installed by default . You have to build it:

  • cd rose_buildtree/tutorial
  • make loopProcessor



See more at

Command line options

..[buildtree/tutorial]./loopProcessor --help

loopProcessor <options> <program name>

-gobj: generate object file
-orig: copy non-modified statements from original file

# split loop
-splitloop: applying loop splitting to remove conditionals inside loops

-annot <filename> 
-pre:  apply partial redundancy elimination
-fd:  apply finite differencing to array index expressions

# Debugging options
-debugloop: print debugging information for loop transformations; 
-debugdep: print debugging information for dependence analysis; 

-tmloop: print timing information for loop transformations; 

# Use special function to denote array access (the special function can be replaced
# with macros after transformation). This option is for circumventing complex
# subscript expressions for linearized multi-dimensional arrays.
-arracc <funcname>: use function <funcname> to denote multi-dimensional array access;

opt <level=0>: the level of loop optimizations to apply; by default, only the outermost level is optimized;

# unroll loop: 
-unroll [-locond] [-nvar] [poet] <-unrollsize> : unrolling innermost loops at <unrollsize>

# break up statements in loops
-bs <stmtsize> : break up statements in loops at <stmtsize>

-bk_poet <blocksize> : parameterize the blocking transformation

-par_poet <blocksize> : paralleization transformation using POET

# loop blocking
-bk1 <blocksize> :block outer loops
-bk2 <blocksize> :block inner loops
-bk3 <blocksize> :block all loops

# copy array
-cp <copydim> :copy array regions with dimensions <= <copydim>
-cp_poet<copydim> :parameterize array copy array regions; to be applied together with blocking.

# loop interchange
-ic1 :loop interchange for more reuses  // *** 

# loop fission
-fs0 : maximum distribution at all loops
-fs01 : maximum distribution at inner-most loops

# loop fusing 
-fs1 :single-level loop fusion for more reuses
-fs2 :multi-level loop fusion for more reuses

# Max number of nodes to split for transitive dependence analysis (to limit the overhead of transitive dep. analysis)
-ta <int> :split limit for transitive dep. analysis

#  set cache line size in evaluating spatial locality (affect decisions in applying loop optimizations)
-clsize <int> :set cache line size

# set maximum distance of reuse that can exploit cache (used to evaluate temporal locality of loops)
-reuse_dist <int> :set reuse distance

-dt :perform dynamic tuning

Example use


Loop fusion

// -----------test loop fusion input.c ---------------
#define N 1024

void foo(double a[N], double b[N], double c[N])
 int i,j;
  for (i = 0; i < N; i++)
    a[i - 1] = b[i];

  for (j = 0; j < N; j++)
    c[j] = a[j];

// command line 

[..buildtree/tutorial]./loopProcessor -fs2 input.c

//------------------------ output---------------
// test loop fusion
#define N 1024

void foo(double a[1024],double b[1024],double c[1024])
  int i;
  int j;
  for (i = 0; i <= 1024; i += 1) {
    if (i <= 1023) {
      a[i - 1] = b[i];
     else {
    if (i >= 1) {
      c[-1 + i] = a[-1 + i];
     else {