ROSE Compiler Framework/Print version

About the Book

edit

FYI: http://wiki.rosecompiler.org redirects here.

Introduction

edit

The goal of this book is to have a community documentation providing extensive and up-to-date instructional information about how to use the open-source ROSE compiler framework, developed at Lawrence Livermore National Laboratory .

While the ROSE project website (http://www.rosecompiler.org) already has a variety of official documentations, having a wikibook for ROSE allows anybody to contribute to gathering instructional information about this software.

Again, please note that this wikibook is not the official documentation of ROSE. It is the community efforts contributed by anyone just like you.

How To Contribute

edit
  • Welcomed Contributions:
    • Fix typos and grammar of existing pages for quality, clarity, and readability.
    • Add new pages about ROSE-specific tutorials, how-tos, FAQ, and workflow
    • Start discussions on the Discussion tab of an existing page about new suggestions of how things can be done better
  • Unwelcome Contributions:
    • Rewriting existing documentation, link to it instead if necessary

References

edit

Conventions

edit
  • Technical names, identifiers, etc. should be enclosed in teletype, <samp></samp>

    The FooBar class can be found in the foobar.cpp file.

  • Source code should use a highlighted code block:
<syntaxhighlight lang="<language>">
<Code goes here...>
</syntaxhighlight>

(Enclosing code in a <pre></pre> block is also okay, but the highlighted code block is preferred.)

  • Headings:
    • Capitalize all words e.g. "Graph Processing"
  • Subheadings:
    • First word capitalized only

Tracking Wiki Changes

edit

Learn how to "Track Changes": http://en.wikibooks.org/wiki/Help:Tracking_changes

Enable Email Notifications for Changes to this book

edit

If you want to be notified of changes to this book, WikiBooks provides email notifications for changes to Wiki pages that you explicitly choose to watch.

To use this feature:

1. Create an account with WikiBooks: http://en.wikibooks.org/w/index.php?title=Special:UserLogin&returnto=Main+Page&type=signup

2. Login to WikiBooks and set your preferences (top right corner of the web page) for both email notifications and your watch list:

  • Email notification settings
    • Preferences-> User profile-> E-mail notifications -> E-mail me when a page on my watchlist is changed (check this on)
  • Define your watchlist
    • Preferences->Watchlist -> Advanced options -> you can select the options you want, such as "Add pages I edit to my watchlist" and "Add pages I create to my watchlist"
    • you can also individually watch and unwatch any wiki page: by click on the star on the page's tab list (after View history)

Caveat: we don't know if wikibooks supports users to watch one entire book. So far, you have to do this one page after another by editing them at some points.

Wikibook Writing Tips

edit

1. What exactly is "BookCat" for? It is a category tag automatically added by wiki robot scripts.

2. Should "BookCat" be at the end of the document? Any position in the page should be fine. Having it at top may be better so it won't be accidentally deleted when we add new things at the bottom.

ROSE's Documentations

edit

ROSE uses a range of materials to document the project.

The website uses WordPress as frontend. Most documentations can also be directly accessed by using the old static webpage: http://rosecompiler.org/index.html

Wiki

  • This wikibook: non-official, community documentation. Editable by anyone, aimed to supplement official documents and to collect tutorials, FAQ and quick pointers to important topics.

Obtaining ROSE

edit

Git Repositories

edit

ROSE's source files are managed by git, a distributed revision control and source code management system.

The latest development version of rose. It may not pass all regression tests. (Recommend to use)

A release version of ROSE that passes all regression tests.

Here are some rough instructions to get you started with building and installing ROSE on your Linux machine:

1. Install the Boost C++ libraries. You can use my quick-and-dirty script: install-boost.sh.

2. Configure your environment

3. Run the $ROSE/build script to perform the Autotools bootstrap

4. Run the $ROSE/configure script with "--with-boost=/path/to/boost", "--prefix=/path/for/installation". Optionally, you should configure --with-gomp_omp_runtime_library=/usr/apps/gcc/4.4.1/lib64/ if you want to play with OpenMP support of ROSE and link with GCC 4.4's OpenMP runtime library.

5. Run "make –j24 install-core" to build and install ROSE

Please note that we only support GCC 4.2.4 and GCC 4.4.5 currently on the ROSE-EDG4 repository. Full documentation is available on our website, http://rosecompiler.org, though instructions specific to ROSE-EDG4 are generally lacking.

Autoconf Configuration

edit

The EDG4.7 frontend is enabled by default, you can just configure ROSE by using the following options

   $ $ROSE/configure --with-boost=/path/to/boost/installation --prefix=/path/for/rose/installation

Example Environment

edit
   # Example setup.sh file to source relevant environment settings for building ROSE
   #
   # Update the paths to fit your environment…
   export BOOST_ROOT=/home/too1/local/opt/boost/1_45_0
   export BOOST_HOME="${BOOST_ROOT}"
   export JAVA_HOME=/home/too1/local/workspace/rose/edg3/keep_going/opt/java/jdk/1.7.0_15
   export LIBTOOL_HOME=/usr/apps/libtool/2.2.4
   export AUTOMAKE_HOME=/usr/apps/automake/1.9.6
   export LD_LIBRARY_PATH=":${JAVA_HOME}/lib:${JAVA_HOME}/jre/lib/amd64/server:${BOOST_ROOT}/lib:"
   export PATH="${JAVA_HOME}/bin:${AUTOMAKE_HOME}/bin:${LIBTOOL_HOME}/bin:$PATH"
   export PATH="/usr/apps/git/latest/bin:$PATH"
   # GCC 4.4.1
   source /nfs/apps/mpfr/2.4.1/setup.sh
   source /nfs/apps/gmp/4.3.1/setup.sh
   source /nfs/apps/gcc/4.2.4/setup.sh

Virtual Machine Image

edit

It can take quite some time to install ROSE for the first time. A virtual machine image is provided with a Ubuntu 10.04 OS with ROSE already installed.

You can download it and play it using VMware Player

More information is at ROSE Virtual Machine Image

git 1.7.10 or later for github.com

edit

github requires git 1.7.10 or later to avoid https cloning errors, as mentioned at https://help.github.com/articles/https-cloning-errors

Ubuntu 10.04's package repository contains git 1.7.0.4. So building later version of git is needed. But you still need an older version of git to get the latest version of git.

 apt-get install git-core

Now you can clone the latest git

 git clone https://github.com/git/git.git

Install all prerequisite packages needed to build git from source files(assuming you already installed GNU tool chain with GCC compiler, make, etc.)

 sudo apt-get install gettext zlib1g-dev asciidoc libcurl4-openssl-dev
 $ cd git  # enter the cloned git directory
 $ make configure ;# as yourself
 $ ./configure --prefix=/usr ;# as yourself
 $ make all doc ;# as yourself
 # make install install-doc install-html;# as root

EDG source code

edit

If you have an EDG license, we can provide you with ROSE's EDG source code. The original, official EDG source code does NOT work with ROSE since we have modified EDG to better serve our purposes.

Note: We provide you with a snapshot of our Git revision controlled ROSE-EDG source code repository. This way, you can more easily contribute your EDG modifications back into ROSE.

1. Send your EDG (research) license to two ROSE staff members, just in case one is on vacation or on travel.

2. Provide ROSE staff with a drop-off location for the EDG source code (ssh or ftp server, etc.)

3. Once you receive the EDG source code, you have two options:

As a submodule

edit

a. Use ROSE-EDG as a submodule (assuming you have ROSE's Git source tree):

This is the recommended way to use the EDG git repo we provide. So the assumption is that you use a local git clone of ROSE($ROSE).

Edit submodule path in $ROSE/.gitmodules to point to your ROSE-EDG repository:

[submodule "src/frontend/CxxFrontend/EDG"]
       path = src/frontend/CxxFrontend/EDG
-       url = ../ROSE-EDG.git
+       url = <path/to/your/ROSE-EDG.git>
-[submodule "projects/vulnerabilitySeeding"]
-       path = projects/vulnerabilitySeeding
-       url = ../vulnerabilitySeeding.git

Run git-submodule commands:

$ cd $ROSE
$ git submodule init
$ git submodule update

The commands above will check out a version of the EDG submodule and save it into ROSE/src/frontend/CxxFrontend/EDG

As a Drop-in

edit

b. As a Drop-in

Move ROSE-EDG tarball into its correct location within the ROSE source tree: $ROSE/src/frontend/EDG

  $ tar xzvf ROSE-EDG-b12158aa2.tgz
  $ ls
  EDG  ROSE-EDG-b12158aa2.tgz
  $ mv EDG $ROSE/src/frontend/EDG

Warning: This method may not work because EDG is a submodule of ROSE and therefore, requires a version synchronization between the two. For example, the latest version of ROSE may not use the latest version of ROSE's EDG.

The remaining steps

edit

4. In ROSE, run the $ROSE/build script from the top-level of the ROSE source tree, i.e. $ROSE. This script bootstraps Autotools, including the Makefile.ams in the EDG source tree.

5. Configure and build ROSE: Normally, during this process ROSE would attempt to download an EDG binary tarball for you, but since you have the source code, this step will be skipped.


EDG tarball

edit

Process

edit

If you don't have access to the EDG source code, you will be able to automatically download a packaged EDG binary tarball during the ROSE build process. The download is triggered during make in $ROSE_BUILD/src/frontend/CxxFrontend.

The EDG binary version is a computed binary compatibility signature relative to your version of ROSE. You can check this version by running the $ROSE/scripts/bincompat-sig, for example:

$ ./scripts/bincompat-sig 
7b1930fafc929de85182ee1a14c86758

You may encounter this error:

$ ./scripts/bincompat-sig 
Unable to find a remote tracking a canonical repository.  Please add a
canonical repository as a remote and ensure it is up to date.  Currently
configured remotes are:

   origin => https://github.com/rose-compiler/rose

Potential canonical repositories include:

   anything ending with "rose.git" (case insensitive)

If you do, simply add ".git" to the end of your origin's URL path. In our example, this translates to:

https://github.com/rose-compiler/rose.git

List of binaries

edit

View the list of available EDG binaries here: http://www.rosecompiler.org/edg_binaries/edg_binaries.txt.

EDG binaries are generated for these platforms (Last updated on 12/22/2012):

Platform EDG 3.3 EDG 4.0
amd64-linux GCC 3.4.6, 4.0.4, 4.1.2, 4.2.4, 4.3.2, 4.4.1 GCC 3.4.6, 4.0.4, 4.1.2, 4.2.4, 4.3.2, 4.4.1
i686-linux GCC 3.4.6, 4.0.4, 4.1.2, 4.2.4, 4.3.2, 4.4.1 GCC 3.4.6, 4.0.4, 4.1.2, 4.2.4, 4.3.2, 4.4.1
32bit-macos-10.5 GCC 4.0.4
64bit-macos-10.6 GCC 4.2.4
64bit-x86_64-macos-10.6 GCC 4.2.4
34bit-debian GCC 3.4.6, 4.0.4, 4.1.2, 4.2.4, 4.3.2, 4.4.1 GCC 3.4.6, 4.0.4, 4.1.2, 4.2.4, 4.3.2, 4.4.1

Installation

edit

ROSE is released as an open source software package. Users are expected to compile and install the software.

Don't like installing ROSE?

edit

There are quite some steps for users to install ROSE from scratch. We provide a virtual machine image which contains an installed copy of ROSE. You can download and try it out before making serious investment of your time.

More information about this is at ROSE_Compiler_Framework/Virtual_Machine_Image.

Platform Requirement

edit

ROSE is portable to Linux and Mac OS X on IA-32 and x86-64 platforms. In particular, ROSE developers often use the following development environments:

  • Red Hat Enterprise Linux 7 or its open source equivalent Centos 7
  • Ubuntu 18.04

Minimum disk space

  • 30 GB

Software Requirement

edit

Here is a list for prerequisite software packages for installing ROSE

  • GCC: the range of supported GCC versions is checked by support-rose.m4 during configuration.
    • gcc
    • g++
  • boost library: Again the range of supported Boost versions is checked by support-rose.m4 during configuration
  • GNU autoconf and automake
  • libtool:
  • bison (byacc),
  • flex
  • glibc-devel
  • git
  • ZGRViewer, a GraphViz/DOT Viewer: essential to view dot graphs of ROSE AST
    • install Graphviz first - Graph Visualization Software

Optional packages for additional features or advanced users

  • gfortran (optional for Fortran support)
  • Sun Java JDK or OpenJDK: needed only if you are interested in Fortran and Java support in ROSE.
  • libxml2-devel
  • sqlite
  • texlive-full, need for building LaTeX docs

Instructions for Ubuntu 18.04

edit

Full process from the beginning to the end. It takes about 1 hour to finish if using an Amazon t2.2xlarge (AWS 8 vCPUs+32 GB Mem.) instance with -j8.

sudo apt-get update
sudo apt-get upgrade
  
sudo apt-get install git wget build-essential libtool flex bison python3-dev unzip perl-doc doxygen texlive libboost-all-dev gdb gcc-7 g++-7 gfortran-7 autoconf automake libxml2-dev libdwarf-dev graphviz openjdk-8-jdk lsb-core ghostscript perl-doc

git clone https://github.com/rose-compiler/rose
cd rose/

./build
 
cd ..
mkdir build-rose
cd build-rose/

../rose/configure --prefix=/home/demo/opt/rose-inst --enable-edg_version=5.0 --with-boost-libdir=/usr/lib/x86_64-linux-gnu --with-boost=/usr

make core -j4
make install-core -j4

./build

edit

In general, it is better to rebuild the configure file in the top level source directory of ROSE. Just type:

 rose_sourcetree>./build

configure

edit

The next step is to run configure in a separated build tree. ROSE will complain if you try to build it within its source directory.

There are many configuration options. You can see the full list of options by typing ../sourcetree/configure --help . But only --prefix and --with-boost are required as the minimum options.

 mkdir buildrose
 cd buildrose
 /home/liao6/rose/freshmaster/sourcetree/configure --prefix=/home/liao6/rose/freshmaster/install --with-boost=/nfs/casc/overture/ROSE/opt/rhel6/x86_64/boost/1_45_0/gcc/4.4.5 --with-C_OPTIMIZE=-O0 --with-CXX_OPTIMIZE=-O0 

By default, all supported languages are enabled as much as possible, this may slow down your compilation process. You can specify the desired language sets by using:

 --enable-languages=LIST Build specific languages:
                          all,none,binaries,c,c++,cuda,fortran,java,x10,opencl,php,matlab,python
                          (default=all)

For example, you can use "--enable-languages=c++,fortran" if you are only interested in C++ and fortran languages support

Additional useful configure options

  • Specify where a gcc's OpenMP runtime library libgomp.a is located. Only GCC 4.4 (and after)'s gomp lib should be used to have OpenMP 3.0 support
    • --with-gomp_omp_runtime_library=/usr/apps/gcc/4.4.1/lib/


By default, ROSE is configured with GCC's -O2 and -g options by default so the translators shipped with ROSE should already have some debugging information available. But some variables may be optimized away. To preserve the max debugging information, you may have to reconfigure/recompile rose to turn off GCC optimizations.

  • --with-C_OPTIMIZE=-O0 --with-CXX_OPTIMIZE=-O0 // in the configuration option list
  • --without-CXX_OPTIMIZE --without-C_OPTIMIZE // alternative method
  • --with-C_OPTIMIZE=no --with-CXX_OPTIMIZE=no // third way

To enable more comprehensive testing when typing make check

  • --with-ROSE_LONG_MAKE_CHECK_RULE=yes

If you are interested in the OpenMP lowering translation in ROSE and let it automatically link with GCC's libgomp.a, you should add one more option

  • --with-gomp_omp_runtime_library=/usr/lib/gcc/x86_64-redhat-linux/4.4.5/

Other useful options

  • --enable-boost-version-check=false // disable boost version check

EDG version

edit

Note: C++11 input files to ROSE are NOT supported using EDG 4.9 configuration with GNU compilers 4.9 and greater (configure ROSE using EDG 4.12)


 ../sourcetree/configure --enable-edg_version=4.12

cmake

edit

see more at ROSE Compiler Framework/cmake

EDG 4.x-based ROSE also supports cmake build system.

Here is the CMake command to configure ROSE:

Needs

  • boost
  • jdk: export JAVA_HOME=/home/demo/opt/jdk1.8.0_25/
  • libxml2


$ CC=gcc CXX=g++ cmake ../rose/ -DBOOST_ROOT="$BOOST_HOME" -Denable-cuda:BOOL=off -DCMAKE_BUILD_TYPE:STRING=Debug -DCMAKE_INSTALL_PREFIX:PATH="$(pwd)/../install"

Afterwards, simply run "make" and then "ctest".

make

edit

In ROSE's build tree, type

 cd buildrose
 make core -j4  # This may take a long time depending on your machine configuration

will build the core ROSE, including librose.so, tutorials, projects, tests, and so on. -j4 means to use four processes to perform the build. You can have bigger numbers if your machine supports more concurrent processes. Still, the entire process will take hours to finish.

For most users, building librose.so should be enough for most of their work. In this case, just type

 make -C src/ -j4  # this is much faster. 

turn off silent build

edit

By default, the Automake build system will use silent build to reduce screen output. Details about compilation/linking command lines and options are hidden. In case you want to see the full command lines, you can pass an option to make, like "make V=1".

More background information about this subject is available at https://autotools.io/automake/silent.html.

make check

edit

Optionally, you can type make check to make sure the compiled rose pass all its shipped tests. This takes hours again to go through all make check rules within projects, tutorial, and tests directories.

To save time, you can just run partial tests under a selected directory, like the buildrose/tests

 make -C tests/ check -j4

make install

edit

After "make", it is recommended to run "make install" so rose's library (librose.so), headers (rose.h) and some prebuilt rose-based tools can be installed under the specified installation path using --prefix.

To install everything, type the following command line under your ROSE build tree:

 make install -j8  

A simplified installation target is install-core, which only installs essential binaries and prebuilt tools

 make install-core -j8

set environment variables

edit

After the installation, you should set up some standard environment variables so you can use rose. For bash, the following is an example:

ROSE_INS=/home/userx/opt/rose_installation_tree
PATH=$PATH:$ROSE_INS/bin
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ROSE_INS/lib
# Don't forget to export variables !!!
export PATH LD_LIBRARY_PATH

try out a rose translator

edit

There are quite some pre-built rose translators installed under $ROSE_INS/bin.

You can try identityTranslator, which just parses input code, generates AST, and unparses it back to original code:

  identityTranslator -c helloWorld.c

It should generate an output file named rose_helloWorld.c, which should just look like your input code.



some example options

edit

identityTranslator --help

  -rose:skip_unparse      read and process input file but skip generation of
                             final C++ output file
 -rose:skipfinalCompileStep
                             read and process input file, 
                             but skip invoking the backend compiler

Trouble shooting

edit

We list common issues associated with ROSE's installation.

EDG binary

edit

If you do not have the EDG frontend source code, ROSE's build system will automatically attempt to download an appropriate EDG binary using wget during the build process (i.e. make -C src/frontend/CxxFrontend). This is an example download URL that is generated by ROSE's build system:

$ http://www.rosecompiler.org/edg_binaries/roseBinaryEDG-4-7-x86_64-pc-linux-gnu-GNU-4.4-968750cb07c75694948532c55bfb097684144cc4.tar.gz

The generalized format for the tarball file is as follows:

roseBinaryEDG-<EDG_VERSION>-<ARCHITECTURE>-<GCC_VERSION>-<BINARY_COMPATIBILITY_SIGNATURE.tar.gz

The binary compatibility signature can be manually generated executing the ROSE/scripts/edg-generate-sig script. For example:

$ cd ROSE/
$ ./scripts/edg-generate-sig
968750cb07c75694948532c55bfb097684144cc4

The EDG binaries are platform-specific and have historically been a cause of issues, i.e. Autoconf detecting wrong host/build/platform types. One possible remedy to these problems is to use the Autoconf Build and Host Options:

1. Check what build system Autoconf thinks you have:

$ ./config/config.guess 
x86_64-unknown-linux-gnu

2. Use the appropriate Autoconf options during configuration of ROSE:

$ $ROSE/configure [--build|--host|--target|...]

See Using the Target Type.

A real user's solution:

Hi Justin,

Checking the config.guess file in source tree, I search the apple darwin for detail information in  --build option, 
then  I found that  UNAME-PROCESSOR and UNAME_RELEASE are needed in --build

First, I type uname  -m (for finding UNAME_PROCESSOR in config.guess)
 result :  x86_64 
Second, I type uname -r (for finding UNAME_RELEASE)
 result : 10.8.0 (darwin kernel version)

Third, I type command to configure again, but I added  --build option, then autoconf can directly find the detail platform type
 
/Users/ma23/ROSE/configure --with-CXX_DEBUG=-ggdb3 --with-CXX_WARNINGS=-Wall --with-boost=/Users/ma23/Desktop/ROSE/boost/BOOST_INSTALL 
--with-gfortran=/Users/ma23/Desktop/macports/bin/gfortran-mp-4.4 --with-alternate_backend_fortran_compiler=gfortran-mp-4.4 
GFORTRAN_PATH=/Users/ma23/Desktop/macports/bin/gfortran-mp-4.4 --build=x86_64-apple-darwin10 

At last, make :)

Virtual machine image

edit

Overview

edit

The goal of this page is to document

  • How users can download the virtual machine image (or virtual appliance) and use ROSE out of box.
  • We no longer release VM but docker image instead.

We have three virtual machine images right now

  • V4: we no longer provide VM for Ubuntu 18.04 since the installation process is simple enough. Please follow the few command lines below.
  • V3: the newest VM using Ubuntu 16.04 (Xenial Xerus) with ROSE installed with EDG 4.12 frontend
  • V2: the VM using Ubuntu 14.04 (Trusty Tahr) and ROSE based on EDG 4.x frontend (no longer maintained)
  • V1: the very old VM using Ubuntu 10.04 (Lucid Lynx) and ROSE based on EDG 3.x frontend (no longer maintained)

Docker image for ROSE compiler infrastructure

edit

Getting Started

edit

Before the building you should clone this repository and have Docker installed on your computer.

Docker installation instructions:

sudo apt update
sudo apt upgrade
sudo apt install apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt update
     
sudo apt install docker-ce
# check if docker service is started
sudo systemctl status docker

Creating your Docker image

edit

You can create and provide a fining tuning of parameters on rose using docker. You can find a GitHub project describing how to do this on the following link:

https://github.com/chunhualiao/rose-docker-image

How to download an image

edit

The easiest way is to use rose with docker is just downloading an image which is ready to use (built by Gleison). You can easily access an image using the following command:

sudo docker pull gleisonsdm/rose:latest

Using the image

edit

After downloading the image, you are able to run rose as a micro service on Docker. The easiest way to use the image is running bash commands directly. You can find the binaries at "/usr/rose/bin/" ready to run on our docker image. Please check this using the follow command line:

sudo docker run --rm -it -v $(pwd):/root gleisonsdm/rose:latest ls /usr/rose/bin

Then, the expected output is:

ArrayProcessor			  interproceduralCFG
DataFaultToleranceTransformation  libtool
KeepGoingTranslator		  livenessAnalysis
astCopyReplTest			  loopProcessor
astRewriteExample1		  mangledNameDumper
autoPar				  measureTool
autoTuning			  moveDeclarationToInnermostScope
buildCallGraph			  outline
codeInstrumentor		  pdfGenerator
compassEmptyMain		  preprocessingInfoDumper
compassMain			  qualifiedNameDumper
compassVerifier			  rajaChecker
defaultTranslator		  rose-config
defuseAnalysis			  roseupcc
dotGenerator			  sampleCompassSubset
dotGeneratorWholeASTGraph	  summarizeSignatures
extractMPISkeleton		  typeforge
generateSignatures		  virtualCFG
identityTranslator

The next step is to check if your docker image is working correctly, you can run identityTranslator in a program as a test. We suggest you to run a hello world, as the code provided below:

#include <iostream>

int main() {
  std::cout << "Hello World!\n";
  return 0;
}

After saving this code in a file named "main.cpp", you are able to run ROSE utils directly, but do not forget to [mount](https://docs.docker.com/storage/volumes/) local paths into container. For instance the command below runs container with ROSE, mounts current working directory to container's /root and executes indetityTranslator tool on main.cpp. Resulting files will be stored in current working directory, container will be deleted due to --rm option.

sudo docker run --rm -it -v $(pwd):/root gleisonsdm/rose identityTranslator -c main.cpp

In the end, your current directory contains new files, as listed bellow:

  • main.o
  • main.ti
  • rose_main.cpp

We no longer provide VM for Ubuntu 18.04 since the installation process is simple enough. Please follow the few command lines at: ROSE_Compiler_Framework/Installation#Instructions_for_Ubuntu_18.04

Download

edit

Download the virtual machine image:

  • http://www.rosecompiler.org/Ubuntu-ROSE-Demo-V3.tar.gz
  • Warning: it is a huge file of 4.4 GB (20.2 GB if fully uncompressed). It may take ~1 hour to download depending on your High Speed Internet Connection.
  • Demonstration user account (sudo user in Ubuntu):
    • account: demo
    • password: password

Warning: LLNL users may not be able to download it due to limitations to max downloaded file size within LLNL. It may also be against LLNL's security policy to run a virtual machine without authorization. So this image should not be used inside LLNL.


On windows, you can install 7-zip (http://www.7-zip.org/) to untar the tar ball (.tar.gz file) into a folder.

  • It may take ~ 20 minutes on a desktop PC to fully uncompress it in two steps (.tar.gz to .tar, then .tar to the folder)
  • The final folder size is around 20.2 GB

Content

edit

demo@ubuntu:~$ cat readme

This is a Ubuntu 16.04 virtual machine with the ROSE Compiler installed.

cloned rose-develop on 2/22/2017
version 0.9.7.188

Directory List
~/rose-develop  : git clone https://github.com/rose-compiler/rose-develop
~/build-rose    : build tree of rose
~/opt/rose_inst : installation path of rose
~/tests         : a simple c file, processed by identityTranslator and dotGeneratorWholeASTGraph. 

type zgrviewer -f filename.dot to view a dot file of the AST graph

gcc-4.9.3 is the default gcc

---------- using ROSE ----------

To use the rose translator, you need to first setup the environment.
source ~/set.rose

---------- bashrc ----------

bash env in .bashrc has the following variables by default

# add jdk to PATH and LD_LIBRARY_PATH
PATH=/home/demo/opt/jvm/jdk1.7.0_51/bin:$PATH
LD_LIBRARY_PATH=/home/demo/opt/jvm/jdk1.7.0_51/jre/lib/amd64/server:/home/demo/opt/jvm/jdk1.7.0_51/lib:$LD_LIBRARY_PATH
# add boost to LD_LIBRARY_PATH
LD_LIBRARY_PATH=/home/demo/opt/boost/1.61.0/gcc-4.9.3-default/lib:$LD_LIBRARY_PATH
# create alias for zgrviewer
alias zgrviewer='/home/demo/opt/zgrviewer-0.10.0/run.sh'
export PATH LD_LIBRARY_PATH

---------- configuration of ROSE ----------

CC=/usr/bin/gcc-4.9 CXX=g++-4.9 FC=/usr/bin/gfortran-4.9
CXXFLAGS='-g -rdynamic -Wall -Wno-unused-local-typedefs -Wno-attributes' 
/home/demo/rose-develop/configure 
--enable-assertion-behavior=abort 
--prefix=/home/demo/opt/rose_inst 
--with-CFLAGS=-fPIC --with-CXXFLAGS=-fPIC 
--with-C_OPTIMIZE=-O0 --with-CXX_OPTIMIZE=-O0 
--with-C_DEBUG='-g -rdynamic' --with-CXX_DEBUG='-g -rdynamic' 
--with-C_WARNINGS='-Wall -Wno-unused-local-typedefs -Wno-attributes' 
--with-CXX_WARNINGS='-Wall -Wno-unused-local-typedefs -Wno-attributes' 
--with-ROSE_LONG_MAKE_CHECK_RULE=yes 
--with-boost=/home/demo/opt/boost/1.61.0/gcc-4.9.3-default 
--with-gfortran='/usr/bin/gfortran-4.9' 
--with-python='/usr/bin/python3' 
--with-java=/home/demo/opt/jvm/jdk1.7.0_51/bin/javac 
--enable-languages=all 
--enable-projects-directory 
--with-doxygen 
--without-sqlite3 
--without-libreadline 
--without-magic 
--with-yaml='/home/demo/opt/yaml/0.5.3/boost-1.61.0/gcc-4.9.3-default' 
--with-dlib='/home/demo/opt/dlib/18.18' 
--without-wt 
--without-yices 
--without-pch 
--enable-rosehpct 
--with-gomp_omp_runtime_library=/usr/lib/gcc/x86_64-linux-gnu/4.9/ 
--without-haskell
--enable-edg_version=4.12

make core
make install-core

Installation Notes

edit

The installation procedures followed directions from http://rosecompiler.org/ROSE_HTML_Reference/installation.html except for changing the gcc version to 4.9.3.

A script is provided if you want to repeat the process: https://github.com/rose-compiler/rose-develop/blob/master/scripts/2017-03-ROSE-Unbuntu-16.04-VM-setup.sh

demo@ubuntu:~$ cat installation_notes

ubuntu 64-bit 16.04.1 amd64

name: Ubuntu-ROSE-Demo-V3
username: demo
password: password
Virtual machine name: Ubuntu-ROSE-Demo-V3

hard disk: 30 GB, split
memory 4096 MB
processors 2

cloned rose-develop on 2/22/2017
version 0.9.7.188
installed with: 
gcc 4.9.3
Boost 1.61.0
EDG 4.12

-------------------- INSTALLATION PROCEDURES --------------------

>$ sudo apt-get update
>$ sudo apt-get upgrade
>$ sudo apt-get install git wget build-essential libtool automake flex bison python3-dev unzip perl-doc

---------- change gcc to 4.9.3 ----------
# gcc 4.9 or later must be used to support C++11 features

>$ sudo apt-get install gcc-4.9 g++-4.9 gfortran-4.9
>$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.9 100
>$ sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.9 100
>$ sudo update-alternatives --install /usr/bin/gfortran gfortran /usr/bin/gfortran-4.9 100

check that gcc -v says version 4.9.3

---------- install jdk ----------

# download jdk-7u51-linux-x64.tar.gz (from http://ftp.upf.br/pub/linux/java/jdk-7u51-linux-x64.tar.gz or official oracle website )

>$ mkdir ~/opt/jvm
>$ cd ~/opt/jvm
>$ tar xvf ~/Downloads/jdk-7u51-linux-x64.tar.gz 

# add to .bashrc
PATH=/home/demo/opt/jvm/jdk1.7.0_51/bin:$PATH
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/demo/opt/jvm/jdk1.7.0_51/jre/lib/amd64/server:/home/demo/opt/jvm/jdk1.7.0_51/lib
export PATH LD_LIBRARY_PATH

source ~/.bashrc
check that javac -version says javac 1.7.0_51

---------- installing Boost ----------
# We highly recommend to install boost into its own path, instead of into system wide path. 
# The system default installation has a different directory layout than our build system expects ($BOOST_ROOT/include/boost and $BOOST_ROOT/lib/). 
# If you still want to use the system wide installation of boost, you have to separately specify where to find headers and libraries, 
# for example: --with-boost=/usr --with-boost-libdir=/usr/lib/x86_64-linux-gnu 

# download boost from
# https://sourceforge.net/projects/boost/files/boost/1.61.0/

>$ cd ~/Downloads
>$ wget -O boost-1.61.0.tar.bz2 http://sourceforge.net/projects/boost/files/boost/1.61.0/boost_1_61_0.tar.bz2/download
>$ tar xf boost-1.61.0.tar.bz2
>$ cd boost_1_61_0
>$ ./bootstrap.sh --prefix=/home/demo/opt/boost/1.61.0/gcc-4.9.3-default --with-libraries=chrono,date_time,filesystem,iostreams,program_options,random,regex,serialization,signals,system,thread,wave
>$ ./b2 -sNO_BZIP2=1 install

# add to .bashrc
export LD_LIBRARY_PATH=/home/demo/opt/boost/1.61.0/gcc-4.9.3-default/lib:$LD_LIBRARY_PATH

---------- install zgrviewer ----------

Download and untar the ZGRViewer distribution.
wget -O zgrviewer-0.10.0.zip https://sourceforge.net/projects/zvtm/files/zgrviewer/0.10.0/zgrviewer-0.10.0.zip/download

>$ cd ~/opt/
>$ unzip ~/Downloads/zgrviewer-0.10.0.zip 

Edit it's "run.sh" script so that the ZGRV_HOME variable has the correct value. The scripts/zgrviewerExampleScript has some additional java switches that are useful.
ZGRV_HOME=/home/demo/opt/zgrviewer-0.10.0

Edit ~/.bashrc and add an alias that allows you to run ZGRViewer by typing "zgrviewer":

alias zgrviewer='/home/demo/opt/zgrviewer-0.10.0/run.sh'

>$ sudo apt-get install graphviz

Run zgrviewer and edit the preferences to point to graphviz bins

---------------------------

# doxygen
>$ sudo apt-get install doxygen 

---------------------------

# latex
>$ sudo apt-get install texlive

---------------------------

# needed to install yaml 
>$ sudo apt-get install cmake

# yaml-cpp, for reading YAML or JSON configuration files and storing results. 
# Yaml-cpp must be compiled against the same version of boost used
# download yaml-cpp-yaml-cpp-0.5.3.tar.gz from https://github.com/jbeder/yaml-cpp/releases

# Download source code to ~/Downloads/yaml-cpp-0.5.3.tar.gz
>$ BOOST_ROOT=/home/demo/opt/boost/1.61.0/gcc-4.9.3-default
>$ YAMLCPP_ROOT=/home/demo/opt/yaml/0.5.3/boost-1.61.0/gcc-4.9.3-default
>$ cd ~/Downloads
>$ tar xzvf yaml-cpp-yaml-cpp-0.5.3.tar.gz
>$ mkdir yaml-cpp-yaml-cpp-0.5.3/_build
>$ cd yaml-cpp-yaml-cpp-0.5.3/_build
>$ cmake .. -DBOOST_ROOT=$BOOST_ROOT -DCMAKE_INSTALL_PREFIX=$YAMLCPP_ROOT
>$ make install

---------------------------

# Dlib.
# --with-dlib='/home/demo/opt/dlib/18.18'
# Download tarball from http://dlib.net/  
# or https://sourceforge.net/projects/dclib/files/dlib/v18.18/
# unpack into desired installation directory

>$ mkdir ~/opt/dlib
>$ cd ~/opt/dlib
>$ tar -xf ~/Downloads/dlib-18.18.tar.bz2
>$ mv dlib-18.18 18.18

---------------------------

# For various analysis algorithms that use cryptographic functions
>$ sudo apt-get install libssl-dev libgcrypt11-dev 

---------------------------

# For parsing XML files in certain tools such as roseHPCT and BinaryContextLookup.
>$ sudo apt-get install libxml2-dev 

---------------------------

>$ sudo apt-get install libdwarf-dev

---------- bashrc ----------

.bashrc should have

# add jdk to PATH and LD_LIBRARY_PATH
PATH=/home/demo/opt/jvm/jdk1.7.0_51/bin:$PATH
LD_LIBRARY_PATH=/home/demo/opt/jvm/jdk1.7.0_51/jre/lib/amd64/server:/home/demo/opt/jvm/jdk1.7.0_51/lib:$LD_LIBRARY_PATH
# add boost to LD_LIBRARY_PATH
LD_LIBRARY_PATH=/home/demo/opt/boost/1.61.0/gcc-4.9.3-default/lib:$LD_LIBRARY_PATH
# create alias for zgrviewer
alias zgrviewer='/home/demo/opt/zgrviewer-0.10.0/run.sh'
export PATH LD_LIBRARY_PATH

---------- install rose ----------

>$ cd ~/
>$ git clone https://github.com/rose-compiler/rose-develop

>$ cd rose-develop
>$ ./build
>$ cd ..
>$ mkdir build-rose
>$ cd build-rose
>$ CC=/usr/bin/gcc-4.9 CXX=g++-4.9 FC=/usr/bin/gfortran-4.9 CXXFLAGS='-g -rdynamic -Wall -Wno-unused-local-typedefs -Wno-attributes' /home/demo/rose-develop/configure --enable-assertion-behavior=abort --prefix=/home/demo/opt/rose_inst --with-CFLAGS=-fPIC --with-CXXFLAGS=-fPIC --with-C_OPTIMIZE=-O0 --with-CXX_OPTIMIZE=-O0 --with-C_DEBUG='-g -rdynamic' --with-CXX_DEBUG='-g -rdynamic' --with-C_WARNINGS='-Wall -Wno-unused-local-typedefs -Wno-attributes' --with-CXX_WARNINGS='-Wall -Wno-unused-local-typedefs -Wno-attributes' --with-ROSE_LONG_MAKE_CHECK_RULE=yes --with-boost=/home/demo/opt/boost/1.61.0/gcc-4.9.3-default --with-gfortran='/usr/bin/gfortran-4.9' --with-python='/usr/bin/python3' --with-java=/home/demo/opt/jvm/jdk1.7.0_51/bin/javac --enable-languages=all --enable-projects-directory --with-doxygen --without-sqlite3 --without-libreadline --without-magic --with-yaml='/home/demo/opt/yaml/0.5.3/boost-1.61.0/gcc-4.9.3-default' --with-dlib='/home/demo/opt/dlib/18.18' --without-wt --without-yices --without-pch --enable-rosehpct --with-gomp_omp_runtime_library=/usr/lib/gcc/x86_64-linux-gnu/4.9/ --without-haskell --enable-edg_version=4.12
>$ make core
>$ make install-core

---------- set.rose ----------

# create a file to set the rose environment

>$ cd ~/
>$ cat > set.rose
ROSE_INS=/home/demo/opt/rose_inst
PATH=$PATH:$ROSE_INS/bin
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ROSE_INS/lib
# Don't forget to export variables !!!
export PATH LD_LIBRARY_PATH

------- using ROSE ----------

>$ source set.rose.edg
>$ mkdir tests/
>$ cd tests/
>$ cat > sample.c
void foo()
{
	int a = 0;
	a += 1;
	return a;
}

>$ identityTranslator -c sample.c
>$ dotGenerator -c sample.c 

Download

edit

Download the virtual machine image created by using VMware Player:

  • http://www.rosecompiler.org/Ubuntu-ROSE-Demo-V2.tar.gz
  • Warning: it is a huge file of 6.7 GB (18.2GB if fully uncompressed). It may take ~1 hour to download depending on your High Speed Internet Connection.
  • Demonstration user account (sudo user in Ubuntu):
    • account: demo
    • password: password

Warning: LLNL users may not be able to download it due to limitations to max downloaded file size within LLNL. It may also be against LLNL's security policy to run a virtual machine without authorization. So this image should not be used inside LLNL.


On windows, you can install 7-zip (http://www.7-zip.org/) to untar the tar ball (.tar.gz file) into a folder.

  • It may take ~ 20 minutes on a desktop PC to fully uncompress it in two steps (.tar.gz to .tar, then .tar to the folder)
  • The final folder size is around 18.2 GB

Content

edit

demo@ubuntu:~$ cat readme

This is a Ubuntu 14.04 virtual machine with installed ROSE-edg4.x. 

Directory List
* ~/rose-edg4x.git  : github.com rose edg4.x-based, checked out on Jan 24, 2015

* ~/buildtree  : build tree of rose, using the following configure command: 

../rose-edg4x.git/configure --prefix=/home/demo/opt/rose_inst --with-boost=/home/demo/opt/boost_1.45.0_inst --with-C_OPTIMIZE=-O0 --with-CXX_OPTIMIZE=-O0 --with-gomp_omp_runtime_library=/usr/lib/gcc/x86_64-linux-gnu/4.8/

* ~/opt/rose_inst : installation path of rose (--prefix value)

* ~/rose-project-template.git:  project templates using the installed rose as a library.

*~/tests: a simple c file, processed by identityTranslator and dotGeneratorWholeASTGraph. 
  type run.sh filex.dot to view a dot file of AST graph

bash env in .bashrc has the following variables by default
----------------------
export PATH=$PATH:/home/demo/opt/jdk1.8.0_25/bin:/home/demo/opt/zgrviewer-0.8.2
export LD_LIBRARY_PATH=/home/demo/opt/boost_1.45.0_inst/lib:/home/demo/opt/jdk1.8.0_25/jre/lib/amd64/server:$LD_LIBRARY_PATH
export JAVA_HOME=/home/demo/opt/jdk1.8.0_25/


To use rose translator , you need to type

source ~/set.rose

Installation Notes

edit

At the time of writing, ROSE does not officially support Ubuntu 14.04 and its default gcc 4.8, mostly due to boost and other portability issues. Fortunately, the release process does generate gcc 4.8 EDG binaries.

There are a few tweaks used to successfully compile ROSE on Ubuntu 14.04

Prepare the prerequisite stuff for ROSE installation

  • sudo apt-get install gcc g++ gfortran
  • sudo apt-get install libtool flex bison automake
  • sudo apt-get install zlibc zlib1g zlib1g-dev libbz2-dev // mostly for boost iostreams library

Hack 1 to the system header path:

  • sudo ln -s /usr/include/x86_64-linux-gnu/sys /usr/include/sys
    • This is a hack since rose uses an absolute path to find some system headers. Ubuntu 14 has a different path. A better fix should be available.

Boost 1.45 has to be patched to work with gcc 4.8 so the required thread lib can be installed

Boost 1.45 Patch 1:

Error: /home/demo/development/install/gcc-4.4.7/boost-1.45.0/include/boost/thread/xtime.hpp:23: error: expected identifier before numeric constant

WORKAROUND: in file boost/thread/xtime.hpp WORKAROUND THAT DOES WORK FOR THREAD AND WAVE:

only undefine the C11 macro: (the following 3 lines are new, followed by the existing troublesome enum in the file)

#ifdef TIME_UTC
#undef TIME_UTC
#endif

enum xtime_clock_types
{
    TIME_UTC=1
};

Boost 1.45 Patch 2: force boost to recognize threading is actually supported by gcc 4.7 and after.

patch for boost/config/stdlib/libstdcpp3.hpp

33	33	 
34	34	#ifdef __GLIBCXX__ // gcc 3.4 and greater: 
35	35	#  if defined(_GLIBCXX_HAVE_GTHR_DEFAULT) \ 
36	 	        || defined(_GLIBCXX__PTHREADS) 
 	36	        || defined(_GLIBCXX__PTHREADS) \ 
 	37	        || defined(_GLIBCXX_HAS_GTHREADS) 
37	38	      // 
38	39	      // If the std lib has thread support turned on, then turn it on in Boost 
39	40	      // as well.  We do this because some gcc-3.4 std lib headers define _REENTANT 


Patch 3 to ROSE, bump up the gcc version allowed for boost filesystem

demo@ubuntu:~/rose-edg4x.git$ git diff
diff --git a/src/util/support/FileHelper.h b/src/util/support/FileHelper.h
index d2ca5b6..142e509 100644
--- a/src/util/support/FileHelper.h
+++ b/src/util/support/FileHelper.h
@@ -5,7 +5,8 @@
 // Non-windows support should used boost filesystem 2 if using GNU version less than 4.7.
 #ifndef _MSC_VER
 // #if ((BACKEND_CXX_COMPILER_MAJOR_VERSION_NUMBER == 4) && (BACKEND_CXX_COMPILER_MINOR_VERSION_NUMBER < 7))
-#if (defined(BACKEND_CXX_IS_INTEL_COMPILER) || ((BACKEND_CXX_COMPILER_MAJOR_VERSION_NUMBER == 4) && (BACKEND_CXX_COMPILER_MINOR_VERSION_NUMBER < 7)))
+// Liao, 1/24/2015. Not sure why GCC version is checked when we are taling about boost filesystem version. bumped up to 9 so gcc 4.8 can be supported  (<8 do
+#if (defined(BACKEND_CXX_IS_INTEL_COMPILER) || ((BACKEND_CXX_COMPILER_MAJOR_VERSION_NUMBER == 4) && (BACKEND_CXX_COMPILER_MINOR_VERSION_NUMBER < 9)))
   #define BOOST_FILESYSTEM_VERSION 2
 #endif
 #else
@@ -56,8 +57,8 @@ public:
 
 // DQ (3/8/2014): Adding use of BACKEND_CXX_IS_INTEL_COMPILER to support Intel compiler for backend use.
 #ifndef _MSC_VER
-// #if (defined(_MSC_VER) || ((BACKEND_CXX_COMPILER_MAJOR_VERSION_NUMBER == 4) && (BACKEND_CXX_COMPILER_MINOR_VERSION_NUMBER < 7)))
-#if (defined(BACKEND_CXX_IS_INTEL_COMPILER) || ((BACKEND_CXX_COMPILER_MAJOR_VERSION_NUMBER == 4) && (BACKEND_CXX_COMPILER_MINOR_VERSION_NUMBER < 7)))
+// #if (defined(_MSC_VER) || ((BACKEND_CXX_COMPILER_MAJOR_VERSION_NUMBER == 4) && (BACKEND_CXX_COMPILER_MINOR_VERSION_NUMBER < 9)))
+#if (defined(BACKEND_CXX_IS_INTEL_COMPILER) || ((BACKEND_CXX_COMPILER_MAJOR_VERSION_NUMBER == 4) && (BACKEND_CXX_COMPILER_MINOR_VERSION_NUMBER < 9)))
  // DQ (2/10/2014): I think this is the older BOOST_FILESYSTEM_VERSION 2 specific code.
     static string getFileName(const string& aPath) {
         path boostPath(aPath);
@@ -132,7 +133,7 @@ public:
 #ifndef _MSC_VER
 // DQ (3/8/2014): Adding use of BACKEND_CXX_IS_INTEL_COMPILER to support Intel compiler for backend use.
 // #if ((BACKEND_CXX_COMPILER_MAJOR_VERSION_NUMBER == 4) && (BACKEND_CXX_COMPILER_MINOR_VERSION_NUMBER < 7))
-#if (defined(BACKEND_CXX_IS_INTEL_COMPILER) || ((BACKEND_CXX_COMPILER_MAJOR_VERSION_NUMBER == 4) && (BACKEND_CXX_COMPILER_MINOR_VERSION_NUMBER < 7)))
+#if (defined(BACKEND_CXX_IS_INTEL_COMPILER) || ((BACKEND_CXX_COMPILER_MAJOR_VERSION_NUMBER == 4) && (BACKEND_CXX_COMPILER_MINOR_VERSION_NUMBER < 9)))
          // DQ (2/10/2014): I think this is the older BOOST_FILESYSTEM_VERSION 2 specific code.
             relativePath += *toPathIterator; //The first path element comes without the leading path delimiter
 #else
@@ -147,7 +148,7 @@ public:
             while (toPathIterator != boostToPath.end()) {
 #ifndef _MSC_VER
 // #if (defined(_MSC_VER) || ((BACKEND_CXX_COMPILER_MAJOR_VERSION_NUMBER == 4) && (BACKEND_CXX_COMPILER_MINOR_VERSION_NUMBER < 7)))
-#if (defined(BACKEND_CXX_IS_INTEL_COMPILER) || ((BACKEND_CXX_COMPILER_MAJOR_VERSION_NUMBER == 4) && (BACKEND_CXX_COMPILER_MINOR_VERSION_NUMBER < 7)))
+#if (defined(BACKEND_CXX_IS_INTEL_COMPILER) || ((BACKEND_CXX_COMPILER_MAJOR_VERSION_NUMBER == 4) && (BACKEND_CXX_COMPILER_MINOR_VERSION_NUMBER < 9)))
              // DQ (2/10/2014): I think this is the older BOOST_FILESYSTEM_VERSION 2 specific code.
                 relativePath += pathDelimiter + *toPathIterator;
 #else

For the old VM using ROSE based on EDG 3.x, see ROSE_Compiler_Framework/Virtual_Machine_Image_V1

ROSE tools

edit

Overview

edit

ROSE is a compiler framework to build customized compiler-based tools. A set of example tools are provided as part of the ROSE release to demonstrate the use of ROSE. Some of them are also useful for daily work of ROSE developers.

We list and briefly explain some tools built using ROSE. They are installed under ROSE_INSTALLATION_TREE/bin .

  • some tools are not installed by default. You may have to cd into the tool's subdirectory and type "make install"

Any ROSE tool(translator) works like a compiler. You have to provide all necessary compilation flags to make it work properly. You have to specify some include paths and/or some macro to be defined.

  • One way to use rose tools is to replace your default compiler (gcc for example) with the rose tool command (identityTranslator, outline, etc) in your Makefile. So the tool will be called with all correct flags when you type "make".
  • Alternatively, if you are only interested in processing a single file. You can manually watch the full compilation command line (e.g. gcc .... -c ) used to compile that single file during a normal "make". Then replace your compiler (gcc) with the rose tool you are interested in (e.g. outline) in the command line.

Prerequisites

edit

You have to install ROSE first, by typing configure, make, make install, etc.

You also have to set the environment variables properly before you can call ROSE tools from command line.

For example: if the installation path (or --prefix path in configure) is /home/opt/rose/install, you can have the following script to set the environment variables using bash:

ROSE_INS=/home/opt/rose/install
export ROSE_INS 

PATH=$ROSE_INS/bin:$PATH
export PATH

LD_LIBRARY_PATH=$ROSE_INS/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH

All ROSE tools are installed under the path specified by the --prefix option.

identityTranslator

edit

Source: http://www.rosecompiler.org/ROSE_Tutorial/ROSE-Tutorial.pdf (chapter 2)

This is the simplest tool built using ROSE. It takes input source files , builds AST, and then unparses the AST back to compilable source code. It tries its best to preserve everything from the input file.

Uses

edit

Typical use cases

  • without any options, test if ROSE can compile your code: replace the compiler used by your Makefile with identityTranslator
  • turn on some built-in analysis, translation or optimization phases, such as -rose:openmp:lowering to support OpenMP
    • type "identityTranslator --help" to see all options
  • debug a ROSE-based translator: the first step is often to use identityTranslator to rule out if it is a compilation problem using ROSE
  • use the source of the identityTranslator as a start point to add custom analysis and transformation. The code in the identityTranslator is indeed the minimum code required for almost all kinds of ROSE-based tools.

Source code

edit

identityTranslator.c

#include "rose.h"
int main(int argc, char *argv[]){
	// Build the AST used by ROSE
	SgProject *project = frontend(argc, argv);

	// Run internal consistency tests on AST
	AstTests::runAllTests(project);

	// Insert your own manipulation of the AST here...

	// Generate source code from AST and call the vendor's compiler
	return backend(project);
}

Plugin

edit

Starting from Version 0.9.9.83, ROSE has a new feature to support external plugins. It borrows the design and implementation of Clang Plugins. The interface is very similar to what Clang has, with some simplification and improvements.

With this feature, you can develop your ROSE-based tools as dynamically loadable plugins. Then you can use command line options of ROSE's default translator, identityTranslator (or another ROSE translator), to

  • load shared libraries containing the plugins,
  • specify actions to be executed,
  • as well as pass command line options to each action.

See more at:

Limitations

edit

Due to limitations of the frontends used by ROSE and some internal processing, identityTranslator cannot generate 100% identical output compared to the input file.

Some notable changes it may introduce include:

  • "int a, b, c;" are transformed to three SgVariableDeclaration statements,
  • macros are expanded.
  • extra brackets are added around constants of typedef types (e.g. c=Typedef_Example(12); is translated in the output to c = Typedef_Example((12));)
  • Converting NULL to 0.

AST dot graph generators

edit

Tools to generate AST graph in dot format. There are two versions

  • dotGenerator: simple AST graph generator showing essential nodes and edges
  • dotGeneratorWholeASTGraph: whole AST graph showing more details. It provides filter options to show/hide certain AST information.

command line:

 dotGeneratorWholeASTGraph  yourcode.c  // it is best to avoid include any header into your sample code to have a small enough tree to visualize!
./dotGeneratorWholeASTGraph -rose:help | more
   -rose:help                     show this help message
   -rose:dotgraph:asmFileFormatFilter              [0|1]  Disable or enable asmFileFormat filter
   -rose:dotgraph:asmTypeFilter                    [0|1]  Disable or enable asmType filter
   -rose:dotgraph:binaryExecutableFormatFilter     [0|1]  Disable or enable binaryExecutableFormat filter
   -rose:dotgraph:commentAndDirectiveFilter        [0|1]  Disable or enable commentAndDirective filter
   -rose:dotgraph:ctorInitializerListFilter        [0|1]  Disable or enable ctorInitializerList filter
   -rose:dotgraph:defaultFilter                    [0|1]  Disable or enable default filter
   -rose:dotgraph:defaultColorFilter               [0|1]  Disable or enable defaultColor filter
   -rose:dotgraph:edgeFilter                       [0|1]  Disable or enable edge filter
   -rose:dotgraph:expressionFilter                 [0|1]  Disable or enable expression filter
   -rose:dotgraph:fileInfoFilter                   [0|1]  Disable or enable fileInfo filter
   -rose:dotgraph:frontendCompatibilityFilter      [0|1]  Disable or enable frontendCompatibility filter
   -rose:dotgraph:symbolFilter                     [0|1]  Disable or enable symbol filter
   -rose:dotgraph:emptySymbolTableFilter           [0|1]  Disable or enable emptySymbolTable filter
   -rose:dotgraph:emptyFunctionParameterListFilter [0|1]  Disable or enable emptyFunctionParameterList filter
   -rose:dotgraph:emptyBasicBlockFilter            [0|1]  Disable or enable emptyBasicBlock filter
   -rose:dotgraph:typeFilter                       [0|1]  Disable or enable type filter
   -rose:dotgraph:variableDeclarationFilter        [0|1]  Disable or enable variableDeclaration filter
   -rose:dotgraph:variableDefinitionFilter         [0|1]  Disable or enable variableDefinitionFilter filter
   -rose:dotgraph:noFilter                         [0|1]  Disable or enable no filtering
Current filter flags' values are: 
         m_asmFileFormat = 0 
         m_asmType = 0 
         m_binaryExecutableFormat = 0 
         m_commentAndDirective = 1 
         m_ctorInitializer = 0 
         m_default = 1 
         m_defaultColor = 1 
         m_edge = 1 
         m_emptySymbolTable = 0 
         m_expression = 0 
         m_fileInfo = 1 
         m_frontendCompatibility = 0 
         m_symbol = 0 
         m_type = 0 
         m_variableDeclaration = 0 
         m_variableDefinition = 0 
         m_noFilter = 0 

More information about how to use the tools can be found at How_to_visualize_AST

AST Outliner

edit

Basic concept: outlining is the process of replacing a block of consecutive statements with a function call to a new function containing those statements. Conceptually, outlining the inverse of inlining.

ROSE provide a builtin translator called AST outliner, which can outline a specified portion of code and generate a function from it.

  • Official documentation for the AST outliner is located in Chapter 37 Using the AST Outliner with the ROSE Tutorial. pdf.
  • Supplemental information can be found here at ROSE Compiler Framework/outliner

KeepGoingTranslator

edit

Often a ROSE-based tool may encounter some issues when processing a large scale applications. Users may want to let the tool keep running until the processing is finished and they can check the final results about how many source files are successfully processed or not. This is similar to the GNU make's -k option, by using which make will try to compile every file that can be tried and show as many compilation errors as possible.

ROSE has a builtin -rose:keep_going support. If this feature is turned on and an error occurs, ROSE will simply run your backend compiler on your original source code file, as is, without modification.

To further simplify the use of -rose:keep_going, we provide the Rose::KeepGoing namespace, which internally

  • uses -rose:keep_going and keeps track of successfully processed or failed files, and
  • save such information into log files.

An example translator named KeepGoingTranslator is created to demonstrate the use of this namespace.

In summary, there are three levels of keep going support

  • GNU make's -k option as part of the build system
  • ROSE's -rose:keep_going option, independent from the build system.
  • Rose::KeepGoing namespace with additional logging support

To use the builtin support for logging messages for each file


# specify where to store success and error logs
  Rose::KeepGoing::report_filename__fail = boost::filesystem::path(getenv("HOME")).native()+"/mytool-failed_files.txt";
  Rose::KeepGoing::report_filename__pass = boost::filesystem::path(getenv("HOME")).native()+"/mytool-passed_files.txt";


# in your tool's traversal, add log messages 
    SgSourceFile* file = getEnclosingSourceFile(forloop);
    string s(":");
    string entry= forloop->get_file_info()->get_filename()+s+oss.str(); // add full filename to each log entries
    Rose::KeepGoing::File2StringMap[file]+= entry;


# in your main(), handle errorss

int
main ( int argc, char* argv[])
{
 ...
  vector<string> argvList(argv, argv+argc);
  argvList = commandline_processing (argvList);


   if (CommandlineProcessing::isOption (argvList,"-E","",false))
   {
     preprocessingOnly = true;
     // we should not put debugging info here. Otherwise polluting the generated preprocessed file!!
   }

  SgProject* project = frontend(argvList);
  ROSE_ASSERT (project != NULL);


  // register midend signal handling function
  if (KEEP_GOING_CAUGHT_MIDEND_SIGNAL)
  {
    std::cout
      << "[WARN] "
      << "Configured to keep going after catching a "
      << "signal in myTool"
      << std::endl;
    Rose::KeepGoing::setMidendErrorCode (project, 100);
    goto label_end;
  }
  else
  {
     // Your own traversal for analysis or transformation
    RoseVisitor visitor;

    SgFilePtrList file_ptr_list = project->get_fileList();
    for (size_t i = 0; i<file_ptr_list.size(); i++)
    {
      SgFile* cur_file = file_ptr_list[i];
      SgSourceFile* s_file = isSgSourceFile(cur_file);
      if (s_file != NULL)
      {
        visitor.traverseWithinFile(s_file, preorder);
      }
    }

  }

label_end:


# write log files
 
  int status = backend(project);
 // important: MUST call backend() first, then generate reports.
 // otherwise, backend errors will not be caught by keep-going feature!!

// We want the reports are generated with or without keep_going option
// if (MyTool::keep_going)
  {
    std::vector<std::string> orig_rose_cmdline(argv, argv+argc);
    Rose::KeepGoing::generate_reports (project, orig_rose_cmdline);
  }

  return status;
}

Inliner

edit

see more at ROSE_Compiler_Framework/Inliner

The ROSE Inliner inlines functions at function callsites.

Official documentation about the Inliner is

Source code

Test directory with an example translator and test input files

By looking into Makefile.am, the example translator's source code will generate an executable named "inlineEverything" in your buildtree.

This is the tool you can try to inline your sample code.

The same Makefile.am's make check rules contain sample command lines to use the tool.

call graph generator

edit

source code of this tool

Command line:

 buildCallGraph -c yourprogram.cpp

Control flow graph generator

edit

Command line:

 virtualCFG -c yourprogram.c

autoPar

edit

This is a tool which can automatically insert OpenMP pragmas into input serial C/C++ codes

This tool is an implementation of automatic parallelization using OpenMP. It is used to explore semantics-aware automatic parallelization, as described in one of our paper* .

  • The source files are currently located in rose/projects/autoParallelization.
  • A standalone executable program (named autoPar ) is generated and installed to the installation tree of ROSE (under ROSE INS/bin).
  • you can test the tool in rose_build_tree/projects/autoParallelization by typing "make check"
  • There is a section in ROSE manual pdf: 12.7 Automatic Parallelization

Publications

  • Chunhua Liao, Daniel J. Quinlan, Jeremiah J. Willcock and Thomas Panas, Semantic-Aware Automatic Parallelization of Modern Applications Using High-Level Abstractions, Journal of Parallel Programming, Volume 38, Numbers 5-6, 361-378, August 23. 2010 LLNL-JRNL-421803
  • Chunhua Liao, Daniel J. Quinlan, Jeremiah J. Willcock and Thomas Panas, Extending Automatic Parallelization to Optimize High-Level Abstractions for Multicore, In Proceedings of the 5th international Workshop on OpenMP: Evolving OpenMP in An Age of Extreme Parallelism (Dresden, Germany, June 03 - 05, 2009).

More info is at autoPar

abstract handles

edit

This work is used in the autotuning and also other tools that pass references to source code as part of an interface.

  • essentially it defines ways to create a string to uniquely identify a language construct in your source code
  • any tool then can locate the corresponding node in AST and do the targetted analysis or transformation for you.

Key info


See more at

Loop translators

edit

List of individual loop translators which perform transformation only. So callers use make sure the transformations are semantically correct for your input code

Example command lines to run these loop translators can be found in Makefile.am

  • e.g.: loopUnrolling -rose:loopunroll:abstract_handle 'Statement<position,5>' -rose:loopunroll:factor 3


There is another integrated loopProcessor, which relies on sophisticated analyses to drive a range of loop optimizations.

Declaration move tool

edit

This tool will move variable declarations to their innermost possible used scopes.

For a declaration, find the innermost scope we can move it into, without breaking the code's original semantics.

  • For a single use place, move to the innermost scope.
  • For the case of multiple uses, we may need to duplicate the declarations and move to two scopes if there is no variable reuse in between, otherwise, we move the declaration into the innermost common scope of the multiple uses.


User instructions: The translator accepts the following options:

  • -rose:merge_decl_assign will merge the moved declaration with an immediately followed assignment.
  • -rose:aggressive  : turn on the aggressive mode, which will move declarations with initializers, and across loop boundaries. A warning message will be sent out if the move crosses a loop boundary. Without this option, the tool only moves a declaration without an initializer to be safe.
  • -rose:debug, which is turned on by default in the testing. Some dot graph files will be generated for scope trees of variables for debugging purpose.
  • -rose:keep_going will ignore assertions as much as possible (currently on skip the assertion on complex for loop initialization statement list). Without this option, the tool will stop on assertion failures.
  • -rose:identity will turn off any transformations and act like an identity translator. Useful for debugging purposes.
  • -rose:trans-tracking will turn on the transformation tracking mode, showing the source statements of a move/merged declaration

source code: https://github.com/rose-compiler/rose-develop/blob/master/tests/roseTests/astInterfaceTests/moveDeclarationToInnermostScope.C

  • Note: moved to tests/nonsmoke/functional/roseTests/astInterfaceTests recently

tests: make move_diff_check defined in https://github.com/rose-compiler/rose-develop/blob/master/tests/roseTests/astInterfaceTests/Makefile.am


See more at Declaration move tool

Arithmetic intensity measuring tool

edit

Arithmetic intensity measuring tool : measure arithmetic intensity (FLOPS/Memory) of loops

MPI code generator

edit

A set of code generation functions used to support translating high level domain specific languages into MPI code, with some runtime support.

Compilation Database

edit

A utility program to convert a compilation database json file into a single makefile.

The motivation is to avoid hacking into users' build systems in order to invoke a ROSE-based tool.

With the single makefile, we can freely replace the compiler name and options without understanding the complex build systems.

Details at: https://github.com/rose-compiler/rose/tree/develop/utilities/compilationDatabase2Makefile


Steps to use it

  • generate compilation_database.json file
  • convert it to a makefile
  • rm all object files first from your build tree!!
    • make -f your-makefile clean
  • make -k -f your-makefile all # -k means keep going if somethings goes wrong

TODO

edit

refactor the tool translators

edit

Refactor the tools into a dedicated rose/tools directory. So they will always be built and available by default, with minimum dependency on other things, like which languages are turned on or off (when applicable of course)

Our current idea is we should separate translators used as examples or tutorials AND translators used for creating end-user tools.

  • For tutorial translators, they should NOT be installed as tools by default. Their purpose is to be included in Manual or Tutorial pdf files to illustrate something to developers by examples. Examples should be concise and to the point.
  • On the other hand, translators used to build end-user tools should have much higher standard to accept command options for different, even advanced features. These translators can be very sophisticated since they don't have the page limitation as tutorial examples do.

Supported Programming Languages

edit

Overview

edit

ROSE supports a wide range of main stream programming languages, with different degrees of maturity. The list of supported languages includes:

  • C and C++: based on the EDG C++ frontend
    • An ongoing effort is to upgrade the EDG frontend to its recent 4.4 version.
    • Another ongoing effort is to use clang as an alternative, open-source C/C++ frontend
  • Fortran 77/95/2003: based on the Open Fortran Parser
    • limitations: variables with embedded spaces are not supported.
  • OpenMP 3.0: based on ROSE's own parsing and translation support for both C/C++ and Fortran OpenMP programs.
  • UPC 1.1: this is also based on the EDG 3.3 frontend

Fortran

edit

Regression tests: tests/nonsmoke/functional/CompileTests/Fortran_tests

  • to trigger individual test
    • make test2020_comment_1.f90.passed

Check the detailed command line:

cat test2020_comment_1.f90.passed
WARNING: Command line option -rose:Fortran90 is deprecated! Use -std=f90 instead.
======== CUT ========
+ ../../testTranslator -rose:verbose 0 -rose:detect_dangling_pointers 2 -I../../../../../../sourcetree/tests/nonsmoke/functional/CompileTests/Fortran_tests -rose:f90 -c ../../../../../../sourcetree/tests/nonsmoke/functional/CompileTests/Fortran_tests/test2020_comment_1.f90
ELAPSED_TIME 2
======== CUT ========

OpenMP

edit

See more at OpenMP Support

ROSE supports OpenMP 3.0 for C/C++ (and limited Fortran support).

Configuration: please always try to use --with-gomp_omp_runtime_library=/usr/apps/gcc/4.4.1/lib64/ when configuring ROSE. So the generated ROSE translators can automatically link with libgomp.a to generate executables for you. This will also allow the execution tests of the omp Lowering be executed to catch errors. Without this option, only the compile level tests will run.

Experimental OpenMP Acclerator Model Implementation


Testing

  • There are about 70 builtin execution tests (many have self-verification) in ROSE.

Some benchmarks are used to test OpenMP support in ROSE in Jenkins (our regression test server)

  • a22b-NPB-2.3-C-parallel: all 8 benchmarks pass
  • a21-SPEC-OMP-64bit-parallel: 3 benchmarks pass.
  • LULESH OpenMP version: download

For builtin test:

You have to configure the path to GOMP if you want to them to be automatically executed when "make check" is typed. e.g. ../sourcetree ... --with-gomp_omp_runtime_library=/usr/apps/gcc/4.4.1/lib64/

UPC 1.1.1: this is based on the EDG 3.3 frontend

  • The supported version is limited by the EDG 3.3 frontend, which only supports UPC 1.1.1 ( UPC VERSION string is defined as

200310L). ROSE uses EDG 3.3 currently and it originally only supported UPC 1.0. We merged the UPC 1.1.1 support from EDG 3.10 into our EDG 3.3 frontend. We have also added the required work to support UPC 1.2.

Documentation:

Tests: make check rule under

  • rose/tests/CompileTests/UPC_tests

An example UPC-to-C translator: roseupcc

  • Not full featured. Only intended to serve as a start point for anybody who is interested/funded to implement UPC in ROSE
  • roseupcc is located in ROSE/projects/UpcTranslation
  • Documented by 13.5 An Example UPC-to-C Translator Using ROSE of the ROSE manual

MPI is mostly a library-based programming paradigm. In many cases, you can simply compile MPI applications normally using a ROSE-based translator, as long as you pass the include path of mpi.h to the command line.


However, there are some additional support for MPI in ROSE:

1) If ROSE is configured with --with-mpi=/mpi/install/location, then the configure adds the following variables to be used in Makefile.am

ROSE_WITH_MPI_CFLAGS='-I/usr/sci/scratch/sriram/local/include'
ROSE_WITH_MPI_CLDFLAGS='-L/usr/sci/scratch/sriram/local/lib  -lmpich -lopa  -lmpl  -lrt  -lpthread'
ROSE_WITH_MPI_CXXFLAGS='-I/usr/sci/scratch/sriram/local/include'
ROSE_WITH_MPI_CXXLDFLAGS='-L/usr/sci/scratch/sriram/local/lib  -lmpichcxx -lmpich  -lopa  -lmpl  -lrt  -lpthread'
ROSE_WITH_MPI_C_FALSE='#'
ROSE_WITH_MPI_C_TRUE=''

These flags are useful to compile ROSE as a MPI program. One can also use the CFLAGS to pass the location of mpi.h. These flags will be set for any version of MPI as long as valid install location is passed to --with-mpi.

2)There are a few MPI-specific analysis projects (MPI_Tools, extractMPISkeleton). Some of the dataflow analysis we are working on are i) Slicing for MPI communication ii) MPI communication invariant analysis ( prove that communication is constant in loops). We currently plan on using these analyses with other standard dataflow analyses to

  • Parallel control-flow graph analysis -- build communication graph of the MPI program with no input dependent control-flow or communication
  • Transformation to cDAG -- cDAG is runtime communication optimizer tool for MPI by Torsten Hoefler. We intend to replace MPI calls with cDAG calls to optimize for communication.

CUDA

edit

ROSE has an experimental connection to EDG 4.0, which helps us support CUDA.

To enable parsing CUDA codes, please use the following configuration options:

 --enable-edg-version=4.0 --enable-cuda --enable-edg-cuda

Chapter 16 of ROSE User Manual has more details about this.


More details from Tristan on Sept. 24, 2012

  • "--enable-cuda" option enable CUDA IR in ROSE (IR, preinclude, ...)
  • "--enable-edg-cuda" option only apply to EDG: it activates the EDG support in EDG 4.x (actually I need to patch EDG 4.4)
  • When "--enable-edg-cuda" is present, we also need to have "--enable-edg-version=4.x" (x = {0, 3})
  • "--enable-cuda" is relevant for --enable-only-cuda as the "-edg-" options are targeting only EDG (which is usually distributed as binary).

OpenCL

edit

There is a section discussing this support in Chapter 16 CUDA and OpenCL of the ROSE manual pdf:

Parser Building Blocks

edit

Quick information:

FailSafe Assertion Language

edit

FailSafe Assertion Language An experimental source code annotation language to support resilient computing.

Abstract Syntax Tree (Intermediate Representation)

edit

The main intermediate representation of ROSE is its abstract syntax tree (AST). To use a programming language, you have to get familiar with the language syntax, semantics, etc. To use ROSE, you have to get familiar with its internal representation of an input code.

The best way to know AST is to visualize it using simplest code samples.

Visualization of AST

edit

Overview

edit

Three things are needed to visualize ROSE AST:

  • Sample input code: you provide it
  • a dot graph generator to generate a dot file from AST: ROSE provides dot graph generators
  • a visualization tool to open the dot graph: ZGRViewer and Graphviz are used by ROSE developers

If you don't want to install ROSE+ZGRview + Graphvis from scratch, you can directly use ROSE virtual machine image, which has everything you need installed and configured so you can just visualize your sample code.

Sample input code

edit

Please prepare simplest input code without including any headers so you can get a small enough AST to digest.

Dot Graph Generator

edit

We provide ROSE_INSTALLATION_TREE/bin/dotGeneratorWholeASTGraph (complex graph) and dotGenerator (a simpler version) to generate a dot graph of the detailed AST of input code.

Tools to generate AST graph in dot format. There are two versions

  • dotGenerator: simple AST graph generator showing essential nodes and edges
  • dotGeneratorWholeASTGraph: whole AST graph showing more details. It provides filter options to show/hide certain AST information.

command line:

 dotGeneratorWholeASTGraph  yourcode.c  // it is best to avoid including any header into your input code to have a small enough tree to visualize.

To skip builtin functions

  • dotGeneratorWholeASTGraph -DSKIP_ROSE_BUILTIN_DECLARATIONS yourcode.c
dotGeneratorWholeASTGraph -rose:help
   -rose:help                     show this help message
   -rose:dotgraph:asmFileFormatFilter           [0|1]  Disable or enable asmFileFormat filter
   -rose:dotgraph:asmTypeFilter                 [0|1]  Disable or enable asmType filter
   -rose:dotgraph:binaryExecutableFormatFilter  [0|1]  Disable or enable binaryExecutableFormat filter
   -rose:dotgraph:commentAndDirectiveFilter     [0|1]  Disable or enable commentAndDirective filter
   -rose:dotgraph:ctorInitializerListFilter     [0|1]  Disable or enable ctorInitializerList filter
   -rose:dotgraph:defaultFilter                 [0|1]  Disable or enable default filter
   -rose:dotgraph:defaultColorFilter            [0|1]  Disable or enable defaultColor filter
   -rose:dotgraph:edgeFilter                    [0|1]  Disable or enable edge filter
   -rose:dotgraph:expressionFilter              [0|1]  Disable or enable expression filter
   -rose:dotgraph:fileInfoFilter                [0|1]  Disable or enable fileInfo filter
   -rose:dotgraph:frontendCompatibilityFilter   [0|1]  Disable or enable frontendCompatibility filter
   -rose:dotgraph:symbolFilter                  [0|1]  Disable or enable symbol filter
   -rose:dotgraph:emptySymbolTableFilter        [0|1]  Disable or enable emptySymbolTable filter
   -rose:dotgraph:typeFilter                    [0|1]  Disable or enable type filter
   -rose:dotgraph:variableDeclarationFilter     [0|1]  Disable or enable variableDeclaration filter
   -rose:dotgraph:variableDefinitionFilter      [0|1]  Disable or enable variableDefinitionFilter filter
   -rose:dotgraph:noFilter                      [0|1]  Disable or enable no filtering
Current filter flags' values are: 
         m_asmFileFormat = 0 
         m_asmType = 0 
         m_binaryExecutableFormat = 0 
         m_commentAndDirective = 1 
         m_ctorInitializer = 0 
         m_default = 1 
         m_defaultColor = 1 
         m_edge = 1 
         m_emptySymbolTable = 0 
         m_expression = 0 
         m_fileInfo = 1 
         m_frontendCompatibility = 0 
         m_symbol = 0 
         m_type = 0 
         m_variableDeclaration = 0 
         m_variableDefinition = 0 
         m_noFilter = 0 

Dot Graph Visualization

edit

To visualize the generated dot graph, you have to install

Please note that you have to configure ZGRViewer to have correct paths to some commands it uses. You can do it from its configuration/setting menu item. Or directly modify the text configuration file (.zgrviewer).

One example configuration is shown below (cat .zgrviewer)

<?xml version="1.0" encoding="UTF-8"?>
<zgrv:config xmlns:zgrv="http://zvtm.sourceforge.net/zgrviewer">
    <zgrv:directories>
        <zgrv:tmpDir value="true">/tmp</zgrv:tmpDir>
        <zgrv:graphDir>/home/liao6/svnrepos</zgrv:graphDir>
        <zgrv:dot>/home/liao6/opt/graphviz-2.18/bin/dot</zgrv:dot>
        <zgrv:neato>/home/liao6/opt/graphviz-2.18/bin/neato</zgrv:neato>
        <zgrv:circo>/home/liao6/opt/graphviz-2.18/bin/circo</zgrv:circo>
        <zgrv:twopi>/home/liao6/opt/graphviz-2.18/bin/twopi</zgrv:twopi>
        <zgrv:graphvizFontDir>/home/liao6/opt/graphviz-2.18/bin</zgrv:graphvizFontDir>
    </zgrv:directories>
    <zgrv:webBrowser autoDetect="true" options="" path=""/>
    <zgrv:proxy enable="false" host="" port="80"/>
    <zgrv:preferences antialiasing="false" cmdL_options=""
        highlightColor="-65536" magFactor="2.0" saveWindowLayout="false"
        sdZoom="false" sdZoomFactor="2" silent="true"/>
    <zgrv:plugins/>
    <zgrv:commandLines/>
</zgrv:config>

You have to configure the run.sh script to have correct path also

cat run.sh

#!/bin/sh

# If you want to be able to run ZGRViewer from any directory,
# set ZGRV_HOME to the absolute path of ZGRViewer's main directory
# e.g. ZGRV_HOME=/usr/local/zgrviewer

ZGRV_HOME=/home/liao6/opt/zgrviewer-0.8.1

java -jar $ZGRV_HOME/target/zgrviewer-0.8.1.jar "$@"

Example session

edit

A complete example

# make sure the environment variables(PATH, LD_LIBRARY_PATH) for the installed rose are correctly set
which dotGeneratorWholeASTGraph
~/workspace/masterClean/build64/install/bin/dotGeneratorWholeASTGraph

# run the dot graph generator
dotGeneratorWholeASTGraph -c ttt.c

#see it
which run.sh
~/64home/opt/zgrviewer-0.8.2/run.sh

run.sh ttt.c_WholeAST.dot

example output

edit

We put some example source files and their AST dump files into: https://github.com/chunhualiao/rose-ast

edit

SageInterface functions


// You can call the following functions with gdb

   //! Pretty print AST horizontally, output to std output
   void SageInterface::printAST (SgNode* node); 


   //! Pretty print AST horizontally, output to a specified text file
   void SageInterface::printAST (SgNode* node, const char* filename); 

   //! Pretty print AST horizontally, output to a specified text file.
   void SageInterface::printAST2TextFile (SgNode* node, const char* filename, bool printTypes=true);

A translator (textASTGenerator) is also available, with its source code under exampleTranslators/defaultTranslator .

  • make install-tools will install this tool
  • textASTGenerator input.c will generate a text output of the entire AST

Example use inside of gdb

edit
  • to print a portion of AST to the screen
  • to print a portion of AST into a text file
(gdb) up
#7  0x00007ffff418ab5d in Unparse_ExprStmt::unparseExprStmt (this=0x1a1bf950, stmt=0x7fffda63ce30, info=...) at ../../../sourcetree/src/backend/unparser/CxxCodeGeneration/unparseCxx_statements.C:9889

(gdb) p SageInterface::printAST(stmt)
└──@0x7fffda63ce30 SgExprStatement transformation 0:0
    └──@0x7fffd8488790 SgFunctionCallExp transformation 0:0
        ├──@0x7fffe6211910 SgMemberFunctionRefExp transformation 0:0
        └──@0x7fffd7f2c370 SgExprListExp transformation 0:0
            └──@0x7fffd8488720 SgFunctionCallExp transformation 0:0
                ├──@0x7fffe6211988 SgMemberFunctionRefExp transformation 0:0
                └──@0x7fffd7f2c3d8 SgExprListExp transformation 0:0
$2 = void


(gdb) up 10
#48 0x00007ffff40dce69 in Unparser::unparseFile (this=0x7fffffff8c60, file=0x7fffeb786010, info=..., unparseScope=0x0) at ../../../sourcetree/src/backend/unparser/unparser.C:945
(gdb) p SageInterface::printAST2TextFile(file,"test.txt")

textASTGenerator

edit

Example command line use:

textASTGenerator -c test_qualifiedName.cpp

cat test_qualifiedName.cpp.AST.txt

└──@0x7fe9f1916010 SgProject
    └──@0xb45730 SgFileList
        └──@0x7fe9f17be010 SgSourceFile
            ├──@0x7fe9fdf19120 SgGlobal test_qualifiedName.cpp 0:0
            │   ├──@0x7fe9f159a010 SgTypedefDeclaration rose_edg_required_macros_and_functions.h 0:0
            │   │   └── NULL
            │   ├──@0x7fe9f159a390 SgTypedefDeclaration rose_edg_required_macros_and_functions.h 0:0
            │   │   └── NULL
            │   ├──@0x7fe9f0f59010 SgFunctionDeclaration rose_edg_required_macros_and_functions.h 0:0 "::feclearexcept"
            │   │   ├──@0x7fe9f1391010 SgFunctionParameterList rose_edg_required_macros_and_functions.h 0:0
            │   │   │   └──@0x7fe9f1258010 SgInitializedName rose_edg_required_macros_and_functions.h 0:0 "::__excepts"
            │   │   │       └── NULL
            │   │   ├── NULL
            │   │   └── NULL
            │   ├──@0x7fe9f0f59540 SgFunctionDeclaration rose_edg_required_macros_and_functions.h 0:0 "::fegetexceptflag"
            │   │   ├──@0x7fe9f1391630 SgFunctionParameterList rose_edg_required_macros_and_functions.h 0:0
            │   │   │   ├──@0x7fe9f1258420 SgInitializedName rose_edg_required_macros_and_functions.h 0:0 "::__flagp"
            │   │   │   │   └── NULL
            │   │   │   └──@0x7fe9f1258628 SgInitializedName rose_edg_required_macros_and_functions.h 0:0 "::__excepts"
            │   │   │       └── NULL
            │   │   ├── NULL
            │   │   └── NULL

              ...

            │   └──@0x7fe9eff218c0 SgFunctionDeclaration test_qualifiedName.cpp 14:1 "::foo"
            │       ├──@0x7fe9ef5e0320 SgFunctionParameterList test_qualifiedName.cpp 14:1
            │       │   ├──@0x7fe9ef495278 SgInitializedName test_qualifiedName.cpp 14:13 "x"
            │       │   │   └── NULL
            │       │   └──@0x7fe9ef495480 SgInitializedName test_qualifiedName.cpp 14:20 "y"
            │       │       └── NULL
            │       ├── NULL
            │       └──@0x7fe9ee8f3010 SgFunctionDefinition test_qualifiedName.cpp 15:1
            │           └──@0x7fe9ee988010 SgBasicBlock test_qualifiedName.cpp 15:1
            │               ├──@0x7fe9eee1ba90 SgVariableDeclaration test_qualifiedName.cpp 16:3
            │               │   ├── NULL
            │               │   └──@0x7fe9ef495688 SgInitializedName test_qualifiedName.cpp 16:3 "z"
            │               │       └── NULL
            │               ├──@0x7fe9ee7ad010 SgExprStatement test_qualifiedName.cpp 17:3
            │               │   └──@0x7fe9ee7dc010 SgAssignOp test_qualifiedName.cpp 17:5
            │               │       ├──@0x7fe9ee8c0010 SgVarRefExp test_qualifiedName.cpp 17:3
            │               │       └──@0x7fe9ee813010 SgAddOp test_qualifiedName.cpp 17:9
            │               │           ├──@0x7fe9ee8c0078 SgVarRefExp test_qualifiedName.cpp 17:7
            │               │           └──@0x7fe9ee84a010 SgMultiplyOp test_qualifiedName.cpp 17:12
            │               │               ├──@0x7fe9ee8c00e0 SgVarRefExp test_qualifiedName.cpp 17:11
            │               │               └──@0x7fe9ee881010 SgIntVal test_qualifiedName.cpp 17:13
            │               └──@0x7fe9ee77e010 SgReturnStmt test_qualifiedName.cpp 18:3
            │                   └──@0x7fe9ee8c0148 SgVarRefExp test_qualifiedName.cpp 18:10
            ├── NULL
            ├── NULL
            └── NULL

Render the AST in HTML

edit

The repo errington1/ast-to-html contains a tool for rendering the Rose abstract syntax "graph" as collapsible HTML with shared nodes and cycles represented by HTML links. For now, it's available only from the command line. The plan is to add command-line options to omit parts of the tree and to make the tool available as a library. For now, it somewhat arbitrarily omit portions of the tree that originate from the file rose_edg_required_macros_and_functions.h.

The command:

astToHTML file.C

will produce file.C.html which can be viewed with a browser:

firefox file.C.html

Sanity Check

edit

We provide a set of sanity check for AST. We use them to make sure the AST is consistent. It is also highly recommended that ROSE developers add a sanity check after their AST transformation is done. This has a higher standard than just correctly unparsed code to compilable code. It is common for an AST to unparse correctly but then fail on the sanity check.

The recommend sanity check is

  • AstTests::runAllTests(project); from src/midend/astDiagnostics. Internally, it calls the following checks:
    • TestAstForProperlyMangledNames
    • TestAstCompilerGeneratedNodes
    • AstTextAttributesHandling
    • AstCycleTest
    • TestAstTemplateProperties
    • TestAstForProperlySetDefiningAndNondefiningDeclarations
    • TestAstSymbolTables
    • TestAstAccessToDeclarations
    • TestExpressionTypes
    • TestMangledNames::test()
    • TestParentPointersInMemoryPool::test()
    • TestChildPointersInMemoryPool::test()
    • TestMappingOfDeclarationsInMemoryPoolToSymbols::test()
    • TestLValueExpressions
    • TestMultiFileConsistancy::test() //2009
    • TestAstAccessToDeclarations::test(*i); // named type test


There are some other functions floating around. But they should be merged into AstTests::runAllTests(project)

  • FixSgProject(*project); //in Qing's AST interface
  • Utility::sanityCheck(SgProject* )
  • Utility::consistencyCheck(SgProject*) // SgFile*

Text Output of an AST

edit

Just call: SgNode::unparseToString(). You can call it from any SgLocatedNode within the AST to dump partial AST's text format.

edit

SageInterface functions


   //! Pretty print AST horizontally, output to std output
   void SageInterface::printAST (SgNode* node); 

   //! Pretty print AST horizontally, output to a specified text file.
   void SageInterface::printAST2TextFile (SgNode* node, const char* filename);

A translator (textASTGenerator) is also available, with its source code under exampleTranslators/defaultTranslator .

Example use inside of gdb:

  • to print a portion of AST to the screen
  • to print a portion of AST into a text file
(gdb) up
#7  0x00007ffff418ab5d in Unparse_ExprStmt::unparseExprStmt (this=0x1a1bf950, stmt=0x7fffda63ce30, info=...) at ../../../sourcetree/src/backend/unparser/CxxCodeGeneration/unparseCxx_statements.C:9889

(gdb) p SageInterface::printAST(stmt)
└──@0x7fffda63ce30 SgExprStatement transformation 0:0
    └──@0x7fffd8488790 SgFunctionCallExp transformation 0:0
        ├──@0x7fffe6211910 SgMemberFunctionRefExp transformation 0:0
        └──@0x7fffd7f2c370 SgExprListExp transformation 0:0
            └──@0x7fffd8488720 SgFunctionCallExp transformation 0:0
                ├──@0x7fffe6211988 SgMemberFunctionRefExp transformation 0:0
                └──@0x7fffd7f2c3d8 SgExprListExp transformation 0:0
$2 = void


(gdb) up 10
#48 0x00007ffff40dce69 in Unparser::unparseFile (this=0x7fffffff8c60, file=0x7fffeb786010, info=..., unparseScope=0x0) at ../../../sourcetree/src/backend/unparser/unparser.C:945
(gdb) p SageInterface::printAST2TextFile(file,"test.txt")

Example command line use:

textASTGenerator -c test_qualifiedName.cpp

cat test_qualifiedName.cpp.AST.txt

└──@0x7fe9f1916010 SgProject
    └──@0xb45730 SgFileList
        └──@0x7fe9f17be010 SgSourceFile
            ├──@0x7fe9fdf19120 SgGlobal test_qualifiedName.cpp 0:0
            │   ├──@0x7fe9f159a010 SgTypedefDeclaration rose_edg_required_macros_and_functions.h 0:0
            │   │   └── NULL
            │   ├──@0x7fe9f159a390 SgTypedefDeclaration rose_edg_required_macros_and_functions.h 0:0
            │   │   └── NULL
            │   ├──@0x7fe9f0f59010 SgFunctionDeclaration rose_edg_required_macros_and_functions.h 0:0 "::feclearexcept"
            │   │   ├──@0x7fe9f1391010 SgFunctionParameterList rose_edg_required_macros_and_functions.h 0:0
            │   │   │   └──@0x7fe9f1258010 SgInitializedName rose_edg_required_macros_and_functions.h 0:0 "::__excepts"
            │   │   │       └── NULL
            │   │   ├── NULL
            │   │   └── NULL
            │   ├──@0x7fe9f0f59540 SgFunctionDeclaration rose_edg_required_macros_and_functions.h 0:0 "::fegetexceptflag"
            │   │   ├──@0x7fe9f1391630 SgFunctionParameterList rose_edg_required_macros_and_functions.h 0:0
            │   │   │   ├──@0x7fe9f1258420 SgInitializedName rose_edg_required_macros_and_functions.h 0:0 "::__flagp"
            │   │   │   │   └── NULL
            │   │   │   └──@0x7fe9f1258628 SgInitializedName rose_edg_required_macros_and_functions.h 0:0 "::__excepts"
            │   │   │       └── NULL
            │   │   ├── NULL
            │   │   └── NULL

              ...

            │   └──@0x7fe9eff218c0 SgFunctionDeclaration test_qualifiedName.cpp 14:1 "::foo"
            │       ├──@0x7fe9ef5e0320 SgFunctionParameterList test_qualifiedName.cpp 14:1
            │       │   ├──@0x7fe9ef495278 SgInitializedName test_qualifiedName.cpp 14:13 "x"
            │       │   │   └── NULL
            │       │   └──@0x7fe9ef495480 SgInitializedName test_qualifiedName.cpp 14:20 "y"
            │       │       └── NULL
            │       ├── NULL
            │       └──@0x7fe9ee8f3010 SgFunctionDefinition test_qualifiedName.cpp 15:1
            │           └──@0x7fe9ee988010 SgBasicBlock test_qualifiedName.cpp 15:1
            │               ├──@0x7fe9eee1ba90 SgVariableDeclaration test_qualifiedName.cpp 16:3
            │               │   ├── NULL
            │               │   └──@0x7fe9ef495688 SgInitializedName test_qualifiedName.cpp 16:3 "z"
            │               │       └── NULL
            │               ├──@0x7fe9ee7ad010 SgExprStatement test_qualifiedName.cpp 17:3
            │               │   └──@0x7fe9ee7dc010 SgAssignOp test_qualifiedName.cpp 17:5
            │               │       ├──@0x7fe9ee8c0010 SgVarRefExp test_qualifiedName.cpp 17:3
            │               │       └──@0x7fe9ee813010 SgAddOp test_qualifiedName.cpp 17:9
            │               │           ├──@0x7fe9ee8c0078 SgVarRefExp test_qualifiedName.cpp 17:7
            │               │           └──@0x7fe9ee84a010 SgMultiplyOp test_qualifiedName.cpp 17:12
            │               │               ├──@0x7fe9ee8c00e0 SgVarRefExp test_qualifiedName.cpp 17:11
            │               │               └──@0x7fe9ee881010 SgIntVal test_qualifiedName.cpp 17:13
            │               └──@0x7fe9ee77e010 SgReturnStmt test_qualifiedName.cpp 18:3
            │                   └──@0x7fe9ee8c0148 SgVarRefExp test_qualifiedName.cpp 18:10
            ├── NULL
            ├── NULL
            └── NULL

AST Iterator

edit

1) The iterator class: The iterator follows the STL iterator pattern and is implemented as pre-order traversal and maintains its own stack. The iterator performs the exact same traversal as the traversal classes in ROSE (it is using the same underlying information):

#include "RoseAst.h"
SgNode* node= .... // any subtree

RoseAst ast(node);

for(RoseAst::iterator i=ast.begin();i!=ast.end();++i) {
   cout<<"We are here:"<<(*i)->class_name()<<endl;
}

Some more features:

  • By default it is not traversing null pointers (you won't see them). However, if you want to see&traverse also all the null pointers, you can use the begin function with: ast.begin().withNullValues()
  • It also has a feature to exclude subtrees from traversing during the traversal: You can simply call on the *iterator*:
    • i.skipChildrenOnForward(); ++i; // skips the children of current node and goes to the next node that follows in the traversal after all those children

Relevant sourcefiles

Content of AST

edit

SgType

edit

Some useful member functions

  • get_base_type() :member function on some IR nodes derived from SgType and returns the non-recursively striped (immediate) type under the typedefs, reference, pointers, arrays, modifiers, etc.
  • findBaseType() recursively strip away all
        typedefs, SgTypedefType
        reference, SgReferenceType
        pointers, SgPointerType
        arrays, SgArrayType
        modifiers SgModifierType
  • SgType * stripType (unsigned char bit_array=STRIP_MODIFIER_TYPE|STRIP_REFERENCE_TYPE|STRIP_POINTER_TYPE|STRIP_ARRAY_TYPE|STRIP_TYPEDEF_TYPE) const

Returns hidden type beneath layers of typedefs, pointers, references, modifiers, array representation, etc.

  • SgType * stripTypedefsAndModifiers () const

File location information

edit

All AST nodes with file location information derive from SgLocatedNode, which has start and end Sg_File_Info to indicate begin and end location information.

You can obtain and printout the pair of location information by calling

locatedNode->get_startOfConstruct()->display() ; 

locatedNode->get_endOfConstruct()->display() ;

// get beginning info only
locatedNode->get_file_info()->display() ;

The output for display() may look like

Inside of Sg_File_Info::display(debug.......) 
     isTransformation                      = false 
     isCompilerGenerated                   = true (no position information) 
     isOutputInCodeGeneration              = false 
     isShared                              = false 
     isFrontendSpecific                    = true (part of ROSE support for gnu compatability) 
     isSourcePositionUnavailableInFrontend = false 
     isCommentOrDirective                  = false 
     isToken                               = false 
     file_id  = 2 
     filename = /home/liao6/daily-test-rose/upcwork/install/include/gcc_HEADERS/rose_edg_required_macros_and_functions.h 
     line     = 167  column   = 1 


.... // transformation generated, will be outputted by the unparser
upcr_pshared_ptr_t gsj;
Inside of Sg_File_Info::display(debug.......) 
     isTransformation                      = true (part of a transformation) 
     isCompilerGenerated                   = false 
     isOutputInCodeGeneration              = true (output in code generator) 
     isShared                              = false 
     isFrontendSpecific                    = false 
     isSourcePositionUnavailableInFrontend = false 
     isCommentOrDirective                  = false 
     isToken                               = false 
     file_id  = -3 
     filename = transformation 
     line     = 0  column   = 0 

As you can see, there are AST nodes generated by ROSE's frontends or by a translator. A transformation generated located node may not have line or column numbers.

You can get file name, line, column numbers

 SgLocatedNode* node =  .... ;

  Sg_File_Info* info_start = node->get_startOfConstruct ();
  size_t a_start = (size_t)info_start->get_line ();

   string filename = node->get_file_info()->get_filename();

  Sg_File_Info* info_end = node->get_endOfConstruct ();
  size_t a_end = (info_end == NULL) ? a_start : info_end->get_line ();

Preprocessing Information

edit

See more at ROSE Compiler Framework/PreprocessingInfo

In addition to nodes and edges, ROSE AST may have attributes in addition to nodes and edges that are attached for preprocessing information like #include or #if .. #else. They are attached before, after, or within a nearby AST node (only the one with source location information.)

An example translator will traverse the input code's AST and dump information which may include preprocessing information.

For example

exampleTranslators/defaultTranslator/preprocessingInfoDumper -c main.cxx
-----------------------------------------------
Found an IR node with preprocessing Info attached:
(memory address: 0x2b7e1852c7d0 Sage type: SgFunctionDeclaration) in file
/export/tmp.liao6/workspace/userSupport/main.cxx (line 3 column 1)
-------------PreprocessingInfo #0 ----------- :
classification = CpreprocessorIncludeDeclaration:
  String format = #include "all_headers.h"

relative position is = before

Source: http://www.rosecompiler.org/ROSE_Tutorial/ROSE-Tutorial.pdf (Chapter 29 - Handling Comments, Preprocessor Directives, And Adding Arbitrary Text to Generated Code)

ROSE Compiler Framework/AST Matching

AST Construction

edit

SageBuilder and SageInterface namespaces provide functions to create ASTs and manipulate them. Doxygen docs

Program Translation

edit

With its high level intermediate representation, ROSE is suitable for building source-to-source translators. This is achieved by re-structuring the AST of the input source code, then unparsing the transformed AST to the output source code.

Documentation

edit

Official tutorial: Chapter 32 AST Construction of ROSE Tutorial

Many beginners' questions should be readily answered after reading this chapter.

List of translation

edit

List

Expected behavior of a ROSE Translator

edit

A translator built using ROSE is designed to act like a compiler (gcc, g++,gfortran ,etc depending on the input file types).

So users of the translator only need to change the build system for the input files to use the translator instead of the original compiler.

Processing pragmas

edit

Main article at ROSE Compiler Framework/Processing Pragmas

It is often useful to use pragmas to guide a translator.

A set of parser building functions are provided to help create recursive descent parsers:

Once you include the header AstFromString.h (located in src/frontend/SageIII/astFromString), you can access the variables and functions defined in the namespace.

There is an example project doing pragma parsing and saving the results into AST attributes. https://github.com/rose-compiler/rose-develop/tree/master/projects/pragmaParsing

SageBuilder and SageInterface

edit

The official guide for restructuring/constructing AST highly recommends using helper functions from SageBuilder and SageInterface namespaces to create AST pieces and moving them around. These helper functions try to be stable across low-level changes and be smart enough to transparently set many edges and maintain symbol tables.

Users who want to have lower level control may want to directly invoke the member functions of AST nodes and symbol tables to explicitly manipulate edges and symbols in the AST. But this process is very tedious and error-prone.

It is possible that some builder functions are not yet provided, especially for C++ constructs like template declaration etc. We are actively working on this. In the meantime, you can directly use new operators and other member functions as a workaround.

Steps for writing translators

edit

Prepare the output of your translator

  • prepare a simplest source file (b.c) as an example output of your translator
    • avoid including any system headers
    • use ROSE_INSTALLATION_TREE/bin/dotGeneratorWholeASTGraph to generate a whole AST for b.c , more details for visualize AST are available at How to visualize AST.
  • study the dot graph for AST node types and their parent-child relations.
  • use SageInterface or SageBuilder functions to restruct the source AST graph to be the AST graph you want to generate
    • if there is no SageBuilder function to create what you want. You may have to use new operator to create the nodes and take care of edges, symbols yourself.

More details, see How to create a translator

Order to traverse AST

edit

Naive pre-order traversal is not suitable for building a translator since the translator may change the nodes the traversal is expected to visit later on. Conceptually, this is essentially a similar problem to C++ iterator invalidation.

To safely transform AST, it is recommended to use a reverse iterator of the statement list generated by a preorder traversal. This is different from a list generated from a post order traversal.

For example, assuming we have a subtree of : parent <child 1, child 2>,

  • Pre order traversal will generate a list: parent, child 1, child2
  • Post order traversal will generate a list: child 1, child2, parent.
  • Reverse iterator of the pre order will give you : child2, child 1, and parent. Transforming using this order is the safest based on our experiences.

Example translators

edit

There are many test translators under https://github.com/rose-compiler/rose/tree/master/tests/roseTests/astInterfaceTests

Other examples:

  • Split one complex statement into multiple simpler statements: ROSE/projects/backstroke/ExtractFunctionArguments.C

Transformation Tracking

edit

See Transformation tracking

Abstract Handles

edit

strings used to pinpoint source code constructs. Useful to pass loops, functions etc. to a translator for processing, more at

Trouble shooting

edit

Assertion failed: (expr->get_startOfConstruct() != NULL)

edit

Assertion failed: (expr->get_startOfConstruct() != NULL), function unparseExpression, file ../../../ROSE/src/backend/unparser/languageIndependenceSupport/unparseLanguageIndependentConstructs.C, line 812.

void visitorTraversal::visit(SgNode* sgn){
    
        SageBuilder::pushScopeStack(body);
        SgAssignOp* sao = isSgAssignOp(sgn);
        if(!sao)
            return;
    
        SgVarRefExp* svr = SageBuilder::buildVarRefExp("mami");
        SgIntVal* siv =  SageBuilder::buildIntVal(33);
    
        SgAssignOp* newsao = new SgAssignOp(svr, siv, NULL);
        SageInterface::replaceWithPattern(sao, newsao);
        SageBuilder::popScopeStack();     
    }

The cause is: SgAssignOp* newsao = new SgAssignOp(svr, siv, NULL);

expr->get_startOfConstruct() != NULL says there is no start file position. There is an existing SageBuilder function to build Assign Op and take care of lots of details, including file info objects. Otherwise you have to maintain these details by yourself if you use raw new operators.

Program Analysis

edit

Overview

edit

ROSE have implemented the following compiler analysis

  • call graph analysis
  • control flow graph
  • data flow analysis: including liveness analysis, def-use analysis, etc.
  • dependence analysis
  • side effect analysis

control flow graph

edit

ROSE provides several variants of control flow graphs

Virtual Control Flow Graph

edit

The virtual control flow graph (vcfg) is dynamically generated on the fly when needed. So there is no mismatch between the ROSE AST and its corresponding control flow graph. The downside is that the same vcfg will be re-generated each time it is needed. This can be a potentially a performance bottleneck.

Facts

  • Documentation: virtual CFG is documented in Chapter 19 Virtual CFG of ROSE tutorial pdf
  • Source Files:
    • src/frontend/SageIII/virtualCFG/virtualCFG.h
    • src/frontend/SageIII/virtualCFG/virtualCFG.C //not only give definitions of virtualCFG.h, but also extend AST node support in VirtualCFG
    • src/ROSETTA/Grammar/Statement.code // prototypes of member functions for SgStatement nodes, etc.
    • src/ROSETTA/Grammar/Expression.code // prototypes of member functions for SgExpression nodes, etc.
    • src/ROSETTA/Grammar/Support.code // prototypes of member functions for SgInitialized(LocatedNode) nodes, etc.
    • src/ROSETTA/Grammar/Common.code // prototypes of member functions for other nodes, etc.
    • src/frontend/SageIII/virtualCFG/memberFunctions.C // implementation of virtual CFG related member functions for each AST node
      • This file will help the generation of buildTree/src/frontend/SageIII/Cxx_Grammar.h
  • Test directory: tests/CompileTests/virtualCFG_tests
  • A dot graph generator: generate a dot graph for either the raw or interesting virtual CFG.
    • Source: tests/CompileTests/virtualCFG_tests/generateVirtualCFG.C
    • Installed under rose_ins/bin

How to extend VirtualCFG to support OpenMP

  • how to add CFGNode for SgOmpClause in
  • 1. Identify the class name in ROSETTA in frontend

For example , if SgOmpPrivateClause or SgOmpSharedClause are not support in VirtualCFG, it is necessary to check whether buildTree/src/frontend/SageIII/Cxx_Grammar.h has function prototypes for adding CFGEdge of SgOmpClause, like SgOmpClause::cfgInEdge() SgOmpClause::cfgOutEdge() If there is no prototypes, then that means you CFGNode does not belong to SgExpression, SgStatement and SgExpression. SgOmpClause can be added in src/ROSETTA/Grammar/Support.code,

  • 2. add the function definitions in src/frontend/SageIII/virtualCFG/memberFunctions.C to give the definitions of adding CFGNode and CFGEdge
  step1: construct SgOmpClause::cfgndexForEnd()
            this index is based on the AST graph of your source code, the index is explicit in AST node
   real example:

SgOmpClauseBodyStatement::cfgIndexForEnd() const {

  int size = this->get_clauses().size(); // the number of clauses in #pragma omp parallel
  return (size + 1); // clauses + body

}

  step2: construct cfgInEdge() for this CFGNode
           please refer to AST, since AST can show all node information,
           real example:
           std::vector<CFGEdge> SgOmpClauseBodyStatement::cfgInEdges(unsigned int idx) {
 std::vector<CFGEdge> result;
 addIncomingFortranGotos(this, idx, result);

if( idx == 0 )

  {
   makeEdge( getNodeJustBeforeInContainer( this ), CFGNode( this, idx ), result );
   }
  else
  {
     if( idx == ( this->get_clauses().size() + 1 ) )
      {
       makeEdge( this->get_body()->cfgForEnd(), CFGNode( this, idx ) , result ); //connect variables clauses first, then parallel body
      }
     else
      {
          if( idx < ( this->get_clauses().size() + 1 ) )
           {
              makeEdge( this->get_clauses()[idx -1]->cfgForEnd(), CFGNode( this, idx ), result );//connect variables clauses first, then parallel body
           }
          else
           {
            ROSE_ASSERT( !" Bad index for SgOmpClauseBodyStatement" );
           }
      }
  }

return result; }

 step3: construct cfgOutEdge for CFGNode
 For example:
   std::vector<CFGEdge> SgOmpClauseBodyStatement::cfgOutEdges(unsigned int idx) {//! edited by Hongyi for edges between SgOmpClauseBodyStatement  and SgOmpClause
  std::vector<CFGEdge> result;

addIncomingFortranGotos( this, idx, result ); if( idx == (this->get_clauses().size() + 1 ) )

 {
    makeEdge( CFGNode( this ,idx), getNodeJustAfterInContainer( this ), result );
 }
else
 {
   if( idx == this->get_clauses().size()  )
      {
        makeEdge( CFGNode( this, idx ), this->get_body()->cfgForBeginning(), result ); // connect variable clauses first, parallel body last
      }
     else
     {
       if( idx < this->get_clauses().size() ) // connect variables clauses first, parallel body last
         {
           makeEdge( CFGNode( this, idx ), this->get_clauses()[idx]->cfgForBeginning(), result );
         }
        else
         {
           ROSE_ASSERT( !"Bad index for SgOmpClauseBodyStatement" );
         }
      }
 }

return result; }

  • 3.How to check the result

First check AST graph /Users/ma23/Desktop/Screen shot 2012-08-24 at 11.51.33 AM.png In this example, you will find that there are three subtree from SgOmpParallelStatement One is get_body, the other two are SgOmpPrivateClasue and SgOmpSharedClauserespectively. So the index is 3. // the order to visit CFGNode is to visit clauses first, then parallel body

 
Add caption here

Static Control Flow Graph

edit

Due to the performance concern of virtual control flow graph, we developed another static version which persistently exists in memory like a regular graph.

Facts:

  • Documentation: 19.7 Static CFG of ROSE tutorial pdf
  • Test Directory: rose/tests/CompileTests/staticCFG_tests

Static and Interprocedural CFGs

edit

Facts:

  • Documentation: 19.8 Static, Interprocedural CFGs of ROSE tutorial pdf
  • Test Directory: rose/tests/CompileTests/staticCFG_tests

Virtual Function Analysis

edit

Facts

  • Original contributor: Faizur from UTSA, done in Summer 2011
  • Code: at src/midend/programAnalysis/VirtualFunctionAnalysis.
  • Implemented with the techniques used in the following paper: "Interprocedural Pointer Alias Analysis - http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.42.2382". The paper boils down the virtual function resolution to pointer aliasing problem. The paper employs flow sensitive inter procedural data flow analysis to solve aliasing problem, using compact representation graphs to represent the alias relations.
  • Some test files in the roseTests folder of the ROSE repository and he told me that the implementation supports function pointers as well as code which is written across different files (header files etc).
  • Documentation: Chapter 24 Dataflow Analysis based Virtual Function Analysis, of ROSE tutorial pdf

Def-use analysis

edit

If you want a def-use analysis, try this http://www.rosecompiler.org/ROSE_HTML_Reference/classVariableRenaming.html

VariableRenaming v(project);
v.run();
v.getReachingDefsAtNode(...);


testing

  • cd buildtree/tests/nonsmoke/functional/roseTests/programAnalysisTests/defUseAnalysisTests
  • type ```make check```

liveness analysis

edit

see Liveness analysis

Pointer Analysis

edit

https://mailman.nersc.gov/pipermail/rose-public/2010-September/000390.html

On 9/1/10 11:49 AM, Fredrik Kjolstad wrote:
> Hi all,
>
> I am trying to use Rose as the analysis backend for a refactoring 
> engine and for one of the refactorings I am implementing I need 
> whole-program pointer analysis.  Rose has an implementation of 
> steensgard's algorithm and I have some questions regarding how to use 
> this.
>
> I looked at the file steensgaardTest2.C to figure out how to invoke 
> this analysis and I am a bit perplexed:
>
> 1. The file SteensgaardPtrAnal.h that is included by the test is not 
> present in the include directory of my installed version of Rose. 
>  Does this mean that the Steensgaard implementation is not a part of 
> the shipped compiler, or does it mean that I have to retrieve an 
> instance of it through some factory method whose static return type is 
> PtrAnal?
I believe it is in the shipped compiler. And you're using the correct 
file to figure out how to use it. It should be in the installed include 
directory --- if it is not, it's probably something that needs to be 
fixed. But you can copy the include file from 
ROSE/src/midend/programAnalysis/pointerAnal/ as a temporary fix
> 2. How do I initialize the alias analysis for a given SgProject?  Is 
> this done through the overloaded ()?
The steensgaardTest2.C file shows how to set up everything to invoke the 
analysis. Right now you need to go over each function definition and 
invoke the analysis explicitly, as illustrated by the main function in 
the file.
> 3. Say I want to query whether two pointer variables alias and I have 
> SGNodes to their declarations.  How do I get the AstNodePtr needed to 
> invoke the may_alias(AstInterface&, const AstNodePtr&, const 
> AstNodePtr&) function?  Or maybe I should rather invoke the version of 
> may_alias that takes two strings (varnames)?
To convert a SgNode* x to AstNodePtr, wrap it inside  an AstNodePtrImpl 
object, i.e., do AstNodePtrImpl(x), as illustrated inside the () 
operator of TestPtrAnal in steensgaardTest2.C.
> 4. How do I query whether two parameters alias?
The PtrAnal class has  the following interface method
     may_alias(AstInterface& fa, const AstNodePtr& r1, const AstNodePtr& 
r2);
It is implemented in SteensgaardPtrAnal class, which inherit PtrAnal 
class. To build AstInterface and AstNodePtr,
you simply need to wrap SgNode* with some wrapper classes, illustrated 
by steensgaardTest2.C

-Qing Yi

void func(void) {
int* pointer;
int* aliasPointer;

pointer = malloc(sizeof(int));
aliasPointer = pointer;
*aliasPointer = 42;

printf("%d\n", *pointer);
}

The SteensgaardPtrAnal::output function returns:
c:(sizeof(int )) LOC1=>LOC2
c:42 LOC3=>LOC4
v:func LOC5=>LOC6 (inparams: ) ->(outparams:  LOC7)
v:func-0 LOC8=>LOC7
v:func-2-1 LOC9=>LOC10
v:func-2-3 LOC11=>LOC12 (pending  LOC10 LOC13=>LOC14 =>LOC4 )
v:func-2-4 LOC15=>LOC16 =>LOC17
v:func-2-5 LOC18=>LOC14 =>LOC4
v:func-2-aliasPointer LOC19=>LOC14 =>LOC4
v:func-2-pointer LOC20=>LOC13 =>LOC14 =>LOC4
v:malloc LOC21=>LOC22 (inparams:  LOC2) ->(outparams:  LOC12)
v:printf LOC23=>LOC24 (inparams:  LOC16=>LOC17  LOC14=>LOC4 ) ->(outparams:
 LOC25)

ROSE has implemented an SSA form. Some discussions on the mailing list: link.

Rice branch has an implementation of array SSA. We are waiting for their commits to be pushed into Jenkins. --Liao (discusscontribs) 18:17, 19 June 2012 (UTC)

Side Effect Analysis

edit

There are at least two implementations in ROSE

First one: recommended to use.

  • Inside of SageInterface interface functions: http://rosecompiler.org/ROSE_HTML_Reference/namespaceSageInterface.html
    • such as bool SageInterface::collectReadWriteVariables (SgStatement *stmt, std::set< SgInitializedName * > &readVars, std::set< SgInitializedName * > &writeVars) // you can pass a for loop, its body (a basic block stmt) to this function and get the read/write variables. It returns false if the analysis is not successful. So be sure to check the return value before using the results.
    • This function is a wrapper function for
    • bool AnalyzeStmtRefs(LoopTransformInterface &la, const AstNodePtr& n,CollectObject<AstNodePtr> &wRefs, CollectObject<AstNodePtr> &rRefs) // from DepInfoAnal.C
    • StmtSideEffectCollect(LoopTransformInterface::getSideEffectInterface())(fa,n,&colw,&colr); //src/midend/astUtil/astSupport/StmtInfoCollect.h

Second one: This one is not robust enough for use. It also depends on sqlite library for installation.

  • Source Code: src/midend/programAnalysis/sideEffectAnalysis
  • Tests: tests/roseTests/programAnalysisTests/sideEffectAnalysisTests
  • The algorithm is based on the paper: K. D. Cooper and K. Kennedy. 1988. Interprocedural side-effect analysis in linear time. In Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation (PLDI '88), R. L. Wexelblat (Ed.). ACM, New York, NY, USA, 57-66.

Generic Dataflow Framework

edit

As the ROSE project goes on, we have collected quite some versions of dataflow analysis. It is painful to maintain and use them as they

  • Duplicate the iterative fixed-point algorithm
  • Scatter in different directories and
  • Use different representations for results.

An ongoing effort is to consolidate all dataflow analysis work within a single framework.

Quick facts

  • Original author: Greg Bronevetsky
  • Code reviewer: Chunhua Liao
  • Documentation:
  • Source codes: files under ./src/midend/programAnalysis/genericDataflow
  • Tests: tests/roseTests/programAnalysisTests/generalDataFlowAnalysisTests
  • Currently implemented analysis
    • Dominator analysis: dominatorAnalysis.h dominatorAnalysis.C
    • Livedead variable analysis, or Liveness analysis: liveDeadVarAnalysis.h liveDeadVarAnalysis.C
    • Constant propagation: constantPropagation.h constantPropagation.C: TODO need to move the files into src/ from /tests

See more at Generic Dataflow Framework

Dependence analysis

edit

TODO: it turns out the interface work is not merged into our master branch. So the following instructions do not apply!

The interface for dependence graph could be found in DependencyGraph.h. The underlying representation is n DepGraph.h. BGL is required to access the graph.

Here are 6 examples attached with this email. In deptest.C, there are also some macros to enable more accurate analysis.

If USE_IVS is defined, the induction variable substitution will be performed. if USE_FUNCTION is defined, the dependency could take a user-specified function side-effect interface. Otherwise, if non of them are defined, it will perform a normal dependence analysis and build the graph.

Generic Dataflow Framework

edit

Introduction

edit

As the ROSE project goes on, we have collected quite some versions of dataflow analysis. It is painful to maintain and use them as they

  • duplicate the iterative fixed-point algorithm,
  • scatter in different directories,
  • use different representations for results, and
  • has different level of maturity and robustness.

An ongoing effort is to consolidate all dataflow analysis work within a single framework.

Quick facts

  • original author: Greg Bronevetsky
  • code gatekeeper: Chunhua Liao
  • Documentation:
    • Chapter 18 Generic Dataflow Analysis Framework, of the ROSE tutorial pdf, git commit
    • This wikibook page
  • source codes: files under ./src/midend/programAnalysis/genericDataflow
  • tests: tests/roseTests/programAnalysisTests/generalDataFlowAnalysisTests

Implemented analysis

edit

List

  • Constant Propagation
  • dominator analysis: dominatorAnalysis.h dominatorAnalysis.C
  • livedead variable analysis, or liveness analysis: liveDeadVarAnalysis.h liveDeadVarAnalysis.C
  • Pointer Analysis

Function, nodeState and FunctionState

edit

Function and nodeState are two required parameters to run data flow analysis:

They are stored together inside FunctionState //functionState.h

functionState.h

genericDataflow/cfgUtils/CallGraphTraverse.h

function

edit

An abstraction of functions, internally connected to SgFunctionDeclaration *decl

declared in ./src/midend/programAnalysis/genericDataflow/cfgUtils/CallGraphTraverse.h

constructors:

  • Function::Function(string name) based on function name
  • Function::Function(SgFunctionDeclaration* sample) // core constructor
  • Function::Function(SgFunctionDefinition* sample)

CGFunction* cgFunc; // call graph function

Function func(cgFunc);

NodeFact

edit

any information related to a CFG node.

  • It has no dataflow 's IN/OUT concept
  • not meant to evolve during the dataflow analysis
class NodeFact: public printable
{
        public:

                // returns a copy of this node fact
        virtual NodeFact* copy() const=0;
        
};

NodeState

edit

Store information about multiple analyses and their corresponding lattices, for a given node (CFG node ??)

./src/midend/programAnalysis/genericDataflow/state/nodeState.h

It also provide static functions to

  • initialize NodeState for all DataflowNode
  • to retrieve NodeState for a given DataflowNode
class NodeState
{
     // internal types: map between analysis and set of lattices

     typedef std::map<Analysis*, std::vector<Lattice*> > LatticeMap;
     typedef std::map<Analysis*, std::vector<NodeFact*> > NodeFactMap;
     typedef std::map<Analysis*, bool > BoolMap;

        // the dataflow information Above the node, for each analysis that 
        // may be interested in the current node
        LatticeMap dfInfoAbove;  // IN set in a dataflow
        
        // the Analysis information Below the node, for each analysis that 
        // may be interested in the current node
        LatticeMap dfInfoBelow;  // OUT set in a dataflow, 

        // the facts that are true at this node, for each analysis that 
        // may be interested in the current node
        NodeFactMap facts;
        
        // Contains all the Analyses that have initialized their state at this node. It is a map because
        // TBB doesn't provide a concurrent set.
        BoolMap initializedAnalyses;

// static interfaces 

        // returns the NodeState object associated with the given dataflow node.
        // index is used when multiple NodeState objects are associated with a given node
        // (ex: SgFunctionCallExp has 3 NodeStates: entry, function body, exit)
        static NodeState* getNodeState(const DataflowNode& n, int index=0);


// most useful interface: retrieve the lattices (could be only one) associated with a given analysis

      // returns the map containing all the lattices from above the node that are owned by the given analysis
        // (read-only access)
        const std::vector<Lattice*>& getLatticeAbove(const Analysis* analysis) const;

        // returns the map containing all the lattices from below the node that are owned by the given analysis
        // (read-only access)
        const std::vector<Lattice*>& getLatticeBelow(const Analysis* analysis) const;

}

FunctionState

edit

./src/midend/programAnalysis/genericDataflow/state/functionState.h

A pair of Function and NodeState.

  • it provides static functions to initialize all FunctionState And retrieve FunctionState

class FunctionState
{
        friend class CollectFunctions;
        public:
        Function func;
        NodeState state;
        // The lattices that describe the value of the function's return variables
        NodeState retState;

        private:
        static std::set<FunctionState*> allDefinedFuncs;        
        static std::set<FunctionState*> allFuncs;
        static bool allFuncsComputed;
                
    public:
        FunctionState(Function &func): 
                func(func),
                state(/*func.get_declaration()->cfgForBeginning()*/)
        {}
  // We should use this interface --------------

  // 1. returns a set of all the functions whose bodies are in the project
        static std::set<FunctionState*>& getAllDefinedFuncs();

  // 2. returns the FunctionState associated with the given function
        // func may be any declared function
        static FunctionState* getFuncState(const Function& func);
 ...
} 


FunctionState* fs = new FunctionState(func); // empty From FuntionState to NodeState


/*************************************
 *** UnstructuredPassInterAnalysis ***
 *************************************/
void UnstructuredPassInterAnalysis::runAnalysis()
{
        set<FunctionState*> allFuncs = FunctionState::getAllDefinedFuncs(); // call a static function to get all function state s

        // Go through functions one by one, call an intra-procedural analysis on each of them
        // iterate over all functions with bodies
        for(set<FunctionState*>::iterator it=allFuncs.begin(); it!=allFuncs.end(); it++)
        {
                FunctionState* fState = *it;
                intraAnalysis->runAnalysis(fState->func, &(fState->state));
        }
}

// runs the intra-procedural analysis on the given function, returns true if 
// the function's NodeState gets modified as a result and false otherwise
// state - the function's NodeState
bool UnstructuredPassIntraAnalysis::runAnalysis(const Function& func, NodeState* state)
{
        DataflowNode funcCFGStart = cfgUtils::getFuncStartCFG(func.get_definition(),filter);
        DataflowNode funcCFGEnd = cfgUtils::getFuncEndCFG(func.get_definition(), filter);
        
        if(analysisDebugLevel>=2)
                Dbg::dbg << "UnstructuredPassIntraAnalysis::runAnalysis() function "<<func.get_name().getString()<<"()\n";
        
        // iterate over all the nodes in this function
        for(VirtualCFG::iterator it(funcCFGStart); it!=VirtualCFG::dataflow::end(); it++)
        {
                DataflowNode n = *it;
                // The number of NodeStates associated with the given dataflow node
                //int numStates=NodeState::numNodeStates(n);
                // The actual NodeStates associated with the given dataflow node
                const vector<NodeState*> nodeStates = NodeState::getNodeStates(n);
                
                // Visit each CFG node
                for(vector<NodeState*>::const_iterator itS = nodeStates.begin(); itS!=nodeStates.end(); itS++)
                        visit(func, n, *(*itS));
        }
        return false;
}

example: retrieve the liveness analysis's IN lattice

void getAllLiveVarsAt(LiveDeadVarsAnalysis* ldva, const NodeState& state, set<varID>& vars, string indent)

  • LiveVarsLattice* liveLAbove = dynamic_cast<LiveVarsLattice*>(*(state.getLatticeAbove(ldva).begin()));

Lattices

edit

Caveat: lattice vs. lattice value

  • A lattice by definition is a set of values. However, an instance of lattice type in Generic dataflow framework is used to represent an individual value within a lattice also. Sorry for this confusing. We welcome suggestions to fix this.

Basics

edit

See more at ROSE Compiler Framework/Lattice

Store the data flow analysis information attached to CFG nodes.

Fundamental operations:

  • what to store: lattice value set, bottom, up , and anything in between
  • initialization: LiveDeadVarsAnalysis::genInitState()
  • creation: transfer function
  • meet operation: a member function of the lattice

Example

  • liveness analysis: the live variable set at the entry point of a CFG node:
  • constant propagation: lattice values from no information (bottom) -> unknown --> constant --> too much information (conflicting constant values, top),
// blindly add all of that_arg's values into current lattice's value set
void LiveVarsLattice::incorporateVars(Lattice* that_arg)  

// retrieve a subset lattice information for a given expr. This lattice may contain more information than those about a given expr.
Lattice* LiveVarsLattice::project(SgExpression* expr) 

// add lattice (exprState)information about expr into current lattice's value set: default implementation just calls meetUpdate(exprState)
bool LiveVarsLattice::unProject(SgExpression* expr, Lattice* exprState)  

below/above vs IN/OUT

edit

The concept is based on the original CFG flow direction

  • above: the incoming edge direction
  • below: the outcoming edge direction


IN and OUT depends on the direction of the problem, forward vs. backward

  • forward direction: IN == above lattice, OUT = below lattice
  • backward direction: IN == below lattice, OUT = above lattice

Common Utility Lattices

edit

the framework provides some pre-defined lattices ready for use.

lattice.h/latticeFull.h

  • BoolAndLattice

LiveVarsLattice

edit
class LiveVarsLattice : public FiniteLattice
{
        public:
        std::set<varID> liveVars;  // bottom is all live variables,  top is the empty set, meet brings down the lattice -> union of variables. 
    ...
 };


// Meet operation: simplest set union of two lattices: 

// computes the meet of this and that and saves the result in this
// returns true if this causes this to change and false otherwise
bool LiveVarsLattice::meetUpdate(Lattice* that_arg)
{
        bool modified = false;
        LiveVarsLattice* that = dynamic_cast<LiveVarsLattice*>(that_arg);
        
        // Add all variables from that to this
        for(set<varID>::iterator var=that->liveVars.begin(); var!=that->liveVars.end(); var++) {
                // If this lattice doesn't yet record *var as being live 
                if(liveVars.find(*var) == liveVars.end()) { // this if () statement gives a chance to set the modified flag. 
                                                           // otherwise, liveVars.insert() can be directly called. 
                        modified = true;
                        liveVars.insert(*var);
                }
        }
        
        return modified;        
}

Transfer Function

edit

basics: Data_flow_analysis#flow.2Ftransfer_function

  • IN = sum of OUT (predecessors)
  • OUT = GEN + (IN - KILL)

The impact of program constructs on the current lattices (how to change the current lattices).

  • lattices: stores IN and OUT information
  • additional data members are necessary to store GEN and KILL set inside the transfer function.


class hierarchy:

class IntraDFTransferVisitor : public ROSE_VisitorPatternDefaultBase
{ 
protected:
  // Common arguments to the underlying transfer function
  const Function &func;  // which function are we talking about
  const DataflowNode &dfNode;  // wrapper of CFGNode
  NodeState &nodeState;   // lattice element state, context information?
  const std::vector<Lattice*> &dfInfo;  // data flow information

public:

  IntraDFTransferVisitor(const Function &f, const DataflowNode &n, NodeState &s, const std::vector<Lattice*> &d)
    : func(f), dfNode(n), nodeState(s), dfInfo(d)
  { }

  virtual bool finish() = 0;
  virtual ~IntraDFTransferVisitor() { }

 };



class LiveDeadVarsTransfer : public IntraDFTransferVisitor
{

};


class ConstantPropagationAnalysisTransfer : public VariableStateTransfer<ConstantPropagationLattice>
{}

Constant Propagation

edit
template <class LatticeType>
class VariableStateTransfer : public IntraDFTransferVisitor
{
  ...
};

class ConstantPropagationAnalysisTransfer : public VariableStateTransfer<ConstantPropagationLattice> {};

void
ConstantPropagationAnalysisTransfer::visit(SgIntVal *sgn)
   {
     ROSE_ASSERT(sgn != NULL);
     ConstantPropagationLattice* resLat = getLattice(sgn);
     ROSE_ASSERT(resLat != NULL);
     resLat->setValue(sgn->get_value());
     resLat->setLevel(ConstantPropagationLattice::constantValue);
   }

LiveDead Variable

edit

Functions to convert program point to Generator and KILL set. For liveness analysis

  • Kill (s) = {variables being defined in s}: //
  • Gen (s) = {variables being used in s}

OUT = IN -KILL + GEN

  • OUT is initialized to be IN set,
  • transfer function will apply -KILL + GEN

class LiveDeadVarsTransfer : public IntraDFTransferVisitor

{
  LiveVarsLattice* liveLat;  // the result of this analysis

  bool modified;
  // Expressions that are assigned by the current operation
  std::set<SgExpression*> assignedExprs;  // KILL () set
  // Variables that are assigned by the current operation
  std::set<varID> assignedVars;
  // Variables that are used/read by the current operation
  std::set<varID> usedVars;   // GEN () set

public:
  LiveDeadVarsTransfer(const Function &f, const DataflowNode &n, NodeState &s, const std::vector<Lattice*> &d, funcSideEffectUses *fseu_)
    : IntraDFTransferVisitor(f, n, s, d), indent("    "), liveLat(dynamic_cast<LiveVarsLattice*>(*(dfInfo.begin()))), modified(false), fseu(fseu_)
  {
        if(liveDeadAnalysisDebugLevel>=1) Dbg::dbg << indent << "liveLat="<<liveLat->str(indent + "    ")<<std::endl;
        // Make sure that all the lattice is initialized
        liveLat->initialize();
  }

  bool finish();
 //  operationg on different AST nodes
  void visit(SgExpression *);
  void visit(SgInitializedName *);
  void visit(SgReturnStmt *);
  void visit(SgExprStatement *);
  void visit(SgCaseOptionStmt *);
  void visit(SgIfStmt *);
  void visit(SgForStatement *);
  void visit(SgWhileStmt *);
  void visit(SgDoWhileStmt *);
}


// Helper transfer function, focusing on handling expressions.
// live dead variable analysis: LDVA, 
//  expression transfer: transfer functions for expressions
/// Visits live expressions - helper to LiveDeadVarsTransfer
class LDVAExpressionTransfer : public ROSE_VisitorPatternDefaultBase
{
  LiveDeadVarsTransfer &ldva;

public:

  // Plain assignment: lhs = rhs,  set GEN (read/used) and KILL (written/assigned) sets
  void visit(SgAssignOp *sgn) {
    ldva.assignedExprs.insert(sgn->get_lhs_operand());
                                
    // If the lhs of the assignment is a complex expression (i.e. it refers to a variable that may be live) OR
    // if is a known expression that is known to may-be-live
    // THIS CODE ONLY APPLIES TO RHSs THAT ARE SIDE-EFFECT-FREE AND WE DON'T HAVE AN ANALYSIS FOR THAT YET
    /*if(!isVarExpr(sgn->get_lhs_operand()) || 
      (isVarExpr(sgn->get_lhs_operand()) && 
      liveLat->isLiveVar(SgExpr2Var(sgn->get_lhs_operand()))))
      { */
    ldva.used(sgn->get_rhs_operand());
  }
...
}

Call Stack

edit
(gdb) bt
#0  LDVAExpressionTransfer::visit (this=0x7fffffffcea0, sgn=0xa20320)
    at ../../../../sourcetree/src/midend/programAnalysis/genericDataflow/simpleAnalyses/liveDeadVarAnalysis.C:228
#1  0x00002aaaac3d9968 in SgAssignOp::accept (this=0xa20320, visitor=...) at Cxx_Grammar.C:143069
#2  0x00002aaaadc61c04 in LiveDeadVarsTransfer::visit (this=0xaf9e00, sgn=0xa20320)
    at ../../../../sourcetree/src/midend/programAnalysis/genericDataflow/simpleAnalyses/liveDeadVarAnalysis.C:384
#3  0x00002aaaadbbaef0 in ROSE_VisitorPatternDefaultBase::visit (this=0xaf9e00, variable_SgBinaryOp=0xa20320) at ../../../src/frontend/SageIII/Cxx_Grammar.h:316006
#4  0x00002aaaadbba04a in ROSE_VisitorPatternDefaultBase::visit (this=0xaf9e00, variable_SgAssignOp=0xa20320) at ../../../src/frontend/SageIII/Cxx_Grammar.h:315931
#5  0x00002aaaac3d9968 in SgAssignOp::accept (this=0xa20320, visitor=...) at Cxx_Grammar.C:143069
#6  0x00002aaaadbcca0a in IntraUniDirectionalDataflow::runAnalysis (this=0x7fffffffd9f0, func=..., fState=0xafbd18, analyzeDueToCallers=true, calleesUpdated=...)
    at ../../../../sourcetree/src/midend/programAnalysis/genericDataflow/analysis/dataflow.C:282
#7  0x00002aaaadbbf444 in IntraProceduralDataflow::runAnalysis (this=0x7fffffffda00, func=..., state=0xafbd18)
    at ../../../../sourcetree/src/midend/programAnalysis/genericDataflow/analysis/dataflow.h:74
#8  0x00002aaaadbb0966 in UnstructuredPassInterDataflow::runAnalysis (this=0x7fffffffda50)
    at ../../../../sourcetree/src/midend/programAnalysis/genericDataflow/analysis/analysis.C:467
#9  0x000000000040381a in main (argc=2, argv=0x7fffffffdba8)
    at ../../../../../sourcetree/tests/roseTests/programAnalysisTests/generalDataFlowAnalysisTests/liveDeadVarAnalysisTest.C:101

Control flow graph and call graph

edit

The generic dataflow framework works on virtual control flow graph in ROSE

Filtered Virtual CFG

edit

The raw virtual CFG may not be desirable for all kinds of analyses since it can have too many administrative nodes which are not relevant to a problem.

So the framework provides a filter parameter to the Analysis class. A default filter will be used unless you specify your own filter.

// Example filter funtion deciding if a CFGnNode should show up or not
bool gfilter (CFGNode cfgn)
{
  SgNode *node = cfgn.getNode();

  switch (node->variantT())
  {
    //Keep the last index for initialized names. This way the def of the variable doesn't propagate to its assign initializer.
    case V_SgInitializedName:
      return (cfgn == node->cfgForEnd());

    // For function calls, we only keep the last node. The function is actually called after all its parameters  are evaluated.
    case V_SgFunctionCallExp:
      return (cfgn == node->cfgForEnd());

   //For basic blocks and other "container" nodes, keep the node that appears before the contents are executed
    case V_SgBasicBlock:
    case V_SgExprStatement:
    case V_SgCommaOpExp:
      return (cfgn == node->cfgForBeginning());

   // Must have a default case: return interesting CFGNode by default in this example
    default:
      return cfgn.isInteresting();
  }
}

// Code using the filter function
int
main( int argc, char * argv[] )
{
  SgProject* project = frontend(argc,argv);
  initAnalysis(project);
  LiveDeadVarsAnalysis ldva(project);
  ldva.filter = gfilter; // set the filter to be your own one

  UnstructuredPassInterDataflow ciipd_ldva(&ldva);
  ciipd_ldva.runAnalysis();
  ....
}

Analysis Driver

edit

Key function:

bool IntraUniDirectionalDataflow::runAnalysis(const Function& func, NodeState* fState, bool analyzeDueToCallers, set<Function> calleesUpdated)  // analysis/dataflow.C

Basic tasks: run the analysis by

  • initialize data flow state: lattices and other information
  • walk the CFG : find descendants from a current node
  • call transfer function

Class Hierarchy

edit
  • Analysis -> IntraProceduralAnalysis -> IntraProceduralDataflow -> IntraUnitDataflow --> IntraUniDirectionalDataflow (INTERESTING level)-> IntraBWDataflow -> LiveDeadVarsAnalysis
class Analysis {}; // an empty abstract class for any analysis

class IntraProceduralAnalysis : virtual public Analysis  //analysis/analysis.h ,  any intra procedural analysis, data flow or not
{
  protected: 
   InterProceduralAnalysis* interAnalysis;
  public: 
    void setInterAnalysis(InterProceduralAnalysis* interAnalysis) // connection to inter procedural analysis
    virtual bool runAnalysis(const Function& func, NodeState* state)=0;  // run this per function, NodeState stores lattices for each CFG node, etc.
    virtual ~IntraProceduralAnalysis();
}


//No re-entry. analysis will be executed  once??,   data flow , intra-procedural analysis
// now lattices are interested
class IntraProceduralDataflow : virtual public IntraProceduralAnalysis  //analysis/dataflow.h
{
// initialize lattice etc for a given dataflow node within a function
  virtual void genInitState (const Function& func, const DataflowNode& n, const NodeState& state,  
       std::vector<Lattice*>& initLattices, std::vector<NodeFact*>& initFacts); 

  virtual bool runAnalysis(const Function& func, NodeState* state, bool analyzeDueToCallers, std::set<Function> calleesUpdated)=0;  // the analysis on a function could be triggered by the state changes of function's callers, or callees.

 std::set<Function> visited; // make sure a function is initialized once when visited multiple times

}


class IntraUnitDataflow : virtual public IntraProceduralDataflow
{
  // transfer function: operate on lattices associated with a dataflow node, considering its current state
 virtual bool transfer(const Function& func, const DataflowNode& n, NodeState& state, const std::vector<Lattice*>& dfInfo)=0;

};


// Uni directional dataflow: either forward or backward, but not both directions!
class IntraUniDirectionalDataflow : public IntraUnitDataflow {
public:
 bool runAnalysis(const Function& func, NodeState* state, bool analyzeDueToCallers, std::set<Function> calleesUpdated);

protected:   
  bool propagateStateToNextNode (
             const std::vector<Lattice*>& curNodeState, DataflowNode curDFNode, int nodeIndex,
             const std::vector<Lattice*>& nextNodeState, DataflowNode nextDFNode);

  std::vector<DataflowNode> gatherDescendants(std::vector<DataflowEdge> edges,
                                                    DataflowNode (DataflowEdge::*edgeFn)() const);

        virtual NodeState*initializeFunctionNodeState(const Function &func, NodeState *fState) = 0;
        virtual VirtualCFG::dataflow*
          getInitialWorklist(const Function &func, bool firstVisit, bool analyzeDueToCallers, const set<Function> &calleesUpdated, NodeState *fState) = 0;

        virtual vector<Lattice*> getLatticeAnte(NodeState *state) = 0;
        virtual vector<Lattice*> getLatticePost(NodeState *state) = 0;

        // If we're currently at a function call, use the associated inter-procedural
        // analysis to determine the effect of this function call on the dataflow state.
        virtual void transferFunctionCall(const Function &func, const DataflowNode &n, NodeState *state) = 0;

        virtual vector<DataflowNode> getDescendants(const DataflowNode &n) = 0;
        virtual DataflowNode getUltimate(const Function &func) = 0; // ultimate what?   final CFG node?
}; 

class IntraBWDataflow  : public IntraUniDirectionalDataflow {//BW: Backward
   public:
        
        IntraBWDataflow()
        {}

        NodeState* initializeFunctionNodeState(const Function &func, NodeState *fState);

        VirtualCFG::dataflow*
          getInitialWorklist(const Function &func, bool firstVisit, bool analyzeDueToCallers, const set<Function> &calleesUpdated, NodeState *fState);

        virtual vector<Lattice*> getLatticeAnte(NodeState *state);
        virtual vector<Lattice*> getLatticePost(NodeState *state);

        void transferFunctionCall(const Function &func, const DataflowNode &n, NodeState *state);

        vector<DataflowNode> getDescendants(const DataflowNode &n); // next CFG nodes, depending on the direction
        { return gatherDescendants(n.inEdges(),  &DataflowEdge::source);  } 

        DataflowNode getUltimate(const Function &func); // the last CFG should be the start CFG of the function for a backward dataflow problem
       {   return cfgUtils::getFuncStartCFG(func.get_definition());   }


};

foward intra-procedural data flow analysis: e.g. reaching definition ()

  • class IntraFWDataflow  : public IntraUniDirectionalDataflow

Initialization: InitDataflowState

edit

Used to initialized the lattices/facts for CFG nodes. It is an analysis by itself. unstructured pass

// super class: provides the driver of initialization: visit each CFG node

class UnstructuredPassIntraAnalysis : virtual public IntraProceduralAnalysis
{
public: 
          // call the initialization function on each CFG node
          bool runAnalysis(const Function& func, NodeState* state);
         // to be implemented by InitDataflowState
        virtual void visit(const Function& func, const DataflowNode& n, NodeState& state)=0;

}

bool UnstructuredPassIntraAnalysis::runAnalysis(const Function& func, NodeState* state)
{
        DataflowNode funcCFGStart = cfgUtils::getFuncStartCFG(func.get_definition());
        DataflowNode funcCFGEnd = cfgUtils::getFuncEndCFG(func.get_definition());
        
        if(analysisDebugLevel>=2)
                Dbg::dbg << "UnstructuredPassIntraAnalysis::runAnalysis() function "<<func.get_name().getString()<<"()\n";
        
        // iterate over all the nodes in this function
        for(VirtualCFG::iterator it(funcCFGStart); it!=VirtualCFG::dataflow::end(); it++)
        {
                DataflowNode n = *it;
                // The number of NodeStates associated with the given dataflow node
                //int numStates=NodeState::numNodeStates(n);
                // The actual NodeStates associated with the given dataflow node
                const vector<NodeState*> nodeStates = NodeState::getNodeStates(n);
                
                // Visit each CFG node
                for(vector<NodeState*>::const_iterator itS = nodeStates.begin(); itS!=nodeStates.end(); itS++)
                        visit(func, n, *(*itS));
        }
        return false;
}
//-------------------- derived class provide link to a concrete analysis, and visit() implementation
class InitDataflowState : public UnstructuredPassIntraAnalysis
{
        IntraProceduralDataflow* dfAnalysis;  // link to the dataflow analysis to be initialized
       
        public:
        InitDataflowState(IntraProceduralDataflow* dfAnalysis/*, std::vector<Lattice*> &initState*/)
        {
                this->dfAnalysis = dfAnalysis;
        }
     
        void visit(const Function& func, const DataflowNode& n, NodeState& state);
};


void InitDataflowState::visit (const Function& func, const DataflowNode& n, NodeState& state)
{
   ...
   dfAnalysis->genInitState(func, n, state, initLats, initFacts);
   state.setLattices((Analysis*)dfAnalysis, initLats);
   state.setFacts((Analysis*)dfAnalysis, initFacts);
   ....
}

worklist

edit

list of CFG nodes, accessed through an iterator interface

auto_ptr<VirtualCFG::dataflow> workList(getInitialWorklist(func, firstVisit, analyzeDueToCallers, calleesUpdated, fState));


class iterator //Declared in cfgUtils/VirtualCFGIterator.h  
{
public:             
    std::list<DataflowNode> remainingNodes;
    std::set<DataflowNode> visited;
    bool initialized;
       protected:
        // returns true if the given DataflowNode is in the remainingNodes list and false otherwise
        bool isRemaining(DataflowNode n);
                
        // advances this iterator in the given direction. Forwards if fwDir=true and backwards if fwDir=false.
        // if pushAllChildren=true, all of the current node's unvisited children (predecessors or successors, 
        //    depending on fwDir) are pushed onto remainingNodes
        void advance(bool fwDir, bool pushAllChildren);
        
        public:
        virtual void operator ++ (int);
        
        bool eq(const iterator& other_it) const;
        
        bool operator==(const iterator& other_it) const;
        
        bool operator!=(const iterator& it) const;
...
  
};


void iterator::advance(bool fwDir, bool pushAllChildren)
{
        ROSE_ASSERT(initialized);
        /*printf("   iterator::advance(%d) remainingNodes.size()=%d\n", fwDir, remainingNodes.size());
        cout<<"        visited=\n";
        for(set<DataflowNode>::iterator it=visited.begin(); it!=visited.end(); it++)
                cout << "            <"<<it->getNode()->class_name()<<" | "<<it->getNode()<<" | "<<it->getNode()->unparseToString()<<">\n";*/
        if(remainingNodes.size()>0)
        {
                // pop the next CFG node from the front of the list
                DataflowNode cur = remainingNodes.front();
                remainingNodes.pop_front();
                
                if(pushAllChildren)
                {
                        // find its followers (either successors or predecessors, depending on value of fwDir), push back 
                        // those that have not yet been visited
                        vector<DataflowEdge> nextE;
                        if(fwDir)
                                nextE = cur.outEdges();
                        else
                                nextE = cur.inEdges();
                        for(vector<DataflowEdge>::iterator it=nextE.begin(); it!=nextE.end(); it++)
                        {
                                DataflowNode nextN((*it).target()/* need to put something here because DataflowNodes don't have a default constructor*/);
                                if(fwDir) nextN = (*it).target();
                                else nextN = (*it).source();
                                        
                                /*cout << "      iterator::advance "<<(fwDir?"descendant":"predecessor")<<": "<<
                                                   "<"<<nextN.getNode()->class_name()<<" | "<<nextN.getNode()<<" | "<<nextN.getNode()->unparseToString()<<">, "<<
                                                   "visited="<<(visited.find(nextN) != visited.end())<<
                                                   " remaining="<<isRemaining(nextN)<<"\n";*/
                                
                                // if we haven't yet visited this node and don't yet have it on the remainingNodes list
                                if(visited.find(nextN) == visited.end() &&
                                        !isRemaining(nextN))
                                {
                                        //printf("   pushing back node <%s: 0x%x: %s> visited=%d\n", nextN.getNode()->class_name().c_str(), nextN.getNode(), nextN.getNode()->unparseToString().c_str(), visited.find(nextN)!=visited.end());
                                        remainingNodes.push_back(nextN);
                                }
                        }
                }
                
                // if we still have any nodes left remaining
                if(remainingNodes.size()>0)
                {
                        // take the next node from the front of the list and mark it as visited
                        //visited[remainingNodes.front()] = true;
                        visited.insert(remainingNodes.front());
                }
        }
}



class dataflow :  public virtual iterator {};

class back_dataflow:  public virtual dataflow {};

void back_dataflow::operator ++ (int)
{
        advance(false, true);  // backward,  add all children
}


class IntraUniDirectionalDataflow : public IntraUnitDataflow
{  ...
   virtual VirtualCFG::dataflow*
          getInitialWorklist(const Function &func, bool firstVisit, bool analyzeDueToCallers, const set<Function> &calleesUpdated, NodeState *fState) = 0;
} 

Implemented in derived classes:

  • VirtualCFG::dataflow* IntraFWDataflow::getInitialWorklist ()
  • VirtualCFG::dataflow* IntraBWDataflow::getInitialWorklist()

apply transfer function

edit

b is a basic block in CFG

  •   // information goes into b is the union/join of information comes out of all predecessor nodes of b
  •   // information goes out out S is the information generated by b minus information killed by b. This is the transfer function operating on b!!
bool IntraUniDirectionalDataflow::runAnalysis(const Function& func, NodeState* fState, bool analyzeDueToCallers, set<Function> calleesUpdated)
{

      // Iterate over the nodes in this function that are downstream from the nodes added above
        for(; it != itEnd; it++)
        {
                DataflowNode n = *it;
                SgNode* sgn = n.getNode();

  ...
               for(vector<NodeState*>::const_iterator itS = nodeStates.begin(); itS!=nodeStates.end(); )
                {
                        state = *itS;

                        const vector<Lattice*> dfInfoAnte = getLatticeAnte(state);  // IN set
                        const vector<Lattice*> dfInfoPost = getLatticePost(state);   // OUT set

                            // OUT = IN first     // transfer within  the node: from IN to OUT,                                            
                        // Overwrite the Lattices below this node with the lattices above this node.
                        // The transfer function will then operate on these Lattices to produce the
                        // correct state below this node.
                        
                        vector<Lattice*>::const_iterator itA, itP;
                        int j=0;
                        for(itA  = dfInfoAnte.begin(), itP  = dfInfoPost.begin();
                            itA != dfInfoAnte.end() && itP != dfInfoPost.end(); 
                            itA++, itP++, j++)
                        {
                                if(analysisDebugLevel>=1){  // 
                                        Dbg::dbg << "    Meet Before: Lattice "<<j<<": \n        "<<(*itA)->str("            ")<<endl;
                                        Dbg::dbg << "    Meet After: Lattice "<<j<<": \n        "<<(*itP)->str("            ")<<endl;
                                }
                                (*itP)->copy(*itA);
                                /*if(analysisDebugLevel>=1){
                                        Dbg::dbg << "    Copied Meet Below: Lattice "<<j<<": \n        "<<(*itB)->str("            ")<<endl;
                                }*/
                        }
                        
                        // =================== TRANSFER FUNCTION ===================
                        //  (IN - KILL ) + GEN
                        if (isSgFunctionCallExp(sgn))
                          transferFunctionCall(func, n, state);

                        boost::shared_ptr<IntraDFTransferVisitor> transferVisitor = getTransferVisitor(func, n, *state, dfInfoPost);
                        sgn->accept(*transferVisitor);
                        modified = transferVisitor->finish() || modified;

                        // =================== TRANSFER FUNCTION ===================
       ...//
    }

} 

propagate state to next (meetUpdate)

edit

This is prove to be essential to propagate information along the path. Cannot commenting it out!!

??? not sure about the difference between this step and the step before (Meet Before () / Meet After)

meetUpdate() is called here also

// Propagates the dataflow info from the current node's NodeState (curNodeState) to the next node's 
//     NodeState (nextNodeState).
// Returns true if the next node's meet state is modified and false otherwise.
bool IntraUniDirectionalDataflow::propagateStateToNextNode(
                      const vector<Lattice*>& curNodeState, DataflowNode curNode, int curNodeIndex,
                      const vector<Lattice*>& nextNodeState, DataflowNode nextNode)
{
        bool modified = false;
        vector<Lattice*>::const_iterator itC, itN;
        if(analysisDebugLevel>=1){
                Dbg::dbg << "\n        Propagating to Next Node: "<<nextNode.getNode()<<"["<<nextNode.getNode()->class_name()<<" | "<<Dbg::escape(nextNode.getNode()->unparseToString())<<"]"<<endl;
                int j;
                for(j=0, itC = curNodeState.begin(); itC != curNodeState.end(); itC++, j++)
                        Dbg::dbg << "        Cur node: Lattice "<<j<<": \n            "<<(*itC)->str("            ")<<endl;
                for(j=0, itN = nextNodeState.begin(); itN != nextNodeState.end(); itN++, j++)
                        Dbg::dbg << "        Next node: Lattice "<<j<<": \n            "<<(*itN)->str("            ")<<endl;
        }

        // Update forward info above nextNode from the forward info below curNode.
        
        // Compute the meet of the dataflow information along the curNode->nextNode edge with the 
        // next node's current state one Lattice at a time and save the result above the next node.
        for(itC = curNodeState.begin(), itN = nextNodeState.begin();
            itC != curNodeState.end() && itN != nextNodeState.end(); 
            itC++, itN++)
        {
                // Finite Lattices can use the regular meet operator, while infinite Lattices
                // must also perform widening to ensure convergence.
                if((*itN)->finiteLattice())
                        modified = (*itN)->meetUpdate(*itC) || modified;
                else
                {
                        //InfiniteLattice* meetResult = (InfiniteLattice*)itN->second->meet(itC->second);
                        InfiniteLattice* meetResult = dynamic_cast<InfiniteLattice*>((*itN)->copy());
                        Dbg::dbg << "        *itN: " << dynamic_cast<InfiniteLattice*>(*itN)->str("            ") << endl;
                        Dbg::dbg << "        *itC: " << dynamic_cast<InfiniteLattice*>(*itC)->str("            ") << endl;
                        meetResult->meetUpdate(*itC);
                        Dbg::dbg << "        meetResult: " << meetResult->str("            ") << endl;
                
                        // Widen the resulting meet
                        modified =  dynamic_cast<InfiniteLattice*>(*itN)->widenUpdate(meetResult);
                        delete meetResult;
                }
        }
        
        if(analysisDebugLevel>=1) {
                if(modified)
                {
                        Dbg::dbg << "        Next node's in-data modified. Adding..."<<endl;
                        int j=0;
                        for(itN = nextNodeState.begin(); itN != nextNodeState.end(); itN++, j++)
                        {
                                Dbg::dbg << "        Propagated: Lattice "<<j<<": \n            "<<(*itN)->str("            ")<<endl;
                        }
                }
                else
                        Dbg::dbg << "        No modification on this node"<<endl;
        }

        return modified;
}

stop condition

edit
class IntraUniDirectionalDataflow : public IntraUnitDataflow
{
public:
         protected:
        // propagates the dataflow info from the current node's NodeState (curNodeState) to the next node's  NodeState (nextNodeState)
           // return true if any state is modified.
        bool propagateStateToNextNode(
             const std::vector<Lattice*>& curNodeState, DataflowNode curDFNode, int nodeIndex,
             const std::vector<Lattice*>& nextNodeState, DataflowNode nextDFNode);

}

live dead variable

edit

Backward Intra-Procedural Dataflow Analysis: e.g. liveness analysis ( use --> backward --> defined)

  • class IntraBWDataflow  : public IntraUniDirectionalDataflow
class LiveDeadVarsAnalysis : public IntraBWDataflow {

  protected:
        funcSideEffectUses* fseu;
       
  public:
        LiveDeadVarsAnalysis(SgProject *project, funcSideEffectUses* fseu=NULL);
        
 // Generates the initial lattice state for the given dataflow node, in the given function, with the given NodeState
 void genInitState(const Function& func, const DataflowNode& n, const NodeState& state,
                          std::vector<Lattice*>& initLattices, std::vector<NodeFact*>& initFacts);

        
  boost::shared_ptr<IntraDFTransferVisitor> getTransferVisitor(const Function& func, const DataflowNode& n,
                                                               NodeState& state, const std::vector<Lattice*>& dfInfo)
  { return boost::shared_ptr<IntraDFTransferVisitor>(new LiveDeadVarsTransfer(func, n, state, dfInfo, fseu)); }

  bool transfer(const Function& func, const DataflowNode& n, NodeState& state, const std::vector<Lattice*>& dfInfo) { assert(0); return false; }


}; 

Inter-procedural analysis

edit

Key: transfer function that is applied to call sites to perform the appropriate state transfers across function boundaries.

transfer function

edit

void IntraFWDataflow::transferFunctionCall(const Function &func, const DataflowNode &n, NodeState *state)
{
  vector<Lattice*> dfInfoBelow = state->getLatticeBelow(this);

  vector<Lattice*>* retState = NULL;
  dynamic_cast<InterProceduralDataflow*>(interAnalysis)->
    transfer(func, n, *state, dfInfoBelow, &retState, true);

  if(retState && !(retState->size()==0 || (retState->size() == dfInfoBelow.size()))) {
    Dbg::dbg << "#retState="<<retState->size()<<endl;
    for(vector<Lattice*>::iterator ml=retState->begin(); ml!=retState->end(); ml++)
      Dbg::dbg << "        "<<(*ml)->str("            ")<<endl;
    Dbg::dbg << "#dfInfoBelow="<<dfInfoBelow.size()<<endl;
    for(vector<Lattice*>::const_iterator l=dfInfoBelow.begin(); l!=dfInfoBelow.end(); l++)
      Dbg::dbg << "        "<<(*l)->str("            ")<<endl;
  }

  // Incorporate information about the function's return value into the caller's dataflow state
  // as the information of the SgFunctionCallExp
  ROSE_ASSERT(retState==NULL || retState->size()==0 || (retState->size() == dfInfoBelow.size()));
  if(retState) {
    vector<Lattice*>::iterator lRet;
    vector<Lattice*>::const_iterator lDF;
    for(lRet=retState->begin(), lDF=dfInfoBelow.begin(); 
        lRet!=retState->end(); lRet++, lDF++) {
      Dbg::dbg << "    lDF Before="<<(*lDF)->str("        ")<<endl;
      Dbg::dbg << "    lRet Before="<<(*lRet)->str("        ")<<endl;
      (*lDF)->unProject(isSgFunctionCallExp(n.getNode()), *lRet);
      Dbg::dbg << "    lDF After="<<(*lDF)->str("        ")<<endl;
    }
  }
}

InterProceduralDataflow

edit
InterProceduralDataflow::InterProceduralDataflow(IntraProceduralDataflow* intraDataflowAnalysis) :
        InterProceduralAnalysis((IntraProceduralAnalysis*)intraDataflowAnalysis)


 // !!! NOTE: cfgForEnd() AND cfgForBeginning() PRODUCE THE SAME SgFunctionDefinition SgNode BUT THE DIFFERENT INDEXES
                        // !!!       (0 FOR BEGINNING AND 3 FOR END). AS SUCH, IT DOESN'T MATTER WHICH ONE WE CHOOSE. HOWEVER, IT DOES MATTER
                        // !!!       WHETHER WE CALL genInitState TO GENERATE THE STATE BELOW THE NODE (START OF THE FUNCTION) OR ABOVE IT 
                        // !!!       (END OF THE FUNCTION). THE CAPABILITY TO DIFFERENTIATE THE TWO CASES NEEDS TO BE ADDED TO genInitState
                        // !!!       AND WHEN IT IS, WE'LL NEED TO CALL IT INDEPENDENTLY FOR cfgForEnd() AND cfgForBeginning() AND ALSO TO MAKE
                        // !!!       TO SET THE LATTICES ABOVE THE ANALYSIS 


TODO: begin and end func definition issue is mentioned inside of this

simplest form:unstructured

edit

Simplest form: No transfer action at call sites at all

class UnstructuredPassInterDataflow : virtual public InterProceduralDataflow
{
        public:
        
        UnstructuredPassInterDataflow(IntraProceduralDataflow* intraDataflowAnalysis) 
                             : InterProceduralAnalysis((IntraProceduralAnalysis*)intraDataflowAnalysis), InterProceduralDataflow(intraDataflowAnalysis)
        {}
                
        // the transfer function that is applied to SgFunctionCallExp nodes to perform the appropriate state transfers
        // fw - =true if this is a forward analysis and =false if this is a backward analysis
        // n - the dataflow node that is being processed
        // state - the NodeState object that describes the dataflow state immediately before (if fw=true) or immediately after 
        //         (if fw=false) the SgFunctionCallExp node, as established by earlier analysis passes
        // dfInfo - the Lattices that this transfer function operates on. The function propagates them 
        //          to the calling function and overwrites them with the dataflow result of calling this function.
        // retState - Pointer reference to a Lattice* vector that will be assigned to point to the lattices of
        //          the function call's return value. The callee may not modify these lattices.
        // Returns true if any of the input lattices changed as a result of the transfer function and
        //    false otherwise.  
        bool transfer(const Function& func, const DataflowNode& n, NodeState& state, 
                      const std::vector<Lattice*>& dfInfo, std::vector<Lattice*>** retState, bool fw)
        { 
                return false;
        }
        
        void runAnalysis();
};

// simply call intra-procedural analysis on each function one by one.
void UnstructuredPassInterDataflow::runAnalysis()
{
        set<FunctionState*> allFuncs = FunctionState::getAllDefinedFuncs();
        
        // iterate over all functions with bodies
        for(set<FunctionState*>::iterator it=allFuncs.begin(); it!=allFuncs.end(); it++)
        {
                const Function& func = (*it)->func;
                FunctionState* fState = FunctionState::getDefinedFuncState(func);
                
                // Call the current intra-procedural dataflow as if it were a generic analysi
                intraAnalysis->runAnalysis(func, &(fState->state));
        }
}

ContextInsensitiveInterProceduralDataflow

edit

TODO

How to use one analysis

edit

Call directly

edit

Direct call: Runs the intra-procedural analysis on the given function and returns true if the function's NodeState gets modified as a result and false otherwise state - the function's NodeState

  • bool IntraUniDirectionalDataflow::runAnalysis(const Function& func, NodeState* state, bool analyzeDueToCallers, std::set<Function> calleesUpdated);
  • direct call with a simpler parameter list : not feasible, all intra procedural analysis has to have an inter procedural analysis set interally!
bool IntraProceduralDataflow::runAnalysis(const Function& func, NodeState* state)
{
   // Each function is analyzed as if it were called directly by the language's runtime, ignoring 
  // the application's actual call graph
    bool analyzeDueToCallers = true; 
                
    // We ignore the application's call graph, so it doesn't matter whether this function calls other functions
   std::set<Function> calleesUpdated;
                
      return runAnalysis(func, state, analyzeDueToCallers, calleesUpdated);
}

Through inter-procedural analysis

edit

Invoke a simple intra-procedural analysis through the unstructured pass inter-procedural data flow class

int main()
{
  SgProject* project = frontend(argc,argv);
  initAnalysis(project);

  // prepare debugging support
  Dbg::init("Live dead variable analysis Test", ".", "index.html");
  liveDeadAnalysisDebugLevel = 1;
  analysisDebugLevel = 1;

  // basis analysis
  LiveDeadVarsAnalysis ldva(project);
     // wrap it inside the unstructured inter-procedural data flow
   UnstructuredPassInterDataflow ciipd_ldva(&ldva);
   ciipd_ldva.runAnalysis();
  
   .....

}

Retrieve lattices

edit

Sample code:

// Initialize vars to hold all the variables and expressions that are live at DataflowNode n
//void getAllLiveVarsAt(LiveDeadVarsAnalysis* ldva, const DataflowNode& n, const NodeState& state, set<varID>& vars, string indent)
void getAllLiveVarsAt(LiveDeadVarsAnalysis* ldva, const NodeState& state, set<varID>& vars, string indent)
{
        LiveVarsLattice* liveLAbove = dynamic_cast<LiveVarsLattice*>(*(state.getLatticeAbove(ldva).begin()));
        LiveVarsLattice* liveLBelow = dynamic_cast<LiveVarsLattice*>(*(state.getLatticeBelow(ldva).begin()));

        // The set of live vars AT this node is the union of vars that are live above it and below it
        for(set<varID>::iterator var=liveLAbove->liveVars.begin(); var!=liveLAbove->liveVars.end(); var++)
                vars.insert(*var);
        for(set<varID>::iterator var=liveLBelow->liveVars.begin(); var!=liveLBelow->liveVars.end(); var++)
                vars.insert(*var);
}

Testing

edit

It is essential to have a way to test the analysis results are correct.

We currently use a primitive way to test the correctness of analysis: comparing pragma and lattice string output

Two examples translators testing analysis correctness(comparing pragma and lattice string output):


An example test input file for liveness analysis's correctness

int bar(int flag)
{

   int a =1,b,c;
#pragma rose [LiveVarsLattice: liveVars=[flag, a, b]]
   if (flag == 0) // flag is only read here, not written!
     c = a;
   else
     c = b;
   return c;
}

How to debug

edit

Trace the analysis

edit

Turn it on

     liveDeadAnalysisDebugLevel = 1;
     analysisDebugLevel = 1;


// find code with  
 if(analysisDebugLevel>=1) ...

check the web page dump using a browser

 firefox index.html

How to read the trace file: start from the beginning: information is ordered based on the CFG nodes visited. The order could be forward or backward order. Check if the order is correct first, then for each node visited

 ==================================  
  Copying incoming Lattice 0: 
        [LiveVarsLattice: liveVars=[b]]
  To outgoing Lattice 0: 
        [LiveVarsLattice: liveVars=[]]
 ==================================  
  Transferring the outgoing  Lattice ... 
    liveLat=[LiveVarsLattice: liveVars=[b]]
    Dead Expression
        usedVars=<>
        assignedVars=<>
        assignedExprs=<>
        #usedVars=0 #assignedExprs=0
    Transferred: outgoing Lattice 0: 
        [LiveVarsLattice: liveVars=[b]]
    transferred, modified=0
 ==================================  
 Propagating/Merging the outgoing  Lattice to all descendant nodes ... 
    Descendants (1):
    ~~~~~~~~~~~~
    Descendant: 0x2b9e8c47f010[SgIfStmt | if(flag == 0) c = a;else c = b;]

        Propagating to Next Node: 0x2b9e8c47f010[SgIfStmt | if(flag == 0) c = a;else c = b;]
        Cur node: Lattice 0: 
            [LiveVarsLattice: liveVars=[b]]
        Next node: Lattice 0: 
            [LiveVarsLattice: liveVars=[a]]
        Next node's in-data modified. Adding...
        Propagated: Lattice 0: 
            [LiveVarsLattice: liveVars=[a, b]]
    propagated/merged, modified=1
    ^^^^^^^^^^^^^^^^^^ 

A real example: if (flag)  c = a; else c = b;  // liveness analysis, a, b are live in two branches, they are propagated backward to if-stmt

   ------------------
    Descendants (1):  // from c =a back to if-stmt (next node)
    ~~~~~~~~~~~~
    Descendant: 0x2ac8bb95c010[SgIfStmt | if(flag == 0) c = a;else c = b;]

        Propagating to Next Node: 0x2ac8bb95c010[SgIfStmt | if(flag == 0) c = a;else c = b;]
        Cur node: Lattice 0: 
            [LiveVarsLattice: liveVars=[a]]   // current node's lattice
        Next node: Lattice 0: 
            [LiveVarsLattice: liveVars=[]]   // next node's lattice before propagation
        Next node's in-data modified. Adding...
        Propagated: Lattice 0: 
            [LiveVarsLattice: liveVars=[a]]  // propagate a into if-stmt's lattice
    propagated, modified=1
    ^^^^^^^^^^^^^^^^^^ 

    ------------------
    Descendants (1):  // from c = b --> if-stmt 
    ~~~~~~~~~~~~
    Descendant: 0x2ac8bb95c010[SgIfStmt | if(flag == 0) c = a;else c = b;]

        Propagating to Next Node: 0x2ac8bb95c010[SgIfStmt | if(flag == 0) c = a;else c = b;]
        Cur node: Lattice 0: 
            [LiveVarsLattice: liveVars=[b]]
        Next node: Lattice 0: 
            [LiveVarsLattice: liveVars=[a]] 
        Next node's in-data modified. Adding...
        Propagated: Lattice 0: 
            [LiveVarsLattice: liveVars=[a, b]]  // now both a and b are propagated/ merged
    propagated, modified=1
    ^^^^^^^^^^^^^^^^^^ 

Dump cfg dot graph with lattices

edit

A class analysisStatesToDot is provided generate a CFG dot graph with lattices information.

//AnalysisDebuggingUtils.C

  class analysisStatesToDOT : public UnstructuredPassIntraAnalysis
  {
    private:
      //    LiveDeadVarsAnalysis* lda; // reference to the source analysis
      Analysis* lda; // reference to the source analysis
      void printEdge(const DataflowEdge& e); // print data flow edge
      void printNode(const DataflowNode& n, std::string state_string); // print date flow node
      void visit(const Function& func, const DataflowNode& n, NodeState& state); // visitor function
    public:
      std::ostream* ostr; 
      analysisStatesToDOT (Analysis* l):  lda(l){ };
  };

namespace Dbg
{ 
//....
 void dotGraphGenerator (::Analysis *a) 
  {
    ::analysisStatesToDOT eas(a);
    IntraAnalysisResultsToDotFiles upia_eas(eas);
    upia_eas.runAnalysis();
  }

} // namespace Dbg

Example use

edit
// Liao, 12/6/2011
#include "rose.h"

#include <list>
#include <sstream>
#include <iostream>
#include <fstream>
#include <string>
#include <map>

using namespace std;

// TODO group them into one header
#include "genericDataflowCommon.h"
#include "VirtualCFGIterator.h"
#include "cfgUtils.h"
#include "CallGraphTraverse.h"
#include "analysisCommon.h"
#include "analysis.h"
#include "dataflow.h"
#include "latticeFull.h"
#include "printAnalysisStates.h"
#include "liveDeadVarAnalysis.h"

int numFails = 0, numPass = 0;

//-----------------------------------------------------------
int
main( int argc, char * argv[] )
   {

     SgProject* project = frontend(argc,argv);

     initAnalysis(project);

   // generating  index.html for tracing the analysis
     Dbg::init("Live dead variable analysis Test", ".", "index.html");
     liveDeadAnalysisDebugLevel = 1;
     analysisDebugLevel = 1;

     LiveDeadVarsAnalysis ldva(project);
     UnstructuredPassInterDataflow ciipd_ldva(&ldva);
     ciipd_ldva.runAnalysis();
   // Output the dot graph  *********************
    Dbg::dotGraphGenerator (&ldva);
      return 0;
   }

TODO

edit
  • Hard to use the generated lattices since many temporary expression objects are generated in lattices. But often users do not care about them (constant propagation, pointer analysis)
    • to see the problem: go to [build64/tests/roseTests/programAnalysisTests/generalDataFlowAnalysisTests]
    • run make check
    • see the dot graph dump of an analysis : run.sh test_ptr4.C_main_0x2b41e651c038_cfg.dot

Program Optimizations

edit

ROSE provides the following program optimizations and tranformations:

  • loop transformation, including loop fusion, fisson, unrolling, blocking, loop interchange, etc.
  • inlining
  • outlining
  • constant folding
  • partial redundancy elimination

ROSE Projects

edit

This page serves as a quite guide about what the major directories under rose/projects are:

Project Overview

edit

Parsing

  • pragmaParsing: An example translator using the parsing building blocks provided by ROSE to parse pragmas

Translations:

  • autoTuning: a project to use ROSE's parameterized translators to facilitate empirical tuning (or autotuning)
  • DataFaultTolerance: a project to use source-to-source translation to make application resilient to memory faults
  • extractMPISkeleton: extract MPI communication skeletons
  • Fortran_to_C : A Fortran to C language translator

Static Analysis

  • compass: a static analysis tool to find errors in applications

Dynamic Analysis

  • RTED: runtime error detection using compiler instrumentation of library calls.

Binary Analysis:

  • BinaryCloneDetection: detect similarities between binary executables.
  • CloneDetection:


Optimizations of high-level abstractions

  • arrayOptimization: optimizations based on array abstractions
  • autoParallelization: A translator which can automatically insert OpenMP directives into serial code, based on dependence analysis and optionally semantics of abstractions.

Parallel Programming Models:

  • mint: a directive based programming model for GPUs
  • OpenMP_Translator: the first version of OpenMP implementation using ROSE. Not recommended for production use, kept just as an example.
  • UpcTranslation: a preliminarily example project to demonstrate how ROSE can be used to created a UPC compiler

OpenK

edit

An ongoing project to explore knowledge-driven HPC analysis and optimization. We use the standard and toolchain used by OWL to formally model the concepts and relations in HPC domains, including programs, hardware, analysis and optimization, etc.

See more at the main article OpenK

Shift Calculus DSL

edit

Developing a scalable domain specific language for stencil computation

minitermite

edit

Problem: A student added some new IR nodes into ROSE. She is having trouble to pass make for minitermite

Solution: projects/minitermite/HOWTO_ADD_NEW_SGNODE_VARIANTS

Developer's Guide

edit

We briefly describe the workflow of ROSE developers.

Basic skills for ROSE developers

edit

Have experience with these, or be ready to learn them

  • Shell programming: Bash (default shell)
  • Unix commands: grep, find, ssh, etc.
  • C++ programming
  • GDB/Valgrind
  • Git
  • Build systems
    • Make
    • CMake The windows port uses this, we are looking into using it elsewhere
    • autoconf/automake
    • libtool
  • LaTeX: Documentation Standard
  • Compiler Techniques: Intuition is invaluable

Helpful Resources

edit

Valued Contributions

edit

The ROSE project values the following contributions:

Development:

  • Code: implementing new compiler features, improving existing work, passing code review and Jenkins. Only commits which were merged into the central master branch count as contributions.
    • Expanding Language Support
    • Analyses (AST analyses are a good place to start)
    • Optimizations
    • Build System
  • Bug Fixes: passing code review and Jenkins (in the future, Klocwork, Coverity, etc. analysis tools)
    • Users bugs found in the SciDAC outreach center's bug tracker
    • Internal bugs: Usually developer bugs, they can be found on github.com or redmine
  • Documentation:
    • How ROSE Works
    • Tutorial, Manual, FAQ, etc. (Updating these is most helpful)
    • Project Documentation (projects can be found in the projects directory)
    • Design/Architecture/API Documentation,
    • Workflow Documentation
  • System administration: Maintain and improve workflow components (mostly internal developer work, but suggestions are useful)
    • Website: rosecompiler.org
    • Git repository
    • Project management: Redmine
    • Code review: Github enterprise
    • Jenkins: Continuous integration, improving testings

Research:

  • Publications: technical reports, papers, presentations, posters
  • Slides from presentations (Upload slides to relevant Redmine project's @Files Tab@. (@.pptx@ format is required)

Proposal:

  • Collaborative proposals

Feedback: Remember that any problem you find is probably not unique to you

  • General struggles (administratively or implementation-wise)
  • General improvement/enhancement ideas for both the software and the people

Milestones for a ROSE developers

edit

Having been working with some interns with us, we roughly identify the following milestones for a ROSE developer:

  • Development environment: pick a platform of your choice (Linux or Mac OS), and get familiar with that specific platform (shell, editors, environment variable setting, etc.)
    • Physical location: locations MATTER! Sit closer to people you should interact often. Make your desk/office accessible to others. Physically isolated office/desk may have very negative impact on your productivity.
  • Installing ROSE: being able to smoothly configure, compile, and install ROSE
  • Build system: being able to add a project (first skeleton) into ROSE by modifying Makefile.am, etc.
  • Contribution following ROSE Coding Standard and passing code review
    • Documentation: sufficient documentation about what you work is about
    • Software Engineering:
      • Style guidelines: Doxygen comments, naming conventions, where to put things, etc.
      • Interface: Does the code has a clean and simple interface to be used by users?
      • Algorithm design: documented by source comments how things are expected to work
      • Coding implementation: correctly implement the designed algorithm
    • Tests: Each contribution must have the accompanying tests to make sure it works as expected
  • Continuous integration: push commits to be code reviewed and tested by Jenkins every two or three weeks for your incremental development results.
    • Add a new test job if none of the existing ones tests your project
  • Confirm your commits are merged into the ROSE project's central master branch: github.com provides graphs for individual impact

Termination checklist

edit

We often have interns/collaborators/subcontractors finishing up their official duties with us. Here a brief checklist before their termination

  • Complete the student program checklist (we have no idea what you need to do :-)
  • Complete the performance evaluation form provided by us: mostly provide objective facts to demonstrate contributions since subjective impressions can be very off.
  • Complete a short feedback form provided by us, where you can discuss anything related to developing ROSE or working with the ROSE team. Your candid feedback is essential to the future of our collaborative program.
  • Schedule a one-to-one meeting with at least one staff member two weeks before the official end dates to do status check and plan the exit
  • Turn in all documentations (LaTeX, word, powerpoint, etc) not in git repo by uploading them to the redmine project File tab
  • Stop developing any new features at least one week before the end date so we can focus on making sure all source code contributions can pass Jenkins
  • If you plan to continue collaborating with us, ask about getting internal access (e.g. VPN), or setup some other method for collaboration.

Code Review

edit

See the Code Review section for details.

Working From An Internal LLNL Computer

edit

Toolchain

edit

There are many tools pre-installed on the /nfs/apps mount point:

$ ls /nfs/apps
apr        bin       etc   grace     java     mpc      neon     pygobject  sqlite      toolworks.old
asciidoc   binutils  flex  graphviz  libtool  mpfr     openssh  python     src         totalview
asymptote  blender   gcc   hdf5      m4       mpich    perl     qt         subversion  upc
autoconf   doc++     git   insure++  maple    mpich2   pgi      rdesktop   swig        visit
automake   doxygen   gmp   intel     matlab   mplayer  psi      ruby       texinfo     xemacs


The root of most of these tools contains a setup.sh file which you can source. This will correctly setup your library path ($LD_LIBRARY_PATH) and program path ($PATH):


GCC

$ source /nfs/apps/gcc/4.5.0/setup.sh

This GCC setup.sh file should also source MPFR and GMP, but if not, please do it manually:

$ source /nfs/apps/mpfr/3.0.0/setup.sh
$ source /nfs/apps/gmp/4.3.2/setup.sh

If you fail to properly source these dependencies, you may encounter this error:

/nfs/apps/gcc/4.3.2/libexec/gcc/x86_64-unknown-linux-gnu/4.3.2/f951: error while loading shared libraries: libmpfr.so.1: cannot open shared object file: No such file or directory

Workflow

edit

Motivation and Goals

edit

The goal of the ROSE workflow is to have a streamlined, simplified, and automated process to allow users and developers to:

  • Improve the quality of ROSE source code and documentation
  • improve our productivity allowing us to produce high quality work using less time and other resources than would otherwise be required

Development Guide

edit

Developing a big, sophisticated project entails many challenges. To mitigate some of these challenges, we have adopted several best practices: incremental development, code review, and continuous integration.

  • Iterative and Incremental software development for early results, controllable risks, and better engagement of stakeholders
  • Code review for consistency, maintainability, usability, and quality
  • Continuous Integration for automated testing, easy release, and scalable collaboration

Incremental Development

edit

Developing new functionality in small steps, where the resulting code at each step is a useful improvement over the previous state. Contrast to developing an entire feature fully elaborated, with no points along the way at which it's externally usable.

Each ROSE developer is expected to push his/her work at least once every three weeks.

Major benefits of doing things incrementally

  • You can have intermediate results along the path. So your sponsors will sleep better.
  • You will get feedback early and frequently about if you are heading to the right direction.
  • Your work will be tested and merged often into the master branch, avoiding the risks of merge conflicts.

See more tips about How to incrementally work on a project

Code Review

edit

See Code Review in ROSE.

Continuous Integration

edit

Incorporating changes from work in progress into a shared mainline as frequently as possible, in order to identify incompatible changes and introduced bugs as early as possible. The integrated changes need not be particular increments of functionality as far as the rest of the system is concerned.

In other words, incremental development is about making one's work valuable as early as possible, and potentially about getting a better sense of what direction it should take, while continuous integration is about reducing the risks that result from codebase divergence as multiple people do development in parallel.

The question of whether to conditionalize new code is an interesting one. By doing so, one narrows the scope of continuous integration to just checking for surface incompatibilities in merging the changed code. Without actually running the new code against the existing tests, the early detection of introduced bugs is lost. In exchange, multiple people working in the same part of the codebase become less likely to step on each other's toes, because the relevant code changes are distributed more rapidly.

See more at Continuous Integration

High Level Workflow

edit

Requirement Analysis

edit

External:

Internal: Need LC accounts to access

Design

edit
  • Wikibook: community-based design documents and provoke discussion
  • Powerpoint slides: more formal communication about what is the design
  • Confluence: https://lc.llnl.gov/confluence/

Implementation

edit
  • Redmine (http://hudson-rose-30:3000/): create projects based on milestones and user input, create and track tasks
    • Project-Specific Tasks
    • Private Issue Tracking
    • Private Documentation
      • Using redmine's wiki
  • Github:
    • Internal (http://github.llnl.gov/): for code review only,
    • External (https://github.com/rose-compiler/rose): public hosting code, public issue tracking for general ROSE bugs and features.
    • "Rosebot" to automate Github workflow: preliminary testing, policies (git-hooks), automatically add reviewers, etc.

Testing

edit

Documentation

edit

Publicity

edit

Proposing Workflow Changes

edit

Major workflow improvements and changes should be thoroughly tested and reviewed by staff members before deployment since they may have profound impact on the project

How to propose a workflow change

  • Submit a ticket on github.com's rose-public/rose issue tracker. In the ticket, provide the following information:
    • What is it: Explain what change is proposed
    • Why the changes: the long-term benefits for our productivity and quality of work
    • The cost of the changes: learning curve, maintainability, purchase cost

Reviewing Workflow Change Proposals

edit

Review criteria

edit
  • Optimize
    • Optimize our workflow to allow us to do more quality and use less time and other resources.
    • Address what is slowing us down or distracting us.
    • Simplify daily life. Compare how we can eliminate or automate using the proposed workflow improvements.
      • It is counterproductive to improve workflow by adding more hoops/steps/clicks into daily work.
  • Improve:
    • Allows the improvement of the quality of work incrementally:
    • Accepting incremental improvements is more realistic than asking for perfection in the first try.
    • Workflow should allow quick new contributions and fast revision of existing contributions
  • Automate:
    • Additions to the workflow should be automated as much as possible.
  • Preserve:
    • It must preserve existing work:
      • No creation of anything from scratch
    • Does it interact well with existing workflow
    • Is there a way to convert existing code/documents into the new form
  • Simplicity:
    • The more software tools we depend on, the harder to use and maintain our workflow. Similarly, the more formats/standards we enforce, the harder for developers to do their daily work
    • Adopting new required software components and new required technical formats/standards in our workflow should be very carefully reviewed for the associated long-term benefits and costs. Long-term means the range of 5 to 10 years and is not tied to a temporary thing we use now.
  • Preference of major contributors: Whoever contributes the most should has a little bit more weight to say
  • Documentation: We require major changes to be documented and reviewed before deployment. Writing down things can help us clarify details and solicit wider comments (instead of limited to face-to-face meeting)

Coding Standard

edit

What to Expect and What to Avoid

edit

This page documents the current recommended practice of how we should write code within the ROSE project. It also serves as a guideline for our code review process.

New code should follow the conventions described in this document from the very beginning.

Updates to existing code that follows a different coding style should only be performed if you are the maintainer of the code.

The order of sections in coding standard follows a top-down approach: big things first, then drill down to fine-grain details.

Six Principles

edit

We use coding standard to reflect the principal things we value for all contributions to ROSE

  • Documentation: What are the commits about? Is this reflected in commit messages, README, source comments, or LaTex files within the same commits?
  • Style: Is the coding style consistent with the required and recommended formats? Is the code clean and pleasant and easy to read?
  • Interface: Does the code have a clean and simple interface to be used by users?
  • Algorithm: Does the code have sufficient comments about what algorithm is used? Is the algorithm correct and efficient (space and time complexity)?
  • Implementation: Does the implementation correctly implement the documented algorithms?
  • Testing: Does the code have the accompanying test translator and input to ensure the contributions do what they are supposed to do?
    • Is Jenkins being configured to trigger these tests? Local tests on developer's workstation do not count.

Avoid Coding Standard War

edit

We directly quote text from http://www.parashift.com/c++-faq/coding-std-wars.html, as follows:

"Nearly every software engineer has, at some point, been exploited by someone who used coding standards as a power play. Dogmatism over minutia is the purvue of the intellectually weak. Don't be like them. These are those who can't contribute in any meaningful way, who can't actually improve the value of the software product, so instead of exposing their incompetence through silence, they blather with zeal about nits. They can't add value in the substance of the software, so they argue over form. Just because "they" do that doesn't mean coding standards are bad, however.

Another emotional reaction against coding standards is caused by coding standards set by individuals with obsolete skills. For example, someone might set today's standards based on what programming was like N decades ago when the standards setter was writing code. Such impositions generate an attitude of mistrust for coding standards. As above, if you have been forced to endure an unfortunate experience like this, don't let it sour you to the whole point and value of coding standards. It doesn't take a very large organization to find there is value in having consistency, since different programmers can edit the same code without constantly reorganizing each others' code in a tug-of-war over the "best" coding standard."

Must, Should and Can

edit

The terms must, should and can have special meaning.

  • A must requirement must be followed,
  • A should is a strong recommendation,
  • A can is a general guideline.

Got New Ideas, Suggestions

edit

This is not a place to write down the new ideas/concepts/suggestions to be used in the future. If you have suggestions, put into the discussion tab link of this page.

We do welcome suggestions for improvements and changes so we can do things faster and better.

Git Convention

edit

Name and Email

edit

Before you commit your local changes, you MUST ensure that you have correctly configured your author and email information (on all of your machines). Having a recognizable and consistent name and email will make it easier for us to evaluate the contributions that you've made to our project.

Guidelines:

  • Name: You MUST use your official name you commonly use for work/business, not nickname or alias which cannot be easily recognized by co-workers, managers, or sponsors.
  • Email: You MUST use your email commonly used for work. It can be either your company email or your personal email (gmail) if you DO commonly use that personal email for business purpose.

To check if your author and email are configured correctly:

  $ git config user.name
  <your name>

  $ git config user.email
  <your email>

Alternatively, you can just type the following to list all your current git configuration variables and values, including name and email information.

  $ git config -l


To set your name and email:

  $ git config --global user.name "<Your Name>"
  $ git config --global user.email "<your@email.com>"

Commit messages

edit

It is important to have concise and accurate commit messages to help code reviewers do their work.

Latest requirements

Example commit message, excerpt from link

(Binary Analysis) SMT solver statistics; documentation

* Replaced the SMT class-wide number-of-calls statistic with a
  more flexible and extensible design that also tracks the amount
  of I/O between ROSE and the SMT solver.  The new method tracks
  statistics on a per-solver basis as well as a class-wide basis, and
  allows the statistics to be reset at arbitrary points by the user.

* More documentation for the new memory cell, memory state, and X86
  register state classes.
  • (Required) Summary: the first line of the commit message is a one line summary (<50 words) of the commit. Start the summary with a topic, enclosed in parentheses, to indicate the project, feature, bugfix, etc. that this commit represents.
  • (Optional) Use a bullet-list (using an asterisk, *) for each item to elaborate on the commit

Also see http://spheredev.org/wiki/Git_for_the_lazy#Writing_good_commit_messages.

Design Document

edit

Overview

edit

"The software design document is a written contract between you, your team, your project manager and your client. When you document your assumptions, decisions and risks, it gives the team members and stakeholders an opportunity to agree or to ask for clarifications and modifications. Once the software design document is approved by the appropriate parties, it becomes a baseline for limiting changes in the scope of the project." - How to Write a Software Design Document | eHow.com

We are still in the process of defining the requirements for design documents, but preliminarily, here are the initial rules for writing a design document for a ROSE module (an analysis, transformation, optimization, etc.).

(We thank Professor Vivek Sarkar at Rice University for his insightful comments for some of the initial design document requirements.)

Guideline

edit
  • All new ROSE analyses, transformations, and optimizations must have an accompanying design document, to be peer-reviewed, before the actual implementation begins.
  • Be specific enough that someone with ROSE skills who is not the original designer could (in principle) implement the design just by looking at the document.
  • It's to be expected that different developers will make different low-level choices about data structures, etc

Requirement vs. Design Document

edit

If the requirements document is the "why" of the software, then the technical design document is the "how to". For simplicity, we put both requirements and design into a single document for now. We allow a separated requirement analysis document if necessary.

The purpose of writing the technical design document is to guide developers in implementing (and fulfilling) the requirements of the software--it's the software's blueprint.

Format

edit

Documents must be:

  • Written in LaTex for re-usability in publications and proposals.
  • Stored under version control to support collaborative writing.

Your document should, at a minimum, include these formal sections:

  • Title page
  • Author information: who participates in the major writing
  • Reviewer information: who reviews and approves the document
  • Table of contents
  • Page numbering format
  • Section numbers
  • Revision history

Content

edit

Major Sections

  • Overview
    • Explain the motivation and goal of the module: what does this module do, the goal, the problem to address, etc.
  • Requirement analysis: what is required for this module
    • Define the interface: namespace, function names, parameters, return values. How others can call this module and obtain the returned results
    • Performance requirement: time and space complexity
    • Scope of input/test codes: what types of languages to be supported, the constructs of a language to be supported, the benchmarks to be used
  • Design considerations
    • Assumptions
    • Constraints
    • Tradeoffs and limitations: why this algorithm, what are the priorities, etc.
    • Non-standard elements: Definitions of any non-standard symbols, shapes, acronyms, and unique terms in the document
    • Game plan: How each requirement will be achieved
  • Internal software workflow
    • Diagrams: logical structure and logical processing steps: MUST have a UML diagram or power point diagram
    • Pseudo code: MUST have pseudo code to describe key data structures and high-level algorithm steps
    • Example: Must illustrate the designed algorithm by using at least one sample input code to go through the important intermediate results of the algorithm.
    • Error, alarm and warning messages, optional
  • Performance: MUST have complexity analysis. Estimate the time and space complexity of this module so users can know what to expect
  • Reliability (Optional)
  • Related work: cite relevant work in textbooks and papers

Development guidelines

edit
  • Coding guidelines: standards and conventions.
  • Standard languages and tools
  • Definitions of variables and a description of where they are used

References

edit

TODO

edit
  • a sample design document

Testing

edit

Rules

  • All contributions MUST have the accompanying test translator and input files to demonstrate the contributions work as expected.
  • All tests MUST be triggered by the "make check" rule
  • All test should have self-verification to make sure the correct results are generated
  • All tests MUST be activated by at least one of the integration tests of Jenkins (the test jobs used to check if something can be merged into our central repository's master branch)
    • This will ensure that no future commits can break your contributions.

Programming Languages

edit

Core Languages

edit

Only C++ is allowed. Any other programming language is an exception on a case-by-case basis.

Question: But Programming language XYZ is much better than C++ and I am really good at XYZ!!!

Answer: We can allow XYZ only if

  • You can teach at least one of old dogs (staff members) of our team the new tricks to efficiently use XYZ
  • You will be around in our team in the next 5 to 10 years to maintain all the code written in XYZ if none of the old dogs have time/interest to switch to XYZ
  • You can prove that XYZ can interact well with the existing C++ codes in ROSE

Scripting Languages

edit

Only two scripting languages are allowed

  • bash shell scripting
  • perl

Again, this is just a preference of the staff members and what we have now. Allowing uncontrolled number of scripting languages in a single project will make the project impossible to maintain and hard to learn.

Naming Conventions

edit

The order of sub-sections reflects a top-down approach for how things are added during the development cycle: from directory --> file --> namespace --> etc.

General

edit
  • Language: all names should be written in English since it is the preferred language for development, internationally
  • fileName; // NOT: filNavn

Abbreviations and Acronyms

edit

Avoid ambiguous abbreviations: obtain good balance between user-clarity and -productivity.

Abbreviations and acronyms should NOT be uppercase when used as name

  • exportHtmlSource(); // NOT: exportHTMLSource();
  • openDvdPlayer(); // NOT: openDVDPlayer();

Likewise, commonly-lowercase abbreviations and acronyms should NOT start with a lower-case letter when used in a CamelCase name:

  • SgAsmX86Instruction // NOT: SgAsmx86Instruction
  • myIpod // NOT: myiPod

File/Directory

edit

Case:

  • camelCase like fileName.hpp: This is consistent with existing names used in ROSE

File Extension:

  • Header files: .h or .hpp
  • Source files: .cpp or .cxx
    • .C should be avoided to work with file systems which do not distinguish between lower or upper case.

Namespaces

edit
  • A namespace should represent a logical unit, usually encapsulated in a single header file within a specific directory.
  • CamelCase for namespaces, such as SageInterface, SageBuilder, etc.
    • avoid lower case names, bad names: sage_interface
  • use singular for nouns within namespace names, avoid plural
  • use full words, avoid abbreviations
  • use at least two words to reduce name collision

Reason: the name convention of namespace is meant to be compatible with existing code and consistent with function names within namespaces.

  • CamelCase namespace can nice be used with doSomething() like: NameSpace::doSomething()
  • lower case namespace names may look inconsistent, such as name_space_1::doSomething()
  • many existing namespaces in ROSE already follow CamelCase, as shown at link

[Note] Leo: I believe this should be more discussed with ROSE Compiler Framework/ROSE API.

Types

edit

MUST be in mixed case starting with an uppercase letter, as in SavingsAccount

Variables

edit
  • Length: variables with a large scope should have long names, variables with a small scope can have short names
  • Temporary variables used for temporary storage (e.g. loop indices) are best kept short. A programmer reading such variables should be able to assume that its value is not used outside of a few lines of code. Common scratch variables for integers are i, j, k, m, n. Optionally, you can use ii, jj, kk, mm, and nn, which are easier to highlight when looking for indexing bugs.
  • Case: camelCase--mixed case starting with lowercase letter, as in functionDecl
    • Variables are purposely to start with lowercase letter as compared to upper case letter for Types. So it is clear by looking at the first letter to know if a name is a variable or a type.

Booleans

edit

Negated boolean variable names must be avoided. The problem arises when such a name is used in conjunction with the logical negation operator as this results in a double negative. It is not immediately apparent what !isNotFound means.

bool isError; // NOT: isNoError
bool isFound; // NOT: isNotFound

Collections

edit

Plural form should be used on names representing a collection of objects. This enhances readability since the name gives the user an immediate clue as to the type of the variable and the operations that can be performed on its elements.

For example,

vector<Point> points;
int values[];

Constants

edit

Named constants (including enumeration values): MUST be all uppercase using underscore to separate words.

For example:

int MAX_ITERATIONS, COLOR_RED;
double PI;

In general, the use of such constants should be minimized. In many cases implementing the value as a method is a better choice:

int getMaxIterations() // NOT: MAX_ITERATIONS = 25
{
    return 25;
}

Generic

edit

Generic variables should have the same name as their type. This reduces complexity by reducing the number of terms and names used. Also makes it easy to deduce the type given a variable name only. If for some reason this convention doesn't seem to fit it is a strong indication that the type name is badly chosen.

void setTopic(Topic* topic) // NOT: void setTopic(Topic* value)
                            // NOT: void setTopic(Topic* aTopic)
                            // NOT: void setTopic(Topic* t) 

void connect(Database* database) // NOT: void connect(Database* db)
                                 // NOT: void connect (Database* oracleDB)

Non-generic variables have a role. These variables can often be named by combining role and type:

Point  startingPoint, centerPoint;
Name   loginName;

Globals

edit

Must always be fully qualified, using the scope-resolution operator ::.

For example, ::mainWindow.open() and ::applicationContext.getName()

In general, the use of global variables should be avoided. Instead,

  • Place variable into a namespace
  • Use singleton objects

Private class variables

edit

Private class variables should have underscore suffix. Apart from its name and its type, the scope of a variable is its most important feature. Indicating class scope by using underscore makes it easy to distinguish class variables from local scratch variables.

For example,

class SomeClass {
  private:
    int length_;
}

An issue is whether the underscore should be added as a prefix or as a suffix. Both practices are commonly used, but the latter is recommended because it seem to best preserve the readability of the name. A side effect of the underscore naming convention is that it nicely resolves the problem of finding reasonable variable names for setter methods and constructors:

  void setDepth (int depth)
  {
    depth_ = depth;
  }

Methods and Functions

edit

Names representing methods or functions: MUST be verbs and written in mixed case starting with lower case to indicate what they return and procedures (void methods) after what they do.

  • e.g. getName(), computeTotalWidth(), isEmpty()

A method name should avoid duplicated object name.

  • e.g. line.getLength(); // NOT: line.getLineLength();

The latter seems natural in the class declaration, but proves superfluous in use, as shown in the example.

The terms get and set must be used where an attribute is accessed directly.

  • e.g: employee.getName(); employee.setName(name); matrix.getElement(2, 4); matrix.setElement(2, 4, value);

The term compute can be used in methods where something is computed.

  • e.g: valueSet->computeAverage(); matrix->computeInverse()

Give the reader the immediate clue that this is a potentially time-consuming operation, and if used repeatedly, he might consider caching the result. Consistent use of the term enhances readability.

The term find can be used in methods where something is looked up.

  • e.g.: vertex.findNearestVertex(); matrix.findMinElement();

Give the reader the immediate clue that this is a simple look up method with a minimum of computations involved. Consistent use of the term enhances readability.

The term initialize can be used where an object or a concept is established.

  • e.g: printer.initializeFontSet();

The american initialize should be preferred over the English initialise. Abbreviation init should be avoided.

The prefix is should be used for boolean variables and methods.

  • e.g: isSet, isVisible, isFinished, isFound, isOpen

There are a few alternatives to the is prefix that fit better in some situations. These are the has, can and should prefixes:

  • bool hasLicense();
  • bool canEvaluate();
  • bool shouldSort();

Parameters should be separated by a single space character, with no leading or trailing spaces in the parameters list:

  • YES: void foo(int x, int y)
  • NO: void foo ( int x,int y )

Directories

edit

Naming Convention

edit

List of common names

  • src: to put source files, headers
  • include: to put headers if you have many headers and don't want to put them all into ./src
  • tests: put test inputs
  • docs: detailed documentation not covered by README

Please use camelCase for your directory name.

  • you should avoid leading Capitalization

Examples of preferred names

  • roseExtensions
  • roseSupport
  • roseAPI

What to avoid

  • rose_api
  • rose_support

Layout

edit

TODO: big picture about where to put things within the ROSE git repository.


For each project directory under ./projects, it is our convention to have subdirectories for different files

  • README: must have this
  • ./src: for all your source files
  • ./include: for all your headers if you don't want to put them all into ./src
  • ./tests: for your test input files
  • ./doc: for your more extensive documentation if README is not enough

Files

edit

A single file should contain one logical unit, or feature. Keep it modular!

Naming Conventions

edit

A file name should be specific and descriptive about what it contains.

You should use camelCase (lowercase character in the beginning)

  • good example: fileName.h

What should be avoided

  • start with capitalization,
  • bad example using underscore: file_name.h

Bad file name

  • functions.h
  • file_name.h

References

Line Length

edit
  • File content should be kept within 80 columns.

80 columns is a common dimension for editors, terminal emulators, printers and debuggers, and files that are shared between several people should keep within these constraints. It improves readability when unintentional line breaks are avoided when passing a file between programmers. If you write a tutorial with more than 80 columns it is likely to not fit on the page. This effectively makes the tutorial useless without having to go into the code base itself.

Indentation

edit

Avoid tabs for your code indentation, except in cases where tabs (\t) are required, e.g. Makefiles.

2 or 4 spaces is recommended for code indentation.

for (i = 0; i < nElements; i++) 
  a[i] = 0;

Indentation of 1 is too small to emphasize the logical layout of the code. Indentation larger than 4 makes deeply nested code difficult to read and increases the chance that the lines must be split.

Characters

edit
  • Special characters like TAB and page break must be avoided.

These characters are bound to cause problem for editors, printers, terminal emulators or debuggers when used in a multi-programmer, multi-platform environment.

We already have a built-in perl script to enforce this policy.

Header Files

edit

File name:

  • must be camelCase: such as fileName.h or fileName.hpp
  • avoid file_name.h

Suffix

  • For C header files: Use .h
  • For C++ header files: Use .h or .hpp

Must have

  • protected preprocesssing directives to prevent the header from being included more than once, example
#ifndef _HEADER_FILE_X_H_
#define _HEADER_FILE_X_H_

#endif //_HEADER_FILE_X_H_
  • try to put your variables, functions, classes within a descriptive namespace.
  • Include statements must be located at the top of a file only.
    • Avoid unwanted compilation side effects by "hidden" include statements deep into a source file.

What to avoid in a header

  • global variables, functions, or classes ; // they will pollute the global scope
  • using namespace std;
    • this will pollute the global scope for each .cpp file which includes this header. using namespace should only be used by .cpp files. More explanations are at link and link2
  • function definitions
    • headers are meant to expose types and function interfaces. They will be included by multiple cpp files. A function definition in a header will cause re-definition error when compiling the multiple cpp files including it.


References:

Source Files

edit

Again, file names should follow the name convention

  • camelCase file name: e.g. sageInterface.cpp
  • Avoid capitalization, spaces, special characters

Preferred suffix

  • Use .c for C source files
  • Use .cpp or .cxx for C++ source files

What to avoid

  • capitalized .C for source files. This will cause some issue when porting ROSE to case-insensitive file systems.

References

README

edit

All major directories within ROSE git repository should have a README file

  • projects/projectXYZ MUST have a README file.

File name should be README

what to avoid

  • README.txt
  • readme

Required Content

edit

For all major directories in ROSE, there should be a README explaining

  • What is in this directory
  • What does this directory accomplish
  • Who added it and when

Each project directory must have a README to explain:

  • What this project is about
    • Name of the project
    • Motivation: Why do we have this project
    • Goal: What do we want to achieve
  • Design/Implementation: So next person can quickly catch up and contribute to this project
    • How do we design/implement it.
    • What is the major algorithm
  • Brief instructions about how to use the project
    • Installation
    • Testing
    • Or point out where to find the complete documentation
  • Status
    • What works
    • What doesn't work
  • Known limitations
  • References and citations: for the underlying algorithms
  • Authors and Dates

Format

edit

Format of README

  • text format with clear sections and bullets
  • optionally, you can use styles defined by w:Markdown

Examples

edit

An example README can be found at

Source Code Documentation

edit

The source code of ROSE is documented using the Doxygen documentation system.

General Guidelines

edit
  • English only
  • Use valid Doxygen syntax (see "Examples" below)
  • Make the code readable for a person who reads your code for the first time:
    • Document key concepts, algorithms, and functionalities
    • Cover your project, file, class/namespace, functions, and variables.
    • State your input and output clearly, specifically the meaning of the input or output
      • Users are more likely to use your code if they don't have to think about what the output means or what the input should be
    • Clever is often synonymous with obfuscated, avoid this form of cleverness in coding.

TODO, not ready yet

  • Test your documentation by generating it on your machine and then manually inspecting it to confirm its correctness

TODO: Generating Local Documentation

This does not work sometimes since we have a configuration file to indicate which directories to be scanned to generate the web reference html files

  $ make doxygen_docs -C ${ROSE_BUILD}/docs/Rose/

Use //TODO

edit

This is a recommended way to improve your code's comments.

While doing incremental development, it is often to have something you decide to do in the next iterations or you know your current implementation/functions have some limitations to be fixed in the future.

A good way is to immediately put a TODO source comments (// TODO blar blar ..) into the relevant code when you make such kind of decisions so you won't forget here is something you want to do next time.

The TODOs also serve as some handy flags within the code for other people if they want to improve your work after you are gone.

Examples

edit

Single Line

edit

Often a brief single line comment is enough

//! Brief description.

Multiple lines

edit

Doxygen supports comments with multiple lines.

/**
 
   ... text..
 
 */

/**
 *
 *  ... text..
 *
 */


/*******************************//**
 *         text
*********************************/

/////////////////////////////////////
///  ... text <= 80 columns in length
//////////////////////////////////////

Combined single line and multiple lines

edit

Doxygen can generate a brief comment for a function and optionally show detailed comments if users click on the function.

Here are the options to support combined single-line and multiple-line source comments.

Option 1:

/**
 * \brief Brief description.
 *        Brief description continued.
 *
 * [Optional detailed description starts here.]
 */

Option 2:

/**
 \brief Brief description.
        Brief description continued.
 
 [Optional detailed description starts here.]
 */

---

Single line comment followed by multiple line comments':

You may extend an existing single line comment with a multiple line comments (Option 1 or 2). For example:

//! Brief description.
/**
 * Detailed description starts here.
 */


TODO: provide a full, combined example.

Functions

edit

Rules

  • Except for simple functions like getXX() and setXX(), all other functions should have at least one line comment to explain what it does
  • Avoid global functions and global variables. Try to put them into a namespace.
  • A function should not have more than 100 lines of code. Please refactor big functions into smaller, separated functions.
  • Limit the unconditional printf() so your translator will not print hundreds lines of unnecessary text output when processing multiple input files
    • Use an if condition to control printf() for debugging purposes such as " if ( SgProject::get_verbose() > 0 ) "
  • The beginning part of the function should try to do sanity check for the function parameters.

Comments

edit

Rules

  • Please follow Doxygen style comments
  • Please explain in sufficient detail how your function works and the steps in the algorithm.
    • Reviewers will read your commented information to understand your algorithm and then read your code to see if the code implements the algorithm correctly and efficiently.

Coding

edit

Correctly implement the designed/documented algorithms. Future users won't have time to read your code directly to discern what it does.

Code should be efficient in terms of both time and space (memory) complexity.

Please be aware that your translator may handle thousands of statements with even more AST nodes.

Be aware that people other than you may use your code or develop it further. Please make this as easy as possible.

Classes

edit

Try to use namespace when possible, avoid global variables or classes.

Name Equals Functionality

edit

Name the class after what it is. If you can't think of what it is that is a clue you have not thought through the design well enough.

  • A class name should be a noun.

Compound names of over three words are a clue your design may be confusing various entities in your system. Revisit your design. Try a CRC card session to see if your objects have more responsibilities than they should.

Explicit Access

edit

All sections (public, protected, private) should be identified explicitly. Not applicable sections should be left out.

Public Members First

edit

The parts of a class should be sorted public, protected and private.

The ordering is "most public first" so people who only wish to use the class can stop reading when they reach the protected/private sections.

Class Variables

edit

Class variables should NOT be declared public.

The concept of C++ information hiding and encapsulation is violated by public variables. Use private variables and access functions instead. One exception to this rule is when the class is essentially a data structure, with no behavior (equivalent to a C struct). In this case it is appropriate to make the class' instance variables public.

Avoid Structs

edit

Structs are kept in C++ for compatibility with C only, and avoiding them increases the readability of the code by reducing the number of constructs used. Use a class instead.

Statements

edit

Loops

edit

Only loop control statements may be included in the for() construction, nothing else is allowed.

//Correct
sum = 0; 
for (i = 0; i < 100; i++) 
  sum += value[i]; sum += value[i];

//Incorrect
 for (i = 0, sum = 0; i < 100; i++) 

This increases maintainability and readability. It also allows future developers to make a clear distinction of what controls and what is contained in the loop.

Loop variables should be initialized immediately before the loop.

Type Conversions

edit

Type conversions must always be done explicitly. Never rely on implicit type conversion.

  //Correct
  floatValue = static_cast<float>(intValue); 
  //Incorrect 
  floatValue = intValue;

By this, the programmer indicates that he is aware of the different types involved and that the mix is intentional.

Conditionals

edit

The body of a conditional must be put on a separate line.

 if (isDone) 
 // NOT: if (isDone) doCleanup(); doCleanup();

This is for debugging purposes. When writing on a single line, it is not apparent whether the test is really true or not.

There must be a space separating the keyword if from the condition statement (isDone).

if (isDone)
  ^ space

Complex conditional expressions must be avoided. You must introduce temporary boolean variables instead

//recommended way
bool isFinished = (elementNo < 0) || (elementNo > maxElement); 
bool isRepeatedEntry = elementNo == lastElement; 
if (isFinished || isRepeatedEntry) { : } 

// NOT: if ((elementNo < 0) || (elementNo > maxElement)|| elementNo == lastElement) { : }

By assigning boolean variables to expressions, the program gets automatic documentation. The construction will be easier to read, debug and maintain. When the variables are well named, it also helps future developers understand what each part of the construction is accomplishing.

printf and cout

edit

All screen output MUST be put into a if statement to be conditionally executed, either via verbose level or other debugging option.

They MUST not print out information by default.

TODO: this can be enforced by a simple Compass checker in the future.

switch

edit

Carefully differentiate

  • things which are known to be allowed to ignore and
  • things which are not yet handled by the current implementation.
  switch(type->variantT())
 {
    case V_SgTypeDouble:
      {
        ...
      }
      break;
    case V_SgTypeInt:
      {
        ...
      }
      break;
   case V_SgTypeFloat: // things which are known to be allowed to be ignored.
      break;
   default:
    {
     //Things which are not yet explicitly handled
      cerr<<"warning, unhandled node type: "<< type->class_name()<<endl;
    }

assert

edit

It is encouraged to use assert often to explicitly express and guarantee assumptions used in the code.

Please use ROSE_ASSERT() or assert().

For each occurrence of assertion, you MUST add a printf or cerr message to indicate where in the code and what goes wrong so users can immediately know the cause of the assertion failure, without going through a debugger to find out what went wrong.

Statements To Be Avoided

edit

The following statements should usually be avoided:

  • Goto statements should not be used. Goto statements violate the idea of structured code. There are very few cases (for instance breaking out of deeply nested structures) where goto should be considered, and only if the equivalent structured counterpart is less readable.
  • Executable statements in conditionals should be avoided. Conditionals with executable statements are very difficult to read.
  File* fileHandle = open(fileName, "w"); 
  if (!fileHandle) { : } 
  // NOT: if (!(fileHandle = open(fileName, "w"))) { : }

Expressions

edit

Guidelines for readability, simplicity and debuggability.

  • Ternary operators (?:) should be replaced with if/else.
  • Long expressions should be broken up into several simpler statements. Add assertion for each pointer value obtained along the process to assist later debugging.
  • Clever use of operator precedence, shortcut evaluation, assignment expressions, etc. should be rewritten to easy-to-understand alternative forms.
  • Always remember that future programmers will appreciate clear and simple code rather than obfuscated cleverness.

AST Translators

edit

All ROSE-based translators should call AstTests::runAllTests(project) after all the transformation is done to make sure the translated AST is correct.

This has a higher standard than just correctly unparsed to compilable code. It is common for an AST to go through unparsing correctly but fail on the sanity check.

More information is at Sanity_check

References

edit

We list some external resources which are influential for us to define ROSE's coding standard

Code Review Process

edit
 
Code review using rose-github.llnl.gov
 
Connection between github and Jenkins

Please note: the URL of the internal github has changed! It is now https://rose-github.llnl.gov/, instead of https://github.llnl.gov/ .

Motivation

edit

Without code review, developers have:

  • added unreadable contributions which do not conform to any consistent coding styles.
  • added undocumented contributions which cannot be understood by anybody else(essentially useless contributions).
  • added untested contributions (codes without accompanying tests) so the contributions do not work as expected or can be easily broken by other conflicting contributions (another essentially less useful contributions)
  • disabled tests to subvert our stringent Jenkins CI regression tests
  • added files into wrong directories, with improper names
  • committed hundreds of reformatted files
  • re-invented the wheel by implementing features that already exist
  • added 160MB MPI trace files into the git repository

See Phabricator's "Advantages of Review" document (a Facebook project).

Goals

edit

Our primary goals for code reviewing ROSE are to:

  • share knowledge about the code: coder + reviewer will know the code, instead of just the coder
  • group-study: learn through studying other peoples' code
  • enforce policies for consistent usability and maintainability of ROSE code: documented and tested
  • avoid reinventing the wheel and eliminating unnecessary redundancy
  • safe-guarding the code: disallowing subversive attempts to disable or remove regression tests

Software

edit

We are currently testing Github Enterprise and looking into the possibility of leveraging Redmine for internal code review.

In the past, we have looked at Google's Gerrit code review system.

Github

edit

Releases: https://enterprise.github.com/releases

Support: https://support.enterprise.github.com

rosebot

edit

(Under development)

An automated pull request analyzer to perform various tasks:

  • Automatically add reviewers to Pull Requests based on hierarchical configuration
  • "Pre-receive hook" analyses: file sizes, quantity of files, proprietary source, etc.
  • more...

Developer Checklist

edit

Read these tips and guidelines before sending a request for code review.

Coding Standards

edit

Please go to Coding Standard for the complete guideline. Here we only summary some key points.

Your code should be written in a way that makes it easily maintainable and reviewable:

  • write easy to understand code; avoid using exotic techniques which nobody can easily understand.
  • add sufficient documentation (source-code comments, README, etc.) to aid the understandability of your code, your documentation should cover
    • why do you do this (motivation)
    • how do you do it (design and/or algorithm)
    • where are the associated tests (works as expected)
  • before submission of your code for review, make sure
    • you have merged with the latest central repository's master branch without conflicts
    • your working copy can pass local tests via: make, make check, and make distcheck
    • you have fixed all compiler warnings of your code whenever possible
  • submit a logical unit of work (one or more commits); something coherent like a bug fix, an improvement of documentation, an intermediate stage for reaching a big new feature.
  • balance code submissions with a good ratio of [lines of code] and [complexity of code]. A good balance needs to be achieved to make the reviewer's life easier.
    • the time needed to review your code should not exceed 1 hour. Please avoid pushing thousands of lines at a time.
    • Please also avoid pushing any trivial (fixed a typo, commented out a single line etc.) to be reviewed.

One time setup

edit

Steps for initializing code review:

1. Login to http://rose-github.llnl.gov using your OUN and PAC.

2. Fork your own clone of the ROSE repository from http://rose-github.llnl.gov/rose-compiler/rose.

3. Add Collaborators:

  • Go to http://rose-github.llnl.gov/<your_account>/rose
    • Click Admin
    • Click Collaborators
      • Add candidate code reviewers: liao6, too1. These developers will review and merge your work.
      • Add admins: hudson-rose. This user will automatically synchronize your master branch with /nfs/casc/overture/ROSE/git/ROSE.git:master.

4. Create your public-private SSH key pair using ssh-keygen, and add the public key to your rose-github.llnl.gov account. Refer to Generating SSH Keys or use a public key tat you already have. (rose-github.llnl.gov only supports the SSH protocol for now; HTTPS is not yet supported.)

5. Configure Auto-syncs: Contact the Jenkins administrator (too1 and liao6) to have your repository added to a white-list of repositories to be synced whenever new commits are integrated into ROSE's official master branch.

6. Setup polling job: Contact the Jenkins administrator (too1 and liao6) to have your Github repository polled for new changes on the master branch. When new changes are detected, your master branch will be pushed to the central repository (and added to the Jenkins testing queue) as <oun>-reviewd-rc.

Daily work process

edit
  • have a local git repo to do your work and submit local commits, you have two choices:
    • clone it from /nfs/casc/overture/rose/rose.git as we usually do before
    • clone your fork on rose-github.llnl.gov to a local repo (only HTTPS is supported via LC)

Note: You may encounter SSL certificate problems. If you do, simply disable SSL verification in cURL using either export GIT_SSL_NO_VERIFY=false or configuring git:

$ git config --global http.sslVerify false
    • don't use branches, use separated git repositories for each of your tasks. So status/progress of one task won't interfere with other tasks.
  • When ready to push your commits, synchronize with the latest rose-compiler/master to resolve merge conflicts, etc.
    • type: git pull origin master # this should always work since master branches on rose-github.llnl.gov are automatically kept up-to-date
    • make sure your local changes can pass 1)make -j8, 2)make check -j8, and 3)make distcheck -j8
  • push your commits to your fork's non-master branch, (like bugfix-rc , featurex-rc, work-status, etc.) You have total freedom in creating any branches in your forked repo, with any names you like
  # If your local repository was cloned from /nfs/casc/overture/ROSE/rose.git. 
  # There is no need to discard it. You can just add the rose-github.llnl's repo as an additional remote repository and push things there:
  git remote add github-llnl-youraccount-rose http://rose-github.llnl.gov/youraccount/rose.git
  git push github-llnl-youraccount-rose HEAD:refs/heads/bugfix-rc
    • It is encouraged to push your work to a remote branch with a -status suffix, which will trigger a pre-screening Jenkins Job: http://hudson-rose-30:8080/view/Status/job/S0-pre-screening-before-code-review/. This is often useful to make sure your pushes can pass a minimum make check rules, including your own, before reviewers spend time on reading your code. Reviewers can also see both your code and your code's actions.
  • add a pull(merge) request to merge bugfix-rc into your own fork's master,
    • please note that the default pull request will use rose-compiler/rose's master as the base branch (destination of the merge). Please change it to be your own fork's master branch instead.
    • Also make sure the source (head) branch of the pull (merge) request is the one your want (bugfix-rc in this example)
    • Double check the diff tab of your pull request only shows the differences you made, without other things brought in from the central repo. Or your own repo's master is out-of-sync with the central repo's master. Notify system admin (too1) for the problem or manually fix it using the troubleshooting section of this page.
  • notify a reviewer that you have a pull request (requesting to merge your bugfix-rc into your master branch)
    • You can assign the pull request to the reviewer so an email notification will be automatically sent to the reviewer
    • Or you can add discussion within the pull request using @revieweraccount. NOTE: please only click "Comment on this issue" once and manually refresh the web page. Github Enterprise has a bug so it cannot automatically shown the newly added comment. bug79
    • Or you can just email the reviewer
  • waiting for reviewer's feedback:

Review results

edit
  • Completion and Submission To Jenkins
    • If your code passes code review, the reviewer should have merged your bugfix-rc into your master. Jenkins will automatically poll your master and do the testing/merging
  • How To Make Changes
    • To implement changes make local edits, local commits, push to your remote branch, and send a merge request again
  • Taking Code Review Seriously
    • Remember code review is not an attack on you as a person. The purpose of code review is to allow a colleague to evaluate your code. This can take a reasonable amount of time, so respect their efforts and seriously look at your code anew.
    • Look through the reviewer comments and address them or comment the purpose of the code as it stands and wait for response
    • Some comments are mandatory changes, these must be addressed before you will pass code review
    • Some comments are suggestions. You should think about their suggestions carefully. If the reviewer suggests something you should form a rationalization for the difference or consider the implications of changing your code. ROSE is a team effort, we must take our colleagues seriously.
    • DO NOT CLOSE' the pull request. You can push your new commits to the same branch again and comment on the pull request to indicate there are new updates. Please review them again. This will avoid unnecessary repetition.

Benefits of Code Review

edit
  • Avoiding coding a feature that is already present.
    • Remember you are coding for a user, and we must try our best to write clear code
    • Code coherency is extremely important in a large project. Coherent code allows the user to spent his or her time on their project rather than trying to find an answer in the doxygen page and finding seven or eight ways to do the same thing without knowing the consequences of the different approaches
  • Coding As A Team
    • If every coder hid away and coded by himself without regard to the features ROSE already has, it not only can confuse users but developers as well.
    • At a glance, the ROSE source directory weighs in at almost 1 GB. The compilation directory after make, make check, make install, make installcheck, make distcheck comes in at 19G. Lesson: ROSE is large and the chance of someone knowing everything about ROSE and its functionality is rather slim
    • As a team the size is quite large but manageable so long as the work each person does on their particular section and asks about possible feature duplication and code readability

Reviewer Checklist

edit

What to look out for as a code reviewer?

  • Be familiar with the current Coding Standard as a general guideline to perform the code review.
  • Allocate up to 1 hour at a time to review approximately 500-1000 lines of code: a longer time may not pay off due to the attention span limits of human brains

What to check

edit

Six major things to check:

  • Documentation: What are the commits about? Is this reflected in README, source comments, or LaTex files?
  • Style: Does the coding style follow our standard? Is the code clean, robust, and maintainable?
  • Interface: Does the code has a clean and simple interface to be used by users?
  • Algorithm: Does the code have sufficient comments about what algorithm is used? Is the algorithm correct and efficient (space and time complexity)?
  • Implementation: Does the code correctly implement the documented algorithm(s)?
  • Testing: Does the code have the accompanying test translator and input test codes to ensure the contributions do what they are supposed to do?
    • Is Jenkins being configured to trigger these tests (your work may require new pre-requisite software or configure options)? Local tests on developer's workstation do not count.

More details, quick summary from Coding Standard

  • Naming conventions: File and directory names follow our standards; clear and intuitive
    • Directory structure: source code, test code, and documentation files are added into the correct locations
  • Maintainability: clarity of code; can somebody who did not write the code easily understand what the code does?
    • No looong functions: a function with hundreds of lines of code is a no-no
    • Architecture/design: the reasons and motivations for writing the code, and its design.
  • No duplication: similar code may already exist or can be extended
  • Re-use: can part of the code be refactored to be reusable by others?
  • Unit tests: make check rules are associated with each new feature to ensure the new feature will be tested and verified for expected behaviors
  • Sanity: no turning off, or relaxing, other tests to make the developer's commits pass Jenkins. In other words, no cheating.

Commenting

edit

Reviewer comments should be clearly delimited into these three well-defined sections:

1. Mandatory: the details of the comment must be implemented in a new commit and added to the Pull Request before the code review can be completed.

2. Recommended: the details of the comment could represent a best-practice or, simply, it could be intended to provide some insight to the developer that they may have not thought about.

Both Mandatory and Recommended can be accompanied by the keyword Nitpick:

3. Nitpick: the details of the comment represent a fix that usually involves a spelling/grammatical or coding style correction. The main purpose of the nitpick indication is to let the developer know that you're not trying to be on their case and make their life difficult, but an error is an error, or there's a better way to do something.

Decisions

edit

Make a clear and definitive decision for the code review:

  • Pass: The code does what it is supposed to do with clear documentation and test cases. Merge and close the pull request.
  • Pass but with future tasks. The commits are accepted. But some additional tasks are needed in the future to improve the code. They can be put into a separate set of commits and pushed later on.
  • Fail. Additional work is needed, such as better naming, better places to put files, more source comments, add regression tests, etc. Notify the developers of the issues and ask for a new set of commits to be pushed addressing the corrections or improvements.

Giving negative feedback

edit

We directly quote from http://www.mediawiki.org/wiki/Code_review_guide#Giving_negative_feedback

" Here are a few guidelines in the event you need to reject someone's submission or ask them to clean up their work:

  1. Focus your comments on the code and any objectively-observed behavior, not motivations; for example, don't state or imply assumptions about motivating factors like whether the developer was just too lazy or stupid to do things right.
  2. Be empathetic and kind. Recognize that the developer has probably put a lot of work in their idea, and thank them for their contribution if you feel comfortable and sincere in doing so (and try to muster the comfort and sincerity). Most importantly, put yourself in their shoes, and say something that indicates you've done so.
  3. Help them schedule their work. If their idea is a "not yet" kind of idea, try to recommend the best way you know of to get their idea on a backlog (i.e. the backlog most likely to eventually get revisited).
  4. Let them know where they can appeal your decision. For example, if the contributor doesn't have a history of being disruptive or dense, invite them to discuss the issue on wikitech-l.
  5. Be clear. Don't sugarcoat things so much that the central message is obscured.
  6. Most importantly, give the feedback quickly. While tactful is better (and you should learn from past mistakes), you can always apologize for a poorly-delivered comment with a quick followup. Don't just leave negative feedback to someone else or hope they aren't persistent enough to make their contribution stick."

Who should review what

edit

Ideally, every ROSE contributor should participate in code review as a reviewer at some point so the benefits of peer-review can fully be fulfilled.

However, due to the limited access to our internal github enterprise server, we currently have a centralized review process in which ROSE staff members (liao6, too1) serve as the default code reviewers. They are responsible for either reviewing the code themselves or delegate to other developers who either has better knowledge about the contributions or should be aware of the contributions.

We am actively looking at better options and will gradually expand the pool of reviewers so the reviewing step won't become a bottleneck.

TODO: use rosebot to automatically assign reviewers according to a hierarchical configuration of the source-tree.

What to avoid

edit
  • Judging code by whether it's what the reviewer would have written
    • Given a problem, there are usually a dozen different ways to solve it. And given a solution, there's a million ways to render it as code.
  • degenerating into nitpicks:
    • perfectionism may hurt the progress. we should allow some non-critical improvements to be done in the next version/commits.
  • feel obligated to say something critical: it is perfectly fine to say "looks good, pass"
  • delay in review: we should not rush it but we should keep in mind that somebody is waiting for the review to be done to move forward

Criticism

edit

Code reviews often degenerate into nitpicks. Brainstorming and design reviews to be more productive.

  • This makes sense, the early we catch the problems, the better. Design happens earlier. Design should be reviewed. The same idea applies to requirement analysis also.
  • To mitigate this risk, we now have rules for design document in our coding standard.

Troubleshooting

edit

master is out-of-sync

edit

The master branch of each developer's git repository (http://rose-github.llnl.gov) should be automatically synchronized with the central git repository's master branch (/nfs/casc/overture/ROSE/git/ROSE.git). In rare cases, it could be out-of-sync. Here is an example to perform a manual synchronization:

1. Clone your Github repository:

$ cd ~/Development/projects/rose
$ git clone git@github.com:<user_oun>/rose.git
Cloning into ROSE...
remote: Counting objects: 216579, done.
remote: Compressing objects: 100% (55675/55675), done.
remote: Total 216579 (delta 159850), reused 211131 (delta 155786)
Receiving objects: 100% (216579/216579), 296.41 MiB | 35.65 MiB/s, done.
Resolving deltas: 100% (159850/159850), done.

2. Add the central repository as a remote repository:

$ git remote add central /nfs/casc/overture/ROSE/git/ROSE.git
$ git fetch central
From /nfs/casc/overture/ROSE/git/ROSE.git
 * [new branch]      master     -> central/master
 ...

3. Push the central master branch to your Github's master branch:

-bash-3.2$ git push central central/master:refs/heads/master
Total 0 (delta 0), reused 0 (delta 0)
To git@rose-github.llnl.gov:<user_oun>/rose.git
   16101fd..563b510  central/master -> master

master cannot be synchronized

edit

In rare cases, your repository's master branch cannot be automatically synchronized. This is most likely due to merge conflicts. You will receive an error message through an automated email, resembling the following (last updated on 7/24/2012):

To git@rose-github.llnl.gov:lin32/rose.git
! [rejected]        origin/master -> master (non-fast forward)
error: failed to push some refs to 'git@rose-github.llnl.gov:lin32/rose.git'

---

Your master branch at [rose-github.llnl.gov:lin32/rose.git] cannot be automatically updated with [/nfs/casc/overture/ROSE/git/ROSE.git:master]

Please manually force the update:

Add the central repository as a remote, call it "nfs":

  $ git remote add nfs /nfs/casc/overture/ROSE/git/ROSE.git

1. First, try to manually perform a merge in your local repository:

  # 1. Checkout and update your Github's master branch
  $ git checkout master
  $ git pull origin master

  # 2. Merge the central master into your local master
  $ git pull nfs master
  <no merge conflicts>

  # 3. Synchronize your local master to your Github's master
  $ git push origin HEAD:refs/head/master

2. Otherwise, try to resolve the conflict.

3. Finally, if all else fails, force the synchronization:

  $ git push --force origin nfs/master:refs/heads/master

  WARNING: your master branch on Github will be overriden so make sure
  you have sufficient backups, and take precaution.

Please simply follow the email's instructions to force the update of your Github's master branch.

Past Software Experience

edit

In the past, we have experimented with other code review tools:

Gerrit (Google)

edit

In short:

  • Gerrit's user interface is not user-friendly (it's complex and therefore, more confusing). This is true, when compared to Github's Pull Request mechanism for code review.
  • Gerrit's remote API was not mature enough to handle our workflow. Additionally, we had to hack several things in order to slightly suit our needs. On the other hand, Github has a great remote API which is easily accessible through Ruby scripting, a very popular language for the domain of web interfaces and development.
  • Gerrit is not as popular as Github, which is important for our project to gain traction. Also, more people are familiar with Github so it makes it easier for them to use.

TODO

edit
  • TOP-PRIORITY: add pre-screening Jenkins job before manual code review kicks in
  • Research, install, and test Facebook's Phabricator: http://phabricator.org/

Connection to Jenkins

edit

See Continuous_Integration#Connection_to_Code_Review

References

edit

Continuous Integration

edit
 
ROSE Continuous integration using Git and Jenkins (Code Review Omitted for simpler explanation)

Motivation

edit

Without automated continuous integration, we had frequent incidents like:

  • Developer A commits something to our central git repository's master branch. The commits contain some bugs which break our build and take a long time to have a fix. Then the central master branch is left to a corrupted state for weeks so nobody can check out/in anything.
  • Developer A does a lot of wonderful work offline for months. But her work later is found to be incompatible with another developer's work. Her work has unsolvable merge conflicts.

Overview

edit

The ROSE project uses a workflow that automates the central principles of continuous integration in order to make integrating the work from different developers a non-event. Because the integration process only integrates with ROSE the changes that passes all tests we encourage all developers to stay in sync with the latest version.

A high level overview of the development model used by ROSE developers.

  • Step 1: Taking advantage of the distributed source code repositories based on git, each developer should first clone his/her own repository from our central git repository (or its mirrors/clones/forks).
  • Step 2: Then a feature or a bugfix can be developed in isolation within the private repository. The developer can create any number of private branches. Each branch should relate to a feature that this developer is working on and be relatively short-lived. The developer can commit changes to the private repository without maintaining an active connection to the shared repository.
  • Step 3: When work is finished and locally tested (make, make check, and make distcheck -j#n), she can pull the latest commits from the central repo's master branch
  • Step 4: She then can push all accumulated commits within the private repository to his branch within the shared repository. We create a dedicated branch within the central repository for each developer and establish access control of the branch so only an authorized developer can push commits to a particular branch of the shared repository.
  • Step 5-6 (automated): Any commits from a developer’s private repository will not be immediately merged to the master branch of the shared repository.

In fact, we have access control to prevent any developer from pushing commits to the master branch within the shared repository. A continuous integration server called Jenkins is actively monitoring each developer’s branch within the central repository and will initiate comprehensive commit tests upon the branch once new commits are detected. Finally, Jenkins will merge the new commits to the master branch of the central repository if all tests pass. If a single test fails, Jenkins will report the error and the responsible developer should address the error in his private repository and push improved commits again.

As a result, the master branch of the central git repository is mostly stable and can be a good candidate for our external release. On top of the master branch of the central git repository, we further have more comprehensive release tests in Jenkins. If all the release tests pass, an external release based on the master branch will be made available outside.

Tests on Jenkins

edit

We use Jenkins ( http://hudson-rose-30:8080/ ) to test commits added to developer's release candidate branches at the central git repository.

The tests are organized into three categories

  • Integration: tests used to check if the new commits can pass various "make check" rules, compatibility tests, portability tests, configuration tests, and so on. If all tests pass, the commits will be merged (or integrated) into the master branch of the central repository.
  • Release: tests used to test the updated master branch of the central repository for additional set of tests using external benchmarks. If all tests pass, the head of the master will be released as a stable snapshot for public file package releases(generated by "make dist").
  • Others: for informational purpose now, not being used in our production workflow.

So for each push (one or more commits to a -rc branch), it will go through two stages: Integration test and Release test stage.

It is each developer's responsibility to make sure their commits can pass BOTH stage by fixing any bugs discovered by the tests.

Installed Software Packages

edit

Here we list software packages installed and used by Jenkins

  • Yices: /export/tmp.hudson-rose/opt/yices/1.0.34

Development Jenkins

edit

Several configurations

GCC_VERSION=4.4.7
BOOST_VERSION=1_47_0
source /nfs/casc/overture/ROSE/opt/rhel6/x86_64/rose_environment.sh
__rose__JAVA_VERSION_HOME=/nfs/casc/overture/ROSE/opt/rhel6/x86_64/java/jdk/1.7.0_51
GCC_VERSION=4.8.1
BOOST_VERSION=1_50_0
source /nfs/casc/overture/ROSE/opt/rhel6/x86_64/rose_environment.sh
__rose__JAVA_VERSION_HOME=/nfs/casc/overture/ROSE/opt/rhel6/x86_64/java/jdk/1.7.0_51

Check Testing Results

edit

It is possible to manually tracking down how you commits are doing within the test pipeline within Jenkins (http://hudson-rose-30:8080/). But it can be tedious and overwhelming.

So we provide a dashboard ( http://sealavender:4000/) to summarize the commits to your release candidate branch(-rc) and the pass/fail status for each integration tests.

Note: It's possible that all of your testing jobs (finally) pass, but the actual integration is not performed. This typically occurs when one of your jobs have a system failure, for instance, so it has to be manually re-started. If you see that all of your jobs have passed, but your work has not been integrated, please let the Jenkins administrator know.

Frequently Failed Jobs

edit

See details at ROSE Compiler Framework/Jenkins Failures

Connection to Code Review

edit
 
Connection between Github Enterprise and Jenkins

In reality, most LLNL developers are now asked to push things to Github Enterprise for code review first instead of directly pushing to our central git repository. The synchronization between the Github Enterprise's code review repositories and our Central Git repo are automated.

Auto Pull

edit

Auto pull: we have another Jenkins at (https://hudson-rose-30:8443/jenkins/) which serves as the bridge between Github Enterprise and our main production Jenkins.

  • For each private repositories on Github Enterprise, we have a Jenkins job to monitor the master branch for approved pull (merge) request. If there is any new approved commits, the job will transfer the commits to the central repository's -reviewed-rc branch for that developer.

Configuration of the auto pull job:

  • Source code management
    • git: git@github.llnl.gov:account_name/rose.git
    • branches to be build: github/master
  • Build Trigger: Poll SCM , schedule "* * * * *"
  • Execution shell
##
## Add /nfs as remote
##
## `|| true`: don't error if remote exists
##
git remote add nfs /nfs/casc/overture/ROSE/git/ROSE.git || true
git fetch nfs

##
## Push to /nfs *-rc
##
if [ -n "$(git log --oneline nfs/master..github/master)" ]; then
  git push --force nfs "$GIT_BRANCH":refs/heads/oun-reviewed-rc
fi

Auto Push

edit

Auto push: A Jenkins job is responsible for propagating latest central master contents to all private repositories on github.llnl.gov

The Job configuration

  • source Code Management:
    • Git: /nfs/casc/overture/ROSE/git/ROSE.git
    • Branches to build: */master
  • Build Trigger: Build after other projects are built: Commit
  • Execute Shell
USERS="\
user1\
user2
"

for user in $USERS; do
  tmpfile="$(mktemp)"
  ( git push git@github.llnl.gov:"$user"/rose.git origin/master:refs/heads/master 2>"$tmpfile" ) || true
  set +e
  cat "$tmpfile"
  cat "$tmpfile" | grep -q "non-fast.*forward"
  if [ $? -eq 0 ]; then
    echo "Sending error email to [${user}@llnl.gov] because their github/master is non-fast-forwardable"
    # email details are omitted here.
  fi
done

Reproduce Jenkins Job failures

edit

Several key elements

  • the Jenkins script repository
  • the right version of ROSE
  • the hardware machine
  • the environment

Assume one failed job is https://hudson-rose-44.llnl.gov:9443/jenkins/job/development-compile-with-autotools-default-EDG-RHEL6/892/gcc=4.4.7,label=RHEL6,rhel=6/parameters/

Steps:

#!/bin/bash -ex

export GCC_VERSION="$gcc"

rm -rf ./jenkins-build-scripts/ || exit 1
git clone rose-dev@rosecompiler1.llnl.gov:jenkins/dev/jenkins-build-scripts.git || exit 1
source ./jenkins-build-scripts/config/env-Linux.sh $gcc $rhel || exit 1
./jenkins-build-scripts/development/development-compile-with-autotools-default-EDG-RHEL6.sh $gcc || exit 1
  • the same configuration page has a configuration matrix with all user defined parameters and values, including gcc versions, OS versions, and others.

Now manually check out the commit and run the script with the right gcc version and rhel version passed

  • ../jenkins-build-scripts/config/env-Linux.sh "4.4.7" 6
  • git clone rose-dev@rosecompiler1.llnl.gov:rose/scratch/rose sourcetree
  • cd sourcetree/
  • git checkout -b autopar 83abd459eee1b575b4e7fab04a9f1dfc4955f02a
  • git submodule init
  • git submodule update
  • ../jenkins-build-scripts/development/development-compile-with-autotools-default-EDG-RHEL6.sh 4.4.7

TODO

edit

High priority

  • Add a pre-screening job before manual code review kicks in. the pre-screening job can make sure the code to be reviewed will be compiled with minimum warning messages and with required make check rules to run tests.
  • enable email notification for the final results of each test:
  • incrementally add more compilation tests using external benchmarks to be integration tests.
    • Initial two jobs: spec cpu benchmark + NPB Fortran benchmarks
  • Better integration with Github Enterprise

Third Party software installed for testing in Jenkins.

  • Yices (http://yices.csl.sri.com/)
    • Download Yices1, the lasted version is better.
    • untar the tarball package of yices, then it is YICES_INSTALL, which is name like yices-1.0.34
    • Type --with-yices=YICES_INSTALL with ROSE/configure option
    • setup YICES_INSTALL/lib into LD_LIBRARY_PATH for Linux and DYLD_LIBRARY_PATH for mac users, it is like add Boost/lib into LD_LIBRARY_PATH

References

edit
  • Files used to generate the figure: feel free to add new versions as new slides: link

Frequently Asked Questions (FAQ)

edit

We collect a list of frequently asked questions about ROSE, mostly from the rose-public mailing list link

General

edit

How to search rose-public mailinglist for previously asked questions?

edit

Use the following command on google search
site:https://mailman.nersc.gov/pipermail/rose-public $(ADD_YOUR_SEARCH_TERM_HERE)

How to check the version of ROSE?

edit

ROSE_Install_path/include/rose/rosePublicConfig.h

/* Define to the version of this package. */
#define ROSE_PACKAGE_VERSION "0.9.8.54"


To check this in your code

bool checkRoseVersionNumber(const std::string &need) {
    std::vector<std::string> needParts = rose::StringUtility::split('.',
need);
    std::vector<std::string> haveParts = rose::StringUtility::split('.',
ROSE_PACKAGE_VERSION);

    for (size_t i=0; i < needParts.size() && i < haveParts.size(); ++i) {
        if (needParts[i] != haveParts[i])
            return needParts[i] < haveParts[i];
    }
    // E.g., need = "1.2" and have = "1.2.x", or vice versa
    return true;
} 

Why can't ROSE staff members answer all my questions?

edit

It can feel very frustrating when you get no responses to your questions submitted to the rose-public@nersc.gov mailing list. You may wonder why the ROSE staff cannot help neither sometimes.

Here are some possible excuses:

  • They are just as busy as everybody else in the research and development fields. They may be working around the clock to meet deadlines for proposals, papers, project reviews, deliverables, etc.
  • They don't know every corner of their own compiler, given the breadth and depth of contributions made to ROSE by collaborators, former staff members, post-docs, and interns. Moreover, most contributions lack good documentation--something that should be remedied in the future.
  • Some questions are simply difficult and open research and development questions. They may have no clue, either.
  • They just feel lazy sometimes or are taking a thing called vacation.

Possible alternatives to have your questions answered and your problems solved in a timely fashion:

  • Please do you own homework first (e.g. Google).
  • The ROSE team is actively addressing the documentation problem, through an internal code review process to enforce well-documented contributions going forward.
  • Help others to help yourself. Answer questions on the rose-public@nersc.gov mailing list and contribute to this community-editable Wikibook.
  • Find ways to formally collaborate with, or fund, the ROSE team. Things go faster when money is flowing :-) Sad, but true, reality in this busy world.

How many lines of source code does ROSE have?

edit

Excluding the EDG submodule and all source code comments, the core of ROSE (rose/src) has about 674,000 lines of C/C++ source code as of July 11, 2012.

Including tests, projects, and tutorial directories, ROSE has about 2 Million lines of code.

Some details are shown below:

[rose/src]./cloc-1.56.pl .
    3076 text files.
    2871 unique files.                                          
     716 files ignored.

http://cloc.sourceforge.net v 1.56  T=26.0 s (91.7 files/s, 39573.3 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C++                            908          75280          93960         354636
C                              123          12010           3717         199087
C/C++ Header                   915          28302          38412         121373
Bourne Shell                    17           3346           4347          25326
Perl                             4            743           1078           7888
Java                            18           1999           4517           7096
m4                               1            747             20           6489
Python                          34           1984           1174           5363
make                           148           1682           1071           3666
C#                              11            899            274           2546
SQL                              1              0              0           1817
Pascal                           5            650             31           1779
CMake                          168           1748           4880           1702
yacc                             3            352            186           1544
Visual Basic                     6            228            421           1180
Ruby                            11            281            181            809
Teamcenter def                   3              3              0            606
lex                              2            103             47            331
CSS                              1             95             32            314
Fortran 90                       1             34              6            244
Tcl/Tk                           2             29              6            212
HTML                             1              8              0             15
-------------------------------------------------------------------------------
SUM:                          2383         130523         154360         744023
-------------------------------------------------------------------------------

How large is ROSE?

edit

To show top level information only (in MB): du -msl * | sort -nr

170	tests
109	projects
90	src
19	docs
16	winspecific
16	ROSE_ResearchPapers
15	binaries
7	scripts
5	LicenseInformation
4	tutorial
4	autom4te.cache
2	libltdl
2	exampleTranslators
2	configure
2	config
2	ChangeLog

Sort directories by their sizes in MegaBytes

 du -m | sort -nr >~/size.txt
709	.
250	./.git
245	./.git/objects
243	./.git/objects/pack
170	./tests
109	./projects
90	./src
76	./tests/CompileTests
50	./tests/RunTests
40	./tests/RunTests/FortranTests
34	./tests/RunTests/FortranTests/LANL_POP
29	./tests/RunTests/FortranTests/LANL_POP/netcdf-4.1.1
27	./src/3rdPartyLibraries
23	./tests/roseTests
23	./src/frontend
22	./tests/CompileTests/Fortran_tests
21	./tests/CompilerOptionsTests
19	./docs
18	./tests/CompileTests/RoseExample_tests
18	./src/midend
18	./docs/Rose
16	./winspecific
16	./ROSE_ResearchPapers
15	./tests/CompileTests/Fortran_tests/gfortranTestSuite
15	./binaries/samples
15	./binaries
14	./tests/CompileTests/Fortran_tests/gfortranTestSuite/gfortran.dg
14	./src/roseExtensions
11	./projects/traceAnalysis
10	./tests/CompileTests/A++Code
10	./tests/CompilerOptionsTests/testCpreprocessorOption
10	./tests/CompilerOptionsTests/A++Code
10	./src/roseExtensions/qtWidgets
10	./src/frontend/Disassemblers
10	./projects/symbolicAnalysisFramework
10	./projects/SATIrE
10	./projects/compass
9	./winspecific/MSVS_ROSE
9	./tests/RunTests/A++Tests
9	./tests/roseTests/binaryTests
9	./src/frontend/SageIII
9	./projects/symbolicAnalysisFramework/src
9	./docs/Rose/powerpoints
8	./winspecific/MSVS_project_ROSETTA_empty
8	./projects/simulator
7	./tests/RunTests/FortranTests/LANL_POP_OLD
7	./tests/CompileTests/Cxx_tests
7	./src/midend/programTransformation
7	./src/midend/programAnalysis
7	./src/3rdPartyLibraries/libharu-2.1.0
7	./scripts
7	./projects/symbolicAnalysisFramework/src/mpiAnal
7	./projects/RTC
6	./winspecific/MSVS_ROSE/Debug
6	./tests/RunTests/FortranTests/LANL_POP/netcdf-4.1.1/ncdap_test
6	./tests/roseTests/programAnalysisTests
6	./src/3rdPartyLibraries/ckpt
6	./src/3rdPartyLibraries/antlr-jars
6	./projects/SATIrE/src
5	./tests/RunTests/FortranTests/LANL_POP/pop-distro
5	./tests/RunTests/FortranTests/LANL_POP/netcdf-4.1.1/libcf
5	./tests/CompileTests/ElsaTestCases
5	./src/ROSETTA
5	./src/3rdPartyLibraries/qrose
5	./projects/DatalogAnalysis
5	./projects/backstroke
5	./LicenseInformation
5	./docs/Rose/AstProcessing

To list files based on size

 find . -type f -print0 | xargs -0 ls -s | sort -k1,1rn
241568 ./.git/objects/pack/pack-f366503d291fc33cb201781e641d688390e7f309.pack
13484 ./tests/CompileTests/RoseExample_tests/Cxx_Grammar.h
10240 ./projects/traceAnalysis/vmp-hw-part.trace
6324 ./tests/RunTests/FortranTests/LANL_POP_OLD/poptest.tgz
5828 ./winspecific/MSVS_ROSE/Debug/MSVS_ROSETTA.pdb
4732 ./.git/objects/pack/pack-f366503d291fc33cb201781e641d688390e7f309.idx
4488 ./binaries/samples/bgl-helloworld-mpicc
4488 ./binaries/samples/bgl-helloworld-mpixlc
4080 ./LicenseInformation/edison_group.pdf
3968 ./projects/RTC/tags
3952 ./src/frontend/Disassemblers/x86-InstructionSetReference-NZ.pdf
3908 ./tests/CompileTests/RoseExample_tests/trial_Cxx_Grammar.C
3572 ./winspecific/MSVS_project_ROSETTA_empty/MSVS_project_ROSETTA_empty.ncb
3424 ./src/frontend/Disassemblers/x86-InstructionSetReference-AM.pdf
2868 ./.git/index
2864 ./projects/compassDistribution/COMPASS_SUBMIT.tar.gz
2864 ./projects/COMPASS_SUBMIT.tar.gz
2740 ./ROSE_ResearchPapers/2007-CommunicatingSoftwareArchitectureUsingAUnifiedSingle-ViewVisualization-ICECC
S.pdf
2592 ./docs/Rose/powerpoints/rose_compiler_users.pptx
2428 ./src/3rdPartyLibraries/ckpt/wrapckpt.c
2408 ./projects/DatalogAnalysis/jars/weka.jar
2220 ./scripts/graph.tar
1900 ./src/3rdPartyLibraries/antlr-jars/antlr-3.3-complete.jar
1884 ./src/3rdPartyLibraries/antlr-jars/antlr-3.2.jar
1848 ./src/midend/programTransformation/ompLowering/run_me_defs.inc
1772 ./src/3rdPartyLibraries/qrose/docs/QROSE.pdf
1732 ./tests/CompileTests/Cxx_tests/longFile.C
1724 ./src/midend/programTransformation/ompLowering/run_me_task_defs.inc
1656 ./ChangeLog
1548 ./tests/roseTests/binaryTests/yicesSemanticsExe.ans
1548 ./tests/roseTests/binaryTests/yicesSemanticsLib.ans
1480 ./ROSE_ResearchPapers/1997-ExpressionTemplatePerformanceIssues-IPPS.pdf
1408 ./docs/Rose/powerpoints/ExaCT_AllHands_March2012_ROSE.pptx

...

Compilation

edit

Cannot download the EDG binary tar ball

edit

Three possible reasons

  • the website hosting EDG binaries is down (there is a manual way to get the binary)
  • we don't support the platform you use so there is no EDG binary is available for you.
  • you cloned your rose from an un-official repo so the build process cannot figure out the right version of EDG binary for you. (there is a solution mentioned below)

It is possible that the rosecompiler.org website is down for maintenance.

So you may encounter the following error message:

make[3]: Entering directory `/home/leo/workspace/github-rose/buildtree/src/frontend/CxxFrontend' test -d /nfs/casc/overture/ROSE/git/ROSE_EDG_Binaries && cp /nfs/casc/overture/ROSE/git/ROSE_EDG_Binaries/roseBinaryEDG-3-3-i686-pc-linux-gnu-GNU-4.4-32fe4e698c2e4a90dba3ee5533951d4c.tar.gz . || wget http://www.rosecompiler.org/edg_binaries/roseBinaryEDG-3-3-i686-pc-linux-gnu-GNU-4.4-32fe4e698c2e4a90dba3ee5533951d4c.tar.gz --2012-08-05 12:58:29-- http://www.rosecompiler.org/edg_binaries/roseBinaryEDG-3-3-i686-pc-linux-gnu-GNU-4.4-32fe4e698c2e4a90dba3ee5533951d4c.tar.gz Resolving www.rosecompiler.org... 128.55.6.204 Connecting to www.rosecompiler.org|128.55.6.204|:80... failed: No route to host. make[3]: *** [roseBinaryEDG-3-3-i686-pc-linux-gnu-GNU-4.4-32fe4e698c2e4a90dba3ee5533951d4c.tar.gz] Error 4

In this case, you should ask for the missing tar ball or find it on our backup location

You don't have to clone the entire edge binary repo since it is big. You can just download the one you need (click raw file link on github.com).

Once you get the bar ball, copy it to your build tree's CxxFrontend subdirectory:

  • buildtree/src/frontend/CxxFrontend

Then you should be able to normally build rose by typing make.

TODO: automate the search using the alternative path to obtain edg binary

Another possible reason is that you cloned your local rose repo from an unofficial repository.

  • In order to maintain the correct matching between rose source and EDG binary, we require a canonical repository to be available.

make[3]: Leaving directory `/global/project/projectdirs/rosecompiler/rose-project-workspace/xomp-instr/buildtree/src/frontend/CxxFrontend/Clang'
Unable to find a remote tracking a canonical repository.  Please add a
canonical repository as a remote and ensure it is up to date.  Currently
configured remotes are:

   origin => git@xxx.com/myrose.git

Potential canonical repositories include:

   anything ending with "rose.git" (case insensitive)
Unable to find a remote tracking a canonical repository.  Please add a
canonical repository as a remote and ensure it is up to date.  Currently
configured remotes are:

   origin => git@xxx.com/myrose.git

Potential canonical repositories include:

   anything ending with "rose.git" (case insensitive)
make[3]: Entering directory `/global/project/projectdirs/rosecompiler/rose-project-workspace/xomp-instr/buildtree/src/frontend/CxxFrontend'
test -d /nfs/casc/overture/ROSE/git/ROSE_EDG_Binaries && cp /nfs/casc/overture/ROSE/git/ROSE_EDG_Binaries/roseBinaryEDG-3-3-x86_64-pc-linux-gnu-GNU-4.3-.tar.gz  . || wget http://www.rosecompiler.org/edg_binaries/roseBinaryEDG-3-3-x86_64-pc-linux-gnu-GNU-4.3-.tar.gz
--2013-02-15 17:26:42--  http://www.rosecompiler.org/edg_binaries/roseBinaryEDG-3-3-x86_64-pc-linux-gnu-GNU-4.3-.tar.gz
Resolving www.rosecompiler.org... 128.55.6.204
Connecting to www.rosecompiler.org|128.55.6.204|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2013-02-15 17:26:42 ERROR 404: Not Found.

make[3]: *** [roseBinaryEDG-3-3-x86_64-pc-linux-gnu-GNU-4.3-.tar.gz] Error 1
make[3]: Leaving directory `/global/project/projectdirs/rosecompiler/rose-project-workspace/xomp-instr/buildtree/src/frontend/CxxFrontend'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/global/project/projectdirs/rosecompiler/rose-project-workspace/xomp-instr/buildtree/src/frontend/CxxFrontend'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/global/project/projectdirs/rosecompiler/rose-project-workspace/xomp-instr/buildtree/src/frontend'
make: *** [all-recursive] Error 1
make: Leaving directory `/global/project/projectdirs/rosecompiler/rose-project-workspace/xomp-instr/buildtree/src'

Solution: add an official rose repo as an additional remote repo of your local repo

  • add a canonical repository, like the one at github: git add remote official-rose https://github.com/rose-compiler/rose.git
  • git fetch official-rose // to retrieve hash numbers etc in the canonical repository
  • Now you can build rose again. it should find the canonical repo you just added and use it to find a matching EDG binary

How to access EDG or EDG-SAGE connection code?

edit

From page 5 of http://rosecompiler.org/ROSE_UserManual/ROSE-UserManual.pdf

The connection code that was used to translate EDG’s AST to SAGE III was derived loosely from the EDG C++ source generator and has formed the basis of the SAGE III translator from EDG to SAGE III’s IR.

Under the license we have, the EDG source code and the translation from the EDG AST in distributions are excluded from source release and are made available through a binary format. No part of the EDG work is visible to the user of ROSE. The EDG source are available only to those who have the EDG research or commercial license.

Chapter 2.6 "Getting a Free EDG License for Research Use" of the manual has instructions about how to obtain the EDG license.

Once you obtain the license, please contact the staff members of ROSE to verify your license. After that, they will give you more instructions about how to proceed.

How to speedup compiling ROSE?

edit

Question It takes hours to compile ROSE, how can I speed up this process?

Answer:

  • if you have multi-core processors, try to use make -j4 (make by using four processes or even more if you like).
  • also try to only build librose.so under src/ by typing make -C src/ -j4
  • Or only try to build the language support you are interested in during configure, such as
    • ../sourcetree/configure --enable-only-c # if you are only interested in C/C++ support
    • ../sourcetree/configure --enable-only-fortran # if you are only interested in Fortran support
    • ../sourcetree/configure --help # show all other options to enable only a few languages.

Can ROSE accept incomplete code?

edit

https://mailman.nersc.gov/pipermail/rose-public/2011-July/001015.html

ROSE does not handle incomplete code. Though this might be possible in the future. It would be language dependent and likely depend heavily on some of the language specific tools that we use internally. This is however, not really a priority for our work. If you want to for example demonstrate how some of the internal tools we are using or alternative tools that we could use might handle incomplete code, this might be interesting and we could discuss it.

For example, we are not presently using Clang, but if it handled incomplete code that might be interesting for the future. I recall that some of the latest EDG work might handle some incomplete code, and if that is true then that might be interesting as well. I have not attempted to handle incomplete code with OFP, so I am not sure how well that could be expected to work. Similarly, I don't know what the incomplete code handling capabilities of ECJ Java support is either. If you know any of these questions we could discuss this further.

I have some doubts about how much meaningful information can come from incomplete code analysis and so that would worry me a bit. I expect it is very language dependent and there would be likely some constraints on the incomplete code. So understanding the subject better would be an additional requirement for me.

Can ROSE analyze Linux Kernel sources?

edit

https://mailman.nersc.gov/pipermail/rose-public/2011-April/000856.html

Question: I'm trying to analyze the Linux kernel. I was not sure of the size of the code-base that can be handled by ROSE, and could not find references as to whether it has been tried on the Linux kernel source. As of now I'm trying to run the identity translator on the source, and would like to know if it can be done using ROSE, and if it has been successfully tested before.

Short answer: Not for now

Long answer: We are using EDG 3.3 internally by default and this version of EDG does not handle the GNU specific register modifiers used in the asm() statements of the Linux Kernel code. There might be other problems, but that was at least the one that we noticed in previous work on this some time ago. But we are working on upgrading the EDG frontend to be a more recent version 4.4.

Can ROSE compile C++ Boost library?

edit

https://mailman.nersc.gov/pipermail/rose-public/2010-November/000544.html

not yet.

I know of a few cases where ROSE can't handle parts of Boost. In each case it is an EDG problem where we are using an older version of EDG. We are trying to upgrade to a newer version of EDG (4.x), but that version's use within ROSE does not include enough C++ support, so it is not ready. The C support is internally tested, but we need more time to work on this.

How to find XYZ in AST?

edit

The usually steps to retrieve information from AST are:

  • prepare a simplest (preferably 5-10 lines only), compilable sample code with the code feature you want to find (e.g array[i][j] if you are curious about how to find use of multi-dimensional arrays in AST), avoid including any headers (#include file.h) to keep the code small.
    • Please note: don't include any headers in the sample code. A header (#include <stdio.h> for example) can bring in thousands of nodes into AST.
  • use dotGeneratorWholeASTGraph to generate a detailed AST dot graph of the input code
  • use zgrviewer-0.8.2's run.sh to visualize the dot graph
  • visually/manually locate the information you want in the dot graph, understand what to look and where to look

Some sample AST graphs are available at https://github.com/chunhualiao/rose-ast

How to get children of an AST node?

edit

Once you know how to find a child in the AST manually. You can use codes to walk the AST using AST member functions, traversal, or SageInteface functions, etc to retrieve the information you want

  • ROSE provides member access functions like get_X() by default for a child named X. such as get_lhs_operand() for SgBinaryOp with a child named lhs_operand in the AST graph.
  • The names are shown in AST graph as labels of edges from parents to children.

To get a child by index use the function (not recommended though):

virtual SgNode * 	get_traversalSuccessorByIndex (size_t idx)

and/or related, similarly named functions.

How to filter out header files from AST traversals?

edit

https://mailman.nersc.gov/pipermail/rose-public/2010-April/000144.html

Question: I want to exclude functions in #include files from my analysis/transformations during my processing.

By default, AST traversal may visit all AST nodes, including the ones come from headers.

So AST processing classes provide three functions :

  • T traverse (SgNode * node, ..): traverse full AST , nodes which represent code from include files
  • T traverseInputFiles(SgProject* projectNode,..) traverse the subtree of AST which represents the files specified on the command line
  • T traverseWithinFile(SgNode* node,..): only the nodes which represent code of the same file as the start node

Should SgIfStmt::get_true_body() return SgBasicBlock?

edit

https://mailman.nersc.gov/pipermail/rose-public/2011-April/000930.html

Both true/false bodies were SgBasicBlock before.

Later, we decided to have more faithful representation of both blocked (with {...}) and single-statement (without { ..} ) bodies. So they are SgStatement (SgBasicBlock is a subclass of SgStatement) now.

But it seems like the document has not been updated to be consistent with the change.

You have to check if the body is a block or a single statement in your code. Or you can use the following function to ensure all bodies must be SgBasicBlock.

//A wrapper of all ensureBasicBlockAs*() above to ensure the parent of s is a scope statement with list of statements as children, otherwise generate a SgBasicBlock in between.

SgLocatedNode * SageInterface::ensureBasicBlockAsParent (SgStatement *s)

How to handle #include "header.h", #if, #define etc. ?

edit

It is called preprocessing info. within ROSE's AST. They are attached before, after, or within a nearby AST node (only the one with source location information.)

An example translator is provided to traverse the input code's AST and dump information about the found preprocessing information. The source code of this translator is https://github.com/rose-compiler/rose/blob/master/exampleTranslators/defaultTranslator/preprocessingInfoDumper.C .

To use the translator:

buildtree/exampleTranslators/defaultTranslator/preprocessingInfoDumper -c main.cxx
-----------------------------------------------
Found an IR node with preprocessing Info attached:
(memory address: 0x2b7e1852c7d0 Sage type: SgFunctionDeclaration) in file
/export/tmp.liao6/workspace/userSupport/main.cxx (line 3 column 1)
-------------PreprocessingInfo #0 ----------- :
classification = CpreprocessorIncludeDeclaration:
  String format = #include "all_headers.h"

relative position is = before

SgClassDeclaration::get_definition() returns NULL?

edit

If you look at the whole AST graph carefully, you can find defining and non-defining declarations for the same class.

A symbol is usually associated with a non-defining declaration. A class definition is associated with a defining declaration.

You may want to get the defining declaration from the non-defining declaration before you try to grab the definition, as in this function:

SgFunctionDefinition* getFunctionDefinitionFromDeclaration(const SgFunctionDeclaration* funcDecl) {
  //Get the defining declaration (we don't know if funcDecl is the defining or nonDefining declaration
  SgFunctionDeclaration* funcDefDecl = isSgFunctionDeclaration(funcDecl->get_definingDeclaration()); 
  ROSE_ASSERT(funcDefDecl != NULL);

  //Get the definition from the defining declaration
  SgFunctionDefinition* funcDef = isSgFunctionDefinition(funcDefDecl->get_definition());  
  ROSE_ASSERT(funcDef != NULL);
  return funcDef;
}

How to handle arrays?

edit

The first step is to get familiar with the AST representing Array types (SgArrayType) and array references (SgPntrArrRefExp). Then you can retrieve the necessary information from the AST.

To understand array types and array references, Here is one example,

// cat ~/temp/array.c 
int a[5][10][15];  // array declaration, a type is declared
int foo()
{
  return a[0][1][2]; // a reference to array element
}

An Array Type is represented by SgArrayType.

int a[5][10][15], corresponding three SgArrayType linked together

List a->get_type() will return the first one

  • SgArrayType_1: (index=5, base_type = SgArrayType_2)
  • SgArrayType_2: (index=10, base_type = SgArrayType_3)
  • SgArrayType_3: (index=15, base_type = SgTypeInt )

So a traverse from the first to the element type will get all dimension sizes 5-10-15

The subtree looks like

     SgArrayType_1
      /       \
     5      SgArrayType_2
            /       \
           10      SgArrayType_3  
                     /    \
                    15     SgTypeInt

An array reference is represented by SgPntrArrRefExp

A reference like: a[0][1][2]

  • SgPntrArrRefExp_1 <lhs= ref_2, rhs=2>
  • SgPntrArrRefExp_2 <lhs= ref_3, rhs=1>
  • SgPntrArrRefExp_3 <lhs= SgVarRefExp (a_symbol), rhs=0>

The subtree should look like the following:

    a[0][1][2] //SgPntrArrRefExp
      /    \
  a[0][1]  2 // SgIntVal
    / \
 a[0]  1
  / \
a    0

SgVarRefExp 

There are quite a few functions related to array handling in http://rosecompiler.org/ROSE_HTML_Reference/namespaceSageInterface.html

You can just search "array" to find them:

//Check if an expression is an array access (SgPntrArrRefExp). If so, return its name expression and subscripts if requested. Users can use convertRefToInitializedName() to get the possible name. It does not check if the expression is a top level SgPntrArrRefExp. 

SageInterface::isArrayReference (SgExpression *ref, SgExpression **arrayNameExp=NULL, std::vector< SgExpression * > **subscripts=NULL)


// 	returns the array dimensions in an array as defined for arrtype
std::vector< SgExpression * > 	SageInterface::get_C_array_dimensions (const SgArrayType &arrtype)

// 	Get the number of dimensions of an array type. 
 int 	SageInterface::getDimensionCount (SgType *t)

// 	Get the element type of an array. 
 SgType * 	SageInterface::getArrayElementType (SgType *t)

Some example code using these functions can be found in https://github.com/rose-compiler/rose-develop/blob/master/src/midend/programTransformation/ompLowering/omp_lowering.cpp

For example, void linearizeArrayAccess(SgPntrArrRefExp* top_array_ref) rewrites array reference using multiple-dimension subscripts to a reference using one-dimension subscripts:

  • a[i][j] is changed to a[i*col_size +j]
  • a [i][j][k] is changed to a [(i*col_size + j)*K_size +k]

Sample code to handle 1-D array references

For 1-D array element access a[0], the AST with 3 nodes looks like:

   a[0]          // node 1: SgPntrArrRefExp
  /    \
a       0    //node 3:  SgIntVal
|
// node 2: SgVarRefExp

So the code searching for SgVarRefExp will find a. The next step is to check its type.

SgVarRefExp *vref = ... 
ROSE_ASSERT (vref != NULL);

 SgType* t = vref->get_type();

  if (SgArrayType* atype= isSgArrayType(t)) // now you have array type
  {
    // obtain the dimension vector
    vector<SgExpression*> dimensions  =  SageInterface::get_C_array_dimensions (* atype);
    // dimensions.size() should be 1 if you only handle 1-D array types
    if (dimensions.size() ==1)
    {
      SgPntrArrRefExp * arr_ref_exp = vref->get_parent(); // now you get a[0] from a.
      //do your things you want , with a (vref) and a[o] (arr_ref_exp)

    }
  }
   else if (SageInterface::isScalarType(t))// if scalar types, handle them differently
   {
     ...
   } 

How to add new AST nodes?

edit

There is a section named "1.7 Adding New SAGE III IR Nodes (Developers Only)" in ROSE Developer’s Guide (http://www.rosecompiler.org/ROSE_DeveloperInstructions.pdf)

But before you decide adding new nodes, you may consider if AstAttribute (user defined objects attached to AST) would be sufficient for your problem.

For example, the 1st version of the OpenMP implementation in ROSE (rose/projects/OpenMP_Translator) started by using AstAttribute to represent information parsed from pragmas. Only in the 2nd version we introduced dedicated AST nodes.

There are two separate steps when new kinds of IR nodes are added into ROSE:

  • First step (declaration): Adding class declaration/implementation into ROSE for the new IR nodes. This step is mostly related to ROSETTA.
  • Second step (creation): Creating those new IR nodes at some point: such as somewhere within frontend, midend, or even backend if desired. So this step is decided case by case.

If the new types of IR come from their counterparts in EDG, then modifications to the EDG/SAGE connection code are needed. If not, the EDG/SAGE connection code may be irrelevant.

If you are trying to add new nodes to represent pragma information, you can create your new nodes without involving EDG or its connection to ROSE. You just parse the pragma string in the original AST and create your own nodes to get a new version of AST. Then it should be done.

How does the AST merge work?

edit

tests that demonstrate the AST Merge are in the directory:

    tests/nonsmoke/functional/CompileTests/mergeAST_tests

(run "make check" to see hundreds of tests go by).

parent vs. scope

edit

An AST node can have a parent node which is different from the its scope.

For example: the struct declaration's parent is the typedef declaration. But the struct's scope is the scope of the typedef declaration.

typedef struct frame {int x;} s_frame;

Parsing text into AST

edit

There is some experimental support to parse simple code text into AST pieces. It is not intended to parse entire source codes. But the support should be able to be extended to handle more types of input.

Some documentation about this work:

Example project using the parser building blocks

  • projects/pragmaParsing should work.

Translation

edit

How to skip system headers in translation?

edit

Often we are only interested in user code. The AST represents all codes from users and system headers. We need to skip things from system headers.


// Final most complete version, skip all header files, we cannot unparse changed AST from header files , at least by default

    if (Inliner::skipHeaders)
    { 
      string filename= funcall->get_file_info()->get_filename();
      string suffix = StringUtility ::fileNameSuffix(filename);
      //vector.tcc: This is an internal header file, included by other library headers
      if (suffix=="h" ||suffix=="hpp"|| suffix=="hh"||suffix=="H" ||suffix=="hxx"||suffix=="h++" ||suffix=="tcc")
        return false;

      // also check if it is compiler generated, mostly template instantiations. They are not from user code.
      if (funcall->get_file_info()->isCompilerGenerated() )
        return false;

      // check if the file is within include-staging/ header directories
      if (insideSystemHeader(funcall))
       return false;

    }

//------------partial solutions

bool processStatements(SgNode* n)
{
  ROSE_ASSERT (n!=NULL);
  // Skip compiler generated code, system headers, etc.
  if (isSgLocatedNode(n))
  {
    if (isSgLocatedNode(n)->get_file_info()->isCompilerGenerated())
      return false;
  }
 ...
}

This is based on Sg_File_Info

Inside of Sg_File_Info::display(debug.......) 
     isTransformation                      = false 
     isCompilerGenerated                   = true (no position information) 
     isOutputInCodeGeneration              = false 
     isShared                              = false 
     isFrontendSpecific                    = true (part of ROSE support for gnu compatability) 
     isSourcePositionUnavailableInFrontend = false 
     isCommentOrDirective                  = false 
     isToken                               = false 
     file_id  = 2 
     filename = /home/liao6/daily-test-rose/upcwork/install/include/gcc_HEADERS/rose_edg_required_macros_and_functions.h 
     line     = 167  column   = 1 

....
shared[1] int gsj;
Inside of Sg_File_Info::display(debug.......) 
     isTransformation                      = false 
     isCompilerGenerated                   = false 
     isOutputInCodeGeneration              = false 
     isShared                              = false 
     isFrontendSpecific                    = false 
     isSourcePositionUnavailableInFrontend = false 
     isCommentOrDirective                  = false 
     isToken                               = false 
     filename = /home/liao6/svnrepos/mycode/rose/upc/unshared.upc 
     line     = 6  column = 1 
     file_id  = 1 
     filename = /home/liao6/svnrepos/mycode/rose/upc/unshared.upc 
     line     = 6  column   = 1 


Another way, rose make a copy for all system headers and store them in dedicated paths

  bool insideSystemHeader (SgLocatedNode* node)
  {
    bool rtval = false;
    ROSE_ASSERT (node != NULL);
    Sg_File_Info* finfo = node->get_file_info();
    if (finfo!=NULL)
    {
      string fname = finfo->get_filenameString();
      string buildtree_str1 = string("include-staging/gcc_HEADERS");
      string buildtree_str2 = string("include-staging/g++_HEADERS");
      string installtree_str1 = string("include/edg/gcc_HEADERS");
      string installtree_str2 = string("include/edg/g++_HEADERS");
      // if the file name has a sys header path of either source or build tree
      if ((fname.find (buildtree_str1, 0) != string::npos) ||
          (fname.find (buildtree_str2, 0) != string::npos) ||
          (fname.find (installtree_str1, 0) != string::npos) ||
          (fname.find (installtree_str2, 0) != string::npos)
          )
        rtval = true;
    }
    return rtval;                                                                                                              
  } 



Can ROSE identityTranslator generate 100% identical output file?

edit

https://mailman.nersc.gov/pipermail/rose-public/2011-January/000604.html

Questions: Rose identityTranslator performs some modifications, "automatically".

These modifications are:

  • Expanding the assert macro.
  • Adding extra brackets around constants of typedef types (e.g. c=Typedef_Example(12); is translated in the output to c = Typedef_Example((12));)
  • Converting NULL to 0.

Can I avoid these modifications?

Answer: No.

There is no easy way to avoid these changes currently. Some of them are introduced by the cpp preprocessor. Others are introduced by the EDG front end ROSE uses. 100% faithful source-to-source translation may require significant changes to preprocessing directive handling and the EDG internals.

We have had some internal discussion to save raw token strings into AST and use them to get faithful unparsed code. But this effort is still at its initial stage as far as I know.

How to build a tool inserting function calls?

edit

https://mailman.nersc.gov/pipermail/rose-public/2010-July/000319.html

Question: I am trying to build a tool which insert one or more function calls whenever in the source code there is a function belonging to a certain group (e.g. all functions beginning with foo_*). During the ast traversal, how can I find the right place, i.e., there is a function in ROSE that searches for a string pattern or something similar?

Answers:

  • In Chapter 28 AST Construction of the ROSE tutorial, there are examples to instrument function calls into the AST using traversals or a queryTree. I would approach this by checking the node for the specific SgFunctionDefinition (or whatever you need) and then check the name of the node to find its location.
  • You can
    • use the AST query mechanism to find all functions and store them in a container. e.g Rose_STL_Container<SgNode*> nodeList = NodeQuery::querySubTree(root_node,V_Sg????);
    • Then iterate the container to check each function to see if the function name matches what you want.
    • use SageBuilder namespace's buildFunctionCallStmt() to create a function call statement.
    • use SageInterface namespace's insertStatement () to do the insertion.

How to insert a header into an input file?

edit

There is an SageInterface function for doing this:

// Insert include "filename" or include <filename> (system header) into the global scope containing the current scope, right after other include XXX.
PreprocessingInfo *     SageInterface::insertHeader (const std::string &filename, PreprocessingInfo::RelativePositionType position=PreprocessingInfo::after, bool isSystemHeader=false, SgScopeStatement *scope=NULL) 

How to copy/clone a function?

edit

https://mailman.nersc.gov/pipermail/rose-public/2011-April/000919.html

We need to be more specific about the function you want to copy. Is it just a prototype function declaration (non-defining declaration in ROSE's term ) or a function with a definition (defining declaration in ROSE's term)?

  • Copying a non-defining function declaration can be achieved by using the following function:
// Build a prototype for an existing function declaration (defining or nondefining is fine).
SgFunctionDeclaration* SageBuilder::buildNondefiningFunctionDeclaration (const SgFunctionDeclaration *funcdecl, SgScopeStatement *scope=NULL)
  • Copying a defining function declaration is semantically a problem since it introduces redefinition of the same function.

It is at least a hack to first introduce something wrong and later correct it. Here is an example translator to do the hack (copy a defining function, rename it, fix its symbol):


#include <rose.h>
#include <stdio.h>
using namespace SageInterface;

int main(int argc, char** argv)
{
  SgProject* project = frontend(argc, argv);
  AstTests::runAllTests(project);

// Find a defining function named "bar" under project

  SgFunctionDeclaration* func=
findDeclarationStatement<SgFunctionDeclaration> (project, "bar", NULL,
true);
  ROSE_ASSERT (func != NULL);

// Make a copy and set it to a new name
  SgFunctionDeclaration* func_copy =
isSgFunctionDeclaration(copyStatement (func));
  func_copy->set_name("bar_copy");

// Insert it to a scope
  SgGlobal * glb = getFirstGlobalScope(project);
  appendStatement (func_copy,glb);

#if 0  // fix up the missing symbol, this should be optional now since SageInterface::appendStatement() should handle it transparently. 
  SgFunctionSymbol *func_symbol =  glb->lookup_function_symbol
("bar_copy", func_copy->get_type());
  if (func_symbol == NULL);
  {
    func_symbol = new SgFunctionSymbol (func_copy);
    glb ->insert_symbol("bar_copy", func_symbol);
  }
#endif
  AstTests::runAllTests(project);
  backend(project);
  return 0;
}

ROSE's unparser checks for Sg_File_Info objects of AST pieces before it decides to print out text format of the AST pieces. Only the AST coming from the same file of the input file or AST generated by transformation should be unparsed by default. For example, some AST subtrees come from an included header. But it is often not desired to unparse the content of an included header.

If the file info is still the original file info, the solution is to set the copied AST to be transformation-generated:

// Recursively set source position info(Sg_File_Info) as transformation generated.
SageInterface::setSourcePositionForTransformation (SgNode *root) 

Can I transform code within a header file?

edit

https://mailman.nersc.gov/pipermail/rose-public/2011-May/000971.html

No. ROSE does not unparse AST from headers right now. A summer project tried to do this. But it did not finish and not well tested.

The option is -rose:unparseHeaderFiles -rose:unparseHeaderFilesRootFolder UNPARSED_HEADERS_DIR in tests/CompilerTests/UnparseHeadersTests

https://mailman.nersc.gov/pipermail/rose-public/2010-August/000344.html

I guess ROSE does not support writing out changed headers for safety/practical reasons. A changed header has to be saved to another file since writing to the original header is very dangerous (imaging debugging a header translator which corrupts input headers). Then all other files/headers using the changed header have to be updated to use the new header file.

Also all files involved have to be writable by user's translators.

As a result, the current unparser skips subtrees of AST from headers by checking file flags (compiler_generated and/or output_in_code_generation etc.) stored in Sg_File_Info objects.

How to work with formal and actual arguments of functions?

edit

https://mailman.nersc.gov/pipermail/rose-public/2011-June/001008.html

     //Get the actual arguments
     SgExprListExp* actualArguments = NULL;
     if (isSgFunctionCallExp(callSite))
         actualArguments = isSgFunctionCallExp(callSite)->get_args();
     else if (isSgConstructorInitializer(callSite))
         actualArguments = isSgConstructorInitializer(callSite)->get_args();
     ROSE_ASSERT(actualArguments != NULL);

     const SgExpressionPtrList& actualArgList = 
actualArguments->get_expressions();

     //Get the formal arguments.
     SgInitializedNamePtrList formalArgList;
     if (calleeDef != NULL)
         formalArgList = calleeDef->get_declaration()->get_args();

     //The number of actual arguments can be less than the number of 
formal arguments (with implicit arguments) or greater
     //than the number of formal arguments (with varargs)

How to translate multiple files scattered in different directories of a project?

edit

Expected behavior of a ROSE Translator:

A translator built using ROSE is designed to act like a compiler (gcc, g++,gfortran ,etc depending on the input file types). So users of the translator only need to change the build system for the input files to use the translator instead of the original compiler.

If the original compiler used by you implicitly include or link anything, you may have to make the include or linking paths explicit after the change. For example, if mpiCC transparently links to /path/to/mpilib.a, you have to add this linking flag into your modified Makefile.

On 07/25/2012 11:20 AM, Fernando Rannou wrote:
> > Hello
> >
> > We are trying to use ROSE to refactor  a big project consisting of
> > several  *.cc and *.hh files, located at various directories. Each
> > class is defined in a *.hh file and implemented in a *.cc file.
> > Classes include (#include) other class definitions. But we have only
> > found single file examples.
> >
> > Is this possible? If so, how?
> >
> >
> > Thanks

Unparsing

edit

Generate code into different files

edit

https://mailman.nersc.gov/pipermail/rose-public/2012-August/001742.html Question: I wonder is it possible for ROSE to generate two files (.c and .cl) when it translates C-to-OpenCL ?

Answer: The ROSE outliner has an option to output the generated function into a new file.

https://github.com/rose-compiler/rose/blob/master/src/midend/programTransformation/astOutlining/Outliner.hh

...
// Generate the outlined function into a separated new source file
// -rose:outline:new_file
extern bool useNewFile;
...

You may want to check how this option is used in the outliner source files to get what you want.

Binary Analysis

edit

How is the binary analysis capability in ROSE?

edit

Question: how is the binary analysis capability in ROSE? Is it just disassembly? is it possible to associate the binary code with the source if combined with ROSE source code analysis?

Answer:

ROSE has various binary disassemblers (x86, ARM, MIPS, PowerPC) that, like source code analysis, create an internal representation of the binary in the form of an AST. Although the types of AST nodes for source and binaries are largely disjoint, one can analyze the binary AST using concepts similar to source analysis. ROSE has a few binary analyses. Here are some off the top of my head:

  • Control flow graphs, both virtual and using Boost Graph Library.
  • Function call graphs.
  • Operations on control flow graphs: dominator, post-dominator
  • Pointer detection analysis that tries to figure out which memory locations are used as if they were pointers in a higher level language.
  • Instruction partitioning: figuring out how to group instructions into basic blocks, and how to group basic blocks into functions when all you have is a list of instructions. Its accuracy on automatically partitioning stripped, obfuscated code has been shown to be better than the best disassemblers that use debugging info and symbol tables.
  • Instruction semantics for x86. This is an area of active development but supports only 32-bit integer instructions. We plan to add floating point, SIMD, 64-bit, other architectures, and a simpler API. But even as it stands, it is complete enough to simulate entire ELF executables (even "vi"). See next bullet
  • An x86 simulator for ELF executables. This project is able to simulate how the Linux kernel loads an executable, and the various system calls made by the executable. It it complete enough to simulate many Linux programs, but also provides callback points for the user to insert various kinds of analyses. For instance, you could use it to disassemble an entire process after it has been dynamically linked. There are many examples in the projects/simulator directory. In contrast to simulators like Qemu, Bochs, valgrind, VirtualBox, VMware, etc. where speed is a primary design driver, the ROSE simulator is designed to provide user-level access to as many aspects of execution as possible.
  • Plugins for instruction semantics. Instruction semantics is written in such a way that different "semantic domains" can be plugged in. ROSE has a symbolic domain, an interval domain, and a partial-symbolic domain. The symbolic domain can be used in conjunction with an SMT solver (currently supporting Yices). The interval domain is actually sets of intervals, and is binary-arithmetic-aware (i.e., correctly handles overflows, etc on a fixed word size). The partial-symbolic domain uses single-node expressions in order to optimize for speed and size at the expense of accuracy. Users can and have written other domains, and a new API (in the works) will make this even easier.
  • Examples of data-flow analysis (e.g., the pointer analysis already mentioned), but not a well defined framework yet (someone is working on one). Currently, data-flow type analyses are implemented using the instruction semantics support: as each instruction is "executed" the domain in which it executes causes the data to flow in the machine state. Each analysis provides its own flow equation to handle the points where control flow joins from two or more directions; and provides its own "next-instruction" function to iterate over the control flow graph.
  • Clone detection of various formats: various forms of syntactic, including one using locality-sensitive hashing; and semantic clone detection via fuzz testing in a simulator.

By Robb


You ask about combining source and binary analysis... Its certainly possible since ROSE can hold both the binary and source ASTs in memory at the same time. But I'm not aware of any analysis that "sews" them together. We do support parsing DWARF info from ELF executables, so you might be able to use that to sew the two ASTs together.

--Robb

Daily work

edit

git clone returns error: SSL certificate problem?

edit

Symptom:

git clone https://github.com/rose-compiler/rose.git
Cloning into rose...
error: SSL certificate problem, verify that the CA cert is OK. Details:
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed while accessing https://github.com/rose-compiler/rose.git/info/refs

fatal: HTTP request failed

The reason may be that you are behind a firewall which tweaks the original SSL certification.

Solutions: Tell cURL to not check for SSL certificates:

#Solution 1: Environment variable (temporary)
      $ env GIT_SSL_NO_VERIFY=true git pull

# Solution 2: git-config (permanent)
      # set local configuration
      $ git config --local http.sslVerify false

# Solution 2:  set global configuration
      $ git config --global http.sslVerify false

What is the best IDE for ROSE developers?

edit

https://mailman.nersc.gov/pipermail/rose-public/2010-April/000115.html

There may not be a widely recognized best integrated development environment. But developers have reported that they are using

  • vim
  • emacs
  • KDevelop
  • Source Navigator
  • Eclipse
  • Netbeans

The thing is that ROSE is huge and has some ridiculously large generated source file (CxxGrammar.h and CxxGrammar.C are generated in the build tree for example). So many code browsers may have trouble in handling ROSE.

Portability

edit

What is the status for supporting Windows?

edit

We do maintain some preliminary Windows Support of building ROSE/src to generate librose.so by leveraging cmake. However, the work is not finished.

To build librose under windows, type the following command lines in the top level source tree

 mkdir ROSE-build-cmake
 cd ROSE-build-cmake
 cmake .. -DBOOST_ROOT=${ROSE_TEST_BOOST_PATH}  // Example: boost installation path /opt/boost_1_40_0-inst

https://mailman.nersc.gov/pipermail/rose-public/2011-December/001349.html

We have not finished the Windows work yet. IT is on our list of things to do. It was started and ROSE internally compiles using MS Visual Studio (using project files generated from the Cmake build that we maintain and test within our release process for ROSE) but does not pass our tests. So it is not ready. The distribution of the EDG binaries for Windows is another step that would come after that. We don't know at present when this will be done, it is important, but not a high priority for our DOE specific work, but important for other work. The effort required is something that we could discuss. If you want to call me that would be the best way to proceed. Send me email off of the main list and we can set that up.

https://mailman.nersc.gov/pipermail/rose-public/2011-March/000798.html

Under Windows ROSE uses CMake. This is a project that is currently under development. As of November 2010 we are able to compile and link the src directory. We are also able to run example programs that link against librose and execute the frontend and backend. {\em However, this is an internal capability and not available externally yet since we don't distribute the Windows generated EDG binaries that would be required. Also the current support for Windows is still incomplete, ROSE does not yet pass its internal tests under Windows.}

How-tos

edit

Quick, short, and focused tutorials about how to do common tasks as a ROSE developer.

Please create a new wikibook page for each how-to topic. Each how-to wiki page should NOT contain any level one (=) or level two(==) heading so it can be included at the correct levels in the print version of this wikibook.

Quick, short, and focused tutorials about how to do common tasks as a ROSE developer. Please create a new wikibook page for each how-to topic. Each how-to wiki page should NOT contain any level one (=) or level two(==) heading so it can be included at the correct levels in the print version of this wikibook.

Create a new page

edit
==[[ROSE Compiler Framework/How to write a How-to|How to write a How-to]]==
{{:ROSE Compiler Framework/How to write a How-to}}
  • rename three places of the pasted text with the desired page name, for example
==[[ROSE Compiler Framework/How to do XYZ|How to do XYZ]]==
{{:ROSE Compiler Framework/How to do XYZ}}
  • click save page
  • You will see red text trying to link to the not yet existing How to do XYZ page
  • click any of the red text, it will bring you to an editing window to add content of your new how-to page
  • you can now add new content and save it.
    • Again, each how-to wiki page should NOT contain any level one (=) or level two(==) heading so it can be included at the correct levels in the print version of this wikibook.

Insert image to wiki page

edit

Rules of the content

edit
  • Only level three headings (===) and higher are allowed in a how-to page. This is necessary for the how-to page to be correctly included into the final one-page print version of this wikibook. Sorry about this restriction.
    • Again, please don't use level one (=) or level two (==) headings in a how-to page!
  • Keep each how-to short and focused. Readers are expected to only spend 30-minutes or much less to quickly learn how to do something using ROSE.
  • After you created a new how-to page and saved your contributions. Please go to the print version to make sure it shows up correctly.
  • please specify the how-to topic is the current practice or the proposed new ways of doing things. So we can have clear guideline for code review for what is mandatory and what is optional.

Developing a big, sophisticated project entails many challenges. To mitigate some of these challenges, we have adopted several best practices: incremental development, code review, and continuous integration.

Here are some tips on how to divide up a big project into smaller, bite-sized pieces so each piece can be incrementally developed, code reviewed, and integrated.

  • Input: define different sets of test inputs based on complexity and difficulty. Tackle simpler sets first.
  • Output: define intermediate results leading to the final output. Often, results A and B are needed to generate C. So the project can have multiple stages, based on the intermediate results.
  • Algorithm: complex compiler algorithms are often just enhanced versions of more fundamental algorithms. Implement the fundamental algorithms first to gain insight and experience. Then, afterward, you can implement the full-blown versions.
  • Language: for projects dealing with multiple languages, focus on one language at a time.
  • Platform: limit the scope of supported platforms: Linux, Ubuntu, OS X (TODO: add reference to ROSE supported platforms)
  • Performance: Start with a basic, working implementation first. Then try to optimize its performance, efficiency.
  • Scope: your translator could first focus on working at a function scope, then grow to handle an entire source file, or even multiple files, at the same time.
  • Skeleton then meat: a project should be created with the major components defined first. Each component can be enriched separately later on.
  • Annotations (manual vs. automated): Performing one compiler task often requires results from many other tasks being developed. Defining source code annotations as the interface between two tasks can decouple these dependencies in a clean manner. The annotations can be first manually inserted. Later the annotations can be automatically generated by the finished analysis.
  • Optional vs. Default: introducing a flag to turn on/off your feature. Make it as a default option when it matures.

Overview

edit

Three things are needed to visualize ROSE AST:

  • Sample input code: you provide it
  • a dot graph generator to generate a dot file from AST: ROSE provides dot graph generators
  • a visualization tool to open the dot graph: ZGRViewer and Graphviz are used by ROSE developers

If you don't want to install ROSE+ZGRview + Graphvis from scratch, you can directly use ROSE virtual machine image, which has everything you need installed and configured so you can just visualize your sample code.

Sample input code

edit

Please prepare simplest input code without including any headers so you can get a small enough AST to digest.

Dot Graph Generator

edit

We provide ROSE_INSTALLATION_TREE/bin/dotGeneratorWholeASTGraph (complex graph) and dotGenerator (a simpler version) to generate a dot graph of the detailed AST of input code.

Tools to generate AST graph in dot format. There are two versions

  • dotGenerator: simple AST graph generator showing essential nodes and edges
  • dotGeneratorWholeASTGraph: whole AST graph showing more details. It provides filter options to show/hide certain AST information.

command line:

 dotGeneratorWholeASTGraph  yourcode.c  // it is best to avoid including any header into your input code to have a small enough tree to visualize.

To skip builtin functions

  • dotGeneratorWholeASTGraph -DSKIP_ROSE_BUILTIN_DECLARATIONS yourcode.c
dotGeneratorWholeASTGraph -rose:help
   -rose:help                     show this help message
   -rose:dotgraph:asmFileFormatFilter           [0|1]  Disable or enable asmFileFormat filter
   -rose:dotgraph:asmTypeFilter                 [0|1]  Disable or enable asmType filter
   -rose:dotgraph:binaryExecutableFormatFilter  [0|1]  Disable or enable binaryExecutableFormat filter
   -rose:dotgraph:commentAndDirectiveFilter     [0|1]  Disable or enable commentAndDirective filter
   -rose:dotgraph:ctorInitializerListFilter     [0|1]  Disable or enable ctorInitializerList filter
   -rose:dotgraph:defaultFilter                 [0|1]  Disable or enable default filter
   -rose:dotgraph:defaultColorFilter            [0|1]  Disable or enable defaultColor filter
   -rose:dotgraph:edgeFilter                    [0|1]  Disable or enable edge filter
   -rose:dotgraph:expressionFilter              [0|1]  Disable or enable expression filter
   -rose:dotgraph:fileInfoFilter                [0|1]  Disable or enable fileInfo filter
   -rose:dotgraph:frontendCompatibilityFilter   [0|1]  Disable or enable frontendCompatibility filter
   -rose:dotgraph:symbolFilter                  [0|1]  Disable or enable symbol filter
   -rose:dotgraph:emptySymbolTableFilter        [0|1]  Disable or enable emptySymbolTable filter
   -rose:dotgraph:typeFilter                    [0|1]  Disable or enable type filter
   -rose:dotgraph:variableDeclarationFilter     [0|1]  Disable or enable variableDeclaration filter
   -rose:dotgraph:variableDefinitionFilter      [0|1]  Disable or enable variableDefinitionFilter filter
   -rose:dotgraph:noFilter                      [0|1]  Disable or enable no filtering
Current filter flags' values are: 
         m_asmFileFormat = 0 
         m_asmType = 0 
         m_binaryExecutableFormat = 0 
         m_commentAndDirective = 1 
         m_ctorInitializer = 0 
         m_default = 1 
         m_defaultColor = 1 
         m_edge = 1 
         m_emptySymbolTable = 0 
         m_expression = 0 
         m_fileInfo = 1 
         m_frontendCompatibility = 0 
         m_symbol = 0 
         m_type = 0 
         m_variableDeclaration = 0 
         m_variableDefinition = 0 
         m_noFilter = 0 

Dot Graph Visualization

edit

To visualize the generated dot graph, you have to install

Please note that you have to configure ZGRViewer to have correct paths to some commands it uses. You can do it from its configuration/setting menu item. Or directly modify the text configuration file (.zgrviewer).

One example configuration is shown below (cat .zgrviewer)

<?xml version="1.0" encoding="UTF-8"?>
<zgrv:config xmlns:zgrv="http://zvtm.sourceforge.net/zgrviewer">
    <zgrv:directories>
        <zgrv:tmpDir value="true">/tmp</zgrv:tmpDir>
        <zgrv:graphDir>/home/liao6/svnrepos</zgrv:graphDir>
        <zgrv:dot>/home/liao6/opt/graphviz-2.18/bin/dot</zgrv:dot>
        <zgrv:neato>/home/liao6/opt/graphviz-2.18/bin/neato</zgrv:neato>
        <zgrv:circo>/home/liao6/opt/graphviz-2.18/bin/circo</zgrv:circo>
        <zgrv:twopi>/home/liao6/opt/graphviz-2.18/bin/twopi</zgrv:twopi>
        <zgrv:graphvizFontDir>/home/liao6/opt/graphviz-2.18/bin</zgrv:graphvizFontDir>
    </zgrv:directories>
    <zgrv:webBrowser autoDetect="true" options="" path=""/>
    <zgrv:proxy enable="false" host="" port="80"/>
    <zgrv:preferences antialiasing="false" cmdL_options=""
        highlightColor="-65536" magFactor="2.0" saveWindowLayout="false"
        sdZoom="false" sdZoomFactor="2" silent="true"/>
    <zgrv:plugins/>
    <zgrv:commandLines/>
</zgrv:config>

You have to configure the run.sh script to have correct path also

cat run.sh

#!/bin/sh

# If you want to be able to run ZGRViewer from any directory,
# set ZGRV_HOME to the absolute path of ZGRViewer's main directory
# e.g. ZGRV_HOME=/usr/local/zgrviewer

ZGRV_HOME=/home/liao6/opt/zgrviewer-0.8.1

java -jar $ZGRV_HOME/target/zgrviewer-0.8.1.jar "$@"

Example session

edit

A complete example

# make sure the environment variables(PATH, LD_LIBRARY_PATH) for the installed rose are correctly set
which dotGeneratorWholeASTGraph
~/workspace/masterClean/build64/install/bin/dotGeneratorWholeASTGraph

# run the dot graph generator
dotGeneratorWholeASTGraph -c ttt.c

#see it
which run.sh
~/64home/opt/zgrviewer-0.8.2/run.sh

run.sh ttt.c_WholeAST.dot

example output

edit

We put some example source files and their AST dump files into: https://github.com/chunhualiao/rose-ast

edit

SageInterface functions


// You can call the following functions with gdb

   //! Pretty print AST horizontally, output to std output
   void SageInterface::printAST (SgNode* node); 


   //! Pretty print AST horizontally, output to a specified text file
   void SageInterface::printAST (SgNode* node, const char* filename); 

   //! Pretty print AST horizontally, output to a specified text file.
   void SageInterface::printAST2TextFile (SgNode* node, const char* filename, bool printTypes=true);

A translator (textASTGenerator) is also available, with its source code under exampleTranslators/defaultTranslator .

  • make install-tools will install this tool
  • textASTGenerator input.c will generate a text output of the entire AST

Example use inside of gdb

edit
  • to print a portion of AST to the screen
  • to print a portion of AST into a text file
(gdb) up
#7  0x00007ffff418ab5d in Unparse_ExprStmt::unparseExprStmt (this=0x1a1bf950, stmt=0x7fffda63ce30, info=...) at ../../../sourcetree/src/backend/unparser/CxxCodeGeneration/unparseCxx_statements.C:9889

(gdb) p SageInterface::printAST(stmt)
└──@0x7fffda63ce30 SgExprStatement transformation 0:0
    └──@0x7fffd8488790 SgFunctionCallExp transformation 0:0
        ├──@0x7fffe6211910 SgMemberFunctionRefExp transformation 0:0
        └──@0x7fffd7f2c370 SgExprListExp transformation 0:0
            └──@0x7fffd8488720 SgFunctionCallExp transformation 0:0
                ├──@0x7fffe6211988 SgMemberFunctionRefExp transformation 0:0
                └──@0x7fffd7f2c3d8 SgExprListExp transformation 0:0
$2 = void


(gdb) up 10
#48 0x00007ffff40dce69 in Unparser::unparseFile (this=0x7fffffff8c60, file=0x7fffeb786010, info=..., unparseScope=0x0) at ../../../sourcetree/src/backend/unparser/unparser.C:945
(gdb) p SageInterface::printAST2TextFile(file,"test.txt")

textASTGenerator

edit

Example command line use:

textASTGenerator -c test_qualifiedName.cpp

cat test_qualifiedName.cpp.AST.txt

└──@0x7fe9f1916010 SgProject
    └──@0xb45730 SgFileList
        └──@0x7fe9f17be010 SgSourceFile
            ├──@0x7fe9fdf19120 SgGlobal test_qualifiedName.cpp 0:0
            │   ├──@0x7fe9f159a010 SgTypedefDeclaration rose_edg_required_macros_and_functions.h 0:0
            │   │   └── NULL
            │   ├──@0x7fe9f159a390 SgTypedefDeclaration rose_edg_required_macros_and_functions.h 0:0
            │   │   └── NULL
            │   ├──@0x7fe9f0f59010 SgFunctionDeclaration rose_edg_required_macros_and_functions.h 0:0 "::feclearexcept"
            │   │   ├──@0x7fe9f1391010 SgFunctionParameterList rose_edg_required_macros_and_functions.h 0:0
            │   │   │   └──@0x7fe9f1258010 SgInitializedName rose_edg_required_macros_and_functions.h 0:0 "::__excepts"
            │   │   │       └── NULL
            │   │   ├── NULL
            │   │   └── NULL
            │   ├──@0x7fe9f0f59540 SgFunctionDeclaration rose_edg_required_macros_and_functions.h 0:0 "::fegetexceptflag"
            │   │   ├──@0x7fe9f1391630 SgFunctionParameterList rose_edg_required_macros_and_functions.h 0:0
            │   │   │   ├──@0x7fe9f1258420 SgInitializedName rose_edg_required_macros_and_functions.h 0:0 "::__flagp"
            │   │   │   │   └── NULL
            │   │   │   └──@0x7fe9f1258628 SgInitializedName rose_edg_required_macros_and_functions.h 0:0 "::__excepts"
            │   │   │       └── NULL
            │   │   ├── NULL
            │   │   └── NULL

              ...

            │   └──@0x7fe9eff218c0 SgFunctionDeclaration test_qualifiedName.cpp 14:1 "::foo"
            │       ├──@0x7fe9ef5e0320 SgFunctionParameterList test_qualifiedName.cpp 14:1
            │       │   ├──@0x7fe9ef495278 SgInitializedName test_qualifiedName.cpp 14:13 "x"
            │       │   │   └── NULL
            │       │   └──@0x7fe9ef495480 SgInitializedName test_qualifiedName.cpp 14:20 "y"
            │       │       └── NULL
            │       ├── NULL
            │       └──@0x7fe9ee8f3010 SgFunctionDefinition test_qualifiedName.cpp 15:1
            │           └──@0x7fe9ee988010 SgBasicBlock test_qualifiedName.cpp 15:1
            │               ├──@0x7fe9eee1ba90 SgVariableDeclaration test_qualifiedName.cpp 16:3
            │               │   ├── NULL
            │               │   └──@0x7fe9ef495688 SgInitializedName test_qualifiedName.cpp 16:3 "z"
            │               │       └── NULL
            │               ├──@0x7fe9ee7ad010 SgExprStatement test_qualifiedName.cpp 17:3
            │               │   └──@0x7fe9ee7dc010 SgAssignOp test_qualifiedName.cpp 17:5
            │               │       ├──@0x7fe9ee8c0010 SgVarRefExp test_qualifiedName.cpp 17:3
            │               │       └──@0x7fe9ee813010 SgAddOp test_qualifiedName.cpp 17:9
            │               │           ├──@0x7fe9ee8c0078 SgVarRefExp test_qualifiedName.cpp 17:7
            │               │           └──@0x7fe9ee84a010 SgMultiplyOp test_qualifiedName.cpp 17:12
            │               │               ├──@0x7fe9ee8c00e0 SgVarRefExp test_qualifiedName.cpp 17:11
            │               │               └──@0x7fe9ee881010 SgIntVal test_qualifiedName.cpp 17:13
            │               └──@0x7fe9ee77e010 SgReturnStmt test_qualifiedName.cpp 18:3
            │                   └──@0x7fe9ee8c0148 SgVarRefExp test_qualifiedName.cpp 18:10
            ├── NULL
            ├── NULL
            └── NULL

Render the AST in HTML

edit

The repo errington1/ast-to-html contains a tool for rendering the Rose abstract syntax "graph" as collapsible HTML with shared nodes and cycles represented by HTML links. For now, it's available only from the command line. The plan is to add command-line options to omit parts of the tree and to make the tool available as a library. For now, it somewhat arbitrarily omit portions of the tree that originate from the file rose_edg_required_macros_and_functions.h.

The command:

astToHTML file.C

will produce file.C.html which can be viewed with a browser:

firefox file.C.html

Translator basically converts one AST to another version of AST. The translation process may add, delete, or modify the information stored in AST.

Overview

edit

A ROSE-based translator usually has the following steps

  1. Search for the AST nodes you want to translate.
  2. Perform the translation action on the found AST nodes. This action can be one of two major variants
  • Updating the existing AST nodes
  • Creating new AST nodes to replace the original ones. This is usually cleaner approach than patching up existing AST and is better supported by SageBuilder and SageInterface functions.
  • Deep copying existing AST subtrees to duplicate the code. May expression subtrees should not be shared. So deep copy them is required to get the correct AST.
  • Optionally update other related information for the translation.

First Step

edit

Get familiar with the ASTs before and after your translation. So you know for sure what your code will deal with and what AST you code will generate.

The best way is to prepare simplest sample codes and carefully examine the whole dot graphs of them.

More details for visualize AST are available at How to visualize AST.

Design considerations

edit

It is usually a good idea to

  • separate the searching step from the translation step so one search (traversal) can be reused by all sorts of translations.
  • When design the order of searching and translation, be careful about if the translation will negatively impact on the searching
    • Please void pre-order traversal since you may end up modifying AST nodes to be visited later on, similar to the effect of iterator invalidation.
    • please use post-order, or reverse order of pre-order for your traversal hooked up with translation

Searching for the AST node

edit

There are multiple ways to find things you want to translate in AST.

AST Query

edit
  • Via AST Query: Node query returns a list of AST nodes in the same type. This is often enough to simple translations
Rose_STL_Container<SgNode*> ProgramHeaderStatementList = NodeQuery::querySubTree (project,V_SgProgramHeaderStatement);
for (Rose_STL_Container<SgNode*>::iterator i = ProgramHeaderStatementList.begin(); i != ProgramHeaderStatementList.end(); i++)
{
    SgProgramHeaderStatement* ProgramHeaderStatement = isSgProgramHeaderStatement(*i);
    ...
}


More information about AST Query can be found at "6 Query Library" of the ROSE User Manual pdf.

AST Traversal

edit
  • Through AST traversal: walks through whole AST using different orders (pre-order or post order). Post-order traversal is recommended to avoid modifying things the traversal will hit later on (similar problem as iterator invalidation in C++)
    • The AST traversal gives visit() functions to hook up your translation functions. A switch statement is can be used for handling different types of AST node.
class f2cTraversal : public AstSimpleProcessing
{
  public:
    virtual void visit(SgNode* n);
};

void f2cTraversal::visit(SgNode* n)
{
  switch(n->variantT())
  {
    case V_SgSourceFile:
      {
        SgFile* fileNode = isSgFile(n);
        translateFileName(fileNode);
      }
      break;
    case V_SgProgramHeaderStatement:
      {
        ...
      }
      break;
    default:
      break;
  }
}

More information about AST Traversal can be found at "7 AST Traversal" of the ROSE User manual pdf online.

Performing Translation

edit

Before you write your translator, please read Chapter 32 AST Construction of ROSE tutorial pdf documentation (http://rosecompiler.org/ROSE_Tutorial/ROSE-Tutorial.pdf). It contains essential information for any translation writers.

The translations you want to do often depend on the types of the AST nodes you visit. For example you can have a set of translation functions defined in your namespace

  • void translateForLoop(SgForLoop* n)
  • void translateFileName(SgFile* n)
  • void translateReturnStatement(SgReturnStmt* n), and so on

Other tips

Updating Tree
edit
  • You might need to handle some details, like removing symbol, updating parent, and symbol table.
  • Be careful to use deepDelete() and deepCopy(). Some information might not be updated properly. For example, deepDelete might not update your symbol table.

Verify the correctness

edit

You can use wholeAST graph to verify your translation.

All ROSE-based translators should call AstTests::runAllTests(project) after all the transformation is done to make sure the translated AST is correct.

This has a higher standard than just correctly unparsed to compilable code. It is common for an AST to go through unparsing correctly but fail on the sanity check.

More information is at Sanity_check

Sample translators

edit

Here we list a few sample translators which can grow to more sophisticated ones you want.

Find pragmas

edit
/*
toy code
by Liao, 12/14/2007
*/
#include "rose.h"
#include <iostream>
using namespace std;

class visitorTraversal : public AstSimpleProcessing
{
  protected:
    virtual void visit(SgNode* n);
};

void visitorTraversal::visit(SgNode* node)
{
  if (node->variantT() == V_SgPragmaDeclaration) {
      cout << "pragma!" << endl;
  }
}

int main(int argc, char * argv[])
{
  SgProject *project = frontend (argc, argv);
  visitorTraversal myvisitor;
  myvisitor.traverseInputFiles(project,preorder);

  return backend(project);
}


Here is an example project doing pragma parsing and saving the results into AST attributes.

https://github.com/rose-compiler/rose-develop/tree/master/projects/pragmaParsing

Loop transformation

edit

SageInterface namespace (http://rosecompiler.org/ROSE_HTML_Reference/namespaceSageInterface.html) has many translation functions, such as those for loops.

For example, there is a loop tiling function defined in https://github.com/rose-compiler/rose/blob/master/src/frontend/SageIII/sageInterface/sageInterface.C :

//     Tile the n-level (starting from 1) loop of a perfectly nested loop nest using tiling size s.
bool     loopTiling (SgForStatement *loopNest, size_t targetLevel, size_t tileSize)


An example Test translator is provided to test this function:

And it has a test input file:

How to build your translator

edit

See How to set up the makefile for a translator

In this HOW-to, it presents the steps of generating a cross-language translator. We will use Fortran to C translator as an example here.

Change the sourcefile information

edit
  • change the output file name. The suffix name has to be changed with this following function.
void SgFile::set_unparse_output_filename (std::string unparse_output_filename ) 
  • change the output language type.
void SgFile::set_outputLanguage(SgFile::outputLanguageOption_enum outputLanguage) 	
  • Set the output to be target-language only.
 We use set_C_only for the Fortran to C translation.  This process might be optional.
void SgFile::set_C_only(bool C_only)

Identify language-dependent AST node

edit
  • Example: ROSE AST uses different AST nodes to present a loop in C and Fortran. The following two figures represent the same loop for different languages.
C uses SgForStatement for the for loops.
 
C SgForStatement
Fortran uses SgFortranDo for the do loops.
 
Fortran SgFortranDo

Implement the translation functions

edit
  • Use the wholeAST as reference to implement the translation function.
  • Generate the new AST node by copy required information from the original AST node.
  • Remove the original node, and make sure the parent/child relationship in AST is setup properly.

Testing output code

edit
  • If compiler is available to test the output code, run the backend to generate object by the backend compiler.
  • If compiler is not available for the target language, make sure output codes can be generated from the testing cases. It is suggested to run the compilation tests for all the testing output.

In this How-to, you will create a makefile to compile and test your own custom ROSE translator.

You may want to first look at "How-to install ROSE": ROSE Compiler Framework/Installation.

Environment variables

edit

You must have the proper environment variable set so you translator can find the librose.so during execution.

export LD_LIBRARY_PATH=${ROSE_INSTALL}/lib:${BOOST_INSTALL}/lib:$LD_LIBRARY_PATH

Translator Code

edit

Here is a simplest ROSE translator.

// ROSE translator example: identity translator.
//
// No AST manipulations, just a simple translation:
//
//    input_code > ROSE AST > output_code

#include <rose.h>

int main (int argc, char** argv)
{
    // Build the AST used by ROSE
    SgProject* project = frontend(argc, argv);

    // Run internal consistency tests on AST
    AstTests::runAllTests(project);

    // Insert your own manipulations of the AST here...

    // Generate source code from AST and invoke your
    // desired backend compiler
    return backend(project);
}

Example 1

edit

If you have a project that's separate from ROSE (i.e., you compile it with an *installed* version of ROSE) it's up to you how to do things.

If the project depends only on ROSE and ROSE's dependencies then you can use the Makefile described at the end of the ROSE installation instructions http://rosecompiler.org/ROSE_HTML_Reference/installation.html

# Sample makefile for programs that use the ROSE library.
#
# ROSE has a number of configuration details that must be present when
# compiling and linking a user program with ROSE, and some of these 
# details are difficult to get right.  The most foolproof way to get
# these details into your own makefile is to use the "rose-config"
# tool. 
#
#
# This makefile assumes:
#   1. The ROSE library has been properly installed (refer to the
#      documentation for configuring, building, and installing ROSE).
#
#   2. The top of the installation directory is $(ROSE_INSTALLED). This
#      is the same directory you specified for the "--prefix" argument
#      of the "configure" script, or the "CMAKE_INSTALL_PREFIX" if using 
#      cmake. E.g., "/usr/local".
#
# The "rose-config" tool currently only works for ROSE configured with
# GNU auto tools (e.g., you ran "configure" when you built and
# installed ROSE). The "cmake" configuration is not currently
# supported by "rose-config" [September 2015].
##############################################################################

# Standard C++ compiler stuff (see rose-config --help)
CXX      = $(shell $(ROSE_INSTALLED)/bin/rose-config cxx)
CPPFLAGS = $(shell $(ROSE_INSTALLED)/bin/rose-config cppflags)
CXXFLAGS = $(shell $(ROSE_INSTALLED)/bin/rose-config cxxflags)
LDFLAGS  = $(shell $(ROSE_INSTALLED)/bin/rose-config ldflags)
LIBDIRS  = $(shell $(ROSE_INSTALLED)/bin/rose-config libdirs)

MOSTLYCLEANFILES =

##############################################################################
# Assuming your source code is "demo.C" to build an executable named "demo".

all: demo

demo.o: demo.C
   $(CXX) $(CPPFLAGS) $(CXXFLAGS) -o $@ -c $^ 

demo: demo.o
   $(CXX) $(CXXFLAGS) -o $@ $^ $(LDFLAGS)
   @echo "Remember to set:" 
   @echo "  LD_LIBRARY_PATH=$(LIBDIRS):$$LD_LIBRARY_PATH"

MOSTLYCLEANFILES += demo demo.o

##############################################################################
# Standard boilerplate

.PHONY: clean 
clean:
   rm -f $(MOSTLYCLEANFILES)
 

Complete examples

edit

There are project examples demonstrating different ways of building your projects using ROSE's headers/libraries.

They are available at: https://github.com/chunhualiao/rose-project-templates

A few templates for independent projects using ROSE. By independent, we mean the projects are located outside of ROSE's source tree.

  • template-project-v1 : using Makefile to build the project
  • template-project-v2 : using Makefile to build and run a ROSE plugin

It is rare that your translator will just work after your finish up coding. Using gdb to debug your code is indispensable to make sure your code works as expected. This page shows examples of how to debug your translator.

Preparations

edit

First and foremost, make sure your ROSE installation and your translator was built with -g and without GCC optimizations turned on. This will ensure all debug information will be best preserved.

To configure ROSE installation with debugging options, you can add the following options to your normal configuration.

 ../rose/configure—with-CXX_DEBUG=-g --with-C_OPTIMIZE=-O0—with-CXX_OPTIMIZE=-O0  ...

If you already built ROSE but forgot what options you used, you can go to your buildtree of ROSE to double check if debugging options are used:

cd buildDebug/
-bash-4.2$ head config.log

  $ ../sourcetree/configure --with-java=/path/to/java/jdk/1.8.0_131 --with-boost=/path/to/boost/1_60_0/gcc/4.9.3 --with-CXX_DEBUG=-g --with-C_OPTIMIZE=-O0 --with-CXX_OPTIMIZE=-O0 --enable-languages=c++,fortran

Before you debug your own translators, you may want to doublecheck if ROSE's builtin translator (rose-compiler) can handle your input code properly. If not, you should report the bug to the ROSE team.

If rose-compiler can handle it but your customized translator cannot. The problem may be caused by the customizations you introduced in your translators.

Another thing is to reduce your input code to be as small as possible so it can just trigger the error you are interested in. This will simplify the bug hunting process dramatically. It is very difficult to debug a translator processing thousands of lines of code.

Basics of GDB

edit

gdb is a debugger. It provides a controlled execution environment for you to inspect if your program is running the way you expected.

Essentially, it allows you to:

  • run your program within a controlled debugging environment: using gdb—args <program> <args...>
    • or libtool—mode=execute gdb—args <progra> <args...> for libtool built executables.
  • stop at desired execution points
    • normal breakpoints (called breakpoints): using break <where>, <where> can be a function name, line_number, or file:line_number.
    • when value changes for a given variable(called watchpoint): using watch <where>
    • segmentation fault : this will happen automatically, so you can inspect how a seg fault happens
    • assertion failure: this will happen automatically, so you can debug assertion failures.
  • inspect and even change things like variables, types, etc. once your program stops at desired execution points
    • inspect the call stack at the breakpoint: using backtrace or bt in short. frame <frame#> to go to the stack frame of your interests.
    • look around relevant source code near the breakpoint: using list [+|-|filename:linenumber|filename:function]
    • inspect the values of variables and expressions: using print <what>, <what> can be any variable, expression, or even function call.
    • inspect the type of a variable: whatis variable_name
    • change the content of a variable to a given value: set <var_name>=<value>
    • call functions: using print function_name, this is helpful to call some dump functions for some class objects.
  • control the execution further
    • step one statement at a time, through the execution of your program: you can step through at the current frame (next), step down into a frame (step), or step out the current stack frame (finish),
    • continue the execution until next breakpoint or watchpoint: using continue or c in short
    • return from a function immediately, passing a given value: return <expression>
  • and other things.

For a quick overview, you can look through a cheat sheet online:

From Rob, There is a curses-based wrapper called "cgdb" [1].

  • You get a split window: the bottom is the GDB console and the top is syntax-highlighted source code that automatically tracks your current location and supports PageUp/PageDn, which is a lot easier to use than GDB's "list" command.
  • it requires ncurses-devl and readline-devel to install.

A translator not built by ROSE's build system

edit

This is also called out-of-sourcetree build for some people.

If the translator is built using a makefile without using libtool. The debugging steps of your translator are just classic steps to use gdb.

  • Make sure your translator is compiled with the GNU debugging option -g so there is debugging information in your object codes

These are the steps of a typical debugging session:

1. Set a breakpoint

2. Examine the execution path to make sure the program goes through the path that you expected

3. Examine the local data to validate their values

# how to print out information about a AST node
#-------------------------------------
(gdb) print n
$1 = (SgNode *) 0xb7f12008

# Check the type of a node
#-------------------------------------
(gdb) print n->sage_class_name()
$2 = 0x578b3af "SgFile"

(gdb) print n->get_parent()
$7 = (SgNode *) 0x95e75b8

# Convert a node to its real node type then call its member functions
#---------------------------
(gdb) isSgFile(n)->getFileName ()

#-------------------------------------
# When displaying a pointer to an object, identify the actual (derived) type of the object 
# rather than the declared type, using the virtual function table. 
#-------------------------------------
(gdb) set print object on
(gdb) print astNode
$6 = (SgPragmaDeclaration *) 0xb7c68008

# unparse the AST from a node
# Only works for AST pieces with full scope information
# It will report error if scope information is not available at any ancestor level.
#-------------------------------------
(gdb) print n->unparseToString()

# print out Sg_File_Info 
#-------------------------------------
(gdb) print n->get_file_info()->display()

Example 1: debugging an AST traversal

edit

We first prepare the example ROSE-based analyzer traversing AST to find loops. Rename it to be demo.C:

We can look into the example analyzer's source code: cat demo.C Essentially, we can see the following content:

  4 #include "rose.h"
  5 
  6 class visitorTraversal : public AstSimpleProcessing
  7    {
  8      public:
  9           visitorTraversal();
 10           virtual void visit(SgNode* n);
 11           virtual void atTraversalEnd();
 12    };
 13 
 14 visitorTraversal::visitorTraversal()
 15    {
 16    }
 17 
 18 void visitorTraversal::visit(SgNode* n)
 19    {
 20      if (isSgForStatement(n) != NULL)
 21         {
 22           printf ("Found a for loop ... \n");
 23         }
 24    }
 25 
 26 void visitorTraversal::atTraversalEnd()
 27    {
 28      printf ("Traversal ends here. \n");
 29    }
 30 
 31 int
 32 main ( int argc, char* argv[] )
 33    {
 34   // Initialize and check compatibility. See Rose::initialize
 35      ROSE_INITIALIZE;
 36 
 37      if (SgProject::get_verbose() > 0)
 38           printf ("In visitorTraversal.C: main() \n");
 39 
 40      SgProject* project = frontend(argc,argv);
 41      ROSE_ASSERT (project != NULL);
 42 
 43   // Build the traversal object
 44      visitorTraversal exampleTraversal;
 45 
 46   // Call the traversal function (member function of AstSimpleProcessing)
 47   // starting at the project node of the AST, using a preorder traversal.
 48      exampleTraversal.traverseInputFiles(project,preorder);
 49 
 50      return 0;
 51    }

A ROSE-based tool initializes ROSE first (at line 35). Then the frontend() function is called to parse an iput code and generate an AST rooted at project of SgProject type (at line 40).

After that, a traversal object is declared at line 44. The object is used to traverse the input files of the project, using a preorder traversal.

The traversal object is based on a derived visitorTraversal class at line 6. This derived class has member functions to define what should happen during construction (line 14), visiting a node (line 18), and the end of the traversal (line 26).

Now get a sample makefile to build the source file into an executable file:

The makefile should be self-explanatory. It uses rose-config in the installation path to set various environment variables for compilers, compilation and linking flags, library path, etc.

Get an example input code for the analyzer:

The input code has two for-loops at line 20 and 41, as shown at link

Prepare the environment variable used to specify where ROSE is installed.

  • export ROSE_HOME=/home/freecc/install/rose_install

Build the analyzer:

  • make -f SampleMakefile

There should be an executable file named demo under the current directory:

Finally, run the demo analyzer to process the example input code:

  • ./demo -c inputCode_ExampleTraversals.C

The analyzer should find two for loops and report the end of the traveral.

Found a for loop ...
Found a for loop ...
Traversal ends here.

Debug The Translator

edit

Now let's debug this simple translator.

First of all, use gdb -args to run the translator with options

gdb -args ./demo -c inputCode_ExampleTraversals.C

// r means run: It is usually a good practice to run the program without setting breakpoints first to see if it can run normally
//     Or to reproduce an assertion error or seg fault
(gdb) r
Starting program: /home/liao6/workspace/rose/2019-10-31_14-16-05_-0700/myTranslator/./demo -c inputCode_ExampleTraversals.C
...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Found a for loop ...
Found a for loop ...
Traversal ends here.
[Inferior 1 (process 44697) exited normally]
...
(gdb)

// This program has no errors. So we set a break point at line 22 of demo.C

(gdb) b demo.C:22
Breakpoint 1 at 0x40b0e2: file demo.C, line 22.

// We expect this breakpoint will be hit twice since the input code has only two loops. We try to verify this:
(gdb) r
Starting program: /home/liao6/workspace/rose/2019-10-31_14-16-05_-0700/myTranslator/./demo -c inputCode_ExampleTraversals.C
warning: File "/nfs/casc/overture/ROSE/opt/rhel7/x86_64/gcc/4.9.3/mpc/1.0/mpfr/3.1.2/gmp/5.1.2/lib64/libstdc++.so.6.0.20-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Breakpoint 1, visitorTraversal::visit (this=0x7fffffffb430, n=0x7fffe87db010) at demo.C:22
22                printf ("Found a for loop ... \n");

// Hit breakpoint 1 once, try to continue to see what will happen

(gdb) c
Continuing.
Found a for loop ...

Breakpoint 1, visitorTraversal::visit (this=0x7fffffffb430, n=0x7fffe87db138) at demo.C:22
22                printf ("Found a for loop ... \n");

// Hit breakpoint 1 for the second time, try to continue

(gdb) c
Continuing.
Found a for loop ...
Traversal ends here.
[Inferior 1 (process 46262) exited normally]

// The program terminates now , no more stop at breakpoint 1.

// ----------now we inspect the variable n at the breakpoint 1
// return the program and hit Breakpoint 1
(gdb) r

Breakpoint 1, visitorTraversal::visit (this=0x7fffffffb430, n=0x7fffe87db010) at demo.C:22
22                printf ("Found a for loop ... \n");

//print out the casted n : it is indeed a SgForStatement

(gdb) p isSgForStatement(n)
$1 = (SgForStatement *) 0x7fffe87db010

// Inspect the file info of this ForStatement, understanding where it is coming from in the source code.
 
(gdb) p isSgForStatement(n)->get_file_info()->display()
Inside of Sg_File_Info::display() of this pointer = 0x7fffe94d58b0
     isTransformation                      = false
     isCompilerGenerated                   = false
     isOutputInCodeGeneration              = false
     isShared                              = false
     isFrontendSpecific                    = false
     isSourcePositionUnavailableInFrontend = false
     isCommentOrDirective                  = false
     isToken                               = false
     isDefaultArgument                     = false
     isImplicitCast                        = false
     filename = /home/liao6/workspace/rose/2019-10-31_14-16-05_-0700/myTranslator/inputCode_ExampleTraversals.C
     line     = 20  column = 6
     physical_file_id       = 0 = /home/liao6/workspace/rose/2019-10-31_14-16-05_-0700/myTranslator/inputCode_ExampleTraversals.C
     physical_line          = 20
     source_sequence_number = 8726
$2 = void

Inspect post_construction_intialization()

edit

Breakpoints at the post_construction_initialization () are useful to inspect when a node is created and/or if a node has required fields set after construction. For example, going through the callstack (using up and down command in gdb) leading to this function call can inspect if the node has parent or scope pointers set. If not, you can add such operations to fix bugs related NULL pointers.

// ----------- We want to inspect when the SgForStatement nodes are created in the execution
// set a breakpoint at the post_construciton_initialization() method of SgForStatement

(gdb) b SgForStatement::post_construction_initialization()
Breakpoint 2 at 0x7ffff3d6495f: file Cxx_Grammar.C, line 139566.

// Disable Breapoint 1 for now
(gdb) disable 1

(gdb) info breakpoints
Num     Type           Disp Enb Address            What
1       breakpoint     keep n   0x000000000040b0e2 in visitorTraversal::visit(SgNode*) at demo.C:22
        breakpoint already hit 1 time
2       breakpoint     keep y   0x00007ffff3d6495f in SgForStatement::post_construction_initialization() at Cxx_Grammar.C:139566

// run until the Breakpoint 2 is hit
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Breakpoint 2, SgForStatement::post_construction_initialization (this=0x7fffe87db010) at Cxx_Grammar.C:139566
139566       if (p_for_init_stmt == NULL) {

//  use backtrace to check the function call stacks leading to this stop of Breakpoint 2. 
//  You can clearly see the callchain from main() all the way to the breakpoint.

(gdb) bt
#0  SgForStatement::post_construction_initialization (this=0x7fffe87db010) at Cxx_Grammar.C:139566
#1  0x00007ffff54e55d8 in SgForStatement::SgForStatement (this=0x7fffe87db010, test=0x0, increment=0x0, loop_body=0x0)
    at Cxx_GrammarNewConstructors.C:5258
#2  0x00007ffff5bb04ce in EDG_ROSE_Translation::parse_statement (sse=..., existingBasicBlock=0x0)
    at ../../../../../../sourcetree/src/frontend/CxxFrontend/EDG/edgRose/edgRose.C:49637
#3  0x00007ffff5bbb5ea in EDG_ROSE_Translation::parse_statement_list (sse=..., orig_kind=iek_statement, orig_ptr=0x115f810)
    at ../../../../../../sourcetree/src/frontend/CxxFrontend/EDG/edgRose/edgRose.C:53079
#4  0x00007ffff5bb0221 in EDG_ROSE_Translation::parse_statement (sse=..., existingBasicBlock=0x7fffe8934010)
    at ../../../../../../sourcetree/src/frontend/CxxFrontend/EDG/edgRose/edgRose.C:49492
#5  0x00007ffff5c09217 in EDG_ROSE_Translation::parse_function_body<SgFunctionDeclaration> (sse_base=..., p=0x1151ad0, decl=0x7fffe9e21698)
    at ../../../../../../sourcetree/src/frontend/CxxFrontend/EDG/edgRose/edgRose.C:36262
#6  0x00007ffff5b844fa in EDG_ROSE_Translation::convert_routine (p=0x1151ad0, forceTemplateDeclaration=false, edg_template=0x0,
    optional_nondefiningTemplateDeclaration=0x0) at ../../../../../../sourcetree/src/frontend/CxxFrontend/EDG/edgRose/edgRose.C:34343
#7  0x00007ffff5b703cf in EDG_ROSE_Translation::parse_routine (sse=..., forceTemplateDeclaration=false, edg_template=0x0, forceSecondaryDeclaration=false)
    at ../../../../../../sourcetree/src/frontend/CxxFrontend/EDG/edgRose/edgRose.C:29866
#8  0x00007ffff5be6f78 in EDG_ROSE_Translation::parse_global_or_namespace_scope_entity (sse=...)
    at ../../../../../../sourcetree/src/frontend/CxxFrontend/EDG/edgRose/edgRose.C:64638
#9  0x00007ffff5bea2df in EDG_ROSE_Translation::parse_global_scope (inputGlobalScope=0x7ffff7ec3120, sse=..., skip_ast_translation=false)
    at ../../../../../../sourcetree/src/frontend/CxxFrontend/EDG/edgRose/edgRose.C:65427
#10 0x00007ffff5bedbee in sage_back_end (sageFile=...) at ../../../../../../sourcetree/src/frontend/CxxFrontend/EDG/edgRose/edgRose.C:66777
#11 0x00007ffff5beea8a in cfe_main (argc=44, argv=0x702f80, sageFile=...)
    at ../../../../../../sourcetree/src/frontend/CxxFrontend/EDG/edgRose/edgRose.C:66992
#12 0x00007ffff5beebe7 in edg_main (argc=44, argv=0x702f80, sageFile=...)
    at ../../../../../../sourcetree/src/frontend/CxxFrontend/EDG/edgRose/edgRose.C:67093
#13 0x00007ffff3c14629 in SgSourceFile::build_C_and_Cxx_AST (this=0x7fffeb45e010, argv=..., inputCommandLine=...)
    at ../../../../sourcetree/src/frontend/SageIII/sage_support/sage_support.cpp:5430
#14 0x00007ffff3c1587a in SgSourceFile::buildAST (this=0x7fffeb45e010, argv=..., inputCommandLine=...)
    at ../../../../sourcetree/src/frontend/SageIII/sage_support/sage_support.cpp:5983
#15 0x00007ffff3c0e5b7 in SgFile::callFrontEnd (this=0x7fffeb45e010) at ../../../../sourcetree/src/frontend/SageIII/sage_support/sage_support.cpp:3119
#16 0x00007ffff3c0b576 in SgSourceFile::callFrontEnd (this=0x7fffeb45e010)
    at ../../../../sourcetree/src/frontend/SageIII/sage_support/sage_support.cpp:2137
#17 0x00007ffff3c0a005 in SgFile::runFrontend (this=0x7fffeb45e010, nextErrorCode=@0x7fffffffaadc: 0)
    at ../../../../sourcetree/src/frontend/SageIII/sage_support/sage_support.cpp:1606
#18 0x00007ffff3c12924 in Rose::Frontend::RunSerial (project=0x7fffeb555010)
    at ../../../../sourcetree/src/frontend/SageIII/sage_support/sage_support.cpp:4613
#19 0x00007ffff3c12593 in Rose::Frontend::Run (project=0x7fffeb555010) at ../../../../sourcetree/src/frontend/SageIII/sage_support/sage_support.cpp:4506
#20 0x00007ffff3c0b84d in SgProject::RunFrontend (this=0x7fffeb555010) at ../../../../sourcetree/src/frontend/SageIII/sage_support/sage_support.cpp:2209
#21 0x00007ffff3c0bcb2 in SgProject::parse (this=0x7fffeb555010) at ../../../../sourcetree/src/frontend/SageIII/sage_support/sage_support.cpp:2334
#22 0x00007ffff3c0b0d4 in SgProject::parse (this=0x7fffeb555010, argv=...)
    at ../../../../sourcetree/src/frontend/SageIII/sage_support/sage_support.cpp:2028
#23 0x00007ffff3cbd2e9 in SgProject::SgProject (this=0x7fffeb555010, argv=..., frontendConstantFolding=false) at Cxx_Grammar.C:29114
#24 0x00007ffff645fd54 in frontend (argv=..., frontendConstantFolding=false) at ../../../sourcetree/src/roseSupport/utility_functions.C:628
#25 0x00007ffff645fc10 in frontend (argc=3, argv=0x7fffffffb578, frontendConstantFolding=false)
    at ../../../sourcetree/src/roseSupport/utility_functions.C:590
#26 0x000000000040b152 in main (argc=3, argv=0x7fffffffb578) at demo.C:40
(gdb)

// Again, Breakpoint 2 will be hit twice since we only have two for loops in the input code

(gdb) c
Continuing.

Breakpoint 2, SgForStatement::post_construction_initialization (this=0x7fffe87db138) at Cxx_Grammar.C:139566
139566       if (p_for_init_stmt == NULL) {
(gdb) c
Continuing.
Found a for loop ...
Found a for loop ...
Traversal ends here.
[Inferior 1 (process 47292) exited normally]

Set a condition to Breakpoints

edit

In real codes, there are hundreds of objects of same class type (e.g. SgForStatement). Many of them come from header files and will be present in AST. We should only stop when it mathes the one we want to inspect. Often, we can use the memory address of the object as a condition.

// Add a condition to Breakpoint 2: stop only when the this pointers is equal to a memory address
(gdb) cond 2 (unsigned long)this==(unsigned long)0x7fffe87db138

// run the program: now it will stop only when the condition for Breakpoint 2 is met, skipping all other hits to Breakpoint 2. 
(gdb) r
Starting program: /home/liao6/workspace/rose/2019-10-31_14-16-05_-0700/myTranslator/./demo -c inputCode_ExampleTraversals.C
..
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Breakpoint 2, SgForStatement::post_construction_initialization (this=0x7fffe87db138) at Cxx_Grammar.C:139566
139566       if (p_for_init_stmt == NULL) {

// continue the execution, after doing inspections you want. It should go to the normal termination, skipping other hits to Breakpoint 2. 
(gdb) c
Continuing.
Found a for loop ...
Found a for loop ...
Traversal ends here.
[Inferior 1 (process 47785) exited normally]

Use Watchpoints

edit

You can use a watchpoint to stop execution whenever the value of an expression changes, without having to predict a particular place where this may happen. (This is sometimes called a data breakpoint.)

Watchpoints can be treated as special types of breakpoints. They will stop when the watched memory locations have value changes. This is especially useful when you want to know when some variable (or field of an object) is set to some value or cleared its value. For example, often a bug is related to some NULL value of some fields of a node. The fields may be set during construction of the node. But later mysteriously one field becomes NULL. It is extremely hard to find when this happens without using watchpoint.

For example, we want to watch the value changes to the parent field of the SgForStatement matching the memory address of the 2nd loop.

  • We first stop at a breakpoint where we have access to the node's internal fields. This usually is done by stopping at SgForStatement::post_construction_initialization ().
  • Once the internal variables are visible in gdb at the proper breakpoint, we can grab the memory address of the internal variable. This requires your knowledge of how internal variables are named. You can either look at the class declaration of the object, or guess it by convention. For example, mostly something with an access function like get_something() has a corresponding internal variable named p_something in ROSE AST node types.
  • Finally, we have to watch the deferenced value of the memory address (watch *address). Watching the memory address (watch address) is to watch a constant value. It won't work.
(gdb) info breakpoints
Num     Type           Disp Enb Address            What
1       breakpoint     keep n   0x000000000040b0e2 in visitorTraversal::visit(SgNode*) at demo.C:22
2       breakpoint     keep y   0x00007ffff3d6495f in SgForStatement::post_construction_initialization() at Cxx_Grammar.C:139566
        stop only if (unsigned long)this==(unsigned long)0x7fffe87db138
        breakpoint already hit 1 time

(gdb) r
Starting program: /home/liao6/workspace/rose/2019-10-31_14-16-05_-0700/myTranslator/./demo -c inputCode_ExampleTraversals.C

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Breakpoint 2, SgForStatement::post_construction_initialization (this=0x7fffe87db138) at Cxx_Grammar.C:139566
139566       if (p_for_init_stmt == NULL) {

// the data member storing parent pointer of an AST node is p_parent
// it is now have NULL value 
(gdb) p p_parent
$3 = (SgNode *) 0x0

// we obtain the memory address of p_parent
(gdb) p &p_parent
$4 = (SgNode **) 0x7fffe87db140

// watch value changes of this address
// Must deference the address with * , or it will won't work by saying "Cannot watch constant value"

(gdb) watch *0x7fffe87db140

// We can now watch the value changes to this memory address
// Let's restart the program from the beginning:

(gdb) r
Starting program: /home/liao6/workspace/rose/2019-10-31_14-16-05_-0700/myTranslator/./demo -c inputCode_ExampleTraversals.C
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Hardware watchpoint 2: *0x7fffe87db140

Old value = <unreadable>
New value = 0

SgNode::SgNode (this=0x7fffe87db138) at Cxx_Grammar.C:2128
2128         p_isModified = false;

// we check when the first time its value is changed: the constructor of ancestor node SgNode

(gdb) bt
#0  SgNode::SgNode (this=0x7fffe87db138) at Cxx_Grammar.C:2128
#1  0x00007ffff3d19f01 in SgLocatedNode::SgLocatedNode (this=0x7fffe87db138, startOfConstruct=0x0) at Cxx_Grammar.C:85278
#2  0x00007ffff3d59798 in SgStatement::SgStatement (this=0x7fffe87db138, startOfConstruct=0x0) at Cxx_Grammar.C:134029
#3  0x00007ffff3d59fcc in SgScopeStatement::SgScopeStatement (this=0x7fffe87db138, file_info=0x0) at Cxx_Grammar.C:134289
#4  0x00007ffff54e54e0 in SgForStatement::SgForStatement (this=0x7fffe87db138, test=0x0, increment=0x0, loop_body=0x0)
    at Cxx_GrammarNewConstructors.C:5230
#5  0x00007ffff5bb04ce in EDG_ROSE_Translation::parse_statement (sse=..., existingBasicBlock=0x0)
    at ../../../../../../sourcetree/src/frontend/CxxFrontend/EDG/edgRose/edgRose.C:49637
#6  0x00007ffff5bbb5ea in EDG_ROSE_Translation::parse_statement_list (sse=..., orig_kind=iek_statement, orig_ptr=0x1162200)
    at ../../../../../../sourcetree/src/frontend/CxxFrontend/EDG/edgRose/edgRose.C:53079
#7  0x00007ffff5bb0221 in EDG_ROSE_Translation::parse_statement (sse=..., existingBasicBlock=0x7fffe8934470)
    at ../../../../../../sourcetree/src/frontend/CxxFrontend/EDG/edgRose/edgRose.C:49492
#8  0x00007ffff5c09217 in EDG_ROSE_Translation::parse_function_body<SgFunctionDeclaration> (sse_base=..., p=0x1151fc0, decl=0x7fffe9e21e68)
    at ../../../../../../sourcetree/src/frontend/CxxFrontend/EDG/edgRose/edgRose.C:36262
#9  0x00007ffff5b844fa in EDG_ROSE_Translation::convert_routine (p=0x1151fc0, forceTemplateDeclaration=false, edg_template=0x0,
    optional_nondefiningTemplateDeclaration=0x0) at ../../../../../../sourcetree/src/frontend/CxxFrontend/EDG/edgRose/edgRose.C:34343
#10 0x00007ffff5b703cf in EDG_ROSE_Translation::parse_routine (sse=..., forceTemplateDeclaration=false, edg_template=0x0, forceSecondaryDeclaration=false)
    at ../../../../../../sourcetree/src/frontend/CxxFrontend/EDG/edgRose/edgRose.C:29866
#11 0x00007ffff5be6f78 in EDG_ROSE_Translation::parse_global_or_namespace_scope_entity (sse=...)
    at ../../../../../../sourcetree/src/frontend/CxxFrontend/EDG/edgRose/edgRose.C:64638
#12 0x00007ffff5bea2df in EDG_ROSE_Translation::parse_global_scope (inputGlobalScope=0x7ffff7ec3120, sse=..., skip_ast_translation=false)
    at ../../../../../../sourcetree/src/frontend/CxxFrontend/EDG/edgRose/edgRose.C:65427
#13 0x00007ffff5bedbee in sage_back_end (sageFile=...) at ../../../../../../sourcetree/src/frontend/CxxFrontend/EDG/edgRose/edgRose.C:66777
#14 0x00007ffff5beea8a in cfe_main (argc=44, argv=0x702f80, sageFile=...)
    at ../../../../../../sourcetree/src/frontend/CxxFrontend/EDG/edgRose/edgRose.C:66992
#15 0x00007ffff5beebe7 in edg_main (argc=44, argv=0x702f80, sageFile=...)
    at ../../../../../../sourcetree/src/frontend/CxxFrontend/EDG/edgRose/edgRose.C:67093
#16 0x00007ffff3c14629 in SgSourceFile::build_C_and_Cxx_AST (this=0x7fffeb45e010, argv=..., inputCommandLine=...)
    at ../../../../sourcetree/src/frontend/SageIII/sage_support/sage_support.cpp:5430
#17 0x00007ffff3c1587a in SgSourceFile::buildAST (this=0x7fffeb45e010, argv=..., inputCommandLine=...)
    at ../../../../sourcetree/src/frontend/SageIII/sage_support/sage_support.cpp:5983
#18 0x00007ffff3c0e5b7 in SgFile::callFrontEnd (this=0x7fffeb45e010) at ../../../../sourcetree/src/frontend/SageIII/sage_support/sage_support.cpp:3119
#19 0x00007ffff3c0b576 in SgSourceFile::callFrontEnd (this=0x7fffeb45e010)
    at ../../../../sourcetree/src/frontend/SageIII/sage_support/sage_support.cpp:2137
#20 0x00007ffff3c0a005 in SgFile::runFrontend (this=0x7fffeb45e010, nextErrorCode=@0x7fffffffaadc: 0)
    at ../../../../sourcetree/src/frontend/SageIII/sage_support/sage_support.cpp:1606
#21 0x00007ffff3c12924 in Rose::Frontend::RunSerial (project=0x7fffeb555010)
    at ../../../../sourcetree/src/frontend/SageIII/sage_support/sage_support.cpp:4613
#22 0x00007ffff3c12593 in Rose::Frontend::Run (project=0x7fffeb555010) at ../../../../sourcetree/src/frontend/SageIII/sage_support/sage_support.cpp:4506
#23 0x00007ffff3c0b84d in SgProject::RunFrontend (this=0x7fffeb555010) at ../../../../sourcetree/src/frontend/SageIII/sage_support/sage_support.cpp:2209
#24 0x00007ffff3c0bcb2 in SgProject::parse (this=0x7fffeb555010) at ../../../../sourcetree/src/frontend/SageIII/sage_support/sage_support.cpp:2334
#25 0x00007ffff3c0b0d4 in SgProject::parse (this=0x7fffeb555010, argv=...)
    at ../../../../sourcetree/src/frontend/SageIII/sage_support/sage_support.cpp:2028
#26 0x00007ffff3cbd2e9 in SgProject::SgProject (this=0x7fffeb555010, argv=..., frontendConstantFolding=false) at Cxx_Grammar.C:29114
#27 0x00007ffff645fd54 in frontend (argv=..., frontendConstantFolding=false) at ../../../sourcetree/src/roseSupport/utility_functions.C:628
#28 0x00007ffff645fc10 in frontend (argc=3, argv=0x7fffffffb578, frontendConstantFolding=false)
    at ../../../sourcetree/src/roseSupport/utility_functions.C:590
#29 0x000000000040b152 in main (argc=3, argv=0x7fffffffb578) at demo.C:40

// We continue the execution

(gdb) c
Continuing.
Hardware watchpoint 2: *0x7fffe87db140

Old value = 0
New value = -393001872
SgNode::set_parent (this=0x7fffe87db138, parent=0x7fffe8934470) at Cxx_Grammar.C:1684
1684         if ( ( variantT() == V_SgClassDeclaration ) && ( parent != NULL && parent->variantT() == V_SgFunctionParameterList ) )

//  Now we found that this p_parent field is set by calling set_parent(). We can inspect the call stack and other things of interests
(gdb) bt
#0  SgNode::set_parent (this=0x7fffe87db138, parent=0x7fffe8934470) at Cxx_Grammar.C:1684
#1  0x00007ffff5bb04ef in EDG_ROSE_Translation::parse_statement (sse=..., existingBasicBlock=0x0)
    at ../../../../../../sourcetree/src/frontend/CxxFrontend/EDG/edgRose/edgRose.C:49643
#2  0x00007ffff5bbb5ea in EDG_ROSE_Translation::parse_statement_list (sse=..., orig_kind=iek_statement, orig_ptr=0x1162200)
    at ../../../../../../sourcetree/src/frontend/CxxFrontend/EDG/edgRose/edgRose.C:53079
#3  0x00007ffff5bb0221 in EDG_ROSE_Translation::parse_statement (sse=..., existingBasicBlock=0x7fffe8934470)
    at ../../../../../../sourcetree/src/frontend/CxxFrontend/EDG/edgRose/edgRose.C:49492
.... // omitted

(gdb) c
Continuing.
Found a for loop ...
Found a for loop ...
Traversal ends here.
[Inferior 1 (process 54495) exited normally]

// No more value changes to the same memory address, as expected. 

A translator shipped with ROSE

edit

This is also called in-tree or in-sourcetree build. libtool is used to build the translators.

ROSE turns on -O2 and -g by default so the translators shipped with ROSE should already have some debugging information available. But some variables may be optimized away. To preserve the max debugging information, you may have to reconfigure/recompile rose to turn off optimizations.

../sourcetree/configure—with-CXX_DEBUG=-g --with-C_OPTIMIZE=-O0—with-CXX_OPTIMIZE=-O0  ...

ROSE uses libtool so the executables in the build tree are not real—they're simply wrappers around the actual executable files. You have two choices:

  • Find the real executable in the .lib directory then debug the real executables there
  • Use libtool command line as follows:
$ libtool --mode=execute gdb --args ./built_in_translator file1.c

If you can set up alias command in your .bashrc, add the following:

alias debug='libtool --mode=execute gdb -args' 

then all your debugging sessions can be as simple as

$ debug ./built_in_translator file1.c

The remaining steps are the same as a regular gdb session with the typical operations, such as breakpoints, printing data, etc.

Example 2: Fixing a real bug in ROSE

edit

1. Reproduce the reported bug:

$ make check
...
./testVirtualCFG \
    --edg:no_warnings -w -rose:verbose 0 --edg:restrict \
    -I$ROSE/tests/CompileTests/virtualCFG_tests/../Cxx_tests \
    -I$ROSE/sourcetree/tests/CompileTests/A++Code \
    -c $ROSE/sourcetree/tests/CompileTests/virtualCFG_tests/../Cxx_tests/test2001_01.C

...
lt-testVirtualCFG: $ROSE/src/frontend/SageIII/virtualCFG/virtualCFG.h:111:
    VirtualCFG::CFGEdge::CFGEdge(VirtualCFG::CFGNode, VirtualCFG::CFGNode):
    Assertion `src.getNode() != __null && tgt.getNode() != __null' failed.

Ah, so we've failed an assertion within the virtualCFG.h header file on line 111:

Assertion `src.getNode() != __null && tgt.getNode() != __null' failed

And the error was produced by running the lt-testVirtualCFG libtool executable translator, i.e. the actual translator name is testVirtualCFG (without the lt- prefix).

2. Run the same translator command line with Libtool to start a GDB debugging session:

$ libtool --mode=execute gdb --args ./testVirtualCFG \
    --edg:no_warnings -w -rose:verbose 0 --edg:restrict \
    -I$ROSE/tests/CompileTests/virtualCFG_tests/../Cxx_tests \
    -I$ROSE/sourcetree/tests/CompileTests/A++Code \
    -c $ROSE/sourcetree/tests/CompileTests/virtualCFG_tests/../Cxx_tests/test2001_01.C

GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-42.el5_8.1)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from ${ROSE_BUILD_TREE}tests/CompileTests/virtualCFG_tests/.libs/lt-testVirtualCFG...done.
(gdb)

The GDB session has started, and we're provided with a command line prompt to begin our debugging.

3. Let's run the program, which will hit the failed assertion:

(gdb) r
Starting program: \
    ${ROSE_BUILD_TREE}/tests/CompileTests/virtualCFG_tests/.libs/lt-testVirtualCFG \
    --edg:no_warnings -w -rose:verbose 0 --edg:restrict \
    -I${ROSE}/tests/CompileTests/virtualCFG_tests/../Cxx_tests \
    -I../../../../sourcetree/tests/CompileTests/A++Code
    -c   ${ROSE}/tests/CompileTests/virtualCFG_tests/../Cxx_tests/test2001_01.C
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x2aaaaaaab000
[Thread debugging using libthread_db enabled]
lt-testVirtualCFG: ${ROSE}/src/frontend/SageIII/virtualCFG/virtualCFG.h:111:

VirtualCFG::CFGEdge::CFGEdge(VirtualCFG::CFGNode, VirtualCFG::CFGNode): Assertion `src.getNode() != __null && tgt.getNode() != __null' failed.

Program received signal SIGABRT, Aborted.
0x0000003752230285 in raise () from /lib64/libc.so.6

Okay, we've reproduced the problem in our GDB session.

4. Let's check the backtrace to see how we wound up at this failed assertion:

(gdb) bt
#0  0x0000003752230285 in raise () from /lib64/libc.so.6
#1  0x0000003752231d30 in abort () from /lib64/libc.so.6
#2  0x0000003752229706 in __assert_fail () from /lib64/libc.so.6

#3  0x00002aaaad6437b2 in VirtualCFG::CFGEdge::CFGEdge (this=0x7fffffffb300, src=..., tgt=...)
     at ${ROSE}/../src/frontend/SageIII/virtualCFG/virtualCFG.h:111
#4  0x00002aaaad643b60 in makeEdge<VirtualCFG::CFGNode, VirtualCFG::CFGEdge> (from=..., to=..., result=...)
     at ${ROSE}/../src/frontend/SageIII/virtualCFG/memberFunctions.C:82
#5  0x00002aaaad62ef7d in SgReturnStmt::cfgOutEdges (this=0xbfaf10, idx=1)
     at ${ROSE}/../src/frontend/SageIII/virtualCFG/memberFunctions.C:1471
#6  0x00002aaaad647e69 in VirtualCFG::CFGNode::outEdges (this=0x7fffffffb530)
     at ${ROSE}/../src/frontend/SageIII/virtualCFG/virtualCFG.C:636
#7  0x000000000040bf7f in getReachableNodes (n=..., s=...) at ${ROSE}/tests/CompileTests/virtualCFG_tests/testVirtualCFG.C:13
...

5. Next, we'll move backwards (or upwards) in the program to get to the point of assertion:

(gdb) up
#1  0x0000003752231d30 in abort () from /lib64/libc.so.6

(gdb) up
#2  0x0000003752229706 in __assert_fail () from /lib64/libc.so.6

(gdb) up
#3  0x00002aaaad6437b2 in VirtualCFG::CFGEdge::CFGEdge (this=0x7fffffffb300, src=..., tgt=...)
     at ${ROSE}/src/frontend/SageIII/virtualCFG/virtualCFG.h:111
111         CFGEdge(CFGNode src, CFGNode tgt): src(src), tgt(tgt) \
                   { assert(src.getNode() != NULL && tgt.getNode() != NULL); }

Okay, so the assertion is inside of a constructor for CFGEdge:

CFGEdge(CFGNode src, CFGNode tgt): src(src), tgt(tgt) \
{
    assert(src.getNode() != NULL && tgt.getNode() != NULL);  # This is the failed assertion
}

Unfortunately, we can't tell at a glance which of the two conditions in the assertion is failing.

6. Figure out why the assertion is failing:

Let's examine the two conditions in the assertion:

(gdb) p src.getNode()
$1 = (SgNode *) 0xbfaf10

So src.getNode() is returning a non-null pointer to an SgNode. How bout tgt.getNode()?

(gdb) p tgt.getNode()
$2 = (SgNode *) 0x0

Ah, there's the culprit. So for some reason, tgt.getNode() is returning a null SgNode pointer (0x0).

From here, we used the GDB up command to backtrace in the program to figure out where the node returned by tgt.getNode() was assigned a NULL value.

We eventually found a call to SgReturnStmt::cfgOutEdges which returns a variable, called enclosingFunc. In the source code, there's currently no assertion to check the value of enclosingFunc, and that's why we received the assertion later on in the program. As a side note, it is good practice to add assertions as soon as possible in your source code so in times like this, we don't have to spend time unnecessarily back-tracing.

After adding the assertion for enclosingFunc, we run the program again to reach this new assertion point:

lt-testVirtualCFG: ${ROSE}sourcetree/src/frontend/SageIII/virtualCFG/memberFunctions.C:1473: \
    virtual std::vector<VirtualCFG::CFGEdge, std::allocator<VirtualCFG::CFGEdge> > \
    SgReturnStmt::cfgOutEdges(unsigned int): \

    Assertion `enclosingFunc != __null' failed.

Okay, it's failing so we know that the assignment to enclosingFunc is NULL.

# enclosingFunc is definitely NULL (0x0)
(gdb) p enclosingFunc
$1 = (SgFunctionDefinition *) 0x0

# What is the current context?
(gdb) p this
$2 = (SgReturnStmt * const) 0xbfaf10

Okay, we're inside of an SgReturnStmt object. Let's set a break point where enclosingFunc is being assigned to:

Breakpoint 1, SgReturnStmt::cfgOutEdges (this=0xbfaf10, idx=1) at ${ROSE}/src/frontend/SageIII/virtualCFG/memberFunctions.C:1472
1472              SgFunctionDefinition* enclosingFunc = SageInterface::getEnclosingProcedure(this);

So this is the line we're examining:

SgFunctionDefinition* enclosingFunc = SageInterface::getEnclosingProcedure(this);

So the NULL value must be coming from SageInterface::getEnclosingProcedure(this);.

After code reviewing the function getEnclosingProcedure, we discovered a flaw in the algorithm.

The function tries to return a SgNode which is the enclosing procedure of the specified type, SgFunctionDefinition. However, upon checking the function's state at the point of return, we see that it incorrectly detected a SgBasicBlack as the enclosing procedure for the SgReturnStmt.

(gdb) p parent->class_name()
$12 = {static npos = 18446744073709551615,
   _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x7cd0e8 "SgBasicBlock"}}

Specifically, the last piece: 0x7cd0e8 "SgBasicBlock".

But this is wrong because we're looking for SgFunctionDefinition, not SgBasicBlock.

Upon further examination, we figured out that the function simply returned the first enclosing node it found, and not the first enclosing node that matched the user's criteria.

We added the necessary logic to make the function complete, tested it to verify its correctness, and then resolved the bug.

Most code development that is layered above the ROSE library starts out its life as a project in the projects directory. Some projects are eventually refactored into the ROSE library once they mature. This chapter describes how one adds a new project to ROSE.

Method 1: New simple ways to add

edit

Robb Matzke added a new feature in ROSE so you can more easily add a new project into ROSE/projects

  • Create a $ROSE/projects/whatever directory.
  • In that directory, create a "rose.config" file
  • In that file, add the line AC_CONFIG_FILES(projects/whatever/Makefile)

rose/config/support-projects.m4 will be updated by running ./build.

You still need to have your Makefile.am. One simplest example is

Method 2: Required Files

edit

A ROSE project encapsulates a complete program or set of related programs that use the ROSE library. Each project exists as a subdirectory of the ROSE "projects" directory and should include files "README", "config/support-rose.m4", "Makefile.am", and any necessary source files, scripts, tests, etc.

  • The "README" should provide an explanation about the project purpose, algorithm, design, implementation, etc.
  • The "support-rose.m4" integrates the project into the ROSE build system in a manner that allows the project to be an optional component (they can be disabled, renamed, deleted, or withheld from distribution without changing any ROSE configuration files). Most older projects are lacking this file and are thus more tightly coupled with the build system.
  • The "Makefile.am" serves as the input to the GNU automake system that ROSE employs to generate Makefiles.
  • Each project should also include all necessary source files, documentation, and test cases.

Setting up support-rose.m4

edit

The "config/support-rose.m4" file integrates the project into the ROSE configure and build system. At a minimum, it should contain a call to the autoconf AC_CONFIG_FILES macro with a list of the project's Makefiles (without the ".am" extension) and its doxygen configuration file (without the ".in" extension). It may also contain any other necessary autoconf checks that are not already performed by ROSE's main configure scripts, including code to enable/disable the project based on the availability of the project's prerequisites.

Here's an example:

dnl List of all makefiles and autoconf-generated                          -*- autoconf -*-
dnl files for this project
AC_CONFIG_FILES([projects/DemoProject/Makefile
                 projects/DemoProject/gui/Makefile
                 projects/DemoProject/doxygen/doxygen.conf
                ])

dnl Even if this project is present in ROSE's "projects" directory, we might not have the
dnl prerequisites to compile this project.  Enable the project's makefiles by using the
dnl ROSE_ENABLE_projectname automake conditional.  Many prerequisites have probably already
dnl been tested by ROSE's main configure script, so we don't need to list them here again
dnl (although it usually doesn't hurt).
AC_MSG_CHECKING([whether DemoProject prerequisites are satisfied])
if test "$ac_cv_header_gcrypt_h" = "yes"; then
        AC_MSG_RESULT([yes])
        rose_enable_demo_project=yes
else
        AC_MSG_RESULT([no])
        rose_enable_demo_project=
fi
AM_CONDITIONAL([ROSE_ENABLE_DEMO_PROJECT], [test "$rose_enable_demo_project" = yes])

Since all configuration for the project is encapsulated in the "support-rose.m4" file, renaming, disabling, or removing the project is trivial: a project can be renamed simply by renaming its directory, it can be disabled by renaming/removing "support-rose.m4", or it can be removed by removing its directory. The "build" and "configure" scripts should be rerun after any of these changes.

Since projects are self-encapsulated and optional parts of ROSE, they need not be distributed with ROSE. This enables end users to drop in their own private projects to an existing ROSE source tree without modifying any ROSE files, and it allows ROSE developers to work on projects that are not distributed publicly. Any project directory that is not part of ROSE's main Git repository will not be distributed (this includes not distributing Git submodules, although the submodule's placeholder empty directory will be distributed).

Setting up Makefile.am

edit

Each project should have at least one Makefile.am, each of which is processed by GNU automake and autoconf to generate a Makefile. See documentation for automake for details about what these files should contain. Some important variables and targets are:

  • include $(top_srcdir)/config/Makefile.for.ROSE.includes.and.libs: This brings in the definitions from the higher level Makefiles and is required by all projects. It should be near the top of the Makefile.am.
  • SUBDIRS: This variable should contain the names all the project's subdirectories that have Makefiles. It may be omitted if the project's only Makefile is in that project's top-level directory.
  • INCLUDES: This would have the flags that need to be added during compilation (flags like -I$(top_srcdir)/projects/RTC/include). Your flags should be placed before $(ROSE_INCLUDES) to ensure the correct files are found. This brings in all the necessary headers from the src directory to your project.
  • lib_*: These variables/targets are necessary if you are creating a library from your project, which can be linked in with other projects or the src directory later. This is the recommended way of handling projects.
  • EXTRA_DIST: These are the files that are not listed as being needed to build the final object (like source and header files), but must still be in the ROSE tarball distribution. This could include README or configuration files, for example.
  • check-local: This is the target that will be called from the higher level Makefiles when make check is called.
  • clean-local: Provides you with a step to perform manual cleanup of your project, for instance, if you manually created some files (so Automake won't automatically clean them up).

A basic example

edit

Many projects start as a translator, analyzer or optimizer, which takes into input code and generate output.

A basic sample commit which adds a new project directory into ROSE: https://github.com/rose-compiler/rose/commit/edf68927596960d96bb773efa25af5e090168f4a

Please look through the diffs so you know what files to be added and changed for a new project.

Essentially, a basic project should contain

  • a README file explaining what this project is about, algorithm, design, implementation, etc
  • a translator acts as a driver of your project
  • additional source files and headers as needed to contain the meat of your project
  • test input files
  • Makefile.am to
    • compile and build your translator
    • contain make check rule so your translator will be invoked to process your input files and generate expected results

To connect your project into ROSE's build system, you also need to

  • Add one more subdir entry into projects/Makefile.am for your project directory
  • Add one line into config/support-rose.m4 for EACH new Makefile (generated from each Makefile.am) used by your projects.

Installing project targets

edit

Install your project's content to a separate directory within the user's specified --prefix location. The reason behind this is that we don't want to pollute the core ROSE installation space. By doing so, we can reduce the complexity and confusion of the ROSE installation tree, while eliminating cross-project file collisions. It also keeps the installation tree modular.

Example

edit

This example uses a prefix for installation. It also maintains Semantic Versioning.

From projects/RosePoly:

  ## 1. Version your project properly (http://semver.org/)
  rosepoly_API_VERSION=0.1.0

  ## 2. Install to separate directory
  ##
  ##    Installation tree should resemble:
  ##
  ##    <--prefix>
  ##    |--bin      # ROSE/bin
  ##    |--include  # ROSE/include
  ##    |--lib      # ROSE/lib
  ##    |
  ##    |--<project>-<version>
  ##       |--bin      # <project>/bin
  ##       |--include  # <project>/include
  ##       |--lib      # <project>/lib
  ##
  exec_prefix=${prefix}/rosepoly-$(rosepoly_API_VERSION)

  ## Installation/include tree should resemble:
  ##   |--<project>-<version>
  ##      |--bin      # <project>/bin
  ##      |--include  # <project>/include
  ##         |--<project>
  ##      |--lib      # <project>/lib
  librosepoly_la_includedir = ${exec_prefix}/include/rosepoly

Generate Doxygen Documentation

edit

0. Install Doxygen tool

Using MacPorts for Apple's Mac OS:

  $ port install doxygen

  # set path to MacPort's bin/
  # ...

Using one of the LLNL machines:

  $ export PATH=/nfs/apps/doxygen/latest/bin:$PATH


1. Create a Doxygen configuration file

  $ doxygen -g

Configuration file `Doxyfile' created.

Now edit the configuration file and enter

  doxygen Doxyfile

to generate the documentation for your project


2. Customize the configuration file (Doxyfile):

...

# If the EXTRACT_ALL tag is set to YES doxygen will assume all entities in
# documentation are documented, even if no documentation was available.
# Private class members and static file members will be hidden unless
# the EXTRACT_PRIVATE and EXTRACT_STATIC tags are set to YES

EXTRACT_ALL            = YES

...

# If the value of the INPUT tag contains directories, you can use the
# FILE_PATTERNS tag to specify one or more wildcard pattern (like *.cpp
# and *.h) to filter out the source-files in the directories. If left
# blank the following patterns are tested:
# *.c *.cc *.cxx *.cpp *.c++ *.d *.java *.ii *.ixx *.ipp *.i++ *.inl *.h *.hh
# *.hxx *.hpp *.h++ *.idl *.odl *.cs *.php *.php3 *.inc *.m *.mm *.dox *.py
# *.f90 *.f *.for *.vhd *.vhdl

FILE_PATTERNS          = *.cpp *.hpp

# The RECURSIVE tag can be used to turn specify whether or not subdirectories
# should be searched for input files as well. Possible values are YES and NO.
# If left blank NO is used.

RECURSIVE              = YES

...


3. Generate the Doxygen documentation

  # Invoke from your top-level directory
  $ doxygen Doxyfile


4. View and verify the HTML documentation

  $ firefox html/index.html &

5. Add target to your Makefile.am to generate the documentation

.PHONY: docs
docs:
    doxygen Doxyfile # TODO: should be $(DOXYGEN)

If you are trying to fix a bug ( your own or a bug assigned to you to fix). Here are high level steps to do the work

Reproduce the bug

edit

You can only fix a bug when you can reproduce it. This step may be more difficult than it sounds. In order to reproduce a bug, you have to

  • find a proper input file
  • find a proper translator: a translator shipped with ROSE is easy to find. But be patient and sincere when you ask for a translator written by users.
  • find a similar/identical software and hardware environment: a bug may only appear on a specific platform when a specific software configuration is used

Possible results for this step:

  • You can reproduce the bug reliably. Bingo! Go to the next step.
  • You cannot reproduce the bug. Either the bug report is invalid or you have to keep trying.
  • You can reproduce the bug once a while (random errors). Oops. This is kind of difficult situation.

Find causes of the bug

edit

Once you can reproduce the bug. You have to identify the root cause of the bug using a debugger like gdb.

Common steps involved

  • simplify the input code as much as possible: It can be very hard to debug a problem with a huge input. Always try to prepare the simplest possible code which can just trigger the bug.
    • Often, you have to use a binary search approach to narrow down the input code: only use half of the input at a time to try. Recursively cut the input file into two parts until no further cut is possible while you can still trigger the bug.
  • forward tracking: for the translator, it usually takes input and generate intermediate results before the final output is generated. Using a debugger to set break points at each critical stages of the code to check if the intermediate results are what you expect.
  • backwards tracking: similar to the previous techniques. But you just back tracking the problem.

Fix the bug

edit

Any bug fix commit should contain

  • a regression test: so make check rules can make sure the bug is actually fixed and no further code changes will make the bug relapse.

Often a feature added into ROSE comes with a set of command line options. These options can enable and customize the feature.

For example, the OpenMP support in ROSE is disabled by default. A special option is need to enable it. Also, the support can be as little as simply parsing the OpenMP directive or as complex as translating into multithreaded code.

This HOWTO quickly go through key steps to add options.

internal flags

edit

Options need to be stored somewhere. There are several choices for the storage,

  • as a data member of SgProject , if the optiona is applicable to all files associated with a SgProject
  • as a data member of SgFile, if the option is applicable to a single source file, or
  • a member variable in a namespace you define, if the option is for some transformation or analysis.

If the option can be as specific as per file, it is recommended to add a new data member to SgFile to save the option value.

For example, here is a command line option to turn on the UPC language support:

ROSE/src/ROSETTA/src/support.C // add a date member for SgFile

    // Liao (6/6/2008): Support for UPC model of C , 6/19/2008: add support for static threads compilation
    File.setDataPrototype         ( "bool", "UPC_only", "= false",
                                    NO_CONSTRUCTOR_PARAMETER, BUILD_ACCESS_FUNCTIONS, NO_TRAVERSAL, NO_DELETE);

ROSETTA process this information to automatically generate a member and the corresponding member access functions (set/get_member()).

process the option

edit

Command line options should be handled within src/frontend/SageIII/sage_support/cmdline.cpp .

File level options are handled by void SgFile::processRoseCommandLineOptions ( vector<string> & argv )


Example code for processing the -rose:openmp option

     set_openmp(false);
     ROSE_ASSERT (get_openmp() == false);
       ...
     if ( CommandlineProcessing::isOption(argv,"-rose:","(OpenMP|openmp)",true) == true )
        {
          if ( SgProject::get_verbose() >= 1 )
               printf ("OpenMP option specified \n");
          set_openmp(true);
         //side effect for enabling OpenMP, define the macro as required
           argv.push_back("-D_OPENMP");
        }


ROSE commandline options should be removed after being processed, to avoid confusing the backend compiler

SgFile::stripRoseCommandLineOptions ( vector<string>& argv ) should have the code to strip off the option.

use the option

edit

In your code, you can use the automatically generated access functions to set/retrieve the stored option values.

For example

  if (sourceFile->get_openmp())
     //... do something here ....

document the option

edit

Any option should be explained by the online help output.

Please add brief help text for your option in void SgFile::usage ( int status ) of ./src/frontend/SageIII/sage_support/cmdline.cpp:

Lessons Learned

edit

Here we collect things to do due to some past lessons.

Do Not Format/Indent other people's code

edit

Lesson:

  • A developer tried to understand a staff member's source code. But he found that the code's indentation was not right for him. So he re-formatted the source files and committed the changes. Later, the staff member found that his code was changed too much and he could not read it anymore.
  • Even worse, people will have difficulties in merging changes mixed with indentation changes and real changes.

Solution:

  • Please don't reformat code you do not own or will not maintain.

Physical locations matter

edit

Lesson

  • we had a student who was assigned a desk which was in a deep corner of a big room. The desk was also far away from other interns. As a result, that student had less interactions with others. He had to solve problems with less help.

Solution:

  • Locations MATTER! Sit closer to people you should interact often. Make your desk/office accessible to others. Physically isolated office/desk may have very negative impact on your productivity.

Choose your development platform carefully

edit

Lesson

  • Somehow new inters were assigned Mac OS X machines by default. But some of them may not be familiar with Apple machines or even dislike Mac OS X's user interface, including keyboard, window system, etc (a love-hate thing for Apple products). So they felt stuck with an uncomfortable development platform. We had interns who could not type smoothly on Mac keyboard even after one month. This is unnecessary.

Solution

  • Provide choice up front: Linux or Mac OS X. Reminder people that they have freedom to choose the platform they personally enjoy.

Use different git repositories for different tasks

edit

Lesson:

  • A developer used different branches of the same git repository to do different tasks: fixing bugs, adding a new feature, and documenting something. Later on he found that he could not commit and push the work for one task since the changes for other tasks are not ready.

Solution:

  • using separated git repositories for different tasks. So the status of one task won't interfere with the progress of other tasks.

Introducing software dependencies very carefully

edit

Lesson

  • ROSE did not depend on boost C++ library in the beginning. But later on, some developers saw the benefits of Boost and advocated for it. Eventually, Boost becomes the required software to use ROSE.
  • But Boost library has its disadvantages: hard to install (just see how many boost issues on our public mailing list), lack of backward compatibility (codes using older version of boost break on new versions), huge header files with complex C++ templates slowing down compilation or even breaking some compilers.
  • We still have internal debates about what to do with Boost. It is often a painful and emotional process.

Solution:

  • Introducing big software dependency very carefully. Or you will get stuck easily.
  • At least ask people who advocate for new software dependency to be responsible for maintaining it for 5 years and providing an option to turn it off at the same time.

Create Exacting Tests Early and Often

edit

Lesson:

  • A developer created tests that were too broad, mostly because they were included late in development. This led to passes that should not have passed, that is passing all tests even though the code had been broken.

Solution:

  • Make sure that tests check results carefully. This is made much easier by making sure your functions have precisely ONE intention. E.g. if you need to transform data and operate on the transformed data, split the transformation and the operation into two functions (at least).

Keep Code Readable While Coding

edit

Lesson:

  • A developer wrote code without commenting initially, then came back to the code and had to go through the arduous task of understanding

his own unreadable code.

Solution:

  • Keep variable and function names meaningful. Do full documentation as you go, do not leave it for later.

Think Before You Code

edit

Lesson:

  • A developer wrote code without minding the structure. This led to bloated and unreadable code that would have to be

refactored several times.

Solution:

  • A programmer must code AND design, not just code. Well structured code is much easier to read then badly structured code

Remember The User

edit

Lesson: A developer wrote the code without knowing what the users actually needed. This led to serious refactoring that could have been avoided, or at least made simpler, if he had concentrated on the user at all times.

Solution: Whenever possible ask users for their input. It will save you a lot of trouble in the long run.

The User is Paramount

edit

Lesson: A developer wrote a rather obtuse component without understanding exactly what the user might want this for

Solution: At the very least check that the input and output are what the user wanted, this will save much time and aggravation

references

edit

http://www.projectsmart.co.uk/lessons-learned.html

Testing

edit

ROSE uses Jenkins to implement a contiguous integration software development process. It leverages a range of software packages to test its correctness, robustness, and performance.

make check rules

edit

we leverage make check rules to do internal testing.

check exist status of pipeline command

edit

In bash scripting, we can use pipelines | as follows:

  • command1 | command2 : the output of each command in the pipeline is connected to the input of the next command

each command is executed in its own subshell, exit status: the last command's exit status

To catch any command's return code, please use ${PIPESTATUS[0]}

For example: Using pipeline will only return the last command 'fold''s status. we add a test to catch the first command's return status

 ../autoPar -c $(srcdir)/$(@:.o=.c) | fold >$(@:.o=.out); test ${PIPESTATUS[0]} = 0

Benchmarks

edit

The software used by the ROSE's Jenkins include:

  • SPEC CPU 2006 benchmark: a subset is supported for now
  • SPEC OMP benchmark: a subset is supported for now
  • NAS parallel benchmark: developed by NASA Ames Research Center. Both C (customized version) and OpenMP versions are used
  • Plum Hall C and C++ Validation Test Suites: a subset is supported for now
  • Jt++ - Java conformance testing: http://modena.us/

Modena Jt++ Test Suite

edit

1. Clone the Modena test suite repository:

  $ git clone ssh://rose-dev@rose-git/modena

2. Autotools setup

  $ cd modena
  $ ./build.sh
  + libtoolize --force --copy --ltdl --automake
  + aclocal -I ./acmacros -I ./acmacros/ac-archive -I /usr/share/aclocal
  + autoconf
  + automake -a -c
  configure.ac:4: installing `./install-sh'
  configure.ac:4: installing `./missing'

3. Environment bootstrap

  $ source /nfs/apps/python/latest/setup.sh

4. Build and test!

  $ mkdir buildTree
  $ cd buildTree
  $ ../configure \
          --with-sqlalchemy=${HOME}/opt/python/sqlalchemy/0.7.5/lib64/python2.4/site-packages \
          --with-target-java-interpreter=java \
          --with-target-java-compiler=testTranslator \
          --with-target-java-compiler-flags="-ecj:1.6" \
          --with-host-java-compiler-flags="-source 1.6"

Jenkins

edit

Using External Benchmarks

edit

The way we set it up is to

  • In the benchmark, we change the benchmark's build system to call the ROSE tool (identityTranslator or your RTED tool) installed.
  • In the Jenkins test job,
    • Build and install the tested ROSE, prepare environment variables.
    • Go to the benchmark with modified build system. Build and run the benchmark.

Basically, the test job should simulate how a ROSE tool would be used by end-users, not by tweaking ROSE for each different benchmarks.

NAS Parallel Benchmarks

edit

We have NPB as part of our regression tests.

To get the benchmark:

git clone rose-dev@rosecompiler1.llnl.gov:testsuite/npb-c-parallel.git

Within the benchmark, there is a make.def file with configuration about which compiler and options to use. The configuration should be correct but some path may be changed to point to your version of ROSE.

After that, type "make suite" to build the entire benchmark suite. Or type "make mg class=A" to just build the benchmark in question.

Introduction

edit

The ROSE project has been through multiple stages of source content management, starting from CVS, then subversion, and now Git.

Git becomes the official source code version control software due to its unique features, including

  • Distributed source code management. Developers can have a self-contained local repository to do their work anywhere they want, without the need for active connection to a central repository.
  • Easy merge. Merging using Git is as simple as it can get.
  • Backup. Since easy clone of our central repository can serve as a standalone repository. We no longer worry too much about losing the central repository.
  • Integrity. Hashing algorithm used by Git ensures that you will get out what you have put into the repository.

Many other prominent software projects have also been through the similar switch from Subversion to Git, including

A more comprehensive list of Git users is given by https://git.wiki.kernel.org/index.php/GitProjects

In summary, Git IS the state-of-the-art for source code management.

git 1.7.10 or later for github.com

edit

github requires git 1.7.10 or later to avoid HTTPS cloning errors, as mentioned at https://help.github.com/articles/https-cloning-errors

Ubuntu 10.04's package repository has git 1.7.0.4. So building later version of git is needed. But you still need an older version of git to get the latest version of git.

 apt-get install git-core

Now you can clone the latest git

 git clone https://github.com/git/git.git

Install all prerequisite packages needed to build git from source files(assuming you already installed GNU tool chain with GCC compiler, make, etc.)

 sudo apt-get install gettext zlib1g-dev asciidoc libcurl4-openssl-dev
 $ cd git  # enter the cloned git directory
 $ make configure ;# as yourself
 $ ./configure --prefix=/usr ;# as yourself
 $ make all doc ;# as yourself
 # make install install-doc install-html;# as root

Converting from a Subversion user

edit

If you're coming from a centralized system, you may have to unlearn a few of the things you've become accustomed to.

  • For example, you generally don't checkout out a branch from a central repo, but rather clone a copy of the entire repository for your own local use.
  • Also, rather than using small, sequential integers to identify revisions, Git uses a cryptographic hash (SHA1), although in general you only need to ever write the first few characters of the hash--just enough to uniquely identify a revision.
  • Finally, the biggest thing to get used to: ALL(!) work is done on local branches--there's no such thing in the DSCM world as working directly on a central branch, or checking your work directly into a central branch.

Having said that, distributed revision control is a superset of centralized revision control, and some projects, including ROSE, set up a centralized repository as a policy choice for sharing code between developers. When a developer works on ROSE, they generally clone from this central location, and when they've made changes, they generally push those changes back to the same central location.

Git Convention

edit

Name and Email

edit

Before you commit your local changes, you MUST ensure that you have correctly configured your author and email information (on all of your machines). Having a recognizable and consistent name and email will make it easier for us to evaluate the contributions that you've made to our project.

Guidelines:

  • Name: You MUST use your official name you commonly use for work/business, not nickname or alias which cannot be easily recognized by co-workers, managers, or sponsors.
  • Email: You MUST use your email commonly used for work. It can be either your company email or your personal email (gmail) if you DO commonly use that personal email for business purpose.

To check if your author and email are configured correctly:

  $ git config user.name
  <your name>

  $ git config user.email
  <your email>

Alternatively, you can just type the following to list all your current git configuration variables and values, including name and email information.

  $ git config -l


To set your name and email:

  $ git config --global user.name "<Your Name>"
  $ git config --global user.email "<your@email.com>"

Branch Naming Convention

edit

All developer central repository branches should be named using the following pattern

  • LOGIN-PURPOSE-OPTION
    • NAME is typically a login name or surname.
    • PURPOSE is a single-word description of the type of work performed on that branch, such as "bugfixes".
    • OPTION is information for ROSE robots with regards to your branch.
      • -test Changes to the branch are automatically tested
      • -rc Changes are tested and if they pass then they're merged into the "master" branch (like "trunk" in Subversion).
  • EXAMPLE:
    • The "matzke-bugfixes-rc" branch is "owned" by Robb Matzke (i.e., he's the one that generally makes changes to that branch), it probably contains only bug fixes or minor edits, and it's being automatically tested and merged into the master branch for eventual release to the public.

Commit messages

edit

It is important to have concise and accurate commit messages to help code reviewers do their work.

Example commit message, excerpt from link

(Binary Analysis) SMT solver statistics; documentation

* Replaced the SMT class-wide number-of-calls statistic with a
  more flexible and extensible design that also tracks the amount
  of I/O between ROSE and the SMT solver.  The new method tracks
  statistics on a per-solver basis as well as a class-wide basis, and
  allows the statistics to be reset at artibrary points by the user.

* More documentation for the new memory cell, memory state, and X86
  register state classes.
  • (Required) Summary: the first line of the commit message is a one line summary (<50 words) of the commit. Start the summary with a topic, enclosed in parentheses, to indicate the project, feature, bugfix, etc. that this commit represents.
  • (Optional) Use a bullet-list (using an asterisk, *) for each item to elaborate on the commit

Also see http://spheredev.org/wiki/Git_for_the_lazy#Writing_good_commit_messages.

Push

edit

Creating and deleting branches on the remote repository is accomplished with git-push.

This is its general form:

$ git push <remote> <syntaxhighlight-ref>:<destination-ref>
  • When you clone a repository, the default <remote> is called "origin"
  • The <source-ref> is the branch in your local repository (cloned from <remote>) that you want to create or synchronize with the <remote>
  • The <destination-ref> is the branch that you want to create on the <remote>

Create remote branch

edit

Example:

$ git remote -v
origin	https://github.com/rose-compiler/rose.git (fetch)
origin	https://github.com/rose-compiler/rose.git (push)

$ git branch
* master

# Method 1
$ git push origin master:refs/heads/master

# Method 2 - The currently checked out branch (see git-branch) is also called the <tt>HEAD</tt>
$ git push origin HEAD:refs/heads/master

# Method 3 - Git is pretty smart -- if you only specify one name, it will use it as both
# the source and destination.
$ git push origin master

Delete remote branch

edit

Deleting a remote branch is simply a matter of specifying nothing as the <source-ref>. To delete the branch my-branch, issue this git-push command:

$ git push origin :refs/heads/my-branch

Rebase

edit

It is recommended to rebase your branch before pushing your work. So your local commits will be moved to the head of the latest master branch, instead of being interleaved with commits from master.

git pull origin master
git rebase master

From http://gitready.com/intermediate/2009/01/31/intro-to-rebase.html

Rebase helps to cut up commits and slice them into any way that you want them served up, and placed exactly where you want them. You can actually rewrite history with this command, be it reordering commits, squashing them into bigger ones, or completely ignoring them if you so desire.

Why is this helpful?

  • One of the most common use cases is that you’ve been working on your own features/fixes/etc in separate branches. Instead of creating ugly merge commits for every change that is brought back into the master branch, you could create one big commit and let rebase handle attaching it.
  • Another frequent use of rebase is to pull in changes from a project and keep your own modifications in line. Usually by doing merges, you’ll end up with a history in which commits are interleaved between upstream and your own. Doing a rebase prevents this and keeps the order in a more sane state.

Modifying a submodule

edit

ROSE uses submodule to link to EDG files.

the default checked out version of submodule is on a ghost branch

  [youraccount@yourmachine:~/rose/src/frontend/CxxFrontend/EDG]git branch
  * (no branch)
    master

You have to create a local branch before you can change the submodule, you should create a new branch based on the ghost branch, which may or may not correspond to the remote master. In our settings, we push EDG changes to non-master branches so most likely the ghost branch is tied to a non-master branch.

$ git checkout -b fix-up

Do you changes then

Always commit and push the submodule changes first

  $ git commit -a -m "Updated the submodule from within the superproject."  // commit locally
  $ git push origin HEAD:refs/heads/your-account/edg4x-rc  // push to your own remote branch

Finally change the super project's link to the changed submodule

  $ cd ..           # back down to its parent repository's path
  $ git add EDG  # Please note: NEVER use "git add EDG/"  !!!! , this will add all files under EDG/ to the super project!!
  $ git commit -m "Updated submodule EDG."
  $ git push

References

edit

Lattices

edit

Introduction

edit

Lattices are mathematical structures. They can be used as a general way to express an order among objects. This data can be exploited in data flow analysis.

Lattices can describe transformations effected by basic blocks on data flow values also known as flow functions.

Lattices can describe data flow frameworks when instantiated as algebraic structures consisting of a set of data flow values, a set of flow functions, and a merge operator.

Poset

edit

Partial ordering:  

A partial ordering is a binary relation   over a set P which is reflexive, antisymmetric and transitive, i.e.

  • Reflexive x<=x
  • Anti-Symmetric, if   then x=y
  • Transitive: if   then  

Partial orders should not be confused with total orders. A total order is a partial order but not vice versa. In a total order any two elements in the set P can be compared. This is not required in a partial order. Two elements that can be compared are said to be comparable

A partially ordered set, also known as a poset, is a set with a partial order.

Given a poset there may exist an infimum or a supremum. However, not all posets contain these.

Given a poset P with set X and order  :

An infimum of a subset S of X is an element a of X such that

  •   for all x in S and
  • for all y in X, if for all x in S,   then  

The dual of this notion is the supremum which has the definition of infimum if you switch   with  

If we simply pick an element of X that satisfies the first condition we have a lower bound. The second condition ensures that we have (if it exists) the unique greatest lower bound. Similarly for suprema.

A lattice is a particular kind of poset. In particular, a lattice L is a poset P(X,   where For any two elements of the lattice a and b, the set {a, b} has a join and a meet

The join and meet operations MUST satisfy the following conditions

  • 1) The join and meet must commute
  • 2) The join and meet are associative
  • 3) The join and meet are idempotent, that is, x join itself or x meet itself are both x.

If the lattice contains a meet it is a meet-semilattice, if a lattice contains a join it is a join-semilattice, similarly there exists a meet-semilattice

(Definitions obtained from wikipedia with minimal modification)

Lattice Definition

edit

Definition of a Lattice (L,  ,   )

  • L is a poset under   such that
    • Every pair of elements has a unique greatest lower bound (meet) and least upper bound (join)
    • Not every poset is a lattice: greatest lower bounds and least upper bounds need not exist in a poset.

Infinite vs. Finite lattices

edit
  • Infinite: An infinite lattice does not contain an 0 (bottom) or 1 (top) element, even though every pair of elements contains a greatest lower bound and a least upper bound on the entire underlying set. By the definition of unbounded or infinite sets we know that given X an unbounded set given any x in X we can find an x' that is greater than x (under some ordering, in this case the lattice). Similarly for greatest lower bounds.
  • a finite/bounded lattice: the underlying set itself has a greatest lower bound and a least upper bound, For now we will call the greatest lower bound 0 and the least upper bound 1.
    • if a  x, for all x in L, then a is the 0 element of L,  , recall that this is a unique element
    • if a  x for all x from L, then a is the 1 element of L,  


Meet   is a binary operation such that a   b take the greatest lower bound of the set (this is guaranteed by the definition lattice.

Similarly Join   returns the least upper bound of the set, guaranteed to exist by the definition of a lattice.

To recap, a lattice L is a triple {X,  ,  } composed of a set, a Meet function, and a Join function

Properties of Meet and  .

  • We refer to the   as   and the   as J
  • Closure: If x and y belong to L, then there exists a unique z and a unique w from L such that x   y = z, and x   y = w
  • Commutativity: for all x, y in L, x   y = y meet x, x   y = y   x:
  • Associativity: (x   y)   z = x   (y   z), similarly in the   operation
  • There are two unique elements of L called bottom ( _|_ ), and top (T) , such that for all x, x   _|_ = _|_ and x   T = T
  • Many lattices, with some exceptions, notably the lattice corresponding to constant propagatioin, are also distributive: x   y  z = (x  z)   (y  z)

Lattices and partial order:

  if and only if  

A strictly ascending chain is a sequence of elements of a set X such that, for x_i in X,   has the property  . The greatest is the chain with final index n such that n is the greatest such final index among all strictly ascending chains.

The height of a lattice is defined as the length of the longest strictly ascending chain it contains.

If a data-flow analysis lattice has a finite height and a monotonic flow function then we know that the associated data flow analysis algorithm will terminate.

  • Example: If the greatest strictly ascending chain of a lattice L is finite and it takes finitely many steps to reach the top, we can infer that the associated data flow algorithm terminates.

(wikipedia used for definitions)

Example: Bit vector Lattices

edit
  • The elements of the set are bit vectors
  • The bottom is the 0 vector
  • The top is a 1 vector
  • Meet is a bitwise And
  • Join is a bitwise Or

  denotes the lattice of bit vectors of length n.

Constructing complex lattices from multiple less complex lattices

  • Example: The product operation which combines (concatenates) lattices elementwise
    • The product of two lattices L1 and L2 with meet operators M1, M2, respectively: L1 x L2
    • The elements in the lattice: {<x1, x2> | x1 from L1, x2 from L2}
  • The meet operator: <x1, x2> M <y1, y2> = <x1 M y1, x2 M y2>
  • The join operation: <x1, x2> J <y1, y2> = <x1 J y1, x2 J y2>
  • Example:
    • BV^n is the product of n copies of the trivial bit vector attice BV^1 with bottom 0 and top 1

Graphical Representation BV^3

          111
     /     |    \
110       101      011
 |    x        x   \
100       010      001
    \     |     /
          000


Here meet and join operators induce a partial order on the lattice elements

x is less than or equal to (<=) y if an only if x M y = x

For the BV^3: 000<= 010 <= 101<=111


The partial order on the lattice is:

  • Transitive x <= y and y <= z, then x <=z
  • Antisymmetric: if x<=y and y<=x, then x = y
  • Reflexive: for all x: x<=x:

The height of the lattice is the length of its longest strictly ascending chain:

  • The maximal n such that there exists a strictly ascending chain x1, x2, ..., xn such that
  • Bottom = x1 < x2 < xn = Top

For BV^3 lattice, height = 4

Monotonic Functions

edit

A monotonic function is a function that preserves an ordering.

Examples

edit

A function f from L to itself, f: L -> L, is monotonic if for all x, y from L, x<=y ==> f(x)<=f(y)

f: BV^3 -> BV^3: f (<x1 x2 x3>) -> <x1 1 x3>

Lattice Tuples

edit

Simple analyses may require complex lattices:

  • Problem:
    • Reaching Constants: V 2^(v*c) where v is the number of variables and c is the constants
  • Solution:
    • Construct a tuple of lattices where each lattice corresponds to a variable

V = constant U {Top, Bottom}


integer value: ICP

edit

This is used in constant propagation Elements: Top, Bottom, Integers, Booleans

  • n M Bottom = Bottom
  • n J Top = Top
  • n J n = n M n = n
  • Integers and booleans m,n, if m != n, then m M n = Bottom, m J n = Top
    • The lattice has three levels: the top element, all other elements, the bottom element
    • Join operation: Higher level to lower level
    • Meet operation: Lower level to higher level

Relevance to data flow analysis

edit

A lattice provides a set of flow values to a particular data flow analysis.

Lattices are used to argue the existence of a solution obtainable through fixed-point iteration

  • At each program point a lattice represents an IN[p] or OUT[p] set (flow value)
  • meet: merge flow values, e.g. set union, deal with control flow branches merge
  • Top usually represents the best information (initial flow value). Note people can use top to represent worst-base information also!!
  • The bottom value represents the worst-base information
  • if BOTTOM <= x <= y <= TOP , then x is a conservative approximation of y. e.g. x is a superset

e.g. liveness analysis

edit

bitvector for all variables x_1, x_2, ..., x_n

First step: design the lattice values

  • top value: empty set {}, initial value, knowing nothing
  • bottom value: all set {x_1, x_2, ..., x_n}: max possible value, knowing every variable is live

n = 3, 3 variable case: a flow value==> a set of live variable at a point

S = {v1, v2, v3}

value set: 2^3 = { empty, {v1},{v2}, {v3}, {v1, v2}, {v1,v3}, {v2, v3}, {v1, v2, ve} }

Design lattice

  • top value, best case: none live { T } // top
  • bottom value, worst ase: all live {v1, v2, v3}

Design meet: set Union (Or operation): bring the value down to the bottom, context insensitive

  • design partial order <= -->  


In between, a partial order: inferior/conservative solutions are lower on the lattice

         Top
      /    |   \
   {v1}   {v2}  {v3}
    |    x      x   |  
{v1, v2}  {v1,v3}  {v2,v3}
      \     |      /
      {v1, v2, v3} = Bottom


Flow function F:  }

reaching definition

edit

Value: 2^n n = number of all definitions

top: empty set: knowing nothing, initial value

bottom: all set: all definitions are reaching definition

Meet operation: set union: bring down the levels of values, from unknowing to knowing

C++ Programming

edit

ROSE is written in C++. Some users have suggested to mention the major C++ programming techniques used in ROSE so they can have more focused learning experiences as C++ beginners.


Design Patterns: ROSE uses some common design patterns

Good API Design

edit

Google: "How to Design a Good API and Why it Matters" by Joshua Bloch

TODO: convert from Markdown

Characteristics of a Good API

edit
  • Easy to learn
  • Easy to use, even without documentation
  • Hard to misuse
  • Easy to read and maintain code that uses it
  • Sufficiently powerful to satisfy requirements
  • Easy to extend
  • Appropriate to audience

The Process of API Design

edit

[T]he repetition of a very short development cycle: first the developer writes a failing automated test case that defines a desired improvement or new function, then produces code to pass that test and finally refactors the new code to acceptable standards.

    • Doubles as examples/tutorials and unit tests
  • Maintain realistic expectations
    • You won't be able to please everyone... aim to displease everyone equally
    • Expect to evolve API; mistakes happen; real-world usage is necessary

General Principles

edit
  > [A] measurement of actual performance [power / weight]
  • Don't give users a gun to shoot themselves with
    • Information hiding: minimize the accessibility of everything

Documentation Matters

edit
  • Class: what an instance represents
* Method: contract between method and calling client (preconditions, postconditions, and side-effects)
* Parameter: indicate units, form, ownership

Pre- and Post- Conditions

  • The precondition statement indicates what must be true before the function is called.
  • The postcondition statement indicates what will be true when the function finishes its work.
  /// \post <return_value>.empty() == false
  

API vs. Implementation

edit

Implementation details should not impact the API. Don't let implementation details "leak" into the API.

Performance

  • Design for usability, refactor for performance
  • Do not warp the API to gain performance
  • Effects of API design decisions on performance are real and permanent:
    • Component.getSize() returns Dimension
    • Dimension is mutable
    • Each getSize call must allocate Dimension
    • Causes millions of needless object allocations

"Harmonize"

edit
  • API must coexist peacefully with platform
    • Do what is customary (standard)
    • Avoid obsolete parameter and return types
    • Mimic patterns in core APIs and language
    • Take advantage of API-friendly features: generics, varargs, enums, default arguments
  • Don't make the client do anything the module could do
    • Reduce need for boilerplate code
  • Don't violate the [Principle of Least Astonishment](http://en.wikipedia.org/wiki/Principle_of_least_astonishment)
  > The design should match the user's experience, expectations, and mental models...aims to exploit users' pre-existing knowledge as a way to minimize the learning curve
  • Provide programmatic access to all data available in string form => no client string parsing necessary
  • Overload with care: ambiguous overloadings

Names Matter

edit
  • Largely self-explanatory (avoid cryptic abbreviations)
  • Be consistent (e.g. same word means same thing)
  • Strive for symmetry
  • Should read like [prose](http://en.wikipedia.org/wiki/Prose)
 > [T]he most typical form of language, applying ordinary grammatical structure and natural flow of speech rather than rhythmic structure (as in traditional poetry).
      if (car.speed() > 2 * SPEED_LIMIT)
        generateAlert("Watch out for cops!");
    

Input Parameters

edit
  • interface types over classes: flexibility, performance
  • most specific possible type: moves error from runtime to compile time
  • use double (64 bits) rather than float (32 bits): precision loss is real, performance loss negligible
  • consistent ordering:
      #include <string.h>
      char *strcpy (char *dest, char *src);
      void bcopy   (void *src,  void *dst, int n); // bad!
    
  • short parameter lists: 3 or fewer; more and users will have to refer to docs; identically typed params harmful
    • Two techniques for shortening: 1) break up method, 2) create helper class to hold parameters

Return Values

edit
  • Avoid values that demand exceptional processing
 > For example, return a `zero-length array` or `empty collection`, not `null`

Exceptions

edit
  • don't force client to use exceptions for control flow
  • don't fail silently
  • favor unchecked exceptions
  • include failure-capture diagnostic information
  • Fail fast: report errors ASAP
    • Compile time: static typing, generics
    • Run time: error on first bad method invocation (should be failure-atomic)

Who is using ROSE

edit

We are aware of the following ROSE users (people who write their own ROSE-based tools). They are the reason of the ROSE's existence. Feel free to add your name if you are using ROSE.

Universities

edit
  • University of California, San Diego, CUDA code generator link
  • University of Utah, compiler-based parameterized code transformation for autotuning
  • University of Oregon, performance tool TAU
  • University of Wyoming, OpenMP error checking
  • Tokyo Institute of Technology
  • RENCI (RENaissance Computing Institute)
  • Indian Institute of Technology Kanpur
  • University of Alabama at Birmingham iProgress
  • Tohoku University, XML-based AST transformation Xevolver
  • University of Central Florida, Autonomous Transformation of Lock-Free Data Structures

DOE national laboratories

edit
  • Argonne National Laboratory, performance modeling

Companies

edit
  • Samsung: its research center at San Jose uses ROSE for multicore research and development.

TODO List

edit

What is missing (so you can help if you want)

How to backup/mirror this wikibook?

edit

Just in case this website is down, how to download a backup of this wiki book?

How to set up a mirror wiki website containing the wikibook of ROSE?

Maintain the print version

edit

It is possible that new chapters are added but they are not reflected in the one-page print version. So periodical synchronization is needed by including more chapters or re-arranging their order in the one-page print version.

Observations:

  • A print version is similar to a source file with included contents, each included chapter will have a first level of heading
  • Because the first level heading (=) is used by the print version page to include all chapters, all included pages/chapters should NOT contain any first level heading.

With the basic understanding of how this work, you can now edit the print version's wiki page:

More at: http://en.wikibooks.org/wiki/Help:Print_versions

Maintain the better pdf file

edit

The pdf version automatically generated from the print version page is rudimentary. It has no table of content and pagination etc.

So we used a manual process to generate better pdf file. We need to occasionally repeat this process to have a up-to-date and better pdf file.

Here are the manual steps:

  • Use your web browser to open and save the print version to your own computer as "web page complete"
  • use the HTML-compatible word processor of your choice to open the html file, convert html to a format the word processor, and add paginate the book.
    • In Microsoft Word, this can done by
      • opening the saved HTML file
      • saving it to a word file
      • adding table of content by selecting Insert > Field > Index and Tables > TOC or Preferences-> Table of contents for Word 2012 or later.
      • adding page numbers to the footer
      • save it to a pdf file with a name like ROSE_Compiler_Framework.pdf
      • upload to wikibooks

To add a link to your wikibook page, insert

{{PDF version|pdf file name without .pdf|size kb, number pages|file description}}

For example

{{PDF version|ROSE_Compiler_Framework|840 kb, 48 pages|ROSE_Compiler_Framework}}

More background about pdf verions: at: http://en.wikibooks.org/wiki/Help:Print_versions

Documentation Alternatives

edit

1. Google Docs: comments, different output formats, easy collaboration 2. AsciiDoc

Sandbox

edit
ROSE_Compiler_Framework/name of the page

.


Some common tricks to write things on wikibooks/wikipedia (both are using the mediawiki software).

How to create a new page

edit

Usually you have to start a new page from an existing wikipage.

Go to the wiki page you want to have a link to the new page you want to create

  • click the edit tab the existing page
  • at the place you want to have a link to the new page, use
     [[ROSE_Compiler_Framework/name of the page]]
    .
  • If there is already a page with the desired name. It will become a link to the page.
  • If not, the link is red so you can click the red link to enter editing model to add content to the page.

Please link the new page to the print version of this wikibook so it can be visible in the print out.

How to do XYZ in wiki?

edit

The best way is to goto en.wikipedia.com and find a page with the output you want. Then pretend to edit the page (by clicking edit) to see the source used to generate the output.

For example, you want to know how C++ syntax highlighting is obtained in wikibook. Go to en.wikipedia.com and find the page for C++. There must be sample code snippet.

Then you pretend to edit it to see the source: http://en.wikipedia.org/w/index.php?title=C%2B%2B&action=edit&section=6

You will see the source code generating the syntax highlighting:

<syntaxhighlight lang="cpp">

# include <iostream>

int main()
{
   std::cout << "Hello, world!\n";
}
</syntaxhighlight>

How to add comments which are only visible to editor, not readers of a page?

edit

Use the HTML comments: for example, the following comment will not show up in the paper rendered. But it is visible to editor to reminder why things are done in certain way.

<!-- Please keep the pixel size to 400 so they are clean in the pdf version, Thanks!  -->
[[File:Rose-compiler-code-review-1.png|thumb|400px|Code review using github.llnl.gov]]

Syntax highlighting

edit

Copied from http://en.wikipedia.org/w/index.php?title=C%2B%2B&action=edit&section=6

<syntaxhighlight lang="cpp">

# include <iostream>

int main()
{
   std::cout << "Hello, world!\n";
}
</syntaxhighlight>

Can generate the following highlighted code:

# include <iostream>

int main()
{
   std::cout << "Hello, world!\n";
}

Math formula

edit

You can pretend to edit this section to see how math formula are written.

More resources are at