ROSE Compiler Framework/Continuous Integration

Motivation

Without automated continuous integration, we had frequent incidents like:

Developer A commits something to our central git repository's master branch. The commits contain some bugs which break our build and take a long time to have a fix. Then the central master branch is left to a corrupted state for weeks so nobody can check out/in anything.
Developer A does a lot of wonderful work offline for months. But her work later is found to be incompatible with another developer's work. Her work has unsolvable merge conflicts.

Overview

The ROSE project uses a workflow that automates the central principles of continuous integration in order to make integrating the work from different developers a non-event. Because the integration process only integrates with ROSE the changes that passes all tests we encourage all developers to stay in sync with the latest version.

A high level overview of the development model used by ROSE developers.

Step 1: Taking advantage of the distributed source code repositories based on git, each developer should first clone his/her own repository from our central git repository (or its mirrors/clones/forks).
Step 2: Then a feature or a bugfix can be developed in isolation within the private repository. The developer can create any number of private branches. Each branch should relate to a feature that this developer is working on and be relatively short-lived. The developer can commit changes to the private repository without maintaining an active connection to the shared repository.
Step 3: When work is finished and locally tested (make, make check, and make distcheck -j#n), she can pull the latest commits from the central repo's master branch
Step 4: She then can push all accumulated commits within the private repository to his branch within the shared repository. We create a dedicated branch within the central repository for each developer and establish access control of the branch so only an authorized developer can push commits to a particular branch of the shared repository.
Step 5-6 (automated): Any commits from a developer’s private repository will not be immediately merged to the master branch of the shared repository.

In fact, we have access control to prevent any developer from pushing commits to the master branch within the shared repository. A continuous integration server called Jenkins is actively monitoring each developer’s branch within the central repository and will initiate comprehensive commit tests upon the branch once new commits are detected. Finally, Jenkins will merge the new commits to the master branch of the central repository if all tests pass. If a single test fails, Jenkins will report the error and the responsible developer should address the error in his private repository and push improved commits again.

As a result, the master branch of the central git repository is mostly stable and can be a good candidate for our external release. On top of the master branch of the central git repository, we further have more comprehensive release tests in Jenkins. If all the release tests pass, an external release based on the master branch will be made available outside.

Tests on Jenkins

We use Jenkins ( http://hudson-rose-30:8080/ ) to test commits added to developer's release candidate branches at the central git repository.

The tests are organized into three categories

Integration: tests used to check if the new commits can pass various "make check" rules, compatibility tests, portability tests, configuration tests, and so on. If all tests pass, the commits will be merged (or integrated) into the master branch of the central repository.
Release: tests used to test the updated master branch of the central repository for additional set of tests using external benchmarks. If all tests pass, the head of the master will be released as a stable snapshot for public file package releases(generated by "make dist").
Others: for informational purpose now, not being used in our production workflow.

So for each push (one or more commits to a -rc branch), it will go through two stages: Integration test and Release test stage.

It is each developer's responsibility to make sure their commits can pass BOTH stage by fixing any bugs discovered by the tests.

Installed Software Packages

Here we list software packages installed and used by Jenkins

Yices: /export/tmp.hudson-rose/opt/yices/1.0.34

Development Jenkins

Several configurations

GCC_VERSION=4.4.7
BOOST_VERSION=1_47_0
source /nfs/casc/overture/ROSE/opt/rhel6/x86_64/rose_environment.sh
__rose__JAVA_VERSION_HOME=/nfs/casc/overture/ROSE/opt/rhel6/x86_64/java/jdk/1.7.0_51

GCC_VERSION=4.8.1
BOOST_VERSION=1_50_0
source /nfs/casc/overture/ROSE/opt/rhel6/x86_64/rose_environment.sh
__rose__JAVA_VERSION_HOME=/nfs/casc/overture/ROSE/opt/rhel6/x86_64/java/jdk/1.7.0_51

Check Testing Results

It is possible to manually tracking down how you commits are doing within the test pipeline within Jenkins (http://hudson-rose-30:8080/). But it can be tedious and overwhelming.

So we provide a dashboard ( http://sealavender:4000/) to summarize the commits to your release candidate branch(-rc) and the pass/fail status for each integration tests.

Note: It's possible that all of your testing jobs (finally) pass, but the actual integration is not performed. This typically occurs when one of your jobs have a system failure, for instance, so it has to be manually re-started. If you see that all of your jobs have passed, but your work has not been integrated, please let the Jenkins administrator know.

Frequently Failed Jobs

See details at ROSE Compiler Framework/Jenkins Failures

Connection to Code Review

Connection between Github Enterprise and Jenkins

In reality, most LLNL developers are now asked to push things to Github Enterprise for code review first instead of directly pushing to our central git repository. The synchronization between the Github Enterprise's code review repositories and our Central Git repo are automated.

Auto Pull

Auto pull: we have another Jenkins at (https://hudson-rose-30:8443/jenkins/) which serves as the bridge between Github Enterprise and our main production Jenkins.

For each private repositories on Github Enterprise, we have a Jenkins job to monitor the master branch for approved pull (merge) request. If there is any new approved commits, the job will transfer the commits to the central repository's -reviewed-rc branch for that developer.

Configuration of the auto pull job:

Source code management
- git: git@github.llnl.gov:account_name/rose.git
- branches to be build: github/master
Build Trigger: Poll SCM , schedule "* * * * *"
Execution shell

##
## Add /nfs as remote
##
## `|| true`: don't error if remote exists
##
git remote add nfs /nfs/casc/overture/ROSE/git/ROSE.git || true
git fetch nfs

##
## Push to /nfs *-rc
##
if [ -n "$(git log --oneline nfs/master..github/master)" ]; then
  git push --force nfs "$GIT_BRANCH":refs/heads/oun-reviewed-rc
fi

Auto Push

Auto push: A Jenkins job is responsible for propagating latest central master contents to all private repositories on github.llnl.gov

http://hudson-rose-30:8080/job/Commit-sync-github

The Job configuration

source Code Management:
- Git: /nfs/casc/overture/ROSE/git/ROSE.git
- Branches to build: */master
Build Trigger: Build after other projects are built: Commit
Execute Shell

USERS="\
user1\
user2
"

for user in $USERS; do
  tmpfile="$(mktemp)"
  ( git push git@github.llnl.gov:"$user"/rose.git origin/master:refs/heads/master 2>"$tmpfile" ) || true
  set +e
  cat "$tmpfile"
  cat "$tmpfile" | grep -q "non-fast.*forward"
  if [ $? -eq 0 ]; then
    echo "Sending error email to [${user}@llnl.gov] because their github/master is non-fast-forwardable"
    # email details are omitted here.
  fi
done

Reproduce Jenkins Job failures

Several key elements

the Jenkins script repository
the right version of ROSE
the hardware machine
the environment

Assume one failed job is https://hudson-rose-44.llnl.gov:9443/jenkins/job/development-compile-with-autotools-default-EDG-RHEL6/892/gcc=4.4.7,label=RHEL6,rhel=6/parameters/

Steps:

First, find the node used for the failed Jenkins job from its log
identify the build parameter:
- clone_url: rose-dev@rosecompiler1.llnl.gov:rose/scratch/rose
- commit: 83abd459eee1b575b4e7fab04a9f1dfc4955f02a
from the job configuration/s Build section , find the script used https://hudson-rose-44.llnl.gov:9443/jenkins/job/development-compile-with-autotools-default-EDG-RHEL6/configure

#!/bin/bash -ex

export GCC_VERSION="$gcc"

rm -rf ./jenkins-build-scripts/ || exit 1
git clone rose-dev@rosecompiler1.llnl.gov:jenkins/dev/jenkins-build-scripts.git || exit 1
source ./jenkins-build-scripts/config/env-Linux.sh $gcc $rhel || exit 1
./jenkins-build-scripts/development/development-compile-with-autotools-default-EDG-RHEL6.sh $gcc || exit 1

the same configuration page has a configuration matrix with all user defined parameters and values, including gcc versions, OS versions, and others.

Now manually check out the commit and run the script with the right gcc version and rhel version passed

../jenkins-build-scripts/config/env-Linux.sh "4.4.7" 6
git clone rose-dev@rosecompiler1.llnl.gov:rose/scratch/rose sourcetree
cd sourcetree/
git checkout -b autopar 83abd459eee1b575b4e7fab04a9f1dfc4955f02a
git submodule init
git submodule update
../jenkins-build-scripts/development/development-compile-with-autotools-default-EDG-RHEL6.sh 4.4.7

TODO

High priority

Add a pre-screening job before manual code review kicks in. the pre-screening job can make sure the code to be reviewed will be compiled with minimum warning messages and with required make check rules to run tests.
enable email notification for the final results of each test:
incrementally add more compilation tests using external benchmarks to be integration tests.
- Initial two jobs: spec cpu benchmark + NPB Fortran benchmarks
Better integration with Github Enterprise
- Avoid the Auto Push failure due to pending commits on private repo's master branch.
- Look into how others are doing this Github+ Jenkins integration
  - http://www.foraker.com/hudson-github-hooks/
  - https://wiki.jenkins-ci.org/display/JENKINS/Github+Plugin

Third Party software installed for testing in Jenkins.

Yices (http://yices.csl.sri.com/)
- Download Yices1, the lasted version is better.
- untar the tarball package of yices, then it is YICES_INSTALL, which is name like yices-1.0.34
- Type --with-yices=YICES_INSTALL with ROSE/configure option
- setup YICES_INSTALL/lib into LD_LIBRARY_PATH for Linux and DYLD_LIBRARY_PATH for mac users, it is like add Boost/lib into LD_LIBRARY_PATH

References

Files used to generate the figure: feel free to add new versions as new slides: link