# Introduction to Parallel ProgrammingEdit

## Overview of OpenMP and MPIEdit

To solve large computational problems quickly, it is necessary to take advantage of multiple cores on a CPU (central processing units) and multiple CPUs. Most programs written up until now are sequential and compilers will not typically automatically generate parallel executables, so programmers need to modify the original serial computer code to take advantage of extra processing power. Two standards which specify what libraries that allow for parallel programming should do are OpenMP and MPI (the message passing interface). In this section, we cover the minimal amount of information required to understand, run and modify the programs in this tutorial. More detailed tutorials can be found at https://computing.llnl.gov/tutorials/ and at http://www.citutor.org.

OpenMP is used for parallel programming on shared memory architectures – each compute process has a global view of memory. It allows one to incrementally parallelize an existing Fortran, C or C++ code by adding directives to the original code. It is therefore easy to use. However some care is required in getting good performance when using OpenMP. It is easy to add directives to a serial code, but thought is required in creating a program which will show improved performance and give correct results when made to run in parallel. For the numerical solution of multidimensional partial differential equations on regular grids, it is easy to perform efficient and effective loop based parallelism, so a complete understanding of all the features of OpenMP is not required. OpenMP typically allows one to use 10’s of computational cores, in particular allowing one to take advantage of multicore laptops, desktops and workstations.

MPI is used for parallel programming on distributed-memory architectures – when separate compute processes have access to their own local memory and processes must explicitly receive data held in memory belonging to other processes which have sent the data. MPI is a library which allows one to parallelize Fortran, C and C++ programs by adding function calls which explicitly move data from one process to another. Careful thought is required in converting a serial program to a parallel MPI program because the data needs to be decomposed onto different processes, so it is usually difficult to incrementally parallelize a program that uses MPI. The best way to parallelize a program which will use MPI is problem dependent. When solving large problems, one typically does not have enough memory on each process to simply replicate all the data. Thus one wants to split up the data (known as domain decomposition) in such a way as to minimize the amount of message passing that is required to perform a computation correctly. Programming this can be rather complicated and time consuming. Fortunately, by using the 2DECOMP&FFT library[1] which is written on top of MPI, we can avoid having to program many of the data passing operations when writing Fourier spectral codes and still benefit from being able to solve partial differential equations on up to O($10^5$) processor cores.

## OpenMPEdit

### OpenMP ExercisesEdit

1. What is OpenMP?

2. Download a copy of the latest OpenMP specifications from www.openmp.org. What version number is the latest specification?

3. Explain what each of the following OpenMP directives does:

1. !$OMP PARALLEL 2. !$OMP END PARALLEL

3. !$OMP PARALLEL DO 4. !$OMP END PARALLEL DO

5. !$OMP BARRIER 6. !$OMP MASTER

7. !$OMP END MASTER 4. Try to understand and then run the Hello World program in listing A on 1, 2, 6 and 12 threads. Put the output of each run in your solutions, the output will be in a file of the form helloworld.o********** where the last entries above are digits corresponding to the number of the run. An example makefile to compile this on Flux is in listing B . An example submission script is in listing C . To change the number of OpenMP processes that the program will run on from say 2 to 6, change ppn=2 to ppn=6 and also change the value of the OMP_NUM_THREADS variable from {OMP_NUM_THREADS=2} to {OMP_NUM_THREADS=6} On Flux, there is a maximum of 12 cores per node, so the largest useful number of threads for most applications is 12. ( A) A Fortran program taken from http://en.wikipedia.org/wiki/OpenMP, which demonstrates parallelism using OpenMP Code download !-------------------------------------------------------------------- ! ! ! PURPOSE ! ! This program uses OpenMP to print hello world from all available ! threads ! ! .. Parameters .. ! ! .. Scalars .. ! id = thread id ! nthreads = total number of threads ! ! .. Arrays .. ! ! .. Vectors .. ! ! REFERENCES ! http:// en.wikipedia.org/wiki/OpenMP ! ! ACKNOWLEDGEMENTS ! The program below was modified from one available at the internet ! address in the references. This internet address was last checked ! on 30 December 2011 ! ! ACCURACY ! ! ERROR INDICATORS AND WARNINGS ! ! FURTHER COMMENTS ! !-------------------------------------------------------------------- ! External routines required ! ! External libraries required ! OpenMP library PROGRAM hello90 USE omp_lib IMPLICIT NONE INTEGER:: id, nthreads !$OMP PARALLEL PRIVATE(id)
PRINT *, 'Hello World from thread', id
!$OMP BARRIER IF ( id == 0 ) THEN PRINT*, 'There are', nthreads, 'threads' END IF !$OMP END PARALLEL
END PROGRAM


( B)

An example makefile for compiling the helloworld program in listing A Code download
#define the complier
COMPILER = ifort
# compilation settings, optimization, precision, parallelization
FLAGS = -O0 -openmp

# libraries
LIBS =
# source list for main program
SOURCES = helloworld.f90

test: $(SOURCES)${COMPILER} -o helloworld $(FLAGS)$(SOURCES)

clean:
rm *.o

clobber:
rm helloworld


( C)

#!/bin/bash
#PBS -N helloworld
#PBS -l nodes=1:ppn=2,walltime=00:02:00
#PBS -q flux
#PBS -l qos=math471f11_flux
#PBS -A math471f11_flux
#PBS -M your_uniqname@umich.edu
#PBS -m abe
#PBS -V
#
# Create a local directory to run and copy your files to local.
# Let PBS handle your output
mkdir /tmp/${PBS_JOBID} cp${HOME}/ParallelMethods/helloworldOMP/helloworld /tmp/${PBS_JOBID}/helloworld cd /tmp/${PBS_JOBID}

./helloworld

cd ${HOME}/ParallelMethods/helloworldOMP /bin/rm -rf /tmp/${PBS_JOBID}

5. Add OpenMP directives to the loops in the 2-D heat equation solver. Run the resulting program on 1,3,6 and 12 threads and record the time it takes to the program to finish. Make a plot of the final iterate.

## MPIEdit

A copy of the current MPI standard can be found at http://www.mpi-forum.org/. It allows for parallelization of Fortran, C and C++ programs. There are newer parallel programming languages such as Co-Array Fortran (CAF) and Unified Parallel C (UPC) which allow the programmer to view memory as a single addressable space even on a distributed-memory machine. However, computer hardware limitations imply that most of the programming concepts used when writing MPI programs will be required to write programs in CAF and UPC. Compiler technology for these languages is also not as well developed as compiler technology for older languages such as Fortran and C, so at the present time, Fortran and C dominate high performance computing. An introduction to the essential concepts required for writing and using MPI programs can be found at http://www.shodor.org/refdesk/Resources/Tutorials/. More information on MPI can be found in Gropp, Lusk and Skjellum[2], Gropp, Lusk and Thakur[3] and at https://computing.llnl.gov/tutorials/mpi/. There are many resources available online, however once the basic concepts have been mastered, what is most useful is an index of MPI commands, usually a search engine will give you sources of listings, however we have found the following sites useful:

### MPI ExercisesEdit

1. What does MPI stand for?

2. Please read the tutorials at http://www.shodor.org/refdesk/Resources/Tutorials/BasicMPI/ and at https://computing.llnl.gov/tutorials/mpi/, then explain what the following commands do:

• USE mpi or INCLUDE 'mpif.h'

• MPI_INIT

• MPI_COMM_SIZE

• MPI_COMM_RANK

• MPI_FINALIZE

3. What is the version number of the current MPI standard?

4. Try to understand the Hello World program in listing D . Explain how it differs from A . Run the program in listing D on 1, 2, 6, 12 and 24 MPI processes[4]. Put the output of each run in your solutions, the output will be in a file of the form
helloworld.o**********
where the last entries above are digits corresponding to the number of the run. An example makefile to compile this on Flux is in listing E . An example submission script is in listing F . To change the number of MPI processes that the program will run on from say 2 to 6, change
ppn=2
to
ppn=6
and also change the submission script from
mpirun -np 2 ./helloworld
to
mpirun -np 6 ./helloworld.

On Flux, there is a maximum of 12 cores per node, so if more than 12 MPI processes are required, one needs to change the number of nodes as well. The total number of cores required is equal to the number of nodes multiplied by the number of processes per node. Thus to use 24 processes change
nodes=1:ppn=2
to
nodes=2:ppn=12
and also change the submission script from
mpirun -np 2 ./helloworld
to
mpirun -np 24 ./helloworld.

( D)

!--------------------------------------------------------------------
!
!
! PURPOSE
!
! This program uses MPI to print hello world from all available
! processes
!
! .. Parameters ..
!
! .. Scalars ..
! myid = process id
! numprocs = total number of MPI processes
! ierr = error code
!
! .. Arrays ..
!
! .. Vectors ..
!
! REFERENCES
! http:// en.wikipedia.org/wiki/OpenMP
!
! ACKNOWLEDGEMENTS
! The program below was modified from one available at the internet
! on 30 December 2011
!
! ACCURACY
!
! ERROR INDICATORS AND WARNINGS
!
!
!--------------------------------------------------------------------
! External routines required
!
! External libraries required
! MPI library
PROGRAM hello90
USE MPI
IMPLICIT NONE
INTEGER(kind=4) :: myid, numprocs, ierr

CALL MPI_INIT(ierr)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)

PRINT*, 'Hello World from process', myid
CALL MPI_BARRIER(MPI_COMM_WORLD,ierr)
IF ( myid == 0 ) THEN
PRINT*, 'There are ', numprocs, ' MPI processes'
END IF
CALL MPI_FINALIZE(ierr)
END PROGRAM


( E)

An example makefile for compiling the helloworld program in listing {lst:ForMpiHw} Code download
#define the complier
COMPILER = mpif90
# compilation settings, optimization, precision, parallelization
FLAGS = -O0

# libraries
LIBS =
# source list for main program
SOURCES = helloworld.f90

test: $(SOURCES)${COMPILER} -o helloworld $(FLAGS)$(SOURCES)

clean:
rm *.o

clobber:
rm helloworld


( F)

#!/bin/bash
#PBS -N helloworld
#PBS -l nodes=1:ppn=2,walltime=00:02:00
#PBS -q flux
#PBS -l qos=math471f11_flux
#PBS -A math471f11_flux
#PBS -M your_uniqname@umich.edu
#PBS -m abe
#PBS -V
#
# Create a local directory to run and copy your files to local.
# Let PBS handle your output
mkdir /tmp/${PBS_JOBID} cp${HOME}/ParallelMethods/helloworldMPI/helloworld /tmp/${PBS_JOBID}/helloworld cd /tmp/${PBS_JOBID}

mpirun -np 2 ./helloworld

cd ${HOME}/ParallelMethods/helloworldMPI /bin/rm -rf /tmp/${PBS_JOBID}


## A first parallel program: Monte Carlo IntegrationEdit

To introduce the basics of parallel programming in a context that is a little more complicated than { Hello World}, we will consider Monte Carlo integration. We review important concepts from probability and Riemann integration, and then give example algorithms and explain why parallelization may be helpful.

### ProbabilityEdit

$f:U\subset \mathbb{R}^2 \rightarrow \mathbb{R}_+$ is a probability density function if $\int\int_U f \mathrm{d}A =1$

If $f$ is a probability density function which takes the set $U\subset\mathbb{R}^2$, then the probability of events in the set $W\subset U$ occurring is $P(W)=\int\int_W f \mathrm{d}A.$

The joint density for it to snow $x$ inches tomorrow and for Kelly to win $y$ dollar in the lottery tomorrow is given by $f=\frac{c}{(1+x)(100+y)}$ for $x,y\in [0,100]\times[0,100]$ and $f=0$ otherwise. Find $c$.

Suppose $X$ is a random variable with probability density function $f_1(x)$ and $Y$ is a random variable with a probability density function $f_2(y)$. Then $X$ and $Y$ are independent random variables if their joint density function is $f(x,y)=f_1(x)f_2(y).$

The probability it will snow tomorrow and the probability Kelly will win the lottery tomorrow are independent random variables.

If $f(x,y)$ is a probability density function for the random variables $X$ and $Y$, the X mean is $\mu_1=\bar{X}=\int\int xf\mathrm{d}A$ and the Y mean is $\mu_2=\bar{Y}=\int\int yf\mathrm{d}A.$

The X mean and the Y mean are the expected values of X and Y.

If $f(x,y)$ is a probability density function for the random variables $X$ and $Y$, the X variance is $\sigma_1^2=\overline{(X-\bar{X})^2}=\int\int (x-\bar{X})^2f\mathrm{d}A$ and the Y variance is $\sigma_2^2=\overline{(Y-\bar{Y})^2}=\int\int (y-\bar{Y})^2f\mathrm{d}A.$

The standard deviation is defined to be the square root of the variance.

Find an expression for the probability that it will snow more than 1.1 times the expected snowfall and also that Kelly will win more than 1.2 times the expected amount in the lottery.

### ExerciseEdit

• A class is graded on a curve. It is assumed that the class is a representative sample of the population, the probability density function for the numerical score $x$ is given by $f(x)=C\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right).$ For simplicity we assume that $x$ can take on the values $-\infty$ and $\infty$, though in actual fact the exam is scored from $0$ to $100$.

1. Determine $C$ using results from your previous homework.

2. Suppose there are 240 students in the class and the mean and standard deviation for the class is not reported. As an enterprising student, you poll 60 of your fellow students (we shall suppose they are selected randomly). You find that the mean for these 60 students is 55% and the standard deviation is 10%. Use the Student’s t distribution http://en.wikipedia.org/wiki/Student\%27s_t-distribution to estimate the 90% confidence interval for the actual sample mean. Make a sketch of the t-distribution probability density function and shade the region which corresponds to the 90% confidence interval for the sample mean.[5]

Remark Fortunately, all the students are hard working, so the possibility of a negative score, although possible, is extremely low, and so we neglect it to make the above computation easier.

### Riemann IntegrationEdit

Recall that we can approximate integrals by Riemann sums. There are many integrals one cannot evaluate analytically, but for which a numerical answer is required. In this section, we shall explore a simple way of doing this on a computer. Suppose we want to find $I2d=\int_0^1\int_0^4 x^2+2y^2\mathrm{d}y\mathrm{d}x.$ If we do this analytically we find $I2d=44.$ Let us suppose we have forgotten how to integrate, and so we do this numerically. We can do so using the following Matlab code:

( G)

A Matlab program which demonstrates how to approximate an integral by a sum Code download
% A program to approximate an integral

clear all; format compact; format short;

nx=1000; % number of points in x
xend=1; % last discretization point
xstart=0; % first discretization point
dx=(xend-xstart)/(nx-1); % size of each x sub-interval

ny=4000; % number of points in y
yend=4; % last discretization point
ystart=0; % first discretization point
dy=(yend-ystart)/(ny-1); % size of each y sub-interval

% create vectors with points for x and y
for i=1:nx
x(i)=xstart+(i-1)*dx;
end
for j=1:ny
y(j)=ystart+(j-1)*dy;
end

% Approximate the integral by a sum
I2d=0;
for i=1:nx
for j=1:ny
I2d=I2d+(x(i)^2+2*y(j)^2)*dy*dx;
end
end
I2d


We can do something similar in three dimensions. Suppose we want to calculate $I3d=\int_0^1\int_0^1\int_0^4 x^2+2y^2+3z^2\mathrm{d}z\mathrm{d}y\mathrm{d}x.$ Analytically we find that $I3d=68$

### ExercisesEdit

1. Modify the Matlab code to perform the three dimensional integral.
2. Try and determine how the accuracy of either the two or three dimensional method varies as the number of subintervals is changed.

### Monte Carlo IntegrationEdit

It is possible to extend the above integration schemes to higher and higher dimensional integrals. This can become computationally intensive and an alternate method of integration based on probability is often used. The method we will discuss is called the Monte Carlo method, and the description of this method which follows is based on Chapter 3 of Vector Calculus by Michael Corral[6] which is available at http://www.mecmath.net/ and where Java and Sage programs for doing Monte Carlo integration can be found. The idea behind Monte Carlo integration is based on the concept of the average value of a function, which is encountered in single-variable calculus. Recall that for a continuous function $f(x)$, the average value $\bar{f}$ of $f$ over an interval $\lbrack a,b \rbrack$ is defined as

$\bar{f} ~=~ \frac{1}{b-a}\int_a^b f(x)\,dx ~.$

( 1)

The quantity $b-a$ is the length of the interval $\lbrack a,b \rbrack$, which can be thought of as the “volume” of the interval. Applying the same reasoning to functions of two or three variables, we define the average value of $f(x,y)$ over a region $R$ to be

$\bar{f} ~=~ \frac{1}{A(R)}\iint\limits_{R} f(x,y)\,dA ~,$

( 2)

where $A(R)$ is the area of the region $R$, and we define the average value of $f(x,y,z)$ over a solid $S$ to be

$\bar{f} ~=~ \frac{1}{V(S)}\iiint\limits_{S} f(x,y,z)\,dV ~,$

( 3)

where $V(S)$ is the volume of the solid $S$. Thus, for example, we have

$\iint\limits_{R} f(x,y)\,dA ~=~ A(R)\bar{f} ~.$

( 4)

The average value of $f(x,y)$ over $R$ can be thought of as representing the sum of all the values of $f$ divided by the number of points in $R$. Unfortunately there are an infinite number (in fact, uncountably many) points in any region, i.e. they can not be listed in a discrete sequence. But what if we took a very large number $N$ of random points in the region $R$ (which can be generated by a computer) and then took the average of the values of $f$ for those points, and used that average as the value of $\bar{f}$? This is exactly what the Monte Carlo method does. So in formula 4 the approximation we get is

$\iint\limits_{R} f(x,y)\,dA ~\approx~ A(R)\bar{f} \pm A(R)\sqrt{\frac{\overline{f^2} - (\bar{f})^2}{N}} ~,$

( 5)

where

$\bar{f} ~=~ \frac{\sum_{i=1}^N f(x_i,y_i)}{N} \quad \text{and} \quad \overline{f^2} ~=~ \frac{\sum_{i=1}^N (f(x_i,y_i))^2}{N} ~,$

( 6)

with the sums taken over the $N$ random points $(x_1,y_1)$, $\ldots$, $(x_N,y_N)$. The $\pm$ “error term” in formula 5 does not really provide hard bounds on the approximation. It represents a single standard deviation from the expected value of the integral. That is, it provides a likely bound on the error. Due to its use of random points, the Monte Carlo method is an example of a probabilistic method (as opposed to deterministic methods such as the Riemann sum approximation method, which use a specific formula for generating points).

For example, we can use the formula in eq. 5 to approximate the volume $V$ under the surface $z=x^2+2y^2$ over the rectangle $R =(0,1)\times(0,4)$. Recall that the actual volume is $44$. Below is a Matlab code that calculates the volume using Monte Carlo integration

( H)

A Matlab program which demonstrates how to use the Monte Carlo method to calculate the volume below $z=x^2+2y^2$, with $(x,y)\in(0,1)\times(0,4)$. Code download
% A program to approximate an integral using the Monte Carlos method

% This program can be made much faster by using Matlab's matrix and vector
% operations, however to allow easy translation to other languages we have
% made it as simple as possible.

Numpoints=65536; % number of random points

I2d=0; % Initialize value
I2dsquare=0; % initial variance
for n=1:Numpoints
% generate random number drawn from a uniform distribution on (0,1)
x=rand(1);
y=rand(1)*4;
I2d=I2d+x^2+2*y^2;
I2dsquare=I2dsquare+(x^2+2*y^2)^2;
end
% we scale the integral by the total area and divide by the number of
% points used
I2d=I2d*4/Numpoints
% we also output an estimated error
I2dsquare=I2dsquare*4/Numpoints;
EstimError=4*sqrt( (I2d^2-I2dsquare)/Numpoints)


The results of running this program with various numbers of random points are shown below:

N = 16: 41.3026 +/- 30.9791
N = 256: 47.1855 +/- 9.0386
N = 4096: 43.4527 +/- 2.0280
N = 65536: 44.0026 +/- 0.5151


As you can see, the approximation is fairly good. As $N \to \infty$, it can be shown that the Monte Carlo approximation converges to the actual volume (on the order of $O(\sqrt{N})$, in computational complexity terminology).

In the above example the region $R$ was a rectangle. To use the Monte Carlo method for a nonrectangular (bounded) region $R$, only a slight modification is needed. Pick a rectangle $\tilde{R}$ that encloses $R$, and generate random points in that rectangle as before. Then use those points in the calculation of $\bar{f}$ only if they are inside $R$. There is no need to calculate the area of $R$ for formula ({eqn:monte}) in this case, since the exclusion of points not inside $R$ allows you to use the area of the rectangle $\tilde{R}$ instead, similar to before.

For instance, one can show that the volume under the surface $z=1$ over the nonrectangular region $R = \left\{ (x,y): 0 \le x^2+y^2 \le 1\right\}$ is $\pi$. Since the rectangle $\tilde{R} = [-1,1] \times [ -1,1]$ contains $R$, we can use a similar program to the one we used, the largest change being a check to see if $y^2+x^3 \leq 1$ for a random point $(x,y)$ in $[-1,1] \times [ -1,1]$. A Matlab code listing which demonstrates this is below:

( I)

A Matlab program which demonstrates how to use the Monte Carlo method to calculate the area of an irregular region and also to calculate $\pi$. Code download
% A program to approximate an integral using the Monte Carlos method

% This program can be made much faster by using Matlab's matrix and vector
% operations, however to allow easy translation to other languages we have
% made it as simple as possible.

Numpoints=256; % number of random points

I2d=0; % Initialize value
I2dsquare=0; % initial variance
for n=1:Numpoints
% generate random number drawn from a uniform distribution on (0,1) and
% scale this to (-1,1)
x=2*rand(1)-1;
y=2*rand(1) -1;
if ((x^2+y^2) <1)
I2d=I2d+1;
I2dsquare=I2dsquare+1;
end
end
% We scale the integral by the total area and divide by the number of
% points used
I2d=I2d*4/Numpoints
% we also output an estimated error
I2dsquare=I2dsquare*4/Numpoints;
EstimError=4*sqrt( (I2d^2-I2dsquare)/Numpoints)


The results of running the program with various numbers of random points are shown below:

N = 16: 3.5000 +/- 2.9580
N = 256: 3.2031 +/- 0.6641
N = 4096: 3.1689 +/- 0.1639
N = 65536: 3.1493 +/- 0.0407


To use the Monte Carlo method to evaluate triple integrals, you will need to generate random triples $(x,y,z)$ in a parallelepiped, instead of random pairs $(x,y)$ in a rectangle, and use the volume of the parallelepiped instead of the area of a rectangle in formula 5 . For a more detailed discussion of numerical integration methods, please take a further course in mathematics.

### ExercisesEdit

1. Write a program that uses the Monte Carlo method to approximate the double integral $\iint\limits_{R} e^{xy}\,dA$, where $R = \lbrack 0,1 \rbrack \times \lbrack 0,1 \rbrack$. Show the program output for $N=10$, $100$, $1000$, $10000$, $100000$ and $1000000$ random points.
2. Write a program that uses the Monte Carlo method to approximate the triple integral $\iiint\limits_{S} e^{xyz}\,dV$, where $S= \lbrack 0,1 \rbrack \times \lbrack 0,1 \rbrack \times \lbrack 0,1 \rbrack$. Show the program output for $N=10$, $100$, $1000$, $10000$, $100000$ and $1000000$ random points.
3. Use the Monte Carlo method to approximate the volume of a sphere of radius $1$.

### Parallel Monte Carlo IntegrationEdit

As you may have noticed, the algorithms are simple, but can require very many grid points to become accurate. It is therefore useful to run these algorithms on a parallel computer. We will demonstrate a parallel Monte Carlo calculation of $\pi$. Before we can do this, we need to learn how to use a parallel computer[7].

We now examine a Fortran program for calculating $\pi$. These programs are taken from http://chpc.wustl.edu/mpi-fortran.html, where further explanation can be found. The original source of these programs appears to be Gropp, Lusk and Skjellum[8]

#### SerialEdit

( I)

A serial Fortran program which demonstrates how to calculate $\pi$ using a Monte Carlo method Code download
!--------------------------------------------------------------------
!
!
! PURPOSE
!
! This program use a monte carlo method to calculate pi
!
! .. Parameters ..
! npts = total number of Monte Carlo points
! xmin = lower bound for integration region
! xmax = upper bound for integration region
! .. Scalars ..
! i = loop counter
! f = average value from summation
! sum = total sum
! randnum = random number generated from (0,1) uniform
! distribution
! x = current Monte Carlo location
! .. Arrays ..
!
! .. Vectors ..
!
! REFERENCES
! http://chpc.wustl.edu/mpi-fortran.html
! Gropp, Lusk and Skjellum, "Using MPI" MIT press (1999)
!
! ACKNOWLEDGEMENTS
! The program below was modified from one available at the internet
! on 30 March 2012
!
! ACCURACY
!
! ERROR INDICATORS AND WARNINGS
!
!
!--------------------------------------------------------------------
! External routines required
!
! External libraries required
! None
PROGRAM monte_carlo
IMPLICIT NONE

INTEGER(kind=8), PARAMETER :: npts = 1e10
REAL(kind=8), PARAMETER :: xmin=0.0d0,xmax=1.0d0
INTEGER(kind=8) :: i
REAL(kind=8) :: f,sum, randnum,x

DO i=1,npts
CALL random_number(randnum)
x = (xmax-xmin)*randnum + xmin
sum = sum + 4.0d0/(1.0d0 + x**2)
END DO
f = sum/npts
PRINT*,'PI calculated with ',npts,' points = ',f

STOP
END


( J)

An example makefile for compiling the program in listing I Code download
#define the complier
COMPILER = mpif90
# compilation settings, optimization, precision, parallelization
FLAGS = -O0

# libraries
LIBS =
# source list for main program
SOURCES = montecarloserial.f90

test: $(SOURCES)${COMPILER} -o montecarloserial $(FLAGS)$(SOURCES)

clean:
rm *.o

clobber:
rm montecarloserial


( K)

An example submission script for use on Trestles located at the San Diego Supercomputing Center Code download
#!/bin/bash
# the queue to be used.
#PBS -q shared
#PBS -A mia122
# number of nodes and number of processors per node requested
#PBS -l nodes=1:ppn=1
# requested Wall-clock time.
#PBS -l walltime=00:05:00
# name of the standard out file to be "output-file".
#PBS -o job_output
# name of the job
#PBS -N MCserial
#PBS -M youremail@umich.edu
# send a notification for job abort, begin and end
#PBS -m abe
#PBS -V
cd $PBS_O_WORKDIR #change to the working directory mpirun_rsh -np 1 -hostfile$PBS_NODEFILE montecarloserial


#### ParallelEdit

( L)

A parallel fortran MPI program to calculate $\pi$. Code download
!--------------------------------------------------------------------
!
!
! PURPOSE
!
! This program uses MPI to do a parallel monte carlo calculation of pi
!
! .. Parameters ..
! npts = total number of Monte Carlo points
! xmin = lower bound for integration region
! xmax = upper bound for integration region
! .. Scalars ..
! mynpts = this processes number of Monte Carlo points
! myid = process id
! nprocs = total number of MPI processes
! ierr = error code
! i = loop counter
! f = average value from summation
! sum = total sum
! mysum = sum on this process
! randnum = random number generated from (0,1) uniform
! distribution
! x = current Monte Carlo location
! start = simulation start time
! finish = simulation end time
! .. Arrays ..
!
! .. Vectors ..
!
! REFERENCES
! http://chpc.wustl.edu/mpi-fortran.html
! Gropp, Lusk and Skjellum, "Using MPI" MIT press (1999)
!
! ACKNOWLEDGEMENTS
! The program below was modified from one available at the internet
! on 30 March 2012
!
! ACCURACY
!
! ERROR INDICATORS AND WARNINGS
!
!
!--------------------------------------------------------------------
! External routines required
!
! External libraries required
! MPI library
PROGRAM monte_carlo_mpi
USE MPI
IMPLICIT NONE

INTEGER(kind=8), PARAMETER :: npts = 1e10
REAL(kind=8), PARAMETER :: xmin=0.0d0,xmax=1.0d0
INTEGER(kind=8) :: mynpts
INTEGER(kind=4) :: ierr, myid, nprocs
INTEGER(kind=8) :: i
REAL(kind=8) :: f,sum,mysum,randnum
REAL(kind=8) :: x, start, finish

! Initialize MPI
CALL MPI_INIT(ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, nprocs, ierr)
start=MPI_WTIME()

! Calculate the number of points each MPI process needs to generate
IF (myid .eq. 0) THEN
mynpts = npts - (nprocs-1)*(npts/nprocs)
ELSE
mynpts = npts/nprocs
ENDIF

! set initial sum to zero
mysum = 0.0d0
! use loop on local process to generate portion of Monte Carlo integral
DO i=1,mynpts
CALL random_number(randnum)
x = (xmax-xmin)*randnum + xmin
mysum = mysum + 4.0d0/(1.0d0 + x**2)
ENDDO

! Do a reduction and sum the results from all processes
CALL MPI_REDUCE(mysum,sum,1,MPI_DOUBLE_PRECISION,MPI_SUM,&
0,MPI_COMM_WORLD,ierr)
finish=MPI_WTIME()

! Get one process to output the result and running time
IF (myid .eq. 0) THEN
f = sum/npts
PRINT*,'PI calculated with ',npts,' points = ',f
PRINT*,'Program took ', finish-start, ' for Time stepping'
ENDIF

CALL MPI_FINALIZE(ierr)

STOP
END PROGRAM


( M)

An example makefile for compiling the program in listing L Code download
#define the complier
COMPILER = mpif90
# compilation settings, optimization, precision, parallelization
FLAGS = -O0

# libraries
LIBS =
# source list for main program
SOURCES = montecarloparallel.f90

test: $(SOURCES)${COMPILER} -o montecarloparallel $(FLAGS)$(SOURCES)

clean:
rm *.o

clobber:
rm montecarloparallel


( N)

An example submission script for use on Trestles located at the San Diego Supercomputing Center Code download
#!/bin/bash
# the queue to be used.
#PBS -q normal
#PBS -A mia122
# number of nodes and number of processors per node requested
#PBS -l nodes=1:ppn=32
# requested Wall-clock time.
#PBS -l walltime=00:05:00
# name of the standard out file to be "output-file".
#PBS -o job_output
# name of the job, you may want to change this so it is unique to you
#PBS -N MPI_MCPARALLEL
#PBS -M youremail@umich.edu
# send a notification for job abort, begin and end
#PBS -m abe
#PBS -V

# change to the job submission directory
cd $PBS_O_WORKDIR # Run the job mpirun_rsh -np 32 -hostfile$PBS_NODEFILE montecarloparallel


### ExercisesEdit

1. Explain why using Monte Carlo to evaluate $\int_0^1\frac{1}{1+x^2}\mathrm{d}x$ allows you to find $\pi$ and, in your own words, explain what the serial and parallel programs do.
2. Find the time it takes to run the Parallel Monte Carlo program on 32, 64, 128, 256 and 512 cores.
3. Use a parallel Monte Carlo integration program to evaluate $\iint x^2+y^6+\exp(xy)\cos(y\exp(x))\mathrm{d}A$ over the unit circle.
4. Use a parallel Monte Carlo integration program to approximate the volume of the ellipsoid $\frac{x^2}{9}+\frac{y^2}{4}+ \frac{z^2}{1}=1$. Use either OpenMP or MPI.
5. Write parallel programs to find the volume of the 4 dimensional sphere $1\geq\sum_{i=1}^4 x_i^2.$ Try both Monte Carlo and Riemann sum techniques. Use either OpenMP or MPI.

## NotesEdit

1. 2decomp&fft
2. Gropp, Lusk and Skjellum (1999)
3. Gropp, Lusk and Thakur (1999)
4. One can run this program on many more than 24 processes, however, the output becomes quite excessive

5. The Student’s t distribution is implemented in many numerical packages such as Maple, Mathematica, Matlab, R, Sage etc., so if you need to use to obtain numerical results, it is helpful to use on of these packages.

6. Corral (2011)
7. Many computers and mobile telephones produced today have 2 or more cores and so can be considered parallel, but here we mean computers with over hundreds of cores.
8. Gropp, Lusk and Skjellum (1999)

## ReferencesEdit

Corral, M. (2011). Vector Calculus.

Gropp, W.; Lusk, E.; Skjellum, A. (1999). Using MPI. MIT Press.

Gropp, W.; Lusk, E.; Thakur, R. (1999). Using MPI-2. MIT Press.

Li, N. "2decomp&fft".