Parallel Spectral Numerical Methods/Introduction to Parallel Programming
Introduction to Parallel Programming
editOverview of OpenMP and MPI
editTo solve large computational problems quickly, it is necessary to take advantage of multiple cores on a CPU (central processing units) and multiple CPUs. Most programs written up until now are sequential and compilers will not typically automatically generate parallel executables, so programmers need to modify the original serial computer code to take advantage of extra processing power. Two standards which specify what libraries that allow for parallel programming should do are OpenMP and MPI (the message passing interface). In this section, we cover the minimal amount of information required to understand, run and modify the programs in this tutorial. More detailed tutorials can be found at https://computing.llnl.gov/tutorials/ and at http://www.citutor.org.
OpenMP is used for parallel programming on shared memory architectures – each compute process has a global view of memory. It allows one to incrementally parallelize an existing Fortran, C or C++ code by adding directives to the original code. It is therefore easy to use. However some care is required in getting good performance when using OpenMP. It is easy to add directives to a serial code, but thought is required in creating a program which will show improved performance and give correct results when made to run in parallel. For the numerical solution of multidimensional partial differential equations on regular grids, it is easy to perform efficient and effective loop based parallelism, so a complete understanding of all the features of OpenMP is not required. OpenMP typically allows one to use 10’s of computational cores, in particular allowing one to take advantage of multicore laptops, desktops and workstations.
MPI is used for parallel programming on distributedmemory architectures – when separate compute processes have access to their own local memory and processes must explicitly receive data held in memory belonging to other processes which have sent the data. MPI is a library which allows one to parallelize Fortran, C and C++ programs by adding function calls which explicitly move data from one process to another. Careful thought is required in converting a serial program to a parallel MPI program because the data needs to be decomposed onto different processes, so it is usually difficult to incrementally parallelize a program that uses MPI. The best way to parallelize a program which will use MPI is problem dependent. When solving large problems, one typically does not have enough memory on each process to simply replicate all the data. Thus one wants to split up the data (known as domain decomposition) in such a way as to minimize the amount of message passing that is required to perform a computation correctly. Programming this can be rather complicated and time consuming. Fortunately, by using the 2DECOMP&FFT library^{[1]} which is written on top of MPI, we can avoid having to program many of the data passing operations when writing Fourier spectral codes and still benefit from being able to solve partial differential equations on up to O( ) processor cores.
OpenMP
editPlease read the tutorial at https://computing.llnl.gov/tutorials/openMP/, then answer the following questions:
OpenMP Exercises
editWhat is OpenMP?
Download a copy of the latest OpenMP specifications from www.openmp.org. What version number is the latest specification?
Explain what each of the following OpenMP directives does:
!$OMP PARALLEL
!$OMP END PARALLEL
!$OMP PARALLEL DO
!$OMP END PARALLEL DO
!$OMP BARRIER
!$OMP MASTER
!$OMP END MASTER
Try to understand and then run the Hello World program in listing A on 1, 2, 6 and 12 threads. Put the output of each run in your solutions, the output will be in a file of the form
helloworld.o**********
where the last entries above are digits corresponding to the number of the run. An example makefile to compile this on Flux is in listing B . An example submission script is in listing C . To change the number of OpenMP processes that the program will run on from say 2 to 6, change
ppn=2
to
ppn=6
and also change the value of the OMP_NUM_THREADS variable from
{OMP_NUM_THREADS=2}
to
{OMP_NUM_THREADS=6}
On Flux, there is a maximum of 12 cores per node, so the largest useful number of threads for most applications is 12.()! ! ! ! PURPOSE ! ! This program uses OpenMP to print hello world from all available ! threads ! ! .. Parameters .. ! ! .. Scalars .. ! id = thread id ! nthreads = total number of threads ! ! .. Arrays .. ! ! .. Vectors .. ! ! REFERENCES ! http:// en.wikipedia.org/wiki/OpenMP ! ! ACKNOWLEDGEMENTS ! The program below was modified from one available at the internet ! address in the references. This internet address was last checked ! on 30 December 2011 ! ! ACCURACY ! ! ERROR INDICATORS AND WARNINGS ! ! FURTHER COMMENTS ! ! ! External routines required ! ! External libraries required ! OpenMP library PROGRAM hello90 USE omp_lib IMPLICIT NONE INTEGER:: id, nthreads !$OMP PARALLEL PRIVATE(id) id = omp_get_thread_num() nthreads = omp_get_num_threads() PRINT *, 'Hello World from thread', id !$OMP BARRIER IF ( id == 0 ) THEN PRINT*, 'There are', nthreads, 'threads' END IF !$OMP END PARALLEL END PROGRAM
()#define the complier COMPILER = ifort # compilation settings, optimization, precision, parallelization FLAGS = O0 openmp # libraries LIBS = # source list for main program SOURCES = helloworld.f90 test: $(SOURCES) ${COMPILER} o helloworld $(FLAGS) $(SOURCES) clean: rm *.o clobber: rm helloworld
()#!/bin/bash #PBS N helloworld #PBS l nodes=1:ppn=2,walltime=00:02:00 #PBS q flux #PBS l qos=math471f11_flux #PBS A math471f11_flux #PBS M your_uniqname@umich.edu #PBS m abe #PBS V # # Create a local directory to run and copy your files to local. # Let PBS handle your output mkdir /tmp/${PBS_JOBID} cp ${HOME}/ParallelMethods/helloworldOMP/helloworld /tmp/${PBS_JOBID}/helloworld cd /tmp/${PBS_JOBID} export OMP_NUM_THREADS=2 ./helloworld #Clean up your files cd ${HOME}/ParallelMethods/helloworldOMP /bin/rm rf /tmp/${PBS_JOBID}
Add OpenMP directives to the loops in the 2D heat equation solver. Run the resulting program on 1,3,6 and 12 threads and record the time it takes to the program to finish. Make a plot of the final iterate.
MPI
editA copy of the current MPI standard can be found at http://www.mpiforum.org/. It allows for parallelization of Fortran, C and C++ programs. There are newer parallel programming languages such as CoArray Fortran (CAF) and Unified Parallel C (UPC) which allow the programmer to view memory as a single addressable space even on a distributedmemory machine. However, computer hardware limitations imply that most of the programming concepts used when writing MPI programs will be required to write programs in CAF and UPC. Compiler technology for these languages is also not as well developed as compiler technology for older languages such as Fortran and C, so at the present time, Fortran and C dominate high performance computing. An introduction to the essential concepts required for writing and using MPI programs can be found at http://www.shodor.org/refdesk/Resources/Tutorials/. More information on MPI can be found in Gropp, Lusk and Skjellum^{[2]}, Gropp, Lusk and Thakur^{[3]} and at https://computing.llnl.gov/tutorials/mpi/. There are many resources available online, however once the basic concepts have been mastered, what is most useful is an index of MPI commands, usually a search engine will give you sources of listings, however we have found the following sites useful:
 http://www.mpi.forum.org/docs/mpi11html/node182.html
 http://publib.boulder.ibm.com/infocenter/zos/v1r13/index.jsp?topic=\%2Fcom.ibm.zos.r13.fomp200\%2Fipezps00172.htm
 http://www.openmpi.org/doc/v1.4/
MPI Exercises
editWhat does MPI stand for?
Please read the tutorials at http://www.shodor.org/refdesk/Resources/Tutorials/BasicMPI/ and at https://computing.llnl.gov/tutorials/mpi/, then explain what the following commands do:
USE mpi or INCLUDE 'mpif.h'
MPI_INIT
MPI_COMM_SIZE
MPI_COMM_RANK
MPI_FINALIZE
What is the version number of the current MPI standard?
Try to understand the Hello World program in listing D . Explain how it differs from A . Run the program in listing D on 1, 2, 6, 12 and 24 MPI processes^{[4]}. Put the output of each run in your solutions, the output will be in a file of the form
helloworld.o**********
where the last entries above are digits corresponding to the number of the run. An example makefile to compile this on Flux is in listing E . An example submission script is in listing F . To change the number of MPI processes that the program will run on from say 2 to 6, change
ppn=2
to
ppn=6
and also change the submission script from
mpirun np 2 ./helloworld
to
mpirun np 6 ./helloworld.On Flux, there is a maximum of 12 cores per node, so if more than 12 MPI processes are required, one needs to change the number of nodes as well. The total number of cores required is equal to the number of nodes multiplied by the number of processes per node. Thus to use 24 processes change
nodes=1:ppn=2
to
nodes=2:ppn=12
and also change the submission script from
mpirun np 2 ./helloworld
to
mpirun np 24 ./helloworld.

(
) 
!
!
!
! PURPOSE
!
! This program uses MPI to print hello world from all available
! processes
!
! .. Parameters ..
!
! .. Scalars ..
! myid = process id
! numprocs = total number of MPI processes
! ierr = error code
!
! .. Arrays ..
!
! .. Vectors ..
!
! REFERENCES
! http:// en.wikipedia.org/wiki/OpenMP
!
! ACKNOWLEDGEMENTS
! The program below was modified from one available at the internet
! address in the references. This internet address was last checked
! on 30 December 2011
!
! ACCURACY
!
! ERROR INDICATORS AND WARNINGS
!
! FURTHER COMMENTS
!
!
! External routines required
!
! External libraries required
! MPI library
PROGRAM hello90
USE MPI
IMPLICIT NONE
INTEGER(kind=4) :: myid, numprocs, ierr
CALL MPI_INIT(ierr)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)
PRINT*, 'Hello World from process', myid
CALL MPI_BARRIER(MPI_COMM_WORLD,ierr)
IF ( myid == 0 ) THEN
PRINT*, 'There are ', numprocs, ' MPI processes'
END IF
CALL MPI_FINALIZE(ierr)
END PROGRAM

(
) 
#define the complier
COMPILER = mpif90
# compilation settings, optimization, precision, parallelization
FLAGS = O0
# libraries
LIBS =
# source list for main program
SOURCES = helloworld.f90
test: $(SOURCES)
${COMPILER} o helloworld $(FLAGS) $(SOURCES)
clean:
rm *.o
clobber:
rm helloworld

(
) 
#!/bin/bash
#PBS N helloworld
#PBS l nodes=1:ppn=2,walltime=00:02:00
#PBS q flux
#PBS l qos=math471f11_flux
#PBS A math471f11_flux
#PBS M your_uniqname@umich.edu
#PBS m abe
#PBS V
#
# Create a local directory to run and copy your files to local.
# Let PBS handle your output
mkdir /tmp/${PBS_JOBID}
cp ${HOME}/ParallelMethods/helloworldMPI/helloworld /tmp/${PBS_JOBID}/helloworld
cd /tmp/${PBS_JOBID}
mpirun np 2 ./helloworld
#Clean up your files
cd ${HOME}/ParallelMethods/helloworldMPI
/bin/rm rf /tmp/${PBS_JOBID}
A first parallel program: Monte Carlo Integration
editTo introduce the basics of parallel programming in a context that is a little more complicated than { Hello World}, we will consider Monte Carlo integration. We review important concepts from probability and Riemann integration, and then give example algorithms and explain why parallelization may be helpful.
Probability
editis a probability density function if
If is a probability density function which takes the set , then the probability of events in the set occurring is
The joint density for it to snow inches tomorrow and for Kelly to win dollar in the lottery tomorrow is given by for and otherwise. Find .
Suppose is a random variable with probability density function and is a random variable with a probability density function . Then and are independent random variables if their joint density function is
The probability it will snow tomorrow and the probability Kelly will win the lottery tomorrow are independent random variables.
If is a probability density function for the random variables and , the X mean is and the Y mean is
The X mean and the Y mean are the expected values of X and Y.
If is a probability density function for the random variables and , the X variance is and the Y variance is
The standard deviation is defined to be the square root of the variance.
Find an expression for the probability that it will snow more than 1.1 times the expected snowfall and also that Kelly will win more than 1.2 times the expected amount in the lottery.
Exercise
editA class is graded on a curve. It is assumed that the class is a representative sample of the population, the probability density function for the numerical score is given by For simplicity we assume that can take on the values and , though in actual fact the exam is scored from to .
Determine using results from your previous homework.
Suppose there are 240 students in the class and the mean and standard deviation for the class is not reported. As an enterprising student, you poll 60 of your fellow students (we shall suppose they are selected randomly). You find that the mean for these 60 students is 55% and the standard deviation is 10%. Use the Student’s t distribution http://en.wikipedia.org/wiki/Student\%27s_tdistribution to estimate the 90% confidence interval for the actual sample mean. Make a sketch of the tdistribution probability density function and shade the region which corresponds to the 90% confidence interval for the sample mean.^{[5]}
Remark Fortunately, all the students are hard working, so the possibility of a negative score, although possible, is extremely low, and so we neglect it to make the above computation easier.
Riemann Integration
editRecall that we can approximate integrals by Riemann sums. There are many integrals one cannot evaluate analytically, but for which a numerical answer is required. In this section, we shall explore a simple way of doing this on a computer. Suppose we want to find If we do this analytically we find Let us suppose we have forgotten how to integrate, and so we do this numerically. We can do so using the following Matlab code:

(
) 
% A program to approximate an integral
clear all; format compact; format short;
nx=1000; % number of points in x
xend=1; % last discretization point
xstart=0; % first discretization point
dx=(xendxstart)/(nx1); % size of each x subinterval
ny=4000; % number of points in y
yend=4; % last discretization point
ystart=0; % first discretization point
dy=(yendystart)/(ny1); % size of each y subinterval
% create vectors with points for x and y
for i=1:nx
x(i)=xstart+(i1)*dx;
end
for j=1:ny
y(j)=ystart+(j1)*dy;
end
% Approximate the integral by a sum
I2d=0;
for i=1:nx
for j=1:ny
I2d=I2d+(x(i)^2+2*y(j)^2)*dy*dx;
end
end
% print out final answer
I2d
We can do something similar in three dimensions. Suppose we want to calculate Analytically we find that
Exercises
edit Modify the Matlab code to perform the three dimensional integral.
 Try and determine how the accuracy of either the two or three dimensional method varies as the number of subintervals is changed.
Monte Carlo Integration
editIt is possible to extend the above integration schemes to higher and higher dimensional integrals. This can become computationally intensive and an alternate method of integration based on probability is often used. The method we will discuss is called the Monte Carlo method, and the description of this method which follows is based on Chapter 3 of Vector Calculus by Michael Corral^{[6]} which is available at http://www.mecmath.net/ and where Java and Sage programs for doing Monte Carlo integration can be found. The idea behind Monte Carlo integration is based on the concept of the average value of a function, which is encountered in singlevariable calculus. Recall that for a continuous function , the average value of over an interval is defined as

()
The quantity is the length of the interval , which can be thought of as the “volume” of the interval. Applying the same reasoning to functions of two or three variables, we define the average value of over a region to be

()
where is the area of the region , and we define the average value of over a solid to be

()
where is the volume of the solid . Thus, for example, we have

()
The average value of over can be thought of as representing the sum of all the values of divided by the number of points in . Unfortunately there are an infinite number (in fact, uncountably many) points in any region, i.e. they can not be listed in a discrete sequence. But what if we took a very large number of random points in the region (which can be generated by a computer) and then took the average of the values of for those points, and used that average as the value of ? This is exactly what the Monte Carlo method does. So in formula 4 the approximation we get is

()
where

()
with the sums taken over the random points , , . The “error term” in formula 5 does not really provide hard bounds on the approximation. It represents a single standard deviation from the expected value of the integral. That is, it provides a likely bound on the error. Due to its use of random points, the Monte Carlo method is an example of a probabilistic method (as opposed to deterministic methods such as the Riemann sum approximation method, which use a specific formula for generating points).
For example, we can use the formula in eq. 5 to approximate the volume under the surface over the rectangle . Recall that the actual volume is . Below is a Matlab code that calculates the volume using Monte Carlo integration

(
) 
% A program to approximate an integral using the Monte Carlos method
% This program can be made much faster by using Matlab's matrix and vector
% operations, however to allow easy translation to other languages we have
% made it as simple as possible.
Numpoints=65536; % number of random points
I2d=0; % Initialize value
I2dsquare=0; % initial variance
for n=1:Numpoints
% generate random number drawn from a uniform distribution on (0,1)
x=rand(1);
y=rand(1)*4;
I2d=I2d+x^2+2*y^2;
I2dsquare=I2dsquare+(x^2+2*y^2)^2;
end
% we scale the integral by the total area and divide by the number of
% points used
I2d=I2d*4/Numpoints
% we also output an estimated error
I2dsquare=I2dsquare*4/Numpoints;
EstimError=4*sqrt( (I2d^2I2dsquare)/Numpoints)
The results of running this program with various numbers of random points are shown below:
N = 16: 41.3026 +/ 30.9791 N = 256: 47.1855 +/ 9.0386 N = 4096: 43.4527 +/ 2.0280 N = 65536: 44.0026 +/ 0.5151
As you can see, the approximation is fairly good. As , it can be shown that the Monte Carlo approximation converges to the actual volume (on the order of , in computational complexity terminology).
In the above example the region was a rectangle. To use the Monte Carlo method for a nonrectangular (bounded) region , only a slight modification is needed. Pick a rectangle that encloses , and generate random points in that rectangle as before. Then use those points in the calculation of only if they are inside . There is no need to calculate the area of for formula ({eqn:monte}) in this case, since the exclusion of points not inside allows you to use the area of the rectangle instead, similar to before.
For instance, one can show that the volume under the surface over the nonrectangular region is . Since the rectangle contains , we can use a similar program to the one we used, the largest change being a check to see if for a random point in . A Matlab code listing which demonstrates this is below:

(
) 
% A program to approximate an integral using the Monte Carlos method
% This program can be made much faster by using Matlab's matrix and vector
% operations, however to allow easy translation to other languages we have
% made it as simple as possible.
Numpoints=256; % number of random points
I2d=0; % Initialize value
I2dsquare=0; % initial variance
for n=1:Numpoints
% generate random number drawn from a uniform distribution on (0,1) and
% scale this to (1,1)
x=2*rand(1)1;
y=2*rand(1) 1;
if ((x^2+y^2) <1)
I2d=I2d+1;
I2dsquare=I2dsquare+1;
end
end
% We scale the integral by the total area and divide by the number of
% points used
I2d=I2d*4/Numpoints
% we also output an estimated error
I2dsquare=I2dsquare*4/Numpoints;
EstimError=4*sqrt( (I2d^2I2dsquare)/Numpoints)
The results of running the program with various numbers of random points are shown below:
N = 16: 3.5000 +/ 2.9580 N = 256: 3.2031 +/ 0.6641 N = 4096: 3.1689 +/ 0.1639 N = 65536: 3.1493 +/ 0.0407
To use the Monte Carlo method to evaluate triple integrals, you will need to generate random triples in a parallelepiped, instead of random pairs in a rectangle, and use the volume of the parallelepiped instead of the area of a rectangle in formula 5 . For a more detailed discussion of numerical integration methods, please take a further course in mathematics.
Exercises
edit Write a program that uses the Monte Carlo method to approximate the double integral , where . Show the program output for , , , , and random points.
 Write a program that uses the Monte Carlo method to approximate the triple integral , where . Show the program output for , , , , and random points.
 Use the Monte Carlo method to approximate the volume of a sphere of radius .
Parallel Monte Carlo Integration
editAs you may have noticed, the algorithms are simple, but can require very many grid points to become accurate. It is therefore useful to run these algorithms on a parallel computer. We will demonstrate a parallel Monte Carlo calculation of . Before we can do this, we need to learn how to use a parallel computer^{[7]}.
We now examine a Fortran program for calculating . These programs are taken from http://chpc.wustl.edu/mpifortran.html, where further explanation can be found. The original source of these programs appears to be Gropp, Lusk and Skjellum^{[8]}
Serial
edit

(
) 
!
!
!
! PURPOSE
!
! This program use a monte carlo method to calculate pi
!
! .. Parameters ..
! npts = total number of Monte Carlo points
! xmin = lower bound for integration region
! xmax = upper bound for integration region
! .. Scalars ..
! i = loop counter
! f = average value from summation
! sum = total sum
! randnum = random number generated from (0,1) uniform
! distribution
! x = current Monte Carlo location
! .. Arrays ..
!
! .. Vectors ..
!
! REFERENCES
! http://chpc.wustl.edu/mpifortran.html
! Gropp, Lusk and Skjellum, "Using MPI" MIT press (1999)
!
! ACKNOWLEDGEMENTS
! The program below was modified from one available at the internet
! address in the references. This internet address was last checked
! on 30 March 2012
!
! ACCURACY
!
! ERROR INDICATORS AND WARNINGS
!
! FURTHER COMMENTS
!
!
! External routines required
!
! External libraries required
! None
PROGRAM monte_carlo
IMPLICIT NONE
INTEGER(kind=8), PARAMETER :: npts = 1e10
REAL(kind=8), PARAMETER :: xmin=0.0d0,xmax=1.0d0
INTEGER(kind=8) :: i
REAL(kind=8) :: f,sum, randnum,x
DO i=1,npts
CALL random_number(randnum)
x = (xmaxxmin)*randnum + xmin
sum = sum + 4.0d0/(1.0d0 + x**2)
END DO
f = sum/npts
PRINT*,'PI calculated with ',npts,' points = ',f
STOP
END

(
) 
#define the complier
COMPILER = mpif90
# compilation settings, optimization, precision, parallelization
FLAGS = O0
# libraries
LIBS =
# source list for main program
SOURCES = montecarloserial.f90
test: $(SOURCES)
${COMPILER} o montecarloserial $(FLAGS) $(SOURCES)
clean:
rm *.o
clobber:
rm montecarloserial

(
) 
#!/bin/bash
# the queue to be used.
#PBS q shared
# specify your project allocation
#PBS A mia122
# number of nodes and number of processors per node requested
#PBS l nodes=1:ppn=1
# requested Wallclock time.
#PBS l walltime=00:05:00
# name of the standard out file to be "outputfile".
#PBS o job_output
# name of the job
#PBS N MCserial
# Email address to send a notification to, change "youremail" appropriately
#PBS M youremail@umich.edu
# send a notification for job abort, begin and end
#PBS m abe
#PBS V
cd $PBS_O_WORKDIR #change to the working directory
mpirun_rsh np 1 hostfile $PBS_NODEFILE montecarloserial
Parallel
edit

(
) 
!
!
!
! PURPOSE
!
! This program uses MPI to do a parallel monte carlo calculation of pi
!
! .. Parameters ..
! npts = total number of Monte Carlo points
! xmin = lower bound for integration region
! xmax = upper bound for integration region
! .. Scalars ..
! mynpts = this processes number of Monte Carlo points
! myid = process id
! nprocs = total number of MPI processes
! ierr = error code
! i = loop counter
! f = average value from summation
! sum = total sum
! mysum = sum on this process
! randnum = random number generated from (0,1) uniform
! distribution
! x = current Monte Carlo location
! start = simulation start time
! finish = simulation end time
! .. Arrays ..
!
! .. Vectors ..
!
! REFERENCES
! http://chpc.wustl.edu/mpifortran.html
! Gropp, Lusk and Skjellum, "Using MPI" MIT press (1999)
!
! ACKNOWLEDGEMENTS
! The program below was modified from one available at the internet
! address in the references. This internet address was last checked
! on 30 March 2012
!
! ACCURACY
!
! ERROR INDICATORS AND WARNINGS
!
! FURTHER COMMENTS
!
!
! External routines required
!
! External libraries required
! MPI library
PROGRAM monte_carlo_mpi
USE MPI
IMPLICIT NONE
INTEGER(kind=8), PARAMETER :: npts = 1e10
REAL(kind=8), PARAMETER :: xmin=0.0d0,xmax=1.0d0
INTEGER(kind=8) :: mynpts
INTEGER(kind=4) :: ierr, myid, nprocs
INTEGER(kind=8) :: i
REAL(kind=8) :: f,sum,mysum,randnum
REAL(kind=8) :: x, start, finish
! Initialize MPI
CALL MPI_INIT(ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, nprocs, ierr)
start=MPI_WTIME()
! Calculate the number of points each MPI process needs to generate
IF (myid .eq. 0) THEN
mynpts = npts  (nprocs1)*(npts/nprocs)
ELSE
mynpts = npts/nprocs
ENDIF
! set initial sum to zero
mysum = 0.0d0
! use loop on local process to generate portion of Monte Carlo integral
DO i=1,mynpts
CALL random_number(randnum)
x = (xmaxxmin)*randnum + xmin
mysum = mysum + 4.0d0/(1.0d0 + x**2)
ENDDO
! Do a reduction and sum the results from all processes
CALL MPI_REDUCE(mysum,sum,1,MPI_DOUBLE_PRECISION,MPI_SUM,&
0,MPI_COMM_WORLD,ierr)
finish=MPI_WTIME()
! Get one process to output the result and running time
IF (myid .eq. 0) THEN
f = sum/npts
PRINT*,'PI calculated with ',npts,' points = ',f
PRINT*,'Program took ', finishstart, ' for Time stepping'
ENDIF
CALL MPI_FINALIZE(ierr)
STOP
END PROGRAM

(
) 
#define the complier
COMPILER = mpif90
# compilation settings, optimization, precision, parallelization
FLAGS = O0
# libraries
LIBS =
# source list for main program
SOURCES = montecarloparallel.f90
test: $(SOURCES)
${COMPILER} o montecarloparallel $(FLAGS) $(SOURCES)
clean:
rm *.o
clobber:
rm montecarloparallel

(
) 
#!/bin/bash
# the queue to be used.
#PBS q normal
# specify your project allocation
#PBS A mia122
# number of nodes and number of processors per node requested
#PBS l nodes=1:ppn=32
# requested Wallclock time.
#PBS l walltime=00:05:00
# name of the standard out file to be "outputfile".
#PBS o job_output
# name of the job, you may want to change this so it is unique to you
#PBS N MPI_MCPARALLEL
# Email address to send a notification to, change "youremail" appropriately
#PBS M youremail@umich.edu
# send a notification for job abort, begin and end
#PBS m abe
#PBS V
# change to the job submission directory
cd $PBS_O_WORKDIR
# Run the job
mpirun_rsh np 32 hostfile $PBS_NODEFILE montecarloparallel
Exercises
edit Explain why using Monte Carlo to evaluate allows you to find and, in your own words, explain what the serial and parallel programs do.
 Find the time it takes to run the Parallel Monte Carlo program on 32, 64, 128, 256 and 512 cores.
 Use a parallel Monte Carlo integration program to evaluate over the unit circle.
 Use a parallel Monte Carlo integration program to approximate the volume of the ellipsoid . Use either OpenMP or MPI.
 Write parallel programs to find the volume of the 4 dimensional sphere Try both Monte Carlo and Riemann sum techniques. Use either OpenMP or MPI.
Notes
edit ↑ 2decomp&fft
 ↑ Gropp, Lusk and Skjellum (1999)
 ↑ Gropp, Lusk and Thakur (1999)
 ↑
One can run this program on many more than 24 processes, however, the output becomes quite excessive
 ↑
The Student’s t distribution is implemented in many numerical packages such as Maple, Mathematica, Matlab, R, Sage etc., so if you need to use to obtain numerical results, it is helpful to use on of these packages.
 ↑ Corral (2011)
 ↑ Many computers and mobile telephones produced today have 2 or more cores and so can be considered parallel, but here we mean computers with over hundreds of cores.
 ↑ Gropp, Lusk and Skjellum (1999)
References
editCorral, M. (2011). Vector Calculus. {{cite book}}
: Cite has empty unknown parameter: coauthors=
(help)
Gropp, W.; Lusk, E.; Skjellum, A. (1999). Using MPI. MIT Press.
Gropp, W.; Lusk, E.; Thakur, R. (1999). Using MPI2. MIT Press.
Li, N. "2decomp&fft". {{cite web}}
: Cite has empty unknown parameter: coauthors=
(help)