Last modified on 6 May 2014, at 08:56

Parallel Spectral Numerical Methods/Introduction to Parallel Programming

Introduction to Parallel ProgrammingEdit

Overview of OpenMP and MPIEdit

To solve large computational problems quickly, it is necessary to take advantage of multiple cores on a CPU (central processing units) and multiple CPUs. Most programs written up until now are sequential and compilers will not typically automatically generate parallel executables, so programmers need to modify the original serial computer code to take advantage of extra processing power. Two standards which specify what libraries that allow for parallel programming should do are OpenMP and MPI (the message passing interface). In this section, we cover the minimal amount of information required to understand, run and modify the programs in this tutorial. More detailed tutorials can be found at https://computing.llnl.gov/tutorials/ and at http://www.citutor.org.

OpenMP is used for parallel programming on shared memory architectures – each compute process has a global view of memory. It allows one to incrementally parallelize an existing Fortran, C or C++ code by adding directives to the original code. It is therefore easy to use. However some care is required in getting good performance when using OpenMP. It is easy to add directives to a serial code, but thought is required in creating a program which will show improved performance and give correct results when made to run in parallel. For the numerical solution of multidimensional partial differential equations on regular grids, it is easy to perform efficient and effective loop based parallelism, so a complete understanding of all the features of OpenMP is not required. OpenMP typically allows one to use 10’s of computational cores, in particular allowing one to take advantage of multicore laptops, desktops and workstations.

MPI is used for parallel programming on distributed-memory architectures – when separate compute processes have access to their own local memory and processes must explicitly receive data held in memory belonging to other processes which have sent the data. MPI is a library which allows one to parallelize Fortran, C and C++ programs by adding function calls which explicitly move data from one process to another. Careful thought is required in converting a serial program to a parallel MPI program because the data needs to be decomposed onto different processes, so it is usually difficult to incrementally parallelize a program that uses MPI. The best way to parallelize a program which will use MPI is problem dependent. When solving large problems, one typically does not have enough memory on each process to simply replicate all the data. Thus one wants to split up the data (known as domain decomposition) in such a way as to minimize the amount of message passing that is required to perform a computation correctly. Programming this can be rather complicated and time consuming. Fortunately, by using the 2DECOMP&FFT library[1] which is written on top of MPI, we can avoid having to program many of the data passing operations when writing Fourier spectral codes and still benefit from being able to solve partial differential equations on up to O(10^5) processor cores.

OpenMPEdit

Please read the tutorial at https://computing.llnl.gov/tutorials/openMP/, then answer the following questions:

OpenMP ExercisesEdit

  1. What is OpenMP?

  2. Download a copy of the latest OpenMP specifications from www.openmp.org. What version number is the latest specification?

  3. Explain what each of the following OpenMP directives does:

    1. !$OMP PARALLEL

    2. !$OMP END PARALLEL

    3. !$OMP PARALLEL DO

    4. !$OMP END PARALLEL DO

    5. !$OMP BARRIER

    6. !$OMP MASTER

    7. !$OMP END MASTER

  4. Try to understand and then run the Hello World program in listing A on 1, 2, 6 and 12 threads. Put the output of each run in your solutions, the output will be in a file of the form
    helloworld.o**********
    where the last entries above are digits corresponding to the number of the run. An example makefile to compile this on Flux is in listing B . An example submission script is in listing C . To change the number of OpenMP processes that the program will run on from say 2 to 6, change
    ppn=2
    to
    ppn=6
    and also change the value of the OMP_NUM_THREADS variable from
    {OMP_NUM_THREADS=2}
    to
    {OMP_NUM_THREADS=6}
    On Flux, there is a maximum of 12 cores per node, so the largest useful number of threads for most applications is 12.

     

     

     

     

    ( A)

    A Fortran program taken from http://en.wikipedia.org/wiki/OpenMP, which demonstrates parallelism using OpenMP Code download
    !--------------------------------------------------------------------
    !
    !
    ! PURPOSE
    !
    ! This program uses OpenMP to print hello world from all available
    ! threads
    !
    ! .. Parameters ..
    !
    ! .. Scalars ..
    ! id = thread id
    ! nthreads = total number of threads
    !
    ! .. Arrays ..
    !
    ! .. Vectors ..
    !
    ! REFERENCES
    ! http:// en.wikipedia.org/wiki/OpenMP
    !
    ! ACKNOWLEDGEMENTS
    ! The program below was modified from one available at the internet
    ! address in the references. This internet address was last checked
    ! on 30 December 2011
    !
    ! ACCURACY
    !
    ! ERROR INDICATORS AND WARNINGS
    !
    ! FURTHER COMMENTS
    !
    !--------------------------------------------------------------------
    ! External routines required
    !
    ! External libraries required
    ! OpenMP library
    PROGRAM hello90
    USE omp_lib
    IMPLICIT NONE
    INTEGER:: id, nthreads
    !$OMP PARALLEL PRIVATE(id)
    id = omp_get_thread_num()
    nthreads = omp_get_num_threads()
    PRINT *, 'Hello World from thread', id
    !$OMP BARRIER
    IF ( id == 0 ) THEN
    PRINT*, 'There are', nthreads, 'threads'
    END IF
    !$OMP END PARALLEL
    END PROGRAM
    


     

     

     

     

    ( B)

    An example makefile for compiling the helloworld program in listing A Code download
    #define the complier
    COMPILER = ifort
    # compilation settings, optimization, precision, parallelization
    FLAGS = -O0 -openmp
     
    # libraries
    LIBS =
    # source list for main program
    SOURCES = helloworld.f90
     
    test: $(SOURCES)
    ${COMPILER} -o helloworld $(FLAGS) $(SOURCES)
     
    clean:
    rm *.o
     
    clobber:
    rm helloworld
    


     

     

     

     

    ( C)

    An example submission script for use on Flux Code download
    #!/bin/bash
    #PBS -N helloworld
    #PBS -l nodes=1:ppn=2,walltime=00:02:00
    #PBS -q flux
    #PBS -l qos=math471f11_flux
    #PBS -A math471f11_flux
    #PBS -M your_uniqname@umich.edu
    #PBS -m abe
    #PBS -V
    #
    # Create a local directory to run and copy your files to local.
    # Let PBS handle your output
    mkdir /tmp/${PBS_JOBID}
    cp ${HOME}/ParallelMethods/helloworldOMP/helloworld /tmp/${PBS_JOBID}/helloworld
    cd /tmp/${PBS_JOBID}
     
    export OMP_NUM_THREADS=2
    ./helloworld
     
    #Clean up your files
    cd ${HOME}/ParallelMethods/helloworldOMP
    /bin/rm -rf /tmp/${PBS_JOBID}
    
  5. Add OpenMP directives to the loops in the 2-D heat equation solver. Run the resulting program on 1,3,6 and 12 threads and record the time it takes to the program to finish. Make a plot of the final iterate.

MPIEdit

A copy of the current MPI standard can be found at http://www.mpi-forum.org/. It allows for parallelization of Fortran, C and C++ programs. There are newer parallel programming languages such as Co-Array Fortran (CAF) and Unified Parallel C (UPC) which allow the programmer to view memory as a single addressable space even on a distributed-memory machine. However, computer hardware limitations imply that most of the programming concepts used when writing MPI programs will be required to write programs in CAF and UPC. Compiler technology for these languages is also not as well developed as compiler technology for older languages such as Fortran and C, so at the present time, Fortran and C dominate high performance computing. An introduction to the essential concepts required for writing and using MPI programs can be found at http://www.shodor.org/refdesk/Resources/Tutorials/. More information on MPI can be found in Gropp, Lusk and Skjellum[2], Gropp, Lusk and Thakur[3] and at https://computing.llnl.gov/tutorials/mpi/. There are many resources available online, however once the basic concepts have been mastered, what is most useful is an index of MPI commands, usually a search engine will give you sources of listings, however we have found the following sites useful:

MPI ExercisesEdit

  1. What does MPI stand for?

  2. Please read the tutorials at http://www.shodor.org/refdesk/Resources/Tutorials/BasicMPI/ and at https://computing.llnl.gov/tutorials/mpi/, then explain what the following commands do:

    • USE mpi or INCLUDE 'mpif.h'

    • MPI_INIT

    • MPI_COMM_SIZE

    • MPI_COMM_RANK

    • MPI_FINALIZE

  3. What is the version number of the current MPI standard?

  4. Try to understand the Hello World program in listing D . Explain how it differs from A . Run the program in listing D on 1, 2, 6, 12 and 24 MPI processes[4]. Put the output of each run in your solutions, the output will be in a file of the form
    helloworld.o**********
    where the last entries above are digits corresponding to the number of the run. An example makefile to compile this on Flux is in listing E . An example submission script is in listing F . To change the number of MPI processes that the program will run on from say 2 to 6, change
    ppn=2
    to
    ppn=6
    and also change the submission script from
    mpirun -np 2 ./helloworld
    to
    mpirun -np 6 ./helloworld.

    On Flux, there is a maximum of 12 cores per node, so if more than 12 MPI processes are required, one needs to change the number of nodes as well. The total number of cores required is equal to the number of nodes multiplied by the number of processes per node. Thus to use 24 processes change
    nodes=1:ppn=2
    to
    nodes=2:ppn=12
    and also change the submission script from
    mpirun -np 2 ./helloworld
    to
    mpirun -np 24 ./helloworld.


     

     

     

     

    ( D)

    A Fortran program which demonstrates parallelizm using MPI Code download
    !--------------------------------------------------------------------
    !
    !
    ! PURPOSE
    !
    ! This program uses MPI to print hello world from all available
    ! processes
    !
    ! .. Parameters ..
    !
    ! .. Scalars ..
    ! myid = process id
    ! numprocs = total number of MPI processes
    ! ierr = error code
    !
    ! .. Arrays ..
    !
    ! .. Vectors ..
    !
    ! REFERENCES
    ! http:// en.wikipedia.org/wiki/OpenMP
    !
    ! ACKNOWLEDGEMENTS
    ! The program below was modified from one available at the internet
    ! address in the references. This internet address was last checked
    ! on 30 December 2011
    !
    ! ACCURACY
    !
    ! ERROR INDICATORS AND WARNINGS
    !
    ! FURTHER COMMENTS
    !
    !--------------------------------------------------------------------
    ! External routines required
    !
    ! External libraries required
    ! MPI library
    PROGRAM hello90
    USE MPI
    IMPLICIT NONE
    INTEGER(kind=4) :: myid, numprocs, ierr
     
    CALL MPI_INIT(ierr)
    CALL MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)
    CALL MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)
     
    PRINT*, 'Hello World from process', myid
    CALL MPI_BARRIER(MPI_COMM_WORLD,ierr)
    IF ( myid == 0 ) THEN
    PRINT*, 'There are ', numprocs, ' MPI processes'
    END IF
    CALL MPI_FINALIZE(ierr)	
    END PROGRAM
    


     

     

     

     

    ( E)

    An example makefile for compiling the helloworld program in listing {lst:ForMpiHw} Code download
    #define the complier
    COMPILER = mpif90
    # compilation settings, optimization, precision, parallelization
    FLAGS = -O0
     
    # libraries
    LIBS =
    # source list for main program
    SOURCES = helloworld.f90
     
    test: $(SOURCES)
    ${COMPILER} -o helloworld $(FLAGS) $(SOURCES)
     
    clean:
    rm *.o
     
    clobber:
    rm helloworld
    


     

     

     

     

    ( F)

    An example submission script for use on Flux Code download
    #!/bin/bash
    #PBS -N helloworld
    #PBS -l nodes=1:ppn=2,walltime=00:02:00
    #PBS -q flux
    #PBS -l qos=math471f11_flux
    #PBS -A math471f11_flux
    #PBS -M your_uniqname@umich.edu
    #PBS -m abe
    #PBS -V
    #
    # Create a local directory to run and copy your files to local.
    # Let PBS handle your output
    mkdir /tmp/${PBS_JOBID}
    cp ${HOME}/ParallelMethods/helloworldMPI/helloworld /tmp/${PBS_JOBID}/helloworld
    cd /tmp/${PBS_JOBID}
     
    mpirun -np 2 ./helloworld
     
    #Clean up your files
    cd ${HOME}/ParallelMethods/helloworldMPI
    /bin/rm -rf /tmp/${PBS_JOBID}
    

    A first parallel program: Monte Carlo IntegrationEdit

    To introduce the basics of parallel programming in a context that is a little more complicated than { Hello World}, we will consider Monte Carlo integration. We review important concepts from probability and Riemann integration, and then give example algorithms and explain why parallelization may be helpful.

    ProbabilityEdit

    f:U\subset \mathbb{R}^2 \rightarrow \mathbb{R}_+ is a probability density function if  \int\int_U f \mathrm{d}A =1

    If f is a probability density function which takes the set U\subset\mathbb{R}^2, then the probability of events in the set W\subset U occurring is P(W)=\int\int_W f \mathrm{d}A.

    The joint density for it to snow x inches tomorrow and for Kelly to win y dollar in the lottery tomorrow is given by f=\frac{c}{(1+x)(100+y)} for x,y\in [0,100]\times[0,100] and f=0 otherwise. Find c.

    Suppose X is a random variable with probability density function f_1(x) and Y is a random variable with a probability density function f_2(y). Then X and Y are independent random variables if their joint density function is f(x,y)=f_1(x)f_2(y).

    The probability it will snow tomorrow and the probability Kelly will win the lottery tomorrow are independent random variables.

    If f(x,y) is a probability density function for the random variables X and Y, the X mean is \mu_1=\bar{X}=\int\int xf\mathrm{d}A and the Y mean is \mu_2=\bar{Y}=\int\int yf\mathrm{d}A.

    The X mean and the Y mean are the expected values of X and Y.

    If f(x,y) is a probability density function for the random variables X and Y, the X variance is \sigma_1^2=\overline{(X-\bar{X})^2}=\int\int (x-\bar{X})^2f\mathrm{d}A and the Y variance is \sigma_2^2=\overline{(Y-\bar{Y})^2}=\int\int (y-\bar{Y})^2f\mathrm{d}A.

    The standard deviation is defined to be the square root of the variance.

    Find an expression for the probability that it will snow more than 1.1 times the expected snowfall and also that Kelly will win more than 1.2 times the expected amount in the lottery.

    ExerciseEdit

    • A class is graded on a curve. It is assumed that the class is a representative sample of the population, the probability density function for the numerical score x is given by  f(x)=C\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right). For simplicity we assume that x can take on the values -\infty and \infty, though in actual fact the exam is scored from 0 to 100.

      1. Determine C using results from your previous homework.

      2. Suppose there are 240 students in the class and the mean and standard deviation for the class is not reported. As an enterprising student, you poll 60 of your fellow students (we shall suppose they are selected randomly). You find that the mean for these 60 students is 55% and the standard deviation is 10%. Use the Student’s t distribution http://en.wikipedia.org/wiki/Student\%27s_t-distribution to estimate the 90% confidence interval for the actual sample mean. Make a sketch of the t-distribution probability density function and shade the region which corresponds to the 90% confidence interval for the sample mean.[5]

      Remark Fortunately, all the students are hard working, so the possibility of a negative score, although possible, is extremely low, and so we neglect it to make the above computation easier.

    Riemann IntegrationEdit

    Recall that we can approximate integrals by Riemann sums. There are many integrals one cannot evaluate analytically, but for which a numerical answer is required. In this section, we shall explore a simple way of doing this on a computer. Suppose we want to find I2d=\int_0^1\int_0^4 x^2+2y^2\mathrm{d}y\mathrm{d}x. If we do this analytically we find I2d=44. Let us suppose we have forgotten how to integrate, and so we do this numerically. We can do so using the following Matlab code:


     

     

     

     

    ( G)

    A Matlab program which demonstrates how to approximate an integral by a sum Code download
    % A program to approximate an integral
     
    clear all; format compact; format short;
     
    nx=1000; % number of points in x
    xend=1; % last discretization point
    xstart=0; % first discretization point
    dx=(xend-xstart)/(nx-1); % size of each x sub-interval
     
    ny=4000; % number of points in y
    yend=4; % last discretization point
    ystart=0; % first discretization point
    dy=(yend-ystart)/(ny-1); % size of each y sub-interval
     
    % create vectors with points for x and y
    for i=1:nx
        x(i)=xstart+(i-1)*dx;
    end
    for j=1:ny
        y(j)=ystart+(j-1)*dy;
    end
     
    % Approximate the integral by a sum
    I2d=0;
    for i=1:nx
        for j=1:ny
            I2d=I2d+(x(i)^2+2*y(j)^2)*dy*dx;
        end
    end
    % print out final answer
    I2d
    

    We can do something similar in three dimensions. Suppose we want to calculate I3d=\int_0^1\int_0^1\int_0^4 x^2+2y^2+3z^2\mathrm{d}z\mathrm{d}y\mathrm{d}x. Analytically we find that I3d=68

    ExercisesEdit

    1. Modify the Matlab code to perform the three dimensional integral.
    2. Try and determine how the accuracy of either the two or three dimensional method varies as the number of subintervals is changed.

    Monte Carlo IntegrationEdit

    It is possible to extend the above integration schemes to higher and higher dimensional integrals. This can become computationally intensive and an alternate method of integration based on probability is often used. The method we will discuss is called the Monte Carlo method, and the description of this method which follows is based on Chapter 3 of Vector Calculus by Michael Corral[6] which is available at http://www.mecmath.net/ and where Java and Sage programs for doing Monte Carlo integration can be found. The idea behind Monte Carlo integration is based on the concept of the average value of a function, which is encountered in single-variable calculus. Recall that for a continuous function f(x), the average value \bar{f} of f over an interval \lbrack a,b \rbrack is defined as

    
 \bar{f} ~=~ \frac{1}{b-a}\int_a^b f(x)\,dx ~.

     

     

     

     

    ( 1)

    The quantity b-a is the length of the interval \lbrack a,b \rbrack, which can be thought of as the “volume” of the interval. Applying the same reasoning to functions of two or three variables, we define the average value of f(x,y) over a region R to be

    
 \bar{f} ~=~ \frac{1}{A(R)}\iint\limits_{R} f(x,y)\,dA ~,

     

     

     

     

    ( 2)

    where A(R) is the area of the region R, and we define the average value of f(x,y,z) over a solid S to be

    
 \bar{f} ~=~ \frac{1}{V(S)}\iiint\limits_{S} f(x,y,z)\,dV ~,

     

     

     

     

    ( 3)

    where V(S) is the volume of the solid S. Thus, for example, we have

    
 \iint\limits_{R} f(x,y)\,dA ~=~ A(R)\bar{f} ~.

     

     

     

     

    ( 4)

    The average value of f(x,y) over R can be thought of as representing the sum of all the values of f divided by the number of points in R. Unfortunately there are an infinite number (in fact, uncountably many) points in any region, i.e. they can not be listed in a discrete sequence. But what if we took a very large number N of random points in the region R (which can be generated by a computer) and then took the average of the values of f for those points, and used that average as the value of \bar{f}? This is exactly what the Monte Carlo method does. So in formula 4 the approximation we get is

    
 \iint\limits_{R} f(x,y)\,dA ~\approx~ A(R)\bar{f} \pm A(R)\sqrt{\frac{\overline{f^2} - (\bar{f})^2}{N}} ~,

     

     

     

     

    ( 5)

    where

    \bar{f} ~=~ \frac{\sum_{i=1}^N f(x_i,y_i)}{N} \quad \text{and} \quad \overline{f^2} ~=~
 \frac{\sum_{i=1}^N (f(x_i,y_i))^2}{N} ~,

     

     

     

     

    ( 6)

    with the sums taken over the N random points (x_1,y_1), \ldots, (x_N,y_N). The \pm “error term” in formula 5 does not really provide hard bounds on the approximation. It represents a single standard deviation from the expected value of the integral. That is, it provides a likely bound on the error. Due to its use of random points, the Monte Carlo method is an example of a probabilistic method (as opposed to deterministic methods such as the Riemann sum approximation method, which use a specific formula for generating points).

    For example, we can use the formula in eq. 5 to approximate the volume V under the surface z=x^2+2y^2 over the rectangle R =(0,1)\times(0,4). Recall that the actual volume is 44. Below is a Matlab code that calculates the volume using Monte Carlo integration


     

     

     

     

    ( H)

    A Matlab program which demonstrates how to use the Monte Carlo method to calculate the volume below z=x^2+2y^2, with (x,y)\in(0,1)\times(0,4). Code download
    % A program to approximate an integral using the Monte Carlos method
     
    % This program can be made much faster by using Matlab's matrix and vector
    % operations, however to allow easy translation to other languages we have
    % made it as simple as possible.
     
    Numpoints=65536; % number of random points
     
    I2d=0; % Initialize value
    I2dsquare=0; % initial variance
    for n=1:Numpoints
        % generate random number drawn from a uniform distribution on (0,1)
        x=rand(1);
        y=rand(1)*4;
        I2d=I2d+x^2+2*y^2;
        I2dsquare=I2dsquare+(x^2+2*y^2)^2;
    end
    % we scale the integral by the total area and divide by the number of
    % points used
    I2d=I2d*4/Numpoints
    % we also output an estimated error
    I2dsquare=I2dsquare*4/Numpoints;
    EstimError=4*sqrt( (I2d^2-I2dsquare)/Numpoints)
    

    The results of running this program with various numbers of random points are shown below:

    N = 16: 41.3026 +/- 30.9791
    N = 256: 47.1855 +/- 9.0386
    N = 4096: 43.4527 +/- 2.0280
    N = 65536: 44.0026 +/- 0.5151
    

    As you can see, the approximation is fairly good. As N \to \infty, it can be shown that the Monte Carlo approximation converges to the actual volume (on the order of O(\sqrt{N}), in computational complexity terminology).

    In the above example the region R was a rectangle. To use the Monte Carlo method for a nonrectangular (bounded) region R, only a slight modification is needed. Pick a rectangle \tilde{R} that encloses R, and generate random points in that rectangle as before. Then use those points in the calculation of \bar{f} only if they are inside R. There is no need to calculate the area of R for formula ({eqn:monte}) in this case, since the exclusion of points not inside R allows you to use the area of the rectangle \tilde{R} instead, similar to before.

    For instance, one can show that the volume under the surface z=1 over the nonrectangular region R = \left\{ (x,y): 0 \le x^2+y^2 \le 1\right\} is \pi. Since the rectangle \tilde{R} = [-1,1] \times [ -1,1] contains R, we can use a similar program to the one we used, the largest change being a check to see if y^2+x^2 \leq 1 for a random point (x,y) in [-1,1] \times [ -1,1]. A Matlab code listing which demonstrates this is below:

     

     

     

     

    ( I)

    A Matlab program which demonstrates how to use the Monte Carlo method to calculate the area of an irregular region and also to calculate \pi. Code download
    % A program to approximate an integral using the Monte Carlos method
     
    % This program can be made much faster by using Matlab's matrix and vector
    % operations, however to allow easy translation to other languages we have
    % made it as simple as possible.
     
    Numpoints=256; % number of random points
     
    I2d=0; % Initialize value
    I2dsquare=0; % initial variance
    for n=1:Numpoints
        % generate random number drawn from a uniform distribution on (0,1) and
        % scale this to (-1,1)
        x=2*rand(1)-1;
        y=2*rand(1) -1;
        if ((x^2+y^2) <1)
            I2d=I2d+1;
            I2dsquare=I2dsquare+1;
        end
    end
    % We scale the integral by the total area and divide by the number of
    % points used
    I2d=I2d*4/Numpoints
    % we also output an estimated error
    I2dsquare=I2dsquare*4/Numpoints;
    EstimError=4*sqrt( (I2d^2-I2dsquare)/Numpoints)
    

    The results of running the program with various numbers of random points are shown below:

    N = 16: 3.5000 +/- 2.9580
    N = 256: 3.2031 +/- 0.6641
    N = 4096: 3.1689 +/- 0.1639
    N = 65536: 3.1493 +/- 0.0407
    

    To use the Monte Carlo method to evaluate triple integrals, you will need to generate random triples (x,y,z) in a parallelepiped, instead of random pairs (x,y) in a rectangle, and use the volume of the parallelepiped instead of the area of a rectangle in formula 5 . For a more detailed discussion of numerical integration methods, please take a further course in mathematics.

    ExercisesEdit

    1. Write a program that uses the Monte Carlo method to approximate the double integral \iint\limits_{R} e^{xy}\,dA, where R = \lbrack 0,1 \rbrack \times \lbrack 0,1 \rbrack. Show the program output for N=10, 100, 1000, 10000, 100000 and 1000000 random points.
    2. Write a program that uses the Monte Carlo method to approximate the triple integral \iiint\limits_{S} e^{xyz}\,dV, where S= \lbrack 0,1 \rbrack \times \lbrack 0,1 \rbrack \times \lbrack 0,1 \rbrack. Show the program output for N=10, 100, 1000, 10000, 100000 and 1000000 random points.
    3. Use the Monte Carlo method to approximate the volume of a sphere of radius 1.

    Parallel Monte Carlo IntegrationEdit

    As you may have noticed, the algorithms are simple, but can require very many grid points to become accurate. It is therefore useful to run these algorithms on a parallel computer. We will demonstrate a parallel Monte Carlo calculation of \pi. Before we can do this, we need to learn how to use a parallel computer[7].

    We now examine a Fortran program for calculating \pi. These programs are taken from http://chpc.wustl.edu/mpi-fortran.html, where further explanation can be found. The original source of these programs appears to be Gropp, Lusk and Skjellum[8]

    SerialEdit

     

     

     

     

    ( I)

    A serial Fortran program which demonstrates how to calculate \pi using a Monte Carlo method Code download
    !--------------------------------------------------------------------
    !
    !
    ! PURPOSE
    !
    ! This program use a monte carlo method to calculate pi
    !
    ! .. Parameters ..
    ! npts = total number of Monte Carlo points
    ! xmin = lower bound for integration region
    ! xmax = upper bound for integration region
    ! .. Scalars ..
    ! i = loop counter
    ! f = average value from summation
    ! sum = total sum
    ! randnum = random number generated from (0,1) uniform
    ! distribution
    ! x = current Monte Carlo location
    ! .. Arrays ..
    !
    ! .. Vectors ..
    !
    ! REFERENCES
    ! http://chpc.wustl.edu/mpi-fortran.html
    ! Gropp, Lusk and Skjellum, "Using MPI" MIT press (1999)
    !
    ! ACKNOWLEDGEMENTS
    ! The program below was modified from one available at the internet
    ! address in the references. This internet address was last checked
    ! on 30 March 2012
    !
    ! ACCURACY
    !
    ! ERROR INDICATORS AND WARNINGS
    !
    ! FURTHER COMMENTS
    !
    !--------------------------------------------------------------------
    ! External routines required
    !
    ! External libraries required
    ! None
      PROGRAM monte_carlo
        IMPLICIT NONE
     
    INTEGER(kind=8), PARAMETER :: npts = 1e10
        REAL(kind=8), PARAMETER :: xmin=0.0d0,xmax=1.0d0
        INTEGER(kind=8) :: i
        REAL(kind=8) :: f,sum, randnum,x
     
        DO i=1,npts
          CALL random_number(randnum)
          x = (xmax-xmin)*randnum + xmin
          sum = sum + 4.0d0/(1.0d0 + x**2)
        END DO
    f = sum/npts
        PRINT*,'PI calculated with ',npts,' points = ',f
     
        STOP
    END
    


     

     

     

     

    ( J)

    An example makefile for compiling the program in listing I Code download
    #define the complier
    COMPILER = mpif90
    # compilation settings, optimization, precision, parallelization
    FLAGS = -O0
     
    # libraries
    LIBS =
    # source list for main program
    SOURCES = montecarloserial.f90
     
    test: $(SOURCES)
    ${COMPILER} -o montecarloserial $(FLAGS) $(SOURCES)
     
    clean:
    rm *.o
     
    clobber:
    rm montecarloserial
    


     

     

     

     

    ( K)

    An example submission script for use on Trestles located at the San Diego Supercomputing Center Code download
    #!/bin/bash
    # the queue to be used.
    #PBS -q shared
    # specify your project allocation
    #PBS -A mia122
    # number of nodes and number of processors per node requested
    #PBS -l nodes=1:ppn=1
    # requested Wall-clock time.
    #PBS -l walltime=00:05:00
    # name of the standard out file to be "output-file".
    #PBS -o job_output
    # name of the job
    #PBS -N MCserial
    # Email address to send a notification to, change "youremail" appropriately
    #PBS -M youremail@umich.edu
    # send a notification for job abort, begin and end
    #PBS -m abe
    #PBS -V
    cd $PBS_O_WORKDIR #change to the working directory
    mpirun_rsh -np 1 -hostfile $PBS_NODEFILE montecarloserial
    

    ParallelEdit

     

     

     

     

    ( L)

    A parallel fortran MPI program to calculate \pi. Code download
    !--------------------------------------------------------------------
    !
    !
    ! PURPOSE
    !
    ! This program uses MPI to do a parallel monte carlo calculation of pi
    !
    ! .. Parameters ..
    ! npts = total number of Monte Carlo points
    ! xmin = lower bound for integration region
    ! xmax = upper bound for integration region
    ! .. Scalars ..
    ! mynpts = this processes number of Monte Carlo points
    ! myid = process id
    ! nprocs = total number of MPI processes
    ! ierr = error code
    ! i = loop counter
    ! f = average value from summation
    ! sum = total sum
    ! mysum = sum on this process
    ! randnum = random number generated from (0,1) uniform
    ! distribution
    ! x = current Monte Carlo location
    ! start = simulation start time
    ! finish = simulation end time
    ! .. Arrays ..
    !
    ! .. Vectors ..
    !
    ! REFERENCES
    ! http://chpc.wustl.edu/mpi-fortran.html
    ! Gropp, Lusk and Skjellum, "Using MPI" MIT press (1999)
    !
    ! ACKNOWLEDGEMENTS
    ! The program below was modified from one available at the internet
    ! address in the references. This internet address was last checked
    ! on 30 March 2012
    !
    ! ACCURACY
    !
    ! ERROR INDICATORS AND WARNINGS
    !
    ! FURTHER COMMENTS
    !
    !--------------------------------------------------------------------
    ! External routines required
    !
    ! External libraries required
    ! MPI library
        PROGRAM monte_carlo_mpi
        USE MPI
        IMPLICIT NONE
     
        INTEGER(kind=8), PARAMETER :: npts = 1e10
        REAL(kind=8), PARAMETER :: xmin=0.0d0,xmax=1.0d0
        INTEGER(kind=8) :: mynpts
        INTEGER(kind=4) :: ierr, myid, nprocs
        INTEGER(kind=8) :: i
        REAL(kind=8) :: f,sum,mysum,randnum
        REAL(kind=8) :: x, start, finish
     
        ! Initialize MPI
        CALL MPI_INIT(ierr)
        CALL MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)
        CALL MPI_COMM_SIZE(MPI_COMM_WORLD, nprocs, ierr)
        start=MPI_WTIME()
     
    ! Calculate the number of points each MPI process needs to generate
        IF (myid .eq. 0) THEN
    mynpts = npts - (nprocs-1)*(npts/nprocs)
        ELSE
    mynpts = npts/nprocs
        ENDIF
     
        ! set initial sum to zero
        mysum = 0.0d0
    ! use loop on local process to generate portion of Monte Carlo integral
        DO i=1,mynpts
          CALL random_number(randnum)
          x = (xmax-xmin)*randnum + xmin
          mysum = mysum + 4.0d0/(1.0d0 + x**2)
        ENDDO
     
    ! Do a reduction and sum the results from all processes
        CALL MPI_REDUCE(mysum,sum,1,MPI_DOUBLE_PRECISION,MPI_SUM,&
              0,MPI_COMM_WORLD,ierr)
        finish=MPI_WTIME()
     
        ! Get one process to output the result and running time
        IF (myid .eq. 0) THEN
    f = sum/npts
             PRINT*,'PI calculated with ',npts,' points = ',f
             PRINT*,'Program took ', finish-start, ' for Time stepping'
        ENDIF
     
    CALL MPI_FINALIZE(ierr)
     
        STOP
    END PROGRAM
    


     

     

     

     

    ( M)

    An example makefile for compiling the program in listing L Code download
    #define the complier
    COMPILER = mpif90
    # compilation settings, optimization, precision, parallelization
    FLAGS = -O0
     
    # libraries
    LIBS =
    # source list for main program
    SOURCES = montecarloparallel.f90
     
    test: $(SOURCES)
    ${COMPILER} -o montecarloparallel $(FLAGS) $(SOURCES)
     
    clean:
    rm *.o
     
    clobber:
    rm montecarloparallel
    


     

     

     

     

    ( N)

    An example submission script for use on Trestles located at the San Diego Supercomputing Center Code download
    #!/bin/bash
    # the queue to be used.
    #PBS -q normal
    # specify your project allocation
    #PBS -A mia122
    # number of nodes and number of processors per node requested
    #PBS -l nodes=1:ppn=32
    # requested Wall-clock time.
    #PBS -l walltime=00:05:00
    # name of the standard out file to be "output-file".
    #PBS -o job_output
    # name of the job, you may want to change this so it is unique to you
    #PBS -N MPI_MCPARALLEL
    # Email address to send a notification to, change "youremail" appropriately
    #PBS -M youremail@umich.edu
    # send a notification for job abort, begin and end
    #PBS -m abe
    #PBS -V
     
    # change to the job submission directory
    cd $PBS_O_WORKDIR
    # Run the job
    mpirun_rsh -np 32 -hostfile $PBS_NODEFILE montecarloparallel
    

    ExercisesEdit

    1. Explain why using Monte Carlo to evaluate \int_0^1\frac{1}{1+x^2}\mathrm{d}x allows you to find \pi and, in your own words, explain what the serial and parallel programs do.
    2. Find the time it takes to run the Parallel Monte Carlo program on 32, 64, 128, 256 and 512 cores.
    3. Use a parallel Monte Carlo integration program to evaluate \iint x^2+y^6+\exp(xy)\cos(y\exp(x))\mathrm{d}A over the unit circle.
    4. Use a parallel Monte Carlo integration program to approximate the volume of the ellipsoid \frac{x^2}{9}+\frac{y^2}{4}+
  \frac{z^2}{1}=1. Use either OpenMP or MPI.
    5. Write parallel programs to find the volume of the 4 dimensional sphere 1\geq\sum_{i=1}^4 x_i^2. Try both Monte Carlo and Riemann sum techniques. Use either OpenMP or MPI.

    NotesEdit

    1. 2decomp&fft
    2. Gropp, Lusk and Skjellum (1999)
    3. Gropp, Lusk and Thakur (1999)
    4. One can run this program on many more than 24 processes, however, the output becomes quite excessive

    5. The Student’s t distribution is implemented in many numerical packages such as Maple, Mathematica, Matlab, R, Sage etc., so if you need to use to obtain numerical results, it is helpful to use on of these packages.

    6. Corral (2011)
    7. Many computers and mobile telephones produced today have 2 or more cores and so can be considered parallel, but here we mean computers with over hundreds of cores.
    8. Gropp, Lusk and Skjellum (1999)

    ReferencesEdit

    Corral, M. (2011). Vector Calculus. http://www.mecmath.net/. 


    Gropp, W.; Lusk, E.; Skjellum, A. (1999). Using MPI. MIT Press. 

    Gropp, W.; Lusk, E.; Thakur, R. (1999). Using MPI-2. MIT Press. 

    Li, N. "2decomp&fft". http://www.2decomp.org/.