# The Cubic Nonlinear Schrödinger Equation

## Background

The cubic nonlinear Schrödinger equation occurs in a variety of areas, including, quantum mechanics, nonlinear optics and surface water waves. A general introduction can be found at http://en.wikipedia.org/wiki/Schrodinger_equation and http://en.wikipedia.org/wiki/Nonlinear_Schrodinger_equation. A mathematical introduction to Schrödinger equations can be found in Sulem and Sulem and Yang. In this section we will introduce the idea of operator splitting and then go on to explain how this can be applied to the nonlinear Schrödinger equation in one, two and three dimensions. In one dimension, one can show that the cubic nonlinear Schrödinger equation is subcritical, and hence one has solutions which exist for all time. In two dimensions, it is $H^{1}$  critical, and so solutions may exhibit blow-up of the $H^{1}$  norm, that is the integral of the square of the gradient of the solution can become infinite in finite time. Finally, in three dimensions, the nonlinear Schrödinger equation is $L^{2}$  supercritical, and so the integral of the square of the solution can also become infinite in finite time. For an introduction to norms and Hilbert spaces, see a textbook on partial differential equations or analysis, such as Evans, Linares and Ponce, Lieb and Loss or Renardy and Rogers. A question of interest is how this blow-up occurs and numerical simulations are often used to understand this; see Sulem and Sulem for examples of this. The cubic nonlinear Schrödinger equation is given by

$i\psi _{t}+\Delta \psi \pm |\psi |^{2}\psi =0,$

where $\psi$  is the wave function and $\Delta$  is the Laplacian operator, so in one dimension it is $\partial _{xx}$ , in two dimensions, $\partial _{xx}+\partial _{yy}$  and in three dimensions it is $\partial _{xx}+\partial _{yy}+\partial _{zz}$ . The $+$  corresponds to the focusing cubic nonlinear Schrödinger equation and the $-$  corresponds to the defocusing cubic nonlinear Schrödinger equation. This equation has many conserved quantities, including the “mass”,

$\int _{\Omega }|\psi |^{2}{\text{d}}^{n}\mathbf {x}$

and the “energy”,

$\int _{\Omega }{\frac {1}{2}}|\nabla \psi |^{2}\mp {\frac {1}{4}}|\psi |^{4}{\text{d}}^{n}\mathbf {x}$

where $n$  is the dimension and $\Omega$  is the domain of the solution. As explained by Klein, these two quantities can provide useful checks on the accuracy of numerically generated solutions.

## Splitting

We will consider a numerical method to solve this equation known as splitting. This method occurs in several applications, and is a useful numerical method when the equation can be split into two separate equations, each of which can either be solved exactly, or each part is best solved by a different numerical method. Introductions to splitting can be found in Holden et al., McLachlan and Quispel, Thalhammer, Shen, Tang and Wang, Weideman and Herbst and Yang, and also at http://en.wikipedia.org/wiki/Split-step_method. For those interested in a comparison of time stepping methods for the nonlinear Schrödinger equation, see Klein. To describe the basic idea of the method, we consider an example given in Holden et al., which is the ordinary differential equation,

$u_{t}=u(u-1)$  with $u(t=0)=0.8$ .

( 1)

We can solve this equation relatively simply by separation of variables to find that

$u(t)={\frac {4}{4+\exp(t)}}.$

Now, an interesting observation is that we can also solve the equations $u_{t}=u^{2}$  and $u_{t}=-u$  individually. For the first we get that $u(t)={\frac {u(0)}{1-tu(0)}}$  and for the second we get that $u(t)=u(0)\exp(-t)$ . The principle behind splitting is to solve these two separate equations alternately for short periods of time. We will describe Strang splitting, although there are other forms of splitting, such as Godunov splitting and also additive splittings. We will not describe these here, but refer you to the previously mentioned references, in particular Holden et al.. To understand how we can solve the differential equation using splitting, consider the linear ordinary differential equation

$u_{t}=u+2u$  with $u(0)=1$ .

We can first solve $p_{t}=p$  for a time $\delta t/2$  and then using $q(0)=p(\delta t/2)$ , we solve $q_{t}=2q$  also for a time $\delta t$  to get $q(\delta t)$  and finally solve $r_{t}=r$  for a time $\delta t/2$  with initial data $r(0)=q(\delta t)$ . Thus in this case $p(\delta t)=\exp(\delta t/2)$ , $q(\delta t)=p(\delta t/2)\exp(2\delta t)=\exp(5\delta t/2)$  and $u(\delta t)\approx r(\delta t/2)=q(\delta t)\exp(\delta t/2)=\exp(3\delta t)$ , which in this case is the exact solution. One can perform a similar splitting for matrix differential equations. Consider solving $\mathbf {u} _{t}=(\mathbf {A} +\mathbf {B} )\mathbf {u}$ , where $\mathbf {A}$  and $\mathbf {B}$  are $n\times n$  matrices, the exact solution is $\mathbf {u} =\exp \left((\mathbf {A} +\mathbf {B} )t\right)\mathbf {u} (t=0)$ , and an approximate solution produced after one time step of splitting is $u(\delta t)\approx u(0)\exp(\mathbf {A} \delta t)\exp(\mathbf {B} \delta t)$ , which is not in general equal to $u(t=0)\exp \left((\mathbf {A} +\mathbf {B} )\delta t\right)$  unless the matrices $\mathbf {A}$  and $\mathbf {B}$  commute, and so the error in doing splitting in this case is of the form $(\mathbf {A} \mathbf {B} -\mathbf {B} \mathbf {A} )\delta t$ . Listing A uses Matlab to demonstrate how to do splitting for eq. 1 .

( A)

A Matlab program which uses Strang splitting to solve an ODE Code download
% A program to solve the u_t=u(u-1) using a
% Strang Splitting method

clear all; format compact; format short;
set(0,'defaultaxesfontsize',30,'defaultaxeslinewidth',.7,...
'defaultlinelinewidth',6,'defaultpatchlinewidth',3.7,...
'defaultaxesfontweight','bold')
Nt = 1000; % number of time slices
tmax = 1; % maximum time
dt=tmax/Nt; % increment between times
time=(linspace(1,Nt,Nt)-1)*dt; % time
uexact=4./(4+exp(time)); % exact solution
u(1)=0.8

for i=1:Nt-1
c=-1/u(i);
utemp=-1/(c+0.5*dt);
utemp2=utemp*exp(-dt);
c=-1/utemp2;
u(i+1)=-1/(c+0.5*dt);
end
figure(1)
plot(time,u,'r+',time,uexact,'b-');


( Ap)

A Python program which uses Strang splitting to solve an ODE. Code download
"""
A program to solve u_t'=u(u-1) using a Strang
splitting method
"""

import time
import numpy
import matplotlib.pyplot as plt

Nt = 1000	# Number of timesteps
tmax = 1.0	# Maximum time
dt=tmax/Nt # increment between times
u0 = 0.8 # Initial value
t0 = 0	# Starting time
u = [u0]	# Variables holding the values of iterations
t = [t0]	# Times of discrete solutions

T0 = time.clock()
for i in xrange(Nt):
c = -1.0/u[i]
utemp=-1.0/(c+0.5*dt)
utemp2=utemp*numpy.exp(-dt)
c = -1.0/utemp2
unew=-1.0/(c+0.5*dt)
u.append(unew)
t.append(t[i]+dt)

T = time.clock() - T0
uexact = [4.0/(4.0+numpy.exp(tt)) for tt in t]

print
print "Elapsed time is %f" % (T)

plt.figure()
plt.plot(t,u,'r+',t,uexact,'b-.')
plt.xlabel('Time')
plt.ylabel('Solution')
plt.legend(('Numerical Solution', 'Exact solution'))
plt.title('Numerical solution of du/dt=u(u-1)')
plt.show()


## Exercises

1. Modify the Matlab code to calculate the error at time 1 for several different choices of timestep. Numerically verify that Strang splitting is second order accurate.
2. Modify the Matlab code to use Godunov splitting where one solves $u1_{t}=u1$  for a time $\delta t$  and then using $u1(\delta t)$  as initial data solves $u2_{t}=2u2$  also for a time $\delta t$  to get the approximation to $u(\delta t)$ . Calculate the error at time 1 for several different choices of timestep. Numerically verify that Godunov splitting is first order accurate.

## Serial

For the nonlinear Schrödinger equation

$i\psi _{t}\pm |\psi |^{2}\psi +\Delta \psi =0$ ,

( 2)

we first solve

$i\psi _{t}+\Delta \psi =0$

( 3)

exactly using the Fourier transform to get

$\psi (\delta {t}/2,\cdot )$ .

We then solve

$i\psi _{t}\pm |\psi |^{2}\psi =0$

( 4)

with

$\psi (\delta {t}/2,\cdot )$

as initial data for a time step of $\delta t$ . As explained by Klein and Thalhammer, this can be solved exactly in real space because in eq. 4 , $|\psi |^{2}$  is a conserved quantity at every point in space and time. To show this, let $\psi ^{*}$  denote the complex conjugate of $\psi$ , so that

${\frac {\mathrm {d} |\psi |^{2}}{\mathrm {d} t}}=\psi ^{*}{\frac {\mathrm {d} \psi }{\mathrm {d} t}}+{\frac {\mathrm {d} \psi ^{*}}{\mathrm {d} t}}\psi =\psi ^{*}\left(\pm i|\psi |^{2}\psi \right)+\left(\pm i|\psi |^{2}\psi \right)^{*}\psi =0$ .

Another half step using eq. 3 is then computed using the solution produced by solving eq. 4 to obtain the approximate solution at time $\delta t$ . Example Matlab codes demonstrating splitting follow.

### Example Matlab and Python Programs for the Nonlinear Schrödinger Equation

The program in listing B computes an approximation to an explicitly known exact solution to the focusing nonlinear Schrödinger equation.

( B)

A Matlab program which uses Strang splitting to solve the one dimensional nonlinear Schrödinger equation Code download
% A program to solve the nonlinear Schr\"{o}dinger equation using a
% splitting method

clear all; format compact; format short;
set(0,'defaultaxesfontsize',30,'defaultaxeslinewidth',.7,...
'defaultlinelinewidth',6,'defaultpatchlinewidth',3.7,...
'defaultaxesfontweight','bold')

Lx = 20; % period 2*pi * L
Nx = 16384; % number of harmonics
Nt = 1000; % number of time slices
dt = 0.25*pi/Nt; % time step
U=zeros(Nx,Nt/10);

Es = -1; % focusing or defocusing parameter

% initialise variables
x = (2*pi/Nx)*(-Nx/2:Nx/2 -1)'*Lx; % x coordinate
kx = 1i*[0:Nx/2-1 0 -Nx/2+1:-1]'/Lx; % wave vector
k2x = kx.^2; % square of wave vector
% initial conditions
t=0; tdata(1)=t;
u=4*exp(1i*t)*(cosh(3*x)+3*exp(8*1i*t)*cosh(x))...
./(cosh(4*x)+4*cosh(2*x)+3*cos(8*t));
v=fft(u);
figure(1); clf; plot(x,u);xlim([-2,2]); drawnow;
U(:,1)=u;

% mass
ma = fft(abs(u).^2);
ma0 = ma(1);

% solve pde and plot results
for n =2:Nt+1

vna=exp(0.5*1i*dt*k2x).*v;
una=ifft(vna);
pot=2*(una.*conj(una));
unb=exp(-1i*Es*dt*pot).*una;
vnb=fft(unb);
v=exp(0.5*1i*dt*k2x).*vnb;
t=(n-1)*dt;

if (mod(n,10)==0)
tdata(n/10)=t;
u=ifft(v);
U(:,n/10)=u;
uexact=4*exp(1i*t)*(cosh(3*x)+3*exp(8*1i*t)*cosh(x))...
./(cosh(4*x)+4*cosh(2*x)+3*cos(8*t));
figure(1); clf; plot(x,abs(u).^2); ...
xlim([-0.5,0.5]); title(num2str(t));
figure(2); clf; plot(x,abs(u-uexact).^2);...
xlim([-0.5,0.5]); title(num2str(t));
drawnow;
ma = fft(abs(u).^2);
ma = ma(1);
test = log10(abs(1-ma/ma0))
end
end
figure(3); clf; mesh(tdata(1:(n-1)/10),x,abs(U(:,1:(n-1)/10)).^2);


( Bp)

A Python program which uses Strang splitting to solve the one-dimensional nonlinear Schrödinger equation. Code download
"""
A program to solve the 1D Nonlinear Schrodinger equation using a
second order splitting method. The numerical solution is compared
to an exact soliton solution. The exact equation solved is
iu_t+u_{xx}+2|u|^2u=0

website, in particular:
http://github.enthought.com/mayavi/mayavi/mlab.html
which was last checked on 6 April 2012

"""

import math
import numpy
import matplotlib.pyplot as plt
import time

plt.ion()

# Grid
Lx=16.0 # Period 2*pi*Lx
Nx=8192 # Number of harmonics
Nt=1000 # Number of time slices
tmax=1.0 # Maximum time
dt=tmax/Nt # time step
plotgap=10 # time steps between plots
Es= -1.0 # focusing (+1) or defocusing (-1) parameter
numplots=Nt/plotgap # number of plots to make

x = [i*2.0*math.pi*(Lx/Nx) for i in xrange(-Nx/2,1+Nx/2)]
k_x = (1.0/Lx)*numpy.array([complex(0,1)*n for n in range(0,Nx/2) \
+  + range(-Nx/2+1,0)])

k2xm=numpy.zeros((Nx), dtype=float)
xx=numpy.zeros((Nx), dtype=float)

for i in xrange(Nx):
k2xm[i] = numpy.real(k_x[i]**2)
xx[i]=x[i]

# allocate arrays
usquared=numpy.zeros((Nx), dtype=float)
pot=numpy.zeros((Nx), dtype=float)
u=numpy.zeros((Nx), dtype=complex)
uexact=numpy.zeros((Nx), dtype=complex)
una=numpy.zeros((Nx), dtype=complex)
unb=numpy.zeros((Nx), dtype=complex)
v=numpy.zeros((Nx), dtype=complex)
vna=numpy.zeros((Nx), dtype=complex)
vnb=numpy.zeros((Nx), dtype=complex)
mass=numpy.zeros((Nx), dtype=complex)
test=numpy.zeros((numplots-1),dtype=float)
tdata=numpy.zeros((numplots-1), dtype=float)

t=0.0
u=4.0*numpy.exp(complex(0,1.0)*t)*\
(numpy.cosh(3.0*xx)+3.0*numpy.exp(8.0*complex(0,1.0)*t)*numpy.cosh(xx))\
/(numpy.cosh(4*xx)+4.0*numpy.cosh(2.0*xx)+3.0*numpy.cos(8.0*t))
uexact=u
v=numpy.fft.fftn(u)
usquared=abs(u)**2
fig =plt.figure()
ax.plot(xx,numpy.real(u),'b-')
plt.xlabel('x')
plt.ylabel('real u')
ax.plot(xx,numpy.imag(u),'b-')
plt.xlabel('x')
plt.ylabel('imaginary u')
ax.plot(xx,abs(u-uexact),'b-')
plt.xlabel('x')
plt.ylabel('error')
plt.show()

# initial mass
usquared=abs(u)**2
mass=numpy.fft.fftn(usquared)
ma=numpy.real(mass)
maO=ma
tdata=t
plotnum=0
#solve pde and plot results
for nt in xrange(numplots-1):
for n in xrange(plotgap):
vna=v*numpy.exp(complex(0,0.5)*dt*k2xm)
una=numpy.fft.ifftn(vna)
usquared=2.0*abs(una)**2
pot=Es*usquared
unb=una*numpy.exp(complex(0,-1)*dt*pot)
vnb=numpy.fft.fftn(unb)
v=vnb*numpy.exp(complex(0,0.5)*dt*k2xm)
u=numpy.fft.ifftn(v)
t+=dt
plotnum+=1
usquared=abs(u)**2
uexact = 4.0*numpy.exp(complex(0,1.0)*t)*\
(numpy.cosh(3.0*xx)+3.0*numpy.exp(8.0*complex(0,1.0)*t)*numpy.cosh(xx))\
/(numpy.cosh(4*xx)+4.0*numpy.cosh(2.0*xx)+3.0*numpy.cos(8.0*t))
plt.cla()
ax.plot(xx,numpy.real(u),'b-')
plt.title(t)
plt.xlabel('x')
plt.ylabel('real u')
plt.cla()
ax.plot(xx,numpy.imag(u),'b-')
plt.xlabel('x')
plt.ylabel('imaginary u')
plt.cla()
ax.plot(xx,abs(u-uexact),'b-')
plt.xlabel('x')
plt.ylabel('error')
plt.draw()
mass=numpy.fft.fftn(usquared)
ma=numpy.real(mass)
test[plotnum-1]=numpy.log(abs(1-ma/maO))
print(test[plotnum-1])
tdata[plotnum-1]=t

plt.ioff()
plt.show()


( C)

A Matlab program which uses Strang splitting to solve the two dimensional nonlinear Schrödinger equation Code download
% A program to solve the 2D nonlinear Schr\"{o}dinger equation using a
% splitting method

clear all; format compact; format short;
set(0,'defaultaxesfontsize',30,'defaultaxeslinewidth',.7,...
'defaultlinelinewidth',6,'defaultpatchlinewidth',3.7,'defaultaxesfontweight','bold')

% set up grid
tic
Lx = 20; % period 2*pi*L
Ly = 20; % period 2*pi*L
Nx = 2*256; % number of harmonics
Ny = 2*256; % number of harmonics
Nt = 100; % number of time slices
dt = 5.0/Nt; % time step

Es = 1.0;

% initialise variables
x = (2*pi/Nx)*(-Nx/2:Nx/2 -1)'*Lx; % x coordinate
kx = 1i*[0:Nx/2-1 0 -Nx/2+1:-1]'/Lx; % wave vector
y = (2*pi/Ny)*(-Ny/2:Ny/2 -1)'*Ly; % y coordinate
ky = 1i*[0:Ny/2-1 0 -Ny/2+1:-1]'/Ly; % wave vector
[xx,yy]=meshgrid(x,y);
[k2xm,k2ym]=meshgrid(kx.^2,ky.^2);
% initial conditions
u = exp(-(xx.^2+yy.^2));
v=fft2(u);
figure(1); clf; mesh(xx,yy,u); drawnow;
t=0; tdata(1)=t;

% mass
ma = fft2(abs(u).^2);
ma0 = ma(1,1);

% solve pde and plot results
for n =2:Nt+1
vna=exp(0.5*1i*dt*(k2xm + k2ym)).*v;
una=ifft2(vna);
pot=Es*((abs(una)).^2);
unb=exp(-1i*dt*pot).*una;
vnb=fft2(unb);
v=exp(0.5*1i*dt*(k2xm + k2ym)).*vnb;
u=ifft2(v);
t=(n-1)*dt;
tdata(n)=t;
if (mod(n,10)==0)
figure(2); clf; mesh(xx,yy,abs(u).^2); title(num2str(t));
drawnow;
ma = fft2(abs(u).^2);
ma = ma(1,1);
test = log10(abs(1-ma/ma0))
end
end
figure(4); clf; mesh(xx,yy,abs(u).^2);
toc


( Cp)

A Python program which uses Strang splitting to solve the two-dimensional nonlinear Schrödinger equation. Code download
"""
A program to solve the 2D Nonlinear Schrodinger equation using a
second order splitting method

website, in particular:
http://github.enthought.com/mayavi/mayavi/mlab.html
which was last checked on 6 April 2012

"""

import math
import numpy
from mayavi import mlab
import matplotlib.pyplot as plt
import time

# Grid
Lx=4.0 # Period 2*pi*Lx
Ly=4.0 # Period 2*pi*Ly
Nx=64 # Number of harmonics
Ny=64 # Number of harmonics
Nt=100 # Number of time slices
tmax=1.0 # Maximum time
dt=tmax/Nt # time step
plotgap=10 # time steps between plots
Es= 1.0 # focusing (+1) or defocusing (-1) parameter
numplots=Nt/plotgap # number of plots to make

x = [i*2.0*math.pi*(Lx/Nx) for i in xrange(-Nx/2,1+Nx/2)]
y = [i*2.0*math.pi*(Ly/Ny) for i in xrange(-Ny/2,1+Ny/2)]
k_x = (1.0/Lx)*numpy.array([complex(0,1)*n for n in range(0,Nx/2) \
+  + range(-Nx/2+1,0)])
k_y = (1.0/Ly)*numpy.array([complex(0,1)*n for n in range(0,Ny/2) \
+  + range(-Ny/2+1,0)])

k2xm=numpy.zeros((Nx,Ny), dtype=float)
k2ym=numpy.zeros((Nx,Ny), dtype=float)
xx=numpy.zeros((Nx,Ny), dtype=float)
yy=numpy.zeros((Nx,Ny), dtype=float)

for i in xrange(Nx):
for j in xrange(Ny):
k2xm[i,j] = numpy.real(k_x[i]**2)
k2ym[i,j] = numpy.real(k_y[j]**2)
xx[i,j]=x[i]
yy[i,j]=y[j]

# allocate arrays
usquared=numpy.zeros((Nx,Ny), dtype=float)
pot=numpy.zeros((Nx,Ny), dtype=float)
u=numpy.zeros((Nx,Ny), dtype=complex)
una=numpy.zeros((Nx,Ny), dtype=complex)
unb=numpy.zeros((Nx,Ny), dtype=complex)
v=numpy.zeros((Nx,Ny), dtype=complex)
vna=numpy.zeros((Nx,Ny), dtype=complex)
vnb=numpy.zeros((Nx,Ny), dtype=complex)
mass=numpy.zeros((Nx,Ny), dtype=complex)
test=numpy.zeros((numplots-1),dtype=float)
tdata=numpy.zeros((numplots-1), dtype=float)

u=numpy.exp(-(xx**2 + yy**2 ))
v=numpy.fft.fftn(u)
usquared=abs(u)**2
src = mlab.surf(xx,yy,usquared,colormap='YlGnBu',warp_scale='auto')
mlab.scalarbar()
mlab.xlabel('x',object=src)
mlab.ylabel('y',object=src)
mlab.zlabel('abs(u)^2',object=src)

# initial mass
usquared=abs(u)**2
mass=numpy.fft.fftn(usquared)
ma=numpy.real(mass[0,0])
print(ma)
maO=ma
t=0.0
tdata=t
plotnum=0
#solve pde and plot results
for nt in xrange(numplots-1):
for n in xrange(plotgap):
vna=v*numpy.exp(complex(0,0.5)*dt*(k2xm+k2ym))
una=numpy.fft.ifftn(vna)
usquared=abs(una)**2
pot=Es*usquared
unb=una*numpy.exp(complex(0,-1)*dt*pot)
vnb=numpy.fft.fftn(unb)
v=vnb*numpy.exp(complex(0,0.5)*dt*(k2xm+k2ym) )
u=numpy.fft.ifftn(v)
t+=dt
plotnum+=1
usquared=abs(u)**2
src.mlab_source.scalars = usquared
mass=numpy.fft.fftn(usquared)
ma=numpy.real(mass[0,0])
test[plotnum-1]=numpy.log(abs(1-ma/maO))
print(test[plotnum-1])
tdata[plotnum-1]=t

plt.figure()
plt.plot(tdata,test,'r-')
plt.title('Time Dependence of Change in Mass')
plt.show()


( D)

A Matlab program which uses Strang splitting to solve the three dimensional nonlinear Schrödinger equation Code download
% A program to solve the 3D nonlinear Schr\"{o}dinger equation using a
% splitting method
% updated by Abdullah Bargash , AbdulAziz Al-thiban
%
clear all; format compact; format short;
set(0,'defaultaxesfontsize',30,'defaultaxeslinewidth',.7,...
'defaultlinelinewidth',6,'defaultpatchlinewidth',3.7,...
'defaultaxesfontweight','bold')

% set up grid
tic
Lx = 4; % period 2*pi*L
Ly = 4; % period 2*pi*L
Lz = 4; % period 2*pi*L
Nx = 64; % number of harmonics
Ny = 64; % number of harmonics
Nz = 64; % number of harmonics
Nt = 100; % number of time slices
dt = 1.0/Nt; % time step

Es = 1.0; % focusing or defocusing parameter

% initialise variables
x = (2*pi/Nx)*(-Nx/2:Nx/2 -1)'*Lx; % x coordinate
kx = 1i*[0:Nx/2-1 0 -Nx/2+1:-1]'/Lx; % wave vector
y = (2*pi/Ny)*(-Ny/2:Ny/2 -1)'*Ly; % y coordinate
ky = 1i*[0:Ny/2-1 0 -Ny/2+1:-1]'/Ly; % wave vector
z = (2*pi/Nz)*(-Nz/2:Nz/2 -1)'*Lz; % y coordinate
kz = 1i*[0:Nz/2-1 0 -Nz/2+1:-1]'/Lz; % wave vector
[xx,yy,zz]=meshgrid(x,y,z);
[k2xm,k2ym,k2zm]=meshgrid(kx.^2,ky.^2,kz.^2);

% initial conditions
u = exp(-(xx.^2+yy.^2+zz.^2));
v=fftn(u);
figure(1); clf; UP = abs(u).^2;
H = vol3d('CData',UP,'texture','3D','XData',x,'YData',y,'ZData',z);
xlabel('x'); ylabel('y'); zlabel('z'); colorbar;
axis equal; axis square; view(3);

xlabel('x'); ylabel('y'); zlabel('z');
axis equal; axis square; view(3); drawnow;
t=0; tdata(1)=t;

% mass
ma = fftn(abs(u).^2);
ma0 = ma(1,1,1);

% solve pde and plot results

for n =2:Nt+1
vna=exp(0.5*1i*dt*(k2xm + k2ym + k2zm)).*v;
una=ifftn(vna);
pot=Es*((abs(una)).^2);
unb=exp(-1i*dt*pot).*una;
vnb=fftn(unb);
v=exp(0.5*1i*dt*(k2xm + k2ym + k2zm)).*vnb;
u=ifftn(v);
t=(n-1)*dt;
tdata(n)=t;
if (mod(n,10)==0)

figure(1); clf; UP = abs(u).^2;

H = vol3d('CData',UP,'texture','3D','XData',x,'YData',y,'ZData',z);
xlabel('x'); ylabel('y'); zlabel('z'); colorbar;
axis equal; axis square; view(3);
title(num2str(t));

xlabel('x'); ylabel('y'); zlabel('z');
axis equal; axis square; view(3); drawnow;
ma = fftn(abs(u).^2);
ma = ma(1,1,1); test = log10(abs(1-ma/ma0))

end

end

figure(1); clf; UP = abs(u).^2;

H = vol3d('CData',UP,'texture','3D','XData',x,'YData',y,'ZData',z);
xlabel('x'); ylabel('y'); zlabel('z'); colorbar;
axis equal; axis square; view(3);


## Example One-Dimensional Fortran Program for the Nonlinear Schrödinger Equation

Before considering parallel programs, we need to understand how to write a Fortran code for the one-dimensional nonlinear Schrödinger equation. Below is an example Fortran program followed by a Matlab plotting script to visualize the results. In compiling the Fortran program a standard Fortran compiler and the FFTW library are required. Since the commands required for this are similar to those in the makefile for the heat equation, we do not include them here.

( E)

A Fortran program to solve the 1D nonlinear Schrödinger equation using splitting Code download
!--------------------------------------------------------------------
!
!
! PURPOSE
!
! This program solves nonlinear Schrodinger equation in 1 dimension
! i*u_t+Es*|u|^2u+u_{xx}=0
! using a second order time spectral splitting scheme
!
! The boundary conditions are u(0)=u(2*L*\pi)
! The initial condition is u=exp(-x^2)
!
! .. Parameters ..
! Nx = number of modes in x - power of 2 for FFT
! Nt = number of timesteps to take
! Tmax = maximum simulation time
! plotgap = number of timesteps between plots
! FFTW_IN_PLACE = value for FFTW input
! FFTW_MEASURE = value for FFTW input
! FFTW_EXHAUSTIVE = value for FFTW input
! FFTW_PATIENT = value for FFTW input
! FFTW_ESTIMATE = value for FFTW input
! FFTW_FORWARD = value for FFTW input
! FFTW_BACKWARD = value for FFTW input
! pi = 3.14159265358979323846264338327950288419716939937510d0
! L = width of box
! ES = +1 for focusing and -1 for defocusing
! .. Scalars ..
! i = loop counter in x direction
! n = loop counter for timesteps direction
! allocatestatus = error indicator during allocation
! start = variable to record start time of program
! finish = variable to record end time of program
! count_rate = variable for clock count rate
! planfx = Forward 1d fft plan in x
! planbx = Backward 1d fft plan in x
! dt = timestep
! .. Arrays ..
! u = approximate solution
! v = Fourier transform of approximate solution
! .. Vectors ..
! una = temporary field
! unb = temporary field
! vna = temporary field
! pot = potential
! kx = fourier frequencies in x direction
! x = x locations
! time = times at which save data
! name_config = array to store filename for data to be saved
! fftfx = array to setup x Fourier transform
! fftbx = array to setup x Fourier transform
! REFERENCES
!
! ACKNOWLEDGEMENTS
!
! ACCURACY
!
! ERROR INDICATORS AND WARNINGS
!
! Check that the initial iterate is consistent with the
! boundary conditions for the domain specified
!--------------------------------------------------------------------
! External routines required
!
! External libraries required
! FFTW3 -- Fast Fourier Transform in the West Library
! (http://www.fftw.org/)

PROGRAM main

! Declare variables
IMPLICIT NONE
INTEGER(kind=4), PARAMETER :: Nx=8*256
INTEGER(kind=4), PARAMETER	:: Nt=200
REAL(kind=8), PARAMETER	&
:: pi=3.14159265358979323846264338327950288419716939937510d0
REAL(kind=8), PARAMETER	:: L=5.0d0
REAL(kind=8), PARAMETER	:: Es=1.0d0
REAL(kind=8)	:: dt=2.0d0/Nt
COMPLEX(kind=8), DIMENSION(:), ALLOCATABLE	:: kx
REAL(kind=8), DIMENSION(:), ALLOCATABLE	:: x
COMPLEX(kind=8), DIMENSION(:,:), ALLOCATABLE	:: u
COMPLEX(kind=8), DIMENSION(:,:), ALLOCATABLE	:: v
COMPLEX(kind=8), DIMENSION(:), ALLOCATABLE	:: una,vn
COMPLEX(kind=8), DIMENSION(:), ALLOCATABLE	:: unb,pot
REAL(kind=8), DIMENSION(:), ALLOCATABLE	:: time
INTEGER(kind=4)	:: i,j,k,n,modes,AllocateStatus
INTEGER(kind=4)	:: start, finish, count_rate
INTEGER(kind=4), PARAMETER	:: FFTW_IN_PLACE = 8, FFTW_MEASURE = 0, &
FFTW_EXHAUSTIVE = 8, FFTW_PATIENT = 32, FFTW_ESTIMATE = 64
INTEGER(kind=4), PARAMETER :: FFTW_FORWARD = -1, FFTW_BACKWARD=1
COMPLEX(kind=8), DIMENSION(:), ALLOCATABLE	:: fftfx,fftbx
INTEGER(kind=8)	:: planfx,planbx
CHARACTER*100	:: name_config

CALL system_clock(start,count_rate)
ALLOCATE(kx(1:Nx),x(1:Nx),u(1:Nx,1:Nt+1),v(1:Nx,1:Nt+1),&
una(1:Nx),vn(1:Nx),unb(1:Nx),pot(1:Nx),time(1:Nt+1),&
fftfx(1:Nx),fftbx(1:Nx),stat=AllocateStatus)
IF (allocatestatus .ne. 0) STOP
! set up ffts
CALL dfftw_plan_dft_1d_(planfx,Nx,fftfx(1:Nx),fftbx(1:Nx),&
FFTW_FORWARD,FFTW_PATIENT)
CALL dfftw_plan_dft_1d_(planbx,Nx,fftbx(1:Nx),fftfx(1:Nx),&
FFTW_BACKWARD,FFTW_PATIENT)
PRINT *,'Setup FFTs'
! setup fourier frequencies
DO i=1,1+Nx/2
kx(i)= cmplx(0.0d0,1.0d0)*(i-1.0d0)/L
END DO
kx(1+Nx/2)=0.0d0
DO i = 1,Nx/2 -1
kx(i+1+Nx/2)=-kx(1-i+Nx/2)
END DO
DO i=1,Nx
x(i)=(-1.0d0 + 2.0d0*REAL(i-1,kind(0d0))/REAL(Nx,kind(0d0)))*pi*L
END DO
PRINT *,'Setup grid and fourier frequencies'

DO i=1,Nx
u(i,1)=exp(-1.0d0*(x(i)**2))
END DO
! transform initial data
CALL dfftw_execute_dft_(planfx,u(1:Nx,1),v(1:Nx,1))
PRINT *,'Got initial data, starting timestepping'
time(1)=0.0d0
DO n=1,Nt
time(n+1)=n*dt
DO i=1,Nx
vn(i)=exp(0.5d0*dt*kx(i)*kx(i)*cmplx(0.0d0,1.0d0))*v(i,n)
END DO
CALL dfftw_execute_dft_(planbx,vn(1:Nx),una(1:Nx))
! normalize
DO i=1,Nx
una(i)=una(1:Nx)/REAL(Nx,kind(0d0))
pot(i)=Es*una(i)*conjg(una(i))
unb(i)=exp(cmplx(0.0d0,-1.0d0)*dt*pot(i))*una(i)
END DO
CALL dfftw_execute_dft_(planfx,unb(1:Nx),vn(1:Nx))
DO i=1,Nx
v(i,n+1)=exp(0.50d0*dt*kx(i)*kx(i)*cmplx(0.0d0,1.0d0))*vn(i)
END DO
CALL dfftw_execute_dft_(planbx,v(1:Nx,n+1),u(1:Nx,n+1))
! normalize
DO i=1,Nx
u(i,n+1)=u(i,n+1)/REAL(Nx,kind(0d0))
END DO
END DO
PRINT *,'Finished time stepping'
CALL system_clock(finish,count_rate)
PRINT*,'Program took ',&
REAL(finish-start,kind(0d0))/REAL(count_rate,kind(0d0)),'for execution'

name_config = 'u.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO j=1,Nt
DO i=1,Nx
WRITE(11,*) abs(u(i,j))**2
END DO
END DO
CLOSE(11)

name_config = 'tdata.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO j=1,Nt
WRITE(11,*) time(j)
END DO
CLOSE(11)

name_config = 'xcoord.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO i=1,Nx
WRITE(11,*) x(i)
END DO
CLOSE(11)

PRINT *,'Saved data'

CALL dfftw_destroy_plan_(planbx)
CALL dfftw_destroy_plan_(planfx)
CALL dfftw_cleanup_()

DEALLOCATE(kx,x,u,v,una,vn,unb,&
pot,time,fftfx,fftbx,&
stat=AllocateStatus)
IF (allocatestatus .ne. 0) STOP
PRINT *,'deallocated memory'
PRINT *,'Program execution complete'
END PROGRAM main


( F)

A Matlab program which plots a numerical solution to a 1D nonlinear Schrödinger equation generated by listing E Code download
% A program to plot the computed results

clear all; format compact, format short,
set(0,'defaultaxesfontsize',18,'defaultaxeslinewidth',.9,...
'defaultlinelinewidth',3.5,'defaultpatchlinewidth',5.5);

Tsteps = length(tdata);

Nx = length(xcoord); Nt = length(tdata);

u = reshape(u,Nx,Nt);

% Plot data
figure(3); clf; mesh(tdata,xcoord,u); xlabel t; ylabel x; zlabel('|u|^2');


## Shared Memory Parallel: OpenMP

We recall that OpenMP is a set of compiler directives that can allow one to easily make a Fortran, C or C++ program run on a shared memory machine – that is a computer for which all compute processes can access the same globally addressed memory space. It allows for easy parallelization of serial programs which have already been written in one of the aforementioned languages.

We will demonstrate one form of parallelizm for the two dimensional nonlinear Schrödinger equation in which we will parallelize the loops using OpenMP commands, but will use the threaded FFTW library to parallelize the transforms for us. The example programs are in listing E . A second method to parallelize the loops and Fast Fourier transforms explicitly using OpenMP commands is outlined in the exercises.

( G)

An OpenMP Fortran program to solve the 2D nonlinear Schrödinger equation using splitting and threaded FFTW Code download
!--------------------------------------------------------------------
!
!
! PURPOSE
!
! This program solves nonlinear Schrodinger equation in 2 dimensions
! i*u_t+Es*|u|^2u+u_{xx}+u_{yy}=0
! using a second order time spectral splitting scheme
!
! The boundary conditions are u(x=0,y)=u(2*Lx*\pi,y),
! u(x,y=0)=u(x,y=2*Ly*\pi)
! The initial condition is u=exp(-x^2-y^2)
!
! .. Parameters ..
! Nx = number of modes in x - power of 2 for FFT
! Ny = number of modes in y - power of 2 for FFT
! Nt = number of timesteps to take
! Tmax = maximum simulation time
! plotgap = number of timesteps between plots
! FFTW_IN_PLACE = value for FFTW input
! FFTW_MEASURE = value for FFTW input
! FFTW_EXHAUSTIVE = value for FFTW input
! FFTW_PATIENT = value for FFTW input
! FFTW_ESTIMATE = value for FFTW input
! FFTW_FORWARD = value for FFTW input
! FFTW_BACKWARD = value for FFTW input
! pi = 3.14159265358979323846264338327950288419716939937510d0
! Lx = width of box in x direction
! Ly = width of box in y direction
! ES = +1 for focusing and -1 for defocusing
! .. Scalars ..
! i = loop counter in x direction
! j = loop counter in y direction
! n = loop counter for timesteps direction
! allocatestatus = error indicator during allocation
! ierr = error return code
! start = variable to record start time of program
! finish = variable to record end time of program
! count_rate = variable for clock count rate
! planfx = Forward 1d fft plan in x
! planbx = Backward 1d fft plan in x
! planfy = Forward 1d fft plan in y
! planby = Backward 1d fft plan in y
! dt = timestep
! .. Arrays ..
! u = approximate solution
! v = Fourier transform of approximate solution
! unax = temporary field
! vnax = temporary field
! vnbx = temporary field
! vnay = temporary field
! vnby = temporary field
! potx = potential
! .. Vectors ..
! kx = fourier frequencies in x direction
! ky = fourier frequencies in y direction
! x = x locations
! y = y locations
! time = times at which save data
! name_config = array to store filename for data to be saved
! fftfx = array to setup x Fourier transform
! fftbx = array to setup x Fourier transform
! fftfy = array to setup y Fourier transform
! fftby = array to setup y Fourier transform
!
! REFERENCES
!
! ACKNOWLEDGEMENTS
!
! ACCURACY
!
! ERROR INDICATORS AND WARNINGS
!
! Check that the initial iterate is consistent with the
! boundary conditions for the domain specified
!--------------------------------------------------------------------
! External routines required
!
! External libraries required
! FFTW3 -- Fast Fourier Transform in the West Library
! (http://www.fftw.org/)
! OpenMP library
PROGRAM main
USE omp_lib
IMPLICIT NONE
! Declare variables
INTEGER(kind=4), PARAMETER :: Nx=1024
INTEGER(kind=4), PARAMETER :: Ny=1024
INTEGER(kind=4), PARAMETER	:: Nt=20
INTEGER(kind=4), PARAMETER	:: plotgap=5
REAL(kind=8), PARAMETER	:: &
pi=3.14159265358979323846264338327950288419716939937510d0
REAL(kind=8), PARAMETER	:: Lx=2.0d0
REAL(kind=8), PARAMETER	:: Ly=2.0d0
REAL(kind=8), PARAMETER	:: Es=1.0d0
REAL(kind=8)	:: dt=0.10d0/Nt
COMPLEX(kind=8), DIMENSION(:), ALLOCATABLE	:: kx
COMPLEX(kind=8), DIMENSION(:), ALLOCATABLE	:: ky
REAL(kind=8), DIMENSION(:), ALLOCATABLE	:: x
REAL(kind=8), DIMENSION(:), ALLOCATABLE	:: y
COMPLEX(kind=8), DIMENSION(:,:), ALLOCATABLE:: unax,vnax,vnbx,potx
COMPLEX(kind=8), DIMENSION(:,:), ALLOCATABLE:: vnay,vnby
REAL(kind=8), DIMENSION(:), ALLOCATABLE	:: time
INTEGER(kind=4)	:: i,j,k,n,allocatestatus,ierr
INTEGER(kind=4)	:: start, finish, count_rate, numthreads
INTEGER(kind=8), PARAMETER :: FFTW_IN_PLACE=8, FFTW_MEASURE=0,&
FFTW_EXHAUSTIVE=8, FFTW_PATIENT=32,&
FFTW_ESTIMATE=64
INTEGER(kind=8),PARAMETER :: FFTW_FORWARD=-1, FFTW_BACKWARD=1
INTEGER(kind=8)	:: planfxy,planbxy
CHARACTER*100	:: name_config,number_file

ALLOCATE(kx(1:Nx),ky(1:Nx),x(1:Nx),y(1:Nx),unax(1:Nx,1:Ny),&
vnax(1:Nx,1:Ny),potx(1:Nx,1:Ny),time(1:1+Nt/plotgap),&
stat=allocatestatus)
IF (allocatestatus .ne. 0) stop
PRINT *,'allocated memory'

PRINT *,'Inidicated number of threads to be used in planning'
CALL dfftw_plan_dft_2d_(planfxy,Nx,Ny,unax(1:Nx,1:Ny),vnax(1:Nx,1:Ny),&
FFTW_FORWARD,FFTW_ESTIMATE)
CALL dfftw_plan_dft_2d_(planbxy,Nx,Ny,vnax(1:Nx,1:Ny),unax(1:Nx,1:Ny),&
FFTW_BACKWARD,FFTW_ESTIMATE)
PRINT *,'Setup FFTs'

! setup fourier frequencies
!$OMP PARALLEL PRIVATE(i,j) !$OMP DO SCHEDULE(static)
DO i=1,1+Nx/2
kx(i)= cmplx(0.0d0,1.0d0)*REAL(i-1,kind(0d0))/Lx
END DO
!$OMP END DO kx(1+Nx/2)=0.0d0 !$OMP DO SCHEDULE(static)
DO i = 1,Nx/2 -1
kx(i+1+Nx/2)=-kx(1-i+Nx/2)
END DO
!$OMP END DO !$OMP DO SCHEDULE(static)
DO i=1,Nx
x(i)=(-1.0d0+2.0d0*REAL(i-1,kind(0d0))/REAL(Nx,kind(0d0)) )*pi*Lx
END DO
!$OMP END DO !$OMP DO SCHEDULE(static)
DO j=1,1+Ny/2
ky(j)= cmplx(0.0d0,1.0d0)*REAL(j-1,kind(0d0))/Ly
END DO
!$OMP END DO ky(1+Ny/2)=0.0d0 !$OMP DO SCHEDULE(static)
DO j = 1,Ny/2 -1
ky(j+1+Ny/2)=-ky(1-j+Ny/2)
END DO
!$OMP END DO !$OMP DO SCHEDULE(static)
DO j=1,Ny
y(j)=(-1.0d0+2.0d0*REAL(j-1,kind(0d0))/REAL(Ny,kind(0d0)) )*pi*Ly
END DO
!$OMP END DO PRINT *,'Setup grid and fourier frequencies' !$OMP DO SCHEDULE(static)
DO j=1,Ny
unax(1:Nx,j)=exp(-1.0d0*(x(1:Nx)**2 +y(j)**2))
END DO
!$OMP END DO !$OMP END PARALLEL
name_config = 'uinitial.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO j=1,Ny
DO i=1,Nx
WRITE(11,*) abs(unax(i,j))**2
END DO
END DO
CLOSE(11)
! transform initial data and do first half time step
CALL dfftw_execute_dft_(planfxy,unax(1:Nx,1:Ny),vnax(1:Nx,1:Ny))

PRINT *,'Got initial data, starting timestepping'
time(1)=0.0d0
CALL system_clock(start,count_rate)
DO n=1,Nt
!$OMP PARALLEL DO PRIVATE(j) SCHEDULE(static) DO j=1,Ny DO i=1,Nx vnax(i,j)=exp(0.5d0*dt*(kx(i)*kx(i) + ky(j)*ky(j))& *cmplx(0.0d0,1.0d0))*vnax(i,j) END DO END DO !$OMP END PARALLEL DO
CALL dfftw_execute_dft_(planbxy,vnax(1:Nx,1:Ny),unax(1:Nx,1:Ny))
!$OMP PARALLEL DO PRIVATE(j) SCHEDULE(static) DO j=1,Ny DO i=1,Nx unax(i,j)=unax(i,j)/REAL(Nx*Ny,kind(0d0)) potx(i,j)=Es*unax(i,j)*conjg(unax(i,j)) unax(i,j)=exp(cmplx(0.0d0,-1.0d0)*dt*potx(i,j))& *unax(i,j) END DO END DO !$OMP END PARALLEL DO
CALL dfftw_execute_dft_(planfxy,unax(1:Nx,1:Ny),vnax(1:Nx,1:Ny))
!$OMP PARALLEL DO PRIVATE(i) SCHEDULE(static) DO j=1,Ny DO i=1,Nx vnax(i,j)=exp(0.5d0*dt*(kx(i)*kx(i) + ky(j)*ky(j))& *cmplx(0.0d0,1.0d0))*vnax(i,j) END DO END DO !$OMP END PARALLEL DO
IF (mod(n,plotgap)==0) then
time(1+n/plotgap)=n*dt
PRINT *,'time',n*dt
CALL dfftw_execute_dft_(planbxy,vnax(1:Nx,1:Ny),unax(1:Nx,1:Ny))
!$OMP PARALLEL DO PRIVATE(j) SCHEDULE(static) DO j=1,Ny DO i=1,Nx unax(i,j)=unax(i,j)/REAL(Nx*Ny,kind(0d0)) END DO END DO !$OMP END PARALLEL DO
name_config='./data/u'
WRITE(number_file,'(i0)') 10000000+1+n/plotgap
ind=index(name_config,' ') -1
name_config=name_config(1:ind)//numberfile
ind=index(name_config,' ') -1
name_config=name_config(1:ind)//'.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO j=1,Ny
DO i=1,Nx
WRITE(11,*) abs(unax(i,j))**2
END DO
END DO
CLOSE(11)
END IF
END DO
PRINT *,'Finished time stepping'
CALL system_clock(finish,count_rate)
PRINT*,'Program took ',REAL(finish-start)/REAL(count_rate),&
'for Time stepping'

name_config = 'tdata.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO j=1,1+Nt/plotgap
WRITE(11,*) time(j)
END DO
CLOSE(11)

name_config = 'xcoord.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO i=1,Nx
WRITE(11,*) x(i)
END DO
CLOSE(11)

name_config = 'ycoord.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO j=1,Ny
WRITE(11,*) y(j)
END DO
CLOSE(11)
PRINT *,'Saved data'

CALL dfftw_destroy_plan_(planbxy)
CALL dfftw_destroy_plan_(planfxy)

DEALLOCATE(unax,vnax,potx,stat=allocatestatus)
IF (allocatestatus .ne. 0) STOP
PRINT *,'Deallocated memory'

PRINT *,'Program execution complete'
END PROGRAM main


( H)

An example makefile for compiling the OpenMP program in listing G Code download
#define the complier
COMPILER = gfortran
# compilation settings, optimization, precision, parallelization
FLAGS = -O3 -fopenmp

# libraries
LIBS = -L/usr/local/lib -lfftw3 -lm
# source list for main program
SOURCES = NLSsplitting.f90

test: $(SOURCES)${COMPILER} -o NLSsplitting $(FLAGS)$(SOURCES) $(LIBS) clean: rm *.o clobber: rm NLSsplitting  The example assumes one is using Flux and has loaded environments for the GCC compiler as well as the GCC compiled version of FFTW. To use the Intel compiler to with this code, the OMP stack size needs to be explicitly set to be large enough. If one is using the PGI compilers instead of the GCC compilers, change the flag $-fopenmp$ to $-mp$ in the makefile. ( I) A Matlab program which plots a numerical solution to a 2D nonlinear Schrödinger equation generated by listing G Code download % A program to plot the computed results for the 2D NLS equation clear all; format compact, format short, set(0,'defaultaxesfontsize',18,'defaultaxeslinewidth',.9,... 'defaultlinelinewidth',3.5,'defaultpatchlinewidth',5.5); % Load data load('./ufinal.dat'); load('./tdata.dat'); load('./ycoord.dat'); load('./xcoord.dat'); Ny = length(ycoord); Nx = length(xcoord); Nt = length(tdata); ufinal = reshape(ufinal,Nx,Ny); % Plot data figure(3); clf; mesh(xcoord,ycoord,ufinal); xlabel x; ylabel y; zlabel('|u|^2');  ( J) An example submission script for use on Flux. Change {your_username} appropriately. Code download #!/bin/bash #PBS -N NLS #PBS -l nodes=1:ppn=2,walltime=00:03:00 #PBS -q flux #PBS -l qos=math471f11_flux #PBS -A math471f11_flux #PBS -M your_username@umich.edu #PBS -m abe #PBS -V # # Create a local directory to run and copy your files to local. # Let PBS handle your output cp${HOME}/parallelspectralintro/NLSsplitting /nobackup/your_username/NLSsplitting

./NLSsplitting


## Exercises

1. Download the example Matlab programs which accompany the pre-print by Klein, Muite and Roidot. Examine how the mass and energy for these Schrödinger like equations are computed. Add code to check conservation of mass and energy to the Matlab programs for the nonlinear Schrödinger equation.

2. The Gross-Pitaevskii equation is given by $i\psi _{t}+|\psi |^{2}\psi +V(\mathbf {x} )\psi =0$  where we will take $V(\mathbf {x} )=\lVert \mathbf {x} \rVert _{l^{2}}^{2}=\sum _{k=1}^{N}x_{k}^{2}$  in which $N$  is the space dimension. Show that this equation can be solved by splitting it into

$i\psi _{t}+\Delta \psi =0$

( 5)

and

$i\psi _{t}+|\psi |^{2}\psi +V(\mathbf {x} )\psi =0$ .

( 6)

Be sure to explain how eqs. 5 , 6 are solved.

3. Modify the Matlab codes to solve the Gross-Pitaevskii equation in one, two and three dimensions.

4. Modify the serial Fortran codes to solve the Gross-Pitaevskii equation in one, two and three dimensions.

5. Listings J and K give an alternate method of parallelizing an OpenMP program. Make the program in listing G as efficient as possible and as similar to that in J , but without changing the parallelization strategy. Compare the speed of the two different programs. Try to vary the number of grid points and cores used. Which code is faster on your system? Why do you think this is?

( J)

An OpenMP Fortran program to solve the 2D nonlinear Schrödinger equation using splitting Code download
!--------------------------------------------------------------------
!
!
! PURPOSE
!
! This program solves nonlinear Schrodinger equation in 2 dimensions
! i*u_t+Es*|u|^2u+u_{xx}+u_{yy}=0
! using a second order time spectral splitting scheme
!
! The boundary conditions are u(x=0,y)=u(2*Lx*\pi,y),
! u(x,y=0)=u(x,y=2*Ly*\pi)
! The initial condition is u=exp(-x^2-y^2)
!
! .. Parameters ..
! Nx = number of modes in x - power of 2 for FFT
! Ny = number of modes in y - power of 2 for FFT
! Nt = number of timesteps to take
! Tmax = maximum simulation time
! plotgap = number of timesteps between plots
! FFTW_IN_PLACE = value for FFTW input
! FFTW_MEASURE = value for FFTW input
! FFTW_EXHAUSTIVE = value for FFTW input
! FFTW_PATIENT = value for FFTW input
! FFTW_ESTIMATE = value for FFTW input
! FFTW_FORWARD = value for FFTW input
! FFTW_BACKWARD = value for FFTW input
! pi = 3.14159265358979323846264338327950288419716939937510d0
! Lx = width of box in x direction
! Ly = width of box in y direction
! ES = +1 for focusing and -1 for defocusing
! .. Scalars ..
! i = loop counter in x direction
! j = loop counter in y direction
! n = loop counter for timesteps direction
! allocatestatus = error indicator during allocation
! start = variable to record start time of program
! finish = variable to record end time of program
! count_rate = variable for clock count rate
! planfx = Forward 1d fft plan in x
! planbx = Backward 1d fft plan in x
! planfy = Forward 1d fft plan in y
! planby = Backward 1d fft plan in y
! dt = timestep
! .. Arrays ..
! u = approximate solution
! v = Fourier transform of approximate solution
! unax = temporary field
! vnax = temporary field
! vnbx = temporary field
! vnay = temporary field
! vnby = temporary field
! potx = potential
! .. Vectors ..
! kx = fourier frequencies in x direction
! ky = fourier frequencies in y direction
! x = x locations
! y = y locations
! time = times at which save data
! name_config = array to store filename for data to be saved
! fftfx = array to setup x Fourier transform
! fftbx = array to setup x Fourier transform
! fftfy = array to setup y Fourier transform
! fftby = array to setup y Fourier transform
!
! REFERENCES
!
! ACKNOWLEDGEMENTS
!
! ACCURACY
!
! ERROR INDICATORS AND WARNINGS
!
! Check that the initial iterate is consistent with the
! boundary conditions for the domain specified
!--------------------------------------------------------------------
! External routines required
!
! External libraries required
! FFTW3 -- Fast Fourier Transform in the West Library
! (http://www.fftw.org/)
! OpenMP library

PROGRAM main
USE omp_lib
IMPLICIT NONE
! Declare variables
INTEGER(kind=4), PARAMETER :: Nx=2**8
INTEGER(kind=4), PARAMETER :: Ny=2**8
INTEGER(kind=4), PARAMETER	:: Nt=20
INTEGER(kind=4), PARAMETER	:: plotgap=5
REAL(kind=8), PARAMETER	:: &
pi=3.14159265358979323846264338327950288419716939937510d0
REAL(kind=8), PARAMETER	:: Lx=2.0d0
REAL(kind=8), PARAMETER	:: Ly=2.0d0
REAL(kind=8), PARAMETER	:: Es=0.0d0
REAL(kind=8)	:: dt=0.10d0/Nt
COMPLEX(kind=8), DIMENSION(:), ALLOCATABLE	:: kx,ky
REAL(kind=8), DIMENSION(:), ALLOCATABLE	:: x,y
COMPLEX(kind=8), DIMENSION(:,:), ALLOCATABLE:: unax,vnax,vnbx,potx
COMPLEX(kind=8), DIMENSION(:,:), ALLOCATABLE:: vnay,vnby
REAL(kind=8), DIMENSION(:), ALLOCATABLE	:: time
INTEGER(kind=4)	:: i,j,k,n,allocatestatus
INTEGER(kind=4)	:: start, finish, count_rate
INTEGER(kind=8), PARAMETER :: FFTW_IN_PLACE=8, FFTW_MEASURE=0,&
FFTW_EXHAUSTIVE=8, FFTW_PATIENT=32,&
FFTW_ESTIMATE=64
INTEGER(kind=8),PARAMETER :: FFTW_FORWARD=-1, FFTW_BACKWARD=1
COMPLEX(kind=8), DIMENSION(:), ALLOCATABLE :: fftfx,fftbx,fftfy,fftby
INTEGER(kind=8)	:: planfx,planbx,planfy,planby
CHARACTER*100	:: name_config

ALLOCATE(kx(1:Nx),ky(1:Nx),x(1:Nx),y(1:Nx),unax(1:Nx,1:Ny),&
vnax(1:Nx,1:Ny),vnbx(1:Nx,1:Ny),potx(1:Nx,1:Ny),fftfx(1:Nx),&
fftbx(1:Nx),fftfy(1:Nx),fftby(1:Nx),vnay(1:Ny,1:Nx),&
vnby(1:Ny,1:Nx),time(1:1+Nt/plotgap),stat=allocatestatus)
IF (allocatestatus .ne. 0) stop
PRINT *,'allocated memory'
! set up ffts
CALL dfftw_plan_dft_1d_(planfx,Nx,fftfx(1:Nx),fftbx(1:Nx),&
FFTW_FORWARD,FFTW_ESTIMATE)
CALL dfftw_plan_dft_1d_(planbx,Nx,fftbx(1:Nx),fftfx(1:Nx),&
FFTW_BACKWARD,FFTW_ESTIMATE)
CALL dfftw_plan_dft_1d_(planfy,Ny,fftfy(1:Ny),fftby(1:Ny),&
FFTW_FORWARD,FFTW_ESTIMATE)
CALL dfftw_plan_dft_1d_(planby,Ny,fftby(1:Ny),fftfy(1:Ny),&
FFTW_BACKWARD,FFTW_ESTIMATE)
PRINT *,'Setup FFTs'

! setup fourier frequencies
!$OMP PARALLEL DO PRIVATE(i) SCHEDULE(static) DO i=1,1+Nx/2 kx(i)= cmplx(0.0d0,1.0d0)*REAL(i-1,kind(0d0))/Lx END DO !$OMP END PARALLEL DO
kx(1+Nx/2)=0.0d0
!$OMP PARALLEL DO PRIVATE(i) SCHEDULE(static) DO i = 1,Nx/2 -1 kx(i+1+Nx/2)=-kx(1-i+Nx/2) END DO !$OMP END PARALLEL DO
!$OMP PARALLEL DO PRIVATE(i) SCHEDULE(static) DO i=1,Nx x(i)=(-1.0d0+2.0d0*REAL(i-1,kind(0d0))/REAL(Nx,kind(0d0)) )*pi*Lx END DO !$OMP END PARALLEL DO
!$OMP PARALLEL DO PRIVATE(j) SCHEDULE(static) DO j=1,1+Ny/2 ky(j)= cmplx(0.0d0,1.0d0)*REAL(j-1,kind(0d0))/Ly END DO !$OMP END PARALLEL DO
ky(1+Ny/2)=0.0d0
!$OMP PARALLEL DO PRIVATE(j) SCHEDULE(static) DO j = 1,Ny/2 -1 ky(j+1+Ny/2)=-ky(1-j+Ny/2) END DO !$OMP END PARALLEL DO
!$OMP PARALLEL DO PRIVATE(j) SCHEDULE(static) DO j=1,Ny y(j)=(-1.0d0+2.0d0*REAL(j-1,kind(0d0))/REAL(Ny,kind(0d0)) )*pi*Ly END DO !$OMP END PARALLEL DO
PRINT *,'Setup grid and fourier frequencies'
!$OMP PARALLEL DO PRIVATE(j) SCHEDULE(static) DO j=1,Ny DO i=1,Nx unax(i,j)=exp(-1.0d0*(x(i)**2 +y(j)**2)) END DO END DO !$OMP END PARALLEL DO
name_config = 'uinitial.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO j=1,Ny
DO i=1,Nx
WRITE(11,*) abs(unax(i,j))**2
END DO
END DO
CLOSE(11)
!$OMP PARALLEL DO PRIVATE(j) SCHEDULE(static) DO j=1,Ny DO i=1,Nx CALL dfftw_execute_dft_(planfx,unax(i,j),vnax(i,j)) END DO END DO !$OMP END PARALLEL DO
vnay(1:Ny,1:Nx)=TRANSPOSE(vnax(1:Nx,1:Ny))
! transform initial data and do first half time step
!$OMP PARALLEL DO PRIVATE(i) SCHEDULE(static) DO i=1,Nx CALL dfftw_execute_dft_(planfy,vnay(1:Ny,i),vnby(1:Ny,i)) DO j=1,Ny vnby(j,i)=exp(0.5d0*dt*(kx(i)*kx(i) + ky(j)*ky(j))& *cmplx(0.0d0,1.0d0))*vnby(j,i) END DO CALL dfftw_execute_dft_(planby,vnby(j,i),vnay(j,i)) END DO !$OMP END PARALLEL DO
PRINT *,'Got initial data, starting timestepping'
time(1)=0.0d0
CALL system_clock(start,count_rate)
DO n=1,Nt
vnbx(1:Nx,1:Ny)=TRANSPOSE(vnay(1:Ny,1:Nx))/REAL(Ny,kind(0d0))
!$OMP PARALLEL DO PRIVATE(j) SCHEDULE(static) DO j=1,Ny CALL dfftw_execute_dft_(planbx,vnbx(1:Nx,j),unax(1:Nx,j)) DO i=1,Nx unax(i,j)=unax(1:Nx,j)/REAL(Nx,kind(0d0)) potx(i,j)=Es*unax(i,j)*conjg(unax(i,j)) unax(i,j)=exp(cmplx(0.0d0,-1.0d0)*dt*potx(i,j))& *unax(i,j) END DO CALL dfftw_execute_dft_(planfx,unax(1:Nx,j),vnax(1:Nx,j)) END DO !$OMP END PARALLEL DO
vnby(1:Ny,1:Nx)=TRANSPOSE(vnax(1:Nx,1:Ny))
!$OMP PARALLEL DO PRIVATE(i) SCHEDULE(static) DO i=1,Nx CALL dfftw_execute_dft_(planfy,vnby(1:Ny,i),vnay(1:Ny,i)) DO j=1,Ny vnby(j,i)=exp(dt*(kx(i)*kx(i) + ky(j)*ky(j))& *cmplx(0.0d0,1.0d0))*vnay(j,i) END DO CALL dfftw_execute_dft_(planby,vnby(1:Ny,i),vnay(1:Ny,i)) END DO !$OMP END PARALLEL DO
IF (mod(n,plotgap)==0) then
time(1+n/plotgap)=n*dt
PRINT *,'time',n*dt
END IF
END DO
PRINT *,'Finished time stepping'
CALL system_clock(finish,count_rate)
PRINT*,'Program took ',REAL(finish-start)/REAL(count_rate),&
'for Time stepping'

! transform back final data and do another half time step
vnbx(1:Nx,1:Ny)=transpose(vnay(1:Ny,1:Nx))/REAL(Ny,kind(0d0))
!$OMP PARALLEL DO PRIVATE(j) SCHEDULE(static) DO j=1,Ny CALL dfftw_execute_dft_(planbx,vnbx(1:Nx,j),unax(1:Nx,j)) unax(1:Nx,j)=unax(1:Nx,j)/REAL(Nx,kind(0d0)) potx(1:Nx,j)=Es*unax(1:Nx,j)*conjg(unax(1:Nx,j)) unax(1:Nx,j)=exp(cmplx(0,-1)*dt*potx(1:Nx,j))*unax(1:Nx,j) CALL dfftw_execute_dft_(planfx,unax(1:Nx,j),vnax(1:Nx,j)) END DO !$OMP END PARALLEL DO
vnby(1:Ny,1:Nx)=TRANSPOSE(vnax(1:Nx,1:Ny))
!$OMP PARALLEL DO PRIVATE(i) SCHEDULE(static) DO i=1,Nx CALL dfftw_execute_dft_(planfy,vnby(1:Ny,i),vnay(1:Ny,i)) vnby(1:Ny,i)=exp(0.5d0*dt*(kx(i)*kx(i) + ky(1:Ny)*ky(1:Ny))& *cmplx(0,1))*vnay(1:Ny,i) CALL dfftw_execute_dft_(planby,vnby(1:Ny,i),vnay(1:Ny,i)) END DO !$OMP END PARALLEL DO
vnbx(1:Nx,1:Ny)=TRANSPOSE(vnay(1:Ny,1:Nx))/REAL(Ny,kind(0d0))
!$OMP PARALLEL DO PRIVATE(j) SCHEDULE(static) DO j=1,Ny CALL dfftw_execute_dft_(planbx,vnbx(1:Nx,j),unax(1:Nx,j)) unax(1:Nx,j)=unax(1:Nx,j)/REAL(Nx,kind(0d0)) END DO !$OMP END PARALLEL DO
name_config = 'ufinal.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO j=1,Ny
DO i=1,Nx
WRITE(11,*) abs(unax(i,j))**2
END DO
END DO
CLOSE(11)

name_config = 'tdata.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO j=1,1+Nt/plotgap
WRITE(11,*) time(j)
END DO
CLOSE(11)

name_config = 'xcoord.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO i=1,Nx
WRITE(11,*) x(i)
END DO
CLOSE(11)

name_config = 'ycoord.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO j=1,Ny
WRITE(11,*) y(j)
END DO
CLOSE(11)
PRINT *,'Saved data'

CALL dfftw_destroy_plan_(planbx)
CALL dfftw_destroy_plan_(planfx)
CALL dfftw_destroy_plan_(planby)
CALL dfftw_destroy_plan_(planfy)
CALL dfftw_cleanup_()

DEALLOCATE(unax,vnax,vnbx,potx, vnay,vnby,stat=allocatestatus)
IF (allocatestatus .ne. 0) STOP
PRINT *,'Deallocated memory'

PRINT *,'Program execution complete'
END PROGRAM main


( K)

An example makefile for compiling the OpenMP program in listing J Code download
#define the complier
COMPILER = gfortran
# compilation settings, optimization, precision, parallelization
FLAGS = -O0 -fopenmp

# libraries
LIBS = -L/usr/local/lib -lfftw3 -lm
# source list for main program
SOURCES = NLSsplitting.f90

test: $(SOURCES)${COMPILER} -o NLSsplitting $(FLAGS)$(SOURCES) $(LIBS) clean: rm *.o clobber: rm NLSsplitting  The example assumes one is using Flux and has loaded environments for the intel compiler as well as the Intel compiled version of FFTW. If one is using the freely available GCC compilers instead of the Intel compilers, change the flag $-openmp$ to $-fopenmp$ in the makefile. 6. Modify the OpenMP Fortran codes to solve the Gross-Pitaevskii equation in two and three dimensions. 7.  Some quantum hydrodynamic models for plasmas are very similar to the nonlinear Schrodinger equation and can also be numerically approximated using splitting methods. A model for a plasma used by Eliasson and Shukla is $i\Psi _{t}+\Delta \Psi +\phi \Psi -|\Psi |^{4/D}\Psi =0$ ( 8) and $\Delta \phi =|\Psi |^{2}-1$ ( 9) where $\Psi$ is the effective wave function, $\phi$ the electrostatic potential and $D$ the dimension, typically 1,2 or 3. This equation can be solved in a similar manner to the Davey-Stewartson equations in Klein, Muite and Roidot. Specifically, first solve $iP_{t}+\Delta P=0$ ( 10) using the Fourier transform so that $P(\delta t)=\exp \left(-i\Delta \delta t\right)P(0)$ where $P(0)=\Psi (0)$ . Then solve $\phi =\Delta ^{-1}\left(|P|^{2}-1\right)$ ( 11) using the Fourier transform. Finally, solve $iQ_{t}+\phi Q-|Q|^{4/D}Q=0$ ( 12) using the fact that at each grid point $\phi -|Q|^{4/D}$ is a constant, so the solution is $Q(\delta t)=\exp \left[i\left(\phi -|\Phi |^{4/D}\right)\delta t\right]Q(0)$ with $Q(0)=P(\delta t)$ and $\Psi (\delta t)\approx Q(\delta t)$ . 8. The operator splitting method can be used for equations other than the nonlinear Schrödinger equation. Another equation for which operator splitting can be used is the complex Ginzburg-Landau equation ${\frac {\partial A}{\partial t}}=A+(1+i\alpha )\Delta A-(1+i\beta )|A|^{2}A,$ where $A$ is a complex function, typically of one, two or three variables. An example one dimensional code is provided in listing L , based on an earlier finite difference code by Blanes, Casa, Chartier and Miura, using the methods described in Blanes et al.. By using complex coefficients, Blanes et al. can create high order splitting methods for parabolic equations. Previous attempts to do this have failed since if only real coefficients are used, a backward step which is required for methods higher than second order leads to numerical instability. Modify the example code to solve the complex Ginzburg-Landau equation in one, two and then in three spatial dimensions. The linear part ${\frac {\partial A}{\partial t}}=A+(1+i\alpha )\Delta A$ can be solved explicitly using the Fourier transform. To solve the nonlinear part, ${\frac {\partial A}{\partial t}}=-(1+i\beta )|A|^{2}A$ consider ${\frac {\partial |A|^{2}}{\partial t}}={\frac {\partial A}{\partial t}}A^{*}+{\frac {\partial A^{*}}{\partial t}}A=2|A|^{4}$ and solve this exactly for $|A|^{2}$ . To recover the phase, observe that ${\frac {\partial \log(A)}{\partial t}}=-(1+i\beta )|A|^{2}$ which can also be integrated explicitly since $|A|^{2}(t)$ is known. ( L) A Matlab program which uses 16th order splitting to solve the cubic nonlinear Schrödinger equation Code download % A program to solve the nonlinear Schr\"{o}dinger equation using a % splitting method. The numerical solution is compared to an exact % solution. % S. Blanes, F. Casas, P. Chartier and A. Murua % "Optimized high-order splitting methods for some classes of parabolic % equations" % ArXiv pre-print 1102.1622v2 % Forthcoming Mathematics of Computation clear all; format compact; format short; set(0,'defaultaxesfontsize',30,'defaultaxeslinewidth',.7,... 'defaultlinelinewidth',6,'defaultpatchlinewidth',3.7,... 'defaultaxesfontweight','bold') % set up grid Lx = 20; % period 2*pi * L Nx = 16384; % number of harmonics Nt = 2000; % number of time slices dt = 0.25*pi/Nt;% time step U=zeros(Nx,Nt/10); method=3; % splitting method: 1 Strang, 2 CCDV10, 3 Blanes et al 2012 % initialise variables x = (2*pi/Nx)*(-Nx/2:Nx/2 -1)'*Lx; % x coordinate kx = 1i*[0:Nx/2-1 0 -Nx/2+1:-1]'/Lx; % wave vector % initial conditions t=0; tdata(1)=t; u=4*exp(1i*t)*(cosh(3*x)+3*exp(8*1i*t)*cosh(x))... ./(cosh(4*x)+4*cosh(2*x)+3*cos(8*t)); v=fft(u); figure(1); clf; plot(x,u);xlim([-2,2]); drawnow; U(:,1)=u; % mass ma = fft(abs(u).^2); ma0 = ma(1); if method==1, % % Strang-Splitting % s=2; a=[1;0]; b=[1/2;1/2]; % elseif method==2, % % Method of Castella, Chartier, Descombes and Vilmart % BIT Numerical Analysis vol 49 pp 487-508, 2009 % s=5; a=[1/4;1/4;1/4;1/4;0]; b=[1/10-1i/30;4/15+2*1i/15;4/15-1i/5;4/15+2*1i/15;1/10-1i/30]; % elseif method==3, % % Method of Blanes, Casas, Chartier and Murua 2012 % s=17; a=1/16*[1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;0]; b=[0.028920177910074098791 - 0.005936580835725746103*1i; 0.056654351383649876160 + 0.020841963949772627119*1i; 0.067258385822722143569 - 0.039386393748812362460*1i; 0.070333980553260772061 + 0.058952097930307840316*1i; 0.077095100838099173580 - 0.038247636602014810025*1i; 0.042022140317231098258 - 0.033116379859951038579*1i; 0.050147397749937784280 + 0.061283684958324249562*1i; 0.047750191909146143447 - 0.032332468814362628262*1i; 0.119636547031757819706 + 0.015883426044923736862*1i; 0.047750191909146143447 - 0.032332468814362628262*1i; 0.050147397749937784280 + 0.061283684958324249562*1i; 0.042022140317231098258 - 0.033116379859951038579*1i; 0.077095100838099173580 - 0.038247636602014810025*1i; 0.070333980553260772061 + 0.058952097930307840316*1i; 0.067258385822722143569 - 0.039386393748812362460*1i; 0.056654351383649876160 + 0.020841963949772627119*1i; 0.028920177910074098791 - 0.005936580835725746103*1i]; end; % solve pde and plot results for n =2:Nt+1 for m=1:(s-1) vna=exp(b(m)*1i*dt*kx.*kx).*v; una=ifft(vna); pot=(2*una.*conj(una)); unb=exp(-1i*a(m)*(-1)*dt*pot).*una; v=fft(unb); end v=exp(b(s)*1i*dt*kx.*kx).*v; u=ifft(v); t=(n-1)*dt; if (mod(n,10)==0) tdata(n/10)=t; u=ifft(v); U(:,n/10)=u; uexact=... 4*exp(1i*t)*(cosh(3*x)+3*exp(8*1i*t)*cosh(x))... ./(cosh(4*x)+4*cosh(2*x)+3*cos(8*t)); figure(1); clf; plot(x,abs(u).^2); ... xlim([-0.5,0.5]); title(num2str(t)); figure(2); clf; loglog(abs(v(1:Nx/2))); ... title('Fourier Coefficients'); figure(3); clf; plot(x,abs(u-uexact).^2); ... xlim([-0.5,0.5]); title('error'); drawnow; ma = fft(abs(u).^2); ma = ma(1); test = log10(abs(1-ma/ma0)) end end figure(4); clf; mesh(tdata(1:(n-1)/10),x,abs(U(:,1:(n-1)/10)).^2); xlim([0,t]);  ## Distributed Memory Parallel: MPI For this section, we will use the library 2DECOMP&FFT available from http://www.2decomp.org/index.html. The website includes some examples which indicate how this library should be used, in particular the sample code at http://www.2decomp.org/case_study1.html is a very helpful indication of how one converts a code that uses FFTW to one that uses MPI and the aforementioned library. Before creating a parallel MPI code using 2DECOMP&FFT, we will generate a serial Fortran code that uses splitting to solve the 3D nonlinear Schrodinger equation. Rather than using loop-based parallelization to do a sequence of one dimensional fast Fourier transforms, we will use FFTW's three dimensional FFT, so that the serial version and MPI parallel version have the same structure. The serial version is in listing E . This file can be compiled in a similar manner to that in Listing A of the chapter Fortran Programs. ( M) A Fortran program to solve the 3D nonlinear Schrödinger equation using splitting and FFTW Code download !-------------------------------------------------------------------- ! ! ! PURPOSE ! ! This program solves nonlinear Schrodinger equation in 3 dimensions ! i*u_t+Es*|u|^2u+u_{xx}+u_{yy}+u_{zz}=0 ! using a second order time spectral splitting scheme ! ! The boundary conditions are u(x=0,y,z)=u(2*Lx*\pi,y,z), ! u(x,y=0,z)=u(x,y=2*Ly*\pi,z), u(x,y,z=0)=u(x,y,z=2*Lz*\pi) ! The initial condition is u=exp(-x^2-y^2) ! ! .. Parameters .. ! Nx = number of modes in x - power of 2 for FFT ! Ny = number of modes in y - power of 2 for FFT ! Nz = number of modes in z - power of 2 for FFT ! Nt = number of timesteps to take ! Tmax = maximum simulation time ! plotgap = number of timesteps between plots ! FFTW_IN_PLACE = value for FFTW input ! FFTW_MEASURE = value for FFTW input ! FFTW_EXHAUSTIVE = value for FFTW input ! FFTW_PATIENT = value for FFTW input ! FFTW_ESTIMATE = value for FFTW input ! FFTW_FORWARD = value for FFTW input ! FFTW_BACKWARD = value for FFTW input ! pi = 3.14159265358979323846264338327950288419716939937510d0 ! Lx = width of box in x direction ! Ly = width of box in y direction ! Lz = width of box in z direction ! ES = +1 for focusing and -1 for defocusing ! .. Scalars .. ! i = loop counter in x direction ! j = loop counter in y direction ! k = loop counter in z direction ! n = loop counter for timesteps direction ! allocatestatus = error indicator during allocation ! start = variable to record start time of program ! finish = variable to record end time of program ! count_rate = variable for clock count rate ! count = keep track of information written to disk ! iol = size of array to write to disk ! planfxyz = Forward 3d fft plan ! planbxyz = Backward 3d fft plan ! dt = timestep ! modescalereal = Number to scale after backward FFT ! ierr = error code ! .. Arrays .. ! unax = approximate solution ! vnax = Fourier transform of approximate solution ! potx = potential ! .. Vectors .. ! kx = fourier frequencies in x direction ! ky = fourier frequencies in y direction ! x = x locations ! y = y locations ! time = times at which save data ! name_config = array to store filename for data to be saved ! fftfxy = array to setup 2D Fourier transform ! fftbxy = array to setup 2D Fourier transform ! ! REFERENCES ! ! ACKNOWLEDGEMENTS ! ! ACCURACY ! ! ERROR INDICATORS AND WARNINGS ! ! FURTHER COMMENTS ! Check that the initial iterate is consistent with the ! boundary conditions for the domain specified !-------------------------------------------------------------------- ! External routines required ! ! External libraries required ! FFTW3 -- Fast Fourier Transform in the West Library ! (http://www.fftw.org/) PROGRAM main ! Declare variables IMPLICIT NONE INTEGER(kind=4), PARAMETER :: Nx=2**5 INTEGER(kind=4), PARAMETER :: Ny=2**5 INTEGER(kind=4), PARAMETER :: Nz=2**5 INTEGER(kind=4), PARAMETER :: Nt=50 INTEGER(kind=4), PARAMETER :: plotgap=10 REAL(kind=8), PARAMETER ::& pi=3.14159265358979323846264338327950288419716939937510d0 REAL(kind=8), PARAMETER :: Lx=2.0d0,Ly=2.0d0,Lz=2.0d0 REAL(kind=8), PARAMETER :: Es=1.0d0 REAL(kind=8) :: dt=0.10d0/Nt REAL(kind=8) :: modescalereal COMPLEX(kind=8), DIMENSION(:), ALLOCATABLE :: kx,ky,kz REAL(kind=8), DIMENSION(:), ALLOCATABLE :: x,y,z COMPLEX(kind=8), DIMENSION(:,:,:), ALLOCATABLE :: unax,vnax,potx REAL(kind=8), DIMENSION(:), ALLOCATABLE :: time INTEGER(kind=4) :: i,j,k,n,AllocateStatus,count,iol ! timing INTEGER(kind=4) :: start, finish, count_rate ! fftw variables INTEGER(kind=8), PARAMETER :: FFTW_IN_PLACE = 8, FFTW_MEASURE = 0, & FFTW_EXHAUSTIVE = 8, FFTW_PATIENT = 32, FFTW_ESTIMATE = 64 INTEGER(kind=8),PARAMETER :: FFTW_FORWARD = -1, FFTW_BACKWARD=1 INTEGER(kind=8) :: planfxyz,planbxyz CHARACTER*100 :: name_config, number_file CALL system_clock(start,count_rate) ALLOCATE(unax(1:Nx,1:Ny,1:Nz),vnax(1:Nx,1:Ny,1:Nz),potx(1:Nx,1:Ny,1:Nz),& kx(1:Nx),ky(1:Ny),kz(1:Nz),x(1:Nx),y(1:Ny),z(1:Nz),& time(1:1+Nt/plotgap),stat=AllocateStatus) IF (AllocateStatus .ne. 0) STOP PRINT *,'allocated space' modescalereal=1.0d0/REAL(Nx,KIND(0d0)) modescalereal=modescalereal/REAL(Ny,KIND(0d0)) modescalereal=modescalereal/REAL(Nz,KIND(0d0)) ! set up ffts CALL dfftw_plan_dft_3d_(planfxyz,Nx,Ny,Nz,unax(1:Nx,1:Ny,1:Nz),& vnax(1:Nx,1:Ny,1:Nz),FFTW_FORWARD,FFTW_ESTIMATE) CALL dfftw_plan_dft_3d_(planbxyz,Nx,Ny,Nz,vnax(1:Nx,1:Ny,1:Nz),& unax(1:Nx,1:Ny,1:Nz),FFTW_BACKWARD,FFTW_ESTIMATE) PRINT *,'Setup FFTs' ! setup fourier frequencies and grid points DO i=1,1+Nx/2 kx(i)= cmplx(0.0d0,1.0)*REAL(i-1,kind(0d0))/Lx END DO kx(1+Nx/2)=0.0d0 DO i = 1,Nx/2 -1 kx(i+1+Nx/2)=-kx(1-i+Nx/2) END DO DO i=1,Nx x(i)=(-1.0d0+2.0d0*REAL(i-1,kind(0d0))/REAL(Nx,kind(0d0)) )*pi*Lx END DO DO j=1,1+Ny/2 ky(j)= cmplx(0.0d0,1.0d0)*REAL(j-1,kind(0d0))/Ly END DO ky(1+Ny/2)=0.0d0 DO j = 1,Ny/2 -1 ky(j+1+Ny/2)=-ky(1-j+Ny/2) END DO DO j=1,Ny y(j)=(-1.0d0+2.0d0*REAL(j-1,kind(0d0))/REAL(Ny,kind(0d0)) )*pi*Ly END DO DO k=1,1+Nz/2 kz(k)= cmplx(0.0d0,1.0d0)*REAL(k-1,kind(0d0))/Lz END DO kz(1+Nz/2)=0.0d0 DO k = 1,Nz/2 -1 kz(k+1+Nz/2)=-kz(1-k+Nz/2) END DO DO k=1,Nz z(k)=(-1.0d0+2.0d0*REAL(k-1,kind(0d0))/REAL(Nz,kind(0d0)) )*pi*Lz END DO PRINT *,'Setup grid and fourier frequencies' DO k=1,Nz; DO j=1,Ny; DO i=1,Nx unax(i,j,k)=exp(-1.0d0*(x(i)**2 +y(j)**2+z(k)**2)) END DO; END DO; END DO name_config = 'uinitial.dat' INQUIRE(iolength=iol) unax(1,1,1) OPEN(unit=11,FILE=name_config,form="unformatted", & access="direct",recl=iol) count=1 DO k=1,Nz; DO j=1,Ny; DO i=1,Nx WRITE(11,rec=count) unax(i,j,k) count=count+1 END DO; END DO; END DO CLOSE(11) CALL dfftw_execute_dft_(planfxyz,unax(1:Nx,1:Ny,1:Nz),vnax(1:Nx,1:Ny,1:Nz)) PRINT *,'Got initial data, starting timestepping' time(1)=0 DO n=1,Nt DO k=1,Nz; DO j=1,Ny; DO i=1,Nx vnax(i,j,k)=exp(0.50d0*dt*& (kz(k)*kz(k) + kx(i)*kx(i) + ky(j)*ky(j))& *cmplx(0.0d0,1.0d0))*vnax(i,j,k) END DO; END DO; END DO CALL dfftw_execute_dft_(planbxyz,vnax(1:Nx,1:Ny,1:Nz),& unax(1:Nx,1:Ny,1:Nz)) DO k=1,Nz; DO j=1,Ny; DO i=1,Nx unax(i,j,k)=unax(i,j,k)*modescalereal potx(i,j,k)=Es*unax(i,j,k)*conjg(unax(i,j,k)) unax(i,j,k)=exp(cmplx(0.0d0,-1.0d0)*dt*potx(i,j,k))& *unax(i,j,k) END DO; END DO; END DO CALL dfftw_execute_dft_(planfxyz,unax(1:Nx,1:Ny,1:Nz),& vnax(1:Nx,1:Ny,1:Nz)) DO k=1,Nz; DO j=1,Ny; DO i=1,Nx vnax(i,j,k)=exp(0.5d0*dt*& (kx(i)*kx(i) + ky(j)*ky(j)+ kz(k)*kz(k))& *cmplx(0.0d0,1.0d0))*vnax(i,j,k) END DO; END DO; END DO IF (mod(n,plotgap)==0) THEN time(1+n/plotgap)=n*dt PRINT *,'time',n*dt CALL dfftw_execute_dft_(planbxyz,vnax(1:Nx,1:Ny,1:Nz),unax(1:Nx,1:Ny,1:Nz)) DO k=1,Nz; DO j=1,Ny; DO i=1,Nx unax(i,j,k)=unax(i,j,k)*modescalereal END DO; END DO; END DO name_config='./data/u' WRITE(number_file,'(i0)') 10000000+1+n/plotgap ind=index(name_config,' ') -1 name_config=name_config(1:ind)//numberfile ind=index(name_config,' ') -1 name_config=name_config(1:ind)//'.dat' OPEN(unit=11,FILE=name_config,status="UNKNOWN") REWIND(11) DO j=1,Ny DO i=1,Nx WRITE(11,*) abs(unax(i,j))**2 END DO END DO CLOSE(11) END IF END DO PRINT *,'Finished time stepping' ! transform back final data and do another half time step CALL system_clock(finish,count_rate) PRINT*,'Program took ',REAL(finish-start)/REAL(count_rate),'for execution' name_config = 'tdata.dat' OPEN(unit=11,FILE=name_config,status="UNKNOWN") REWIND(11) DO j=1,1+Nt/plotgap WRITE(11,*) time(j) END DO CLOSE(11) name_config = 'xcoord.dat' OPEN(unit=11,FILE=name_config,status="UNKNOWN") REWIND(11) DO i=1,Nx WRITE(11,*) x(i) END DO CLOSE(11) name_config = 'ycoord.dat' OPEN(unit=11,FILE=name_config,status="UNKNOWN") REWIND(11) DO j=1,Ny WRITE(11,*) y(j) END DO CLOSE(11) name_config = 'zcoord.dat' OPEN(unit=11,FILE=name_config,status="UNKNOWN") REWIND(11) DO k=1,Nz WRITE(11,*) z(k) END DO CLOSE(11) PRINT *,'Saved data' CALL dfftw_destroy_plan_(planbxyz) CALL dfftw_destroy_plan_(planfxyz) CALL dfftw_cleanup_() DEALLOCATE(unax,vnax,potx,& kx,ky,kz,x,y,z,& time,stat=AllocateStatus) IF (AllocateStatus .ne. 0) STOP PRINT *,'Program execution complete' END PROGRAM main  In comparison to the previous programs, the program in listing E writes out its final data as a binary file. This is often significantly faster than writing out a text file, and the resulting file is usually much smaller in size. This is important when many such files are written and/or if individual files are large. Due to the formatting change, the binary file also needs to be read in slightly differently. The Matlab script in listing N shows how to do this. ( N) Matlab program which plots a numerical solution to a 3D nonlinear Schrödinger equation generated by listings E or O Code download % A program to plot the computed results clear all; format compact, format short, set(0,'defaultaxesfontsize',18,'defaultaxeslinewidth',.9,... 'defaultlinelinewidth',3.5,'defaultpatchlinewidth',5.5); % Load data tdata=load('./tdata.dat'); x=load('./xcoord.dat'); y=load('./ycoord.dat'); z=load('./zcoord.dat'); Tsteps = length(tdata); Nx = length(x); Nt = length(tdata); Ny = length(y); Nz = length(z); fid=fopen('./ufinal.datbin','r'); [fname,mode,mformat]=fopen(fid); u=fread(fid,Nx*Ny*Nz,'double',mformat); u = reshape(u,Nx,Ny,Nz); % Plot data figure (1); clf ; UP = abs(u).^2; p1 = patch(isosurface(x,y,z,UP,.0025) ,... 'FaceColor','yellow','EdgeColor','none'); p2 = patch(isocaps(x,y,z,UP,.0025) ,... 'FaceColor','interp','EdgeColor','none'); isonormals(UP,p1); lighting phong; xlabel('x'); ylabel('y'); zlabel('z'); axis equal; axis square; view(3); drawnow;  We now modify the above code to use MPI and the library 2DECOMP&FFT. The library 2DECOMP&FFT hides most of the details of MPI although there are a few commands which it is useful for the user to understand. These commands are: • USE mpi or INCLUDE 'mpif.h' • MPI_INIT • MPI_COMM_SIZE • MPI_COMM_RANK • MPI_FINALIZE The program is listed in listing O , please compare this to the serial code in E . The library 2DECOMP&FFT does a domain decomposition of the arrays so that separate parts of the arrays are on separate processors. The library can also perform a Fourier transform on the arrays even though they are stored on different processors – the library does all the necessary message passing and transpositions required to perform the Fourier transform. It should be noted that the order of the entries in the arrays after the Fourier transform is not necessarily the same as the order used by FFTW. However, the correct ordering of the entries is returned by the structure decomp and so this structure is used to obtain starting and stopping entries for the loops. We assume that the library 2DECOMP&FFT has been installed in an appropriate location. ( O) A Fortran program to solve the 3D nonlinear Schrödinger equation using splitting and 2DECOMP&FFT Code download !-------------------------------------------------------------------- ! ! ! PURPOSE ! ! This program solves nonlinear Schrodinger equation in 3 dimensions ! i*u_t+Es*|u|^2u+u_{xx}+u_{yy}+u_{zz}=0 ! using a second order time spectral splitting scheme ! ! The boundary conditions are u(x=0,y,z)=u(2*Lx*\pi,y,z), ! u(x,y=0,z)=u(x,y=2*Ly*\pi,z), u(x,y,z=0)=u(x,y,z=2*Lz*\pi) ! The initial condition is u=exp(-x^2-y^2) ! ! .. Parameters .. ! Nx = number of modes in x - power of 2 for FFT ! Ny = number of modes in y - power of 2 for FFT ! Nz = number of modes in z - power of 2 for FFT ! Nt = number of timesteps to take ! Tmax = maximum simulation time ! plotgap = number of timesteps between plots ! pi = 3.14159265358979323846264338327950288419716939937510d0 ! Lx = width of box in x direction ! Ly = width of box in y direction ! Lz = width of box in z direction ! ES = +1 for focusing and -1 for defocusing ! .. Scalars .. ! i = loop counter in x direction ! j = loop counter in y direction ! k = loop counter in z direction ! n = loop counter for timesteps direction ! allocatestatus = error indicator during allocation ! start = variable to record start time of program ! finish = variable to record end time of program ! count_rate = variable for clock count rate ! dt = timestep ! modescalereal = Number to scale after backward FFT ! myid = Process id ! ierr = error code ! p_row = number of rows for domain decomposition ! p_col = number of columns for domain decomposition ! filesize = total filesize ! disp = displacement to start writing data from ! ind = index in array to write ! plotnum = number of plot to save ! numberfile = number of the file to be saved to disk ! stat = error indicator when reading inputfile ! .. Arrays .. ! u = approximate solution ! v = Fourier transform of approximate solution ! pot = potential ! .. Vectors .. ! kx = fourier frequencies in x direction ! ky = fourier frequencies in y direction ! kz = fourier frequencies in z direction ! x = x locations ! y = y locations ! z = z locations ! time = times at which save data ! nameconfig = array to store filename for data to be saved ! InputFileName = name of the Input File ! .. Special Structures .. ! decomp = contains information on domain decomposition ! see http://www.2decomp.org/ for more information ! ! REFERENCES ! ! ACKNOWLEDGEMENTS ! ! ACCURACY ! ! ERROR INDICATORS AND WARNINGS ! ! FURTHER COMMENTS ! Check that the initial iterate is consistent with the ! boundary conditions for the domain specified !-------------------------------------------------------------------- ! External routines required ! ! External libraries required ! 2DECOMP&FFT -- Domain decomposition and Fast Fourier Library ! (http://www.2decomp.org/index.html) ! MPI library PROGRAM main USE decomp_2d USE decomp_2d_fft USE decomp_2d_io USE MPI ! Declare variables IMPLICIT NONE INTEGER(kind=4) :: Nx=2**5 INTEGER(kind=4) :: Ny=2**5 INTEGER(kind=4) :: Nz=2**5 INTEGER(kind=4) :: Nt=50 INTEGER(kind=4) :: plotgap=10 REAL(kind=8), PARAMETER ::& pi=3.14159265358979323846264338327950288419716939937510d0 REAL(kind=8) :: Lx=2.0d0,Ly=2.0d0,Lz=2.0d0 REAL(kind=8) :: Es=1.0d0 REAL(kind=8) :: dt=0.0010d0 COMPLEX(kind=8), DIMENSION(:), ALLOCATABLE :: kx,ky,kz REAL(kind=8), DIMENSION(:), ALLOCATABLE :: x,y,z COMPLEX(kind=8), DIMENSION(:,:,:), ALLOCATABLE :: u,v,pot REAL(kind=8), DIMENSION(:), ALLOCATABLE :: time INTEGER(KIND=4), DIMENSION(1:5) :: intcomm REAL(KIND=8), DIMENSION(1:5) :: dpcomm REAL(kind=8) :: modescalereal INTEGER(kind=4) :: i,j,k,n,AllocateStatus,stat INTEGER(kind=4) :: myid,numprocs,ierr TYPE(DECOMP_INFO) :: decomp INTEGER(kind=MPI_OFFSET_KIND) :: filesize, disp INTEGER(kind=4) :: p_row=0, p_col=0 INTEGER(kind=4) :: start, finish, count_rate, ind, plotnum CHARACTER*50 :: nameconfig CHARACTER*20 :: numberfile ! initialisation of MPI CALL MPI_INIT(ierr) CALL MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr) CALL MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr) ! initialisation of 2decomp ! do automatic domain decomposition CALL decomp_2d_init(Nx,Ny,Nz,p_row,p_col) ! get information about domain decomposition choosen CALL decomp_info_init(Nx,Ny,Nz,decomp) ! initialise FFT library CALL decomp_2d_fft_init ALLOCATE(u(decomp%xst(1):decomp%xen(1),& decomp%xst(2):decomp%xen(2),& decomp%xst(3):decomp%xen(3)),& v(decomp%zst(1):decomp%zen(1),& decomp%zst(2):decomp%zen(2),& decomp%zst(3):decomp%zen(3)),& pot(decomp%xst(1):decomp%xen(1),& decomp%xst(2):decomp%xen(2),& decomp%xst(3):decomp%xen(3)),& kx(1:Nx),ky(1:Ny),kz(1:Nz),& x(1:Nx),y(1:Ny),z(1:Nz),& time(1:1+Nt/plotgap),stat=AllocateStatus) IF (AllocateStatus .ne. 0) STOP IF (myid.eq.0) THEN PRINT *,'allocated space' END IF modescalereal=1.0d0/REAL(Nx,KIND(0d0)) modescalereal=modescalereal/REAL(Ny,KIND(0d0)) modescalereal=modescalereal/REAL(Nz,KIND(0d0)) ! setup fourier frequencies and grid points DO i=1,1+Nx/2 kx(i)= cmplx(0.0d0,1.0d0)*REAL(i-1,kind(0d0))/Lx END DO kx(1+Nx/2)=0.0d0 DO i = 1,Nx/2 -1 kx(i+1+Nx/2)=-kx(1-i+Nx/2) END DO DO i=1,Nx x(i)=(-1.0d0 + 2.0d0*REAL(i-1,kind(0d0))/REAL(Nx,kind(0d0)))*pi*Lx END DO DO j=1,1+Ny/2 ky(j)= cmplx(0.0d0,1.0d0)*REAL(j-1,kind(0d0))/Ly END DO ky(1+Ny/2)=0.0d0 DO j = 1,Ny/2 -1 ky(j+1+Ny/2)=-ky(1-j+Ny/2) END DO DO j=1,Ny y(j)=(-1.0d0 + 2.0d0*REAL(j-1,kind(0d0))/REAL(Ny,kind(0d0)))*pi*Ly END DO DO k=1,1+Nz/2 kz(k)= cmplx(0.0d0,1.0d0)*REAL(k-1,kind(0d0))/Lz END DO kz(1+Nz/2)=0.0d0 DO k = 1,Nz/2 -1 kz(k+1+Nz/2)=-kz(1-k+Nz/2) END DO DO k=1,Nz z(k)=(-1.0d0 + 2.0d0*REAL(k-1,kind(0d0))/REAL(Nz,kind(0d0)))*pi*Lz END DO IF (myid.eq.0) THEN PRINT *,'Setup grid and fourier frequencies' END IF DO k=decomp%xst(3),decomp%xen(3) DO j=decomp%xst(2),decomp%xen(2) DO i=decomp%xst(1),decomp%xen(1) u(i,j,k)=exp(-1.0d0*(x(i)**2 +y(j)**2+z(k)**2)) END DO END DO END DO ! write out using 2DECOMP&FFT MPI-IO routines nameconfig='./data/u' plotnum=0 WRITE(numberfile,'(i0)') 10000000+plotnum ind=index(nameconfig,' ') -1 nameconfig=nameconfig(1:ind)//numberfile ind=index(nameconfig,' ') -1 nameconfig=nameconfig(1:ind)//'.datbin' CALL decomp_2d_write_one(1,u,nameconfig) CALL decomp_2d_fft_3d(u,v,DECOMP_2D_FFT_FORWARD) IF (myid.eq.0) THEN PRINT *,'Got initial data, starting timestepping' END IF CALL system_clock(start,count_rate) time(1)=0 DO n=1,Nt ! Use Strang splitting DO k=decomp%zst(3),decomp%zen(3) DO j=decomp%zst(2),decomp%zen(2) DO i=decomp%zst(1),decomp%zen(1) v(i,j,k)=exp(0.50d0*dt*& (kz(k)*kz(k) + kx(i)*kx(i) + ky(j)*ky(j))& *cmplx(0.0d0,1.0d0))*v(i,j,k) END DO END DO END DO CALL decomp_2d_fft_3d(v,u,DECOMP_2D_FFT_BACKWARD) DO k=decomp%xst(3),decomp%xen(3) DO j=decomp%xst(2),decomp%xen(2) DO i=decomp%xst(1),decomp%xen(1) u(i,j,k)=u(i,j,k)*modescalereal pot(i,j,k)=Es*u(i,j,k)*conjg(u(i,j,k)) u(i,j,k)=exp(cmplx(0.0d0,-1.0d0)*dt*pot(i,j,k))*u(i,j,k) END DO END DO END DO CALL decomp_2d_fft_3d(u,v,DECOMP_2D_FFT_FORWARD) DO k=decomp%zst(3),decomp%zen(3) DO j=decomp%zst(2),decomp%zen(2) DO i=decomp%zst(1),decomp%zen(1) v(i,j,k)=exp(dt*0.5d0*& (kx(i)*kx(i) +ky(j)*ky(j) +kz(k)*kz(k))& *cmplx(0.0d0,1.0d0))*v(i,j,k) END DO END DO END DO IF (mod(n,plotgap)==0) THEN time(1+n/plotgap)=n*dt IF (myid.eq.0) THEN PRINT *,'time',n*dt END IF CALL decomp_2d_fft_3d(v,u,DECOMP_2D_FFT_BACKWARD) u=u*modescalereal nameconfig='./data/u' plotnum=plotnum+1 WRITE(numberfile,'(i0)') 10000000+plotnum ind=index(nameconfig,' ') -1 nameconfig=nameconfig(1:ind)//numberfile ind=index(nameconfig,' ') -1 nameconfig=nameconfig(1:ind)//'.datbin' ! write out using 2DECOMP&FFT MPI-IO routines CALL decomp_2d_write_one(1,u,nameconfig) END IF END DO IF (myid.eq.0) THEN PRINT *,'Finished time stepping' END IF CALL system_clock(finish,count_rate) IF (myid.eq.0) THEN PRINT*,'Program took ',REAL(finish-start)/REAL(count_rate),'for execution' END IF IF (myid.eq.0) THEN ! Save times at which output was made in text format nameconfig = './data/tdata.dat' OPEN(unit=11,FILE=nameconfig,status="UNKNOWN") REWIND(11) DO j=1,1+Nt/plotgap WRITE(11,*) time(j) END DO CLOSE(11) ! Save x grid points in text format nameconfig = './data/xcoord.dat' OPEN(unit=11,FILE=nameconfig,status="UNKNOWN") REWIND(11) DO i=1,Nx WRITE(11,*) x(i) END DO CLOSE(11) ! Save y grid points in text format nameconfig = './data/ycoord.dat' OPEN(unit=11,FILE=nameconfig,status="UNKNOWN") REWIND(11) DO j=1,Ny WRITE(11,*) y(j) END DO CLOSE(11) ! Save z grid points in text format nameconfig = './data/zcoord.dat' OPEN(unit=11,FILE=nameconfig,status="UNKNOWN") REWIND(11) DO k=1,Nz WRITE(11,*) z(k) END DO CLOSE(11) PRINT *,'Saved data' END IF ! clean up CALL decomp_2d_fft_finalize CALL decomp_2d_finalize DEALLOCATE(u,v,pot,& kx,ky,kz,x,y,z,& time,stat=AllocateStatus) IF (AllocateStatus .ne. 0) STOP IF (myid.eq.0) THEN PRINT *,'Program execution complete' END IF CALL MPI_FINALIZE(ierr) END PROGRAM main  To compile the code modify the makefile in listing P appropriately for your system. Type make and then you should be ready to run it after it compiles. It is assumed that FFTW is the underlying one dimensional fast Fourier transform library called by 2DECOMP&FFT, so this should be changed if another library is used. Note that you will need to rebuild the program if you want to change the problem size or the runtime. ( P) An example makefile for a parallel Fortran program to solve the 3D nonlinear Schrödinger equation using splitting and 2DECOMP&FFT Code download COMPILER = mpif90 decompdir=../2decomp_fft FLAGS = -O0 DECOMPLIB = -I${decompdir}/include -L${decompdir}/lib -l2decomp_fft LIBS = -L${FFTW_LINK} -lfftw3_threads -lfftw3 -lm
SOURCES = NLSsplitting.f90

NLSsplitting: $(SOURCES)${COMPILER} -o NLSsplitting $(FLAGS)$(SOURCES) $(LIBS)$(DECOMPLIB)

clean:
rm -f *.o
rm -f *.mod
clobber:
rm -f NLSsplitting


## Exercises

1. The ASCII character set requires 7 bits per character and so at least 7 bits are required to store a digit between 0 and 9. A double precision number in IEEE arithmetic requires 64 bits to store a double precision number with approximately 15 decimal digits and approximately a 3 decimal digit exponent. How many bits are required to store a IEEE double precision number? Suppose a file has $10^{6}$  double precision numbers. What is the minimum size of the file if the numbers are stored as IEEE double precision numbers? What is the minimum size of the file if the numbers are stored as characters?
2. Write an MPI code using 2DECOMP&FFT to solve the Gross-Pitaevskii equation in three dimensions.
3. Learn to use either VisIt (https://wci.llnl.gov/codes/visit/) or Paraview (http://www.paraview.org/) and write a script to visualize two and three dimensional output in a manner that is similar to the Matlab codes.