# Parallel Spectral Numerical Methods/Visualization with ParaView CoProcessing

The ParaView CoProcessing plugin allows an application code to be instrumented to connect to a ParaView server in order to execute a visualization pipeline. The pipeline can produce either images in a variety of formats or VTK XML parallel file format data sets.

A user creates a pipeline in the ParaView client using a sample, probably lower resolution, data set. This pipeline is then exported using the CoProcessor plugin as a Python script which will be loaded by the application.

The application developer also needs to write some Fortran/C++ code to convert the application’s data structures into a format ParaView can understand. In the case of completely regular grids, this is straightforward.

# Using ParaView CoProcessing on a Single Computation Node

## ParaView client setup

The first requirement is to have a ParaView client which can export the Python script. This requires building from source as of ParaView 3.14.1. The standard instructions apply with some additional instructions for building the script generator plugin. After building, the first time you launch ParaView, go to Tools, Manage Plugins and set the CoProcessingPlugin to load on startup. After the plugin is loaded, there should be two new menu items: Writers and CoProcessing

### Client build notes:

• Might have to install CMake
• Might have to build Qt
• Might have to turn off building the Manta plugin

## Existing code alterations

The following instructions pertain to the Navier-Stokes CUDA Fortran as tested on NCSA's Forge during the Summer of 2012.

### Simulation code

( A)

!--------------------------------------------------------------------
!
! PURPOSE
!
! This program numerically solves the 2D incompressible Navier-Stokes
! on a Square Domain [0,1]x[0,1] using pseudo-spectral methods and
! Crank-Nicolson timestepping. The numerical solution is compared to
! the exact Taylor-Green Vortex Solution.
!
! Periodic free-slip boundary conditions and Initial conditions:
!       u(x,y,0)=sin(2*pi*x)cos(2*pi*y)
!       v(x,y,0)=-cos(2*pi*x)sin(2*pi*y)
! Analytical Solution (subscript denote derivatives):
!       u(x,y,t)=sin(2*pi*x)cos(2*pi*y)exp(-8*pi^2*t/Re)
!       v(x,y,t)=-cos(2*pi*x)sin(2*pi*y)exp(-8*pi^2*t/Re)
!   u_y(x,y,t)=-2*pi*sin(2*pi*x)sin(2*pi*y)exp(-8*pi^2*t/Re)
!       v_x(x,y,t)=2*pi*sin(2*pi*x)sin(2*pi*y)exp(-8*pi^2*t/Re)
!       omega=v_x-u_y
!
! .. Parameters ..
!  Nx                           = number of modes in x - power of 2 for FFT
!  Ny                           = number of modes in y - power of 2 for FFT
!  nplots                       = number of plots produced
!  plotgap                      = number of timesteps inbetween plots
!  Re                           = dimensionless Renold's number
!  ReInv                        = 1/Re for optimization
!  dt                           = timestep size
!  dtInv                        = 1/dt for optimization
!  tol                          = determines when convergences is reached
!  scalemodes           = 1/(Nx*Ny), scaling after preforming FFTs
!  numthreads           = number of CPUs used in calculation
! .. Scalars ..
!  i                            = loop counter in x direction
!  j                            = loop counter in y direction
!  n                            = loop counter for timesteps direction
!  allocatestatus       = error indicator during allocation
!  time                         = times at which data is saved
!  chg                          = error at each iteration
!  temp                         = used for ordering saved omega
! .. Arrays (gpu) ..
!  u_d                          = velocity in x direction
!  uold_d                               = velocity in x direction at previous timestep
!  v_d                          = velocity in y direction
!  vold_d                               = velocity in y direction at previous timestep
!  omeg_d                               = vorticity     in real space
!  omeghat_d                    = 2D Fourier transform of vorticity
!                                               at next iterate
!  omegoldhat_d         = 2D Fourier transform of vorticity at previous
!                                               iterate
!  nloldhat_d                   = nonlinear term in Fourier space
!                                               at previous iterate
!  omegexact_d          = taylor-green vorticity at
!                                               at final step
!  psihat_d                     = 2D Fourier transform of streamfunction
!                                               at next iteration
!  omegcheck_d          = store of vorticity at previous iterate
!  temp1_d/temp2_d              = reusable complex/real space used for
!                                               calculations. This reduces number of
!                                               arrays stored.
! .. Vectors (gpu) ..
!  kx_d                         = fourier frequencies in x direction
!  ky_d                         = fourier frequencies in y direction
!  x_d                          = x locations
!  y_d                          = y locations
!  name_config          = array to store filename for data to be saved
! REFERENCES
!
! ACKNOWLEDGEMENTS
!
! ACCURACY
!
! ERROR INDICATORS AND WARNINGS
!
! This program has not been fully optimized.
!--------------------------------------------------------------------
module precision
! Precision control

integer, parameter, public :: Single = kind(0.0) ! Single precision
integer, parameter, public :: Double = kind(0.0d0) ! Double precision

integer, parameter, public :: fp_kind = Double
!integer, parameter, public :: fp_kind = Single

end module precision

module cufft

integer, public :: CUFFT_FORWARD = -1
integer, public :: CUFFT_INVERSE = 1
integer, public :: CUFFT_R2C = Z'2a' ! Real to Complex (interleaved)
integer, public :: CUFFT_C2R = Z'2c' ! Complex (interleaved) to Real
integer, public :: CUFFT_C2C = Z'29' ! Complex to Complex, interleaved
integer, public :: CUFFT_D2Z = Z'6a' ! Double to Double-Complex
integer, public :: CUFFT_Z2D = Z'6c' ! Double-Complex to Double
integer, public :: CUFFT_Z2Z = Z'69' ! Double-Complex to Double-Complex

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!
! cufftPlan2d(cufftHandle *plan, int nx,int ny, cufftType type)
!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

interface cufftPlan2d
subroutine cufftPlan2d(plan, nx, ny, type) bind(C,name='cufftPlan2d')
use iso_c_binding
integer(c_int):: plan
integer(c_int),value:: nx, ny, type
end subroutine cufftPlan2d
end interface

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!
! cufftDestroy(cufftHandle plan)
!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

interface cufftDestroy
subroutine cufftDestroy(plan) bind(C,name='cufftDestroy')
use iso_c_binding
integer(c_int),value:: plan
end subroutine cufftDestroy
end interface

interface cufftExecD2Z
subroutine cufftExecD2Z(plan, idata, odata) &
& bind(C,name='cufftExecD2Z')
use iso_c_binding
use precision
integer(c_int),  value  :: plan
real(fp_kind),   device :: idata(1:nx,1:ny)
complex(fp_kind),device :: odata(1:nx,1:ny)
end subroutine cufftExecD2Z
end interface

interface cufftExecZ2D
subroutine cufftExecZ2D(plan, idata, odata) &
& bind(C,name='cufftExecZ2D')
use iso_c_binding
use precision
integer(c_int),value:: plan
complex(fp_kind),device:: idata(1:nx,1:ny)
real(fp_kind),device :: odata(1:nx,1:ny)
end subroutine cufftExecZ2D
end interface

end module cufft

PROGRAM main
use precision
use cufft

! coprocessing:

! declare variables
IMPLICIT NONE
INTEGER(kind=4), PARAMETER            ::  Nx=1024
INTEGER(kind=4), PARAMETER            ::  Ny=1024
!INTEGER(kind=8)                                              ::
REAL(fp_kind), PARAMETER              ::  dt=0.00025d0
REAL(fp_kind), PARAMETER              ::  dtInv=1.0d0/REAL(dt,kind(0d0))
REAL(fp_kind), PARAMETER      &
::  pi=3.14159265358979323846264338327950288419716939937510d0
REAL(fp_kind), PARAMETER              ::  Re=5000.0d0
REAL(fp_kind), PARAMETER              ::      ReInv=1.0d0/REAL(Re,kind(0d0))
REAL(fp_kind), PARAMETER              ::      tol=0.1d0**10

!coprocessing: temp is used as an int, runtime crashes on
! floating temp used in write format string on line 376
!REAL(fp_kind)                                        ::  scalemodes,chg,temp=10000000
real(fp_kind) :: scalemodes, chg
integer(kind=4) :: temp=100000000

INTEGER(kind=4), PARAMETER            ::      nplots=400, plotgap=200
REAL(fp_kind),DIMENSION(:), ALLOCATABLE               ::  x,y
REAL(fp_kind),DIMENSION(:,:), ALLOCATABLE     ::      omeg,omegexact
INTEGER(kind=4)                               ::  i,j,n,t, AllocateStatus
INTEGER(kind=4)                               ::  planz2d,pland2z, kersize
!variables used for saving data and timing
INTEGER(kind=4)                               ::      start, finish, count_rate,count, iol
CHARACTER*100                                 ::  name_config
! Declare variables for GPU
REAL(fp_kind), DEVICE, DIMENSION(:,:), ALLOCATABLE            ::      u_d,v_d,&
omegcheck_d,omeg_d,&
nl_d, temp2_d, omegexact_d
COMPLEX(fp_kind), DEVICE, DIMENSION(:,:), ALLOCATABLE :: omegoldhat_d, nloldhat_d,&
omeghat_d, nlhat_d, psihat_d,&
temp1_d
COMPLEX(fp_kind), DEVICE, DIMENSION(:), ALLOCATABLE           ::  kx_d,ky_d
REAL(kind=8),DEVICE, DIMENSION(:), ALLOCATABLE                ::  x_d,y_d

kersize=min(Nx,256)
PRINT *,'Program starting'
PRINT *,'Grid:',Nx,'X',Ny
PRINT *,'dt:',dt
ALLOCATE(x(1:Nx),y(1:Ny),omeg(1:Nx,1:Ny),omegexact(1:Nx,1:Ny),&
stat=AllocateStatus)
IF (AllocateStatus .ne. 0) STOP
PRINT *,'Allocated CPU arrays'
ALLOCATE(kx_d(1:Nx/2+1),ky_d(1:Ny),x_d(1:Nx),y_d(1:Ny),u_d(1:Nx,1:Ny),&
v_d(1:Nx,1:Ny),omeg_d(1:Nx,1:Ny),&
omegexact_d(1:Nx,1:Ny),temp2_d(1:Nx,1:Ny),&
omegoldhat_d(1:Nx/2+1,1:Ny),nloldhat_d(1:Nx/2+1,1:Ny),&
omegcheck_d(1:Nx,1:Ny),omeghat_d(1:Nx/2+1,1:Ny),nl_d(1:Nx,1:Ny),&
nlhat_d(1:Nx/2+1,1:Ny), psihat_d(1:Nx/2+1,1:Ny),temp1_d(1:Nx/2+1,1:Ny),&
stat=AllocateStatus)
IF (AllocateStatus .ne. 0) STOP
PRINT *,'Allocated GPU arrays'

CALL cufftPlan2D(pland2z,nx,ny,CUFFT_D2Z)
CALL cufftPlan2D(planz2d,nx,ny,CUFFT_Z2D)
PRINT *,'Setup FFTs'

! setup fourier frequencies
!$cuf kernel do <<< *,* >>> DO i=1,Nx/2+1 kx_d(i)= 2.0d0*pi*cmplx(0.0d0,1.0d0)*REAL(i-1,kind=fp_kind) END DO kx_d(1+Nx/2)=0.0d0 !$cuf kernel do <<< *,* >>>
DO i=1,Nx
x_d(i)=REAL(i-1,kind(0d0))/REAL(Nx,kind=fp_kind)
END DO
!$cuf kernel do <<< *,* >>> DO j=1,Ny/2+1 ky_d(j)= 2.0d0*pi*cmplx(0.0d0,1.0d0)*REAL(j-1,kind=fp_kind) END DO ky_d(1+Ny/2)=0.0d0 !$cuf kernel do <<< *,* >>>
DO j = 1,Ny/2 -1
ky_d(j+1+Ny/2)=-ky_d(1-j+Ny/2)
END DO
!$cuf kernel do <<< *, * >>> DO j=1,Ny y_d(j)=REAL(j-1,kind(0d0))/REAL(Ny,kind=fp_kind) END DO scalemodes=1.0d0/REAL(Nx*Ny,kind=fp_kind) PRINT *,'Setup grid and fourier frequencies' !initial data !$cuf kernel do <<<  *,*  >>>
DO j=1,Ny
DO i=1,Nx
u_d(i,j)=sin(2.0d0*pi*x_d(i))*cos(2.0d0*pi*y_d(j))
END DO
END DO
!$cuf kernel do <<< *,* >>> DO j=1,Ny DO i=1,Nx v_d(i,j)=-cos(2.0d0*pi*x_d(i))*sin(2.0d0*pi*y_d(j)) END DO END DO !$cuf kernel do <<<  *,*  >>>
DO j=1,Ny
DO i=1,Nx
omeg_d(i,j)=4.0d0*pi*sin(2.0d0*pi*x_d(i))*sin(2.0d0*pi*y_d(j))+0.01d0*cos(2.0d0*pi*y_d(j))
END DO
END DO

CALL cufftExecD2Z(pland2z,omeg_d,omeghat_d)

!$cuf kernel do <<< *,* >>> DO j=1,Ny DO i=1,Nx/2+1 temp1_d(i,j)=omeghat_d(i,j)*kx_d(i) END DO END DO CALL cufftExecZ2D(planz2d,temp1_d,temp2_d) !first part nonlinear term in real space !$cuf kernel do <<<  *,*  >>>
DO j=1,Ny
DO i=1,Nx
nl_d(i,j)=u_d(i,j)*temp2_d(i,j)
END DO
END DO

!$cuf kernel do <<< *,* >>> DO j=1,Ny DO i=1,Nx/2+1 temp1_d(i,j)=omeghat_d(i,j)*ky_d(j) END DO END DO CALL cufftExecZ2D(planz2d,temp1_d,temp2_d) !$cuf kernel do <<<  *,*  >>>
DO j=1,Ny
DO i=1,Nx
nl_d(i,j)=(nl_d(i,j)+v_d(i,j)*temp2_d(i,j))*scalemodes
END DO
END DO

CALL cufftExecD2Z(pland2z,nl_d,nlhat_d)

omegcheck_d=omeg_d
PRINT *,'Got initial data, starting timestepping'
CALL system_clock(start,count_rate)

! coprocessing: simulation loop
! coprocessing: before entering, initialze the ParaView coprocessor
! coprocessing: this is from FortranAdaptorAPI.cxx
! coprocessing: it takes the name of the pipeline script you created
! coprocessing: in the ParaView client and the name's length in chars
call coprocessorinitialize("pipeline.py", 11 )
DO t=1,nplots
DO n=1,plotgap
chg=1.0d0
nloldhat_d=nlhat_d
omegoldhat_d=omeghat_d
DO WHILE (chg>tol)
!$cuf kernel do(2) <<< (2,*), (kersize,1) >>> DO j=1,Ny DO i=1,Nx/2+1 omeghat_d(i,j)=((dtInv+0.5d0*ReInv*(kx_d(i)*kx_d(i)+ky_d(j)*ky_d(j)))& *omegoldhat_d(i,j) - 0.5d0*(nloldhat_d(i,j)+nlhat_d(i,j)))& /(dtInv-0.5d0*ReInv*(kx_d(i)*kx_d(i)+ky_d(j)*ky_d(j))) END DO END DO !$cuf kernel do(2) <<< (2,*), (kersize,1) >>>
DO j=1,Ny
DO i=1,Nx/2+1
psihat_d(i,j)=-omeghat_d(i,j)/(kx_d(i)*kx_d(i)+ky_d(j)*ky_d(j)+0.10d0**14)
END DO
END DO
CALL cufftExecZ2D(planz2d,omeghat_d,omeg_d)
!$cuf kernel do(2) <<< (2,*), (kersize,1) >>> DO j=1,Ny DO i=1,Nx/2+1 temp1_d(i,j)=-psihat_d(i,j)*kx_d(i)*scalemodes END DO END DO CALL cufftExecZ2D(planz2d,temp1_d,v_d) !$cuf kernel do(2) <<< (2,*), (kersize,1) >>>
DO j=1,Ny
DO i=1,Nx/2+1
temp1_d(i,j)=psihat_d(i,j)*ky_d(j)*scalemodes
END DO
END DO
CALL cufftExecZ2D(planz2d,temp1_d,u_d)

!$cuf kernel do(2) <<< (2,*), (kersize,1) >>> DO j=1,Ny DO i=1,Nx/2+1 temp1_d(i,j)=omeghat_d(i,j)*kx_d(i) END DO END DO CALL cufftExecZ2D(planz2d,temp1_d,temp2_d) !$cuf kernel do(2) <<< (2,*), (kersize,1) >>>
DO j=1,Ny
DO i=1,Nx
nl_d(i,j)=u_d(i,j)*temp2_d(i,j)
END DO
END DO

!$cuf kernel do(2) <<< (2,*), (kersize,1) >>> DO j=1,Ny DO i=1,Nx/2+1 temp1_d(i,j)=omeghat_d(i,j)*ky_d(j) END DO END DO CALL cufftExecZ2D(planz2d,temp1_d,temp2_d) !$cuf kernel do(2) <<< (2,*), (kersize,1) >>>
DO j=1,Ny
DO i=1,Nx
nl_d(i,j)=(nl_d(i,j)+v_d(i,j)*temp2_d(i,j))*scalemodes
END DO
END DO

CALL cufftExecD2Z(pland2z,nl_d,nlhat_d)

chg=0.0d0

## Run the simulation

Run as you would the uninstrumented version.

## Convert the images to an animation

There are a variety of ways to do this, if ffmpeg is available ffmpeg -sameq -i path/to/images/image%04d.png myanimation.mpg

or if writing out the VTK data sets, one can work with those in ParaView directly.

### dt and effect on animation time.

Smooth video is ~30 fps or dt = 0.033… If you write out a larger dt as VTK data sets, then one can do linear time interpolation in ParaView. If writing out a larger dt as images to stitch into an mpg, the options available created much reduced video quality.

## ParaView Server setup

• Forge did not have X support, so an offscreen Mesa (OSMesa) build was necessary.
• Nautilus supports X and so can render with the supplied OpenGL libraries.

# Summary of Changes for New API

The coprocessing API has had some minor changes since ParaView 3.14.1. The following applies for ParaView 3.98+ and Catalyst 1.0 alpha+. These were tested on Beacon in offload mode at NICS on August 20, 2013.

The code changes are:

• In the simulation code, coprocessorinitialize is now coprocessorinitializewithpython, the arguments are the same.
• In the C++ data set helper code, change from ParaViewCoprocessing namespace to vtkCPPythonAdaptorAPI namespace.

The libraries to link against have also changed, for a ParaView 4.0.1 build these are:

• -lvtkPVCatalystPython26D-pv4.0
• -lvtkPVCatalyst-pv4.0
• -lvtkCommonCore-pv4.0
• -lvtkCommonDataModel-pv4.0
• -lvtkPVPythonCatalyst-pv4.0

## Example with New API

( A)

Simulation: navierstokes.f90 -- only change is the initialization function name.

!--------------------------------------------------------------------
!
!
! PURPOSE
!
! This program numerically solves the 2D incompressible Navier-Stokes
! on a Square Domain [0,1]x[0,1] using pseudo-spectral methods and
! Crank-Nicolson timestepping. The numerical solution is compared to
! the exact Taylor-Green Vortex Solution.
!
! Periodic free-slip boundary conditions and Initial conditions:
! u(x,y,0)=sin(2*pi*x)cos(2*pi*y)
! v(x,y,0)=-cos(2*pi*x)sin(2*pi*y)
! Analytical Solution:
! u(x,y,t)=sin(2*pi*x)cos(2*pi*y)exp(-8*pi^2*nu*t)
! v(x,y,t)=-cos(2*pi*x)sin(2*pi*y)exp(-8*pi^2*nu*t)
!
! .. Parameters ..
!  Nx    = number of modes in x - power of 2 for FFT
!  Ny    = number of modes in y - power of 2 for FFT
!  Nt    = number of timesteps to take
!  Tmax    = maximum simulation time
!  FFTW_IN_PLACE  = value for FFTW input
!  FFTW_MEASURE  = value for FFTW input
!  FFTW_EXHAUSTIVE  = value for FFTW input
!  FFTW_PATIENT  = value for FFTW input
!  FFTW_ESTIMATE  = value for FFTW input
!  FFTW_FORWARD     = value for FFTW input
!  FFTW_BACKWARD = value for FFTW input
!  pi = 3.14159265358979323846264338327950288419716939937510d0
!  mu    = viscosity
!  rho    = density
! .. Scalars ..
!  i    = loop counter in x direction
!  j    = loop counter in y direction
!  n    = loop counter for timesteps direction
!  allocatestatus = error indicator during allocation
!  count   = keep track of information written to disk
!  iol    = size of array to write to disk
!  start   = variable to record start time of program
!  finish   = variable to record end time of program
!  count_rate  = variable for clock count rate
!  planfx   = Forward 1d fft plan in x
!  planbx   = Backward 1d fft plan in x
!  planfy   = Forward 1d fft plan in y
!  planby   = Backward 1d fft plan in y
!  dt    = timestep
! .. Arrays ..
!  u    = velocity in x direction
!  uold    = velocity in x direction at previous timestep
!  v    = velocity in y direction
!  vold    = velocity in y direction at previous timestep
!  u_y    = y derivative of velocity in x direction
!  v_x    = x derivative of velocity in y direction
!  omeg    = vorticity in real space
!  omegold   = vorticity in real space at previous
!      iterate
!  omegcheck  = store of vorticity at previous iterate
!  omegoldhat  = 2D Fourier transform of vorticity at previous
!      iterate
!  omegoldhat_x  = x-derivative of vorticity in Fourier space
!      at previous iterate
!  omegold_x  = x-derivative of vorticity in real space
!      at previous iterate
!  omegoldhat_y  = y-derivative of vorticity in Fourier space
!      at previous iterate
!  omegold_y  = y-derivative of vorticity in real space
!       at previous iterate
!  nlold   = nonlinear term in real space
!      at previous iterate
!  nloldhat   = nonlinear term in Fourier space
!      at previous iterate
!  omeghat   = 2D Fourier transform of vorticity
!      at next iterate
!  omeghat_x  = x-derivative of vorticity in Fourier space
!      at next timestep
!  omeghat_y  = y-derivative of vorticity in Fourier space
!      at next timestep
!  omeg_x   = x-derivative of vorticity in real space
!      at next timestep
!  omeg_y   = y-derivative of vorticity in real space
!      at next timestep
! .. Vectors ..
!  kx    = fourier frequencies in x direction
!  ky    = fourier frequencies in y direction
!  kxx    = square of fourier frequencies in x direction
!  kyy    = square of fourier frequencies in y direction
!  x    = x locations
!  y    = y locations
!  time    = times at which save data
!  name_config  = array to store filename for data to be saved
!  fftfx   = array to setup x Fourier transform
!  fftbx   = array to setup y Fourier transform
! REFERENCES
!
! ACKNOWLEDGEMENTS
!
! ACCURACY
!
! ERROR INDICATORS AND WARNINGS
!
! This program has not been optimized to use the least amount of memory
! but is intended as an example only for which all states can be saved
!--------------------------------------------------------------------
! External routines required
!
! External libraries required
! FFTW3  -- Fast Fourier Transform in the West Library
!   (http://www.fftw.org/)
! declare variables
PROGRAM main
IMPLICIT NONE
INTEGER(kind=4), PARAMETER  ::  Nx=256
INTEGER(kind=4), PARAMETER  ::  Ny=256
REAL(kind=8), PARAMETER  ::  dt=0.00125
REAL(kind=8), PARAMETER &
::  pi=3.14159265358979323846264338327950288419716939937510
REAL(kind=8), PARAMETER  ::  rho=1.0d0
REAL(kind=8), PARAMETER  ::  mu=1.0d0
REAL(kind=8), PARAMETER  :: tol=0.1d0**10
REAL(kind=8)    :: chg
INTEGER(kind=4), PARAMETER :: nplots=50
REAL(kind=8), DIMENSION(:), ALLOCATABLE   :: time
COMPLEX(kind=8), DIMENSION(:), ALLOCATABLE  ::  kx,kxx
COMPLEX(kind=8), DIMENSION(:), ALLOCATABLE  ::  ky,kyy
REAL(kind=8), DIMENSION(:), ALLOCATABLE   ::  x
REAL(kind=8), DIMENSION(:), ALLOCATABLE   ::  y
COMPLEX(kind=8), DIMENSION(:,:), ALLOCATABLE :: &
u,uold,v,vold,u_y,v_x,omegold, omegcheck, omeg,&
omegoldhat, omegoldhat_x, omegold_x,&
omegoldhat_y, omegold_y, nlold, nloldhat,&
omeghat, omeghat_x, omeghat_y, omeg_x, omeg_y,&
nl, nlhat, psihat, psihat_x, psi_x, psihat_y, psi_y

real(kind=8), dimension(:,:), allocatable :: dump

REAL(kind=8),DIMENSION(:,:), ALLOCATABLE :: uexact_y,vexact_x,omegexact
INTEGER(kind=4)        ::  i,j,k,n, allocatestatus, count, iol
INTEGER(kind=4)         :: start, finish, count_rate
INTEGER(kind=4), PARAMETER         ::  FFTW_IN_PLACE = 8, FFTW_MEASURE = 0, &
FFTW_EXHAUSTIVE = 8, FFTW_PATIENT = 32,    &
FFTW_ESTIMATE = 64
INTEGER(kind=4),PARAMETER          ::  FFTW_FORWARD = -1, FFTW_BACKWARD=1
COMPLEX(kind=8), DIMENSION(:,:), ALLOCATABLE ::  fftfx,fftbx
INTEGER(kind=8)        ::  planfxy,planbxy
CHARACTER*100         ::  name_config

CALL system_clock(start,count_rate)
ALLOCATE(time(1:nplots),kx(1:Nx),kxx(1:Nx),ky(1:Ny),kyy(1:Ny),x(1:Nx),y(1:Ny),&
u(1:Nx,1:Ny),uold(1:Nx,1:Ny),v(1:Nx,1:Ny),vold(1:Nx,1:Ny),u_y(1:Nx,1:Ny),&
v_x(1:Nx,1:Ny),omegold(1:Nx,1:Ny),omegcheck(1:Nx,1:Ny), omeg(1:Nx,1:Ny),&
omegoldhat(1:Nx,1:Ny),omegoldhat_x(1:Nx,1:Ny), omegold_x(1:Nx,1:Ny), &
omegoldhat_y(1:Nx,1:Ny),omegold_y(1:Nx,1:Ny), nlold(1:Nx,1:Ny), nloldhat(1:Nx,1:Ny),&
omeghat(1:Nx,1:Ny), omeghat_x(1:Nx,1:Ny), omeghat_y(1:Nx,1:Ny), omeg_x(1:Nx,1:Ny),&
omeg_y(1:Nx,1:Ny), nl(1:Nx,1:Ny), nlhat(1:Nx,1:Ny), psihat(1:Nx,1:Ny), &
psihat_x(1:Nx,1:Ny), psi_x(1:Nx,1:Ny), psihat_y(1:Nx,1:Ny), psi_y(1:Nx,1:Ny),&
uexact_y(1:Nx,1:Ny), vexact_x(1:Nx,1:Ny), omegexact(1:Nx,1:Ny),fftfx(1:Nx,1:Ny),&
fftbx(1:Nx,1:Ny),dump(1:Nx,1:Ny),stat=AllocateStatus)
IF (AllocateStatus .ne. 0) STOP
PRINT *,'allocated space'

! set up ffts
CALL dfftw_plan_dft_2d_(planfxy,Nx,Ny,fftfx(1:Nx,1:Ny),fftbx(1:Nx,1:Ny),&
FFTW_FORWARD,FFTW_EXHAUSTIVE)
CALL dfftw_plan_dft_2d_(planbxy,Nx,Ny,fftbx(1:Nx,1:Ny),fftfx(1:Nx,1:Ny),&
FFTW_BACKWARD,FFTW_EXHAUSTIVE)

! setup fourier frequencies in x-direction
DO i=1,1+Nx/2
kx(i)= 2.0d0*pi*cmplx(0.0d0,1.0d0)*REAL(i-1,kind(0d0))
END DO
kx(1+Nx/2)=0.0d0
DO i = 1,Nx/2 -1
kx(i+1+Nx/2)=-kx(1-i+Nx/2)
END DO
DO i=1,Nx
kxx(i)=kx(i)*kx(i)
END DO
DO i=1,Nx
x(i)=REAL(i-1,kind(0d0))/REAL(Nx,kind(0d0))
END DO

! setup fourier frequencies in y-direction
DO j=1,1+Ny/2
ky(j)= 2.0d0*pi*cmplx(0.0d0,1.0d0)*REAL(j-1,kind(0d0))
END DO
ky(1+Ny/2)=0.0d0
DO j = 1,Ny/2 -1
ky(j+1+Ny/2)=-ky(1-j+Ny/2)
END DO
DO j=1,Ny
kyy(j)=ky(j)*ky(j)
END DO
DO j=1,Ny
y(j)=REAL(j-1,kind(0d0))/REAL(Ny,kind(0d0))
END DO
PRINT *,'Setup grid and fourier frequencies'

DO j=1,Ny
DO i=1,Nx
u(i,j)=sin(2.0d0*pi*x(i))*cos(2.0d0*pi*y(j))
v(i,j)=-cos(2.0d0*pi*x(i))*sin(2.0d0*pi*y(j))
u_y(i,j)=-2.0d0*pi*sin(2.0d0*pi*x(i))*sin(2.0d0*pi*y(j))
v_x(i,j)=2.0d0*pi*sin(2.0d0*pi*x(i))*sin(2.0d0*pi*y(j))
omeg(i,j)=v_x(i,j)-u_y(i,j)
END DO
END DO

! Vorticity to Fourier Space
CALL dfftw_execute_dft_(planfxy,omeg(1:Nx,1:Ny),omeghat(1:Nx,1:Ny))

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!Initial nonlinear term !!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! obtain \hat{\omega}_x^{n,k}
DO j=1,Ny
omeghat_x(1:Nx,j)=omeghat(1:Nx,j)*kx(1:Nx)
END DO
! obtain \hat{\omega}_y^{n,k}
DO i=1,Nx
omeghat_y(i,1:Ny)=omeghat(i,1:Ny)*ky(1:Ny)
END DO
! convert to real space
CALL dfftw_execute_dft_(planbxy,omeghat_x(1:Nx,1:Ny),omeg_x(1:Nx,1:Ny))
CALL dfftw_execute_dft_(planbxy,omeghat_y(1:Nx,1:Ny),omeg_y(1:Nx,1:Ny))
! compute nonlinear term in real space
DO j=1,Ny
nl(1:Nx,j)=u(1:Nx,j)*omeg_x(1:Nx,j)/REAL(Nx*Ny,kind(0d0))+&
v(1:Nx,j)*omeg_y(1:Nx,j)/REAL(Nx*Ny,kind(0d0))
END DO
CALL dfftw_execute_dft_(planfxy,nl(1:Nx,1:Ny),nlhat(1:Nx,1:Ny))
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
time(1)=0.0d0
PRINT *,'Got initial data, starting timestepping'

! New Coprocessor API:
call coprocessorinitializewithpython("pipeline.py", 11)

DO n=1,nplots
chg=1
! save old values
uold(1:Nx,1:Ny)=u(1:Nx,1:Ny)
vold(1:Nx,1:Ny)=v(1:Nx,1:Ny)
omegold(1:Nx,1:Ny)=omeg(1:Nx,1:Ny)
omegcheck(1:Nx,1:Ny)=omeg(1:Nx,1:Ny)
omegoldhat(1:Nx,1:Ny)=omeghat(1:Nx,1:Ny)
nloldhat(1:Nx,1:Ny)=nlhat(1:Nx,1:Ny)
DO WHILE (chg>tol)
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!nonlinear fixed (n,k+1)!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! obtain \hat{\omega}_x^{n+1,k}
DO j=1,Ny
omeghat_x(1:Nx,j)=omeghat(1:Nx,j)*kx(1:Nx)
END DO
! obtain \hat{\omega}_y^{n+1,k}
DO i=1,Nx
omeghat_y(i,1:Ny)=omeghat(i,1:Ny)*ky(1:Ny)
END DO
! convert back to real space
CALL dfftw_execute_dft_(planbxy,omeghat_x(1:Nx,1:Ny),omeg_x(1:Nx,1:Ny))
CALL dfftw_execute_dft_(planbxy,omeghat_y(1:Nx,1:Ny),omeg_y(1:Nx,1:Ny))
! calculate nonlinear term in real space
DO j=1,Ny
nl(1:Nx,j)=u(1:Nx,j)*omeg_x(1:Nx,j)/REAL(Nx*Ny,kind(0d0))+&
v(1:Nx,j)*omeg_y(1:Nx,j)/REAL(Nx*Ny,kind(0d0))
END DO
! convert back to fourier
CALL dfftw_execute_dft_(planfxy,nl(1:Nx,1:Ny),nlhat(1:Nx,1:Ny))
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

! obtain \hat{\omega}^{n+1,k+1} with Crank Nicolson timestepping
DO j=1,Ny
omeghat(1:Nx,j)=( (1.0d0/dt+0.5d0*(mu/rho)*(kxx(1:Nx)+kyy(j)))&
*omegoldhat(1:Nx,j) - 0.5d0*(nloldhat(1:Nx,j)+nlhat(1:Nx,j)))/&
(1.0d0/dt-0.5d0*(mu/rho)*(kxx(1:Nx)+kyy(j)))
END DO

! calculate \hat{\psi}^{n+1,k+1}
DO j=1,Ny
psihat(1:Nx,j)=-omeghat(1:Nx,j)/(kxx(1:Nx)+kyy(j))
END DO
psihat(1,1)=0.0d0
psihat(Nx/2+1,Ny/2+1)=0.0d0
psihat(Nx/2+1,1)=0.0d0
psihat(1,Ny/2+1)=0.0d0

! obtain \psi_x^{n+1,k+1} and \psi_y^{n+1,k+1}
DO j=1,Ny
psihat_x(1:Nx,j)=psihat(1:Nx,j)*kx(1:Nx)
END DO
CALL dfftw_execute_dft_(planbxy,psihat_x(1:Nx,1:Ny),psi_x(1:Nx,1:Ny))
DO i=1,Nx
psihat_y(i,1:Ny)=psihat(i,1:Ny)*ky(1:Ny)
END DO
CALL dfftw_execute_dft_(planbxy,psihat_y(1:Ny,1:Ny),psi_y(1:Ny,1:Ny))
DO j=1,Ny
psi_x(1:Nx,j)=psi_x(1:Nx,j)/REAL(Nx*Ny,kind(0d0))
psi_y(1:Nx,j)=psi_y(1:Nx,j)/REAL(Nx*Ny,kind(0d0))
END DO

! obtain \omega^{n+1,k+1}
CALL dfftw_execute_dft_(planbxy,omeghat(1:Nx,1:Ny),omeg(1:Nx,1:Ny))
DO j=1,Ny
omeg(1:Nx,j)=omeg(1:Nx,j)/REAL(Nx*Ny,kind(0d0))
END DO

! obtain u^{n+1,k+1} and v^{n+1,k+1} using stream function (\psi) in real space
DO j=1,Ny
u(1:Nx,j)=psi_y(1:Nx,j)
v(1:Nx,j)=-psi_x(1:Nx,j)
END DO

! check for convergence
chg=maxval(abs(omeg-omegcheck))
! saves {n+1,k+1} to {n,k} for next iteration
omegcheck=omeg
END DO
time(n+1)=time(n)+dt
PRINT *,'TIME ',time(n+1)
! not sure if there's a way to cleanly pass complex to C++ code
do j = 1, Ny
do i = 1, Nx
dump(i,j) = real(omeg(i,j), kind(0d0))
end do
end do
call nsadaptor(Nx, Ny, 1, n, time(n), dump)
END DO

call coprocessorfinalize()

DO j=1,Ny
DO i=1,Nx
uexact_y(i,j)=-2.0d0*pi*sin(2.0d0*pi*x(i))*sin(2.0d0*pi*y(j))*&
exp(-8.0d0*mu*(pi**2)*nplots*dt)
vexact_x(i,j)=2.0d0*pi*sin(2.0d0*pi*x(i))*sin(2.0d0*pi*y(j))*&
exp(-8.0d0*mu*(pi**2)*nplots*dt)
omegexact(i,j)=vexact_x(i,j)-uexact_y(i,j)
END DO
END DO

name_config = 'omegafinal.datbin'
INQUIRE(iolength=iol) omegexact(1,1)
OPEN(unit=11,FILE=name_config,form="unformatted", access="direct",recl=iol)
count = 1
DO j=1,Ny
DO i=1,Nx
WRITE(11,rec=count) REAL(omeg(i,j),KIND(0d0))
count=count+1
END DO
END DO
CLOSE(11)

name_config = 'omegaexactfinal.datbin'
OPEN(unit=11,FILE=name_config,form="unformatted", access="direct",recl=iol)
count = 1
DO j=1,Ny
DO i=1,Nx
WRITE(11,rec=count) omegexact(i,j)
count=count+1
END DO
END DO
CLOSE(11)

name_config = 'xcoord.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO i=1,Nx
WRITE(11,*) x(i)
END DO
CLOSE(11)

name_config = 'ycoord.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO j=1,Ny
WRITE(11,*) y(j)
END DO
CLOSE(11)

CALL dfftw_destroy_plan_(planfxy)
CALL dfftw_destroy_plan_(planbxy)
CALL dfftw_cleanup_()

DEALLOCATE(time,kx,kxx,ky,kyy,x,y,&
u,uold,v,vold,u_y,v_x,omegold, omegcheck, omeg, &
omegoldhat, omegoldhat_x, omegold_x,&
omegoldhat_y, omegold_y, nlold, nloldhat,&
omeghat, omeghat_x, omeghat_y, omeg_x, omeg_y,&
nl, nlhat, psihat, psihat_x, psi_x, psihat_y, psi_y,&
uexact_y,vexact_x,omegexact, &
fftfx,fftbx,stat=AllocateStatus)
IF (AllocateStatus .ne. 0) STOP
PRINT *,'Program execution complete'
END PROGRAM main


( A)

! Fortran module for interacting with the ParaView/Catalyst coprocessor
implicit none
public
contains
subroutine nsadaptor(nx, ny, nz, step, time, scalar)
! nx, ny, nz -- grid dimensions
! step       -- current time step
! time       -- current time
! scalar     -- the data
integer, intent(in) :: nx, ny, nz, step
real(kind=8), intent(in) :: time
real(kind=8), dimension(:,:), intent(in) :: scalar
integer :: flag

! check if coprocessing this step
if (flag .ne. 0) then
! coprocessing requested
! check if grid exists
call needtocreategrid(flag)

if (flag .ne. 0) then
! grid needed
! defined by application developer
call createcpimagedata(nx, ny, nz)
end if

! also defined by application dev

call coprocess()
end if

return


( A)

C++ VTK data helper: PointBasedDataSet.cxx -- Changes are a new header and a new namespace. Notice also the use of vtkSmartPointer instead of raw pointer handling. Code download

// VTK Data set creator for ParaView/Catalyst Coprocessing.
// Interfaces with a Fortran adaptor to get data from a Fortran
// simulation.

// Fortran specific header for API called from Fortran glue module

// Adaptor specific headers, will need to change if simulation data structure
// changes. This should available from the ParaView install include directory
// if you chose to install development files during configuration.
#include "vtkCPProcessor.h"
#include "vtkDoubleArray.h"
#include "vtkPointData.h"
#include "vtkSmartPointer.h"
#include "vtkImageData.h"
#include "vtkCellData.h"

#include <string>
#include <iostream>
// Names are mangled with trailing underscore for linker visibility, along with
// using extern "C"

// Creates the vtkDataSet container for the CoProcessor. The simulation data
// must be compatible.
extern "C" void createcpimagedata_(int* nx, int* ny, int* nz)
{
vtkGenericWarningMacro("Unable to access coprocessor data.");
return;
}

vtkSmartPointer<vtkImageData> img = vtkSmartPointer<vtkImageData>::New();

// Note: expects Fortran indexing
img->SetExtent(0, *nx - 1, 0, *ny - 1, 0, *nz - 1);

}

// Add data to the container.
extern "C" void addfield_(double* simdata, char* name)
{

vtkSmartPointer<vtkImageData> img = vtkImageData::SafeDownCast(idd->GetGrid());

if (!img) {
vtkGenericWarningMacro("No adaptor grid to attach field data to.");
return;
}

std::string fieldPythonName = name;

if (idd->IsFieldNeeded(name)) {
vtkSmartPointer<vtkDoubleArray> field = vtkSmartPointer<vtkDoubleArray>::New();
field->SetName(name);
field->SetArray(simdata, img->GetNumberOfPoints(), 1);
}
}


# Using ParaView CoProcessing with MPI and Domain Decomposition

The following notes developed while working on NS3D-MPI, NavierStokes3DfftIMR.f90, KgSemiImp3D.f90, and NSLsplitting.f90 on Nautilus at NICS.

## Nodal interpretation

When working with single CPU node, non-partioned data, the simulation arrays can be passed directly to ParaView and added to the pipeline using the vtkDataSet::GetPointData()->AddArray() method. The vtkImageData’s points then correspond exactly to the simulation’s mesh nodes. This vtkImageData will have an implied cellular structure, with each cell corner being one of the simulation nodes.

Without special altering for MPI this code will run, however ParaView won’t be able to create the cells whose corners are on differet ranks. The .pvti will give an error when loaded that the extents are incorrect. The individual .vtis can be loaded to see that this is indeed the case and that there will be cellular gaps between the pencils.

E.g., say the array is dimension(20,10,10) and has been decomposed into 2 pencils, (1:10,:,:) and (11:20,:,:). ParaView considers the data for (10:11,:,:) to be missing and a gap will be present.

## Working with point halos

Halos provide one way to fill in the gaps. The 2decomp&fft halo api creates arrays containing the pencil data surrounded by halo elements. E.g., say a pencil is (10:20,10:20,:), the halo array is then (9:21,9:21,:). Using the halos raises some issues:

• Halo elements can extend outside of the computational domain and these will contain garbage.
• Not every pencil needs to send the same halo elements to the coprocessor. This requires additional decision logic in the program.
• From the C++ point of view, a halo array section will not be densely packed, and therefore contain garbage.

### Code changes in simulation code

( D)

• Added 1D arrays for passing packed arrays to C++.
• Added branching for which halo elements to pass based on pencil extents.
1. If pencil furthest from index origin, send just the pencil, no halo elements.
2. Else if pencil is against either furthest extent, but not both, send the halo elements furthest in the other extent.
3. Else send the halo elements furthest in both extents.
• Added reshaping of the halo sections. Arrays are passed by reference from F90 to C++. C++ doesn’t understand F90 array sections and will just read incrementally from the reference (this is an issue with any array sectioning, not just halos). One way around this is to reshape the section as a 1D array, this will make the desired data contiguous in memory.
PROGRAM main
!-----------------------------------------------------------------------------------
!
!
! PURPOSE
!
! This program numerically solves the 3D incompressible Navier-Stokes
! on a Cubic Domain [0,2pi]x[0,2pi]x[0,2pi] using pseudo-spectral methods and
! Implicit Midpoint rule timestepping. The numerical solution is compared to
! an exact solution reported by Shapiro
!
! Analytical Solution:
!       u(x,y,z,t)=-0.25*(cos(x)sin(y)sin(z)+sin(x)cos(y)cos(z))exp(-t/Re)
!       v(x,y,z,t)= 0.25*(sin(x)cos(y)sin(z)-cos(x)sin(y)cos(z))exp(-t/Re)
!       w(x,y,z,t)= 0.5*cos(x)cos(y)sin(z)exp(-t/Re)
!
! .. Parameters ..
!  Nx                           = number of modes in x - power of 2 for FFT
!  Ny                           = number of modes in y - power of 2 for FFT
!  Nz                           = number of modes in z - power of 2 for FFT
!  Nt                           = number of timesteps to take
!  Tmax                         = maximum simulation time
!  FFTW_IN_PLACE        = value for FFTW input
!  FFTW_MEASURE         = value for FFTW input
!  FFTW_EXHAUSTIVE      = value for FFTW input
!  FFTW_PATIENT         = value for FFTW input
!  FFTW_ESTIMATE        = value for FFTW input
!  FFTW_FORWARD     = value for FFTW input
!  FFTW_BACKWARD        = value for FFTW input
!  pi = 3.14159265358979323846264338327950288419716939937510d0
!  Re                           = Reynolds number
! .. Scalars ..
!  i                            = loop counter in x direction
!  j                            = loop counter in y direction
!  k                            = loop counter in z direction
!  n                            = loop counter for timesteps direction
!  allocatestatus       = error indicator during allocation
!  count                        = keep track of information written to disk
!  iol                          = size of array to write to disk
!  start                        = variable to record start time of program
!  finish                       = variable to record end time of program
!  count_rate           = variable for clock count rate
!  planfxyz                     = Forward 3d fft plan
!  planbxyz                     = Backward 3d fft plan
!  dt                           = timestep
! .. Arrays ..
!  u                            = velocity in x direction
!  v                            = velocity in y direction
!  w                            = velocity in z direction
!  uold                         = velocity in x direction at previous timestep
!  vold                         = velocity in y direction at previous timestep
!  wold                         = velocity in z direction at previous timestep
!  ux                           = x derivative of velocity in x direction
!  uy                           = y derivative of velocity in x direction
!  uz                           = z derivative of velocity in x direction
!  vx                           = x derivative of velocity in y direction
!  vy                           = y derivative of velocity in y direction
!  vz                           = z derivative of velocity in y direction
!  wx                           = x derivative of velocity in z direction
!  wy                           = y derivative of velocity in z direction
!  wz                           = z derivative of velocity in z direction
!  uxold                        = x derivative of velocity in x direction
!  uyold                        = y derivative of velocity in x direction
!  uzold                        = z derivative of velocity in x direction
!  vxold                        = x derivative of velocity in y direction
!  vyold                        = y derivative of velocity in y direction
!  vzold                        = z derivative of velocity in y direction
!  wxold                        = x derivative of velocity in z direction
!  wyold                        = y derivative of velocity in z direction
!  wzold                        = z derivative of velocity in z direction
!  utemp                        = temporary storage of u to check convergence
!  vtemp                        = temporary storage of u to check convergence
!  wtemp                        = temporary storage of u to check convergence
!  temp_r                       = temporary storage for untransformed variables
!  uhat                         = Fourier transform of u
!  vhat                         = Fourier transform of v
!  what                         = Fourier transform of w
!  rhsuhatfix           = Fourier transform of righthand side for u for timestepping
!  rhsvhatfix           = Fourier transform of righthand side for v for timestepping
!  rhswhatfix           = Fourier transform of righthand side for w for timestepping
!  nonlinuhat           = Fourier transform of nonlinear term for u
!  nonlinvhat           = Fourier transform of nonlinear term for u
!  nonlinwhat           = Fourier transform of nonlinear term for u
!  phat                         = Fourier transform of nonlinear term for pressure, p
!  temp_c                       = temporary storage for Fourier transforms
!  realtemp                     = Real storage
!
! .. Vectors ..
!  kx                           = fourier frequencies in x direction
!  ky                           = fourier frequencies in y direction
!  kz                           = fourier frequencies in z direction
!  x                            = x locations
!  y                            = y locations
!  z                            = y locations
!  time                         = times at which save data
!  name_config          = array to store filename for data to be saved
!
! REFERENCES
!
! A. Shapiro " The use of an exact solution of the Navier-Stokes equations
! in a validation test of a three-dimensional nonhydrostatic numerical model"
! Monthly Weather Review vol. 121, 2420-2425, (1993).
!
! ACKNOWLEDGEMENTS
!
! ACCURACY
!
! ERROR INDICATORS AND WARNINGS
!
!
! This program has not been optimized to use the least amount of memory
! but is intended as an example only for which all states can be saved
!
!--------------------------------------------------------------------------------
! External routines required
!
! External libraries required
! 2DECOMP&FFT -- Fast Fourier Transform in the West Library
!                       (http://2decomp.org/)

!---------------------------------------------------------------------------------

USE decomp_2d
USE decomp_2d_fft
USE decomp_2d_io
USE MPI
!coprocessing:
IMPLICIT NONE
! declare variables
INTEGER(kind=4), PARAMETER              :: Nx=256
INTEGER(kind=4), PARAMETER              :: Ny=256
INTEGER(kind=4), PARAMETER              :: Nz=256
INTEGER(kind=4), PARAMETER              :: Lx=1
INTEGER(kind=4), PARAMETER              :: Ly=1
INTEGER(kind=4), PARAMETER              :: Lz=1
INTEGER(kind=4), PARAMETER              :: Nt=20
REAL(kind=8), PARAMETER                 :: dt=0.05d0/Nt
REAL(kind=8), PARAMETER                 :: Re=1.0d0
REAL(kind=8), PARAMETER                 :: tol=0.1d0**10
REAL(kind=8), PARAMETER                 :: theta=0.0d0

REAL(kind=8), PARAMETER &
::  pi=3.14159265358979323846264338327950288419716939937510d0
REAL(kind=8), PARAMETER         ::      ReInv=1.0d0/REAL(Re,kind(0d0))
REAL(kind=8), PARAMETER         ::  dtInv=1.0d0/REAL(dt,kind(0d0))
REAL(kind=8)                            :: scalemodes,chg,factor
REAL(kind=8), DIMENSION(:), ALLOCATABLE                 :: x, y, z, time,mychg,allchg
COMPLEX(kind=8), DIMENSION(:,:,:), ALLOCATABLE  :: u, v, w,&
ux, uy, uz,&
vx, vy, vz,&
wx, wy, wz,&
uold, uxold, uyold, uzold,&
vold, vxold, vyold, vzold,&
wold, wxold, wyold, wzold,&
utemp, vtemp, wtemp, temp_r

COMPLEX(kind=8), DIMENSION(:), ALLOCATABLE      ::      kx, ky, kz
COMPLEX(kind=8), DIMENSION(:,:,:), ALLOCATABLE  ::      uhat, vhat, what,&
rhsuhatfix, rhsvhatfix,&
rhswhatfix, nonlinuhat,&
nonlinvhat, nonlinwhat,&
phat,temp_c
!coprocessing: added x,y,z component arrays also halos
!coprocessing: halos get allocated inside call to update_halo
REAL(kind=8), DIMENSION(:,:,:), ALLOCATABLE     ::  realtemp, realtempx, &
realtempy, realtempz, tempxhalo, tempyhalo, tempzhalo
!coprocessing: added 1D array for passing to C++
real(kind=8), dimension(:), allocatable :: xpass2c, ypass2c, zpass2c
! MPI and 2DECOMP variables
TYPE(DECOMP_INFO)                               ::  decomp
INTEGER(kind=MPI_OFFSET_KIND)                   ::  filesize, disp
INTEGER(kind=4)                                 ::  p_row=0, p_col=0, numprocs, myid, ierr

! variables used for saving data and timing
INTEGER(kind=4)                                 :: count, iol
INTEGER(kind=4)                                 :: i,j,k,n,t,allocatestatus
INTEGER(kind=4)                                 :: ind, numberfile
CHARACTER*100                                   :: name_config
INTEGER(kind=4)                                 :: start, finish, count_rate

! initialisation of 2DECOMP&FFT
CALL MPI_INIT(ierr)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)
! do automatic domain decomposition
CALL decomp_2d_init(Nx,Ny,Nz,p_row,p_col)
! get information about domain decomposition choosen
CALL decomp_info_init(Nx,Ny,Nz,decomp)
! initialise FFT library
CALL decomp_2d_fft_init
IF (myid.eq.0) THEN
PRINT *,'Grid:',Nx,'X',Ny,'Y',Nz,'Z'
PRINT *,'dt:',dt
END IF

ALLOCATE(x(1:Nx),y(1:Ny),z(1:Nz),time(1:Nt+1),mychg(1:3),allchg(1:3),&
u(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
v(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
w(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
ux(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
uy(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
uz(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vx(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vy(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vz(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wx(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wy(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wz(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
uold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
uxold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
uyold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
uzold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vxold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vyold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vzold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wxold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wyold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wzold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
utemp(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vtemp(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wtemp(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
temp_r(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
kx(1:Nx),ky(1:Ny),kz(1:Nz),&
uhat(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
vhat(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
what(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
rhsuhatfix(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
rhsvhatfix(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
rhswhatfix(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
nonlinuhat(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
nonlinvhat(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
nonlinwhat(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
phat(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
temp_c(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
realtemp(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
realtempx(decomp%xst(1):decomp%xen(1), &
decomp%xst(2):decomp%xen(2), &
decomp%xst(3):decomp%xen(3)),&
realtempy(decomp%xst(1):decomp%xen(1), &
decomp%xst(2):decomp%xen(2), &
decomp%xst(3):decomp%xen(3)),&
realtempz(decomp%xst(1):decomp%xen(1), &
decomp%xst(2):decomp%xen(2), &
decomp%xst(3):decomp%xen(3)),&
stat=AllocateStatus)
IF (AllocateStatus .ne. 0) STOP
IF (myid.eq.0) THEN
PRINT *,'allocated space'
END IF

! setup fourier frequencies in x-direction
DO i=1,Nx/2+1
kx(i)= cmplx(0.0d0,1.0d0)*REAL(i-1,kind(0d0))/Lx
END DO
kx(1+Nx/2)=0.0d0
DO i = 1,Nx/2 -1
kx(i+1+Nx/2)=-kx(1-i+Nx/2)
END DO
ind=1
DO i=-Nx/2,Nx/2-1
x(ind)=2.0d0*pi*REAL(i,kind(0d0))*Lx/REAL(Nx,kind(0d0))
ind=ind+1
END DO
! setup fourier frequencies in y-direction
DO j=1,Ny/2+1
ky(j)= cmplx(0.0d0,1.0d0)*REAL(j-1,kind(0d0))/Ly
END DO
ky(1+Ny/2)=0.0d0
DO j = 1,Ny/2 -1
ky(j+1+Ny/2)=-ky(1-j+Ny/2)
END DO
ind=1
DO j=-Ny/2,Ny/2-1
y(ind)=2.0d0*pi*REAL(j,kind(0d0))*Ly/REAL(Ny,kind(0d0))
ind=ind+1
END DO
! setup fourier frequencies in z-direction
DO k=1,Nz/2+1
kz(k)= cmplx(0.0d0,1.0d0)*REAL(k-1,kind(0d0))/Lz
END DO
kz(1+Nz/2)=0.0d0
DO k = 1,Nz/2 -1
kz(k+1+Nz/2)=-kz(1-k+Nz/2)
END DO
ind=1
DO k=-Nz/2,Nz/2-1
z(ind)=2.0d0*pi*REAL(k,kind(0d0))*Lz/REAL(Nz,kind(0d0))
ind=ind+1
END DO
scalemodes=1.0d0/REAL(Nx*Ny*Nz,kind(0d0))
IF (myid.eq.0) THEN
PRINT *,'Setup grid and fourier frequencies'
END IF

!initial conditions for Taylor-Green vortex
!       factor=2.0d0/sqrt(3.0d0)
!       DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
!               u(i,j,k)=factor*sin(theta+2.0d0*pi/3.0d0)*sin(x(i))*cos(y(j))*cos(z(k))
!       END DO; END DO; END DO
!       DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
!               v(i,j,k)=factor*sin(theta-2.0d0*pi/3.0d0)*cos(x(i))*sin(y(j))*cos(z(k))
!       END DO ; END DO ; END DO
!       DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
!               w(i,j,k)=factor*sin(theta)*cos(x(i))*cos(y(j))*sin(z(k))
!       END DO ; END DO ; END DO

time(1)=0.0d0
factor=sqrt(3.0d0)
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
u(i,j,k)=-0.5*( factor*cos(x(i))*sin(y(j))*sin(z(k))&
+sin(x(i))*cos(y(j))*cos(z(k)) )*exp(-(factor**2)*time(1)/Re)
END DO; END DO; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
v(i,j,k)=0.5*(  factor*sin(x(i))*cos(y(j))*sin(z(k))&
-cos(x(i))*sin(y(j))*cos(z(k)) )*exp(-(factor**2)*time(1)/Re)
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
w(i,j,k)=cos(x(i))*cos(y(j))*sin(z(k))*exp(-(factor**2)*time(1)/Re)
END DO; END DO ; END DO

CALL decomp_2d_fft_3d(u,uhat,DECOMP_2D_FFT_FORWARD)
CALL decomp_2d_fft_3d(v,vhat,DECOMP_2D_FFT_FORWARD)
CALL decomp_2d_fft_3d(w,what,DECOMP_2D_FFT_FORWARD)

! derivative of u with respect to x, y, and z
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=uhat(i,j,k)*kx(i)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,ux,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=uhat(i,j,k)*ky(j)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,uy,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=uhat(i,j,k)*kz(k)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,uz,DECOMP_2D_FFT_BACKWARD)

! derivative of v with respect to x, y, and z
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=vhat(i,j,k)*kx(i)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,vx,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=vhat(i,j,k)*ky(j)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,vy,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=vhat(i,j,k)*kz(k)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,vz,DECOMP_2D_FFT_BACKWARD)

! derivative of w with respect to x, y, and z
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=what(i,j,k)*kx(i)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,wx,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=what(i,j,k)*ky(j)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,wy,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=what(i,j,k)*kz(k)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,wz,DECOMP_2D_FFT_BACKWARD)

! save initial data
n = 0
!coprocessing: removed savedata calls from coprocessing version
!coprocessing: could have this after the coprocessor initialize and
!coprocessing: followed by a call to the coprocessor to write out the initial data
!coprocessing: image. Otherwise, these are just wasted calls at this spot.
call coprocessorinitialize("pipeline.py", 11)

DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
realtempx(i,j,k)=REAL(wy(i,j,k)-vz(i,j,k),KIND=8)
END DO; END DO ; END DO
call update_halo(realtempx, tempxhalo, level=1)

DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
realtempy(i,j,k)=REAL(uz(i,j,k)-wx(i,j,k),KIND=8)
END DO; END DO ; END DO
call update_halo(realtempy, tempyhalo, level=1)

DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
realtempz(i,j,k)=REAL(vx(i,j,k)-uy(i,j,k),KIND=8)
END DO; END DO ; END DO
call update_halo(realtempz, tempzhalo, level=1)

!coprocessing: Four cases of sending halo data to the coprocessor.
if ((ny_global == decomp%xen(2)) .and. (nz_global == decomp%xen(3))) then
!coprocessing: case 1: pencil furthest from index origin, send just the pencil
allocate(xpass2c(decomp%xsz(1)*decomp%xsz(2)*decomp%xsz(3)))
allocate(ypass2c(decomp%xsz(1)*decomp%xsz(2)*decomp%xsz(3)))
allocate(zpass2c(decomp%xsz(1)*decomp%xsz(2)*decomp%xsz(3)))
xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)),(/size(xpass2c)/))
ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)),(/size(ypass2c)/))
zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)),(/size(zpass2c)/))
decomp%xst(2),decomp%xen(2),decomp%xst(3),decomp%xen(3), n, n*dt, &
xpass2c, ypass2c, zpass2c)
deallocate(xpass2c, ypass2c, zpass2c)
else if ((ny_global > decomp%xen(2)) .and. (nz_global == decomp%xen(3))) then
!coprocessing: case 2: pencil furthest in z, send just upper y halo
allocate(xpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*decomp%xsz(3)))
allocate(ypass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*decomp%xsz(3)))
allocate(zpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*decomp%xsz(3)))
xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)),(/size(xpass2c)/))
ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)),(/size(ypass2c)/))
zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)),(/size(zpass2c)/))
decomp%xst(2),decomp%xen(2)+1,decomp%xst(3),decomp%xen(3), n, n*dt, &
xpass2c, ypass2c, zpass2c)
deallocate(xpass2c, ypass2c, zpass2c)
else if ((ny_global == decomp%xen(2)) .and. (nz_global > decomp%xen(3))) then
!coprocessing: case 3: pencil furthest in y, send just upper z halo
allocate(xpass2c(decomp%xsz(1)*decomp%xsz(2)*(decomp%xsz(3)+1)))
allocate(ypass2c(decomp%xsz(1)*decomp%xsz(2)*(decomp%xsz(3)+1)))
allocate(zpass2c(decomp%xsz(1)*decomp%xsz(2)*(decomp%xsz(3)+1)))
xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)+1),(/size(xpass2c)/))
ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)+1),(/size(ypass2c)/))
zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)+1),(/size(zpass2c)/))
decomp%xst(2),decomp%xen(2),decomp%xst(3),decomp%xen(3)+1, n, n*dt, &
xpass2c, ypass2c, zpass2c)
deallocate(xpass2c, ypass2c, zpass2c)
else
!coprocessing: case 4: send both upper y and upper z halo
allocate(xpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*(decomp%xsz(3)+1)))
allocate(ypass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*(decomp%xsz(3)+1)))
allocate(zpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*(decomp%xsz(3)+1)))
xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)+1),(/size(xpass2c)/))
ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)+1),(/size(ypass2c)/))
zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)+1),(/size(zpass2c)/))
decomp%xst(2),decomp%xen(2)+1,decomp%xst(3),decomp%xen(3)+1,n,n*dt, &
xpass2c, ypass2c, zpass2c)
deallocate(xpass2c,ypass2c,zpass2c)
end if

!start timer
CALL system_clock(start,count_rate)
! coprocessing: Simulation loop starts here
DO n=1,Nt
!fixed point
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
uold(i,j,k)=u(i,j,k)
uxold(i,j,k)=ux(i,j,k)
uyold(i,j,k)=uy(i,j,k)
uzold(i,j,k)=uz(i,j,k)
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
vold(i,j,k)=v(i,j,k)
vxold(i,j,k)=vx(i,j,k)
vyold(i,j,k)=vy(i,j,k)
vzold(i,j,k)=vz(i,j,k)
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
wold(i,j,k)=w(i,j,k)
wxold(i,j,k)=wx(i,j,k)
wyold(i,j,k)=wy(i,j,k)
wzold(i,j,k)=wz(i,j,k)
END DO; END DO ; END DO
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
rhsuhatfix(i,j,k) = (dtInv+(0.5*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)))*uhat(i,j,k)
END DO; END DO ; END DO
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
rhsvhatfix(i,j,k) = (dtInv+(0.5*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)))*vhat(i,j,k)
END DO; END DO ; END DO
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
rhswhatfix(i,j,k) = (dtInv+(0.5*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)))*what(i,j,k)
END DO; END DO ; END DO

chg=1
DO WHILE (chg .gt. tol)
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
temp_r(i,j,k)=0.25d0*((u(i,j,k)+uold(i,j,k))*(ux(i,j,k)+uxold(i,j,k))&
+(v(i,j,k)+vold(i,j,k))*(uy(i,j,k)+uyold(i,j,k))&
+(w(i,j,k)+wold(i,j,k))*(uz(i,j,k)+uzold(i,j,k)))
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_r,nonlinuhat,DECOMP_2D_FFT_FORWARD)
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
temp_r(i,j,k)=0.25d0*((u(i,j,k)+uold(i,j,k))*(vx(i,j,k)+vxold(i,j,k))&
+(v(i,j,k)+vold(i,j,k))*(vy(i,j,k)+vyold(i,j,k))&
+(w(i,j,k)+wold(i,j,k))*(vz(i,j,k)+vzold(i,j,k)))
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_r,nonlinvhat,DECOMP_2D_FFT_FORWARD)
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
temp_r(i,j,k)=0.25d0*((u(i,j,k)+uold(i,j,k))*(wx(i,j,k)+wxold(i,j,k))&
+(v(i,j,k)+vold(i,j,k))*(wy(i,j,k)+wyold(i,j,k))&
+(w(i,j,k)+wold(i,j,k))*(wz(i,j,k)+wzold(i,j,k)))
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_r,nonlinwhat,DECOMP_2D_FFT_FORWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
phat(i,j,k)=-1.0d0*( kx(i)*nonlinuhat(i,j,k)&
+ky(j)*nonlinvhat(i,j,k)&
+kz(k)*nonlinwhat(i,j,k))&
/(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)+0.1d0**13)
END DO; END DO ; END DO

DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
uhat(i,j,k)=(rhsuhatfix(i,j,k)-nonlinuhat(i,j,k)-kx(i)*phat(i,j,k))/&
(dtInv-(0.5d0*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k))) !*scalemodes
END DO; END DO ; END DO
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
vhat(i,j,k)=(rhsvhatfix(i,j,k)-nonlinvhat(i,j,k)-ky(j)*phat(i,j,k))/&
(dtInv-(0.5d0*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k))) !*scalemodes
END DO; END DO ; END DO
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
what(i,j,k)=(rhswhatfix(i,j,k)-nonlinwhat(i,j,k)-kz(k)*phat(i,j,k))/&
(dtInv-(0.5d0*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k))) !*scalemodes
END DO; END DO ; END DO

! derivative of u with respect to x, y, and z
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=uhat(i,j,k)*kx(i)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,ux,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3); DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=uhat(i,j,k)*ky(j)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,uy,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3); DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=uhat(i,j,k)*kz(k)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,uz,DECOMP_2D_FFT_BACKWARD)

! derivative of v with respect to x, y, and z
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=vhat(i,j,k)*kx(i)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,vx,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3); DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=vhat(i,j,k)*ky(j)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,vy,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=vhat(i,j,k)*kz(k)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,vz,DECOMP_2D_FFT_BACKWARD)

! derivative of w with respect to x, y, and z
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=what(i,j,k)*kx(i)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,wx,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=what(i,j,k)*ky(j)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,wy,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=what(i,j,k)*kz(k)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,wz,DECOMP_2D_FFT_BACKWARD)

DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
utemp(i,j,k)=u(i,j,k)
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
vtemp(i,j,k)=v(i,j,k)
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
wtemp(i,j,k)=w(i,j,k)
END DO; END DO ; END DO

CALL decomp_2d_fft_3d(uhat,u,DECOMP_2D_FFT_BACKWARD)
CALL decomp_2d_fft_3d(vhat,v,DECOMP_2D_FFT_BACKWARD)
CALL decomp_2d_fft_3d(what,w,DECOMP_2D_FFT_BACKWARD)

DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
u(i,j,k)=u(i,j,k)*scalemodes
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
v(i,j,k)=v(i,j,k)*scalemodes
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
w(i,j,k)=w(i,j,k)*scalemodes
END DO; END DO ; END DO

mychg(1) =maxval(abs(utemp-u))
mychg(2) =maxval(abs(vtemp-v))
mychg(3) =maxval(abs(wtemp-w))
CALL MPI_ALLREDUCE(mychg,allchg,3,MPI_DOUBLE_PRECISION,MPI_MAX,MPI_COMM_WORLD,ierr)
chg=allchg(1)+allchg(2)+allchg(3)
IF (myid.eq.0) THEN
PRINT *,'chg:',chg
END IF
END DO
time(n+1)=n*dt

!goto 5100
IF (myid.eq.0) THEN
PRINT *,'time',n*dt
END IF

!coprocessing: populating the arrays sent to the coprocessor
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
realtempx(i,j,k)=REAL(wy(i,j,k)-vz(i,j,k),KIND=8)
END DO; END DO ; END DO
call update_halo(realtempx, tempxhalo, level=1)

DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
realtempy(i,j,k)=REAL(uz(i,j,k)-wx(i,j,k),KIND=8)
END DO; END DO ; END DO
call update_halo(realtempy, tempyhalo, level=1)

DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
realtempz(i,j,k)=REAL(vx(i,j,k)-uy(i,j,k),KIND=8)
END DO; END DO ; END DO
call update_halo(realtempz, tempzhalo, level=1)

!coprocessing: same cases as for initial conditions.
if ((ny_global == decomp%xen(2)) .and. (nz_global == decomp%xen(3))) then
! case 1: send just the pencil
allocate(xpass2c(decomp%xsz(1)*decomp%xsz(2)*decomp%xsz(3)))
allocate(ypass2c(decomp%xsz(1)*decomp%xsz(2)*decomp%xsz(3)))
allocate(zpass2c(decomp%xsz(1)*decomp%xsz(2)*decomp%xsz(3)))
xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)),(/size(xpass2c)/))
ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)),(/size(ypass2c)/))
zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)),(/size(zpass2c)/))
decomp%xst(2),decomp%xen(2),decomp%xst(3),decomp%xen(3), n, n*dt, &
xpass2c, ypass2c, zpass2c)
deallocate(xpass2c, ypass2c, zpass2c)
else if ((ny_global > decomp%xen(2)) .and. (nz_global == decomp%xen(3))) then
! case 2: send just upper y halo
allocate(xpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*decomp%xsz(3)))
allocate(ypass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*decomp%xsz(3)))
allocate(zpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*decomp%xsz(3)))
xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)),(/size(xpass2c)/))
ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)),(/size(ypass2c)/))
zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)),(/size(zpass2c)/))
decomp%xst(2),decomp%xen(2)+1,decomp%xst(3),decomp%xen(3), n, n*dt, &
xpass2c, ypass2c, zpass2c)
deallocate(xpass2c, ypass2c, zpass2c)
else if ((ny_global == decomp%xen(2)) .and. (nz_global > decomp%xen(3))) then
! case 3: send just upper z halo
allocate(xpass2c(decomp%xsz(1)*decomp%xsz(2)*(decomp%xsz(3)+1)))
allocate(ypass2c(decomp%xsz(1)*decomp%xsz(2)*(decomp%xsz(3)+1)))
allocate(zpass2c(decomp%xsz(1)*decomp%xsz(2)*(decomp%xsz(3)+1)))
xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)+1),(/size(xpass2c)/))
ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)+1),(/size(ypass2c)/))
zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)+1),(/size(zpass2c)/))
decomp%xst(2),decomp%xen(2),decomp%xst(3),decomp%xen(3)+1, n, n*dt, &
xpass2c, ypass2c, zpass2c)
deallocate(xpass2c, ypass2c, zpass2c)
else
! send both upper y and upper z halo
allocate(xpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*(decomp%xsz(3)+1)))
allocate(ypass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*(decomp%xsz(3)+1)))
allocate(zpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*(decomp%xsz(3)+1)))
xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)+1),(/size(xpass2c)/))
ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)+1),(/size(ypass2c)/))
zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)+1),(/size(zpass2c)/))
decomp%xst(2),decomp%xen(2)+1,decomp%xst(3),decomp%xen(3)+1,n,n*dt, &
xpass2c, ypass2c, zpass2c)
deallocate(xpass2c,ypass2c,zpass2c)
end if

END DO
!coprocessing:
call coprocessorfinalize()

CALL system_clock(finish,count_rate)

IF (myid.eq.0) then
PRINT *, 'Program took', REAL(finish-start)/REAL(count_rate), 'for main timestepping loop'
END IF

IF (myid.eq.0) THEN
name_config = './data/tdata.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO n=1,1+Nt
WRITE(11,*) time(n)
END DO
CLOSE(11)

name_config = './data/xcoord.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO i=1,Nx
WRITE(11,*) x(i)
END DO
CLOSE(11)

name_config = './data/ycoord.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO j=1,Ny
WRITE(11,*) y(j)
END DO
CLOSE(11)

name_config = './data/zcoord.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO k=1,Nz
WRITE(11,*) z(k)
END DO
CLOSE(11)
PRINT *,'Saved data'
END IF

! Calculate error in final numerical solution
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
utemp(i,j,k)=u(i,j,k) -&
(-0.5*( factor*cos(x(i))*sin(y(j))*sin(z(k))&
+sin(x(i))*cos(y(j))*cos(z(k)) )*exp(-(factor**2)*time(Nt+1)/Re))
END DO; END DO; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
vtemp(i,j,k)=v(i,j,k) -&
(0.5*(  factor*sin(x(i))*cos(y(j))*sin(z(k))&
-cos(x(i))*sin(y(j))*cos(z(k)) )*exp(-(factor**2)*time(Nt+1)/Re))
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
wtemp(i,j,k)=w(i,j,k)-&
(cos(x(i))*cos(y(j))*sin(z(k))*exp(-(factor**2)*time(Nt+1)/Re))
END DO; END DO ; END DO
mychg(1) = maxval(abs(utemp))
mychg(2) = maxval(abs(vtemp))
mychg(3) = maxval(abs(wtemp))
CALL MPI_ALLREDUCE(mychg,allchg,3,MPI_DOUBLE_PRECISION,MPI_MAX,MPI_COMM_WORLD,ierr)
chg=allchg(1)+allchg(2)+allchg(3)
IF (myid.eq.0) THEN
PRINT*,'The error at the final timestep is',chg
END IF

! clean up
CALL decomp_2d_fft_finalize
CALL decomp_2d_finalize

DEALLOCATE(x,y,z,time,mychg,allchg,u,v,w,ux,uy,uz,vx,vy,vz,wx,wy,wz,uold,uxold,uyold,uzold,&
vold,vxold,vyold,vzold,wold,wxold,wyold,wzold,utemp,vtemp,wtemp,&
temp_r,kx,ky,kz,uhat,vhat,what,rhsuhatfix,rhsvhatfix,&
rhswhatfix,phat,nonlinuhat,nonlinvhat,nonlinwhat,temp_c,&
realtemp,stat=AllocateStatus)
!coprocessing: deallocate Coprocessing specific arrays
deallocate(realtempx, realtempy, realtempz, tempxhalo, tempyhalo, tempzhalo)
IF (AllocateStatus .ne. 0) STOP
IF (myid.eq.0) THEN
PRINT *,'Program execution complete'
END IF
CALL MPI_FINALIZE(ierr)

END PROGRAM main


### Code changes in f90 adaptor glue code

( E)

• Changed the adaptor to accept the subextents of the current pencil.
! fortran module for interacting with the ParaView CoProcessor
! loosely based on:
! -- Changed for SESE

! subroutine determines if coprocessing needed during the
! current simulation step or not, and if so, calls the coprocessor.
! some of the subroutines are supplied by ParaView's FortranAdaptorAPI,
! others have to be supplied by the programmer.
use iso_c_binding
implicit none
public
! Originally also had a version that accepted 3D arrays, but in all other
! respects was identical. Decided this could lead to too much confusion.
contains
subroutine NSadaptor1D(nx, ny, nz, xst, xen, yst, yen, zst, zen, &
step, time, omegax, omegay, omegaz)
! nx, ny, nz     -- grid dimensions or entire mesh
!                   used for setting whole extent
! xst, xen, etc. -- extents of current subdomain pencil
! step           -- current simulation time step
! time           -- current simulation time
! omega*          -- scalar array for the current time step
! flag           -- receives status from API calls
integer, intent(in) :: nx, ny, nz, xst, xen, yst, yen, zst, zen, step
real(kind=8), intent(in) :: time
real(kind=8), dimension(:), intent (in) :: omegax, omegay, omegaz
integer :: flag

! check if processing this time step

if (flag /= 0) then
! processing requested
! check if need to create grid
call needtocreategrid(flag)

if (flag /= 0) then
! grid needed
! defined in VTKPointBasedDataSet.cxx
! takes the size of the entire grid, and the extents of the
! sub grid.
call createcpimagedata(nx, ny, nz, xst, xen, yst, yen, zst, zen)
end if

! defined in VTKPointBasedDataSet.cxx
! remember to null-terminate strings for C/C++

call coprocess()
end if

return


### Code changes in the C++ wrapping code

( F)

• Specified the partitioned data extents relative to the entire data.
• Specified the relative spacing of the cells.
// Adaptor for getting fortran simulation code into ParaView CoProcessor.
// Based on the PhastaAdaptor sample in the ParaView distribution.

// Routines that take the place of VTK dataset object creation.
// Called from Fortran code which also calls the Fortran Adaptor API
// supplied with ParaView source.
// Note: names mangled with trailing underscores for linker visibility.
#include "vtkCPProcessor.h"
#include "vtkDoubleArray.h"
#include "vtkPointData.h"
#include "vtkSmartPointer.h"
#include "vtkImageData.h"

#include <string>

// These will be called from the Fortran "glue" code"
// Completely dependent on data layout, structured vs. unstructured, etc.
// since VTK/ParaView uses different internal layouts for each.

// Creates the data container for the CoProcessor.
// Takes the extents for both the global dataset and the particular subsection
// visible to the current MPI rank.
// Note: expects to receive Fortran base-1 indices.
extern "C" void createcpimagedata_(int* nx, int* ny, int* nz, int* xst, int* xen,
int* yst, int* yen, int* zst, int* zen)
{
if (!ParaViewCoProcessing::GetCoProcessorData()) {
vtkGenericWarningMacro("Unable to access CoProcessorData.");
return;
}

// The simulation grid is a 3-dimensional topologically and geometrically
// regular grid. In VTK/ParaView, this is considered an image data set.
vtkSmartPointer<vtkImageData> img = vtkSmartPointer<vtkImageData>::New();
img->SetExtent(*xst - 1, *xen - 1, *yst - 1, *yen - 1, *zst - 1, *zen - 1);
img->SetSpacing( 1.0 / *nx, 1.0 / *ny, 1.0 / *nz);
ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input")->SetGrid(img);
ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input")->SetWholeExtent(0, *nx - 1, 0, *ny - 1, 0, *nz - 1);
}

// Add field(s) to the data container.
// Separate from above because this will be dynamic, grid is static.
// Might be an issue, VTK assumes row major C storage.
// Underscore is by-hand name mangling for fortran linking.
// Note: Expects the address of the data, has no way of determining
// if the array is densely packed or not.
// Note 2: Expects "name" to point to null-terminated character array,
// be sure to null-terminate in caller.
extern "C" void addfield_(double* a, char* name)
{

vtkSmartPointer<vtkImageData> img = vtkImageData::SafeDownCast(idd->GetGrid());

if (!img) {
vtkGenericWarningMacro("No adaptor grid to attach field data to.");
return;
}

if (idd->IsFieldNeeded(name)) {
vtkSmartPointer<vtkDoubleArray> field = vtkSmartPointer<vtkDoubleArray>::New();
field->SetName(name);
field->SetArray(a, img->GetNumberOfPoints(), 1);
}
}


### Pros

• Some VTK/ParaView filters work with ghost (halo) cells, code could be extended to handle those.

### Cons

• Much code added to the simulation code itself.
• Additional memory required for the halos and packed arrays.
• A different decomposition library might require changing the halo logic.
• Would not work as is with a 3D decomposition library.

## Reinterpreting simulation nodes as visualization cells

Another way to look at the data, is that each node of the simulation is a cell in the visualization domain. Then when the pencils are sent to ParaView, there aren’t any gaps of missing cells.

### Code changes in simulation code

( G)

• None, beyond the regular CoProcessing calls.
PROGRAM main
!-----------------------------------------------------------------------------------
!
!
! PURPOSE
!
! This program numerically solves the 3D incompressible Navier-Stokes
! on a Cubic Domain [0,2pi]x[0,2pi]x[0,2pi] using pseudo-spectral methods and
! Implicit Midpoint rule timestepping. The numerical solution is compared to
! an exact solution reported by Shapiro
!
! Analytical Solution:
!       u(x,y,z,t)=-0.25*(cos(x)sin(y)sin(z)+sin(x)cos(y)cos(z))exp(-t/Re)
!       v(x,y,z,t)= 0.25*(sin(x)cos(y)sin(z)-cos(x)sin(y)cos(z))exp(-t/Re)
!       w(x,y,z,t)= 0.5*cos(x)cos(y)sin(z)exp(-t/Re)
!
! .. Parameters ..
!  Nx                           = number of modes in x - power of 2 for FFT
!  Ny                           = number of modes in y - power of 2 for FFT
!  Nz                           = number of modes in z - power of 2 for FFT
!  Nt                           = number of timesteps to take
!  Tmax                         = maximum simulation time
!  FFTW_IN_PLACE        = value for FFTW input
!  FFTW_MEASURE         = value for FFTW input
!  FFTW_EXHAUSTIVE      = value for FFTW input
!  FFTW_PATIENT         = value for FFTW input
!  FFTW_ESTIMATE        = value for FFTW input
!  FFTW_FORWARD     = value for FFTW input
!  FFTW_BACKWARD        = value for FFTW input
!  pi = 3.14159265358979323846264338327950288419716939937510d0
!  Re                           = Reynolds number
! .. Scalars ..
!  i                            = loop counter in x direction
!  j                            = loop counter in y direction
!  k                            = loop counter in z direction
!  n                            = loop counter for timesteps direction
!  allocatestatus       = error indicator during allocation
!  count                        = keep track of information written to disk
!  iol                          = size of array to write to disk
!  start                        = variable to record start time of program
!  finish                       = variable to record end time of program
!  count_rate           = variable for clock count rate
!  planfxyz                     = Forward 3d fft plan
!  planbxyz                     = Backward 3d fft plan
!  dt                           = timestep
! .. Arrays ..
!  u                            = velocity in x direction
!  v                            = velocity in y direction
!  w                            = velocity in z direction
!  uold                         = velocity in x direction at previous timestep
!  vold                         = velocity in y direction at previous timestep
!  wold                         = velocity in z direction at previous timestep
!  ux                           = x derivative of velocity in x direction
!  uy                           = y derivative of velocity in x direction
!  uz                           = z derivative of velocity in x direction
!  vx                           = x derivative of velocity in y direction
!  vy                           = y derivative of velocity in y direction
!  vz                           = z derivative of velocity in y direction
!  wx                           = x derivative of velocity in z direction
!  wy                           = y derivative of velocity in z direction
!  wz                           = z derivative of velocity in z direction
!  uxold                        = x derivative of velocity in x direction
!  uyold                        = y derivative of velocity in x direction
!  uzold                        = z derivative of velocity in x direction
!  vxold                        = x derivative of velocity in y direction
!  vyold                        = y derivative of velocity in y direction
!  vzold                        = z derivative of velocity in y direction
!  wxold                        = x derivative of velocity in z direction
!  wyold                        = y derivative of velocity in z direction
!  wzold                        = z derivative of velocity in z direction
!  utemp                        = temporary storage of u to check convergence
!  vtemp                        = temporary storage of u to check convergence
!  wtemp                        = temporary storage of u to check convergence
!  temp_r                       = temporary storage for untransformed variables
!  uhat                         = Fourier transform of u
!  vhat                         = Fourier transform of v
!  what                         = Fourier transform of w
!  rhsuhatfix           = Fourier transform of righthand side for u for timestepping
!  rhsvhatfix           = Fourier transform of righthand side for v for timestepping
!  rhswhatfix           = Fourier transform of righthand side for w for timestepping
!  nonlinuhat           = Fourier transform of nonlinear term for u
!  nonlinvhat           = Fourier transform of nonlinear term for u
!  nonlinwhat           = Fourier transform of nonlinear term for u
!  phat                         = Fourier transform of nonlinear term for pressure, p
!  temp_c                       = temporary storage for Fourier transforms
!  realtemp                     = Real storage
!
! .. Vectors ..
!  kx                           = fourier frequencies in x direction
!  ky                           = fourier frequencies in y direction
!  kz                           = fourier frequencies in z direction
!  x                            = x locations
!  y                            = y locations
!  z                            = y locations
!  time                         = times at which save data
!  name_config          = array to store filename for data to be saved
!
! REFERENCES
!
! A. Shapiro " The use of an exact solution of the Navier-Stokes equations
! in a validation test of a three-dimensional nonhydrostatic numerical model"
! Monthly Weather Review vol. 121, 2420-2425, (1993).
!
! ACKNOWLEDGEMENTS
!
! ACCURACY
!
! ERROR INDICATORS AND WARNINGS
!
!
! This program has not been optimized to use the least amount of memory
! but is intended as an example only for which all states can be saved
!
!--------------------------------------------------------------------------------
! External routines required
!
! External libraries required
! 2DECOMP&FFT -- Fast Fourier Transform in the West Library
!                       (http://2decomp.org/)

!---------------------------------------------------------------------------------

USE decomp_2d
USE decomp_2d_fft
USE decomp_2d_io
USE MPI
!coprocessing:
IMPLICIT NONE
! declare variables
INTEGER(kind=4), PARAMETER              :: Nx=256
INTEGER(kind=4), PARAMETER              :: Ny=256
INTEGER(kind=4), PARAMETER              :: Nz=256
INTEGER(kind=4), PARAMETER              :: Lx=1
INTEGER(kind=4), PARAMETER              :: Ly=1
INTEGER(kind=4), PARAMETER              :: Lz=1
INTEGER(kind=4), PARAMETER              :: Nt=20
REAL(kind=8), PARAMETER                 :: dt=0.05d0/Nt
REAL(kind=8), PARAMETER                 :: Re=1.0d0
REAL(kind=8), PARAMETER                 :: tol=0.1d0**10
REAL(kind=8), PARAMETER                 :: theta=0.0d0

REAL(kind=8), PARAMETER &
::  pi=3.14159265358979323846264338327950288419716939937510d0
REAL(kind=8), PARAMETER         ::      ReInv=1.0d0/REAL(Re,kind(0d0))
REAL(kind=8), PARAMETER         ::  dtInv=1.0d0/REAL(dt,kind(0d0))
REAL(kind=8)                            :: scalemodes,chg,factor
REAL(kind=8), DIMENSION(:), ALLOCATABLE                 :: x, y, z, time,mychg,allchg
COMPLEX(kind=8), DIMENSION(:,:,:), ALLOCATABLE  :: u, v, w,&
ux, uy, uz,&
vx, vy, vz,&
wx, wy, wz,&
uold, uxold, uyold, uzold,&
vold, vxold, vyold, vzold,&
wold, wxold, wyold, wzold,&
utemp, vtemp, wtemp, temp_r

COMPLEX(kind=8), DIMENSION(:), ALLOCATABLE      ::      kx, ky, kz
COMPLEX(kind=8), DIMENSION(:,:,:), ALLOCATABLE  ::      uhat, vhat, what,&
rhsuhatfix, rhsvhatfix,&
rhswhatfix, nonlinuhat,&
nonlinvhat, nonlinwhat,&
phat,temp_c
REAL(kind=8), DIMENSION(:,:,:), ALLOCATABLE     ::  realtemp, realtempx, &
realtempy, realtempz
! MPI and 2DECOMP variables
TYPE(DECOMP_INFO)                               ::  decomp
INTEGER(kind=MPI_OFFSET_KIND)                   ::  filesize, disp
INTEGER(kind=4)                                 ::  p_row=0, p_col=0, numprocs, myid, ierr

! variables used for saving data and timing
INTEGER(kind=4)                                 :: count, iol
INTEGER(kind=4)                                 :: i,j,k,n,t,allocatestatus
INTEGER(kind=4)                                 :: ind, numberfile
CHARACTER*100                                   :: name_config
INTEGER(kind=4)                                 :: start, finish, count_rate

! initialisation of 2DECOMP&FFT
CALL MPI_INIT(ierr)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)
! do automatic domain decomposition
CALL decomp_2d_init(Nx,Ny,Nz,p_row,p_col)
! get information about domain decomposition choosen
CALL decomp_info_init(Nx,Ny,Nz,decomp)
! initialise FFT library
CALL decomp_2d_fft_init
IF (myid.eq.0) THEN
PRINT *,'Grid:',Nx,'X',Ny,'Y',Nz,'Z'
PRINT *,'dt:',dt
END IF

ALLOCATE(x(1:Nx),y(1:Ny),z(1:Nz),time(1:Nt+1),mychg(1:3),allchg(1:3),&
u(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
v(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
w(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
ux(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
uy(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
uz(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vx(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vy(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vz(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wx(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wy(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wz(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
uold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
uxold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
uyold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
uzold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vxold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vyold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vzold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wxold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wyold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wzold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
utemp(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vtemp(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wtemp(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
temp_r(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
kx(1:Nx),ky(1:Ny),kz(1:Nz),&
uhat(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
vhat(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
what(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
rhsuhatfix(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
rhsvhatfix(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
rhswhatfix(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
nonlinuhat(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
nonlinvhat(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
nonlinwhat(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
phat(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
temp_c(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
realtemp(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
realtempx(decomp%xst(1):decomp%xen(1), &
decomp%xst(2):decomp%xen(2), &
decomp%xst(3):decomp%xen(3)),&
realtempy(decomp%xst(1):decomp%xen(1), &
decomp%xst(2):decomp%xen(2), &
decomp%xst(3):decomp%xen(3)),&
realtempz(decomp%xst(1):decomp%xen(1), &
decomp%xst(2):decomp%xen(2), &
decomp%xst(3):decomp%xen(3)),&
stat=AllocateStatus)
IF (AllocateStatus .ne. 0) STOP
IF (myid.eq.0) THEN
PRINT *,'allocated space'
END IF

! setup fourier frequencies in x-direction
DO i=1,Nx/2+1
kx(i)= cmplx(0.0d0,1.0d0)*REAL(i-1,kind(0d0))/Lx
END DO
kx(1+Nx/2)=0.0d0
DO i = 1,Nx/2 -1
kx(i+1+Nx/2)=-kx(1-i+Nx/2)
END DO
ind=1
DO i=-Nx/2,Nx/2-1
x(ind)=2.0d0*pi*REAL(i,kind(0d0))*Lx/REAL(Nx,kind(0d0))
ind=ind+1
END DO
! setup fourier frequencies in y-direction
DO j=1,Ny/2+1
ky(j)= cmplx(0.0d0,1.0d0)*REAL(j-1,kind(0d0))/Ly
END DO
ky(1+Ny/2)=0.0d0
DO j = 1,Ny/2 -1
ky(j+1+Ny/2)=-ky(1-j+Ny/2)
END DO
ind=1
DO j=-Ny/2,Ny/2-1
y(ind)=2.0d0*pi*REAL(j,kind(0d0))*Ly/REAL(Ny,kind(0d0))
ind=ind+1
END DO
! setup fourier frequencies in z-direction
DO k=1,Nz/2+1
kz(k)= cmplx(0.0d0,1.0d0)*REAL(k-1,kind(0d0))/Lz
END DO
kz(1+Nz/2)=0.0d0
DO k = 1,Nz/2 -1
kz(k+1+Nz/2)=-kz(1-k+Nz/2)
END DO
ind=1
DO k=-Nz/2,Nz/2-1
z(ind)=2.0d0*pi*REAL(k,kind(0d0))*Lz/REAL(Nz,kind(0d0))
ind=ind+1
END DO
scalemodes=1.0d0/REAL(Nx*Ny*Nz,kind(0d0))
IF (myid.eq.0) THEN
PRINT *,'Setup grid and fourier frequencies'
END IF

!initial conditions for Taylor-Green vortex
!       factor=2.0d0/sqrt(3.0d0)
!       DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
!               u(i,j,k)=factor*sin(theta+2.0d0*pi/3.0d0)*sin(x(i))*cos(y(j))*cos(z(k))
!       END DO; END DO; END DO
!       DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
!               v(i,j,k)=factor*sin(theta-2.0d0*pi/3.0d0)*cos(x(i))*sin(y(j))*cos(z(k))
!       END DO ; END DO ; END DO
!       DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
!               w(i,j,k)=factor*sin(theta)*cos(x(i))*cos(y(j))*sin(z(k))
!       END DO ; END DO ; END DO

time(1)=0.0d0
factor=sqrt(3.0d0)
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
u(i,j,k)=-0.5*( factor*cos(x(i))*sin(y(j))*sin(z(k))&
+sin(x(i))*cos(y(j))*cos(z(k)) )*exp(-(factor**2)*time(1)/Re)
END DO; END DO; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
v(i,j,k)=0.5*(  factor*sin(x(i))*cos(y(j))*sin(z(k))&
-cos(x(i))*sin(y(j))*cos(z(k)) )*exp(-(factor**2)*time(1)/Re)
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
w(i,j,k)=cos(x(i))*cos(y(j))*sin(z(k))*exp(-(factor**2)*time(1)/Re)
END DO; END DO ; END DO

CALL decomp_2d_fft_3d(u,uhat,DECOMP_2D_FFT_FORWARD)
CALL decomp_2d_fft_3d(v,vhat,DECOMP_2D_FFT_FORWARD)
CALL decomp_2d_fft_3d(w,what,DECOMP_2D_FFT_FORWARD)

! derivative of u with respect to x, y, and z
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=uhat(i,j,k)*kx(i)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,ux,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=uhat(i,j,k)*ky(j)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,uy,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=uhat(i,j,k)*kz(k)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,uz,DECOMP_2D_FFT_BACKWARD)

! derivative of v with respect to x, y, and z
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=vhat(i,j,k)*kx(i)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,vx,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=vhat(i,j,k)*ky(j)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,vy,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=vhat(i,j,k)*kz(k)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,vz,DECOMP_2D_FFT_BACKWARD)

! derivative of w with respect to x, y, and z
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=what(i,j,k)*kx(i)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,wx,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=what(i,j,k)*ky(j)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,wy,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=what(i,j,k)*kz(k)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,wz,DECOMP_2D_FFT_BACKWARD)

! save initial data
n = 0
!coprocessing: removed savedata calls from coprocessing version
!coprocessing: could have this after the coprocessor initialize and
!coprocessing: followed by a call to the coprocessor to write out the initial data
!coprocessing: image. Otherwise, these are just wasted calls at this spot.
call coprocessorinitialize("pipeline.py", 11)

DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
realtempx(i,j,k)=REAL(wy(i,j,k)-vz(i,j,k),KIND=8)
END DO; END DO ; END DO

DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
realtempy(i,j,k)=REAL(uz(i,j,k)-wx(i,j,k),KIND=8)
END DO; END DO ; END DO

DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
realtempz(i,j,k)=REAL(vx(i,j,k)-uy(i,j,k),KIND=8)
END DO; END DO ; END DO

decomp%xst(2),decomp%xen(2),decomp%xst(3),decomp%xen(3), n, n*dt, &
realtempx, realtempy, realtempz)

!start timer
CALL system_clock(start,count_rate)
! coprocessing: Simulation loop starts here
DO n=1,Nt
!fixed point
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
uold(i,j,k)=u(i,j,k)
uxold(i,j,k)=ux(i,j,k)
uyold(i,j,k)=uy(i,j,k)
uzold(i,j,k)=uz(i,j,k)
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
vold(i,j,k)=v(i,j,k)
vxold(i,j,k)=vx(i,j,k)
vyold(i,j,k)=vy(i,j,k)
vzold(i,j,k)=vz(i,j,k)
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
wold(i,j,k)=w(i,j,k)
wxold(i,j,k)=wx(i,j,k)
wyold(i,j,k)=wy(i,j,k)
wzold(i,j,k)=wz(i,j,k)
END DO; END DO ; END DO
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
rhsuhatfix(i,j,k) = (dtInv+(0.5*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)))*uhat(i,j,k)
END DO; END DO ; END DO
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
rhsvhatfix(i,j,k) = (dtInv+(0.5*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)))*vhat(i,j,k)
END DO; END DO ; END DO
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
rhswhatfix(i,j,k) = (dtInv+(0.5*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)))*what(i,j,k)
END DO; END DO ; END DO

chg=1
DO WHILE (chg .gt. tol)
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
temp_r(i,j,k)=0.25d0*((u(i,j,k)+uold(i,j,k))*(ux(i,j,k)+uxold(i,j,k))&
+(v(i,j,k)+vold(i,j,k))*(uy(i,j,k)+uyold(i,j,k))&
+(w(i,j,k)+wold(i,j,k))*(uz(i,j,k)+uzold(i,j,k)))
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_r,nonlinuhat,DECOMP_2D_FFT_FORWARD)
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
temp_r(i,j,k)=0.25d0*((u(i,j,k)+uold(i,j,k))*(vx(i,j,k)+vxold(i,j,k))&
+(v(i,j,k)+vold(i,j,k))*(vy(i,j,k)+vyold(i,j,k))&
+(w(i,j,k)+wold(i,j,k))*(vz(i,j,k)+vzold(i,j,k)))
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_r,nonlinvhat,DECOMP_2D_FFT_FORWARD)
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
temp_r(i,j,k)=0.25d0*((u(i,j,k)+uold(i,j,k))*(wx(i,j,k)+wxold(i,j,k))&
+(v(i,j,k)+vold(i,j,k))*(wy(i,j,k)+wyold(i,j,k))&
+(w(i,j,k)+wold(i,j,k))*(wz(i,j,k)+wzold(i,j,k)))
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_r,nonlinwhat,DECOMP_2D_FFT_FORWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
phat(i,j,k)=-1.0d0*( kx(i)*nonlinuhat(i,j,k)&
+ky(j)*nonlinvhat(i,j,k)&
+kz(k)*nonlinwhat(i,j,k))&
/(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)+0.1d0**13)
END DO; END DO ; END DO

DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
uhat(i,j,k)=(rhsuhatfix(i,j,k)-nonlinuhat(i,j,k)-kx(i)*phat(i,j,k))/&
(dtInv-(0.5d0*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k))) !*scalemodes
END DO; END DO ; END DO
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
vhat(i,j,k)=(rhsvhatfix(i,j,k)-nonlinvhat(i,j,k)-ky(j)*phat(i,j,k))/&
(dtInv-(0.5d0*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k))) !*scalemodes
END DO; END DO ; END DO
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
what(i,j,k)=(rhswhatfix(i,j,k)-nonlinwhat(i,j,k)-kz(k)*phat(i,j,k))/&
(dtInv-(0.5d0*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k))) !*scalemodes
END DO; END DO ; END DO

! derivative of u with respect to x, y, and z
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=uhat(i,j,k)*kx(i)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,ux,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3); DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=uhat(i,j,k)*ky(j)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,uy,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3); DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=uhat(i,j,k)*kz(k)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,uz,DECOMP_2D_FFT_BACKWARD)

! derivative of v with respect to x, y, and z
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=vhat(i,j,k)*kx(i)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,vx,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3); DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=vhat(i,j,k)*ky(j)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,vy,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=vhat(i,j,k)*kz(k)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,vz,DECOMP_2D_FFT_BACKWARD)

! derivative of w with respect to x, y, and z
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=what(i,j,k)*kx(i)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,wx,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=what(i,j,k)*ky(j)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,wy,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=what(i,j,k)*kz(k)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,wz,DECOMP_2D_FFT_BACKWARD)

DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
utemp(i,j,k)=u(i,j,k)
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
vtemp(i,j,k)=v(i,j,k)
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
wtemp(i,j,k)=w(i,j,k)
END DO; END DO ; END DO

CALL decomp_2d_fft_3d(uhat,u,DECOMP_2D_FFT_BACKWARD)
CALL decomp_2d_fft_3d(vhat,v,DECOMP_2D_FFT_BACKWARD)
CALL decomp_2d_fft_3d(what,w,DECOMP_2D_FFT_BACKWARD)

DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
u(i,j,k)=u(i,j,k)*scalemodes
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
v(i,j,k)=v(i,j,k)*scalemodes
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
w(i,j,k)=w(i,j,k)*scalemodes
END DO; END DO ; END DO

mychg(1) =maxval(abs(utemp-u))
mychg(2) =maxval(abs(vtemp-v))
mychg(3) =maxval(abs(wtemp-w))
CALL MPI_ALLREDUCE(mychg,allchg,3,MPI_DOUBLE_PRECISION,MPI_MAX,MPI_COMM_WORLD,ierr)
chg=allchg(1)+allchg(2)+allchg(3)
IF (myid.eq.0) THEN
PRINT *,'chg:',chg
END IF
END DO
time(n+1)=n*dt

!goto 5100
IF (myid.eq.0) THEN
PRINT *,'time',n*dt
END IF

!coprocessing: populating the arrays sent to the coprocessor
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
realtempx(i,j,k)=REAL(wy(i,j,k)-vz(i,j,k),KIND=8)
END DO; END DO ; END DO

DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
realtempy(i,j,k)=REAL(uz(i,j,k)-wx(i,j,k),KIND=8)
END DO; END DO ; END DO

DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
realtempz(i,j,k)=REAL(vx(i,j,k)-uy(i,j,k),KIND=8)
END DO; END DO ; END DO

decomp%xst(2),decomp%xen(2),decomp%xst(3),decomp%xen(3), n, n*dt, &
realtempx, realtempy, realtempz)
END DO
!coprocessing:
call coprocessorfinalize()

CALL system_clock(finish,count_rate)

IF (myid.eq.0) then
PRINT *, 'Program took', REAL(finish-start)/REAL(count_rate), 'for main timestepping loop'
END IF

IF (myid.eq.0) THEN
name_config = './data/tdata.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO n=1,1+Nt
WRITE(11,*) time(n)
END DO
CLOSE(11)

name_config = './data/xcoord.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO i=1,Nx
WRITE(11,*) x(i)
END DO
CLOSE(11)

name_config = './data/ycoord.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO j=1,Ny
WRITE(11,*) y(j)
END DO
CLOSE(11)

name_config = './data/zcoord.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO k=1,Nz
WRITE(11,*) z(k)
END DO
CLOSE(11)
PRINT *,'Saved data'
END IF

! Calculate error in final numerical solution
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
utemp(i,j,k)=u(i,j,k) -&
(-0.5*( factor*cos(x(i))*sin(y(j))*sin(z(k))&
+sin(x(i))*cos(y(j))*cos(z(k)) )*exp(-(factor**2)*time(Nt+1)/Re))
END DO; END DO; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
vtemp(i,j,k)=v(i,j,k) -&
(0.5*(  factor*sin(x(i))*cos(y(j))*sin(z(k))&
-cos(x(i))*sin(y(j))*cos(z(k)) )*exp(-(factor**2)*time(Nt+1)/Re))
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
wtemp(i,j,k)=w(i,j,k)-&
(cos(x(i))*cos(y(j))*sin(z(k))*exp(-(factor**2)*time(Nt+1)/Re))
END DO; END DO ; END DO
mychg(1) = maxval(abs(utemp))
mychg(2) = maxval(abs(vtemp))
mychg(3) = maxval(abs(wtemp))
CALL MPI_ALLREDUCE(mychg,allchg,3,MPI_DOUBLE_PRECISION,MPI_MAX,MPI_COMM_WORLD,ierr)
chg=allchg(1)+allchg(2)+allchg(3)
IF (myid.eq.0) THEN
PRINT*,'The error at the final timestep is',chg
END IF

! clean up
CALL decomp_2d_fft_finalize
CALL decomp_2d_finalize

DEALLOCATE(x,y,z,time,mychg,allchg,u,v,w,ux,uy,uz,vx,vy,vz,wx,wy,wz,uold,uxold,uyold,uzold,&
vold,vxold,vyold,vzold,wold,wxold,wyold,wzold,utemp,vtemp,wtemp,&
temp_r,kx,ky,kz,uhat,vhat,what,rhsuhatfix,rhsvhatfix,&
rhswhatfix,phat,nonlinuhat,nonlinvhat,nonlinwhat,temp_c,&
realtemp,stat=AllocateStatus)
!coprocessing: deallocate Coprocessing specific arrays
deallocate(realtempx, realtempy, realtempz)
IF (AllocateStatus .ne. 0) STOP
IF (myid.eq.0) THEN
PRINT *,'Program execution complete'
END IF
CALL MPI_FINALIZE(ierr)

END PROGRAM main


### Code changes in f90 adaptor glue code

( H)

• Accept the subextents of the current pencil.
! Fortran module for interacting with the ParaView CoProcessor
! loosely based on:
! -- Changed for SESE

! Subroutine determines if coprocessing needed during the
! current simulation step or not, and if so, calls the coprocessor.
! Some of the subroutines are supplied by ParaView's FortranAdaptorAPI,
! others have to be supplied by the programmer.
use iso_c_binding
implicit none
public
contains
subroutine NSadaptor3D(nx, ny, nz, xst, xen, yst, yen, zst, zen, &
step, time, omegax, omegay, omegaz)
! nx, ny, nz     -- grid dimensions of entire mesh
!                   used for setting whole extent
! xst, xen, etc. -- extents of current subdomain pencil
! step           -- current simulation time step
! time           -- current simulation time
! omega*         -- scalar arrays for the current time step
integer, intent(in) :: nx, ny, nz, xst, xen, yst, yen, zst, zen, step
real(kind=8), intent(in) :: time
real(kind=8), dimension(:,:,:), intent (in) :: omegax, omegay, omegaz
integer :: flag

! check if processing this time step

if (flag /= 0) then
! processing requested
! check if need to create grid
call needtocreategrid(flag)

if (flag /= 0) then
! grid needed
! defined in VTKCellBasedDataSet.cxx
! takes the size of the entire grid, and the extents of the
! sub grid.
call createcpimagedata(nx, ny, nz, xst, xen, yst, yen, zst, zen)
end if

! defined in VTKCellBasedDataSet.cxx
! call for each field of interest. Be sure to null-terminate the
! string for C++!

call coprocess()
end if

return


### Code changes in C++ wrapping code

( I)

• Changed how the extents are calculated.
• Changes point data calls to cell data calls.
// Adaptor for getting fortran simulation code into ParaView CoProcessor.
// Based on the PhastaAdaptor sample in the ParaView distribution.

// Routines that take the place of VTK dataset object creation.
// Called from Fortran code which also calls the Fortran Adaptor API
// supplied with ParaView source.
// Note: names mangled with trailing underscores for linker visibility.
#include "vtkCPProcessor.h"
#include "vtkDoubleArray.h"
#include "vtkPointData.h"
#include "vtkSmartPointer.h"
#include "vtkImageData.h"
#include "vtkCellData.h"
#include <string>

// These will be called from the Fortran adaptor "glue" code.
// Completely dependent on data layout, structured vs. unstructured, etc.
// since VTK/ParaView uses different internal layouts for each.

// Creates the data container for the CoProcessor.
// Takes the extents for both the global dataset and the particular subsection
// visible to the current MPI rank.
// Note: expects to receive Fortran base-1 indices.
extern "C" void createcpimagedata_(int* nx, int* ny, int* nz, int* xst, int* xen,
int* yst, int* yen, int* zst, int* zen)
{
if (!ParaViewCoProcessing::GetCoProcessorData()) {
vtkGenericWarningMacro("Unable to access CoProcessorData.");
return;
}

// The simulation grid is a 3-dimensional topologically and geometrically
// regular grid. In VTK/ParaView, this is considered an image data set.
vtkSmartPointer<vtkImageData> img = vtkSmartPointer<vtkImageData>::New();
// Indexing based on change from F90 to C++, and also from nodal to cellular.
img->SetExtent(*xst - 1, *xen, *yst - 1, *yen, *zst - 1, *zen);
img->SetSpacing( 1.0 / *nx, 1.0 / *ny, 1.0 / *nz);

ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input")->SetGrid(img);
ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input")->SetWholeExtent(0, *nx, 0, *ny, 0, *nz);
}

// Add field to the data container.
// Separate from above because this could be dynamic, grid is static.
// Might be an issue, VTK assumes row major C storage.
// Underscore is by-hand name mangling for fortran linking.
// Note: Expects the address of the data, has no way of determining
// if the array is densely packed or not.
// Note 2: Assumes "name" points to null-terminated array of chars.
// Easiest way to do that is to concatenate a terminal NULL in caller.
extern "C" void addfield_(double* a, char* name)
{

vtkSmartPointer<vtkImageData> img = vtkImageData::SafeDownCast(idd->GetGrid());

if (!img) {
vtkGenericWarningMacro("No adaptor grid to attach field data to.");
return;
}

if (idd->IsFieldNeeded(name)) {
vtkSmartPointer<vtkDoubleArray> field = vtkSmartPointer<vtkDoubleArray>::New();
field->SetName(name);
field->SetArray(a, img->GetNumberOfCells(), 1);
}
}


### Pros

• Minimal code changes, none in simulation code (beyond the coprocessing calls).
• No halo or packed arrays needed.
• Should work with different decomposition libraries without change.

### Cons

• Requires extra handling in the ParaView client to handle the differences between cell and point rendering.
• Won’t work as is with VTK/ParaView filters that require ghost cells.

## Additional notes on creating pipelines in the ParaView client

### Working with Fortran complex type

C++ does not have a native complex data type. This requires some additional handling in the ParaView client:

1. Run the original non-coprocessing simulation at a reduced mesh size.
2. Open this data in the ParaView client using the RAW binary reader.
3. Set the number of Scalar Components to 2
4. Create the pipeline, the real and imaginary parts’s default names are ImageFile_X and ImageFile_Y, respectively.

( J)

! Fortran module for interacting with the ParaView CoProcessor
! loosely based on:
! -- Changed for SESE

! Subroutine determines if coprocessing needed during the
! current simulation step or not, and if so, calls the coprocessor.
! Some of the subroutines are supplied by ParaView's FortranAdaptorAPI,
! others have to be supplied by the programmer.
use iso_c_binding
implicit none
public
! using an interface incase need arises to overload 1D, 2D versions
contains
subroutine NLSadaptor3D(nx, ny, nz, xst, xen, yst, yen, zst, zen, &
step, time, a)
! nx, ny, nz     -- grid dimensions of entire mesh
!                   used for setting whole extent
! xst, xen, etc. -- extents of current subdomain pencil
! step           -- current simulation time step
! time           -- current simulation time
! a              -- scalar array for the current time step
! flag           -- receives status from API calls
integer, intent(in) :: nx, ny, nz, xst, xen, yst, yen, zst, zen, step
real(kind=8), intent(in) :: time
complex(kind=8), dimension(:,:,:), intent (in) :: a
integer :: flag

flag = 0
! check if processing this time step

if (flag /= 0) then
! processing requested
! check if need to create grid
call needtocreategrid(flag)

if (flag /= 0) then
! grid needed
! takes the size of the entire grid, and the extents of the
! sub grid.
call createcpimagedata(nx, ny, nz, xst, xen, yst, yen, zst, zen)
end if

! Be sure to null-terminate the
! string for C++!

call coprocess()
end if

return


( K )

This also requires some minor code changes in the accompanying C++ file. Code Download

// CoProcessor specific headers // Routines that take the place of VTK dataset object creation. // Called from Fortran code which also calls the Fortran Adaptor API // supplied with ParaView source. // Note: names mangled with trailing underscores for linker visibility.

3. include "vtkCPProcessor.h"
4. include "vtkDoubleArray.h"
5. include "vtkPointData.h"
6. include "vtkSmartPointer.h"
7. include "vtkImageData.h"
8. include "vtkCellData.h"
9. include <string>

// These will be called from the Fortran "glue" code" // Completely dependent on data layout, structured vs. unstructured, etc. // since VTK/ParaView uses different internal layouts for each.

// Creates the data container for the CoProcessor. // Takes the extents for both the global dataset and the particular subsection // visible to the current MPI rank. // Note: expects to receive Fortran base-1 indices. extern "C" void createcpimagedata_(int* nx, int* ny, int* nz, int* xst, int* xen, int* yst, int* yen, int* zst, int* zen) {

 if (!ParaViewCoProcessing::GetCoProcessorData()) {
vtkGenericWarningMacro("Unable to access CoProcessorData.");
return;
}

 // The simulation grid is a 3-dimensional topologically and geometrically
// regular grid. In VTK/ParaView, this is considered an image data set.
vtkSmartPointer<vtkImageData> img = vtkSmartPointer<vtkImageData>::New();

 // For Fortram complex, need to set that there are 2 scalar components
img->SetNumberOfScalarComponents(2);

 // Indexing based on change from F90 to C++, and also from nodal to cellular.
// Extents are given in terms of nodes, not cells.
img->SetExtent(*xst - 1, *xen, *yst - 1, *yen, *zst - 1, *zen);

// Setting spacing is important so that the camera position in the pipeline makes
// sense if using different sized meshes between setting up the pipeline and running
// the simulation. Origin can often be ignored.
img->SetSpacing( 1.0 / *nx, 1.0 / *ny, 1.0 / *nz); // considering passing in as args.
ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input")->SetGrid(img);
ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input")->SetWholeExtent(0, *nx, 0, *ny, 0, *nz);


}

// Add field to the data container. // Separate from above because this could be dynamic, grid is static. // Might be an issue, VTK assumes row major C storage. // Underscore is by-hand name mangling for fortran linking. // Note: Expects the address of the data, has no way of determining // if the array is densely packed or not. // Note 2: Assumes "name" points to null-terminated array of chars. // Easiest way to do that is to concatenate in caller. extern "C" void addfield_(double* a, char* name) {

 vtkSmartPointer<vtkCPInputDataDescription> idd = ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input");
vtkSmartPointer<vtkImageData> img = vtkImageData::SafeDownCast(idd->GetGrid());

 if (!img) {
vtkGenericWarningMacro("No adaptor grid to attach field data to.");
return;
}

 if (idd->IsFieldNeeded(name)) {
vtkSmartPointer<vtkDoubleArray> field = vtkSmartPointer<vtkDoubleArray>::New();
field->SetNumberOfComponents(2);
field->SetName(name);
field->SetArray(a, 2* img->GetNumberOfCells(), 1);
}


}

### Working with cell data

• Some ParaView filters and representations will only work with point data, e.g. volume rendering. Use the Cell Data to Point Data filter to convert. Be sure to check the option to pass the cellular data through the filter. If a simulation runs with no errors and produces no output, a possible culprit is failing to have checked that option when creating the pipeline.

### vtkKDTreeGenerator warnings

• Pipelines that use volume rendering will generate warnings about Region IDs being 0. These can often be ignored.