Parallel Spectral Numerical Methods/Visualization with ParaView CoProcessing
The ParaView CoProcessing plugin allows an application code to be instrumented to connect to a ParaView server in order to execute a visualization pipeline. The pipeline can produce either images in a variety of formats or VTK XML parallel file format data sets.
A user creates a pipeline in the ParaView client using a sample, probably lower resolution, data set. This pipeline is then exported using the CoProcessor plugin as a Python script which will be loaded by the application.
The application developer also needs to write some Fortran/C++ code to convert the application’s data structures into a format ParaView can understand. In the case of completely regular grids, this is straightforward.
Using ParaView CoProcessing on a Single Computation Node
editParaView client setup
editThe first requirement is to have a ParaView client which can export the Python script. This requires building from source as of ParaView 3.14.1. The standard instructions apply with some additional instructions for building the script generator plugin. After building, the first time you launch ParaView, go to Tools, Manage Plugins and set the CoProcessingPlugin to load on startup. After the plugin is loaded, there should be two new menu items: Writers and CoProcessing
Client build notes:
edit- Might have to install CMake
- Might have to build Qt
- Might have to turn off building the Manta plugin
Existing code alterations
editThe following instructions pertain to the Navier-Stokes CUDA Fortran as tested on NCSA's Forge during the Summer of 2012.
Simulation code
edit
|
(
) |
navierstokes.cuf added 4 lines and removed .datbin writing. Code download
!--------------------------------------------------------------------
!
! PURPOSE
!
! This program numerically solves the 2D incompressible Navier-Stokes
! on a Square Domain [0,1]x[0,1] using pseudo-spectral methods and
! Crank-Nicolson timestepping. The numerical solution is compared to
! the exact Taylor-Green Vortex Solution.
!
! Periodic free-slip boundary conditions and Initial conditions:
! u(x,y,0)=sin(2*pi*x)cos(2*pi*y)
! v(x,y,0)=-cos(2*pi*x)sin(2*pi*y)
! Analytical Solution (subscript denote derivatives):
! u(x,y,t)=sin(2*pi*x)cos(2*pi*y)exp(-8*pi^2*t/Re)
! v(x,y,t)=-cos(2*pi*x)sin(2*pi*y)exp(-8*pi^2*t/Re)
! u_y(x,y,t)=-2*pi*sin(2*pi*x)sin(2*pi*y)exp(-8*pi^2*t/Re)
! v_x(x,y,t)=2*pi*sin(2*pi*x)sin(2*pi*y)exp(-8*pi^2*t/Re)
! omega=v_x-u_y
!
! .. Parameters ..
! Nx = number of modes in x - power of 2 for FFT
! Ny = number of modes in y - power of 2 for FFT
! nplots = number of plots produced
! plotgap = number of timesteps inbetween plots
! Re = dimensionless Renold's number
! ReInv = 1/Re for optimization
! dt = timestep size
! dtInv = 1/dt for optimization
! tol = determines when convergences is reached
! scalemodes = 1/(Nx*Ny), scaling after preforming FFTs
! numthreads = number of CPUs used in calculation
! .. Scalars ..
! i = loop counter in x direction
! j = loop counter in y direction
! n = loop counter for timesteps direction
! allocatestatus = error indicator during allocation
! time = times at which data is saved
! chg = error at each iteration
! temp = used for ordering saved omega
! .. Arrays (gpu) ..
! u_d = velocity in x direction
! uold_d = velocity in x direction at previous timestep
! v_d = velocity in y direction
! vold_d = velocity in y direction at previous timestep
! omeg_d = vorticity in real space
! omeghat_d = 2D Fourier transform of vorticity
! at next iterate
! omegoldhat_d = 2D Fourier transform of vorticity at previous
! iterate
! nloldhat_d = nonlinear term in Fourier space
! at previous iterate
! omegexact_d = taylor-green vorticity at
! at final step
! psihat_d = 2D Fourier transform of streamfunction
! at next iteration
! omegcheck_d = store of vorticity at previous iterate
! temp1_d/temp2_d = reusable complex/real space used for
! calculations. This reduces number of
! arrays stored.
! .. Vectors (gpu) ..
! kx_d = fourier frequencies in x direction
! ky_d = fourier frequencies in y direction
! x_d = x locations
! y_d = y locations
! name_config = array to store filename for data to be saved
! REFERENCES
!
! ACKNOWLEDGEMENTS
!
! ACCURACY
!
! ERROR INDICATORS AND WARNINGS
!
! FURTHER COMMENTS
! This program has not been fully optimized.
!--------------------------------------------------------------------
module precision
! Precision control
integer, parameter, public :: Single = kind(0.0) ! Single precision
integer, parameter, public :: Double = kind(0.0d0) ! Double precision
integer, parameter, public :: fp_kind = Double
!integer, parameter, public :: fp_kind = Single
end module precision
module cufft
integer, public :: CUFFT_FORWARD = -1
integer, public :: CUFFT_INVERSE = 1
integer, public :: CUFFT_R2C = Z'2a' ! Real to Complex (interleaved)
integer, public :: CUFFT_C2R = Z'2c' ! Complex (interleaved) to Real
integer, public :: CUFFT_C2C = Z'29' ! Complex to Complex, interleaved
integer, public :: CUFFT_D2Z = Z'6a' ! Double to Double-Complex
integer, public :: CUFFT_Z2D = Z'6c' ! Double-Complex to Double
integer, public :: CUFFT_Z2Z = Z'69' ! Double-Complex to Double-Complex
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!
! cufftPlan2d(cufftHandle *plan, int nx,int ny, cufftType type)
!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
interface cufftPlan2d
subroutine cufftPlan2d(plan, nx, ny, type) bind(C,name='cufftPlan2d')
use iso_c_binding
integer(c_int):: plan
integer(c_int),value:: nx, ny, type
end subroutine cufftPlan2d
end interface
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!
! cufftDestroy(cufftHandle plan)
!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
interface cufftDestroy
subroutine cufftDestroy(plan) bind(C,name='cufftDestroy')
use iso_c_binding
integer(c_int),value:: plan
end subroutine cufftDestroy
end interface
interface cufftExecD2Z
subroutine cufftExecD2Z(plan, idata, odata) &
& bind(C,name='cufftExecD2Z')
use iso_c_binding
use precision
integer(c_int), value :: plan
real(fp_kind), device :: idata(1:nx,1:ny)
complex(fp_kind),device :: odata(1:nx,1:ny)
end subroutine cufftExecD2Z
end interface
interface cufftExecZ2D
subroutine cufftExecZ2D(plan, idata, odata) &
& bind(C,name='cufftExecZ2D')
use iso_c_binding
use precision
integer(c_int),value:: plan
complex(fp_kind),device:: idata(1:nx,1:ny)
real(fp_kind),device :: odata(1:nx,1:ny)
end subroutine cufftExecZ2D
end interface
end module cufft
PROGRAM main
use precision
use cufft
! coprocessing:
use ns2dcnadaptor
! declare variables
IMPLICIT NONE
INTEGER(kind=4), PARAMETER :: Nx=1024
INTEGER(kind=4), PARAMETER :: Ny=1024
!INTEGER(kind=8) ::
REAL(fp_kind), PARAMETER :: dt=0.00025d0
REAL(fp_kind), PARAMETER :: dtInv=1.0d0/REAL(dt,kind(0d0))
REAL(fp_kind), PARAMETER &
:: pi=3.14159265358979323846264338327950288419716939937510d0
REAL(fp_kind), PARAMETER :: Re=5000.0d0
REAL(fp_kind), PARAMETER :: ReInv=1.0d0/REAL(Re,kind(0d0))
REAL(fp_kind), PARAMETER :: tol=0.1d0**10
!coprocessing: temp is used as an int, runtime crashes on
! floating temp used in write format string on line 376
!REAL(fp_kind) :: scalemodes,chg,temp=10000000
real(fp_kind) :: scalemodes, chg
integer(kind=4) :: temp=100000000
INTEGER(kind=4), PARAMETER :: nplots=400, plotgap=200
REAL(fp_kind),DIMENSION(:), ALLOCATABLE :: x,y
REAL(fp_kind),DIMENSION(:,:), ALLOCATABLE :: omeg,omegexact
INTEGER(kind=4) :: i,j,n,t, AllocateStatus
INTEGER(kind=4) :: planz2d,pland2z, kersize
!variables used for saving data and timing
INTEGER(kind=4) :: start, finish, count_rate,count, iol
CHARACTER*100 :: name_config
! Declare variables for GPU
REAL(fp_kind), DEVICE, DIMENSION(:,:), ALLOCATABLE :: u_d,v_d,&
omegcheck_d,omeg_d,&
nl_d, temp2_d, omegexact_d
COMPLEX(fp_kind), DEVICE, DIMENSION(:,:), ALLOCATABLE :: omegoldhat_d, nloldhat_d,&
omeghat_d, nlhat_d, psihat_d,&
temp1_d
COMPLEX(fp_kind), DEVICE, DIMENSION(:), ALLOCATABLE :: kx_d,ky_d
REAL(kind=8),DEVICE, DIMENSION(:), ALLOCATABLE :: x_d,y_d
kersize=min(Nx,256)
PRINT *,'Program starting'
PRINT *,'Grid:',Nx,'X',Ny
PRINT *,'dt:',dt
ALLOCATE(x(1:Nx),y(1:Ny),omeg(1:Nx,1:Ny),omegexact(1:Nx,1:Ny),&
stat=AllocateStatus)
IF (AllocateStatus .ne. 0) STOP
PRINT *,'Allocated CPU arrays'
ALLOCATE(kx_d(1:Nx/2+1),ky_d(1:Ny),x_d(1:Nx),y_d(1:Ny),u_d(1:Nx,1:Ny),&
v_d(1:Nx,1:Ny),omeg_d(1:Nx,1:Ny),&
omegexact_d(1:Nx,1:Ny),temp2_d(1:Nx,1:Ny),&
omegoldhat_d(1:Nx/2+1,1:Ny),nloldhat_d(1:Nx/2+1,1:Ny),&
omegcheck_d(1:Nx,1:Ny),omeghat_d(1:Nx/2+1,1:Ny),nl_d(1:Nx,1:Ny),&
nlhat_d(1:Nx/2+1,1:Ny), psihat_d(1:Nx/2+1,1:Ny),temp1_d(1:Nx/2+1,1:Ny),&
stat=AllocateStatus)
IF (AllocateStatus .ne. 0) STOP
PRINT *,'Allocated GPU arrays'
CALL cufftPlan2D(pland2z,nx,ny,CUFFT_D2Z)
CALL cufftPlan2D(planz2d,nx,ny,CUFFT_Z2D)
PRINT *,'Setup FFTs'
! setup fourier frequencies
!$cuf kernel do <<< *,* >>>
DO i=1,Nx/2+1
kx_d(i)= 2.0d0*pi*cmplx(0.0d0,1.0d0)*REAL(i-1,kind=fp_kind)
END DO
kx_d(1+Nx/2)=0.0d0
!$cuf kernel do <<< *,* >>>
DO i=1,Nx
x_d(i)=REAL(i-1,kind(0d0))/REAL(Nx,kind=fp_kind)
END DO
!$cuf kernel do <<< *,* >>>
DO j=1,Ny/2+1
ky_d(j)= 2.0d0*pi*cmplx(0.0d0,1.0d0)*REAL(j-1,kind=fp_kind)
END DO
ky_d(1+Ny/2)=0.0d0
!$cuf kernel do <<< *,* >>>
DO j = 1,Ny/2 -1
ky_d(j+1+Ny/2)=-ky_d(1-j+Ny/2)
END DO
!$cuf kernel do <<< *, * >>>
DO j=1,Ny
y_d(j)=REAL(j-1,kind(0d0))/REAL(Ny,kind=fp_kind)
END DO
scalemodes=1.0d0/REAL(Nx*Ny,kind=fp_kind)
PRINT *,'Setup grid and fourier frequencies'
!initial data
!$cuf kernel do <<< *,* >>>
DO j=1,Ny
DO i=1,Nx
u_d(i,j)=sin(2.0d0*pi*x_d(i))*cos(2.0d0*pi*y_d(j))
END DO
END DO
!$cuf kernel do <<< *,* >>>
DO j=1,Ny
DO i=1,Nx
v_d(i,j)=-cos(2.0d0*pi*x_d(i))*sin(2.0d0*pi*y_d(j))
END DO
END DO
!$cuf kernel do <<< *,* >>>
DO j=1,Ny
DO i=1,Nx
omeg_d(i,j)=4.0d0*pi*sin(2.0d0*pi*x_d(i))*sin(2.0d0*pi*y_d(j))+0.01d0*cos(2.0d0*pi*y_d(j))
END DO
END DO
CALL cufftExecD2Z(pland2z,omeg_d,omeghat_d)
!$cuf kernel do <<< *,* >>>
DO j=1,Ny
DO i=1,Nx/2+1
temp1_d(i,j)=omeghat_d(i,j)*kx_d(i)
END DO
END DO
CALL cufftExecZ2D(planz2d,temp1_d,temp2_d)
!first part nonlinear term in real space
!$cuf kernel do <<< *,* >>>
DO j=1,Ny
DO i=1,Nx
nl_d(i,j)=u_d(i,j)*temp2_d(i,j)
END DO
END DO
!$cuf kernel do <<< *,* >>>
DO j=1,Ny
DO i=1,Nx/2+1
temp1_d(i,j)=omeghat_d(i,j)*ky_d(j)
END DO
END DO
CALL cufftExecZ2D(planz2d,temp1_d,temp2_d)
!$cuf kernel do <<< *,* >>>
DO j=1,Ny
DO i=1,Nx
nl_d(i,j)=(nl_d(i,j)+v_d(i,j)*temp2_d(i,j))*scalemodes
END DO
END DO
CALL cufftExecD2Z(pland2z,nl_d,nlhat_d)
omegcheck_d=omeg_d
PRINT *,'Got initial data, starting timestepping'
CALL system_clock(start,count_rate)
! coprocessing: simulation loop
! coprocessing: before entering, initialze the ParaView coprocessor
! coprocessing: this is from FortranAdaptorAPI.cxx
! coprocessing: it takes the name of the pipeline script you created
! coprocessing: in the ParaView client and the name's length in chars
call coprocessorinitialize("pipeline.py", 11 )
DO t=1,nplots
DO n=1,plotgap
chg=1.0d0
nloldhat_d=nlhat_d
omegoldhat_d=omeghat_d
DO WHILE (chg>tol)
!$cuf kernel do(2) <<< (2,*), (kersize,1) >>>
DO j=1,Ny
DO i=1,Nx/2+1
omeghat_d(i,j)=((dtInv+0.5d0*ReInv*(kx_d(i)*kx_d(i)+ky_d(j)*ky_d(j)))&
*omegoldhat_d(i,j) - 0.5d0*(nloldhat_d(i,j)+nlhat_d(i,j)))&
/(dtInv-0.5d0*ReInv*(kx_d(i)*kx_d(i)+ky_d(j)*ky_d(j)))
END DO
END DO
!$cuf kernel do(2) <<< (2,*), (kersize,1) >>>
DO j=1,Ny
DO i=1,Nx/2+1
psihat_d(i,j)=-omeghat_d(i,j)/(kx_d(i)*kx_d(i)+ky_d(j)*ky_d(j)+0.10d0**14)
END DO
END DO
CALL cufftExecZ2D(planz2d,omeghat_d,omeg_d)
!$cuf kernel do(2) <<< (2,*), (kersize,1) >>>
DO j=1,Ny
DO i=1,Nx/2+1
temp1_d(i,j)=-psihat_d(i,j)*kx_d(i)*scalemodes
END DO
END DO
CALL cufftExecZ2D(planz2d,temp1_d,v_d)
!$cuf kernel do(2) <<< (2,*), (kersize,1) >>>
DO j=1,Ny
DO i=1,Nx/2+1
temp1_d(i,j)=psihat_d(i,j)*ky_d(j)*scalemodes
END DO
END DO
CALL cufftExecZ2D(planz2d,temp1_d,u_d)
!$cuf kernel do(2) <<< (2,*), (kersize,1) >>>
DO j=1,Ny
DO i=1,Nx/2+1
temp1_d(i,j)=omeghat_d(i,j)*kx_d(i)
END DO
END DO
CALL cufftExecZ2D(planz2d,temp1_d,temp2_d)
!$cuf kernel do(2) <<< (2,*), (kersize,1) >>>
DO j=1,Ny
DO i=1,Nx
nl_d(i,j)=u_d(i,j)*temp2_d(i,j)
END DO
END DO
!$cuf kernel do(2) <<< (2,*), (kersize,1) >>>
DO j=1,Ny
DO i=1,Nx/2+1
temp1_d(i,j)=omeghat_d(i,j)*ky_d(j)
END DO
END DO
CALL cufftExecZ2D(planz2d,temp1_d,temp2_d)
!$cuf kernel do(2) <<< (2,*), (kersize,1) >>>
DO j=1,Ny
DO i=1,Nx
nl_d(i,j)=(nl_d(i,j)+v_d(i,j)*temp2_d(i,j))*scalemodes
END DO
END DO
CALL cufftExecD2Z(pland2z,nl_d,nlhat_d)
chg=0.0d0
!$cuf kernel do(2) <<< (2,*), (kersize,1) >>>
DO j=1,Ny
DO i=1,Nx
chg=chg+(omeg_d(i,j)-omegcheck_d(i,j))*(omeg_d(i,j)-omegcheck_d(i,j))&
*scalemodes*scalemodes
END DO
END DO
omegcheck_d=omeg_d
END DO
END DO
PRINT *, t*plotgap*dt
omeg=omeg_d
!temp=temp+1
!write(name_config,'(a,i0,a)') 'omega',temp,'.datbin'
!INQUIRE(iolength=iol) omeg(1,1)
!OPEN(unit=11,FILE=name_config,form="unformatted", access="direct",recl=iol)
!count = 1 !coprocessing - there's an intrinsic array function this shadows
!DO j=1,Ny
! DO i=1,Nx
! WRITE(11,rec=count) omeg(i,j)*scalemodes
! count=count+1
! END DO
!END DO
!CLOSE(11)
! coprocessing: do the coprocessing at the end of the sim loop
! grid dims, time step, time, scalar array
call navierStokesCoprocessor(Nx, Ny, 1, t, t*dt ,omeg)
END DO
! coprocessing: clean up
call coprocessorfinalize()
CALL system_clock(finish,count_rate)
PRINT*,'Program took ',REAL(finish-start)/REAL(count_rate),&
'for Time stepping'
! Copy results back to host
x=x_d
y=y_d
!!!!!!!!!!!!!!!!!!!!!!!!
!copy over data to disk!
!!!!!!!!!!!!!!!!!!!!!!!!
name_config = 'xcoord.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO i=1,Nx
WRITE(11,*) x(i)
END DO
CLOSE(11)
name_config = 'ycoord.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO j=1,Ny
WRITE(11,*) y(j)
END DO
CLOSE(11)
!!!!!!!!!!!!!!!!!!!!!!!!
CALL cufftDestroy(planz2d)
CALL cufftDestroy(pland2z)
PRINT *,'Destroyed fft plan'
DEALLOCATE(x,y,omeg,omegexact,stat=AllocateStatus)
IF (AllocateStatus .ne. 0) STOP
PRINT *,'Deallocated CPU memory'
DEALLOCATE(kx_d,ky_d,x_d,y_d,u_d,&
v_d, omegcheck_d, omeg_d,omegoldhat_d,&
omegexact_d, nloldhat_d,omeghat_d,nl_d, nlhat_d,&
temp1_d,temp2_d, psihat_d,stat=AllocateStatus)
IF (AllocateStatus .ne. 0) STOP
PRINT *,'Deallocated GPU memory'
PRINT *,'Program execution complete'
END PROGRAM main
Developed files
editns2dcnadaptor.f90
edit
|
(
) |
! fortran module for interacting with the ParaView CoProcessor
! loosely based on:
! ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/PhastaAdaptor/phastaadaptor.f
! -- Changed for SESE
! subroutine determines if coprocessing needed during the
! current simulation step or not.
module ns2dcnadaptor
implicit none
public
contains
subroutine navierstokescoprocessor(nx, ny, nz, step, time, omeg)
! nx, ny, nz -- grid dimensions, nz will be 1 since this is a 2D problem.
! step -- current simulation time step
! time -- current simulation time
! omega -- scalar array for the current time step
integer, intent(in) :: nx, ny, nz, step
real(kind=8), intent(in) :: time
real(kind=8), dimension(:,:), intent (in) :: omeg
integer :: flag
! check if processing this time step
! defined in FortranAdaptorAPI.h
call requestdatadescription(step, time, flag)
if (flag .ne. 0) then
! processing requested
! check if need to create grid
! defined in FortranAdaptorAPI.h
call needtocreategrid(flag)
if (flag .ne. 0) then
! grid needed
! defined in ns2dcnVTKDataSet.cxx
call createcpimagedata(nx, ny, nz)
end if
! defined in ns2dcnVTKDataSet.cxx
call addfield(omeg)
! defined in FortranAdaptorAPI.h
call coprocess()
end if
return
end subroutine navierstokescoprocessor
end module ns2dcnadaptor
ns2dcnVTKDataSet.cxx
edit
|
(
) |
Defines functions that put simulation data into a format which the ParaView/VTK libs can understand. Very straightforward for completely regular grids. Expanding to 3D is simple. Unstructured meshes will require more effort. Code download
// Adaptor for getting fortran simulation code into ParaView CoProcessor.
// Based on the PhastaAdaptor sample in the ParaView distribution.
// ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/PhastaAdaptor/PhastaAdaptor.cxx
// Fortran specific header
// ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/
#include "FortranAdaptorAPI.h"
// CoProcessor specific headers
// ParaView-3.14.1-Source/CoProcessing/CoProcessor/
#include "vtkCPDataDescription.h"
#include "vtkCPInputDataDescription.h"
#include "vtkCPProcessor.h"
#include "vtkDoubleArray.h"
#include "vtkPointData.h"
// ParaView-3.14.1-Source/VTK/Filtering/vtkImageData.h
#include "vtkImageData.h"
// These will be called from the Fortran "glue" code"
// Completely dependent on data layout, structured vs. unstructured, etc.
// since VTK/ParaView uses different internal layouts for each.
// Creates the data container for the CoProcessor.
extern "C" void createcpimagedata_(int* nx, int* ny, int* nz)
{
if (!ParaViewCoProcessing::GetCoProcessorData()) {
vtkGenericWarningMacro("Unable to access CoProcessorData.");
return;
}
// The simulation grid is a 2-dimensional topologically and geometrically
// regular grid. In VTK/ParaView, this is considered an image data set.
vtkImageData* Grid = vtkImageData::New();
// assuming dimZ == 1 for now
Grid->SetDimensions(*nx, *ny, *nz);
// Setting the Origin and Spacing are also options.
// Name should be consistent between here, Fortran and Python client script.
ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input")->SetGrid(Grid);
}
// Add field(s) to the data container.
// Separate from above because this will be dynamic, grid is static.
// Might be an issue, VTK probably assumes row major, but
// omeg probably passed column major...
// by hand name mangling for fortran
extern "C" void addfield_(double* scalars)
{
vtkCPInputDataDescription *idd = ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input");
vtkImageData* Image = vtkImageData::SafeDownCast(idd->GetGrid());
if (!Image) {
vtkGenericWarningMacro("No adaptor grid to attach field data to.");
return;
}
// field name must match that in the fortran code.
if (idd->IsFieldNeeded("omeg")) {
vtkDoubleArray* omega = vtkDoubleArray::New();
omega->SetName("omeg");
omega->SetArray(scalars, Image->GetNumberOfPoints(), 1);
Image->GetPointData()->AddArray(omega);
omega->Delete();
}
}
Creating a visualization pipeline in the ParaView Client
editThe script pipeline is generated in the ParaView client by using the CoProcessing Script Generator Plugin.
- File > Open > browse to data set, the file should now be visible in the Pipeline Browser window.
- In the Properties window, click apply, the data should now appear in the layout window.
- To change the colormap, click on the toolbar button that’s a rainbow with a green circle.
- To display the colormap in the layout, click on the toolbar button that’s a vertical rainbow.
- Set the size of the viewport in Tools > Lock View Size Custom. This will help you arrange the items for the final image. Make a note of the dimensions, you’ll need them later for a bug workaround.
- If writing out VTK data sets, go to the Writers > Parallel Image Data Writer, this should add a writer to the pipeline
- To export the python pipeline go to CoProcessing > Export State, this should launch a wizard
- click Next
- Select the dataset you want to connect to your simulation, Add it, click Next
- The default Simulation Name is “input”. If you change this, be sure to update ns2dcnVTKDataSet.cxx, click Next
- Check whether or not to output images. Click Finish.
- Save the file. The file name will be referenced in navierstokes.cuf, so save as something like pipeline-test-??.py and then cp that to pipeline.py so to avoid recompiling for new pipelines.
Python script issues/bugs
editIf the pipeline is set to output images, then there are two issues that can be considered bugs in the script.
- No matter what viewport size is set in the client, it doesn’t get used by the script.
- Color map labels will be drawn with a box surrounding them.
The workaround for the first is to add a method call to ViewSize equal to the viewport window dimensions in the client. The second a simple edit. Both of these labeled in the sample pipeline.py with #mvm comments
Set the environment
editForge required loading the ParaView OSMesa module. $ module load paraview/3.14.1-OSMesa. On Nautilus the module is paraview/3.14.1
Run the simulation
editRun as you would the uninstrumented version.
Convert the images to an animation
editThere are a variety of ways to do this, if ffmpeg is available ffmpeg -sameq -i path/to/images/image%04d.png myanimation.mpg
or if writing out the VTK data sets, one can work with those in ParaView directly.
dt and effect on animation time.
editSmooth video is ~30 fps or dt = 0.033… If you write out a larger dt as VTK data sets, then one can do linear time interpolation in ParaView. If writing out a larger dt as images to stitch into an mpg, the options available created much reduced video quality.
ParaView Server setup
edit- Forge did not have X support, so an offscreen Mesa (OSMesa) build was necessary.
- Nautilus supports X and so can render with the supplied OpenGL libraries.
Summary of Changes for New API
editThe coprocessing API has had some minor changes since ParaView 3.14.1. The following applies for ParaView 3.98+ and Catalyst 1.0 alpha+. These were tested on Beacon in offload mode at NICS on August 20, 2013.
The code changes are:
- In the simulation code, coprocessorinitialize is now coprocessorinitializewithpython, the arguments are the same.
- In the C++ data set helper code, include vtkCPPythonAdaptorAPI.h instead of FortranAdaptorAPI.h.
- In the C++ data set helper code, change from ParaViewCoprocessing namespace to vtkCPPythonAdaptorAPI namespace.
The libraries to link against have also changed, for a ParaView 4.0.1 build these are:
- -lvtkPVCatalystPython26D-pv4.0
- -lvtkPVCatalyst-pv4.0
- -lvtkCommonCore-pv4.0
- -lvtkCommonDataModel-pv4.0
- -lvtkPVPythonCatalyst-pv4.0
Example with New API
edit
|
(
) |
!--------------------------------------------------------------------
!
!
! PURPOSE
!
! This program numerically solves the 2D incompressible Navier-Stokes
! on a Square Domain [0,1]x[0,1] using pseudo-spectral methods and
! Crank-Nicolson timestepping. The numerical solution is compared to
! the exact Taylor-Green Vortex Solution.
!
! Periodic free-slip boundary conditions and Initial conditions:
! u(x,y,0)=sin(2*pi*x)cos(2*pi*y)
! v(x,y,0)=-cos(2*pi*x)sin(2*pi*y)
! Analytical Solution:
! u(x,y,t)=sin(2*pi*x)cos(2*pi*y)exp(-8*pi^2*nu*t)
! v(x,y,t)=-cos(2*pi*x)sin(2*pi*y)exp(-8*pi^2*nu*t)
!
! .. Parameters ..
! Nx = number of modes in x - power of 2 for FFT
! Ny = number of modes in y - power of 2 for FFT
! Nt = number of timesteps to take
! Tmax = maximum simulation time
! FFTW_IN_PLACE = value for FFTW input
! FFTW_MEASURE = value for FFTW input
! FFTW_EXHAUSTIVE = value for FFTW input
! FFTW_PATIENT = value for FFTW input
! FFTW_ESTIMATE = value for FFTW input
! FFTW_FORWARD = value for FFTW input
! FFTW_BACKWARD = value for FFTW input
! pi = 3.14159265358979323846264338327950288419716939937510d0
! mu = viscosity
! rho = density
! .. Scalars ..
! i = loop counter in x direction
! j = loop counter in y direction
! n = loop counter for timesteps direction
! allocatestatus = error indicator during allocation
! count = keep track of information written to disk
! iol = size of array to write to disk
! start = variable to record start time of program
! finish = variable to record end time of program
! count_rate = variable for clock count rate
! planfx = Forward 1d fft plan in x
! planbx = Backward 1d fft plan in x
! planfy = Forward 1d fft plan in y
! planby = Backward 1d fft plan in y
! dt = timestep
! .. Arrays ..
! u = velocity in x direction
! uold = velocity in x direction at previous timestep
! v = velocity in y direction
! vold = velocity in y direction at previous timestep
! u_y = y derivative of velocity in x direction
! v_x = x derivative of velocity in y direction
! omeg = vorticity in real space
! omegold = vorticity in real space at previous
! iterate
! omegcheck = store of vorticity at previous iterate
! omegoldhat = 2D Fourier transform of vorticity at previous
! iterate
! omegoldhat_x = x-derivative of vorticity in Fourier space
! at previous iterate
! omegold_x = x-derivative of vorticity in real space
! at previous iterate
! omegoldhat_y = y-derivative of vorticity in Fourier space
! at previous iterate
! omegold_y = y-derivative of vorticity in real space
! at previous iterate
! nlold = nonlinear term in real space
! at previous iterate
! nloldhat = nonlinear term in Fourier space
! at previous iterate
! omeghat = 2D Fourier transform of vorticity
! at next iterate
! omeghat_x = x-derivative of vorticity in Fourier space
! at next timestep
! omeghat_y = y-derivative of vorticity in Fourier space
! at next timestep
! omeg_x = x-derivative of vorticity in real space
! at next timestep
! omeg_y = y-derivative of vorticity in real space
! at next timestep
! .. Vectors ..
! kx = fourier frequencies in x direction
! ky = fourier frequencies in y direction
! kxx = square of fourier frequencies in x direction
! kyy = square of fourier frequencies in y direction
! x = x locations
! y = y locations
! time = times at which save data
! name_config = array to store filename for data to be saved
! fftfx = array to setup x Fourier transform
! fftbx = array to setup y Fourier transform
! REFERENCES
!
! ACKNOWLEDGEMENTS
!
! ACCURACY
!
! ERROR INDICATORS AND WARNINGS
!
! FURTHER COMMENTS
! This program has not been optimized to use the least amount of memory
! but is intended as an example only for which all states can be saved
!--------------------------------------------------------------------
! External routines required
!
! External libraries required
! FFTW3 -- Fast Fourier Transform in the West Library
! (http://www.fftw.org/)
! declare variables
PROGRAM main
use nsadaptor_glue
IMPLICIT NONE
INTEGER(kind=4), PARAMETER :: Nx=256
INTEGER(kind=4), PARAMETER :: Ny=256
REAL(kind=8), PARAMETER :: dt=0.00125
REAL(kind=8), PARAMETER &
:: pi=3.14159265358979323846264338327950288419716939937510
REAL(kind=8), PARAMETER :: rho=1.0d0
REAL(kind=8), PARAMETER :: mu=1.0d0
REAL(kind=8), PARAMETER :: tol=0.1d0**10
REAL(kind=8) :: chg
INTEGER(kind=4), PARAMETER :: nplots=50
REAL(kind=8), DIMENSION(:), ALLOCATABLE :: time
COMPLEX(kind=8), DIMENSION(:), ALLOCATABLE :: kx,kxx
COMPLEX(kind=8), DIMENSION(:), ALLOCATABLE :: ky,kyy
REAL(kind=8), DIMENSION(:), ALLOCATABLE :: x
REAL(kind=8), DIMENSION(:), ALLOCATABLE :: y
COMPLEX(kind=8), DIMENSION(:,:), ALLOCATABLE :: &
u,uold,v,vold,u_y,v_x,omegold, omegcheck, omeg,&
omegoldhat, omegoldhat_x, omegold_x,&
omegoldhat_y, omegold_y, nlold, nloldhat,&
omeghat, omeghat_x, omeghat_y, omeg_x, omeg_y,&
nl, nlhat, psihat, psihat_x, psi_x, psihat_y, psi_y
! added for coprocessing
real(kind=8), dimension(:,:), allocatable :: dump
REAL(kind=8),DIMENSION(:,:), ALLOCATABLE :: uexact_y,vexact_x,omegexact
INTEGER(kind=4) :: i,j,k,n, allocatestatus, count, iol
INTEGER(kind=4) :: start, finish, count_rate
INTEGER(kind=4), PARAMETER :: FFTW_IN_PLACE = 8, FFTW_MEASURE = 0, &
FFTW_EXHAUSTIVE = 8, FFTW_PATIENT = 32, &
FFTW_ESTIMATE = 64
INTEGER(kind=4),PARAMETER :: FFTW_FORWARD = -1, FFTW_BACKWARD=1
COMPLEX(kind=8), DIMENSION(:,:), ALLOCATABLE :: fftfx,fftbx
INTEGER(kind=8) :: planfxy,planbxy
CHARACTER*100 :: name_config
CALL system_clock(start,count_rate)
ALLOCATE(time(1:nplots),kx(1:Nx),kxx(1:Nx),ky(1:Ny),kyy(1:Ny),x(1:Nx),y(1:Ny),&
u(1:Nx,1:Ny),uold(1:Nx,1:Ny),v(1:Nx,1:Ny),vold(1:Nx,1:Ny),u_y(1:Nx,1:Ny),&
v_x(1:Nx,1:Ny),omegold(1:Nx,1:Ny),omegcheck(1:Nx,1:Ny), omeg(1:Nx,1:Ny),&
omegoldhat(1:Nx,1:Ny),omegoldhat_x(1:Nx,1:Ny), omegold_x(1:Nx,1:Ny), &
omegoldhat_y(1:Nx,1:Ny),omegold_y(1:Nx,1:Ny), nlold(1:Nx,1:Ny), nloldhat(1:Nx,1:Ny),&
omeghat(1:Nx,1:Ny), omeghat_x(1:Nx,1:Ny), omeghat_y(1:Nx,1:Ny), omeg_x(1:Nx,1:Ny),&
omeg_y(1:Nx,1:Ny), nl(1:Nx,1:Ny), nlhat(1:Nx,1:Ny), psihat(1:Nx,1:Ny), &
psihat_x(1:Nx,1:Ny), psi_x(1:Nx,1:Ny), psihat_y(1:Nx,1:Ny), psi_y(1:Nx,1:Ny),&
uexact_y(1:Nx,1:Ny), vexact_x(1:Nx,1:Ny), omegexact(1:Nx,1:Ny),fftfx(1:Nx,1:Ny),&
fftbx(1:Nx,1:Ny),dump(1:Nx,1:Ny),stat=AllocateStatus)
IF (AllocateStatus .ne. 0) STOP
PRINT *,'allocated space'
! set up ffts
CALL dfftw_plan_dft_2d_(planfxy,Nx,Ny,fftfx(1:Nx,1:Ny),fftbx(1:Nx,1:Ny),&
FFTW_FORWARD,FFTW_EXHAUSTIVE)
CALL dfftw_plan_dft_2d_(planbxy,Nx,Ny,fftbx(1:Nx,1:Ny),fftfx(1:Nx,1:Ny),&
FFTW_BACKWARD,FFTW_EXHAUSTIVE)
! setup fourier frequencies in x-direction
DO i=1,1+Nx/2
kx(i)= 2.0d0*pi*cmplx(0.0d0,1.0d0)*REAL(i-1,kind(0d0))
END DO
kx(1+Nx/2)=0.0d0
DO i = 1,Nx/2 -1
kx(i+1+Nx/2)=-kx(1-i+Nx/2)
END DO
DO i=1,Nx
kxx(i)=kx(i)*kx(i)
END DO
DO i=1,Nx
x(i)=REAL(i-1,kind(0d0))/REAL(Nx,kind(0d0))
END DO
! setup fourier frequencies in y-direction
DO j=1,1+Ny/2
ky(j)= 2.0d0*pi*cmplx(0.0d0,1.0d0)*REAL(j-1,kind(0d0))
END DO
ky(1+Ny/2)=0.0d0
DO j = 1,Ny/2 -1
ky(j+1+Ny/2)=-ky(1-j+Ny/2)
END DO
DO j=1,Ny
kyy(j)=ky(j)*ky(j)
END DO
DO j=1,Ny
y(j)=REAL(j-1,kind(0d0))/REAL(Ny,kind(0d0))
END DO
PRINT *,'Setup grid and fourier frequencies'
DO j=1,Ny
DO i=1,Nx
u(i,j)=sin(2.0d0*pi*x(i))*cos(2.0d0*pi*y(j))
v(i,j)=-cos(2.0d0*pi*x(i))*sin(2.0d0*pi*y(j))
u_y(i,j)=-2.0d0*pi*sin(2.0d0*pi*x(i))*sin(2.0d0*pi*y(j))
v_x(i,j)=2.0d0*pi*sin(2.0d0*pi*x(i))*sin(2.0d0*pi*y(j))
omeg(i,j)=v_x(i,j)-u_y(i,j)
END DO
END DO
! Vorticity to Fourier Space
CALL dfftw_execute_dft_(planfxy,omeg(1:Nx,1:Ny),omeghat(1:Nx,1:Ny))
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!Initial nonlinear term !!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! obtain \hat{\omega}_x^{n,k}
DO j=1,Ny
omeghat_x(1:Nx,j)=omeghat(1:Nx,j)*kx(1:Nx)
END DO
! obtain \hat{\omega}_y^{n,k}
DO i=1,Nx
omeghat_y(i,1:Ny)=omeghat(i,1:Ny)*ky(1:Ny)
END DO
! convert to real space
CALL dfftw_execute_dft_(planbxy,omeghat_x(1:Nx,1:Ny),omeg_x(1:Nx,1:Ny))
CALL dfftw_execute_dft_(planbxy,omeghat_y(1:Nx,1:Ny),omeg_y(1:Nx,1:Ny))
! compute nonlinear term in real space
DO j=1,Ny
nl(1:Nx,j)=u(1:Nx,j)*omeg_x(1:Nx,j)/REAL(Nx*Ny,kind(0d0))+&
v(1:Nx,j)*omeg_y(1:Nx,j)/REAL(Nx*Ny,kind(0d0))
END DO
CALL dfftw_execute_dft_(planfxy,nl(1:Nx,1:Ny),nlhat(1:Nx,1:Ny))
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
time(1)=0.0d0
PRINT *,'Got initial data, starting timestepping'
! New Coprocessor API:
call coprocessorinitializewithpython("pipeline.py", 11)
DO n=1,nplots
chg=1
! save old values
uold(1:Nx,1:Ny)=u(1:Nx,1:Ny)
vold(1:Nx,1:Ny)=v(1:Nx,1:Ny)
omegold(1:Nx,1:Ny)=omeg(1:Nx,1:Ny)
omegcheck(1:Nx,1:Ny)=omeg(1:Nx,1:Ny)
omegoldhat(1:Nx,1:Ny)=omeghat(1:Nx,1:Ny)
nloldhat(1:Nx,1:Ny)=nlhat(1:Nx,1:Ny)
DO WHILE (chg>tol)
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!nonlinear fixed (n,k+1)!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! obtain \hat{\omega}_x^{n+1,k}
DO j=1,Ny
omeghat_x(1:Nx,j)=omeghat(1:Nx,j)*kx(1:Nx)
END DO
! obtain \hat{\omega}_y^{n+1,k}
DO i=1,Nx
omeghat_y(i,1:Ny)=omeghat(i,1:Ny)*ky(1:Ny)
END DO
! convert back to real space
CALL dfftw_execute_dft_(planbxy,omeghat_x(1:Nx,1:Ny),omeg_x(1:Nx,1:Ny))
CALL dfftw_execute_dft_(planbxy,omeghat_y(1:Nx,1:Ny),omeg_y(1:Nx,1:Ny))
! calculate nonlinear term in real space
DO j=1,Ny
nl(1:Nx,j)=u(1:Nx,j)*omeg_x(1:Nx,j)/REAL(Nx*Ny,kind(0d0))+&
v(1:Nx,j)*omeg_y(1:Nx,j)/REAL(Nx*Ny,kind(0d0))
END DO
! convert back to fourier
CALL dfftw_execute_dft_(planfxy,nl(1:Nx,1:Ny),nlhat(1:Nx,1:Ny))
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! obtain \hat{\omega}^{n+1,k+1} with Crank Nicolson timestepping
DO j=1,Ny
omeghat(1:Nx,j)=( (1.0d0/dt+0.5d0*(mu/rho)*(kxx(1:Nx)+kyy(j)))&
*omegoldhat(1:Nx,j) - 0.5d0*(nloldhat(1:Nx,j)+nlhat(1:Nx,j)))/&
(1.0d0/dt-0.5d0*(mu/rho)*(kxx(1:Nx)+kyy(j)))
END DO
! calculate \hat{\psi}^{n+1,k+1}
DO j=1,Ny
psihat(1:Nx,j)=-omeghat(1:Nx,j)/(kxx(1:Nx)+kyy(j))
END DO
psihat(1,1)=0.0d0
psihat(Nx/2+1,Ny/2+1)=0.0d0
psihat(Nx/2+1,1)=0.0d0
psihat(1,Ny/2+1)=0.0d0
! obtain \psi_x^{n+1,k+1} and \psi_y^{n+1,k+1}
DO j=1,Ny
psihat_x(1:Nx,j)=psihat(1:Nx,j)*kx(1:Nx)
END DO
CALL dfftw_execute_dft_(planbxy,psihat_x(1:Nx,1:Ny),psi_x(1:Nx,1:Ny))
DO i=1,Nx
psihat_y(i,1:Ny)=psihat(i,1:Ny)*ky(1:Ny)
END DO
CALL dfftw_execute_dft_(planbxy,psihat_y(1:Ny,1:Ny),psi_y(1:Ny,1:Ny))
DO j=1,Ny
psi_x(1:Nx,j)=psi_x(1:Nx,j)/REAL(Nx*Ny,kind(0d0))
psi_y(1:Nx,j)=psi_y(1:Nx,j)/REAL(Nx*Ny,kind(0d0))
END DO
! obtain \omega^{n+1,k+1}
CALL dfftw_execute_dft_(planbxy,omeghat(1:Nx,1:Ny),omeg(1:Nx,1:Ny))
DO j=1,Ny
omeg(1:Nx,j)=omeg(1:Nx,j)/REAL(Nx*Ny,kind(0d0))
END DO
! obtain u^{n+1,k+1} and v^{n+1,k+1} using stream function (\psi) in real space
DO j=1,Ny
u(1:Nx,j)=psi_y(1:Nx,j)
v(1:Nx,j)=-psi_x(1:Nx,j)
END DO
! check for convergence
chg=maxval(abs(omeg-omegcheck))
! saves {n+1,k+1} to {n,k} for next iteration
omegcheck=omeg
END DO
time(n+1)=time(n)+dt
PRINT *,'TIME ',time(n+1)
! call adaptor here
! not sure if there's a way to cleanly pass complex to C++ code
do j = 1, Ny
do i = 1, Nx
dump(i,j) = real(omeg(i,j), kind(0d0))
end do
end do
call nsadaptor(Nx, Ny, 1, n, time(n), dump)
END DO
call coprocessorfinalize()
DO j=1,Ny
DO i=1,Nx
uexact_y(i,j)=-2.0d0*pi*sin(2.0d0*pi*x(i))*sin(2.0d0*pi*y(j))*&
exp(-8.0d0*mu*(pi**2)*nplots*dt)
vexact_x(i,j)=2.0d0*pi*sin(2.0d0*pi*x(i))*sin(2.0d0*pi*y(j))*&
exp(-8.0d0*mu*(pi**2)*nplots*dt)
omegexact(i,j)=vexact_x(i,j)-uexact_y(i,j)
END DO
END DO
name_config = 'omegafinal.datbin'
INQUIRE(iolength=iol) omegexact(1,1)
OPEN(unit=11,FILE=name_config,form="unformatted", access="direct",recl=iol)
count = 1
DO j=1,Ny
DO i=1,Nx
WRITE(11,rec=count) REAL(omeg(i,j),KIND(0d0))
count=count+1
END DO
END DO
CLOSE(11)
name_config = 'omegaexactfinal.datbin'
OPEN(unit=11,FILE=name_config,form="unformatted", access="direct",recl=iol)
count = 1
DO j=1,Ny
DO i=1,Nx
WRITE(11,rec=count) omegexact(i,j)
count=count+1
END DO
END DO
CLOSE(11)
name_config = 'xcoord.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO i=1,Nx
WRITE(11,*) x(i)
END DO
CLOSE(11)
name_config = 'ycoord.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO j=1,Ny
WRITE(11,*) y(j)
END DO
CLOSE(11)
CALL dfftw_destroy_plan_(planfxy)
CALL dfftw_destroy_plan_(planbxy)
CALL dfftw_cleanup_()
DEALLOCATE(time,kx,kxx,ky,kyy,x,y,&
u,uold,v,vold,u_y,v_x,omegold, omegcheck, omeg, &
omegoldhat, omegoldhat_x, omegold_x,&
omegoldhat_y, omegold_y, nlold, nloldhat,&
omeghat, omeghat_x, omeghat_y, omeg_x, omeg_y,&
nl, nlhat, psihat, psihat_x, psi_x, psihat_y, psi_y,&
uexact_y,vexact_x,omegexact, &
fftfx,fftbx,stat=AllocateStatus)
IF (AllocateStatus .ne. 0) STOP
PRINT *,'Program execution complete'
END PROGRAM main
|
(
) |
Fortran adaptor: NSadaptor.f90 -- No API changes, included for completeness. Code download
! Fortran module for interacting with the ParaView/Catalyst coprocessor
module nsadaptor_glue
implicit none
public
contains
subroutine nsadaptor(nx, ny, nz, step, time, scalar)
! nx, ny, nz -- grid dimensions
! step -- current time step
! time -- current time
! scalar -- the data
integer, intent(in) :: nx, ny, nz, step
real(kind=8), intent(in) :: time
real(kind=8), dimension(:,:), intent(in) :: scalar
integer :: flag
! check if coprocessing this step
! defined in adaptor header in ParaView src
call requestdatadescription(step, time, flag)
if (flag .ne. 0) then
! coprocessing requested
! check if grid exists
! also from ParaView adaptor header
call needtocreategrid(flag)
if (flag .ne. 0) then
! grid needed
! defined by application developer
call createcpimagedata(nx, ny, nz)
end if
! also defined by application dev
call addfield(scalar, "test")
! from ParaView adaptor header
call coprocess()
end if
return
end subroutine nsadaptor
end module nsadaptor_glue
|
(
) |
C++ VTK data helper: PointBasedDataSet.cxx -- Changes are a new header and a new namespace. Notice also the use of vtkSmartPointer instead of raw pointer handling. Code download
// VTK Data set creator for ParaView/Catalyst Coprocessing.
// Interfaces with a Fortran adaptor to get data from a Fortran
// simulation.
// Fortran specific header for API called from Fortran glue module
#include "vtkCPPythonAdaptorAPI.h"
// Adaptor specific headers, will need to change if simulation data structure
// changes. This should available from the ParaView install include directory
// if you chose to install development files during configuration.
#include "vtkCPDataDescription.h"
#include "vtkCPInputDataDescription.h"
#include "vtkCPProcessor.h"
#include "vtkDoubleArray.h"
#include "vtkPointData.h"
#include "vtkSmartPointer.h"
#include "vtkImageData.h"
#include "vtkCellData.h"
#include <string>
#include <iostream>
// Names are mangled with trailing underscore for linker visibility, along with
// using extern "C"
// Creates the vtkDataSet container for the CoProcessor. The simulation data
// must be compatible.
extern "C" void createcpimagedata_(int* nx, int* ny, int* nz)
{
if (!vtkCPPythonAdaptorAPI::GetCoProcessorData()) {
vtkGenericWarningMacro("Unable to access coprocessor data.");
return;
}
vtkSmartPointer<vtkImageData> img = vtkSmartPointer<vtkImageData>::New();
// Note: expects Fortran indexing
img->SetExtent(0, *nx - 1, 0, *ny - 1, 0, *nz - 1);
vtkCPPythonAdaptorAPI::GetCoProcessorData()->GetInputDescriptionByName("input")->SetGrid(img);
}
// Add data to the container.
extern "C" void addfield_(double* simdata, char* name)
{
vtkSmartPointer<vtkCPInputDataDescription> idd =
vtkCPPythonAdaptorAPI::GetCoProcessorData()->GetInputDescriptionByName("input");
vtkSmartPointer<vtkImageData> img = vtkImageData::SafeDownCast(idd->GetGrid());
if (!img) {
vtkGenericWarningMacro("No adaptor grid to attach field data to.");
return;
}
std::string fieldPythonName = name;
if (idd->IsFieldNeeded(name)) {
vtkSmartPointer<vtkDoubleArray> field = vtkSmartPointer<vtkDoubleArray>::New();
field->SetName(name);
field->SetArray(simdata, img->GetNumberOfPoints(), 1);
img->GetPointData()->AddArray(field);
}
}
Using ParaView CoProcessing with MPI and Domain Decomposition
editThe following notes developed while working on NS3D-MPI, NavierStokes3DfftIMR.f90, KgSemiImp3D.f90, and NSLsplitting.f90 on Nautilus at NICS.
Nodal interpretation
editWhen working with single CPU node, non-partioned data, the simulation arrays can be passed directly to ParaView and added to the pipeline using the vtkDataSet::GetPointData()->AddArray() method. The vtkImageData’s points then correspond exactly to the simulation’s mesh nodes. This vtkImageData will have an implied cellular structure, with each cell corner being one of the simulation nodes.
Without special altering for MPI this code will run, however ParaView won’t be able to create the cells whose corners are on differet ranks. The .pvti will give an error when loaded that the extents are incorrect. The individual .vtis can be loaded to see that this is indeed the case and that there will be cellular gaps between the pencils.
E.g., say the array is dimension(20,10,10) and has been decomposed into 2 pencils, (1:10,:,:) and (11:20,:,:). ParaView considers the data for (10:11,:,:) to be missing and a gap will be present.
Working with point halos
editHalos provide one way to fill in the gaps. The 2decomp&fft halo api creates arrays containing the pencil data surrounded by halo elements. E.g., say a pencil is (10:20,10:20,:), the halo array is then (9:21,9:21,:). Using the halos raises some issues:
- Halo elements can extend outside of the computational domain and these will contain garbage.
- Not every pencil needs to send the same halo elements to the coprocessor. This requires additional decision logic in the program.
- From the C++ point of view, a halo array section will not be densely packed, and therefore contain garbage.
Code changes in simulation code
edit
|
(
) |
- Added halo arrays.
- Added 1D arrays for passing packed arrays to C++.
- Added branching for which halo elements to pass based on pencil extents.
- If pencil furthest from index origin, send just the pencil, no halo elements.
- Else if pencil is against either furthest extent, but not both, send the halo elements furthest in the other extent.
- Else send the halo elements furthest in both extents.
- Added reshaping of the halo sections. Arrays are passed by reference from F90 to C++. C++ doesn’t understand F90 array sections and will just read incrementally from the reference (this is an issue with any array sectioning, not just halos). One way around this is to reshape the section as a 1D array, this will make the desired data contiguous in memory.
PROGRAM main
!-----------------------------------------------------------------------------------
!
!
! PURPOSE
!
! This program numerically solves the 3D incompressible Navier-Stokes
! on a Cubic Domain [0,2pi]x[0,2pi]x[0,2pi] using pseudo-spectral methods and
! Implicit Midpoint rule timestepping. The numerical solution is compared to
! an exact solution reported by Shapiro
!
! Analytical Solution:
! u(x,y,z,t)=-0.25*(cos(x)sin(y)sin(z)+sin(x)cos(y)cos(z))exp(-t/Re)
! v(x,y,z,t)= 0.25*(sin(x)cos(y)sin(z)-cos(x)sin(y)cos(z))exp(-t/Re)
! w(x,y,z,t)= 0.5*cos(x)cos(y)sin(z)exp(-t/Re)
!
! .. Parameters ..
! Nx = number of modes in x - power of 2 for FFT
! Ny = number of modes in y - power of 2 for FFT
! Nz = number of modes in z - power of 2 for FFT
! Nt = number of timesteps to take
! Tmax = maximum simulation time
! FFTW_IN_PLACE = value for FFTW input
! FFTW_MEASURE = value for FFTW input
! FFTW_EXHAUSTIVE = value for FFTW input
! FFTW_PATIENT = value for FFTW input
! FFTW_ESTIMATE = value for FFTW input
! FFTW_FORWARD = value for FFTW input
! FFTW_BACKWARD = value for FFTW input
! pi = 3.14159265358979323846264338327950288419716939937510d0
! Re = Reynolds number
! .. Scalars ..
! i = loop counter in x direction
! j = loop counter in y direction
! k = loop counter in z direction
! n = loop counter for timesteps direction
! allocatestatus = error indicator during allocation
! count = keep track of information written to disk
! iol = size of array to write to disk
! start = variable to record start time of program
! finish = variable to record end time of program
! count_rate = variable for clock count rate
! planfxyz = Forward 3d fft plan
! planbxyz = Backward 3d fft plan
! dt = timestep
! .. Arrays ..
! u = velocity in x direction
! v = velocity in y direction
! w = velocity in z direction
! uold = velocity in x direction at previous timestep
! vold = velocity in y direction at previous timestep
! wold = velocity in z direction at previous timestep
! ux = x derivative of velocity in x direction
! uy = y derivative of velocity in x direction
! uz = z derivative of velocity in x direction
! vx = x derivative of velocity in y direction
! vy = y derivative of velocity in y direction
! vz = z derivative of velocity in y direction
! wx = x derivative of velocity in z direction
! wy = y derivative of velocity in z direction
! wz = z derivative of velocity in z direction
! uxold = x derivative of velocity in x direction
! uyold = y derivative of velocity in x direction
! uzold = z derivative of velocity in x direction
! vxold = x derivative of velocity in y direction
! vyold = y derivative of velocity in y direction
! vzold = z derivative of velocity in y direction
! wxold = x derivative of velocity in z direction
! wyold = y derivative of velocity in z direction
! wzold = z derivative of velocity in z direction
! utemp = temporary storage of u to check convergence
! vtemp = temporary storage of u to check convergence
! wtemp = temporary storage of u to check convergence
! temp_r = temporary storage for untransformed variables
! uhat = Fourier transform of u
! vhat = Fourier transform of v
! what = Fourier transform of w
! rhsuhatfix = Fourier transform of righthand side for u for timestepping
! rhsvhatfix = Fourier transform of righthand side for v for timestepping
! rhswhatfix = Fourier transform of righthand side for w for timestepping
! nonlinuhat = Fourier transform of nonlinear term for u
! nonlinvhat = Fourier transform of nonlinear term for u
! nonlinwhat = Fourier transform of nonlinear term for u
! phat = Fourier transform of nonlinear term for pressure, p
! temp_c = temporary storage for Fourier transforms
! realtemp = Real storage
!
! .. Vectors ..
! kx = fourier frequencies in x direction
! ky = fourier frequencies in y direction
! kz = fourier frequencies in z direction
! x = x locations
! y = y locations
! z = y locations
! time = times at which save data
! name_config = array to store filename for data to be saved
!
! REFERENCES
!
! A. Shapiro " The use of an exact solution of the Navier-Stokes equations
! in a validation test of a three-dimensional nonhydrostatic numerical model"
! Monthly Weather Review vol. 121, 2420-2425, (1993).
!
! ACKNOWLEDGEMENTS
!
! ACCURACY
!
! ERROR INDICATORS AND WARNINGS
!
! FURTHER COMMENTS
!
! This program has not been optimized to use the least amount of memory
! but is intended as an example only for which all states can be saved
!
!--------------------------------------------------------------------------------
! External routines required
!
! External libraries required
! 2DECOMP&FFT -- Fast Fourier Transform in the West Library
! (http://2decomp.org/)
!---------------------------------------------------------------------------------
USE decomp_2d
USE decomp_2d_fft
USE decomp_2d_io
USE MPI
!coprocessing:
use NSadaptor_module
IMPLICIT NONE
! declare variables
INTEGER(kind=4), PARAMETER :: Nx=256
INTEGER(kind=4), PARAMETER :: Ny=256
INTEGER(kind=4), PARAMETER :: Nz=256
INTEGER(kind=4), PARAMETER :: Lx=1
INTEGER(kind=4), PARAMETER :: Ly=1
INTEGER(kind=4), PARAMETER :: Lz=1
INTEGER(kind=4), PARAMETER :: Nt=20
REAL(kind=8), PARAMETER :: dt=0.05d0/Nt
REAL(kind=8), PARAMETER :: Re=1.0d0
REAL(kind=8), PARAMETER :: tol=0.1d0**10
REAL(kind=8), PARAMETER :: theta=0.0d0
REAL(kind=8), PARAMETER &
:: pi=3.14159265358979323846264338327950288419716939937510d0
REAL(kind=8), PARAMETER :: ReInv=1.0d0/REAL(Re,kind(0d0))
REAL(kind=8), PARAMETER :: dtInv=1.0d0/REAL(dt,kind(0d0))
REAL(kind=8) :: scalemodes,chg,factor
REAL(kind=8), DIMENSION(:), ALLOCATABLE :: x, y, z, time,mychg,allchg
COMPLEX(kind=8), DIMENSION(:,:,:), ALLOCATABLE :: u, v, w,&
ux, uy, uz,&
vx, vy, vz,&
wx, wy, wz,&
uold, uxold, uyold, uzold,&
vold, vxold, vyold, vzold,&
wold, wxold, wyold, wzold,&
utemp, vtemp, wtemp, temp_r
COMPLEX(kind=8), DIMENSION(:), ALLOCATABLE :: kx, ky, kz
COMPLEX(kind=8), DIMENSION(:,:,:), ALLOCATABLE :: uhat, vhat, what,&
rhsuhatfix, rhsvhatfix,&
rhswhatfix, nonlinuhat,&
nonlinvhat, nonlinwhat,&
phat,temp_c
!coprocessing: added x,y,z component arrays also halos
!coprocessing: halos get allocated inside call to update_halo
REAL(kind=8), DIMENSION(:,:,:), ALLOCATABLE :: realtemp, realtempx, &
realtempy, realtempz, tempxhalo, tempyhalo, tempzhalo
!coprocessing: added 1D array for passing to C++
real(kind=8), dimension(:), allocatable :: xpass2c, ypass2c, zpass2c
! MPI and 2DECOMP variables
TYPE(DECOMP_INFO) :: decomp
INTEGER(kind=MPI_OFFSET_KIND) :: filesize, disp
INTEGER(kind=4) :: p_row=0, p_col=0, numprocs, myid, ierr
! variables used for saving data and timing
INTEGER(kind=4) :: count, iol
INTEGER(kind=4) :: i,j,k,n,t,allocatestatus
INTEGER(kind=4) :: ind, numberfile
CHARACTER*100 :: name_config
INTEGER(kind=4) :: start, finish, count_rate
! initialisation of 2DECOMP&FFT
CALL MPI_INIT(ierr)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)
! do automatic domain decomposition
CALL decomp_2d_init(Nx,Ny,Nz,p_row,p_col)
! get information about domain decomposition choosen
CALL decomp_info_init(Nx,Ny,Nz,decomp)
! initialise FFT library
CALL decomp_2d_fft_init
IF (myid.eq.0) THEN
PRINT *,'Grid:',Nx,'X',Ny,'Y',Nz,'Z'
PRINT *,'dt:',dt
END IF
ALLOCATE(x(1:Nx),y(1:Ny),z(1:Nz),time(1:Nt+1),mychg(1:3),allchg(1:3),&
u(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
v(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
w(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
ux(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
uy(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
uz(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vx(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vy(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vz(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wx(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wy(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wz(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
uold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
uxold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
uyold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
uzold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vxold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vyold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vzold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wxold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wyold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wzold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
utemp(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vtemp(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wtemp(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
temp_r(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
kx(1:Nx),ky(1:Ny),kz(1:Nz),&
uhat(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
vhat(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
what(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
rhsuhatfix(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
rhsvhatfix(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
rhswhatfix(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
nonlinuhat(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
nonlinvhat(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
nonlinwhat(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
phat(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
temp_c(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
realtemp(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
realtempx(decomp%xst(1):decomp%xen(1), &
decomp%xst(2):decomp%xen(2), &
decomp%xst(3):decomp%xen(3)),&
realtempy(decomp%xst(1):decomp%xen(1), &
decomp%xst(2):decomp%xen(2), &
decomp%xst(3):decomp%xen(3)),&
realtempz(decomp%xst(1):decomp%xen(1), &
decomp%xst(2):decomp%xen(2), &
decomp%xst(3):decomp%xen(3)),&
stat=AllocateStatus)
IF (AllocateStatus .ne. 0) STOP
IF (myid.eq.0) THEN
PRINT *,'allocated space'
END IF
! setup fourier frequencies in x-direction
DO i=1,Nx/2+1
kx(i)= cmplx(0.0d0,1.0d0)*REAL(i-1,kind(0d0))/Lx
END DO
kx(1+Nx/2)=0.0d0
DO i = 1,Nx/2 -1
kx(i+1+Nx/2)=-kx(1-i+Nx/2)
END DO
ind=1
DO i=-Nx/2,Nx/2-1
x(ind)=2.0d0*pi*REAL(i,kind(0d0))*Lx/REAL(Nx,kind(0d0))
ind=ind+1
END DO
! setup fourier frequencies in y-direction
DO j=1,Ny/2+1
ky(j)= cmplx(0.0d0,1.0d0)*REAL(j-1,kind(0d0))/Ly
END DO
ky(1+Ny/2)=0.0d0
DO j = 1,Ny/2 -1
ky(j+1+Ny/2)=-ky(1-j+Ny/2)
END DO
ind=1
DO j=-Ny/2,Ny/2-1
y(ind)=2.0d0*pi*REAL(j,kind(0d0))*Ly/REAL(Ny,kind(0d0))
ind=ind+1
END DO
! setup fourier frequencies in z-direction
DO k=1,Nz/2+1
kz(k)= cmplx(0.0d0,1.0d0)*REAL(k-1,kind(0d0))/Lz
END DO
kz(1+Nz/2)=0.0d0
DO k = 1,Nz/2 -1
kz(k+1+Nz/2)=-kz(1-k+Nz/2)
END DO
ind=1
DO k=-Nz/2,Nz/2-1
z(ind)=2.0d0*pi*REAL(k,kind(0d0))*Lz/REAL(Nz,kind(0d0))
ind=ind+1
END DO
scalemodes=1.0d0/REAL(Nx*Ny*Nz,kind(0d0))
IF (myid.eq.0) THEN
PRINT *,'Setup grid and fourier frequencies'
END IF
!initial conditions for Taylor-Green vortex
! factor=2.0d0/sqrt(3.0d0)
! DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
! u(i,j,k)=factor*sin(theta+2.0d0*pi/3.0d0)*sin(x(i))*cos(y(j))*cos(z(k))
! END DO; END DO; END DO
! DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
! v(i,j,k)=factor*sin(theta-2.0d0*pi/3.0d0)*cos(x(i))*sin(y(j))*cos(z(k))
! END DO ; END DO ; END DO
! DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
! w(i,j,k)=factor*sin(theta)*cos(x(i))*cos(y(j))*sin(z(k))
! END DO ; END DO ; END DO
time(1)=0.0d0
factor=sqrt(3.0d0)
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
u(i,j,k)=-0.5*( factor*cos(x(i))*sin(y(j))*sin(z(k))&
+sin(x(i))*cos(y(j))*cos(z(k)) )*exp(-(factor**2)*time(1)/Re)
END DO; END DO; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
v(i,j,k)=0.5*( factor*sin(x(i))*cos(y(j))*sin(z(k))&
-cos(x(i))*sin(y(j))*cos(z(k)) )*exp(-(factor**2)*time(1)/Re)
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
w(i,j,k)=cos(x(i))*cos(y(j))*sin(z(k))*exp(-(factor**2)*time(1)/Re)
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(u,uhat,DECOMP_2D_FFT_FORWARD)
CALL decomp_2d_fft_3d(v,vhat,DECOMP_2D_FFT_FORWARD)
CALL decomp_2d_fft_3d(w,what,DECOMP_2D_FFT_FORWARD)
! derivative of u with respect to x, y, and z
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=uhat(i,j,k)*kx(i)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,ux,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=uhat(i,j,k)*ky(j)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,uy,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=uhat(i,j,k)*kz(k)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,uz,DECOMP_2D_FFT_BACKWARD)
! derivative of v with respect to x, y, and z
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=vhat(i,j,k)*kx(i)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,vx,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=vhat(i,j,k)*ky(j)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,vy,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=vhat(i,j,k)*kz(k)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,vz,DECOMP_2D_FFT_BACKWARD)
! derivative of w with respect to x, y, and z
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=what(i,j,k)*kx(i)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,wx,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=what(i,j,k)*ky(j)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,wy,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=what(i,j,k)*kz(k)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,wz,DECOMP_2D_FFT_BACKWARD)
! save initial data
n = 0
!coprocessing: removed savedata calls from coprocessing version
!coprocessing: could have this after the coprocessor initialize and
!coprocessing: followed by a call to the coprocessor to write out the initial data
!coprocessing: image. Otherwise, these are just wasted calls at this spot.
call coprocessorinitialize("pipeline.py", 11)
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
realtempx(i,j,k)=REAL(wy(i,j,k)-vz(i,j,k),KIND=8)
END DO; END DO ; END DO
call update_halo(realtempx, tempxhalo, level=1)
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
realtempy(i,j,k)=REAL(uz(i,j,k)-wx(i,j,k),KIND=8)
END DO; END DO ; END DO
call update_halo(realtempy, tempyhalo, level=1)
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
realtempz(i,j,k)=REAL(vx(i,j,k)-uy(i,j,k),KIND=8)
END DO; END DO ; END DO
call update_halo(realtempz, tempzhalo, level=1)
!coprocessing: Four cases of sending halo data to the coprocessor.
if ((ny_global == decomp%xen(2)) .and. (nz_global == decomp%xen(3))) then
!coprocessing: case 1: pencil furthest from index origin, send just the pencil
allocate(xpass2c(decomp%xsz(1)*decomp%xsz(2)*decomp%xsz(3)))
allocate(ypass2c(decomp%xsz(1)*decomp%xsz(2)*decomp%xsz(3)))
allocate(zpass2c(decomp%xsz(1)*decomp%xsz(2)*decomp%xsz(3)))
xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)),(/size(xpass2c)/))
ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)),(/size(ypass2c)/))
zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)),(/size(zpass2c)/))
call NSadaptor(Nx,Ny,Nz,decomp%xst(1),decomp%xen(1), &
decomp%xst(2),decomp%xen(2),decomp%xst(3),decomp%xen(3), n, n*dt, &
xpass2c, ypass2c, zpass2c)
deallocate(xpass2c, ypass2c, zpass2c)
else if ((ny_global > decomp%xen(2)) .and. (nz_global == decomp%xen(3))) then
!coprocessing: case 2: pencil furthest in z, send just upper y halo
allocate(xpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*decomp%xsz(3)))
allocate(ypass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*decomp%xsz(3)))
allocate(zpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*decomp%xsz(3)))
xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)),(/size(xpass2c)/))
ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)),(/size(ypass2c)/))
zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)),(/size(zpass2c)/))
call NSadaptor(Nx,Ny,Nz,decomp%xst(1),decomp%xen(1), &
decomp%xst(2),decomp%xen(2)+1,decomp%xst(3),decomp%xen(3), n, n*dt, &
xpass2c, ypass2c, zpass2c)
deallocate(xpass2c, ypass2c, zpass2c)
else if ((ny_global == decomp%xen(2)) .and. (nz_global > decomp%xen(3))) then
!coprocessing: case 3: pencil furthest in y, send just upper z halo
allocate(xpass2c(decomp%xsz(1)*decomp%xsz(2)*(decomp%xsz(3)+1)))
allocate(ypass2c(decomp%xsz(1)*decomp%xsz(2)*(decomp%xsz(3)+1)))
allocate(zpass2c(decomp%xsz(1)*decomp%xsz(2)*(decomp%xsz(3)+1)))
xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)+1),(/size(xpass2c)/))
ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)+1),(/size(ypass2c)/))
zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)+1),(/size(zpass2c)/))
call NSadaptor(Nx,Ny,Nz,decomp%xst(1),decomp%xen(1), &
decomp%xst(2),decomp%xen(2),decomp%xst(3),decomp%xen(3)+1, n, n*dt, &
xpass2c, ypass2c, zpass2c)
deallocate(xpass2c, ypass2c, zpass2c)
else
!coprocessing: case 4: send both upper y and upper z halo
allocate(xpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*(decomp%xsz(3)+1)))
allocate(ypass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*(decomp%xsz(3)+1)))
allocate(zpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*(decomp%xsz(3)+1)))
xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)+1),(/size(xpass2c)/))
ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)+1),(/size(ypass2c)/))
zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)+1),(/size(zpass2c)/))
call NSadaptor(Nx,Ny,Nz,decomp%xst(1),decomp%xen(1), &
decomp%xst(2),decomp%xen(2)+1,decomp%xst(3),decomp%xen(3)+1,n,n*dt, &
xpass2c, ypass2c, zpass2c)
deallocate(xpass2c,ypass2c,zpass2c)
end if
!start timer
CALL system_clock(start,count_rate)
! coprocessing: Simulation loop starts here
DO n=1,Nt
!fixed point
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
uold(i,j,k)=u(i,j,k)
uxold(i,j,k)=ux(i,j,k)
uyold(i,j,k)=uy(i,j,k)
uzold(i,j,k)=uz(i,j,k)
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
vold(i,j,k)=v(i,j,k)
vxold(i,j,k)=vx(i,j,k)
vyold(i,j,k)=vy(i,j,k)
vzold(i,j,k)=vz(i,j,k)
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
wold(i,j,k)=w(i,j,k)
wxold(i,j,k)=wx(i,j,k)
wyold(i,j,k)=wy(i,j,k)
wzold(i,j,k)=wz(i,j,k)
END DO; END DO ; END DO
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
rhsuhatfix(i,j,k) = (dtInv+(0.5*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)))*uhat(i,j,k)
END DO; END DO ; END DO
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
rhsvhatfix(i,j,k) = (dtInv+(0.5*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)))*vhat(i,j,k)
END DO; END DO ; END DO
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
rhswhatfix(i,j,k) = (dtInv+(0.5*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)))*what(i,j,k)
END DO; END DO ; END DO
chg=1
DO WHILE (chg .gt. tol)
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
temp_r(i,j,k)=0.25d0*((u(i,j,k)+uold(i,j,k))*(ux(i,j,k)+uxold(i,j,k))&
+(v(i,j,k)+vold(i,j,k))*(uy(i,j,k)+uyold(i,j,k))&
+(w(i,j,k)+wold(i,j,k))*(uz(i,j,k)+uzold(i,j,k)))
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_r,nonlinuhat,DECOMP_2D_FFT_FORWARD)
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
temp_r(i,j,k)=0.25d0*((u(i,j,k)+uold(i,j,k))*(vx(i,j,k)+vxold(i,j,k))&
+(v(i,j,k)+vold(i,j,k))*(vy(i,j,k)+vyold(i,j,k))&
+(w(i,j,k)+wold(i,j,k))*(vz(i,j,k)+vzold(i,j,k)))
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_r,nonlinvhat,DECOMP_2D_FFT_FORWARD)
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
temp_r(i,j,k)=0.25d0*((u(i,j,k)+uold(i,j,k))*(wx(i,j,k)+wxold(i,j,k))&
+(v(i,j,k)+vold(i,j,k))*(wy(i,j,k)+wyold(i,j,k))&
+(w(i,j,k)+wold(i,j,k))*(wz(i,j,k)+wzold(i,j,k)))
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_r,nonlinwhat,DECOMP_2D_FFT_FORWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
phat(i,j,k)=-1.0d0*( kx(i)*nonlinuhat(i,j,k)&
+ky(j)*nonlinvhat(i,j,k)&
+kz(k)*nonlinwhat(i,j,k))&
/(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)+0.1d0**13)
END DO; END DO ; END DO
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
uhat(i,j,k)=(rhsuhatfix(i,j,k)-nonlinuhat(i,j,k)-kx(i)*phat(i,j,k))/&
(dtInv-(0.5d0*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k))) !*scalemodes
END DO; END DO ; END DO
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
vhat(i,j,k)=(rhsvhatfix(i,j,k)-nonlinvhat(i,j,k)-ky(j)*phat(i,j,k))/&
(dtInv-(0.5d0*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k))) !*scalemodes
END DO; END DO ; END DO
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
what(i,j,k)=(rhswhatfix(i,j,k)-nonlinwhat(i,j,k)-kz(k)*phat(i,j,k))/&
(dtInv-(0.5d0*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k))) !*scalemodes
END DO; END DO ; END DO
! derivative of u with respect to x, y, and z
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=uhat(i,j,k)*kx(i)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,ux,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3); DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=uhat(i,j,k)*ky(j)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,uy,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3); DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=uhat(i,j,k)*kz(k)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,uz,DECOMP_2D_FFT_BACKWARD)
! derivative of v with respect to x, y, and z
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=vhat(i,j,k)*kx(i)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,vx,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3); DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=vhat(i,j,k)*ky(j)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,vy,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=vhat(i,j,k)*kz(k)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,vz,DECOMP_2D_FFT_BACKWARD)
! derivative of w with respect to x, y, and z
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=what(i,j,k)*kx(i)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,wx,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=what(i,j,k)*ky(j)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,wy,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=what(i,j,k)*kz(k)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,wz,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
utemp(i,j,k)=u(i,j,k)
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
vtemp(i,j,k)=v(i,j,k)
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
wtemp(i,j,k)=w(i,j,k)
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(uhat,u,DECOMP_2D_FFT_BACKWARD)
CALL decomp_2d_fft_3d(vhat,v,DECOMP_2D_FFT_BACKWARD)
CALL decomp_2d_fft_3d(what,w,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
u(i,j,k)=u(i,j,k)*scalemodes
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
v(i,j,k)=v(i,j,k)*scalemodes
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
w(i,j,k)=w(i,j,k)*scalemodes
END DO; END DO ; END DO
mychg(1) =maxval(abs(utemp-u))
mychg(2) =maxval(abs(vtemp-v))
mychg(3) =maxval(abs(wtemp-w))
CALL MPI_ALLREDUCE(mychg,allchg,3,MPI_DOUBLE_PRECISION,MPI_MAX,MPI_COMM_WORLD,ierr)
chg=allchg(1)+allchg(2)+allchg(3)
IF (myid.eq.0) THEN
PRINT *,'chg:',chg
END IF
END DO
time(n+1)=n*dt
!goto 5100
IF (myid.eq.0) THEN
PRINT *,'time',n*dt
END IF
!coprocessing: populating the arrays sent to the coprocessor
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
realtempx(i,j,k)=REAL(wy(i,j,k)-vz(i,j,k),KIND=8)
END DO; END DO ; END DO
call update_halo(realtempx, tempxhalo, level=1)
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
realtempy(i,j,k)=REAL(uz(i,j,k)-wx(i,j,k),KIND=8)
END DO; END DO ; END DO
call update_halo(realtempy, tempyhalo, level=1)
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
realtempz(i,j,k)=REAL(vx(i,j,k)-uy(i,j,k),KIND=8)
END DO; END DO ; END DO
call update_halo(realtempz, tempzhalo, level=1)
!coprocessing: same cases as for initial conditions.
if ((ny_global == decomp%xen(2)) .and. (nz_global == decomp%xen(3))) then
! case 1: send just the pencil
allocate(xpass2c(decomp%xsz(1)*decomp%xsz(2)*decomp%xsz(3)))
allocate(ypass2c(decomp%xsz(1)*decomp%xsz(2)*decomp%xsz(3)))
allocate(zpass2c(decomp%xsz(1)*decomp%xsz(2)*decomp%xsz(3)))
xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)),(/size(xpass2c)/))
ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)),(/size(ypass2c)/))
zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)),(/size(zpass2c)/))
call NSadaptor(Nx,Ny,Nz,decomp%xst(1),decomp%xen(1), &
decomp%xst(2),decomp%xen(2),decomp%xst(3),decomp%xen(3), n, n*dt, &
xpass2c, ypass2c, zpass2c)
deallocate(xpass2c, ypass2c, zpass2c)
else if ((ny_global > decomp%xen(2)) .and. (nz_global == decomp%xen(3))) then
! case 2: send just upper y halo
allocate(xpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*decomp%xsz(3)))
allocate(ypass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*decomp%xsz(3)))
allocate(zpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*decomp%xsz(3)))
xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)),(/size(xpass2c)/))
ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)),(/size(ypass2c)/))
zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)),(/size(zpass2c)/))
call NSadaptor(Nx,Ny,Nz,decomp%xst(1),decomp%xen(1), &
decomp%xst(2),decomp%xen(2)+1,decomp%xst(3),decomp%xen(3), n, n*dt, &
xpass2c, ypass2c, zpass2c)
deallocate(xpass2c, ypass2c, zpass2c)
else if ((ny_global == decomp%xen(2)) .and. (nz_global > decomp%xen(3))) then
! case 3: send just upper z halo
allocate(xpass2c(decomp%xsz(1)*decomp%xsz(2)*(decomp%xsz(3)+1)))
allocate(ypass2c(decomp%xsz(1)*decomp%xsz(2)*(decomp%xsz(3)+1)))
allocate(zpass2c(decomp%xsz(1)*decomp%xsz(2)*(decomp%xsz(3)+1)))
xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)+1),(/size(xpass2c)/))
ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)+1),(/size(ypass2c)/))
zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)+1),(/size(zpass2c)/))
call NSadaptor(Nx,Ny,Nz,decomp%xst(1),decomp%xen(1), &
decomp%xst(2),decomp%xen(2),decomp%xst(3),decomp%xen(3)+1, n, n*dt, &
xpass2c, ypass2c, zpass2c)
deallocate(xpass2c, ypass2c, zpass2c)
else
! send both upper y and upper z halo
allocate(xpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*(decomp%xsz(3)+1)))
allocate(ypass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*(decomp%xsz(3)+1)))
allocate(zpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*(decomp%xsz(3)+1)))
xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)+1),(/size(xpass2c)/))
ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)+1),(/size(ypass2c)/))
zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)+1),(/size(zpass2c)/))
call NSadaptor(Nx,Ny,Nz,decomp%xst(1),decomp%xen(1), &
decomp%xst(2),decomp%xen(2)+1,decomp%xst(3),decomp%xen(3)+1,n,n*dt, &
xpass2c, ypass2c, zpass2c)
deallocate(xpass2c,ypass2c,zpass2c)
end if
END DO
!coprocessing:
call coprocessorfinalize()
CALL system_clock(finish,count_rate)
IF (myid.eq.0) then
PRINT *, 'Program took', REAL(finish-start)/REAL(count_rate), 'for main timestepping loop'
END IF
IF (myid.eq.0) THEN
name_config = './data/tdata.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO n=1,1+Nt
WRITE(11,*) time(n)
END DO
CLOSE(11)
name_config = './data/xcoord.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO i=1,Nx
WRITE(11,*) x(i)
END DO
CLOSE(11)
name_config = './data/ycoord.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO j=1,Ny
WRITE(11,*) y(j)
END DO
CLOSE(11)
name_config = './data/zcoord.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO k=1,Nz
WRITE(11,*) z(k)
END DO
CLOSE(11)
PRINT *,'Saved data'
END IF
! Calculate error in final numerical solution
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
utemp(i,j,k)=u(i,j,k) -&
(-0.5*( factor*cos(x(i))*sin(y(j))*sin(z(k))&
+sin(x(i))*cos(y(j))*cos(z(k)) )*exp(-(factor**2)*time(Nt+1)/Re))
END DO; END DO; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
vtemp(i,j,k)=v(i,j,k) -&
(0.5*( factor*sin(x(i))*cos(y(j))*sin(z(k))&
-cos(x(i))*sin(y(j))*cos(z(k)) )*exp(-(factor**2)*time(Nt+1)/Re))
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
wtemp(i,j,k)=w(i,j,k)-&
(cos(x(i))*cos(y(j))*sin(z(k))*exp(-(factor**2)*time(Nt+1)/Re))
END DO; END DO ; END DO
mychg(1) = maxval(abs(utemp))
mychg(2) = maxval(abs(vtemp))
mychg(3) = maxval(abs(wtemp))
CALL MPI_ALLREDUCE(mychg,allchg,3,MPI_DOUBLE_PRECISION,MPI_MAX,MPI_COMM_WORLD,ierr)
chg=allchg(1)+allchg(2)+allchg(3)
IF (myid.eq.0) THEN
PRINT*,'The error at the final timestep is',chg
END IF
! clean up
CALL decomp_2d_fft_finalize
CALL decomp_2d_finalize
DEALLOCATE(x,y,z,time,mychg,allchg,u,v,w,ux,uy,uz,vx,vy,vz,wx,wy,wz,uold,uxold,uyold,uzold,&
vold,vxold,vyold,vzold,wold,wxold,wyold,wzold,utemp,vtemp,wtemp,&
temp_r,kx,ky,kz,uhat,vhat,what,rhsuhatfix,rhsvhatfix,&
rhswhatfix,phat,nonlinuhat,nonlinvhat,nonlinwhat,temp_c,&
realtemp,stat=AllocateStatus)
!coprocessing: deallocate Coprocessing specific arrays
deallocate(realtempx, realtempy, realtempz, tempxhalo, tempyhalo, tempzhalo)
IF (AllocateStatus .ne. 0) STOP
IF (myid.eq.0) THEN
PRINT *,'Program execution complete'
END IF
CALL MPI_FINALIZE(ierr)
END PROGRAM main
Code changes in f90 adaptor glue code
edit
|
(
) |
- Changed the adaptor to accept the subextents of the current pencil.
! fortran module for interacting with the ParaView CoProcessor
! loosely based on:
! ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/PhastaAdaptor/phastaNSadaptor.f
! -- Changed for SESE
! subroutine determines if coprocessing needed during the
! current simulation step or not, and if so, calls the coprocessor.
! some of the subroutines are supplied by ParaView's FortranAdaptorAPI,
! others have to be supplied by the programmer.
module NSadaptor_module
use iso_c_binding
implicit none
public
interface NSadaptor
! Originally also had a version that accepted 3D arrays, but in all other
! respects was identical. Decided this could lead to too much confusion.
module procedure NSadaptor1D
end interface NSadaptor
contains
subroutine NSadaptor1D(nx, ny, nz, xst, xen, yst, yen, zst, zen, &
step, time, omegax, omegay, omegaz)
! nx, ny, nz -- grid dimensions or entire mesh
! used for setting whole extent
! xst, xen, etc. -- extents of current subdomain pencil
! step -- current simulation time step
! time -- current simulation time
! omega* -- scalar array for the current time step
! flag -- receives status from API calls
integer, intent(in) :: nx, ny, nz, xst, xen, yst, yen, zst, zen, step
real(kind=8), intent(in) :: time
real(kind=8), dimension(:), intent (in) :: omegax, omegay, omegaz
integer :: flag
! check if processing this time step
! defined in FortranAdaptorAPI.h
call requestdatadescription(step, time, flag)
if (flag /= 0) then
! processing requested
! check if need to create grid
! defined in FortranAdaptorAPI.h
call needtocreategrid(flag)
if (flag /= 0) then
! grid needed
! defined in VTKPointBasedDataSet.cxx
! takes the size of the entire grid, and the extents of the
! sub grid.
call createcpimagedata(nx, ny, nz, xst, xen, yst, yen, zst, zen)
end if
! defined in VTKPointBasedDataSet.cxx
! remember to null-terminate strings for C/C++
call addfield(omegax, "realtempx"//char(0));
call addfield(omegay, "realtempy"//char(0));
call addfield(omegaz, "realtempz"//char(0));
! defined in FortranAdaptorAPI.h
call coprocess()
end if
return
end subroutine NSadaptor1D
end module NSadaptor_module
Code changes in the C++ wrapping code
edit
|
(
) |
- Specified the partitioned data extents relative to the entire data.
- Specified the relative spacing of the cells.
// Adaptor for getting fortran simulation code into ParaView CoProcessor.
// Based on the PhastaAdaptor sample in the ParaView distribution.
// ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/PhastaAdaptor/PhastaAdaptor.cxx
// Fortran specific header
// ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/
#include "FortranAdaptorAPI.h"
// CoProcessor specific headers
// Routines that take the place of VTK dataset object creation.
// Called from Fortran code which also calls the Fortran Adaptor API
// supplied with ParaView source.
// Note: names mangled with trailing underscores for linker visibility.
#include "vtkCPDataDescription.h"
#include "vtkCPInputDataDescription.h"
#include "vtkCPProcessor.h"
#include "vtkDoubleArray.h"
#include "vtkPointData.h"
#include "vtkSmartPointer.h"
#include "vtkImageData.h"
#include <string>
// These will be called from the Fortran "glue" code"
// Completely dependent on data layout, structured vs. unstructured, etc.
// since VTK/ParaView uses different internal layouts for each.
// Creates the data container for the CoProcessor.
// Takes the extents for both the global dataset and the particular subsection
// visible to the current MPI rank.
// Note: expects to receive Fortran base-1 indices.
extern "C" void createcpimagedata_(int* nx, int* ny, int* nz, int* xst, int* xen,
int* yst, int* yen, int* zst, int* zen)
{
if (!ParaViewCoProcessing::GetCoProcessorData()) {
vtkGenericWarningMacro("Unable to access CoProcessorData.");
return;
}
// The simulation grid is a 3-dimensional topologically and geometrically
// regular grid. In VTK/ParaView, this is considered an image data set.
vtkSmartPointer<vtkImageData> img = vtkSmartPointer<vtkImageData>::New();
img->SetExtent(*xst - 1, *xen - 1, *yst - 1, *yen - 1, *zst - 1, *zen - 1);
img->SetSpacing( 1.0 / *nx, 1.0 / *ny, 1.0 / *nz);
ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input")->SetGrid(img);
ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input")->SetWholeExtent(0, *nx - 1, 0, *ny - 1, 0, *nz - 1);
}
// Add field(s) to the data container.
// Separate from above because this will be dynamic, grid is static.
// Might be an issue, VTK assumes row major C storage.
// Underscore is by-hand name mangling for fortran linking.
// Note: Expects the address of the data, has no way of determining
// if the array is densely packed or not.
// Note 2: Expects "name" to point to null-terminated character array,
// be sure to null-terminate in caller.
extern "C" void addfield_(double* a, char* name)
{
vtkSmartPointer<vtkCPInputDataDescription> idd = ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input");
vtkSmartPointer<vtkImageData> img = vtkImageData::SafeDownCast(idd->GetGrid());
if (!img) {
vtkGenericWarningMacro("No adaptor grid to attach field data to.");
return;
}
if (idd->IsFieldNeeded(name)) {
vtkSmartPointer<vtkDoubleArray> field = vtkSmartPointer<vtkDoubleArray>::New();
field->SetName(name);
field->SetArray(a, img->GetNumberOfPoints(), 1);
img->GetPointData()->AddArray(field);
}
}
Pros
edit- Some VTK/ParaView filters work with ghost (halo) cells, code could be extended to handle those.
Cons
edit- Much code added to the simulation code itself.
- Additional memory required for the halos and packed arrays.
- Additional overhead of reshaping arrays for C++.
- A different decomposition library might require changing the halo logic.
- Would not work as is with a 3D decomposition library.
Reinterpreting simulation nodes as visualization cells
editAnother way to look at the data, is that each node of the simulation is a cell in the visualization domain. Then when the pencils are sent to ParaView, there aren’t any gaps of missing cells.
Code changes in simulation code
edit
|
(
) |
- None, beyond the regular CoProcessing calls.
PROGRAM main
!-----------------------------------------------------------------------------------
!
!
! PURPOSE
!
! This program numerically solves the 3D incompressible Navier-Stokes
! on a Cubic Domain [0,2pi]x[0,2pi]x[0,2pi] using pseudo-spectral methods and
! Implicit Midpoint rule timestepping. The numerical solution is compared to
! an exact solution reported by Shapiro
!
! Analytical Solution:
! u(x,y,z,t)=-0.25*(cos(x)sin(y)sin(z)+sin(x)cos(y)cos(z))exp(-t/Re)
! v(x,y,z,t)= 0.25*(sin(x)cos(y)sin(z)-cos(x)sin(y)cos(z))exp(-t/Re)
! w(x,y,z,t)= 0.5*cos(x)cos(y)sin(z)exp(-t/Re)
!
! .. Parameters ..
! Nx = number of modes in x - power of 2 for FFT
! Ny = number of modes in y - power of 2 for FFT
! Nz = number of modes in z - power of 2 for FFT
! Nt = number of timesteps to take
! Tmax = maximum simulation time
! FFTW_IN_PLACE = value for FFTW input
! FFTW_MEASURE = value for FFTW input
! FFTW_EXHAUSTIVE = value for FFTW input
! FFTW_PATIENT = value for FFTW input
! FFTW_ESTIMATE = value for FFTW input
! FFTW_FORWARD = value for FFTW input
! FFTW_BACKWARD = value for FFTW input
! pi = 3.14159265358979323846264338327950288419716939937510d0
! Re = Reynolds number
! .. Scalars ..
! i = loop counter in x direction
! j = loop counter in y direction
! k = loop counter in z direction
! n = loop counter for timesteps direction
! allocatestatus = error indicator during allocation
! count = keep track of information written to disk
! iol = size of array to write to disk
! start = variable to record start time of program
! finish = variable to record end time of program
! count_rate = variable for clock count rate
! planfxyz = Forward 3d fft plan
! planbxyz = Backward 3d fft plan
! dt = timestep
! .. Arrays ..
! u = velocity in x direction
! v = velocity in y direction
! w = velocity in z direction
! uold = velocity in x direction at previous timestep
! vold = velocity in y direction at previous timestep
! wold = velocity in z direction at previous timestep
! ux = x derivative of velocity in x direction
! uy = y derivative of velocity in x direction
! uz = z derivative of velocity in x direction
! vx = x derivative of velocity in y direction
! vy = y derivative of velocity in y direction
! vz = z derivative of velocity in y direction
! wx = x derivative of velocity in z direction
! wy = y derivative of velocity in z direction
! wz = z derivative of velocity in z direction
! uxold = x derivative of velocity in x direction
! uyold = y derivative of velocity in x direction
! uzold = z derivative of velocity in x direction
! vxold = x derivative of velocity in y direction
! vyold = y derivative of velocity in y direction
! vzold = z derivative of velocity in y direction
! wxold = x derivative of velocity in z direction
! wyold = y derivative of velocity in z direction
! wzold = z derivative of velocity in z direction
! utemp = temporary storage of u to check convergence
! vtemp = temporary storage of u to check convergence
! wtemp = temporary storage of u to check convergence
! temp_r = temporary storage for untransformed variables
! uhat = Fourier transform of u
! vhat = Fourier transform of v
! what = Fourier transform of w
! rhsuhatfix = Fourier transform of righthand side for u for timestepping
! rhsvhatfix = Fourier transform of righthand side for v for timestepping
! rhswhatfix = Fourier transform of righthand side for w for timestepping
! nonlinuhat = Fourier transform of nonlinear term for u
! nonlinvhat = Fourier transform of nonlinear term for u
! nonlinwhat = Fourier transform of nonlinear term for u
! phat = Fourier transform of nonlinear term for pressure, p
! temp_c = temporary storage for Fourier transforms
! realtemp = Real storage
!
! .. Vectors ..
! kx = fourier frequencies in x direction
! ky = fourier frequencies in y direction
! kz = fourier frequencies in z direction
! x = x locations
! y = y locations
! z = y locations
! time = times at which save data
! name_config = array to store filename for data to be saved
!
! REFERENCES
!
! A. Shapiro " The use of an exact solution of the Navier-Stokes equations
! in a validation test of a three-dimensional nonhydrostatic numerical model"
! Monthly Weather Review vol. 121, 2420-2425, (1993).
!
! ACKNOWLEDGEMENTS
!
! ACCURACY
!
! ERROR INDICATORS AND WARNINGS
!
! FURTHER COMMENTS
!
! This program has not been optimized to use the least amount of memory
! but is intended as an example only for which all states can be saved
!
!--------------------------------------------------------------------------------
! External routines required
!
! External libraries required
! 2DECOMP&FFT -- Fast Fourier Transform in the West Library
! (http://2decomp.org/)
!---------------------------------------------------------------------------------
USE decomp_2d
USE decomp_2d_fft
USE decomp_2d_io
USE MPI
!coprocessing:
use NSadaptor_module
IMPLICIT NONE
! declare variables
INTEGER(kind=4), PARAMETER :: Nx=256
INTEGER(kind=4), PARAMETER :: Ny=256
INTEGER(kind=4), PARAMETER :: Nz=256
INTEGER(kind=4), PARAMETER :: Lx=1
INTEGER(kind=4), PARAMETER :: Ly=1
INTEGER(kind=4), PARAMETER :: Lz=1
INTEGER(kind=4), PARAMETER :: Nt=20
REAL(kind=8), PARAMETER :: dt=0.05d0/Nt
REAL(kind=8), PARAMETER :: Re=1.0d0
REAL(kind=8), PARAMETER :: tol=0.1d0**10
REAL(kind=8), PARAMETER :: theta=0.0d0
REAL(kind=8), PARAMETER &
:: pi=3.14159265358979323846264338327950288419716939937510d0
REAL(kind=8), PARAMETER :: ReInv=1.0d0/REAL(Re,kind(0d0))
REAL(kind=8), PARAMETER :: dtInv=1.0d0/REAL(dt,kind(0d0))
REAL(kind=8) :: scalemodes,chg,factor
REAL(kind=8), DIMENSION(:), ALLOCATABLE :: x, y, z, time,mychg,allchg
COMPLEX(kind=8), DIMENSION(:,:,:), ALLOCATABLE :: u, v, w,&
ux, uy, uz,&
vx, vy, vz,&
wx, wy, wz,&
uold, uxold, uyold, uzold,&
vold, vxold, vyold, vzold,&
wold, wxold, wyold, wzold,&
utemp, vtemp, wtemp, temp_r
COMPLEX(kind=8), DIMENSION(:), ALLOCATABLE :: kx, ky, kz
COMPLEX(kind=8), DIMENSION(:,:,:), ALLOCATABLE :: uhat, vhat, what,&
rhsuhatfix, rhsvhatfix,&
rhswhatfix, nonlinuhat,&
nonlinvhat, nonlinwhat,&
phat,temp_c
!coprocessing: added x,y,z component arrays
REAL(kind=8), DIMENSION(:,:,:), ALLOCATABLE :: realtemp, realtempx, &
realtempy, realtempz
! MPI and 2DECOMP variables
TYPE(DECOMP_INFO) :: decomp
INTEGER(kind=MPI_OFFSET_KIND) :: filesize, disp
INTEGER(kind=4) :: p_row=0, p_col=0, numprocs, myid, ierr
! variables used for saving data and timing
INTEGER(kind=4) :: count, iol
INTEGER(kind=4) :: i,j,k,n,t,allocatestatus
INTEGER(kind=4) :: ind, numberfile
CHARACTER*100 :: name_config
INTEGER(kind=4) :: start, finish, count_rate
! initialisation of 2DECOMP&FFT
CALL MPI_INIT(ierr)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)
! do automatic domain decomposition
CALL decomp_2d_init(Nx,Ny,Nz,p_row,p_col)
! get information about domain decomposition choosen
CALL decomp_info_init(Nx,Ny,Nz,decomp)
! initialise FFT library
CALL decomp_2d_fft_init
IF (myid.eq.0) THEN
PRINT *,'Grid:',Nx,'X',Ny,'Y',Nz,'Z'
PRINT *,'dt:',dt
END IF
ALLOCATE(x(1:Nx),y(1:Ny),z(1:Nz),time(1:Nt+1),mychg(1:3),allchg(1:3),&
u(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
v(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
w(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
ux(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
uy(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
uz(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vx(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vy(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vz(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wx(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wy(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wz(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
uold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
uxold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
uyold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
uzold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vxold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vyold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vzold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wxold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wyold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wzold(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
utemp(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
vtemp(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
wtemp(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
temp_r(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
kx(1:Nx),ky(1:Ny),kz(1:Nz),&
uhat(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
vhat(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
what(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
rhsuhatfix(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
rhsvhatfix(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
rhswhatfix(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
nonlinuhat(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
nonlinvhat(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
nonlinwhat(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
phat(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
temp_c(decomp%zst(1):decomp%zen(1),&
decomp%zst(2):decomp%zen(2),&
decomp%zst(3):decomp%zen(3)),&
realtemp(decomp%xst(1):decomp%xen(1),&
decomp%xst(2):decomp%xen(2),&
decomp%xst(3):decomp%xen(3)),&
realtempx(decomp%xst(1):decomp%xen(1), &
decomp%xst(2):decomp%xen(2), &
decomp%xst(3):decomp%xen(3)),&
realtempy(decomp%xst(1):decomp%xen(1), &
decomp%xst(2):decomp%xen(2), &
decomp%xst(3):decomp%xen(3)),&
realtempz(decomp%xst(1):decomp%xen(1), &
decomp%xst(2):decomp%xen(2), &
decomp%xst(3):decomp%xen(3)),&
stat=AllocateStatus)
IF (AllocateStatus .ne. 0) STOP
IF (myid.eq.0) THEN
PRINT *,'allocated space'
END IF
! setup fourier frequencies in x-direction
DO i=1,Nx/2+1
kx(i)= cmplx(0.0d0,1.0d0)*REAL(i-1,kind(0d0))/Lx
END DO
kx(1+Nx/2)=0.0d0
DO i = 1,Nx/2 -1
kx(i+1+Nx/2)=-kx(1-i+Nx/2)
END DO
ind=1
DO i=-Nx/2,Nx/2-1
x(ind)=2.0d0*pi*REAL(i,kind(0d0))*Lx/REAL(Nx,kind(0d0))
ind=ind+1
END DO
! setup fourier frequencies in y-direction
DO j=1,Ny/2+1
ky(j)= cmplx(0.0d0,1.0d0)*REAL(j-1,kind(0d0))/Ly
END DO
ky(1+Ny/2)=0.0d0
DO j = 1,Ny/2 -1
ky(j+1+Ny/2)=-ky(1-j+Ny/2)
END DO
ind=1
DO j=-Ny/2,Ny/2-1
y(ind)=2.0d0*pi*REAL(j,kind(0d0))*Ly/REAL(Ny,kind(0d0))
ind=ind+1
END DO
! setup fourier frequencies in z-direction
DO k=1,Nz/2+1
kz(k)= cmplx(0.0d0,1.0d0)*REAL(k-1,kind(0d0))/Lz
END DO
kz(1+Nz/2)=0.0d0
DO k = 1,Nz/2 -1
kz(k+1+Nz/2)=-kz(1-k+Nz/2)
END DO
ind=1
DO k=-Nz/2,Nz/2-1
z(ind)=2.0d0*pi*REAL(k,kind(0d0))*Lz/REAL(Nz,kind(0d0))
ind=ind+1
END DO
scalemodes=1.0d0/REAL(Nx*Ny*Nz,kind(0d0))
IF (myid.eq.0) THEN
PRINT *,'Setup grid and fourier frequencies'
END IF
!initial conditions for Taylor-Green vortex
! factor=2.0d0/sqrt(3.0d0)
! DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
! u(i,j,k)=factor*sin(theta+2.0d0*pi/3.0d0)*sin(x(i))*cos(y(j))*cos(z(k))
! END DO; END DO; END DO
! DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
! v(i,j,k)=factor*sin(theta-2.0d0*pi/3.0d0)*cos(x(i))*sin(y(j))*cos(z(k))
! END DO ; END DO ; END DO
! DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
! w(i,j,k)=factor*sin(theta)*cos(x(i))*cos(y(j))*sin(z(k))
! END DO ; END DO ; END DO
time(1)=0.0d0
factor=sqrt(3.0d0)
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
u(i,j,k)=-0.5*( factor*cos(x(i))*sin(y(j))*sin(z(k))&
+sin(x(i))*cos(y(j))*cos(z(k)) )*exp(-(factor**2)*time(1)/Re)
END DO; END DO; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
v(i,j,k)=0.5*( factor*sin(x(i))*cos(y(j))*sin(z(k))&
-cos(x(i))*sin(y(j))*cos(z(k)) )*exp(-(factor**2)*time(1)/Re)
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
w(i,j,k)=cos(x(i))*cos(y(j))*sin(z(k))*exp(-(factor**2)*time(1)/Re)
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(u,uhat,DECOMP_2D_FFT_FORWARD)
CALL decomp_2d_fft_3d(v,vhat,DECOMP_2D_FFT_FORWARD)
CALL decomp_2d_fft_3d(w,what,DECOMP_2D_FFT_FORWARD)
! derivative of u with respect to x, y, and z
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=uhat(i,j,k)*kx(i)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,ux,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=uhat(i,j,k)*ky(j)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,uy,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=uhat(i,j,k)*kz(k)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,uz,DECOMP_2D_FFT_BACKWARD)
! derivative of v with respect to x, y, and z
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=vhat(i,j,k)*kx(i)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,vx,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=vhat(i,j,k)*ky(j)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,vy,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=vhat(i,j,k)*kz(k)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,vz,DECOMP_2D_FFT_BACKWARD)
! derivative of w with respect to x, y, and z
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=what(i,j,k)*kx(i)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,wx,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=what(i,j,k)*ky(j)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,wy,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=what(i,j,k)*kz(k)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,wz,DECOMP_2D_FFT_BACKWARD)
! save initial data
n = 0
!coprocessing: removed savedata calls from coprocessing version
!coprocessing: could have this after the coprocessor initialize and
!coprocessing: followed by a call to the coprocessor to write out the initial data
!coprocessing: image. Otherwise, these are just wasted calls at this spot.
call coprocessorinitialize("pipeline.py", 11)
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
realtempx(i,j,k)=REAL(wy(i,j,k)-vz(i,j,k),KIND=8)
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
realtempy(i,j,k)=REAL(uz(i,j,k)-wx(i,j,k),KIND=8)
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
realtempz(i,j,k)=REAL(vx(i,j,k)-uy(i,j,k),KIND=8)
END DO; END DO ; END DO
call NSadaptor(Nx,Ny,Nz,decomp%xst(1),decomp%xen(1), &
decomp%xst(2),decomp%xen(2),decomp%xst(3),decomp%xen(3), n, n*dt, &
realtempx, realtempy, realtempz)
!start timer
CALL system_clock(start,count_rate)
! coprocessing: Simulation loop starts here
DO n=1,Nt
!fixed point
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
uold(i,j,k)=u(i,j,k)
uxold(i,j,k)=ux(i,j,k)
uyold(i,j,k)=uy(i,j,k)
uzold(i,j,k)=uz(i,j,k)
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
vold(i,j,k)=v(i,j,k)
vxold(i,j,k)=vx(i,j,k)
vyold(i,j,k)=vy(i,j,k)
vzold(i,j,k)=vz(i,j,k)
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
wold(i,j,k)=w(i,j,k)
wxold(i,j,k)=wx(i,j,k)
wyold(i,j,k)=wy(i,j,k)
wzold(i,j,k)=wz(i,j,k)
END DO; END DO ; END DO
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
rhsuhatfix(i,j,k) = (dtInv+(0.5*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)))*uhat(i,j,k)
END DO; END DO ; END DO
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
rhsvhatfix(i,j,k) = (dtInv+(0.5*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)))*vhat(i,j,k)
END DO; END DO ; END DO
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
rhswhatfix(i,j,k) = (dtInv+(0.5*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)))*what(i,j,k)
END DO; END DO ; END DO
chg=1
DO WHILE (chg .gt. tol)
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
temp_r(i,j,k)=0.25d0*((u(i,j,k)+uold(i,j,k))*(ux(i,j,k)+uxold(i,j,k))&
+(v(i,j,k)+vold(i,j,k))*(uy(i,j,k)+uyold(i,j,k))&
+(w(i,j,k)+wold(i,j,k))*(uz(i,j,k)+uzold(i,j,k)))
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_r,nonlinuhat,DECOMP_2D_FFT_FORWARD)
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
temp_r(i,j,k)=0.25d0*((u(i,j,k)+uold(i,j,k))*(vx(i,j,k)+vxold(i,j,k))&
+(v(i,j,k)+vold(i,j,k))*(vy(i,j,k)+vyold(i,j,k))&
+(w(i,j,k)+wold(i,j,k))*(vz(i,j,k)+vzold(i,j,k)))
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_r,nonlinvhat,DECOMP_2D_FFT_FORWARD)
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
temp_r(i,j,k)=0.25d0*((u(i,j,k)+uold(i,j,k))*(wx(i,j,k)+wxold(i,j,k))&
+(v(i,j,k)+vold(i,j,k))*(wy(i,j,k)+wyold(i,j,k))&
+(w(i,j,k)+wold(i,j,k))*(wz(i,j,k)+wzold(i,j,k)))
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_r,nonlinwhat,DECOMP_2D_FFT_FORWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
phat(i,j,k)=-1.0d0*( kx(i)*nonlinuhat(i,j,k)&
+ky(j)*nonlinvhat(i,j,k)&
+kz(k)*nonlinwhat(i,j,k))&
/(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)+0.1d0**13)
END DO; END DO ; END DO
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
uhat(i,j,k)=(rhsuhatfix(i,j,k)-nonlinuhat(i,j,k)-kx(i)*phat(i,j,k))/&
(dtInv-(0.5d0*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k))) !*scalemodes
END DO; END DO ; END DO
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
vhat(i,j,k)=(rhsvhatfix(i,j,k)-nonlinvhat(i,j,k)-ky(j)*phat(i,j,k))/&
(dtInv-(0.5d0*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k))) !*scalemodes
END DO; END DO ; END DO
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
what(i,j,k)=(rhswhatfix(i,j,k)-nonlinwhat(i,j,k)-kz(k)*phat(i,j,k))/&
(dtInv-(0.5d0*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k))) !*scalemodes
END DO; END DO ; END DO
! derivative of u with respect to x, y, and z
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=uhat(i,j,k)*kx(i)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,ux,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3); DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=uhat(i,j,k)*ky(j)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,uy,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3); DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=uhat(i,j,k)*kz(k)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,uz,DECOMP_2D_FFT_BACKWARD)
! derivative of v with respect to x, y, and z
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=vhat(i,j,k)*kx(i)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,vx,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3); DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=vhat(i,j,k)*ky(j)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,vy,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=vhat(i,j,k)*kz(k)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,vz,DECOMP_2D_FFT_BACKWARD)
! derivative of w with respect to x, y, and z
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=what(i,j,k)*kx(i)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,wx,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=what(i,j,k)*ky(j)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,wy,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
temp_c(i,j,k)=what(i,j,k)*kz(k)*scalemodes
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(temp_c,wz,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
utemp(i,j,k)=u(i,j,k)
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
vtemp(i,j,k)=v(i,j,k)
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
wtemp(i,j,k)=w(i,j,k)
END DO; END DO ; END DO
CALL decomp_2d_fft_3d(uhat,u,DECOMP_2D_FFT_BACKWARD)
CALL decomp_2d_fft_3d(vhat,v,DECOMP_2D_FFT_BACKWARD)
CALL decomp_2d_fft_3d(what,w,DECOMP_2D_FFT_BACKWARD)
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
u(i,j,k)=u(i,j,k)*scalemodes
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
v(i,j,k)=v(i,j,k)*scalemodes
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
w(i,j,k)=w(i,j,k)*scalemodes
END DO; END DO ; END DO
mychg(1) =maxval(abs(utemp-u))
mychg(2) =maxval(abs(vtemp-v))
mychg(3) =maxval(abs(wtemp-w))
CALL MPI_ALLREDUCE(mychg,allchg,3,MPI_DOUBLE_PRECISION,MPI_MAX,MPI_COMM_WORLD,ierr)
chg=allchg(1)+allchg(2)+allchg(3)
IF (myid.eq.0) THEN
PRINT *,'chg:',chg
END IF
END DO
time(n+1)=n*dt
!goto 5100
IF (myid.eq.0) THEN
PRINT *,'time',n*dt
END IF
!coprocessing: populating the arrays sent to the coprocessor
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
realtempx(i,j,k)=REAL(wy(i,j,k)-vz(i,j,k),KIND=8)
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
realtempy(i,j,k)=REAL(uz(i,j,k)-wx(i,j,k),KIND=8)
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
realtempz(i,j,k)=REAL(vx(i,j,k)-uy(i,j,k),KIND=8)
END DO; END DO ; END DO
call NSadaptor(Nx,Ny,Nz,decomp%xst(1),decomp%xen(1), &
decomp%xst(2),decomp%xen(2),decomp%xst(3),decomp%xen(3), n, n*dt, &
realtempx, realtempy, realtempz)
END DO
!coprocessing:
call coprocessorfinalize()
CALL system_clock(finish,count_rate)
IF (myid.eq.0) then
PRINT *, 'Program took', REAL(finish-start)/REAL(count_rate), 'for main timestepping loop'
END IF
IF (myid.eq.0) THEN
name_config = './data/tdata.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO n=1,1+Nt
WRITE(11,*) time(n)
END DO
CLOSE(11)
name_config = './data/xcoord.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO i=1,Nx
WRITE(11,*) x(i)
END DO
CLOSE(11)
name_config = './data/ycoord.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO j=1,Ny
WRITE(11,*) y(j)
END DO
CLOSE(11)
name_config = './data/zcoord.dat'
OPEN(unit=11,FILE=name_config,status="UNKNOWN")
REWIND(11)
DO k=1,Nz
WRITE(11,*) z(k)
END DO
CLOSE(11)
PRINT *,'Saved data'
END IF
! Calculate error in final numerical solution
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
utemp(i,j,k)=u(i,j,k) -&
(-0.5*( factor*cos(x(i))*sin(y(j))*sin(z(k))&
+sin(x(i))*cos(y(j))*cos(z(k)) )*exp(-(factor**2)*time(Nt+1)/Re))
END DO; END DO; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
vtemp(i,j,k)=v(i,j,k) -&
(0.5*( factor*sin(x(i))*cos(y(j))*sin(z(k))&
-cos(x(i))*sin(y(j))*cos(z(k)) )*exp(-(factor**2)*time(Nt+1)/Re))
END DO; END DO ; END DO
DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
wtemp(i,j,k)=w(i,j,k)-&
(cos(x(i))*cos(y(j))*sin(z(k))*exp(-(factor**2)*time(Nt+1)/Re))
END DO; END DO ; END DO
mychg(1) = maxval(abs(utemp))
mychg(2) = maxval(abs(vtemp))
mychg(3) = maxval(abs(wtemp))
CALL MPI_ALLREDUCE(mychg,allchg,3,MPI_DOUBLE_PRECISION,MPI_MAX,MPI_COMM_WORLD,ierr)
chg=allchg(1)+allchg(2)+allchg(3)
IF (myid.eq.0) THEN
PRINT*,'The error at the final timestep is',chg
END IF
! clean up
CALL decomp_2d_fft_finalize
CALL decomp_2d_finalize
DEALLOCATE(x,y,z,time,mychg,allchg,u,v,w,ux,uy,uz,vx,vy,vz,wx,wy,wz,uold,uxold,uyold,uzold,&
vold,vxold,vyold,vzold,wold,wxold,wyold,wzold,utemp,vtemp,wtemp,&
temp_r,kx,ky,kz,uhat,vhat,what,rhsuhatfix,rhsvhatfix,&
rhswhatfix,phat,nonlinuhat,nonlinvhat,nonlinwhat,temp_c,&
realtemp,stat=AllocateStatus)
!coprocessing: deallocate Coprocessing specific arrays
deallocate(realtempx, realtempy, realtempz)
IF (AllocateStatus .ne. 0) STOP
IF (myid.eq.0) THEN
PRINT *,'Program execution complete'
END IF
CALL MPI_FINALIZE(ierr)
END PROGRAM main
Code changes in f90 adaptor glue code
edit
|
(
) |
- Accept the subextents of the current pencil.
! Fortran module for interacting with the ParaView CoProcessor
! loosely based on:
! ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/PhastaAdaptor/phastaadaptor.f
! -- Changed for SESE
! Subroutine determines if coprocessing needed during the
! current simulation step or not, and if so, calls the coprocessor.
! Some of the subroutines are supplied by ParaView's FortranAdaptorAPI,
! others have to be supplied by the programmer.
module NSadaptor_module
use iso_c_binding
implicit none
public
interface NSadaptor
module procedure NSadaptor3D
end interface NSadaptor
contains
subroutine NSadaptor3D(nx, ny, nz, xst, xen, yst, yen, zst, zen, &
step, time, omegax, omegay, omegaz)
! nx, ny, nz -- grid dimensions of entire mesh
! used for setting whole extent
! xst, xen, etc. -- extents of current subdomain pencil
! step -- current simulation time step
! time -- current simulation time
! omega* -- scalar arrays for the current time step
integer, intent(in) :: nx, ny, nz, xst, xen, yst, yen, zst, zen, step
real(kind=8), intent(in) :: time
real(kind=8), dimension(:,:,:), intent (in) :: omegax, omegay, omegaz
integer :: flag
! check if processing this time step
! defined in FortranAdaptorAPI.h
call requestdatadescription(step, time, flag)
if (flag /= 0) then
! processing requested
! check if need to create grid
! defined in FortranAdaptorAPI.h
call needtocreategrid(flag)
if (flag /= 0) then
! grid needed
! defined in VTKCellBasedDataSet.cxx
! takes the size of the entire grid, and the extents of the
! sub grid.
call createcpimagedata(nx, ny, nz, xst, xen, yst, yen, zst, zen)
end if
! defined in VTKCellBasedDataSet.cxx
! call for each field of interest. Be sure to null-terminate the
! string for C++!
call addfield(omegax, "realtempx"//char(0))
call addfield(omegay, "realtempy"//char(0))
call addfield(omegaz, "realtempz"//char(0))
! defined in FortranAdaptorAPI.h
call coprocess()
end if
return
end subroutine NSadaptor3D
end module NSadaptor_module
Code changes in C++ wrapping code
edit
|
(
) |
- Changed how the extents are calculated.
- Changes point data calls to cell data calls.
// Adaptor for getting fortran simulation code into ParaView CoProcessor.
// Based on the PhastaAdaptor sample in the ParaView distribution.
// ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/PhastaAdaptor/PhastaAdaptor.cxx
// Fortran specific header
// ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/
#include "FortranAdaptorAPI.h"
// CoProcessor specific headers
// Routines that take the place of VTK dataset object creation.
// Called from Fortran code which also calls the Fortran Adaptor API
// supplied with ParaView source.
// Note: names mangled with trailing underscores for linker visibility.
#include "vtkCPDataDescription.h"
#include "vtkCPInputDataDescription.h"
#include "vtkCPProcessor.h"
#include "vtkDoubleArray.h"
#include "vtkPointData.h"
#include "vtkSmartPointer.h"
#include "vtkImageData.h"
#include "vtkCellData.h"
#include <string>
// These will be called from the Fortran adaptor "glue" code.
// Completely dependent on data layout, structured vs. unstructured, etc.
// since VTK/ParaView uses different internal layouts for each.
// Creates the data container for the CoProcessor.
// Takes the extents for both the global dataset and the particular subsection
// visible to the current MPI rank.
// Note: expects to receive Fortran base-1 indices.
extern "C" void createcpimagedata_(int* nx, int* ny, int* nz, int* xst, int* xen,
int* yst, int* yen, int* zst, int* zen)
{
if (!ParaViewCoProcessing::GetCoProcessorData()) {
vtkGenericWarningMacro("Unable to access CoProcessorData.");
return;
}
// The simulation grid is a 3-dimensional topologically and geometrically
// regular grid. In VTK/ParaView, this is considered an image data set.
vtkSmartPointer<vtkImageData> img = vtkSmartPointer<vtkImageData>::New();
// Indexing based on change from F90 to C++, and also from nodal to cellular.
img->SetExtent(*xst - 1, *xen, *yst - 1, *yen, *zst - 1, *zen);
img->SetSpacing( 1.0 / *nx, 1.0 / *ny, 1.0 / *nz);
ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input")->SetGrid(img);
ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input")->SetWholeExtent(0, *nx, 0, *ny, 0, *nz);
}
// Add field to the data container.
// Separate from above because this could be dynamic, grid is static.
// Might be an issue, VTK assumes row major C storage.
// Underscore is by-hand name mangling for fortran linking.
// Note: Expects the address of the data, has no way of determining
// if the array is densely packed or not.
// Note 2: Assumes "name" points to null-terminated array of chars.
// Easiest way to do that is to concatenate a terminal NULL in caller.
extern "C" void addfield_(double* a, char* name)
{
vtkSmartPointer<vtkCPInputDataDescription> idd = ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input");
vtkSmartPointer<vtkImageData> img = vtkImageData::SafeDownCast(idd->GetGrid());
if (!img) {
vtkGenericWarningMacro("No adaptor grid to attach field data to.");
return;
}
if (idd->IsFieldNeeded(name)) {
vtkSmartPointer<vtkDoubleArray> field = vtkSmartPointer<vtkDoubleArray>::New();
field->SetName(name);
field->SetArray(a, img->GetNumberOfCells(), 1);
img->GetCellData()->AddArray(field);
}
}
Pros
edit- Minimal code changes, none in simulation code (beyond the coprocessing calls).
- No halo or packed arrays needed.
- Should work with different decomposition libraries without change.
Cons
edit- Requires extra handling in the ParaView client to handle the differences between cell and point rendering.
- Won’t work as is with VTK/ParaView filters that require ghost cells.
Additional notes on creating pipelines in the ParaView client
editWorking with Fortran complex type
editC++ does not have a native complex data type. This requires some additional handling in the ParaView client:
- Run the original non-coprocessing simulation at a reduced mesh size.
- Open this data in the ParaView client using the RAW binary reader.
- Set the number of Scalar Components to 2
- Create the pipeline, the real and imaginary parts’s default names are ImageFile_X and ImageFile_Y, respectively.
|
(
) |
See the code for NLSadaptor.f90 for details on passing a complex to the coprocessor. Code download
! Fortran module for interacting with the ParaView CoProcessor
! loosely based on:
! ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/PhastaAdaptor/phastaadaptor.f
! -- Changed for SESE
! Subroutine determines if coprocessing needed during the
! current simulation step or not, and if so, calls the coprocessor.
! Some of the subroutines are supplied by ParaView's FortranAdaptorAPI,
! others have to be supplied by the programmer.
module NLSadaptor_module
use iso_c_binding
implicit none
public
! using an interface incase need arises to overload 1D, 2D versions
interface NLSadaptor
module procedure NLSadaptor3D
end interface NLSadaptor
contains
subroutine NLSadaptor3D(nx, ny, nz, xst, xen, yst, yen, zst, zen, &
step, time, a)
! nx, ny, nz -- grid dimensions of entire mesh
! used for setting whole extent
! xst, xen, etc. -- extents of current subdomain pencil
! step -- current simulation time step
! time -- current simulation time
! a -- scalar array for the current time step
! flag -- receives status from API calls
integer, intent(in) :: nx, ny, nz, xst, xen, yst, yen, zst, zen, step
real(kind=8), intent(in) :: time
complex(kind=8), dimension(:,:,:), intent (in) :: a
integer :: flag
flag = 0
! check if processing this time step
! defined in FortranAdaptorAPI.h
call requestdatadescription(step, time, flag)
if (flag /= 0) then
! processing requested
! check if need to create grid
! defined in FortranAdaptorAPI.h
call needtocreategrid(flag)
if (flag /= 0) then
! grid needed
! defined in adaptor.cxx
! takes the size of the entire grid, and the extents of the
! sub grid.
call createcpimagedata(nx, ny, nz, xst, xen, yst, yen, zst, zen)
end if
! defined in adaptor.cxx
! Be sure to null-terminate the
! string for C++!
call addfield(a, "u_complex"//char(0))
! defined in FortranAdaptorAPI.h
call coprocess()
end if
return
end subroutine NLSadaptor3D
end module NLSadaptor_module
|
(
) |
This also requires some minor code changes in the accompanying C++ file. Code Download
// Adaptor for getting fortran simulation code into ParaView CoProcessor. // Based on the PhastaAdaptor sample in the ParaView distribution. // ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/PhastaAdaptor/PhastaAdaptor.cxx
// Fortran specific header
// ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/
- include "FortranAdaptorAPI.h"
// CoProcessor specific headers // Routines that take the place of VTK dataset object creation. // Called from Fortran code which also calls the Fortran Adaptor API // supplied with ParaView source. // Note: names mangled with trailing underscores for linker visibility.
- include "vtkCPDataDescription.h"
- include "vtkCPInputDataDescription.h"
- include "vtkCPProcessor.h"
- include "vtkDoubleArray.h"
- include "vtkPointData.h"
- include "vtkSmartPointer.h"
- include "vtkImageData.h"
- include "vtkCellData.h"
- include <string>
// These will be called from the Fortran "glue" code" // Completely dependent on data layout, structured vs. unstructured, etc. // since VTK/ParaView uses different internal layouts for each.
// Creates the data container for the CoProcessor. // Takes the extents for both the global dataset and the particular subsection // visible to the current MPI rank. // Note: expects to receive Fortran base-1 indices. extern "C" void createcpimagedata_(int* nx, int* ny, int* nz, int* xst, int* xen, int* yst, int* yen, int* zst, int* zen) {
if (!ParaViewCoProcessing::GetCoProcessorData()) { vtkGenericWarningMacro("Unable to access CoProcessorData."); return; }
// The simulation grid is a 3-dimensional topologically and geometrically // regular grid. In VTK/ParaView, this is considered an image data set. vtkSmartPointer<vtkImageData> img = vtkSmartPointer<vtkImageData>::New();
// For Fortram complex, need to set that there are 2 scalar components img->SetNumberOfScalarComponents(2);
// Indexing based on change from F90 to C++, and also from nodal to cellular. // Extents are given in terms of nodes, not cells. img->SetExtent(*xst - 1, *xen, *yst - 1, *yen, *zst - 1, *zen); // Setting spacing is important so that the camera position in the pipeline makes // sense if using different sized meshes between setting up the pipeline and running // the simulation. Origin can often be ignored. img->SetSpacing( 1.0 / *nx, 1.0 / *ny, 1.0 / *nz); // considering passing in as args. ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input")->SetGrid(img); ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input")->SetWholeExtent(0, *nx, 0, *ny, 0, *nz);
}
// Add field to the data container. // Separate from above because this could be dynamic, grid is static. // Might be an issue, VTK assumes row major C storage. // Underscore is by-hand name mangling for fortran linking. // Note: Expects the address of the data, has no way of determining // if the array is densely packed or not. // Note 2: Assumes "name" points to null-terminated array of chars. // Easiest way to do that is to concatenate in caller. extern "C" void addfield_(double* a, char* name) {
vtkSmartPointer<vtkCPInputDataDescription> idd = ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input"); vtkSmartPointer<vtkImageData> img = vtkImageData::SafeDownCast(idd->GetGrid());
if (!img) { vtkGenericWarningMacro("No adaptor grid to attach field data to."); return; }
if (idd->IsFieldNeeded(name)) { vtkSmartPointer<vtkDoubleArray> field = vtkSmartPointer<vtkDoubleArray>::New(); field->SetNumberOfComponents(2); field->SetName(name); field->SetArray(a, 2* img->GetNumberOfCells(), 1); img->GetCellData()->AddArray(field); }
}
Working with cell data
edit- Some ParaView filters and representations will only work with point data, e.g. volume rendering. Use the Cell Data to Point Data filter to convert. Be sure to check the option to pass the cellular data through the filter. If a simulation runs with no errors and produces no output, a possible culprit is failing to have checked that option when creating the pipeline.
vtkKDTreeGenerator warnings
edit- Pipelines that use volume rendering will generate warnings about Region IDs being 0. These can often be ignored.
ParaView CoProcessing Resources
edit- ParaView Wiki Entry
- SC10 Coprocessing Tutorial
- The ParaView source code, in particular the PHASTA Fortran example.
- paraview-users mailing list