Parallel Spectral Numerical Methods/Visualization with ParaView CoProcessing

The ParaView CoProcessing plugin allows an application code to be instrumented to connect to a ParaView server in order to execute a visualization pipeline. The pipeline can produce either images in a variety of formats or VTK XML parallel file format data sets.

A user creates a pipeline in the ParaView client using a sample, probably lower resolution, data set. This pipeline is then exported using the CoProcessor plugin as a Python script which will be loaded by the application.

The application developer also needs to write some Fortran/C++ code to convert the application’s data structures into a format ParaView can understand. In the case of completely regular grids, this is straightforward.

Using ParaView CoProcessing on a Single Computation Node

edit

ParaView client setup

edit

The first requirement is to have a ParaView client which can export the Python script. This requires building from source as of ParaView 3.14.1. The standard instructions apply with some additional instructions for building the script generator plugin. After building, the first time you launch ParaView, go to Tools, Manage Plugins and set the CoProcessingPlugin to load on startup. After the plugin is loaded, there should be two new menu items: Writers and CoProcessing

Client build notes:

edit
  • Might have to install CMake
  • Might have to build Qt
  • Might have to turn off building the Manta plugin

Existing code alterations

edit

The following instructions pertain to the Navier-Stokes CUDA Fortran as tested on NCSA's Forge during the Summer of 2012.

Simulation code

edit

 

 

 

 

( A)

navierstokes.cuf added 4 lines and removed .datbin writing. Code download

!--------------------------------------------------------------------
!
! PURPOSE
!
! This program numerically solves the 2D incompressible Navier-Stokes
! on a Square Domain [0,1]x[0,1] using pseudo-spectral methods and
! Crank-Nicolson timestepping. The numerical solution is compared to
! the exact Taylor-Green Vortex Solution. 
!
! Periodic free-slip boundary conditions and Initial conditions:
!       u(x,y,0)=sin(2*pi*x)cos(2*pi*y)
!       v(x,y,0)=-cos(2*pi*x)sin(2*pi*y)
! Analytical Solution (subscript denote derivatives):
!       u(x,y,t)=sin(2*pi*x)cos(2*pi*y)exp(-8*pi^2*t/Re)
!       v(x,y,t)=-cos(2*pi*x)sin(2*pi*y)exp(-8*pi^2*t/Re)
!   u_y(x,y,t)=-2*pi*sin(2*pi*x)sin(2*pi*y)exp(-8*pi^2*t/Re)
!       v_x(x,y,t)=2*pi*sin(2*pi*x)sin(2*pi*y)exp(-8*pi^2*t/Re)
!       omega=v_x-u_y
!
! .. Parameters ..
!  Nx                           = number of modes in x - power of 2 for FFT
!  Ny                           = number of modes in y - power of 2 for FFT
!  nplots                       = number of plots produced
!  plotgap                      = number of timesteps inbetween plots
!  Re                           = dimensionless Renold's number
!  ReInv                        = 1/Re for optimization
!  dt                           = timestep size 
!  dtInv                        = 1/dt for optimization
!  tol                          = determines when convergences is reached
!  scalemodes           = 1/(Nx*Ny), scaling after preforming FFTs
!  numthreads           = number of CPUs used in calculation
! .. Scalars ..
!  i                            = loop counter in x direction
!  j                            = loop counter in y direction
!  n                            = loop counter for timesteps direction  
!  allocatestatus       = error indicator during allocation
!  time                         = times at which data is saved
!  chg                          = error at each iteration
!  temp                         = used for ordering saved omega 
! .. Arrays (gpu) ..
!  u_d                          = velocity in x direction
!  uold_d                               = velocity in x direction at previous timestep
!  v_d                          = velocity in y direction
!  vold_d                               = velocity in y direction at previous timestep
!  omeg_d                               = vorticity     in real space
!  omeghat_d                    = 2D Fourier transform of vorticity
!                                               at next iterate
!  omegoldhat_d         = 2D Fourier transform of vorticity at previous
!                                               iterate
!  nloldhat_d                   = nonlinear term in Fourier space
!                                               at previous iterate
!  omegexact_d          = taylor-green vorticity at
!                                               at final step
!  psihat_d                     = 2D Fourier transform of streamfunction
!                                               at next iteration
!  omegcheck_d          = store of vorticity at previous iterate
!  temp1_d/temp2_d              = reusable complex/real space used for
!                                               calculations. This reduces number of
!                                               arrays stored.
! .. Vectors (gpu) ..
!  kx_d                         = fourier frequencies in x direction
!  ky_d                         = fourier frequencies in y direction
!  x_d                          = x locations
!  y_d                          = y locations
!  name_config          = array to store filename for data to be saved                  
! REFERENCES
!
! ACKNOWLEDGEMENTS
!
! ACCURACY
!               
! ERROR INDICATORS AND WARNINGS
!
! FURTHER COMMENTS
! This program has not been fully optimized.
!--------------------------------------------------------------------   
module precision
  ! Precision control
  
  integer, parameter, public :: Single = kind(0.0) ! Single precision
  integer, parameter, public :: Double = kind(0.0d0) ! Double precision
  
  integer, parameter, public :: fp_kind = Double
  !integer, parameter, public :: fp_kind = Single
  
end module precision

module cufft
  
  integer, public :: CUFFT_FORWARD = -1
  integer, public :: CUFFT_INVERSE = 1
  integer, public :: CUFFT_R2C = Z'2a' ! Real to Complex (interleaved)
  integer, public :: CUFFT_C2R = Z'2c' ! Complex (interleaved) to Real
  integer, public :: CUFFT_C2C = Z'29' ! Complex to Complex, interleaved
  integer, public :: CUFFT_D2Z = Z'6a' ! Double to Double-Complex
  integer, public :: CUFFT_Z2D = Z'6c' ! Double-Complex to Double
  integer, public :: CUFFT_Z2Z = Z'69' ! Double-Complex to Double-Complex
  
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  !
  ! cufftPlan2d(cufftHandle *plan, int nx,int ny, cufftType type)
  !
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  
  interface cufftPlan2d
     subroutine cufftPlan2d(plan, nx, ny, type) bind(C,name='cufftPlan2d')
       use iso_c_binding
       integer(c_int):: plan
       integer(c_int),value:: nx, ny, type
     end subroutine cufftPlan2d
  end interface
  
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  !
  ! cufftDestroy(cufftHandle plan)
  !
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  
  interface cufftDestroy
     subroutine cufftDestroy(plan) bind(C,name='cufftDestroy')
       use iso_c_binding
       integer(c_int),value:: plan
     end subroutine cufftDestroy
  end interface
  
  interface cufftExecD2Z
     subroutine cufftExecD2Z(plan, idata, odata) &
          & bind(C,name='cufftExecD2Z')
       use iso_c_binding
       use precision
       integer(c_int),  value  :: plan
       real(fp_kind),   device :: idata(1:nx,1:ny)
       complex(fp_kind),device :: odata(1:nx,1:ny)
     end subroutine cufftExecD2Z
  end interface
  
  interface cufftExecZ2D
     subroutine cufftExecZ2D(plan, idata, odata) &
          & bind(C,name='cufftExecZ2D')
       use iso_c_binding
       use precision
       integer(c_int),value:: plan
       complex(fp_kind),device:: idata(1:nx,1:ny)
       real(fp_kind),device :: odata(1:nx,1:ny)
     end subroutine cufftExecZ2D
  end interface
  
end module cufft

PROGRAM main
  use precision
  use cufft
  
  ! coprocessing:
  use ns2dcnadaptor

  ! declare variables
  IMPLICIT NONE
  INTEGER(kind=4), PARAMETER            ::  Nx=1024
  INTEGER(kind=4), PARAMETER            ::  Ny=1024     
  !INTEGER(kind=8)                                              ::  
  REAL(fp_kind), PARAMETER              ::  dt=0.00025d0
  REAL(fp_kind), PARAMETER              ::  dtInv=1.0d0/REAL(dt,kind(0d0))   
  REAL(fp_kind), PARAMETER      &
       ::  pi=3.14159265358979323846264338327950288419716939937510d0
  REAL(fp_kind), PARAMETER              ::  Re=5000.0d0                 
  REAL(fp_kind), PARAMETER              ::      ReInv=1.0d0/REAL(Re,kind(0d0))
  REAL(fp_kind), PARAMETER              ::      tol=0.1d0**10
  
  !coprocessing: temp is used as an int, runtime crashes on 
  ! floating temp used in write format string on line 376
  !REAL(fp_kind)                                        ::  scalemodes,chg,temp=10000000
  real(fp_kind) :: scalemodes, chg
  integer(kind=4) :: temp=100000000

  INTEGER(kind=4), PARAMETER            ::      nplots=400, plotgap=200                         
  REAL(fp_kind),DIMENSION(:), ALLOCATABLE               ::  x,y
  REAL(fp_kind),DIMENSION(:,:), ALLOCATABLE     ::      omeg,omegexact
  INTEGER(kind=4)                               ::  i,j,n,t, AllocateStatus
  INTEGER(kind=4)                               ::  planz2d,pland2z, kersize
  !variables used for saving data and timing            
  INTEGER(kind=4)                               ::      start, finish, count_rate,count, iol    
  CHARACTER*100                                 ::  name_config
  ! Declare variables for GPU   
  REAL(fp_kind), DEVICE, DIMENSION(:,:), ALLOCATABLE            ::      u_d,v_d,&
       omegcheck_d,omeg_d,&
       nl_d, temp2_d, omegexact_d
  COMPLEX(fp_kind), DEVICE, DIMENSION(:,:), ALLOCATABLE :: omegoldhat_d, nloldhat_d,&
       omeghat_d, nlhat_d, psihat_d,&
       temp1_d                                                          
  COMPLEX(fp_kind), DEVICE, DIMENSION(:), ALLOCATABLE           ::  kx_d,ky_d                                           
  REAL(kind=8),DEVICE, DIMENSION(:), ALLOCATABLE                ::  x_d,y_d
  
  kersize=min(Nx,256)
  PRINT *,'Program starting'
  PRINT *,'Grid:',Nx,'X',Ny
  PRINT *,'dt:',dt      
  ALLOCATE(x(1:Nx),y(1:Ny),omeg(1:Nx,1:Ny),omegexact(1:Nx,1:Ny),&
       stat=AllocateStatus)
  IF (AllocateStatus .ne. 0) STOP               
  PRINT *,'Allocated CPU arrays'
  ALLOCATE(kx_d(1:Nx/2+1),ky_d(1:Ny),x_d(1:Nx),y_d(1:Ny),u_d(1:Nx,1:Ny),&
       v_d(1:Nx,1:Ny),omeg_d(1:Nx,1:Ny),&
       omegexact_d(1:Nx,1:Ny),temp2_d(1:Nx,1:Ny),&
       omegoldhat_d(1:Nx/2+1,1:Ny),nloldhat_d(1:Nx/2+1,1:Ny),&
       omegcheck_d(1:Nx,1:Ny),omeghat_d(1:Nx/2+1,1:Ny),nl_d(1:Nx,1:Ny),&
       nlhat_d(1:Nx/2+1,1:Ny), psihat_d(1:Nx/2+1,1:Ny),temp1_d(1:Nx/2+1,1:Ny),&
       stat=AllocateStatus)
  IF (AllocateStatus .ne. 0) STOP 
  PRINT *,'Allocated GPU arrays'
  
  
  CALL cufftPlan2D(pland2z,nx,ny,CUFFT_D2Z)
  CALL cufftPlan2D(planz2d,nx,ny,CUFFT_Z2D)
  PRINT *,'Setup FFTs'
  
  ! setup fourier frequencies
  !$cuf kernel do <<< *,* >>>
  DO i=1,Nx/2+1
     kx_d(i)= 2.0d0*pi*cmplx(0.0d0,1.0d0)*REAL(i-1,kind=fp_kind)  
  END DO
  kx_d(1+Nx/2)=0.0d0
  !$cuf kernel do <<< *,* >>>
  DO i=1,Nx
     x_d(i)=REAL(i-1,kind(0d0))/REAL(Nx,kind=fp_kind) 
  END DO
  !$cuf kernel do <<< *,* >>>
  DO j=1,Ny/2+1
     ky_d(j)= 2.0d0*pi*cmplx(0.0d0,1.0d0)*REAL(j-1,kind=fp_kind) 
  END DO
  ky_d(1+Ny/2)=0.0d0
  !$cuf kernel do <<< *,* >>>
  DO j = 1,Ny/2 -1
     ky_d(j+1+Ny/2)=-ky_d(1-j+Ny/2)
  END DO
  !$cuf kernel do <<< *, * >>>
  DO j=1,Ny
     y_d(j)=REAL(j-1,kind(0d0))/REAL(Ny,kind=fp_kind) 
  END DO
  scalemodes=1.0d0/REAL(Nx*Ny,kind=fp_kind)
  PRINT *,'Setup grid and fourier frequencies'
  
  !initial data
  !$cuf kernel do <<<  *,*  >>>
  DO j=1,Ny
     DO i=1,Nx
        u_d(i,j)=sin(2.0d0*pi*x_d(i))*cos(2.0d0*pi*y_d(j))
     END DO
  END DO
  !$cuf kernel do <<<  *,*  >>>
  DO j=1,Ny
     DO i=1,Nx
        v_d(i,j)=-cos(2.0d0*pi*x_d(i))*sin(2.0d0*pi*y_d(j))
     END DO
  END DO
  !$cuf kernel do <<<  *,*  >>>
  DO j=1,Ny
     DO i=1,Nx
        omeg_d(i,j)=4.0d0*pi*sin(2.0d0*pi*x_d(i))*sin(2.0d0*pi*y_d(j))+0.01d0*cos(2.0d0*pi*y_d(j))
     END DO
  END DO
  
  CALL cufftExecD2Z(pland2z,omeg_d,omeghat_d) 
  
  !$cuf kernel do <<<  *,*  >>>
  DO j=1,Ny
     DO i=1,Nx/2+1
        temp1_d(i,j)=omeghat_d(i,j)*kx_d(i)
     END DO
  END DO
  
  CALL cufftExecZ2D(planz2d,temp1_d,temp2_d)
  !first part nonlinear term in real space
  !$cuf kernel do <<<  *,*  >>>
  DO j=1,Ny
     DO i=1,Nx
        nl_d(i,j)=u_d(i,j)*temp2_d(i,j)
     END DO
  END DO
  
  !$cuf kernel do <<<  *,*  >>>
  DO j=1,Ny
     DO i=1,Nx/2+1
        temp1_d(i,j)=omeghat_d(i,j)*ky_d(j)
     END DO
  END DO
  
  CALL cufftExecZ2D(planz2d,temp1_d,temp2_d)
  
  !$cuf kernel do <<<  *,*  >>>
  DO j=1,Ny
     DO i=1,Nx
        nl_d(i,j)=(nl_d(i,j)+v_d(i,j)*temp2_d(i,j))*scalemodes
     END DO
  END DO
  
  CALL cufftExecD2Z(pland2z,nl_d,nlhat_d) 
  
  omegcheck_d=omeg_d
  PRINT *,'Got initial data, starting timestepping'     
  CALL system_clock(start,count_rate)

  ! coprocessing: simulation loop
  ! coprocessing: before entering, initialze the ParaView coprocessor
  ! coprocessing: this is from FortranAdaptorAPI.cxx
  ! coprocessing: it takes the name of the pipeline script you created
  ! coprocessing: in the ParaView client and the name's length in chars
  call coprocessorinitialize("pipeline.py", 11 )
  DO t=1,nplots
     DO n=1,plotgap
        chg=1.0d0
        nloldhat_d=nlhat_d
        omegoldhat_d=omeghat_d
        DO WHILE (chg>tol)
           !$cuf kernel do(2) <<< (2,*), (kersize,1) >>>
           DO j=1,Ny
              DO i=1,Nx/2+1
                 omeghat_d(i,j)=((dtInv+0.5d0*ReInv*(kx_d(i)*kx_d(i)+ky_d(j)*ky_d(j)))&
                      *omegoldhat_d(i,j) - 0.5d0*(nloldhat_d(i,j)+nlhat_d(i,j)))&
                      /(dtInv-0.5d0*ReInv*(kx_d(i)*kx_d(i)+ky_d(j)*ky_d(j)))   
              END DO
           END DO
           !$cuf kernel do(2) <<< (2,*), (kersize,1) >>>
           DO j=1,Ny
              DO i=1,Nx/2+1
                 psihat_d(i,j)=-omeghat_d(i,j)/(kx_d(i)*kx_d(i)+ky_d(j)*ky_d(j)+0.10d0**14)
              END DO
           END DO
           CALL cufftExecZ2D(planz2d,omeghat_d,omeg_d)
           !$cuf kernel do(2) <<< (2,*), (kersize,1) >>>
           DO j=1,Ny
              DO i=1,Nx/2+1
                 temp1_d(i,j)=-psihat_d(i,j)*kx_d(i)*scalemodes
              END DO
           END DO
           CALL cufftExecZ2D(planz2d,temp1_d,v_d) 
           !$cuf kernel do(2) <<< (2,*), (kersize,1) >>>                                
           DO j=1,Ny
              DO i=1,Nx/2+1
                 temp1_d(i,j)=psihat_d(i,j)*ky_d(j)*scalemodes
              END DO
           END DO
           CALL cufftExecZ2D(planz2d,temp1_d,u_d)
           
           !$cuf kernel do(2) <<< (2,*), (kersize,1) >>>
           DO j=1,Ny
              DO i=1,Nx/2+1
                 temp1_d(i,j)=omeghat_d(i,j)*kx_d(i)
              END DO
           END DO
           
           CALL cufftExecZ2D(planz2d,temp1_d,temp2_d)
           
           !$cuf kernel do(2) <<< (2,*), (kersize,1) >>>
           DO j=1,Ny
              DO i=1,Nx
                 nl_d(i,j)=u_d(i,j)*temp2_d(i,j)
              END DO
           END DO
           
           !$cuf kernel do(2) <<< (2,*), (kersize,1) >>>
           DO j=1,Ny
              DO i=1,Nx/2+1
                 temp1_d(i,j)=omeghat_d(i,j)*ky_d(j)
              END DO
           END DO
           
           CALL cufftExecZ2D(planz2d,temp1_d,temp2_d)
           
           !$cuf kernel do(2) <<< (2,*), (kersize,1) >>>
           DO j=1,Ny
              DO i=1,Nx
                 nl_d(i,j)=(nl_d(i,j)+v_d(i,j)*temp2_d(i,j))*scalemodes
              END DO
           END DO
           
           CALL cufftExecD2Z(pland2z,nl_d,nlhat_d) 
           
           chg=0.0d0
           !$cuf kernel do(2) <<< (2,*), (kersize,1) >>>
           DO j=1,Ny
              DO i=1,Nx
                 chg=chg+(omeg_d(i,j)-omegcheck_d(i,j))*(omeg_d(i,j)-omegcheck_d(i,j))&
                      *scalemodes*scalemodes
              END DO
           END DO
           omegcheck_d=omeg_d
        END DO
     END DO
     PRINT *, t*plotgap*dt
     omeg=omeg_d
     !temp=temp+1
     !write(name_config,'(a,i0,a)') 'omega',temp,'.datbin'
     !INQUIRE(iolength=iol) omeg(1,1)
     !OPEN(unit=11,FILE=name_config,form="unformatted", access="direct",recl=iol) 
     !count = 1 !coprocessing - there's an intrinsic array function this shadows
     !DO j=1,Ny
     !   DO i=1,Nx
     !      WRITE(11,rec=count) omeg(i,j)*scalemodes
     !      count=count+1
     !   END DO
     !END DO
     !CLOSE(11)

     ! coprocessing: do the coprocessing at the end of the sim loop
     ! grid dims, time step, time, scalar array
     call navierStokesCoprocessor(Nx, Ny, 1, t, t*dt  ,omeg)  
 
  END DO
 
  ! coprocessing: clean up
  call coprocessorfinalize()

 CALL system_clock(finish,count_rate)
  PRINT*,'Program took ',REAL(finish-start)/REAL(count_rate),&
       'for Time stepping'
  
  
  ! Copy results back to host
  x=x_d
  y=y_d         
  
!!!!!!!!!!!!!!!!!!!!!!!!
  !copy over data to disk!
!!!!!!!!!!!!!!!!!!!!!!!!
  
  name_config = 'xcoord.dat' 
  OPEN(unit=11,FILE=name_config,status="UNKNOWN")       
  REWIND(11)
  DO i=1,Nx
     WRITE(11,*) x(i)
  END DO
  CLOSE(11)
  
  name_config = 'ycoord.dat' 
  OPEN(unit=11,FILE=name_config,status="UNKNOWN")       
  REWIND(11)
  DO j=1,Ny
     WRITE(11,*) y(j)
  END DO
  CLOSE(11)
!!!!!!!!!!!!!!!!!!!!!!!!
  
  CALL cufftDestroy(planz2d)
  CALL cufftDestroy(pland2z)
  PRINT *,'Destroyed fft plan'
  
  DEALLOCATE(x,y,omeg,omegexact,stat=AllocateStatus)    
  IF (AllocateStatus .ne. 0) STOP
  PRINT *,'Deallocated CPU memory'
  
  DEALLOCATE(kx_d,ky_d,x_d,y_d,u_d,&
       v_d, omegcheck_d, omeg_d,omegoldhat_d,&
       omegexact_d, nloldhat_d,omeghat_d,nl_d, nlhat_d,&
       temp1_d,temp2_d, psihat_d,stat=AllocateStatus)   
  IF (AllocateStatus .ne. 0) STOP
  PRINT *,'Deallocated GPU memory'
  PRINT *,'Program execution complete'
END PROGRAM main

Developed files

edit

ns2dcnadaptor.f90

edit

 

 

 

 

(B)
Defines navierstokescoprocessor subroutine. This calls some functions from the ParaView supplied FortranAdaptorAPI.h and the user defined functions in ns2dcnVTKDataSet.cxx.

Code download

! fortran module for interacting with the ParaView CoProcessor
! loosely based on: 
! ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/PhastaAdaptor/phastaadaptor.f
! -- Changed for SESE

! subroutine determines if coprocessing needed during the
! current simulation step or not.
      
module ns2dcnadaptor
  implicit none
  public
contains
  subroutine navierstokescoprocessor(nx, ny, nz, step, time, omeg)
    ! nx, ny, nz -- grid dimensions, nz will be 1 since this is a 2D problem.
    ! step       -- current simulation time step
    ! time       -- current simulation time
    ! omega      -- scalar array for the current time step
    integer, intent(in) :: nx, ny, nz, step
    real(kind=8), intent(in) :: time
    real(kind=8), dimension(:,:), intent (in) :: omeg
    integer :: flag

    ! check if processing this time step
    ! defined in FortranAdaptorAPI.h
    call requestdatadescription(step, time, flag)
        
    if (flag .ne. 0) then
       ! processing requested
       ! check if need to create grid
       ! defined in FortranAdaptorAPI.h
       call needtocreategrid(flag)
       
       if (flag .ne. 0) then
          ! grid needed
          ! defined in ns2dcnVTKDataSet.cxx
          call createcpimagedata(nx, ny, nz)
       end if
          
       ! defined in ns2dcnVTKDataSet.cxx 
       call addfield(omeg)
       
       ! defined in FortranAdaptorAPI.h
       call coprocess()
    end if
    
    return
  end subroutine navierstokescoprocessor
end module ns2dcnadaptor

ns2dcnVTKDataSet.cxx

edit

 

 

 

 

( C)

Defines functions that put simulation data into a format which the ParaView/VTK libs can understand. Very straightforward for completely regular grids. Expanding to 3D is simple. Unstructured meshes will require more effort. Code download

// Adaptor for getting fortran simulation code into ParaView CoProcessor.
// Based on the PhastaAdaptor sample in the ParaView distribution.
// ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/PhastaAdaptor/PhastaAdaptor.cxx


// Fortran specific header
// ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/
#include "FortranAdaptorAPI.h" 

// CoProcessor specific headers
// ParaView-3.14.1-Source/CoProcessing/CoProcessor/
#include "vtkCPDataDescription.h"
#include "vtkCPInputDataDescription.h"
#include "vtkCPProcessor.h"

#include "vtkDoubleArray.h"
#include "vtkPointData.h"

// ParaView-3.14.1-Source/VTK/Filtering/vtkImageData.h
#include "vtkImageData.h"

// These will be called from the Fortran "glue" code"
// Completely dependent on data layout, structured vs. unstructured, etc.
// since VTK/ParaView uses different internal layouts for each.

// Creates the data container for the CoProcessor.
extern "C" void createcpimagedata_(int* nx, int* ny, int* nz)
{
  if (!ParaViewCoProcessing::GetCoProcessorData()) {
    vtkGenericWarningMacro("Unable to access CoProcessorData.");
    return;
  }

  // The simulation grid is a 2-dimensional topologically and geometrically 
  // regular grid. In VTK/ParaView, this is considered an image data set.
  vtkImageData* Grid = vtkImageData::New();
  
  // assuming dimZ == 1 for now
  Grid->SetDimensions(*nx, *ny, *nz);
  
  // Setting the Origin and Spacing are also options.

  // Name should be consistent between here, Fortran and Python client script.
  ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input")->SetGrid(Grid);
}

// Add field(s) to the data container.
// Separate from above because this will be dynamic, grid is static.
// Might be an issue, VTK probably assumes row major, but
// omeg probably passed column major...
// by hand name mangling for fortran
extern "C" void addfield_(double* scalars)
{
  vtkCPInputDataDescription *idd = ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input");

  vtkImageData* Image = vtkImageData::SafeDownCast(idd->GetGrid());

  if (!Image) {
    vtkGenericWarningMacro("No adaptor grid to attach field data to.");
    return;
  }


  // field name must match that in the fortran code.
  if (idd->IsFieldNeeded("omeg")) {
    vtkDoubleArray* omega = vtkDoubleArray::New();
    omega->SetName("omeg");
    omega->SetArray(scalars, Image->GetNumberOfPoints(), 1); 
    Image->GetPointData()->AddArray(omega);
    omega->Delete();
  }
}

Creating a visualization pipeline in the ParaView Client

edit

The script pipeline is generated in the ParaView client by using the CoProcessing Script Generator Plugin.

  • File > Open > browse to data set, the file should now be visible in the Pipeline Browser window.
  • In the Properties window, click apply, the data should now appear in the layout window.
  • To change the colormap, click on the toolbar button that’s a rainbow with a green circle.
  • To display the colormap in the layout, click on the toolbar button that’s a vertical rainbow.
  • Set the size of the viewport in Tools > Lock View Size Custom. This will help you arrange the items for the final image. Make a note of the dimensions, you’ll need them later for a bug workaround.
  • If writing out VTK data sets, go to the Writers > Parallel Image Data Writer, this should add a writer to the pipeline
  • To export the python pipeline go to CoProcessing > Export State, this should launch a wizard
    • click Next
    • Select the dataset you want to connect to your simulation, Add it, click Next
    • The default Simulation Name is “input”. If you change this, be sure to update ns2dcnVTKDataSet.cxx, click Next
    • Check whether or not to output images. Click Finish.
    • Save the file. The file name will be referenced in navierstokes.cuf, so save as something like pipeline-test-??.py and then cp that to pipeline.py so to avoid recompiling for new pipelines.

Python script issues/bugs

edit

If the pipeline is set to output images, then there are two issues that can be considered bugs in the script.

  1. No matter what viewport size is set in the client, it doesn’t get used by the script.
  2. Color map labels will be drawn with a box surrounding them.

The workaround for the first is to add a method call to ViewSize equal to the viewport window dimensions in the client. The second a simple edit. Both of these labeled in the sample pipeline.py with #mvm comments

Set the environment

edit

Forge required loading the ParaView OSMesa module. $ module load paraview/3.14.1-OSMesa. On Nautilus the module is paraview/3.14.1

Run the simulation

edit

Run as you would the uninstrumented version.

Convert the images to an animation

edit

There are a variety of ways to do this, if ffmpeg is available ffmpeg -sameq -i path/to/images/image%04d.png myanimation.mpg

or if writing out the VTK data sets, one can work with those in ParaView directly.

dt and effect on animation time.

edit

Smooth video is ~30 fps or dt = 0.033… If you write out a larger dt as VTK data sets, then one can do linear time interpolation in ParaView. If writing out a larger dt as images to stitch into an mpg, the options available created much reduced video quality.

ParaView Server setup

edit
  • Forge did not have X support, so an offscreen Mesa (OSMesa) build was necessary.
  • Nautilus supports X and so can render with the supplied OpenGL libraries.

Summary of Changes for New API

edit

The coprocessing API has had some minor changes since ParaView 3.14.1. The following applies for ParaView 3.98+ and Catalyst 1.0 alpha+. These were tested on Beacon in offload mode at NICS on August 20, 2013.

The code changes are:

  • In the simulation code, coprocessorinitialize is now coprocessorinitializewithpython, the arguments are the same.
  • In the C++ data set helper code, include vtkCPPythonAdaptorAPI.h instead of FortranAdaptorAPI.h.
  • In the C++ data set helper code, change from ParaViewCoprocessing namespace to vtkCPPythonAdaptorAPI namespace.

The libraries to link against have also changed, for a ParaView 4.0.1 build these are:

  • -lvtkPVCatalystPython26D-pv4.0
  • -lvtkPVCatalyst-pv4.0
  • -lvtkCommonCore-pv4.0
  • -lvtkCommonDataModel-pv4.0
  • -lvtkPVPythonCatalyst-pv4.0

Example with New API

edit

 

 

 

 

( A)
Simulation: navierstokes.f90 -- only change is the initialization function name.

Code download

   
!--------------------------------------------------------------------
!
!
! PURPOSE
!
! This program numerically solves the 2D incompressible Navier-Stokes
! on a Square Domain [0,1]x[0,1] using pseudo-spectral methods and
! Crank-Nicolson timestepping. The numerical solution is compared to
! the exact Taylor-Green Vortex Solution. 
!
! Periodic free-slip boundary conditions and Initial conditions:
! u(x,y,0)=sin(2*pi*x)cos(2*pi*y)
! v(x,y,0)=-cos(2*pi*x)sin(2*pi*y)
! Analytical Solution:
! u(x,y,t)=sin(2*pi*x)cos(2*pi*y)exp(-8*pi^2*nu*t)
! v(x,y,t)=-cos(2*pi*x)sin(2*pi*y)exp(-8*pi^2*nu*t)
!
! .. Parameters ..
!  Nx    = number of modes in x - power of 2 for FFT
!  Ny    = number of modes in y - power of 2 for FFT
!  Nt    = number of timesteps to take
!  Tmax    = maximum simulation time
!  FFTW_IN_PLACE  = value for FFTW input 
!  FFTW_MEASURE  = value for FFTW input
!  FFTW_EXHAUSTIVE  = value for FFTW input
!  FFTW_PATIENT  = value for FFTW input    
!  FFTW_ESTIMATE  = value for FFTW input
!  FFTW_FORWARD     = value for FFTW input
!  FFTW_BACKWARD = value for FFTW input 
!  pi = 3.14159265358979323846264338327950288419716939937510d0
!  mu    = viscosity
!  rho    = density
! .. Scalars ..
!  i    = loop counter in x direction
!  j    = loop counter in y direction
!  n    = loop counter for timesteps direction 
!  allocatestatus = error indicator during allocation
!  count   = keep track of information written to disk
!  iol    = size of array to write to disk
!  start   = variable to record start time of program
!  finish   = variable to record end time of program
!  count_rate  = variable for clock count rate
!  planfx   = Forward 1d fft plan in x
!  planbx   = Backward 1d fft plan in x
!  planfy   = Forward 1d fft plan in y
!  planby   = Backward 1d fft plan in y
!  dt    = timestep
! .. Arrays ..
!  u    = velocity in x direction
!  uold    = velocity in x direction at previous timestep
!  v    = velocity in y direction
!  vold    = velocity in y direction at previous timestep
!  u_y    = y derivative of velocity in x direction
!  v_x    = x derivative of velocity in y direction
!  omeg    = vorticity in real space
!  omegold   = vorticity in real space at previous
!      iterate
!  omegcheck  = store of vorticity at previous iterate
!  omegoldhat  = 2D Fourier transform of vorticity at previous
!      iterate
!  omegoldhat_x  = x-derivative of vorticity in Fourier space
!      at previous iterate
!  omegold_x  = x-derivative of vorticity in real space 
!      at previous iterate
!  omegoldhat_y  = y-derivative of vorticity in Fourier space 
!      at previous iterate
!  omegold_y  = y-derivative of vorticity in real space
!       at previous iterate
!  nlold   = nonlinear term in real space
!      at previous iterate
!  nloldhat   = nonlinear term in Fourier space
!      at previous iterate
!  omeghat   = 2D Fourier transform of vorticity
!      at next iterate
!  omeghat_x  = x-derivative of vorticity in Fourier space
!      at next timestep
!  omeghat_y  = y-derivative of vorticity in Fourier space
!      at next timestep
!  omeg_x   = x-derivative of vorticity in real space
!      at next timestep
!  omeg_y   = y-derivative of vorticity in real space
!      at next timestep
! .. Vectors ..
!  kx    = fourier frequencies in x direction
!  ky    = fourier frequencies in y direction
!  kxx    = square of fourier frequencies in x direction
!  kyy    = square of fourier frequencies in y direction
!  x    = x locations
!  y    = y locations
!  time    = times at which save data
!  name_config  = array to store filename for data to be saved      
!  fftfx   = array to setup x Fourier transform
!  fftbx   = array to setup y Fourier transform
! REFERENCES
!
! ACKNOWLEDGEMENTS
!
! ACCURACY
!  
! ERROR INDICATORS AND WARNINGS
!
! FURTHER COMMENTS
! This program has not been optimized to use the least amount of memory
! but is intended as an example only for which all states can be saved
!--------------------------------------------------------------------
! External routines required
! 
! External libraries required
! FFTW3  -- Fast Fourier Transform in the West Library
!   (http://www.fftw.org/)         
! declare variables
PROGRAM main
    use nsadaptor_glue
    IMPLICIT NONE  
    INTEGER(kind=4), PARAMETER  ::  Nx=256   
    INTEGER(kind=4), PARAMETER  ::  Ny=256   
    REAL(kind=8), PARAMETER  ::  dt=0.00125     
    REAL(kind=8), PARAMETER &
        ::  pi=3.14159265358979323846264338327950288419716939937510
    REAL(kind=8), PARAMETER  ::  rho=1.0d0   
    REAL(kind=8), PARAMETER  ::  mu=1.0d0  
    REAL(kind=8), PARAMETER  :: tol=0.1d0**10  
    REAL(kind=8)    :: chg  
    INTEGER(kind=4), PARAMETER :: nplots=50
    REAL(kind=8), DIMENSION(:), ALLOCATABLE   :: time
    COMPLEX(kind=8), DIMENSION(:), ALLOCATABLE  ::  kx,kxx    
    COMPLEX(kind=8), DIMENSION(:), ALLOCATABLE  ::  ky,kyy    
    REAL(kind=8), DIMENSION(:), ALLOCATABLE   ::  x     
    REAL(kind=8), DIMENSION(:), ALLOCATABLE   ::  y      
    COMPLEX(kind=8), DIMENSION(:,:), ALLOCATABLE :: & 
        u,uold,v,vold,u_y,v_x,omegold, omegcheck, omeg,&
        omegoldhat, omegoldhat_x, omegold_x,& 
        omegoldhat_y, omegold_y, nlold, nloldhat,&
        omeghat, omeghat_x, omeghat_y, omeg_x, omeg_y,&
        nl, nlhat, psihat, psihat_x, psi_x, psihat_y, psi_y

    ! added for coprocessing
    real(kind=8), dimension(:,:), allocatable :: dump

    REAL(kind=8),DIMENSION(:,:), ALLOCATABLE :: uexact_y,vexact_x,omegexact
    INTEGER(kind=4)        ::  i,j,k,n, allocatestatus, count, iol
    INTEGER(kind=4)         :: start, finish, count_rate
    INTEGER(kind=4), PARAMETER         ::  FFTW_IN_PLACE = 8, FFTW_MEASURE = 0, &
                FFTW_EXHAUSTIVE = 8, FFTW_PATIENT = 32,    &
                                FFTW_ESTIMATE = 64
    INTEGER(kind=4),PARAMETER          ::  FFTW_FORWARD = -1, FFTW_BACKWARD=1 
    COMPLEX(kind=8), DIMENSION(:,:), ALLOCATABLE ::  fftfx,fftbx
    INTEGER(kind=8)        ::  planfxy,planbxy
    CHARACTER*100         ::  name_config
       
    CALL system_clock(start,count_rate)  
    ALLOCATE(time(1:nplots),kx(1:Nx),kxx(1:Nx),ky(1:Ny),kyy(1:Ny),x(1:Nx),y(1:Ny),&
        u(1:Nx,1:Ny),uold(1:Nx,1:Ny),v(1:Nx,1:Ny),vold(1:Nx,1:Ny),u_y(1:Nx,1:Ny),&
        v_x(1:Nx,1:Ny),omegold(1:Nx,1:Ny),omegcheck(1:Nx,1:Ny), omeg(1:Nx,1:Ny),&
        omegoldhat(1:Nx,1:Ny),omegoldhat_x(1:Nx,1:Ny), omegold_x(1:Nx,1:Ny), &
        omegoldhat_y(1:Nx,1:Ny),omegold_y(1:Nx,1:Ny), nlold(1:Nx,1:Ny), nloldhat(1:Nx,1:Ny),&
        omeghat(1:Nx,1:Ny), omeghat_x(1:Nx,1:Ny), omeghat_y(1:Nx,1:Ny), omeg_x(1:Nx,1:Ny),&
        omeg_y(1:Nx,1:Ny), nl(1:Nx,1:Ny), nlhat(1:Nx,1:Ny), psihat(1:Nx,1:Ny), &
        psihat_x(1:Nx,1:Ny), psi_x(1:Nx,1:Ny), psihat_y(1:Nx,1:Ny), psi_y(1:Nx,1:Ny),&
        uexact_y(1:Nx,1:Ny), vexact_x(1:Nx,1:Ny), omegexact(1:Nx,1:Ny),fftfx(1:Nx,1:Ny),&
        fftbx(1:Nx,1:Ny),dump(1:Nx,1:Ny),stat=AllocateStatus) 
    IF (AllocateStatus .ne. 0) STOP 
    PRINT *,'allocated space'
  
    ! set up ffts
    CALL dfftw_plan_dft_2d_(planfxy,Nx,Ny,fftfx(1:Nx,1:Ny),fftbx(1:Nx,1:Ny),&
        FFTW_FORWARD,FFTW_EXHAUSTIVE)
    CALL dfftw_plan_dft_2d_(planbxy,Nx,Ny,fftbx(1:Nx,1:Ny),fftfx(1:Nx,1:Ny),&
    FFTW_BACKWARD,FFTW_EXHAUSTIVE)
  
     ! setup fourier frequencies in x-direction
    DO i=1,1+Nx/2
        kx(i)= 2.0d0*pi*cmplx(0.0d0,1.0d0)*REAL(i-1,kind(0d0))     
    END DO
    kx(1+Nx/2)=0.0d0
    DO i = 1,Nx/2 -1
        kx(i+1+Nx/2)=-kx(1-i+Nx/2)
    END DO
    DO i=1,Nx
        kxx(i)=kx(i)*kx(i)
    END DO
    DO i=1,Nx
        x(i)=REAL(i-1,kind(0d0))/REAL(Nx,kind(0d0)) 
    END DO
  
    ! setup fourier frequencies in y-direction
    DO j=1,1+Ny/2
        ky(j)= 2.0d0*pi*cmplx(0.0d0,1.0d0)*REAL(j-1,kind(0d0))     
    END DO
    ky(1+Ny/2)=0.0d0
    DO j = 1,Ny/2 -1
        ky(j+1+Ny/2)=-ky(1-j+Ny/2)
    END DO
    DO j=1,Ny
        kyy(j)=ky(j)*ky(j)
    END DO
    DO j=1,Ny
        y(j)=REAL(j-1,kind(0d0))/REAL(Ny,kind(0d0)) 
    END DO
    PRINT *,'Setup grid and fourier frequencies'
 
  
    DO j=1,Ny
        DO i=1,Nx
            u(i,j)=sin(2.0d0*pi*x(i))*cos(2.0d0*pi*y(j))
            v(i,j)=-cos(2.0d0*pi*x(i))*sin(2.0d0*pi*y(j))
            u_y(i,j)=-2.0d0*pi*sin(2.0d0*pi*x(i))*sin(2.0d0*pi*y(j))
            v_x(i,j)=2.0d0*pi*sin(2.0d0*pi*x(i))*sin(2.0d0*pi*y(j))
            omeg(i,j)=v_x(i,j)-u_y(i,j)
        END DO
    END DO
 
    ! Vorticity to Fourier Space
    CALL dfftw_execute_dft_(planfxy,omeg(1:Nx,1:Ny),omeghat(1:Nx,1:Ny)) 
 
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    !!!!!!!!!!!!!!!Initial nonlinear term !!!!!!!!!!!!!!!!!!!!!!!!
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 
    ! obtain \hat{\omega}_x^{n,k}
    DO j=1,Ny
        omeghat_x(1:Nx,j)=omeghat(1:Nx,j)*kx(1:Nx)
    END DO
    ! obtain \hat{\omega}_y^{n,k}
    DO i=1,Nx
        omeghat_y(i,1:Ny)=omeghat(i,1:Ny)*ky(1:Ny)
    END DO
    ! convert to real space 
    CALL dfftw_execute_dft_(planbxy,omeghat_x(1:Nx,1:Ny),omeg_x(1:Nx,1:Ny))
    CALL dfftw_execute_dft_(planbxy,omeghat_y(1:Nx,1:Ny),omeg_y(1:Nx,1:Ny))
    ! compute nonlinear term in real space
    DO j=1,Ny
        nl(1:Nx,j)=u(1:Nx,j)*omeg_x(1:Nx,j)/REAL(Nx*Ny,kind(0d0))+&
        v(1:Nx,j)*omeg_y(1:Nx,j)/REAL(Nx*Ny,kind(0d0)) 
    END DO
    CALL dfftw_execute_dft_(planfxy,nl(1:Nx,1:Ny),nlhat(1:Nx,1:Ny)) 
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 
    time(1)=0.0d0
    PRINT *,'Got initial data, starting timestepping'

    ! New Coprocessor API:
    call coprocessorinitializewithpython("pipeline.py", 11)
    
    DO n=1,nplots
        chg=1
        ! save old values
        uold(1:Nx,1:Ny)=u(1:Nx,1:Ny)
        vold(1:Nx,1:Ny)=v(1:Nx,1:Ny)
        omegold(1:Nx,1:Ny)=omeg(1:Nx,1:Ny)
        omegcheck(1:Nx,1:Ny)=omeg(1:Nx,1:Ny)  
        omegoldhat(1:Nx,1:Ny)=omeghat(1:Nx,1:Ny)
        nloldhat(1:Nx,1:Ny)=nlhat(1:Nx,1:Ny)
        DO WHILE (chg>tol) 
            !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
            !!!!!!!!!!!!!!nonlinear fixed (n,k+1)!!!!!!!!!!!!!!!!!!!!!!!!!
            !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 
            ! obtain \hat{\omega}_x^{n+1,k}
            DO j=1,Ny
                omeghat_x(1:Nx,j)=omeghat(1:Nx,j)*kx(1:Nx) 
            END DO
            ! obtain \hat{\omega}_y^{n+1,k}
            DO i=1,Nx
                omeghat_y(i,1:Ny)=omeghat(i,1:Ny)*ky(1:Ny)
            END DO
            ! convert back to real space 
            CALL dfftw_execute_dft_(planbxy,omeghat_x(1:Nx,1:Ny),omeg_x(1:Nx,1:Ny)) 
            CALL dfftw_execute_dft_(planbxy,omeghat_y(1:Nx,1:Ny),omeg_y(1:Nx,1:Ny))
            ! calculate nonlinear term in real space
            DO j=1,Ny
                nl(1:Nx,j)=u(1:Nx,j)*omeg_x(1:Nx,j)/REAL(Nx*Ny,kind(0d0))+&
                v(1:Nx,j)*omeg_y(1:Nx,j)/REAL(Nx*Ny,kind(0d0))
            END DO
            ! convert back to fourier
            CALL dfftw_execute_dft_(planfxy,nl(1:Nx,1:Ny),nlhat(1:Nx,1:Ny)) 
            !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 
            !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
            !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   
            ! obtain \hat{\omega}^{n+1,k+1} with Crank Nicolson timestepping
            DO j=1,Ny
                omeghat(1:Nx,j)=( (1.0d0/dt+0.5d0*(mu/rho)*(kxx(1:Nx)+kyy(j)))&
                *omegoldhat(1:Nx,j) - 0.5d0*(nloldhat(1:Nx,j)+nlhat(1:Nx,j)))/&
                (1.0d0/dt-0.5d0*(mu/rho)*(kxx(1:Nx)+kyy(j)))   
            END DO
   
            ! calculate \hat{\psi}^{n+1,k+1}
            DO j=1,Ny
                psihat(1:Nx,j)=-omeghat(1:Nx,j)/(kxx(1:Nx)+kyy(j))
            END DO
            psihat(1,1)=0.0d0
            psihat(Nx/2+1,Ny/2+1)=0.0d0
            psihat(Nx/2+1,1)=0.0d0
            psihat(1,Ny/2+1)=0.0d0
   
            ! obtain \psi_x^{n+1,k+1} and \psi_y^{n+1,k+1}
            DO j=1,Ny
                psihat_x(1:Nx,j)=psihat(1:Nx,j)*kx(1:Nx)
            END DO
            CALL dfftw_execute_dft_(planbxy,psihat_x(1:Nx,1:Ny),psi_x(1:Nx,1:Ny)) 
            DO i=1,Nx
                psihat_y(i,1:Ny)=psihat(i,1:Ny)*ky(1:Ny)
            END DO
            CALL dfftw_execute_dft_(planbxy,psihat_y(1:Ny,1:Ny),psi_y(1:Ny,1:Ny)) 
            DO j=1,Ny
                psi_x(1:Nx,j)=psi_x(1:Nx,j)/REAL(Nx*Ny,kind(0d0))
                psi_y(1:Nx,j)=psi_y(1:Nx,j)/REAL(Nx*Ny,kind(0d0))
            END DO 
   
            ! obtain \omega^{n+1,k+1}
            CALL dfftw_execute_dft_(planbxy,omeghat(1:Nx,1:Ny),omeg(1:Nx,1:Ny)) 
            DO j=1,Ny
                omeg(1:Nx,j)=omeg(1:Nx,j)/REAL(Nx*Ny,kind(0d0))
            END DO
   
            ! obtain u^{n+1,k+1} and v^{n+1,k+1} using stream function (\psi) in real space
            DO j=1,Ny
                u(1:Nx,j)=psi_y(1:Nx,j)
                v(1:Nx,j)=-psi_x(1:Nx,j)
            END DO 
   
            ! check for convergence 
            chg=maxval(abs(omeg-omegcheck)) 
            ! saves {n+1,k+1} to {n,k} for next iteration
            omegcheck=omeg  
        END DO
        time(n+1)=time(n)+dt
        PRINT *,'TIME ',time(n+1)
        ! call adaptor here
        ! not sure if there's a way to cleanly pass complex to C++ code
        do j = 1, Ny
            do i = 1, Nx
                dump(i,j) = real(omeg(i,j), kind(0d0))
            end do
        end do
        call nsadaptor(Nx, Ny, 1, n, time(n), dump)
    END DO
  
    call coprocessorfinalize()

    DO j=1,Ny
        DO i=1,Nx
            uexact_y(i,j)=-2.0d0*pi*sin(2.0d0*pi*x(i))*sin(2.0d0*pi*y(j))*&
            exp(-8.0d0*mu*(pi**2)*nplots*dt)
            vexact_x(i,j)=2.0d0*pi*sin(2.0d0*pi*x(i))*sin(2.0d0*pi*y(j))*&
            exp(-8.0d0*mu*(pi**2)*nplots*dt)
            omegexact(i,j)=vexact_x(i,j)-uexact_y(i,j)
        END DO
    END DO
  
    name_config = 'omegafinal.datbin' 
    INQUIRE(iolength=iol) omegexact(1,1)
    OPEN(unit=11,FILE=name_config,form="unformatted", access="direct",recl=iol) 
    count = 1 
    DO j=1,Ny
        DO i=1,Nx
            WRITE(11,rec=count) REAL(omeg(i,j),KIND(0d0))
            count=count+1
        END DO
    END DO
    CLOSE(11)
  
    name_config = 'omegaexactfinal.datbin' 
    OPEN(unit=11,FILE=name_config,form="unformatted", access="direct",recl=iol) 
    count = 1 
    DO j=1,Ny
        DO i=1,Nx
            WRITE(11,rec=count) omegexact(i,j)
            count=count+1
        END DO
    END DO
    CLOSE(11)

    name_config = 'xcoord.dat' 
    OPEN(unit=11,FILE=name_config,status="UNKNOWN")  
    REWIND(11)
    DO i=1,Nx
        WRITE(11,*) x(i)
    END DO
    CLOSE(11)

    name_config = 'ycoord.dat' 
    OPEN(unit=11,FILE=name_config,status="UNKNOWN")  
    REWIND(11)
    DO j=1,Ny
        WRITE(11,*) y(j)
    END DO
    CLOSE(11)
 
    CALL dfftw_destroy_plan_(planfxy)
    CALL dfftw_destroy_plan_(planbxy)
    CALL dfftw_cleanup_()
  
    DEALLOCATE(time,kx,kxx,ky,kyy,x,y,&
        u,uold,v,vold,u_y,v_x,omegold, omegcheck, omeg, &
        omegoldhat, omegoldhat_x, omegold_x,& 
        omegoldhat_y, omegold_y, nlold, nloldhat,&
        omeghat, omeghat_x, omeghat_y, omeg_x, omeg_y,&
        nl, nlhat, psihat, psihat_x, psi_x, psihat_y, psi_y,&
        uexact_y,vexact_x,omegexact, &
        fftfx,fftbx,stat=AllocateStatus) 
    IF (AllocateStatus .ne. 0) STOP 
    PRINT *,'Program execution complete'
END PROGRAM main

 

 

 

 

( A)

Fortran adaptor: NSadaptor.f90 -- No API changes, included for completeness. Code download

! Fortran module for interacting with the ParaView/Catalyst coprocessor
module nsadaptor_glue
    implicit none
    public
contains
    subroutine nsadaptor(nx, ny, nz, step, time, scalar)
        ! nx, ny, nz -- grid dimensions
        ! step       -- current time step
        ! time       -- current time
        ! scalar     -- the data
        integer, intent(in) :: nx, ny, nz, step
        real(kind=8), intent(in) :: time
        real(kind=8), dimension(:,:), intent(in) :: scalar
        integer :: flag
        
        ! check if coprocessing this step
        ! defined in adaptor header in ParaView src
        call requestdatadescription(step, time, flag)
        if (flag .ne. 0) then
            ! coprocessing requested
            ! check if grid exists
            ! also from ParaView adaptor header
            call needtocreategrid(flag)

            if (flag .ne. 0) then
                ! grid needed
                ! defined by application developer
                call createcpimagedata(nx, ny, nz)
            end if

            ! also defined by application dev
            call addfield(scalar, "test")
    
            ! from ParaView adaptor header
            call coprocess()
        end if

        return
    end subroutine nsadaptor
end module nsadaptor_glue

 

 

 

 

( A)

C++ VTK data helper: PointBasedDataSet.cxx -- Changes are a new header and a new namespace. Notice also the use of vtkSmartPointer instead of raw pointer handling. Code download

// VTK Data set creator for ParaView/Catalyst Coprocessing.
// Interfaces with a Fortran adaptor to get data from a Fortran
// simulation.

// Fortran specific header for API called from Fortran glue module
#include "vtkCPPythonAdaptorAPI.h"

// Adaptor specific headers, will need to change if simulation data structure
// changes. This should available from the ParaView install include directory
// if you chose to install development files during configuration.
#include "vtkCPDataDescription.h"
#include "vtkCPInputDataDescription.h"
#include "vtkCPProcessor.h"
#include "vtkDoubleArray.h"
#include "vtkPointData.h"
#include "vtkSmartPointer.h"
#include "vtkImageData.h"
#include "vtkCellData.h"

#include <string>
#include <iostream>
// Names are mangled with trailing underscore for linker visibility, along with
// using extern "C"

// Creates the vtkDataSet container for the CoProcessor. The simulation data
// must be compatible.
extern "C" void createcpimagedata_(int* nx, int* ny, int* nz)
{
  if (!vtkCPPythonAdaptorAPI::GetCoProcessorData()) {
    vtkGenericWarningMacro("Unable to access coprocessor data.");
    return;
  }

  vtkSmartPointer<vtkImageData> img = vtkSmartPointer<vtkImageData>::New();
	
  // Note: expects Fortran indexing
  img->SetExtent(0, *nx - 1, 0, *ny - 1, 0, *nz - 1);

  vtkCPPythonAdaptorAPI::GetCoProcessorData()->GetInputDescriptionByName("input")->SetGrid(img);
}

// Add data to the container.
extern "C" void addfield_(double* simdata, char* name)
{
  vtkSmartPointer<vtkCPInputDataDescription> idd = 
    vtkCPPythonAdaptorAPI::GetCoProcessorData()->GetInputDescriptionByName("input");
  
  vtkSmartPointer<vtkImageData> img = vtkImageData::SafeDownCast(idd->GetGrid());

  if (!img) {
    vtkGenericWarningMacro("No adaptor grid to attach field data to.");
    return;
  }

  std::string fieldPythonName = name;
	
  if (idd->IsFieldNeeded(name)) {
    vtkSmartPointer<vtkDoubleArray> field = vtkSmartPointer<vtkDoubleArray>::New();
    field->SetName(name);
    field->SetArray(simdata, img->GetNumberOfPoints(), 1);
    img->GetPointData()->AddArray(field);
  }
}

Using ParaView CoProcessing with MPI and Domain Decomposition

edit

The following notes developed while working on NS3D-MPI, NavierStokes3DfftIMR.f90, KgSemiImp3D.f90, and NSLsplitting.f90 on Nautilus at NICS.

Nodal interpretation

edit

When working with single CPU node, non-partioned data, the simulation arrays can be passed directly to ParaView and added to the pipeline using the vtkDataSet::GetPointData()->AddArray() method. The vtkImageData’s points then correspond exactly to the simulation’s mesh nodes. This vtkImageData will have an implied cellular structure, with each cell corner being one of the simulation nodes.

Without special altering for MPI this code will run, however ParaView won’t be able to create the cells whose corners are on differet ranks. The .pvti will give an error when loaded that the extents are incorrect. The individual .vtis can be loaded to see that this is indeed the case and that there will be cellular gaps between the pencils.

E.g., say the array is dimension(20,10,10) and has been decomposed into 2 pencils, (1:10,:,:) and (11:20,:,:). ParaView considers the data for (10:11,:,:) to be missing and a gap will be present.

Working with point halos

edit

Halos provide one way to fill in the gaps. The 2decomp&fft halo api creates arrays containing the pencil data surrounded by halo elements. E.g., say a pencil is (10:20,10:20,:), the halo array is then (9:21,9:21,:). Using the halos raises some issues:

  • Halo elements can extend outside of the computational domain and these will contain garbage.
  • Not every pencil needs to send the same halo elements to the coprocessor. This requires additional decision logic in the program.
  • From the C++ point of view, a halo array section will not be densely packed, and therefore contain garbage.

Code changes in simulation code

edit

 

 

 

 

( D)
  • Added halo arrays.
  • Added 1D arrays for passing packed arrays to C++.
  • Added branching for which halo elements to pass based on pencil extents.
    1. If pencil furthest from index origin, send just the pencil, no halo elements.
    2. Else if pencil is against either furthest extent, but not both, send the halo elements furthest in the other extent.
    3. Else send the halo elements furthest in both extents.
  • Added reshaping of the halo sections. Arrays are passed by reference from F90 to C++. C++ doesn’t understand F90 array sections and will just read incrementally from the reference (this is an issue with any array sectioning, not just halos). One way around this is to reshape the section as a 1D array, this will make the desired data contiguous in memory.

Code download

PROGRAM main    
  !-----------------------------------------------------------------------------------
  !
  !
  ! PURPOSE
  !
  ! This program numerically solves the 3D incompressible Navier-Stokes
  ! on a Cubic Domain [0,2pi]x[0,2pi]x[0,2pi] using pseudo-spectral methods and
  ! Implicit Midpoint rule timestepping. The numerical solution is compared to
  ! an exact solution reported by Shapiro 
  !
  ! Analytical Solution:
  !       u(x,y,z,t)=-0.25*(cos(x)sin(y)sin(z)+sin(x)cos(y)cos(z))exp(-t/Re)
  !       v(x,y,z,t)= 0.25*(sin(x)cos(y)sin(z)-cos(x)sin(y)cos(z))exp(-t/Re)
  !       w(x,y,z,t)= 0.5*cos(x)cos(y)sin(z)exp(-t/Re)
  !
  ! .. Parameters ..
  !  Nx                           = number of modes in x - power of 2 for FFT
  !  Ny                           = number of modes in y - power of 2 for FFT
  !  Nz                           = number of modes in z - power of 2 for FFT
  !  Nt                           = number of timesteps to take
  !  Tmax                         = maximum simulation time
  !  FFTW_IN_PLACE        = value for FFTW input 
  !  FFTW_MEASURE         = value for FFTW input
  !  FFTW_EXHAUSTIVE      = value for FFTW input
  !  FFTW_PATIENT         = value for FFTW input    
  !  FFTW_ESTIMATE        = value for FFTW input
  !  FFTW_FORWARD     = value for FFTW input
  !  FFTW_BACKWARD        = value for FFTW input  
  !  pi = 3.14159265358979323846264338327950288419716939937510d0
  !  Re                           = Reynolds number
  ! .. Scalars ..
  !  i                            = loop counter in x direction
  !  j                            = loop counter in y direction
  !  k                            = loop counter in z direction
  !  n                            = loop counter for timesteps direction  
  !  allocatestatus       = error indicator during allocation
  !  count                        = keep track of information written to disk
  !  iol                          = size of array to write to disk
  !  start                        = variable to record start time of program
  !  finish                       = variable to record end time of program
  !  count_rate           = variable for clock count rate
  !  planfxyz                     = Forward 3d fft plan 
  !  planbxyz                     = Backward 3d fft plan
  !  dt                           = timestep
  ! .. Arrays ..
  !  u                            = velocity in x direction
  !  v                            = velocity in y direction
  !  w                            = velocity in z direction
  !  uold                         = velocity in x direction at previous timestep
  !  vold                         = velocity in y direction at previous timestep
  !  wold                         = velocity in z direction at previous timestep
  !  ux                           = x derivative of velocity in x direction
  !  uy                           = y derivative of velocity in x direction
  !  uz                           = z derivative of velocity in x direction
  !  vx                           = x derivative of velocity in y direction
  !  vy                           = y derivative of velocity in y direction
  !  vz                           = z derivative of velocity in y direction
  !  wx                           = x derivative of velocity in z direction
  !  wy                           = y derivative of velocity in z direction
  !  wz                           = z derivative of velocity in z direction
  !  uxold                        = x derivative of velocity in x direction
  !  uyold                        = y derivative of velocity in x direction
  !  uzold                        = z derivative of velocity in x direction
  !  vxold                        = x derivative of velocity in y direction
  !  vyold                        = y derivative of velocity in y direction
  !  vzold                        = z derivative of velocity in y direction
  !  wxold                        = x derivative of velocity in z direction
  !  wyold                        = y derivative of velocity in z direction
  !  wzold                        = z derivative of velocity in z direction
  !  utemp                        = temporary storage of u to check convergence
  !  vtemp                        = temporary storage of u to check convergence
  !  wtemp                        = temporary storage of u to check convergence
  !  temp_r                       = temporary storage for untransformed variables
  !  uhat                         = Fourier transform of u
  !  vhat                         = Fourier transform of v
  !  what                         = Fourier transform of w
  !  rhsuhatfix           = Fourier transform of righthand side for u for timestepping
  !  rhsvhatfix           = Fourier transform of righthand side for v for timestepping
  !  rhswhatfix           = Fourier transform of righthand side for w for timestepping
  !  nonlinuhat           = Fourier transform of nonlinear term for u
  !  nonlinvhat           = Fourier transform of nonlinear term for u
  !  nonlinwhat           = Fourier transform of nonlinear term for u
  !  phat                         = Fourier transform of nonlinear term for pressure, p
  !  temp_c                       = temporary storage for Fourier transforms
  !  realtemp                     = Real storage
  !
  ! .. Vectors ..
  !  kx                           = fourier frequencies in x direction
  !  ky                           = fourier frequencies in y direction
  !  kz                           = fourier frequencies in z direction
  !  x                            = x locations
  !  y                            = y locations
  !  z                            = y locations
  !  time                         = times at which save data
  !  name_config          = array to store filename for data to be saved
  !               
  ! REFERENCES
  !
  ! A. Shapiro " The use of an exact solution of the Navier-Stokes equations 
  ! in a validation test of a three-dimensional nonhydrostatic numerical model"
  ! Monthly Weather Review vol. 121, 2420-2425, (1993).
  !
  ! ACKNOWLEDGEMENTS
  !
  ! ACCURACY
  !               
  ! ERROR INDICATORS AND WARNINGS
  !
  ! FURTHER COMMENTS
  !
  ! This program has not been optimized to use the least amount of memory
  ! but is intended as an example only for which all states can be saved
  !
  !--------------------------------------------------------------------------------
  ! External routines required
  ! 
  ! External libraries required
  ! 2DECOMP&FFT -- Fast Fourier Transform in the West Library
  !                       (http://2decomp.org/)


  !---------------------------------------------------------------------------------

  USE decomp_2d
  USE decomp_2d_fft
  USE decomp_2d_io
  USE MPI
  !coprocessing:
  use NSadaptor_module 
  IMPLICIT NONE   
  ! declare variables
  INTEGER(kind=4), PARAMETER              :: Nx=256
  INTEGER(kind=4), PARAMETER              :: Ny=256
  INTEGER(kind=4), PARAMETER              :: Nz=256
  INTEGER(kind=4), PARAMETER              :: Lx=1
  INTEGER(kind=4), PARAMETER              :: Ly=1
  INTEGER(kind=4), PARAMETER              :: Lz=1
  INTEGER(kind=4), PARAMETER              :: Nt=20
  REAL(kind=8), PARAMETER                 :: dt=0.05d0/Nt
  REAL(kind=8), PARAMETER                 :: Re=1.0d0     
  REAL(kind=8), PARAMETER                 :: tol=0.1d0**10
  REAL(kind=8), PARAMETER                 :: theta=0.0d0

  REAL(kind=8), PARAMETER &
       ::  pi=3.14159265358979323846264338327950288419716939937510d0
  REAL(kind=8), PARAMETER         ::      ReInv=1.0d0/REAL(Re,kind(0d0))
  REAL(kind=8), PARAMETER         ::  dtInv=1.0d0/REAL(dt,kind(0d0)) 
  REAL(kind=8)                            :: scalemodes,chg,factor
  REAL(kind=8), DIMENSION(:), ALLOCATABLE                 :: x, y, z, time,mychg,allchg
  COMPLEX(kind=8), DIMENSION(:,:,:), ALLOCATABLE  :: u, v, w,&
       ux, uy, uz,&
       vx, vy, vz,&
       wx, wy, wz,&
       uold, uxold, uyold, uzold,&
       vold, vxold, vyold, vzold,&
       wold, wxold, wyold, wzold,&
       utemp, vtemp, wtemp, temp_r

  COMPLEX(kind=8), DIMENSION(:), ALLOCATABLE      ::      kx, ky, kz                                              
  COMPLEX(kind=8), DIMENSION(:,:,:), ALLOCATABLE  ::      uhat, vhat, what,&
       rhsuhatfix, rhsvhatfix,&
       rhswhatfix, nonlinuhat,&
       nonlinvhat, nonlinwhat,&
       phat,temp_c
  !coprocessing: added x,y,z component arrays also halos
  !coprocessing: halos get allocated inside call to update_halo
  REAL(kind=8), DIMENSION(:,:,:), ALLOCATABLE     ::  realtemp, realtempx, &
       realtempy, realtempz, tempxhalo, tempyhalo, tempzhalo
  !coprocessing: added 1D array for passing to C++
  real(kind=8), dimension(:), allocatable :: xpass2c, ypass2c, zpass2c
  ! MPI and 2DECOMP variables
  TYPE(DECOMP_INFO)                               ::  decomp
  INTEGER(kind=MPI_OFFSET_KIND)                   ::  filesize, disp
  INTEGER(kind=4)                                 ::  p_row=0, p_col=0, numprocs, myid, ierr      

  ! variables used for saving data and timing
  INTEGER(kind=4)                                 :: count, iol 
  INTEGER(kind=4)                                 :: i,j,k,n,t,allocatestatus
  INTEGER(kind=4)                                 :: ind, numberfile
  CHARACTER*100                                   :: name_config
  INTEGER(kind=4)                                 :: start, finish, count_rate 

  ! initialisation of 2DECOMP&FFT
  CALL MPI_INIT(ierr)
  CALL MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)
  CALL MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr) 
  ! do automatic domain decomposition
  CALL decomp_2d_init(Nx,Ny,Nz,p_row,p_col)
  ! get information about domain decomposition choosen
  CALL decomp_info_init(Nx,Ny,Nz,decomp)
  ! initialise FFT library
  CALL decomp_2d_fft_init
  IF (myid.eq.0) THEN
     PRINT *,'Grid:',Nx,'X',Ny,'Y',Nz,'Z'
     PRINT *,'dt:',dt
  END IF

  ALLOCATE(x(1:Nx),y(1:Ny),z(1:Nz),time(1:Nt+1),mychg(1:3),allchg(1:3),&
       u(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),& 
       v(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       w(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       ux(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       uy(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       uz(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vx(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vy(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vz(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wx(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wy(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wz(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       uold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       uxold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       uyold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       uzold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vxold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vyold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vzold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wxold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wyold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wzold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       utemp(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vtemp(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wtemp(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       temp_r(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       kx(1:Nx),ky(1:Ny),kz(1:Nz),&
       uhat(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       vhat(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       what(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       rhsuhatfix(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       rhsvhatfix(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       rhswhatfix(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       nonlinuhat(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       nonlinvhat(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       nonlinwhat(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       phat(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       temp_c(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       realtemp(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),& 
       realtempx(decomp%xst(1):decomp%xen(1), &
       decomp%xst(2):decomp%xen(2), &
       decomp%xst(3):decomp%xen(3)),&
       realtempy(decomp%xst(1):decomp%xen(1), &
       decomp%xst(2):decomp%xen(2), &
       decomp%xst(3):decomp%xen(3)),&
       realtempz(decomp%xst(1):decomp%xen(1), &
       decomp%xst(2):decomp%xen(2), &
       decomp%xst(3):decomp%xen(3)),&
       stat=AllocateStatus)      
  IF (AllocateStatus .ne. 0) STOP
  IF (myid.eq.0) THEN
     PRINT *,'allocated space'
  END IF

  ! setup fourier frequencies in x-direction
  DO i=1,Nx/2+1
     kx(i)= cmplx(0.0d0,1.0d0)*REAL(i-1,kind(0d0))/Lx                        
  END DO
  kx(1+Nx/2)=0.0d0
  DO i = 1,Nx/2 -1
     kx(i+1+Nx/2)=-kx(1-i+Nx/2)
  END DO
  ind=1
  DO i=-Nx/2,Nx/2-1
     x(ind)=2.0d0*pi*REAL(i,kind(0d0))*Lx/REAL(Nx,kind(0d0))
     ind=ind+1
  END DO
  ! setup fourier frequencies in y-direction
  DO j=1,Ny/2+1
     ky(j)= cmplx(0.0d0,1.0d0)*REAL(j-1,kind(0d0))/Ly                        
  END DO
  ky(1+Ny/2)=0.0d0
  DO j = 1,Ny/2 -1
     ky(j+1+Ny/2)=-ky(1-j+Ny/2)
  END DO
  ind=1
  DO j=-Ny/2,Ny/2-1
     y(ind)=2.0d0*pi*REAL(j,kind(0d0))*Ly/REAL(Ny,kind(0d0))
     ind=ind+1
  END DO
  ! setup fourier frequencies in z-direction
  DO k=1,Nz/2+1
     kz(k)= cmplx(0.0d0,1.0d0)*REAL(k-1,kind(0d0))/Lz                        
  END DO
  kz(1+Nz/2)=0.0d0
  DO k = 1,Nz/2 -1
     kz(k+1+Nz/2)=-kz(1-k+Nz/2)
  END DO
  ind=1
  DO k=-Nz/2,Nz/2-1
     z(ind)=2.0d0*pi*REAL(k,kind(0d0))*Lz/REAL(Nz,kind(0d0))
     ind=ind+1
  END DO
  scalemodes=1.0d0/REAL(Nx*Ny*Nz,kind(0d0))
  IF (myid.eq.0) THEN
     PRINT *,'Setup grid and fourier frequencies'
  END IF

  !initial conditions for Taylor-Green vortex
  !       factor=2.0d0/sqrt(3.0d0)
  !       DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
  !               u(i,j,k)=factor*sin(theta+2.0d0*pi/3.0d0)*sin(x(i))*cos(y(j))*cos(z(k))
  !       END DO; END DO; END DO
  !       DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
  !               v(i,j,k)=factor*sin(theta-2.0d0*pi/3.0d0)*cos(x(i))*sin(y(j))*cos(z(k))
  !       END DO ; END DO ; END DO
  !       DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
  !               w(i,j,k)=factor*sin(theta)*cos(x(i))*cos(y(j))*sin(z(k))
  !       END DO ; END DO ; END DO

  time(1)=0.0d0
  factor=sqrt(3.0d0)
  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     u(i,j,k)=-0.5*( factor*cos(x(i))*sin(y(j))*sin(z(k))&
          +sin(x(i))*cos(y(j))*cos(z(k)) )*exp(-(factor**2)*time(1)/Re)
  END DO; END DO; END DO
  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     v(i,j,k)=0.5*(  factor*sin(x(i))*cos(y(j))*sin(z(k))&
          -cos(x(i))*sin(y(j))*cos(z(k)) )*exp(-(factor**2)*time(1)/Re)
  END DO; END DO ; END DO
  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     w(i,j,k)=cos(x(i))*cos(y(j))*sin(z(k))*exp(-(factor**2)*time(1)/Re)
  END DO; END DO ; END DO

  CALL decomp_2d_fft_3d(u,uhat,DECOMP_2D_FFT_FORWARD)
  CALL decomp_2d_fft_3d(v,vhat,DECOMP_2D_FFT_FORWARD)
  CALL decomp_2d_fft_3d(w,what,DECOMP_2D_FFT_FORWARD)


  ! derivative of u with respect to x, y, and z 
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=uhat(i,j,k)*kx(i)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,ux,DECOMP_2D_FFT_BACKWARD) 
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=uhat(i,j,k)*ky(j)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,uy,DECOMP_2D_FFT_BACKWARD) 
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=uhat(i,j,k)*kz(k)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,uz,DECOMP_2D_FFT_BACKWARD) 

  ! derivative of v with respect to x, y, and z 
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=vhat(i,j,k)*kx(i)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,vx,DECOMP_2D_FFT_BACKWARD)         
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=vhat(i,j,k)*ky(j)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,vy,DECOMP_2D_FFT_BACKWARD) 
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=vhat(i,j,k)*kz(k)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,vz,DECOMP_2D_FFT_BACKWARD)         

  ! derivative of w with respect to x, y, and z 
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=what(i,j,k)*kx(i)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,wx,DECOMP_2D_FFT_BACKWARD)         
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=what(i,j,k)*ky(j)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,wy,DECOMP_2D_FFT_BACKWARD)         
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=what(i,j,k)*kz(k)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,wz,DECOMP_2D_FFT_BACKWARD)         

  ! save initial data
  n = 0
  !coprocessing: removed savedata calls from coprocessing version
  !coprocessing: could have this after the coprocessor initialize and
  !coprocessing: followed by a call to the coprocessor to write out the initial data
  !coprocessing: image. Otherwise, these are just wasted calls at this spot.
  call coprocessorinitialize("pipeline.py", 11)

  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     realtempx(i,j,k)=REAL(wy(i,j,k)-vz(i,j,k),KIND=8)
  END DO; END DO ; END DO
  call update_halo(realtempx, tempxhalo, level=1)

  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     realtempy(i,j,k)=REAL(uz(i,j,k)-wx(i,j,k),KIND=8)
  END DO; END DO ; END DO
  call update_halo(realtempy, tempyhalo, level=1)

  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     realtempz(i,j,k)=REAL(vx(i,j,k)-uy(i,j,k),KIND=8)
  END DO; END DO ; END DO
  call update_halo(realtempz, tempzhalo, level=1)

  !coprocessing: Four cases of sending halo data to the coprocessor.
  if ((ny_global == decomp%xen(2)) .and. (nz_global == decomp%xen(3))) then
     !coprocessing: case 1: pencil furthest from index origin, send just the pencil
     allocate(xpass2c(decomp%xsz(1)*decomp%xsz(2)*decomp%xsz(3)))
     allocate(ypass2c(decomp%xsz(1)*decomp%xsz(2)*decomp%xsz(3)))
     allocate(zpass2c(decomp%xsz(1)*decomp%xsz(2)*decomp%xsz(3)))
     xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)),(/size(xpass2c)/))
     ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)),(/size(ypass2c)/))     
     zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)),(/size(zpass2c)/))
     call NSadaptor(Nx,Ny,Nz,decomp%xst(1),decomp%xen(1), &
          decomp%xst(2),decomp%xen(2),decomp%xst(3),decomp%xen(3), n, n*dt, &
          xpass2c, ypass2c, zpass2c)
     deallocate(xpass2c, ypass2c, zpass2c)
  else if ((ny_global > decomp%xen(2)) .and. (nz_global == decomp%xen(3))) then
     !coprocessing: case 2: pencil furthest in z, send just upper y halo
     allocate(xpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*decomp%xsz(3)))
     allocate(ypass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*decomp%xsz(3)))
     allocate(zpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*decomp%xsz(3)))
     xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)),(/size(xpass2c)/))
     ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)),(/size(ypass2c)/))
     zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)),(/size(zpass2c)/))
     call NSadaptor(Nx,Ny,Nz,decomp%xst(1),decomp%xen(1), &
          decomp%xst(2),decomp%xen(2)+1,decomp%xst(3),decomp%xen(3), n, n*dt, &
          xpass2c, ypass2c, zpass2c)
     deallocate(xpass2c, ypass2c, zpass2c)
  else if ((ny_global == decomp%xen(2)) .and. (nz_global > decomp%xen(3))) then
     !coprocessing: case 3: pencil furthest in y, send just upper z halo
     allocate(xpass2c(decomp%xsz(1)*decomp%xsz(2)*(decomp%xsz(3)+1)))
     allocate(ypass2c(decomp%xsz(1)*decomp%xsz(2)*(decomp%xsz(3)+1)))
     allocate(zpass2c(decomp%xsz(1)*decomp%xsz(2)*(decomp%xsz(3)+1)))
     xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)+1),(/size(xpass2c)/))
     ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)+1),(/size(ypass2c)/))
     zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)+1),(/size(zpass2c)/))
     call NSadaptor(Nx,Ny,Nz,decomp%xst(1),decomp%xen(1), &
          decomp%xst(2),decomp%xen(2),decomp%xst(3),decomp%xen(3)+1, n, n*dt, &
          xpass2c, ypass2c, zpass2c)
     deallocate(xpass2c, ypass2c, zpass2c)
  else
     !coprocessing: case 4: send both upper y and upper z halo
     allocate(xpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*(decomp%xsz(3)+1)))
     allocate(ypass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*(decomp%xsz(3)+1)))
     allocate(zpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*(decomp%xsz(3)+1)))
     xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)+1),(/size(xpass2c)/))
     ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)+1),(/size(ypass2c)/))
     zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)+1),(/size(zpass2c)/))
     call NSadaptor(Nx,Ny,Nz,decomp%xst(1),decomp%xen(1), &
          decomp%xst(2),decomp%xen(2)+1,decomp%xst(3),decomp%xen(3)+1,n,n*dt, &
          xpass2c, ypass2c, zpass2c)
     deallocate(xpass2c,ypass2c,zpass2c)
  end if

  !start timer
  CALL system_clock(start,count_rate)
  ! coprocessing: Simulation loop starts here
  DO n=1,Nt
     !fixed point
     DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
        uold(i,j,k)=u(i,j,k)
        uxold(i,j,k)=ux(i,j,k)
        uyold(i,j,k)=uy(i,j,k)
        uzold(i,j,k)=uz(i,j,k)
     END DO; END DO ; END DO
     DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
        vold(i,j,k)=v(i,j,k)
        vxold(i,j,k)=vx(i,j,k)
        vyold(i,j,k)=vy(i,j,k)
        vzold(i,j,k)=vz(i,j,k)
     END DO; END DO ; END DO
     DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
        wold(i,j,k)=w(i,j,k)
        wxold(i,j,k)=wx(i,j,k)
        wyold(i,j,k)=wy(i,j,k)
        wzold(i,j,k)=wz(i,j,k)
     END DO; END DO ; END DO
     DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
        rhsuhatfix(i,j,k) = (dtInv+(0.5*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)))*uhat(i,j,k) 
     END DO; END DO ; END DO
     DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
        rhsvhatfix(i,j,k) = (dtInv+(0.5*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)))*vhat(i,j,k) 
     END DO; END DO ; END DO
     DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
        rhswhatfix(i,j,k) = (dtInv+(0.5*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)))*what(i,j,k) 
     END DO; END DO ; END DO

     chg=1
     DO WHILE (chg .gt. tol)
        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           temp_r(i,j,k)=0.25d0*((u(i,j,k)+uold(i,j,k))*(ux(i,j,k)+uxold(i,j,k))&
                +(v(i,j,k)+vold(i,j,k))*(uy(i,j,k)+uyold(i,j,k))&
                +(w(i,j,k)+wold(i,j,k))*(uz(i,j,k)+uzold(i,j,k)))
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_r,nonlinuhat,DECOMP_2D_FFT_FORWARD)
        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           temp_r(i,j,k)=0.25d0*((u(i,j,k)+uold(i,j,k))*(vx(i,j,k)+vxold(i,j,k))&
                +(v(i,j,k)+vold(i,j,k))*(vy(i,j,k)+vyold(i,j,k))&
                +(w(i,j,k)+wold(i,j,k))*(vz(i,j,k)+vzold(i,j,k)))
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_r,nonlinvhat,DECOMP_2D_FFT_FORWARD)
        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           temp_r(i,j,k)=0.25d0*((u(i,j,k)+uold(i,j,k))*(wx(i,j,k)+wxold(i,j,k))&
                +(v(i,j,k)+vold(i,j,k))*(wy(i,j,k)+wyold(i,j,k))&
                +(w(i,j,k)+wold(i,j,k))*(wz(i,j,k)+wzold(i,j,k)))
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_r,nonlinwhat,DECOMP_2D_FFT_FORWARD)
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           phat(i,j,k)=-1.0d0*( kx(i)*nonlinuhat(i,j,k)&
                +ky(j)*nonlinvhat(i,j,k)&
                +kz(k)*nonlinwhat(i,j,k))&
                /(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)+0.1d0**13)
        END DO; END DO ; END DO

        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           uhat(i,j,k)=(rhsuhatfix(i,j,k)-nonlinuhat(i,j,k)-kx(i)*phat(i,j,k))/&
                (dtInv-(0.5d0*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k))) !*scalemodes
        END DO; END DO ; END DO
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           vhat(i,j,k)=(rhsvhatfix(i,j,k)-nonlinvhat(i,j,k)-ky(j)*phat(i,j,k))/&
                (dtInv-(0.5d0*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k))) !*scalemodes
        END DO; END DO ; END DO
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           what(i,j,k)=(rhswhatfix(i,j,k)-nonlinwhat(i,j,k)-kz(k)*phat(i,j,k))/&
                (dtInv-(0.5d0*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k))) !*scalemodes
        END DO; END DO ; END DO

        ! derivative of u with respect to x, y, and z 
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=uhat(i,j,k)*kx(i)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,ux,DECOMP_2D_FFT_BACKWARD) 
        DO k=decomp%zst(3),decomp%zen(3); DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=uhat(i,j,k)*ky(j)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,uy,DECOMP_2D_FFT_BACKWARD) 
        DO k=decomp%zst(3),decomp%zen(3); DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=uhat(i,j,k)*kz(k)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,uz,DECOMP_2D_FFT_BACKWARD) 

        ! derivative of v with respect to x, y, and z 
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=vhat(i,j,k)*kx(i)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,vx,DECOMP_2D_FFT_BACKWARD) 
        DO k=decomp%zst(3),decomp%zen(3); DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=vhat(i,j,k)*ky(j)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,vy,DECOMP_2D_FFT_BACKWARD) 
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=vhat(i,j,k)*kz(k)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,vz,DECOMP_2D_FFT_BACKWARD) 

        ! derivative of w with respect to x, y, and z 
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=what(i,j,k)*kx(i)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,wx,DECOMP_2D_FFT_BACKWARD) 
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=what(i,j,k)*ky(j)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,wy,DECOMP_2D_FFT_BACKWARD) 
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=what(i,j,k)*kz(k)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,wz,DECOMP_2D_FFT_BACKWARD) 

        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           utemp(i,j,k)=u(i,j,k)
        END DO; END DO ; END DO
        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           vtemp(i,j,k)=v(i,j,k)
        END DO; END DO ; END DO
        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           wtemp(i,j,k)=w(i,j,k)
        END DO; END DO ; END DO

        CALL decomp_2d_fft_3d(uhat,u,DECOMP_2D_FFT_BACKWARD)    
        CALL decomp_2d_fft_3d(vhat,v,DECOMP_2D_FFT_BACKWARD)    
        CALL decomp_2d_fft_3d(what,w,DECOMP_2D_FFT_BACKWARD)    

        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           u(i,j,k)=u(i,j,k)*scalemodes
        END DO; END DO ; END DO
        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           v(i,j,k)=v(i,j,k)*scalemodes
        END DO; END DO ; END DO
        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           w(i,j,k)=w(i,j,k)*scalemodes
        END DO; END DO ; END DO

        mychg(1) =maxval(abs(utemp-u))
        mychg(2) =maxval(abs(vtemp-v))
        mychg(3) =maxval(abs(wtemp-w))
        CALL MPI_ALLREDUCE(mychg,allchg,3,MPI_DOUBLE_PRECISION,MPI_MAX,MPI_COMM_WORLD,ierr)
        chg=allchg(1)+allchg(2)+allchg(3)
        IF (myid.eq.0) THEN
           PRINT *,'chg:',chg
        END IF
     END DO
     time(n+1)=n*dt

     !goto 5100
     IF (myid.eq.0) THEN     
        PRINT *,'time',n*dt
     END IF

     !coprocessing: populating the arrays sent to the coprocessor
     DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
        realtempx(i,j,k)=REAL(wy(i,j,k)-vz(i,j,k),KIND=8)
     END DO; END DO ; END DO
     call update_halo(realtempx, tempxhalo, level=1)

     DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
        realtempy(i,j,k)=REAL(uz(i,j,k)-wx(i,j,k),KIND=8)
     END DO; END DO ; END DO
     call update_halo(realtempy, tempyhalo, level=1)

     DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
        realtempz(i,j,k)=REAL(vx(i,j,k)-uy(i,j,k),KIND=8)
     END DO; END DO ; END DO
     call update_halo(realtempz, tempzhalo, level=1)

     !coprocessing: same cases as for initial conditions. 
     if ((ny_global == decomp%xen(2)) .and. (nz_global == decomp%xen(3))) then
        ! case 1: send just the pencil
        allocate(xpass2c(decomp%xsz(1)*decomp%xsz(2)*decomp%xsz(3)))
        allocate(ypass2c(decomp%xsz(1)*decomp%xsz(2)*decomp%xsz(3)))
        allocate(zpass2c(decomp%xsz(1)*decomp%xsz(2)*decomp%xsz(3)))
        xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)),(/size(xpass2c)/))
        ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)),(/size(ypass2c)/))     
        zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)),(/size(zpass2c)/))
        call NSadaptor(Nx,Ny,Nz,decomp%xst(1),decomp%xen(1), &
             decomp%xst(2),decomp%xen(2),decomp%xst(3),decomp%xen(3), n, n*dt, &
             xpass2c, ypass2c, zpass2c)
        deallocate(xpass2c, ypass2c, zpass2c)
     else if ((ny_global > decomp%xen(2)) .and. (nz_global == decomp%xen(3))) then
        ! case 2: send just upper y halo
        allocate(xpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*decomp%xsz(3)))
        allocate(ypass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*decomp%xsz(3)))
        allocate(zpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*decomp%xsz(3)))
        xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)),(/size(xpass2c)/))
        ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)),(/size(ypass2c)/))
        zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)),(/size(zpass2c)/))
        call NSadaptor(Nx,Ny,Nz,decomp%xst(1),decomp%xen(1), &
             decomp%xst(2),decomp%xen(2)+1,decomp%xst(3),decomp%xen(3), n, n*dt, &
             xpass2c, ypass2c, zpass2c)
        deallocate(xpass2c, ypass2c, zpass2c)
     else if ((ny_global == decomp%xen(2)) .and. (nz_global > decomp%xen(3))) then
        ! case 3: send just upper z halo
        allocate(xpass2c(decomp%xsz(1)*decomp%xsz(2)*(decomp%xsz(3)+1)))
        allocate(ypass2c(decomp%xsz(1)*decomp%xsz(2)*(decomp%xsz(3)+1)))
        allocate(zpass2c(decomp%xsz(1)*decomp%xsz(2)*(decomp%xsz(3)+1)))
        xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)+1),(/size(xpass2c)/))
        ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)+1),(/size(ypass2c)/))
        zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2),1:decomp%xsz(3)+1),(/size(zpass2c)/))
        call NSadaptor(Nx,Ny,Nz,decomp%xst(1),decomp%xen(1), &
             decomp%xst(2),decomp%xen(2),decomp%xst(3),decomp%xen(3)+1, n, n*dt, &
             xpass2c, ypass2c, zpass2c)
        deallocate(xpass2c, ypass2c, zpass2c)
     else
        ! send both upper y and upper z halo
        allocate(xpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*(decomp%xsz(3)+1)))
        allocate(ypass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*(decomp%xsz(3)+1)))
        allocate(zpass2c(decomp%xsz(1)*(decomp%xsz(2)+1)*(decomp%xsz(3)+1)))
        xpass2c = reshape(tempxhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)+1),(/size(xpass2c)/))
        ypass2c = reshape(tempyhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)+1),(/size(ypass2c)/))
        zpass2c = reshape(tempzhalo(:,1:decomp%xsz(2)+1,1:decomp%xsz(3)+1),(/size(zpass2c)/))
        call NSadaptor(Nx,Ny,Nz,decomp%xst(1),decomp%xen(1), &
             decomp%xst(2),decomp%xen(2)+1,decomp%xst(3),decomp%xen(3)+1,n,n*dt, &
             xpass2c, ypass2c, zpass2c)
        deallocate(xpass2c,ypass2c,zpass2c)
     end if

  END DO
  !coprocessing:
  call coprocessorfinalize()

  CALL system_clock(finish,count_rate)

  IF (myid.eq.0) then
     PRINT *, 'Program took', REAL(finish-start)/REAL(count_rate), 'for main timestepping loop'
  END IF

  IF (myid.eq.0) THEN
     name_config = './data/tdata.dat' 
     OPEN(unit=11,FILE=name_config,status="UNKNOWN")         
     REWIND(11)
     DO n=1,1+Nt
        WRITE(11,*) time(n)
     END DO 
     CLOSE(11)

     name_config = './data/xcoord.dat' 
     OPEN(unit=11,FILE=name_config,status="UNKNOWN")         
     REWIND(11)
     DO i=1,Nx
        WRITE(11,*) x(i)
     END DO
     CLOSE(11)       

     name_config = './data/ycoord.dat' 
     OPEN(unit=11,FILE=name_config,status="UNKNOWN")         
     REWIND(11)
     DO j=1,Ny
        WRITE(11,*) y(j)
     END DO
     CLOSE(11)

     name_config = './data/zcoord.dat' 
     OPEN(unit=11,FILE=name_config,status="UNKNOWN")         
     REWIND(11)
     DO k=1,Nz
        WRITE(11,*) z(k)
     END DO
     CLOSE(11)
     PRINT *,'Saved data'
  END IF

  ! Calculate error in final numerical solution
  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     utemp(i,j,k)=u(i,j,k) -&
          (-0.5*( factor*cos(x(i))*sin(y(j))*sin(z(k))&
          +sin(x(i))*cos(y(j))*cos(z(k)) )*exp(-(factor**2)*time(Nt+1)/Re))
  END DO; END DO; END DO
  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     vtemp(i,j,k)=v(i,j,k) -&
          (0.5*(  factor*sin(x(i))*cos(y(j))*sin(z(k))&
          -cos(x(i))*sin(y(j))*cos(z(k)) )*exp(-(factor**2)*time(Nt+1)/Re))
  END DO; END DO ; END DO
  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     wtemp(i,j,k)=w(i,j,k)-&
          (cos(x(i))*cos(y(j))*sin(z(k))*exp(-(factor**2)*time(Nt+1)/Re))
  END DO; END DO ; END DO
  mychg(1) = maxval(abs(utemp))
  mychg(2) = maxval(abs(vtemp))
  mychg(3) = maxval(abs(wtemp))
  CALL MPI_ALLREDUCE(mychg,allchg,3,MPI_DOUBLE_PRECISION,MPI_MAX,MPI_COMM_WORLD,ierr)
  chg=allchg(1)+allchg(2)+allchg(3)
  IF (myid.eq.0) THEN
     PRINT*,'The error at the final timestep is',chg
  END IF

  ! clean up 
  CALL decomp_2d_fft_finalize
  CALL decomp_2d_finalize

  DEALLOCATE(x,y,z,time,mychg,allchg,u,v,w,ux,uy,uz,vx,vy,vz,wx,wy,wz,uold,uxold,uyold,uzold,&
       vold,vxold,vyold,vzold,wold,wxold,wyold,wzold,utemp,vtemp,wtemp,&
       temp_r,kx,ky,kz,uhat,vhat,what,rhsuhatfix,rhsvhatfix,&
       rhswhatfix,phat,nonlinuhat,nonlinvhat,nonlinwhat,temp_c,&
       realtemp,stat=AllocateStatus)         
  !coprocessing: deallocate Coprocessing specific arrays
  deallocate(realtempx, realtempy, realtempz, tempxhalo, tempyhalo, tempzhalo)
  IF (AllocateStatus .ne. 0) STOP
  IF (myid.eq.0) THEN
     PRINT *,'Program execution complete'
  END IF
  CALL MPI_FINALIZE(ierr)         

END PROGRAM main

Code changes in f90 adaptor glue code

edit

 

 

 

 

( E)
  • Changed the adaptor to accept the subextents of the current pencil.

Code download

! fortran module for interacting with the ParaView CoProcessor
! loosely based on: 
! ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/PhastaAdaptor/phastaNSadaptor.f
! -- Changed for SESE

! subroutine determines if coprocessing needed during the
! current simulation step or not, and if so, calls the coprocessor.
! some of the subroutines are supplied by ParaView's FortranAdaptorAPI,
! others have to be supplied by the programmer.      
module NSadaptor_module 
  use iso_c_binding
  implicit none
  public
  interface NSadaptor
    ! Originally also had a version that accepted 3D arrays, but in all other
    ! respects was identical. Decided this could lead to too much confusion.
    module procedure NSadaptor1D 
  end interface NSadaptor
contains
  subroutine NSadaptor1D(nx, ny, nz, xst, xen, yst, yen, zst, zen, &
                                     step, time, omegax, omegay, omegaz)
    ! nx, ny, nz     -- grid dimensions or entire mesh
    !                   used for setting whole extent
    ! xst, xen, etc. -- extents of current subdomain pencil
    ! step           -- current simulation time step
    ! time           -- current simulation time
    ! omega*          -- scalar array for the current time step
    ! flag           -- receives status from API calls
    integer, intent(in) :: nx, ny, nz, xst, xen, yst, yen, zst, zen, step
    real(kind=8), intent(in) :: time
    real(kind=8), dimension(:), intent (in) :: omegax, omegay, omegaz
    integer :: flag

    ! check if processing this time step
    ! defined in FortranAdaptorAPI.h
    call requestdatadescription(step, time, flag)
        
    if (flag /= 0) then
       ! processing requested
       ! check if need to create grid
       ! defined in FortranAdaptorAPI.h
       call needtocreategrid(flag)
       
       if (flag /= 0) then
          ! grid needed
          ! defined in VTKPointBasedDataSet.cxx
          ! takes the size of the entire grid, and the extents of the
          ! sub grid.
          call createcpimagedata(nx, ny, nz, xst, xen, yst, yen, zst, zen)
       end if
          
       ! defined in VTKPointBasedDataSet.cxx
       ! remember to null-terminate strings for C/C++
       call addfield(omegax, "realtempx"//char(0));
       call addfield(omegay, "realtempy"//char(0));
       call addfield(omegaz, "realtempz"//char(0));       

       ! defined in FortranAdaptorAPI.h
       call coprocess()
    end if
    
    return
  end subroutine NSadaptor1D 
end module NSadaptor_module

Code changes in the C++ wrapping code

edit

 

 

 

 

( F)
  • Specified the partitioned data extents relative to the entire data.
  • Specified the relative spacing of the cells.

Code download

// Adaptor for getting fortran simulation code into ParaView CoProcessor.
// Based on the PhastaAdaptor sample in the ParaView distribution.
// ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/PhastaAdaptor/PhastaAdaptor.cxx


// Fortran specific header
// ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/
#include "FortranAdaptorAPI.h" 

// CoProcessor specific headers
// Routines that take the place of VTK dataset object creation.
// Called from Fortran code which also calls the Fortran Adaptor API
// supplied with ParaView source.
// Note: names mangled with trailing underscores for linker visibility.
#include "vtkCPDataDescription.h"
#include "vtkCPInputDataDescription.h"
#include "vtkCPProcessor.h"
#include "vtkDoubleArray.h"
#include "vtkPointData.h"
#include "vtkSmartPointer.h"
#include "vtkImageData.h"

#include <string>

// These will be called from the Fortran "glue" code"
// Completely dependent on data layout, structured vs. unstructured, etc.
// since VTK/ParaView uses different internal layouts for each.

// Creates the data container for the CoProcessor.
// Takes the extents for both the global dataset and the particular subsection
// visible to the current MPI rank.
// Note: expects to receive Fortran base-1 indices.
extern "C" void createcpimagedata_(int* nx, int* ny, int* nz, int* xst, int* xen,
	int* yst, int* yen, int* zst, int* zen)
{
  if (!ParaViewCoProcessing::GetCoProcessorData()) {
    vtkGenericWarningMacro("Unable to access CoProcessorData.");
    return;
  }

  // The simulation grid is a 3-dimensional topologically and geometrically 
  // regular grid. In VTK/ParaView, this is considered an image data set.
  vtkSmartPointer<vtkImageData> img = vtkSmartPointer<vtkImageData>::New();
  img->SetExtent(*xst - 1, *xen - 1, *yst - 1, *yen - 1, *zst - 1, *zen - 1); 
  img->SetSpacing( 1.0 / *nx, 1.0 / *ny, 1.0 / *nz); 
  ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input")->SetGrid(img);
  ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input")->SetWholeExtent(0, *nx - 1, 0, *ny - 1, 0, *nz - 1);
}

// Add field(s) to the data container.
// Separate from above because this will be dynamic, grid is static.
// Might be an issue, VTK assumes row major C storage.
// Underscore is by-hand name mangling for fortran linking.
// Note: Expects the address of the data, has no way of determining
// if the array is densely packed or not.
// Note 2: Expects "name" to point to null-terminated character array, 
// be sure to null-terminate in caller.
extern "C" void addfield_(double* a, char* name)
{

  vtkSmartPointer<vtkCPInputDataDescription> idd = ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input");
  vtkSmartPointer<vtkImageData> img = vtkImageData::SafeDownCast(idd->GetGrid());

  if (!img) {
    vtkGenericWarningMacro("No adaptor grid to attach field data to.");
    return;
  }


  if (idd->IsFieldNeeded(name)) {
    vtkSmartPointer<vtkDoubleArray> field = vtkSmartPointer<vtkDoubleArray>::New();
    field->SetName(name);
    field->SetArray(a, img->GetNumberOfPoints(), 1); 
    img->GetPointData()->AddArray(field);
  }
}

Pros

edit
  • Some VTK/ParaView filters work with ghost (halo) cells, code could be extended to handle those.

Cons

edit
  • Much code added to the simulation code itself.
  • Additional memory required for the halos and packed arrays.
  • Additional overhead of reshaping arrays for C++.
  • A different decomposition library might require changing the halo logic.
  • Would not work as is with a 3D decomposition library.

Reinterpreting simulation nodes as visualization cells

edit

Another way to look at the data, is that each node of the simulation is a cell in the visualization domain. Then when the pencils are sent to ParaView, there aren’t any gaps of missing cells.

Code changes in simulation code

edit

 

 

 

 

( G)
  • None, beyond the regular CoProcessing calls.

Code download

PROGRAM main    
  !-----------------------------------------------------------------------------------
  !
  !
  ! PURPOSE
  !
  ! This program numerically solves the 3D incompressible Navier-Stokes
  ! on a Cubic Domain [0,2pi]x[0,2pi]x[0,2pi] using pseudo-spectral methods and
  ! Implicit Midpoint rule timestepping. The numerical solution is compared to
  ! an exact solution reported by Shapiro 
  !
  ! Analytical Solution:
  !       u(x,y,z,t)=-0.25*(cos(x)sin(y)sin(z)+sin(x)cos(y)cos(z))exp(-t/Re)
  !       v(x,y,z,t)= 0.25*(sin(x)cos(y)sin(z)-cos(x)sin(y)cos(z))exp(-t/Re)
  !       w(x,y,z,t)= 0.5*cos(x)cos(y)sin(z)exp(-t/Re)
  !
  ! .. Parameters ..
  !  Nx                           = number of modes in x - power of 2 for FFT
  !  Ny                           = number of modes in y - power of 2 for FFT
  !  Nz                           = number of modes in z - power of 2 for FFT
  !  Nt                           = number of timesteps to take
  !  Tmax                         = maximum simulation time
  !  FFTW_IN_PLACE        = value for FFTW input 
  !  FFTW_MEASURE         = value for FFTW input
  !  FFTW_EXHAUSTIVE      = value for FFTW input
  !  FFTW_PATIENT         = value for FFTW input    
  !  FFTW_ESTIMATE        = value for FFTW input
  !  FFTW_FORWARD     = value for FFTW input
  !  FFTW_BACKWARD        = value for FFTW input  
  !  pi = 3.14159265358979323846264338327950288419716939937510d0
  !  Re                           = Reynolds number
  ! .. Scalars ..
  !  i                            = loop counter in x direction
  !  j                            = loop counter in y direction
  !  k                            = loop counter in z direction
  !  n                            = loop counter for timesteps direction  
  !  allocatestatus       = error indicator during allocation
  !  count                        = keep track of information written to disk
  !  iol                          = size of array to write to disk
  !  start                        = variable to record start time of program
  !  finish                       = variable to record end time of program
  !  count_rate           = variable for clock count rate
  !  planfxyz                     = Forward 3d fft plan 
  !  planbxyz                     = Backward 3d fft plan
  !  dt                           = timestep
  ! .. Arrays ..
  !  u                            = velocity in x direction
  !  v                            = velocity in y direction
  !  w                            = velocity in z direction
  !  uold                         = velocity in x direction at previous timestep
  !  vold                         = velocity in y direction at previous timestep
  !  wold                         = velocity in z direction at previous timestep
  !  ux                           = x derivative of velocity in x direction
  !  uy                           = y derivative of velocity in x direction
  !  uz                           = z derivative of velocity in x direction
  !  vx                           = x derivative of velocity in y direction
  !  vy                           = y derivative of velocity in y direction
  !  vz                           = z derivative of velocity in y direction
  !  wx                           = x derivative of velocity in z direction
  !  wy                           = y derivative of velocity in z direction
  !  wz                           = z derivative of velocity in z direction
  !  uxold                        = x derivative of velocity in x direction
  !  uyold                        = y derivative of velocity in x direction
  !  uzold                        = z derivative of velocity in x direction
  !  vxold                        = x derivative of velocity in y direction
  !  vyold                        = y derivative of velocity in y direction
  !  vzold                        = z derivative of velocity in y direction
  !  wxold                        = x derivative of velocity in z direction
  !  wyold                        = y derivative of velocity in z direction
  !  wzold                        = z derivative of velocity in z direction
  !  utemp                        = temporary storage of u to check convergence
  !  vtemp                        = temporary storage of u to check convergence
  !  wtemp                        = temporary storage of u to check convergence
  !  temp_r                       = temporary storage for untransformed variables
  !  uhat                         = Fourier transform of u
  !  vhat                         = Fourier transform of v
  !  what                         = Fourier transform of w
  !  rhsuhatfix           = Fourier transform of righthand side for u for timestepping
  !  rhsvhatfix           = Fourier transform of righthand side for v for timestepping
  !  rhswhatfix           = Fourier transform of righthand side for w for timestepping
  !  nonlinuhat           = Fourier transform of nonlinear term for u
  !  nonlinvhat           = Fourier transform of nonlinear term for u
  !  nonlinwhat           = Fourier transform of nonlinear term for u
  !  phat                         = Fourier transform of nonlinear term for pressure, p
  !  temp_c                       = temporary storage for Fourier transforms
  !  realtemp                     = Real storage
  !
  ! .. Vectors ..
  !  kx                           = fourier frequencies in x direction
  !  ky                           = fourier frequencies in y direction
  !  kz                           = fourier frequencies in z direction
  !  x                            = x locations
  !  y                            = y locations
  !  z                            = y locations
  !  time                         = times at which save data
  !  name_config          = array to store filename for data to be saved
  !               
  ! REFERENCES
  !
  ! A. Shapiro " The use of an exact solution of the Navier-Stokes equations 
  ! in a validation test of a three-dimensional nonhydrostatic numerical model"
  ! Monthly Weather Review vol. 121, 2420-2425, (1993).
  !
  ! ACKNOWLEDGEMENTS
  !
  ! ACCURACY
  !               
  ! ERROR INDICATORS AND WARNINGS
  !
  ! FURTHER COMMENTS
  !
  ! This program has not been optimized to use the least amount of memory
  ! but is intended as an example only for which all states can be saved
  !
  !--------------------------------------------------------------------------------
  ! External routines required
  ! 
  ! External libraries required
  ! 2DECOMP&FFT -- Fast Fourier Transform in the West Library
  !                       (http://2decomp.org/)


  !---------------------------------------------------------------------------------

  USE decomp_2d
  USE decomp_2d_fft
  USE decomp_2d_io
  USE MPI
  !coprocessing:
  use NSadaptor_module 
  IMPLICIT NONE   
  ! declare variables
  INTEGER(kind=4), PARAMETER              :: Nx=256
  INTEGER(kind=4), PARAMETER              :: Ny=256
  INTEGER(kind=4), PARAMETER              :: Nz=256
  INTEGER(kind=4), PARAMETER              :: Lx=1
  INTEGER(kind=4), PARAMETER              :: Ly=1
  INTEGER(kind=4), PARAMETER              :: Lz=1
  INTEGER(kind=4), PARAMETER              :: Nt=20
  REAL(kind=8), PARAMETER                 :: dt=0.05d0/Nt
  REAL(kind=8), PARAMETER                 :: Re=1.0d0     
  REAL(kind=8), PARAMETER                 :: tol=0.1d0**10
  REAL(kind=8), PARAMETER                 :: theta=0.0d0

  REAL(kind=8), PARAMETER &
       ::  pi=3.14159265358979323846264338327950288419716939937510d0
  REAL(kind=8), PARAMETER         ::      ReInv=1.0d0/REAL(Re,kind(0d0))
  REAL(kind=8), PARAMETER         ::  dtInv=1.0d0/REAL(dt,kind(0d0)) 
  REAL(kind=8)                            :: scalemodes,chg,factor
  REAL(kind=8), DIMENSION(:), ALLOCATABLE                 :: x, y, z, time,mychg,allchg
  COMPLEX(kind=8), DIMENSION(:,:,:), ALLOCATABLE  :: u, v, w,&
       ux, uy, uz,&
       vx, vy, vz,&
       wx, wy, wz,&
       uold, uxold, uyold, uzold,&
       vold, vxold, vyold, vzold,&
       wold, wxold, wyold, wzold,&
       utemp, vtemp, wtemp, temp_r

  COMPLEX(kind=8), DIMENSION(:), ALLOCATABLE      ::      kx, ky, kz                                              
  COMPLEX(kind=8), DIMENSION(:,:,:), ALLOCATABLE  ::      uhat, vhat, what,&
       rhsuhatfix, rhsvhatfix,&
       rhswhatfix, nonlinuhat,&
       nonlinvhat, nonlinwhat,&
       phat,temp_c
  !coprocessing: added x,y,z component arrays 
  REAL(kind=8), DIMENSION(:,:,:), ALLOCATABLE     ::  realtemp, realtempx, &
       realtempy, realtempz
  ! MPI and 2DECOMP variables
  TYPE(DECOMP_INFO)                               ::  decomp
  INTEGER(kind=MPI_OFFSET_KIND)                   ::  filesize, disp
  INTEGER(kind=4)                                 ::  p_row=0, p_col=0, numprocs, myid, ierr      

  ! variables used for saving data and timing
  INTEGER(kind=4)                                 :: count, iol 
  INTEGER(kind=4)                                 :: i,j,k,n,t,allocatestatus
  INTEGER(kind=4)                                 :: ind, numberfile
  CHARACTER*100                                   :: name_config
  INTEGER(kind=4)                                 :: start, finish, count_rate 

  ! initialisation of 2DECOMP&FFT
  CALL MPI_INIT(ierr)
  CALL MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)
  CALL MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr) 
  ! do automatic domain decomposition
  CALL decomp_2d_init(Nx,Ny,Nz,p_row,p_col)
  ! get information about domain decomposition choosen
  CALL decomp_info_init(Nx,Ny,Nz,decomp)
  ! initialise FFT library
  CALL decomp_2d_fft_init
  IF (myid.eq.0) THEN
     PRINT *,'Grid:',Nx,'X',Ny,'Y',Nz,'Z'
     PRINT *,'dt:',dt
  END IF

  ALLOCATE(x(1:Nx),y(1:Ny),z(1:Nz),time(1:Nt+1),mychg(1:3),allchg(1:3),&
       u(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),& 
       v(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       w(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       ux(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       uy(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       uz(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vx(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vy(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vz(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wx(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wy(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wz(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       uold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       uxold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       uyold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       uzold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vxold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vyold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vzold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wxold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wyold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wzold(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       utemp(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       vtemp(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       wtemp(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       temp_r(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),&
       kx(1:Nx),ky(1:Ny),kz(1:Nz),&
       uhat(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       vhat(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       what(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       rhsuhatfix(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       rhsvhatfix(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       rhswhatfix(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       nonlinuhat(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       nonlinvhat(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       nonlinwhat(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       phat(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       temp_c(decomp%zst(1):decomp%zen(1),&
       decomp%zst(2):decomp%zen(2),&
       decomp%zst(3):decomp%zen(3)),&
       realtemp(decomp%xst(1):decomp%xen(1),&
       decomp%xst(2):decomp%xen(2),&
       decomp%xst(3):decomp%xen(3)),& 
       realtempx(decomp%xst(1):decomp%xen(1), &
       decomp%xst(2):decomp%xen(2), &
       decomp%xst(3):decomp%xen(3)),&
       realtempy(decomp%xst(1):decomp%xen(1), &
       decomp%xst(2):decomp%xen(2), &
       decomp%xst(3):decomp%xen(3)),&
       realtempz(decomp%xst(1):decomp%xen(1), &
       decomp%xst(2):decomp%xen(2), &
       decomp%xst(3):decomp%xen(3)),&
       stat=AllocateStatus)      
  IF (AllocateStatus .ne. 0) STOP
  IF (myid.eq.0) THEN
     PRINT *,'allocated space'
  END IF

  ! setup fourier frequencies in x-direction
  DO i=1,Nx/2+1
     kx(i)= cmplx(0.0d0,1.0d0)*REAL(i-1,kind(0d0))/Lx                        
  END DO
  kx(1+Nx/2)=0.0d0
  DO i = 1,Nx/2 -1
     kx(i+1+Nx/2)=-kx(1-i+Nx/2)
  END DO
  ind=1
  DO i=-Nx/2,Nx/2-1
     x(ind)=2.0d0*pi*REAL(i,kind(0d0))*Lx/REAL(Nx,kind(0d0))
     ind=ind+1
  END DO
  ! setup fourier frequencies in y-direction
  DO j=1,Ny/2+1
     ky(j)= cmplx(0.0d0,1.0d0)*REAL(j-1,kind(0d0))/Ly                        
  END DO
  ky(1+Ny/2)=0.0d0
  DO j = 1,Ny/2 -1
     ky(j+1+Ny/2)=-ky(1-j+Ny/2)
  END DO
  ind=1
  DO j=-Ny/2,Ny/2-1
     y(ind)=2.0d0*pi*REAL(j,kind(0d0))*Ly/REAL(Ny,kind(0d0))
     ind=ind+1
  END DO
  ! setup fourier frequencies in z-direction
  DO k=1,Nz/2+1
     kz(k)= cmplx(0.0d0,1.0d0)*REAL(k-1,kind(0d0))/Lz                        
  END DO
  kz(1+Nz/2)=0.0d0
  DO k = 1,Nz/2 -1
     kz(k+1+Nz/2)=-kz(1-k+Nz/2)
  END DO
  ind=1
  DO k=-Nz/2,Nz/2-1
     z(ind)=2.0d0*pi*REAL(k,kind(0d0))*Lz/REAL(Nz,kind(0d0))
     ind=ind+1
  END DO
  scalemodes=1.0d0/REAL(Nx*Ny*Nz,kind(0d0))
  IF (myid.eq.0) THEN
     PRINT *,'Setup grid and fourier frequencies'
  END IF

  !initial conditions for Taylor-Green vortex
  !       factor=2.0d0/sqrt(3.0d0)
  !       DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
  !               u(i,j,k)=factor*sin(theta+2.0d0*pi/3.0d0)*sin(x(i))*cos(y(j))*cos(z(k))
  !       END DO; END DO; END DO
  !       DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
  !               v(i,j,k)=factor*sin(theta-2.0d0*pi/3.0d0)*cos(x(i))*sin(y(j))*cos(z(k))
  !       END DO ; END DO ; END DO
  !       DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
  !               w(i,j,k)=factor*sin(theta)*cos(x(i))*cos(y(j))*sin(z(k))
  !       END DO ; END DO ; END DO

  time(1)=0.0d0
  factor=sqrt(3.0d0)
  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     u(i,j,k)=-0.5*( factor*cos(x(i))*sin(y(j))*sin(z(k))&
          +sin(x(i))*cos(y(j))*cos(z(k)) )*exp(-(factor**2)*time(1)/Re)
  END DO; END DO; END DO
  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     v(i,j,k)=0.5*(  factor*sin(x(i))*cos(y(j))*sin(z(k))&
          -cos(x(i))*sin(y(j))*cos(z(k)) )*exp(-(factor**2)*time(1)/Re)
  END DO; END DO ; END DO
  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     w(i,j,k)=cos(x(i))*cos(y(j))*sin(z(k))*exp(-(factor**2)*time(1)/Re)
  END DO; END DO ; END DO

  CALL decomp_2d_fft_3d(u,uhat,DECOMP_2D_FFT_FORWARD)
  CALL decomp_2d_fft_3d(v,vhat,DECOMP_2D_FFT_FORWARD)
  CALL decomp_2d_fft_3d(w,what,DECOMP_2D_FFT_FORWARD)


  ! derivative of u with respect to x, y, and z 
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=uhat(i,j,k)*kx(i)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,ux,DECOMP_2D_FFT_BACKWARD) 
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=uhat(i,j,k)*ky(j)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,uy,DECOMP_2D_FFT_BACKWARD) 
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=uhat(i,j,k)*kz(k)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,uz,DECOMP_2D_FFT_BACKWARD) 

  ! derivative of v with respect to x, y, and z 
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=vhat(i,j,k)*kx(i)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,vx,DECOMP_2D_FFT_BACKWARD)         
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=vhat(i,j,k)*ky(j)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,vy,DECOMP_2D_FFT_BACKWARD) 
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=vhat(i,j,k)*kz(k)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,vz,DECOMP_2D_FFT_BACKWARD)         

  ! derivative of w with respect to x, y, and z 
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=what(i,j,k)*kx(i)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,wx,DECOMP_2D_FFT_BACKWARD)         
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=what(i,j,k)*ky(j)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,wy,DECOMP_2D_FFT_BACKWARD)         
  DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
     temp_c(i,j,k)=what(i,j,k)*kz(k)*scalemodes
  END DO; END DO ; END DO
  CALL decomp_2d_fft_3d(temp_c,wz,DECOMP_2D_FFT_BACKWARD)         

  ! save initial data
  n = 0
  !coprocessing: removed savedata calls from coprocessing version
  !coprocessing: could have this after the coprocessor initialize and
  !coprocessing: followed by a call to the coprocessor to write out the initial data
  !coprocessing: image. Otherwise, these are just wasted calls at this spot.
  call coprocessorinitialize("pipeline.py", 11)

  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     realtempx(i,j,k)=REAL(wy(i,j,k)-vz(i,j,k),KIND=8)
  END DO; END DO ; END DO

  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     realtempy(i,j,k)=REAL(uz(i,j,k)-wx(i,j,k),KIND=8)
  END DO; END DO ; END DO

  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     realtempz(i,j,k)=REAL(vx(i,j,k)-uy(i,j,k),KIND=8)
  END DO; END DO ; END DO

  call NSadaptor(Nx,Ny,Nz,decomp%xst(1),decomp%xen(1), &
          decomp%xst(2),decomp%xen(2),decomp%xst(3),decomp%xen(3), n, n*dt, &
          realtempx, realtempy, realtempz)

  !start timer
  CALL system_clock(start,count_rate)
  ! coprocessing: Simulation loop starts here
  DO n=1,Nt
     !fixed point
     DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
        uold(i,j,k)=u(i,j,k)
        uxold(i,j,k)=ux(i,j,k)
        uyold(i,j,k)=uy(i,j,k)
        uzold(i,j,k)=uz(i,j,k)
     END DO; END DO ; END DO
     DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
        vold(i,j,k)=v(i,j,k)
        vxold(i,j,k)=vx(i,j,k)
        vyold(i,j,k)=vy(i,j,k)
        vzold(i,j,k)=vz(i,j,k)
     END DO; END DO ; END DO
     DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
        wold(i,j,k)=w(i,j,k)
        wxold(i,j,k)=wx(i,j,k)
        wyold(i,j,k)=wy(i,j,k)
        wzold(i,j,k)=wz(i,j,k)
     END DO; END DO ; END DO
     DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
        rhsuhatfix(i,j,k) = (dtInv+(0.5*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)))*uhat(i,j,k) 
     END DO; END DO ; END DO
     DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
        rhsvhatfix(i,j,k) = (dtInv+(0.5*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)))*vhat(i,j,k) 
     END DO; END DO ; END DO
     DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
        rhswhatfix(i,j,k) = (dtInv+(0.5*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)))*what(i,j,k) 
     END DO; END DO ; END DO

     chg=1
     DO WHILE (chg .gt. tol)
        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           temp_r(i,j,k)=0.25d0*((u(i,j,k)+uold(i,j,k))*(ux(i,j,k)+uxold(i,j,k))&
                +(v(i,j,k)+vold(i,j,k))*(uy(i,j,k)+uyold(i,j,k))&
                +(w(i,j,k)+wold(i,j,k))*(uz(i,j,k)+uzold(i,j,k)))
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_r,nonlinuhat,DECOMP_2D_FFT_FORWARD)
        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           temp_r(i,j,k)=0.25d0*((u(i,j,k)+uold(i,j,k))*(vx(i,j,k)+vxold(i,j,k))&
                +(v(i,j,k)+vold(i,j,k))*(vy(i,j,k)+vyold(i,j,k))&
                +(w(i,j,k)+wold(i,j,k))*(vz(i,j,k)+vzold(i,j,k)))
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_r,nonlinvhat,DECOMP_2D_FFT_FORWARD)
        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           temp_r(i,j,k)=0.25d0*((u(i,j,k)+uold(i,j,k))*(wx(i,j,k)+wxold(i,j,k))&
                +(v(i,j,k)+vold(i,j,k))*(wy(i,j,k)+wyold(i,j,k))&
                +(w(i,j,k)+wold(i,j,k))*(wz(i,j,k)+wzold(i,j,k)))
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_r,nonlinwhat,DECOMP_2D_FFT_FORWARD)
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           phat(i,j,k)=-1.0d0*( kx(i)*nonlinuhat(i,j,k)&
                +ky(j)*nonlinvhat(i,j,k)&
                +kz(k)*nonlinwhat(i,j,k))&
                /(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k)+0.1d0**13)
        END DO; END DO ; END DO

        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           uhat(i,j,k)=(rhsuhatfix(i,j,k)-nonlinuhat(i,j,k)-kx(i)*phat(i,j,k))/&
                (dtInv-(0.5d0*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k))) !*scalemodes
        END DO; END DO ; END DO
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           vhat(i,j,k)=(rhsvhatfix(i,j,k)-nonlinvhat(i,j,k)-ky(j)*phat(i,j,k))/&
                (dtInv-(0.5d0*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k))) !*scalemodes
        END DO; END DO ; END DO
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           what(i,j,k)=(rhswhatfix(i,j,k)-nonlinwhat(i,j,k)-kz(k)*phat(i,j,k))/&
                (dtInv-(0.5d0*ReInv)*(kx(i)*kx(i)+ky(j)*ky(j)+kz(k)*kz(k))) !*scalemodes
        END DO; END DO ; END DO

        ! derivative of u with respect to x, y, and z 
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=uhat(i,j,k)*kx(i)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,ux,DECOMP_2D_FFT_BACKWARD) 
        DO k=decomp%zst(3),decomp%zen(3); DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=uhat(i,j,k)*ky(j)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,uy,DECOMP_2D_FFT_BACKWARD) 
        DO k=decomp%zst(3),decomp%zen(3); DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=uhat(i,j,k)*kz(k)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,uz,DECOMP_2D_FFT_BACKWARD) 

        ! derivative of v with respect to x, y, and z 
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=vhat(i,j,k)*kx(i)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,vx,DECOMP_2D_FFT_BACKWARD) 
        DO k=decomp%zst(3),decomp%zen(3); DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=vhat(i,j,k)*ky(j)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,vy,DECOMP_2D_FFT_BACKWARD) 
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=vhat(i,j,k)*kz(k)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,vz,DECOMP_2D_FFT_BACKWARD) 

        ! derivative of w with respect to x, y, and z 
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=what(i,j,k)*kx(i)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,wx,DECOMP_2D_FFT_BACKWARD) 
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=what(i,j,k)*ky(j)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,wy,DECOMP_2D_FFT_BACKWARD) 
        DO k=decomp%zst(3),decomp%zen(3) ; DO j=decomp%zst(2),decomp%zen(2) ; DO i=decomp%zst(1),decomp%zen(1)
           temp_c(i,j,k)=what(i,j,k)*kz(k)*scalemodes
        END DO; END DO ; END DO
        CALL decomp_2d_fft_3d(temp_c,wz,DECOMP_2D_FFT_BACKWARD) 

        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           utemp(i,j,k)=u(i,j,k)
        END DO; END DO ; END DO
        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           vtemp(i,j,k)=v(i,j,k)
        END DO; END DO ; END DO
        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           wtemp(i,j,k)=w(i,j,k)
        END DO; END DO ; END DO

        CALL decomp_2d_fft_3d(uhat,u,DECOMP_2D_FFT_BACKWARD)    
        CALL decomp_2d_fft_3d(vhat,v,DECOMP_2D_FFT_BACKWARD)    
        CALL decomp_2d_fft_3d(what,w,DECOMP_2D_FFT_BACKWARD)    

        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           u(i,j,k)=u(i,j,k)*scalemodes
        END DO; END DO ; END DO
        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           v(i,j,k)=v(i,j,k)*scalemodes
        END DO; END DO ; END DO
        DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
           w(i,j,k)=w(i,j,k)*scalemodes
        END DO; END DO ; END DO

        mychg(1) =maxval(abs(utemp-u))
        mychg(2) =maxval(abs(vtemp-v))
        mychg(3) =maxval(abs(wtemp-w))
        CALL MPI_ALLREDUCE(mychg,allchg,3,MPI_DOUBLE_PRECISION,MPI_MAX,MPI_COMM_WORLD,ierr)
        chg=allchg(1)+allchg(2)+allchg(3)
        IF (myid.eq.0) THEN
           PRINT *,'chg:',chg
        END IF
     END DO
     time(n+1)=n*dt

     !goto 5100
     IF (myid.eq.0) THEN     
        PRINT *,'time',n*dt
     END IF

     !coprocessing: populating the arrays sent to the coprocessor
     DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
        realtempx(i,j,k)=REAL(wy(i,j,k)-vz(i,j,k),KIND=8)
     END DO; END DO ; END DO

     DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
        realtempy(i,j,k)=REAL(uz(i,j,k)-wx(i,j,k),KIND=8)
     END DO; END DO ; END DO

     DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
        realtempz(i,j,k)=REAL(vx(i,j,k)-uy(i,j,k),KIND=8)
     END DO; END DO ; END DO

     call NSadaptor(Nx,Ny,Nz,decomp%xst(1),decomp%xen(1), &
             decomp%xst(2),decomp%xen(2),decomp%xst(3),decomp%xen(3), n, n*dt, &
             realtempx, realtempy, realtempz) 
  END DO
  !coprocessing:
  call coprocessorfinalize()

  CALL system_clock(finish,count_rate)

  IF (myid.eq.0) then
     PRINT *, 'Program took', REAL(finish-start)/REAL(count_rate), 'for main timestepping loop'
  END IF

  IF (myid.eq.0) THEN
     name_config = './data/tdata.dat' 
     OPEN(unit=11,FILE=name_config,status="UNKNOWN")         
     REWIND(11)
     DO n=1,1+Nt
        WRITE(11,*) time(n)
     END DO 
     CLOSE(11)

     name_config = './data/xcoord.dat' 
     OPEN(unit=11,FILE=name_config,status="UNKNOWN")         
     REWIND(11)
     DO i=1,Nx
        WRITE(11,*) x(i)
     END DO
     CLOSE(11)       

     name_config = './data/ycoord.dat' 
     OPEN(unit=11,FILE=name_config,status="UNKNOWN")         
     REWIND(11)
     DO j=1,Ny
        WRITE(11,*) y(j)
     END DO
     CLOSE(11)

     name_config = './data/zcoord.dat' 
     OPEN(unit=11,FILE=name_config,status="UNKNOWN")         
     REWIND(11)
     DO k=1,Nz
        WRITE(11,*) z(k)
     END DO
     CLOSE(11)
     PRINT *,'Saved data'
  END IF

  ! Calculate error in final numerical solution
  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     utemp(i,j,k)=u(i,j,k) -&
          (-0.5*( factor*cos(x(i))*sin(y(j))*sin(z(k))&
          +sin(x(i))*cos(y(j))*cos(z(k)) )*exp(-(factor**2)*time(Nt+1)/Re))
  END DO; END DO; END DO
  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     vtemp(i,j,k)=v(i,j,k) -&
          (0.5*(  factor*sin(x(i))*cos(y(j))*sin(z(k))&
          -cos(x(i))*sin(y(j))*cos(z(k)) )*exp(-(factor**2)*time(Nt+1)/Re))
  END DO; END DO ; END DO
  DO k=decomp%xst(3),decomp%xen(3); DO j=decomp%xst(2),decomp%xen(2); DO i=decomp%xst(1),decomp%xen(1)
     wtemp(i,j,k)=w(i,j,k)-&
          (cos(x(i))*cos(y(j))*sin(z(k))*exp(-(factor**2)*time(Nt+1)/Re))
  END DO; END DO ; END DO
  mychg(1) = maxval(abs(utemp))
  mychg(2) = maxval(abs(vtemp))
  mychg(3) = maxval(abs(wtemp))
  CALL MPI_ALLREDUCE(mychg,allchg,3,MPI_DOUBLE_PRECISION,MPI_MAX,MPI_COMM_WORLD,ierr)
  chg=allchg(1)+allchg(2)+allchg(3)
  IF (myid.eq.0) THEN
     PRINT*,'The error at the final timestep is',chg
  END IF

  ! clean up 
  CALL decomp_2d_fft_finalize
  CALL decomp_2d_finalize

  DEALLOCATE(x,y,z,time,mychg,allchg,u,v,w,ux,uy,uz,vx,vy,vz,wx,wy,wz,uold,uxold,uyold,uzold,&
       vold,vxold,vyold,vzold,wold,wxold,wyold,wzold,utemp,vtemp,wtemp,&
       temp_r,kx,ky,kz,uhat,vhat,what,rhsuhatfix,rhsvhatfix,&
       rhswhatfix,phat,nonlinuhat,nonlinvhat,nonlinwhat,temp_c,&
       realtemp,stat=AllocateStatus)         
  !coprocessing: deallocate Coprocessing specific arrays
  deallocate(realtempx, realtempy, realtempz)
  IF (AllocateStatus .ne. 0) STOP
  IF (myid.eq.0) THEN
     PRINT *,'Program execution complete'
  END IF
  CALL MPI_FINALIZE(ierr)         

END PROGRAM main

Code changes in f90 adaptor glue code

edit

 

 

 

 

( H)
  • Accept the subextents of the current pencil.

Code download

! Fortran module for interacting with the ParaView CoProcessor
! loosely based on: 
! ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/PhastaAdaptor/phastaadaptor.f
! -- Changed for SESE

! Subroutine determines if coprocessing needed during the
! current simulation step or not, and if so, calls the coprocessor.
! Some of the subroutines are supplied by ParaView's FortranAdaptorAPI,
! others have to be supplied by the programmer.      
module NSadaptor_module 
  use iso_c_binding
  implicit none
  public
  interface NSadaptor 
    module procedure NSadaptor3D 
  end interface NSadaptor 
contains
  subroutine NSadaptor3D(nx, ny, nz, xst, xen, yst, yen, zst, zen, &
                                     step, time, omegax, omegay, omegaz)
    ! nx, ny, nz     -- grid dimensions of entire mesh
    !                   used for setting whole extent
    ! xst, xen, etc. -- extents of current subdomain pencil
    ! step           -- current simulation time step
    ! time           -- current simulation time
    ! omega*         -- scalar arrays for the current time step
    integer, intent(in) :: nx, ny, nz, xst, xen, yst, yen, zst, zen, step
    real(kind=8), intent(in) :: time
    real(kind=8), dimension(:,:,:), intent (in) :: omegax, omegay, omegaz
    integer :: flag

    ! check if processing this time step
    ! defined in FortranAdaptorAPI.h
    call requestdatadescription(step, time, flag)
        
    if (flag /= 0) then
       ! processing requested
       ! check if need to create grid
       ! defined in FortranAdaptorAPI.h
       call needtocreategrid(flag)
       
       if (flag /= 0) then
          ! grid needed
          ! defined in VTKCellBasedDataSet.cxx
          ! takes the size of the entire grid, and the extents of the
          ! sub grid.
          call createcpimagedata(nx, ny, nz, xst, xen, yst, yen, zst, zen)
       end if
          
       ! defined in VTKCellBasedDataSet.cxx 
       ! call for each field of interest. Be sure to null-terminate the
       ! string for C++!
       call addfield(omegax, "realtempx"//char(0))
       call addfield(omegay, "realtempy"//char(0))
       call addfield(omegaz, "realtempz"//char(0))       


       ! defined in FortranAdaptorAPI.h
       call coprocess()
    end if
    
    return
  end subroutine NSadaptor3D 
end module NSadaptor_module

Code changes in C++ wrapping code

edit

 

 

 

 

( I)
  • Changed how the extents are calculated.
  • Changes point data calls to cell data calls.

Code download

// Adaptor for getting fortran simulation code into ParaView CoProcessor.
// Based on the PhastaAdaptor sample in the ParaView distribution.
// ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/PhastaAdaptor/PhastaAdaptor.cxx


// Fortran specific header
// ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/
#include "FortranAdaptorAPI.h" 

// CoProcessor specific headers
// Routines that take the place of VTK dataset object creation.
// Called from Fortran code which also calls the Fortran Adaptor API
// supplied with ParaView source.
// Note: names mangled with trailing underscores for linker visibility.
#include "vtkCPDataDescription.h"
#include "vtkCPInputDataDescription.h"
#include "vtkCPProcessor.h"
#include "vtkDoubleArray.h"
#include "vtkPointData.h"
#include "vtkSmartPointer.h"
#include "vtkImageData.h"
#include "vtkCellData.h"
#include <string>

// These will be called from the Fortran adaptor "glue" code.
// Completely dependent on data layout, structured vs. unstructured, etc.
// since VTK/ParaView uses different internal layouts for each.

// Creates the data container for the CoProcessor.
// Takes the extents for both the global dataset and the particular subsection
// visible to the current MPI rank.
// Note: expects to receive Fortran base-1 indices.
extern "C" void createcpimagedata_(int* nx, int* ny, int* nz, int* xst, int* xen,
	int* yst, int* yen, int* zst, int* zen)
{
  if (!ParaViewCoProcessing::GetCoProcessorData()) {
    vtkGenericWarningMacro("Unable to access CoProcessorData.");
    return;
  }

  // The simulation grid is a 3-dimensional topologically and geometrically 
  // regular grid. In VTK/ParaView, this is considered an image data set.
  vtkSmartPointer<vtkImageData> img = vtkSmartPointer<vtkImageData>::New();
  // Indexing based on change from F90 to C++, and also from nodal to cellular.
  img->SetExtent(*xst - 1, *xen, *yst - 1, *yen, *zst - 1, *zen); 
  img->SetSpacing( 1.0 / *nx, 1.0 / *ny, 1.0 / *nz);
 
  ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input")->SetGrid(img);
  ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input")->SetWholeExtent(0, *nx, 0, *ny, 0, *nz);
}

// Add field to the data container.
// Separate from above because this could be dynamic, grid is static.
// Might be an issue, VTK assumes row major C storage.
// Underscore is by-hand name mangling for fortran linking.
// Note: Expects the address of the data, has no way of determining
// if the array is densely packed or not.
// Note 2: Assumes "name" points to null-terminated array of chars.
// Easiest way to do that is to concatenate a terminal NULL in caller. 
extern "C" void addfield_(double* a, char* name)
{

  vtkSmartPointer<vtkCPInputDataDescription> idd = ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input");
  vtkSmartPointer<vtkImageData> img = vtkImageData::SafeDownCast(idd->GetGrid());

  if (!img) {
    vtkGenericWarningMacro("No adaptor grid to attach field data to.");
    return;
  }

  if (idd->IsFieldNeeded(name)) {
    vtkSmartPointer<vtkDoubleArray> field = vtkSmartPointer<vtkDoubleArray>::New();
    field->SetName(name);
    field->SetArray(a, img->GetNumberOfCells(), 1); 
    img->GetCellData()->AddArray(field);
  }
}

Pros

edit
  • Minimal code changes, none in simulation code (beyond the coprocessing calls).
  • No halo or packed arrays needed.
  • Should work with different decomposition libraries without change.

Cons

edit
  • Requires extra handling in the ParaView client to handle the differences between cell and point rendering.
  • Won’t work as is with VTK/ParaView filters that require ghost cells.

Additional notes on creating pipelines in the ParaView client

edit

Working with Fortran complex type

edit

C++ does not have a native complex data type. This requires some additional handling in the ParaView client:

  1. Run the original non-coprocessing simulation at a reduced mesh size.
  2. Open this data in the ParaView client using the RAW binary reader.
  3. Set the number of Scalar Components to 2
  4. Create the pipeline, the real and imaginary parts’s default names are ImageFile_X and ImageFile_Y, respectively.

 

 

 

 

( J)

See the code for NLSadaptor.f90 for details on passing a complex to the coprocessor. Code download

! Fortran module for interacting with the ParaView CoProcessor
! loosely based on: 
! ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/PhastaAdaptor/phastaadaptor.f
! -- Changed for SESE

! Subroutine determines if coprocessing needed during the
! current simulation step or not, and if so, calls the coprocessor.
! Some of the subroutines are supplied by ParaView's FortranAdaptorAPI,
! others have to be supplied by the programmer.      
module NLSadaptor_module 
  use iso_c_binding
  implicit none
  public
  ! using an interface incase need arises to overload 1D, 2D versions
  interface NLSadaptor 
    module procedure NLSadaptor3D 
  end interface NLSadaptor 
contains
  subroutine NLSadaptor3D(nx, ny, nz, xst, xen, yst, yen, zst, zen, &
                                     step, time, a)
    ! nx, ny, nz     -- grid dimensions of entire mesh
    !                   used for setting whole extent
    ! xst, xen, etc. -- extents of current subdomain pencil
    ! step           -- current simulation time step
    ! time           -- current simulation time
    ! a              -- scalar array for the current time step
    ! flag           -- receives status from API calls
    integer, intent(in) :: nx, ny, nz, xst, xen, yst, yen, zst, zen, step
    real(kind=8), intent(in) :: time
    complex(kind=8), dimension(:,:,:), intent (in) :: a 
    integer :: flag

    flag = 0
    ! check if processing this time step
    ! defined in FortranAdaptorAPI.h
    call requestdatadescription(step, time, flag)
        
    if (flag /= 0) then
       ! processing requested
       ! check if need to create grid
       ! defined in FortranAdaptorAPI.h
       call needtocreategrid(flag)
       
       if (flag /= 0) then
          ! grid needed
          ! defined in adaptor.cxx
          ! takes the size of the entire grid, and the extents of the
          ! sub grid.
          call createcpimagedata(nx, ny, nz, xst, xen, yst, yen, zst, zen)
       end if
          
       ! defined in adaptor.cxx
       ! Be sure to null-terminate the
       ! string for C++!
       call addfield(a, "u_complex"//char(0))

       ! defined in FortranAdaptorAPI.h
       call coprocess()
    end if
    
    return
  end subroutine NLSadaptor3D 
end module NLSadaptor_module

 

 

 

 

 

( K )

This also requires some minor code changes in the accompanying C++ file. Code Download

// Adaptor for getting fortran simulation code into ParaView CoProcessor. // Based on the PhastaAdaptor sample in the ParaView distribution. // ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/PhastaAdaptor/PhastaAdaptor.cxx


// Fortran specific header // ParaView-3.14.1-Source/CoProcessing/Adaptors/FortranAdaptors/

  1. include "FortranAdaptorAPI.h"

// CoProcessor specific headers // Routines that take the place of VTK dataset object creation. // Called from Fortran code which also calls the Fortran Adaptor API // supplied with ParaView source. // Note: names mangled with trailing underscores for linker visibility.

  1. include "vtkCPDataDescription.h"
  2. include "vtkCPInputDataDescription.h"
  3. include "vtkCPProcessor.h"
  4. include "vtkDoubleArray.h"
  5. include "vtkPointData.h"
  6. include "vtkSmartPointer.h"
  7. include "vtkImageData.h"
  8. include "vtkCellData.h"
  9. include <string>

// These will be called from the Fortran "glue" code" // Completely dependent on data layout, structured vs. unstructured, etc. // since VTK/ParaView uses different internal layouts for each.

// Creates the data container for the CoProcessor. // Takes the extents for both the global dataset and the particular subsection // visible to the current MPI rank. // Note: expects to receive Fortran base-1 indices. extern "C" void createcpimagedata_(int* nx, int* ny, int* nz, int* xst, int* xen, int* yst, int* yen, int* zst, int* zen) {

 if (!ParaViewCoProcessing::GetCoProcessorData()) {
   vtkGenericWarningMacro("Unable to access CoProcessorData.");
   return;
 }
 // The simulation grid is a 3-dimensional topologically and geometrically 
 // regular grid. In VTK/ParaView, this is considered an image data set.
 vtkSmartPointer<vtkImageData> img = vtkSmartPointer<vtkImageData>::New();
 // For Fortram complex, need to set that there are 2 scalar components
 img->SetNumberOfScalarComponents(2);
 // Indexing based on change from F90 to C++, and also from nodal to cellular.
 // Extents are given in terms of nodes, not cells.
 img->SetExtent(*xst - 1, *xen, *yst - 1, *yen, *zst - 1, *zen); 
 
 // Setting spacing is important so that the camera position in the pipeline makes
 // sense if using different sized meshes between setting up the pipeline and running
 // the simulation. Origin can often be ignored.
 img->SetSpacing( 1.0 / *nx, 1.0 / *ny, 1.0 / *nz); // considering passing in as args. 
 ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input")->SetGrid(img);
 ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input")->SetWholeExtent(0, *nx, 0, *ny, 0, *nz);

}

// Add field to the data container. // Separate from above because this could be dynamic, grid is static. // Might be an issue, VTK assumes row major C storage. // Underscore is by-hand name mangling for fortran linking. // Note: Expects the address of the data, has no way of determining // if the array is densely packed or not. // Note 2: Assumes "name" points to null-terminated array of chars. // Easiest way to do that is to concatenate in caller. extern "C" void addfield_(double* a, char* name) {

 vtkSmartPointer<vtkCPInputDataDescription> idd = ParaViewCoProcessing::GetCoProcessorData()->GetInputDescriptionByName("input");
 vtkSmartPointer<vtkImageData> img = vtkImageData::SafeDownCast(idd->GetGrid());
 if (!img) {
   vtkGenericWarningMacro("No adaptor grid to attach field data to.");
   return;
 }
 if (idd->IsFieldNeeded(name)) {
   vtkSmartPointer<vtkDoubleArray> field = vtkSmartPointer<vtkDoubleArray>::New();
   field->SetNumberOfComponents(2);
   field->SetName(name);
   field->SetArray(a, 2* img->GetNumberOfCells(), 1); 
   img->GetCellData()->AddArray(field);
 }

}

Working with cell data

edit
  • Some ParaView filters and representations will only work with point data, e.g. volume rendering. Use the Cell Data to Point Data filter to convert. Be sure to check the option to pass the cellular data through the filter. If a simulation runs with no errors and produces no output, a possible culprit is failing to have checked that option when creating the pipeline.

vtkKDTreeGenerator warnings

edit
  • Pipelines that use volume rendering will generate warnings about Region IDs being 0. These can often be ignored.

ParaView CoProcessing Resources

edit