MINC/Tutorials/NumericPyminc

Numeric Pyminc, or how python made me love fortran, or eat my shorts, Matlab!

The previous tutorials taught some pyminc basics, but left the basic question of why use python and pyminc at all open. If C and perl were good enough for Peter Neelin, why care? The main reason is, in my opinion, that you can use python as a programming language that's a pleasure to use (unlike, gag, Matlab), is freely distributable (Matlab? Heh.), yet can give you speeds close to native C. And there's more than one way to do it. Read on.

Cortical thickness, or an ode to Pierre-Simon, the Marquis de Laplace

One of the ways we have of measuring cortical thickness is with Laplace's equation to create streamlines between the inside and the outside of the cortex. So, for the numerical examples below, I will take a core bit of that problem and solve it in all the different numerical ways accessible to python. In short, given defined boundaries of inside and outside (see labels.mnc in the link above) it creates isopotential lines between them. This is done through a simple iterative six-point averaging style. As a lookahead, the sample labels and the code can all be found here:

http://www.bic.mni.mcgill.ca/users/jason/pyminc-tutorial/

Standard python

The simple solution given the boundaries is the following:

# the simplest solver in plain old unfettered python
def singleSolveLaplace(grid, output):
    convergence = 0
    for v0 in range(grid.sizes[0]):
        for v1 in range(grid.sizes[1]):
            for v2 in range(grid.sizes[2]):
                if grid.data[v0,v1,v2] > innerValue and \
                       grid.data[v0,v1,v2] < outerValue:
                    oldvalue = output.data[v0,v1,v2]
                    output.data[v0,v1,v2] = (output.data[v0+1,v1,v2] + \
                                             output.data[v0-1,v1,v2] + \
                                             output.data[v0,v1+1,v2] + \
                                             output.data[v0,v1-1,v2] + \
                                             output.data[v0,v1,v2+1] + \
                                             output.data[v0,v1,v2-1]) / 6.0
                    convergence += (oldvalue - output.data[v0,v1,v2])
    return(convergence)

Iterate over every voxel, sum up all neighbours and divide by six. That's it. On the example grid, this takes about 7 seconds on my computer, and there be the rub. Python is an interpreted language, and simply not very efficient when iterating over voxels as described above.

Vectorized python

As any Matlab nerd will tell you, one solution is to vectorize the problem. That can be done in python too, resulting in the following code:

# vectorizing all the operations - much faster, not possible for all
# routines.
def vectorizedLaplace(grid, output):
    output.data[1:-1,1:-1,1:-1] = (output.data[0:-2,1:-1,1:-1] + \
                                   output.data[2:,1:-1,1:-1] + \
                                   output.data[1:-1,0:-2,1:-1] + \
                                   output.data[1:-1,2:,1:-1] + \
                                   output.data[1:-1,1:-1,0:-2] + \
                                   output.data[1:-1,1:-1,2:,]) / 6
    output.data[grid.data ==10] = 10
    output.data[grid.data == 0] = 0
    return(0)

Much faster - about 0.4 seconds. Uglier to read, however, and not every problem is amenable to these kinds of tricks.

Enter weave

So you could rewrite the whole shebang in C or C++. But then you'd have to deal with the pain of coding in either. So why not just take the inner loop and rewrite that? Simple in python - enter a nice extension known as weave. Here a C++ string can be embedded in python. The first time it sees it, it compiles it on the fly, and unless the code changes will from then on reuse the compiled output.

    convergence = 0
    nv0 = sizes[0]
    nv1 = sizes[1]
    nv2 = sizes[2]
    code = r"""
#line 45 "laplacian-python-test.py"
for (int v0=0; v0 < nv0; v0++) {
   for (int v1=0; v1 < nv1; v1++) {
      for (int v2=0; v2 < nv2; v2++) {
         if (grid(v0,v1,v2) > 0 && grid(v0,v1,v2) < 10) {
            double oldvalue = output(v0, v1, v2);
            output(v0,v1,v2) = (output(v0+1,v1,v2) +
                                output(v0-1,v1,v2) +
                                output(v0,v1+1,v2) +
                                output(v0,v1-1,v2) +
                                output(v0,v1,v2+1) +
                                output(v0,v1,v2-1))/6;
            convergence += oldvalue - output(v0,v1,v2);
}}}}
"""
    weave.inline(code, ['grid', 'output', 'convergence', 'nv0', 'nv1', 'nv2'],
                 type_converters = converters.blitz,
                 compiler='gcc')
    return(convergence)

Again takes about 0.35 seconds. Not bad.

do 300, v0=1, nv0 - rocking it old school

Along with that C++ interface in weave, there's a fantastic Fortran interface to python in f2py. Wouldn't you know it, python makes Fortran loveable again. While any Fortran dialect can be used, I stuck to F77 for the masochism of it all. Take the following Fortran code:

C FILE: laplace.f
      subroutine laplace (grid, output, nv0, nv1, nv2, convergence)
C     
C     single evaluation of laplace's function
C
      INTEGER grid(nv0, nv1, nv2)
      REAL output(nv0, nv1, nv2)
      double precision convergence
      double precision oldvalue

      INTEGER v0, v1, v2
cf2py intent(in) :: grid, n0, nv1, nv2
cf2py intent(inplace) :: output
cf2py intent(out) :: convergence

      do 300, v0 = 1, nv0
         do 200, v1 = 1, nv1
            do 100, v2 = 1, nv2
               if (output(v0,v1,v2) .lt. 10) then
                  if (output(v0,v1,v2) .gt. 0) then
                     oldvalue = output(v0,v1,v2)
                     
                     output(v0,v1,v2) = (output(v0-1, v1, v2) + 
     +                    output(v0+1,v1,v2) + 
     +                    output(v0,v1-1,v2) + 
     +                    output(v0,v1+1,v2) + 
     +                    output(v0,v1,v2-1) + 
     +                    output(v0,v1,v2+1))/6
                     convergence = convergence + 
     +                    (oldvalue - output(v0,v1,v2))
                  endif
               endif
 100        continue
 200     continue
 300  continue
      end

Then, on the command line, compile it with 'f2py -c laplace.f -m laplace', and call it like so in python:

# in fortran (F77, no less - rocking it old school!)
# actual code is in laplace.f
def flaplaceSolve(g,o):
    convergence = 0
    convergence = laplace.laplace(g,o, grid.sizes[0],
                                  grid.sizes[1], grid.sizes[2])
    return(convergence)

And done! 0.315 seconds now. Note that if you look at the full source code I had to add a transpose due to the joys of row versus column major arrays.

Purdy cython

But what if you wanted to write clean python code and not delve in Fortran? Yet another extension to the rescue - cython. It's a dialect of python which lets you add type information in order to get near native speeds. The code is the following:

# compile as follows:
# cython cython_laplace.pyx
# gcc -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I/usr/include/python2.5 -o cython_laplace.so cython_laplace.c

# note - there are smarter ways of doing this in distutils.

import numpy as np
cimport numpy as np

# some type definitions - first for the floats
FDTYPE = np.float64
ctypedef np.float64_t FDTYPE_t

# ... and then for the unsigned bytes
BDTYPE = np.uint8
ctypedef np.uint8_t BDTYPE_t

# the function definition - notice the types and dims, which are key for speed.
def cythonLaplaceStep(np.ndarray[BDTYPE_t, ndim=3] g,
                      np.ndarray[FDTYPE_t, ndim=3] o):

    # die a horrible flaming death if inputs had different types
    assert g.dtype == BDTYPE and o.dtype == FDTYPE

    # get the dimension info - again, notice the importance of defining types
    cdef int nv0 = g.shape[0]
    cdef int nv1 = g.shape[1]
    cdef int nv2 = g.shape[2]

    cdef int v0, v1, v2

    cdef double convergence = 0.0
    cdef double oldvalue

    # the actual loop - looks identical to the python code
    for v0 in range(nv0):
        for v1 in range(nv1):
            for v2 in range(nv2):
                if g[v0,v1,v2] > 0 and g[v0,v1,v2] < 10:
                    oldvalue = o[v0,v1,v2]
                    o[v0,v1,v2] = (o[v0+1,v1,v2] +
                                   o[v0-1,v1,v2] +
                                   o[v0,v1+1,v2] +
                                   o[v0,v1-1,v2] +
                                   o[v0,v1,v2+1] +
                                   o[v0,v1,v2-1])/6.0
                    convergence += oldvalue - o[v0,v1,v2]
    return(convergence)

Then this bit is compiled - notice how the code looks almost identical to the simple and slow python loop except for the addition of types - and called from the main python code as follows:

# in cython - the actual code is in cython_laplace.pyx
def cythonLaplace(grid, output):
    convergence = cython_laplace.cythonLaplaceStep(asarray(grid.data),
                                                   asarray(output.data))
    return(convergence)

Takes about 0.32 seconds.

Conclusions

Above are all the reasons to use pyminc - many options for how to write efficient code in a language that's a pleasure to work with. The full code used is below - have fun:

#!/usr/bin/env python

# lots of imports
from pyminc.volumes.factory import *
from numpy import *
import sys
from optparse import OptionParser
from scipy import weave
from scipy.weave import converters

import laplace # the fortran version
import cython_laplace # and the cython version

# the simplest solver in plain old unfettered python
def singleSolveLaplace(grid, output):
    convergence = 0
    for v0 in range(grid.sizes[0]):
        for v1 in range(grid.sizes[1]):
            for v2 in range(grid.sizes[2]):
                if grid.data[v0,v1,v2] > innerValue and \
                       grid.data[v0,v1,v2] < outerValue:
                    oldvalue = output.data[v0,v1,v2]
                    output.data[v0,v1,v2] = (output.data[v0+1,v1,v2] + \
                                             output.data[v0-1,v1,v2] + \
                                             output.data[v0,v1+1,v2] + \
                                             output.data[v0,v1-1,v2] + \
                                             output.data[v0,v1,v2+1] + \
                                             output.data[v0,v1,v2-1]) / 6.0
                    convergence += (oldvalue - output.data[v0,v1,v2])
    return(convergence)

# vectorizing all the operations - much faster, not possible for all
# routines.
def vectorizedLaplace(grid, output):
    output.data[1:-1,1:-1,1:-1] = (output.data[0:-2,1:-1,1:-1] + \
                                   output.data[2:,1:-1,1:-1] + \
                                   output.data[1:-1,0:-2,1:-1] + \
                                   output.data[1:-1,2:,1:-1] + \
                                   output.data[1:-1,1:-1,0:-2] + \
                                   output.data[1:-1,1:-1,2:,]) / 6
    output.data[grid.data ==10] = 10
    output.data[grid.data == 0] = 0
    return(0)

# in cython - the actual code is in cython_laplace.pyx
def cythonLaplace(grid, output):
    convergence = cython_laplace.cythonLaplaceStep(asarray(grid.data),
                                                   asarray(output.data))
    return(convergence)

# in weave - allows you to embed a C++ string in the code
def weaveLaplace(grid, output, sizes):
    convergence = 0
    nv0 = sizes[0]
    nv1 = sizes[1]
    nv2 = sizes[2]
    code = r"""
#line 45 "laplacian-python-test.py"
for (int v0=0; v0 < nv0; v0++) {
   for (int v1=0; v1 < nv1; v1++) {
      for (int v2=0; v2 < nv2; v2++) {
         if (grid(v0,v1,v2) > 0 && grid(v0,v1,v2) < 10) {
            double oldvalue = output(v0, v1, v2);
            output(v0,v1,v2) = (output(v0+1,v1,v2) +
                                output(v0-1,v1,v2) +
                                output(v0,v1+1,v2) +
                                output(v0,v1-1,v2) +
                                output(v0,v1,v2+1) +
                                output(v0,v1,v2-1))/6;
            convergence += oldvalue - output(v0,v1,v2);
}}}}
"""
    weave.inline(code, ['grid', 'output', 'convergence', 'nv0', 'nv1', 'nv2'],
                 type_converters = converters.blitz,
                 compiler='gcc')
    return(convergence)

# in fortran (F77, no less - rocking it old school!)
# actual code is in laplace.f
def flaplaceSolve(g,o):
    convergence = 0
    convergence = laplace.laplace(g,o, grid.sizes[0],
                                  grid.sizes[1], grid.sizes[2])
    return(convergence)

if __name__ == "__main__":

    usage = "%prog input-grid.mnc output-solved.mnc"
    description = "Simple solver of laplace's equation - designed to help teach people how to use numeric python"


    # argument handling - an option chooses which way to do the computation
    parser = OptionParser(usage=usage, description=description)
    parser.add_option("--slow-python", dest="mode", action="store_const",
                      const="slow-python")
    parser.add_option("--fortran", dest="mode", action="store_const",
                      const="fortran")
    parser.add_option("--vectorized-python", dest="mode", action="store_const",
                      const="vector-python")
    parser.add_option("--weave", dest="mode", action="store_const",
                      const="weave")
    parser.add_option("--cython", dest="mode", action="store_const",
                      const="cython")

    (options, args) = parser.parse_args()
    if len(args) != 2:
        parser.error("Incorrect number of arguments")
        
    gridfile = args[0]
    outputfile = args[1]

    # get the volumes
    grid = volumeFromFile(gridfile, dtype='ubyte')
    output = volumeLikeFile(gridfile, outputfile)
    # initialize the output to the boundaries
    output.data[:,:,:] = grid.data[:,:,:]

    innerValue = 0
    outerValue = 10

    convergence = 0

    # transpose for the fortran bits - the joys of column versus row major
    g = asfortranarray(grid.data)
    o = asfortranarray(output.data)

    for i in range(100):
        if options.mode == "slow-python":
            convergence = singleSolveLaplace(grid, output)
        elif options.mode == "fortran":
            convergence = flaplaceSolve(g,o)
        elif options.mode == "vector-python":
            convergence = vectorizedLaplace(grid, output)
        elif options.mode == "weave":
            # notice the cast to array - no speed penalty, but necessary
            # since weave does not understand the hyperslab subclass
            # that pyminc uses
            convergence = weaveLaplace(asarray(grid.data),
                                       asarray(output.data),
                                       grid.sizes)
        elif options.mode == "cython":
            convergence = cythonLaplace(grid, output)
    if options.mode == "fortran":
        output.data = ascontiguousarray(o)

    output.writeFile()
    output.closeVolume()
    grid.closeVolume()