Understanding Multitouch/Image Processing

Processing edit

This chapter is designed to represent the computational model for the upcoming chapters. This is important as we may implement these algorithms as either software or hardware, in parallel or in serial. Before we go further, you should read the chapter on Image Acquisition, and understand our theoretical sensor model.

Software Models edit

As of 2007, computer hardware has become very sophisticated, and has made inclusions specifically designed for signal processing, including both Single Instruction Multiple Data (SIMD) hardware and multiple cores and multiple threads in hardware (SMT). Both of these will influence the way we will want to import our data from our sensor, to make the best we can from our hardware.

One of the most important things to understand while implementing for an SIMD machine is the idea of Arrays of Structures verses Structures of Arrays. Inexperienced and older programmers will often miss optimization opportunities afforded to them by SIMD hardware by mistakenly using older concepts of storing the data in memory. The design idiom of Array of Structures is thusly rampant in older code as a way of improving readability.

Example 1: Array of Structures edit

  structure 3d_Point {
     int32 x, y, z
  }
  3d_Point OurData[20];

This makes inefficient use of our hardware because we cannot process these structures in parallel, and can cause problems with processor cache loading and unloading due to (mis)alignment. Processor companies have implemented instructions to aid in the reinterpretation of this kind of data (often called "swizzling" and "deswizzling"), but by simply designing the code correctly in the first place, we can avoid these slower instructions.

Example 2: Structure of Arrays edit

  structure 3d_Points {
     int32 x[20], y[20], z[20]
  }
  3d_Points OurData;

The Structures of Arrays idiom takes advantage of our parallel hardware more efficiently. Because our structure encapsulates more data, we can get all of the data into the processor's cache earlier on. The alignment is now fixed and aligned to 4 bytes, we can skip swizzling and deswizzling.