Understanding Multitouch

The current, editable version of this book is available in Wikibooks, the open-content textbooks collection, at
https://en.wikibooks.org/wiki/Understanding_Multitouch

Permission is granted to copy, distribute, and/or modify this document under the terms of the Creative Commons Attribution-ShareAlike 3.0 License.

Image Acquisition

Acquisition

When thinking about a multitouch system, it is advantageous to break it into several components that stream together to form the entire system. As such, the first step in the pipeline is always acquisition. Later in the book we will describe the hardware from which we will acquire the data, but for now, we will talk about how to model that data in software.

Format

To start, we need to come up with a format to describe the system. We do this at acquisition because every component further down the pipeline will require knowledge of what this format is, and how to interact with it. In other systems used in digital signal processing, data is imported in the form of raw image packets or raw sound packets in well-known formats. For example, an image is often imported in the form of Y:Cr:Cb (seen in RAW-mode imagery) or RGB (in other devices such as webcams and cheaper digital cameras), and sound is often represented in the form of PCM data.

Neither PCM nor either image mode is directly applicable to multitouch as the data from the sensor in this case may not model neither sound nor an image. We tend to think of the data coming from our sensor as histogram data, intensities at a point on a grid. Imaging models best represent this but obviously neither RGB nor Y:Cr:Cb describe this properly. We will need to define a new format for our own usage here.

The first step is depicting the average resolution of our underlying device. This can vary wildly, with lower-end capacitive sensors and pressure sensors presenting a very low-resolution data source, to image sensors which produce data in a vast amount of detail, often too much detail to deal with. A good starting point for your system will be 8 bits of resolution (representing 2^8 = 256 different possible reference points), but your sensor may need a larger range.

Determining what is usable data and what is noise is the next part of our solution. A sensor can usually do this at startup, or with a calibration routine by collecting multiple samples of data from the sensor, averaging that data, and using the result as a reference for the platform's noise (alg 1). Again, as this is hardware dependent, it will be up to the implementer to decide how many samples at what rate is enough. One thing to keep in mind is that many sensors are temperature dependent and can possibly drift, so it may be advantageous to implement your calibration routine as a ring buffer and continuously calibrate with a running average.

Noise

As will be made abundantly clear later, the key enemy in any multitouch system is noise. Noise in our case is simply information that is incorrect. How it got to being incorrect (for example, a sensor-to-buffer resolution mismapping) is not as important as how incorrect it actually is (white noise verses pink or otherwise-colored noise). For example, Wiener Deconvolution can correct errors that often occur due to white noise. Other kinds of noise can be removed from the signal through dynamic programming algorithms.

Realize that at the acquisition-level, we do not need to be perfect. The more time we waste here, the less time we have further down the pipeline to make the device responsive to the user's input. While it is worthwhile to try to remove noise programmatically, the best way is to insure the noise never enters the system to begin with and should be dealt with in your hardware sensor's implementation as we will discuss later in the book.

An Abstract Model

For the purposes of this book, we will need to describe a model that will be universal to any programming examples. For simplicity's sake, the model will not describe a physical sensor, but instead a theoretical "perfect sensor" in which we can trust that all of the data is 100% correct. Error considerations will be noted in due course, but will not be taken into consideration when discussing our sensor model.

The basis for all multitouch as described in the Overview comes from the idea of having two-dimensional input sensor. The sensor does not need to be of high resolution, and as we will be sampling it often, we will want to throw away as much data as possible as early on as possible. Our theoretical model sensor will have a horizontal and vertical resolution of 8 elements consisting of 8-bits of resolution. Consider that the 8 elements on either axis are equally and evenly spaced apart; whether or not this spacing is a few millimeters or a centimeter apart (or more) is irrelevant to our model. With our model, we will call our 8-bits of depth the pressure resolution, and we will call our horizontal and vertical resolutions collectively spatial resolution.

Our pressure resolution will be modeled as an integer with a normalized and calibrated zero pressure indicated by 0, and the maximum amount of pressure we can register as 255. In a real system, it may be advantageous to reserve the lower two bits to indicate error states such as a sample overflowing the representable resolution, but in this model our sensor will always represent perfect data. Our spatial resolution will be modeled as integers from 0 to 7 on either axis, represented as a matrix.

Image Processing

Processing

This chapter is designed to represent the computational model for the upcoming chapters. This is important as we may implement these algorithms as either software or hardware, in parallel or in serial. Before we go further, you should read the chapter on Image Acquisition, and understand our theoretical sensor model.

Software Models

As of 2007, computer hardware has become very sophisticated, and has made inclusions specifically designed for signal processing, including both Single Instruction Multiple Data (SIMD) hardware and multiple cores and multiple threads in hardware (SMT). Both of these will influence the way we will want to import our data from our sensor, to make the best we can from our hardware.

One of the most important things to understand while implementing for an SIMD machine is the idea of Arrays of Structures verses Structures of Arrays. Inexperienced and older programmers will often miss optimization opportunities afforded to them by SIMD hardware by mistakenly using older concepts of storing the data in memory. The design idiom of Array of Structures is thusly rampant in older code as a way of improving readability.

Example 1: Array of Structures

  structure 3d_Point {
     int32 x, y, z
  }
  3d_Point OurData[20];

This makes inefficient use of our hardware because we cannot process these structures in parallel, and can cause problems with processor cache loading and unloading due to (mis)alignment. Processor companies have implemented instructions to aid in the reinterpretation of this kind of data (often called "swizzling" and "deswizzling"), but by simply designing the code correctly in the first place, we can avoid these slower instructions.

Example 2: Structure of Arrays

  structure 3d_Points {
     int32 x[20], y[20], z[20]
  }
  3d_Points OurData;

The Structures of Arrays idiom takes advantage of our parallel hardware more efficiently. Because our structure encapsulates more data, we can get all of the data into the processor's cache earlier on. The alignment is now fixed and aligned to 4 bytes, we can skip swizzling and deswizzling.

Convolution and Deconvolution

Convolution Overview

Convolution mathematically is described as an operator which takes two functions f and g and produces a third function that represents the amount of overlap between f and g. Convolution is used quite a bit in signal analysis because convolution operations are often very cheap to perform, can be performed in parallel, and are easy to adjust to fit the situation.

In our case, we will be talking about convolution in the 2D sense, where our image and a known convolution function (known as a filter) will be combined to make an image that better represents the data we are looking for. As you should already be familiar with these concepts, this will be a brief overview of how convolution works, and the specific convolution filters we are interested in.

If you recall, our theoretical sensor model inputs data into the system as an 8x8 matrix of 8-bit data, looking similar to the following:

            Horizontal Address 
          0  1  2  3  4  5  6  7   
 V A   0 00 00 00 00 00 00 00 00   
 e d   1 00 00 00 00 00 00 00 00   
 r d   2 00 00 00 00 00 00 00 00   
 t r   3 00 00 00 00 00 00 00 00   
 i e   4 00 00 00 00 00 00 00 00   
 c s   5 00 00 00 00 00 00 00 00   
 a s   6 00 00 00 00 00 00 00 00   
 l     7 00 00 00 00 00 00 00 00

Our convolution kernels, however, will be much smaller in size. For the algorithms we will be using on the theoretical model, a 3x3 convolution kernel is sufficient.

The important thing to remember when it comes to implementing a convolution kernel is that we are taking two functions and generating a third. The original functions can be discarded, or they can go back into the system for other purposes, such as decreasing noise, so when we go to implement our filters in software, we will want to generate a function which will return a filled buffer to the user the same size as the input buffer.

A convolution operation should look very similar to this pseudocode:

function Convolution ( in_buffer[8][8], out_buffer[8][8], divisor )
{
   define kernel[3][3];

   for (x from 0 to 7)
      for (y from 0 to 7)
         out_buffer[x][y]  = 0; //make sure the buffer is zeroed

         if (x-1 >= 0 && y-1 >= 0)
            out_buffer[x][y] += in_buffer[x-1][y-1] * kernel[0][0];
         if (x-1 >= 0)
            out_buffer[x][y] += in_buffer[x-1][y] * kernel[0][1];
         if (x-1 >= 0 && y+1 <= 7)
            out_buffer[x][y] += in_buffer[x-1][y+1] * kernel[0][2];

         if (y-1 >= 0)
            out_buffer[x][y] += in_buffer[x][y-1] * kernel[1][0];
         //this is our "free" operation, the only one that will always work.
            out_buffer[x][y] += in_buffer[x][y] * kernel[1][1];
         if (y+1 <= 7)
            out_buffer[x][y] += in_buffer[x][y+1] * kernel[1][2];

         if (x+1 <= 7 && y-1 >= 0)
            out_buffer[x][y] += in_buffer[x+1][y-1] * kernel[2][0];
         if (x+1 <= 7)
            out_buffer[x][y] += in_buffer[x+1][y] * kernel[2][1];
         if (x+1 <= 7 && y+1 <= 7)
            out_buffer[x][y] += in_buffer[x+1][y+1] * kernel[2][2];

         if (divisor && divisor > 0)
            out_buffer[x][y] /= divisor;
         else
            out_buffer[x][y] /= 9;

      end for
   end for
}

The above may be optimized on your hardware, as often you can avoid the branch statements by insuring your buffer is big enough to catch possible overruns without needing to do the bounds testing, trading memory for branch operations. Your algorithm will generally require no temporary data besides the constant kernel, and you will probably already have functions similar to this in your toolkits if you have worked with signal processing in the past.

Often the question is asked on what to do at the edges of the filter (known as Edge Conditions). There are many to choose from, such as picking a known "background intensity" and apply it for that element (in this case, the data was a zero, and thusly would not affect the kernel operation; this is the exact same as truncating the filter in this special case), but there are many other ways of dealing with this condition, including extending the data over the edge of the image, truncating the kernel and not running operations over the region, or simply not applying the kernel to points which extend over the edge (often known as "Copy in Place").

You will also notice that we specified no else situations during the pixel operations; in all cases where we aren't doing an operation, we're assuming the output of that operation would have been zero. For example, if the convolution kernel's center term is zero as is often the case, there is no point to do the multiply and yield zero. Simply skip that calculation and move to the next.

In certain cases, for example if our data is in integer format, we need to apply a divisor to satisfy the constraints of the number format (in this case, our buffer consists of 8-bit data). The divisor is typically the kernel's width times the kernel's height (9 for a 3x3, 25 for a 5x5, etc), but there are cases in which other divisors may be desirable.

Sobel Kernel

The first kernel we will look at is a Sobel Kernel, also known as an Edge Detection kernel. A Sobel Kernel emphasizes high frequency changes in the image function, by reducing the intensity of pixels of low frequencies as it increases the intensity of pixels with a higher frequency. Essentially, the Sobel Kernel is taking the second derivative of the image's intensity at every coordinate, however, to those of us less mathematically minded and to some of us who are, it's much easier to understand as a convolution filter:

 -1 0 1
 -2 0 2
 -1 0 1

We can think about this by looking at the one dimensional second derivative of a constant function f(x): $f^{\prime \prime }(x)=1*f(x-1)-2*f(x)+1*f(x+1)$ . By taking the constants and storing them in a matrix, we construct our one dimensional convolution kernel: ${\begin{bmatrix}1&&2&&1\end{bmatrix}}$ . We make this into a 2-dimensional convolution kernel by matrix multiplication with the vertical matrix ${\begin{bmatrix}-1\\0\\1\end{bmatrix}}$ , which is simply the absolute difference from the point we are convolving.

There are several variations of this relatively simple filter. For example, one often adjustment is to the direction of emphasis:

 1 0 -1
 2 0 -2
 1 0 -1

This adaptation will change the leading edge of the filter to the right side rather than the left as is emphasized in the former example. It is the same filter as above "flipped" by multiplying the function by -1.

  1  2  1        -1 -2 -1 
  0  0  0   or    0  0  0
 -1 -2 -1         1  2  1

These are rotations to the original filter, which emphasize upwards and downwards respectively.

Understanding Multitouch/Printable version

Contents