Understanding Multitouch/Image Acquisition

Acquisition

When thinking about a multitouch system, it is advantageous to break it into several components that stream together to form the entire system. As such, the first step in the pipeline is always acquisition. Later in the book we will describe the hardware from which we will acquire the data, but for now, we will talk about how to model that data in software.

Format

To start, we need to come up with a format to describe the system. We do this at acquisition because every component further down the pipeline will require knowledge of what this format is, and how to interact with it. In other systems used in digital signal processing, data is imported in the form of raw image packets or raw sound packets in well-known formats. For example, an image is often imported in the form of Y:Cr:Cb (seen in RAW-mode imagery) or RGB (in other devices such as webcams and cheaper digital cameras), and sound is often represented in the form of PCM data.

Neither PCM nor either image mode is directly applicable to multitouch as the data from the sensor in this case may not model neither sound nor an image. We tend to think of the data coming from our sensor as histogram data, intensities at a point on a grid. Imaging models best represent this but obviously neither RGB nor Y:Cr:Cb describe this properly. We will need to define a new format for our own usage here.

The first step is depicting the average resolution of our underlying device. This can vary wildly, with lower-end capacitive sensors and pressure sensors presenting a very low-resolution data source, to image sensors which produce data in a vast amount of detail, often too much detail to deal with. A good starting point for your system will be 8 bits of resolution (representing 2^8 = 256 different possible reference points), but your sensor may need a larger range.

Determining what is usable data and what is noise is the next part of our solution. A sensor can usually do this at startup, or with a calibration routine by collecting multiple samples of data from the sensor, averaging that data, and using the result as a reference for the platform's noise (alg 1). Again, as this is hardware dependent, it will be up to the implementer to decide how many samples at what rate is enough. One thing to keep in mind is that many sensors are temperature dependent and can possibly drift, so it may be advantageous to implement your calibration routine as a ring buffer and continuously calibrate with a running average.

Noise

As will be made abundantly clear later, the key enemy in any multitouch system is noise. Noise in our case is simply information that is incorrect. How it got to being incorrect (for example, a sensor-to-buffer resolution mismapping) is not as important as how incorrect it actually is (white noise verses pink or otherwise-colored noise). For example, Wiener Deconvolution can correct errors that often occur due to white noise. Other kinds of noise can be removed from the signal through dynamic programming algorithms.

Realize that at the acquisition-level, we do not need to be perfect. The more time we waste here, the less time we have further down the pipeline to make the device responsive to the user's input. While it is worthwhile to try to remove noise programmatically, the best way is to insure the noise never enters the system to begin with and should be dealt with in your hardware sensor's implementation as we will discuss later in the book.

An Abstract Model

For the purposes of this book, we will need to describe a model that will be universal to any programming examples. For simplicity's sake, the model will not describe a physical sensor, but instead a theoretical "perfect sensor" in which we can trust that all of the data is 100% correct. Error considerations will be noted in due course, but will not be taken into consideration when discussing our sensor model.

The basis for all multitouch as described in the Overview comes from the idea of having two-dimensional input sensor. The sensor does not need to be of high resolution, and as we will be sampling it often, we will want to throw away as much data as possible as early on as possible. Our theoretical model sensor will have a horizontal and vertical resolution of 8 elements consisting of 8-bits of resolution. Consider that the 8 elements on either axis are equally and evenly spaced apart; whether or not this spacing is a few millimeters or a centimeter apart (or more) is irrelevant to our model. With our model, we will call our 8-bits of depth the pressure resolution, and we will call our horizontal and vertical resolutions collectively spatial resolution.

Our pressure resolution will be modeled as an integer with a normalized and calibrated zero pressure indicated by 0, and the maximum amount of pressure we can register as 255. In a real system, it may be advantageous to reserve the lower two bits to indicate error states such as a sample overflowing the representable resolution, but in this model our sensor will always represent perfect data. Our spatial resolution will be modeled as integers from 0 to 7 on either axis, represented as a matrix.