Concepts of Computer Graphics/Output Space/Representing The Digital Image

Now that we have covered the basics of how we represent images and colors, we can combine that information into a more detailed picture of how a digital image is represented. We noted before that a digital image is just an array of color values, and now that we know a bit about how colors are represented as numbers in the RGB color space, we can combine that information to get a more detailed picture.

Every pixel in a raster image using an RGB color system contains three RGB values.

As you might expect, the digital representation of an image is simply an array of these triples of RGB values. Specifically, an image can be thought of as a rectangular grid of these color values. If we are dealing with an image we wish to be 800 by 600 pixels, then we would have an array which had 800 RGB color values across the width of the image, and 600 RGB color values down the length of the image.

As you might expect, it is convenient to reference any given pixel by its (x,y) coordinates. In this coordinate system, (0,0) is the upper-left-most pixel, and (width, height) is the lower-right-most pixel, where width and height represent the width and height of the image, of course (using the above example, they would be 800 and 600 respectively). Thus, as you move horizontally across the image the x values increase, while the y values increase as you move down the image, which can be counterintuitive at first.

This a fairly arbitrary organization (in fact, in some image formats, (0,0) is the lower-left-most pixel, as you might expect from the familiar Cartesian coordinate system), but it is the one we will assume throughout the book, and it is the most widely used organization in practice. There is a practical benefit to using this organization from the perspective of a computer designer, but the computer graphics programmer rarely sees the benefit from it. That benefit is this: when a computer monitor displays an image, it sweeps a beam of light, left to right, top to bottom across all the pixels on the monitor. Thus, as the image is being sent to the monitor, it makes the most sense to give it what it draws first first.

It is also worth noting that it is easy to convert from one representation to another. For example, if you wish to think of the screen as having the familiar coordinate system, then you could simply write your code the way you like and then reorder the rows later on, before it is displayed to the screen. Another alternative would be to replace each y coordinate you wish you access by (height-y). Thus, when you try to access what you think of as the bottom row of the screen (y=0), you will instead access (y=800-0=800), which is the bottom row of the screen as the computer monitor thinks of it. In the end, the pixels have to go in one direction or another, and this is a familiar and commonly used format.

In the computer's memory, the image is laid out in this very direct fashion, as one contiguous block of memory. Each pixel is assigned enough space for the three numbers representing its color components, and the first pixel in the image (0,0) is the first pixel in the memory block. At the next highest memory address is the next pixel in the row (1,0), and so forth. Eventually, the last pixel in the memory block is the one which is considered to be in the position (width, height).

As an example, let us say that we have an image in memory which is 800 by 600 pixels, the first pixel of the image is at memory address 100, and the size of each RGB component is given by r. Then the r bytes at address 100 are the pixel at (0,0), the r bytes at address 100+r are the pixel at (1,0), and the r bytes at address 100+2*r are the pixel at (2,0). If we wish to access the pixel data for the pixel at (5,5), we need to access the r bytes at 100+5*width*r+5*r. That is, we need to add to the base address the number of bytes in the first 5 rows (5*width*r), plus the number of bytes into the row that that pixel is at. In general, we can access the pixel (x,y) in memory by accessing baseAddress+y*width*r+x*r. This formula is not important to remember, but it is important to understand that the image data in memory is not two dimensional, it is one dimensional and we only give the data a two dimensional organization by how we interpret it.