OpenGL Programming/3D/Matrices

Understanding Transformation Matrices edit

The concepts behind building a transformation matrix are easy enough to understand, but what does that matrix actually represent? This section describes a mechanism of interpreting a composite transformation matrix with multiple translations, rotations, and scalings in terms of transitions between reference frames. It provides a model for organizing primitive transformations into a hierarchical system of nested reference frames.

The Reference Frame edit

We normally think of objects as moving "forward and backwards", "up and down", or "left and right"; we do not normally consider their motion with respect to an arbitrary reference frame. They "turn around" (yaw), "look up" (pitch), and "cartwheel" (roll), "move forward", "strafe left", and "jump up". So how do we compose transformations to create objects that behave symbolically more like they do in the real world?

Let us define an object's reference frame as its position and orientation. This can be uniquely defined by two orthogonal vectors specifying the object's "forward" and "up" directions (implying a third, "right" == forward cross up), and a vertex specifying the object's position.

When the parameters of the child reference frame are specified by vectors/vertices in the parent reference frame's coordinates, we can create a matrix that transforms a vector from the child's reference frame to the parent's reference frame by creating the matrix out of the four column vectors (right, up, forward, position), where right == forward cross up.

Let us make this more concrete.

The Reference Frame of the Universe, or The Ether Revisited edit

We will first look at the identity matrix:


By our definition, we interpret the first column ({1,0,0,0}) as the "right" direction, the second ({0,1,0,0}) as "up", the third ({0,0,1,0}) as "forward" (or "backward", if you prefer), and the fourth ({0,0,0,1}) as the "position" vertex. Note that the position vertex has a w coordinate of 1, whereas the direction vectors (which are really the differences between two vertices) have a w coordinate of 0.

This example appears trivial, but it provides us with a foundation. We can call this the "universe" reference frame. Right is +x, up is +y, forward (or backward, if you prefer) is +z, and the origin is located at {0,0,0,1}.

Now, we can look at the example of a child object within the universe. We can specify this object's reference frame via the matrix:


where right, up, and forward are the frame's normalized direction vectors and position is the frame's position vertex, specified in the frame of the parent (or 'universe') object. This is easiest understood when the 'w' coordinate of all directions is 0 and the 'w' coordinate of the position vertex is 1, as per the "fourth-dimensional hack" convention.

To demonstrate: We have vertex v = {2, 0, 0, 1} in the coordinates of framechild. Multiplying framechild by v, we get:


Note that 2*rightw + positionw = 1 when rightw = 0 and positionw = 1. We see that the vertex v in framechild is equivalent to 2*frameright + frameposition, which is precisely the description of v in the parent reference frame. Also note that in the case of a direction-without-position (w=0), the rotations are applied but position is not accumulated - the direction is reoriented without falsely being assigned a position.

Each "frame" matrix is a transformation from the reference frame that the coordinate is specified in to the reference frame of the parent object, so that ultimately we have:

 frameparent*framechild*framenested_child*vertex_in_nested_child_coords ==
 frameparent*framechild*vertex_in_child_coords ==
 frameparent*vertex_in_parent_coords ==

This matrix (frameparent*framechild*framenested_child), when post-multiplied to the universe's reference frame matrix, will transform a coordinate from the nested_child's coordinates to the universe's coordinates. These reference frames may be nested indefinitely.

An example reference frame hierarchy:

 universe -> galaxy -> solar system -> earth -> position on earth

The motions of each reference frame may be considered "independent" of each other. When the galaxy moves, the solar system moves with it, and with that the earth, and with that your position on earth.

Scaling Operations edit

While it is this author's opinion that scaling operations should generally be avoided, since they butcher vertex normals, sometimes they are needed anyway. Fortunately, this is easy to accomplish. Let us consider the scaling transformation:

               | αx   0  0   0  |
 scale_xform = | 0   αy  0   0  |
               | 0   0   αz  0  |
               | 0   0   0   αw |

where αw == 1 unless you are insane.

Typically, we will want to scale the vertices as the first operation performed; when we "scale" x by a factor of 3, we typically do not want to scale its orientation and position (i.e., the columns of the transformation matrix), since they are specified in the coordinates of the parent reference frame. As such, scaling is the last operation appended to the matrix in the "reference frame" transformation model.

When we multiply framechild by scale_xform, we get:

               | αx*rightx  αy*upx  αz*forwardx  αw*positionx |
 final_xform = | αx*righty  αy*upy  αz*forwardy  αw*positiony |
               | αx*rightz  αy*upz  αz*forwardz  αw*positionz |
               | αx*rightw  αy*upw  αz*forwardw  αw*positionw |

The scale factors simply show up as the magnitudes of the orientation vectors.

The "Camera" Transformation edit

FIXME: I have only proven this, and have not demonstrated whether it works.

The transformation from the reference frame of the universe to that of the camera is subtly but critically different from those described above. Unlike the previous discussion, the 'camera' (parent reference frame) is typically specified in 'universe' (child reference frame) coordinates. Using the conventional logic of a camera moving freely through a universe requires extra consideration.

To convert between the universe and camera reference frames, we need to create a set of "fixed" coordinates which represent the 'universe' as a child of the 'camera'; we need the universe's reference frame in camera coordinates.

FIXME: needs illustrations, badly. Thinking out this problem:

  1. If the camera is facing the same direction as the universe: No transformation.
  2. If the camera is cocked to the left, then the universe needs to be cocked to the right.
  3. If the camera is pitched upwards, then the universe needs to be pitched downwards.
  4. If the camera is facing the opposite direction as the universe, the universe needs to face the other way.
  5. If the camera is upside-down with respect to the universe, the universe needs to be flipped over.

These are characteristics of reflections; the entire "camera" reference frame matrix is the composition of the column vectors of the reflections of the camera's 'universe' coordinates about the universe's axes. Defining

 c = camera reference frame in universe coordinates
 u = universe reference frame (identity)
 refl(a, b) = reflection of a over b = 2*dot(a,b)*b - a:

we can build the camera's reference frame matrix as:

 framecamera = { refl(cright, uright), refl(cup, uup), refl(cforward, uforward), refl(cposition, uposition) }

Interestingly, this includes the position vertex, which is "reflected" through the origin.

This applies to all reference frames where the 'parent' reference frame is stored in the coordinates of the 'child' reference frame. The author cannot think of any other examples of this scenario.

The Projection Transformation edit

The projection transformation should be used only to fit the camera into the viewport. glOrtho*(), glFrustum(), and gluPerspective() are typically used to accomplish this goal.

The 'camera' frame -- either as specified above, or via gluLookAt() -- will normally be the "bottom" frame on the modelview matrix stack. This matrix converts from universe to camera coordinates. The projection matrix precedes the modelview matrix in multiplication order, so that the projection matrix may be considered the transformation to screen coordinates from camera coordinates.

Using the 'camera' frame is indistinguishable from using gluLookAt() (FIXME: Check for congruence), though the parameterization of the two operations is somewhat different.

Summary edit

By viewing the transformation matrix as a reference frame composed of four column vectors { c0, c1, c2, c3 } decompose the matrix into more usable properties:

  • The "right" axis: c0.direction()
  • The "up" axis: c1.direction()
  • The "forward" axis: c2.direction()
  • The "position": c3
  • The "x" scale: c0.magnitude()
  • The "y" scale: c1.magnitude()
  • The "z" scale: c2.magnitude()
  • The "w" scale: c3.magnitude()

We can use these properties to implement motion more intuitively:

  • Move forward distance β: xform += { 0, 0, 0, β*c2 }
  • Move up distance γ: xform += {0, 0, 0, γ*c1 }
  • Turn to the left 30 degrees: xform *= rotation_matrix(axis = c1, angle = π/6.0)

and so forth.

We can transform from the "current" reference frame (ie, the one represented by the Modelview matrix in the context of OpenGL) to a "child" reference frame by simply multiplying the reference frame matrix (constructed as above) to the current matrix. A complete drawing operation may be performed by: