Engineering Analysis/Print version



Vector Spaces

Vectors and Scalars

A scalar is a single number value, such as 3, 5, or 10. A vector is an ordered set of scalars.

A vector is typically described as a matrix with a row or column size of 1. A vector with a column size of 1 is a row vector, and a vector with a row size of 1 is a column vector.


[Column Vector]

\begin{bmatrix}a \\ b\\ c\\ \vdots\end{bmatrix}


[Row Vector]

\begin{bmatrix}a & b & c &\cdots\end{bmatrix}

A "common vector" is another name for a column vector, and this book will simply use the word "vector" to refer to a common vector.

Vector Spaces

A vector space is a set of vectors and two operations (addition and multiplication, typically) that follow a number of specific rules. We will typically denote vector spaces with a capital-italic letter: V, for instance. A space V is a vector space if all the following requirements are met. We will be using x and y as being arbitrary vectors in V. We will also use c and d as arbitrary scalar values. There are 10 requirements in all:

Given: x, y \in V

  1. There is an operation called "Addition" (signified with a "+" sign) between two vectors, x + y, such that if both the operands are in V, then the result is also in V.
  2. The addition operation is commutative for all elements in V.
  3. The addition operation is associative for all elements in V.
  4. There is a unique neutral element, φ, in V, such that x + φ = x. This is also called a zero element.
  5. For every x in V, then there is a negative element -x in V such that -x + x = φ.
  6. cx \in V
  7. c(x + y) = cx + cy
  8. (c + d)x = cx + dx
  9. c(dx) = cdx
  10. 1 × x = x

Some of these rules may seem obvious, but that's only because they have been generally accepted, and have been taught to people since they were children.

Scalar Product

A scalar product is a special type of operation that acts on two vectors, and returns a scalar result. Scalar products are denoted as an ordered pair between angle-brackets: <x,y>. A scalar product between vectors must satisfy the following four rules:

  1. \langle x, x\rangle \ge 0, \quad \forall x \in V
  2. \langle x, x\rangle = 0, only if x = 0
  3. \langle x, y\rangle = \langle y, x\rangle
  4. \langle x, cy_1 + dy_2\rangle = c\langle x, y_1\rangle + d\langle x, y_2\rangle

If an operation satisifes all these requirements, then it is a scalar product.

Examples

One of the most common scalar products is the dot product, that is discussed commonly in Linear Algebra

Norm

The norm is an important scalar quantity that indicates the magnitude of the vector. Norms of a vector are typically denoted as \|x\|. To be a norm, an operation must satisfy the following four conditions:

  1. \|x\| \ge 0
  2. \|x\| = 0 only if x = 0.
  3. \|cx\| = |c|\|x\|
  4. \|x + y\| \le \|x\| + \|y\|

A vector is called normal if it's norm is 1. A normal vector is sometimes also referred to as a unit vector. Both notations will be used in this book. To make a vector normal, but keep it pointing in the same direction, we can divide the vector by its norm:

\bar{x} = \frac{x}{\|x\|}

Examples

One of the most common norms is the cartesian norm, that is defined as the square-root of the sum of the squares:

\|x\| = \sqrt{x_1^2 + x_2^2 + \cdots + x_n^2}

Unit Vector

A vector is said to be a unit vector if the norm of that vector is 1.

Orthogonality

Two vectors x and y are said to be orthogonal if the scalar product of the two is equal to zero:

\langle x,y\rangle = 0

Two vectors are said to be orthonormal if their scalar product is zero, and both vectors are unit vectors.

Cauchy-Schwartz Inequality

The cauchy-schwartz inequality is an important result, and relates the norm of a vector to the scalar product:

|\langle x,y\rangle | \leq \|x\|\|y\|

Metric (Distance)

The distance between two vectors in the vector space V, called the metric of the two vectors, is denoted by d(x, y). A metric operation must satisfy the following four conditions:

  1. d(x,y) \ge 0
  2. d(x,y) = 0 only if x = y
  3. d(x,y) = d(y, x)
  4. d(x,y) \le d(x,z) + d(z, y)

Examples

A common form of metric is the distance between points a and b in the cartesian plane:

d(a, b)_{{cartesian}} = \sqrt{(x_a - x_b)^2 + (y_a - y_b)^2}

Linear Independance

A set of vectors V = {v_1, v_2, \cdots, v_n} are said to be linearly dependant on one another if any vector v from the set can be constructed from a linear combination of the other vectors in the set. Given the following linear equation:

a_1v_1 + a_2v_2 + \cdots + a_nv_n = 0

The set of vectors V is linearly independent only if all the a coefficients are zero. If we combine the v vectors together into a single row vector:

\hat{V} = [v_1 v_2 \cdots v_n]

And we combine all the a coefficients into a single column vector:

\hat{a} = [a_1 a_2 \cdots a_n]^T

We have the following linear equation:

\hat{V}\hat{a} = 0

We can show that this equation can only be satisifed for \hat{a} = 0, the matrix \hat{V} must be invertable:

\hat{V}^{-1}\hat{V}\hat{a} = \hat{V}^{-1}0
\hat{a} = 0

Remember that for the matrix to be invertable, the determinate must be non-zero.

Non-Square Matrix V

If the matrix \hat{V} is not square, then the determinate can not be taken, and therefore the matrix is not invertable. To solve this problem, we can premultiply by the transpose matrix:

\hat{V}^T\hat{V}\hat{a} = 0

And then the square matrix \hat{V}^T\hat{V} must be invertable:

(\hat{V}^T\hat{V})^{-1}\hat{V}^T\hat{V}\hat{a} = 0
\hat{a} = 0

Rank

The rank of a matrix is the largest number of linearly independent rows or columns in the matrix.

To determine the Rank, typically the matrix is reduced to row-echelon form. From the reduced form, the number of non-zero rows, or the number of non-zero columns (whichever is smaller) is the rank of the matrix.

If we multiply two matrices A and B, and the result is C:

AB = C

Then the rank of C is the minimum value between the ranks A and B:

\operatorname{Rank}(C) = \operatorname{min}[\operatorname{Rank}(A), \operatorname{Rank}(B)]

Span

A Span of a set of vectors V is the set of all vectors that can be created by a linear combination of the vectors.

Basis

A basis is a set of linearly-independent vectors that span the entire vector space.

Basis Expansion

If we have a vector y \in V, and V has basis vectors {v_1 v_2 \cdots v_n}, by definition, we can write y in terms of a linear combination of the basis vectors:

a_1v_1 + a_2v_2 + \cdots + a_nv_n = y

or

\hat{V}\hat{a} = y

If \hat{V} is invertable, the answer is apparent, but if \hat{V} is not invertable, then we can perform the following technique:

\hat{V}^T\hat{V}\hat{a} = \hat{V}^Ty
\hat{a} = (\hat{V}^T\hat{V})^{-1}\hat{V}^Ty

And we call the quantity (\hat{V}^T\hat{V})^{-1}\hat{V}^T the left-pseudoinverse of \hat{V}.

Change of Basis

Frequently, it is useful to change the basis vectors to a different set of vectors that span the set, but have different properties. If we have a space V, with basis vectors \hat{V} and a vector in V called x, we can use the new basis vectors \hat{W} to represent x:

x = \sum_{i = 0}^na_iv_i = \sum_{j = 1}^n b_jw_j

or,

x = \hat{V}\hat{a} = \hat{W}\hat{b}

If V is invertable, then the solution to this problem is simple.

Grahm-Schmidt Orthogonalization

If we have a set of basis vectors that are not orthogonal, we can use a process known as orthogonalization to produce a new set of basis vectors for the same space that are orthogonal:

Given: \hat{V} = {x_1 v_2 \cdots v_n}
Find the new basis \hat{W} = {w_1 w_2 \cdots w_n}
Such that \langle w_i, w_j\rangle = 0\quad\forall i, j

We can define the vectors as follows:

  1. w_1 = v_1
  2. w_m = v_m - \sum_{i = 1}^{m-1}\frac{\langle v_m, u_i\rangle }{\langle u_i, u_i\rangle }u_i

Notice that the vectors produced by this technique are orthogonal to each other, but they are not necessarily orthonormal. To make the w vectors orthonormal, you must divide each one by its norm:

\bar{w} = \frac{w}{\|w\|}

Reciprocal Basis

A Reciprocal basis is a special type of basis that is related to the original basis. The reciprocal basis \hat{W} can be defined as:

\hat{W} = [\hat{V}^T]^{-1}


Linear Transformations

Linear Transformations

A linear transformation is a matrix M that operates on a vector in space V, and results in a vector in a different space W. We can define a transformation as such:

T:V\to W

In the above equation, we say that V is the domain space of the transformation, and W is the range space of the transformation. Also, we can use a "function notation" for the transformation, and write it as:

M(x) = Mx = y

Where x is a vector in V, and y is a vector in W. To be a linear transformation, the principle of superposition must hold for the transformation:

M(av_1 + bv_2) = aM(v_1) + bM(v_2)

Where a and b are arbitrary scalars.

Null Space

The Nullspace of an equation is the set of all vectors x for which the following relationship holds:

Mx = 0

Where M is a linear transformation matrix. Depending on the size and rank of M, there may be zero or more vectors in the nullspace. Here are a few rules to remember:

  1. If the matrix M is invertable, then there is no nullspace.
  2. The number of vectors in the nullspace (N) is the difference between the rank(R) of the matrix and the number of columns(C) of the matrix:
N = R - C

If the matrix is in row-eschelon form, the number of vectors in the nullspace is given by the number of rows without a leading 1 on the diagonal. For every column where there is not a leading one on the diagonal, the nullspace vectors can be obtained by placing a negative one in the leading position for that column vector.

We denote the nullspace of a matrix A as:

\mathcal{N}\{A\}

Linear Equations

If we have a set of linear equations in terms of variables x, scalar coefficients a, and a scalar result b, we can write the system in matrix notation as such:

Ax = b

Where x is a m × 1 vector, b is an n × 1 vector, and A is an n × m matrix. Therefore, this is a system of n equations with m unknown variables. There are 3 possibilities:

  1. If Rank(A) is not equal to Rank([A b]), there is no solution
  2. If Rank(A) = Rank([A b]) = n, there is exactly one solution
  3. If Rank(A) = Rank([A b]) < n, there are infinitely many solutions.

Complete Solution

The complete solution of a linear equation is given by the sum of the homogeneous solution, and the particular solution. The homogeneous solution is the nullspace of the transformation, and the particular solution is the values for x that satisfy the equation:

A(x) = b
A(x_h + x_p) = b

Where

x_h is the homogeneous solution, and is the nullspace of A that satisfies the equation A(x_h) = 0
x_p is the particular solution that satisfies the equation A(x_p) = b

Minimum Norm Solution

If Rank(A) = Rank([A b]) < n, then there are infinitely many solutions to the linear equation. In this situation, the solution called the minimum norm solution must be found. This solution represents the "best" solution to the problem. To find the minimum norm solution, we must minimize the norm of x subject to the constraint of:

Ax - b = 0

There are a number of methods to minimize a value according to a given constraint, and we will talk about them later.

Least-Squares Curve Fit

If Rank(A) doesnt equal Rank([A b]), then the linear equation has no solution. However, we can find the solution which is the closest. This "best fit" solution is known as the Least-Squares curve fit.

We define an error quantity E, such that:

E = Ax - b \ne 0

Our job then is to find the minimum value for the norm of E:

\|E\|^2 = \|Ax - b\|^2 = <Ax -b, Ax-b>

We do this by differentiating with respect to x, and setting the result to zero:

\frac{\partial \|E\|^2}{\partial x} = 2A'(Ax - b) = 0

Solving, we get our result:

x = (A^TA)^{-1}A^Tb


Minimization

Khun-Tucker Theorem

The Khun-Tucker Theorem is a method for minimizing a function f(x) under the constraint g(x). We can define the theorem as follows:

L(x) = f(x) + \langle \Lambda, g(x)\rangle

Where Λ is the lagrangian vector, and < , > denotes the scalar product operation. We will discuss scalar products more later. If we differentiate this equation with respect to x first, and then with respect to Λ, we get the following two equations:

\frac{\partial L(x)}{\partial x} = x + A\Lambda
\frac{\partial L(x)}{\partial \Lambda} = Ax - b

We have the final result:

x = A^T[AA^T]^{-1}b


Projections

Projection

The projection of a vector v \in V onto the vector space W \in V is the minimum distance between v and the space W. In other words, we need to minimize the distance between vector v, and an arbitrary vector w \in W:

\|w - v\|^2 = \|\hat{W}\hat{a} - v\|^2
\frac{\partial \|\hat{W} \hat{a} - v\|^2}{\partial \hat{a}} = \frac{\partial \langle \hat{W}\hat{a} - v, \hat{W}\hat{a} - v\rangle }{\partial \hat{a}} = 0


[Projection onto space W]

\hat{a} = (\hat{W}^T\hat{W})^{-1}\hat{W}^Tv

For every vector v \in V there exists a vector w \in W called the projection of v onto W such that <v-w, p> = 0, where p is an arbitrary element of W.

Orthogonal Complement

w^\perp = {x \in V: \langle x, y \rangle = 0, \forall y \in W}

Distance between v and W

The distance between v \in V and the space W is given as the minimum distance between v and an arbitrary w \in W:

\frac{\partial d(v, w)}{\partial \hat{a}} = \frac{\partial\|v - \hat{W}\hat{a}\|}{\partial \hat{a}} = 0

Intersections

Given two vector spaces V and W, what is the overlapping area between the two? We define an arbitrary vector z that is a component of both V, and W:

z = \hat{V} \hat{a} = \hat{W} \hat{b}
\hat{V} \hat{a} - \hat{W} \hat{b} = 0
\begin{bmatrix}\hat{a} \\ \hat{b}\end{bmatrix}= \mathcal{N}([\hat{v} - \hat{W}])

Where N is the nullspace.


Linear Spaces

{{:Engineeing Analysis/Linear Spaces}


Matrices

Norms

Induced Norms

n-Norm

Frobenius Norm

Spectral Norm

Derivatives

Consider the following set of linear equations:

a = bx_1 + cx_2
d = ex_1 + fx_2

We can define the matrix A to represent the coefficients, the vector B as the results, and the vector x as the variables:

A = \begin{bmatrix}b &  c \\ e & f\end{bmatrix}
B = \begin{bmatrix}a \\ d\end{bmatrix}
x = \begin{bmatrix}x_1 \\ x_2\end{bmatrix}

And rewriting the equation in terms of the matrices, we get:

B = Ax

Now, let's say we want the derivative of this equation with respect to the vector x:

\frac{d}{dx}B = \frac{d}{dx}Ax

We know that the first term is constant, so the derivative of the left-hand side of the equation is zero. Analyzing the right side shows us:

Pseudo-Inverses

There are special matrices known as pseudo-inverses, that satisfies some of the properties of an inverse, but not others. To recap, If we have two square matrices A and B, that are both n × n, then if the following equation is true, we say that A is the inverse of B, and B is the inverse of A:

AB = BA = I

Right Pseudo-Inverse

Consider the following matrix:

R = A^T[AA^T]^{-1}

We call this matrix R the right pseudo-inverse of A, because:

AR = I

but

RA \ne I

We will denote the right pseudo-inverse of A as A^\dagger

Left Pseudo-Inverse

Consider the following matrix:

L = [A^TA]^{-1}A^T

We call L the left pseudo-inverse of A because

LA = I

but

AL \ne I

We will denote the left pseudo-inverse of A as A^\ddagger

Matrices that follow certain predefined formats are useful in a number of computations. We will discuss some of the common matrix formats here. Later chapters will show how these formats are used in calculations and analysis.

Diagonal Matrix

A diagonal matrix is a matrix such that:

a_{ij} = 0, i \ne j

In otherwords, all the elements off the main diagonal are zero, and the diagonal elements may be (but don't need to be) non-zero.

Companion Form Matrix

If we have the following characteristic polynomial for a matrix:

|A - \lambda I| = \lambda^n + a_{n-1}\lambda^{n-1} + \cdots + a_1\lambda^1 + a_0

We can create a companion form matrix in one of two ways:

\begin{bmatrix} 0 & 0 & 0 & \cdots & 0 & -a_0 \\
                       1 & 0 & 0 & \cdots & 0 & -a_1 \\
                       0 & 1 & 0 & \cdots & 0 & -a_2 \\
                       0 & 0 & 1 & \cdots & 0 & -a_3 \\
                       \vdots & \vdots & \vdots &\ddots & \vdots & \vdots \\
                       0 & 0 & 0 & \cdots & 1 & -a_{n-1} 
       \end{bmatrix}

Or, we can also write it as:

\begin{bmatrix} -a_{n-1} & -a_{n-2} & -a_{n-3} & \cdots & a_1 & a_0 \\
                       0 & 0 & 0 & \cdots & 0 & 0 \\
                       1 & 0 & 0 & \cdots & 0 & 0 \\
                       0 & 1 & 0 & \cdots & 0 & 0 \\
                       0 & 0 & 1 & \cdots & 0 & 0 \\
                       \vdots & \vdots & \vdots &\ddots & \vdots & \vdots \\
                       0 & 0 & 0 & \cdots & 1 & 0 
       \end{bmatrix}

Jordan Canonical Form

To discuss the Jordan canonical form, we first need to introduce the idea of the Jordan Block:

Jordan Blocks

A jordan block is a square matrix such that all the diagonal elements are equal, and all the super-diagonal elements (the elements directly above the diagonal elements) are all 1. To illustrate this, here is an example of an n-dimensional jordan block:

\begin{bmatrix} a & 1 & 0 & \cdots & 0 \\
                       0 & a & 1 & \cdots & 0 \\
                       0 & 0 & a & \cdots & 0 \\
                       \vdots & \vdots & \vdots &\ddots & \vdots \\
                       0 & 0 & a & \cdots & 1 \\
                       0 & 0 & 0 & \cdots & a 
       \end{bmatrix}

Canonical Form

A square matrix is in Jordan Canonical form, if it is a diagonal matrix, or if it has one of the following two block-diagonal forms:

\begin{bmatrix}D & 0 & \cdots & 0 \\
                      0 & J_1 & \cdots & 0 \\
                      \vdots & \vdots &\ddots & \vdots \\
                      0 & 0 & \cdots & J_n
       \end{bmatrix}

Or:

\begin{bmatrix}J_1 & 0 & \cdots & 0 \\
                      0 & J_2 & \cdots & 0 \\
                      \vdots & \vdots &\ddots & \vdots \\
                      0 & 0 & \cdots & J_n
       \end{bmatrix}

The where the D element is a diagonal block matrix, and the J blocks are in Jordan block form.

If we have an n × 1 vector x, and an n × n symmetric matrix M, we can write:

x^TMx = a

Where a is a scalar value. Equations of this form are called quadratic forms.

Matrix Definiteness

Based on the quadratic forms of a matrix, we can create a certain number of categories for special types of matrices:

  1. if x^TMx > 0 for all x, then the matrix is positive definite.
  2. if x^TMx \ge 0 for all x, then the matrix is positive semi-definite.
  3. if x^TMx < 0 for all x, then the matrix is negative definite.
  4. if x^TMx \le 0 for all x, then the matrix is negative semi-definite.

These classifications are used commonly in control engineering.


Eigenvalues and Eigenvectors

The Eigen Problem

This page is going to talk about the concept of Eigenvectors and Eigenvalues, which are important tools in linear algebra, and which play an important role in State-Space control systems. The "Eigen Problem" stated simply, is that given a square matrix A which is n × n, there exists a set of n scalar values λ and n corresponding non-trivial vectors v such that:

Av = \lambda v

We call λ the eigenvalues of A, and we call v the corresponding eigenvectors of A. We can rearrange this equation as:

(A - \lambda I)v = 0

For this equation to be satisfied so that v is non-trivial, the matrix (A - λI) must be singular. That is:

|A - \lambda I| = 0

Characteristic Equation

The characteristic equation of a square matrix A is given by:


[Characteristic Equation]

|A - \lambda I| = 0

Where I is the identity matrix, and λ is the set of eigenvalues of matrix A. From this equation we can solve for the eigenvalues of A, and then using the equations discussed above, we can calculate the corresponding eigenvectors.

In general, we can expand the characteristic equation as:


[Characteristic Polynomial]

|A - \lambda I| = (-1)^n(\lambda^n + c_{n-1}\lambda^{n-1} + \cdots + c_1\lambda^1 + c_0)

This equation satisfies the following properties:

  1. |A| = (-1)^n c_0
  2. A is nonsingular if c0 is non-zero.

Example: 2 × 2 Matrix

Let's say that X is a square matrix of order 2, as such:

X = \begin{bmatrix}a & b \\c & d\end{bmatrix}

Then we can use this value in our characteristic equation:

\begin{vmatrix}a - \lambda & b \\ c & d- \lambda\end{vmatrix} = 0
(a - \lambda)(d - \lambda) - (b)(c) = 0

The roots to the above equation (the values for λ that satisfies the equality) are the eigenvalues of X.

Eigenvalues

The solutions, λ, of the characteristic equation for matrix X are known as the eigenvalues of the matrix X.

Eigenvalues satisfy the following properties:

  1. If λ is an eigenvalue of A, λn is an eigenvalue of An.
  2. If λ is a complex eigenvalue of A, then λ* (the complex conjugate) is also an eigenvalue of A.
  3. If any of the eigenvalues of A are zero, then A is singular. If A is non-singular, all the eigenvalues of A are nonzero.

Eigenvectors

The characteristic equation can be rewritten as such:

Xv = \lambda v

Where X is the matrix under consideration, and λ are the eigenvalues for matrix X. For every unique eigenvalue, there is a solution vector v to the above equation, known as an eigenvector. The above equation can also be rewritten as:

|X - \lambda I|v = 0

Where the resulting values of v for each eigenvalue λ is an eigenvector of X. There is a unique eigenvector for each unique eigenvalue of X. From this equation, we can see that the eigenvectors of A form the nullspace:

v = \mathcal{N}\{A - \lambda I\}

And therefore, we can find the eigenvectors through row-reduction of that matrix.

Eigenvectors satisfy the following properties:

  1. If v is a complex eigenvector of A, then v* (the complex conjugate) is also an eigenvector of A.
  2. Distinct eigenvectors of A are linearly independent.
  3. If A is n × n, and if there are n distinct eigenvectors, then the eigenvectors of A form a complete basis set for \mathcal{R}^n

Generalized Eigenvectors

Let's say that matrix A has the following characteristic polynomial:

(A - \lambda I) = (-1)^n(\lambda - \lambda_1)^{d_1}(\lambda - \lambda_2)^{d_2} \cdots (\lambda - \lambda_s)^{d_s}

Where d1, d2, ... , ds are known as the algebraic multiplicity of the eigenvalue λi. Also note that d1 + d2 + ... + ds = n, and s < n. In other words, the eigenvalues of A are repeated. Therefore, this matrix doesnt have n distinct eigenvectors. However, we can create vectors known as generalized eigenvectors to make up the missing eigenvectors by satisfying the following equations:

(A-\lambda I)^k v_k = 0
(A-\lambda I)^{k-1} v_k = 0

Right and Left Eigenvectors

The equation for determining eigenvectors is:

(A - \lambda I) v = 0

And because the eigenvector v is on the right, these are more appropriately called "right eigenvectors". However, if we rewrite the equation as follows:

u(A - \lambda I) = 0

The vectors u are called the "left eigenvectors" of matrix A.

Similarity

Matrices A and B are said to be similar to one another if there exists an invertable matrix T such that:

T^{-1}AT = B

If there exists such a matrix T, the matrices are similar. Similar matrices have the same eigenvalues. If A has eigenvectors v1, v2 ..., then B has eigenvectors u given by:

u_i = Tv_i

Matrix Diagonalization

Some matricies are similar to diagonal matrices using a transition matrix, T. We will say that matrix A is diagonalizable if the following equation can be satisfied:

T^{-1}AT = D

Where D is a diagonal matrix. An n × n square matrix is diagonalizable if and only if it has n linearly independent eigenvectors.

Transition Matrix

If an n × n square matrix has n distinct eigenvalues λ, and therefore n distinct eigenvectors v, we can create a transition matrix T as:

T = [v_1 v_2 ... v_n]

And transforming matrix X gives us:

T^{-1}AT = \begin{bmatrix}\lambda_1 & 0 & \cdots & 0 \\
                                 0 & \lambda_2 & \cdots & 0 \\
                                 \vdots & \vdots & \ddots & \vdots \\
                                 0 & 0 & \cdots & \lambda_n\end{bmatrix}

Therefore, if the matrix has n distinct eigenvalues, the matrix is diagonalizable, and the diagonal entries of the diagonal matrix are the corresponding eigenvalues of the matrix.

Complex Eigenvalues

Consider the situation where a matrix A has 1 or more complex conjugate eigenvalue pairs. The eigenvectors of A will also be complex. The resulting diagonal matrix D will have the complex eigenvalues as the diagonal entries. In engineering situations, it is often not a good idea to deal with complex matrices, so other matrix transformations can be used to create matrices that are "nearly diagonal".

Generalized Eigenvectors

If the matrix A does not have a complete set of eigenvectors, that is, that they have d eigenvectors and n - d generalized eigenvectors, then the matrix A is not diagonalizable. However, the next best thing is acheived, and matrix A can be transformed into a Jordan Cannonical Matrix. Each set of generalized eigenvectors that are formed from a single eigenvector basis will create a jordan block. All the distinct eigenvectors that do not spawn any generalized eigenvectors will form a diagonal block in the Jordan matrix.

If λi are the n distinct eigenvalues of matrix A, and vi are the corresponding n distinct eigenvectors, and if wi are the n distinct left-eigenvectors, then the matrix A can be represented as a sum:

A = \sum_{i = 1}^n \lambda_i v_i w_i^T

this is known as the spectral decomposition of A.

Consider a scenario where the matrix representation of a system A differs from the actual implementation of the system by a factor of ΔA. In other words, our system uses the matrix:

A + \Delta A

From the study of Control Systems, we know that the values of the eigenvectors can affect the stability of the system. For that reason, we would like to know how a small error in A will affect the eigenvalues.

First off, we assume that ΔA is a small shift. The definition of "small" in this sense is arbitrary, and will remained open. Keep in mind that the techniques discussed here are more accurate the smaller ΔA is.

If ΔA is the error in the matrix A, then Δλ is the error in the eigenvalues and Δv is the error in the eigenvectors. The characteristic equation becomes:

(A + \Delta A)(v + \Delta v) = (\lambda + \Delta \lambda)(v + \Delta v)

We have an equation now with two unknowns: Δλ and Δv. In other words, we don't know how a small change in A will affect the eigenvalues and eigenvectors. If we multiply out both sides, we get:

Av + \Delta A v + A \Delta v + O(\Delta^2) = \lambda v + \Delta \lambda v + v \Delta \lambda + O(\Delta^2)

This situation seems hopeless, until we multiply both sides by the corresponding left-eigenvector w from the left:

w^TAv + w^T\Delta A v + w^Tv \Delta A = w^T\lambda v + w^T\Delta \lambda v + w^T v \Delta \lambda + O(\Delta^2)

Terms where two Δs (which are known to be small, by definition) are multiplied together, we can say are negligible, and ignore them. Also, we know from our right-eigenvalue equation that:

w^TA = \lambda w^T

Another fact is that the right-eigenvectors and left eigenvectors are orthogonal to each other, so the following result holds:

w^T v = 0

Substituting these results, where necessary, into our long equation above, we get the following simplification:

w^T \Delta A v =  \Delta \lambda w^T\Delta v

And solving for the change in the eigenvalue gives us:

\Delta \lambda = \frac{w^T \Delta A v}{w^T \Delta v}

This approximate result is only good for small values of ΔA, and the result is less precise as the error increases.


Functions of Matrices

If we have functions, and we use a matrix as the input to those functions, the output values are not always intuitive. For instance, if we have a function f(x), and as the input argument we use matrix A, the output matrix is not necessarily the function f applied to the individual elements of A.

Diagonal Matrix

In the special case of diagonal matrices, the result of f(A) is the function applied to each element of the diagonal matrix:

A = \begin{bmatrix} 
             a_{11} & 0 & \cdots & 0 \\
             0 & a_{22} & \cdots & 0 \\
             \vdots & \vdots & \ddots & \vdots \\
             0 & 0 & \cdots & a_{nn}
           \end{bmatrix}

Then the function f(A) is given by:

f(A) = \begin{bmatrix} 
             f(a_{11}) & 0 & \cdots & 0 \\
             0 & f(a_{22}) & \cdots & 0 \\
             \vdots & \vdots & \ddots & \vdots \\
             0 & 0 & \cdots & f(a_{nn})
           \end{bmatrix}

Jordan Cannonical Form

Matrices in Jordan Canonical form also have an easy way to compute the functions of the matrix. However, this method is not nearly as easy as the diagonal matrices described above.

If we have a matrix in Jordan Block form, A, the function f(A) is given by:

f(A) = \begin{bmatrix} 
             \frac{f(a)}{0!} & \frac{f'(a)}{1!} & \cdots & \frac{f^{(r-1)}(a)}{(r-1)!} \\
             0 &  \frac{f(a)}{0!} & \cdots &  \frac{f^(r-2)(a)}{(r-2)!} \\
             \vdots & \vdots & \ddots & \vdots \\
             0 & 0 & \cdots &  \frac{f(a)}{0!}
           \end{bmatrix}

The matrix indices have been removed, because in Jordan block form, all the diagonal elements must be equal.

If the matrix is in Jordan Block form, the value of the function is given as the function applied to the individual diagonal blocks.

If the characteristic equation of matrix A is given by:

\Delta(\lambda) = |A-\lambda I| = (-1)^n(\lambda^n + a_{n-1}\lambda^{n-1} + \cdots + a_0) = 0

Then the Cayley-Hamilton theorem states that the matrix A itself is also a valid solution to that equation:

\Delta(A) = (-1)^n(A^n + a_{n-1}A^{n-1} + \cdots + a_0) = 0

Another theorem worth mentioning here (and by "worth mentioning", we really mean "fundamental for some later topics") is stated as:

If λ are the eigenvalues of matrix A, and if there is a function f that is defined as a linear combination of powers of λ:

f(\lambda) = \sum_{i = 0}^\infty b_i \lambda^i

If this function has a radius of convergence S, and if all the eigenvectors of A have magnitudes less then S, then the matrix A itself is also a solution to that function:

f(A) = \sum_{i = 0}^\infty b_i A^i

Matrix Exponentials

If we have a matrix A, we can raise that matrix to a power of e as follows:

e^{A}

It is important to note that this is not necessarily (not usually) equal to each individual element of A being raised to a power of e. Using taylor-series expansion of exponentials, we can show that:

e^{A} = I + A + \frac{1}{2}A^2 + \frac{1}{6}A^3 + ...  = \sum_{k=0}^\infty{1 \over k!}A^k.

In other words, the matrix exponential can be reducted to a sum of powers of the matrix. This follows from both the taylor series expansion of the exponential function, and the cayley-hamilton theorem discussed previously.

However, this infinite sum is expensive to compute, and because the sequence is infinite, there is no good cut-off point where we can stop computing terms and call the answer a "good approximation". To alleviate this point, we can turn to the Cayley-Hamilton Theorem. Solving the Theorem for An, we get:

A^n = -c_{n-1}A^{n-1} - c_{n-2}A^{n-2} - \cdots - c_1A - c_0I

Multiplying both sides of the equation by A, we get:

A^{n+1} = -c_{n-1}A^n - c_{n-2}A^{n-1} - \cdots - c_1A^2 - c_0A

We can substitute the first equation into the second equation, and the result will be An+1 in terms of the first n - 1 powers of A. In fact, we can repeat that process so that Am, for any arbitrary high power of m can be expressed as a linear combination of the first n - 1 powers of A. Applying this result to our exponential problem:

e^A = \alpha_0I + \alpha_1A + \cdots + \alpha_{n-1}A^{n-1}

Where we can solve for the α terms, and have a finite polynomial that expresses the exponential.

Inverse

The inverse of a matrix exponential is given by:

(e^{A})^{-1} = e^{-A}

Derivative

The derivative of a matrix exponential is:

\frac{d}{dx}e^{Ax} = Ae^{Ax} = e^{Ax}A

Notice that the exponential matrix is commutative with the matrix A. This is not the case with other functions, necessarily.

Sum of Matrices

If we have a sum of matrices in the exponent, we cannot separate them:

e^{(A+B)x} \ne e^{Ax}e^{Bx}

Differential Equations

If we have a first-degree differential equation of the following form:

x'(t) = Ax(t) + f(t)

With initial conditions

x(t_0) = c

Then the solution to that equation is given in terms of the matrix exponential:

x(t) = e^{A(t - t_0)}c + \int_{t_0}^t e^{A(t - \tau)}f(\tau)d\tau

This equation shows up frequently in control engineering.

Laplace Transform

As a matter of some interest, we will show the Laplace Transform of a matrix exponential function:

\mathcal{L}[e^{At}] = (sI - A)^{-1}

We will not use this result any further in this book, although other books on engineering might make use of it.


Function Spaces

Function Space

A function space is a linear space where all the elements of the space are functions. A function space that has a norm operation is known as a normed function space. The spaces we consider will all be normed.

Continuity

f(x) is continuous at x0 if, for every ε > 0 there exists a δ(ε) > 0 such that |f(x) - f(x0)| < ε when |x - x0| < δ(ε).

Common Function Spaces

Here is a listing of some common function spaces. This is not an exhaustive list.

C Space

The C function space is the set of all functions that are continuous.

The metric for C space is defined as:

\rho(x, y)_{L_2} = \max|f(x) - g(x)|

Consider the metric of sin(x) and cos(x):

\rho(sin(x), cos(x))_{L_2} = \sqrt{2}, x = \frac{3\pi}{4}

Cp Space

The Cp is the set of all continuous functions for which the first p derivatives are also continuous. If  p = \infty the function is called "infinitely continuous. The set C^\infty is the set of all such functions. Some examples of functions that are infinitely continuous are exponentials, sinusoids, and polynomials.

L Space

The L space is the set of all functions that are finitely integrable over a given interval [a, b].

f(x) is in L(a, b) if:

\int_a^b |f(x)|dx < \infty

L p Space

The Lp space is the set of all functions that are finitely integrable over a given interval [a, b] when raised to the power p:

\int_a^b |f(x)|^pdx < \infty

Most importantly for engineering is the L2 space, or the set of functions that are "square integrable".

The L2 space is very important to engineers, because functions in this space do not need to be continuous. Many discontinuous engineering functions, such as the delta (impulse) function, the unit step function, and other discontinuous finctions are part of this space.

L2 Functions

A large number of functions qualify as L2 functions, including uncommon, discontinuous, piece-wise, and other functions. A function which, over a finite range, has a finite number of discontinuties is an L2 function. For example, a unit step and an impulse function are both L2 functions. Also, other functions useful in signal analysis, such as square waves, triangle waves, wavelets, and other functions are L2 functions.

In practice, most physical systems have a finite amount of noise associated with them. Noisy signals and random signals, if finite, are also L2 functions: this makes analysis of those functions using the techniques listed below easy.

Null Function

The null functions of L2 are the set of all functions φ in L2 that satisfy the equation:

\int_a^b |\phi(x)|^2dx = 0

for all a and b.

Norm

The L2 norm is defined as follows:


[L2 Norm]

\|f(x)\|_{L_2} = \sqrt{\int_a^b |f(x)|^2dx}

If the norm of the function is 1, the function is normal.

We can show that the derivative of the norm squared is:

\frac{\partial \|x\|^2}{\partial x} = 2x

Scalar Product

The scalar product in L2 space is defined as follows:


[L2 Scalar Product]

\langle f(x), g(x)\rangle_{L_2} = \int_a^bf(x)g(x)dx

If the scalar product of two functions is zero, the functions are orthogonal.

We can show that given coefficient matrices A and B, and variable x, the derivative of the scalar product can be given as:

\frac{\partial}{\partial x}\langle Ax, Bx\rangle = A^TBx + B^TAx

We can recognize this as the product rule of differentiation. Generalizing, we can say that:

\frac{\partial}{\partial x}\langle f(x), g(x)\rangle = f'(x)g(x) + f(x)g'(x)

We can also say that the derivative of a matrix A times a vector x is:

\frac{d}{dx}Ax = A^T

Metric

The metric of two functions (we will not call it the "distance" here, because that word has no meaning in a function space) will be denoted with ρ(x,y). We can define the metric of an L2 function as follows:


[L2 Metric]

\rho(x, y)_{L_2} = \sqrt{\int_a^b|f(x) - g(x)|^2dx}

Cauchy-Schwarz Inequality

The Cauchy-Schwarz Inequality still holds for L2 functions, and is restated here:

|\langle f(x), g(x)\rangle| \le \|f\|\|g\|

Linear Independance

A set of functions in L2 are linearly independent if:

a_1f_1(x) + a_2f_2(x) + \cdots + a_nf_n(x) = 0

If and only if all the a coefficients are 0.

Grahm-Schmidt Orthogonalization

The Grahm-Schmidt technique that we discussed earlier still works with functions, and we can use it to form a set of linearly independent, orthogonal functions in L2.

For a set of functions φ, we can make a set of orthogonal functions ψ that space the same space but are orthogonal to one another:


[Grahm-Schmidt Orthogonalization]

\psi_1 = \phi_1
\psi_i = \phi_i - \sum_{n=1}^{i-1}\frac{\langle \psi_n, \phi_{i}\rangle}{\langle \psi_n, \psi_n\rangle}\psi_n

Basis

The L2 is an infinite-basis set, which means that any basis for the L2 set will require an infinite number of basis functions. To prove that an infinite set of orthogonal functions is a basis for the L2 space, we need to show that the null function is the only function in L2 that is orthogonal to all the basis functions. If the null function is the only function that satisfies this relationship, then the set is a basis set for L2.

By definition, we can express any function in L2 as a linear sum of the basis elements. If we have basis elements φ, we can define any other function ψ as a linear sum:

\psi(x) = \sum_{n = 1}^\infty a_n\phi_n(x)

We will explore this important result in the section on Fourier Series.

There are some special spaces known as Banach spaces, and Hilbert spaces.

Convergent Functions

Let's define the piece-wise function φ(x) as:

\phi_n(x) = \left\{\begin{matrix}0 &  x \le 0 \\
                                         nx & 0 < x \le \frac{1}{n} \\
                                         1 & \frac{1}{n} < x 
                           \end{matrix}\right.

We can see that as we set n \to \infty, this function becomes the unit step function. We can say that as n approaches infinity, that this function converges to the unit step function. Notice that this function only converges in the L2 space, because the unit step function does not exist in the C space (it is not continuous).

Convergence

We can say that a function φ converges to a function φ* if:

\lim_{n \to \infty}\|\phi_n - \phi^*\| = 0

We can call this sequences, and all such sequences that converge to a given function as n approaches infinity a cauchy sequence.

Complete Function Spaces

A function space is called complete if all sequences in that space converge to another function in that space.

Banach Space

A Banach Space is a complete normed function space.

Hilbert Space

A Hilbert Space is a Banach Space with respect to a norm induced by the scalar product. That is, if there is a scalar product in the space X, then we can say the norm is induced by the scalar product if we can write:

\|f\| = g(\langle f, f\rangle)

That is, that the norm can be written as a function of the scalar product. In the L2 space, we can define the norm as:

\|f\| = \sqrt{\langle f, f\rangle}

If the scalar product space is a Banach Space, if the norm space is also a Banach space.

In a Hilbert Space, the Parallelogram rule holds for all members f and g in the function space:

\|f + g\|^2 + \|f - g\|^2 = 2\|f\|^2 + 2\|g\|^2

The L2 space is a Hilbert Space. The C space, however, is not.


Fourier Series

The L2 space is an infinite function space, and therefore a linear combination of any infinite set of orthogonal functions can be used to represent any single member of the L2 space. The decomposition of an L2 function in terms of an infinite basis set is a technique known as the Fourier Decomposition of the function, and produces a result called the Fourier Series.

Fourier Basis

Let's consider a set of L2 functions, \phi, as follows:

\phi = \{1, \sin(\pi x), \cos(\pi x), \sin(2\pi x), \cos(2\pi x), \sin(3\pi x), \cos(3\pi x) ...\}. \

We can prove that over a range [0, 2\pi], all of these functions are orthogonal:

\int_0^{2\pi} 1 \cdot \cos(n\pi x) dx = 0
\int_0^{2\pi} 1 \cdot \sin(n\pi x) dx = 0
\int_0^{2\pi} \sin(n\pi x) \sin(m\pi x)dx = 0, n \ne m
\int_0^{2\pi} \sin(n\pi x) \cos(m\pi x)dx = 0
\int_0^{2\pi} \cos(n\pi x) \cos(m\pi x)dx = 0, n \ne m

Because \phi is as an infinite orthogonal set in L2, \phi is also a valid basis set in the L2 space. Therefore, we can decompose any function in L2 as the following sum:


[Classical Fourier Series]

\psi(x) = a_0(1) + \sum_{n=1}^\infty a_n \sin(n\pi x) + \sum_{m=1}^\infty b_m\cos(m\pi x)

However, the difficulty occurs when we need to calculate the a and b coefficients. We will show the method to do this below:

a0: The Constant Term

Calculation of a0 is the easiest, and therefore we will show how to calculate it first. We use the value of a0 which minimizes the error in approximating f(x) by the Fourier series.

First, define an error function, E, that is equal to the squared norm of the difference between the function f(x) and the infinite sum above:

E = \frac{1}{2}\int_0^{2\pi}\|f(x) - a_0(1) - \sum_{n=1}^\infty a_n \sin(n\pi x) - \sum_{m=1}^\infty b_m\cos(m\pi x)\|^2dx

For ease, we will write all the basis functions as the set φ, described above:

\sum_{i=0}^\infty a_i\phi_i = a_0 + \sum_{n=1}^\infty a_n \sin(n\pi x) + \sum_{m=1}^\infty b_m\cos(m\pi x)

Combining the last two functions together, and writing the norm as an integral, we can say:

E = \frac{1}{2}\int_0^{2\pi}|\sum_{i=0}^\infty a_i\phi_i|^2dx

We attempt to minimize this error function with respect to the constant term. To do this, we differentiate both sides with respect to a0, and set the result to zero:

0 = \frac{\partial E}{\partial a_0} = \int_0^{2\pi} (f(x) - \sum_{i=0}^\infty a_i\phi_i(x))(-\phi_0(x))dx

The φ0 term comes out of the sum because of the chain rule: it is the only term in the entire sum dependant on a0. We can separate out the integral above as follows:

\int_0^{2\pi} (f(x) - \sum_{i=0}^\infty a_i\phi_i)(-\phi_0)dx = -\int_0^{2\pi}f(x)\phi_0(x)dx + a_0\int_0^{2\pi}\phi_0(x)\phi_0(x)dx

All the other terms drop out of the infinite sum because they are all orthogonal to φ0. Again, we can rewrite the above equation in terms of the scalar product:

0 = -\langle f(x), \phi_0(x)\rangle + a_0\langle \phi_0(x), \phi_0(x)\rangle

And solving for a0, we get our final result:

a_0 = \frac{\langle f(x), \phi_0(x)\rangle}{\langle \phi_0(x), \phi_0(x)\rangle}

Sin Coefficients

Using the above method, we can solve for the an coefficients of the sin terms:

a_n = \frac{\langle f(x), \sin(n\pi x)\rangle}{\langle \sin(n\pi x), \sin(n\pi x)\rangle}

Cos Coefficients

Also using the above method, we can solve for the bn terms of the cos term.

b_n = \frac{\langle f(x), \cos(n\pi x)\rangle}{\langle \cos(n\pi x), \cos(n\pi x)\rangle}

The classical Fourier series uses the following basis:

\phi(x) = {1, \sin(n\pi x), \cos(n \pi x)}, n = 1, 2, ...

However, we can generalize this concept to extend to any orthogonal basis set from the L2 space.

We can say that if we have our orthogonal basis set that is composed of an infinite set of arbitrary, orthogonal L2 functions:

\phi = {\phi_1, \phi_2, \cdots, }

We can define any L2 function f(x) in terms of this basis set:


[Generalized Fourier Series]

f(x) = \sum_{n=1}^\infty a_n\phi_n(x)

Using the method from the previous chapter, we can solve for the coefficients as follows:


[Generalized Fourier Coefficient]

a_n = \frac{\langle f(x), \phi_n(x)\rangle}{\langle \phi_n(x), \phi_n(x)\rangle}

Bessel's equation relates the original function to the fourier coefficients an:


[Bessel's Equation]

\sum_{n=1}^\infty a_n^2 \le \|f(x)\|^2

If the basis set is infinitely orthogonal, and if an infinite sum of the basis functions perfectly reproduces the function f(x), then the above equation will be an equality, known as Parseval's Theorem:


[Parseval's Theorem]

\sum_{n=1}^\infty a_n^2 = \|f(x)\|^2

Engineers may recognize this as a relationship between the energy of the signal, as represented in the time and frequency domains. However, parseval's rule applies not only to the classical Fourier series coefficients, but also to the generalized series coefficients as well.

The concept of the fourier series can be expanded to include 2-dimensional and n-dimensional function decomposition as well. Let's say that we have a function in terms of independent variables x and y. We can decompose that function as a double-summation as follows:

f(x,y) = \sum_{i=1}^\infty\sum_{j=1}^\infty a_{ij}\phi_{ij}(x,y)

Where φij is a 2-dimensional set of orthogonal basis functions. We can define the coefficients as:

a_{ij} = \frac{\langle f(x,y), \phi_{ij}(x,y)\rangle}{\langle \phi_{ij}(x,y),\phi_{ij}(x,y)\rangle}

This same concept can be expanded to include series with n-dimensions.

further reading


Miscellany


[Lyapunov's Equation]

AM + MB = C

Where A, B and C are constant square matrices, and M is the solution that we are trying to find. If A, B, and C are of the same order, and if A and B have no eigenvalues in common, then the solution can be given in terms of matrix exponentials:

M = -\int_0^\infty e^{Ax}Ce^{Bx}dx

Leibniz' rule allows us to take the derivative of an integral, where the derivative and the integral are performed using different variables:

Wavelets are orthogonal basis functions that only exist for certain windows in time. This is in contrast to sinusoidal waves, which exist for all times t. A wavelet, because it is dependant on time, can be used as a basis function. A wavelet basis set gives rise to wavelet decomposition, which is a 2-variable decomposition of a 1-variable function. Wavelet analysis allows us to decompose a function in terms of time and frequency, while fourier decomposition only allows us to decompose a function in terms of frequency.

Mother Wavelet

If we have a basic wavelet function ψ(t), we can write a 2-dimensional function known as the mother wavelet function as such:

\psi_{jk} = 2^{j/2}\psi(2^jt - k)

Wavelet Series

If we have our mother wavelet function, we can write out a fourier-style series as a double-sum of all the wavelets:

f(t) = \sum_{j=0}^\infty\sum_{k=0}^\infty a_{jk}\psi_{jk}(t)

Scaling Function

Sometimes, we can add in an additional function, known as a scaling function:

f(t) = \sum_{i=0}^\infty c_i\phi_i + \sum_{j=0}^\infty\sum_{k=0}^\infty a_{jk}\psi_{jk}(t)

The idea is that the scaling function is larger than the wavelet functions, and occupies more time. In this case, the scaling function will show long-term changes in the signal, and the wavelet functions will show short-term changes in the signal.

Optimization

Optimization is an important concept in engineering. Finding any solution to a problem is not nearly as good as finding the one "optimal solution" to the problem. Optimization problems are typically reformatted so they become minimization problems, which are well-studied problems in the field of mathematics.

Typically, when optimizing a system, the costs and benefits of that system are arranged into a cost function. It is the engineers job then to minimize this cost function (and thereby minimize the cost of the system). It is worth noting at this point that the word "cost" can have multiple meanings, depending on the particular problem. For instance, cost can refer to the actual monetary cost of a system (number of computer units to host a website, amount of cable needed to connect Philadelphia and New York), the delay of the system (loading time for a website, transmission delay for a communication network), the reliability of the system (number of dropped calls in a cellphone network, average lifetime of a car transmission), or any other types of factors that reduce the effectiveness and efficiency of the system.

Because optimization typically becomes a mathematical minimization problem, we are going to discuss minimization here.

Minimization

Minimization is the act of finding the numerically lowest point in a given function, or in a particular range of a given function. Students of mathematics and calculus may remember using the derivative of a function to find the maxima and minima of a function. If we have a function f(x), we can find the maxima, minima, or saddle-points (points where the function has zero slope, but is not a maxima or minima) by solving for x in the following equation:

\frac{df(x)}{dx} = 0

In other words, we are looking for the roots of the derivative of the function f plus those points where f has a corner. Once we have the so called critical points of the function (if any), we can test them to see if they are relatively high (maxima), or relatively low (minima). Some words to remember in this context are:

Global Minima
A global minimum of a function is the lowest value of that function anywhere. If the domain of the function is restricted, say A < x < B, then the minima can also occur at the boundary, here A or B.
Local Minima
A local minimum of a function is the lowest value of that function within a small range. A value can thus be a local minimum even though there are smaller function values, but not in a small neighborhood.

Unconstrained Minimization

Unconstrained Minimization refers to the minimization of the given function without having to worry about any other rules or caveats. Constrained Minimization, on the other hand, refers to minimization problems where other relations called constraints must be satisfied at the same time.

Beside the method above (where we take the derivative of the function and set that equal to zero), there are several numerical methods that we can use to find the minima of a function. For these methods there are useful computational tools such as Matlab.

Hessian Matrix

The function has a local minima at a point x if the Hessian matrix H(x) is positive definite:

H(x) = \frac{\partial^2 f(x)}{\partial x^2}

Where x is a vector of all the independant variables of the function. If x is a scalar variable, the hessian matrix reduces to the second derivative of the function f.

Newton-Raphson Method

The Newton-Raphson Method of computing the minima of a function f uses an iterative computation. We can define the sequence:

x^{n+1} = x^n - \frac{f'(x)}{f''(x)}

Where

f'(x) = \frac{df(x)}{dx}
f''(x) = \frac{d^2f(x)}{dx^2}

As we repeat the above computation, plugging in consecutive values for n, our solution will converge on the true solution. However, this process will take infinitely many iterations to converge, but if an approximation of the true solution will suffices, you can stop after only few iterations, because the sequence converges rather quickly (quadratic).

Steepest Descent Method

The Newton-Raphson method can be tricky because it relies on the second derivative of the function f, and this can oftentimes be difficult (if not impossible) to accurately calculate. The Steepest Descent Method, however, does not require the second derivative, but it does require the selection of an appropriate scalar quantity ε, which cannot be chosen arbitrarily (but which can also not be calculated using a set formula). The Steepest Descent method is defined by the following iterative computation:

x^{n+1} = x^n - \epsilon \frac{df(x)}{dx}

Where epsilon needs to be sufficiently small. If epsilon is too large, the iteration may diverge. If this happens, a new epsilon value needs to be chosen, and the process needs to be repeated.

Conjugate Gradient Method

Constrained Minimization

Constrained Minimization' is the process of finding the minimum value of a function under a certain number of additional rules called constraints. For instance, we could say "Find the minium value of f(x), but g(x) must equal 10". These kinds of problems are more difficult, but the Khun-Tucker theorem, and also the Karush-Khun-Tucker theorem help to solve them.

There are two different types of constraints: equality constraints and inequality constraints. We will consider them individually, and then mixed constraints.

Equality Constraints

The Khun-Tucker Theorem is a method for minimizing a function f(x) under the equality constraint g(x). The theorem reads as follows:

Given the cost function f, and an equality constraint g in the following form:

g(x) = 0,

Then we can convert this problem into an unconstrained minimization problem by constructing the Lagrangian function of f and g:

L(x) = f(x) + \langle \Lambda, g(x)\rangle

Where Λ is the lagrange multiplier, and < , > denotes the scalar product of the vector space Rn (where n is the number of equality constraints). We will discuss scalar products in more detail later. If we differentiate this equation with respect to x, we can find the minimum of this whole function L(x,Λ), and that will be the minimum of our function f.

\frac{df(x)}{dx} + \left\langle\Lambda,\frac{dg(x)}{dx}\right\rangle = 0
g(x) = 0

This is a set of n+k equations with n+k unknown variables (n Λs and k xs).


Inequality Constraints

Similar to the method above, let us say that we have a cost function f, and an inequality constraint in the following form:

g(x) \le 0

Then we can take the Lagrangian of this again:

L(x) = f(x) + \langle \Lambda, g(x)\rangle

But we now must use the following three equations/ inequalities in determining our solution:

\frac{df}{dx} = 0
\langle\Lambda , g(x)\rangle = 0
\Lambda \ge 0

These last second equation can be interpreted in the following way:

if g(x) < 0, then \Lambda = 0
if g(x) \le 0, then \Lambda \ge 0

Using these two additional equations/ inequalities, we can solve in a similar manner as above.

Mixed Constraints

If we have a set of equality and inequality constraints

g(x) = 0
h(x) \le 0

we can combine them into a single Lagrangian with two additional conditions:

L(x) = f(x) + \langle\Lambda, g(x)\rangle + \langle \mu, h(x)\rangle
g(x)=0
\langle\mu, h(x)\rangle = 0
\mu \ge 0

Infinite Dimensional Minimization

The above methods work well if the variables involved in the analysis are finite-dimensional vectors, like those in the RN. However, when we are trying to minimize something that is more complex than a vector, i.e. a function we need the following concept. We consider functions that live in a subspace of L2(RN), which is an infinite-dimensional vector space. We will define the term functional as follows:

Functional
A functional is a map that takes one or more functions as arguments, and which returns a scalar value.

Let us say that we consider functions x of time t (N=1). Suppose further we have a fixed function f in two variables. With that function, we can associate a cost functional J:

J[x] = \int_a^b f(x,t) dt

Where we are explicitly taking account of t in the definition of f. To minimize this function, like all minimization problems, we need to take the derivative of the function, and set the derivative to zero. However, we need slightly more sophisticated version of derivative, because x is a function. This is where the Gateaux Derivative enters the field.

Gateaux Derivative

We can define the Gateaux Derivative in terms of the following limit:

\delta F(x, h) = \lim_{\epsilon \to 0} \frac{1}{\epsilon} [F(x + \epsilon h) - F(x)]

Which is similar to the classical definition of the derivative in the direction h. In plain words, we took the derivative of F with respect to x in the direction of h. h is an arbitrary function of time, in the same space as x (here we are talking about the space L2). Analog to the one-dimensional case a function is differentiable at x iff the above limit exists. We can use the Gateaux derivative to find the minimization of our functional above.

Euler-Lagrange Equation

We will now use the Gateaux derivative, discussed above, to find the minimizer of the following types of function:

J(x(t)) = \int_a^b f(x(t), x'(t), t)dt

We thus have to find the solutions to the equation:

\delta J(x) = 0

The solution is the Euler-Lagrange Equation:

\frac{\partial f}{\partial x} - \frac{d}{dt}\frac{\partial f}{\partial x'} = 0

The partial derivatives are done in an ordinary way ignoring the fact that x is a function of t. Solutions to this equation are either maxima, minima, or saddle points of the cost functional J.

Example: Shortest Distance

We've heard colloquially that the shortest distance between two points is a straight line. We can use the Euler-Lagrange equation to prove this rule.

If we have two points in R2, a, and b, we would like to find the minimum curve (x,y(x)) that joins these two points. Line element ds reads:

ds = \sqrt{dx^2 + dy^2}

Our function that we are trying to minimize then is defined as:

J[y] = \int_a^b ds

or:

J[y] = \int_a^b \sqrt{1 + \left(\frac{dy}{dx}\right)^2}dx

We can take the Gateaux derivative of the function J and set it equal to zero to find the minimum function between these two points. Denoting the square root as f, we get

0=\frac{\partial f}{\partial y}-\frac{d}{dx}\left(\frac{\partial f}{\partial y'}\right)=y''\frac{1}{\left(1+y^{\prime2}\right)^{3/2}} \;.

Knowing that the line element will be finite this boils down to the equation

\frac{d^2y}{dx^2}=0

with the well known solution

y(x)=mx+n=\frac{b_y-a_y}{b_x-a_x}(x-a_x)+a_y \;.

License

GNU Free Documentation License

Version 1.3, 3 November 2008 Copyright (C) 2000, 2001, 2002, 2007, 2008 Free Software Foundation, Inc. <http://fsf.org/>

Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.

0. PREAMBLE

The purpose of this License is to make a manual, textbook, or other functional and useful document "free" in the sense of freedom: to assure everyone the effective freedom to copy and redistribute it, with or without modifying it, either commercially or noncommercially. Secondarily, this License preserves for the author and publisher a way to get credit for their work, while not being considered responsible for modifications made by others.

This License is a kind of "copyleft", which means that derivative works of the document must themselves be free in the same sense. It complements the GNU General Public License, which is a copyleft license designed for free software.

We have designed this License in order to use it for manuals for free software, because free software needs free documentation: a free program should come with manuals providing the same freedoms that the software does. But this License is not limited to software manuals; it can be used for any textual work, regardless of subject matter or whether it is published as a printed book. We recommend this License principally for works whose purpose is instruction or reference.

1. APPLICABILITY AND DEFINITIONS

This License applies to any manual or other work, in any medium, that contains a notice placed by the copyright holder saying it can be distributed under the terms of this License. Such a notice grants a world-wide, royalty-free license, unlimited in duration, to use that work under the conditions stated herein. The "Document", below, refers to any such manual or work. Any member of the public is a licensee, and is addressed as "you". You accept the license if you copy, modify or distribute the work in a way requiring permission under copyright law.

A "Modified Version" of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or translated into another language.

A "Secondary Section" is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or authors of the Document to the Document's overall subject (or to related matters) and contains nothing that could fall directly within that overall subject. (Thus, if the Document is in part a textbook of mathematics, a Secondary Section may not explain any mathematics.) The relationship could be a matter of historical connection with the subject or with related matters, or of legal, commercial, philosophical, ethical or political position regarding them.

The "Invariant Sections" are certain Secondary Sections whose titles are designated, as being those of Invariant Sections, in the notice that says that the Document is released under this License. If a section does not fit the above definition of Secondary then it is not allowed to be designated as Invariant. The Document may contain zero Invariant Sections. If the Document does not identify any Invariant Sections then there are none.

The "Cover Texts" are certain short passages of text that are listed, as Front-Cover Texts or Back-Cover Texts, in the notice that says that the Document is released under this License. A Front-Cover Text may be at most 5 words, and a Back-Cover Text may be at most 25 words.

A "Transparent" copy of the Document means a machine-readable copy, represented in a format whose specification is available to the general public, that is suitable for revising the document straightforwardly with generic text editors or (for images composed of pixels) generic paint programs or (for drawings) some widely available drawing editor, and that is suitable for input to text formatters or for automatic translation to a variety of formats suitable for input to text formatters. A copy made in an otherwise Transparent file format whose markup, or absence of markup, has been arranged to thwart or discourage subsequent modification by readers is not Transparent. An image format is not Transparent if used for any substantial amount of text. A copy that is not "Transparent" is called "Opaque".

Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo input format, LaTeX input format, SGML or XML using a publicly available DTD, and standard-conforming simple HTML, PostScript or PDF designed for human modification. Examples of transparent image formats include PNG, XCF and JPG. Opaque formats include proprietary formats that can be read and edited only by proprietary word processors, SGML or XML for which the DTD and/or processing tools are not generally available, and the machine-generated HTML, PostScript or PDF produced by some word processors for output purposes only.

The "Title Page" means, for a printed book, the title page itself, plus such following pages as are needed to hold, legibly, the material this License requires to appear in the title page. For works in formats which do not have any title page as such, "Title Page" means the text near the most prominent appearance of the work's title, preceding the beginning of the body of the text.

The "publisher" means any person or entity that distributes copies of the Document to the public.

A section "Entitled XYZ" means a named subunit of the Document whose title either is precisely XYZ or contains XYZ in parentheses following text that translates XYZ in another language. (Here XYZ stands for a specific section name mentioned below, such as "Acknowledgements", "Dedications", "Endorsements", or "History".) To "Preserve the Title" of such a section when you modify the Document means that it remains a section "Entitled XYZ" according to this definition.

The Document may include Warranty Disclaimers next to the notice which states that this License applies to the Document. These Warranty Disclaimers are considered to be included by reference in this License, but only as regards disclaiming warranties: any other implication that these Warranty Disclaimers may have is void and has no effect on the meaning of this License.

2. VERBATIM COPYING

You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the copyright notices, and the license notice saying this License applies to the Document are reproduced in all copies, and that you add no other conditions whatsoever to those of this License. You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute. However, you may accept compensation in exchange for copies. If you distribute a large enough number of copies you must also follow the conditions in section 3.

You may also lend copies, under the same conditions stated above, and you may publicly display copies.

3. COPYING IN QUANTITY

If you publish printed copies (or copies in media that commonly have printed covers) of the Document, numbering more than 100, and the Document's license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the back cover. Both covers must also clearly and legibly identify you as the publisher of these copies. The front cover must present the full title with all words of the title equally prominent and visible. You may add other material on the covers in addition. Copying with changes limited to the covers, as long as they preserve the title of the Document and satisfy these conditions, can be treated as verbatim copying in other respects.

If the required texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit reasonably) on the actual cover, and continue the rest onto adjacent pages.

If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machine-readable Transparent copy along with each Opaque copy, or state in or with each Opaque copy a computer-network location from which the general network-using public has access to download using public-standard network protocols a complete Transparent copy of the Document, free of added material. If you use the latter option, you must take reasonably prudent steps, when you begin distribution of Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or retailers) of that edition to the public.

It is requested, but not required, that you contact the authors of the Document well before redistributing any large number of copies, to give them a chance to provide you with an updated version of the Document.

4. MODIFICATIONS

You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release the Modified Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it. In addition, you must do these things in the Modified Version:

  1. Use in the Title Page (and on the covers, if any) a title distinct from that of the Document, and from those of previous versions (which should, if there were any, be listed in the History section of the Document). You may use the same title as a previous version if the original publisher of that version gives permission.
  2. List on the Title Page, as authors, one or more persons or entities responsible for authorship of the modifications in the Modified Version, together with at least five of the principal authors of the Document (all of its principal authors, if it has fewer than five), unless they release you from this requirement.
  3. State on the Title page the name of the publisher of the Modified Version, as the publisher.
  4. Preserve all the copyright notices of the Document.
  5. Add an appropriate copyright notice for your modifications adjacent to the other copyright notices.
  6. Include, immediately after the copyright notices, a license notice giving the public permission to use the Modified Version under the terms of this License, in the form shown in the Addendum below.
  7. Preserve in that license notice the full lists of Invariant Sections and required Cover Texts given in the Document's license notice.
  8. Include an unaltered copy of this License.
  9. Preserve the section Entitled "History", Preserve its Title, and add to it an item stating at least the title, year, new authors, and publisher of the Modified Version as given on the Title Page. If there is no section Entitled "History" in the Document, create one stating the title, year, authors, and publisher of the Document as given on its Title Page, then add an item describing the Modified Version as stated in the previous sentence.
  10. Preserve the network location, if any, given in the Document for public access to a Transparent copy of the Document, and likewise the network locations given in the Document for previous versions it was based on. These may be placed in the "History" section. You may omit a network location for a work that was published at least four years before the Document itself, or if the original publisher of the version it refers to gives permission.
  11. For any section Entitled "Acknowledgements" or "Dedications", Preserve the Title of the section, and preserve in the section all the substance and tone of each of the contributor acknowledgements and/or dedications given therein.
  12. Preserve all the Invariant Sections of the Document, unaltered in their text and in their titles. Section numbers or the equivalent are not considered part of the section titles.
  13. Delete any section Entitled "Endorsements". Such a section may not be included in the Modified version.
  14. Do not retitle any existing section to be Entitled "Endorsements" or to conflict in title with any Invariant Section.
  15. Preserve any Warranty Disclaimers.

If the Modified Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain no material copied from the Document, you may at your option designate some or all of these sections as invariant. To do this, add their titles to the list of Invariant Sections in the Modified Version's license notice. These titles must be distinct from any other section titles.

You may add a section Entitled "Endorsements", provided it contains nothing but endorsements of your Modified Version by various parties—for example, statements of peer review or that the text has been approved by an organization as the authoritative definition of a standard.

You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) any one entity. If the Document already includes a cover text for the same cover, previously added by you or by arrangement made by the same entity you are acting on behalf of, you may not add another; but you may replace the old one, on explicit permission from the previous publisher that added the old one.

The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply endorsement of any Modified Version.

5. COMBINING DOCUMENTS

You may combine the Document with other documents released under this License, under the terms defined in section 4 above for modified versions, provided that you include in the combination all of the Invariant Sections of all of the original documents, unmodified, and list them all as Invariant Sections of your combined work in its license notice, and that you preserve all their Warranty Disclaimers.

The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced with a single copy. If there are multiple Invariant Sections with the same name but different contents, make the title of each such section unique by adding at the end of it, in parentheses, the name of the original author or publisher of that section if known, or else a unique number. Make the same adjustment to the section titles in the list of Invariant Sections in the license notice of the combined work.

In the combination, you must combine any sections Entitled "History" in the various original documents, forming one section Entitled "History"; likewise combine any sections Entitled "Acknowledgements", and any sections Entitled "Dedications". You must delete all sections Entitled "Endorsements".

6. COLLECTIONS OF DOCUMENTS

You may make a collection consisting of the Document and other documents released under this License, and replace the individual copies of this License in the various documents with a single copy that is included in the collection, provided that you follow the rules of this License for verbatim copying of each of the documents in all other respects.

You may extract a single document from such a collection, and distribute it individually under this License, provided you insert a copy of this License into the extracted document, and follow this License in all other respects regarding verbatim copying of that document.

7. AGGREGATION WITH INDEPENDENT WORKS

A compilation of the Document or its derivatives with other separate and independent documents or works, in or on a volume of a storage or distribution medium, is called an "aggregate" if the copyright resulting from the compilation is not used to limit the legal rights of the compilation's users beyond what the individual works permit. When the Document is included in an aggregate, this License does not apply to the other works in the aggregate which are not themselves derivative works of the Document.

If the Cover Text requirement of section 3 is applicable to these copies of the Document, then if the Document is less than one half of the entire aggregate, the Document's Cover Texts may be placed on covers that bracket the Document within the aggregate, or the electronic equivalent of covers if the Document is in electronic form. Otherwise they must appear on printed covers that bracket the whole aggregate.

8. TRANSLATION

Translation is considered a kind of modification, so you may distribute translations of the Document under the terms of section 4. Replacing Invariant Sections with translations requires special permission from their copyright holders, but you may include translations of some or all Invariant Sections in addition to the original versions of these Invariant Sections. You may include a translation of this License, and all the license notices in the Document, and any Warranty Disclaimers, provided that you also include the original English version of this License and the original versions of those notices and disclaimers. In case of a disagreement between the translation and the original version of this License or a notice or disclaimer, the original version will prevail.

If a section in the Document is Entitled "Acknowledgements", "Dedications", or "History", the requirement (section 4) to Preserve its Title (section 1) will typically require changing the actual title.

9. TERMINATION

You may not copy, modify, sublicense, or distribute the Document except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense, or distribute it is void, and will automatically terminate your rights under this License.

However, if you cease all violation of this License, then your license from a particular copyright holder is reinstated (a) provisionally, unless and until the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some reasonable means prior to 60 days after the cessation.

Moreover, your license from a particular copyright holder is reinstated permanently if the copyright holder notifies you of the violation by some reasonable means, this is the first time you have received notice of violation of this License (for any work) from that copyright holder, and you cure the violation prior to 30 days after your receipt of the notice.

Termination of your rights under this section does not terminate the licenses of parties who have received copies or rights from you under this License. If your rights have been terminated and not permanently reinstated, receipt of a copy of some or all of the same material does not give you any rights to use it.

10. FUTURE REVISIONS OF THIS LICENSE

The Free Software Foundation may publish new, revised versions of the GNU Free Documentation License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. See http://www.gnu.org/copyleft/.

Each version of the License is given a distinguishing version number. If the Document specifies that a particular numbered version of this License "or any later version" applies to it, you have the option of following the terms and conditions either of that specified version or of any later version that has been published (not as a draft) by the Free Software Foundation. If the Document does not specify a version number of this License, you may choose any version ever published (not as a draft) by the Free Software Foundation. If the Document specifies that a proxy can decide which future versions of this License can be used, that proxy's public statement of acceptance of a version permanently authorizes you to choose that version for the Document.

11. RELICENSING

"Massive Multiauthor Collaboration Site" (or "MMC Site") means any World Wide Web server that publishes copyrightable works and also provides prominent facilities for anybody to edit those works. A public wiki that anybody can edit is an example of such a server. A "Massive Multiauthor Collaboration" (or "MMC") contained in the site means any set of copyrightable works thus published on the MMC site.

"CC-BY-SA" means the Creative Commons Attribution-Share Alike 3.0 license published by Creative Commons Corporation, a not-for-profit corporation with a principal place of business in San Francisco, California, as well as future copyleft versions of that license published by that same organization.

"Incorporate" means to publish or republish a Document, in whole or in part, as part of another Document.

An MMC is "eligible for relicensing" if it is licensed under this License, and if all works that were first published under this License somewhere other than this MMC, and subsequently incorporated in whole or in part into the MMC, (1) had no cover texts or invariant sections, and (2) were thus incorporated prior to November 1, 2008.

The operator of an MMC Site may republish an MMC contained in the site under CC-BY-SA on the same site at any time before August 1, 2009, provided the MMC is eligible for relicensing.

How to use this License for your documents

To use this License in a document you have written, include a copy of the License in the document and put the following copyright and license notices just after the title page:

Copyright (c) YEAR YOUR NAME.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3
or any later version published by the Free Software Foundation;
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
A copy of the license is included in the section entitled "GNU
Free Documentation License".

If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, replace the "with...Texts." line with this:

with the Invariant Sections being LIST THEIR TITLES, with the
Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST.

If you have Invariant Sections without Cover Texts, or some other combination of the three, merge those two alternatives to suit the situation.

If your document contains nontrivial examples of program code, we recommend releasing these examples in parallel under your choice of free software license, such as the GNU General Public License, to permit their use in free software.

Last modified on 20 March 2007, at 01:35