Introduction to Numerical Methods/Regression

Objectives

  • define linear and non-linear regression models.
  • reason about goodness of fit criteria.
  • derive coefficients of linear regression models.
  • derive coefficients of non-linear regression models

Resources

Regression is different from interpolation in that it allows us to approximate overdetermined system, which has more equations than unknowns. This is useful when the exact solution is too expensive or unnecessary due to errors in the data, such as measurement errors or random noise.

Linear Regression

edit

Linear regression finds a linear function that most nearly passes through the given data points - the regression (function) line best fits the data. We must define our metric for measuring the goodness of fit. If all data points lie on the function it is a perfect fit, otherwise there are errors in the function representation of the data. We can measure the deviations of the data points from the function. As shown in the following example (source) neither the sum of errors nor the sum of the absolute errors is a good metric. The data include four points (2, 4), (3, 6), (2, 6), and (3, 8). We use a straight line to fit the data. Two possible solutions are shown in iPython notebook (Example 1).

Straight Line (one variable)

edit

Lets look at the example of fitting a straight line to data, i.e. find a linear regression model with one variable that represents the data. The function is   and the (cost) sum of squares of errors function to be minimized is

 

We can minimize the   function by setting the gradient to zero. Because the function has two parameters (variables) there are two gradient equations:

 
 

The partial derivative represents the rate of change of the function value with respect to an independent variable while all other variables being held fixed. When a partial derivative becomes zero it means the change has stopped. This implies that we have reach a minimum or a maximum, which can be determined by checking the sign of the 2nd derivative.

Let's consider:

 

and

 ,

we can derive

 .

Therefore, we can calculate   and   as follows:

 
 .

Given the following definitions:

 

we get

 
 .

Multi-linear Regression

edit

A linear regression line models the relationship between independent variables (predictors) and a response variable. Once the model is built it can be used for prediction. In almost all real-world regression models multiple predictors (independent variables) are involved to model multiple factors that affect the response (dependent variable) from the system. Such models are known as multi-linear models.

Normal Equation

edit

Another way to find the parameters of a regression model that minimize the sum of squares of errors is to solve the corresponding normal equation. Given a matrix equation  , the normal equation is that which minimizes the sum of the square differences between the left and right sides:

 .

Given a hypothesized regression model and a dataset we can construct the left side to express the values our model would predict using the data. A should be a   matrix where each row represents one of the   data points (samples) and each column of each row represents a multiplier for each of the   parameters.   is a   column vector for the unknown parameters. The right side   should be a   column vector that stores the corresponding   values for the data points.

Recall the example that fits a straight line (model)   to the following data points.

data points
x y
2 4
2 6
3 6
3 8

We can construct the left side of the matrix equation   as:

 ,

which should result in a column vector of the values our model would predict:

 .

The right side should be a vector of corresponding values from the data points:

 

To minimize the sum of square differences between the left and the right sides is equivalent to solving the following normal equation:

 .

The normal equation for our example is:

 

If   is invertible, the solution vector should be unique and give us the values for the parameters   and   that minimizes the difference between the model and the data, i.e. the model best fits the data.

When we determine a regression line of the form   that fits four data points, we have four equations that can be written as   or

 

To solve the model we are effectively minimizing  .

 

 

 

The method introduced in the previous section boils down to solving a system of linear equations with two unknowns in the following form:

 

It is not hard to imagine that with   independent variables and   data points we can derive a similar system of linear equations with   unknowns. Then numerical methods, such as Gaussian elimination can be used to solve for the parameters. We could also use normal equations and matrix operations to solve for the parameters.

Gradient Descent

edit

Gradient descent is a method for finding local minimum of a function. The method is based on the concept of Gradient, which is a generalization of the derivative of a function in one dimension (slope or tangent) to a function in multiple dimensions. For a function with   variables the gradient at a particular point is a vector whose components are the   partial derivatives of the function. The following figure shows the gradient of a function.

 
The gradient of the function f(x,y) = −(cos2x + cos2y)2 depicted as a projected vector field on the bottom plane.

Because the gradient points in the direction of the greatest rate of increase of the function and its magnitude is the slope of the graph in that direction, we can start at a random point and take steps proportional to the negative of the gradient of the current point to find the local minimum - gradient decent or steepest descent.

Recall that gradients in multiple dimensions are partial derivatives of the cost function with respect to the parameters for the dimensions:

 
 
 
 
 

Gradient descent can be used to solve non-linear regression models.

Non-linear Regression

edit

Non-linear regression models the relationship in observational data by a function which is a nonlinear combination of the model parameters and depends on one or more independent variables. Some nonlinear regression problems can be transformed to a linear domain. For example, solving   is equivalent to solving   where  .

Here is an example of a gradient descent solution to a non-linear regression model.