A-level Mathematics/MEI/NM/Solving equations

The methods in this section are for solving numerically equations of the form f(x) = 0 that cannot be solved analytically, or are too difficult to solve analytically.

Change of sign methods edit

Introduction edit

If we have a continuous function f(x), and two x values a and b, then provided f(a) and f(b) have opposite signs, we know that the interval [a, b] contains a root of the equation f(x) = 0 (somewhere between a and b must be a value where f(x) = 0).

Graphically, if the curve y = f(x) is above the x-axis at one point and below it at another, it must have crossed the x-axis somewhere in between. Therefore, a root of f(x) = 0 lies somewhere in between the two points.

Change of sign methods use this information to progressively shrink an interval containing a change of sign, in order to find a root.

Note that:

The function must be continuous within the interval for the above to hold true
There may be more than one root in the interval. For example, f(x)=x³-x, solving f(x)=0 . The interval [-2, 2] has a change of sign and contains three roots: x=-1,x=0,x=1
Repeated roots do not cause a change of sign. For example, solving f(x)=0 where f(x)=(x -1)² . f(x) will evaluate as positive at the endpoints of any interval, yet there is a repeated root at x=1.

Bisection edit

Bisection is a change of sign method. It requires an initial interval containing a change of sign. On each step (called an iteration), the bisection method involves doing the following:

Bisect (divide into 2 equal parts) an interval in which a root is known to lie, hence giving two new intervals.
Evaluate the function at the endpoints of the two new intervals.
A change of sign indicates (provided the function is continuous) that there is a root in that interval. Hence deduce that a root lies in the interval which has a change of sign of the evaluation of the function between its endpoints.
Take the interval containing a root as your new starting interval in the next iteration.

Given an interval [a, b], let x be the new estimate of the root.

$x={\frac {a+b}{2}}$

The function is then evaluated at a, b and x, and the interval containing a sign change - either [a, x] or [x, b] - is selected as the new interval.

[graph, example]

Advantages edit

Bisection always converges - it will always find a root in the given starting interval.
It carries a definite statement of the bounds in which the result must lie. Numerical work is of much more value if you know how accurate the answer you obtain is.
It has a fixed rate of convergence - the interval halves in size on each iteration. Under certain conditions, other methods may converge more slowly.

Disadvantages edit

It requires a starting interval containing a change of sign. Therefore it cannot find repeated roots.
It has a fixed rate of convergence, which can be much slower than other methods, requiring more iterations to find the root to a given degree of precision.

False position edit

False position expands on bisection by using information about the value of the function at the endpoints to make a more intelligent estimate of the root. If the value of f(x) at one endpoint is closer to zero than at the other endpoint, then you would expect the root to lie closer to that endpoint. In many cases, this will lead to faster convergence on the root than bisection, which always estimates that the root is directly in the middle of the interval.

The procedure is nearly exactly the same as bisection. However, given an interval [a,b], the new estimate of the root, x, is instead:

$x={\frac {af(b)-bf(a)}{f(b)-f(a)}}$

This can be seen as just a weighted average of a and b, with the weightings being the value of f(x) at the other endpoint. This is because we want the endpoint with a smaller f(x) to have a larger weight, so that the estimate is closer to it. We therefore use the larger value of f(x) as the weighting for the endpoint which produces a smaller f(x).

The false position method is equivalent to constructing a line through the points on the curve at x=a and x=b, and using the intersection of this line with the x-axis as the new estimate.

[graph, example, advantages, disadvantages]

(w:False position method)

Other methods edit

Iterative method notation edit

x_r means the value of x after r iterations. For example, the initial estimate of the root would be x₀, and the estimate obtained by performing one iteration of the method on this would be x₁.

We write down iterative methods in the form x_r+1 (the next estimate of the root) in terms of x_r (the current estimate of the root). For example, the fixed point iteration method is:

$x_{r+1}=g(x_{r})\,\!$

If the new estimate is calculated from the previous two estimates, as in the case of the secant method, the new estimate of the root will be x_r+2, written in terms of x_r+1 and x_r.

Secant edit

This is similar in some ways to the false position method. It uses the co-ordinates of the previous two points on the curve to approximate the curve by a straight line. Like the false position method, it uses the place where this line crosses the axis as the new estimate of the root.

$x_{r+2}={\frac {x_{r}f(x_{r+1})-x_{r+1}f(x_{r})}{f(x_{r+1})-f(x_{r})}}$

[graph, example, advantages, disadvantages]

Newton-Raphson edit

The Newton-Raphson method is based on obtaining a new estimate of the root on each iteration by approximating f by its tangent and finding where this tangent crosses the x-axis.

$x_{r+1}=x_{r}-{\frac {f(x_{r})}{f'(x_{r})}}$

[graph, example, advantages, disadvantages]

Fixed point iteration edit

In this method, the equation $f(x)=0$ is rearranged into the form x = g(x). We then take an initial estimate of x as the starting value, and calculate a new estimate using g(x).

$x_{r+1}=g(x_{r})\,\!$

If this method converges, then provided g is continuous it will converge to a fixed point of g (where the input is the same as the output, giving x = g(x) ). This value of x will also satisfy f(x) = 0, hence giving a root of the equation.

Fixed point iteration will not always converge. There are infinitely many rearrangements of f(x) = 0 into x = g(x). Some rearrangements will only converge given a starting value very close to the root, and some will not converge at all. In all cases, it is the gradient of g that influences whether convergence occurs. Convergence will occur (given a suitable starting value), if when near the root:

$-1<g'(x)<1\,\!$

Where g' is the gradient of g. Convergence will be fastest when g'(x) is close to 0.

[staircase + cobweb diagrams, examples, advantages, disadvantages]