Solution of Algebraic and Transcendental Equations

An equation of the type $f(x)=0$ is either algebraic or transcendental.

E.g, these equations are algebraic.

2x=5;\quad x^{2}+x=1;\quad x^{7}=x(1+2x).

and these are transcendental

\sin x=0;\quad \quad e^{\sqrt {x}}=\pi ;\quad \tan x=x.

While roots can be found directly for algebraic equations of fourth order or lower, and for a few special transcendental equations, in practice we need to solve equations of higher order and also arbitrary transcendental equations.

As analytic solutions are often either too cumbersome or simply do not exist, we need to find an approximate method of solution. This is where numerical analysis comes into the picture.

Background

Some Useful Observations

The total number of roots an algebraic equation can have is the same as its degree.
An algebraic equation can have at most as many positive roots as the number of changes of sign in $f(x)$ .
An algebraic equation can have at most as many negative roots as the number of changes of sign in $f(-x)$ .
In an algebraic equation with real coefficients, complex roots occur in conjugate pairs
If $f(x)=a_{0}x^{n}+a_{1}x^{n-1}+a_{2}x^{n-2}+...+a_{n-1}x+a_{n}$ with roots $\alpha _{1},\alpha _{2},...,\alpha _{n}$ then the following hold good:
- $\sum _{i}\alpha _{i}={\frac {-a_{1}}{a_{0}}}$
- $\sum _{i<j}{\alpha _{i}\alpha _{j}}={\frac {a_{2}}{a_{0}}}$
- $\prod _{i}\alpha _{i}=(-1)^{n}{\frac {a_{n}}{a_{0}}}$
If $f(x)$ is continuous in the interval $[a,b]$ and $f(a)f(b)<0$ then a root must exist in the interval $(a,b)$

Initial Approximation

The last point about the interval is one of the most useful properties numerical methods use to find the roots. All of them have in common the requirement that we need to make an initial guess for the root. Practically, this is easy to do graphically. Simply plot the equation and make a rough estimate of the solution. Analytically, we can usually choose any point in an interval where a change of sign takes place. However, this is subject to certain conditions that vary from method to method.

Convergence

A numerical method to solve equations will be a long process. We would like to know, if the method will lead to a solution (close to the exact solution) or will lead us away from the solution. If the method, leads to the solution, then we say that the method is convergent. Otherwise, the method is said to be divergent.i.e, in case of linear and non linear interpolation convergence means tends to 0.

Rate of Convergence

Various methods converge to the root at different rates. That is, some methods are slow to converge and it takes a long time to arrive at the root, while other methods can lead us to the root faster. This is in general a compromise between ease of calculation and time.

For a computer program however, it is generally better to look at methods which converge quickly. The rate of convergence could be linear or of some higher order. The higher the order, the faster the method converges.

If $e_{i}$ is the magnitude of the error in the $i$ th iteration, ignoring sign, then the order is $n$ if ${\frac {e_{i+1}}{e_{i}^{n}}}$ is approximately constant.

It is also important to note that the chosen method will converge only if $e_{i+1}<e_{i}$ .

Methods

Bisection Method

This is one of the simplest methods and is strongly based on the property of intervals. To find a root using this method, the first thing to do is to find an interval $[a,b]$ such that $f(a)\cdot f(b)<0$ . Bisect this interval to get a point $(c,f(c))$ . Choose one of $a$ or $b$ so that the sign of $f(c)$ is opposite to the ordinate at that point. Use this as the new interval and proceed until you get the root within desired accuracy.

Example

Solve $e^{x}-2x^{2}+3x+1$ correct up to 2 decimal places.

f(x)=e^{x}-2x^{2}+3x+1

$\Rightarrow f(2)\cdot f(3)<0$ $\Rightarrow \epsilon ={\frac {1}{2}}10^{-2}$ $\Rightarrow i={\frac {2.3010}{0.3010}}=8$ $\Rightarrow x_{1}=2.5$

f(x_{1})=5.625>0

$\Rightarrow x_{2}=2.25$ $\Rightarrow ...x_{8}=2.09$

Error Analysis

The maximum error after the $i$ th iteration using this process will be given as

\epsilon _{i}={\frac {|b-a|}{2^{i}}}

$\Rightarrow i\geq {\frac {[\log(b-a)-\log \epsilon _{i}]}{\log 2}}$

As the interval at each iteration is halved, we have ${\frac {\epsilon _{i+1}}{\epsilon _{i}}}={\frac {1}{2}}$ . Thus this method converges linearly.

If we are interested in the number of iterations the Bisection Method needs to converge to a root within a certain tolerance than we can use the formula for the maximum error.

Example

How many iterations do you need to get the root if you start with a = 1 and b = 2 and the tolerance is 10⁻⁴?

The error $\epsilon _{i}$ needs to be smaller than 10⁻⁴. Use the formula for the maximum error:

\epsilon _{i}=2^{-i}\cdot (2-1)<10^{-4}

\epsilon _{i}=2^{-i}<10^{-4}

Solve for i using log rules

\displaystyle \log _{10}{2^{-i}}<\log _{10}{10^{-4}}

\displaystyle -i\cdot \log _{10}{2}<-4

\displaystyle i>{\frac {4}{\log _{10}{2}}}=13.29=14

Hence 14 iterations will ensure an approximation to the root accurate to $\displaystyle 10^{-4}$ . Note: the error analysis only gives a bound approximation to the error; the actual error may be much smaller.

False Position Method

The false position method (sometimes called the regula falsi method) is essentially same as the bisection method -- except that instead of bisecting the interval, we find where the chord joining the two points meets the X axis. The roots are calculated using the equation of the chord, i.e. putting $y=0$ in

y-f(x_{0})={\frac {f(x_{1})-f(x_{0})}{x_{1}-x_{0}}}(x-x_{0})

.

The rate of convergence is still linear but faster than that of the bisection method. Both these methods will fail if f has a double root.

Example

Consider f(x)=x²-1. We already know the roots of this equation, so we can easily check how fast the regula falsi method converges.

For our initial guess, we'll use the interval [0,2].

Since f is concave upwards and increasing, a quick sketch of the geometry shows that the chord will always intersect the x-axis to the left of the solution. This can be confirmed by a little algebra.

We'll call our n^th iteration of the interval [a_n, 2]

The chord intersects the x-axis when

-(a_{n}^{2}-1)={\frac {(2^{2}-1)-(a_{n}^{2}-1)}{2-a_{n}}}(x-a_{n})

Rearranging and simplifying gives

x={\frac {1+2a_{n}}{2+a_{n}}}

Since this is always less than the root, it is also a_n+1

The difference between a_n and the root is e_n=a_n-1, but

e_{n+1}={\frac {1+2a_{n}}{2+a_{n}}}-1={\frac {a_{n}-1}{a_{n}+2}}={\frac {1}{a_{n}+2}}e_{n}

This is always smaller than e_n when a_n is positive. When a_n approaches 1, each extra iteration reduces the error by two-thirds, rather than one-half as the bisection method would.

The order of convergence of this method is 2/3 and is linear.

In this case, the lower end of the interval tends to the root, and the minimum error tends to zero, but the upper limit and maximum error remain fixed. This is not uncommon.

Fixed Point Iteration (or Staircase method or x = g(x) method or Iterative method)

If we can write f(x)=0 in the form x=g(x), then the point x would be a fixed point of the function g (that is, the input of g is also the output). Then an obvious sequence to consider is

x_{n+1}=g(x_{n})\,

If we look at this on a graph we can see how this could converge to the intersection.

Any function can be written in this form if we define x=g(x), though in some cases other rearrangements may prove more useful(must satisfy the condition |g(x)|<1). This method is useful for finding a positive root to the infinite series also.

Fixed Point Theorem (Statement)

If g is continuous on[a,b] and a ≤ g(x) ≤ b for all x in [a,b], then g has at least a fixed point in [a,b].

Further, suppose g'(x) is continuous on (a,b) and that a positive constant c exists with |g'(x)| ≤ c <1, for all x in (a,b). Then there is a unique fixed point α of g in [a,b]. Also, the iterates x_n+1 = g(x_n) n≥0 will converge to α for any choice of x₀ in [a,b].

Error analysis

We define the error at the n^th step to be

e_{n}=x_{n}-x{\mbox{ where }}x=g(x)\,

Then we have

{\begin{matrix}e_{n+1}&=&x_{n+1}-x&\\&=&g(x_{n})-x&\\&=&g(x+e_{n})-x&\\&=&-x+g(x)+e_{n}g^{\prime }(x)+\cdots &\\&=&e_{n}g^{\prime }(x)+\cdots &{\mbox{since }}g(x)=x\end{matrix}}

So, when |g′(x)|<l, this sequence converges to a root, and the error will be approximately proportional to n.

Because the relationship between e_n+1 and e_n is linear, we say that this method converges linearly, if it converges at all.

When g(x)=f(x)+x this means that if

x_{n+1}=x_{n}+f(x_{n})\,

.

converges to a root, x, of f then

-2<f^{\prime }(x)<0\,

Note that this convergence will only happen for a certain range of x. If the first estimate is outside that range then no solution will be found.

Also note that although this is a necessary condition for convergence, it does not guarantee convergence. In the error analysis we neglected higher powers of e_n, but we can only do this if e_n is small. If our initial error is large, the higher powers may prevent convergence, even when the condition is satisfied.

If |g′(x)|<1 is true at the root, the iteration sequence will converge in some interval around the root, which may be smaller than the interval where |g′(x)|<1. If |g′(x)| isn't smaller than one at the root, the iteration will not converge to that root.

Example

Lets consider $f(x)=x^{3}+x-2$ , which we can see has a single root at x=1. There are several ways f(x)=0 can be written in the desired form, x=g(x).

The simplest is

1): $x_{n+1}=x_{n}+f(x_{n})=x_{n}^{3}+2x_{n}-2$

In this case, $g'(x)=3x^{2}+2$ , and the convergence condition is

1>|g'(x)|=3x^{2}+2,\qquad -1>3x^{2}

Since this is never true, this doesn't converge to the root.

2)An alternate rearrangement is

x_{n+1}=2-x_{n}^{3}

This converges when

1>|g'(x)|=|-3x^{2}|,\qquad x^{2}<{\frac {1}{3}},\qquad |x|<{\frac {1}{\sqrt {3}}}

Since this range does not include the root, this method won't converge either.

3)Another obvious rearrangement is

x_{n+1}={\sqrt[{3}]{2-x_{n}}}

In this case the convergence condition becomes

{\frac {1}{3}}\left|(2-x_{n})^{-{\frac {2}{3}}}\right|<1,\qquad \left(2-x_{n}\right)^{-2}<3^{3},\qquad |x_{n}-2|>{\sqrt {27}}

Again, this region excludes the root.

4)Another possibility is obtained by dividing by x²+1

x_{n+1}={\frac {2}{x_{n}^{2}+1}}

In this case the convergence condition becomes

{\frac {4|x|}{(1+x^{2})^{2}}}<1,\qquad 4|x|<(1+x^{2})^{2}

Consideration of this inequality shows it is satisfied if x>1, so if we start with such an x, this will converge to the root.

Clearly, finding a method of this type which converges is not always straightforwards.

Newton-Raphson

In numerical analysis, Newton's method (also known as the Newton–Raphson method or the Newton–Fourier method) is an efficient algorithm for finding approximations to the zeros (or roots) of a real-valued function. As such, it is an example of a root-finding algorithm.

Any zero-finding method (Bisection Method, False Position Method, Newton-Raphson, etc.) can also be used to find a minimum or maximum of such a function, by finding a zero in the function's first derivative, see Newton's method as an optimization algorithm.

Description of the method

The idea of the Newton-Raphson method is as follows: one starts with an initial guess which is reasonably close to the true root, then the function is approximated by its tangent line (which can be computed using the tools of calculus), and one computes the x-intercept of this tangent line (which is easily done with elementary algebra). This x-intercept will typically be a better approximation to the function's root than the original guess, and the method can be iterated. Suppose f : [a, b] → R is a differentiable function defined on the interval [a, b] with values in the real numbers R. The formula for converging on the root can be easily derived. Suppose we have some current approximation xn. Then we can derive the formula for a better approximation, xn+1 by referring to the diagram on the right. We know from the definition of the derivative at a given point that it is the slope of a tangent at that point.

We can get better convergence if we know about the function's derivatives. Consider the tangent to the function:

Tangent is in red; the function itself in blue.

Near any point, the tangent at that point is approximately the same as f('x) itself, so we can use the tangent to approximate the function.

The tangent through the point (x_n, f(x_n)) is

{\frac {y-f(x_{n})}{x-x_{n}}}=f'(x_{n})

The next approximation, x_n+1, is where the tangent line intersects the axis, so where y=0. Rearranging, we find

x_{n+1}=x_{n}-{\frac {f(x_{n})}{f'(x_{n})}}

Error analysis

Again, we define the root to be x, and the error at the n^th step to be

e_{n}=x_{n}-x\,

Then the error at the next step is

e_{n+1}=e_{n}-{\frac {f(x)+e_{n}f'(x)+{\frac {1}{2}}e_{n}^{2}f''(x)+\cdots }{f'(x)+e_{n}f''(x)+\cdots }}

(1)

where we've written f as a Taylor series round its root, x. Rearranging this, and using f(x)=0, we get

{\begin{matrix}e_{n+1}&=&e_{n}&-&e_{n}\left(f'(x)+{\frac {1}{2}}e_{n}f''(x)\right)/f'(x)+\cdots \\&=&-{\frac {f''(x)}{f'(x)}}e_{n}^{2}&+&\cdots \end{matrix}}

where we've neglected cubic and higher powers of the error, since they will be much smaller than the squared term, when the error itself is small.

Notice that the error is squared at each step. This means that the number of correct decimal places doubles with each step, much faster than linear convergence.

This sequence will converge if

\left|{\frac {f''(x)}{f'(x)}}e_{n}^{2}\right|<|e_{n}|,\qquad |e_{n}|<{\frac {f'(x)}{f''(x)}}

If f′ isn't zero at the root, then there will always be a range round the root where this method converges.

If f′ is zero at the root, then on looking again at (1) we see that we get

e_{n+1}=e_{n}/2\,

and the convergence becomes merely linear.

Overall, this method works well, provided f does not have a minimum near its root, but it can only be used if the derivative is known.

Example

Let's consider f(x)=x²-a. Here, we know the roots exactly, so we can see better just how well the method converges.

We have

x_{n+1}=x_{n}-{\frac {f(x_{n})}{f'(x_{n})}}=x_{n}-{\frac {x_{n}^{2}-a}{2x_{n}}}={\frac {x_{n}^{2}+a}{2x_{n}}}={\frac {1}{2}}\left(x_{n}+{\frac {a}{x_{n}}}\right)

This method is easily implemented, even with just pen and paper, and has been used to rapidly estimate square roots since long before Newton.

The n^th error is e_n=x_n-√a, so we have

{\begin{matrix}e_{n+1}&=&{\frac {(e_{n}+{\sqrt {a}})^{2}+a}{2({\sqrt {a}}+e_{n})}}-{\sqrt {a}}\\&=&{\frac {2a+2e_{n}{\sqrt {a}}+e_{n}^{2}}{2({\sqrt {a}}+e_{n})}}-{\sqrt {a}}\\&=&{\frac {2({\sqrt {a}}+e_{n}){\sqrt {a}}+e_{n}^{2}}{2({\sqrt {a}}+e_{n})}}-{\sqrt {a}}\\&=&{\frac {e_{n}^{2}}{2({\sqrt {a}}+e_{n})}}\end{matrix}}

If a=0, this simplifies to e_n/2, as expected.

If a>0, e_n+1 will be positive, provided e_n is greater than -√a, i.e provided x_n is positive. Thus, starting from any positive number, all the errors, except perhaps the first will be positive.

The method converges when,

|e_{n+1}|=\left|{\frac {e_{n}^{2}}{2({\sqrt {a}}+e_{n})}}\right|<|e_{n}|

so, assuming e_n is positive, it converges when

e_{n}<2\left(e_{n}+{\sqrt {a}}\right)

which is always true.

This method converges to the square root, starting from any positive number, and it does so quadratically.

Higher order methods

There are methods that converge even faster than Newton-Raphson. e.g

x_{n+1}=x_{n}-{\frac {f(x_{n})}{f'(x_{n})}}+{\frac {1}{2}}{\frac {f''(x_{n})f^{2}(x_{n})}{{f'}^{3}(x_{n})}}

which converge cubicly, tripling the number of correct digits at each iteration, which is 50% faster than Newton-Raphson. However, if iterating each step takes 50% longer, due to the more complex formula, there is no net gain in speed. For this reason, methods such as this are seldom used. Ostrowski's efficiency index describes the trade-off well: if each step takes $n$ units of work (e.g. evaluating a function) for an algorithm of convergence order $p$ , the "effective work" required for a given amount of convergence is $n 1/ p$ .^[1]

A quite honorable mention here is the ITP method, an improvement of the false position method. It provides a guaranteed (tunable, typically) ${\sqrt {2}}$ order of convergence without requiring a derivative.

Main Page - Mathematics bookshelf - Numerical Methods

↑ A. M. Ostrowski, “Solution of Equations and Systems of Equations”, Academic Press, New York, 1960.

[1] A. M. Ostrowski, “Solution of Equations and Systems of Equations”, Academic Press, New York, 1960.

[1]