BASIC Programming

The current, editable version of this book is available in Wikibooks, the open-content textbooks collection, at
https://en.wikibooks.org/wiki/BASIC_Programming

Permission is granted to copy, distribute, and/or modify this document under the terms of the Creative Commons Attribution-ShareAlike 3.0 License.

Introduction

Normative BASIC

The BASIC Programming Language has been standardized, firstly in the United States of America (USA) by the American National Standards Institute (ANSI), and later in Europe by the European Computer Manufacturers Association (ECMA), giving rise to the American National Standard (ANS) X3.60-1978 for Minimal BASIC and X3.113-1987 for Full BASIC by the former, and to the European Computer Manufacturers Association Standard 55 for Minimal BASIC in 1978 and Standard 116 for Full BASIC in 1986 by the latter.

The aim of the standards is to promote the interchangeability of BASIC programs among a variety of systems and through strict co-operation between both organizations it was possible to maintain full compatibility between the respective ANSI and ECMA standards.

The standards establish, among others:

the syntax of a program written in BASIC, and
the semantic rules for interpreting the meaning of a program written in BASIC.

Nowadays, only the ECMA standards are publicly available.

Minimal BASIC (ANS X3.60-1978, ECMA Standard 55)

Character Set

The set of allowable character is given by:

the set of capital letters from A to Z,
the set of digits from 0 to 9,
the set of symbols !, #, $, %, &, (, ), +, -, *, /, ^, ., ,, ;, :, <, =, >, _, ?, ', "
the space character

List of Reserved Keywords

Reserved keywords in Minimal BASIC are (26 in total):

BASE
DATA
DEF
DIM
END
FOR
GO
GOSUB
GOTO
IF
INPUT
LET
NEXT
ON
OPTION
PRINT
RANDOMIZE
READ
REM
RESTORE
RETURN
STEP
STOP
SUB
THEN
TO

Its meaning will be explained within the next sections.

Convention for the Name of Variables

Variables are used in BASIC to hold either character strings or numeric values, the latter being either of scalar or vectorial nature.

In the case of variables for character strings, each variable name is composed of a single letter between A - Z and the dollar sign $. So, A$, B$, ..., Z$ are all valid variable names for character strings, while A# or Z% are not.

In the case of variables for numeric scalar values, each variable name is composed of a single letter between A - Z and an optional digit. So, A, B, C1, D2, etc., are valid variable names for scalar values, while A11, B22, etc., are not.

In the case of variables for numeric vectorial values, each variable name is composed of a single letter between A - Z and either a number or two, separated by a comma, enclosed within parentheses for a one or a two dimensional array. So, A(1), B(2), C(1,1), D(2,2), etc., are valid variable names for vectorial values.

This convention makes the explicit declaration of variables not necessary in BASIC, since a dollar sign serves to distinguish a character string from a numeric value, and the presence of subscripts distinguishes a vectorial from a scalar variable.

Character Strings and Numeric Constants

Character strings are defined by any combination of characters from the allowable character set written within double quotation marks, the length of any character string being limited to 18 characters (with the exception of character strings in a print or remark-statement, to be seen later, which can be as long as line numbers and the line length limit permit). So, "", " ", "1 2 3 4 5 6 7 8 9", "A B C D E F G H I", "! # $ % & ... ' ", etc., are allowable character strings, while "1 2 3 4 5 6 7 8 9 0", "A B C D E F G H I J", "! # $ % & ( ) + - * / ^ . , ; : < = > _ ? ' ", etc., are not, since they exceed the 18-character limit.

Numeric constants denote scalar numeric values in a decimal representation in positional notation of a number. There are four general syntactic forms of optionally signed numeric constants:

implicit point representation (sd...d), like in the case of 1, 2, +1, -2, etc.,
explicit point unscaled representation (sd...drd...d), like in the case of 1.0, 2.0, +1.0, -2.0, etc.,
explicit point scaled representation (sd...drd...dEsd...d), like in the case of 1.0E1, 2.0E-1, +1.0E+1.0, -2.0E-2.0, etc.,
implicit point scaled representation (sd...dEsd...d), like in the case of 1.0E1, 2.0E-1, +1.0E+1, -2.0E-2, etc.,

where:

s is an optional sign (+ or -),
d is a decimal digit (0 - 9),
r is a period (.), and
E means 10 to the power.

Numeric constants can have any number of digits, although internally not less than six significant decimal digits and a range between 1E-38 and 1E+38.

Numeric constants whose magnitude is less than machine infinitesimal are replaced by zero, while constants whose magnitude are larger than machine infinity are replaced by machine infinity with the appropriate sign.

General Program Structure

BASIC is a line-oriented language, in the sense that a BASIC program can be considered as a sequence of lines, the last of which is an end-line, and each of which contains a keyword. Moreover, each line begins with a unique line number, which serves as a label for the statement contained in that line.

So, in BASIC every program can be represented with the following Backus-Naur form (BNF):

program = block end-line
block = line / for-block
line = line-number statement
line-number = digit digit? digit? digit?
end-line = line-number end-statement
end-statement = END
statement = data-statement / def-statement / dimension-statement / gosub-statement / goto-statement / if-then-statement / input-statement / let-statement / on-goto-statement / option-statement / print-statement / randomize-statement / read-statement / remark-statement / restore-statement / return-statement / stop-statement

So, the following simple examples are valid examples of a program in BASIC:

a two-line program (just a remark-statement, which serves to document the program and produces no output, and an end-statement, which terminates the program):

10 REM "REMARK STATEMENT"
20 END

a three-line program (just a remark-statement, a print-statement, which prints a character string, and an end-statement):

10 REM "HELLO WORLD PROGRAM"
20 PRINT "HELLO, WORLD!"
30 END

Programs lines are executed in sequential order, starting with the first one, until:

some other action is dictated by a control statement, or
an exception condition occurs, which results in abnormal termination of the program, or
a stop-statement or end-statement is executed.

So, in the first example, the first line, 10 REM "REMARK STATEMENT", is composed of a non-control statement which produces no output or internal activity, passing then to the second line, 20 END, which is composed of a control statement, the end-statement, which ends the program.

In the second example, there exists an additional line between the remark and the end-statement lines, being composed of a print-statement, also a non-control statement, which prints a character string.

The value of the line-numbers are positive integers, with leading zeroes having no effect. So, 1, 01, 10, 010, etc., are all valid line-numbers. Normally, line-numbers are given as multiples of 5 or 10, e.g., 10, 20, 30, 40, etc., which allows for room in case an additional line must be inserted in between existing lines.

Additionally, lines can be up to 72-characters long, so leaving 4 characters for the line-number, and a blank space as a separator between the line-number and the keyword, leaves 67 printable characters left for the statement in a line.

Spaces may occur anywhere in a BASIC program without affecting the execution of that program and may be used to improve the readability of the program.

All keywords in a program can be preceded by at least one space and, if not at the end of a line, can also be followed by at least one space.

Spaces shall not appear:

at the beginning of a line
within line numbers
within keywords
within numeric constants
within function or variable names
within two-character relation symbols

Program Variables

Variables in BASIC are associated with either numeric or string values and, in the case of numeric values, may be either simple variables or references to elements of one or two-dimensional arrays, which are then called subscripted or compound variables.

As stated before, simple numeric variables are named by a single capital letter followed by an optional single digit, while subscripted variables are named by a single capital letter followed by one or two numbers, separated in this last case by a comma, enclosed within parentheses.

String variables are also to be named by a single capital letter followed by a dollar sign.

At any instant in the execution of a program, a numeric variable is associated with a single numeric value and a string variable is associated with a single string value, the value associated with the variable possibly being changed by program statements in the course of program execution.

The length of a character string associated with a string variable can change during execution of the program from a length varying between 0 for the empty string to 18 characters.

Simple numeric variables and string variables are declared implicitly through their appearance in the program (also no type definitions are necessary, due to the given naming convention), although it is good programming practice to initialize or set them to meaningful values at the beginning of the program before their use in any statement.

A subscripted variable, on the other hand, refers to the element in the one or two-dimensional array selected by the value or values of the subscripts, being the subscripts integer values.

Unless explicitly declared in a dimension statement (to be seen later), subscripted variables are implicitly declared by their first appearance in a program, in which case the range of each subscript is to be understood from zero to ten, both inclusive, unless the presence of an option-statement indicates that the range is defined from one to ten, both also inclusive.

Caution must be paid, so that the same single letter is not used both for the name of a simple variable and a composed variable, nor for the name of both a one-dimensional and a two-dimensional array.

On the contrary, this restriction does not apply between a simple variable and a string variable, whose names may agree except for the dollar sign.

So, the following simple examples are valid examples of a program in BASIC:

the previous three-line program, with a somewhat different comment line and a new character string, which indicates the value of pi, being printed:

10 REM "SIMPLE PROGRAM FOR DEMONSTRATION PURPOSES"
20 PRINT "PI = 3.14159265"
30 END

a modified four-line program which makes use of a let-statement to assign the numeric constant 3.14159265 to the numeric variable P in the second line, and a print-statement with a string constant and a numeric variable as a comma-separated list of arguments in the third line:

10 REM "SIMPLE PROGRAM FOR DEMONSTRATION PURPOSES"
20 LET P = 3.14159265
30 PRINT "PI = ", P
40 END

a modified five-line program which, still making use of a let-statement to assign the numeric constant 3.14159265 to the numeric variable P in the second line, now also makes use of a let-statement to assign the character constant "PI = " to the string variable P$ in a third line -- the print-statement in the fourth line is now composed of the comma-separated list of arguments formed by the string variable and the numeric variable:

10 REM "SIMPLE PROGRAM FOR DEMONSTRATION PURPOSES"
20 LET P = 3.14159265
30 LET P$ = "PI = "
40 PRINT P$, P
50 END

Statements

Up to now we have seen how to declare/initialize simple numeric variables and string variables in the course of a program by means of the let-statement and how to print them with the help of the print-statement.

It is sometimes desirable not only to print the value of a variable, let it be a numeric or a character one, but to introduce the value as input to the program in order to compute a numerical value or to print a message depending on the value of a condition. For those cases, one needs to make use of expressions, mathematical functions, and control statements, as we shall see in this section.

Input/Output, Mathematical Operators, Expressions

Expressions are normally classified as numeric expressions or string expressions.

In the case of numeric expressions, these are constructed from variables, constants, mathematical functions, and the mathematical operations of addition, subtraction, multiplication, division, and involution.

The formation and evaluation of numeric expressions follow the normal algebraic rules, and the circumflex accent, the asterisk, the solidus, the plus sign, and the minus sign symbols are used to represent the operations of involution, multiplication, division, addition, and subtraction, respectively.

Unless parentheses dictate otherwise, exponentiation is performed first, then multiplications and divisions, and finally additions and subtractions, where operations of the same precedence are associated from left to right. So, A - B - C is interpreted as (A - B) - C, A / B / C as (A / B) / C, and A - B / C as A - (B / C), since in the first two all the mathematical operators have the same precedence, and hence evaluate from left to right, while in the last one there exists different precedence between operators, and hence the division is evaluated before the subtraction.

The following examples illustrate in a simple way the concepts seen so far:

a program that prints the value of the exponentiation of the numeric constant 1.4142 by 2 (also calculates the square of 1.4142), together with some text:

10 REM "SIMPLE PROGRAM FOR DEMONSTRATION PURPOSES"
20 PRINT "THE SQUARE OF 1.4142 IS ", 1.4142^2
30 END

a program that defines a numeric variable S with a value of 1.4142, and prints the value of the exponentiation of the numeric variable by 2 (also calculates the square of 1.4142), together with some text:

10 REM "SIMPLE PROGRAM FOR DEMONSTRATION PURPOSES"
20 LET S = 1.4142
30 PRINT "THE SQUARE OF ", S, " IS ", S^2
40 END

a program that defines a numeric variable S with a value of 1.4142, calculates the product of S by S (also calculates the square of 1.4142), assigns this value to a numeric variable S2, and prints the value of both S and S2 together with some text as string constants:

10 REM "SIMPLE PROGRAM FOR DEMONSTRATION PURPOSES"
20 LET S = 1.4142
30 LET S2 = S * S
40 PRINT "THE SQUARE OF ", S, " IS ", S2
50 END

a program that defines a numeric variable S with a value of 1.4142 and another one S2 with 2.0000, and prints the value of the operation of dividing S2 by S, together with some text as string constants:

10 REM "SIMPLE PROGRAM FOR DEMONSTRATION PURPOSES"
20 LET S = 1.4142
30 LET S2 = 2.0000
40 PRINT "THE SQUARE ROOT OF ", S2, " IS APPROXIMATELY ", (S2 / S)
50 END

a program similar to the previous one, in that it defines a numeric variable S with a value of 1.4142 and another one S2 with 2.0000, but prints the value of the operation of subtraction of S by the result of dividing S2 by S (giving then a measure of accuracy for the approximation -- this is the nucleus of a numerical method that we will see later to calculate the square root of a number), together with some text as string constants:

10 REM "SIMPLE PROGRAM FOR DEMONSTRATION PURPOSES"
20 LET S = 1.4142
30 LET S2 = 2.0000
40 PRINT S, " AND THE QUOTIENT OF ", S2, " BY ", S, " DIFFER BY ", (S - S2 / S)
50 END

a program a little bit different to the previous ones, in that it asks the user for a number, whose square is to be calculated:

10 REM "SIMPLE PROGRAM FOR DEMONSTRATION PURPOSES"
20 LET S = 0.0
30 PRINT "THIS PROGRAM CALCULATES THE SQUARE OF A NUMBER"
40 INPUT "PLEASE ENTER THE NUMBER WHOSE SQUARE IS TO BE CALCULATED: ", S
50 PRINT "THE SQUARE OF ", S, " IS ", S^2
60 END

Mathematical Functions

Up to here we have seen how numeric and character variables are to be defined, the rules for writing lines, basic input and output, and the rules for simple arithmetic.

But what happens, if one needs to calculate the square root of a number? For this purpose, basic mathematical functions are by default provided. These are:

the absolute value of a number, ABS(X)
the arctangent of a number, ATN(X)
the cosine of a number expressed in radians, COS(X)
the exponential of a number, EXP(X)
the integer part of a number, INT(X)
the natural logarithm of a number, LOG(X)
the sign of a number, SGN(X)
the sine of a number expressed in radians, SIN(X)
the square root of a positive number, SQR(X)
a uniformly distributed pseudo-random number in the interval (0,1), RND()
the tangent of a number expressed in radians, TAN(X)

Let us see some examples:

a program that calculates the square root of 2 with the help of the SQR mathematical function provided:

10 REM "SIMPLE PROGRAM FOR DEMONSTRATION PURPOSES"
20 LET S2 = 2.0
30 LET S  = SQR(S2)
40 PRINT "THE SQUARE ROOT OF ", S2, " IS ", S
50 END

a program that asks the user for a number, for which its cosine is to be calculated:

10 REM "SIMPLE PROGRAM FOR DEMONSTRATION PURPOSES"
20 LET X = 0.0
30 INPUT "PLEASE ENTER THE NUMBER WHOSE COSINE IS TO BE CALCULATED: ", X
40 PRINT "THE COSINE OF ", X, " IS ", COS(X)
50 END

a program that asks the user for a number, for which its sine is to be calculated:

10 REM "SIMPLE PROGRAM FOR DEMONSTRATION PURPOSES"
20 LET X = 0.0
30 INPUT "PLEASE ENTER THE NUMBER WHOSE SINE IS TO BE CALCULATED: ", X
40 PRINT "THE SINE OF ", X, " IS ", SIN(X)
50 END

a program that asks the user for a number, for which its tangent is to be calculated:

10 REM "SIMPLE PROGRAM FOR DEMONSTRATION PURPOSES"
20 LET X = 0.0
30 INPUT "PLEASE ENTER THE NUMBER WHOSE TAN IS TO BE CALCULATED: ", X
40 PRINT "THE TAN OF ", X, " IS ", TAN(X)
50 END

a program that asks the user for a number, for which its exponential is to be calculated:

10 REM "SIMPLE PROGRAM FOR DEMONSTRATION PURPOSES"
20 LET X = 0.0
30 INPUT "PLEASE ENTER THE NUMBER WHOSE EXP IS TO BE CALCULATED: ", X
40 PRINT "THE EXP OF ", X, " IS ", EXP(X)
50 END

a program that asks the user for a number, for which its natural logarithm is to be calculated:

10 REM "SIMPLE PROGRAM FOR DEMONSTRATION PURPOSES"
20 LET X = 0.0
30 INPUT "PLEASE ENTER THE NUMBER WHOSE LOG IS TO BE CALCULATED: ", X
40 PRINT "THE LOG OF ", X, " IS ", LOG(X)
50 END

a program to print a pseudo-random number uniformly distributed in the interval (0,1):

10 REM "SIMPLE PROGRAM FOR DEMONSTRATION PURPOSES"
20 PRINT "PSEUDO-RANDOM NUMBER UNIFORMLY DISTRIBUTED IN (0,1): ", RND()
30 END

Control Statements

Sample Programs

Minimal BASIC sample programs can be found in the corresponding page.

Normative BASIC/Minimal BASIC

Sample Programs

Sample programs for Minimal BASIC will appear here.

Numerical Integration

Introduction

There exists two cases, when the computation of the value of a definite integral by numerical methods is needed. One of them is the calculation of the area below the curve defined by a set of experimental data, and another is the calculation of the definite integral of a mathematical function, for which no known integral is known. The former is often the case of response functions in the experimental labors of science and engineering, while the latter is normally the case in the practical investigations of physics, mathematics, and engineering.

Independently of it, the development of numerical methods for integration purposes, a field that belongs to the department of applied mathematics, is based on the simple idea from which it stems, i.e., if $y(x)=f(x)$ is a real-valued (the complex-valued case can be treated analogously, by separating it into its real and imaginary parts) continuous function of $x$ defined in an interval $(a,b)$ , its definite integral,

$\int _{a}^{b}f(x)dx=\sum _{x=a}^{x=b}\lim _{\delta x\rightarrow 0}f(x)\delta x=\lim _{\delta x\rightarrow 0}\sum _{x=a}^{x=b}f(x)\delta x\approx \sum _{x=a}^{x=b}f(x)\Delta x$ ,

can be calculated approximately as the finite sum of the product $f(x)\Delta x$ evaluated at some given points in the interval $(a,b)$ .

In the case of experimental data, the set of points at which the value of the function is measured is usually not regularly distributed (i.e., the points are not equispaced), so the value of the definite integral must be calculated in the form:

$\int _{a}^{b}f(x)dx=\int _{{x_{0}}=a}^{x_{1}}f(x)dx+\int _{x_{1}}^{x_{2}}f(x)dx+...+\int _{x_{n-2}}^{x_{n-1}}f(x)dx+\int _{x_{n-1}}^{x_{n}=b}f(x)dx$ ,

which can be approximated either as:

$\int _{a}^{b}f(x)dx\approx f({x_{0}}=a)({x_{1}}-{x_{0}})+f({x_{1}})({x_{2}}-{x_{1}})+...+f({x_{n-2}})({x_{n-1}}-{x_{n-2}})+f({x_{n-1}})({x_{n}}-{x_{n-1}})$ ,

or as:

$\int _{a}^{b}f(x)dx\approx f({x_{1}})({x_{1}}-{x_{0}})+f({x_{2}})({x_{2}}-{x_{1}})+...+f({x_{n-1}})({x_{n-1}}-{x_{n-2}})+f({x_{n}}=b)({x_{n}}-{x_{n-1}})$ .

In the first case, the value of the integral is underestimated (overestimated) in the case of monotonically ascending (descending) functions, since the value of $f(x)$ taken in each evaluation is always the lowest (highest) in every subinterval, and hence constituting an absolute lower (upper) bound to the value of the integral, while in the second case, the value of the integral is overestimated (underestimated) in the case of monotonically descending (ascending) functions, since the value of $f(x)$ taken in each evaluation is always the highest (lowest) in every subinterval, and hence constituting an absolute upper (lower) bound to the value of the integral.

According to the Mean-Value Theorem of Calculus, the value of a definite integral can also be calculated as:

$\int _{a}^{b}f(x)dx=f(\gamma )(b-a)$ ,

for some value $\gamma$ in $(a,b)$ for which $f(\gamma )$ represents the mean value of $f(x)$ in $(a,b)$ , so it is then a better approximation to calculate the definite integral of a set of experimental data as:

$\int _{a}^{b}f(x)dx=\int _{{x_{0}}=a}^{x_{1}}f(x)dx+\int _{x_{1}}^{x_{2}}f(x)dx+...+\int _{x_{n-2}}^{x_{n-1}}f(x)dx+\int _{x_{n-1}}^{x_{n}=b}f(x)dx$ ,

$\int _{a}^{b}f(x)dx=f(\gamma _{(0,1)})({x_{1}}-{x_{0}})+f(\gamma _{(1,2)})({x_{2}}-{x_{1}})+...+f(\gamma _{(n-2,n-1)})({x_{n-1}}-{x_{n-2}})+f(\gamma _{(n-1,n)})({x_{n}}-{x_{n-1}})$ ,

$\int _{a}^{b}f(x)dx\approx {\frac {f({x_{0}})+f({x_{1}})}{2}}({x_{1}}-{x_{0}})+{\frac {f({x_{1}})+f({x_{2}})}{2}}({x_{2}}-{x_{1}})+...+{\frac {f({x_{n-2}})+f({x_{n-1}})}{2}}({x_{n-1}}-{x_{n-2}})+{\frac {f({x_{n-1}})+f({x_{n}})}{2}}({x_{n}}-{x_{n-1}})$ .

For reasons that we shall see later, this is equal to assume a piecewise linear interpolating function between the different points, and the value of the integral so calculated is exact for linear functions (i.e., functions for which its slope changes at constant rate), although it is underestimated for functions for which its slope grows at non-constant rate (i.e., its second derivative is strictly positive in the considered interval), and it is overestimated for functions for which its slope decreases at non-constant rate (i.e., its second derivative is strictly negative in the considered interval). The value so calculated constitutes a better approximation than the lower and upper bounds presented before, and in the case that the second derivative of the function (which can be calculated from the experimental data with the help of second or central differences) changes signs between subintervals, the value is expected to be close to the actual value due to the cancellation of the errors in the approximation of the mean values.

In the case of mathematical functions, there is more information about the function, since it is possible not only to calculate the value of the function at any given point, but also to compute first, second, and higher-order derivatives with any degree of accuracy.

Let us elaborate some mathematical results, going from simple to more elaborate methods:

The main theme in the development of numerical methods, together with the study of the stability (i.e., if a method converges), is the rate of convergence of the method, which studies how many evaluations are needed and the error in the approximation, for non-iterative methods, or how many iterations are needed and how the error is minimized in each iteration, for iterative methods.

In the case of the study of the stability, and as we have seen before, the value of the definite integral of a function $f(x)$ defined in an interval $(a,b)$ ,

$\int _{a}^{b}f(x)dx=\sum _{x=a}^{x=b}\lim _{\delta x\rightarrow 0}f(x)\delta x=\lim _{\delta x\rightarrow 0}\sum _{x=a}^{x=b}f(x)\delta x\approx \sum _{x=a}^{x=b}f({x_{k}})\Delta {x_{k}}$ ,

can be calculated approximately as the finite sum of the product $f(x)\Delta x$ evaluated at some given points in the interval $(a,b)$ .

In the limit $\Delta x\rightarrow \delta x\rightarrow 0$ , the finite sum tends to the infinite integral, and so convergence is assured.

In the case of the study of the rate of convergence, one is interested in increasing the accuracy of the approximation, while retaining the number of subintervals, with a minor increase in computational complexity.

The approach used consists normally in using a polynomial approximation for the evaluation of the function $f(x)$ in each subinterval, using the information provided by the value of the function at several points in the subinterval.

Let us consider the case of equally-spaced points (although this restriction can be easily lifted):

According to the definition,

$\int _{a}^{b}f(x)dx=\sum _{x=a}^{x=b}\lim _{\delta x\rightarrow 0}f(x)\delta x=\lim _{\delta x\rightarrow 0}\sum _{x=a}^{x=b}f(x)\delta x\approx \sum _{x=a}^{x=b}f({x_{k}})\Delta {x_{k}}=\sum _{x=a}^{x=b}f({x_{k}})({x_{j+1}}-{x_{j}})$ ,

with $k$ being indicative of the subinterval, ${x_{k}}$ being some arbitrary number in every subinterval $({x_{j}},{x_{j+1}})$ , with $j=0,1,2,...,n-1$ , and ${x_{0}}=a,{x_{n}}=b$ , and $\Delta {x_{k}}=({x_{j+1}}-{x_{j}})$ , with $k=1,2,...,n$ , and $j=k-1$ .

The Mean-Value Theorem of Calculus tells us, that if

$\int _{a}^{b}f(x)dx$

is the definite integral of $f(x)$ in $(a,b)$ , there exists a value $\gamma$ in $(a,b)$ , such that

$\int _{a}^{b}f(x)dx=f(\gamma )(b-a)$ .

Additionally, by definition, if

$\int _{a}^{b}f(x)dx$

is the definite integral of $f(x)$ in $(a,b)$ , this one can also be understood as being composed of the individual contributions

$\int _{a}^{b}f(x)dx=\int _{{x_{0}}=a}^{x_{1}}f(x)dx+\int _{x_{1}}^{x_{2}}f(x)dx+...+\int _{x_{n-2}}^{x_{n-1}}f(x)dx+\int _{x_{n-1}}^{x_{n}=b}f(x)dx$

for arbitrary values ${x_{0}}=a,{x_{1}},{x_{2}},...,{x_{n-2}},{x_{n-1}},{x_{n}}=b$ .

Now, applying the Mean-Value Theorem to each individual contribution yields the result:

$\int _{a}^{b}f(x)dx=f(\gamma _{({x_{0}},{x_{1}})})({x_{1}}-{x_{0}})+f(\gamma _{({x_{1}},{x_{2}})})({x_{2}}-{x_{1}})+...+f(\gamma _{({x_{n-2}},{x_{n-1}})})({x_{n-1}}-{x_{n-2}})+f(\gamma _{({x_{n-1}},{x_{n}})})({x_{n}}-{x_{n-1}})$ ,

which is exact.

In the particular case that every subinterval is of equal size, i.e., $({x_{1}}-{x_{0}})=({x_{2}}-{x_{1}})=...=({x_{n-1}}-{x_{n-2}})=({x_{n}}-{x_{n-1}})=\Delta x$ , then the expression reduces to

$\int _{a}^{b}f(x)dx=\left(f(\gamma _{({x_{0}},{x_{1}})})+f(\gamma _{({x_{1}},{x_{2}})})+...+f(\gamma _{({x_{n-2}},{x_{n-1}})})+f(\gamma _{({x_{n-1}},{x_{n}})})\right)\Delta x$ .

In this way, the calculation of the initial definite integral reduces to the calculation of the mean values

$f(\gamma _{({x_{0}},{x_{1}})}),f(\gamma _{({x_{1}},{x_{2}})}),...,f(\gamma _{({x_{n-2}},{x_{n-1}})}),f(\gamma _{({x_{n-1}},{x_{n}})})$ .

In a first approximation, with only one point,

$f(\gamma _{({x_{0}},{x_{1}})})\approx f({x_{i}}_{({x_{0}},{x_{1}})}),f(\gamma _{({x_{1}},{x_{2}})})\approx f({x_{i}}_{({x_{1}},{x_{2}})}),...,f(\gamma _{({x_{n-2}},{x_{n-1}})})\approx f({x_{i}}_{({x_{n-2}},{x_{n-1}})}),f(\gamma _{({x_{n-1}},{x_{n}})})\approx f({x_{i}}_{({x_{n-1}},{x_{n}})})$ ,

with ${x_{i}}$ being the value of $x$ in the middle of each subinterval, which leads to

$\int _{a}^{b}f(x)dx\approx \left(f({x_{i}}_{({x_{0}},{x_{1}})})+f({x_{i}}_{({x_{1}},{x_{2}})})+...+f({x_{i}}_{({x_{n-2}},{x_{n-1}})})+f({x_{i}}_{({x_{n-1}},{x_{n}})})\right)\Delta x$ .

In a second approximation, with only two points,

$f(\gamma _{({x_{0}},{x_{1}})})\approx {\frac {f({x_{0}})+f({x_{1}})}{2}},f(\gamma _{({x_{1}},{x_{2}})})\approx {\frac {f({x_{1}})+f({x_{2}})}{2}},...,f(\gamma _{({x_{n-2}},{x_{n-1}})})\approx {\frac {f({x_{n-2}})+f({x_{n-1}})}{2}},f(\gamma _{({x_{n-1}},{x_{n}})})\approx {\frac {f({x_{n-1}})+f({x_{n}})}{2}}$ ,

with ${x_{n-1}},{x_{n}}$ being the value of $x$ at the beginning and at the end of each subinterval, which leads to

$\int _{a}^{b}f(x)dx\approx \left(f({x_{0}})+2f({x_{1}})+...+2f({x_{n-1}})+f({x_{n}})\right)\Delta x/2$ .

In a third approximation, with only three points,

$f(\gamma _{({x_{0}},{x_{1}})})\approx {\frac {f({x_{0}})+f({x_{i}}_{({x_{0}},{x_{1}})})+f({x_{1}})}{3}},f(\gamma _{({x_{1}},{x_{2}})})\approx {\frac {f({x_{1}})+f({x_{i}}_{({x_{1}},{x_{2}})})+f({x_{2}})}{3}},...,f(\gamma _{({x_{n-2}},{x_{n-1}})})\approx {\frac {f({x_{n-2}})+f({x_{i}}_{({x_{n-2}},{x_{n-1}})})+f({x_{n-1}})}{3}},f(\gamma _{({x_{n-1}},{x_{n}})})\approx {\frac {f({x_{n-1}})+f({x_{i}}_{({x_{n-1}},{x_{n}})})+f({x_{n}})}{3}}$ ,

with ${x_{n-1}},{x_{i}},{x_{n}}$ being the value of $x$ at the beginning, in the middle, and at the end of each subinterval, which leads to

$\int _{a}^{b}f(x)dx\approx \left(f({x_{0}})+f({x_{i}}_{({x_{0}},{x_{1}})})+2f({x_{1}})+f({x_{i}}_{({x_{1}},{x_{2}})})+2f({x_{2}})+...+2f({x_{n-2}})+f({x_{i}}_{({x_{n-2}},{x_{n-1}})})+2f({x_{n-1}})+f({x_{i}}_{({x_{n-1}},{x_{n}})})+f({x_{n}})\right)\Delta x/3$ .

In a fourth approximation, while still making use of the evaluation of the function $f(x)$ at the beginning, in the middle, and at the end of each subinterval, one can use the result that if $f(\gamma ^{-})$ and $f(\gamma ^{+})$ are estimates for $f(\gamma )$ , then its arithmetic mean $\left(f(\gamma ^{-})+f(\gamma ^{+})\right)/2$ is also another estimate with at least the same accuracy, if not better.

So, adding the results for the first and second approximation, and dividing by two,

$f(\gamma _{({x_{n-1}},{x_{n}})})\approx {\frac {f({x_{i}}_{({x_{n-1}},{x_{n}})})+{\frac {f({x_{n-1}})+f({x_{n}})}{2}}}{2}}={\frac {f({x_{n-1}})+2f({x_{i}}_{({x_{n-1}},{x_{n}})})+f({x_{n}})}{4}}$ ,

which leads to the result

$\int _{a}^{b}f(x)dx\approx \left(f({x_{0}})+2f({x_{i}}_{({x_{0}},{x_{1}})})+2f({x_{1}})+2f({x_{i}}_{({x_{1}},{x_{2}})})+2f({x_{2}})+...+2f({x_{n-2}})+2f({x_{i}}_{({x_{n-2}},{x_{n-1}})})+2f({x_{n-1}})+2f({x_{i}}_{({x_{n-1}},{x_{n}})})+f({x_{n}})\right)\Delta x/4$ .

Adding the results for the first and third approximation, and dividing by two,

$f(\gamma _{({x_{n-1}},{x_{n}})})\approx {\frac {f({x_{i}}_{({x_{n-1}},{x_{n}})})+{\frac {f({x_{n-1}})+f({x_{i}}_{({x_{n-1}},{x_{n}})})+f({x_{n}})}{3}}}{2}}={\frac {f({x_{n-1}})+4f({x_{i}}_{({x_{n-1}},{x_{n}})})+f({x_{n}})}{6}}$ ,

which leads to the result

$\int _{a}^{b}f(x)dx\approx \left(f({x_{0}})+4f({x_{i}}_{({x_{0}},{x_{1}})})+2f({x_{1}})+4f({x_{i}}_{({x_{1}},{x_{2}})})+2f({x_{2}})+...+2f({x_{n-2}})+4f({x_{i}}_{({x_{n-2}},{x_{n-1}})})+2f({x_{n-1}})+4f({x_{i}}_{({x_{n-1}},{x_{n}})})+f({x_{n}})\right)\Delta x/6$ .

In practice, one can do no better with only three evaluations in an interval, but the results obtained are simple and accurate enough, even in the case of one single interval.

Let us illustrate the case by means of an example:

Let us suppose, that we wish to calculate the definite integral of the function $f(x)=exp(x)$ in the interval $(0,1)$ , for which we know its exact value, $F(x)|_{0}^{1}=\int _{0}^{1}exp(x)dx=exp(x)|_{0}^{1}=exp(1)-exp(0)=2.71828-1.00000=1.71828$ .

Let us also keep the problem simple and do the calculations with a single interval, i.e., ${x_{0}}=a=0.0$ , ${x_{n}}=b=1.0$ , and $x_{i}=0.5$ .

So, we have:

First approximation:

$\int _{a}^{b}f(x)dx\approx f(x_{i})(b-a)=exp(0.5)(1.0-0.0)=1.64872$