Fundamentals of Data Representation: Rounding Errors

PAPER 2 - ⇑ Fundamentals of data representation ⇑
← Binary fractions	Rounding errors	Absolute and relative errors →

From the Specification : Fundamentals of Data Representation - Rounding Errors

Know and be able to explain why both fixed point and floating point representation of decimal numbers may be inaccurate.

For a real number to be represented exactly by the binary number system, it must be capable of being represented by a binary fraction in the given number of bits. Some values cannot ever be represented exactly, for example 0.1₁₀.

Maths in a processor is normally performed using set numbers of bits. For example, where you add 8 bits to 8 bits. This will often cause no problems at all:

 00110011 (51)
+00001010 (10)
 --------
 00111101 (61)

But what happens if we add the following numbers together:

 01110011 (115)
+01001010 (74)
 --------
 10111101 (189)

This may appear to have gone ok, but we have a problem. If we are dealing with twos complement numbers the answer from adding two positive numbers together is negative!

 01110011 (115)
+01001010 (74)
 --------
 10111101 (-67!)

Overflow

Let's take a look at another problem example, the problem of overflow

Overflow - When the result of a calculation is too large to fit into a set number of bits

    1010 (-6)
   +1010 (-6)
    --------
 (1)0100 (+4!)

As you can see in the sum above, we have added two negative numbers together and the result is a positive number.

To deal with the situations mentioned above we use the status register

Status Register - information about process states such as whether a result is zero, positive/negative or resulted in overflow.

The most common flags

Flag	Name	Description
Z	Zero flag	Indicates that the result of an arithmetic or logical operation (or, sometimes, a load) was zero.
C	Carry flag	Enables numbers larger than a single word (in the examples above 4 or 8 bits) to be added/subtracted by carrying a binary digit from a less significant word to the least significant bit of a more significant word as needed
S / N	Sign flag / Negative flag	One indicates whether the result was negative whereas the other indicates whether a subtraction or addition has taken place.
O	Overflow flag	Indicates that the signed result of an operation is too large to fit in the register width using twos complement representation.
P	Parity flag	Indicates whether the number of set bits of the last result is odd or even.

Status register working

For the sum that we met earlier we will take a look at how the status register can be used to stop the incorrect answer arising:

 01110011 (115)
+01001010 (74)
 --------
 10111101 (-67)

Status register: Z = False | C = False | N = True | O = True | P = Even

Using these flags you can see that the result is negative, if the original sum used only positive values, then we know we have an error.

Looking at the other equation:

    1010 (-6)
   +1010 (-6)
    ----
 (1)0100

Status register: Z = False | C = True | N = False | O = True | P = Odd

Using these flags you can see that the result is positive when the original used two negative numbers. We can also see that overflow occurred.

Exercise: Status register

What is the problem with the result of the following 4 bit sum:

    1011 (-5)
   +1011 (-5)
    ----

Answer:

The result would create overflow, giving an incorrect answer:

    1011 (-5)
   +1011 (-5)
    ----
 (1)0110 (+6)

In the context of calculations what is overflow?

Answer:

When a the result of a calculation is too large to fit into a set number of bits.

What do we need the status register for?

Answer:

The status register holds flags keeping track of the results of sums, this helps us to see when there is an error in a result and correct it accordingly

Name three flags in a status register:

Answer:

Overflow, Carry, Negative, Zero

Show the Status register for the following sum:

    1001 (-7)
   +1001 (-7)
    ----
 (1)0010 (+2)

Answer:

Status register: Z = False | C = True | N = False | O = True | P = Odd