Foundations of Computer Science/Information Representation

Information Representation

Introductory problem

Computers often represent colors as a red-green-blue (RGB) set of numbers, called a "triple", where each of the red, green, and blue components is an integer between 0 and 255. For example, the color (255, 0, 10) has full red, no green, and a small amount of blue. Write an algorithm that takes as input the RGB components for a color, and returns a message indicating the largest component or components. For example, if the input color is (100, 255, 0), the algorithm should output "Largest component(s): green". And if the input color is (255, 255, 255), then the algorithm should output "Largest component(s): red, green, blue".

Overview of this chapter

One amazing aspect of computers is they can store so many different types of data. Of course computers can store numbers. But unlike simple calculators they can also store text, and they can store colors, and images, and audio, and video, and many other types of data. And not only can they store many different types, but they can also analyze them, and they can transmit them to other computers. This versatility is one reason why computers are so useful, and affect so many areas of our lives.

To understand computers and computer science, it is important to know something about how computers deal with different types of data. Let's return to colors. How are colors stored in a computer? The introductory problem states one way: as an RGB triple. This is not the only possible way. RGB is just one of many color systems. For example, sometimes colors are represented as an HSV triple: by hue, saturation, and value. However, RGB is the most common color representation in computer programs.

This leads to a deeper issue: how are numbers stored in a computer? And why is it important anyway that we understand how numbers, and other different types of data, are stored and processed in a computer? This chapter deals with these and related questions. In particular, we will look at the following:

Why is this an important topic?
How do computers represent numbers?
How do computers represent text?
How do computers represent other types of data such as images?
What is the binary number system and why is it important in computer science?
How do computers do basic operations such as addition and subtraction?

Goals

Upon completing this chapter, you should be able to do the following:

Be able to explain how, on the lowest level, computers represent both numeric and text data, as well as other types of data such as color data.
Be able to explain and use the basic terminology in this area: bit, byte, megabyte, RGB triple, ASCII, etc.
Be able to convert numbers and text from one representation to another.
Be able to convert integers from one representation to another, for example from decimal representation to two's complement representation.
Be able to add and subtract numbers written in unsigned binary or in two's complement representation.
Be able to explain how the number of bits used to represent data affects the range and precision of the representation.
Be able to explain in general how computers represent different types of data such as images.
Be able to do calculations involving amounts of memory or download times for certain datasets.

Data representation and mathematics

How is data representation related to liberal education and mathematics? As you might guess, there is a strong connection. Computers store all data in terms of binary (i.e., base 2) numbers. So to understand computers it is necessary to understand binary. Moreover, you need to understand not only binary basics, but also some of the complications such as the "two's complement" notation discussed below.

Binary representation is important not only because it is how computers represent data, but also because so much of computers and computing is based on it. For example, we will see it again in the chapter on machine organization.

Data representation and society and technology

The computer revolution. That is a phrase you often hear used to describe the many ways computers are affecting our lives. Another phrase you might hear is the digital revolution. What does the digital revolution mean?

Nowadays, many of our devices are digital. We have digital watches, digital phones, digital radio, digital TVs, etc. However, previously many devices were analog: "data ... represented by a continuously variable physical quantity" ^[1] Think, for example, of an old watch with second, minute, and hour hands that moved continuously (although very slowly for the minute and hour hands). Compare this with many modern-day watches that shows a digital representation of the time such as 2:03:23.

This example highlights a key difference between analog and digital devices: analog devices rely on a continuous phenomenon and digital devices rely on a discrete one. As a second example of this difference, an analog radio receives audio radio broadcast signals which are transmitted as radio waves, while a digital radio receives signals which are streams of numbers.^[2]

The digital revolution refers to the many digital devices, their uses, and their effects. These devices include not only computers, but also other devices or systems that play a major role in our lives, such as communication systems.

Because digital devices usually store numbers using the binary number system, a major theme in this chapter is binary representation of data. Binary is fundamental to computers and computer science: to understand how computers work, and how computer scientists think, you need to understand binary. The first part of this chapter therefore covers binary basics. The second part then builds on the first and explains how computers store different types of data.

Representation basics

Introduction

Computing is fundamentally about information processes. Each computation is a certain manipulation of symbols, which can be done purely mechanically (blindly). If we can represent information using symbols and know how to process the symbols and interpret the results, we can access valuable new information. In this section we will study information representation in computing.

The algorithms chapters discuss ways to describe a sequence of operations. Computer scientists use algorithms to specify behavior of computers. But for these algorithms to be useful they need data, and so computers need ways to represent data.^[3]

Information is conveyed as the content of messages, which when interpreted and perceived by our senses, causes certain mental responses. Information is always encoded into some form for transmission and interpretation. We deal with information all the time. For example, we receive information when we read a book, listen to a story, watch a movie, or dream a dream. We give information when we write an email, draw a picture, act in a show or give a speech. Information is abstract but it is conveyed through concrete media. For instance, a conversation on the phone communicates information but the information is represented by sound waves and electronic signals along the way.

Information is abstract/virtual and the media that carry the information must be concrete/physical. Therefore before any information can be processed or communicated it must be quantified/digitized: a process that turns information into (data) representations using symbols.

People have many ways to represent even a very simple number. For example, the number four can be represented as 4 or IV or |||| or 2 + 2, and so on. How do computers represent numbers? (Or text? Or audio files?)

The way computers represent and work with numbers is different from how we do. Since early computer history, the standard has been the binary number system. Computers "like" binary because it is extremely easy for them. However, binary is not easy for humans. While most of the time people do not need to be concerned with the internal representations that computers use, sometimes they do.

Why binary?

Suppose you and some friends are spending the weekend at a cabin. The group will travel in two separate cars, and you all agree that the first group to arrive will leave the front light on to make it easier for the later group. When the car you are in arrives at the cabin you will be able to tell by the light if your car arrived first. The light therefore encodes two possibilities: on (the other group has already arrived) or off (the other group hasn't arrived yet).

To convey more information you could use two lights. For example, both off could mean the first group hasn't arrived yet, the first light off and second on indicate the first group has arrived but left to get supplies, the first on and second off that the group arrived but left to go fishing, and both on that the group has arrived and hasn't left.

Note the key ideas here: a light can be on or off (we don't allow different level of light, multiple colors, or other options), just two possibilities. But the second is that if we want to represent more than two choices we can use more lights.

This "on or off" idea is a powerful one. There are two and only two distinct choices or states: on or off, 0 or 1, black or white, present or absent, large or small, rough or smooth, etc.—all of these are different ways of representing possibilities. One reason the two-choice idea is so powerful is it is easier to build objects—computers, cameras, CDs, and so on—where the data at the lowest level is in two possible states, either a 0 or a 1.^[4]

In computer representation, a bit (i.e., a binary digit) can be a 0 or a 1. A collection of bits is called a bitstring. A bitstring that is 8 bits long is called a byte. Bits and bytes are important concepts in computer storage and data transmission, and later on we'll explain them further along with some related terminology and concepts. But first we will look at the basic question of how a computer represents numbers.

A brief historic aside

Claude Shannon is considered the father of information theory because he is the first person who studied and built mathematical models for information and communication of information. He also made many other significant contributions to computing. His seminal paper “A mathematical theory of communication” (1948) changed our view of information, laying the foundation for the information age. Shannon discovered that the fundamental unit of information is a yes or no answer to a question or one bit with two distinct states, which can be represented by only two symbols. He also founded the design theory of digital computers/circuits by proving that propositions of Boolean algebra can be used to build a "logic machine" capable of carrying out general computation (manipulation of two types of symbols). Data, another term closely related to information, is an abstract concept of representations of information. We will use information representations and data interchangeably.

External and internal information representation

Information can be represented on different levels. It is helpful to separate information representations into two categories: external representation and internal representation. External representation is used for communication between humans and computers. Everything we see on a computer monitor or screen, whether it is text, image, or motion picture, is a representation of certain information. Computers also represent information externally using sound and other media, such as touch pads for the blind to read text.

Internally all modern computers represent information as bits. We can think of a bit as a digit with two possible values. Since a bit is the fundamental unit of information it is sufficient to represent all information. It is also the simplest representation because only two symbols are needed to represent two distinct values. This makes it easy to represent bits physically - any device capable of having two distinct states works, e.g. a toggle switch. We will see later that modern computer processors are made up of tiny switches called transistors.

Review of the decimal number system

When bits are put together into sequences they can represent numbers. We are familiar with representing quantities with numbers. Numbers are concrete symbols representing abstract quantities. With ten fingers, humans conveniently adopted the base ten (decimal) numbering system, which requires ten different symbols. We all know decimal representation and use it every day. For instance, the arabic numerals use 0 through 9. Each symbol represents a power of ten depending on the position the symbol is in.

So, for example, the number one hundred and twenty-four is $(1\times 100)+(2\times 10)+(4\times 1)$ . We can emphasize this by writing the powers of 10 over the digits in 124:

10^2  10^1 10^0
   1     2    4

So if we take what we know about base 10 and apply it to base 2 we can figure out binary. But first recall that a bit is a binary digit and a byte is 8 bits. In this file most of the binary numbers we talk about will be one byte long.

(Computers actually use more than one byte to represent most numbers. For example, most numbers are actually represented using 32 bits (4 bytes) or 64 bits (8 bytes). The more bits, the more different values you can represent: a single bit permits 2 values, 2 bits give 4 values, 3 bits gives 8 values, ..., 8 bits give 256 values, and in general n bits gives $2^{n}$ values. However when looking at binary examples we'll usually use 8 bit numbers to make the examples manageable.

This base ten system used for numbering is somewhat arbitrary. In fact, we commonly use other base systems to represent quantities of different nature: base 7 for days in a week, base 60 for minutes in an hour, 24 for hours in a day, 16 for ounces in a pound, and so on. It is not hard to imagine base 2 (two symbols) is the simplest base system, because with fewer than two symbols, we cannot represent change (and therefore no information).

Unsigned binary

When we talk about decimal, we deal with 10 digits—0 through 9 (that's where decimal comes from). In binary we only have two digits, that's why it's binary. The digits in binary are 0 and 1. You will never see any 2's or 3's, etc. If you do, something is wrong. A bit will always be a 0 or 1.

Counting in binary proceeds as follows:

    0     (decimal 0) 
    1     (decimal 1) 
   10     (decimal 2) 
   11     (decimal 3) 
  100     (decimal 4) 
  101     (decimal 5) 
  ...

An old joke runs, "There are 10 types of people in the world. Those who understand binary and those who don't."

The next thing to think about is what values are possible in one byte. Let's write out the powers of two in a byte:

2^7  2^6  2^5  2^4  2^3  2^2  2^1  2^0
128  64   32   16   8    4    2    1

As an example, the binary number 10011001 is $(1\times 128)+(0\times 64)+(0\times 32)+(1\times 16)+(1\times 8)+(0\times 4)+(0\times 2)+(1\times 1)=153.$ Note each of the 8 bits can either be a 0 or a 1. So there are two possibilities for the leftmost bit, two for the next bit, two for the bit after that, and so on: two choices for each of the 8 bits. Multiplying these possibilities together gives $2^{8}$ or 256 possibilities. In unsigned binary these possibilities represent the integers between 0 (all bits 0) to 255 (all bits 1).

All base systems work in the same way: the rightmost digit represents the quantity of the base raised to the zeroth power (recall that anything raised to the 0th power results in 1), and each digit to the left represents a quantity that is base times larger than the one represented by the digit immediately to the right. The binary number 1001 represents the quantity 9 in decimal, because the rightmost 1 represents $2^{0}=1$ , the zeroes contribute nothing at the $2^{1}$ and $2^{2}$ positions, and finally the leftmost one represents $2^{3}=8$ . When we use different base systems it is necessary to indicate the base as the subscript to avoid confusion. For example, we write $1001_{2}$ to indicate the number 1001 in binary (which represents the quantity 9 in decimal). The subscript 2 means "binary": it tells the reader that it does not represent a thousand and one in decimal. This example also shows us that representations have no intrinsic meaning. The same pattern of symbols, e.g. 1001, can represent different quantities depending on the way it is interpreted. There are many other ways to represent the quantity $9_{10}$ (remember: read this as "nine in base 10 / decimal"); for instance, the symbol 九 represents the same quantity in Chinese.

As the same quantity can be represented differently, we can often change the representation without changing the quantity it represents. As shown before, the binary representation $1001_{2}$ is equivalent to the decimal representation $9_{10}$ - representing exactly the same quantity. In studying computing we often need to convert between decimal representation, which we are most familiar with, and binary representation, which is used internally by computers.

Binary to decimal conversion

Converting the binary representation of a non-negative integer to its decimal representation is a straight-forward process: summing up the quantities each binary digit represents yields the result.

$1001_{2}=1\times 2^{3}+0\times 2^{2}+0\times 2^{1}+1\times 2^{0}=8+0+0+1=9_{10}$

Decimal to binary conversion

One task you will need to do in this book, and which computer scientists often need to do, is to convert a decimal number to or from a binary number. The last subsection showed how to convert binary to decimal: take each power of 2 whose corresponding bit is a 1, and add those powers together.

Suppose we want to do a decimal to binary conversion. As an example, let's convert the decimal value 75 to binary. Here's one technique that relies on successive division by 2:

75/2   quotient=37   remainder=1
37/2   quotient=18   remainder=1
18/2   quotient=9    remainder=0
9/2    quotient=4    remainder=1
4/2    quotient=2    remainder=0
2/2    quotient=1    remainder=0
1/2    quotient=0    remainder=1

We then take the remainders bottom-to-top to get 1001011. Since we usually work with group of 8 bits, if it doesn't fill all eight bits, we add zeroes at the front until it does. So we end up with 01001011.

Binary mathematics

Addition of binary numbers

In addition to storing data, computers also need to do operations such as addition of data. How do we add numbers in binary representation?

Addition of bits has four simple rules, shown here as four vertical columns:

  0      0      1      1
+ 0    + 1    + 0    + 1
=========================
  0      1      1      10

Now if we have a binary number consisting of multiple bits we use these four rules, plus "carrying". Here's an example:

  00110101
+ 10101100
==========
  11100001

Here's the same example, but with the carried bits listed explicitly, i.e., a 0 if there is no carry, and a 1 if there is. When 1+1=10, the 0 is kept in that column's solution and the 1 is carried over to be added to the next column left.

  0111100
  00110101
+ 10101100
==========
  11100001

We can check binary operations by converting each number to decimal: with both binary and decimal we're doing the same operations on the same numbers, but with different representations. If the representations and operations are correct the results should be consistent. Let's look one more time at the example addition problem we just solved above. Converting $00110101_{2}$ to decimal produces $53_{10}$ (do the conversion on your own to verify its accuracy), and converting $10101100_{2}$ gives $172_{10}$ . Adding these yields $225_{10}$ , which, when converted back to binary is indeed $11100001_{2}$ .

But binary addition doesn't always work quite right:

  01110100
+ 10011111
==========
 100010011

Note there are 9 bits in the result, but there should only be 8 in a byte. Here is the sum in decimal:

  116
+ 159 
=====
  275

Note 275 which is greater than 255, the maximum we can hold in an 8-bit number. This results in a condition called overflow. Overflow is not an issue if the computer can go to a 9-bit binary number; however, if the computer only has 8 bits set aside for the result, overflow means that a program might not run correctly or at all.

Subtraction of binary numbers

Once again, let's start by looking at single bits:

  0      0      1      1
- 0    - 1    - 0    - 1
========================
  0     -1      1      0

Notice that in the -1 case, what we often want to do is get a 1 result and borrow. So let's apply this to an 8-bit problem:

  10011101
- 00100010
==========
  01111011

which is the same as (in base 10),

  157
-  34
======
  123

Here's the binary subtraction again with the borrowing shown:

  1100010
  10011101
- 00100010
==========
  01111011

Most people find binary subtraction significantly harder than binary addition.

Other representations related to binary

You might have had questions about the binary representation in the last section. For example, what about negative numbers? What about numbers with a fractional part? Aren't all those 0's and 1's difficult for humans to work with? These are good questions. In this and a couple of other sections we'll look at a few other representations that are used in computer science and are related to binary.

Hexadecimal

Computers are good at binary. Humans aren't. Binary is hard for humans to write, hard to read, and hard to understand. But what if we want a number system that is easier to read but still is closely tied to binary in some way, to preserve some of the advantages of binary?

One possibility is hexadecimal, i.e., base 16. But using a base greater than 10 immediately presents a problem. Specifically, we run out of digits after 0 to 9 — we can't use 10, 11, or greater because those have multiple digits within them. So instead we use letters: A is 10, B is 11, C is 12, D is 13, E is 14, and F is 15. So the digits we're using are 0 through F instead of 0 through 9 in decimal, or instead of 0 and 1 in binary.

We also have to reexamine the value of each place. In hexadecimal, each place represents a power of 16. A two-digit hexadecimal number has a 16's place and a 1's place. For example, D8 has D in the 16's place, and 8 in the 1's place:

16^1  16^0  <- hexadecimal places showing powers of 16
16    1     <- value of these places in decimal (base 10)
D     8     <- our sample hexadecimal number

So the hexadecimal number D8 equals $(13\times 16)+(8\times 1)=216$ in decimal. Note any two digit hexadecimal number, however, can represent the same amount of information as one byte of binary. (That's because the largest two-digit hex number $FF_{16}=(15\times 16)+(15\times 1)=255_{1}0=11111111_{2}$ , the same maximum as 8 bits of binary.) So it's easier for us to read or write.

When working with a number, there are times when which representation is being used isn't clear. For example, does 10 represent the number ten (so the representation is decimal), the number two (the representation is binary), the number sixteen (hexadecimal), or some other number? Often, the representation is clear from the context. However, when it isn't, we use a subscript to clarify which representation is being used, for example $10_{10}$ for decimal, versus $10_{2}$ for binary, versus $10_{16}$ for hexadecimal.

Hexadecimal numbers can have more hexadecimal digits than the two we've already seen. For example, consider $FF0581A4_{16}$ , which uses the following powers of 16:

16^7  16^6  16^5  16^4  16^3  16^2  16^1  16^0
F     F     0     5     8     1     A     4

So in decimal this is: $(15\times 16^{7})+(15\times 16^{6})+(0\times 16^{5})+(5\times 16^{4})$ $+(8\times 16^{3})+(1\times 16^{2})+(10\times 16^{1})+(4\times 16^{0})$ $=4,278,550,948$

Hexadecimal doesn't appear often, but it is used in some places, for example sometimes to represent memory addresses (you'll see this in a future chapter) or colors. Why is it useful in such cases? Consider a 24-bit RGB color with 8 bits each for red, green, and blue. Since 8 bits requires 2 hexadecimal digits, a 24-bit color needs 6 hexadecimal digits, rather than 24 bits. For example, FF0088 indicates a 24-bit color with a full red component, no green, and a mid-level blue.

Now there are additional types of conversion problems:

* Decimal to hexadecimal
* Hexadecimal to decimal
* Binary to hexadecimal
* Hexadecimal to binary

Here are a couple examples involving the last two of these.

Let's convert the binary number 00111100 to hexadecimal. To do this, break it into two 4-bit parts: 0011 and 1100. Now convert each part to decimal and get 3 and 12. The 3 is a hexadecimal digit, but 12 isn't. Instead recall that C is the hexadecimal representation for 12. So the hexadecimal representation for 00111100 is 3C.

Rather than going from binary to decimal (for each 4-bit segment) and then to hexadecimal digits, you could go from binary to hexadecimal directly.

Hexadecimal digits and their decimal and binary equivalents: first, base 16 (hexadecimal), then base 10 (decimal), then base 2 (binary).

16 10  2 <- bases
===========
0  0   0000
1  1   0001
2  2   0010
3  3   0011
4  4   0100
5  5   0101
6  6   0110
7  7   0111
8  8   1000
9  9   1001
A  10  1010
B  11  1011
C  12  1100
D  13  1101
E  14  1110
F  15  1111

Now let's convert the hexadecimal number D6 to binary. D is the hexadecimal representation for $13_{10}$ , which is 1101 in binary. 6 in binary is 0110. Put these two parts together to get 11010110. Again we could skip the intermediate conversions by using the hexadecimal and binary columns above.

Text representation

A piece of text can be viewed as a stream of symbols can be represented/encoded as a sequence of bits resulting in a stream of bits for the text. Two common encoding schemes are ASCII code and Unicode. ASCII code use one byte (8 bits) to represent each symbol and can represent up to 256 ( $2^{8}=256$ ) different symbols, which includes the English alphabet (in both lower and upper cases) and other commonly used symbols. Unicode extends ASCII code to represent a much larger number of symbols using multiple bytes. Unicode can represent any symbol from any written language and much more.

Image, audio, and video files

Images, audio, and video are other types of data. How computers represent these types of data is fascinating but complex. For example, there are perceptual issues (e.g., what types of sounds can humans hear, and how does that affect how many numbers we need to store to reliably represent music?), size issues (as we'll see below, these types of data can result in large file sizes), standards issues (e.g., you might have heard of JPEG or GIF image formats), and other issues.

We won't be able to cover image, audio, and video representation in depth: the details are too complicated, and can get very sophisticated. For example, JPEG images can rely on an advanced mathematical technique called the discrete cosine transform. However, it is worth examining a few key high-level points about image, audio, and video files:

Computers can represent not only basic numeric and text data, but also data such as music, images, and video.
They do this by digitizing the data. At the lowest level the data is still represented in terms of bits, but there are higher-level representational constructs as well.
There are numerous ways to encode such data, and so standard encoding techniques are useful.
Audio, image, and video files can be large, which presents challenges in terms of storing, processing and transmitting these files. For this reason most encoding techniques use some sophisticated types of compression.

Images

A perceived image is the result of light beams physically coming into our eyes and triggering nerves to send signals to our brain. In computing, an image is simulated by a grid of dots (called pixels, for "picture element"), each of which has a particular color. This works because our eyes cannot tell the difference between the original image and the dot-based image if the resolution (number of dots used) is high enough. In fact, the computer screen itself uses such a grid of pixels to display images and text.

"The largest and most detailed photograph of our galaxy ever taken has been unveiled. The gigantic nine-gigapixel image captures more than 84 million stars at the core of the Milky Way. It was created with data gathered by the Visible and Infrared Survey Telescope for Astronomy (VISTA) at the European Southern Observatory's Paranal Observatory in Chile. If it was printed with the resolution of a newspaper it would stretch 30 feet long and 23 feet tall, the team behind it said, and has a resolution of 108,200 by 81,500 pixels."^[5]

While this galaxy image is obviously an extreme example, it illustrates that images (even much smaller images) can take significant computer space. Here is a more mundane example. Suppose you have an image that is 1500 pixels wide, and 1000 pixels high. Each pixel is stored as a 24-bit color. How many bytes does it take to store this image?

This problem describes a straightforward but naive way to store the image: for each row, for each column, store the 24-bit color at that location. The answer is $1500\times 1000$ pixels multiplied by 24 bits/pixel multiplied by 8 bits per 1 byte = 4.5 million bytes, or about 4.5MB.

Note the file size. If you store a number of photographs or other images you know that images, and especially collections of images, can take up considerable storage space. You might also know that most images do not take 4.5MB. And you have probably heard of some image storage formats such as JPEG or GIF.

Why are most image sizes tens or hundreds of kilobytes rather than megabytes? Most images are stored not in a direct format, but using some compression technique. For example, suppose you have a night image where the entire top half of the image is black ((0,0,0) in RGB). Rather than storing (0,0,0) as many times as there are pixels in the upper half of the image, it is more efficient to use some "shorthand." For example, rather than having a file that has thousands of 0's in it, you could have (0,0,0) plus a number indicating how many pixels starting the image (if you read from line by line from top to bottom) have color (0,0,0).

This leads to a compressed image: an image that contains all, or most, of the information in the original image, but in a more efficient representation. For example, if an original image would have taken 4MB, but the more efficient version takes 400KB, then the compression ratio is 4MB to 400KB, or about 10 to 1.

Complicated compression standards, such as JPEG, use a variety of techniques to compress images. The techniques can be quite sophisticated.

How much can an image be compressed? It depends on a number of factors. For many images, a compression ratio of, say, 10:1 is possible, but this depends on the image and on its use. For example, one factor is how complicated an image is. An uncomplicated image (say, as an extreme example, if every pixel is black^[6]), can be compressed a very large amount. Richer, more complicated images can be compressed less. However, even complicated images can usually be compressed at least somewhat.

Another consideration is how faithful the compressed image is to the original. For example, many users will trade some small discrepancies between the original image and the compressed image for a smaller file size, as long as those discrepancies are not easily noticeable. A compression scheme that doesn't lose any image information is called a lossless scheme. One that does is called lossy. Lossy compression will give better compression than lossless, but with some loss of fidelity.^[7]

In addition, the encoding of an image includes other metadata, such as the size of the image, the encoding standard, and the date and time when it was created.

Video

It is not hard to imagine that videos can be encoded as series of image frames with synchronized audio tracks also encoded using bits.

Suppose you have a 10 minute video, 256 x 256 pixels, 24 bits per pixel, and 30 frames of the video per second. You use an encoding that stores all bits for each pixel for each frame in the video. What is the total file size? And suppose you have a 500 kilobit per second download connection; how long will it take to download the file?

This problem highlights some of the challenges of video files. Note the answer to the file size question is (256x256) pixels $\times$ 24 bits/pixel $\times$ 10 minutes $\times$ 60 seconds/minute $\times$ 30 frames per second = approximately 28 Gb (Gb means gigabits). This is about 28/8 = 3.5 gigabytes. With a 500 kilobit per second download rate, this will take 28Gb/500 Kbps, or about 56,000 seconds. This is over 15 hours, longer than many people would like to wait. And the time will only increase if the number of pixels per frame is larger (e.g., in a full screen display) or the video length is longer, or the download speed is slower.

So video file size can be an issue. However, it does not take 15 hours to download a ten minute video; as with image files, there are ways to decrease the file size and transmission time. For example, standards such as MPEG make use not only of image compression techniques to decrease the storage size of a single frame, but also take advantage of the fact that a scene in one frame is usually quite similar to the scene in the next frame. There's a wealth of information online about various compression techniques and standards, storage media, etc.^[8]

Audio

It might seem, at first, that audio files shouldn't take anywhere as much space as video. However, if you think about how complicated audio such as music can be, you probably won't be surprised that audio files can also be large.

Sound is essentially vibrations, or collections of sound waves travelling through the air. Humans can hear sound waves that have frequencies of between 20 and 20,000 cycles per second.^[9] To avoid certain undesirable artifacts, audio files need to use a sample rate of twice the highest frequency. So, for example, for a CD music is usually sampled 44,100 Hz, or 44,100 times per second.^[10] And if you want a stereo effect, you need to sample on two channels. For each sample you want to store the amplitude using enough bits to give a faithful representation. CDs usually use 16 bits per sample. So a minute of music takes 44,100 samples $\times$ 16 bits/samples $\times$ 2 channels $\times$ 60 seconds/minute $\times$ 8 bits/1 byte = about 10.5MB per minute. This means a 4 minute song will take about 40MB, and an hour of music will take about 630 MB, which is (very) roughly the amount of memory a typical CD will hold.^[11]

Note, however, that if you want to download a 40 MB song over a 1Mbps connection, it will take 40MB/1Mbps, which comes to about 320 seconds. This is not a long time, but it would be desirable if it could be shorter. So, not surprisingly, there are compression schemes that reduce this considerably. For example, there is an MPEG audio compression standard that will compress 4 minutes songs to about 4MB, a considerable reduction.^[12]

Sizes and limits of representations

In the last section we saw that a page of text could take a few thousand bytes to store. Images files might take tens of thousands, hundreds of thousands, or even more bytes. Music files can take millions of bytes. Movie files can take billions. There are databases that consist of trillions or quadrillions of bytes of data.

Computer science has special terminology and notation for large numbers of bytes. Here is a table of memory amounts, their powers of two, and approximate American English word.

1 kilobyte (KB) —  $2^{10}$  bytes — thousand bytes
1 megabyte (MB) —  $2^{20}$  bytes — million bytes
1 gigabyte (GB) —  $2^{30}$  bytes — billion bytes
1 terabyte (TB) —  $2^{40}$  bytes — trillion bytes
1 petabyte (PB) —  $2^{50}$  bytes — quadrillion bytes
1 exabyte (EB) —  $2^{60}$  bytes — quintillion bytes

There are still higher numbers or smaller quantities of these types.^[13]

Kilobytes, megabytes, and the other sizes are important enough for discussing file sizes, computer memory sizes, and so on, that you should know both the terminology and the abbreviations. One caution: file sizes are usually given in terms of bytes (or kilobytes, megabytes, etc.). However, some quantities in computer science are usually given in terms involving bits. For example, download speeds are often given in terms of bits per second. "Mbps" is an abbreviation for megabits (not megabytes) per second. Notice the 'b' in Mbps is a lower case, while the 'b' in MB (megabytes) is capitalized.

In the context of computer memory, the usual definition of kilobytes, megabytes, etc. is a power of two. For example, a kilobyte is $2^{10}=1024$ bytes, not a thousand. In some other situations, however, a kilobyte is defined to be exactly a thousand bytes. This can obviously be confusing. For the purposes of this book, the difference will usually not matter. That is, in most problems we do, an approximation will be close enough. So, for example, if we do a calculation and find a file takes 6,536 bytes, then you can say this is approximately 6.5 KB, unless the problem statement says otherwise.^[14]

All representations are limited in multiple ways. First, the number of different things we can represent is limited because the number combinations of symbols we can use is always limited by the physical space available. For instance, if you were to represent a decimal number by writing it down on a piece of paper, the size of the paper and the size of the font limit how many digits you can put down. Similarly in a computer the number of bits can be stored physically is also limited. With three binary digits we can generate $2^{3}=8$ different representations/patterns, namely $000_{2},001_{2},010_{2},011_{2},100_{2},101_{2},110_{2},111_{2}$ , which conventionally represent 0 through 7 respectively. Keep in mind representations do not have intrinsic meanings. So three bits can possibly represent seven different things. With n bits we can represent $2^{n}$ different things because each bit can be either one or zero and $2^{n}$ are the total combinations we can get, which limits the amount of information we can represent.

Another type of limit is due to the nature of the representations. For example, one third can never be represented precisely by a decimal format with a fractional part because there will be an infinite number of threes after the decimal point. Similarly, one third can not be represented precisely in binary format either. In other words, it is impossible to represent one third as the sum of a finite list of power of twos. However, in a base-three numbering system one third can be represented precisely as: $0.1_{3}$ because the one after the point represent a power of three: $3^{-1}$ .

Notes and references

↑ Analog at Wiktionary.
↑ Actually, it's more complicated than that because some devices, including some digital radios, intermix digital and analog. For example, a digital radio broadcast might start in digital form, i.e., as a stream of numbers, then be converted into and transmitted as radio waves, then received and converted back into digital form. Technically speaking the signal was modulated and demodulated. If you have a modem (modulator-demodulator) on your computer, it fulfills a similar function.
↑ Actually we need not only data, but a way to represent the algorithms within the computer as well. How computers store algorithm instructions is discussed in another chapter.
↑ Of course how a 0 or 1 is represented varies according to the device. For example, in a computer the common way to differentiate a 0 from a 1 is by electrical properties, such as using different voltage levels. In a fiber optic cable, the presence or absence of a light pulse can differentiate 0's from 1's. Optical storage devices can differentiate 0's and 1's by the presence or absence of small "dents" that affect the reflectivity of locations on the disk surface.
↑ [1]
↑ You might have seen modern art paintings where the entire work is a single color.
↑ See, for example, [2] for examples of the interplay between compression rate and image fidelity.
↑ For example, see [3] and the links there.
↑ This is just a rough estimate since there is much individual variation as well as other factors that affect this range.
↑ Hz, or Hertz, is a measurement of frequency. It appears in a variety of places in computer science, computer engineering, and related fields such as electrical engineering. For example, a computer monitor might have a refresh rate of 60Hz, meaning it is redrawn 60 times per second. It is also used in many other fields. As an example, in most modern day concert music, A above middle C is taken to be 440 Hz.
↑ See, for example, [4] for more information about how CDs work. In general, there is a wealth of web sites about audio files, formats, storage media, etc.
↑ Remember there is also an MPEG video compression standard. MPEG actually has a collection of standards: see Moving Picture Experts Group on Wikipedia.
↑ See, for example, binary prefixes.
↑ The difference between "round" numbers, such as a million, and powers of 10 is not as pronounced for smaller numbers of bytes as it is for larger. A kilobyte is $2^{10}=1024$ bytes, which is only 2.4% more than a thousand. A megabyte is $2^{20}=1,048,576$ bytes, about 4.9% more than one million. A gigabyte is about 7.4% bytes more than a billion, and a terabyte is about 10.0% more bytes than a trillion. In most of the file size problems we do, we'll be interested in the approximate size, and being off by 2% or 5% or 10% won't matter. But of course there are real-world applications where it does matter, so when doing file size problems keep in mind we are doing approximations, not exact calculations.

[1] Analog at Wiktionary.

[2] Actually, it's more complicated than that because some devices, including some digital radios, intermix digital and analog. For example, a digital radio broadcast might start in digital form, i.e., as a stream of numbers, then be converted into and transmitted as radio waves, then received and converted back into digital form. Technically speaking the signal was modulated and demodulated. If you have a modem (modulator-demodulator) on your computer, it fulfills a similar function.

[3] Actually we need not only data, but a way to represent the algorithms within the computer as well. How computers store algorithm instructions is discussed in another chapter.

[4] Of course how a 0 or 1 is represented varies according to the device. For example, in a computer the common way to differentiate a 0 from a 1 is by electrical properties, such as using different voltage levels. In a fiber optic cable, the presence or absence of a light pulse can differentiate 0's from 1's. Optical storage devices can differentiate 0's and 1's by the presence or absence of small "dents" that affect the reflectivity of locations on the disk surface.

[5] [1]

[6] You might have seen modern art paintings where the entire work is a single color.

[7] See, for example, [2] for examples of the interplay between compression rate and image fidelity.

[8] For example, see [3] and the links there.

[9] This is just a rough estimate since there is much individual variation as well as other factors that affect this range.

[10] Hz, or Hertz, is a measurement of frequency. It appears in a variety of places in computer science, computer engineering, and related fields such as electrical engineering. For example, a computer monitor might have a refresh rate of 60Hz, meaning it is redrawn 60 times per second. It is also used in many other fields. As an example, in most modern day concert music, A above middle C is taken to be 440 Hz.

[11] See, for example, [4] for more information about how CDs work. In general, there is a wealth of web sites about audio files, formats, storage media, etc.

[12] Remember there is also an MPEG video compression standard. MPEG actually has a collection of standards: see Moving Picture Experts Group on Wikipedia.

[13] See, for example, binary prefixes.

[14] The difference between "round" numbers, such as a million, and powers of 10 is not as pronounced for smaller numbers of bytes as it is for larger. A kilobyte is $2^{10}=1024$ bytes, which is only 2.4% more than a thousand. A megabyte is $2^{20}=1,048,576$ bytes, about 4.9% more than one million. A gigabyte is about 7.4% bytes more than a billion, and a terabyte is about 10.0% more bytes than a trillion. In most of the file size problems we do, we'll be interested in the approximate size, and being off by 2% or 5% or 10% won't matter. But of course there are real-world applications where it does matter, so when doing file size problems keep in mind we are doing approximations, not exact calculations.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]