Foundations of Computer Science/Algorithms and Programs

Algorithms and Programs

An algorithm can be defined as a set of steps used to solve a specific problem. For example, a cook may use a recipe when preparing a specific type of food. Similarly, in computer science, algorithms are the conceptual solutions used to create programs. It is important to distinguish an algorithm from a program. The implementation of an algorithm is known as a program.

Defining information processes

Computer is about information processes. Once information is represented concretely using different patterns of symbols it can be processed to derive new information. We learned that computers use the binary system internally to represent everything as sequence of bits - zeros and ones. Chapter 1 of the Blown to Bits book talks about the digital explosion of bits as the result of the innovations in computing and technologies, which enable us to turn information into bits and shared them with unprecedented speed.

Creating information processes is the topic of this chapter. We will learn that information processes start with conceptual solutions to problems and then can be implemented (coded) in ways understandable to machines. The conceptual solutions are called algorithms and the executable implementations are called programs.

What is an algorithm?

Algorithm is a rather fancy name for a simple idea: a step-by-step solution to a problem. Avi Wigderson once said algorithm is a common language for nature, human, and computer. The idea has been around for a long time. You are already familiar with many algorithms, such as tying your shoes, making coffee, send an email, and cooking a dish according to a recipe. Algorithms in computing are designed for computers to follow. Imagine we have built a machine that can perform the single digit addition procedure described in chapter one. Recall the procedure performs the addition using simple table lookup. If we give the machine two digits and ask it to perform the operation, it gives two digits back as the answer. Of course the numbers in the inputs and the output have to be represented (encoded) properly. Even though the machine doesn't understand addition, it should be able to perform the addition correctly. However, the machine will not perform the addition unless it is instructed to do so. The command (with input values) that signals the machine to perform an addition is called an instruction. We have imagined that it is not hard to use the addition procedure to create other more complex procedures that can perform more impressive activities. Before we can create such procedures we must identify a problem and find a conceptual solution to it. Often time the conceptual solution is one that can be carried out manually by a person. This conceptual solution is an algorithm.

Why study algorithms?

Algorithm is a center piece in the computer science discipline. As we discussed in chapter one, computing can be done blindly or purely mechanically by a simple device. The intelligence of any computation (information process) lies in the algorithm that defines it.

For an algorithm to be useful, it must be correct - the steps must be logical and specific for machines to carry out - and efficient - it must finish in a reasonable amount of time. The correctness and efficiency of algorithms are two key issues in the study of algorithms.

Programs are implemented algorithms

Studying algorithms allows us to solve problems conceptually regardless of the machines that carry out the solutions. An algorithm must communicate a conceptual solution in a unambiguous and human understandable fashion. A notational system for describing algorithms should allow us to describe and reason with ideas on paper. Once an algorithm's correctness is verified on paper, it can be implemented as a program understandable to a particular machine.

Formal definition of algorithm

Alan Turing is the first person who studied algorithms mathematically by creating a universal machine model, later called Turing machine. He also proved that computation is unavoidable in circumstances - we will have to perform the computation to the result, which separates computing from mathematics (the birth of computer science). The turing machine can represent/encode information in a standard form and interpret and updates the representation according to rules (algorithms), which is part of the representation. This machine model is simple yet powerful. In fact, it is the most powerful computing model known to computer scientist. The turing machine can perform any computation done by any machine we could ever build. Turing machine equivalents is defined based on this idea.

The turing machine model allows us to study algorithms in abstraction. For instance, we can view each algorithm as a state machine: an algorithm always starts with a state - represented by a data representation of the input and the internal information - and move through a number of states to its final state as the result of performing operations prescribed in the algorithm. When the number of possible initial states approach infinity the same algorithm can generate potentially infinite number of computation. This explains why it is hard to verify the correctness of an algorithm through testing as the initial states can be too many to exhaustively enumerate.

Define algorithms

An algorithm is simply a set of steps that allow us to solve a particular problem. A step is a unit of work that is unambiguous and can be done in a fixed amount of time. For example bring a pot of water to boil is a step in the tea-making process/algorithm. In computing we deal with representations of information (data), so a step can be adding two integers and storing the result to a variable. We will explain how to define and use variables later.

The definition of a unit of work depends on what the agent, who performs work, can do. Algorithms in computing are necessarily informed by the capability of the computing machines. Recall that algorithms must be implemented/described in a programming language understandable to a machine before the machine can perform the task. There are many different programming languages, therefore different ways to express the same algorithm. The only language understandable to a specific machine is called the machine language. Machine languages are written in instructions consisting of zeros and ones (binary bits) because computers are fundamentally machines that can manipulate two types of symbols. Each different type of machine is designed to understand its own native language—patterns of zeros and ones—because they can have very different computing hardware. As you can imagine writing programs in machine languages can be very hard. Normally we write programs to express our algorithms in high level languages - languages that are close to our natural language, e.g. English. Then we use tools (compilers and interpreters) to translate our programs in higher level languages to machine languages, which is analogous to using a personal interpreter when we travel abroad and don't understand the native language. To run the same program on a different machine we can simply recompile it or use a different interpreter. High level languages hides the differences between machines to allow us to write programs in a machine independent way, which is a huge time saver. When we write programs in high level languages we use an abstraction that is supported by all computers. For instance if a high level language allows the expression of an addition we assume it can be done by all computers.

Programming languages (high level or machine level) are tools for expressing algorithms to machines. When we create algorithms to solve problems conceptually we want to create them independent of the languages. A well-designed recipe ought to work for different cooks in different kitchens. So the steps or units of work must be defined in terms of a higher abstraction - a set of common data structure, operations and control structures that all languages support. By creating algorithms conceptually using abstractions allows us humans to think on a higher level closer to the problem domain we know. When an algorithm is implemented in a particular language the abstract steps can be mapped to the specific expression in the language. We are confident the chain of tools we have can translate the solution to the machine level executable code. The following diagram shows that the same algorithm can be implemented in a variety of programming languages and the result programs can be executed on different machine models (maybe after some translation).

This diagram shows that the same algorithm can be implemented in a variety of programming language and then result programs can be run on different machine models.

Here are the common operations and control structures we can assume all high level languages support:

data structures: variables of single value and list of values.
operations: arithmetic operations, comparisons, and relational operations (and, or, and not)
control structures: sequential (one after another), conditional (selective on a condition), and repetition

Here is an algorithm defined in pseudo code (natural language). This algorithm finds the largest number in a list of numbers with the following steps:

set max (a variable) to the value of the first number in the list (store and retrieve values).
go through the list one number at a time comparing each number to max, if the number is greater than max replace max with the number (conditional and repetition).
the value stored in max is the answer.

We know the solution is correct because it is so simple. We know how to carry it out manually, but we would certainly not solve the problem using the process expressed in psuedo code. It is necessary to design and express the algorithm in this detailed fashion for computers. Keep in mind that computers are machines that perform computation mechanically and therefore the instructions must be specific. The psuedo code, even though in natural language, must use the aforementioned constructs (data structures, operations, and control structures) that are common to computers. You may wonder why we can not ask the computer to look at the whole list and identify the largest number as we humans do. Computers are simple machines that can not think. They are designed to perform simple operations, e.g. adding two digits via symbol manipulation. Even we, as human beings, will not be able to scan a long list, say a million numbers, and find the largest number at a glance. The algorithm we write must be expressed in terms of what a computer can do and are scalable to inputs (data sets) of arbitrary sizes. The algorithm we just studied can deal with a list of any size. In fact, it makes little difference to a computer whether the list has three numbers or three million numbers.

Another way to express the same algorithm is to use a graphical notation called flow-chart.

This flow-chart shows the steps involved in finding the largest number in a list of numbers.

This chart shows the logic of the solution more clearly. There are two conditionals - the checking of a condition and corresponding actions taken as the result. The top-most conditional defines a repetition (loop) because their an unconditional branching back to the conditional as expressed in the arrow with no label.

Both the pseudo code and the flow chart describe the same solution to the same problem. They are different representations of the same idea. The following figure shows an implementation of the algorithm in Scratch.

This stack of blocks scratch finds the largest number in a list of numbers.

As you can see, a concrete implementation of an algorithm must use the building "blocks" available in the particular implementation language. What is not shown in the figure is the part where the list is populated with data from a file or user input. The structure of the code resembles that of the flow-chart.

In summary constructing and studying algorithms allows us to study (algorithmic) solution to problems in a language and computing environment neutral way. We could use the "finding largest number" algorithm as a single block to form more complex algorithms, but we need to keep in mind this block is not a unit of work as the number of steps involved depends on the input size (the length of the list). We will revisit this topic in the future when we talk about functional decomposition (to keep algorithms simple) and algorithm complexity analysis (count the cost of algorithms).

Programs Each software program boils down to two components - data (structure) and algorithm. We will study some fundamental algorithms and data structures in computer science.

Example Algorithms

Image enconding/representation

Follow this http://csunplugged.org/sites/default/files/activity_pdfs_full/unplugged-02-image_representation.pdf image representation activity to see how images are encoded, transmitted, and reproduced in fax machines.

Error Detection

Follow this http://⁰csunplugged.org/sites/default/files/activity_pdfs_full/unplugged-04-error_detection.pdf error detection activity to see the algorithm works to detect and also correct single bit errors.

A similar algorithm, Luhn algorithm, is used to validate credit card numbers.

Text Compression

Text compression is another important task in computing. The following activity demonstrates how a compression algorithm works: http://csunplugged.org/sites/default/files/activity_pdfs_full/unplugged-03-text_compression.pdf

Searching

Why is searching important? We do it on a daily basis. It is good business too. Google's mission is to organize the world's information and make it universally accessible and useful. Obviously to be able to find the information we need fast is very useful and profitable.

We can always find a piece of information by going through a list of them sequentially checking each one of them. Could you describe the algorithm using either the pseudo code or the flow-chart notation? Structure-wise this algorithm should resemble the "find largest number" algorithm. This algorithm needs two inputs: the list and the target item we are looking for. The repeated steps are fetching the next item and comparing it to target. The piece of information used in the comparison is also known as the key because it determines whether a search is successful or not. For instance, if we have a list of students we can search a student by last name, birthdate, or shoe size, which are search keys.

A sequential search is straight-forward, but it can be costly if we need to perform it very often. However, if the list is ordered by the search key we can use a much better algorithm by taking advantage of this ordered property of the data. Think about the index of a book, a phone book, or a dictionary. They are all ordered somehow. For instance, home phone numbers in a phone books are usually ordered by owner's last names and business phone numbers are ordered by business types. Entries in a dictionary or the index of a book are ordered alphabetically. How does this orderedness help us when we search for information? It allows us to guesstimate where the search target is located. The number guessing game illustrate the idea well. If the list of numbers are random but ordered increasingly we can always guess the target number is in the middle of the search range (initially the whole list). If we are not lucky, we can eliminate half of the candidates - if the number in the middle is greater than the search target the new search range will be the first half of the previous search range, otherwise the second half. Imagine the reduction in the number of comparisons! We will study this algorithm in much detail when we discuss algorithm complexities.

Social impact

Please watch the following video How algorithms shape our world. Could you explain in what ways algorithms are shaping our world?

In the world of computing, we store and process data (representations of information) by quantifying information. This quantification process reduces the world to what can be counted and measured and emphasize abstraction and efficiency (speed). We must not be fooled to believe the abstractions of reality are true reality as in abstractionism. As Frederick Brooks warns us “Models are intentional oversimplifications to help us with real-life problems that are frighteningly complicated. The map is not the terrain.”