Haskell/Preliminaries

Note

This page is not part of the main Haskell Wikibook. It overlaps with the primary introductory material but isn't as polished. It has some assumptions about imperative programming background. Because learning works best when experiencing things from many angles, reading this as a supplement to reinforce basic concepts may be more effective than simply re-reading other chapters. Still, this page is basically extraneous.

(All the examples in this chapter can be typed into a Haskell source file and evaluated by loading that file into GHC or Hugs.)

Variables that Don't and Functions that Do

One of the main ideas behind functional programming is referential transparency — the idea that a function and a parameter can always be replaced by the function's return value. For example, the expression 2+2 is interchangeable with the value 4. This facilitates a number of important concepts in Haskell including lazy evaluation and the ability to reorder expressions to execute them concurrently.

Unlike imperative languages such as C++, Haskell does not treat variables as memory locations. In Haskell, variables are bound to a particular value within whatever scope the variable exists. They are more like constants than variables in most cases. The concept of variable in math is actually the same as that in Haskell. In the following example, x is bound to 3; it can't possibly change.

x = 6 / 2

This may seem extremely weird if you are used to imperative programming. How can we write useful programs without actual variables? Well, Haskell's function parameters can serve the same purpose: once you pass a value to a function, that function can use it wherever it wants. To change the value of a parameter, just call the function again. So the answer is recursion.

Here are some Haskell declarations:

x = 3
y = x * 2

As you've probably guessed, by the end y = 6. Looks quite a bit like math doesn't it?

However, unlike a conventional language, the order doesn't matter. Haskell doesn't have a notion of "flow of control", so it doesn't start executing at the top and working its way down. So this means exactly the same as the above:

y = x * 2
x = 3

You can of course also define functions. This is where life starts to get a little more interesting.

double x = x * 2
z = double y

This says that the function "double" takes an argument "x" and the result is equal to twice x. Note the lack of parentheses: in Haskell "double y" means apply the function "double" to the value "y". Parentheses are only needed to determine evaluation priority, just like in math.

So far, this lets you do about as much as a spreadsheet would: variables in this world look a lot like cells in a spreadsheet, and you can set up functions that can "do" things with these variables.

Now look at this function:

factorial 0 = 1
factorial n = n * factorial (n-1)

This defines a new function called "factorial". The first line says that the factorial of 0 is 1, and the second line says that the factorial of any other number "n" is equal to "n" times the factorial of "n-1". Note the parentheses around the "n-1": without them this would have been parsed as "(factorial n) - 1"; function application (as this process is known) has higher precedence than any regular operator like addition or exponentiation.

This shows how you do "loops" in Haskell. You don't have a loop counter like "n = n - 1" because that would change the value of "n". So instead you use recursion.

Unlike our earlier example, the order of the two recursive declarations is important. Haskell matches function calls starting at the top and picking the first one that matches. So, if the two declarations were reversed then the compiler would conclude that factorial 0 equals 0 * factorial -1, and so on to infinity. Not what we want.

Tail Recursion

In case you're wondering about stack space, won't all this recursion eventually overflow the stack? The answer is that Haskell supports tail recursion (every decent functional language's compiler does). Tail recursion is the idea that if the last thing a function does is call itself, the interpreter may as well not bother putting things on the stack, and just "jump" to the beginning using the function's new arguments. In this case, though, the factorial function has to call itself with "(n-1)" and then multiply the result by "n".

So our factorial function isn't tail recursive (because it has to multiply result by n after calling itself). This one is, though:

factorial n = factorial' n 1

factorial' 0 a = a
factorial' n a = factorial' (n-1) (n*a)

Here are two functions. They have almost the same name, except the second has a tick mark. (Haskell identifiers are allowed to contain tick marks.) By convention, ticks are used to indicate a variation on something.

Here, the factorial' function takes an argument n as before, but it also takes a second "accumulator" argument. Because calling itself is the last thing it does, the tail recursion rule applies: nothing gets put on the stack; the factorial' function simply executes as if its arguments had always been n-1 and n*a.

(Note the similarity between this parameter and a variable. Parameters are, in a way, the variables of Haskell.)

A programmer using this function doesn't want to worry about the accumulator argument, so the plain factorial function is used to "wrap up" factorial' by supplying the initial value of the accumulator.

Exercises

Type the factorial function into a Haskell source file and load it into your favourite Haskell environment.

What is factorial 5?

What about factorial 1000? Is that what you expected?

What about factorial -1?

Types

You may have noticed something interesting above. None of your parameters or anything was declared to be of any particular type.

That's because Haskell uses a type system to infer the type of every expression, the Hindley-Milner type system. But Haskell is not weakly typed; it's strongly typed like C++, so you can't get runtime type errors. Any program that compiles is guaranteed to not have type errors in it. Another advantage is that when you make a mistake, it will almost always change the type of the resulting expression, and it'll be flagged by the compiler as such. (Your program simply won't compile.) You may still want to write types for your functions for a few reasons.

It helps to document your program.
When you get something wrong, the type declarations help to narrow down the source of the problem.
Most importantly, it will help you learn. Understanding the type system is fundamental to getting a good grasp on Haskell.

The syntax for type declarations looks like this:

x :: Integer
factorial :: Integer -> Integer

You should read the first declaration as saying "x is of type Integer" or "x has type Integer". The '::' tells us we're referring to a type. You'll note factorial looks a bit different. It's a function that takes an Integer and returns an Integer. Later on you'll see why we use -> in describing the types, and why describing the type of a function is so close syntactically to describing the type of a variable.

One final note: identifiers that start with uppercase letters are reserved for types, modules and data constructors (somewhat like namespaces, don't worry about these for now). If you try to declare a variable or function that begins with an uppercase letter, the compiler will complain.