Yet Another Haskell Tutorial/Type basics

Haskell

Yet Another Haskell Tutorial
Preamble
Introduction
Getting Started
Language Basics (Solutions)
Type Basics (Solutions)
IO (Solutions)
Modules (Solutions)
Advanced Language (Solutions)
Advanced Types (Solutions)
Monads (Solutions)
Advanced IO
Recursion
Complexity
This box: view • talk • edit

Haskell uses a system of static type checking. This means that every expression in Haskell is assigned a type. For instance 'a' would have type Char, for "character." Then, if you have a function which expects an argument of a certain type and you give it the wrong type, a compile-time error will be generated (that is, you will not be able to compile the program). This vastly reduces the number of bugs that can creep into your program.

Furthermore, Haskell uses a system of type inference. This means that you don't even need to specify the type of expressions. For comparison, in C, when you define a variable, you need to specify its type (for instance, int, char, etc.). In Haskell, you needn't do this -- the type will be inferred from context.

Note

If you want, you certainly are allowed to explicitly specify the type of an expression; this often helps debugging. In fact, it is sometimes considered good style to explicitly specify the types of outermost functions.

Both Hugs and GHCi allow you to apply type inference to an expression to find its type. This is done by using the :t command. For instance, start up your favorite shell and try the following:

Example:

Prelude> :t 'c'
'c' :: Char

This tells us that the expression 'c' has type Char (the double colon :: is used throughout Haskell to specify types).

Simple Types

There are a slew of built-in types, including Int (for integers, both positive and negative), Double (for floating point numbers), Char (for single characters), String (for strings), and others. We have already seen an expression of type Char; let's examine one of type String:

Example:

Prelude> :t "Hello"
"Hello" :: String

You can also enter more complicated expressions, for instance, a test of equality:

Example:

Prelude> :t 'a' == 'b'
'a' == 'b' :: Bool

You should note that even though this expression is false, it still has a type, namely the type Bool.

Note

Bool is short for Boolean (pronounced "boo-lee-uhn") and has two possible values: True and False.

You can observe the process of type checking and type inference by trying to get the shell to give you the type of an ill-typed expression. For instance, the equality operator requires that the type of both of its arguments are of the same type. We can see that Char and String are of different types by trying to compare a character to a string:

Example:

Prelude> :t 'a' == "a"
ERROR - Type error in application
*** Expression     : 'a' == "a"
*** Term           : 'a'
*** Type           : Char
*** Does not match : [Char]

The first line of the error (the line containing "Expression") tells us the expression in which the type error occurred. The second line tells us which part of this expression is ill-typed. The third line tells us the inferred type of this term and the fourth line tells us what it needs to have matched. In this case, it says that type Char doesn't match the type [Char] (a list of characters -- a string in Haskell is represented as a list of characters).

As mentioned before, you can explicitly specify the type of an expression using the :: operator. For instance, instead of "a" in the previous example, we could have written ("a"::String). In this case, this has no effect since there's only one possible interpretation of "a". However, consider the case of numbers. You can try:

Example:

Prelude> :t 5 :: Int
5 :: Int
Prelude> :t 5 :: Double
5 :: Double

Here, we can see that the number 5 can be instantiated as either an Int or a Double. What if we don't specify the type?

Example:

Prelude> :t 5
5 :: Num a => a

Not quite what you expected? What this means, briefly, is that if some type a is an instance of the Num class, then type of the expression 5 can be of type a. If that made no sense, that's okay for now. In Section Classes we talk extensively about type classes (which is what this is). The way to read this, though, is to say "a being an instance of Num implies a."

Exercises

Figure out for yourself, and then verify the types of the following expressions, if they have a type. Also note if the expression is a type error:

'h':'e':'l':'l':'o':[]
[5,'a']
(5,'a')
(5::Int) + 10
(5::Int) + (10::Double)

Polymorphic Types

Haskell employs a polymorphic type system. This essentially means that you can have type variables, which we have alluded to before. For instance, note that a function like tail doesn't care what the elements in the list are:

Example:

Prelude> tail [5,6,7,8,9]
[6,7,8,9]
Prelude> tail "hello"
"ello"
Prelude> tail ["the","man","is","happy"]
["man","is","happy"]

This is possible because tail has a polymorphic type: [ $\alpha$ ] -> [ $\alpha$ ]. That means it can take as an argument any list and return a value which is a list of the same type.

The same analysis can explain the type of fst:

Example:

Prelude> :t fst
fst :: (a,b) -> a

Here, GHCi has made explicit the universal quantification of the type values. That is, it is saying that for all types a and b, fst is a function from (a,b) to a.

Exercises

Figure out for yourself, and then verify the types of the following expressions, if they have a type. Also note if the expression is a type error:

snd
head
null
head . tail
head . head

Type Classes

We saw last section some strange typing having to do with the number five. Before we delve too deeply into the subject of type classes, let's take a step back and see some of the motivation.

Motivation

In many languages (C++, Java, etc.), there exists a system of overloading. That is, a function can be written that takes parameters of differing types. For instance, the canonical example is the equality function. If we want to compare two integers, we should use an integer comparison; if we want to compare two floating point numbers, we should use a floating point comparison; if we want to compare two characters, we should use a character comparison. In general, if we want to compare two things which have type $\alpha$ , we want to use an $\alpha -compare$ . We call $\alpha$ a type variable since it is a variable whose value is a type.

Note

In general, type variables will be written using the first part of the Greek alphabet: $\alpha ,\beta ,\gamma ,\delta ,...$ .

Unfortunately, this presents some problems for static type checking, since the type checker doesn't know which types a certain operation (for instance, equality testing) will be defined for. There are as many solutions to this problem as there are statically typed languages (perhaps a slight exaggeration, but not so much so). The one chosen in Haskell is the system of type classes. Whether this is the "correct" solution or the "best" solution of course depends on your application domain. It is, however, the one we have, so you should learn to love it.

Equality Testing

Returning to the issue of equality testing, what we want to be able to do is define a function == (the equality operator) which takes two parameters, each of the same type (call it $\alpha$ ), and returns a boolean. But this function may not be defined for every type; just for some. Thus, we associate this function == with a type class, which we call Eq. If a specific type $\alpha$ belongs to a certain type class (that is, all functions associated with that class are implemented for $\alpha$ ), we say that $\alpha$ is an instance of that class. For instance, Int is an instance of Eq since equality is defined over integers.

The Num Class

In addition to overloading operators like ==, Haskell has overloaded numeric constants (i.e., 1, 2, 3, etc.). This was done so that when you type in a number like 5, the compiler is free to say 5 is an integer or floating point number as it sees fit. It defines the Num class to contain all of these numbers and certain minimal operations over them (addition, for instance). The basic numeric types (Int, Double) are defined to be instances of Num.

We have only skimmed the surface of the power (and complexity) of type classes here. There will be much more discussion of them in Section Classes, but we need some more background before we can get there. Before we do that, we need to talk a little more about functions.

The Show Class

Another of the standard classes in Haskell is the Show class. Types which are members of the Show class have functions which convert values of that type to a string. This function is called show. For instance show applied to the integer 5 is the string "5"; show applied to the character 'a' is the three-character string "'a'" (the first and last characters are apostrophes). show applied to a string simply puts quotes around it. You can test this in the interpreter:

Example:

Prelude> show 5
"5"
Prelude> show 'a'
"'a'"
Prelude> show "Hello World"
"\"Hello World\""

Note

The reason the backslashes appear in the last line is because the interior quotes are "escaped", meaning that they are part of the string, not part of the interpreter printing the value. The actual string doesn't contain the backslashes.

Some types are not instances of Show; functions for example. If you try to show a function (like sqrt), the compiler or interpreter will give you some cryptic error message, complaining about a missing instance declaration or an illegal class constraint.

Function Types

In Haskell, functions are first class values, meaning that just as 1 or 'c' are values which have a type, so are functions like square or ++. Before we talk too much about functions, we need to make a short diversion into very theoretical computer science (don't worry, it won't be too painful) and talk about the lambda calculus.

Lambda Calculus

The name "Lambda Calculus", while perhaps daunting, describes a fairly simple system for representing functions. The way we would write a squaring function in lambda calculus is: $\lambda x.x*x$ , which means that we take a value, which we will call $x$ (that's what $\lambda x.$ means) and then multiply it by itself. The $\lambda$ is called "lambda abstraction." In general, lambdas can only have one parameter. If we want to write a function that takes two numbers, doubles the first and adds it to the second, we would write: $\lambda x\lambda y.2*x+y$ . When we apply a value to a lambda expression, we remove the outermost $\lambda$ and replace every occurrence of the lambda variable with the value. For instance, if we evaluate $(\lambda x.x*x)5$ , we remove the lambda and replace every occurrence of $x$ with $5$ , yielding $(5*5)$ which is $25$ .

In fact, Haskell is largely based on an extension of the lambda calculus, and these two expressions can be written directly in Haskell (we simply replace the $\lambda$ with a backslash(\) and the $.$ with (->); also we don't need to repeat the lambdas; and, of course, in Haskell we have to give them names if we're defining functions):

square = \x -> x*x
f = \x y -> 2*x + y

You can also evaluate lambda expressions in your interactive shell:

Example:

Prelude> (\x -> x*x) 5
25
Prelude> (\x y -> 2*x + y) 5 4
14

We can see in the second example that we need to give the lambda abstraction two arguments, one corresponding to x and the other corresponding to y.

Higher-Order Types

"Higher-Order Types" is the name given to those types whose elements are functions. The type given to functions mimicks the lambda calculus representation of the functions. For instance, the definition of square gives $\lambda x.x*x$ . To get the type of this, we first ask ourselves what the type of x is. Say we decide x is an Int. Then, we notice that the function square takes an Int and produces a value x*x. We know that when we multiply two Ints together, we get another Int, so the type of the results of square is also an Int. Thus, we say the type of square is Int -> Int.

We can apply a similar analysis to the function f above(\x y -> 2*x + y). The value of this function (remember, functions are values) is something which takes a value x and produces a new value, which takes a value y and produces 2*x+y. For instance, if we take f and apply only one number to it, we get $(\lambda x\lambda y.2x+y)5$ which becomes our new value $\lambda y.2(5)+y$ , where all occurrences of $x$ have been replaced with the applied value, $5$ .

So we know that f takes an Int and produces a value of some type, of which we're not sure. But we know the type of this value is the type of $\lambda y.2(5)+y$ . We apply the above analysis and find out that this expression has type Int -> Int. Thus, f takes an Int and produces something which has type Int -> Int. So the type of f is Int -> (Int -> Int).

Note

The parentheses are not necessary; in function types, if you have $\alpha \rightarrow \beta \rightarrow \gamma$ it is assumed that $\beta \rightarrow \gamma$ is grouped. If you want the other way, with $\alpha \rightarrow \beta$ grouped, you need to put parentheses around them.

This isn't entirely accurate. As we saw before, numbers like 5 aren't really of type Int, they are of type Num a => a.

We can easily find the type of Prelude functions using ":t" as before:

Example:

Prelude> :t head
head :: [a] -> a
Prelude> :t tail
tail :: [a] -> [a]
Prelude> :t null
null :: [a] -> Bool
Prelude> :t fst
fst :: (a,b) -> a
Prelude> :t snd
snd :: (a,b) -> b

We read this as: "head" is a function that takes a list containing values of type "a" and gives back a value of type "a"; "tail" takes a list of "a"s and gives back another list of "a"s; "null" takes a list of "a"s and gives back a boolean; "fst" takes a pair of type "(a,b)" and gives back something of type "a", and so on.

Note

Saying that the type of fst is (a,b) -> a does not necessarily mean that it simply gives back the first element; it only means that it gives back something with the same type as the first element.

We can also get the type of operators like + and * and ++ and :; however, in order to do this we need to put them in parentheses. In general, any function which is used infix (meaning in the middle of two arguments rather than before them) must be put in parentheses when getting its type.

Example:

Prelude> :t (+)
(+) :: Num a => a -> a -> a
Prelude> :t (*)
(*) :: Num a => a -> a -> a
Prelude> :t (++)
(++) :: [a] -> [a] -> [a]
Prelude> :t (:)
(:) :: a -> [a] -> [a]

The types of + and * are the same, and mean that + is a function which, for some type a which is an instance of Num, takes a value of type a and produces another function which takes a value of type a and produces a value of type a. In short hand, we might say that + takes two values of type a and produces a value of type a, but this is less precise.

The type of ++ means, in shorthand, that, for a given type a, ++ takes two lists of as and produces a new list of as. Similarly, : takes a value of type a and another value of type [a] (list of as) and produces another value of type [a].

That Pesky IO Type

You might be tempted to try getting the type of a function like putStrLn:

Example:

Prelude> :t putStrLn
putStrLn :: String -> IO ()
Prelude> :t readFile
readFile :: FilePath -> IO String

What in the world is that IO thing? It's basically Haskell's way of representing that these functions aren't really functions. They're called "IO Actions" (hence the IO). The immediate question which arises is: okay, so how do I get rid of the IO. Well, you can't maintain the pure functionality of your code while using a function with a type such as IO String -> String. Haskell's pure functions always give the same result when applied to the same argument, just as in mathematics. readFile, on the other hand, yields whatever the file content happens to be when it is called. Suppose x = "TABLE OF CONTENTS" at one point in an ongoing Haskell process, and later x = "TABLE OF CONTENTS {new line} "Chapter One". That wouldn't be a problem in languages with mutable variables but in Haskell, once a value has been assigned to x, mutating that value could result in bizarre behavior prior to a system crash. So, in almost every use case you can imagine, the way to use things with an IO type is to isolate them from pure parts of the program by combining them with other functions.

For example, if you're reading a file using readFile, presumably you want to do something with the string it returns (otherwise, why would you read the file in the first place). Suppose you have a function f which takes a String and produces an Int. You can't directly apply f to the result of readFile since the input to f is String and the output of readFile is IO String and these don't match. However, you can combine these as:

main = do
  s <- readFile "somefile"
  let i = f s
  putStrLn (show i)

Here, we use the arrow convention to "get the string out of the IO action" and then apply f to the string (called s). We then, for example, print i to the screen. Note that the let here doesn't have a corresponding in. This is because we are in a do block. Also note that we don't write i <- f s because f is just a normal function, not an IO action. Note: putStrLn (show i) can be simplified to print i if you want.

Explicit Type Declarations

It is sometimes desirable to explicitly specify the types of some elements or functions, for one (or more) of the following reasons:

Clarity
Speed
Debugging

Some people consider it good software engineering to specify the types of all top-level functions. If nothing else, if you're trying to compile a program and you get type errors that you cannot understand, if you declare the types of some of your functions explicitly, it may be easier to figure out where the error is.

Type declarations are written separately from the function definition. For instance, we could explicitly type the function square as in the following code (an explicitly declared type is called a type signature):

square :: Num a => a -> a
square x = x*x

These two lines do not even have to be next to each other. However, the type that you specify must match the inferred type of the function definition (or be more specific). In this definition, you could apply square to anything which is an instance of Num: Int, Double, etc. However, if you knew apriori that square were only going to be applied to value of type Int, you could refine its type as:

square :: Int -> Int
square x = x*x

Now, you could only apply square to values of type Int. Moreover, with this definition, the compiler doesn't have to generate the general code specified in the original function definition since it knows you will only apply square to Ints, so it may be able to generate faster code.

If you have extensions turned on ("-98" in Hugs or "-fglasgow-exts" in GHC(i)), you can also add a type signature to expressions and not just functions. For instance, you could write:

square (x :: Int) = x*x

which tells the compiler that x is an Int; however, it leaves the compiler alone to infer the type of the rest of the expression. What is the type of square in this example? Make your guess then you can check it either by entering this code into a file and loading it into your interpreter or by asking for the type of the expression:

Example:

Prelude> :t (\(x :: Int) -> x*x)

since this lambda abstraction is equivalent to the above function declaration.

Functional Arguments

In the section on Lists we saw examples of functions taking other functions as arguments. For instance, map took a function to apply to each element in a list, filter took a function that told it which elements of a list to keep, and foldl took a function which told it how to combine list elements together. As with every other function in Haskell, these are well-typed.

Let's first think about the map function. Its job is to take a list of elements and produce another list of elements. These two lists don't necessarily have to have the same types of elements. So map will take a value of type [a] and produce a value of type [b]. How does it do this? It uses the user-supplied function to convert. In order to convert an a to a b, this function must have type a -> b. Thus, the type of map is (a -> b) -> [a] -> [b], which you can verify in your interpreter with ":t".

We can apply the same sort of analysis to filter and discern that it has type (a -> Bool) -> [a] -> [a]. As we presented the foldr function, you might be tempted to give it type (a -> a -> a) -> a -> [a] -> a, meaning that you take a function which combines two as into another one, an initial value of type a, a list of as to produce a final value of type a. In fact, foldr has a more general type: (a -> b -> b) -> b -> [a] -> b. So it takes a function which turn an a and a b into a b, an initial value of type b and a list of as. It produces a b.

To see this, we can write a function count which counts how many members of a list satisfy a given constraint. You can of course use filter and length to do this, but we will also do it using foldr:

module Count
    where

import Char

count1 bTest list = length (filter bTest list)
count2 bTest list = foldr (\x cnt -> if bTest x then cnt+1 else cnt) 0 list

The functioning of count1 is simple. It filters the list list according to the predicate bTest, then takes the length of the resulting list. On the other hand, count2 uses the initial value (which is an integer) to hold the current count. For each element in the list list, it applies the lambda expression shown. This takes two arguments, cnt which holds the current count and x which is the current element in the list that we're looking at. It checks to see if bTest holds about x. If it does, it returns the new value cnt+1, increasing the count of elements for which the predicate holds. If it doesn't, it just returns cnt, the old count. With lambda defined as above, count2 Char.isLower "aBCde" expands to lambda 'a' (lambda 'B' (lambda 'C' (lambda 'd' (lambda 'e' 0)))).

Exercises

Figure out for yourself, and then verify the types of the following expressions, if they have a type. Also note if the expression is a type error:

\x -> [x]
\x y z -> (x,y:z:[])
\x -> x + 5
\x -> "hello, world"
\x -> x 'a'
\x -> x x
\x -> x + x

Data Types

Tuples and lists are nice, common ways to define structured values. However, it is often desirable to be able to define our own data structures and functions over them. So-called "datatypes" are defined using the data keyword.

Pairs

For instance, a definition of a pair of elements (much like the standard, built-in pair type) could be:

data Pair a b = Pair a b

Let's walk through this code one word at a time. First we say "data" meaning that we're defining a datatype. We then give the name of the datatype, in this case, "Pair." The "a" and "b" that follow "Pair" are unique type parameters, just like the "a" is the type of the function map. So up until this point, we've said that we're going to define a data structure called "Pair" which is parameterized over two types, a and b. Note that you can't have Pair a a = Pair a a — in this case write Pair a = Pair a a.

After the equals sign, we specify the constructors of this data type. In this case, there is a single constructor, "Pair" (this doesn't necessarily have to have the same name as the type, but in the case of a single constructor it seems to make more sense). After this pair, we again write "a b", which means that in order to construct a Pair we need two values, one of type a and one of type b.

This definition introduces a function, Pair :: a -> b -> Pair a b that you can use to construct Pairs. If you enter this code into a file and load it, you can see how these are constructed:

Example:

Datatypes> :t Pair
Pair :: a -> b -> Pair a b
Datatypes> :t Pair 'a'
Pair 'a' :: a -> Pair Char a
Datatypes> :t Pair 'a' "Hello"
Pair 'a' "Hello" :: Pair Char [Char]

So, by giving Pair two values, we have completely constructed a value of type Pair. We can write functions involving pairs as:

pairFst (Pair x y) = x
pairSnd (Pair x y) = y

In this, we've used the pattern matching capabilities of Haskell to look at a pair and extract values from it. In the definition of pairFst we take an entire Pair and extract the first element; similarly for pairSnd. We'll discuss pattern matching in much more detail in the section on Pattern matching.

Exercises

Write a data type declaration for Triple, a type which contains three elements, all of different types. Write functions tripleFst, tripleSnd and tripleThr to extract respectively the first, second and third.
Write a datatype Quadruple which holds four elements. However, the first two elements must be the same type and the last two elements must be the same type. Write a function firstTwo which returns a list containing the first two elements and a function lastTwo which returns a list containing the last two elements. Write type signatures for these functions.

Multiple Constructors

We have seen an example of the data type with one constructor: Pair. It is also possible (and extremely useful) to have multiple constructors.

Let us consider a simple function which searches through a list for an element satisfying a given predicate and then returns the first element satisfying that predicate. What should we do if none of the elements in the list satisfy the predicate? A few options are listed below:

Raise an error
Loop indefinitely
Write a check function
Return the first element
$...$

Raising an error is certainly an option (see the section on Exceptions to see how to do this). The problem is that it is difficult/impossible to recover from such errors. Looping indefinitely is possible, but not terribly useful. We could write a sister function which checks to see if the list contains an element satisfying a predicate and leave it up to the user to always use this function first. We could return the first element, but this is very ad-hoc and difficult to remember; and what if the list itself is empty?

The fact that there is no basic option to solve this problem simply means we have to think about it a little more. What are we trying to do? We're trying to write a function which might succeed and might not. Furthermore, if it does succeed, it returns some sort of value. Let's write a datatype:

data Maybe a = Nothing
             | Just a

This is one of the most common datatypes in Haskell and is defined in the Prelude.

Here, we're saying that there are two possible ways to create something of type Maybe a. The first is to use the nullary constructor Nothing, which takes no arguments (this is what "nullary" means). The second is to use the constructor Just, together with a value of type a.

The Maybe type is useful in all sorts of circumstances. For instance, suppose we want to write a function (like head) which returns the first element of a given list. However, we don't want the program to die if the given list is empty. We can accomplish this with a function like:

firstElement :: [a] -> Maybe a
firstElement []     = Nothing
firstElement (x:xs) = Just x

The type signature here says that firstElement takes a list of as and produces something with type Maybe a. In the first line of code, we match against the empty list []. If this match succeeds (i.e., the list is, in fact, empty), we return Nothing. If the first match fails, then we try to match against x:xs which must succeed. In this case, we return Just x.

For our findElement function, we represent failure by the value Nothing and success with value a by Just a. Our function might look something like this:

findElement :: (a -> Bool) -> [a] -> Maybe a
findElement p [] = Nothing
findElement p (x:xs) =
    if p x then Just x
    else findElement p xs

The first line here gives the type of the function. In this case, our first argument is the predicate (and takes an element of type a and returns True if and only if the element satisfies the predicate); the second argument is a list of as. Our return value is maybe an a. That is, if the function succeeds, we will return Just a and if not, Nothing.

Another useful datatype is the Either type, defined as:

data Either a b = Left a
                | Right b

This is a way of expressing alternation. That is, something of type Either a b is either a value of type a (using the Left constructor) or a value of type b (using the Right constructor).

Exercises

Write a datatype Tuple which can hold one, two, three or four elements, depending on the constructor (that is, there should be four constructors, one for each number of arguments). Also provide functions tuple1 through tuple4 which take a tuple and return Just the value in that position, or Nothing if the number is invalid (i.e., you ask for the tuple4 on a tuple holding only two elements).
Based on our definition of Tuple from the previous exercise, write a function which takes a Tuple and returns either the value (if it's a one-tuple), a Haskell-pair (i.e., ('a',5)) if it's a two-tuple, a Haskell-triple if it's a three-tuple or a Haskell-quadruple if it's a four-tuple. Remember: it is not possible to write a function which return different types. You will need to use the Either type to represent this.

Recursive Datatypes

We can also define recursive datatypes. These are datatypes whose definitions are based on themselves. For instance, we could define a list datatype as:

data List a = Nil
            | Cons a (List a)

In this definition, we have defined what it means to be of type List a. We say that a list is either empty (Nil) or it's the Cons of a value of type a and another value of type List a. This is almost identical to the actual definition of the list datatype in Haskell, except that uses special syntax where [] corresponds to Nil and : corresponds to Cons. We can write our own length function for our lists as:

listLength Nil = 0
listLength (Cons x xs) = 1 + listLength xs

This function is slightly more complicated and uses recursion to calculate the length of a List. The first line says that the length of an empty list (a Nil) is $0$ . This much is obvious. The second line tells us how to calculate the length of a non-empty list. A non-empty list must be of the form Cons x xs for some values of x and xs. We know that xs is another list and we know that whatever the length of the current list is, it's the length of its tail (the value of xs) plus one (to account for x). Thus, we apply the listLength function to xs and add one to the result. This gives us the length of the entire list.

Exercises
Write functions `listHead`, `listTail`, `listFoldl` and `listFoldr` which are equivalent to their Prelude twins, but function on our `List` datatype. Don't worry about exceptional conditions on the first two.

Binary Trees

We can define datatypes that are more complicated than lists. Suppose we want to define a structure that looks like a binary tree. A binary tree is a structure that has a single root node; each node in the tree is either a "leaf" or a "branch." If it's a leaf, it holds a value; if it's a branch, it holds a value and a left child and a right child. Each of these children is another node. We can define such a data type as:

data BinaryTree a
    = Leaf a
    | Branch (BinaryTree a) a (BinaryTree a)

In this datatype declaration we say that a BinaryTree of as is either a Leaf which holds an a, or it's a branch with a left child (which is a BinaryTree of as), a node value (which is an a), and a right child (which is also a BinaryTree of as). It is simple to modify the listLength function so that instead of calculating the length of lists, it calculates the number of nodes in a BinaryTree. Can you figure out how? We can call this function treeSize. The solution is given below:

treeSize (Leaf x) = 1
treeSize (Branch left x right) =
  1 + treeSize left + treeSize right

Here, we say that the size of a leaf is $1$ and the size of a branch is the size of its left child, plus the size of its right child, plus one.

Exercises

Write a function elements which returns the elements in a BinaryTree in a bottom-up, left-to-right manner (i.e., the first element returned is the left-most leaf, followed by its parent's value, followed by the other child's value, and so on). The result type should be a normal Haskell list.
Write a foldr function treeFoldr for BinaryTrees and rewrite elements in terms of it (call the new one elements2). The type of treeFoldr should be (a -> b -> b) -> b -> BinaryTree a -> b.
Write a foldl function treeFoldl for BinaryTrees and rewrite elements in terms of it (call the new one elements3).

Enumerated Sets

You can also use datatypes to define things like enumerated sets, for instance, a type which can only have a constrained number of values. We could define a color type:

data Color
    = Red
    | Orange
    | Yellow
    | Green
    | Blue
    | Purple
    | White
    | Black

This would be sufficient to deal with simple colors. Suppose we were using this to write a drawing program, we could then write a function to convert between a Color and a RGB triple. We can write a colorToRGB function, as:

colorToRGB Red    = (255,0,0)
colorToRGB Orange = (255,128,0)
colorToRGB Yellow = (255,255,0)
colorToRGB Green  = (0,255,0)
colorToRGB Blue   = (0,0,255)
colorToRGB Purple = (255,0,255)
colorToRGB White  = (255,255,255)
colorToRGB Black  = (0,0,0)

If we wanted also to allow the user to define his own custom colors, we could change the Color datatype to something like:

data Color
    = Red
    | Orange
    | Yellow
    | Green
    | Blue
    | Purple
    | White
    | Black
    | Custom Int Int Int -- R G B components

And add a final definition for colorToRGB:

colorToRGB (Custom r g b) = (r,g,b)

The Unit type

A final useful datatype defined in Haskell (from the Prelude) is the unit type. Its definition is:

data () = ()

The only true value of this type is (). This is essentially the same as a void type in a language like C or Java and will be useful when we talk about IO in the chapter Io.

We'll dwell much more on data types in the sections on Pattern matching and Datatypes.

Continuation Passing Style

There is a style of functional programming called "Continuation Passing Style" (also simply "CPS"). The idea behind CPS is to pass around as a function argument what to do next. I will handwave through an example which is too complex to write out at this point and then give a real example, though one with less motivation.

Consider the problem of parsing. The idea here is that we have a sequence of tokens (words, letters, whatever) and we want to ascribe structure to them. The task of converting a string of Java tokens to a Java abstract syntax tree is an example of a parsing problem. So is the task of parsing English sentences (though the latter is extremely difficult, even for native English users parsing sentences from the real world).

Suppose we're parsing something like C or Java where functions take arguments in parentheses. But for simplicity, assume they are not separated by commas. That is, a function call looks like myFunction(x y z). We want to convert this into something like a pair containing first the string "myFunction" and then a list with three string elements: "x", "y" and "z".

The general approach to solving this would be to write a function which parses function calls like this one. First it would look for an identifier ("myFunction"), then for an open parenthesis, then for zero or more identifiers, then for a close parenthesis.

One way to do this would be to have two functions:

parseFunction   ::
    [Token] -> Maybe ((String, [String]), [Token])

parseIdentifier ::
    [Token] -> Maybe (String, [Token])

The idea would be that if we call parseFunction, if it doesn't return Nothing, then it returns the pair described earlier, together with whatever is left after parsing the function. Similarly, parseIdentifier will parse one of the arguments. If it returns Nothing, then it's not an argument; if it returns Just something, then that something is the argument paired with the rest of the tokens.

What the parseFunction function would do is to parse an identifier. If this fails, it fails itself. Otherwise, it continues and tries to parse an open parenthesis. If that succeeds, it repeatedly calls parseIdentifier until that fails. It then tries to parse a close parenthesis. If that succeeds, then it's done. Otherwise, it fails.

There is, however, another way to think about this problem. The advantage to this solution is that functions no longer need to return the remaining tokens (which tends to get ugly). Instead of the above, we write functions:

parseFunction   ::
    [Token] -> ((String, [String]) -> [Token] -> a) ->
    ([Token] -> a) -> a

parseIdentifier ::
    [Token] -> (String -> [Token] -> a) ->
    ([Token] -> a) -> a

Let's consider parseIdentifier. This takes three arguments: a list of tokens and two continuations. The first continuation is what to do when you succeed. The second continuation is what to do if you fail. What parseIdentifier does, then, is try to read an identifier. If this succeeds, it calls the first continuation with that identifier and the remaining tokens as arguments. If reading the identifier fails, it calls the second continuation with all the tokens.

Now consider parseFunction. Recall that it wants to read an identifier, an open parenthesis, zero or more identifiers and a close parenthesis. Thus, the first thing it does is call parseIdentifier. The first argument it gives is the list of tokens. The first continuation (which is what parseIdentifier should do if it succeeds) is in turn a function which will look for an open parenthesis, zero or more arguments and a close parenthesis. The second continuation (the failure argument) is just going to be the failure function given to parseFunction.

Now, we simply need to define this function which looks for an open parenthesis, zero or more arguments and a close parenthesis. This is easy. We write a function which looks for the open parenthesis and then calls parseIdentifier with a success continuation that looks for more identifiers, and a "failure" continuation which looks for the close parenthesis (note that this failure doesn't really mean failure -- it just means there are no more arguments left).

I realize this discussion has been quite abstract. I would willingly give code for all this parsing, but it is perhaps too complex at the moment. Instead, consider the problem of folding across a list. We can write a CPS fold as:

cfold' f z [] = z
cfold' f z (x:xs) = f x z (\y -> cfold' f y xs)

In this code, cfold' takes a function f which takes three arguments, slightly different from the standard folds. The first is the current list element, x, the second is the accumulated element, z, and the third is the continuation: basically, what to do next.

We can write a wrapper function for cfold' that will make it behave more like a normal fold:

cfold f z l = cfold' (\x t g -> f x (g t)) z l

We can test that this function behaves as we desire:

Example:

CPS> cfold (+) 0 [1,2,3,4]
10
CPS> cfold (:) [] [1,2,3]
[1,2,3]

One thing that's nice about formulating cfold in terms of the helper function cfold' is that we can use the helper function directly. This enables us to change, for instance, the evaluation order of the fold very easily:

Example:

CPS> cfold' (\x t g -> (x : g t)) [] [1..10]
[1,2,3,4,5,6,7,8,9,10]
CPS> cfold' (\x t g -> g (x : t)) [] [1..10]
[10,9,8,7,6,5,4,3,2,1]

The only difference between these calls to cfold' is whether we call the continuation before or after constructing the list. As it turns out, this slight difference changes the behavior for being like foldr to being like foldl. We can evaluate both of these calls as follows (let f be the folding function):

     cfold' (\x t g -> (x : g t)) [] [1,2,3]
==>  cfold' f [] [1,2,3]
==>  f 1 [] (\y -> cfold' f y [2,3])
==>  1 : ((\y -> cfold' f y [2,3]) [])
==>  1 : (cfold' f [] [2,3])
==>  1 : (f 2 [] (\y -> cfold' f y [3]))
==>  1 : (2 : ((\y -> cfold' f y [3]) []))
==>  1 : (2 : (cfold' f [] [3]))
==>  1 : (2 : (f 3 [] (\y -> cfold' f y [])))
==>  1 : (2 : (3 : (cfold' f [] [])))
==>  1 : (2 : (3 : []))
==>  [1,2,3]

     cfold' (\x t g -> g (x:t)) [] [1,2,3]
==>  cfold' f [] [1,2,3]
==>  (\x t g -> g (x:t)) 1 [] (\y -> cfold' f y [2,3])
==>  (\g -> g [1]) (\y -> cfold' f y [2,3])
==>  (\y -> cfold' f y [2,3]) [1]
==>  cfold' f [1] [2,3]
==>  (\x t g -> g (x:t)) 2 [1] (\y -> cfold' f y [3])
==>  cfold' f (2:[1]) [3]
==>  cfold' f [2,1] [3]
==>  (\x t g -> g (x:t)) 3 [2,1] (\y -> cfold' f y [])
==>  cfold' f (3:[2,1]) []
==>  [3,2,1]

In general, continuation passing style is a very powerful abstraction, though it can be difficult to master. We will revisit the topic more thoroughly later in the book.

Exercises
Test whether the CPS-style fold mimics either of `foldr` and `foldl`. If not, where is the difference? Write `map` and `filter` using continuation passing style.