Yet Another Haskell Tutorial/Type basics
Haskell uses a system of static type checking. This means that every expression in Haskell is assigned a type. For instance 'a'
would have type Char
, for "character." Then, if you have a function which expects an argument of a certain type and you give it the wrong type, a compile-time error will be generated (that is, you will not be able to compile the program). This vastly reduces the number of bugs that can creep into your program.
Furthermore, Haskell uses a system of type inference. This means that you don't even need to specify the type of expressions. For comparison, in C, when you define a variable, you need to specify its type (for instance, int, char, etc.). In Haskell, you needn't do this -- the type will be inferred from context.
Note
If you want, you certainly are allowed to explicitly specify the type of an expression; this often helps debugging. In fact, it is sometimes considered good style to explicitly specify the types of outermost functions.
Both Hugs and GHCi allow you to apply type inference to an expression to find its type. This is done by using the :t
command. For instance, start up your favorite shell and try the following:
Example:
Prelude> :t 'c' 'c' :: Char
This tells us that the expression 'c'
has type Char
(the double colon ::
is used throughout Haskell to specify types).
Simple Types
editThere are a slew of built-in types, including Int
(for integers, both positive and negative), Double
(for floating point numbers), Char
(for single characters), String
(for strings), and others. We have already seen an expression of type Char
; let's examine one of type String
:
Example:
Prelude> :t "Hello" "Hello" :: String
You can also enter more complicated expressions, for instance, a test of equality:
Example:
Prelude> :t 'a' == 'b' 'a' == 'b' :: Bool
You should note that even though this expression is false, it still has a type, namely the type Bool
.
Note
Bool
is short for Boolean (pronounced "boo-lee-uhn") and has two possible values: True
and False
.
You can observe the process of type checking and type inference by trying to get the shell to give you the type of an ill-typed expression. For instance, the equality operator requires that the type of both of its arguments are of the same type. We can see that Char
and String
are of different types by trying to compare a character to a string:
Example:
Prelude> :t 'a' == "a" ERROR - Type error in application *** Expression : 'a' == "a" *** Term : 'a' *** Type : Char *** Does not match : [Char]
The first line of the error (the line containing "Expression
") tells us the expression in which the type error occurred. The second line tells us which part of this expression is ill-typed. The third line tells us the inferred type of this term and the fourth line tells us what it needs to have matched. In this case, it says that type Char
doesn't match the type [Char]
(a list of characters -- a string in Haskell is represented as a list of characters).
As mentioned before, you can explicitly specify the type of an expression using the :: operator. For instance, instead of "a"
in the previous example, we could have written ("a"::String)
. In this case, this has no effect since there's only one possible interpretation of "a"
. However, consider the case of numbers. You can try:
Example:
Prelude> :t 5 :: Int 5 :: Int Prelude> :t 5 :: Double 5 :: Double
Here, we can see that the number 5 can be instantiated as either an Int
or a Double
. What if we don't specify the type?
Example:
Prelude> :t 5 5 :: Num a => a
Not quite what you expected? What this means, briefly, is that if some type a
is an instance of the Num
class, then type of the expression 5
can be of type a
. If that made no sense, that's okay for now. In Section Classes we talk extensively about type classes (which is what this is). The way to read this, though, is to say "a being an instance of Num implies a."
Exercises |
---|
Figure out for yourself, and then verify the types of the following expressions, if they have a type. Also note if the expression is a type error:
|
Polymorphic Types
editHaskell employs a polymorphic type system. This essentially means that you can have type variables, which we have alluded to before. For instance, note that a function like tail
doesn't care what the elements in the list are:
Example:
Prelude> tail [5,6,7,8,9] [6,7,8,9] Prelude> tail "hello" "ello" Prelude> tail ["the","man","is","happy"] ["man","is","happy"]
This is possible because tail
has a polymorphic type: [
]
-> [
]
. That means it can take as an argument any list and return a value which is a list of the same type.
The same analysis can explain the type of fst
:
Example:
Prelude> :t fst fst :: (a,b) -> a
Here, GHCi has made explicit the universal quantification of the type values. That is, it is saying that for all types a
and b
, fst
is a function from (a,b)
to a
.
Exercises |
---|
Figure out for yourself, and then verify the types of the following expressions, if they have a type. Also note if the expression is a type error:
|
Type Classes
editWe saw last section some strange typing having to do with the number five. Before we delve too deeply into the subject of type classes, let's take a step back and see some of the motivation.
Motivation
editIn many languages (C++, Java, etc.), there exists a system of overloading. That is, a function can be written that takes parameters of differing types. For instance, the canonical example is the equality function. If we want to compare two integers, we should use an integer comparison; if we want to compare two floating point numbers, we should use a floating point comparison; if we want to compare two characters, we should use a character comparison. In general, if we want to compare two things which have type , we want to use an . We call a type variable since it is a variable whose value is a type.
Note
In general, type variables will be written using the first part of the Greek alphabet: .
Unfortunately, this presents some problems for static type checking, since the type checker doesn't know which types a certain operation (for instance, equality testing) will be defined for. There are as many solutions to this problem as there are statically typed languages (perhaps a slight exaggeration, but not so much so). The one chosen in Haskell is the system of type classes. Whether this is the "correct" solution or the "best" solution of course depends on your application domain. It is, however, the one we have, so you should learn to love it.
Equality Testing
editReturning to the issue of equality testing, what we want to be able to do is define a function ==
(the equality operator) which takes two parameters, each of the same type (call it ), and returns a boolean. But this function may not be defined for every type; just for some. Thus, we associate this function ==
with a type class, which we call Eq. If a specific type belongs to a certain type class (that is, all functions associated with that class are implemented for ), we say that is an instance of that class. For instance, Int is an instance of Eq since equality is defined over integers.
The Num Class
editIn addition to overloading operators like ==
, Haskell has
overloaded numeric constants (i.e., 1, 2, 3, etc.). This was done so that when you type in a number like 5, the compiler is free to say 5 is an integer or floating point number as it sees fit. It defines the Num class to contain all of these numbers and certain minimal operations over them (addition, for instance). The basic numeric types (Int, Double) are defined to be instances of Num.
We have only skimmed the surface of the power (and complexity) of type classes here. There will be much more discussion of them in Section Classes, but we need some more background before we can get there. Before we do that, we need to talk a little more about functions.
The Show Class
editAnother of the standard classes in Haskell is the Show
class. Types which are members of the Show
class have functions which convert values of that type to a string. This function is called show
. For instance show
applied to the integer 5 is the string "5"; show
applied to the character 'a' is the three-character string "'a'" (the first and last characters are apostrophes). show
applied to a string simply puts quotes around it. You can test this in the interpreter:
Example:
Prelude> show 5 "5" Prelude> show 'a' "'a'" Prelude> show "Hello World" "\"Hello World\""
Note
The reason the backslashes appear in the last line is because the interior quotes are "escaped", meaning that they are part of the string, not part of the interpreter printing the value. The actual string doesn't contain the backslashes.
Some types are not instances of Show; functions for example. If you try to show a function (like sqrt
), the compiler or interpreter will give you some cryptic error message, complaining about a missing instance declaration or an illegal class constraint.
Function Types
editIn Haskell, functions are first class values, meaning that just as 1
or 'c'
are values which have a type, so are functions like square
or ++
. Before we talk too much about functions, we need to make a short diversion into very theoretical computer science (don't worry, it won't be too painful) and talk about the lambda calculus.
Lambda Calculus
editThe name "Lambda Calculus", while perhaps daunting, describes a fairly simple system for representing functions. The way we would write a squaring function in lambda calculus is: , which means that we take a value, which we will call (that's what means) and then multiply it by itself. The is called "lambda abstraction." In general, lambdas can only have one parameter. If we want to write a function that takes two numbers, doubles the first and adds it to the second, we would write: . When we apply a value to a lambda expression, we remove the outermost and replace every occurrence of the lambda variable with the value. For instance, if we evaluate , we remove the lambda and replace every occurrence of with , yielding which is .
In fact, Haskell is largely based on an extension of the lambda calculus, and these two expressions can be written directly in Haskell (we simply replace the with a backslash(\) and the with (->); also we don't need to repeat the lambdas; and, of course, in Haskell we have to give them names if we're defining functions):
square = \x -> x*x f = \x y -> 2*x + y
You can also evaluate lambda expressions in your interactive shell:
Example:
Prelude> (\x -> x*x) 5 25 Prelude> (\x y -> 2*x + y) 5 4 14
We can see in the second example that we need to give the lambda abstraction two arguments, one corresponding to x
and the other corresponding to y
.
Higher-Order Types
edit"Higher-Order Types" is the name given to those types whose elements are functions. The type given to functions mimicks the lambda calculus representation of the functions. For instance, the definition of square gives . To get the type of this, we first ask ourselves what the type of x
is. Say we decide x
is an Int
. Then, we notice that the function square
takes an Int
and produces a value x*x
. We know that when we multiply two Int
s together, we get another Int
, so the type of the results of square
is also an Int
. Thus, we say the type of square
is Int -> Int
.
We can apply a similar analysis to the function f
above(\x y -> 2*x + y). The value of this function (remember, functions are values) is something which takes a value x
and produces a new value, which takes a value y
and produces 2*x+y
. For instance, if we take f
and apply only one number to it, we get which becomes our new value , where all occurrences of have been replaced with the applied value, .
So we know that f
takes an Int
and produces a value of some type, of which we're not sure. But we know the type of this value is the type of . We apply the above analysis and find out that this expression has type Int -> Int
. Thus, f
takes an Int
and produces something which has type Int -> Int
. So the type of f
is Int -> (Int -> Int)
.
Note
The parentheses are not necessary; in function types, if you have it is assumed that is grouped. If you want the other way, with grouped, you need to put parentheses around them.
This isn't entirely accurate. As we saw before, numbers like 5
aren't really of type Int
, they are of type Num a => a
.
We can easily find the type of Prelude functions using ":t" as before:
Example:
Prelude> :t head head :: [a] -> a Prelude> :t tail tail :: [a] -> [a] Prelude> :t null null :: [a] -> Bool Prelude> :t fst fst :: (a,b) -> a Prelude> :t snd snd :: (a,b) -> b
We read this as: "head" is a function that takes a list containing values of type "a" and gives back a value of type "a"; "tail" takes a list of "a"s and gives back another list of "a"s; "null" takes a list of "a"s and gives back a boolean; "fst" takes a pair of type "(a,b)" and gives back something of type "a", and so on.
Note
Saying that the type of fst
is (a,b) -> a
does not necessarily mean that it simply gives back the first element; it only means that it gives back something with the same type as the first element.
We can also get the type of operators like +
and *
and ++
and :
; however, in order to do this we need to put them in parentheses. In general, any function which is used infix (meaning in the middle of two arguments rather than before them) must be put in parentheses when getting its type.
Example:
Prelude> :t (+) (+) :: Num a => a -> a -> a Prelude> :t (*) (*) :: Num a => a -> a -> a Prelude> :t (++) (++) :: [a] -> [a] -> [a] Prelude> :t (:) (:) :: a -> [a] -> [a]
The types of +
and *
are the same, and mean that +
is a function which, for some type a
which is an instance of Num
, takes a value of type a
and produces another function which takes a value of type a
and produces a value of type a
. In short hand, we might say that +
takes two values of type a
and produces a value of type a
, but this is less precise.
The type of ++
means, in shorthand, that, for a given type a
, ++
takes two lists of a
s and produces a new list of a
s. Similarly, :
takes a value of type a
and another value of type [a]
(list of a
s) and produces another value of type [a]
.
That Pesky IO Type
editYou might be tempted to try getting the type of a function like putStrLn
:
Example:
Prelude> :t putStrLn putStrLn :: String -> IO () Prelude> :t readFile readFile :: FilePath -> IO String
What in the world is that IO
thing? It's basically Haskell's way of representing that these functions aren't really functions. They're called "IO Actions" (hence the IO
). The immediate question which arises is: okay, so how do I get rid of the IO
. Well, you can't maintain the pure functionality of your code while using a function with a type such as IO String -> String. Haskell's pure functions always give the same result when applied to the same argument, just as in mathematics. readFile
, on the other hand, yields whatever the file content happens to be when it is called. Suppose x = "TABLE OF CONTENTS"
at one point in an ongoing Haskell process, and later x = "TABLE OF CONTENTS {new line} "Chapter One"
. That wouldn't be a problem in languages with mutable variables but in Haskell, once a value has been assigned to x, mutating that value could result in bizarre behavior prior to a system crash. So, in almost every use case you can imagine, the way to use things with an IO
type is to isolate them from pure parts of the program by combining them with other functions.
For example, if you're reading a file using readFile
, presumably you want to do something with the string it returns (otherwise, why would you read the file in the first place). Suppose you have a function f
which takes a String
and produces an Int
. You can't directly apply f
to the result of readFile
since the input to f
is String
and the output of readFile
is IO String
and these don't match. However, you can combine these as:
main = do s <- readFile "somefile" let i = f s putStrLn (show i)
Here, we use the arrow convention to "get the string out of the IO action" and then apply f
to the string (called s
). We then, for example, print i
to the screen. Note that the let here doesn't have a corresponding in. This is because we are in a do block. Also note that we don't write i <- f s
because f
is just a normal function, not an IO action. Note: putStrLn (show i)
can be simplified to print i
if you want.
Explicit Type Declarations
editIt is sometimes desirable to explicitly specify the types of some elements or functions, for one (or more) of the following reasons:
- Clarity
- Speed
- Debugging
Some people consider it good software engineering to specify the types of all top-level functions. If nothing else, if you're trying to compile a program and you get type errors that you cannot understand, if you declare the types of some of your functions explicitly, it may be easier to figure out where the error is.
Type declarations are written separately from the function definition. For instance, we could explicitly type the function square
as in the following code (an explicitly declared type is called a type signature):
square :: Num a => a -> a square x = x*x
These two lines do not even have to be next to each other. However, the type that you specify must match the inferred type of the function definition (or be more specific). In this definition, you could apply square
to anything which is an instance of Num
: Int
, Double
, etc. However, if you knew apriori that square
were only going to be applied to value of type Int
, you could refine its type as:
square :: Int -> Int square x = x*x
Now, you could only apply square
to values of type Int
. Moreover, with this definition, the compiler doesn't have to generate the general code specified in the original function definition since it knows you will only apply square
to Int
s, so it may be able to generate faster code.
If you have extensions turned on ("-98" in Hugs or "-fglasgow-exts" in GHC(i)), you can also add a type signature to expressions and not just functions. For instance, you could write:
square (x :: Int) = x*x
which tells the compiler that x
is an Int
; however, it leaves the compiler alone to infer the type of the rest of the expression. What is the type of square
in this example? Make your guess then you can check it either by entering this code into a file and loading it into your interpreter or by asking for the type of the expression:
Example:
Prelude> :t (\(x :: Int) -> x*x)
since this lambda abstraction is equivalent to the above function declaration.
Functional Arguments
editIn the section on Lists we saw examples of functions taking other functions as arguments. For instance, map
took a function to apply to each element in a list, filter
took a function that told it which elements of a list to keep, and foldl
took a function which told it how to combine list elements together. As with every other function in Haskell, these are well-typed.
Let's first think about the map
function. Its job is to take a list of elements and produce another list of elements. These two lists don't necessarily have to have the same types of elements. So map
will take a value of type [a]
and produce a value of type [b]
. How does it do this? It uses the user-supplied function to convert. In order to convert an a
to a b
, this function must have type a -> b
. Thus, the type of map
is (a -> b) -> [a] -> [b]
, which you can verify in your interpreter with ":t".
We can apply the same sort of analysis to filter
and discern that it has type (a -> Bool) -> [a] -> [a]
. As we presented the foldr
function, you might be tempted to give it type (a -> a -> a) -> a -> [a] -> a
, meaning that you take a function which combines two a
s into another one, an initial value of type a
, a list of a
s to produce a final value of type a
. In fact, foldr
has a more general type: (a -> b -> b) -> b -> [a] -> b
. So it takes a function which turn an a
and a b
into a b
, an initial value of type b
and a list of a
s. It produces a b
.
To see this, we can write a function count
which counts how many members of a list satisfy a given constraint. You can of course use filter
and length
to do this, but we will also do it using foldr
:
module Count where import Char count1 bTest list = length (filter bTest list) count2 bTest list = foldr (\x cnt -> if bTest x then cnt+1 else cnt) 0 list
The functioning of count1
is simple. It filters the list list
according to the predicate bTest
, then takes the length of the resulting list. On the other hand, count2
uses the initial value (which is an integer) to hold the current count. For each element in the list list
, it applies the lambda expression shown. This takes two arguments, cnt
which holds the current count and x
which is the current element in the list that we're looking at. It checks to see if bTest
holds about x
. If it does, it returns the new value cnt+1
, increasing the count of elements for which the predicate holds. If it doesn't, it just returns cnt
, the old count. With lambda
defined as above, count2 Char.isLower "aBCde"
expands to lambda 'a' (lambda 'B' (lambda 'C' (lambda 'd' (lambda 'e' 0))))
.
Exercises |
---|
Figure out for yourself, and then verify the types of the following expressions, if they have a type. Also note if the expression is a type error:
|
Data Types
editTuples and lists are nice, common ways to define structured values. However, it is often desirable to be able to define our own data structures and functions over them. So-called "datatypes" are defined using the data keyword.
Pairs
editFor instance, a definition of a pair of elements (much like the standard, built-in pair type) could be:
data Pair a b = Pair a b
Let's walk through this code one word at a time. First we say "data" meaning that we're defining a datatype. We then give the name of the datatype, in this case, "Pair." The "a" and "b" that follow "Pair" are unique type parameters, just like the "a" is the type of the function map
. So up until this point, we've said that we're going to define a data structure called "Pair" which is parameterized over two types, a
and b
. Note that you can't have Pair a a = Pair a a
— in this case write Pair a = Pair a a
.
After the equals sign, we specify the constructors of this data type. In this case, there is a single constructor, "Pair" (this doesn't necessarily have to have the same name as the type, but in the case of a single constructor it seems to make more sense). After this pair, we again write "a b", which means that in order to construct a Pair
we need two values, one of type a
and one of type b
.
This definition introduces a function, Pair :: a -> b -> Pair a b
that you can use to construct Pair
s. If you enter this code into a file and load it, you can see how these are constructed:
Example:
Datatypes> :t Pair Pair :: a -> b -> Pair a b Datatypes> :t Pair 'a' Pair 'a' :: a -> Pair Char a Datatypes> :t Pair 'a' "Hello" Pair 'a' "Hello" :: Pair Char [Char]
So, by giving Pair
two values, we have completely constructed a value of type Pair
. We can write functions involving pairs as:
pairFst (Pair x y) = x pairSnd (Pair x y) = y
In this, we've used the pattern matching capabilities of Haskell to look at a pair and extract values from it. In the definition of pairFst
we take an entire Pair
and extract the first element; similarly for pairSnd
. We'll discuss pattern matching in much more detail in the section on Pattern matching.
Exercises |
---|
|
Multiple Constructors
editWe have seen an example of the data type with one constructor:
Pair
. It is also possible (and extremely useful) to have
multiple constructors.
Let us consider a simple function which searches through a list for an element satisfying a given predicate and then returns the first element satisfying that predicate. What should we do if none of the elements in the list satisfy the predicate? A few options are listed below:
- Raise an error
- Loop indefinitely
- Write a check function
- Return the first element
Raising an error is certainly an option (see the section on Exceptions to see how to do this). The problem is that it is difficult/impossible to recover from such errors. Looping indefinitely is possible, but not terribly useful. We could write a sister function which checks to see if the list contains an element satisfying a predicate and leave it up to the user to always use this function first. We could return the first element, but this is very ad-hoc and difficult to remember; and what if the list itself is empty?
The fact that there is no basic option to solve this problem simply means we have to think about it a little more. What are we trying to do? We're trying to write a function which might succeed and might not. Furthermore, if it does succeed, it returns some sort of value. Let's write a datatype:
data Maybe a = Nothing | Just a
This is one of the most common datatypes in Haskell and is defined in the Prelude.
Here, we're saying that there are two possible ways to create
something of type Maybe a
. The first is to use the nullary
constructor Nothing
, which takes no arguments (this is what
"nullary" means). The second is to use the constructor Just
,
together with a value of type a
.
The Maybe
type is useful in all sorts of circumstances. For
instance, suppose we want to write a function (like head
) which
returns the first element of a given list. However, we don't want the
program to die if the given list is empty. We can accomplish this
with a function like:
firstElement :: [a] -> Maybe a firstElement [] = Nothing firstElement (x:xs) = Just x
The type signature here says that firstElement
takes a list of
a
s and produces something with type Maybe a
. In the first
line of code, we match against the empty list []
. If this match
succeeds (i.e., the list is, in fact, empty), we return Nothing
.
If the first match fails, then we try to match against x:xs
which
must succeed. In this case, we return Just x
.
For our findElement
function, we represent failure by the value
Nothing
and success with value a
by Just a
. Our
function might look something like this:
findElement :: (a -> Bool) -> [a] -> Maybe a findElement p [] = Nothing findElement p (x:xs) = if p x then Just x else findElement p xs
The first line here gives the type of the function. In this case, our
first argument is the predicate (and takes an element of type a
and returns True
if and only if the element satisfies the
predicate); the second argument is a list of a
s. Our return
value is maybe an a
. That is, if the function succeeds,
we will return Just a
and if not, Nothing
.
Another useful datatype is the Either
type, defined as:
data Either a b = Left a | Right b
This is a way of expressing alternation. That is, something of type
Either a b
is either a value of type a
(using the
Left
constructor) or a value of type b
(using the Right
constructor).
Exercises |
---|
|
Recursive Datatypes
editWe can also define recursive datatypes. These are datatypes whose definitions are based on themselves. For instance, we could define a list datatype as:
data List a = Nil | Cons a (List a)
In this definition, we have defined what it means to be of type
List a
. We say that a list is either empty (Nil
) or it's
the Cons
of a value of type a
and another value of type
List a
. This is almost identical to the actual definition of the
list datatype in Haskell, except that uses special syntax where
[]
corresponds to Nil
and :
corresponds to Cons
.
We can write our own length
function for our lists as:
listLength Nil = 0 listLength (Cons x xs) = 1 + listLength xs
This function is slightly more complicated and uses recursion
to calculate the length of a List
. The first line says that the
length of an empty list (a Nil
) is . This much is obvious.
The second line tells us how to calculate the length of a non-empty
list. A non-empty list must be of the form Cons x xs
for some
values of x
and xs
. We know that xs
is another list
and we know that whatever the length of the current list is, it's the
length of its tail (the value of xs
) plus one (to account for
x
). Thus, we apply the listLength
function to xs
and
add one to the result. This gives us the length of the entire list.
Exercises |
---|
Write functions List datatype. Don't worry about exceptional conditions on the first two. |
Binary Trees
editWe can define datatypes that are more complicated than lists. Suppose we want to define a structure that looks like a binary tree. A binary tree is a structure that has a single root node; each node in the tree is either a "leaf" or a "branch." If it's a leaf, it holds a value; if it's a branch, it holds a value and a left child and a right child. Each of these children is another node. We can define such a data type as:
data BinaryTree a = Leaf a | Branch (BinaryTree a) a (BinaryTree a)
In this datatype declaration we say that a BinaryTree
of a
s
is either a Leaf
which holds an a
, or it's a branch with a
left child (which is a BinaryTree
of a
s), a node value
(which is an a
), and a right child (which is also a
BinaryTree
of a
s). It is simple to modify the
listLength
function so that instead of calculating the length of
lists, it calculates the number of nodes in a BinaryTree
. Can
you figure out how? We can call this function treeSize
. The
solution is given below:
treeSize (Leaf x) = 1 treeSize (Branch left x right) = 1 + treeSize left + treeSize right
Here, we say that the size of a leaf is and the size of a branch is the size of its left child, plus the size of its right child, plus one.
Exercises |
---|
|
Enumerated Sets
editYou can also use datatypes to define things like enumerated sets, for instance, a type which can only have a constrained number of values. We could define a color type:
data Color = Red | Orange | Yellow | Green | Blue | Purple | White | Black
This would be sufficient to deal with simple colors. Suppose we were
using this to write a drawing program, we could then write a function
to convert between a Color
and a RGB triple. We can write a
colorToRGB
function, as:
colorToRGB Red = (255,0,0) colorToRGB Orange = (255,128,0) colorToRGB Yellow = (255,255,0) colorToRGB Green = (0,255,0) colorToRGB Blue = (0,0,255) colorToRGB Purple = (255,0,255) colorToRGB White = (255,255,255) colorToRGB Black = (0,0,0)
If we wanted also to allow the user to define his own custom colors,
we could change the Color
datatype to something like:
data Color = Red | Orange | Yellow | Green | Blue | Purple | White | Black | Custom Int Int Int -- R G B components
And add a final definition for colorToRGB
:
colorToRGB (Custom r g b) = (r,g,b)
The Unit type
editA final useful datatype defined in Haskell (from the Prelude) is the unit type. Its definition is:
data () = ()
The only true value of this type is ()
. This is essentially the
same as a void type in a language like C or Java and will be
useful when we talk about IO in the chapter Io.
We'll dwell much more on data types in the sections on Pattern matching and Datatypes.
Continuation Passing Style
editThere is a style of functional programming called "Continuation Passing Style" (also simply "CPS"). The idea behind CPS is to pass around as a function argument what to do next. I will handwave through an example which is too complex to write out at this point and then give a real example, though one with less motivation.
Consider the problem of parsing. The idea here is that we have a sequence of tokens (words, letters, whatever) and we want to ascribe structure to them. The task of converting a string of Java tokens to a Java abstract syntax tree is an example of a parsing problem. So is the task of parsing English sentences (though the latter is extremely difficult, even for native English users parsing sentences from the real world).
Suppose we're parsing something like C or Java where functions take arguments in parentheses. But for simplicity, assume they are not separated by commas. That is, a function call looks like myFunction(x y z). We want to convert this into something like a pair containing first the string "myFunction" and then a list with three string elements: "x", "y" and "z".
The general approach to solving this would be to write a function which parses function calls like this one. First it would look for an identifier ("myFunction"), then for an open parenthesis, then for zero or more identifiers, then for a close parenthesis.
One way to do this would be to have two functions:
parseFunction :: [Token] -> Maybe ((String, [String]), [Token]) parseIdentifier :: [Token] -> Maybe (String, [Token])
The idea would be that if we call parseFunction
, if it doesn't
return Nothing
, then it returns the pair described earlier,
together with whatever is left after parsing the function. Similarly,
parseIdentifier
will parse one of the arguments. If it returns
Nothing
, then it's not an argument; if it returns Just
something, then that something is the argument paired with the rest of
the tokens.
What the parseFunction
function would do is to parse an
identifier. If this fails, it fails itself. Otherwise, it continues
and tries to parse an open parenthesis. If that succeeds, it
repeatedly calls parseIdentifier
until that fails. It then tries
to parse a close parenthesis. If that succeeds, then it's done.
Otherwise, it fails.
There is, however, another way to think about this problem. The advantage to this solution is that functions no longer need to return the remaining tokens (which tends to get ugly). Instead of the above, we write functions:
parseFunction :: [Token] -> ((String, [String]) -> [Token] -> a) -> ([Token] -> a) -> a parseIdentifier :: [Token] -> (String -> [Token] -> a) -> ([Token] -> a) -> a
Let's consider parseIdentifier
. This takes three arguments: a
list of tokens and two continuations. The first continuation
is what to do when you succeed. The second continuation is what to do
if you fail. What parseIdentifier
does, then, is try to read an
identifier. If this succeeds, it calls the first continuation with
that identifier and the remaining tokens as arguments. If reading the
identifier fails, it calls the second continuation with all the
tokens.
Now consider parseFunction
. Recall that it wants to read an
identifier, an open parenthesis, zero or more identifiers and a close
parenthesis. Thus, the first thing it does is call
parseIdentifier
. The first argument it gives is the list of
tokens. The first continuation (which is what parseIdentifier
should do if it succeeds) is in turn a function which will look for an
open parenthesis, zero or more arguments and a close parenthesis. The
second continuation (the failure argument) is just going to be the failure
function given to parseFunction
.
Now, we simply need to define this function which looks for an
open parenthesis, zero or more arguments and a close parenthesis. This
is easy. We write a function which looks for the open parenthesis and
then calls parseIdentifier
with a success continuation that looks
for more identifiers, and a "failure" continuation which looks for
the close parenthesis (note that this failure doesn't really mean
failure -- it just means there are no more arguments left).
I realize this discussion has been quite abstract. I would willingly give code for all this parsing, but it is perhaps too complex at the moment. Instead, consider the problem of folding across a list. We can write a CPS fold as:
cfold' f z [] = z cfold' f z (x:xs) = f x z (\y -> cfold' f y xs)
In this code, cfold'
takes a function f
which takes three
arguments, slightly different from the standard folds. The first is
the current list element, x
, the second is the accumulated
element, z
, and the third is the continuation: basically, what
to do next.
We can write a wrapper function for cfold'
that will make it
behave more like a normal fold:
cfold f z l = cfold' (\x t g -> f x (g t)) z l
We can test that this function behaves as we desire:
Example:
CPS> cfold (+) 0 [1,2,3,4] 10 CPS> cfold (:) [] [1,2,3] [1,2,3]
One thing that's nice about formulating cfold
in terms of the
helper function cfold'
is that we can use the helper function
directly. This enables us to change, for instance, the evaluation
order of the fold very easily:
Example:
CPS> cfold' (\x t g -> (x : g t)) [] [1..10] [1,2,3,4,5,6,7,8,9,10] CPS> cfold' (\x t g -> g (x : t)) [] [1..10] [10,9,8,7,6,5,4,3,2,1]
The only difference between these calls to cfold'
is whether we
call the continuation before or after constructing the list. As it
turns out, this slight difference changes the behavior for being like
foldr
to being like foldl
. We can evaluate both of these
calls as follows (let f
be the folding function):
cfold' (\x t g -> (x : g t)) [] [1,2,3] ==> cfold' f [] [1,2,3] ==> f 1 [] (\y -> cfold' f y [2,3]) ==> 1 : ((\y -> cfold' f y [2,3]) []) ==> 1 : (cfold' f [] [2,3]) ==> 1 : (f 2 [] (\y -> cfold' f y [3])) ==> 1 : (2 : ((\y -> cfold' f y [3]) [])) ==> 1 : (2 : (cfold' f [] [3])) ==> 1 : (2 : (f 3 [] (\y -> cfold' f y []))) ==> 1 : (2 : (3 : (cfold' f [] []))) ==> 1 : (2 : (3 : [])) ==> [1,2,3] cfold' (\x t g -> g (x:t)) [] [1,2,3] ==> cfold' f [] [1,2,3] ==> (\x t g -> g (x:t)) 1 [] (\y -> cfold' f y [2,3]) ==> (\g -> g [1]) (\y -> cfold' f y [2,3]) ==> (\y -> cfold' f y [2,3]) [1] ==> cfold' f [1] [2,3] ==> (\x t g -> g (x:t)) 2 [1] (\y -> cfold' f y [3]) ==> cfold' f (2:[1]) [3] ==> cfold' f [2,1] [3] ==> (\x t g -> g (x:t)) 3 [2,1] (\y -> cfold' f y []) ==> cfold' f (3:[2,1]) [] ==> [3,2,1]
In general, continuation passing style is a very powerful abstraction, though it can be difficult to master. We will revisit the topic more thoroughly later in the book.
Exercises |
---|
|