Yet Another Haskell Tutorial/Language advanced
Sections and Infix Operators
editWe've already seen how to double the values of elements in a list
using map
:
Example:
Prelude> map (\x -> x*2) [1,2,3,4] [2,4,6,8]
However, there is a more concise way to write this:
Example:
Prelude> map (*2) [1,2,3,4] [2,4,6,8]
This type of thing can be done for any infix function:
Example:
Prelude> map (+5) [1,2,3,4] [6,7,8,9] Prelude> map (/2) [1,2,3,4] [0.5,1.0,1.5,2.0] Prelude> map (2/) [1,2,3,4] [2.0,1.0,0.666667,0.5]
You might be tempted to try to subtract values from elements in a list
by mapping -2
across a list. This won't work, though, because
while the +
in +2
is parsed as the standard plus operator
(as there is no ambiguity), the -
in -2
is interpreted as
the unary minus, not the binary minus. Thus -2
here is the
number , not the function .
In general, these are called sections. For binary infix operators
(like +
), we can cause the function to become prefix by enclosing
it in parentheses. For example:
Example:
Prelude> (+) 5 3 8 Prelude> (-) 5 3 2
Additionally, we can provide either of its argument to make a section. For example:
Example:
Prelude> (+5) 3 8 Prelude> (/3) 6 2.0 Prelude> (3/) 6 0.5
Non-infix functions can be made infix by enclosing them in backquotes ("\`"). For example:
Example:
Prelude> (+2) `map` [1..10] [3,4,5,6,7,8,9,10,11,12] Prelude> (`map` [1..10]) (+2) [3,4,5,6,7,8,9,10,11,12]
Local Declarations
editRecall back from the section on Functions, there are many computations which require using the result of the same computation in multiple places in a function. There, we considered the function for computing the roots of a quadratic polynomial:
roots a b c = ((-b + sqrt(b*b - 4*a*c)) / (2*a), (-b - sqrt(b*b - 4*a*c)) / (2*a))
In addition to the let bindings introduced
there, we can do this using a where clause. where
clauses
come immediately after function definitions and introduce a new level
of layout (see the section on Layout). We write this
as:
roots a b c = ((-b + det) / (2*a), (-b - det) / (2*a)) where det = sqrt(b*b-4*a*c)
Any values defined in a where clause shadow any other values with the same name. For instance, if we had the following code block:
det = "Hello World" roots a b c = ((-b + det) / (2*a), (-b - det) / (2*a)) where det = sqrt(b*b-4*a*c) f _ = det
The value of roots
doesn't notice the top-level declaration of
det
, since it is shadowed by the local definition (the fact that
the types don't match doesn't matter either). Furthermore, since
f
cannot "see inside" of roots
, the only thing it knows
about det
is what is available at the top level, which is the
string "Hello World."
We could also pull out the
2*a
computation and get the following code:
roots a b c = ((-b + det) / (a2), (-b - det) / (a2)) where det = sqrt(b*b-4*a*c) a2 = 2*a
Sub-expressions in where clauses must come after function
definitions. Sometimes it is more convenient to put the local
definitions before the actual expression of the function. This can be
done by using let/in clauses. We have already seen let
clauses; where clauses are virtually identical to their let
clause cousins except for their placement. The same roots
function can be written using let as:
roots a b c = let det = sqrt (b*b - 4*a*c) a2 = 2*a in ((-b + det) / a2, (-b - det) / a2)
Using a where clause, it looks like:
roots a b c = ((-b + det) / a2, (-b - det) / a2) where det = sqrt (b*b - 4*a*c) a2 = 2*a
These two types of clauses can be mixed (i.e., you can write a function which has both a let cause and a where clause). This is strongly advised against, as it tends to make code difficult to read. However, if you choose to do it, values in the let clause shadow those in the where clause. So if you define the function:
f x = let y = x+1 in y where y = x+2
The value of f 5
is 6
, not 7
. Of course, I plead with
you to never ever write code that looks like this. No one should have
to remember this rule and by shadowing where-defined values in a
let clause only makes your code difficult to understand.
In general, whether you should use let clauses or where clauses is largely a matter of personal preference. Usually, the names you give to the subexpressions should be sufficiently expressive that without reading their definitions any reader of your code should be able to figure out what they do. In this case, where clauses are probably more desirable because they allow the reader to see immediately what a function does. However, in real life, values are often given cryptic names.
Partial Application
editPartial application is when you take a function which takes
arguments and you supply it with of them. When discussing
Sections, we saw a form of
"partial application" in which functions like +
were partially
applied. For instance, in the expression map (+1) [1,2,3]
, the
section (+1)
is a partial application of +
. This is because
+
really takes two arguments, but we've only given it one.
Partial application is very common in function definitions and
sometimes goes by the name "η (eta) reduction". For instance, suppose we are
writing a function lcaseString
which converts a whole string
into lower case. We could write this as:
lcaseString s = map toLower s
Here, there is no partial application (though you could argue that
applying no arguments to toLower
could be considered partial
application). However, we notice that the application of s
occurs at the end of both lcaseString
and of map toLower
.
In fact, we can remove it by performing eta reduction, to get:
lcaseString = map toLower
Now, we have a partial application of map
: it expects a function
and a list, but we've only given it the function.
This all is related to the type of map
, which is (a -> b)
-> ([a] -> [b])
, when all parentheses are included. In our case,
toLower
is of type Char -> Char
. Thus, if we supply this
function to map
, we get a function of type [Char] ->
[Char]
, as desired.
Now, consider the task of converting a string to lowercase and removing all non letter characters. We might write this as:
lcaseLetters s = map toLower (filter isAlpha s)
But note that we can actually write this in terms of function composition:
lcaseLetters s = (map toLower . filter isAlpha) s
And again, we're left with an eta reducible function:
lcaseLetters = map toLower . filter isAlpha
Writing functions in this style is very common among advanced Haskell
users. In fact it has a name: point-free
programming (not to be confused with pointless programming).
It is called point free because in the original definition of
lcaseLetters
, we can think of the value s
as a point on
which the function is operating. By removing the point from the
function definition, we have a point-free function.
A function similar to (.)
is
($)
. Whereas (.)
is function
composition, ($)
is function application. The definition of ($)
from the Prelude is very
simple:
f $ x = f x
However, this function is given very low fixity, which means that it can be used to replace parentheses. For instance, we might write a function:
foo x y = bar y (baz (fluff (ork x)))
However, using the function application function, we can rewrite this as:
foo x y = bar y $ baz $ fluff $ ork x
This moderately resembles the function composition syntax. The
($)
function is also useful when combined with other infix
functions. For instance, we cannot write:
Example:
Prelude> putStrLn "5+3=" ++ show (5+3)
because this is interpreted as (putStrLn "5+3=") ++ (show (5+3))
,
which makes no sense. However, we can fix this by writing instead:
Example:
Prelude> putStrLn $ "5+3=" ++ show (5+3)
Which works fine.
Consider now the task of extracting from a list of tuples all the ones whose first component is greater than zero. One way to write this would be:
fstGt0 l = filter (\ (a,b) -> a>0) l
We can first apply eta reduction to the whole function, yielding:
fstGt0 = filter (\ (a,b) -> a>0)
Now, we can rewrite the lambda function to use the fst
function
instead of the pattern matching:
fstGt0 = filter (\x -> fst x > 0)
Now, we can use function composition between fst
and >
to
get:
fstGt0 = filter (\x -> ((>0) . fst) x)
And finally we can eta reduce:
fstGt0 = filter ((>0).fst)
This definition is simultaneously shorter and easier to understand
than the original. We can clearly see exactly what it is doing: we're
filtering a list by checking whether something is greater than zero.
What are we checking? The fst
element.
While converting to point free style often results in clearer code, this is of course not always the case. For instance, converting the following map to point free style yields something nearly uninterpretable:
foo = map (\x -> sqrt (3+4*(x^2))) foo = map (sqrt . (3+) . (4*) . (^2))
There are a handful of combinators defined in the Prelude which are useful for point free programming:
uncurry
takes a function of typea -> b -> c
and converts it into a function of type(a,b) -> c
. This is useful, for example, when mapping across a list of pairs:
Example:
Prelude> map (uncurry (*)) [(1,2),(3,4),(5,6)] [2,12,30]
curry
is the opposite ofuncurry
and takes a function of type(a,b) -> c
and produces a function of typea -> b -> c
.
flip
reverse the order of the first two arguments to a function. That is, it takes a function of typea -> b -> c
and produces a function of typeb -> a -> c
. For instance, we can sort a list in reverse order by usingflip compare
:
Example:
Prelude> List.sortBy compare [5,1,8,3] [1,3,5,8] Prelude> List.sortBy (flip compare) [5,1,8,3] [8,5,3,1]
This is the same as saying:
Example:
Prelude> List.sortBy (\a b -> compare b a) [5,1,8,3] [8,5,3,1]
only shorter.
Of course, not all functions can be written in point free style. For instance:
square x = x*x
Cannot be written in point free style, without some other combinators. For instance, if we can define other functions, we can write:
pair x = (x,x) square = uncurry (*) . pair
But in this case, this is not terribly useful.
Exercises |
---|
Convert the following functions into point-free style, if possible. func1 x l = map (\y -> y*x) l func2 f g l = filter f (map g l) func3 f l = l ++ map f l func4 l = map (\y -> y+2) (filter (\z -> z `elem` [1..10]) (5:l)) func5 f l = foldr (\x y -> f (y,x)) 0 l |
Pattern Matching
editPattern matching is one of the most powerful features of Haskell (and
most functional programming languages). It is most commonly used in
conjunction with case expressions, which we have already seen in
the section on Functions. Let's return to our
Color
example from the section on Datatypes. I'll
repeat the definition we already had for the datatype:
data Color = Red | Orange | Yellow | Green | Blue | Purple | White | Black | Custom Int Int Int -- R G B components deriving (Show,Eq)
We then want to write a function that will convert between something
of type Color
and a triple of Int
s, which correspond to the
RGB values, respectively. Specifically, if we see a Color
which
is Red
, we want to return (255,0,0)
, since this is the RGB
value for red. So we write that (remember that piecewise function
definitions are just case statements):
colorToRGB Red = (255,0,0)
If we see a Color
which is Orange
, we want to return
(255,128,0)
; and if we see Yellow
, we want to return
(255,255,0)
, and so on. Finally, if we see a custom color, which
is comprised of three components, we want to make a triple out of
these, so we write:
colorToRGB Orange = (255,128,0) colorToRGB Yellow = (255,255,0) colorToRGB Green = (0,255,0) colorToRGB Blue = (0,0,255) colorToRGB Purple = (255,0,255) colorToRGB White = (255,255,255) colorToRGB Black = (0,0,0) colorToRGB (Custom r g b) = (r,g,b)
Then, in our interpreter, if we type:
Example:
Color> colorToRGB Yellow (255,255,0)
What is happening is this: we create a value, call it , which has
value Yellow
. We then apply this to colorToRGB
. We check to
see if we can "match" against Red
. This match fails
because according to the definition of Eq
{Color}, Red
is not
equal to Yellow
. We continue down the definitions of
colorToRGB
and try to match Yellow
against Orange
.
This fails, too. We the try to match Yellow
against Yellow
,
which succeeds, so we use this function definition, which simply
returns the value (255,255,0)
, as expected.
Suppose instead, we used a custom color:
Example:
Color> colorToRGB (Custom 50 200 100) (50,200,100)
We apply the same matching process, failing on all values from
Red
to Black
. We then get to try to match Custom 50 200
100
against Custom r g b
. We can see that the Custom
part
matches, so then we go see if the subelements match. In the matching,
the variables r
, g
and b
are essentially wild cards, so
there is no trouble matching r
with 50, g
with 200 and
b
with 100. As a "side-effect" of this matching, r
gets
the value 50, g
gets the value 200 and b
gets the value
100. So the entire match succeeded and we look at the definition of
this part of the function and bundle up the triple using the matched
values of r
, g
and b
.
We can also write a function to check to see if a Color
is a
custom color or not:
isCustomColor (Custom _ _ _) = True isCustomColor _ = False
When we apply a value to isCustomColor
it tries to match that
value against Custom _ _ _
. This match will succeed if the
value is Custom x y z
for any x
, y
and z
. The
_
(underscore) character is a "wildcard" and will match
anything, but will not do the binding that would happen if you put a
variable name there. If this match succeeds, the function returns
True
; however, if this match fails, it goes on to the next line,
which will match anything and then return False
.
For some reason we might want to define a function which tells us whether a given color is "bright" or not, where my definition of "bright" is that one of its RGB components is equal to 255 (admittedly an arbitrary definition, but it's simply an example). We could define this function as:
isBright = isBright' . colorToRGB where isBright' (255,_,_) = True isBright' (_,255,_) = True isBright' (_,_,255) = True isBright' _ = False
Let's dwell on this definition for a second. The isBright
function is the composition of our previously defined function
colorToRGB
and a helper function isBright'
, which tells us
if a given RGB value is bright or not. We could replace the first
line here with isBright c = isBright' (colorToRGB c)
but there is
no need to explicitly write the parameter here, so we don't. Again,
this function composition style of programming takes some getting used
to, so I will try to use it frequently in this tutorial.
The isBright'
helper function takes the RGB triple produced by
colorToRGB
. It first tries to match it against (255,_,_)
which succeeds if the value has 255 in its first position. If this
match succeeds, isBright'
returns True
and so does
isBright
. The second and third line of definition check for 255
in the second and third position in the triple, respectively. The
fourth line, the fallthrough, matches everything else and
reports it as not bright.
We might want to also write a function to convert between RGB triples
and Color
s. We could simply stick everything in a Custom
constructor, but this would defeat the purpose; we want to use the
Custom
slot only for values which don't match the predefined
colors. However, we don't want to allow the user to construct custom
colors like (600,-40,99) since these are invalid RGB values. We could
throw an error if such a value is given, but this can be difficult to
deal with. Instead, we use the Maybe
datatype. This is defined
(in the Prelude) as:
data Maybe a = Nothing | Just a
The way we use this is as follows: our rgbToColor
function
returns a value of type Maybe Color
. If the RGB value passed to
our function is invalid, we return Nothing
, which
corresponds to a failure. If, on the other hand, the RGB value is
valid, we create the appropriate Color
value and return Just
that. The code to do this is:
rgbToColor 255 0 0 = Just Red rgbToColor 255 128 0 = Just Orange rgbToColor 255 255 0 = Just Yellow rgbToColor 0 255 0 = Just Green rgbToColor 0 0 255 = Just Blue rgbToColor 255 0 255 = Just Purple rgbToColor 255 255 255 = Just White rgbToColor 0 0 0 = Just Black rgbToColor r g b = if 0 <= r && r <= 255 && 0 <= g && g <= 255 && 0 <= b && b <= 255 then Just (Custom r g b) else Nothing -- invalid RGB value
The first eight lines match the RGB arguments against the predefined
values and, if they match, rgbToColor
returns Just
the
appropriate color. If none of these matches, the last definition of
rgbToColor
matches the first argument against r
, the second
against g
and the third against b
(which causes the
side-effect of binding these values). It then checks to see if these
values are valid (each is greater than or equal to zero and less than
or equal to 255). If so, it returns Just (Custom r g b)
; if not,
it returns Nothing
corresponding to an invalid color.
Using this, we can write a function that checks to see if a right RGB value is valid:
rgbIsValid r g b = rgbIsValid' (rgbToColor r g b) where rgbIsValid' (Just _) = True rgbIsValid' _ = False
Here, we compose the helper function rgbIsValid'
with our
function rgbToColor
. The helper function checks to see if the
value returned by rgbToColor
is Just
anything (the
wildcard). If so, it returns True
. If not, it matches anything
and returns False
.
Pattern matching isn't magic, though. You can only match against datatypes; you cannot match against functions. For instance, the following is invalid:
f x = x + 1 g (f x) = x
Even though the intended meaning of g
is clear (i.e., g x = x
- 1
), the compiler doesn't know in general that f
has an
inverse function, so it can't perform matches like this.
Guards
editGuards can be thought of as an extension to the pattern matching facility. They enable you to allow piecewise function definitions to be taken according to arbitrary boolean expressions. Guards appear after all arguments to a function but before the equals sign, and are begun with a vertical bar. We could use guards to write a simple function which returns a string telling you the result of comparing two elements:
comparison x y | x < y = "The first is less" | x > y = "The second is less" | otherwise = "They are equal"
You can read the vertical bar as "such that." So we say that the
value of comparison x y
"such that" x is less than y is "The
first is less." The value such that x is greater than y is "The
second is less" and the value otherwise is "They are equal".
The keyword otherwise
is simply defined to be equal to True
and thus matches anything that falls through that far. So, we can see
that this works:
Example:
Guards> comparison 5 10 "The first is less" Guards> comparison 10 5 "The second is less" Guards> comparison 7 7 "They are equal"
Guards are applied in conjunction with pattern matching. When a pattern matches, all of its guards are tried, consecutively, until one matches. If none match, then pattern matching continues with the next pattern.
One nicety about guards is that where clauses are common to all
guards. So another possible definition for our isBright
function
from the previous section would be:
isBright2 c | r == 255 = True | g == 255 = True | b == 255 = True | otherwise = False where (r,g,b) = colorToRGB c
The function is equivalent to the previous version, but performs its
calculation slightly differently. It takes a color, c
, and
applies colorToRGB
to it, yielding an RGB triple which is matched
(using pattern matching!) against (r,g,b)
. This match succeeds
and the values r
, g
and b
are bound to their respective
values. The first guard checks to see if r
is 255 and, if so,
returns true. The second and third guard check g
and b
against 255, respectively and return true if they match. The last
guard fires as a last resort and returns False
.
Instance Declarations
editIn order to declare a type to be an instance of a class, you need to provide an instance declaration for it. Most classes provide what's called a "minimal complete definition." This means the functions which must be implemented for this class in order for its definition to be satisfied. Once you've written these functions for your type, you can declare it an instance of the class.
The Eq Class
editThe Eq
class has two members (i.e., two functions):
(==) :: Eq a => a -> a -> Bool (/=) :: Eq a => a -> a -> Bool
The first of these type signatures reads that the function ==
is
a function which takes two a
s which are members of Eq
and
produces a Bool
. The type signature of /=
(not equal) is
identical. A minimal complete definition for the Eq
class
requires that either one of these functions be defined (if you define
==
, then /=
is defined automatically by negating the result
of ==
, and vice versa). These declarations must be provided
inside the instance declaration.
This is best demonstrated by example. Suppose we have our color example, repeated here for convenience:
data Color = Red | Orange | Yellow | Green | Blue | Purple | White | Black | Custom Int Int Int -- R G B components
We can define Color
to be an instance of Eq
by the following
declaration:
instance Eq Color where Red == Red = True Orange == Orange = True Yellow == Yellow = True Green == Green = True Blue == Blue = True Purple == Purple = True White == White = True Black == Black = True (Custom r g b) == (Custom r' g' b') = r == r' && g == g' && b == b' _ == _ = False
The first line here begins with the keyword instance telling the
compiler that we're making an instance declaration. It then specifies
the class, Eq
, and the type, Color
which is going to be an
instance of this class. Following that, there's the where
keyword. Finally there's the method declaration.
The first eight lines of the method declaration are basically
identical. The first one, for instance, says that the value of the
expression Red == Red
is equal to True
. Lines two through
eight are identical. The declaration for custom colors is a bit
different. We pattern match Custom
on both sides of ==
. On
the left hand side, we bind r
, g
and b
to the
components, respectively. On the right hand side, we bind r'
,
g'
and b'
to the components. We then say that these two
custom colors are equal precisely when r == r'
, g == g'
and
b == b'
are all equal. The fallthrough says that any pair we
haven't previously declared as equal are unequal.
The Show Class
editThe Show
class is used to display arbitrary values as strings.
This class has three methods:
show :: Show a => a -> String showsPrec :: Show a => Int -> a -> String -> String showList :: Show a => [a] -> String -> String
A minimal complete definition is either show
or showsPrec
(we will talk about showsPrec
later -- it's in there for
efficiency reasons). We can define our Color
datatype to be an
instance of Show
with the following instance declaration:
instance Show Color where show Red = "Red" show Orange = "Orange" show Yellow = "Yellow" show Green = "Green" show Blue = "Blue" show Purple = "Purple" show White = "White" show Black = "Black" show (Custom r g b) = "Custom " ++ show r ++ " " ++ show g ++ " " ++ show b
This declaration specifies exactly how to convert values of type
Color
to String
s. Again, the first eight lines are
identical and simply take a Color
and produce a string. The last
line for handling custom colors matches out the RGB components and
creates a string by concatenating the result of show
ing the
components individually (with spaces in between and "Custom" at the
beginning).
Other Important Classes
editThere are a few other important classes which I will mention briefly because either they are commonly used or because we will be using them shortly. I won't provide example instance declarations; how you can do this should be clear by now.
The Ord Class
editThe ordering class, the functions are:
compare :: Ord a => a -> a -> Ordering (<=) :: Ord a => a -> a -> Bool (>) :: Ord a => a -> a -> Bool (>=) :: Ord a => a -> a -> Bool (<) :: Ord a => a -> a -> Bool min :: Ord a => a -> a -> a max :: Ord a => a -> a -> a
Almost any of the functions alone is a minimal complete
definition; it is recommended that you implement compare
if you
implement only one, though. This function returns a value of type
Ordering
which is defined as:
data Ordering = LT | EQ | GT
So, for instance, we get:
Example:
Prelude> compare 5 7 LT Prelude> compare 6 6 EQ Prelude> compare 7 5 GT
In order to declare a type to be an instance of Ord
you must
already have declared it an instance of Eq
(in other words,
Ord
is a subclass of Eq
-- more about this in
the section on Classes).
The Enum Class
editThe Enum
class is for enumerated types; that is, for types where
each element has a successor and a predecessor. Its methods are:
pred :: Enum a => a -> a succ :: Enum a => a -> a toEnum :: Enum a => Int -> a fromEnum :: Enum a => a -> Int enumFrom :: Enum a => a -> [a] enumFromThen :: Enum a => a -> a -> [a] enumFromTo :: Enum a => a -> a -> [a] enumFromThenTo :: Enum a => a -> a -> a -> [a]
The minimal complete definition contains both toEnum
and
fromEnum
, which converts from and to Int
s. The pred
and succ
functions give the predecessor and successor,
respectively. The enum
functions enumerate lists of elements.
For instance, enumFrom x
lists all elements after x
;
enumFromThen x step
lists all elements starting at x
in
steps of size step
. The To
functions end the enumeration at
the given element.
The Num Class
editThe Num
class provides the standard arithmetic operations:
(-) :: Num a => a -> a -> a (*) :: Num a => a -> a -> a (+) :: Num a => a -> a -> a negate :: Num a => a -> a signum :: Num a => a -> a abs :: Num a => a -> a fromInteger :: Num a => Integer -> a
All of these are obvious except for perhaps negate
which is the
unary minus. That is, negate x
means .
The Read Class
editThe Read
class is the opposite of the Show
class. It is a
way to take a string and read in from it a value of arbitrary type.
The methods for Read
are:
readsPrec :: Read a => Int -> String -> [(a, String)] readList :: String -> [([a], String)]
The minimal complete definition is readsPrec
. The most important
function related to this is read
, which uses readsPrec
as:
read s = fst (head (readsPrec 0 s))
This will fail if parsing the string fails. You could define a
maybeRead
function as:
maybeRead s = case readsPrec 0 s of [(a,_)] -> Just a _ -> Nothing
How to write and use readsPrec
directly will be discussed further
in the examples.
Class Contexts
editSuppose we are defining the Maybe
datatype from scratch. The
definition would be something like:
data Maybe a = Nothing | Just a
Now, when we go to write the instance declarations, for, say, Eq
,
we need to know that a
is an instance of Eq
otherwise we
can't write a declaration. We express this as:
instance Eq a => Eq (Maybe a) where Nothing == Nothing = True (Just x) == (Just x') = x == x'
This first line can be read "That a
is an instance of Eq
implies (=>
) that Maybe a
is an instance of
Eq
."
Deriving Classes
editWriting obvious Eq
, Ord
, Read
and Show
classes
like these is tedious and should be automated. Luckily for us, it
is. If you write a datatype that's "simple enough" (almost any
datatype you'll write unless you start writing fixed point types), the
compiler can automatically derive some of the most basic
classes. To do this, you simply add a deriving clause to after
the datatype declaration, as in:
data Color = Red | ... | Custom Int Int Int -- R G B components deriving (Eq, Ord, Show, Read)
This will automatically create instances of the Color
datatype of
the named classes. Similarly, the declaration:
data Maybe a = Nothing | Just a deriving (Eq, Ord, Show, Read)
derives these classes just when a
is appropriate.
All in all, you are allowed to derive instances of Eq
, Ord
,
Enum
, Bounded
, Show
and Read
. There is
considerable work in the area of "polytypic programming" or
"generic programming" which, among other things, would allow for
instance declarations for any class to be derived. This is
much beyond the scope of this tutorial; instead, I refer you to the
literature.
Datatypes Revisited
editI know by this point you're probably terribly tired of hearing about datatypes. They are, however, incredibly important, otherwise I wouldn't devote so much time to them. Datatypes offer a sort of notational convenience if you have, for instance, a datatype that holds many many values. These are called named fields.
Named Fields
editConsider a datatype whose purpose is to hold configuration settings. Usually when you extract members from this type, you really only care about one or possibly two of the many settings. Moreover, if many of the settings have the same type, you might often find yourself wondering "wait, was this the fourth or fifth element?" One thing you could do would be to write accessor functions. Consider the following made-up configuration type for a terminal program:
data Configuration = Configuration String -- user name String -- local host String -- remote host Bool -- is guest? Bool -- is super user? String -- current directory String -- home directory Integer -- time connected deriving (Eq, Show)
You could then write accessor functions, like (I've only listed a few):
getUserName (Configuration un _ _ _ _ _ _ _) = un getLocalHost (Configuration _ lh _ _ _ _ _ _) = lh getRemoteHost (Configuration _ _ rh _ _ _ _ _) = rh getIsGuest (Configuration _ _ _ ig _ _ _ _) = ig ...
You could also write update functions to update a single element. Of course, now if you add an element to the configuration, or remove one, all of these functions now have to take a different number of arguments. This is highly annoying and is an easy place for bugs to slip in. However, there's a solution. We simply give names to the fields in the datatype declaration, as follows:
data Configuration = Configuration { username :: String, localhost :: String, remotehost :: String, isguest :: Bool, issuperuser :: Bool, currentdir :: String, homedir :: String, timeconnected :: Integer }
This will automatically generate the following accessor functions for us:
username :: Configuration -> String localhost :: Configuration -> String ...
Moreover, it gives us very convenient update methods. Here is a short
example for a "post working directory" and "change directory" like
functions that work on Configuration
s:
changeDir :: Configuration -> String -> Configuration changeDir cfg newDir = -- make sure the directory exists if directoryExists newDir then -- change our current directory cfg{currentdir = newDir} else error "directory does not exist" postWorkingDir :: Configuration -> String -- retrieve our current directory postWorkingDir cfg = currentdir cfg
So, in general, to update the field x
in a datatype y
to
z
, you write y{x=z}
. You can change more than one; each
should be separated by commas, for instance, y{x=z, a=b, c=d}
.
You can of course continue to pattern match against
Configuration
s as you did before. The named fields are simply
syntactic sugar; you can still write something like:
getUserName (Configuration un _ _ _ _ _ _ _) = un
But there is little reason to. Finally, you can pattern match against named fields as in:
getHostData (Configuration {localhost=lh,remotehost=rh}) = (lh,rh)
This matches the variable lh
against the localhost
field on
the Configuration
and the variable rh
against the
remotehost
field on the Configuration
. These matches of
course succeed. You could also constrain the matches by putting
values instead of variable names in these positions, as you would for
standard datatypes.
You can create values of Configuration
in the old way as shown in
the first definition below, or in the named-field's type, as shown in
the second definition below:
initCFG = Configuration "nobody" "nowhere" "nowhere" False False "/" "/" 0 initCFG' = Configuration { username="nobody", localhost="nowhere", remotehost="nowhere", isguest=False, issuperuser=False, currentdir="/", homedir="/", timeconnected=0 }
Though the second is probably much more understandable unless you litter your code with comments.
More Lists
editto do: put something here
Standard List Functions
editRecall that the definition of the built-in Haskell list datatype is equivalent to:
data List a = Nil | Cons a (List a)
With the exception that Nil
is called []
and Cons x xs
is called x:xs
. This is simply to make pattern matching easier
and code smaller. Let's investigate how some of the standard list
functions may be written. Consider map
. A definition is given
below:
map _ [] = [] map f (x:xs) = f x : map f xs
Here, the first line says that when you map
across an empty list,
no matter what the function is, you get an empty list back. The
second line says that when you map
across a list with x
as
the head and xs
as the tail, the result is f
applied to
x
consed onto the result of mapping f
on xs
.
The filter
can be defined similarly:
filter _ [] = [] filter p (x:xs) | p x = x : filter p xs | otherwise = filter p xs
How this works should be clear. For an empty list, we return an empty
list. For a non empty list, we return the filter of the tail, perhaps
with the head on the front, depending on whether it satisfies the
predicate p
or not.
We can define foldr
as:
foldr _ z [] = z foldr f z (x:xs) = f x (foldr f z xs)
Here, the best interpretation is that we are replacing the empty list
([]
) with a particular value and the list constructor (:
)
with some function. On the first line, we can see the replacement of
[]
for z
. Using backquotes to make f
infix, we can
write the second line as:
foldr f z (x:xs) = x `f` (foldr f z xs)
From this, we can directly see how :
is being replaced by f
.
Finally, foldl
:
foldl _ z [] = z foldl f z (x:xs) = foldl f (f z x) xs
This is slightly more complicated. Remember, z
can be thought of
as the current state. So if we're folding across a list which is
empty, we simply return the current state. On the other hand, if the
list is not empty, it's of the form x:xs
. In this case, we get a
new state by applying f
to the current state z
and the
current list element x
and then recursively call foldl
on
xs
with this new state.
There is another class of functions: the zip
and unzip
functions, which respectively take multiple lists and make one or take
one lists and split them apart. For instance, zip
does the
following:
Example:
Prelude> zip "hello" [1,2,3,4,5] [('h',1),('e',2),('l',3),('l',4),('o',5)]
Basically, it pairs the first elements of both lists and makes that
the first element of the new list. It then pairs the second elements
of both lists and makes that the second element, etc. What if the
lists have unequal length? It simply stops when the shorter one
stops. A reasonable definition for zip
is:
zip [] _ = [] zip _ [] = [] zip (x:xs) (y:ys) = (x,y) : zip xs ys
The unzip
function does the opposite. It takes a zipped list and
returns the two "original" lists:
Example:
Prelude> unzip [('f',1),('o',2),('o',3)] ("foo",[1,2,3])
There are a whole slew of zip
and unzip
functions, named
zip3
, unzip3
, zip4
, unzip4
and so on; the
...3
functions use triples instead of pairs; the ...4
functions use 4-tuples, etc.
Finally, the function take
takes an integer and a list and
returns the first elements off the list. Correspondingly,
drop
takes an integer and a list and returns the result of
throwing away the first elements off the list. Neither of these
functions produces an error; if is too large, they both will just
return shorter lists.
List Comprehensions
editThere is some syntactic sugar for dealing with lists whose elements
are members of the Enum
class (see
the section on Instances), such as Int
or
Char
. If we want to create a list of all the elements from
to , we can simply write:
Example:
Prelude> [1..10] [1,2,3,4,5,6,7,8,9,10]
We can also introduce an amount to step by:
Example:
Prelude> [1,3..10] [1,3,5,7,9] Prelude> [1,4..10] [1,4,7,10]
These expressions are short hand for enumFromTo
and
enumFromThenTo
, respectively. Of course, you don't need to
specify an upper bound. Try the following (but be ready to hit
Control+C to stop the computation!):
Example:
Prelude> [1..] [1,2,3,4,5,6,7,8,9,10,11,12{Interrupted!}
Probably yours printed a few thousand more elements than this. As we said before, Haskell is lazy. That means that a list of all numbers from 1 on is perfectly well formed and that's exactly what this list is. Of course, if you attempt to print the list (which we're implicitly doing by typing it in the interpreter), it won't halt. But if we only evaluate an initial segment of this list, we're fine:
Example:
Prelude> take 3 [1..] [1,2,3] Prelude> take 3 (drop 5 [1..]) [6,7,8]
This comes in useful if, say, we want to assign an ID to each element in a list. Without laziness we'd have to write something like this:
assignID :: [a] -> [(a,Int)] assignID l = zip l [1..length l]
Which means that the list will be traversed twice. However, because of laziness, we can simply write:
assignID l = zip l [1..]
And we'll get exactly what we want. We can see that this works:
Example:
Prelude> assignID "hello" [('h',1),('e',2),('l',3),('l',4),('o',5)]
Finally, there is some useful syntactic sugar for map
and
filter
, based on standard set-notation in mathematics. In math,
we would write something like to
mean the set of all values of when applied to elements of
which satisfy . This is equivalent to the Haskell statement
map f (filter p s)
. However, we can also use more math-like
notation and write [f x | x <- s, p x]
. While in math the
ordering of the statements on the side after the pipe is free, it is
not so in Haskell. We could not have put p x
before x <- s
otherwise the compiler wouldn't know yet what x
was. We can use
this to do simple string processing. Suppose we want to take a
string, keep only the uppercase letters and convert those to lowercase.
We could do this in either of the following
two equivalent ways:
Example:
Prelude> map toLower (filter isUpper "Hello World") "hw" Prelude> [toLower x | x <- "Hello World", isUpper x] "hw"
These two are equivalent, and, depending on the exact functions you're using, one might be more readable than the other. There's more you can do here, though. Suppose you want to create a list of pairs, one for each point between (0,0) and (5,7) below the diagonal. Doing this manually with lists and maps would be cumbersome and possibly difficult to read. It couldn't be easier than with list comprehensions:
Example:
Prelude> [(x,y) | x <- [1..5], y <- [x..7]] [(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),(2,2),(2,3), (2,4),(2,5),(2,6),(2,7),(3,3),(3,4),(3,5),(3,6),(3,7), (4,4),(4,5),(4,6),(4,7),(5,5),(5,6),(5,7)]
If you reverse the order of the x <-
and y <-
clauses, the
order in which the space is traversed will be reversed (of course, in
that case, y
could no longer depend on x
and you would need
to make x
depend on y
but this is trivial).
Arrays
editLists are nice for many things. It is easy to add elements to the beginning of them and to manipulate them in various ways that change the length of the list. However, they are bad for random access, having average complexity to access an arbitrary element (if you don't know what means, you can either ignore it or take a quick detour and read the appendix chapter Complexity, a two-page introduction to complexity theory). So, if you're willing to give up fast insertion and deletion because you need random access, you should use arrays instead of lists.
In order to use arrays you must import the Array
module. There
are a few methods for creating arrays, the array
function, the
listArray
function, and the accumArray
function. The
array
function takes a pair which is the bounds of the array, and
an association list which specifies the initial values of the array.
The listArray
function takes bounds and then simply a list of
values. Finally, the accumArray
function takes an accumulation
function, an initial value and an association list and accumulates
pairs from the list into the array. Here are some examples of arrays
being created:
Example:
Prelude> :m Array Prelude Array> array (1,5) [(i,2*i) | i <- [1..5]] array (1,5) [(1,2),(2,4),(3,6),(4,8),(5,10)] Prelude Array> listArray (1,5) [3,7,5,1,10] array (1,5) [(1,3),(2,7),(3,5),(4,1),(5,10)] Prelude Array> accumArray (+) 2 (1,5) [(i,i) | i <- [1..5]] array (1,5) [(1,3),(2,4),(3,5),(4,6),(5,7)]
When arrays are printed out (via the show function), they are printed with an association list. For instance, in the first example, the association list says that the value of the array at is , the value of the array at is , and so on.
You can extract an element of an array using the !
function,
which takes an array and an index, as in:
Example:
Prelude Array> (listArray (1,5) [3,7,5,1,10]) ! 3 5
Moreover, you can update elements in the array using the //
function. This takes an array and an association list and updates the
positions specified in the list:
Example:
Prelude Array> (listArray (1,5) [3,7,5,1,10]) // [(2,99),(3,-99)] array (1,5) [(1,3),(2,99),(3,-99),(4,1),(5,10)]
There are a few other functions which are of interest:
bounds |
returns the bounds of an array |
indices |
returns a list of all indices of the array |
elems |
returns a list of all the values in the array in order |
assocs |
returns an association list for the array |
If we define arr
to be listArray (1,5) [3,7,5,1,10]
, the
result of these functions applied to arr
are:
Example:
Prelude Array> bounds arr (1,5) Prelude Array> indices arr [1,2,3,4,5] Prelude Array> elems arr [3,7,5,1,10] Prelude Array> assocs arr [(1,3),(2,7),(3,5),(4,1),(5,10)]
Note that while arrays are access, they are not
update. They are in fact update, since in order to maintain
purity, the array must be copied in order to make an update.
Thus, functional arrays are pretty much only useful when you're
filling them up once and then only reading. If you need fast access
and update, you should probably use FiniteMap
s, which are
discussed in the section on Finitemaps and have
access and update.
Maps
editThis section is work in progress, please help improving it! |
The Map
datatype from the Data.Map
module is a purely functional implementation of balanced trees. Maps
can be compared to lists and arrays in terms of the time it takes to
perform various operations on those datatypes of a fixed size, . A
brief comparison is:
List | Array | Map | |
---|---|---|---|
insert | |||
update | |||
delete | |||
find | |||
map |
As we can see, lists provide fast insertion (but slow everything else), arrays provide fast lookup (but slow everything else) and maps provide moderately fast everything.
The type of a map is for the form Map k a
where
k
is the type of the keys and a
is the type of the
elements. That is, maps are lookup tables from type k
to type a
.
The basic map functions are:
empty :: Map k a insert :: k -> a -> Map k a -> Map k a delete :: k -> Map k a -> Map k a member :: k -> Map k a -> Bool lookup :: k -> Map k a -> a
In all these cases, the type k
must be an instance of Ord
(and hence also an instance of Eq
).
There are also function fromList
and toList
to convert
lists to and from maps. Try the following:
Example:
Prelude> :m Data.Map Prelude Data.Map> let mymap = fromList [('a',5),('b',10),('c',1),('d',2)] Prelude Data.Map> let othermap = insert 'e' 6 mymap Prelude Data.Map> toList mymap [('a',5),('b',10),('c',1),('d',2)] Prelude Data.Map> toList othermap [('a',5),('b',10),('c',1),('d',2),('e',6)] Prelude Data.Map> Data.Map.lookup 'e' othermap 6 Prelude Data.Map> Data.Map.lookup 'e' mymap *** Exception: user error (Data.Map.lookup: Key not found)
Layout
editThe Final Word on Lists
editYou are likely tired of hearing about lists at this point, but they are so fundamental to Haskell (and really all of functional programming) that it would be terrible not to talk about them some more.
It turns out that foldr
is actually quite a powerful function: it
can compute a primitive recursive function. A primitive
recursive function is essentially one which can be calculated using
only "for" loops, but not "while" loops.
In fact, we can fairly easily define map
in terms of foldr
:
map2 f = foldr (\a b -> f a : b) []
Here, b
is the accumulator (i.e., the result list) and a
is
the element being currently considered. In fact, we can simplify this
definition through a sequence of steps:
foldr (\a b -> f a : b) [] ==> foldr (\a b -> (:) (f a) b) [] ==> foldr (\a -> (:) (f a)) [] ==> foldr (\a -> ((:) . f) a) [] ==> foldr ((:) . f) []
This is directly related to the fact that foldr (:) []
is the
identity function on lists. This is because, as mentioned before,
foldr f z
can be thought of as replacing the []
in lists by
z
and the :
by f
. In this case, we're keeping both the
same, so it is the identity function.
In fact, you can convert any function of the following style into a
foldr
:
myfunc [] = z myfunc (x:xs) = f x (myfunc xs)
By writing the last line with f
in infix form, this should be
obvious:
myfunc [] = z myfunc (x:xs) = x `f` (myfunc xs)
Clearly, we are just replacing []
with z
and :
with
f
. Consider the filter
function:
filter p [] = [] filter p (x:xs) = if p x then x : filter p xs else filter p xs
This function also follows the form above. Based on the first line,
we can figure out that z
is supposed to be []
, just like in
the map
case. Now, suppose that we call the result of calling
filter p xs
simply b
, then we can rewrite this as:
filter p [] = [] filter p (x:xs) = if p x then x : b else b
Given this, we can transform filter
into a fold:
filter p = foldr (\a b -> if p a then a:b else b) []
Let's consider a slightly more complicated function: ++
. The
definition for ++
is:
(++) [] ys = ys (++) (x:xs) ys = x : (xs ++ ys)
Now, the question is whether we can write this in fold notation. First, we can apply eta reduction to the first line to give:
(++) [] = id
Through a sequence of steps, we can also eta-reduce the second line:
(++) (x:xs) ys = x : ((++) xs ys) ==> (++) (x:xs) ys = (x:) ((++) xs ys) ==> (++) (x:xs) ys = ((x:) . (++) xs) ys ==> (++) (x:xs) = (x:) . (++) xs
Thus, we get that an eta-reduced definition of ++
is:
(++) [] = id (++) (x:xs) = (x:) . (++) xs
Now, we can try to put this into fold notation. First, we notice that
the base case converts []
into id
. Now, if we assume
(++) xs
is called b
and x
is called a
, we can get
the following definition in terms of foldr
:
(++) = foldr (\a b -> (a:) . b) id
This actually makes sense intuitively. If we only think about
applying ++
to one argument, we can think of it as a function
which takes a list and creates a function which, when applied, will
prepend this list to another list. In the lambda function, we assume
we have a function b
which will do this for the rest of the list
and we need to create a function which will do this for b
as well
as the single element a
. In order to do this, we first apply
b
and then further add a
to the front.
We can further reduce this expression to a point-free style through the following sequence:
==> (++) = foldr (\a b -> (a:) . b) id ==> (++) = foldr (\a b -> (.) (a:) b) id ==> (++) = foldr (\a -> (.) (a:)) id ==> (++) = foldr (\a -> (.) ((:) a)) id ==> (++) = foldr (\a -> ((.) . (:)) a) id ==> (++) = foldr ((.) . (:)) id
This final version is point free, though not necessarily understandable. Presumbably the original version is clearer.
As a final example, consider concat
. We can write this as:
concat [] = [] concat (x:xs) = x ++ concat xs
It should be immediately clear that the z
element for the fold is
[]
and that the recursive function is ++
, yielding:
concat = foldr (++) []
Exercises |
---|
|