Haskell/Pattern matching

In the previous modules of this book we introduced and then made occasional reference to pattern matching, pointing out common uses. Now that we have developed some familiarity with the language, it is time to take a proper, deeper look at pattern matching. We will kick-start the discussion with a condensed description, which we will expanded upon throughout the chapter:

In pattern matching, we attempt to match values against patterns and, if so desired, bind variables to successful matches.

Note

Pattern matching on what?

Some languages like Perl and Python use the term pattern matching for matching regular expressions against strings. The pattern matching we are referring to in this chapter is something completely different. In fact, you're probably best off forgetting what you know about pattern matching for now.[1] Here, pattern matching is used in the same way as in other ML-like languages: to deconstruct values according to their type specification.


Analysing pattern matchingEdit

Pattern matching is virtually everywhere. To pick but one example, consider a definition like that of map:

map _ []     = []
map f (x:xs) = f x : map f xs

Here, at surface level, there are four different patterns involved, two per equation. Let's explore each one in turn, beginning with the second equation:

  • f is a pattern which matches anything at all, and binds the f variable to whatever is matched.
  • (x:xs) is a pattern that matches a non-empty list which is formed by something (which gets bound to the x variable) which was cons'd (by the (:) function) onto something else (which gets bound to xs).
  • [] is a pattern that matches the empty list. It doesn't bind any variables.
  • _ is the pattern which matches anything at all, but doesn't do any binding.

In the (x:xs) pattern, x and xs can be seen as sub-patterns used to match the parts of the list. Just like f, they match anything - though it is evident that if there is a successful match and x has type a, xs will have type [a]. Finally, these considerations imply that xs will also match an empty list, and so a one-element list matches (x:xs).

From the above dissection, we can say pattern matching gives us a way to:

  • recognize values. For instance, when map is called and the second argument matches [] the first equation for map is used instead of the second one.
  • bind variables to the recognized values. In this case, the variables f, x, and xs are assigned to the values passed as arguments to map when the second equation is used, and so we can use these values through the variables in the right-hand side of =. As _ and [] show, binding is not an essential part of pattern matching, but just a side effect of using variable names as patterns.
  • break down values into parts, as the (x:xs) pattern does by binding two variables to parts (head and tail) of a matched argument (the non-empty list).

The connection with constructorsEdit

Regardless of the detailed analysis above, the process of using (:) to break down a list, as if we were undoing the effects of the (:) operator, may look a little too magical. It will not, however, work with any arbitrary operator. For example, one might think, by analogy to how (:) is used to break down a list, of defining a function which uses (++) to chop off the first three elements of a list:

dropThree ([x,y,z] ++ xs) = xs

However, that will not work. The function (++) is not allowed in patterns, and in fact most other functions that act on lists are not allowed as well. Which functions, then, are allowed?

In one word, constructors – the functions used to build values of algebraic data types. Let us consider a random example:

data Foo = Bar | Baz Int

Here Bar and Baz are constructors for the type Foo. And so you can use them for pattern matching Foo values, and bind variables to the Int value contained in a Foo constructed with Baz:

f :: Foo -> Int
f Bar     = 1
f (Baz x) = x - 1

That was exactly what was going on back when we defined showAnniversary and showDate in the Type declarations module. For instance:

data Date = Date Int Int Int   -- Year, Month, Day
showDate :: Date -> String
showDate (Date y m d) = show y ++ "-" ++ show m ++ "-" ++ show d

The (Date y m d) pattern in the left-hand side of the showDate definition matches a Date (built with the Date constructor) and binds the variables y, m and d to the contents of the Date value.

Why does it work with lists?Edit

As for lists, they are not different from data-defined algebraic data types as far as pattern matching is concerned. It works as if lists were defined with this data declaration (note that the following isn't actually valid syntax: lists are in reality deeply ingrained into Haskell):

data [a] = [] | a : [a]

So the empty list, [] and the (:) function are in reality constructors of the list datatype, and so you can pattern match with them. [] takes no arguments, and therefore no variables can be bound when it is used for pattern matching. (:) takes two arguments, the list head and tail, which may then have variables bound to them when the pattern is recognized.

Prelude> :t []
[] :: [a]
Prelude> :t (:)
(:) :: a -> [a] -> [a]

Furthermore, since [x, y, z] is just syntactic sugar for x:y:z:[], we can solve the dropThree problem using pattern matching alone:

dropThree :: [a] -> [a]
dropThree (_:_:_:xs) = xs
dropThree _          = []

The first pattern will match any list with at least three elements. The catch-all second definition provides a reasonable default[2] when lists fail to match the main pattern, and thus prevents runtime crashes due to pattern match failure.

Note


From the fact that we could write a dropThree function with bare pattern matching it doesn't follow that we should do so! Even though the solution is simple, it is still a waste of effort to code something this specific when we could just use Prelude and settle it with drop 3 xs instead. Mirroring what was said before about baking bare recursive functions, we might say: don't get too excited about pattern matching either...


Tuple constructorsEdit

Analogous considerations are valid for tuples. Our access to their components via pattern matching...

fstPlusSnd :: (Num a) => (a, a) -> a
fstPlusSnd (x, y) = x + y
 
norm3D :: (Floating a) => (a, a, a) -> a
norm3D (x, y, z) = sqrt (x^2 + y^2 + z^2)

... is granted by the existence of tuple constructors. For pairs, the constructor is the comma operator, (,); for larger tuples there are (,,); (,,,) and so on. These operators are slightly unusual in that we can't use them infix in the regular way; so 5 , 3 is not a valid way to write (5, 3). All of them, however, can be used prefix, which is occasionally useful.

Prelude> (,) 5 3
(5,3)
Prelude> (,,,) "George" "John" "Paul" "Ringo"
("George","John","Paul","Ringo")

Matching literal valuesEdit

As discussed earlier in the book, a simple piece-wise function definition like this one

f :: Int -> Int
f 0 = 1
f 1 = 5
f 2 = 2
f _ = -1

is performing pattern matching as well, matching the argument of f with the Int literals 0, 1 and 2, and finally with _ . In general, numeric and character literals can be used in pattern matching[3]. They can also be used together with constructor patterns. For instance, this function

g :: [Int] -> Bool
g (0:[]) = False
g (0:xs) = True
g _ = False

will evaluate to False for the [0] list, to True if the list has 0 as first element and a non-empty tail and to False in all other cases. Also, lists with literal elements like [1,2,3], or even "abc" (which is equivalent to ['a','b','c']) can be used for pattern matching as well, since these forms are only syntactic sugar for the (:) constructor.

It costs nothing to emphasize that the above considerations are only valid for literal values, so the following will not work:

k = 1
--again, this won't work as expected
h :: Int -> Bool
h k = True
h _ = False
Exercises
  1. Test the flawed h function above in GHCi, with arguments equal to and different from 1. Then, explain what goes wrong.
  2. In this section about pattern matching with literal values we made no mention of the boolean values True and False, even though we can do pattern matching with them, as demonstrated in the Next steps chapter. Can you guess why we omitted them? (Hint: is there anything distinctive about the way we write boolean values?)


Syntax tricksEdit

As-patternsEdit

Sometimes, when matching a pattern with a value, it may be useful to bind a name also to the whole value being matched. As-patterns allow exactly this: they are of the form var@pattern and have the additional effect to bind the name var to the whole value being matched by pattern. For instance, here is a toy variation on the map theme:

contrivedMap :: ([a] -> a -> b) -> [a] -> [b]
contrivedMap f [] = []
contrivedMap f list@(x:xs) = f list x : contrivedMap f xs

contrivedMap passes to the parameter function f not only x but also the undivided list used as argument of each recursive call. Writing it without as-patterns would have been more cumbersome because we would have to needlessly reconstruct the original value of list, i.e. actually evaluate x:xs on the right side:

contrivedMap :: ([a] -> a -> b) -> [a] -> [b]
contrivedMap f [] = []
contrivedMap f (x:xs) = f (x:xs) x : contrivedMap f xs

Introduction to recordsEdit

For constructors with many elements, records provide useful syntactical help. Briefly, they are a way of naming values in a datatype, using the following syntax:

data Foo2 = Bar2 | Baz2 {bazNumber::Int, bazName::String}

Using records allows doing matching and binding only for the variables relevant to the function we're writing, making code much clearer:

h :: Foo2 -> Int
h Baz2 {bazName=name} = length name
h Bar2 {} = 0

Also, the {} pattern can be used for matching a constructor regardless of the datatype elements even if you don't use records in the data declaration:

data Foo = Bar | Baz Int
g :: Foo -> Bool
g Bar {} = True
g Baz {} = False

The function g does not have to be changed if we modify the number or the type of elements of the constructors Bar or Baz.

There are even more potential advantages in using record syntax. We will cover records in more detail further down the road; in case you want to start using them right now it may be useful to have a look at the Named fields section of the More on datatypes chapter.

Where we can use pattern matchingEdit

The short answer is that wherever you can bind variables, you can pattern match. Let us have a glance at such places we have seen before; a few more will be introduced in the following chapters.

EquationsEdit

The most obvious use case is the left-hand side of function definition equations, which were the subject of our examples so far.

map _ []     = []
map f (x:xs) = f x : map f xs

In the map definition we're doing pattern matching on the left hand side of both equations, and also binding variables on the second one.

let expressions and where clausesEdit

Both let and where are ways of doing local variable bindings. As such, you can also use pattern matching in them. A simple example:

y =
  let 
    (x:_) = map ((*) 2) [1,2,3]
  in x + 5

Or, equivalently,

y = x + 5
  where 
  (x:_) = map ((*) 2) [1,2,3]

Here, x will be bound to the first element of map ((*) 2) [1,2,3]. y, therefore, will evaluate to 2 + 5 = 7.

List comprehensionsEdit

After the | in list comprehensions you can pattern match. This is actually extremely useful, and adds a lot to the expressiveness of comprehensions. Let's see how that works with a slightly more sophisticated example. Prelude provides a Maybe type which has the following constructors:

data Maybe a = Nothing | Just a

It is typically used to hold values resulting from an operation which may or may not succeed; if the operation succeeds, the Just constructor is used and the value is passed to it; otherwise Nothing is used[4]. The utility function catMaybes (which is available from Data.Maybe library module) takes a list of Maybes (which may contain both "Just" and "Nothing" Maybes), and retrieves the contained values by filtering out the Nothing values and getting rid of the Just wrappers of the Just x. Writing it with list comprehensions is very straightforward:

catMaybes :: [Maybe a] -> [a]
catMaybes ms = [ x | Just x <- ms ]

Another nice thing about using a list comprehension for this task is that if the pattern match fails (that is, it meets a Nothing) it just moves on to the next element in ms, thus avoiding the need of explicitly handling constructors we are not interested in with alternate function definitions[5].

do blocksEdit

Within a do block like the ones we used in the Simple input and output chapter, we can pattern match with the left-hand side of the left arrow variable bindings:

putFirstChar = do
    (c:_) <- getLine
    putStrLn [c]

Furthermore, the let bindings in do blocks are, as far as pattern matching is concerned, just the same as the "real" let expressions.


NotesEdit

  1. If you came here looking for regex pattern matching, you might be interested in looking at the Haskell Text.Regex library wrapper.
  2. Reasonable for this particular task, and just because it makes sense to expect that dropThree will give [] when applied to a list of, say, two elements. With a different problem, perhaps it would not be reasonable to return any list if the first match failed. Later in the book we will consider one simple way of dealing with such cases.
  3. As perhaps could be expected, this kind of matching with literals is not constructor-based. Rather, there is an equality comparison behind the scenes.
  4. The canonical example of such an operation is looking up values in a dictionary - which might just be a [(a, b)] list with the tuples being key-value pairs, or a more sophisticated implementation. In any case, if we, given an arbitrary key, try to retrieve a value there is no guarantee we will actually find a value associated to the key.
  5. Note for the curious: The reason why it works this way instead of crashing out on a pattern matching failure has to do with the real nature of list comprehensions - namely, being nice wrappers for the list monad. Please don't lose sleep over this aside; we will explain what that means eventually.


Last modified on 30 March 2014, at 08:24