Haskell/More on datatypes
Enumerations
editOne special case of the data
declaration is the enumeration — a data type where none of the constructor functions have any arguments:
data Month = January | February | March | April | May | June | July
| August | September | October | November | December
You can mix constructors that do and do not have arguments, but then the result is not called an enumeration. The following example is not an enumeration because the last constructor takes three arguments:
data Colour = Black | Red | Green | Blue | Cyan
| Yellow | Magenta | White | RGB Int Int Int
As you will see further on when we discuss classes and derivation, there are practical reasons to distinguish between what is and isn't an enumeration.
Incidentally, the Bool
datatype is an enumeration:
data Bool = False | True
deriving (Bounded, Enum, Eq, Ord, Read, Show)
Named Fields (Record Syntax)
editConsider a datatype whose purpose is to hold configuration settings. Usually, when you extract members from this type, you really only care about one or two of the many settings. Moreover, if many of the settings have the same type, you might often find yourself wondering "wait, was this the fourth or fifth element?" One way to clarify is to write accessor functions. Consider the following made-up configuration type for a terminal program:
data Configuration = Configuration
String -- User name
String -- Local host
String -- Remote host
Bool -- Is guest?
Bool -- Is superuser?
String -- Current directory
String -- Home directory
Integer -- Time connected
deriving (Eq, Show)
You could then write accessor functions, such as:
getUserName (Configuration un _ _ _ _ _ _ _) = un
getLocalHost (Configuration _ lh _ _ _ _ _ _) = lh
getRemoteHost (Configuration _ _ rh _ _ _ _ _) = rh
getIsGuest (Configuration _ _ _ ig _ _ _ _) = ig
-- And so on...
You could also write update functions to update a single element. Of course, if you add or remove an element in the configuration later, all of these functions now have to take a different number of arguments. This is quite annoying and is an easy place for bugs to slip in. Thankfully, there's a solution: we simply give names to the fields in the datatype declaration, as follows:
data Configuration = Configuration
{ username :: String
, localHost :: String
, remoteHost :: String
, isGuest :: Bool
, isSuperuser :: Bool
, currentDir :: String
, homeDir :: String
, timeConnected :: Integer
}
This will automatically generate the following accessor functions for us:
username :: Configuration -> String
localHost :: Configuration -> String
-- etc.
This also gives us a convenient update method. Here is a short
example for a "post working directory" and "change directory"
functions that work on Configuration
s:
changeDir :: Configuration -> String -> Configuration
changeDir cfg newDir =
if directoryExists newDir -- make sure the directory exists
then cfg { currentDir = newDir }
else error "Directory does not exist"
postWorkingDir :: Configuration -> String
postWorkingDir cfg = currentDir cfg
So, in general, to update the field x
in a value y
to
z
, you write y { x = z }
. You can change more than one; each
should be separated by commas, for instance, y {x = z, a = b, c = d }
.
Note
Those of you familiar with object-oriented languages might be tempted, after all of this talk about "accessor functions" and "update methods", to think of the y{x=z}
construct as a setter method, which modifies the value of x in a pre-existing y. It is not like that – remember that in Haskell variables are immutable. Therefore, using the example above, if you do something like
conf2 = changeDir conf1 "/opt/foo/bar"
conf2
will be defined as a Configuration
which is just like conf1
except for having "/opt/foo/bar"
as its currentDir
, but conf1
will remain unchanged.
It's only sugar
editYou can, of course, continue to pattern match against
Configuration
s as you did before. The named fields are simply
syntactic sugar; you can still write something like:
getUserName (Configuration un _ _ _ _ _ _ _) = un
But there is no need to do this.
Finally, you can pattern match against named fields as in:
getHostData (Configuration { localHost = lh, remoteHost = rh }) = (lh, rh)
This matches the variable lh
against the localHost
field in
the Configuration
and the variable rh
against the
remoteHost
field. These matches will succeed, of course.
You could also constrain the matches by putting values instead of
variable names in these positions, as you would for standard datatypes.
If you are using GHC, then, with the language extension NamedFieldPuns
, it is also possible to use this form:
getHostData (Configuration { localHost, remoteHost }) = (localHost, remoteHost)
It can be mixed with the normal form like this:
getHostData (Configuration { localHost, remoteHost = rh }) = (localHost, rh)
(To use this language extension, enter :set -XNamedFieldPuns
in the interpreter, or use the {-# LANGUAGE NamedFieldPuns #-}
pragma at the beginning of a source file, or pass the -XNamedFieldPuns
command-line flag to the compiler.)
You can create values of Configuration
in the old way as shown in
the first definition below, or in the named field's type, as shown in
the second definition:
initCFG = Configuration "nobody" "nowhere" "nowhere" False False "/" "/" 0
initCFG' = Configuration
{ username = "nobody"
, localHost = "nowhere"
, remoteHost = "nowhere"
, isGuest = False
, isSuperuser = False
, currentDir = "/"
, homeDir = "/"
, timeConnected = 0
}
The first way is much shorter, but the second is much clearer.
WARNING: The second style will allow you to write code that omits fields but will still compile, such as:
cfgFoo = Configuration { username = "Foo" }
cfgBar = Configuration { localHost = "Bar", remoteHost = "Baz" }
cfgUndef = Configuration {}
Trying to evaluate the unspecified fields will then result in a runtime error!
Parameterized Types
editParameterized types are similar to "generic" or "template" types in other languages. A parameterized type takes one or more type parameters. For example, the Standard Prelude type Maybe
is defined as follows:
data Maybe a = Nothing | Just a
This says that the type Maybe
takes a type parameter a
. You can use this to declare, for example:
lookupBirthday :: [Anniversary] -> String -> Maybe Anniversary
The lookupBirthday
function takes a list of birthday records and a string and returns a Maybe Anniversary
. The usual interpretation of such a type is that if the name given through the string is found in the list of anniversaries the result will be Just
the corresponding record; otherwise, it will be Nothing
. Maybe
is the simplest and most common way of indicating failure in Haskell. It is also sometimes seen in the types of function arguments, as a way to make them optional (the intent being that passing Nothing
amounts to omitting the argument).
You can parameterize type
and newtype
declarations in exactly the same way. Furthermore you can combine parameterized types in arbitrary ways to construct new types.
More than one type parameter
editWe can also have more than one type parameter. An example of this is the Either
type:
data Either a b = Left a | Right b
For example:
pairOff :: Int -> Either String Int
pairOff people
| people < 0 = Left "Can't pair off negative number of people."
| people > 30 = Left "Too many people for this activity."
| even people = Right (people `div` 2)
| otherwise = Left "Can't pair off an odd number of people."
groupPeople :: Int -> String
groupPeople people =
case pairOff people of
Right groups -> "We have " ++ show groups ++ " group(s)."
Left problem -> "Problem! " ++ problem
In this example pairOff
indicates how many groups you would have if you paired off a certain number of people for your activity. It can also let you know if you have too many people for your activity or if somebody will be left out. So pairOff
will return either an Int representing the number of groups you will have, or a String describing the reason why you can't create your groups.
Kind Errors
editThe flexibility of Haskell parameterized types can lead to errors in type declarations that are somewhat like type errors, except that they occur in the type declarations rather than in the program proper. Errors in these "types of types" are known as "kind" errors. You don't program with kinds: the compiler infers them for itself. But if you get parameterized types wrong then the compiler will report a kind error.