# Yet Another Haskell Tutorial/Type advanced

As you've probably ascertained by this point, the type system is integral to Haskell. While this chapter is called "Advanced Types", you will probably find it to be more general than that and it must not be skipped simply because you're not interested in the type system.

## Type Synonyms

editType synonyms exist in Haskell simply for convenience: their removal would not make Haskell any less powerful.

Consider the case when you are constantly dealing with lists of
three-dimensional points. For instance, you might have a function
with type ```
[(Double,Double,Double)] -> Double ->
[(Double,Double,Double)]
```

. Since you are a good software engineer,
you want to place type signatures on all your top-level functions.
However, typing `[(Double,Double,Double)]`

all the time gets very
tedious. To get around this, you can define a type synonym:

type List3D = [(Double,Double,Double)]

Now, the type signature for your functions may be written ```
List3D ->
Double -> List3D
```

.

We should note that type synonyms cannot be self-referential. That is, you cannot have:

type BadType = Int -> BadType

This is because this is an "infinite type." Since Haskell removes
type synonyms very early on, any instance of `BadType`

will be
replaced by `Int -> BadType`

, which will result in an infinite loop.

To make a recursive type, one would use `newtype`

newtype GoodType = MakeGoodType (Int -> GoodType)

Type synonyms can also be parameterized. For instance, you might want
to be able to change the types of the points in the list of 3D
points. For this, you could define:

type List3D a = [(a,a,a)]

Then your references to `[(Double,Double,Double)]`

would become
`List3D Double`

.

## Newtypes

editConsider the problem in which you need to have a type which is very
much like `Int`

, but its ordering is defined differently. Perhaps
you wish to order `Int`

s first by even numbers then by odd numbers
(that is, all odd numbers are greater than any even number and within
the odd/even subsets, ordering is standard).

Unfortunately, you cannot define a new instance of `Ord`

for
`Int`

because then Haskell won't know which one to use. What you
want is to define a type which is *isomorphic* to `Int`

.

*Note*

"Isomorphic" is a common term in mathematics which basically means "structurally identical." For instance, in graph theory, if you have two graphs which are identical except they have different labels on the nodes, they are isomorphic. In our context, two types are isomorphic if they have the same underlying structure.

One way to do this would be to define a new datatype:

data MyInt = MyInt Int

We could then write appropriate code for this datatype. The problem
(and this is very subtle) is that this type is not truly isomorphic to
`Int`

: it has one more value. When we think of the type `Int`

,
we usually think that it takes all values of integers, but it really
has one more value: (pronounced "bottom"), which is used
to represent erroneous or undefined computations. Thus, `MyInt`

has not only values `MyInt 0`

, `MyInt 1`

and so on, but also
`MyInt `

. However, since datatypes can themselves be
undefined, it has an additional value: which differs from
`MyInt `

and this makes the types non-isomorphic. (See
the section on Bottom for more information on bottom.)

Disregarding that subtlety, there may be efficiency issues with this
representation: now, instead of simply storing an integer, we have to
store a pointer to an integer and have to follow that pointer whenever
we need the value of a `MyInt`

.

To get around these problems, Haskell has a ` newtype`
construction. A

`is a cross between a datatype and a type synonym: it has a constructor like a datatype, but it can have only one constructor and this constructor can have only one argument. For instance, we can define:`

**newtype**newtype MyInt = MyInt Int

But we cannot define any of:

newtype Bad1 = Bad1a Int | Bad1b Double newtype Bad2 = Bad2 Int Double

Of course, the fact that we cannot define `Bad2`

as above is not a
big issue: we simply use `type` instead:

type Good2 = Good2 Int Double

or (almost equivalently) declare a newtype alias to the existing tuple type:

newtype Good2 = Good2 (Int,Double)

Now, suppose we've defined `MyInt`

as a ` newtype`:

instance Ord MyInt where compare (MyInt i) (MyInt j) | odd i && odd j = compare i j | even i && even j = compare i j | even i = LT | otherwise = GT

Like datatype, we can still derive classes like `Show`

and `Eq`

over newtypes (in fact, I'm implicitly assuming we have derived
`Eq`

over `MyInt`

-- where is my assumption in the above code?).

Moreover, in recent versions of GHC (see
the section on Ghc), on newtypes, you are allowed to derive
*any* class of which the base type (in this case, `Int`

) is an
instance. For example, we could derive `Num`

on `MyInt`

to
provide arithmetic functions over it.

Pattern matching over newtypes is exactly as in datatypes. We can
write constructor and destructor functions for `MyInt`

as follows:

mkMyInt i = MyInt i unMyInt (MyInt i) = i

## Datatypes

editWe've already seen datatypes used in a variety of contexts. This section concludes some of the discussion and introduces some of the common datatypes in Haskell. It also provides a more theoretical underpinning to what datatypes actually are.

### Strict Fields

editOne of the great things about Haskell is that computation is performed lazily. However, sometimes this leads to inefficiencies. One way around this problem is to use datatypes with strict fields. Before we talk about the solution, let's spend some time to get a bit more comfortable with how bottom works in to the picture (for more theory, see the section on Bottom).

Suppose we've defined the unit datatype (this one of the simplest datatypes you can define):

data Unit = Unit

This datatype has exactly one constructor, `Unit`

, which takes no
arguments. In a strict language like ML, there would be exactly one
value of type `Unit`

: namely, `Unit`

. This is not quite so in
Haskell. In fact, there are *two* values of type `Unit`

. One
of them is `Unit`

. The other is bottom (written ).

You can think of bottom as representing a computation which won't halt. For instance, suppose we define the value:

foo = foo

This is perfectly valid Haskell code and simply says that when you
want to evaluate `foo`

, all you need to do is evaluate `foo`

.
Clearly this is an "infinite loop."

What is the type of `foo`

? Simply `a`

. We cannot say anything
more about it than that. The fact that `foo`

has type `a`

in
fact tells us that it must be an infinite loop (or some other such
strange value). However, since `foo`

has type `a`

and thus can
have any type, it can also have type `Unit`

. We could write, for
instance:

foo :: Unit foo = foo

Thus, we have found a second value with type `Unit`

. In fact, we
have found all values of type `Unit`

. Any other non-terminating
function or error-producing function will have exactly the same effect
as `foo`

(though Haskell provides some more utility with the
function `error`

).

This means, for instance, that there are actually *four* values
with type `Maybe Unit`

. They are: , `Nothing`

,
`Just `

and `Just Unit`

. However, it could be the fact
that you, as a programmer, know that you will never come across the
third of these. Namely, you want the argument to `Just`

to be
*strict*. This means that if the argument to `Just`

is
bottom, then the entire structure becomes bottom. You use an
exclamation point to specify a constructor as strict. We can define a
strict version of `Maybe`

as:

data SMaybe a = SNothing | SJust !a

There are now only three values of `SMaybe`

. We can see the
difference by writing the following program:

module Main where import System data SMaybe a = SNothing | SJust !a deriving Show main = do [cmd] <- getArgs case cmd of "a" -> printJust undefined "b" -> printJust Nothing "c" -> printJust (Just undefined) "d" -> printJust (Just ()) "e" -> printSJust undefined "f" -> printSJust SNothing "g" -> printSJust (SJust undefined) "h" -> printSJust (SJust ()) printJust :: Maybe () -> IO () printJust Nothing = putStrLn "Nothing" printJust (Just x) = do putStr "Just "; print x printSJust :: SMaybe () -> IO () printSJust SNothing = putStrLn "Nothing" printSJust (SJust x) = do putStr "Just "; print x

Here, depending on what command line argument is passed, we will do something different. The outputs for the various options are:

**Example:**

% ./strict a Fail: Prelude.undefined % ./strict b Nothing % ./strict c Just Fail: Prelude.undefined % ./strict d Just () % ./strict e Fail: Prelude.undefined % ./strict f Nothing % ./strict g Fail: Prelude.undefined % ./strict h Just ()

The thing worth noting here is the difference between cases "c" and
"g". In the "c" case, the `Just`

is printed, because this is
printed *before* the undefined value is evaluated. However, in
the "g" case, since the constructor is strict, as soon as you match
the `SJust`

, you also match the value. In this case, the value is
undefined, so the whole thing fails before it gets a chance to do
*anything*.

## Classes

editWe have already encountered type classes a few times, but only in the context of previously existing type classes. This section is about how to define your own. We will begin the discussion by talking about Pong and then move on to a useful generalization of computations.

### Pong

editThe discussion here will be motivated by the
construction of the game Pong (see the appendix on Pong for
the full code). In Pong, there are three things drawn on the screen:
the two paddles and the ball. While the paddles and the ball are
different in a few respects, they share many commonalities, such as
position, velocity, acceleration, color, shape, and so on. We can
express these commonalities by defining a class for Pong entities,
which we call `Entity`

. We make such a definition as follows:

class Entity a where getPosition :: a -> (Int,Int) getVelocity :: a -> (Int,Int) getAcceleration :: a -> (Int,Int) getColor :: a -> Color getShape :: a -> Shape

This code defines a typeclass `Entity`

. This class has five
methods: `getPosition`

, `getVelocity`

, `getAcceleration`

,
`getColor`

and `getShape`

with the corresponding types.

The first line here uses the keyword ` class` to introduce a new
typeclass. We can read this typeclass definition as "There is a
typeclass 'Entity'; a type 'a' is an instance of Entity if it provides
the following five functions: ...". To see how we can write an
instance of this class, let us define a player (paddle) datatype:

data Paddle = Paddle { paddlePosX, paddlePosY, paddleVelX, paddleVelY, paddleAccX, paddleAccY :: Int, paddleColor :: Color, paddleHeight :: Int, playerNumber :: Int }

Given this data declaration, we can define `Paddle`

to be an
instance of `Entity`

:

instance Entity Paddle where getPosition p = (paddlePosX p, paddlePosY p) getVelocity p = (paddleVelX p, paddleVelY p) getAcceleration p = (paddleAccX p, paddleAccY p) getColor = paddleColor getShape = Rectangle 5 . paddleHeight

The actual Haskell types of the class functions all have included the
context `Entity a =>`

. For example, `getPosition`

has type
`Entity a => a -> (Int,Int)`

. However, it will turn out that many
of our routines will need entities to also be instances of `Eq`

.
We can therefore choose to make `Entity`

a subclass of `Eq`

:
namely, you can only be an instance of `Entity`

if you are already
an instance of `Eq`

. To do this, we change the first line of the
class declaration to:

class Eq a => Entity a where

Now, in order to define `Paddle`

s to be instances of `Entity`

we
will first need them to be instances of `Eq`

-- we can do this by
deriving the class.

### Computations

editLet's think back to our original motivation for defining the
`Maybe`

datatype from the section on Datatypes-maybe. We
wanted to be able to express that functions (i.e., computations) can
fail.

Let us consider the case of performing search on a graph. Allow us to take a small aside to set up a small graph library:

data Graph v e = Graph [(Int,v)] [(Int,Int,e)]

The `Graph`

datatype takes two type arguments which correspond to
vertex and edge labels. The first argument to the `Graph`

constructor is a list (set) of vertices; the second is the list (set)
of edges. We will assume these lists are always sorted and that each
vertex has a unique id and that there is at most one edge between any
two vertices.

Suppose we want to search for a path between two vertices. Perhaps
there is no path between those vertices. To represent this, we will
use the `Maybe`

datatype. If it succeeds, it will return the list
of vertices traversed. Our search function could be written (naively)
as follows:

search :: Graph v e -> Int -> Int -> Maybe [Int] search g@(Graph vl el) src dst | src == dst = Just [src] | otherwise = search' el where search' [] = Nothing search' ((u,v,_):es) | src == u = case search g v dst of Just p -> Just (u:p) Nothing -> search' es | otherwise = search' es

This algorithm works as follows (try to read along): to search in a
graph `g`

from `src`

to `dst`

, first we check to see if these
are equal. If they are, we have found our way and just return the
trivial solution. Otherwise, we want to traverse the edge-list. If
we're traversing the edge-list and it is empty, we've failed, so we
return `Nothing`

. Otherwise, we're looking at an edge from `u`

to `v`

. If `u`

is our source, then we consider this step and
recursively search the graph from `v`

to `dst`

. If this fails,
we try the rest of the edges; if this succeeds, we put our current
position before the path found and return. If `u`

is not our
source, this edge is useless and we continue traversing the edge-list.

This algorithm is terrible: namely, if the graph contains cycles, it can loop indefinitely. Nevertheless, it is sufficient for now. Be sure you understand it well: things only get more complicated.

Now, there are cases where the `Maybe`

datatype is not
sufficient: perhaps we wish to include an error message together with
the failure. We could define a datatype to express this as:

data Failable a = Success a | Fail String

Now, failures come with a failure string to express what went wrong. We can rewrite our search function to use this datatype:

search2 :: Graph v e -> Int -> Int -> Failable [Int] search2 g@(Graph vl el) src dst | src == dst = Success [src] | otherwise = search' el where search' [] = Fail "No path" search' ((u,v,_):es) | src == u = case search2 g v dst of Success p -> Success (u:p) _ -> search' es | otherwise = search' es

This code is a straightforward translation of the above.

There is another option for this computation: perhaps we want not just one path, but all possible paths. We can express this as a function which returns a list of lists of vertices. The basic idea is the same:

search3 :: Graph v e -> Int -> Int -> [[Int]] search3 g@(Graph vl el) src dst | src == dst = [[src]] | otherwise = search' el where search' [] = [] search' ((u,v,_):es) | src == u = map (u:) (search3 g v dst) ++ search' es | otherwise = search' es

The code here has gotten a little shorter, thanks to the standard
prelude `map`

function, though it is essentially the same.

We may ask ourselves what all of these have in common and try to gobble up those commonalities in a class. In essence, we need some way of representing success and some way of representing failure. Furthermore, we need a way to combine two successes (in the first two cases, the first success is chosen; in the third, they are strung together). Finally, we need to be able to augment a previous success (if there was one) with some new value. We can fit this all into a class as follows:

class Computation c where success :: a -> c a failure :: String -> c a augment :: c a -> (a -> c b) -> c b combine :: c a -> c a -> c a

In this class declaration, we're saying that `c`

is an instance of
the class `Computation`

if it provides four functions:
`success`

, `failure`

, `augment`

and `combine`

. The
`success`

function takes a value of type `a`

and returns it
wrapped up in `c`

, representing a successful computation. The
`failure`

function takes a `String`

and returns a computation
representing a failure. The `combine`

function takes two previous
computations and produces a new one which is the combination of both.
The `augment`

function is a bit more complex.

The `augment`

function takes some previously given computation
(namely, `c a`

) and a function which takes the value of that
computation (the `a`

) and returns a `b`

and produces a `b`

inside of that computation. Note that in our current situation,
giving `augment`

the type `c a -> (a -> a) -> c a`

would have
been sufficient, since `a`

is always `[Int]`

, but we make it
more general this time just for generality.

How `augment`

works is probably best shown by example. We can
define `Maybe`

, `Failable`

and `[]`

to be instances of
`Computation`

as:

instance Computation Maybe where success = Just failure = const Nothing augment (Just x) f = f x augment Nothing _ = Nothing combine Nothing y = y combine x _ = x

Here, success is represented with `Just`

and `failure`

ignores
its argument and returns `Nothing`

. The `combine`

function
takes the first success we found and ignores the rest. The function
`augment`

checks to see if we succeeded before (and thus had a
`Just`

something) and, if we did, applies `f`

to it. If we
failed before (and thus had a `Nothing`

), we ignore the function
and return `Nothing`

.

instance Computation Failable where success = Success failure = Fail augment (Success x) f = f x augment (Fail s) _ = Fail s combine (Fail _) y = y combine x _ = x

These definitions are obvious. Finally:

instance Computation [] where success a = [a] failure = const [] augment l f = concat (map f l) combine = (++)

Here, the value of a successful computation is a singleton list containing that value. Failure is represented with the empty list and to combine previous successes we simply catenate them. Finally, augmenting a computation amounts to mapping the function across the list of previous computations and concatenate them. we apply the function to each element in the list and then concatenate the results.

Using these computations, we can express all of the above versions of search as:

searchAll g@(Graph vl el) src dst | src == dst = success [src] | otherwise = search' el where search' [] = failure "no path" search' ((u,v,_):es) | src == u = (searchAll g v dst `augment` (success . (u:))) `combine` search' es | otherwise = search' es

In this, we see the uses of all the functions from the class
`Computation`

.

If you've understood this discussion of computations, you are in a
very good position as you have understood the concept of
*monads*, probably the most difficult concept in Haskell. In
fact, the `Computation`

class is almost exactly the `Monad`

class, except that `success`

is called `return`

, `failure`

is
called `fail`

and `augment`

is called `>>=`

(read "bind").
The `combine`

function isn't actually required by monads, but is
found in the `MonadPlus`

class for reasons which will become
obvious later.

If you didn't understand everything here, read through it again and then wait for the proper discussion of monads in the chapter Monads.

## Instances

editWe have already seen how to declare instances of some simple classes;
allow us to consider some more advanced classes here. There is a
`Functor`

class defined in the `Functor`

module.

*Note*

The name "functor", like "monad" comes from category theory. There, a functor is like a function, but instead of mapping elements to elements, it maps structures to structures.

The definition of the functor class is:

class Functor f where fmap :: (a -> b) -> f a -> f b

The type definition for `fmap`

(not to mention its name) is very
similar to the function `map`

over lists. In fact, `fmap`

is
essentially a generalization of `map`

to arbitrary structures (and,
of course, lists are already instances of `Functor`

). However, we
can also define other structures to be instances of functors.
Consider the following datatype for binary trees:

data BinTree a = Leaf a | Branch (BinTree a) (BinTree a)

We can immediately identify that the `BinTree`

type essentially
"raises" a type `a`

into trees of that type. There is a
naturally associated functor which goes along with this raising. We
can write the instance:

instance Functor BinTree where fmap f (Leaf a) = Leaf (f a) fmap f (Branch left right) = Branch (fmap f left) (fmap f right)

Now, we've seen how to make something like `BinTree`

an instance of
`Eq`

by using the ` deriving` keyword, but here we will do it by
hand. We want to make

`BinTree a`

s instances of `Eq`

but
obviously we cannot do this unless `a`

is itself an instance of
`Eq`

. We can specify this dependence in the instance declaration:
instance Eq a => Eq (BinTree a) where Leaf a == Leaf b = a == b Branch l r == Branch l' r' = l == l' && r == r' _ == _ = False

The first line of this can be read "if `a`

is an instance of
`Eq`

, then `BinTree a`

is also an instance of `Eq`

". We
then provide the definitions. If we did not include the ```
Eq a
=>
```

part, the compiler would complain because we're trying to use
the `==`

function on `a`

s in the second line.

The "`Eq a =>`

" part of the definition is called the "context."
We should note that there are some restrictions on what can appear in
the context and what can appear in the declaration. For instance,
we're not allowed to have instance declarations that don't contain
type constructors on the right hand side. To see why, consider the
following declarations:

class MyEq a where myeq :: a -> a -> Bool instance Eq a => MyEq a where myeq = (==)

As it stands, there doesn't seem to be anything wrong with this definition. However, if elsewhere in a program we had the definition:

instance MyEq a => Eq a where (==) = myeq

In this case, if we're trying to establish if some type is an instance
of `Eq`

, we could reduce it to trying to find out if that type is
an instance of `MyEq`

, which we could in turn reduce to trying to
find out if that type is an instance of `Eq`

, and so on. The
compiler protects itself against this by refusing the first instance
declaration.

This is commonly known as the *closed-world assumption*. That
is, we're assuming, when we write a definition like the first one,
that there won't be any declarations like the second. However, this
assumption is invalid because there's nothing to prevent the second
declaration (or some equally evil declaration). The closed world
assumption can also bite you in cases like:

class OnlyInts a where foo :: a -> a -> Bool instance OnlyInts Int where foo = (==) bar :: OnlyInts a => a -> Bool bar = foo 5

We've again made the closed-world assumption: we've assumed that the
only instance of `OnlyInts`

is `Int`

, but there's no reason
another instance couldn't be defined elsewhere, ruining our definition
of `bar`

.

## Kinds

editLet us take a moment and think about what types are available in
Haskell. We have simple types, like `Int`

, `Char`

, `Double`

and so on. We then have type constructors like `Maybe`

which take
a type (like `Char`

) and produce a new type, `Maybe Char`

.
Similarly, the type constructor `[]`

(lists) takes a type (like
`Int`

) and produces `[Int]`

. We have more complex things like
`->`

(function arrow) which takes *two* types (say `Int`

and `Bool`

) and produces a new type `Int -> Bool`

.

In a sense, these types themselves have type. Types like `Int`

have some sort of basic type. Types like `Maybe`

have a type which
takes something of basic type and returns something of basic type.
And so forth.

Talking about the types of types becomes unwieldy and highly
ambiguous, so we call the types of types "kinds." What we have been
calling "basic types" have kind "`*`

". Something of kind
`*`

is something which can have an actual value. There is also a
single kind constructor, `->`

with which we can build more complex
kinds.

Consider `Maybe`

. This takes something of kind `*`

and produces
something of kind `*`

. Thus, the kind of `Maybe`

is `* -> *`

. Recall the definition of `Pair`

from
the section on Datatypes-pairs:

data Pair a b = Pair a b

Here, `Pair`

is a type constructor which takes two arguments, each
of kind `*`

and produces a type of kind `*`

. Thus, the kind of
`Pair`

is `* -> (* -> *)`

. However, we again assume
associativity so we just write `* -> * -> *`

.

Let us make a slightly strange datatype definition:

data Strange c a b = MkStrange (c a) (c b)

Before we analyze the kind of `Strange`

, let's think about what it
does. It is essentially a pairing constructor, though it doesn't pair
actual elements, but elements within another constructor. For
instance, think of `c`

as `Maybe`

. Then `MkStrange`

pairs
`Maybe`

s of the two types `a`

and `b`

. However, `c`

need not be `Maybe`

but could instead by `[]`

, or many other
things.

What do we know about `c`

, though? We know that it must have kind
`* -> *`

. This is because we have `c a`

on the right hand
side. The type variables `a`

and `b`

each have kind `*`

as
before. Thus, the kind of `Strange`

is `(* -> *) -> * -> * -> *`

. That is, it takes a constructor (`c`

) of kind `* -> *`

together with two types of kind `*`

and produces something of kind
`*`

.

A question may arise regarding how we know `a`

has kind `*`

and
not some other kind `k`

. In fact, the inferred kind for
`Strange`

is `(k -> *) -> k -> k -> *`

. However, this requires polymorphism on the kind level, which is too complex, so we make a
default assumption that `k = *`

.

*Note*

There are extensions to GHC which allow you to specify the kind of constructors directly. For instance, if you wanted a different kind, you could write this explicitly:

data Strange (c :: (* -> *) -> *) a b = MkStrange (c a) (c b)

to give a different kind to `Strange`

.

The notation of kinds suggests that we can perform partial application, as we can for functions. And, in fact, we can. For instance, we could have:

type MaybePair = Strange Maybe

The kind of `MaybePair`

is, not surprisingly, `* -> * -> *`

.

We should note here that all of the following definitions are acceptable:

type MaybePair1 = Strange Maybe type MaybePair2 a = Strange Maybe a type MaybePair3 a b = Strange Maybe a b

These all appear to be the same, but they are in fact not identical as far as Haskell's type system is concerned. The following are all valid type definitions using the above:

type MaybePair1a = MaybePair1 type MaybePair1b = MaybePair1 Int type MaybePair1c = MaybePair1 Int Double type MaybePair2b = MaybePair2 Int type MaybePair2c = MaybePair2 Int Double type MaybePair3c = MaybePair3 Int Double

But the following are *not* valid:

type MaybePair2a = MaybePair2 type MaybePair3a = MaybePair3 type MaybePair3b = MaybePair3 Int

This is because while it is possible to partially apply type
constructors on datatypes, it is not possible on type synonyms. For
instance, the reason `MaybePair2a`

is invalid is because
`MaybePair2`

is defined as a type synonym with one argument and we
have given it none. The same applies for the invalid `MaybePair3`

definitions.

## Class Hierarchies

edit## Default

editwhat is it?