Haskell/GADT
Generalized algebraic datatypes, or simply GADTs, are a generalization of the algebraic data types that you are familiar with. Basically, they allow you to explicitly write down the types of the constructors. In this chapter, you'll learn why this is useful and how to declare your own.
We begin with an example of building a simple embedded domain specific language (EDSL) for simple arithmetical expressions, which is put on a sounder footing with GADTs. This is followed by a review of the syntax for GADTs, with simpler illustrations, and a different application to construct a safe list type for which the equivalent of head []
fails to typecheck and thus does not give the usual runtime error: *** Exception: Prelude.head: empty list
.
Understanding GADTs
editSo, what are GADTs and what are they useful for? GADTs are mainly used to implement domain specific languages, and so this section will introduce them with a corresponding example.
Arithmetic expressions
editLet's consider a small language for arithmetic expressions, given by the data type
data Expr = I Int -- integer constants
| Add Expr Expr -- add two expressions
| Mul Expr Expr -- multiply two expressions
In other words, this data type corresponds to the abstract syntax tree, an arithmetic term like (5+1)*7
would be represented as (I 5 `Add` I 1) `Mul` I 7 :: Expr
.
Given the abstract syntax tree, we would like to do something with it; we want to compile it, optimize it and so on. For starters, let's write an evaluation function that takes an expression and calculates the integer value it represents. The definition is straightforward:
eval :: Expr -> Int
eval (I n) = n
eval (Add e1 e2) = eval e1 + eval e2
eval (Mul e1 e2) = eval e1 * eval e2
Extending the language
editNow, imagine that we would like to extend our language with other types than just integers. For instance, let's say we want to represent equality tests, so we need booleans as well. We augment the `Expr` type to become
data Expr = I Int
| B Bool -- boolean constants
| Add Expr Expr
| Mul Expr Expr
| Eq Expr Expr -- equality test
The term 5+1 == 7
would be represented as (I 5 `Add` I 1) `Eq` I 7
.
As before, we want to write a function eval
to evaluate expressions. But this time, expressions can now represent either integers or booleans and we have to capture that in the return type
eval :: Expr -> Either Int Bool
The first two cases are straightforward
eval (I n) = Left n
eval (B b) = Right b
but now we get in trouble. We would like to write
eval (Add e1 e2) = eval e1 + eval e2 -- ???
but this doesn't type check: the addition function +
expects two integer arguments, but eval e1
is of type Either Int Bool
and we'd have to extract the Int
from that.
Even worse, what happens if e1
actually represents a boolean? The following is a valid expression
B True `Add` I 5 :: Expr
but clearly, it doesn't make any sense; we can't add booleans to integers! In other words, evaluation may return integers or booleans, but it may also fail because the expression makes no sense. We have to incorporate that in the return type:
eval :: Expr -> Maybe (Either Int Bool)
Now, we could write this function just fine, but that would still be unsatisfactory, because what we really want to do is to have Haskell's type system rule out any invalid expressions; we don't want to check types ourselves while deconstructing the abstract syntax tree.
Exercise: Despite our goal, it may still be instructional to implement the eval
function; do this.
Starting point:
data Expr = I Int
| B Bool -- boolean constants
| Add Expr Expr
| Mul Expr Expr
| Eq Expr Expr -- equality test
eval :: Expr -> Maybe (Either Int Bool)
-- Your implementation here.
Phantom types
editThe so-called phantom types are the first step towards our goal. The technique is to augment the Expr
with a type variable, so that it becomes
data Expr a = I Int
| B Bool
| Add (Expr a) (Expr a)
| Mul (Expr a) (Expr a)
| Eq (Expr a) (Expr a)
Note that an expression Expr a
does not contain a value a
at all; that's why a
is called a phantom type, it's just a dummy variable. Compare that with, say, a list [a]
which does contain a bunch of a
s.
The key idea is that we're going to use a
to track the type of the expression for us. Instead of making the constructor
Add :: Expr a -> Expr a -> Expr a
available to users of our small language, we are only going to provide a smart constructor with a more restricted type
add :: Expr Int -> Expr Int -> Expr Int
add = Add
The implementation is the same, but the types are different. Doing this with the other constructors as well,
i :: Int -> Expr Int
i = I
b :: Bool -> Expr Bool
b = B
the previously problematic expression
b True `add` i 5
no longer type checks! After all, the first argument has the type Expr Bool
while add
expects an Expr Int
. In other words, the phantom type a
marks the intended type of the expression. By only exporting the smart constructors, the user cannot create expressions with incorrect types.
As before, we want to implement an evaluation function. With our new marker a
, we might hope to give it the type
eval :: Expr a -> a
and implement the first case like this
eval (I n) = n
But alas, this does not work: how would the compiler know that encountering the constructor I
means that a = Int
? Granted, this will be the case for all expressions that were created by users of our language because they are only allowed to use the smart constructors. But internally, an expression like
I 5 :: Expr String
is still valid. In fact, as you can see, a
doesn't even have to be Int
or Bool
, it could be anything.
What we need is a way to restrict the return types of the constructors themselves, and that's exactly what generalized data types do.
GADTs
editTo enable this language feature, add {-# LANGUAGE GADTs #-}
to the beginning of the file.
The obvious notation for restricting the type of a constructor is to write down its type, and that's exactly how GADTs are defined:
data Expr a where
I :: Int -> Expr Int
B :: Bool -> Expr Bool
Add :: Expr Int -> Expr Int -> Expr Int
Mul :: Expr Int -> Expr Int -> Expr Int
Eq :: Eq a => Expr a -> Expr a -> Expr Bool
In other words, we simply list the type signatures of all the constructors. In particular, the marker type a
is specialised to Int
or Bool
according to our needs, just like we would have done with smart constructors.
And the great thing about GADTs is that we now can implement an evaluation function that takes advantage of the type marker:
eval :: Expr a -> a
eval (I n) = n
eval (B b) = b
eval (Add e1 e2) = eval e1 + eval e2
eval (Mul e1 e2) = eval e1 * eval e2
eval (Eq e1 e2) = eval e1 == eval e2
In particular, in the first case
eval (I n) = n
the compiler is now able to infer that a=Int
when we encounter a constructor I
and that it is legal to return the n :: Int
; similarly for the other cases.
To summarise, GADTs allows us to restrict the return types of constructors and thus enable us to take advantage of Haskell's type system for our domain specific languages. Thus, we can implement more languages and their implementation becomes simpler.
Summary
editSyntax
editHere's a quick summary of how the syntax for declaring GADTs works.
First, consider the following ordinary algebraic datatypes: the familiar List
and Maybe
types, and a simple tree type, RoseTree
:
Maybe | List | Rose Tree |
---|---|---|
data Maybe a =
Nothing |
Just a
|
data List a =
Nil |
Cons a (List a)
|
data RoseTree a =
RoseTree a [RoseTree a]
|
Remember that the constructors introduced by these declarations can be used both for pattern matches to deconstruct values and as functions to construct values. (Nothing
and Nil
are functions with "zero arguments".) We can ask what the types of the latter are:
Maybe | List | Rose Tree |
---|---|---|
> :t Nothing Nothing :: Maybe a > :t Just Just :: a -> Maybe a |
> :t Nil Nil :: List a > :t Cons Cons :: a -> List a -> List a |
> :t RoseTree RoseTree :: a -> [RoseTree a] -> RoseTree a |
It is clear that this type information about the constructors for Maybe
, List
and RoseTree
respectively is equivalent to the information we gave to the compiler when declaring the datatype in the first place. In other words, it's also conceivable to declare a datatype by simply listing the types of all of its constructors, and that's exactly what the GADT syntax does:
Maybe | List | Rose Tree |
---|---|---|
data Maybe a where
Nothing :: Maybe a
Just :: a -> Maybe a
|
data List a where
Nil :: List a
Cons :: a -> List a -> List a
|
data RoseTree a where
RoseTree ::
a -> [RoseTree a] -> RoseTree a
|
This syntax is made available by the language option {-#LANGUAGE GADTs #-}
. It should be familiar to you in that it closely resembles the syntax of type class declarations. It's also easy to remember if you already like to think of constructors as just being functions. Each constructor is just defined by a type signature.
New possibilities
editNote that when we asked the GHCi
for the types of Nothing
and Just
it returned Maybe a
and a -> Maybe a
as the types. In these and the other cases, the type of the final output of the function associated with a constructor is the type we were initially defining - Maybe a
, List a
or RoseTree a
. In general, in standard Haskell, the constructor functions for Foo a
have Foo a
as their final return type. If the new syntax were to be strictly equivalent to the old, we would have to place this restriction on its use for valid type declarations.
So what do GADTs add for us? The ability to control exactly what kind of Foo
you return. With GADTs, a constructor for Foo a
is not obliged to return Foo a
; it can return any Foo blah
that you can think of. In the code sample below, for instance, the MkTrueGadtFoo
constructor returns a TrueGadtFoo Int
even though it is for the type TrueGadtFoo a
.
Example: GADT gives you more control:
data FooInGadtClothing a where
MkFooInGadtClothing :: a -> FooInGadtClothing a
--which is no different from: data Haskell98Foo a = MkHaskell98Foo a ,
--by contrast, consider:
data TrueGadtFoo a where
MkTrueGadtFoo :: a -> TrueGadtFoo Int
--This has no Haskell 98 equivalent.
But note that you can only push the generalization so far... if the datatype you are declaring is a Foo
, the constructor functions must return some kind of Foo
or another. Returning anything else simply wouldn't work.
Example: Try this out. It doesn't work:
data Bar where
BarNone :: Bar -- This is ok
data Foo where
MkFoo :: Bar Int-- This will not typecheck
Examples
editSafe Lists
edit- Prerequisite: We assume in this section that you know how a List tends to be represented in functional languages
- Note: The examples in this section additionally require the extensions EmptyDataDecls and KindSignatures to be enabled
We've now gotten a glimpse of the extra control given to us by the GADT syntax. The only thing new is that you can control exactly what kind of data structure you return. Now, what can we use it for? Consider the humble Haskell list. What happens when you invoke head []
? Haskell blows up. Have you ever wished you could have a magical version of head
that only accepts lists with at least one element, lists on which it will never blow up?
To begin with, let's define a new type, SafeList x y
. The idea is to have something similar to normal Haskell lists [x]
, but with a little extra information in the type. This extra information (the type variable y
) tells us whether or not the list is empty. Empty lists are represented as SafeList x Empty
, whereas non-empty lists are represented as SafeList x NonEmpty
.
-- we have to define these types
data Empty
data NonEmpty
-- the idea is that you can have either
-- SafeList a Empty
-- or SafeList a NonEmpty
data SafeList a b where
-- to be implemented
Since we have this extra information, we can now define a function safeHead
on only the non-empty lists! Calling safeHead
on an empty list would simply refuse to type-check.
safeHead :: SafeList a NonEmpty -> a
So now that we know what we want, safeHead
, how do we actually go about getting it? The answer is GADT. The key is that we take advantage of the GADT feature to return two different list-of-a types, SafeList a Empty
for the Nil
constructor, and SafeList a NonEmpty
for the Cons
constructor:
data SafeList a b where
Nil :: SafeList a Empty
Cons :: a -> SafeList a b -> SafeList a NonEmpty
This wouldn't have been possible without GADT, because all of our constructors would have been required to return the same type of list; whereas with GADT we can now return different types of lists with different constructors. Anyway, let's put this all together, along with the actual definition of SafeHead
:
Example: safe lists via GADT
{-#LANGUAGE GADTs, EmptyDataDecls #-}
-- (the EmptyDataDecls pragma must also appear at the very top of the module,
-- in order to allow the Empty and NonEmpty datatype declarations.)
data Empty
data NonEmpty
data SafeList a b where
Nil :: SafeList a Empty
Cons:: a -> SafeList a b -> SafeList a NonEmpty
safeHead :: SafeList a NonEmpty -> a
safeHead (Cons x _) = x
Copy this listing into a file and load in ghci -fglasgow-exts
. You should notice the following difference, calling safeHead
on a non-empty and an empty-list respectively:
Example: safeHead
is... safe
Prelude Main> safeHead (Cons "hi" Nil) "hi" Prelude Main> safeHead Nil <interactive>:1:9: Couldn't match `NonEmpty' against `Empty' Expected type: SafeList a NonEmpty Inferred type: SafeList a Empty In the first argument of `safeHead', namely `Nil' In the definition of `it': it = safeHead Nil
The complaint is a good thing: it means that we can now ensure during compile-time if we're calling safeHead
on an appropriate list. However, that also sets up a pitfall in potential. Consider the following function. What do you think its type is?
Example: Trouble with GADTs
silly False = Nil
silly True = Cons () Nil
Now try loading the example up in GHCi. You'll notice the following complaint:
Example: Trouble with GADTs - the complaint
Couldn't match `Empty' against `NonEmpty' Expected type: SafeList () Empty Inferred type: SafeList () NonEmpty In the application `Cons () Nil' In the definition of `silly': silly True = Cons () Nil
The cases in the definition of silly
evaluate to marked lists of different types, leading to a type error. The extra constraints imposed through the GADT make it impossible for a function to produce both empty and non-empty lists.
If we are really keen on defining silly
, we can do so by liberalizing the type of Cons
, so that it can construct both safe and unsafe lists.
Example: A different approach
{-#LANGUAGE GADTs, EmptyDataDecls, KindSignatures #-}
-- here we add the KindSignatures pragma,
-- which makes the GADT declaration a bit more elegant.
-- Note the subtle yet revealing change in the phantom type names.
data NotSafe
data Safe
data MarkedList :: * -> * -> * where
Nil :: MarkedList t NotSafe
Cons :: a -> MarkedList a b -> MarkedList a c
safeHead :: MarkedList a Safe -> a
safeHead (Cons x _) = x
-- This function will never produce anything that can be consumed by safeHead,
-- no matter that the resulting list is not necessarily empty.
silly :: Bool -> MarkedList () NotSafe
silly False = Nil
silly True = Cons () Nil
There is a cost to the fix above: by relaxing the constraint on Cons
we throw away the knowledge that it cannot produce an empty list. Based on our first version of the safe list we could, for instance, define a function which took a SafeList a Empty
argument and be sure anything produced by Cons
would not be accepted by it. That does not hold for the analogous MarkedList a NotSafe
; arguably, the type is less useful exactly because it is less restrictive. While in this example the issue may seem minor, given that not much can be done with an empty list, in general it is worth considering.
Exercises |
---|
|
A simple expression evaluator
edit- Insert the example used in Wobbly Types paper... I thought that was quite pedagogical
- This is already covered in the first part of the tutorial.
Discussion
edit- More examples, thoughts
- From FOSDEM 2006, I vaguely recall that there is some relationship between GADT and the below... what?
Phantom types
editSee Phantom types.
Existential types
editIf you like Existentially quantified types, you'd probably want to notice that they are now subsumed by GADTs. As the GHC manual says, the following two type declarations give you the same thing.
data TE a = forall b. MkTE b (b->a)
data TG a where { MkTG :: b -> (b->a) -> TG a }
Witness types
editThis page is a stub. You can help Haskell by expanding it. |
At least part of this page was imported from the Haskell wiki article Generalised algebraic datatype, in accordance to its Simple Permissive License. If you wish to modify this page and if your changes will also be useful on that wiki, you might consider modifying that source page instead of this one, as changes from that page may propagate here, but not the other way around. Alternately, you can explicitly dual license your contributions under the Simple Permissive License. |