F Sharp Programming/Mutable Collections

Previous: Arrays Index Next: Input and Output
F# : Mutable Collections

The .NET BCL comes with its own suite of mutable collections which are found in the System.Collections.Generic namespace. These built-in collections are very similar to their immutable counterparts in F#.

List<'T> Class

edit

The List<'T> class represents a strongly typed list of objects that can be accessed by index. Conceptually, this makes the List<'T> class similar to arrays. However, unlike arrays, Lists can be resized and don't need to have their size specified on declaration.

.NET lists are created using the new keyword and calling the list's constructor as follows:

> open System.Collections.Generic;;
> let myList = new List<string>();;

val myList : List<string>

> myList.Add("hello");;
val it : unit = ()
> myList.Add("world");;
val it : unit = ()
> myList.[0];;
val it : string = "hello"
> myList |> Seq.iteri (fun index item -> printfn "%i: %s" index myList.[index]);;
0: hello
1: world

It's easy to tell that .NET lists are mutable because their Add methods return unit rather than returning another list.

Underlying Implementation

edit

Behind the scenes, the List<'T> class is just a fancy wrapper for an array. When a List<'T> is constructed, it creates an 4-element array in memory. Adding the first 4 items is an O(1) operation. However, as soon as the 5th element needs to be added, the list doubles the size of the internal array, which means it has to reallocate new memory and copy elements in the existing list; this is a O(n) operation, where n is the number of items in the list.

The List<'T>.Count property returns the number of items currently held in the collection, the List<'T>.Capacity collection returns the size of the underlying array. This code sample demonstrates how the underlying array is resized:

open System
open System.Collections.Generic

let items = new List<string>()

let printList (l : List<_>) =
    printfn "l.Count: %i, l.Capacity: %i" l.Count l.Capacity
    printfn "Items:"
    l |> Seq.iteri (fun index item ->
        printfn "    l.[%i]: %s" index l.[index])
    printfn "-----------"

let main() =
    printList items
    
    items.Add("monkey")
    printList items
    
    items.Add("kitty")
    items.Add("bunny")
    printList items
    
    items.Add("doggy")
    items.Add("octopussy")
    items.Add("ducky")
    printList items
    
    printfn "Removing entry for \"doggy\"\n--------\n"
    items.Remove("doggy") |> ignore
    printList items
    
    printfn "Removing item at index 3\n--------\n"
    items.RemoveAt(3)
    printList items
    
    Console.ReadKey(true) |> ignore
            
main()
l.Count: 0, l.Capacity: 0
Items:
-----------
l.Count: 1, l.Capacity: 4
Items:
    l.[0]: monkey
-----------
l.Count: 3, l.Capacity: 4
Items:
    l.[0]: monkey
    l.[1]: kitty
    l.[2]: bunny
-----------
l.Count: 6, l.Capacity: 8
Items:
    l.[0]: monkey
    l.[1]: kitty
    l.[2]: bunny
    l.[3]: doggy
    l.[4]: octopussy
    l.[5]: ducky
-----------
Removing entry for "doggy"
--------

l.Count: 5, l.Capacity: 8
Items:
    l.[0]: monkey
    l.[1]: kitty
    l.[2]: bunny
    l.[3]: octopussy
    l.[4]: ducky
-----------
Removing item at index 3
--------

l.Count: 4, l.Capacity: 8
Items:
    l.[0]: monkey
    l.[1]: kitty
    l.[2]: bunny
    l.[3]: ducky
-----------

If you know the maximum size of the list beforehand, it is possible to avoid the performance hit by calling the List<'T>(size : int) constructor instead. The following sample demonstrates how to add 1000 items to a list without resizing the internal array:

> let myList = new List<int>(1000);;

val myList : List<int>

> myList.Count, myList.Capacity;;
val it : int * int = (0, 1000)
> seq { 1 .. 1000 } |> Seq.iter (fun x -> myList.Add(x));;
val it : unit = ()
> myList.Count, myList.Capacity;;
val it : int * int = (1000, 1000)

LinkedList<'T> Class

edit

A LinkedList<'T> represented a doubly-linked sequence of nodes which allows efficient O(1) inserts and removal, supports forward and backward traversal, but its implementation prevents efficient random access. Linked lists have a few valuable methods:

(* Prepends an item to the LinkedList *)
val AddFirst : 'T -> LinkedListNode<'T>

(* Appends an items to the LinkedList *)
val AddLast : 'T -> LinkedListNode<'T>

(* Adds an item before a LinkedListNode *)
val AddBefore : LinkedListNode<'T> -> 'T -> LinkedListNode<'T>

(* Adds an item after a LinkedListNode *)
val AddAfter : LinkedListNode<'T> -> 'T -> LinkedListNode<'T>

Note that these methods return a LinkedListNode<'T>, not a new LinkedList<'T>. Adding nodes actually mutates the LinkedList object:

> open System.Collections.Generic;;
> let items = new LinkedList<string>();;

val items : LinkedList<string>

> items.AddLast("AddLast1");;
val it : LinkedListNode<string>
= System.Collections.Generic.LinkedListNode`1[System.String]
    {List = seq ["AddLast1"];
     Next = null;
     Previous = null;
     Value = "AddLast1";}
> items.AddLast("AddLast2");;
val it : LinkedListNode<string>
= System.Collections.Generic.LinkedListNode`1[System.String]
    {List = seq ["AddLast1"; "AddLast2"];
     Next = null;
     Previous = System.Collections.Generic.LinkedListNode`1[System.String];
     Value = "AddLast2";}
> let firstItem = items.AddFirst("AddFirst1");;

val firstItem : LinkedListNode<string>

> let addAfter = items.AddAfter(firstItem, "addAfter");;

val addAfter : LinkedListNode<string>

> let addBefore = items.AddBefore(addAfter, "addBefore");;

val addBefore : LinkedListNode<string>

> items |> Seq.iter (fun x -> printfn "%s" x);;
AddFirst1
addBefore
addAfter
AddLast1
AddLast2

The Stack<'T> and Queue<'T> classes are special cases of a linked list (they can be thought of as linked lists with restrictions on where you can add and remove items).

Stack<'T> Class

edit

A Stack<'T> only allows programmers prepend/push and remove/pop items from the front of a list, which makes it a last in, first out (LIFO) data structure. The Stack<'T> class can be thought of as a mutable version of the F# list.

> stack.Push("First");; (* Adds item to front of the list *)
val it : unit = ()
> stack.Push("Second");;
val it : unit = ()
> stack.Push("Third");;
val it : unit = ()
> stack.Pop();; (* Returns and removes item from front of the list *)
val it : string = "Third"
> stack.Pop();;
val it : string = "Second"
> stack.Pop();;
val it : string = "First"

A stack of coins could be represented with this data structure. If we stacked coins one on top another, the first coin in the stack is at the bottom of the stack, and the last coin in the stack appears at the top. We remove coins from top to bottom, so the last coin added to the stack is the first one removed.

 
A simple representation of a stack

Queue<'T> Class

edit

A Queue<'T> only allows programmers to append/enqueue to the rear of a list and remove/dequeue from the front of a list, which makes it a first in, first out (FIFO) data structure.

> let queue = new Queue<string>();;

val queue : Queue<string>

> queue.Enqueue("First");; (* Adds item to the rear of the list *)
val it : unit = ()
> queue.Enqueue("Second");;
val it : unit = ()
> queue.Enqueue("Third");;
val it : unit = ()
> queue.Dequeue();; (* Returns and removes item from front of the queue *)
val it : string = "First"
> queue.Dequeue();;
val it : string = "Second"
> queue.Dequeue();;
val it : string = "Third"

A line of people might be represented by a queue: people add themselves to the rear of the line, and are removed from the front. The first person to stand in line is the first person to be served.

 

HashSet<'T>, and Dictionary<'TKey, 'TValue> Classes

edit

The HashSet<'T> and Dictionary<'TKey, 'TValue> classes are mutable analogs of the F# set and map data structures and contain many of the same functions.

Using the HashSet<'T>

open System
open System.Collections.Generic

let nums_1to10 = new HashSet<int>()
let nums_5to15 = new HashSet<int>()

let main() =
    let printCollection msg targetSet =
        printf "%s: " msg
        targetSet |> Seq.sort |> Seq.iter(fun x -> printf "%O " x)
        printfn ""
        
    let addNums min max (targetSet : ICollection<_>) =
        seq { min .. max } |> Seq.iter(fun x -> targetSet.Add(x))
        
    addNums 1 10 nums_1to10
    addNums 5 15 nums_5to15
    
    printCollection "nums_1to10 (before)" nums_1to10
    printCollection "nums_5to15 (before)" nums_5to15
    
    nums_1to10.IntersectWith(nums_5to15) (* mutates nums_1to10 *)
    printCollection "nums_1to10 (after)" nums_1to10
    printCollection "nums_5to15 (after)" nums_5to15
    
    Console.ReadKey(true) |> ignore
            
main()
nums_1to10 (before): 1 2 3 4 5 6 7 8 9 10
nums_5to15 (before): 5 6 7 8 9 10 11 12 13 14 15
nums_1to10 (after): 5 6 7 8 9 10
nums_5to15 (after): 5 6 7 8 9 10 11 12 13 14 15

Using the Dictionary<'TKey, 'TValue>

> open System.Collections.Generic;;
> let dict = new Dictionary<string, string>();;

val dict : Dictionary<string,string>

> dict.Add("Garfield", "Jim Davis");;
val it : unit = ()
> dict.Add("Farside", "Gary Larson");;
val it : unit = ()
> dict.Add("Calvin and Hobbes", "Bill Watterson");;
val it : unit = ()
> dict.Add("Peanuts", "Charles Schultz");;
val it : unit = ()
> dict.["Farside"];; (* Use the '.[key]' operator to retrieve items *)
val it : string = "Gary Larson"

Differences Between .NET BCL and F# Collections

edit

The major difference between the collections built into the .NET BCL and their F# analogs is, of course, mutability. The mutable nature of BCL collections dramatically affects their implementation and time-complexity:

.NET Data Structure Insert Remove Lookup F# Data Structure Insert Remove Lookup
List O(1) / O(n)* O(n) O(1) (by index) / O(n) (linear) No built-in equivalent
LinkedList O(1) O(1) O(n) No built-in equivalent
Stack O(1) O(1) O(n) List O(1) n/a O(n)
Queue O(1) O(1) O(n) No built-in equivalent
HashSet O(1) / O(n)* O(1) O(1) Set O(log n) O(log n) O(log n)
Dictionary O(1) / O(n)* O(1) O(1) Map O(log n) O(log n) O(log n)
* These classes are built on top of internal arrays. They may take a performance hit as the internal arrays are periodically resized when adding items.
Note: the Big-O notation above refers to the time-complexity of the insert/remove/retrieve operations relative to the number of items in the data structure, not the relative amount of time required to evaluate the operations relative to other data structures. For example, accessing arrays by index vs. accessing dictionaries by key have the same time complexity, O(1), but the operations do not necessarily occur in the same amount of time.
Previous: Arrays Index Next: Input and Output