F Sharp Programming/Mutable Collections
F# : Mutable Collections |
The .NET BCL comes with its own suite of mutable collections which are found in the System.Collections.Generic namespace. These built-in collections are very similar to their immutable counterparts in F#.
List<'T>
Class
edit
The List<'T>
class represents a strongly typed list of objects that can be accessed by index. Conceptually, this makes the List<'T>
class similar to arrays. However, unlike arrays, List
s can be resized and don't need to have their size specified on declaration.
.NET lists are created using the new
keyword and calling the list's constructor as follows:
> open System.Collections.Generic;;
> let myList = new List<string>();;
val myList : List<string>
> myList.Add("hello");;
val it : unit = ()
> myList.Add("world");;
val it : unit = ()
> myList.[0];;
val it : string = "hello"
> myList |> Seq.iteri (fun index item -> printfn "%i: %s" index myList.[index]);;
0: hello
1: world
It's easy to tell that .NET lists are mutable because their Add
methods return unit
rather than returning another list.
Underlying Implementation
editBehind the scenes, the List<'T>
class is just a fancy wrapper for an array. When a List<'T>
is constructed, it creates an 4-element array in memory. Adding the first 4 items is an O(1) operation. However, as soon as the 5th element needs to be added, the list doubles the size of the internal array, which means it has to reallocate new memory and copy elements in the existing list; this is a O(n) operation, where n is the number of items in the list.
The List<'T>.Count
property returns the number of items currently held in the collection, the List<'T>.Capacity
collection returns the size of the underlying array. This code sample demonstrates how the underlying array is resized:
open System
open System.Collections.Generic
let items = new List<string>()
let printList (l : List<_>) =
printfn "l.Count: %i, l.Capacity: %i" l.Count l.Capacity
printfn "Items:"
l |> Seq.iteri (fun index item ->
printfn " l.[%i]: %s" index l.[index])
printfn "-----------"
let main() =
printList items
items.Add("monkey")
printList items
items.Add("kitty")
items.Add("bunny")
printList items
items.Add("doggy")
items.Add("octopussy")
items.Add("ducky")
printList items
printfn "Removing entry for \"doggy\"\n--------\n"
items.Remove("doggy") |> ignore
printList items
printfn "Removing item at index 3\n--------\n"
items.RemoveAt(3)
printList items
Console.ReadKey(true) |> ignore
main()
l.Count: 0, l.Capacity: 0 Items: ----------- l.Count: 1, l.Capacity: 4 Items: l.[0]: monkey ----------- l.Count: 3, l.Capacity: 4 Items: l.[0]: monkey l.[1]: kitty l.[2]: bunny ----------- l.Count: 6, l.Capacity: 8 Items: l.[0]: monkey l.[1]: kitty l.[2]: bunny l.[3]: doggy l.[4]: octopussy l.[5]: ducky ----------- Removing entry for "doggy" -------- l.Count: 5, l.Capacity: 8 Items: l.[0]: monkey l.[1]: kitty l.[2]: bunny l.[3]: octopussy l.[4]: ducky ----------- Removing item at index 3 -------- l.Count: 4, l.Capacity: 8 Items: l.[0]: monkey l.[1]: kitty l.[2]: bunny l.[3]: ducky -----------
If you know the maximum size of the list beforehand, it is possible to avoid the performance hit by calling the List<'T>(size : int)
constructor instead. The following sample demonstrates how to add 1000 items to a list without resizing the internal array:
> let myList = new List<int>(1000);;
val myList : List<int>
> myList.Count, myList.Capacity;;
val it : int * int = (0, 1000)
> seq { 1 .. 1000 } |> Seq.iter (fun x -> myList.Add(x));;
val it : unit = ()
> myList.Count, myList.Capacity;;
val it : int * int = (1000, 1000)
LinkedList<'T>
Class
edit
A LinkedList<'T>
represented a doubly-linked sequence of nodes which allows efficient O(1) inserts and removal, supports forward and backward traversal, but its implementation prevents efficient random access. Linked lists have a few valuable methods:
(* Prepends an item to the LinkedList *)
val AddFirst : 'T -> LinkedListNode<'T>
(* Appends an items to the LinkedList *)
val AddLast : 'T -> LinkedListNode<'T>
(* Adds an item before a LinkedListNode *)
val AddBefore : LinkedListNode<'T> -> 'T -> LinkedListNode<'T>
(* Adds an item after a LinkedListNode *)
val AddAfter : LinkedListNode<'T> -> 'T -> LinkedListNode<'T>
Note that these methods return a LinkedListNode<'T>
, not a new LinkedList<'T>
. Adding nodes actually mutates the LinkedList
object:
> open System.Collections.Generic;;
> let items = new LinkedList<string>();;
val items : LinkedList<string>
> items.AddLast("AddLast1");;
val it : LinkedListNode<string>
= System.Collections.Generic.LinkedListNode`1[System.String]
{List = seq ["AddLast1"];
Next = null;
Previous = null;
Value = "AddLast1";}
> items.AddLast("AddLast2");;
val it : LinkedListNode<string>
= System.Collections.Generic.LinkedListNode`1[System.String]
{List = seq ["AddLast1"; "AddLast2"];
Next = null;
Previous = System.Collections.Generic.LinkedListNode`1[System.String];
Value = "AddLast2";}
> let firstItem = items.AddFirst("AddFirst1");;
val firstItem : LinkedListNode<string>
> let addAfter = items.AddAfter(firstItem, "addAfter");;
val addAfter : LinkedListNode<string>
> let addBefore = items.AddBefore(addAfter, "addBefore");;
val addBefore : LinkedListNode<string>
> items |> Seq.iter (fun x -> printfn "%s" x);;
AddFirst1
addBefore
addAfter
AddLast1
AddLast2
The Stack<'T>
and Queue<'T>
classes are special cases of a linked list (they can be thought of as linked lists with restrictions on where you can add and remove items).
Stack<'T>
Class
edit
A Stack<'T>
only allows programmers prepend/push and remove/pop items from the front of a list, which makes it a last in, first out (LIFO) data structure. The Stack<'T>
class can be thought of as a mutable version of the F# list.
> stack.Push("First");; (* Adds item to front of the list *)
val it : unit = ()
> stack.Push("Second");;
val it : unit = ()
> stack.Push("Third");;
val it : unit = ()
> stack.Pop();; (* Returns and removes item from front of the list *)
val it : string = "Third"
> stack.Pop();;
val it : string = "Second"
> stack.Pop();;
val it : string = "First"
A stack of coins could be represented with this data structure. If we stacked coins one on top another, the first coin in the stack is at the bottom of the stack, and the last coin in the stack appears at the top. We remove coins from top to bottom, so the last coin added to the stack is the first one removed.
Queue<'T>
Class
edit
A Queue<'T>
only allows programmers to append/enqueue to the rear of a list and remove/dequeue from the front of a list, which makes it a first in, first out (FIFO) data structure.
> let queue = new Queue<string>();;
val queue : Queue<string>
> queue.Enqueue("First");; (* Adds item to the rear of the list *)
val it : unit = ()
> queue.Enqueue("Second");;
val it : unit = ()
> queue.Enqueue("Third");;
val it : unit = ()
> queue.Dequeue();; (* Returns and removes item from front of the queue *)
val it : string = "First"
> queue.Dequeue();;
val it : string = "Second"
> queue.Dequeue();;
val it : string = "Third"
A line of people might be represented by a queue: people add themselves to the rear of the line, and are removed from the front. The first person to stand in line is the first person to be served.
HashSet<'T>
, and Dictionary<'TKey, 'TValue>
Classes
edit
The HashSet<'T>
and Dictionary<'TKey, 'TValue>
classes are mutable analogs of the F# set and map data structures and contain many of the same functions.
Using the HashSet<'T>
open System
open System.Collections.Generic
let nums_1to10 = new HashSet<int>()
let nums_5to15 = new HashSet<int>()
let main() =
let printCollection msg targetSet =
printf "%s: " msg
targetSet |> Seq.sort |> Seq.iter(fun x -> printf "%O " x)
printfn ""
let addNums min max (targetSet : ICollection<_>) =
seq { min .. max } |> Seq.iter(fun x -> targetSet.Add(x))
addNums 1 10 nums_1to10
addNums 5 15 nums_5to15
printCollection "nums_1to10 (before)" nums_1to10
printCollection "nums_5to15 (before)" nums_5to15
nums_1to10.IntersectWith(nums_5to15) (* mutates nums_1to10 *)
printCollection "nums_1to10 (after)" nums_1to10
printCollection "nums_5to15 (after)" nums_5to15
Console.ReadKey(true) |> ignore
main()
nums_1to10 (before): 1 2 3 4 5 6 7 8 9 10 nums_5to15 (before): 5 6 7 8 9 10 11 12 13 14 15 nums_1to10 (after): 5 6 7 8 9 10 nums_5to15 (after): 5 6 7 8 9 10 11 12 13 14 15
Using the Dictionary<'TKey, 'TValue>
> open System.Collections.Generic;;
> let dict = new Dictionary<string, string>();;
val dict : Dictionary<string,string>
> dict.Add("Garfield", "Jim Davis");;
val it : unit = ()
> dict.Add("Farside", "Gary Larson");;
val it : unit = ()
> dict.Add("Calvin and Hobbes", "Bill Watterson");;
val it : unit = ()
> dict.Add("Peanuts", "Charles Schultz");;
val it : unit = ()
> dict.["Farside"];; (* Use the '.[key]' operator to retrieve items *)
val it : string = "Gary Larson"
Differences Between .NET BCL and F# Collections
editThe major difference between the collections built into the .NET BCL and their F# analogs is, of course, mutability. The mutable nature of BCL collections dramatically affects their implementation and time-complexity:
.NET Data Structure | Insert | Remove | Lookup | F# Data Structure | Insert | Remove | Lookup |
---|---|---|---|---|---|---|---|
List
|
O(1) / O(n)* | O(n) | O(1) (by index) / O(n) (linear) | No built-in equivalent | |||
LinkedList
|
O(1) | O(1) | O(n) | No built-in equivalent | |||
Stack
|
O(1) | O(1) | O(n) | List
|
O(1) | n/a | O(n) |
Queue
|
O(1) | O(1) | O(n) | No built-in equivalent | |||
HashSet
|
O(1) / O(n)* | O(1) | O(1) | Set
|
O(log n) | O(log n) | O(log n) |
Dictionary
|
O(1) / O(n)* | O(1) | O(1) | Map
|
O(log n) | O(log n) | O(log n) |
- * These classes are built on top of internal arrays. They may take a performance hit as the internal arrays are periodically resized when adding items.
- Note: the Big-O notation above refers to the time-complexity of the insert/remove/retrieve operations relative to the number of items in the data structure, not the relative amount of time required to evaluate the operations relative to other data structures. For example, accessing arrays by index vs. accessing dictionaries by key have the same time complexity, O(1), but the operations do not necessarily occur in the same amount of time.