Introducing Julia/Dictionaries and sets
Dictionaries
editMany of the functions introduced so far have been shown working on arrays (and tuples). But arrays are just one type of collection. Julia has others.
A simple look-up table is a useful way of organizing many types of data: given a single piece of information, such as a number, string, or symbol, called the key, what is the corresponding data value? For this purpose, Julia provides the Dictionary object, called Dict for short. It's an "associative collection" because it associates keys with values.
Creating dictionaries
editYou can create a simple dictionary using the following syntax:
julia> dict = Dict("a" => 1, "b" => 2, "c" => 3) Dict{String,Int64} with 3 entries: "c" => 3 "b" => 2 "a" => 1
dict
is now a dictionary. The keys are "a", "b", and "c", the corresponding values are 1, 2, and 3. The =>
operator is called the Pair()
function. In a dictionary, keys are always unique – you can't have two keys with the same name.
If you know the types of the keys and values in advance, you can (and probably should) specify them after the Dict
keyword, in curly braces:
julia> dict = Dict{String,Integer}("a"=>1, "b" => 2) Dict{String,Integer} with 2 entries: "b" => 2 "a" => 1
You can also create dictionaries using the generator/comprehensions syntax:
julia> dict = Dict(string(i) => sind(i) for i = 0:5:360) Dict{String,Float64} with 73 entries: "320" => -0.642788 "65" => 0.906308 "155" => 0.422618 ⋮ => ⋮
Use the following syntax to create a typed empty dictionary:
julia> dict = Dict{String,Int64}() Dict{String,Int64} with 0 entries
or you can omit the types, and get an untyped dictionary:
julia> dict = Dict() Dict{Any,Any} with 0 entries
It's sometimes useful to create dictionary entries using a for
loop:
files = ["a.txt", "b.txt", "c.txt"]
fvars = Dict()
for (n, f) in enumerate(files)
fvars["x_$(n)"] = f
end
This is one way you could create a set of 'variables' stored in a dictionary:
julia> fvars Dict{Any,Any} with 3 entries: "x_1" => "a.txt" "x_2" => "b.txt" "x_3" => "c.txt"
Looking things up
editTo get a value, if you have the key:
julia> dict = Dict("a" => 1, "b" => 2, "c" => 3, "d" => 4, "e" => 5) julia> dict["a"] 1
if the keys are strings. Or, if the keys are symbols:
julia> symdict = Dict(:x => 1, :y => 3, :z => 6) Dict{Symbol,Int64} with 3 entries: :z => 6 :x => 1 :y => 3
julia> symdict[:x] 1
Or if the keys are integers:
julia> intdict = Dict(1 => "one", 2 => "two", 3 => "three") Dict{Int64,String} with 3 entries: 2 => "two" 3 => "three" 1 => "one"
julia> intdict[2] "two"
You can instead use the get()
function, and provide a fail-safe default value if there's no value for that particular key:
julia> dict = Dict("a" => 1, "b" => 2, "c" => 3, "d" => 4, "e" => 5)
julia> get(dict, "a", 0) 1 julia> get(dict, "Z", 0) 0
If you don't want get()
to provide a default value, use a try
...catch
block:
try
dict["Z"]
catch error
if isa(error, KeyError)
println("sorry, I couldn't find anything")
end
end
sorry, I couldn't find anything
To change a value assigned to an existing key (or assign a value to a hitherto unseen key):
julia> dict["a"] = 10 10
Keys
editKeys must be unique for a dictionary. There's always only one key called a
in this dictionary, so when you assign a value to a key that already exists, you're not creating a new one, just modifying an existing one.
To see if the dictionary contains a key, use haskey()
:
julia> haskey(dict, "Z") false
To check for the existence of a key/value pair:
julia> in(("b" => 2), dict) true
To add a new key and value to a dictionary, use this:
julia> dict["d"] = 4 4
You can delete a key from the dictionary, using delete!()
:
julia> delete!(dict, "d") Dict{String,Int64} with 4 entries: "c" => 3 "e" => 5 "b" => 2 "a" => 1
You'll notice that the dictionary doesn't seem to be sorted in any way — at least, the keys are in no particular order. This is due to the way they're stored, and you can't sort them in place. (But see Sorting, below.)
To get all keys, use the keys()
function:
julia> dict = Dict("a" => 1, "b" => 2, "c" => 3, "d" => 4, "e" => 5); julia> keys(dict) Base.KeySet for a Dict{String,Int64} with 5 entries. Keys: "c" "e" "b" "a" "d"
The result is an iterator that has just one job: to iterate through a dictionary key by key:
julia> collect(keys(dict)) 5-element Array{String,1}: "c" "e" "b" "a" "d" julia> [uppercase(key) for key in keys(dict)] 5-element Array{Any,1}: "C" "E" "B" "A" "D"
This uses the list comprehension form ([ new-element for loop-variable in iterator ]
) and each new element is collected into an array. An alternative would be:
julia> map(uppercase, collect(keys(dict))) 5-element Array{String,1}: "C" "E" "B" "A" "D"
Values
editTo retrieve all the values, use the values()
function:
julia> values(dict) Base.ValueIterator for a Dict{String,Int64} with 5 entries. Values: 3 5 2 1 4
If you want to go through a dictionary and process each key/value, you can make use the fact that dictionaries themselves are iterable objects:
julia> for kv in dict println(kv) end "c"=>3 "e"=>5 "b"=>2 "a"=>1 "d"=>4
where kv
is a tuple containing each key/value pair in turn.
Or you could do:
julia> for k in keys(dict) println(k, " ==> ", dict[k]) end c ==> 3 e ==> 5 b ==> 2 a ==> 1 d ==> 4
Even better, you can use a key/value tuple to simplify the iteration even more:
julia> for (key, value) in dict println(key, " ==> ", value) end c ==> 3 e ==> 5 b ==> 2 a ==> 1 d ==> 4
Here's another example:
for tuple in Dict("1"=>"Hydrogen", "2"=>"Helium", "3"=>"Lithium")
println("Element $(tuple[1]) is $(tuple[2])")
end
Element 1 is Hydrogen
Element 2 is Helium
Element 3 is Lithium
(Notice the string interpolation operator, $
. This allows you to use a variable's name in a string and get the variable's value when the string is printed. You can include any Julia expression in a string using $()
.)
Sorting a dictionary
editBecause dictionaries don't store the keys in any particular order, you might want to output the dictionary to a sorted array to obtain the items in order:
julia> dict = Dict("a" => 1, "b" => 2, "c" => 3, "d" => 4, "e" => 5, "f" => 6) Dict{String,Int64} with 6 entries: "f" => 6 "c" => 3 "e" => 5 "b" => 2 "a" => 1 "d" => 4
julia> for key in sort(collect(keys(dict))) println("$key => $(dict[key])") end a => 1 b => 2 c => 3 d => 4 e => 5 f => 6
If you really need to have a dictionary that remains sorted all the time, you can use the SortedDict data type from the DataStructures.jl package (after having installed it).
julia> import DataStructures julia> dict = DataStructures.SortedDict("b" => 2, "c" => 3, "d" => 4, "e" => 5, "f" => 6) DataStructures.SortedDict{String,Int64,Base.Order.ForwardOrdering} with 5 entries: "b" => 2 "c" => 3 "d" => 4 "e" => 5 "f" => 6
julia> dict["a"] = 1 1
julia> dict DataStructures.SortedDict{String,Int64,Base.Order.ForwardOrdering} with 6 entries: "a" => 1 "b" => 2 "c" => 3 "d" => 4 "e" => 5 "f" => 6
Recent versions of Julia sort dictionaries for you:
julia> dict = Dict("a" => 1, "b" => 2, "c" => 3, "d" => 4, "e" => 5, "f" => 6) Dict{String,Int64} with 6 entries: "f" => 6 "c" => 3 "e" => 5 "b" => 2 "a" => 1 "d" => 4 julia> sort(dict) OrderedCollections.OrderedDict{String,Int64} with 6 entries: "a" => 1 "b" => 2 "c" => 3 "d" => 4 "e" => 5 "f" => 6
Simple example: counting words
editA simple application of a dictionary is to count how many times each word appears in a piece of text. Each word is a key, and the value of the key is the number of times that word appears in the text.
Let's count the words in the Sherlock Holmes stories. I've downloaded the text from the excellent Project Gutenberg and stored them in a file "sherlock-holmes-canon.txt". To create a list of words from the loaded text in canon
, we'll split the text using a regular expression, and convert every word to lowercase. (There are probably faster methods.)
julia> f = open("sherlock-holmes-canon.txt") julia> wordlist = String[] julia> for line in eachline(f) words = split(line, r"\W") map(w -> push!(wordlist, lowercase(w)), words) end julia> filter!(!isempty, wordlist) julia> close(f)
wordlist
is now an array of nearly 700,000 words:
julia> wordlist[1:20] 20-element Array{String,1}: "THE" "COMPLETE" "SHERLOCK" "HOLMES" "Arthur" "Conan" "Doyle" "Table" "of" "contents" "A" "Study" "In" "Scarlet" "The" "Sign" "of" "the" "Four" "The"
To store the words and the word counts, we'll create a dictionary:
julia> wordcounts = Dict{String,Int64}() Dict{String,Int64} with 0 entries
To build the dictionary, loop through the list of words, and use get()
to look up the current tally, if any. If the word has already been seen, the count can be increased. If the word hasn't been seen before, the fall-back third argument of get()
ensures that the absence doesn't cause an error, and 1 is stored instead.
for word in wordlist
wordcounts[word]=get(wordcounts, word, 0) + 1
end
Now you can look up words in the wordcounts
dictionary and find out how many times they appear:
julia> wordcounts["watson"] 1040 julia> wordcounts["holmes"] 3057 julia> wordcounts["sherlock"] 415 julia> wordcounts["lestrade"] 244
Dictionaries aren't sorted, but you can use the collect()
and keys()
functions on the dictionary to collect the keys and then sort them. In a loop you can work through the dictionary in alphabetical order:
for i in sort(collect(keys(wordcounts)))
println("$i, $(wordcounts[i])")
end
000, 5
1, 8
10, 7
100, 4
1000, 9
104, 1
109, 1
10s, 2
10th, 1
11, 9
1100, 1
117, 2
117th, 2
11th, 1
12, 2
120, 2
126b, 3
⋮
zamba, 2
zeal, 5
zealand, 3
zealous, 3
zenith, 1
zeppelin, 1
zero, 2
zest, 3
zig, 1
zigzag, 3
zigzagged, 1
zinc, 3
zion, 2
zoo, 1
zoology, 2
zu, 1
zum, 2
â, 41
ã, 4
But how do you find out the most common words? One way is to use collect()
to convert the dictionary to an array of tuples, and then to sort the array by looking at the last value of each tuple:
julia> sort(collect(wordcounts), by = tuple -> last(tuple), rev=true) 19171-element Array{Pair{String,Int64},1}: ("the",36244) ("and",17593) ("i",17357) ("of",16779) ("to",16041) ("a",15848) ("that",11506) ⋮ ("enrage",1) ("smuggled",1) ("lounges",1) ("devotes",1) ("reverberated",1) ("munitions",1) ("graybeard",1)
To see only the top 20 words:
julia> sort(collect(wordcounts), by = tuple -> last(tuple), rev=true)[1:20] 20-element Array{Pair{String,Int64},1}: ("the",36244) ("and",17593) ("i",17357) ("of",16779) ("to",16041) ("a",15848) ("that",11506) ("it",11101) ("in",10766) ("he",10366) ("was",9844) ("you",9688) ("his",7836) ("is",6650) ("had",6057) ("have",5532) ("my",5293) ("with",5256) ("as",4755) ("for",4713)
In a similar way, you can use the filter()
function to find, for example, all words that start with "k" and occur less than four times:
julia> filter(tuple -> startswith(first(tuple), "k") && last(tuple) < 4, collect(wordcounts)) 73-element Array{Pair{String,Int64},1}: ("keg",1) ("klux",2) ("knifing",1) ("keening",1) ("kansas",3) ⋮ ("kaiser",1) ("kidnap",2) ("keswick",1) ("kings",2) ("kratides",3) ("ken",2) ("kindliness",2) ("klan",2) ("keepsake",1) ("kindled",2) ("kit",2) ("kicking",1) ("kramm",2) ("knob",1)
More complex structures
editA dictionary can hold many different types of values. Here for example is a dictionary where the keys are strings and the values are arrays of arrays of points (assuming that the Point type has been defined already). For example, this could be used to store graphical shapes describing the letters of the alphabet (some of which have two or more loops):
julia> p = Dict{String, Array{Array}}() Dict{String,Array{Array{T,N},N}} julia> p["a"] = Array[[Point(0,0), Point(1,1)], [Point(34, 23), Point(5,6)]] 2-element Array{Array{T,N},1}: [Point(0.0,0.0), Point(1.0,1.0)] [Point(34.0,23.0), Point(5.0,6.0)] julia> push!(p["a"], [Point(34.0,23.0), Point(5.0,6.0)]) 3-element Array{Array{T,N},1}: [Point(0.0,0.0), Point(1.0,1.0)] [Point(34.0,23.0), Point(5.0,6.0)] [Point(34.0,23.0), Point(5.0,6.0)]
Or create a dictionary with some already-known values:
julia> d = Dict("shape1" => Array [ [ Point(0,0), Point(-20,57)], [Point(34, -23), Point(-10,12) ] ]) Dict{String,Array{Array{T,N},1}} with 1 entry: "shape1" => Array [ [ Point(0.0,0.0), Point(-20.0,57.0)], [Point(34.0,-23.0), Point(-10.0,12.0) ] ]
Add another array to the first one:
julia> push!(d["shape1"], [Point(-124.0, 37.0), Point(25.0,32.0)]) 3-element Array{Array{T,N},1}: [Point(0.0,0.0), Point(-20.0,57.0)] [Point(34.0,-23.0), Point(-10.0,12.0)] [Point(-124.0,37.0), Point(25.0,32.0)]
Sets
editA set is a collection of elements, just like an array or dictionary, with no duplicated elements.
The two important differences between a set and other types of collection is that in a set you can have only one of each element, and, in a set, the order of elements isn't important (whereas an array can have multiple copies of an element and their order is remembered).
You can create an empty set using the Set
constructor function:
julia> colors = Set() Set{Any}({})
As elsewhere in Julia, you can specify the type:
julia> primes = Set{Int64}() Set(Int64)[]
You can create and fill sets in one go:
julia> colors = Set{String}(["red","green","blue","yellow"]) Set(String["yellow","blue","green","red"])
or you can let Julia "guess the type":
julia> colors = Set(["red","green","blue","yellow"]) Set{String}({"yellow","blue","green","red"})
Quite a few of the functions that work with arrays also work with sets. Adding elements to sets, for example, is a bit like adding elements to arrays. You can use push!()
:
julia> push!(colors, "black") Set{String}({"yellow","blue","green","black","red"})
But you can't use pushfirst!()
, because that works only for things that have a concept of "first", like arrays.
What happens if you try to add something to the set that's already there? Absolutely nothing. You don't get a copy added, because it's a set, not an array, and sets don't store repeated elements.
To see if something is in the set, you can use in()
:
julia> in("green", colors) true
There are some standard operations you can do with sets, namely find their union, intersection, and difference, with the functions, union()
, intersect()
, and setdiff()
:
julia> rainbow = Set(["red","orange","yellow","green","blue","indigo","violet"]) Set(String["indigo","yellow","orange","blue","violet","green","red"])
The union of two sets is the set of everything that is in one or the other sets. The result is another set – so you can't have two "yellow"s here, even though we've got a "yellow" in each set:
julia> union(colors, rainbow) Set(String["indigo","yellow","orange","blue","violet","green","black","red"])
The intersection of two sets is the set that contains every element that belongs to both sets:
julia> intersect(colors, rainbow) Set(String["yellow","blue","green","red"])
The difference between two sets is the set of elements that are in the first set, but not in the second. This time, the order in which you supply the sets matters. The setdiff()
function finds the elements that are in the first set, colors
, but not in the second set, rainbow
:
julia> setdiff(colors, rainbow) Set(String["black"])
Other functions
editFunctions that work on arrays and sets sometimes work on dictionaries and other collections too. For example, some of the set operations can be applied to dictionaries, not just sets and arrays:
julia> d1 = Dict(1=>"a", 2 => "b") Dict{Int64,String} with 2 entries: 2 => "b" 1 => "a" julia> d2 = Dict(2 => "b", 3 =>"c", 4 => "d") Dict{Int64,String} with 3 entries: 4 => "d" 2 => "b" 3 => "c" julia> union(d1, d2) 4-element Array{Pair{Int64,String},1}: 2=>"b" 1=>"a" 4=>"d" 3=>"c" julia> intersect(d1, d2) 1-element Array{Pair{Int64,String},1}: 2=>"b" julia> setdiff(d1, d2) 1-element Array{Pair{Int64,String},1}: 1=>"a"
Notice that the results are returned as arrays of Pairs, rather than as Dictionaries.
Functions such as filter()
, map()
, and collect()
which we've already seen being used with arrays also work with dictionaries:
julia> filter((k, v) -> k == 1, d1) Dict{Int64,String} with 1 entry: 1 => "a"
There's a merge()
function which can merge two dictionaries:
julia> merge(d1, d2) Dict{Int64,String} with 4 entries: 4 => "d" 2 => "b" 3 => "c" 1 => "a"
The findmin()
function can find the minimum value in a dictionary, and return the value, and its key.
julia> d1 = Dict(:a => 1, :b => 2, :c => 0) Dict{Symbol,Int64} with 3 entries: :a => 1 :b => 2 :c => 0 julia> findmin(d1) (0, :c)