Python Programming/Sets
Starting with version 2.3, Python comes with an implementation of the mathematical set. Initially this implementation had to be imported from the standard module set, but with Python 2.6 the types set and frozenset became built-in types. A set is an unordered collection of objects, unlike sequence objects such as lists and tuples, in which each element is indexed. Sets cannot have duplicate members - a given object appears in a set 0 or 1 times. All members of a set have to be hashable, just like dictionary keys. Integers, floating point numbers, tuples, and strings are hashable; dictionaries, lists, and other sets (except frozensets) are not.
Overview
editSets in Python at a glance:
set1 = set() # A new empty set
set1.add("cat") # Add a single member
set1.update(["dog", "mouse"]) # Add several members, like list's extend
set1 |= set(["doe", "horse"]) # Add several members 2, like list's extend
if "cat" in set1: # Membership test
set1.remove("cat")
#set1.remove("elephant") - throws an error
set1.discard("elephant") # No error thrown
print(set1)
for item in set1: # Iteration AKA for each element
print(item)
print("Item count:", len(set1))# Length AKA size AKA item count
#1stitem = set1[0] # Error: no indexing for sets
isempty = len(set1) == 0 # Test for emptiness
set1 = {"cat", "dog"} # Initialize set using braces; since Python 2.7
#set1 = {} # No way; this is a dict
set1 = set(["cat", "dog"]) # Initialize set from a list
set2 = set(["dog", "mouse"])
set3 = set1 & set2 # Intersection
set4 = set1 | set2 # Union
set5 = set1 - set3 # Set difference
set6 = set1 ^ set2 # Symmetric difference
issubset = set1 <= set2 # Subset test
issuperset = set1 >= set2 # Superset test
set7 = set1.copy() # A shallow copy
set7.remove("cat")
print(set7.pop()) # Remove an arbitrary element
set8 = set1.copy()
set8.clear() # Clear AKA empty AKA erase
set9 = {x for x in range(10) if x % 2} # Set comprehension; since Python 2.7
print(set1, set2, set3, set4, set5, set6, set7, set8, set9, issubset, issuperset)
Constructing Sets
editOne way to construct sets is by passing any sequential object to the "set" constructor.
>>> set([0, 1, 2, 3])
set([0, 1, 2, 3])
>>> set("obtuse")
set(['b', 'e', 'o', 's', 'u', 't'])
We can also add elements to sets one by one, using the "add" function.
>>> s = set([12, 26, 54])
>>> s.add(32)
>>> s
set([32, 26, 12, 54])
Note that since a set does not contain duplicate elements, if we add one of the members of s to s again, the add function will have no effect. This same behavior occurs in the "update" function, which adds a group of elements to a set.
>>> s.update([26, 12, 9, 14])
>>> s
set([32, 9, 12, 14, 54, 26])
Note that you can give any type of sequential structure, or even another set, to the update function, regardless of what structure was used to initialize the set.
The set function also provides a copy constructor. However, remember that the copy constructor will copy the set, but not the individual elements.
>>> s2 = s.copy()
>>> s2
set([32, 9, 12, 14, 54, 26])
Membership Testing
editWe can check if an object is in the set using the same "in" operator as with sequential data types.
>>> 32 in s
True
>>> 6 in s
False
>>> 6 not in s
True
We can also test the membership of entire sets. Given two sets and , we check if is a subset or a superset of .
>>> s.issubset(set([32, 8, 9, 12, 14, -4, 54, 26, 19]))
True
>>> s.issuperset(set([9, 12]))
True
Note that "issubset" and "issuperset" can also accept sequential data types as arguments
>>> s.issuperset([32, 9])
True
Note that the <= and >= operators also express the issubset and issuperset functions respectively.
>>> set([4, 5, 7]) <= set([4, 5, 7, 9])
True
>>> set([9, 12, 15]) >= set([9, 12])
True
Like lists, tuples, and string, we can use the "len" function to find the number of items in a set.
Removing Items
editThere are three functions which remove individual items from a set, called pop, remove, and discard. The first, pop, simply removes an item from the set. Note that there is no defined behavior as to which element it chooses to remove.
>>> s = set([1,2,3,4,5,6])
>>> s.pop()
1
>>> s
set([2,3,4,5,6])
We also have the "remove" function to remove a specified element.
>>> s.remove(3)
>>> s
set([2,4,5,6])
However, removing a item which isn't in the set causes an error.
>>> s.remove(9)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
KeyError: 9
If you wish to avoid this error, use "discard." It has the same functionality as remove, but will simply do nothing if the element isn't in the set
We also have another operation for removing elements from a set, clear, which simply removes all elements from the set.
>>> s.clear()
>>> s
set([])
Iteration Over Sets
editWe can also have a loop move over each of the items in a set. However, since sets are unordered, it is undefined which order the iteration will follow.
>>> s = set("blerg")
>>> for n in s:
... print(n, "", end="")
...
r b e l g
Set Operations
editPython allows us to perform all the standard mathematical set operations, using members of set. Note that each of these set operations has several forms. One of these forms, s1.function(s2) will return another set which is created by "function" applied to and . The other form, s1.function_update(s2), will change to be the set created by "function" of and . Finally, some functions have equivalent special operators. For example, s1 & s2 is equivalent to s1.intersection(s2)
Intersection
editAny element which is in both and will appear in their intersection.
>>> s1 = set([4, 6, 9])
>>> s2 = set([1, 6, 8])
>>> s1.intersection(s2)
set([6])
>>> s1 & s2
set([6])
>>> s1.intersection_update(s2)
>>> s1
set([6])
Union
editThe union is the merger of two sets. Any element in or will appear in their union.
>>> s1 = set([4, 6, 9])
>>> s2 = set([1, 6, 8])
>>> s1.union(s2)
set([1, 4, 6, 8, 9])
>>> s1 | s2
set([1, 4, 6, 8, 9])
Note that union's update function is simply "update" above.
Symmetric Difference
editThe symmetric difference of two sets is the set of elements which are in one of either set, but not in both (also called exclusive-or in logic).
>>> s1 = set([4, 6, 9])
>>> s2 = set([1, 6, 8])
>>> s1.symmetric_difference(s2)
set([8, 1, 4, 9])
>>> s1 ^ s2
set([8, 1, 4, 9])
>>> s1.symmetric_difference_update(s2)
>>> s1
set([8, 1, 4, 9])
Set Difference
editPython can also find the set difference of and , which is the elements that are in but not in .
>>> s1 = set([4, 6, 9])
>>> s2 = set([1, 6, 8])
>>> s1.difference(s2)
set([9, 4])
>>> s1 - s2
set([9, 4])
>>> s1.difference_update(s2)
>>> s1
set([9, 4])
Multiple sets
editStarting with Python 2.6, "union", "intersection", and "difference" can work with multiple input. For example, using "set.intersection()":
>>> s1 = set([3, 6, 7, 9])
>>> s2 = set([6, 7, 9, 10])
>>> s3 = set([7, 9, 10, 11])
>>> set.intersection(s1, s2, s3)
set([9, 7])
frozenset
editA frozenset is basically the same as a set, except that it is immutable - once it is created, its members cannot be changed. Since they are immutable, they are also hashable, which means that frozensets can be used as members in other sets and as dictionary keys. frozensets have the same functions as normal sets, except none of the functions that change the contents (update, remove, pop, etc.) are available.
>>> fs = frozenset([2, 3, 4])
>>> s1 = set([fs, 4, 5, 6])
>>> s1
set([4, frozenset([2, 3, 4]), 6, 5])
>>> fs.intersection(s1)
frozenset([4])
>>> fs.add(6)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'frozenset' object has no attribute 'add'
Exercises
edit- Create the set {'cat', 1, 2, 3}, call it s.
- Create the set {'c', 'a', 't', '1', '2', '3'}.
- Create the frozen set {'cat', 1, 2, 3}, call it fs.
- Create a set containing the frozenset fs, it should look like {frozenset({'cat', 2, 3, 1})}.
Reference
edit- Python Tutorial, section "Data Structures", subsection "Sets" -- python.org
- Python Library Reference on Set Types -- python.org
- PEP 218 -- Adding a Built-In Set Object Type, python.org, a nice concise overview of the set type