Background:
The set type was introduced in Python 2.4 and provides a convenient and fast way to store unique, immutable elements with support for set-theoretic operations (union, intersection, etc.). Sets are implemented based on hash tables.
Problem: Many users do not understand the difference between set and list, the specifics of storing elements in a set (unordered, only hashable objects), and the nuances of using sets to optimize searching, uniqueness checks, or working with large datasets.
Solution:
A set is a mutable, unordered container of unique hashable objects. It supports fast membership testing, union, intersection, difference, and symmetric difference operations. Built-in methods include: add, remove, discard, update, intersection, difference, union, symmetric_difference, and others.
Code example:
nums = {1, 3, 5, 7} nums.add(9) nums.update([5, 10]) # 5 is already present, only 10 will be added other = {3, 9, 11} inter = nums & other # intersection {3, 9} # Membership test — faster than with list y = 11 if y in nums: print('Exists!')
Key features:
in operation is incredibly fast (O(1) on average), unlike list (O(n)).Can you add a list or another set to a set?
Answer: No, you cannot. Only hashable (immutable) objects are allowed: strings, numbers, tuples. Lists and sets are mutable, they cannot be added.
Code example:
s = set() s.add([1, 2]) # TypeError: unhashable type: 'list' s.add((1, 2)) # OK
What is the difference between the remove and discard methods of a set?
Answer: remove(value) raises a KeyError if the value is not found. discard(value) quietly does nothing if the element is not present.
Code example:
s = {1, 2, 3} s.remove(4) # KeyError s.discard(4) # No error
Is an empty set {} a set object?
Answer: No. The literal {} always represents an empty dict. To create an empty set, you need to use the set() function.
Code example:
empty_set = {} # This is a dict empty_set_real = set() # This is a set
Negative case
Trying to store unique objects in a list and checking for duplicates using "in" for large datasets.
Pros:
Cons:
Positive case
Using a set to find intersections and unique data in large arrays (for instance, email distributions), avoiding duplicates while performing operations quickly.
Pros:
Cons: