Sets: Store Unique Values Efficiently
Introduction
In this chapter, you will learn Python sets, a data structure designed for unique values. Sets are very useful when you need deduplication, fast membership checks, and mathematical set operations. Once you understand sets, many data-cleaning tasks become much easier.
Prerequisites
- Python
3.10+installed - Basic understanding of lists, dictionaries, loops, and conditions
- Ability to run
.pyfiles in terminal or IDE
What Is a Set
A set is an unordered collection of unique elements.
Key points:
- Unordered: no fixed index position
- Unique: duplicate values are automatically removed
- Mutable: you can add and remove elements
# Create a set with duplicate values
numbers = {1, 2, 2, 3, 4, 4}
# Duplicates are removed automatically
print(numbers) # {1, 2, 3, 4}1) Create Sets
Use {} with elements, or set() from another iterable.
# Create directly
fruits = {"apple", "banana", "orange"}
print(fruits)
# Create from list
scores = [90, 85, 90, 78, 85]
unique_scores = set(scores)
print(unique_scores)Empty set must use set():
# Correct empty set
empty_set = set()
# This is an empty dictionary, not a set
empty_dict = {}Warning
Do not use {} for an empty set.
{} creates an empty dictionary.
2) Add and Remove Elements
Common methods:
add()add one elementupdate()add multiple elementsremove()delete one element (error if missing)discard()delete one element (no error if missing)pop()remove and return one arbitrary element
# Start set
tags = {"python", "beginner"}
# Add one element
tags.add("practice")
# Add multiple elements
tags.update(["coding", "tips"])
# Remove one element safely
tags.discard("beginner")
# Print current set
print(tags)3) Membership Check
Set membership is very efficient.
# Student IDs set
student_ids = {1001, 1002, 1003}
# Fast membership query
print(1002 in student_ids) # True
print(2001 in student_ids) # FalseTip
Performance Hint
For frequent "exists or not" checks, sets are usually better than lists.
4) Set Math Operations
Sets support classic mathematical operations.
Union
Combine all unique elements from both sets.
set_a = {1, 2, 3}
set_b = {3, 4, 5}
print(set_a | set_b) # {1, 2, 3, 4, 5}Intersection
Get common elements.
set_a = {1, 2, 3}
set_b = {3, 4, 5}
print(set_a & set_b) # {3}Difference
Get elements in one set but not the other.
set_a = {1, 2, 3}
set_b = {3, 4, 5}
print(set_a - set_b) # {1, 2}Symmetric Difference
Get elements in either set, but not both.
set_a = {1, 2, 3}
set_b = {3, 4, 5}
print(set_a ^ set_b) # {1, 2, 4, 5}5) Real Mini Example: Email Deduplication
This example removes duplicate emails and checks subscription status.
# Raw email list with duplicates
emails = [
"a@example.com",
"b@example.com",
"a@example.com",
"c@example.com",
]
# Deduplicate
unique_emails = set(emails)
# Print unique result
print(f"Unique email count: {len(unique_emails)}")
for email in unique_emails:
print(email)
# Membership check
target = "b@example.com"
print(f"Subscribed: {target in unique_emails}")This pattern is common in user management and notification systems.
Common Beginner Mistakes
Mistake 1: Expecting Stable Order
Sets are unordered, so output order may vary.
Mistake 2: Trying to Access by Index
my_set[0] is invalid because sets do not support indexing.
Mistake 3: Using Mutable Types as Set Elements
Set elements must be hashable, so lists and dictionaries cannot be set elements.
Surprise Practice Challenge
Build a "Class Math Score Deduplicator":
- Let user input 8 math scores (allow duplicates)
- Convert to set to deduplicate
- Convert back to list
- Sort from high to low
- Print final scores
Reference implementation:
# Collect 8 scores from user
raw_scores = []
for i in range(1, 9):
score = float(input(f"Enter math score #{i}: "))
raw_scores.append(score)
# Deduplicate using set
unique_scores = set(raw_scores)
# Convert to list and sort descending
sorted_scores = sorted(list(unique_scores), reverse=True)
# Print result
print("Deduplicated and sorted scores:")
for score in sorted_scores:
print(score)FAQ
When should I use a set instead of a list?
Use a set when uniqueness and fast membership checks matter more than order.
Can set elements be duplicated?
No. Duplicate values are removed automatically.
Why does set output order look random?
Because sets are unordered collections by design.
How can I keep order and still deduplicate?
One simple way is to use dict.fromkeys() on modern Python, or combine set logic with ordered structures.