Sets: Store Unique Values Efficiently

Introduction

In this chapter, you will learn Python sets, a data structure designed for unique values. Sets are very useful when you need deduplication, fast membership checks, and mathematical set operations. Once you understand sets, many data-cleaning tasks become much easier.

Prerequisites

  • Python 3.10+ installed
  • Basic understanding of lists, dictionaries, loops, and conditions
  • Ability to run .py files in terminal or IDE

What Is a Set

A set is an unordered collection of unique elements.

Key points:

  • Unordered: no fixed index position
  • Unique: duplicate values are automatically removed
  • Mutable: you can add and remove elements
python
# Create a set with duplicate values
numbers = {1, 2, 2, 3, 4, 4}
 
# Duplicates are removed automatically
print(numbers)  # {1, 2, 3, 4}

1) Create Sets

Use {} with elements, or set() from another iterable.

python
# Create directly
fruits = {"apple", "banana", "orange"}
print(fruits)
 
# Create from list
scores = [90, 85, 90, 78, 85]
unique_scores = set(scores)
print(unique_scores)

Empty set must use set():

python
# Correct empty set
empty_set = set()
 
# This is an empty dictionary, not a set
empty_dict = {}

Warning

Do not use {} for an empty set.
{} creates an empty dictionary.

2) Add and Remove Elements

Common methods:

  • add() add one element
  • update() add multiple elements
  • remove() delete one element (error if missing)
  • discard() delete one element (no error if missing)
  • pop() remove and return one arbitrary element
python
# Start set
tags = {"python", "beginner"}
 
# Add one element
tags.add("practice")
 
# Add multiple elements
tags.update(["coding", "tips"])
 
# Remove one element safely
tags.discard("beginner")
 
# Print current set
print(tags)

3) Membership Check

Set membership is very efficient.

python
# Student IDs set
student_ids = {1001, 1002, 1003}
 
# Fast membership query
print(1002 in student_ids)  # True
print(2001 in student_ids)  # False

Tip

Performance Hint

For frequent "exists or not" checks, sets are usually better than lists.

4) Set Math Operations

Sets support classic mathematical operations.

Union

Combine all unique elements from both sets.

python
set_a = {1, 2, 3}
set_b = {3, 4, 5}
print(set_a | set_b)  # {1, 2, 3, 4, 5}

Intersection

Get common elements.

python
set_a = {1, 2, 3}
set_b = {3, 4, 5}
print(set_a & set_b)  # {3}

Difference

Get elements in one set but not the other.

python
set_a = {1, 2, 3}
set_b = {3, 4, 5}
print(set_a - set_b)  # {1, 2}

Symmetric Difference

Get elements in either set, but not both.

python
set_a = {1, 2, 3}
set_b = {3, 4, 5}
print(set_a ^ set_b)  # {1, 2, 4, 5}

5) Real Mini Example: Email Deduplication

This example removes duplicate emails and checks subscription status.

python
# Raw email list with duplicates
emails = [
    "a@example.com",
    "b@example.com",
    "a@example.com",
    "c@example.com",
]
 
# Deduplicate
unique_emails = set(emails)
 
# Print unique result
print(f"Unique email count: {len(unique_emails)}")
for email in unique_emails:
    print(email)
 
# Membership check
target = "b@example.com"
print(f"Subscribed: {target in unique_emails}")

This pattern is common in user management and notification systems.

Common Beginner Mistakes

Mistake 1: Expecting Stable Order

Sets are unordered, so output order may vary.

Mistake 2: Trying to Access by Index

my_set[0] is invalid because sets do not support indexing.

Mistake 3: Using Mutable Types as Set Elements

Set elements must be hashable, so lists and dictionaries cannot be set elements.

Surprise Practice Challenge

Build a "Class Math Score Deduplicator":

  1. Let user input 8 math scores (allow duplicates)
  2. Convert to set to deduplicate
  3. Convert back to list
  4. Sort from high to low
  5. Print final scores

Reference implementation:

python
# Collect 8 scores from user
raw_scores = []
for i in range(1, 9):
    score = float(input(f"Enter math score #{i}: "))
    raw_scores.append(score)
 
# Deduplicate using set
unique_scores = set(raw_scores)
 
# Convert to list and sort descending
sorted_scores = sorted(list(unique_scores), reverse=True)
 
# Print result
print("Deduplicated and sorted scores:")
for score in sorted_scores:
    print(score)

FAQ

When should I use a set instead of a list?

Use a set when uniqueness and fast membership checks matter more than order.

Can set elements be duplicated?

No. Duplicate values are removed automatically.

Why does set output order look random?

Because sets are unordered collections by design.

How can I keep order and still deduplicate?

One simple way is to use dict.fromkeys() on modern Python, or combine set logic with ordered structures.