Deduplicate and Sort Scores in Python
Introduction
In this chapter, you will build a practical mini exercise: deduplicate student math scores and sort them from high to low. This task is simple but very useful in real-world data cleaning scenarios. You will combine lists, sets, and sorting in one complete workflow.
Prerequisites
- Python
3.10+installed - Basic understanding of lists, sets, loops, and numeric values
- Ability to run
.pyfiles in terminal or IDE
Project Goal
Implement this requirement:
- Read student math scores (with possible duplicates)
- Remove duplicate scores
- Sort final scores from high to low
- Print result clearly
This is a common data-processing pattern in reports and ranking systems.
1) Prepare Raw Scores
You can start with hardcoded sample data.
# Raw math scores with duplicates
scores = [95, 88, 95, 76, 88, 92, 100, 76]
# Print original list
print("Original scores:")
print(scores)2) Deduplicate with set
Convert the list to a set to remove duplicates.
# Deduplicate scores
unique_scores = set(scores)
# Print deduplicated data
print("Unique scores:")
print(unique_scores)Because sets only keep unique values, repeated scores are removed automatically.
Warning
Set output order is not guaranteed.
After deduplication, always sort if order matters.
3) Sort from High to Low
Use sorted(..., reverse=True) for descending order.
# Convert to sorted list (descending)
sorted_scores = sorted(unique_scores, reverse=True)
# Print final sorted result
print("Final scores (high to low):")
print(sorted_scores)4) Full Runnable Version
Here is the complete script:
# Raw math scores with duplicates
scores = [95, 88, 95, 76, 88, 92, 100, 76]
# Deduplicate scores
unique_scores = set(scores)
# Sort from high to low
sorted_scores = sorted(unique_scores, reverse=True)
# Print final output
print("Original scores:")
print(scores)
print("Deduplicated and sorted scores (high to low):")
print(sorted_scores)5) User Input Version
Now let users input scores manually.
# Create empty list for raw scores
scores = []
# Ask how many scores to input
count = int(input("How many scores will you enter? "))
# Collect scores
for i in range(1, count + 1):
score = float(input(f"Enter score #{i}: "))
scores.append(score)
# Deduplicate and sort
unique_scores = set(scores)
sorted_scores = sorted(unique_scores, reverse=True)
# Print output
print("Deduplicated and sorted scores (high to low):")
for score in sorted_scores:
print(score)Tip
Best Practice
For score data, validate the range (0-100) before storing values.
6) Enhanced Output with Ranking Numbers
You can add ranking labels for clearer display.
# Example sorted scores
sorted_scores = [100, 95, 92, 88, 76]
# Print ranking table
print("=== Math Score Ranking ===")
for idx, score in enumerate(sorted_scores, start=1):
print(f"{idx}. {score}")Common Beginner Mistakes
Mistake 1: Forgetting Descending Sort
Default sort is ascending. For high-to-low ranking, use reverse=True.
Mistake 2: Trying to Index a Set
Sets do not support indexing. Convert to list if positional access is needed.
Mistake 3: Skipping Input Validation
Without validation, invalid scores like -5 or 120 may enter your dataset.
Surprise Practice Challenge
Upgrade this script with grade levels:
- Deduplicate and sort scores
- For each score, print grade label:
Afor score >= 90Bfor score >= 80Cfor score >= 70Dotherwise
- Show count of each grade level
If you finish this, you are moving from simple sorting to small-scale score analytics.
FAQ
Why convert list to set first?
Because set automatically removes duplicates, which simplifies preprocessing.
Can I keep insertion order while deduplicating?
Yes. You can use dict.fromkeys(scores) to keep first-appearance order in modern Python.
Should scores be int or float?
Both are fine. Use int for whole-number scoring systems and float for decimal scores.
What is the time benefit of using set for deduplication?
Set-based deduplication is generally efficient and commonly used for large datasets.