Deduplicate and Sort Scores in Python

Introduction

In this chapter, you will build a practical mini exercise: deduplicate student math scores and sort them from high to low. This task is simple but very useful in real-world data cleaning scenarios. You will combine lists, sets, and sorting in one complete workflow.

Prerequisites

  • Python 3.10+ installed
  • Basic understanding of lists, sets, loops, and numeric values
  • Ability to run .py files in terminal or IDE

Project Goal

Implement this requirement:

  • Read student math scores (with possible duplicates)
  • Remove duplicate scores
  • Sort final scores from high to low
  • Print result clearly

This is a common data-processing pattern in reports and ranking systems.

1) Prepare Raw Scores

You can start with hardcoded sample data.

python
# Raw math scores with duplicates
scores = [95, 88, 95, 76, 88, 92, 100, 76]
 
# Print original list
print("Original scores:")
print(scores)

2) Deduplicate with set

Convert the list to a set to remove duplicates.

python
# Deduplicate scores
unique_scores = set(scores)
 
# Print deduplicated data
print("Unique scores:")
print(unique_scores)

Because sets only keep unique values, repeated scores are removed automatically.

Warning

Set output order is not guaranteed.
After deduplication, always sort if order matters.

3) Sort from High to Low

Use sorted(..., reverse=True) for descending order.

python
# Convert to sorted list (descending)
sorted_scores = sorted(unique_scores, reverse=True)
 
# Print final sorted result
print("Final scores (high to low):")
print(sorted_scores)

4) Full Runnable Version

Here is the complete script:

python
# Raw math scores with duplicates
scores = [95, 88, 95, 76, 88, 92, 100, 76]
 
# Deduplicate scores
unique_scores = set(scores)
 
# Sort from high to low
sorted_scores = sorted(unique_scores, reverse=True)
 
# Print final output
print("Original scores:")
print(scores)
print("Deduplicated and sorted scores (high to low):")
print(sorted_scores)

5) User Input Version

Now let users input scores manually.

python
# Create empty list for raw scores
scores = []
 
# Ask how many scores to input
count = int(input("How many scores will you enter? "))
 
# Collect scores
for i in range(1, count + 1):
    score = float(input(f"Enter score #{i}: "))
    scores.append(score)
 
# Deduplicate and sort
unique_scores = set(scores)
sorted_scores = sorted(unique_scores, reverse=True)
 
# Print output
print("Deduplicated and sorted scores (high to low):")
for score in sorted_scores:
    print(score)

Tip

Best Practice

For score data, validate the range (0-100) before storing values.

6) Enhanced Output with Ranking Numbers

You can add ranking labels for clearer display.

python
# Example sorted scores
sorted_scores = [100, 95, 92, 88, 76]
 
# Print ranking table
print("=== Math Score Ranking ===")
for idx, score in enumerate(sorted_scores, start=1):
    print(f"{idx}. {score}")

Common Beginner Mistakes

Mistake 1: Forgetting Descending Sort

Default sort is ascending. For high-to-low ranking, use reverse=True.

Mistake 2: Trying to Index a Set

Sets do not support indexing. Convert to list if positional access is needed.

Mistake 3: Skipping Input Validation

Without validation, invalid scores like -5 or 120 may enter your dataset.

Surprise Practice Challenge

Upgrade this script with grade levels:

  1. Deduplicate and sort scores
  2. For each score, print grade label:
    • A for score >= 90
    • B for score >= 80
    • C for score >= 70
    • D otherwise
  3. Show count of each grade level

If you finish this, you are moving from simple sorting to small-scale score analytics.

FAQ

Why convert list to set first?

Because set automatically removes duplicates, which simplifies preprocessing.

Can I keep insertion order while deduplicating?

Yes. You can use dict.fromkeys(scores) to keep first-appearance order in modern Python.

Should scores be int or float?

Both are fine. Use int for whole-number scoring systems and float for decimal scores.

What is the time benefit of using set for deduplication?

Set-based deduplication is generally efficient and commonly used for large datasets.