When working with Python lists you’ll often encounter times when you need to remove duplicate values present in a single list, remove duplicates found in multiple lists, or identify the duplicate or unique values in one or more lists.
Python includes a number of features that make deduplicating list values fairly simple. In this quick tutorial I’ll show you how you can use three Python features dict.fromkeys()
, set()
and intersection()
to identify and remove duplicate or unique values in Python lists.
First, let’s create a couple of lists that contain a few duplicate values, and some values that are unique to each list. As you can see, Macan
and Cayenne
are found in both lists, but Range Rover
is only found in the first and G Wagen
and Defender
are only found in the second.
first = ['Macan', 'Cayenne', 'Range Rover']
second = ['Macan', 'Cayenne', 'G Wagen', 'Defender']
To identify values only found in the first list we can use the Python set()
function. We’ll pass the first
list to set()
then subtract the output of passing the second
list to set()
, then we’ll cast the output to a list using list()
.
found_in_first_only = list(set(first) - set(second))
If you print found_in_first_only
you’ll get back a list of the values that are unique to the first
list.
found_in_first_only
['Range Rover']
To identify values only found in the second
list we can use set()
again but instead subtract set(first)
from set(second)
to get the values that are unique to the second
list only.
found_in_second_only = list(set(second) - set(first))
Printing found_in_second_only
reveals that the Defender
and G Wagen
values are only found in the second
list and not the first
list.
found_in_second_only
['Defender', 'G Wagen']
To identify items that are duplicated and found in both the first
and second
lists we can use another Python function called intersection()
. Here we’ll append .intersection(set(second))
to set(first)
and then cast the output to a list using list()
.
found_in_both = list(set(first).intersection(set(second)))
Printing found_in_both
shows us that the values Cayenne
and Macan
were duplicated and present in both the first
and second
lists.
found_in_both
['Cayenne', 'Macan']
Another common problem you’ll encounter is identifying, removing, or deduping duplicate values present in a single Python list. This is also pretty easy. First, we’ll use extend()
to join the first
and second
lists together into a continuous list containing some duplicate values.
first.extend(second)
Next we’ll use dict.fromkeys(first)
and then cast the output to a list using list()
and assign it to deduped
. This returns a Python list containing only the unique values with the duplicate values removed.
deduped = list(dict.fromkeys(first))
deduped
['Macan', 'Cayenne', 'Range Rover', 'G Wagen', 'Defender']
Here’s another example of that in action, showing a single list containing duplicate values, that we’re deduping using dict.fromkeys()
. It’s a quick and easy way to dedupe a Python list and find the unique values.
cars = ['Maserati', 'Ferrari', 'Porsche', 'Gilbern', 'Bitter', 'Bitter', 'Lotus', 'Lotus']
deduped = list(dict.fromkeys(cars))
deduped
['Maserati', 'Ferrari', 'Porsche', 'Gilbern', 'Bitter', 'Lotus']
Matt Clarke, Saturday, April 23, 2022