Lists are one of the most widely used data storage objects or data types within Python and are used throughout every data science package. Along with the dictionary
, tuple
, and set
, the list
allow you to store data and retrieve it quickly using a variety of different techniques. Here are the basics you need to know.
Creating a list in Python is very simple. You simply assign a variable, such as companies
below, then assign a list of comma separated values within a pair of square brackets. String values need to be enclosed in double or single quotes.
companies = ['Hooli', 'Pied Piper', 'Aviato']
print(companies)
['Hooli', 'Pied Piper', 'Aviato']
Numeric values, such as int
or float
values, don’t require any quote encapsulation. You can include as many values as you like, and are free to repeat values and mix types if you wish.
numbers = [2, 6, 56, 4, 2.99, 53]
print(numbers)
[2, 6, 56, 4, 2, 53]
To add an item to an existing list, you can add append()
function to your list, i.e. companies.append()
and then pass the list value you wish to add. companies.append('Sliceline')
will add the string “Sliceline” to our companies
list.
companies.append('Sliceline')
print(companies)
['Hooli', 'Pied Piper', 'Sliceline']
The process for removing an item from a list is much the same. You just need to substitute the append()
function for remove()
, and the value you pass will be removed from the list.
companies.remove('Sliceline')
companies
['Hooli', 'Pied Piper', 'Aviato']
If you want to remove a list of items from another list, you can create a for
loop and iterate over it, then run remove()
on each of the list elements. The little function below can then be used to remove the list ['Erlich', 'Jian Yang']
assigned to remove_list
from the target_list
.
def remove_from_list(target_list, remove_list):
for item in remove_list:
if item in target_list:
target_list.remove(item)
return target_list
target_list = ['Richard', 'Gilfoyle', 'Dinesh', 'Jared', 'Erlich', 'Jian Yang']
remove_list = ['Erlich', 'Jian Yang']
new_list = remove_from_list(target_list, remove_list)
['Richard', 'Gilfoyle', 'Dinesh', 'Jared']
To merge or concatenate two or more lists together, you can simply use the +
operator. Here we’ll create a new list called technologies
and will create it by concatenating the languages
and databases
lists.
languages = ['Python', 'PHP']
databases = ['MySQL', 'PostgreSQL', 'SQL Server']
technologies = languages + databases
technologies
['Python', 'PHP', 'JavaScript', 'MySQL', 'PostgreSQL', 'SQL Server']
The other way to join lists together is to use the extend()
function. Below we’re passing the visualisation_packages
list as an argument to extend
and using it to add the list onto the existing packages
list.
packages = ['Pandas', 'NumPy']
visualisation_packages = ['Matplotlib', 'Seaborn']
packages.extend(visualisation_packages)
packages
['Pandas', 'NumPy', 'Matplotlib', 'Seaborn']
To iterate over a list, either to print the values, or perform some kind of function or calculation on the contents, you can use a for
loop. Below we’re iterating over the technologies
list with a for
loop and then printing the value assigned to technology
.
for technology in technologies:
print(technology)
Python
PHP
JavaScript
MySQL
PostgreSQL
SQL Server
To sort a list you can use the sorted()
function. By default, this places items in alphabetical order, or in ascending order of their size.
sorted_technologies = sorted(technologies)
sorted_technologies
['JavaScript', 'MySQL', 'PHP', 'PostgreSQL', 'Python', 'SQL Server']
To reverse the order of a list you can use the reverse()
function. Applying reverse()
to our list containing ['A', 'B', 'C']
will flip the sort order so they become ['C', 'B', 'A']
.
letters = ['A', 'B', 'C']
letters.reverse()
letters
['C', 'B', 'A']
To check whether a given value is present in a list or not you can use an if
statement with an in
clause or a not in
clause.
if 'Python' in technologies :
print("Python found!")
Python found!
if 'Go' not in technologies :
print("Go not found!")
Go not found!
To find the length of a list, or the total number of items present, you can use len()
and pass the list variable as the argument.
names = ['Bob', 'Mike', 'Phil', 'Bob']
len(names)
To find the number of unique mentions for a given item in a list, you can use count()
and pass the list item value as the argument. Below names.count('Bob')
returns 2
because Bob is present twice.
names = ['Bob', 'Mike', 'Phil', 'Bob']
names.count('Bob')
2
To copy or duplicate a list you can append the copy()
function.
names_again = names.copy()
names_again
['Bob', 'Mike', 'Phil', 'Bob']
To clear or empty a list you can append the clear()
function. This then returns an empty list.
names_again.clear()
names_again
[]
Although you can’t see it when you view a list, each item is given a numeric index. As with other programming languages, indices in Python start at zero, so the first item in the list below “MySQL” occupies index 0
not index 1
. You can return the index for a given list value using index()
.
databases = ['MySQL', 'PostgreSQL', 'SQL Server']
index = databases.index('PostgreSQL')
index
1
Once you’ve got the index for a list item, you can then access it directly by passing the index value to the list name in square brackets. So databases[1]
will return “PostgreSQL”, as this occupies index 1
in our list.
databases[1]
'PostgreSQL'
Matt Clarke, Monday, March 08, 2021