How to use Pandas concat() to concatenate dataframes

Picture by Jeshoots, Pexels.

10 minutes to read

Data Science Pandas

The Pandas concat() function is used to concatenate (or join together) two or more Pandas objects such as dataframes or series. It can be used to join two dataframes together vertically or horizontally, or add additional rows or columns.

Pandas concat() is an important function to learn, since the function usually used for these tasks - the Pandas append() function - is deprecated since version 1.4.0 and may eventually be removed from Pandas.

This is mildly annoying, as I prefer append(), but concat() has the benefit of being a bit quicker as it adds additional data in a single operation, rather than doing it iteratively. That’s going to have a more noticeable effect on larger datasets than those I work on. However, while it’s a bit fiddlier to use, it’s faster and the recommended way to concatenate data, so you need to learn it.

The concat() function

The concat() function has various optional parameters, but the essential one you need to pass is called objs and defines the Pandas objects you want to concatenate - whether that’s Series or DataFrame objects. There are also a number of other optional parameters. Here’s a quick summary of what they do and why you might need to use them.

Parameter	Description
`objs`	The `objs` parameter is a list of Series, DataFrames, or Panels to be concatenated together.
`axis`	The `axis` parameter is the axis to concatenate along. The default is `0`, which concatenates along the rows, while passing 1 will concatenate along the columns.
`join`	The `join` parameter is the type of join to perform. The default is `outer`, which performs a full outer join. Other options include `inner`, which performs an inner join.
`ignore_index`	The `ignore_index` parameter is a boolean value that determines whether to ignore the index of the individual DataFrames. The default is `False`, which preserves the index of each DataFrame. If set to `True`, the index of each DataFrame will be ignored.
`keys`	The `keys` parameter is a list of values that will be used to create a hierarchical index. The length of the list must match the number of DataFrames being concatenated.
`levels`	The `levels` parameter is a list of lists that specifies the levels to use for the hierarchical index. The length of the list must match the number of DataFrames being concatenated.
`names`	The `names` parameter is a list of values that will be used to name the hierarchical index. The length of the list must match the number of DataFrames being concatenated.
`verify_integrity`	The `verify_integrity` parameter is a boolean value that determines whether to check for duplicates in the concatenated DataFrame. The default is `False`, which does not check for duplicates. If set to `True`, an exception will be raised if there are duplicates.
`sort`	The `sort` parameter is a boolean value that determines whether to sort the concatenated DataFrame. The default is `False`, which does not sort the DataFrame. If set to `True`, the DataFrame will be sorted.

Import Pandas

To get started, open a Jupyter notebook and import the Pandas library using the as pd naming convention.

import pandas as pd

Use concat() to concatenate two dataframes vertically

First, we’ll use concat() to concatenate or join together two dataframes vertically. We’ll create two dataframes that contain the same column names.

df1 = pd.DataFrame(
    [('Pterophyllum altum', 3, 12.5, 13.3), 
     ('Pterophyllum scalare', 2, 10.0, 11.0),
     ('Pterophyllum leopoldi', 1, 8.0, 9.0)], 
    columns=['species', 'age', 'length', 'weight']
)
df1

	species	age	length	weight
0	Pterophyllum altum	3	12.5	13.3
1	Pterophyllum scalare	2	10.0	11.0
2	Pterophyllum leopoldi	1	8.0	9.0

df2 = pd.DataFrame(
    [('Vieja synspila', 2, 10.0, 11.0),
    ('Altolamprologus calvus', 1, 8.0, 9.0)],
    columns=['species', 'age', 'length', 'weight']
)
df2

	species	age	length	weight
0	Vieja synspila	2	10.0	11.0
1	Altolamprologus calvus	1	8.0	9.0

To concatenate the dataframes vertically (i.e. one below the other), we can use the concat function with no additional arguments. We’ll pass the objs parameter a list of the dataframes we want to concatenate. Since the column names are the same, we end up with a neat dataframe with df1 stacked vertically on top of df2 beneath.

df3 = pd.concat([df1, df2])
df3

	species	age	length	weight
0	Pterophyllum altum	3	12.5	13.3
1	Pterophyllum scalare	2	10.0	11.0
2	Pterophyllum leopoldi	1	8.0	9.0
0	Vieja synspila	2	10.0	11.0
1	Altolamprologus calvus	1	8.0	9.0

Use concat() with ignore_index=True

If you look closely at the example above, you’ll spot that the index doesn’t look quite right, since it contains several 0 and 1 values from concatenating the two dataframes. To fix this, you can use the ignore_index parameter to create a new index. Now we get back a neater dataframe in which the index is reset to appear in the correct order.

df4 = pd.concat([df1, df2], ignore_index=True)
df4

	species	age	length	weight
0	Pterophyllum altum	3	12.5	13.3
1	Pterophyllum scalare	2	10.0	11.0
2	Pterophyllum leopoldi	1	8.0	9.0
3	Vieja synspila	2	10.0	11.0
4	Altolamprologus calvus	1	8.0	9.0

Use concat() to concatenate a new row

The concat() function can also be used to add a new row to a DataFrame. There are a couple of ways to do this. The first way is to create a new dataframe with the same columns and index with a single row, and then concatenate it vertically using the ignore_index=True argument.

new_row = pd.DataFrame(
    [('Pterophyllum altum', 4, 14.0, 15.0)],
    columns=['species', 'age', 'length', 'weight']
)

df5 = pd.concat([df4, new_row], ignore_index=True)
df5

	species	age	length	weight
0	Pterophyllum altum	3	12.5	13.3
1	Pterophyllum scalare	2	10.0	11.0
2	Pterophyllum leopoldi	1	8.0	9.0
3	Vieja synspila	2	10.0	11.0
4	Altolamprologus calvus	1	8.0	9.0
5	Pterophyllum altum	4	14.0	15.0

Use concat() to add a new column to a dataframe

The concat() method can also be used to add a new column to a dataframe. To do this, we need to create a new series using pd.Series() and then concatenate it to the other dataframe. When calling pd.Series() you need to ensure you provide the correct number of values, and define the name parameter to give the new column a name.

df6 = pd.concat([df5, pd.Series([5, 9, 23, 4, 9, 23], name='price')], axis=1)
df6

	species	age	length	weight	price
0	Pterophyllum altum	3	12.5	13.3	5
1	Pterophyllum scalare	2	10.0	11.0	9
2	Pterophyllum leopoldi	1	8.0	9.0	23
3	Vieja synspila	2	10.0	11.0	4
4	Altolamprologus calvus	1	8.0	9.0	9
5	Pterophyllum altum	4	14.0	15.0	23

Matt Clarke, Sunday, November 27, 2022

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.