The Pandas append()
function is commonly used for appending or adding new rows to the bottom of an existing Pandas dataframe, or joining or concatenating dataframes vertically. However, while still in common use, it was actually deprecated in version 1.4.0 of Pandas, which means it’s eventually going to be retired from the Pandas extension.
Pandas is currently on version 1.5.2, so it’s days are numbered. The official Pandas documentation recommends that you use the concat()
function instead of append()
. The concat()
function isn’t quite the same, and does take a bit of getting used to, especially when we’ve become used to using append()
for so long, but it still does the job. It’s also much more efficient.
If you want to continue using append()
in your code, you can still do so. Here’s how it works and how you can use it to easily append a single row, multiple rows, or even an entire dataframe to an existing dataframe.
One common use for the append()
function is to combine or concatenate two dataframes vertically - i.e. one dataframe on top of the other. To see this in action, import pandas as pd
and then create a couple of dataframes with identical column names and column orders.
import pandas as pd
df = pd.DataFrame(
[('Pterophyllum altum', 3, 12.5, 13.3),
('Pterophyllum scalare', 2, 10.0, 11.0),
('Pterophyllum leopoldi', 1, 8.0, 9.0)],
columns=['species', 'age', 'length', 'weight']
)
df
species | age | length | weight | |
---|---|---|---|---|
0 | Pterophyllum altum | 3 | 12.5 | 13.3 |
1 | Pterophyllum scalare | 2 | 10.0 | 11.0 |
2 | Pterophyllum leopoldi | 1 | 8.0 | 9.0 |
df2 = pd.DataFrame(
[('Vieja synspila', 2, 10.0, 11.0),
('Altolamprologus calvus', 1, 8.0, 9.0)],
columns=['species', 'age', 'length', 'weight']
)
df2
species | age | length | weight | |
---|---|---|---|---|
0 | Vieja synspila | 2 | 10.0 | 11.0 |
1 | Altolamprologus calvus | 1 | 8.0 | 9.0 |
To append one dataframe to another vertically, you simply use .append()
on the first dataframe and pass in the value of the second dataframe as the argument, i.e. df.append(df2)
. The append()
function will return a new dataframe containing both dataframes stacked on top of each other.
df = df.append(df2)
df
species | age | length | weight | |
---|---|---|---|---|
0 | Pterophyllum altum | 3 | 12.5 | 13.3 |
1 | Pterophyllum scalare | 2 | 10.0 | 11.0 |
2 | Pterophyllum leopoldi | 1 | 8.0 | 9.0 |
0 | Vieja synspila | 2 | 10.0 | 11.0 |
1 | Altolamprologus calvus | 1 | 8.0 | 9.0 |
The append()
function can also be used to add one or more rows to an existing dataframe. It’s quite versatile and there are several ways to do this, which is why it’s been such a popular technique historically with data scientists. The first approach we’ll cover is by adding a new row to the dataframe using pd.Series()
.
First, we create a new Pandas Series by calling pd.Series()
and assigning the object returned to a variable called row
. When creating the Series, we need to provide a list of values to the first argument, and then a list of column names to the index
argument. Next, we call the append()
function and pass in the row
variable containing our Pandas Series and set the ignore_index
parameter to True
.
row = pd.Series(['Paracheirodon simulans', 1, 4.0, 3.0], index=['species', 'age', 'length', 'weight'])
df = df.append(row, ignore_index=True)
df
species | age | length | weight | |
---|---|---|---|---|
0 | Pterophyllum altum | 3 | 12.5 | 13.3 |
1 | Pterophyllum scalare | 2 | 10.0 | 11.0 |
2 | Pterophyllum leopoldi | 1 | 8.0 | 9.0 |
3 | Vieja synspila | 2 | 10.0 | 11.0 |
4 | Altolamprologus calvus | 1 | 8.0 | 9.0 |
5 | Paracheirodon simulans | 1 | 4.0 | 3.0 |
6 | Paracheirodon simulans | 1 | 4.0 | 3.0 |
The other popular way to use append()
, and the one I’ve tended to use the most, is to pass in the new row as a Python dictionary. We’ll create a dictionary called row
and will populate it with values and column names that match the dataframe, then we’ll call append()
and pass in the row
dictionary and set ignore_index
to True
. Our dictionary row is appended to the bottom of the dataframe.
row = {'species': 'Tetraodon nigroviridis', 'age': 1, 'length': 8.0, 'weight': 9.0}
df = df.append(row, ignore_index=True)
df
species | age | length | weight | |
---|---|---|---|---|
0 | Pterophyllum altum | 3 | 12.5 | 13.3 |
1 | Pterophyllum scalare | 2 | 10.0 | 11.0 |
2 | Pterophyllum leopoldi | 1 | 8.0 | 9.0 |
3 | Vieja synspila | 2 | 10.0 | 11.0 |
4 | Altolamprologus calvus | 1 | 8.0 | 9.0 |
5 | Paracheirodon innesi | 1 | 8.0 | 9.0 |
6 | Paracheirodon innesi | 1 | 8.0 | 9.0 |
7 | Tetraodon nigroviridis | 1 | 8.0 | 9.0 |
With append()
you’re not limited to passing in a single row, or a single dataframe. You can also pass in a list of dictionaries containing multiple rows. To do this, you simply construct dictionaries that match the column names and formatting of your original dataframe but assign several of them to a Python list, instead of a single dictionary.
rows = [{'species': 'Tetraodon lineatus', 'age': 1, 'length': 8.0, 'weight': 9.0},
{'species': 'Tetraodon fahaka', 'age': 1, 'length': 8.0, 'weight': 9.0}]
df = df.append(rows, ignore_index=True)
df
species | age | length | weight | |
---|---|---|---|---|
0 | Pterophyllum altum | 3 | 12.5 | 13.3 |
1 | Pterophyllum scalare | 2 | 10.0 | 11.0 |
2 | Pterophyllum leopoldi | 1 | 8.0 | 9.0 |
3 | Vieja synspila | 2 | 10.0 | 11.0 |
4 | Altolamprologus calvus | 1 | 8.0 | 9.0 |
5 | Paracheirodon innesi | 1 | 8.0 | 9.0 |
6 | Paracheirodon innesi | 1 | 8.0 | 9.0 |
7 | Tetraodon nigroviridis | 1 | 8.0 | 9.0 |
8 | Tetraodon lineatus | 1 | 8.0 | 9.0 |
9 | Tetraodon fahaka | 1 | 8.0 | 9.0 |
As stated above, while append()
still works, you might get a deprecation warning back from Pandas when running it, as it’s eventually going to be retired or deprecated. Instead of append()
, Pandas recommends that you switch to using the concat()
function instead.
Matt Clarke, Saturday, November 26, 2022