Have you ever needed to chop the top or bottom off a Pandas dataframe, or extract a specific section from the middle? If so, there’s a Pandas function called truncate()
to let you perform this task with ease.
There are technically two truncate()
functions in Pandas: pd.DataFrame.truncate()
works on the dataframe level, while pd.Series.truncate()
works on the series level.
In this quick tutorial I’ll show you some code examples showing how you can use the pd.DataFrame.truncate()
and pd.Series.truncate()
methods to truncate a Pandas dataframe or series by removing rows from before, after, or on either side of specific index values.
The Pandas pd.DataFrame.truncate()
functions can be used to truncate data from a specific section of a dataframe based on the value of its index. You can use it to select all rows after a specific index position, all rows before a specific index position, or all rows between two index positions.
As the name suggests, the pd.DataFrame.truncate()
method need to be appended to your Pandas dataframe object. When run, it will transform the dataframe by truncating the data, and it returns a new dataframe. As a result, it will work fine with the method chaining technique.
To get started, either import some data into a Pandas dataframe or create a dataframe containing some dummy data. Importantly, the pd.DataFrame.truncate()
method uses the index of the dataframe, so you’ll need to use the set_index()
function to change the date column to a date index.
import pandas as pd
df = pd.DataFrame({'date': ['2023-01-01', '2023-01-02', '2023-01-03',
'2023-01-04', '2023-01-05', '2023-01-06',
'2023-01-07', '2023-01-08', '2023-01-09'],
'orders': [3432, 3233, 3434, 3235, 3436,
3237, 3438, 3239, 3440]})
df = df.set_index('date')
df
orders | |
---|---|
date | |
2023-01-01 | 3432 |
2023-01-02 | 3233 |
2023-01-03 | 3434 |
2023-01-04 | 3235 |
2023-01-05 | 3436 |
2023-01-06 | 3237 |
2023-01-07 | 3438 |
2023-01-08 | 3239 |
2023-01-09 | 3440 |
The before
parameter takes an index value, which is a date in the case of our dataframe, and truncates all values before this one. If we pass before='2023-01-02
to truncate()
it will drop the 2023-01-01
row and return a subset dataframe with all rows after that one.
df1 = df.truncate(before='2023-01-02')
df1
orders | |
---|---|
date | |
2023-01-02 | 3233 |
2023-01-03 | 3434 |
2023-01-04 | 3235 |
2023-01-05 | 3436 |
2023-01-06 | 3237 |
2023-01-07 | 3438 |
2023-01-08 | 3239 |
2023-01-09 | 3440 |
Similarly, we can use the after
attribute to drop any rows after a given index position. So, if we enter after='2023-01-06
all dates after this position in the index will be dropped from the subset of the dataframe returned.
df2 = df.truncate(after='2023-01-06')
df2
orders | |
---|---|
date | |
2023-01-01 | 3432 |
2023-01-02 | 3233 |
2023-01-03 | 3434 |
2023-01-04 | 3235 |
2023-01-05 | 3436 |
2023-01-06 | 3237 |
You can combine the before
and after
parameters to select a subset of rows from within a Pandas dataframe based on their index values. If we pass before='2023-01-03'
and after='2023-01-05'
to the truncate()
function it will return the rows where the index values are between these two dates.
df3 = df.truncate(before='2023-01-03', after='2023-01-05')
df3
orders | |
---|---|
date | |
2023-01-03 | 3434 |
2023-01-04 | 3235 |
2023-01-05 | 3436 |
The examples above show how to truncate a dataframe that has a date index. However, you can use the same technique to truncate dataframes with a regular numeric index in the same way.
df_fish = pd.DataFrame(
[('Pterophyllum altum', 'Cichlidae', 15, 12.3),
('Pterophyllum scalare', 'Cichlidae', 10, 9.2),
('Pterophyllum leopoldi', 'Cichlidae', 8, 6.5),
('Corydoras paleatus', 'Callichthyidae', 5, 0.5),
('Corydoras aeneus', 'Callichthyidae', 7, 2.5),
('Corydoras trilineatus', 'Callichthyidae', 5, 1.5),
('Corydoras habrosus', 'Callichthyidae', 3, 0.5),
],
columns=['species', 'family', 'length', 'weight']
)
df_fish
species | family | length | weight | |
---|---|---|---|---|
0 | Pterophyllum altum | Cichlidae | 15 | 12.3 |
1 | Pterophyllum scalare | Cichlidae | 10 | 9.2 |
2 | Pterophyllum leopoldi | Cichlidae | 8 | 6.5 |
3 | Corydoras paleatus | Callichthyidae | 5 | 0.5 |
4 | Corydoras aeneus | Callichthyidae | 7 | 2.5 |
5 | Corydoras trilineatus | Callichthyidae | 5 | 1.5 |
6 | Corydoras habrosus | Callichthyidae | 3 | 0.5 |
To truncate the dataframe and remove all rows before index 3, we simply use before=3
and we’ll get back a subset dataframe containing rows 3 upwards.
df4 = df_fish.truncate(before=3)
df4
species | family | length | weight | |
---|---|---|---|---|
3 | Corydoras paleatus | Callichthyidae | 5 | 0.5 |
4 | Corydoras aeneus | Callichthyidae | 7 | 2.5 |
5 | Corydoras trilineatus | Callichthyidae | 5 | 1.5 |
6 | Corydoras habrosus | Callichthyidae | 3 | 0.5 |
You can also use the truncate()
method to truncate a dataframe horizontally and drop or remove columns from before, after, or on either side of specific column names. To do this you pass the column names to before
and after
and set the axis
parameter to axis='columns'
.
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
df.truncate(before='B', axis='columns')
B | C | |
---|---|---|
0 | 4 | 7 |
1 | 5 | 8 |
2 | 6 | 9 |
The pd.Series.truncate() function is slightly different to
pd.DataFrame.truncate(). This method is applied to a Pandas series or column and returns a new series with the rows removed based on your chosen
before and
after` parameter arguments.
For example, let’s say we wanted to return only the species
series from our dataframe where the index values were between 2 and 4. To do this we’d append the pd.Series.truncate()
method to our species
series, as df_fish['species'].truncate()
and then pass in before=2, after=4
as the arguments.
If you want to return a dataframe instead of a series, you can append the Pandas to_frame()
function to the end and you’ll get back a dataframe.
df_fish = pd.DataFrame(
[('Pterophyllum altum', 'Cichlidae', 15, 12.3),
('Pterophyllum scalare', 'Cichlidae', 10, 9.2),
('Pterophyllum leopoldi', 'Cichlidae', 8, 6.5),
('Corydoras paleatus', 'Callichthyidae', 5, 0.5),
('Corydoras aeneus', 'Callichthyidae', 7, 2.5),
('Corydoras trilineatus', 'Callichthyidae', 5, 1.5),
('Corydoras habrosus', 'Callichthyidae', 3, 0.5),
],
columns=['species', 'family', 'length', 'weight']
)
df_fish['species'].truncate(before=2, after=4)
2 Pterophyllum leopoldi
3 Corydoras paleatus
4 Corydoras aeneus
Name: species, dtype: object
df_fish['species'].truncate(before=2, after=4).to_frame()
species | |
---|---|
2 | Pterophyllum leopoldi |
3 | Corydoras paleatus |
4 | Corydoras aeneus |
Matt Clarke, Tuesday, January 17, 2023