Lamdba functions are small anonymous functions that don’t need to be defined with a name. If you’re creating a function to solve a specific problem in Pandas and there’s little likelihood that you’ll need to re-use the function, you may want to consider using a lambda
function in Pandas.
In this simple tutorial we’ll cover the basics of using lambda
functions in Pandas and show how you can use them using the Pandas apply()
function and the Pandas assign()
function, both individually and via the Pandas method chaining technique.
Open a Jupyter notebook, import the Pandas library, and create a dataframe. I’ve created a dummy dataframe based on a series of dates and some ecommerce website metrics. We’ll use lambda
functions to perform some calculations on these data next.
import pandas as pd
df = pd.DataFrame(
[
['2020-01-01', 1872, 20, 42822],
['2020-01-02', 1829, 21, 42389],
['2020-01-03', 1823, 24, 54352],
['2020-01-04', 1200, 26, 56789],
],
columns=['date', 'sessions', 'transactions', 'revenue']
)
df
date | sessions | transactions | revenue | |
---|---|---|---|---|
0 | 2020-01-01 | 1872 | 20 | 42822 |
1 | 2020-01-02 | 1829 | 21 | 42389 |
2 | 2020-01-03 | 1823 | 24 | 54352 |
3 | 2020-01-04 | 1200 | 26 | 56789 |
One of the most commonly used ways to run a lambda function is via the Pandas apply()
method. While the apply()
method works fine, it isn’t vectorised, so it’s slower than some other techniques, and you may wish to use a different method instead.
The lambda
function itself is the section of code that reads lambda x: x['revenue'] / x['transactions']
. This takes x
(which is the Pandas dataframe), and then divides the value in the transactions
column by the value in the revenue
column. The axis=1
argument is part of apply()
and tells the function to look at the row level data.
df['aov'] = df.apply(lambda x: x['revenue'] / x['transactions'], axis=1)
df
date | sessions | transactions | revenue | aov | conversion_rate | |
---|---|---|---|---|---|---|
0 | 2020-01-01 | 1872 | 20 | 42822 | 2141.100000 | 0.010684 |
1 | 2020-01-02 | 1829 | 21 | 42389 | 2018.523810 | 0.011482 |
2 | 2020-01-03 | 1823 | 24 | 54352 | 2264.666667 | 0.013165 |
3 | 2020-01-04 | 1200 | 26 | 56789 | 2184.192308 | 0.021667 |
The Pandas assign()
method has some benefits over apply()
. It can be used on multiple columns, works well with chaining, and avoids the dreaded SettingWithCopyWarning
error. It’s one of the best ways to add a new Pandas column.
Since assign()
returns a dataframe, you need to save the output back to the original dataframe in order to keep the column you’ve added. Here, we’re running the same lambda
function to create an aov
column and then saving it back to the original df
.
df = df.assign(aov = lambda x: x['revenue'] / x['transactions'])
df
date | sessions | transactions | revenue | aov | conversion_rate | |
---|---|---|---|---|---|---|
0 | 2020-01-01 | 1872 | 20 | 42822 | 2141.100000 | 0.010684 |
1 | 2020-01-02 | 1829 | 21 | 42389 | 2018.523810 | 0.011482 |
2 | 2020-01-03 | 1823 | 24 | 54352 | 2264.666667 | 0.013165 |
3 | 2020-01-04 | 1200 | 26 | 56789 | 2184.192308 | 0.021667 |
The other neat thing about the Pandas assign()
method is that it can be used to run a series of lambda
functions via a modern Pandas technique called method chaining.
To use method chaining we first wrap the chain up in parentheses to allow us to incorporate whitespace to aid visibility, and then we’ll run a series of lambda
functions on various columns within a single assign()
function call.
df = (df
.assign(
aov = lambda x: x['revenue'] / x['transactions'],
conversion_rate = lambda x: x['transactions'] / x['sessions']
)
)
df
date | sessions | transactions | revenue | aov | conversion_rate | |
---|---|---|---|---|---|---|
0 | 2020-01-01 | 1872 | 20 | 42822 | 2141.100000 | 0.010684 |
1 | 2020-01-02 | 1829 | 21 | 42389 | 2018.523810 | 0.011482 |
2 | 2020-01-03 | 1823 | 24 | 54352 | 2264.666667 | 0.013165 |
3 | 2020-01-04 | 1200 | 26 | 56789 | 2184.192308 | 0.021667 |
Matt Clarke, Sunday, January 01, 2023