How to use lambda functions in Pandas

Learn how to use lambda functions in Pandas to run small anonymous custom functions using apply() and assign() using method chaining.

How to use lambda functions in Pandas
Picture by Kat Smith, Pexels.
6 minutes to read

Lamdba functions are small anonymous functions that don’t need to be defined with a name. If you’re creating a function to solve a specific problem in Pandas and there’s little likelihood that you’ll need to re-use the function, you may want to consider using a lambda function in Pandas.

In this simple tutorial we’ll cover the basics of using lambda functions in Pandas and show how you can use them using the Pandas apply() function and the Pandas assign() function, both individually and via the Pandas method chaining technique.

Create a Pandas dataframe

Open a Jupyter notebook, import the Pandas library, and create a dataframe. I’ve created a dummy dataframe based on a series of dates and some ecommerce website metrics. We’ll use lambda functions to perform some calculations on these data next.

import pandas as pd

df = pd.DataFrame(
    [
        ['2020-01-01', 1872, 20, 42822],
        ['2020-01-02', 1829, 21, 42389],
        ['2020-01-03', 1823, 24, 54352],
        ['2020-01-04', 1200, 26, 56789],
    ], 
    columns=['date', 'sessions', 'transactions', 'revenue']
)
df
date sessions transactions revenue
0 2020-01-01 1872 20 42822
1 2020-01-02 1829 21 42389
2 2020-01-03 1823 24 54352
3 2020-01-04 1200 26 56789

Use apply to run a lambda function

One of the most commonly used ways to run a lambda function is via the Pandas apply() method. While the apply() method works fine, it isn’t vectorised, so it’s slower than some other techniques, and you may wish to use a different method instead.

The lambda function itself is the section of code that reads lambda x: x['revenue'] / x['transactions']. This takes x (which is the Pandas dataframe), and then divides the value in the transactions column by the value in the revenue column. The axis=1 argument is part of apply() and tells the function to look at the row level data.

df['aov'] = df.apply(lambda x: x['revenue'] / x['transactions'], axis=1)
df
date sessions transactions revenue aov conversion_rate
0 2020-01-01 1872 20 42822 2141.100000 0.010684
1 2020-01-02 1829 21 42389 2018.523810 0.011482
2 2020-01-03 1823 24 54352 2264.666667 0.013165
3 2020-01-04 1200 26 56789 2184.192308 0.021667

Use assign() to run a lambda function on two columns

The Pandas assign() method has some benefits over apply(). It can be used on multiple columns, works well with chaining, and avoids the dreaded SettingWithCopyWarning error. It’s one of the best ways to add a new Pandas column.

Since assign() returns a dataframe, you need to save the output back to the original dataframe in order to keep the column you’ve added. Here, we’re running the same lambda function to create an aov column and then saving it back to the original df.

df = df.assign(aov = lambda x: x['revenue'] / x['transactions'])
df
date sessions transactions revenue aov conversion_rate
0 2020-01-01 1872 20 42822 2141.100000 0.010684
1 2020-01-02 1829 21 42389 2018.523810 0.011482
2 2020-01-03 1823 24 54352 2264.666667 0.013165
3 2020-01-04 1200 26 56789 2184.192308 0.021667

Use method chaining to run multiple lambda functions

The other neat thing about the Pandas assign() method is that it can be used to run a series of lambda functions via a modern Pandas technique called method chaining.

To use method chaining we first wrap the chain up in parentheses to allow us to incorporate whitespace to aid visibility, and then we’ll run a series of lambda functions on various columns within a single assign() function call.

df = (df
        .assign(
                aov = lambda x: x['revenue'] / x['transactions'], 
                conversion_rate = lambda x: x['transactions'] / x['sessions']
        )
)
df
date sessions transactions revenue aov conversion_rate
0 2020-01-01 1872 20 42822 2141.100000 0.010684
1 2020-01-02 1829 21 42389 2018.523810 0.011482
2 2020-01-03 1823 24 54352 2264.666667 0.013165
3 2020-01-04 1200 26 56789 2184.192308 0.021667

Matt Clarke, Sunday, January 01, 2023

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.