How to use Pandas std() to calculate standard deviation

Picture by Karolina Grabowska, Pexels.

5 minutes to read

Data Science Pandas

Standard deviation, STD or STDEV, is a descriptive statistic that measures the dispersion of a dataset relative to its mean and is calculated as the square root of the variance.

Pandas comes with a built-in function called std() that calculates the standard deviation of a DataFrame or Series, so it’s very easy to use. It’s also returned when you use the Pandas describe() function to get an overview of the descriptive statistics of a dataframe.

In this quick tutorial, we’ll go over some simple code samples that show you how to calculate the standard deviation of all dataframe columns, specific dataframe columns, or a single dataframe column. I’ll also show you how to return the degrees of freedom.

Import the packages

To get started, open a Jupyter notebook and import the Pandas and Numpy libraries. Then, either import some data into a Pandas dataframe from an existing dataset, or execute the code below to create a dummy dataset to use.

import pandas as pd
import numpy as np

df = pd.DataFrame({'age': [18, 19, 20, 18, 20, 21, 18, 19, 20, 21],
                   'height': [192, 189, 157, 178, 189, 201, 210, 189, 198, np.nan],
                   'weight': [69, 72, 73, 87, 89, 100, 98, 89, 72, np.nan]})
df

	age	height	weight
0	18	192.0	69.0
1	19	189.0	72.0
2	20	157.0	73.0
3	18	178.0	87.0
4	20	189.0	89.0
5	21	201.0	100.0
6	18	210.0	98.0
7	19	189.0	89.0
8	20	198.0	72.0
9	21	NaN	NaN

Calculate the standard deviation of all columns

First, we’ll use the std() method to calculate the standard deviation of all columns in the DataFrame. To do this, you simply append the std() method to the DataFrame object. It returns a Series object with the standard deviation of each column.

df.std()

age        1.173788
height    15.081261
weight    11.935009
dtype: float64

Calculate the standard deviation of a single column

To calculate the standard deviation of a single column, you can use the std() method on the column itself. Running this on the age column shows we’ve got a low standard deviation of 1.17 years so the dispersion of the data is fairly low and the subjects are of roughly similar ages.

df['age'].std()

1.1737877907772671

Calculate the standard deviation of specific columns

To calculate the standard deviation of specific columns, we can use the std() method on the DataFrame and pass the column names as a list.

df[['age', 'height']].std()

age        1.173788
height    15.081261
dtype: float64

Calculate the standard deviation and skip Nan values

By default, the std() function will skip NaN values, so using the skipna parameter is not necessary, and will return the same result as just std()

df['height'].std(skipna=True)

15.081261367818158

Calculate the standard deviation degrees of freedom

The std() function can also return the degrees of freedom or DDOF. This is the number of values that are used to calculate the standard deviation. The default value is 0, which means that the standard deviation is calculated using all the values in the column.

df['age'].std(ddof=0)

1.1135528725660042

If you set the DDOF to 1, then the standard deviation is calculated using all the values in the column except for the last value. This is useful when you want to calculate the standard deviation of a sample instead of a population.

df['age'].std(ddof=1)

1.1737877907772671

Matt Clarke, Sunday, November 27, 2022

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.