 Picture by Karolina Grabowska, Pexels.

Standard deviation, STD or STDEV, is a descriptive statistic that measures the dispersion of a dataset relative to its mean and is calculated as the square root of the variance.

Pandas comes with a built-in function called `std()` that calculates the standard deviation of a DataFrame or Series, so it’s very easy to use. It’s also returned when you use the Pandas `describe()` function to get an overview of the descriptive statistics of a dataframe.

In this quick tutorial, we’ll go over some simple code samples that show you how to calculate the standard deviation of all dataframe columns, specific dataframe columns, or a single dataframe column. I’ll also show you how to return the degrees of freedom.

### Import the packages

To get started, open a Jupyter notebook and import the Pandas and Numpy libraries. Then, either import some data into a Pandas dataframe from an existing dataset, or execute the code below to create a dummy dataset to use.

``````import pandas as pd
import numpy as np
``````
``````df = pd.DataFrame({'age': [18, 19, 20, 18, 20, 21, 18, 19, 20, 21],
'height': [192, 189, 157, 178, 189, 201, 210, 189, 198, np.nan],
'weight': [69, 72, 73, 87, 89, 100, 98, 89, 72, np.nan]})
df
``````
age height weight
0 18 192.0 69.0
1 19 189.0 72.0
2 20 157.0 73.0
3 18 178.0 87.0
4 20 189.0 89.0
5 21 201.0 100.0
6 18 210.0 98.0
7 19 189.0 89.0
8 20 198.0 72.0
9 21 NaN NaN

### Calculate the standard deviation of all columns

First, we’ll use the `std()` method to calculate the standard deviation of all columns in the DataFrame. To do this, you simply append the `std()` method to the DataFrame object. It returns a Series object with the standard deviation of each column.

``````df.std()
``````
``````age        1.173788
height    15.081261
weight    11.935009
dtype: float64
``````

### Calculate the standard deviation of a single column

To calculate the standard deviation of a single column, you can use the `std()` method on the column itself. Running this on the `age` column shows we’ve got a low standard deviation of 1.17 years so the dispersion of the data is fairly low and the subjects are of roughly similar ages.

``````df['age'].std()
``````
``````1.1737877907772671
``````

### Calculate the standard deviation of specific columns

To calculate the standard deviation of specific columns, we can use the `std()` method on the DataFrame and pass the column names as a list.

``````df[['age', 'height']].std()
``````
``````age        1.173788
height    15.081261
dtype: float64
``````

### Calculate the standard deviation and skip Nan values

By default, the `std()` function will skip NaN values, so using the `skipna` parameter is not necessary, and will return the same result as just `std()`

``````df['height'].std(skipna=True)
``````
``````15.081261367818158
``````

### Calculate the standard deviation degrees of freedom

The `std()` function can also return the degrees of freedom or DDOF. This is the number of values that are used to calculate the standard deviation. The default value is 0, which means that the standard deviation is calculated using all the values in the column.

``````df['age'].std(ddof=0)
``````
``````1.1135528725660042
``````

If you set the DDOF to 1, then the standard deviation is calculated using all the values in the column except for the last value. This is useful when you want to calculate the standard deviation of a sample instead of a population.

``````df['age'].std(ddof=1)
``````
``````1.1737877907772671
``````

Matt Clarke, Sunday, November 27, 2022

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.