Standard deviation, STD or STDEV, is a descriptive statistic that measures the dispersion of a dataset relative to its mean and is calculated as the square root of the variance.

Pandas comes with a built-in function called `std()`

that calculates the standard deviation of a DataFrame or Series, so it’s very easy to use. It’s also returned when you use the Pandas `describe()`

function to get an overview of the descriptive statistics of a dataframe.

In this quick tutorial, we’ll go over some simple code samples that show you how to calculate the standard deviation of all dataframe columns, specific dataframe columns, or a single dataframe column. I’ll also show you how to return the degrees of freedom.

To get started, open a Jupyter notebook and import the Pandas and Numpy libraries. Then, either import some data into a Pandas dataframe from an existing dataset, or execute the code below to create a dummy dataset to use.

```
import pandas as pd
import numpy as np
```

```
df = pd.DataFrame({'age': [18, 19, 20, 18, 20, 21, 18, 19, 20, 21],
'height': [192, 189, 157, 178, 189, 201, 210, 189, 198, np.nan],
'weight': [69, 72, 73, 87, 89, 100, 98, 89, 72, np.nan]})
df
```

age | height | weight | |
---|---|---|---|

0 | 18 | 192.0 | 69.0 |

1 | 19 | 189.0 | 72.0 |

2 | 20 | 157.0 | 73.0 |

3 | 18 | 178.0 | 87.0 |

4 | 20 | 189.0 | 89.0 |

5 | 21 | 201.0 | 100.0 |

6 | 18 | 210.0 | 98.0 |

7 | 19 | 189.0 | 89.0 |

8 | 20 | 198.0 | 72.0 |

9 | 21 | NaN | NaN |

First, we’ll use the `std()`

method to calculate the standard deviation of all columns in the DataFrame. To do this, you simply append the `std()`

method to the DataFrame object. It returns a Series object with the standard deviation of each column.

```
df.std()
```

```
age 1.173788
height 15.081261
weight 11.935009
dtype: float64
```

To calculate the standard deviation of a single column, you can use the `std()`

method on the column itself. Running this on the `age`

column shows we’ve got a low standard deviation of 1.17 years so the dispersion of the data is fairly low and the subjects are of roughly similar ages.

```
df['age'].std()
```

```
1.1737877907772671
```

To calculate the standard deviation of specific columns, we can use the `std()`

method on the DataFrame and pass the column names as a list.

```
df[['age', 'height']].std()
```

```
age 1.173788
height 15.081261
dtype: float64
```

By default, the `std()`

function will skip NaN values, so using the `skipna`

parameter is not necessary, and will return the same result as just `std()`

```
df['height'].std(skipna=True)
```

```
15.081261367818158
```

The `std()`

function can also return the degrees of freedom or DDOF. This is the number of values that are used to calculate the standard deviation. The default value is 0, which means that the standard deviation is calculated using all the values in the column.

```
df['age'].std(ddof=0)
```

```
1.1135528725660042
```

If you set the DDOF to 1, then the standard deviation is calculated using all the values in the column except for the last value. This is useful when you want to calculate the standard deviation of a sample instead of a population.

```
df['age'].std(ddof=1)
```

```
1.1737877907772671
```

Matt Clarke, Sunday, November 27, 2022