The Pandas library is so versatile that it provides several ways to create a dataframe. One of the most commonly used is the from_dict()
method, which allows you to create a dataframe from a Python dictionary. In this tutorial, we will learn how to use the from_dict()
method to create a dataframe and use the various optional arguments of the method.
The Pandas from_dict()
function is used to create a new dataframe from a Python dictionary or a list of dictionaries. There’s often no need to pass in any parameters to the function, apart from the dictionary itself, but there are several optional arguments that can be used to change specific behaviours.
Argument | Description |
---|---|
data |
The default argument, data , does not need to be named explicitly and accepts either a single dictionary or a list of dictionaries from which to construct a new Pandas dataframe. |
orient |
The orient argument specifies the format of the input data. The default value is columns , which means that the keys of the dictionary are interpreted as column names and the values are interpreted as column values. The other possible values are index , which means that the keys of the dictionary are interpreted as row names and the values are interpreted as row values, and tight (since Pandas 1.4.0), which assumes a dictionary with keys ['index', 'columns', 'data', 'index_names', 'column_names'] . |
dtype |
The dtype argument specifies the data type of the resulting dataframe. The default value is None , which means that the data type is inferred from the input data. The other possible value is float , which means that all values are converted to floating point numbers. |
columns |
The columns argument specifies the column names of the resulting dataframe. The default value is None , which means that the column names are inferred from the input data. The other possible value is a list of column names. |
To get started, open a Jupyter notebook and import the Pandas library.
import pandas as pd
To create a Pandas dataframe from a Python dictionary using from_dict()
you first create a dictionary and then pass it to from_dict()
using the format pd.DataFrame.from_dict(data)
.
We’ll create a simple dictionary and will define the column names and lists of values. When these are passed to from_dict()
, Pandas will return a new dataframe constructed from the dictionary.
data = {'Monday': [12.3, 13.4, 12.5, 11.9],
'Tuesday': [11.3, 12.4, 13.5, 12.9],
'Wednesday': [10.3, 11.4, 12.5, 11.9]}
df = pd.DataFrame.from_dict(data)
df
Monday | Tuesday | Wednesday | |
---|---|---|---|
0 | 12.3 | 11.3 | 10.3 |
1 | 13.4 | 12.4 | 11.4 |
2 | 12.5 | 13.5 | 12.5 |
3 | 11.9 | 12.9 | 11.9 |
As you can see below, because the default value of the orient
parameter is columns
, the keys of the dictionary are used as the column names and the values of the dictionary are used as the column values.
df = pd.DataFrame.from_dict(data, orient='columns')
df
Monday | Tuesday | Wednesday | |
---|---|---|---|
0 | 12.3 | 11.3 | 10.3 |
1 | 13.4 | 12.4 | 11.4 |
2 | 12.5 | 13.5 | 12.5 |
3 | 11.9 | 12.9 | 11.9 |
The orient='index'
argument will create a dataframe from a dictionary where the keys are the column names and the values are the rows. This will flip our data so that we get all the data for Monday in one row, all the data for Tuesday in one row, and so on.
data = {'Monday': [12.3, 13.4, 12.5, 11.9],
'Tuesday': [11.3, 12.4, 13.5, 12.9],
'Wednesday': [10.3, 11.4, 12.5, 11.9]}
df = pd.DataFrame.from_dict(data, orient='index')
df
0 | 1 | 2 | 3 | |
---|---|---|---|---|
Monday | 12.3 | 13.4 | 12.5 | 11.9 |
Tuesday | 11.3 | 12.4 | 13.5 | 12.9 |
Wednesday | 10.3 | 11.4 | 12.5 | 11.9 |
The columns
argument can be used to set the column names in the dataframe. It’s most useful when using orient='index'
to create a dataframe from a dictionary of lists. To use it, you need to pass a list of column names to the columns
argument.
data = {'Monday': [12.3, 13.4, 12.5, 11.9],
'Tuesday': [11.3, 12.4, 13.5, 12.9],
'Wednesday': [10.3, 11.4, 12.5, 11.9]}
df = pd.DataFrame.from_dict(data, orient='index', columns=['A', 'B', 'C', 'D'])
df
A | B | C | D | |
---|---|---|---|---|
Monday | 12.3 | 13.4 | 12.5 | 11.9 |
Tuesday | 11.3 | 12.4 | 13.5 | 12.9 |
Wednesday | 10.3 | 11.4 | 12.5 | 11.9 |
Sometimes when creating a dataframe, the values in your dictionary may be stored as the wrong data type or dtype. This means you’d have to convert the column values to the correct dtype in order to apply certain Pandas functions to the data.
To show this, we’ll first create a dataframe using from_dict()
in which we’ve intentionally set the values for Monday as string or object
dtype values instead of floats.
data = {'Monday': ['12.3', '13.4', '12.5', '11.9'],
'Tuesday': [11.3, 12.4, 13.5, 12.9],
'Wednesday': [10.3, 11.4, 12.5, 11.9]}
df = pd.DataFrame.from_dict(data)
df
Monday | Tuesday | Wednesday | |
---|---|---|---|
0 | 12.3 | 11.3 | 10.3 |
1 | 13.4 | 12.4 | 11.4 |
2 | 12.5 | 13.5 | 12.5 |
3 | 11.9 | 12.9 | 11.9 |
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Monday 4 non-null object
1 Tuesday 4 non-null float64
2 Wednesday 4 non-null float64
dtypes: float64(2), object(1)
memory usage: 224.0+ bytes
To set all the values in the dataframe to the float
dtype
, we can repeat the same code but pass in the argument dtype=float
to set all column values to floats. The dtype
parameter is completely optional and if you don’t set it Pandas will infer the dtype it should use from the values in each column, however, it can be useful for forcing the dtype to show correctly when it’s been passed in error.
data = {'Monday': ['12.3', '13.4', '12.5', '11.9'],
'Tuesday': [11, 12, 13, 12],
'Wednesday': [10.3, 11.4, 12.5, 11.9]}
df = pd.DataFrame.from_dict(data, dtype=float)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Monday 4 non-null float64
1 Tuesday 4 non-null float64
2 Wednesday 4 non-null float64
dtypes: float64(3)
memory usage: 224.0 bytes
Matt Clarke, Thursday, January 05, 2023