How to use Pandas from_dict() to create a dataframe

Picture by Pixabay, Pexels.

10 minutes to read

Data Science Pandas

The Pandas library is so versatile that it provides several ways to create a dataframe. One of the most commonly used is the from_dict() method, which allows you to create a dataframe from a Python dictionary. In this tutorial, we will learn how to use the from_dict() method to create a dataframe and use the various optional arguments of the method.

The Pandas from_dict() function is used to create a new dataframe from a Python dictionary or a list of dictionaries. There’s often no need to pass in any parameters to the function, apart from the dictionary itself, but there are several optional arguments that can be used to change specific behaviours.

Argument	Description
`data`	The default argument, `data`, does not need to be named explicitly and accepts either a single dictionary or a list of dictionaries from which to construct a new Pandas dataframe.
`orient`	The `orient` argument specifies the format of the input data. The default value is `columns`, which means that the keys of the dictionary are interpreted as column names and the values are interpreted as column values. The other possible values are `index`, which means that the keys of the dictionary are interpreted as row names and the values are interpreted as row values, and `tight` (since Pandas 1.4.0), which assumes a dictionary with keys `['index', 'columns', 'data', 'index_names', 'column_names']`.
`dtype`	The `dtype` argument specifies the data type of the resulting dataframe. The default value is `None`, which means that the data type is inferred from the input data. The other possible value is `float`, which means that all values are converted to floating point numbers.
`columns`	The `columns` argument specifies the column names of the resulting dataframe. The default value is `None`, which means that the column names are inferred from the input data. The other possible value is a list of column names.

Load Pandas

To get started, open a Jupyter notebook and import the Pandas library.

import pandas as pd

Create a Pandas dataframe using from_dict()

To create a Pandas dataframe from a Python dictionary using from_dict() you first create a dictionary and then pass it to from_dict() using the format pd.DataFrame.from_dict(data).

We’ll create a simple dictionary and will define the column names and lists of values. When these are passed to from_dict(), Pandas will return a new dataframe constructed from the dictionary.

data = {'Monday': [12.3, 13.4, 12.5, 11.9],
        'Tuesday': [11.3, 12.4, 13.5, 12.9],
        'Wednesday': [10.3, 11.4, 12.5, 11.9]}

df = pd.DataFrame.from_dict(data)
df

	Monday	Tuesday	Wednesday
0	12.3	11.3	10.3
1	13.4	12.4	11.4
2	12.5	13.5	12.5
3	11.9	12.9	11.9

As you can see below, because the default value of the orient parameter is columns, the keys of the dictionary are used as the column names and the values of the dictionary are used as the column values.

df = pd.DataFrame.from_dict(data, orient='columns')
df

	Monday	Tuesday	Wednesday
0	12.3	11.3	10.3
1	13.4	12.4	11.4
2	12.5	13.5	12.5
3	11.9	12.9	11.9

Using orient=’index’ to create a dataframe from a dictionary

The orient='index' argument will create a dataframe from a dictionary where the keys are the column names and the values are the rows. This will flip our data so that we get all the data for Monday in one row, all the data for Tuesday in one row, and so on.

data = {'Monday': [12.3, 13.4, 12.5, 11.9],
        'Tuesday': [11.3, 12.4, 13.5, 12.9],
        'Wednesday': [10.3, 11.4, 12.5, 11.9]}

df = pd.DataFrame.from_dict(data, orient='index')
df

	0	1	2	3
Monday	12.3	13.4	12.5	11.9
Tuesday	11.3	12.4	13.5	12.9
Wednesday	10.3	11.4	12.5	11.9

Using columns to set the column names

The columns argument can be used to set the column names in the dataframe. It’s most useful when using orient='index' to create a dataframe from a dictionary of lists. To use it, you need to pass a list of column names to the columns argument.

data = {'Monday': [12.3, 13.4, 12.5, 11.9],
        'Tuesday': [11.3, 12.4, 13.5, 12.9],
        'Wednesday': [10.3, 11.4, 12.5, 11.9]}

df = pd.DataFrame.from_dict(data, orient='index', columns=['A', 'B', 'C', 'D'])
df

	A	B	C	D
Monday	12.3	13.4	12.5	11.9
Tuesday	11.3	12.4	13.5	12.9
Wednesday	10.3	11.4	12.5	11.9

Using dtype to specify the data type of the columns

Sometimes when creating a dataframe, the values in your dictionary may be stored as the wrong data type or dtype. This means you’d have to convert the column values to the correct dtype in order to apply certain Pandas functions to the data.

To show this, we’ll first create a dataframe using from_dict() in which we’ve intentionally set the values for Monday as string or object dtype values instead of floats.

data = {'Monday': ['12.3', '13.4', '12.5', '11.9'],
        'Tuesday': [11.3, 12.4, 13.5, 12.9],
        'Wednesday': [10.3, 11.4, 12.5, 11.9]}

df = pd.DataFrame.from_dict(data)
df

	Monday	Tuesday	Wednesday
0	12.3	11.3	10.3
1	13.4	12.4	11.4
2	12.5	13.5	12.5
3	11.9	12.9	11.9

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Monday     4 non-null      object 
 1   Tuesday    4 non-null      float64
 2   Wednesday  4 non-null      float64
dtypes: float64(2), object(1)
memory usage: 224.0+ bytes

To set all the values in the dataframe to the float dtype, we can repeat the same code but pass in the argument dtype=float to set all column values to floats. The dtype parameter is completely optional and if you don’t set it Pandas will infer the dtype it should use from the values in each column, however, it can be useful for forcing the dtype to show correctly when it’s been passed in error.

data = {'Monday': ['12.3', '13.4', '12.5', '11.9'],
        'Tuesday': [11, 12, 13, 12],
        'Wednesday': [10.3, 11.4, 12.5, 11.9]}

df = pd.DataFrame.from_dict(data, dtype=float)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Monday     4 non-null      float64
 1   Tuesday    4 non-null      float64
 2   Wednesday  4 non-null      float64
dtypes: float64(3)
memory usage: 224.0 bytes

Matt Clarke, Thursday, January 05, 2023

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.