How to use Pandas from_records() to create a dataframe

Picture by Elviss Railijs Bitāns, Pexels.

6 minutes to read

Data Science Pandas

Pandas’ versatility means that there are loads of different ways to create a dataframe. The Pandas from_dict() function is one of the most common ways to create a dataframe from a dictionary. However, there are other ways to create a dataframe, including the from_records() function.

The Pandas from_records() function can be a very useful one if you need to create a dataframe from a list of data, such as a list of Python dictionaries, or a list of tuples. It can also be used to create dataframes directly from a NumPy array. In this tutorial we’ll go over the basics of from_records() so you can see how it works.

Import the packages

To get started, open a Jupyter notebook and import the Pandas and NumPy libraries using the import pandas as pd and import numpy as np naming conventions. This will allow us to access Pandas and NumPy using a pd and np shorthand in the following steps.

import pandas as pd
import numpy as np

Create a dataframe from a list of dictionaries

First we’ll use the Pandas from_records() function to create a new dataframe from a list of Python dictionaries. Each dictionary needs to have the same keys and values. Ours contain Species, Genus, and Family. When you call pd.DataFrame.from_records() and pass in the dictionary, Pandas will create a dataframe and assign the dictionary keys to the column names and the dictionary values to the row values, giving you a neat dataframe you can work with.

data = [{'Species': 'Esox lucius', 
         'Genus': 'Esox',
         'Family': 'Esocidae'}, 
        {'Species': 'Salmo trutta',
        'Genus': 'Salmo',
        'Family': 'Salmonidae'},
        {'Species': 'Salmo salar',
        'Genus': 'Salmo',
        'Family': 'Salmonidae'}]
df = pd.DataFrame.from_records(data)
df

	Species	Genus	Family
0	Esox lucius	Esox	Esocidae
1	Salmo trutta	Salmo	Salmonidae
2	Salmo salar	Salmo	Salmonidae

Exclude specific columns

Supposing you were working with a list of massive dictionaries and you only wanted to import specific columns. By passing a list of dictionary keys to the exclude parameter we can tell from_records() not to import certain columns to the dataframe.

data = [{'Species': 'Esox lucius', 
         'Genus': 'Esox',
         'Family': 'Esocidae'}, 
        {'Species': 'Salmo trutta',
        'Genus': 'Salmo',
        'Family': 'Salmonidae'},
        {'Species': 'Salmo salar',
        'Genus': 'Salmo',
        'Family': 'Salmonidae'}]
df = pd.DataFrame.from_records(data, exclude=['Genus'])
df

	Species	Family
0	Esox lucius	Esocidae
1	Salmo trutta	Salmonidae
2	Salmo salar	Salmonidae

Create a dataframe from a list of tuples

While they’re not used that often, lists of Python tuples can also be used to create dataframes using from_records() Since a tuple doesn’t provide a name to be used as a column header, you need to use the columns parameter to pass in a list of column names that map to the sequence used in your tuple.

data = [('Esox lucius', 'Llyn Brenig'), 
        ('Salmo trutta', 'River Dee'), 
        ('Phoxinus phoxinus', 'River Ceiriog')]

df = pd.DataFrame.from_records(data, columns=['Species', 'Location'])
df

	Species	Location
0	Esox lucius	Llyn Brenig
1	Salmo trutta	River Dee
2	Phoxinus phoxinus	River Ceiriog

Create a dataframe from a NumPy array

Finally, from_records() can also be used to create a Pandas dataframe from a NumPy array. Here’s how it’s done.

data = np.array([(43, 'a'), (35, 'b'), (27, 'c'), (13, 'd')],
                dtype=[('Score', 'i4'), ('Segment', 'U1')])

df = pd.DataFrame.from_records(data)
df

	Score	Segment
0	43	a
1	35	b
2	27	c
3	13	d

Matt Clarke, Monday, January 09, 2023

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.