Pandas’ versatility means that there are loads of different ways to create a dataframe. The Pandas from_dict()
function is one of the most common ways to create a dataframe from a dictionary. However, there are other ways to create a dataframe, including the from_records()
function.
The Pandas from_records()
function can be a very useful one if you need to create a dataframe from a list of data, such as a list of Python dictionaries, or a list of tuples. It can also be used to create dataframes directly from a NumPy array. In this tutorial we’ll go over the basics of from_records()
so you can see how it works.
To get started, open a Jupyter notebook and import the Pandas and NumPy libraries using the import pandas as pd
and import numpy as np
naming conventions. This will allow us to access Pandas and NumPy using a pd
and np
shorthand in the following steps.
import pandas as pd
import numpy as np
First we’ll use the Pandas from_records()
function to create a new dataframe from a list of Python dictionaries. Each dictionary needs to have the same keys and values. Ours contain Species
, Genus
, and Family
. When you call pd.DataFrame.from_records()
and pass in the dictionary, Pandas will create a dataframe and assign the dictionary keys to the column names and the dictionary values to the row values, giving you a neat dataframe you can work with.
data = [{'Species': 'Esox lucius',
'Genus': 'Esox',
'Family': 'Esocidae'},
{'Species': 'Salmo trutta',
'Genus': 'Salmo',
'Family': 'Salmonidae'},
{'Species': 'Salmo salar',
'Genus': 'Salmo',
'Family': 'Salmonidae'}]
df = pd.DataFrame.from_records(data)
df
Species | Genus | Family | |
---|---|---|---|
0 | Esox lucius | Esox | Esocidae |
1 | Salmo trutta | Salmo | Salmonidae |
2 | Salmo salar | Salmo | Salmonidae |
Supposing you were working with a list of massive dictionaries and you only wanted to import specific columns. By passing a list of dictionary keys to the exclude
parameter we can tell from_records()
not to import certain columns to the dataframe.
data = [{'Species': 'Esox lucius',
'Genus': 'Esox',
'Family': 'Esocidae'},
{'Species': 'Salmo trutta',
'Genus': 'Salmo',
'Family': 'Salmonidae'},
{'Species': 'Salmo salar',
'Genus': 'Salmo',
'Family': 'Salmonidae'}]
df = pd.DataFrame.from_records(data, exclude=['Genus'])
df
Species | Family | |
---|---|---|
0 | Esox lucius | Esocidae |
1 | Salmo trutta | Salmonidae |
2 | Salmo salar | Salmonidae |
While they’re not used that often, lists of Python tuples can also be used to create dataframes using from_records()
Since a tuple doesn’t provide a name to be used as a column header, you need to use the columns
parameter to pass in a list of column names that map to the sequence used in your tuple.
data = [('Esox lucius', 'Llyn Brenig'),
('Salmo trutta', 'River Dee'),
('Phoxinus phoxinus', 'River Ceiriog')]
df = pd.DataFrame.from_records(data, columns=['Species', 'Location'])
df
Species | Location | |
---|---|---|
0 | Esox lucius | Llyn Brenig |
1 | Salmo trutta | River Dee |
2 | Phoxinus phoxinus | River Ceiriog |
Finally, from_records()
can also be used to create a Pandas dataframe from a NumPy array. Here’s how it’s done.
data = np.array([(43, 'a'), (35, 'b'), (27, 'c'), (13, 'd')],
dtype=[('Score', 'i4'), ('Segment', 'U1')])
df = pd.DataFrame.from_records(data)
df
Score | Segment | |
---|---|---|
0 | 43 | a |
1 | 35 | b |
2 | 27 | c |
3 | 13 | d |
Matt Clarke, Monday, January 09, 2023