When working with Pandas dataframes, you may sometimes encounter a column that contains a list of Python dictionaries or JSON objects. While this format doesn’t take up much space and is convenient for developers to work with, it does make things a bit fiddly when you want to work with the data in a dataframe.
In this project, you’ll learn how to convert a Pandas column containing a list of Python dictionaries or JSON objects into a new Pandas dataframe using just a few lines of code.
To get started open a new Jupyter notebook and import the Pandas package. We’ll then create a Pandas dataframe that includes three columns: id
, sku
, and variants
. The variants
column will contain a list of Python dictionaries or JSON objects.
import pandas as pd
data = [
{
'id': 1,
'sku': 'ABC1',
'variants': [
{
'parent_sku': 'ABC1',
'variant_sku': 'ABC1-A5',
'price': 1.99
},
{
'parent_sku': 'ABC1',
'variant_sku': 'ABC1-A4',
'price': 2.99
},
{
'parent_sku': 'ABC1',
'variant_sku': 'ABC1-A3',
'price': 3.99
},
]
},
{
'id': 2,
'sku': 'ABC2',
'variants': [
{
'parent_sku': 'ABC2',
'variant_sku': 'ABC2-A5',
'price': 2.99
},
{
'parent_sku': 'ABC2',
'variant_sku': 'ABC2-A4',
'price': 3.99
},
]
},
{
'id': 3,
'sku': 'ABC3',
'variants': [
{
'parent_sku': 'ABC3',
'variant_sku': 'ABC3-A5',
'price': 3.99
},
]
},
]
We can now use the pd.DataFrame.from_records()
function to create a Pandas dataframe from the data
list. As you can see, the variants
column contains a list of Python dictionaries or JSON objects and is not easy to read or work with. Since it relates to data better represented as individual rows, we’ll write a function to convert it into a new dataframe.
df = pd.DataFrame.from_records(data)
df
id | sku | variants | |
---|---|---|---|
0 | 1 | ABC1 | [{'parent_sku': 'ABC1', 'variant_sku': 'ABC1-A... |
1 | 2 | ABC2 | [{'parent_sku': 'ABC2', 'variant_sku': 'ABC2-A... |
2 | 3 | ABC3 | [{'parent_sku': 'ABC3', 'variant_sku': 'ABC3-A... |
We’ll now write a function to convert the variants
column into a new dataframe. The function will take the df
dataframe as an argument and return a new dataframe with the variants
column converted into a new dataframe.
The function first creates a list called rows
that will hold each row of data we want to add to the dataframe. We’ll then use a for
loop to iterate over each row in the df
dataframe and each item in the list stores in the column.
When we get a row, we’ll append it to our rows
list. We’ll then use the pd.DataFrame()
function to create a new dataframe from the rows
list.
def dict_list_to_df(df, col):
"""Return a Pandas dataframe based on a column that contains a list of JSON objects or dictionaries.
Args:
df (Pandas dataframe): The dataframe to be flattened.
col (str): The name of the column that contains the JSON objects or dictionaries.
Returns:
Pandas dataframe: A new dataframe with the JSON objects or dictionaries expanded into columns.
"""
rows = []
for index, row in df[col].iteritems():
for item in row:
rows.append(item)
df = pd.DataFrame(rows)
return df
Now we can use our reusable function and pass it the df
dataframe and the name of the column that contains the list of dictionaries or JSON objects. We get back a nice neat dataframe containing only the contents of the variants
column.
df_variants = json_list_to_df(df, 'variants')
df_variants
parent_sku | variant_sku | price | |
---|---|---|---|
0 | ABC1 | ABC1-A5 | 1.99 |
1 | ABC1 | ABC1-A4 | 2.99 |
2 | ABC1 | ABC1-A3 | 3.99 |
3 | ABC2 | ABC2-A5 | 2.99 |
4 | ABC2 | ABC2-A4 | 3.99 |
5 | ABC3 | ABC3-A5 | 3.99 |
Matt Clarke, Thursday, October 06, 2022