How to convert a column list of dictionaries to a Pandas dataframe

Picture by Pixabay, Pexels.

6 minutes to read

Data Science Pandas

When working with Pandas dataframes, you may sometimes encounter a column that contains a list of Python dictionaries or JSON objects. While this format doesn’t take up much space and is convenient for developers to work with, it does make things a bit fiddly when you want to work with the data in a dataframe.

In this project, you’ll learn how to convert a Pandas column containing a list of Python dictionaries or JSON objects into a new Pandas dataframe using just a few lines of code.

Create a Pandas dataframe

To get started open a new Jupyter notebook and import the Pandas package. We’ll then create a Pandas dataframe that includes three columns: id, sku, and variants. The variants column will contain a list of Python dictionaries or JSON objects.

import pandas as pd

data = [
    {
        'id': 1, 
        'sku': 'ABC1', 
        'variants': [
            {
                'parent_sku': 'ABC1', 
                'variant_sku': 'ABC1-A5',
                'price': 1.99
            }, 
            {
                'parent_sku': 'ABC1', 
                'variant_sku': 'ABC1-A4',
                'price': 2.99
            },
            {
                'parent_sku': 'ABC1', 
                'variant_sku': 'ABC1-A3',
                'price': 3.99
            },  
        ] 
    }, 
    {
        'id': 2, 
        'sku': 'ABC2', 
        'variants': [
            {
                'parent_sku': 'ABC2', 
                'variant_sku': 'ABC2-A5',
                'price': 2.99
            }, 
            {
                'parent_sku': 'ABC2', 
                'variant_sku': 'ABC2-A4',
                'price': 3.99
            },
        ] 
    }, 
    {
        'id': 3, 
        'sku': 'ABC3', 
        'variants': [
            {
                'parent_sku': 'ABC3', 
                'variant_sku': 'ABC3-A5',
                'price': 3.99
            }, 
        ] 
    }, 
]

We can now use the pd.DataFrame.from_records() function to create a Pandas dataframe from the data list. As you can see, the variants column contains a list of Python dictionaries or JSON objects and is not easy to read or work with. Since it relates to data better represented as individual rows, we’ll write a function to convert it into a new dataframe.

df = pd.DataFrame.from_records(data)
df

	id	sku	variants
0	1	ABC1	[{'parent_sku': 'ABC1', 'variant_sku': 'ABC1-A...
1	2	ABC2	[{'parent_sku': 'ABC2', 'variant_sku': 'ABC2-A...
2	3	ABC3	[{'parent_sku': 'ABC3', 'variant_sku': 'ABC3-A...

Create a function to convert the list of dictionaries or JSON to a dataframe

We’ll now write a function to convert the variants column into a new dataframe. The function will take the df dataframe as an argument and return a new dataframe with the variants column converted into a new dataframe.

The function first creates a list called rows that will hold each row of data we want to add to the dataframe. We’ll then use a for loop to iterate over each row in the df dataframe and each item in the list stores in the column.

When we get a row, we’ll append it to our rows list. We’ll then use the pd.DataFrame() function to create a new dataframe from the rows list.

def dict_list_to_df(df, col):
    """Return a Pandas dataframe based on a column that contains a list of JSON objects or dictionaries.
    Args:
        df (Pandas dataframe): The dataframe to be flattened.
        col (str): The name of the column that contains the JSON objects or dictionaries.
    Returns:
        Pandas dataframe: A new dataframe with the JSON objects or dictionaries expanded into columns.
    """

    rows = []
    for index, row in df[col].iteritems():
        for item in row:
            rows.append(item)
    df = pd.DataFrame(rows)
    return df

Create a new dataframe from the list of dictionaries or JSON

Now we can use our reusable function and pass it the df dataframe and the name of the column that contains the list of dictionaries or JSON objects. We get back a nice neat dataframe containing only the contents of the variants column.

df_variants = json_list_to_df(df, 'variants')

df_variants

	parent_sku	variant_sku	price
0	ABC1	ABC1-A5	1.99
1	ABC1	ABC1-A4	2.99
2	ABC1	ABC1-A3	3.99
3	ABC2	ABC2-A5	2.99
4	ABC2	ABC2-A4	3.99
5	ABC3	ABC3-A5	3.99

Matt Clarke, Thursday, October 06, 2022

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.