How to split a Pandas column string or list into separate columns

Learn how to split or explode a string or Python list stored in a Pandas column into separate columns to make data analysis easier.

How to split a Pandas column string or list into separate columns
Picture by Nataliya Vaitkevich, Pexels.
6 minutes to read

When working in Pandas you’ll sometimes encounter data stored in a single column that would actually be better presented when split into separate columns. For example, a Pandas column might contain a comma separated list of values, or a Python list.

Data stored in a single column are much harder to work with, so splitting data up into distinct columns can make analysis or modeling much easier. There are a couple of ways to split a Pandas column string or list into separate columns. The Pandas explode() function is probably the best, but you can also use the str.split() function.

In this simple project, we’ll take a string of comma separated values stored in a single column of a Pandas dataframe, split the data up into a Python list stored in a single column, and then explode or split the data, so it’s present in its own specific column.

Import the packages

First, open a Jupyter notebook and import the Pandas package. Use the set_option() function to increase the maximum column width using max_colwidth so we can easily see longer column values.

import pandas as pd
pd.set_option('max_colwidth', 100)

Create a Pandas dataframe

Next we need to create a Pandas dataframe that contains some string columns that are separated in some way, for example, as a comma separated value list. To make things more challenging for ourselves, we’ll make these strings of differing length, so we can’t take an easy route.

data = {
    'brand': ['Jaguar', 
              'Land Rover', 
              'Lotus'],
    'models_string': ['XE, XF, E Pace, F Pace, I Pace, F Type', 
                      'Defender, Discovery, Discovery Sport, Range Rover, Range Rover Sport, Range Rover Velar',
                      'Emira, Evija']
}

df = pd.DataFrame(data, columns=['brand', 'models_string'])
df.head()
brand models_string
0 Jaguar XE, XF, E Pace, F Pace, I Pace, F Type
1 Land Rover Defender, Discovery, Discovery Sport, Range Rover, Range Rover Sport, Range Rover Velar
2 Lotus Emira, Evija

Use str.split() to split a string into a list

First, we’ll split or “explode” our string of comma separated values into a Python list using the str.split() function. We’ll append the str.split() function to the name of the original column and define the separator as a comma, then we’ll assign the output of the result to a new column called models_list. When executed this will split our string up after each comma and create a Python list of values.

df['models_list'] = df['models_string'].str.split(',')
df.head()
brand models_string models_list
0 Jaguar XE, XF, E Pace, F Pace, I Pace, F Type [XE, XF, E Pace, F Pace, I Pace, F Type]
1 Land Rover Defender, Discovery, Discovery Sport, Range Rover, Range Rover Sport, Range Rover Velar [Defender, Discovery, Discovery Sport, Range Rover, Range Rover Sport, Range Rover Velar]
2 Lotus Emira, Evija [Emira, Evija]

Split a column list into separate columns

Finally, we can take each of these list values and store them in their own Pandas column. Our lists are of unequal length, which makes this trickier. We’ll first create a new dataframe called df_models and will use tolist() to pass in the data from models_list. This returns a Pandas dataframe with each of the list values in their own column. We’ll then use fillna('') to replace any NaN values with an empty string and prefix each column with model_.

df_models = pd.DataFrame(df['models_list'].tolist()).fillna('').add_prefix('model_')

Next, we’ll use concat() to perform an outer join. We’ll pass in the two dataframes and join them on their index, returning a new dataframe that includes the original brand data with the models each in their own column.

df = pd.concat([df, df_models], axis=1)
df[['brand', 'model_0', 'model_1', 'model_2', 'model_3', 'model_4', 'model_5']]
brand model_0 model_1 model_2 model_3 model_4 model_5
0 Jaguar XE XF E Pace F Pace I Pace F Type
1 Land Rover Defender Discovery Discovery Sport Range Rover Range Rover Sport Range Rover Velar
2 Lotus Emira Evija

Matt Clarke, Thursday, August 18, 2022

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.