How to create a Pandas dataframe

Picture by Christina Morillo, Unsplash.

12 minutes to read

Data Science Pandas

The massive versatility of Pandas means that you can create dataframes from almost any type of raw data. Whether you have a list, a list of lists, a dictionary, a dictionary of lists, a list of dictionaries, some tuples, a NumPy array, or something else, you can turn your data into a Pandas dataframe. Here’s how it’s done.

Create a Pandas dataframe from scratch

There are numerous ways to create a Pandas dataframe from scratch. The most commonly used is to create a dictionary containing a list of values for each column (or series) you want to add, then pass the dictionary and a list of corresponding column names to the columns argument of pd.DataFrame().

import pandas as pd

data = {'Model': ['Jaguar XE', 'Jaguar XF', 'Jaguar XJ'],
        'Price from': [29635, 32585, 56020]}

df = pd.DataFrame(data, columns = ['Model', 'Price from'])
df

	Model	Price from
0	Jaguar XE	29635
1	Jaguar XF	32585
2	Jaguar XJ	56020

Create a Pandas dataframe from a list

If you have a single list you can pass it directly to pd.DataFrame(), along with a list containing the column name, and Pandas will turn it into a dataframe with a single column.

import pandas as pd 
  
models = ['Jaguar XE', 'Jaguar XF', 'Jaguar XJ', 'Jaguar F-Type', 'Jaguar XK'] 
df = pd.DataFrame(models, columns=['Models']) 
df 

	Models
0	Jaguar XE
1	Jaguar XF
2	Jaguar XJ
3	Jaguar F-Type
4	Jaguar XK

Create a Pandas dataframe from two or more lists

If you have two or more lists, you can use the list(zip()) technique to pass them into Pandas, and then define their column names in a list passed to the columns argument.

import pandas as pd 

models = ['Jaguar XE', 'Jaguar XF', 'Jaguar XJ', 'Jaguar F-Type', 'Jaguar XK'] 
prices = [29635, 32585, 56020, 67300, 75392]

df = pd.DataFrame(list(zip(models, prices)), columns=['Models', 'Prices']) 
df

	Models	Prices
0	Jaguar XE	29635
1	Jaguar XF	32585
2	Jaguar XJ	56020
3	Jaguar F-Type	67300
4	Jaguar XK	75392

Create a Pandas dataframe from a multidimensional list

If you have a multidimensional list that contains a series of data points, such as the model and torque figure in our below example, you can pass this as the first argument to pd.DataFrame() and define the columns list as the second argument.

import pandas as pd  
    
cars = [['BMW 435d', '465 ft lb'], 
       ['BMW M3', '406 ft lb'], 
       ['BMW M4', '406 ft lb'], 
       ['BMW M5', '553 ft lb']] 
    
df = pd.DataFrame(cars, columns=['Model', 'Torque']) 
df 

	Model	Torque
0	BMW 435d	465 ft lb
1	BMW M3	406 ft lb
2	BMW M4	406 ft lb
3	BMW M5	553 ft lb

Create a Pandas dataframe from a dictionary

The other quick way to create a Pandas dataframe from a dictionary is to use the from_dict() function. If you use this approach, Pandas will take the key of the dictionary (i.e. Model and Wheelbase in this example) and assign them to the column names.

import pandas as pd

data = {'Model': ['Defender 90', 'Defender 110', 'Defender 130'],
        'Wheelbase': ['90 inches', '110 inches', '130 inches'] 
       }

df = pd.DataFrame.from_dict(data)
df

	Model	Wheelbase
0	Defender 90	90 inches
1	Defender 110	110 inches
2	Defender 130	130 inches

Create a Pandas dataframe from a list of dictionaries

If you have a list containing one or more dictionaries with the same format, you can pass the list to the from_records() function. Like from_dict(), this is quite a time saver, because it automatically takes the dictionary keys and uses them to assign to the column headers.

import pandas as pd

data = [{'Species': 'Esox lucius', 'Weight': 4272},
        {'Species': 'Perca fluviatilis', 'Weight': 1022},
        {'Species': 'Salmo trutta', 'Weight': 3832}]

df = pd.DataFrame.from_records(data)
df

	Species	Weight
0	Esox lucius	4272
1	Perca fluviatilis	1022
2	Salmo trutta	3832

Create a Pandas dataframe from a dictionary of lists

If your data is currently present within a number of lists, you can create a dictionary and pass the dict to pd.DataFrame. The key names assigned to the dictionary will be used to set the Pandas column names.

import pandas as pd  
  
species = ["Salmo trutta", "Thymallus thymallus", "Phoxinus phoxinus"] 
length = [91, 35, 6] 
  
dict = {'Species': species, 'Length': length}  
    
df = pd.DataFrame(dict) 
    
df  

	Species	Length
0	Salmo trutta	91
1	Thymallus thymallus	35
2	Phoxinus phoxinus	6

Create a Pandas dataframe from tuples

Tuples are slightly less common than dictionaries and lists. However, the approach to building a dataframe from tuples is just the same. You simply pass the list of tuples to the first argument of from_records() and pass a list of column names to the columns argument.

import pandas as pd

data = [('Esox lucius', 'Llyn Brenig'), 
        ('Salmo trutta', 'River Dee'), 
        ('Phoxinus phoxinus', 'River Ceiriog')]

df = pd.DataFrame.from_records(data, columns=['Species', 'Location'])
df

	Species	Location
0	Esox lucius	Llyn Brenig
1	Salmo trutta	River Dee
2	Phoxinus phoxinus	River Ceiriog

Create a Pandas dataframe a CSV file

Perhaps the most common way to create a dataframe in Pandas is to create it by importing data from another source, such as a CSV file or Excel spreadsheet. Here’s a really simple example, but for more details on this technique please check out my other guide to importing data in Pandas.

import pandas as pd

df = pd.read_csv('../sitemap.csv')
df.head()

	url
0	http://flyandlure.org/
1	http://flyandlure.org/about
2	http://flyandlure.org/terms
3	http://flyandlure.org/privacy
4	http://flyandlure.org/copyright

Create a Pandas dataframe from a NumPy array

NumPy arrays are slightly more challenging to import. However, they can also be handled by the from_records() function.

import pandas as pd
import numpy as np

data = np.array([(43, 'a'), (35, 'b'), (27, 'c'), (13, 'd')],
                dtype=[('Score', 'i4'), ('Segment', 'U1')])

df = pd.DataFrame.from_records(data)
df

	Score	Segment
0	43	a
1	35	b
2	27	c
3	13	d

Matt Clarke, Tuesday, March 02, 2021

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.