How to create a Pandas dataframe

Pandas lets you create dataframes from almost any type of data, including lists, dictionaries, tuples, NumPy arrays, and files. Here’s how it’s done.

How to create a Pandas dataframe
Picture by Christina Morillo, Unsplash.
12 minutes to read

The massive versatility of Pandas means that you can create dataframes from almost any type of raw data. Whether you have a list, a list of lists, a dictionary, a dictionary of lists, a list of dictionaries, some tuples, a NumPy array, or something else, you can turn your data into a Pandas dataframe. Here’s how it’s done.

Create a Pandas dataframe from scratch

There are numerous ways to create a Pandas dataframe from scratch. The most commonly used is to create a dictionary containing a list of values for each column (or series) you want to add, then pass the dictionary and a list of corresponding column names to the columns argument of pd.DataFrame().

import pandas as pd

data = {'Model': ['Jaguar XE', 'Jaguar XF', 'Jaguar XJ'],
        'Price from': [29635, 32585, 56020]}

df = pd.DataFrame(data, columns = ['Model', 'Price from'])
df
Model Price from
0 Jaguar XE 29635
1 Jaguar XF 32585
2 Jaguar XJ 56020

Create a Pandas dataframe from a list

If you have a single list you can pass it directly to pd.DataFrame(), along with a list containing the column name, and Pandas will turn it into a dataframe with a single column.

import pandas as pd 
  
models = ['Jaguar XE', 'Jaguar XF', 'Jaguar XJ', 'Jaguar F-Type', 'Jaguar XK'] 
df = pd.DataFrame(models, columns=['Models']) 
df 
Models
0 Jaguar XE
1 Jaguar XF
2 Jaguar XJ
3 Jaguar F-Type
4 Jaguar XK

Create a Pandas dataframe from two or more lists

If you have two or more lists, you can use the list(zip()) technique to pass them into Pandas, and then define their column names in a list passed to the columns argument.

import pandas as pd 

models = ['Jaguar XE', 'Jaguar XF', 'Jaguar XJ', 'Jaguar F-Type', 'Jaguar XK'] 
prices = [29635, 32585, 56020, 67300, 75392]

df = pd.DataFrame(list(zip(models, prices)), columns=['Models', 'Prices']) 
df
Models Prices
0 Jaguar XE 29635
1 Jaguar XF 32585
2 Jaguar XJ 56020
3 Jaguar F-Type 67300
4 Jaguar XK 75392

Create a Pandas dataframe from a multidimensional list

If you have a multidimensional list that contains a series of data points, such as the model and torque figure in our below example, you can pass this as the first argument to pd.DataFrame() and define the columns list as the second argument.

import pandas as pd  
    
cars = [['BMW 435d', '465 ft lb'], 
       ['BMW M3', '406 ft lb'], 
       ['BMW M4', '406 ft lb'], 
       ['BMW M5', '553 ft lb']] 
    
df = pd.DataFrame(cars, columns=['Model', 'Torque']) 
df 
Model Torque
0 BMW 435d 465 ft lb
1 BMW M3 406 ft lb
2 BMW M4 406 ft lb
3 BMW M5 553 ft lb

Create a Pandas dataframe from a dictionary

The other quick way to create a Pandas dataframe from a dictionary is to use the from_dict() function. If you use this approach, Pandas will take the key of the dictionary (i.e. Model and Wheelbase in this example) and assign them to the column names.

import pandas as pd

data = {'Model': ['Defender 90', 'Defender 110', 'Defender 130'],
        'Wheelbase': ['90 inches', '110 inches', '130 inches'] 
       }

df = pd.DataFrame.from_dict(data)
df
Model Wheelbase
0 Defender 90 90 inches
1 Defender 110 110 inches
2 Defender 130 130 inches

Create a Pandas dataframe from a list of dictionaries

If you have a list containing one or more dictionaries with the same format, you can pass the list to the from_records() function. Like from_dict(), this is quite a time saver, because it automatically takes the dictionary keys and uses them to assign to the column headers.

import pandas as pd

data = [{'Species': 'Esox lucius', 'Weight': 4272},
        {'Species': 'Perca fluviatilis', 'Weight': 1022},
        {'Species': 'Salmo trutta', 'Weight': 3832}]

df = pd.DataFrame.from_records(data)
df
Species Weight
0 Esox lucius 4272
1 Perca fluviatilis 1022
2 Salmo trutta 3832

Create a Pandas dataframe from a dictionary of lists

If your data is currently present within a number of lists, you can create a dictionary and pass the dict to pd.DataFrame. The key names assigned to the dictionary will be used to set the Pandas column names.

import pandas as pd  
  
species = ["Salmo trutta", "Thymallus thymallus", "Phoxinus phoxinus"] 
length = [91, 35, 6] 
  
dict = {'Species': species, 'Length': length}  
    
df = pd.DataFrame(dict) 
    
df  
Species Length
0 Salmo trutta 91
1 Thymallus thymallus 35
2 Phoxinus phoxinus 6

Create a Pandas dataframe from tuples

Tuples are slightly less common than dictionaries and lists. However, the approach to building a dataframe from tuples is just the same. You simply pass the list of tuples to the first argument of from_records() and pass a list of column names to the columns argument.

import pandas as pd

data = [('Esox lucius', 'Llyn Brenig'), 
        ('Salmo trutta', 'River Dee'), 
        ('Phoxinus phoxinus', 'River Ceiriog')]

df = pd.DataFrame.from_records(data, columns=['Species', 'Location'])
df
Species Location
0 Esox lucius Llyn Brenig
1 Salmo trutta River Dee
2 Phoxinus phoxinus River Ceiriog

Create a Pandas dataframe a CSV file

Perhaps the most common way to create a dataframe in Pandas is to create it by importing data from another source, such as a CSV file or Excel spreadsheet. Here’s a really simple example, but for more details on this technique please check out my other guide to importing data in Pandas.

import pandas as pd

df = pd.read_csv('../sitemap.csv')
df.head()
url
0 http://flyandlure.org/
1 http://flyandlure.org/about
2 http://flyandlure.org/terms
3 http://flyandlure.org/privacy
4 http://flyandlure.org/copyright

Create a Pandas dataframe from a NumPy array

NumPy arrays are slightly more challenging to import. However, they can also be handled by the from_records() function.

import pandas as pd
import numpy as np

data = np.array([(43, 'a'), (35, 'b'), (27, 'c'), (13, 'd')],
                dtype=[('Score', 'i4'), ('Segment', 'U1')])

df = pd.DataFrame.from_records(data)
df
Score Segment
0 43 a
1 35 b
2 27 c
3 13 d

Matt Clarke, Tuesday, March 02, 2021

Matt Clarke Matt is a Digital Director who uses data science to help in his work. He has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.

Joining Data with pandas

Learn to combine data from multiple tables by joining data together using pandas.

Start course for FREE

Comments