How to read Google Sheets data in Pandas with GSpread

The GSpread package makes it quick and easy to read Google Sheets spreadsheets from Google Drive and load them into Pandas dataframes. Here's a quick guide to using it.

How to read Google Sheets data in Pandas with GSpread
Picture by Kindel Media, Pexels.
5 minutes to read

GSpread is a Python package that makes it quick and easy to read and write data from Google Sheets spreadsheets stored in your Google Drive into Python. With a tiny bit of extra code, you can also load your Google Sheets data straight into a Pandas dataframe.

Once your GSheets data is in a Pandas dataframe, you can manipulate as you wish, connect it to other data sources, or push it into other systems, such as databases, or push it back into Google Sheets so you can access it via Google Data Studio and other tools. Here’s a quick guide to how it’s done.

Install the packages

First, open a Jupyter notebook and install the GSpread Python package by entering !pip3 install gspread into a cell and then executing it by pressing shift and enter. Then import the gspread and pandas packages. !pip3 install gspread

import gspread as gs
import pandas as pd

Authenticate with Google Sheets

Next, use the service_account() function from GSpread to authenticate on your Google Sheets account using a client secrets JSON keyfile for your service account. You can get one of these from the Google API Console.

gc = gs.service_account(filename='service_account.json')

Open the Google Sheet with Python

Now we have a connection to Google Sheets stored in the gc object, we can append the open_by_url() function to this and pass in the URL of the Google Sheets spreadsheet we want to load in Pandas. This will then use the authenticated connection to fetch the data and pull it into your Jupyter notebook as a Gspread spreadsheet object.

sh = gc.open_by_url('https://docs.google.com/spreadsheets/d/1vxyaK4dDdXzDJ5Axakrvik4CUxzGyHvgmmBQJ3Qo7dE/edit?usp=sharing')

Select the specific worksheet

Spreadsheets in Google Sheets and Excel are divided into worksheets. Whether your spreadsheet contains a single worksheet or multiple named worksheets, you’ll need to load the specific one. To do this we append the worksheet() function to the sh spreadsheet object and give it the name of the seet.

ws = sh.worksheet('Courses')

To access the content from the worksheet, we append the get_all_records() function to the object containing the worksheet. This returns a Python dictionary containing all the data in the worksheet. ws.get_all_records() Since we want to display the contents of our Google Sheet in a Pandas dataframe, we need to create a dataframe from that dictionary. This is actually really easy. A single line of code will do that for us.

df = pd.DataFrame(ws.get_all_records())
df.head()
course_title course_duration course_type course_categories
0 Recurrent Neural Networks for Language Modelin... 4 hours Course Machine Learning
1 Market Basket Analysis in Python 4 hours Course Machine Learning
2 Machine Learning for Everyone 4 hours Course Machine Learning
3 Machine Learning for Business 4 hours Course Machine Learning
4 Practicing Machine Learning Interview Question... 4 hours Course Machine Learning

Bring it all together

To wrap up, here’s the whole set of code in one neat block. To recap, all you need to do is import GSpread and Pandas, create a service account connection, load your spreadsheet, select the worksheet, and load the records into a Pandas dataframe. You can then manipulate

import gspread as gs
import pandas as pd

gc = gs.service_account(filename='service_account.json')
sh = gc.open_by_url('https://docs.google.com/spreadsheets/d/1vxyaK4dDdXzDJ5Axakrvik4CUxzGyHvgmmBQJ3Qo7dE/edit?usp=sharing')

ws = sh.worksheet('Courses')
df = pd.DataFrame(ws.get_all_records())
df.head()
course_title course_duration course_type course_categories
0 Recurrent Neural Networks for Language Modelin... 4 hours Course Machine Learning
1 Market Basket Analysis in Python 4 hours Course Machine Learning
2 Machine Learning for Everyone 4 hours Course Machine Learning
3 Machine Learning for Business 4 hours Course Machine Learning
4 Practicing Machine Learning Interview Question... 4 hours Course Machine Learning

Matt Clarke, Saturday, June 05, 2021

Matt Clarke Matt is a Digital Director who uses data science to help in his work. He has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.

Marketing Analytics in Spreadsheets

Learn how to ensure clean data entry and build dynamic dashboards to display your marketing data.

Start course for FREE

Comments