GSpread is a Python package that makes it quick and easy to read and write data from Google Sheets spreadsheets stored in your Google Drive into Python. With a tiny bit of extra code, you can also load your Google Sheets data straight into a Pandas dataframe.
Once your GSheets data is in a Pandas dataframe, you can manipulate as you wish, connect it to other data sources, or push it into other systems, such as databases, or push it back into Google Sheets so you can access it via Google Data Studio and other tools. Here’s a quick guide to how it’s done.
First, open a Jupyter notebook and install the GSpread Python package by entering !pip3 install gspread
into a cell and then executing it by pressing shift and enter. Then import the gspread
and pandas
packages.
!pip3 install gspread
import gspread as gs
import pandas as pd
Next, use the service_account()
function from GSpread to authenticate on your Google Sheets account using a client secrets JSON keyfile for your service account. You can get one of these from the Google API Console.
gc = gs.service_account(filename='service_account.json')
Now we have a connection to Google Sheets stored in the gc
object, we can append the open_by_url()
function to this and pass in the URL of the Google Sheets spreadsheet we want to load in Pandas. This will then use the authenticated connection to fetch the data and pull it into your Jupyter notebook as a Gspread spreadsheet object.
sh = gc.open_by_url('https://docs.google.com/spreadsheets/d/1vxyaK4dDdXzDJ5Axakrvik4CUxzGyHvgmmBQJ3Qo7dE/edit?usp=sharing')
Spreadsheets in Google Sheets and Excel are divided into worksheets. Whether your spreadsheet contains a single worksheet or multiple named worksheets, you’ll need to load the specific one. To do this we append the worksheet()
function to the sh
spreadsheet object and give it the name of the seet.
ws = sh.worksheet('Courses')
To access the content from the worksheet, we append the get_all_records()
function to the object containing the worksheet. This returns a Python dictionary containing all the data in the worksheet.
ws.get_all_records()
Since we want to display the contents of our Google Sheet in a Pandas dataframe, we need to create a dataframe from that dictionary. This is actually really easy. A single line of code will do that for us.
df = pd.DataFrame(ws.get_all_records())
df.head()
course_title | course_duration | course_type | course_categories | |
---|---|---|---|---|
0 | Recurrent Neural Networks for Language Modelin... | 4 hours | Course | Machine Learning |
1 | Market Basket Analysis in Python | 4 hours | Course | Machine Learning |
2 | Machine Learning for Everyone | 4 hours | Course | Machine Learning |
3 | Machine Learning for Business | 4 hours | Course | Machine Learning |
4 | Practicing Machine Learning Interview Question... | 4 hours | Course | Machine Learning |
To wrap up, here’s the whole set of code in one neat block. To recap, all you need to do is import GSpread and Pandas, create a service account connection, load your spreadsheet, select the worksheet, and load the records into a Pandas dataframe. You can then manipulate
import gspread as gs
import pandas as pd
gc = gs.service_account(filename='service_account.json')
sh = gc.open_by_url('https://docs.google.com/spreadsheets/d/1vxyaK4dDdXzDJ5Axakrvik4CUxzGyHvgmmBQJ3Qo7dE/edit?usp=sharing')
ws = sh.worksheet('Courses')
df = pd.DataFrame(ws.get_all_records())
df.head()
course_title | course_duration | course_type | course_categories | |
---|---|---|---|---|
0 | Recurrent Neural Networks for Language Modelin... | 4 hours | Course | Machine Learning |
1 | Market Basket Analysis in Python | 4 hours | Course | Machine Learning |
2 | Machine Learning for Everyone | 4 hours | Course | Machine Learning |
3 | Machine Learning for Business | 4 hours | Course | Machine Learning |
4 | Practicing Machine Learning Interview Question... | 4 hours | Course | Machine Learning |
Matt Clarke, Saturday, June 05, 2021