How to import data into Google Data Studio using Python

Picture by Alex Knight, Unsplash.

5 minutes to read

Google Data Studio has native support for a range of platforms, but there’s no reliable means of pushing data in from Python without going via another data source. Google BigQuery is one option, but the easiest for small datasets is to use Google Sheets, which is supported natively and is easy to use.

Here’s how you can create a simple data pipeline to take data from MySQL using Python and Pandas and then push it into Google Sheets so you can access it in Google Data Studio.

Install GSpread

To get our data from Python to Google Data Studio we’re going to the GSpread Python package and use some Google Sheets spreadsheets to store our data. You can install gspread easily via PyPi and can both write to and read from Google Sheets using Python.

However, you will need to go to the GSpread documentation and follow the instructions on how to create a Google Service Account and then download the JSON service_account.json keyfile to ~/.config/gspread/service_account. json, so it can be used to authenticate your user.

pip3 install gspread

Now that GSpread is installed and you have your Google Service Account key in the right place, you can import the Gspread package, and create a service account connection. We’ll then use the create() function to create a new spreadsheet called “Data: Monthly sales by channel” and we’ll share it with a user and email them to let them know. As spreadsheets in GSpread are created by the service account user, it’s vital that you share them, otherwise they won’t appear in your Google Drive!

import gspread
gc = gspread.service_account()

sh = gc.create('Data: Monthly sales by channel')
sh.share('email@example.com', perm_type='user', role='writer', email_message='Data: Monthly sales by channel')

Open the spreadsheet

There are three ways to open a spreadsheet: via the name, via the key, or via the URL. I’d recommend that you use the open_by_key example, since the key is unique, unlike the name, and it can’t be broken if the sheet is renamed. Go to your Google Drive, find the spreadsheet and extract the key hash from the URL. It’s the long string of characters in the middle.

sh = gc.open_by_key('kasd789yas98dyu89asud9au8sd98uas9d8uj89ass')

Create some named worksheets

Just like regular Google Sheets, you can add named worksheets to the spreadsheets you create. We’ll create a couple of named worksheets in which to store our data and then we’ll delete the default Sheet1. We need to define the number of rows and columns and provide a number when we create the new worksheets.

wks_all = sh.add_worksheet(title="All", rows=1000, cols=6)
wks_amazon = sh.add_worksheet(title="Amazon", rows=1000, cols=6)
sh.del_worksheet(sh.sheet1)

Import data from Python to Google Sheets

Now we’ll use SQLAlchemy to write some SQL queries to query MySQL, fetch the data in Pandas and then push that data into each of the sheets we created in our Google Sheets spreadsheet. Any % symbols need to be escaped with an additional % in SQLAlchemy.

import gspread
import pandas as pd
from sqlalchemy import create_engine

sh = gc.open_by_key('kasd789yas98dyu89asud9au8sd98uas9d8uj89ass')

engine = create_engine('mysql+pymysql://root:SecretPassword@172.17.0.2:3306/database_name')

query = """
SELECT
	DATE_FORMAT(orders.date_created, '%%Y%%m') AS period,
	COUNT(DISTINCT(orders.id)) AS total_orders,
	COUNT(DISTINCT(orders.customer_id)) AS total_customers,
	SUM(orders.revenue) AS total_revenue,
	ROUND((SUM(orders.revenue) / COUNT(DISTINCT(orders.customer_id))),2) AS aov,
	channels.title AS channel
FROM shoporders
LEFT JOIN channels ON orders.channel_id = channels.id
WHERE channels.title = 'Amazon'
GROUP BY DATE_FORMAT(orders.date_created, '%%Y%%m')
ORDER BY DATE_FORMAT(orders.date_created, '%%Y%%m') DESC
"""

df = pd.read_sql(query, con=engine)

wks_all.update([df.columns.values.tolist()] + df.values.tolist())

You should now find your populated spreadsheet in your Google Drive. If you now go to Google Data Studio, you can add the data source in the usual way, giving you access to your MySQL, Pandas, or Python data directly in GDS.

Matt Clarke, Thursday, March 04, 2021

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.