Google Data Studio has native support for a range of platforms, but there’s no reliable means of pushing data in from Python without going via another data source. Google BigQuery is one option, but the easiest for small datasets is to use Google Sheets, which is supported natively and is easy to use.
Here’s how you can create a simple data pipeline to take data from MySQL using Python and Pandas and then push it into Google Sheets so you can access it in Google Data Studio.
To get our data from Python to Google Data Studio we’re going to the GSpread Python package and use some Google Sheets spreadsheets to store our data. You can install gspread easily via PyPi and can both write to and read from Google Sheets using Python.
However, you will need to go to the GSpread documentation and follow the instructions on how to create a Google
Service Account and then download the JSON
keyfile to ~/.config/gspread/service_account.
json, so it can be used to authenticate your user.
pip3 install gspread
Now that GSpread is installed and you have your Google Service Account key in the right place, you can import the Gspread package, and create a service account connection. We’ll then use the
create() function to create a new spreadsheet called “Data: Monthly sales by channel” and we’ll share it with a user and email them to let them know. As spreadsheets in GSpread are created by the service account user, it’s vital that you share them, otherwise they won’t appear in your Google Drive!
import gspread gc = gspread.service_account() sh = gc.create('Data: Monthly sales by channel') sh.share('firstname.lastname@example.org', perm_type='user', role='writer', email_message='Data: Monthly sales by channel')
There are three ways to open a spreadsheet: via the name, via the key, or via the URL. I’d recommend that you use the
open_by_key example, since the key is unique, unlike the name, and it can’t be broken if the sheet is renamed. Go to your Google Drive, find the spreadsheet and extract the key hash from the URL. It’s the long string of characters in the middle.
sh = gc.open_by_key('kasd789yas98dyu89asud9au8sd98uas9d8uj89ass')
Just like regular Google Sheets, you can add named worksheets to the spreadsheets you create. We’ll create a couple of named worksheets in which to store our data and then we’ll delete the default
Sheet1. We need to define the number of rows and columns and provide a number when we create the new worksheets.
wks_all = sh.add_worksheet(title="All", rows=1000, cols=6) wks_amazon = sh.add_worksheet(title="Amazon", rows=1000, cols=6) sh.del_worksheet(sh.sheet1)
Now we’ll use SQLAlchemy to write some SQL queries to query MySQL, fetch the data in Pandas and then push that data into each of the sheets we created in our Google Sheets spreadsheet. Any % symbols need to be escaped with an additional % in SQLAlchemy.
import gspread import pandas as pd from sqlalchemy import create_engine sh = gc.open_by_key('kasd789yas98dyu89asud9au8sd98uas9d8uj89ass') engine = create_engine('mysql+pymysql://root:SecretPassword@172.17.0.2:3306/database_name') query = """ SELECT DATE_FORMAT(orders.date_created, '%%Y%%m') AS period, COUNT(DISTINCT(orders.id)) AS total_orders, COUNT(DISTINCT(orders.customer_id)) AS total_customers, SUM(orders.revenue) AS total_revenue, ROUND((SUM(orders.revenue) / COUNT(DISTINCT(orders.customer_id))),2) AS aov, channels.title AS channel FROM shoporders LEFT JOIN channels ON orders.channel_id = channels.id WHERE channels.title = 'Amazon' GROUP BY DATE_FORMAT(orders.date_created, '%%Y%%m') ORDER BY DATE_FORMAT(orders.date_created, '%%Y%%m') DESC """ df = pd.read_sql(query, con=engine)
wks_all.update([df.columns.values.tolist()] + df.values.tolist())
You should now find your populated spreadsheet in your Google Drive. If you now go to Google Data Studio, you can add the data source in the usual way, giving you access to your MySQL, Pandas, or Python data directly in GDS.
Matt Clarke, Thursday, March 04, 2021