How to create a Shopify price tracker with Python

Learn how to create a Shopify price tracker by scraping a Shopify ecommerce store with Python and storing the data in a SQLite database.

How to create a Shopify price tracker with Python
Picture by Karolina Grabowska, Pexels.
10 minutes to read

In ecommerce, it’s very common for retailers to need to monitor the prices of their competitors. Prices make a big difference to sales and if they’re set too high then customers will go elsewhere, so monitoring them is crucial for ensuring the sales keeping coming in.

In this post, we’ll create a Shopify price tracker with Python. We’ll scrape a Shopify ecommerce store, extract the product prices, and store them on a daily basis in a SQLite database. We can then check the database to see if the prices have changed.

To save the hassle of writing a Shopify scraper from scratch, as I showed in my previous tutorial, we’ll instead be using the ShopifyScraper package I created from the same code. This makes the process of scraping a Shopify store possible in just a few lines of code.

Install ShopifyScraper

To install my ShopifyScraper package, run the following command in your terminal. This will install the latest version of the package from my GitHub repository using the Pip package management system.

pip3 install git+https://github.com/practical-data-science/ShopifyScraper.git

Import the packages

Next, you’ll want to open Pycharm or a similar Python IDE, since this is a project best completed in a script, rather than a Jupyter notebook. Create a script called price_tracker.py. At the top of the script, you’ll need to import the packages we’ll be using. You’ll likely have Pandas installed, and datetime and sqlite3 are standard libraries, so you won’t need to install them.

import datetime
import pandas as pd
from shopify_scraper import scraper
import sqlite3

Scrape the Shopify site

Next, we’ll define the URL of the Shopify ecommerce site we want to create and then call the two main functions in ShopifyScraper. The get_products() function will extract all the product page data from the site, while the get_variants() function will go through this data and extract the product variants. Both the products and variants are stored in a Pandas dataframe.

url = "https://sunrayflyfish.com"
parents = scraper.get_products(url)
children = scraper.get_variants(parents)

Create a subset of price data to store

Since we only want to store a subset of columns for our price tracking application, we’ll create a new dataframe with just the columns we want. We’ll also add a column for the date and time the data was scraped. You could, of course, make this more efficient and store only the id, price, and date_checked, and then use merge() to join it to the source data stored in products or children, but for this simple example we’ll store some extra data.

prices = children[['id', 'sku', 'parent_title', 'title', 'price', 'compare_at_price']].sort_values(by=['parent_title', 'price'], ascending=True)
prices['date_checked'] = datetime.datetime.now()

Here’s what that data looks like.

prices.head()
id sku parent_title title price compare_at_price date_checked
60 39367977009217 CLR-FLUORO- 2LB-SR 100% Fluorocarbon Tippet 2lb 0.15mm 4.99 13.99 2022-10-01 15:22:35.579469
61 39367977041985 CLR-FLUORO- 4LB-SR 100% Fluorocarbon Tippet 4lb 0.18mm 4.99 13.99 2022-10-01 15:22:35.579469
62 39367977074753 CLR-FLUORO- 6LB-SR 100% Fluorocarbon Tippet 6lb 0.20mm 4.99 13.99 2022-10-01 15:22:35.579469
63 39367977107521 CLR-FLUORO- 8LB-SR 100% Fluorocarbon Tippet 8lb 0.24mm 4.99 13.99 2022-10-01 15:22:35.579469
64 39367978352705 CLR-FLUORO- 12LB-SR 100% Fluorocarbon Tippet 12lb 0.28m 4.99 13.99 2022-10-01 15:22:35.579469

Store the prices in a SQLite database

Next, we’ll create a SQLite database and store the price data in it. We’ll define the name of the database as prices_tracker.db and then create a connection to it. We’ll then use the to_sql() function to take our prices dataframe above and store it in a new table called prices using the conn connection just created. If the product exists already we’ll append it, so we end up with a historical record of prices.

conn = sqlite3.connect('price_tracker.db')
prices.to_sql('prices', conn, if_exists='append', index=False)
conn.close()

Create a cron job to run the script daily

Finally, we’ll create a cron job to run the price_tracker.py script daily. This will allow us to track the prices of products on the site over time. Cron is the easiest way to create simple scheduled tasks like this on Linux workstations or servers, but it’s worth checking out the excellent Apache Airflow if you want something more robust and feature rich.

You need to configure your cron job to run the script once every day. Each time it runs, it will fetch the products, variants, and prices and then store the prices in the SQLite database.

Read the price tracker data

Now create a separate Python script, or a Jupyter notebook if you can. We’ll use this to read the price tracker data from the SQLite database and then display it. You’ll need to import Pandas and SQLite again and then connect to the price_tracker.db database in which we’ve stored our prices.

To crudely select everything from the prices table we can use SELECT * FROM prices and then use the read_sql() function to read the data into a dataframe. We’ll then use the head() function to display the first few rows of the data.

import pandas as pd
import sqlite3
conn = sqlite3.connect('price_tracker.db')
stored_prices = pd.read_sql_query("SELECT * FROM prices", conn)
conn.close()
stored_prices.head()
id sku parent_title title price compare_at_price date_checked
0 39367977009217 CLR-FLUORO- 2LB-SR 100% Fluorocarbon Tippet 2lb 0.15mm 4.99 13.99 2022-10-01 15:21:57.240474
1 39367977041985 CLR-FLUORO- 4LB-SR 100% Fluorocarbon Tippet 4lb 0.18mm 4.99 13.99 2022-10-01 15:21:57.240474
2 39367977074753 CLR-FLUORO- 6LB-SR 100% Fluorocarbon Tippet 6lb 0.20mm 4.99 13.99 2022-10-01 15:21:57.240474
3 39367977107521 CLR-FLUORO- 8LB-SR 100% Fluorocarbon Tippet 8lb 0.24mm 4.99 13.99 2022-10-01 15:21:57.240474
4 39367978352705 CLR-FLUORO- 12LB-SR 100% Fluorocarbon Tippet 12lb 0.28m 4.99 13.99 2022-10-01 15:21:57.240474

Read the historical data on a single product

To fetch the historical data stored on a single product in the price tracker database we can create a really simple SQL query and then execute it with the read_sql_query() function. We’ll use the WHERE clause to filter the data to only include the product with the sku of CLR-FLUORO- 2LB-SR and then use the head() function to display the first few rows of the data.

query = """
SELECT * FROM prices 
WHERE sku = 'CLR-FLUORO- 2LB-SR'
"""
conn = sqlite3.connect('price_tracker.db')
specific_product = pd.read_sql_query(query, conn)
conn.close()
specific_product.head()
id sku parent_title title price compare_at_price date_checked
0 39367977009217 CLR-FLUORO- 2LB-SR 100% Fluorocarbon Tippet 2lb 0.15mm 4.99 13.99 2022-10-01 15:21:57.240474
143 39367977009217 CLR-FLUORO- 2LB-SR 100% Fluorocarbon Tippet 2lb 0.15mm 4.99 13.99 2022-10-01 15:22:35.579469

Next steps

Obviously, this is a really basic example. However, it shows how little code is required to achieve the job when working with a Shopify ecommerce site, since the Shopify Scraper package does all the heavy lifting for us.

With some very minor tweaks, this could be configured to loop through a list of Shopify stores and scrape and store the data in a database. This would allow you to track the prices of products across multiple stores and then compare the prices to find whether you need to adjust your prices to increase sales.

Matt Clarke, Saturday, October 01, 2022

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.