In ecommerce, it’s very common for retailers to need to monitor the prices of their competitors. Prices make a big difference to sales and if they’re set too high then customers will go elsewhere, so monitoring them is crucial for ensuring the sales keeping coming in.
In this post, we’ll create a Shopify price tracker with Python. We’ll scrape a Shopify ecommerce store, extract the product prices, and store them on a daily basis in a SQLite database. We can then check the database to see if the prices have changed.
To save the hassle of writing a Shopify scraper from scratch, as I showed in my previous tutorial, we’ll instead be using the ShopifyScraper package I created from the same code. This makes the process of scraping a Shopify store possible in just a few lines of code.
To install my ShopifyScraper package, run the following command in your terminal. This will install the latest version of the package from my GitHub repository using the Pip package management system.
pip3 install git+https://github.com/practical-data-science/ShopifyScraper.git
Next, you’ll want to open Pycharm or a similar Python IDE, since this is a project best completed in a script, rather than a Jupyter notebook. Create a script called price_tracker.py
. At the top of the script, you’ll need to import the packages we’ll be using. You’ll likely have Pandas installed, and datetime and sqlite3 are standard libraries, so you won’t need to install them.
import datetime
import pandas as pd
from shopify_scraper import scraper
import sqlite3
Next, we’ll define the URL of the Shopify ecommerce site we want to create and then call the two main functions in ShopifyScraper. The get_products()
function will extract all the product page data from the site, while the get_variants()
function will go through this data and extract the product variants. Both the products and variants are stored in a Pandas dataframe.
url = "https://sunrayflyfish.com"
parents = scraper.get_products(url)
children = scraper.get_variants(parents)
Since we only want to store a subset of columns for our price tracking application, we’ll create a new dataframe with just the columns we want. We’ll also add a column for the date and time the data was scraped. You could, of course, make this more efficient and store only the id
, price
, and date_checked
, and then use merge()
to join it to the source data stored in products
or children
, but for this simple example we’ll store some extra data.
prices = children[['id', 'sku', 'parent_title', 'title', 'price', 'compare_at_price']].sort_values(by=['parent_title', 'price'], ascending=True)
prices['date_checked'] = datetime.datetime.now()
Here’s what that data looks like.
prices.head()
id | sku | parent_title | title | price | compare_at_price | date_checked | |
---|---|---|---|---|---|---|---|
60 | 39367977009217 | CLR-FLUORO- 2LB-SR | 100% Fluorocarbon Tippet | 2lb 0.15mm | 4.99 | 13.99 | 2022-10-01 15:22:35.579469 |
61 | 39367977041985 | CLR-FLUORO- 4LB-SR | 100% Fluorocarbon Tippet | 4lb 0.18mm | 4.99 | 13.99 | 2022-10-01 15:22:35.579469 |
62 | 39367977074753 | CLR-FLUORO- 6LB-SR | 100% Fluorocarbon Tippet | 6lb 0.20mm | 4.99 | 13.99 | 2022-10-01 15:22:35.579469 |
63 | 39367977107521 | CLR-FLUORO- 8LB-SR | 100% Fluorocarbon Tippet | 8lb 0.24mm | 4.99 | 13.99 | 2022-10-01 15:22:35.579469 |
64 | 39367978352705 | CLR-FLUORO- 12LB-SR | 100% Fluorocarbon Tippet | 12lb 0.28m | 4.99 | 13.99 | 2022-10-01 15:22:35.579469 |
Next, we’ll create a SQLite database and store the price data in it. We’ll define the name of the database as prices_tracker.db
and then create a connection to it. We’ll then use the to_sql()
function to take our prices
dataframe above and store it in a new table called prices
using the conn
connection just created. If the product exists already we’ll append it, so we end up with a historical record of prices.
conn = sqlite3.connect('price_tracker.db')
prices.to_sql('prices', conn, if_exists='append', index=False)
conn.close()
Finally, we’ll create a cron job to run the price_tracker.py
script daily. This will allow us to track the prices of products on the site over time. Cron is the easiest way to create simple scheduled tasks like this on Linux workstations or servers, but it’s worth checking out the excellent Apache Airflow if you want something more robust and feature rich.
You need to configure your cron job to run the script once every day. Each time it runs, it will fetch the products, variants, and prices and then store the prices in the SQLite database.
Now create a separate Python script, or a Jupyter notebook if you can. We’ll use this to read the price tracker data from the SQLite database and then display it. You’ll need to import Pandas and SQLite again and then connect to the price_tracker.db
database in which we’ve stored our prices.
To crudely select everything from the prices
table we can use SELECT * FROM prices
and then use the read_sql()
function to read the data into a dataframe. We’ll then use the head()
function to display the first few rows of the data.
import pandas as pd
import sqlite3
conn = sqlite3.connect('price_tracker.db')
stored_prices = pd.read_sql_query("SELECT * FROM prices", conn)
conn.close()
stored_prices.head()
id | sku | parent_title | title | price | compare_at_price | date_checked | |
---|---|---|---|---|---|---|---|
0 | 39367977009217 | CLR-FLUORO- 2LB-SR | 100% Fluorocarbon Tippet | 2lb 0.15mm | 4.99 | 13.99 | 2022-10-01 15:21:57.240474 |
1 | 39367977041985 | CLR-FLUORO- 4LB-SR | 100% Fluorocarbon Tippet | 4lb 0.18mm | 4.99 | 13.99 | 2022-10-01 15:21:57.240474 |
2 | 39367977074753 | CLR-FLUORO- 6LB-SR | 100% Fluorocarbon Tippet | 6lb 0.20mm | 4.99 | 13.99 | 2022-10-01 15:21:57.240474 |
3 | 39367977107521 | CLR-FLUORO- 8LB-SR | 100% Fluorocarbon Tippet | 8lb 0.24mm | 4.99 | 13.99 | 2022-10-01 15:21:57.240474 |
4 | 39367978352705 | CLR-FLUORO- 12LB-SR | 100% Fluorocarbon Tippet | 12lb 0.28m | 4.99 | 13.99 | 2022-10-01 15:21:57.240474 |
To fetch the historical data stored on a single product in the price tracker database we can create a really simple SQL query and then execute it with the read_sql_query()
function. We’ll use the WHERE
clause to filter the data to only include the product with the sku
of CLR-FLUORO- 2LB-SR
and then use the head()
function to display the first few rows of the data.
query = """
SELECT * FROM prices
WHERE sku = 'CLR-FLUORO- 2LB-SR'
"""
conn = sqlite3.connect('price_tracker.db')
specific_product = pd.read_sql_query(query, conn)
conn.close()
specific_product.head()
id | sku | parent_title | title | price | compare_at_price | date_checked | |
---|---|---|---|---|---|---|---|
0 | 39367977009217 | CLR-FLUORO- 2LB-SR | 100% Fluorocarbon Tippet | 2lb 0.15mm | 4.99 | 13.99 | 2022-10-01 15:21:57.240474 |
143 | 39367977009217 | CLR-FLUORO- 2LB-SR | 100% Fluorocarbon Tippet | 2lb 0.15mm | 4.99 | 13.99 | 2022-10-01 15:22:35.579469 |
Obviously, this is a really basic example. However, it shows how little code is required to achieve the job when working with a Shopify ecommerce site, since the Shopify Scraper package does all the heavy lifting for us.
With some very minor tweaks, this could be configured to loop through a list of Shopify stores and scrape and store the data in a database. This would allow you to track the prices of products across multiple stores and then compare the prices to find whether you need to adjust your prices to increase sales.
Matt Clarke, Saturday, October 01, 2022