How to use CASE for flow control in SQL statements

Picture by Daria Shevtsova, Pexels.

11 minutes to read

Data Science SQL

The SQL CASE statement is used for flow control, much like an if, then, else statement. If the statement finds a match on the chosen condition it will return the defined value, otherwise it will continue through the other options until it finds one that matches, or it will fail gracefully and trigger the final ELSE clause.

The CASE statement is one of the most useful and powerful features in MySQL, and other SQL dialects. Not only does it allow you to create new column labels or values based on data, but it also allows you to perform calculations based on the presence of values in one or more columns. It’s pretty easy to pick up and is something you’ll soon be using every day. Here’s how it works.

How the `CASE` statement works

In the simple example below, we’re querying the sku, and unit_price for some products, and we’re using the CASE statement to assign a label to a new column in our result set, based on the value of the unit_price.

SELECT 
    sku, 
    unit_price,
    CASE
        WHEN unit_price < 5 THEN 'Low'
        WHEN unit_price >= 5 AND unit_price <= 10 THEN 'Medium'
        ELSE 'High'
    END AS price_range
FROM order_items

We place the CASE statement in the list of columns we want to select, then we create a series of conditions that will be checked in order. First, we’ll check to see if the unit_price is less than 5, and if it is we use THEN to return the value Low. If the price is greater than 5, execution will continue to the next step.

In the next step, we check if the unit_price is greater than or equal to five, or less than or equal to 10. If it is, we’ll return the value Medium. If the condition fails, execution will continue. Since we know the price must be greater than 10, we can use ELSE to exit the flow control section and return the label High. Finally, we use END AS to assign the value to a column called price_range.

It’s pretty easy really, but let’s look at some other examples, using an actual database.

Setting up your MySQL server

First, you’ll need to create a MySQL database. I run my MySQL server in a Docker container and have created a test database based on the widely used Online Retail Dataset. I’ve created a couple of step-by-step guides that you can follow to create your own MySQL Docker container and import a MySQL database, so you can follow along on the very same data. It only takes a few minutes.

Install and load the packages

Once you’ve got your MySQL server setup, open up a Jupyter notebook and install the SQLAlchemy and PyMySQL Python packages. We’ll be using these to connect to MySQL and pass in database queries using Pandas. Once you’ve installed the packages using the Pip package manager, import pandas and the create_engine module from sqlalchemy.

!pip3 install sqlalchemy
!pip3 install pymysql

import pandas as pd
from sqlalchemy import create_engine

Create a database connection

Next, we need to configure the connection to our MySQL server by creating a connection string to pass to the create_engine() function. This will vary according to your own setup. I’m connecting to a MySQL server with the PyMySQL driver, using the username root and the password ***** on port 3306 of the IP address 172.17.0.3. The final argument tells create_engine() to connect to the ecommerce database.

engine = create_engine('mysql+pymysql://root:*****@172.17.0.2:3306/ecommerce')

SELECT FROM an SQL table

To test the connection to the MySQL server we’ll first write a simple MySQL SELECT statement to return everything from the order_items table.

df = pd.read_sql_query("SELECT * FROM order_items", con=engine)
df.head()

	invoice	sku	description	qty	order_date	unit_price	customer_id	country	line_price
0	536365	85123A	WHITE HANGING HEART T-LIGHT HOLDER	6	2010-12-01 08:26:00	2.55	17850.0	United Kingdom	15.30
1	536365	71053	WHITE METAL LANTERN	6	2010-12-01 08:26:00	3.39	17850.0	United Kingdom	20.34
2	536365	84406B	CREAM CUPID HEARTS COAT HANGER	8	2010-12-01 08:26:00	2.75	17850.0	United Kingdom	22.00
3	536365	84029G	KNITTED UNION FLAG HOT WATER BOTTLE	6	2010-12-01 08:26:00	3.39	17850.0	United Kingdom	20.34
4	536365	84029E	RED WOOLLY HOTTIE WHITE HEART.	6	2010-12-01 08:26:00	3.39	17850.0	United Kingdom	20.34

Create a CASE statement in MySQL

The simplest way to use the CASE statement is to construct a flow control section that works rather like a switch statement or if elseif else statement in order to form a new column value. For example, let’s say we wanted to perform some simple binning on our unit_price data and assign a label of Low, Medium, or High, depending on the unit_price value.

To do this, we’d create a CASE statement within our SQL SELECT with WHEN rules. If the data match the WHEN rule, we’d then apply a label using THEN. If the data doesn’t match the WHEN rule, execution continues up to the ELSE at the end of the CASE statement. We then use END AS to stop the CASE statement and assign the output to our new column, called price_range in this simple example.

query = """
SELECT 
    sku, 
    description, 
    unit_price,
    CASE
        WHEN unit_price < 5 THEN 'Low'
        WHEN unit_price >= 5 AND unit_price <= 10 THEN 'Medium'
        ELSE 'High'
    END AS price_range
FROM order_items
"""

df = pd.read_sql_query(query, con=engine)

df.sample(5)

	sku	description	unit_price	price_range
240242	22759	SET OF 3 NOTEBOOKS IN PARCEL	1.65	Low
203110	21033	JUMBO BAG CHARLIE AND LOLA TOYS	4.13	Low
78742	22482	BLUE TEA TOWEL CLASSIC DESIGN	1.25	Low
418823	21818	GLITTER HEART DECORATION	0.39	Low
152474	22570	FELTCRAFT CUSHION RABBIT	3.75	Low

One-hot encode data using `CASE`

You can also use CASE to one-hot encode data and record a binary value of 1 or 0 depending on whether the data in a given column or columns matches your rules. This can be quite a useful technique if you want to perform calculations on categorical variables stored within your database. Granted, it’s easier to do in Pandas, but sometimes you’ll need to have the data in SQL.

query = """
SELECT 
    invoice, 
    description,
    CASE
        WHEN description LIKE '%WATER BOTTLE%' THEN 1 
        ELSE 0
    END AS contains_water_bottle
FROM order_items
"""

df = pd.read_sql_query(query, con=engine)
df.head()

	invoice	description	contains_water_bottle
0	536365	WHITE HANGING HEART T-LIGHT HOLDER	0
1	536365	WHITE METAL LANTERN	0
2	536365	CREAM CUPID HEARTS COAT HANGER	0
3	536365	KNITTED UNION FLAG HOT WATER BOTTLE	1
4	536365	RED WOOLLY HOTTIE WHITE HEART.	0

Create aggregate calculations using CASE with GROUP BY

You can also use the CASE statement to perform calculations on aggregated data. For example, let’s take the simple example above where we one-hot encoded order lines to include a 1 if they contained the phrase “WATER BOTTLE”. If we wrap the calculation in a SUM() we can aggregate the data and return the total number of order lines that matched the criteria.

query = """
SELECT 
    COUNT(invoice) AS order_lines,
    COUNT(DISTINCT(invoice)) AS orders,
    SUM(CASE WHEN description LIKE '%WATER BOTTLE%' THEN 1 ELSE 0 END) AS water_bottles
FROM order_items
"""

df = pd.read_sql_query(query, con=engine)
df.head()

	order_lines	orders	water_bottles
0	541909	25900	9505.0

You can create much more complex CASE statements incorporating a whole range of different criteria, based on one or more columns, and either assign labels or values to fields, or perform calculations on the data. This makes CASE one of the most useful and powerful flow control features in SQL.

Matt Clarke, Wednesday, April 20, 2022

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.