CASE statement is used for flow control, much like an if, then, else statement. If the statement finds a match on the chosen condition it will return the defined value, otherwise it will continue through the other options until it finds one that matches, or it will fail gracefully and trigger the final
CASE statement is one of the most useful and powerful features in MySQL, and other SQL dialects. Not only does it allow you to create new column labels or values based on data, but it also allows you to perform calculations based on the presence of values in one or more columns. It’s pretty easy to pick up and is something you’ll soon be using every day. Here’s how it works.
In the simple example below, we’re querying the
unit_price for some products, and we’re using the
CASE statement to assign a label to a new column in our result set, based on the value of the
SELECT sku, unit_price, CASE WHEN unit_price < 5 THEN 'Low' WHEN unit_price >= 5 AND unit_price <= 10 THEN 'Medium' ELSE 'High' END AS price_range FROM order_items
We place the
CASE statement in the list of columns we want to select, then we create a series of conditions that will be checked in order. First, we’ll check to see if the
unit_price is less than 5, and if it is we use
THEN to return the value
Low. If the price is greater than 5, execution will continue to the next step.
In the next step, we check if the
unit_price is greater than or equal to five, or less than or equal to 10. If it is, we’ll return the value
Medium. If the condition fails, execution will continue. Since we know the price must be greater than 10, we can use
ELSE to exit the flow control section and return the label
High. Finally, we use
END AS to assign the value to a column called
It’s pretty easy really, but let’s look at some other examples, using an actual database.
First, you’ll need to create a MySQL database. I run my MySQL server in a Docker container and have created a test database based on the widely used Online Retail Dataset. I’ve created a couple of step-by-step guides that you can follow to create your own MySQL Docker container and import a MySQL database, so you can follow along on the very same data. It only takes a few minutes.
Once you’ve got your MySQL server setup, open up a Jupyter notebook and install the SQLAlchemy and PyMySQL Python packages. We’ll be using these to connect to MySQL and pass in database queries using Pandas. Once you’ve installed the packages using the Pip package manager, import
pandas and the
create_engine module from
!pip3 install sqlalchemy !pip3 install pymysql
import pandas as pd from sqlalchemy import create_engine
Next, we need to configure the connection to our MySQL server by creating a connection string to pass to the
create_engine() function. This will vary according to your own setup. I’m connecting to a MySQL server with the PyMySQL driver, using the username root and the password
***** on port 3306 of the IP address 172.17.0.3. The final argument tells
create_engine() to connect to the
engine = create_engine('mysql+pymysql://root:*****@172.17.0.2:3306/ecommerce')
To test the connection to the MySQL server we’ll first write a simple MySQL
SELECT statement to return everything from the
df = pd.read_sql_query("SELECT * FROM order_items", con=engine) df.head()
|0||536365||85123A||WHITE HANGING HEART T-LIGHT HOLDER||6||2010-12-01 08:26:00||2.55||17850.0||United Kingdom||15.30|
|1||536365||71053||WHITE METAL LANTERN||6||2010-12-01 08:26:00||3.39||17850.0||United Kingdom||20.34|
|2||536365||84406B||CREAM CUPID HEARTS COAT HANGER||8||2010-12-01 08:26:00||2.75||17850.0||United Kingdom||22.00|
|3||536365||84029G||KNITTED UNION FLAG HOT WATER BOTTLE||6||2010-12-01 08:26:00||3.39||17850.0||United Kingdom||20.34|
|4||536365||84029E||RED WOOLLY HOTTIE WHITE HEART.||6||2010-12-01 08:26:00||3.39||17850.0||United Kingdom||20.34|
The simplest way to use the
CASE statement is to construct a flow control section that works rather like a
switch statement or
if elseif else statement in order to form a new column value. For example, let’s say we wanted to perform some simple binning on our
unit_price data and assign a label of
High, depending on the
To do this, we’d create a
CASE statement within our SQL
WHEN rules. If the data match the
WHEN rule, we’d then apply a label using
THEN. If the data doesn’t match the
WHEN rule, execution continues up to the
ELSE at the end of the
CASE statement. We then use
END AS to stop the
CASE statement and assign the output to our new column, called
price_range in this simple example.
query = """ SELECT sku, description, unit_price, CASE WHEN unit_price < 5 THEN 'Low' WHEN unit_price >= 5 AND unit_price <= 10 THEN 'Medium' ELSE 'High' END AS price_range FROM order_items """
df = pd.read_sql_query(query, con=engine)
|240242||22759||SET OF 3 NOTEBOOKS IN PARCEL||1.65||Low|
|203110||21033||JUMBO BAG CHARLIE AND LOLA TOYS||4.13||Low|
|78742||22482||BLUE TEA TOWEL CLASSIC DESIGN||1.25||Low|
|418823||21818||GLITTER HEART DECORATION||0.39||Low|
|152474||22570||FELTCRAFT CUSHION RABBIT||3.75||Low|
You can also use
CASE to one-hot encode data and record a binary value of 1 or 0 depending on whether the data in a given column or columns matches your rules. This can be quite a useful technique if you want to perform calculations on categorical variables stored within your database. Granted, it’s easier to do in Pandas, but sometimes you’ll need to have the data in SQL.
query = """ SELECT invoice, description, CASE WHEN description LIKE '%WATER BOTTLE%' THEN 1 ELSE 0 END AS contains_water_bottle FROM order_items """
df = pd.read_sql_query(query, con=engine) df.head()
|0||536365||WHITE HANGING HEART T-LIGHT HOLDER||0|
|1||536365||WHITE METAL LANTERN||0|
|2||536365||CREAM CUPID HEARTS COAT HANGER||0|
|3||536365||KNITTED UNION FLAG HOT WATER BOTTLE||1|
|4||536365||RED WOOLLY HOTTIE WHITE HEART.||0|
You can also use the
CASE statement to perform calculations on aggregated data. For example, let’s take the simple example above where we one-hot encoded order lines to include a 1 if they contained the phrase “WATER BOTTLE”. If we wrap the calculation in a
SUM() we can aggregate the data and return the total number of order lines that matched the criteria.
query = """ SELECT COUNT(invoice) AS order_lines, COUNT(DISTINCT(invoice)) AS orders, SUM(CASE WHEN description LIKE '%WATER BOTTLE%' THEN 1 ELSE 0 END) AS water_bottles FROM order_items """
df = pd.read_sql_query(query, con=engine) df.head()
You can create much more complex
CASE statements incorporating a whole range of different criteria, based on one or more columns, and either assign labels or values to fields, or perform calculations on the data. This makes
CASE one of the most useful and powerful flow control features in SQL.
Matt Clarke, Wednesday, April 20, 2022