How to use BETWEEN in SQL statements to return data between two values

Selecting data between two values or dates is easy when you use create a BETWEEN AND expression in SQL. Here's how to do it.

How to use BETWEEN in SQL statements to return data between two values
Picture by Anna Nekrashevich, Pexels.
12 minutes to read

In SQL, when you want to SELECT data that lies between two values, there are a number of different SQL operators you can use to return the correct data. However, the BETWEEN operator is arguably the most logical one to read, which can make queries easier for data scientists to logically parse when reviewing code at a later date.

In this simple tutorial I’ll show you how you can use the BETWEEN opeator with AND to create an expression that returns values and dates that fall between a minimum and a maximum value. Once you know how to use it, you’ll likely find yourself using this SQL expression regularly, especially when selecting data that falls within a given date range.

Fire up your MySQL server

For this example we’ll be accessing a MySQL Docker container running the Online Retail Dataset database, but you can use any database to which you have access. It only takes a few minutes to spin up a MySQL Docker container and import an SQL database, so if you want to follow along, I’d recommend trying this approach.

Install and load the packages

You can write your SQL SELECT statements directly into the terminal or into a MySQL client, such as Beekeeper Studio, if you wish. However, to make it easier for you to follow I’m writing the code in Pandas. To do this you’ll first need to install the sqlalchemy and pmysql Python packages used to query a SQL database using Pandas. Then you’ll need to import pandas and the create_engine module from sqlalchemy.

!pip3 install sqlalchemy
!pip3 install pymysql
import pandas as pd
from sqlalchemy import create_engine

Create a database connection

Next, we need to create a connection to our SQL server. I’m connecting to the ecommerce database on my MySQL Docker container which is located at IP address 172.17.0.3 on port 3306. I’m connecting using the username root and my password, which I’m hidden with asterisks.

engine = create_engine('mysql+pymysql://root:********@172.17.0.3:3306/ecommerce')

SELECT * FROM an SQL table

To test your connection and take a look at the data, let’s first write a simple SQL statement to SELECT all of the columns and rows from the order_items table. Since the Pandas read_sql_query() passes the SQL statement to the SQL server and returns a Pandas dataframe, we can run the statement and use head() to print the first five rows. As you can see, the order_date column contains the date each order was placed.

df = pd.read_sql_query("SELECT * FROM order_items", con=engine)
df.head()
invoice sku description qty order_date unit_price customer_id country line_price
0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2010-12-01 08:26:00 2.55 17850.0 United Kingdom 15.30
1 536365 71053 WHITE METAL LANTERN 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 2010-12-01 08:26:00 2.75 17850.0 United Kingdom 22.00
3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34

We can take a quick look at the data in the order_date column by using the Pandas min() and max() functions to determine the first and last order dates in the dataset.

df['order_date'].min()
Timestamp('2010-12-01 08:26:00')
df['order_date'].max()
Timestamp('2011-12-09 12:50:00')

SELECT data for a specific date

Next, we’ll select the data for a specific date. Since the data in order_date are expressed as datetime, if you attempt to select only those rows where the order_date is equal to a date, i.e. 2010-12-01, you’ll get no data back.

query = """
SELECT * FROM order_items 
WHERE order_date = '2010-12-01'
"""
df = pd.read_sql(query, con=engine)
df.head()
invoice sku description qty order_date unit_price customer_id country line_price

Instead, you’ll likely want to use LIKE and query part of the datetime value. The query below will return any order_date values that start with our chosen date. If you apply the min() and max() functions to the values in the dataframe you can sanity check that your query has returned valid results.

query = """
SELECT * FROM order_items 
WHERE order_date LIKE '2010-12-01 %'
"""
df = pd.read_sql(query, con=engine)
df.head()
invoice sku description qty order_date unit_price customer_id country line_price
0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2010-12-01 08:26:00 2.55 17850.0 United Kingdom 15.30
1 536365 71053 WHITE METAL LANTERN 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 2010-12-01 08:26:00 2.75 17850.0 United Kingdom 22.00
3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
df['order_date'].min()
Timestamp('2010-12-01 08:26:00')
df['order_date'].max()
Timestamp('2010-12-01 17:35:00')

SELECT items between two dates using BETWEEN

To select all data that falls between two dates, or two values, you can use a BETWEENAND expression. Unlike the queries above, BETWEEN will work with datetime values, so there’s no need for a LIKE here. Instead, we simply define the column we want to query, i.e. order_date and then use BETWEEN and AND to separate the minimum value and the maximum value.

query = """
SELECT * FROM order_items 
WHERE order_date BETWEEN '2010-12-01' AND '2010-12-03' 
"""
df = pd.read_sql(query, con=engine)
df.head()
invoice sku description qty order_date unit_price customer_id country line_price
0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2010-12-01 08:26:00 2.55 17850.0 United Kingdom 15.30
1 536365 71053 WHITE METAL LANTERN 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 2010-12-01 08:26:00 2.75 17850.0 United Kingdom 22.00
3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34

The important caveat with BETWEEN is that it is not functionally equivalent to the >= AND <= operator expression. For example, if you print the min() and max() values returned by the SQL statement above, you’ll notice that it returns values greater than or equal to the mininum, i.e. >=, but that are less than i.e. < the maximum value.

This means our WHERE order_date BETWEEN '2010-12-01' AND '2010-12-03' expression will return orders from 2010-12-01 to 2010-12-02, not the 2020-12-03 shown in the query. This is a common mistake when writing SQL statements that select date ranges or other values and can result in inaccurate results.

df['order_date'].min()
Timestamp('2010-12-01 08:26:00')
df['order_date'].max()
Timestamp('2010-12-02 19:59:00')

SELECT items between two values using operators

Here’s an alternative way to select the same data shown above using regular SQL operators. Whenever you’re selecting data between ranges, I’d highly recommend that you sanity check the results using min() and max() to double-check that the query is returning the data you expect.

query = """
SELECT * FROM order_items 
WHERE order_date >= '2010-12-01' 
AND order_date < '2010-12-03' 
"""
df = pd.read_sql(query, con=engine)
df.head()
invoice sku description qty order_date unit_price customer_id country line_price
0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2010-12-01 08:26:00 2.55 17850.0 United Kingdom 15.30
1 536365 71053 WHITE METAL LANTERN 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 2010-12-01 08:26:00 2.75 17850.0 United Kingdom 22.00
3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
df['order_date'].min()
Timestamp('2010-12-01 08:26:00')
df['order_date'].max()
Timestamp('2010-12-02 19:59:00')

Matt Clarke, Wednesday, April 13, 2022

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.