How to use SELECT, FROM, WHERE, and AND in SQL statements

Learn how to query a MySQL database using SELECT, FROM, WHERE, and AND in simple SQL statements.

How to use SELECT, FROM, WHERE, and AND in SQL statements
Picture by Brett Sayles, Pexels.
18 minutes to read

The SELECT statement is the most simple of all SQL queries and allows you to retrieve the precise data you want from one or more tables, or even databases. In this simple tutorial I’ll explain how you can use SELECT with the FROM, WHERE and AND clauses to fetch a wide range of specific column values from the database tables in a MySQL database.

Fire up your database server

Before you can start, you’ll need to have access to a MySQL database that you can query. To make this easier to follow along, I’ve created a MySQL database dump file of the popular Online Retail Dataset that you can import into a Docker container.

To get started, you’ll first need to create a MySQL Docker container and then import the SQL database dump to create a new database. Although it sounds scary, the whole process should only take a few minutes and I’ve provided the code you need for every step.

Install and load the packages

To query the MySQL database on your Docker container we’ll need to install a couple of Python packages. To query a database with Pandas you’ll need an SQL driver - a bit of extra code that connects to the database and passes your SQL queries. We’ll be using sqlalchemy and pymysql, so import the packages into your Jupyter environment using the commands below.

!pip3 install sqlalchemy
!pip3 install pymysql
import pandas as pd
from sqlalchemy import create_engine

Create a database connection

Next we need to configure our connection to the MySQL Docker container. This is covered in the tutorial on creating the database itself, but basically involves connecting to the IP address of the server, usually on port 3306, and then defining the ecommerce database to which you want to connect, along with your username and password.

engine = create_engine('mysql+pymysql://root:**********@172.17.0.3:3306/ecommerce')

SELECT * FROM a SQL table

Now you hopefully have a working database connection, we’ll write the most simple of all SQL queries - the SELECT statement. The statement SELECT * FROM customers uses the asterisk wildcard to to MySQL to return all columns in the customers table.

To execute the SQL statement or query, we’ll pass it to the Pandas read_sql_query() function along with the engine connection we created above. Pandas will use SQLAlchemy and PyMSQL to connect to the database, run the query, and return the results in a Pandas dataframe that we’re assigning to the variable df.

Since that’s just a regular Pandas dataframe, we can manipulate it in the same way we would an other dataframe, for example, by using head() to print the first five rows. The ecommerce database contains four database tables called customers, orders, order_items, and products. You can select the data from these tables simply by changing the database table parameter after FROM.

df = pd.read_sql_query("SELECT * FROM customers", con=engine)
df.head()
customer_id revenue orders skus units first_order_date last_order_date
0 14646.0 279489.02 77 703 196719 2010-12-20 10:09:00 2011-12-08 12:12:00
1 18102.0 256438.49 62 151 64122 2010-12-07 16:42:00 2011-12-09 11:50:00
2 17450.0 187482.17 55 127 69029 2010-12-07 09:23:00 2011-12-01 13:29:00
3 14911.0 132572.62 248 1794 77180 2010-12-01 14:05:00 2011-12-08 15:54:00
4 12415.0 123725.45 26 444 77242 2011-01-06 11:12:00 2011-11-15 14:22:00
df = pd.read_sql_query("SELECT * FROM orders", con=engine)
df.head()
invoice revenue skus items
0 536365 139.12 7 40
1 536366 22.20 2 12
2 536367 278.73 12 83
3 536368 70.05 4 15
4 536369 17.85 1 3
df = pd.read_sql_query("SELECT * FROM order_items", con=engine)
df.head()
invoice sku description qty order_date unit_price customer_id country line_price
0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2010-12-01 08:26:00 2.55 17850.0 United Kingdom 15.30
1 536365 71053 WHITE METAL LANTERN 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 2010-12-01 08:26:00 2.75 17850.0 United Kingdom 22.00
3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
df = pd.read_sql_query("SELECT * FROM products", con=engine)
df.head()
sku units avg_units revenue orders customers
0 10002 1037 14.205479 759.89 73 40
1 10080 495 20.625000 119.09 24 19
2 10120 193 6.433333 40.53 29 25
3 10123C -13 -3.250000 3.25 4 3
4 10123G -38 -38.000000 0.00 1 0

SELECT specific columns FROM an SQL table

The asterisk operator returns all columns from a database table, but you’ll often want to select only specific columns be returned by your query. To do this, you just pass a comma separated value list of column names instead of *. The order of the columns dictates the order in which they’ll appear in the SQL result set or dataframe.

df = pd.read_sql_query("SELECT customer_id, revenue, orders FROM customers", con=engine)
df.head()
customer_id revenue orders
0 14646.0 279489.02 77
1 18102.0 256438.49 62
2 17450.0 187482.17 55
3 14911.0 132572.62 248
4 12415.0 123725.45 26

Using WHERE in an SQL statement

The very simple SQL statement we wrote above doesn’t include any clauses to tell MySQL to filter the results. To select only specific rows from the database you need to use the WHERE clause. This takes a column name, i.e. invoice, and a column value to find, i.e. 536365. Running the statement SELECT * FROM order_items WHERE invoice = 536365 will, therefore, return all the order_items columns from invoice 536365.

df = pd.read_sql_query("SELECT * FROM order_items WHERE invoice = 536365", con=engine)
df.head()
invoice sku description qty order_date unit_price customer_id country line_price
0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2010-12-01 08:26:00 2.55 17850.0 United Kingdom 15.30
1 536365 71053 WHITE METAL LANTERN 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 2010-12-01 08:26:00 2.75 17850.0 United Kingdom 22.00
3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34

Using multiple operators in a WHERE clause using AND

If you want to create a more complex database query, you’ll often need to combine multiple clauses using AND. The below query will select all columns from the order_items table WHERE the country is United Kingdom and the order_date is greater than 2010-10-01 and the qty of units was greater than 10.

query = """
SELECT * 
FROM order_items 
WHERE country = 'United Kingdom'
AND order_date > '2010-10-01'
AND qty > 10
"""
df = pd.read_sql_query(query, con=engine)
df.head()
invoice sku description qty order_date unit_price customer_id country line_price
0 536367 84879 ASSORTED COLOUR BIRD ORNAMENT 32 2010-12-01 08:34:00 1.69 13047.0 United Kingdom 54.08
1 536371 22086 PAPER CHAIN KIT 50'S CHRISTMAS 80 2010-12-01 09:00:00 2.55 13748.0 United Kingdom 204.00
2 536374 21258 VICTORIAN SEWING BOX LARGE 32 2010-12-01 09:09:00 10.95 15100.0 United Kingdom 350.40
3 536376 22114 HOT WATER BOTTLE TEA AND SYMPATHY 48 2010-12-01 09:32:00 3.45 15291.0 United Kingdom 165.60
4 536376 21733 RED HANGING HEART T-LIGHT HOLDER 64 2010-12-01 09:32:00 2.55 15291.0 United Kingdom 163.20

Using LIKE to return values containing a string

Sometimes, you might want to perform a fuzzy match and return only items that match a partial string. For example, we might want to look at all the products in the order_items table that contain the word white in their product name held in the description column. To do this we can use LIKE along with the % symbol. By putting a percentage symbol either side of the value we’ll return any products that contain the word white.

query = """
SELECT * 
FROM order_items 
WHERE description LIKE '%white%'
"""
df = pd.read_sql_query(query, con=engine)
df.head()
invoice sku description qty order_date unit_price customer_id country line_price
0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2010-12-01 08:26:00 2.55 17850.0 United Kingdom 15.30
1 536365 71053 WHITE METAL LANTERN 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
2 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
3 536373 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2010-12-01 09:02:00 2.55 17850.0 United Kingdom 15.30
4 536373 71053 WHITE METAL LANTERN 6 2010-12-01 09:02:00 3.39 17850.0 United Kingdom 20.34

Using LIKE to return values starting with a string

Wrapping two percentage symbols around a string will return all values that contain the string. However, sometimes you might only want to return values that start with a string. To do this, you simply place the percentage symbol at the end.

query = """
SELECT * 
FROM order_items 
WHERE description LIKE 'white%'
"""
df = pd.read_sql_query(query, con=engine)
df.head()
invoice sku description qty order_date unit_price customer_id country line_price
0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2010-12-01 08:26:00 2.55 17850.0 United Kingdom 15.30
1 536365 71053 WHITE METAL LANTERN 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
2 536373 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2010-12-01 09:02:00 2.55 17850.0 United Kingdom 15.30
3 536373 71053 WHITE METAL LANTERN 6 2010-12-01 09:02:00 3.39 17850.0 United Kingdom 20.34
4 536375 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2010-12-01 09:32:00 2.55 17850.0 United Kingdom 15.30

Using LIKE to return values ending with a string

Similarly, to select values that end with a value, you can use LIKE and place the % symbol at the start of the string. This will return only those items that end with the provided value.

query = """
SELECT * 
FROM order_items 
WHERE description LIKE '%white'
"""
df = pd.read_sql_query(query, con=engine)
df.head()
invoice sku description qty order_date unit_price customer_id country line_price
0 536378 85099C JUMBO BAG BAROQUE BLACK WHITE 10 2010-12-01 09:37:00 1.95 14688.0 United Kingdom 19.50
1 536386 85099C JUMBO BAG BAROQUE BLACK WHITE 100 2010-12-01 09:57:00 1.65 16029.0 United Kingdom 165.00
2 536464 85099C JUMBO BAG BAROQUE BLACK WHITE 1 2010-12-01 12:23:00 1.95 17968.0 United Kingdom 1.95
3 536488 22376 AIRLINE BAG VINTAGE JET SET WHITE 1 2010-12-01 12:31:00 4.25 17897.0 United Kingdom 4.25
4 536529 22222 CAKE PLATE LOVEBIRD WHITE 3 2010-12-01 13:20:00 4.95 14237.0 United Kingdom 14.85

Using NOT LIKE to return values that don’t match a string

To obtain items that do not contain a given value you can use NOT LIKE. Here’s a simple example that returns all products that don’t contain the word white in their product name. You can, of course, use the above approach and return items that start with a string, or end with a string, simply by using the percentage symbol at either end, rather than both ends.

query = """
SELECT * 
FROM order_items 
WHERE description NOT LIKE '%white%'
"""
df = pd.read_sql_query(query, con=engine)
df.head()
invoice sku description qty order_date unit_price customer_id country line_price
0 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 2010-12-01 08:26:00 2.75 17850.0 United Kingdom 22.00
1 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
2 536365 22752 SET 7 BABUSHKA NESTING BOXES 2 2010-12-01 08:26:00 7.65 17850.0 United Kingdom 15.30
3 536365 21730 GLASS STAR FROSTED T-LIGHT HOLDER 6 2010-12-01 08:26:00 4.25 17850.0 United Kingdom 25.50
4 536366 22633 HAND WARMER UNION JACK 6 2010-12-01 08:28:00 1.85 17850.0 United Kingdom 11.10

Matt Clarke, Tuesday, April 12, 2022

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.