When you create an SQL SELECT
statement, the data probably won’t be returned or sorted in the order you want, so the ORDER BY
clause is used to control the sort order of the results returned by the query.
The ORDER BY
clause is really easy to use and can allow you to sort an SQL resultset in ascending or descending order, using one or more columns. Here’s a quick guide to using it.
I’m connecting to a MySQL Docker container into which I’ve imported a MySQL SQL database dump based on the Online Retail Dataset. We’ll be using SQLAlchemy and PyMYSQL to connect to the database using Pandas.
Since we’re querying MySQL from within a Jupyter notebook, as many data scientists do in their daily work, you’ll first need to open Jupyter and install the required pymysql
and sqlalchemy
packages using the Pip Python package manager. Then, you’ll need to import pandas
and the create_engine
module from sqlalchemy
.
!pip3 install pymysql
!pip3 install sqlalchemy
import pandas as pd
from sqlalchemy import create_engine
Next we’ll construct a database connection string to pass to create_engine()
. This tells SQLAlchemy we’re connecting to a MySQL database using the PyMYSQL driver, and are using the username root
and the password *****
and are connecting to the MySQL server on port 3306 of IP address 172.17.0.3. The final argument tells create_engine()
to connect to the ecommerce
database.
engine = create_engine('mysql+pymysql://root:*****@172.17.0.3:3306/ecommerce')
First we’ll create a simple unordered query using a regular SQL SELECT
statement. This orders the data according to whatever index is present on the database, which might not match our preferred formatting.
df = pd.read_sql_query("SELECT * FROM order_items", con=engine)
df.head()
invoice | sku | description | qty | order_date | unit_price | customer_id | country | line_price | |
---|---|---|---|---|---|---|---|---|---|
0 | 536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 2010-12-01 08:26:00 | 2.55 | 17850.0 | United Kingdom | 15.30 |
1 | 536365 | 71053 | WHITE METAL LANTERN | 6 | 2010-12-01 08:26:00 | 3.39 | 17850.0 | United Kingdom | 20.34 |
2 | 536365 | 84406B | CREAM CUPID HEARTS COAT HANGER | 8 | 2010-12-01 08:26:00 | 2.75 | 17850.0 | United Kingdom | 22.00 |
3 | 536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 2010-12-01 08:26:00 | 3.39 | 17850.0 | United Kingdom | 20.34 |
4 | 536365 | 84029E | RED WOOLLY HOTTIE WHITE HEART. | 6 | 2010-12-01 08:26:00 | 3.39 | 17850.0 | United Kingdom | 20.34 |
To sort the SQL resultset we can append the ORDER BY
clause to the end of the SELECT
statement and define the column by which we want to sort and the sort order, which can either by DESC
for descending order, or ASC
for ascending order.
Our query SELECT * FROM order_items ORDER BY qty DESC
will sort the resultset by the qty
column in descending order, so the highest quantities are shown first and the values decrease down the list.
df = pd.read_sql_query("SELECT * FROM order_items ORDER BY qty DESC", con=engine)
df.head()
invoice | sku | description | qty | order_date | unit_price | customer_id | country | line_price | |
---|---|---|---|---|---|---|---|---|---|
0 | 581483 | 23843 | PAPER CRAFT , LITTLE BIRDIE | 80995 | 2011-12-09 09:15:00 | 2.08 | 16446.0 | United Kingdom | 168469.6 |
1 | 541431 | 23166 | MEDIUM CERAMIC TOP STORAGE JAR | 74215 | 2011-01-18 10:01:00 | 1.04 | 12346.0 | United Kingdom | 77183.6 |
2 | 578841 | 84826 | ASSTD DESIGN 3D PAPER STICKERS | 12540 | 2011-11-25 15:57:00 | 0.00 | 13256.0 | United Kingdom | 0.0 |
3 | 542504 | 37413 | None | 5568 | 2011-01-28 12:03:00 | 0.00 | NaN | United Kingdom | 0.0 |
4 | 573008 | 84077 | WORLD WAR 2 GLIDERS ASSTD DESIGNS | 4800 | 2011-10-27 12:26:00 | 0.21 | 12901.0 | United Kingdom | 1008.0 |
To sort items in ascending order, you simply replace the DESC
argument with ASC
. Our query below will sort items from the lowest unit_price
to the highest.
df = pd.read_sql_query("SELECT * FROM order_items ORDER BY unit_price ASC", con=engine)
df.head()
invoice | sku | description | qty | order_date | unit_price | customer_id | country | line_price | |
---|---|---|---|---|---|---|---|---|---|
0 | A563186 | B | Adjust bad debt | 1 | 2011-08-12 14:51:00 | -11062.06 | NaN | United Kingdom | -11062.06 |
1 | A563187 | B | Adjust bad debt | 1 | 2011-08-12 14:52:00 | -11062.06 | NaN | United Kingdom | -11062.06 |
2 | 536414 | 22139 | None | 56 | 2010-12-01 11:52:00 | 0.00 | NaN | United Kingdom | 0.00 |
3 | 536545 | 21134 | None | 1 | 2010-12-01 14:32:00 | 0.00 | NaN | United Kingdom | 0.00 |
4 | 536546 | 22145 | None | 1 | 2010-12-01 14:33:00 | 0.00 | NaN | United Kingdom | 0.00 |
The ORDER BY column_name ASC
statement also works for non-numeric or categorical data, such as strings, allowing you to put them in alphabetical order from A to Z, or vice versa if you use DESC
.
df = pd.read_sql_query("SELECT * FROM order_items ORDER BY description ASC", con=engine)
df.head()
invoice | sku | description | qty | order_date | unit_price | customer_id | country | line_price | |
---|---|---|---|---|---|---|---|---|---|
0 | 536414 | 22139 | None | 56 | 2010-12-01 11:52:00 | 0.0 | NaN | United Kingdom | 0.0 |
1 | 536545 | 21134 | None | 1 | 2010-12-01 14:32:00 | 0.0 | NaN | United Kingdom | 0.0 |
2 | 536546 | 22145 | None | 1 | 2010-12-01 14:33:00 | 0.0 | NaN | United Kingdom | 0.0 |
3 | 536547 | 37509 | None | 1 | 2010-12-01 14:33:00 | 0.0 | NaN | United Kingdom | 0.0 |
4 | 536549 | 85226A | None | 1 | 2010-12-01 14:34:00 | 0.0 | NaN | United Kingdom | 0.0 |
Finally, you can also sort by multiple columns by passing them as a list of comma separated values. The below query will sort the resultset first in ascending order of the description
name (i.e. in alphabetical order) and then in ascending order of the order_date
, so the earliest purchases are first.
query = """
SELECT
* FROM order_items
ORDER BY
description ASC,
order_date ASC
"""
df = pd.read_sql_query(query, con=engine)
df.head()
invoice | sku | description | qty | order_date | unit_price | customer_id | country | line_price | |
---|---|---|---|---|---|---|---|---|---|
0 | 536414 | 22139 | None | 56 | 2010-12-01 11:52:00 | 0.0 | NaN | United Kingdom | 0.0 |
1 | 536545 | 21134 | None | 1 | 2010-12-01 14:32:00 | 0.0 | NaN | United Kingdom | 0.0 |
2 | 536546 | 22145 | None | 1 | 2010-12-01 14:33:00 | 0.0 | NaN | United Kingdom | 0.0 |
3 | 536547 | 37509 | None | 1 | 2010-12-01 14:33:00 | 0.0 | NaN | United Kingdom | 0.0 |
4 | 536549 | 85226A | None | 1 | 2010-12-01 14:34:00 | 0.0 | NaN | United Kingdom | 0.0 |
Matt Clarke, Thursday, April 14, 2022