How to use string functions in SQL statements

SQL string functions and operators allow you to manipulate strings in SQL statements and are very useful for data analysis. Here's how to use them.

How to use string functions in SQL statements
Picture by Susanne Jutzeler, Pexels.
19 minutes to read

MySQL includes a large number of string functions and operators that you can use in SQL statements to both query data and reformat column values. In this simple example, I’ll show you the main types of MySQL string functions and operators that you can use in SQL queries.

You can, of course, do this on any MySQL database to which you have access, and you can write your SQL statements directly into the MySQL terminal or into a SQL client, such as Beekeeper Studio. However, I’m doing what most data scientists will be doing, and passing my queries to MySQL using Pandas via a Jupyter notebook.

Setting up your MySQL server

I run my MySQL server in a Docker container and have created a test database based on the widely used Online Retail Dataset. I’ve created a couple of step-by-step guides that you can follow to create your own MySQL Docker container and import a MySQL database so you can follow along on the very same data. It only takes a few minutes.

Install and load the packages

Once you’ve got your MySQL server setup, open up a Jupyter notebook and install the SQLAlchemy and PyMySQL Python packages. We’ll be using these to connect to MySQL and pass in database queries using Pandas. Once you’ve installed the packages using the Pip package manager, import pandas and the create_engine module from sqlalchemy. !pip3 install sqlalchemy !pip3 install pymysql

import pandas as pd
from sqlalchemy import create_engine

Create a database connection

Next, we need to configure the connection to our MySQL server by creating a connection string to pass to the create_engine() function. This will vary according to your own setup. I’m connecting to a MySQL server with the PyMySQL driver, using the username root and the password ***** on port 3306 of the IP address 172.17.0.3. The final argument tells create_engine() to connect to the ecommerce database.

engine = create_engine('mysql+pymysql://root:*****@172.17.0.3:3306/ecommerce')

SELECT FROM an SQL table

To test the connection to the MySQL server we’ll first write a simple MySQL SELECT statement to return everything from the order_items table.

df = pd.read_sql_query("SELECT * FROM order_items", con=engine)
df.head()
invoice sku description qty order_date unit_price customer_id country line_price
0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2010-12-01 08:26:00 2.55 17850.0 United Kingdom 15.30
1 536365 71053 WHITE METAL LANTERN 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 2010-12-01 08:26:00 2.75 17850.0 United Kingdom 22.00
3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34

Convert a string to lowercase using LOWER()

Now we can move onto looking at the MySQL string functions and operators. The first one we’ll try is called LOWER(), which is a synonym of the LCASE() function. As the name suggests, it converts a string to lowercase. We’ll use AS to define the column name, otherwise MySQL will name the column LOWER(description).

query = """
SELECT 
    LOWER(description) AS description
FROM order_items
"""
df = pd.read_sql_query(query, con=engine)
df.head()
description
0 white hanging heart t-light holder
1 white metal lantern
2 cream cupid hearts coat hanger
3 knitted union flag hot water bottle
4 red woolly hottie white heart.

Convert a string to uppercase using UPPER()

As you might imagine, there’s also a corresponding string function in MySQL called UPPER() which is a synonym of UCASE() that we can use to convert a string to uppercase characters.

query = """
SELECT 
    UPPER(country) AS country
FROM order_items
"""
df = pd.read_sql_query(query, con=engine)
df.head()
country
0 UNITED KINGDOM
1 UNITED KINGDOM
2 UNITED KINGDOM
3 UNITED KINGDOM
4 UNITED KINGDOM

Start a string with a capital letter

Unfortunately, there is no MySQL function to convert a string to title case, where each word starts with a capital letter (rather like the ucwords() function of PHP). However, you can write custom functions to perform this. Converting a sentence to start with an uppercase letter followed by lowercase letters is slightly easier. In the example below, we’re using a mixture of CONCAT(), LEFT(), and SUBSTRING() to do this for us.

query = """
SELECT 
    CONCAT(UPPER(LEFT(description, 1)), 
           LOWER(SUBSTRING(description, 2))
           ) AS description
FROM order_items
"""
df = pd.read_sql_query(query, con=engine)
df.head()
description
0 White hanging heart t-light holder
1 White metal lantern
2 Cream cupid hearts coat hanger
3 Knitted union flag hot water bottle
4 Red woolly hottie white heart.

Count the number of characters in a string

To count the number of characters in a string stored in a MySQL column you can use the CHAR_LENGTH() function. This will count the characters in the word and return the value in the new column.

query = """
SELECT 
    description, 
    CHAR_LENGTH(description) AS characters
FROM order_items
"""
df = pd.read_sql_query(query, con=engine)
df.head()
description characters
0 WHITE HANGING HEART T-LIGHT HOLDER 34.0
1 WHITE METAL LANTERN 19.0
2 CREAM CUPID HEARTS COAT HANGER 30.0
3 KNITTED UNION FLAG HOT WATER BOTTLE 35.0
4 RED WOOLLY HOTTIE WHITE HEART. 30.0

Return the length of a string in bytes using LENGTH()

If you want to find the length of a string in bytes, you can use the MySQL LENGTH() function.

query = """
SELECT 
    description, 
    LENGTH(description) AS bytes
FROM order_items
"""
df = pd.read_sql_query(query, con=engine)
df.head()
description bytes
0 WHITE HANGING HEART T-LIGHT HOLDER 34.0
1 WHITE METAL LANTERN 19.0
2 CREAM CUPID HEARTS COAT HANGER 30.0
3 KNITTED UNION FLAG HOT WATER BOTTLE 35.0
4 RED WOOLLY HOTTIE WHITE HEART. 30.0

Concatenate or join column values with CONCAT()

Another really common thing you’ll want to do in MySQL queries is join words together via a process known as concatenation. The CONCAT() function and the related CONCAT_WS() function, both allow you to concatenate one or more values together. These could be different column values that you want to combine in a single column, such as the firstname and lastname of a person. You can also include a separator with the values, so they don’t form a continuous string, if you wish.

query = """
SELECT 
    sku,
    description,
    CONCAT(sku, ' ', description) AS sku_description
FROM order_items
"""
df = pd.read_sql_query(query, con=engine)
df.head()
sku description sku_description
0 85123A WHITE HANGING HEART T-LIGHT HOLDER 85123A WHITE HANGING HEART T-LIGHT HOLDER
1 71053 WHITE METAL LANTERN 71053 WHITE METAL LANTERN
2 84406B CREAM CUPID HEARTS COAT HANGER 84406B CREAM CUPID HEARTS COAT HANGER
3 84029G KNITTED UNION FLAG HOT WATER BOTTLE 84029G KNITTED UNION FLAG HOT WATER BOTTLE
4 84029E RED WOOLLY HOTTIE WHITE HEART. 84029E RED WOOLLY HOTTIE WHITE HEART.

Replace a value in a string with REPLACE()

The REPLACE() function is used to replace one value with another, using a simple find and replace rule. For example, let’s say you want to replace occurrences of WHITE with White. Note that the REPLACE() function is case sensitive.

query = """
SELECT 
    sku,
    description,
    REPLACE(description, 'WHITE', 'White') AS description2
FROM order_items
"""
df = pd.read_sql_query(query, con=engine)
df.head()
sku description description2
0 85123A WHITE HANGING HEART T-LIGHT HOLDER White HANGING HEART T-LIGHT HOLDER
1 71053 WHITE METAL LANTERN White METAL LANTERN
2 84406B CREAM CUPID HEARTS COAT HANGER CREAM CUPID HEARTS COAT HANGER
3 84029G KNITTED UNION FLAG HOT WATER BOTTLE KNITTED UNION FLAG HOT WATER BOTTLE
4 84029E RED WOOLLY HOTTIE WHITE HEART. RED WOOLLY HOTTIE White HEART.

Perform a case insensitive REPLACE() using REGEX_REPLACE()

Since the default REPLACE() function is case sensitive, it’s often better to use the alternative REGEX_REPLACE() function instead, since this can catch all variations and it’s case insensitive. It has some additional features, but for a simple case insenstive find and replace you can use it just like REPLACE(). Here, we’re looking for the string white (which doesn’t occur, as our data is set to WHITE) and we’re replacing it with XXXXX.

query = """
SELECT 
    sku,
    description,
    REGEXP_REPLACE(description, 'white', 'XXXXX') AS description2
FROM order_items
"""
df = pd.read_sql_query(query, con=engine)
df.head()
sku description description2
0 85123A WHITE HANGING HEART T-LIGHT HOLDER XXXXX HANGING HEART T-LIGHT HOLDER
1 71053 WHITE METAL LANTERN XXXXX METAL LANTERN
2 84406B CREAM CUPID HEARTS COAT HANGER CREAM CUPID HEARTS COAT HANGER
3 84029G KNITTED UNION FLAG HOT WATER BOTTLE KNITTED UNION FLAG HOT WATER BOTTLE
4 84029E RED WOOLLY HOTTIE WHITE HEART. RED WOOLLY HOTTIE XXXXX HEART.

Remove trailing spaces from a string with RTRIM()

Sometimes, values stored in databases aren’t properly sanitised and may contain trailing or leading whitespace. MySQL has a couple of different string functions for handling this kind of data. The RTRIM() function removes any trailing whitespace from the right hand side of a string if it is present.

query = """
SELECT 
    sku,
    RTRIM(description) AS description
FROM order_items
"""
df = pd.read_sql_query(query, con=engine)
df.head()
sku description
0 85123A WHITE HANGING HEART T-LIGHT HOLDER
1 71053 WHITE METAL LANTERN
2 84406B CREAM CUPID HEARTS COAT HANGER
3 84029G KNITTED UNION FLAG HOT WATER BOTTLE
4 84029E RED WOOLLY HOTTIE WHITE HEART.

Remove leading spaces from a string with LTRIM()

The LTRIM() function is used to remove leading whitespace from the left hand side of a string if it exists.

query = """
SELECT 
    sku,
    LTRIM(description) AS description
FROM order_items
"""
df = pd.read_sql_query(query, con=engine)
df.head()
sku description
0 85123A WHITE HANGING HEART T-LIGHT HOLDER
1 71053 WHITE METAL LANTERN
2 84406B CREAM CUPID HEARTS COAT HANGER
3 84029G KNITTED UNION FLAG HOT WATER BOTTLE
4 84029E RED WOOLLY HOTTIE WHITE HEART.

Reverse the characters in a string with REVERSE()

The REVERSE() function isn’t one I use very often, but can reverse the orders of characters in a string.

query = """
SELECT 
    sku,
    REVERSE(description) AS description
FROM order_items
"""
df = pd.read_sql_query(query, con=engine)
df.head()
sku description
0 85123A REDLOH THGIL-T TRAEH GNIGNAH ETIHW
1 71053 NRETNAL LATEM ETIHW
2 84406B REGNAH TAOC STRAEH DIPUC MAERC
3 84029G ELTTOB RETAW TOH GALF NOINU DETTINK
4 84029E .TRAEH ETIHW EITTOH YLLOOW DER

Return the Soundex of a string

The SOUNDEX() string function returns the Soundex score or value for a string. Soundex is an algorithm that returns a hash value showing how the string sounds - strings that sound the same have identical Soundex values.

query = """
SELECT 
    sku,
    description,
    SOUNDEX(description) AS soundex_value
FROM order_items
"""
df = pd.read_sql_query(query, con=engine)
df.head()
sku description soundex_value
0 85123A WHITE HANGING HEART T-LIGHT HOLDER W3525263423436
1 71053 WHITE METAL LANTERN W35345365
2 84406B CREAM CUPID HEARTS COAT HANGER C652136323526
3 84029G KNITTED UNION FLAG HOT WATER BOTTLE K53514236134
4 84029E RED WOOLLY HOTTIE WHITE HEART. R34363

Matt Clarke, Thursday, April 14, 2022

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.