SQL is one of the most widely used languages in data science so it’s important to know at least the basics required in order to fetch the data you’ll need to use in analyses or models.
In this example, I’ll show you how you can import a MySQL database into a MySQL Docker container and create a connection that allows you to start learning the SQL queries you’ll need to know for basic data science projects.
For this tutorial, I’ll assume you already have MySQL installed and setup. If you’ve not done this yet, please follow my simple step-by-step guide to getting MySQL up and running on Docker in under five minutes. It’s really easy to follow.
Use the docker exec -it docker_mysql bash
command to open up the Bash shell on your Docker machine in your terminal, where docker_mysql
is the name you gave your container. Once you’ve got your shell, enter mysql -uroot -p
and then login to MySQL using the password you created when you set it up.
docker exec -it docker_mysql bash
mysql -uroot -p
Next, in your MySQL terminal enter CREATE DATABASE ecommerce;
and hit enter. This will create a new MySQL database called ecommerce
into which we can now import our database. If the command works, you’ll get a message back saying Query OK, 1 row affected (0.00 sec)
, which means you now have an empty database.
CREATE DATABASE ecommerce;
You can create your own database tables if you wish, but a more common approach is to import an existing SQL database from an SQL database dump file. I’ve created a simple SQL database from the Online Retail Dataset that is ideal for learning SQL queries, so we’ll import that next.
First, download the file from my GitHub repository, then unzip it. You’ll find inside a file called ecommerce.sql
that includes the required queries to create the SQL database tables and insert some test data. Use the cd
command to change the directory to Downloads
(or wherever you saved the file to) and use ls
to list the files to check that ecommerce.sql
is present.
cd Downloads
ls
Next we’ll use the docker exec
command to run a command from our terminal on the docker_mysql
container. We’ll login to docker_mysql
, then login to mysql
with our password, tell it to use the ecommerce
database and then tell it to import the ecommerce.sql
file. After a few seconds of crunching the data, you should see a message telling you the import was successful.
docker exec -i docker_mysql mysql -uroot -pPASSWORD ecommerce < ecommerce.sql
To access the database we need to know it’s IP address so we can configure a database connection. Enter the command docker inspect docker_mysql
in your terminal and Docker will return a huge amount of variables on your Docker container and environment. The one you need to find is called IPAddress
and is located near the bottom of the file. My container has the IP address 172.17.0.3
. Find yours and make a note of it.
docker inspect docker_mysql "Gateway": "172.17.0.1",
"IPAddress": "172.17.0.3",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "02:42:ac:11:00:03",
"DriverOpts": null
Finally, open a Jupyter notebook and install the sqlalchemy
and pymysql
packages using the Pip package manager Then import pandas
and the create_engine
module from sqlalchemy
. These will be used to connect to the MySQL server running on your Docker container.
!pip3 install sqlalchemy
!pip3 install pymysql
import pandas as pd
from sqlalchemy import create_engine
Next we’ll create a connection to the MySQL Docker using create_engine()
. We’ll need to tell it that the database is mysql
and we want to use pymysql
for the connection, then we need to provide the username and password as root:PASSWORD
where PASSWORD
is the password you created. That’s followed by the IP address we obtained above, the standard port 3306 MySQL uses, and finally the name of the database ecommerce
.
engine = create_engine('mysql+pymysql://root:********@172.17.0.3:3306/ecommerce')
You can then use the Pandas read_sql()
function to run an SQL query on the database using the connection above that we’ve stored in engine
.
df = pd.read_sql("SELECT * FROM orders", con=engine)
df.head()
invoice | revenue | skus | items | |
---|---|---|---|---|
0 | 536365 | 139.12 | 7 | 40 |
1 | 536366 | 22.20 | 2 | 12 |
2 | 536367 | 278.73 | 12 | 83 |
3 | 536368 | 70.05 | 4 | 15 |
4 | 536369 | 17.85 | 1 | 3 |
Matt Clarke, Tuesday, April 12, 2022