SQL is one of the most widely used languages in data science so it’s important to know at least the basics required in order to fetch the data you’ll need to use in analyses or models.
In this example, I’ll show you how you can import a MySQL database into a MySQL Docker container and create a connection that allows you to start learning the SQL queries you’ll need to know for basic data science projects.
For this tutorial, I’ll assume you already have MySQL installed and setup. If you’ve not done this yet, please follow my simple step-by-step guide to getting MySQL up and running on Docker in under five minutes. It’s really easy to follow.
docker exec -it docker_mysql bash command to open up the Bash shell on your Docker machine in your terminal, where
docker_mysql is the name you gave your container. Once you’ve got your shell, enter
mysql -uroot -p and then login to MySQL using the password you created when you set it up.
docker exec -it docker_mysql bash mysql -uroot -p
Next, in your MySQL terminal enter
CREATE DATABASE ecommerce; and hit enter. This will create a new MySQL database called
ecommerce into which we can now import our database. If the command works, you’ll get a message back saying
Query OK, 1 row affected (0.00 sec), which means you now have an empty database.
CREATE DATABASE ecommerce;
You can create your own database tables if you wish, but a more common approach is to import an existing SQL database from an SQL database dump file. I’ve created a simple SQL database from the Online Retail Dataset that is ideal for learning SQL queries, so we’ll import that next.
First, download the file from my GitHub repository, then unzip it. You’ll find inside a file called
ecommerce.sql that includes the required queries to create the SQL database tables and insert some test data. Use the
cd command to change the directory to
Downloads (or wherever you saved the file to) and use
ls to list the files to check that
ecommerce.sql is present.
cd Downloads ls
Next we’ll use the
docker exec command to run a command from our terminal on the
docker_mysql container. We’ll login to
docker_mysql, then login to
mysql with our password, tell it to use the
ecommerce database and then tell it to import the
ecommerce.sql file. After a few seconds of crunching the data, you should see a message telling you the import was successful.
docker exec -i docker_mysql mysql -uroot -pPASSWORD ecommerce < ecommerce.sql
To access the database we need to know it’s IP address so we can configure a database connection. Enter the command
docker inspect docker_mysql in your terminal and Docker will return a huge amount of variables on your Docker container and environment. The one you need to find is called
IPAddress and is located near the bottom of the file. My container has the IP address
172.17.0.3. Find yours and make a note of it.
docker inspect docker_mysql "Gateway": "172.17.0.1", "IPAddress": "172.17.0.3", "IPPrefixLen": 16, "IPv6Gateway": "", "GlobalIPv6Address": "", "GlobalIPv6PrefixLen": 0, "MacAddress": "02:42:ac:11:00:03", "DriverOpts": null
Finally, open a Jupyter notebook and install the
pymysql packages using the Pip package manager Then import
pandas and the
create_engine module from
sqlalchemy. These will be used to connect to the MySQL server running on your Docker container.
!pip3 install sqlalchemy !pip3 install pymysql
import pandas as pd from sqlalchemy import create_engine
Next we’ll create a connection to the MySQL Docker using
create_engine(). We’ll need to tell it that the database is
mysql and we want to use
pymysql for the connection, then we need to provide the username and password as
PASSWORD is the password you created. That’s followed by the IP address we obtained above, the standard port 3306 MySQL uses, and finally the name of the database
engine = create_engine('mysql+pymysql://root:********@172.17.0.3:3306/ecommerce')
You can then use the Pandas
read_sql() function to run an SQL query on the database using the connection above that we’ve stored in
df = pd.read_sql("SELECT * FROM orders", con=engine) df.head()
Matt Clarke, Tuesday, April 12, 2022