Many MySQL databases are configured to accept connections from other servers on the local network and will reject connections from remote machines. Ordinarily, you could work around this by creating an SSH tunnel connection to login to the machine and then open a database connection and map the database port to the one on your local machine. But how can you do that from a Python script?
Well, thanks to the Python packages sshtunnel
and pymysql
you can create an SSH connection inside your Python script and connect to a remote MySQL server securely and return your query results in a Pandas dataframe. Here’s how it’s done.
Open a Python document or Jupyter notebook and load the pandas
, pymysql
, logging
, and sshtunnel
packages, including the SSHTunnelForwarder
module from sshtunnel
. You can install these using pip3 install package-name
if they’re not already on your machine.
import pandas as pd
import pymysql
import logging
import sshtunnel
from sshtunnel import SSHTunnelForwarder
Next, create some configuration variables to hold your credentials. You’ll need the IP address of your SSH server, plus the username and password. You’ll also need the MySQL username, password, and database name.
ssh_host = '1.2.3.4'
ssh_username = 'bob'
ssh_password = 'SecretPassword'
database_username = 'bob_smith'
database_password = 'SecretDBPassword'
database_name = 'ecommerce'
localhost = '127.0.0.1'
To make the final code cleaner and allow us to re-use it, we’ll create some functions to handle each task. Our first function uses the sshtunnel
package SSHTunnelForwarder
module to create an SSH tunnel and port forwards it to a port on the local machine, then it starts the connection. You can use the verbose=True
flag to turn on debugging to check everything is working.
def open_ssh_tunnel(verbose=False):
"""Open an SSH tunnel and connect using a username and password.
:param verbose: Set to True to show logging
:return tunnel: Global SSH tunnel connection
"""
if verbose:
sshtunnel.DEFAULT_LOGLEVEL = logging.DEBUG
global tunnel
tunnel = SSHTunnelForwarder(
(ssh_host, 22),
ssh_username = ssh_username,
ssh_password = ssh_password,
remote_bind_address = ('127.0.0.1', 3306)
)
tunnel.start()
The next function uses PyMySQL to create a connection to the MySQL server. This will be run after the SSH tunnel has
been opened. The MySQL port will be mapped to the local_bind_port
from the tunnel
object and stored in the global connection
.
def mysql_connect():
"""Connect to a MySQL server using the SSH tunnel connection
:return connection: Global MySQL database connection
"""
global connection
connection = pymysql.connect(
host='127.0.0.1',
user=database_username,
passwd=database_password,
db=database_name,
port=tunnel.local_bind_port
)
To run a database query, we can create a wrapper to the Pandas read_sql_query()
function. This takes a SQL query
and the global connection
and returns the MySQL result set in a Pandas dataframe.
def run_query(sql):
"""Runs a given SQL query via the global database connection.
:param sql: MySQL query
:return: Pandas dataframe containing results
"""
return pd.read_sql_query(sql, connection)
Finally, we need a couple of functions to tidy up when we’re done. The first one disconnects from the MySQL server by using the close()
function, while the second one closes the SSH tunnel.
def mysql_disconnect():
"""Closes the MySQL database connection.
"""
connection.close()
def close_ssh_tunnel():
"""Closes the SSH tunnel connection.
"""
tunnel.close
Finally, we can run the commands in the following order. First, we’ll connect to the remote server via SSH and create an SSH tunnel, then we’ll connect to the MySQL database on the remote server using the tunnel, then we’ll run the query and return a Pandas dataframe containing the result.
When that’s run, we’ll then disconnect from MySQL and close the SSH tunnel. You can also use similar code to insert data into a MySQL database using PyMySQL.
open_ssh_tunnel()
mysql_connect()
df = run_query("SELECT * FROM orders ORDER BY id DESC LIMIT 100")
df.head()
mysql_disconnect()
close_ssh_tunnel()
Matt Clarke, Thursday, March 04, 2021