How to use the Pip Python package manager

Picture by Christina Morillo, Pexels.

6 minutes to read

Data Science Python

Pip is a command line application that allows you to install, upgrade, and remove Python packages from your development environment using simple commands. It works just like the Aptitude or apt package manager you have probably used on your Ubuntu Linux data science workstation in that it allows you to fetch remote packages from a package repository and install them to your local machine, or a specific Python virtual environment.

Using the Pip package management system means that you can use third party packages in your project and install and import packages and code written by others to speed up your development processes. It also means that you can share your code with colleagues and they’ll be able to install the same packages you used, without the need to visit individual websites hosting the code. Pip is quick and easy to use and a vital basic skill for all data scientists.

Check which version of Pip is installed

To check which version of Python and Pip you are running you can call the application name with the --version flag. Most people are going to be running Python 3 now, so the commands to use are python3 and pip3, but those on Python 2 will need to use python and pip instead. If you’re running your code in a Jupyter notebook you’ll need to prefix your commands with !.

python3 --version

Python 3.7.10

pip3 --version

pip 21.1.2 from /conda/envs/data-science-stack-2.9.0/lib/python3.7/site-packages/pip (python 3.7)

Upgrade Pip to the latest version

To upgrade Pip to the latest version you can use Pip. Running the command pip3 install --upgrade package-name will go to the PyPi repository, check for the latest package, and install it if it is not already installed. You can check that the package has been installed correctly by re-running pip3 --version. Depending on your setup you may need to use python -m pip3 instead of just pip3.

pip3 install --upgrade pip

Requirement already satisfied: pip in /conda/envs/data-science-stack-2.9.0/lib/python3.7/site-packages (21.3.1)
[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv[0m

pip3 --version

pip 21.3.1 from /conda/envs/data-science-stack-2.9.0/lib/python3.7/site-packages/pip (python 3.7)

Install a Python package from PyPi using Pip

Pip installs Python packages. These are specially constructed directories of Python code designed to allow Pip to install them so you can import third party code into your projects easily. Most Python packages are stored and distributed via the PyPi Python Package Index and can be installed simply by entering the package name after pip3 install. In the below example, we’ll install my gilfoyle PDF generator Python package and all its dependencies.

pip3 install gilfoyle

Install a Python package from GitHub using Pip

If a Python package isn’t on PyPi, but the package author has still created it using the correct Python packaging format and hosted their code on GitHub, you can still install the package directly from there. To do this, you need to obtain the .git file location from the GitHub repository. In the example below, we’ll install GAPandas using its GitHub repository URL.

pip3 install git+https://github.com/flyandlure/gapandas.git

Install a Python package from a source or wheel distribution using Pip

More rarely, you might need to install a Python package from a source distribution or sdist or a wheel distribution or wheel. These are distribution files usually included in the package itself, so if you’ve downloaded the source, you can use these commands to install the package.

pip3 install mypackage-1.0.tar.gzpip install mypackage-1.0-py3-none-any.whl

Upgrade a Python package using Pip

As we saw earlier when upgrading Pip itself, we can use the pip3 install --upgrade command to upgrade any install Python packages in the working environment. In the example below, I’m updating the gapandas package to the latest version.

pip3 install --upgrade gapandas

Uninstall a Python package using Pip

To uninstall or delete a package you can use the uninstall command. This will prompt you to enter y to confirm you wish to proceed before removing the package from your Python environment.

pip3 uninstall gapandas

Install packages from a requirements file

A requirements file, usually called requirements.txt, specifies the list of packages that are required to run each package. This might just be the name of the package, like pandas, or it could refer to a specific version number i.e. pandas==1.5.0, or any version greater than a specific version number, i.e. pandas>=1.5.0. The requirements.txt file is created by the Package provider and is installed in the document root of the package directory. Here’s an example of a requirements.txt file.

dropbox
pandas
mailchimp-transactional
numpy

The above requirements.txt file defines that the dropbox, pandas, mailchimp-transactional, and numpy packages are needed to allow the given package to run. Ordinarily, when the package is installed pip will install the packages listed in requirements.txt so all the project dependencies are satisfied. However, during package development and testing you may need to manually run this. You can install the dependencies listed in the requirements file with the below command:

pip3 install -r requirements.txt

Matt Clarke, Thursday, December 23, 2021

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.