Pip is a command line application that allows you to install, upgrade, and remove Python packages from your development environment using simple commands. It works just like the Aptitude or apt package manager you have probably used on your Ubuntu Linux data science workstation in that it allows you to fetch remote packages from a package repository and install them to your local machine, or a specific Python virtual environment.
Using the Pip package management system means that you can use third party packages in your project and install and import packages and code written by others to speed up your development processes. It also means that you can share your code with colleagues and they’ll be able to install the same packages you used, without the need to visit individual websites hosting the code. Pip is quick and easy to use and a vital basic skill for all data scientists.
To check which version of Python and Pip you are running you can call the application name with the --version
flag. Most people are going to be running Python 3 now, so the commands to use are python3
and pip3
, but those on Python 2 will need to use python
and pip
instead. If you’re running your code in a Jupyter notebook you’ll need to prefix your commands with !
.
python3 --version
Python 3.7.10
pip3 --version
pip 21.1.2 from /conda/envs/data-science-stack-2.9.0/lib/python3.7/site-packages/pip (python 3.7)
To upgrade Pip to the latest version you can use Pip. Running the command pip3 install --upgrade package-name
will go to the PyPi repository, check for the latest package, and install it if it is not already installed. You can check that the package has been installed correctly by re-running pip3 --version
. Depending on your setup you may need to use python -m pip3
instead of just pip3
.
pip3 install --upgrade pip
Requirement already satisfied: pip in /conda/envs/data-science-stack-2.9.0/lib/python3.7/site-packages (21.3.1)
[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv[0m
pip3 --version
pip 21.3.1 from /conda/envs/data-science-stack-2.9.0/lib/python3.7/site-packages/pip (python 3.7)
Pip installs Python packages. These are specially constructed directories of Python code designed to allow Pip to install them so you can import third party code into your projects easily. Most Python packages are stored and distributed via the PyPi Python Package Index and can be installed simply by entering the package name after pip3 install
. In the below example, we’ll install my gilfoyle
PDF generator Python package and all its dependencies.
pip3 install gilfoyle
If a Python package isn’t on PyPi, but the package author has still created it using the correct Python packaging format and hosted their code on GitHub, you can still install the package directly from there. To do this, you need to obtain the .git
file location from the GitHub repository. In the example below, we’ll install GAPandas using its GitHub repository URL.
pip3 install git+https://github.com/flyandlure/gapandas.git
More rarely, you might need to install a Python package from a source distribution or sdist
or a wheel distribution or wheel
. These are distribution files usually included in the package itself, so if you’ve downloaded the source, you can use these commands to install the package.
pip3 install mypackage-1.0.tar.gzpip install mypackage-1.0-py3-none-any.whl
As we saw earlier when upgrading Pip itself, we can use the pip3 install --upgrade
command to upgrade any install Python packages in the working environment. In the example below, I’m updating the gapandas
package to the latest version.
pip3 install --upgrade gapandas
To uninstall or delete a package you can use the uninstall
command. This will prompt you to enter y
to confirm you wish to proceed before removing the package from your Python environment.
pip3 uninstall gapandas
A requirements file, usually called requirements.txt
, specifies the list of packages that are required to run each package. This might just be the name of the package, like pandas
, or it could refer to a specific version number i.e. pandas==1.5.0
, or any version greater than a specific version number, i.e. pandas>=1.5.0
. The requirements.txt
file is created by the Package provider and is installed in the document root of the package directory. Here’s an example of a requirements.txt
file.
dropbox
pandas
mailchimp-transactional
numpy
The above requirements.txt
file defines that the dropbox
, pandas
, mailchimp-transactional
, and numpy
packages are needed to allow the given package to run. Ordinarily, when the package is installed pip
will install the packages listed in requirements.txt
so all the project dependencies are satisfied. However, during package development and testing you may need to manually run this. You can install the dependencies listed in the requirements file with the below command:
pip3 install -r requirements.txt
Matt Clarke, Thursday, December 23, 2021