Pip is a command line application that allows you to install, upgrade, and remove Python packages from your development environment using simple commands. It works just like the Aptitude or apt package manager you have probably used on your Ubuntu Linux data science workstation in that it allows you to fetch remote packages from a package repository and install them to your local machine, or a specific Python virtual environment.
Using the Pip package management system means that you can use third party packages in your project and install and import packages and code written by others to speed up your development processes. It also means that you can share your code with colleagues and they’ll be able to install the same packages you used, without the need to visit individual websites hosting the code. Pip is quick and easy to use and a vital basic skill for all data scientists.
To check which version of Python and Pip you are running you can call the application name with the
--version flag. Most people are going to be running Python 3 now, so the commands to use are
pip3, but those on Python 2 will need to use
pip instead. If you’re running your code in a Jupyter notebook you’ll need to prefix your commands with
pip 21.1.2 from /conda/envs/data-science-stack-2.9.0/lib/python3.7/site-packages/pip (python 3.7)
To upgrade Pip to the latest version you can use Pip. Running the command
pip3 install --upgrade package-name will go to the PyPi repository, check for the latest package, and install it if it is not already installed. You can check that the package has been installed correctly by re-running
pip3 --version. Depending on your setup you may need to use
python -m pip3 instead of just
pip3 install --upgrade pip
Requirement already satisfied: pip in /conda/envs/data-science-stack-2.9.0/lib/python3.7/site-packages (21.3.1) [33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv[0m
pip 21.3.1 from /conda/envs/data-science-stack-2.9.0/lib/python3.7/site-packages/pip (python 3.7)
Pip installs Python packages. These are specially constructed directories of Python code designed to allow Pip to install them so you can import third party code into your projects easily. Most Python packages are stored and distributed via the PyPi Python Package Index and can be installed simply by entering the package name after
pip3 install. In the below example, we’ll install my
gilfoyle PDF generator Python package and all its dependencies.
pip3 install gilfoyle
If a Python package isn’t on PyPi, but the package author has still created it using the correct Python packaging format and hosted their code on GitHub, you can still install the package directly from there. To do this, you need to obtain the
.git file location from the GitHub repository. In the example below, we’ll install GAPandas using its GitHub repository URL.
pip3 install git+https://github.com/flyandlure/gapandas.git
More rarely, you might need to install a Python package from a source distribution or
sdist or a wheel distribution or
wheel. These are distribution files usually included in the package itself, so if you’ve downloaded the source, you can use these commands to install the package.
pip3 install mypackage-1.0.tar.gzpip install mypackage-1.0-py3-none-any.whl
As we saw earlier when upgrading Pip itself, we can use the
pip3 install --upgrade command to upgrade any install Python packages in the working environment. In the example below, I’m updating the
gapandas package to the latest version.
pip3 install --upgrade gapandas
To uninstall or delete a package you can use the
uninstall command. This will prompt you to enter
y to confirm you wish to proceed before removing the package from your Python environment.
pip3 uninstall gapandas
A requirements file, usually called
requirements.txt, specifies the list of packages that are required to run each package. This might just be the name of the package, like
pandas, or it could refer to a specific version number i.e.
pandas==1.5.0, or any version greater than a specific version number, i.e.
requirements.txt file is created by the Package provider and is installed in the document root of the package directory. Here’s an example of a
dropbox pandas mailchimp-transactional numpy
requirements.txt file defines that the
numpy packages are needed to allow the given package to run. Ordinarily, when the package is installed
pip will install the packages listed in
requirements.txt so all the project dependencies are satisfied. However, during package development and testing you may need to manually run this. You can install the dependencies listed in the requirements file with the below command:
pip3 install -r requirements.txt
Matt Clarke, Thursday, December 23, 2021