How to install the NVIDIA Data Science Stack on Ubuntu 20.04

The NVIDIA Data Science Stack is the quickest way to setup the drivers and packages needed for GPU-accelerated data science. Here’s how to install it.

How to install the NVIDIA Data Science Stack on Ubuntu 20.04
Picture by XPS, Unsplash.
4 minutes to read

One of the most annoying aspects of working with GPU-accelerated data science software, such as NVIDIA Rapids, TensorFlow, PyTorch and XGBoost, is that it can sometimes be very complicated and time-consuming to get all of your drivers and packages working together properly.

Over the past year I’ve set up several Ubuntu data science workstations and have encountered various issues in getting everything set up correctly. I was hoping that NVIDIA might release their own Linux data science distro, however, I recently found that they’d already created the next best thing - the NVIDIA Data Science Stack.

NVIDIA Data Science Stack

The Data Science Stack is a shell script tool for Linux that can be used to set up the right NVIDIA GPU drivers and GPU-accelerated data science packages, and it removes much of the hassle from the manual approach. It can be used on Ubuntu 18.04 LTS, Ubuntu 20.04 LTS, and Red Hat Enterprise Linux (RHEL) 7.5+ or 8.x.

The Data Science Stack can be used to install and configure dozens of commonly used packages, including these:

  • NVIDIA GPU driver
  • NVIDIA CUDA
  • NVIDIA RAPIDS
  • NVIDIA Docker and Docker-CE
  • Jupyter Lab
  • Anaconda
  • PyTorch
  • FastAI
  • Tensorflow

After recently taking delivery of a new Dell Precision 7750 portable data science workstation, I thought I’d give it a try. Here’s how you use it.

1. Clone the Git repository

After creating a fresh install of Ubuntu Linux 20.04 LTS on my Precision 7750, I installed git and cloned the NVIDIA Data Science Stack repository to a directory called Data, which I placed in my home directory.

sudo apt install git
cd ~
mkdir Data
cd Data
git clone https://github.com/NVIDIA/data-science-stack
cd data-science-stack

2. Start the installer

To run the Data Science Stack installation script, start the script by entering ./data-science-stack and then pass the setup-system argument. This script will set up the correct NVIDIA graphics drivers, currently 455.23.04, and the CUDA 11.0.228 libraries that allow data science software to utilise NVIDIA GPUs.

./data-science-stack setup-system

3. Setup a user

The next step is to add a user to the system using the setup-user argument. After running this, if you run gnome-session-quit --no-prompt your Gnome session will automatically log you out and then back in again without a prompt.

./data-science-stack setup-user
gnome-session-quit --no-prompt

4. Build the container

Next we will build a containerised environment containing all the data science applications we need. To build a container, you pass in the build-container argument.

This command will set up Docker CE and NVIDIA Docker 2 and install a metric shit-ton of packages that allow you to do GPU-accelerated stuff. If you prefer to use Conda, you can pass in the build-conda-env command instead.

./data-science-stack build-container

As the build-container process is very intensive, it does take a long time to complete - maybe 30-60 minutes. It does run without requiring user input, so you can go for a walk while it’s running.

5. Run the container

To run the NVIDIA Docker container you can pass the run-container command to the data-science-stack script. This will fire up the container and start up a Jupyter Lab environment from where you can run your own Jupyter notebooks, or some of the built-in GPU-accelerated example scripts.

./data-science-stack run-container

To access the NVIDIA Docker container running Jupyter Lab you will need to visit http://localhost:8888/ in your web browser. The CLI may give you a URL, such as http://a3682313c15e:8888/ but these often don’t work on Ubuntu.

Jupyterlab

6. Edit the script to mount your volume

To get the NVIDIA Data Science Stack to map your local notebooks directory to the one loaded by the Docker container you’ll need to edit the data-science-stack shell script.

Look for the docker run command towards the bottom of the file and add a section which reads -v ~/Data/notebooks:/notebooks just after the docker run, where Data/notebooks is the name of the directory in your home directory containing your Jupyter notebooks.

docker run -v ~/Data/notebooks:/notebooks --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 ${ENVIRONMENT_NAME}:${STACK_VERSION}

Finally, run the ./data-science-stack run-container and the container will start up with your local notebooks folder mapped to the Docker machine.

Matt Clarke, Sunday, March 07, 2021

Matt Clarke Matt is a Digital Director who uses data science to help in his work. He has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.

Software Engineering for Data Scientists in Python

Learn all about modularity, documentation, & automated testing to help you solve Data Science problems quicker and more reliably.

Start course for FREE

Comments