Despite having been a Linux user for about 20 years, there are times when I find I have wasted days trying to solve a seemingly simple problem. One such issue that seems to be hampering many data scientists at present is getting CUDA, CuDNN, Keras and TensorFlow up and running correctly on Ubuntu 20.04.
NVIDIA produce many fantastic things for the data science community, but unfortunately, getting them all to work together can sometimes be a challenge. It’s a shame that NVIDIA don’t produce a custom Ubuntu spin containing everything pre-configured so it just works.
I experienced a raft of issues trying to get everything work together. When I eventually did get my setup working, the NVIDIA graphics card driver decided to send a signal to only one display from my two GeForce GTX 690 graphics cards (packing around 6000 CUDA cores between them). Switching to a newer GeForce RTX 2060 solved my problems, so perhaps much of my bad luck can be attributed to using older (but still very capable) graphics cards.
Here are the software versions that worked for me, plus a series of commands you can run in a Jupyter Notebook or your terminal to confirm that your machine is correctly configured.
To check all of the version numbers you’ve got installed, you can run a series of commands on Ubuntu via the terminal to get some useful diagnostic data back.
!python --version
Python 3.8.3
!nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
!nvidia-smi
Wed Aug 12 20:48:34 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 2060 On | 00000000:26:00.0 On | N/A |
| 0% 45C P5 14W / 170W | 437MiB / 5931MiB | 27% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1600 G /usr/lib/xorg/Xorg 59MiB |
| 0 N/A N/A 2175 G /usr/lib/xorg/Xorg 145MiB |
| 0 N/A N/A 2380 G /usr/bin/gnome-shell 112MiB |
| 0 N/A N/A 3594 G ...AAAAAAAAA= --shared-files 104MiB |
+-----------------------------------------------------------------------------+
!whereis cuda
cuda: /usr/local/cuda
!cat /usr/include/cudnn.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 5
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"
Running the below command will import TensorFlow and return the version number, if it has been installed correctly.
!pip list | grep tensorflow
tensorflow-estimator 2.3.0
tensorflow-gpu 2.3.0
!pip list | grep Keras
Keras 2.3.1
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.2
import tensorflow as tf
tf.test.is_built_with_cuda()
True
import tensorflow as tf
tf.config.list_physical_devices('GPU')
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
## Print TensorFlow version
import tensorflow as tf
print(tf.__version__)
2.3.0
Matt Clarke, Tuesday, March 02, 2021