Welcome to our in-depth guide on configuring NVIDIA GPUs with Docker on Ubuntu. This post is tailored for developers, data scientists, and IT professionals who are looking to leverage the power of NVIDIA's GPU acceleration within Docker containers.
Whether you're working on machine learning projects, scientific computations, or any GPU-intensive tasks, this guide will walk you through the process step-by-step.
This guide will work for all Nvidia GPUs that have a supported driver in Linux such as the GTX, RTX and Tesla series. Of course the Tesla series is recommended as they have ECC and are more tailored for AI applications.
Typically 1 real/physical GPU can work on one machine, whether physical or a VM that gets exclusive use. But nvidia has made tools that solve this problem by essentially creating a layer between Docker and the nvidia driver.
There is the old nvidia-docker2 package that NVIDIA has created which allows an unlimited amount of Docker containers to use the underlaying GPU(s) on the host but this has now been deprecrated for the new "nvidia-toolkit"
nvidia-toolkit is what you want unless there's a reason why you have an older distro that can't use the newer nvidia-toolkit.
nvidia-toolkit official install guide: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
First we add the gpg key for the repo:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
Add the repo into our sources.list.d
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' |
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
Update apt so it can see the required packages from the nvidia repo
apt update
apt-get install -y nvidia-container-toolkit Reading package lists... Done Building dependency tree Reading state information... Done The following additional packages will be installed: libnvidia-container-tools libnvidia-container1 nvidia-container-toolkit-base The following NEW packages will be installed: libnvidia-container-tools libnvidia-container1 nvidia-container-toolkit nvidia-container-toolkit-base 0 upgraded, 4 newly installed, 0 to remove and 690 not upgraded. Need to get 4,194 kB of archives. After this operation, 16.6 MB of additional disk space will be used. Get:1 https://nvidia.github.io/libnvidia-container/stable/deb/amd64 libnvidia-container1 1.14.3-1 [923 kB] Get:2 https://nvidia.github.io/libnvidia-container/stable/deb/amd64 libnvidia-container-tools 1.14.3-1 [19.3 kB] Get:3 https://nvidia.github.io/libnvidia-container/stable/deb/amd64 nvidia-container-toolkit-base 1.14.3-1 [2,336 kB] Get:4 https://nvidia.github.io/libnvidia-container/stable/deb/amd64 nvidia-container-toolkit 1.14.3-1 [917 kB] Fetched 4,194 kB in 1s (3,150 kB/s) Selecting previously unselected package libnvidia-container1:amd64. (Reading database ... 467373 files and directories currently installed.) Preparing to unpack .../libnvidia-container1_1.14.3-1_amd64.deb ... Unpacking libnvidia-container1:amd64 (1.14.3-1) ... Selecting previously unselected package libnvidia-container-tools. Preparing to unpack .../libnvidia-container-tools_1.14.3-1_amd64.deb ... Unpacking libnvidia-container-tools (1.14.3-1) ... Selecting previously unselected package nvidia-container-toolkit-base. Preparing to unpack .../nvidia-container-toolkit-base_1.14.3-1_amd64.deb ... Unpacking nvidia-container-toolkit-base (1.14.3-1) ... Selecting previously unselected package nvidia-container-toolkit. Preparing to unpack .../nvidia-container-toolkit_1.14.3-1_amd64.deb ... Unpacking nvidia-container-toolkit (1.14.3-1) ... Setting up nvidia-container-toolkit-base (1.14.3-1) ... Setting up libnvidia-container1:amd64 (1.14.3-1) ... Setting up libnvidia-container-tools (1.14.3-1) ... Setting up nvidia-container-toolkit (1.14.3-1) ... Processing triggers for libc-bin (2.31-0ubuntu9.2) ...
nvidia-ctk runtime configure --runtime=docker
INFO[0000] Loading config from /etc/docker/daemon.json
INFO[0000] Wrote updated config to /etc/docker/daemon.json
INFO[0000] It is recommended that docker daemon be restarted.
docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.154 Driver Version: 390.154 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... Off | 00000000:04:00.0 N/A | N/A |
| N/A 41C P0 N/A / N/A | 0MiB / 16160MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
configure, nvidia, gpus, docker, ubuntu, comprehensive, ai, cuda, solutionwelcome, depth, configuring, tailored, developers, scientists, professionals, leverage, gpu, acceleration, containers, projects, scientific, computations, intensive, tasks, supported, linux, gtx, rtx, tesla, recommended, ecc, applications, multiple, typically, vm, exclusive, essentially, creating, layer, allows, unlimited, underlaying, deprecrated, quot, toolkit, distro, newer, install, https, docs, datacenter, native, container, html, mint, debian, repo, gpg, curl, fssl, github, io, libnvidia, gpgkey, sudo, dearmor, usr, keyrings, keyring, sources, deb, sed, tee, etc, apt, update, packages, lists, dependency, additional, installed, upgraded, newly, kb, archives, mb, disk, amd, fetched, selecting, previously, unselected, database, directories, currently, preparing, unpack, _, _amd, unpacking, tools_, base_, toolkit_, processing, triggers, libc, bin, configuration, ctk, runtime, info, loading, config, daemon, json, updated, restarted, rm, smi, persistence, disp, volatile, uncorr, temp, perf, pwr, usage, util, compute, sxm, mib, default, processes, pid,