本文介绍了当主机具有 CUDA 9 时,我可以使用 CUDA 10 运行 Docker 容器吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在需要 CUDA 10 的 docker 容器中部署应用程序.这是运行应用程序使用的一些底层 pytorch 功能所必需的.

Im deploying an application in a docker container that requires CUDA 10. This is necessary to run some of the underlying pytorch functionality that the application uses.

但是,主机服务器正在运行 docker ce 17、Nvidia-docker v 1.0 和 CUDA 版本 9,我将无法升级主机.

However, the host server is running docker ce 17, Nvidia-docker v 1.0 with CUDA version 9, and I will not be able to upgrade the host.

我的印象是,我被主机上可用的 v1 nvidia docker runtime 和 CUDA 版本束缚住了.

I’m under the impression that I’m handcuffed to the v1 nvidia docker runtime and CUDA version available on the host.

有没有办法在容器上运行 CUDA 10,以便我可以利用这个工具包的功能?

Is there a way to run CUDA 10 on the container so I can leverage the functionality of this toolkit?

推荐答案

一般情况下,任何特定的 CUDA 版本都需要最低 GPU 驱动程序版本.这在 这里这里(表1).因此,要使用 CUDA 9.0,您至少需要一个支持 CUDA 9.0 的 GPU 驱动程序版本,例如 R384 驱动程序.要使用 CUDA 10.0,您至少需要一个支持 CUDA 10.0 的 GPU 驱动程序版本,例如 R410 驱动程序.

In the general case, any specific CUDA version will require a minimum GPU driver version. That is covered in places like here and here (table 1). So to use CUDA 9.0 you would need at least a GPU driver version that supports CUDA 9.0, such as a R384 driver. To use CUDA 10.0 you would need at least a GPU driver version that supports CUDA 10.0, such as a R410 driver.

容器的使用并没有从根本上改变这一点.如果你想使用一个包含 CUDA 10 代码的容器,你的基础机器需要一个支持 CUDA 10 的驱动程序.

The usage of containers doesn't fundamentally change this. If you want to use a container that has CUDA 10 code in it, your base machine needs a driver that supports CUDA 10.

NVIDIA 确实开始发布允许修改上述声明的兼容性库.这些兼容性库可用,但默认情况下不随 CUDA 工具包安装一起安装.这些兼容性库仅在某些情况下有效,并且它们具有某些可用要求.此处记录了兼容性库.

NVIDIA did start publishing compatibility libraries that allow modifications to the above statements. These compatibility libraries are available but not installed by default with a CUDA toolkit install. These compatibility libraries only work in certain cases, and they have certain requirements to be usable. The compatibility libraries are documented here.

使用这些兼容性库的具体要求之一是使用的 GPU 必须是 Tesla 品牌的 GPU.这些兼容性库不支持 GeForce、Quadro、Jetson 和 Titan 系列 GPU.

One of the specific requirements for use of these compatibility libraries is that the GPU(s) in use must be Tesla-brand GPUs. GeForce, Quadro, Jetson, and Titan family GPUs are not supported by these compatibility libraries.

此外,这些库仅适用于 CUDA 工具包版本和安装在基础机器上的 GPU 驱动程序版本的特定组合.此兼容性矩阵"记录在 此处(表3).只有 CUDA 工具包版本与已安装驱动程序版本的特定组合才能用于兼容性.举一个例子,如果您希望使用 CUDA 10.0,并且您的基础机器具有安装了 R396 驱动程序的 Tesla GPU,则不提供兼容性支持.但是,在相同的设置中,如果您希望使用 CUDA 10.1,则可以提供兼容性支持.

Furthermore, the libraries only work with certain combination of CUDA toolkit versions, and GPU driver versions installed on the base machine. This "compatibility matrix" is documented here (Table 3). Only the specific combinations of CUDA toolkit versions with installed driver versions will be usable for compatibility. To pick one example, if you wish to use CUDA 10.0, and your base machine has a Tesla GPU with a R396 driver installed, there is no compatibility support. In the same setup, however, if you wish to use CUDA 10.1, there is compatibility support for that.

如果您已经满足了兼容性使用的要求,那么剩下的步骤就是安装兼容性库(或从基础构建容器已安装兼容性库的容器).

If you have satisfied the requirements for compatibility usage, then the remaining step would be to install the compatibility libraries (or build your container from a base container that has the compatibility libraries already installed).

对于包管理器 CUDA 安装方法,安装兼容性库的方法很简单(以 Ubuntu 为例,安装 CUDA 10.1 兼容性以匹配 CUDA 10.1 工具包安装):

For a package manager CUDA install method, the method to install the compatibility libraries is simple (example on Ubuntu, installing the CUDA 10.1 compatibility to match CUDA 10.1 toolkit install):

sudo apt-get install cuda-compat-10.1

确保版本与您正在使用的 CUDA 工具包版本相匹配(您使用包管理器方法安装的版本,或者已经安装在容器中的版本).

Make sure to match the version to the CUDA toolkit version that you are using (that you installed with the package manager method, or that was already installed in your container).

此兼容性路径"仅在 CUDA 9.0 时间范围内开始.配备早于 CUDA 9.0 的驱动程序的系统将无法以任何方式用于此兼容性路径.还有各种功能限制和限制,在 文档.

This compatibility "path" only began in the CUDA 9.0 timeframe. Systems that are equipped with drivers that predate CUDA 9.0 will not be usable in any way for this compatibility path. There are also various functional limitations and restrictions, which are covered in the documentation.

正确安装并使用此兼容性路径"后,整个系统配置可能似乎"违反了此答案顶部指示的规则.例如,CUDA 10.1 应用程序可能在仅安装了 R396 驱动程序的机器上运行.

When this "compatibility path" is correctly installed and in use, the overall system configuration can "appear" to be violating the rules indicated at the top of this answer. For example a CUDA 10.1 application could possibly be running on a machine that had only a R396 driver installed.

对于此处查看的具体问题,OP 最终表示基础机器具有 Quadro GPU,因此此兼容性路径"不适用,并且是运行例如如果在基本机器中安装了支持 CUDA 10.0 的驱动程序,则 CUDA 10.0 容器将是,例如R410 或更高版本的驱动程序.

For the specific question in view here, OP eventually indicated that the base machine had a Quadro GPU, so this "compatibility path" does not apply, and the only way to run e.g. a CUDA 10.0 container would be if a CUDA 10.0-capable driver is installed in the base machine, e.g. R410 or later driver.

这篇关于当主机具有 CUDA 9 时,我可以使用 CUDA 10 运行 Docker 容器吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!