我如何在gitlab-ci docker executor中使用cuda

本文介绍了我如何在gitlab-ci docker executor中使用cuda的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们正在使用gitlab持续集成来构建和测试我们的项目。最近，一个项目增加了对CUDA的要求，以启用GPU加速。我不想更改我们的管道（docker和gitlab-ci对我们来说运行良好），所以我想以某种方式使docker能够与nvidia GPU进行对话。

We are using gitlab continuous integration to buildand test our projects. Recently, one of the projects added the requirement for CUDA to enable GPU acceleration. I do not want to change our pipeline (docker and gitlab-ci are working well for us), so I'd like to somehow give docker the ability to talk to an nvidia GPU.

其他详细信息：

在构建服务器上安装nvidia GPU很好-我们还有一些备用GPU可用于该目的

我们没有使用ubuntu或centOS，因此我们不能使用

您无法向gitlab CI提供-runtime 参数，因此您不能使用nvidia建议的docker调用。

Installing an nvidia GPU on our build servers is fine - we have some spare GPU's lying around to use for that purpose
We are not using ubuntu or centOS, so we cannot use nvidia's cuda containers directly
You can't supply the --runtime parameter to gitlab CI, so you can't use nvidia's suggested docker invocation. [ edit: actually, you can now. See https://gitlab.com/gitlab-org/gitlab-runner/merge_requests/764 ]

推荐答案

有多个步骤：

在主机PC上安装nvidia驱动程序

安装nvidia-docker2

使用CUDA构建docker镜像

使其在gitlab CI中工作

Install the nvidia driver on the host PC
Install nvidia-docker2
Build a docker image with CUDA
Get it working in gitlab CI

请注意，如果您您只需要编译CUDA代码并且不需要运行它，就不需要使用nvidia-docker2，在主机PC上安装nvidia驱动程序，也没有采取任何特殊步骤来使其在gitlab CI中运行。（即，您只需要执行步骤3）

Note that if you only want to compile CUDA code and don't need to run it, you don't need to use nvidia-docker2, have the nvidia driver on the host PC, and there are no special steps for getting it working in gitlab CI. (ie you only have to do step 3)

恐怕我对docker不太熟悉，因此，如果我将容器和图像混合使用，我深表歉意。如果有更多知识的人想修复有关docker的任何错字，将不胜感激。

I'm afraid I'm not too familiar with docker, so if I've mixed container and image I apologize. If someone with more knowledge wants to fix any typos about docker, it would be greatly appreciated.

您在这里有两个选择。您可以使用主机操作系统的推荐步骤。这很容易，但是这意味着环境可能在构建服务器之间有所不同。
另一个选择是直接从nVidia下载安装程序（即），以便您可以通过Docker容器分发它。

YOu have two options here. Either you can use your host's OS's recommended procedure. This is easy, but will mean that the environment may differ across build servers.The other option is to download the installer directly from nVidia (ie https://www.nvidia.com/object/unix.html ) so that you can distribute that with your docker container.

我当前的测试PC是archlinux，因此这是从AUR使用它的一种情况。 nVidia为多个操作系统提供了存储库，因此请参阅 github页面上的快速入门指南。

My current test PC is archlinux, so this was a case of using it from the AUR. nVidia provides repositories for several OS's, so see the quickstart guide on the nvidia-docker github page.

您应该按照快速入门指南测试您的nvidia-docker安装。从主机PC运行以下命令：
docker run --runtime = nvidia --rm nvidia / cuda：9.0-base nvidia-smi 应该运行并输出一些内容像这样：

You should test your nvidia-docker installation as per the quickstart guide. Running from your host PC the command:docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi should run and output something like:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 415.18       Driver Version: 415.18       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 106...  Off  | 00000000:02:00.0  On |                  N/A |
| 28%   39C    P0    24W / 120W |    350MiB /  6071MiB |      3%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

注意，尽管我指定了基于9.0的映像，但是nvidia-smi报告了Cuda10。我认为这是因为在主机PC上安装了Cuda 10。 nvidia-docker文档说它将使用来自docker映像的cuda，所以这应该不是问题。

Notice that although I've specified the 9.0-base image, nvidia-smi reports Cuda 10. I think this is because Cuda 10 is installed on the host PC. The nvidia-docker documentation says that it will use cuda from the docker image, so this shouldn't be a problem.

除非有充分的理由，否则应直接使用Nvidia dockerhub docker映像。就我而言，我想使用基于Debian的docker映像，但是Nvidia仅提供Ubuntu和CentOS的映像。幸运的是，Nvidia为其图像发布了dockerfile，因此您可以从中复制其dockerfile的相关部分。我基于

You should use the Nvidia dockerhub docker images directly unless you have a good reason not to. In my case, I wanted to use a docker image based on Debian, but Nvidia only provides images for Ubuntu and CentOS. Fortunately, Nvidia posts the dockerfile for their images, so you can copy the relevant part of their dockerfiles from them. I based mine on https://gitlab.com/nvidia/cuda/blob/ubuntu16.04/9.2/base/Dockerfile

dockerfile的神奇之处包括：

The magic part of the dockerfile included:

# Install cuda manually
RUN wget https://developer.nvidia.com/compute/cuda/9.2/Prod2/local_installers/cuda_9.2.148_396.37_linux

COPY install_cuda.exp install_cuda.exp
RUN mv cuda_* cuda_install_bin && \
    chmod +x cuda_install_bin && \
    expect install_cuda.exp && \
    rm cuda_*

# Magic copied from nvidia's cuda9.2 dockerfile at
# https://gitlab.com/nvidia/cuda/blob/ubuntu16.04/9.2/base/Dockerfile
ENV CUDA_VERSION 9.2.148


LABEL com.nvidia.volumes.needed="nvidia_driver"
LABEL com.nvidia.cuda.version="${CUDA_VERSION}"

RUN echo "/usr/local/nvidia/lib" >> /etc/ld.so.conf.d/nvidia.conf && \
    echo "/usr/local/nvidia/lib64" >> /etc/ld.so.conf.d/nvidia.conf

ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH}
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64

# nvidia-container-runtime
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
ENV NVIDIA_REQUIRE_CUDA "cuda>=9.2"

期望命令将允许您编写脚本以自动自动接受许可协议等。对我来说，发布 install_cuda.exp 文件不是一个好主意（因为我无法接受您的同意），但就我而言，我接受了eula，同意将其安装在不受支持的操作系统上，不是否安装了图形驱动程序，是否安装了cuda，使用了默认路径，安装了指向usr / local / cuda的符号链接，并且未安装示例。
有关期望的更多信息，请参见手册页，您可以。为此，您必须编辑 /etc/gitlab-runner/config.toml 并添加 runtime = nvidia 到正确的位置。
例如，我的跑步者配置如下：

When researching around the web, most sources tell you that you can't supply the --runtime flag from gitlab runner configuration. Actually, according to this merge request, you can. To do so, you have to edit /etc/gitlab-runner/config.toml and add in runtime = "nvidia" to the right place.For example, my runner configuration looks like:

[[runners]]
  name = "docker-runner-test"
  url = "<<REDACTED>>"
  token = "<<REDACTED>>"
  executor = "docker"
  [runners.docker]
    tls_verify = false
    image = "build_machine"
    privileged = false
    disable_cache = false
    runtime = "nvidia"
    volumes = ["/cache"]
    pull_policy = "never"
    shm_size = 0
  [runners.cache]

这篇关于我如何在gitlab-ci docker executor中使用cuda的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！