我有一个Docker映像,我想在Kubernetes中部署它。图像基于nvidia/cuda:10.0-base。入口点的一个命令是rm -r /usr(是的,此命令会引发问题,但这是必需的)。

当我在docker上运行时,容器效果很好。我确定入口点已正确且完全执行。但是,当我尝试在我的k8s上部署此映像时,容器崩溃并显示以下错误:

rm: cannot remove '/usr/bin/nvidia-smi': Device or resource busy
rm: cannot remove '/usr/bin/nvidia-persistenced': Device or resource busy
rm: cannot remove '/usr/bin/nvidia-cuda-mps-server': Device or resource busy
rm: cannot remove '/usr/bin/nvidia-cuda-mps-control': Device or resource busy
rm: cannot remove '/usr/bin/nvidia-debugdump': Device or resource busy
rm: cannot remove '/usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.430.26': Device or resource busy
rm: cannot remove '/usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.430.26': Device or resource busy
rm: cannot remove '/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.410.104': Device or resource busy
rm: cannot remove '/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.430.26': Device or resource busy
rm: cannot remove '/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.430.26': Device or resource busy
rm: cannot remove '/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.430.26': Device or resource busy
rm: cannot remove '/usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.410.104': Device or resource busy
rm: cannot remove '/usr/lib/x86_64-linux-gnu/libcuda.so.410.104': Device or resource busy
rm: cannot remove '/usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.430.26': Device or resource busy
rm: cannot remove '/usr/lib/x86_64-linux-gnu/libcuda.so.430.26': Device or resource busy


我成功地使用其他入口点部署了该容器,并使用kubectl exec -it进入了该容器的外壳。例如,当我尝试删除时,会出现相同的/usr/bin/nvidia-smi错误。

Device or resource busytop都不显示使用lsof或上面列出的任何其他文件的进程。

/usr/bin/nvidia-smi输出:

      1 root      20   0    4636    848    768 S   0.0  0.0   0:00.05 sh
     19 root      20   0   72304   5860   5096 S   0.0  0.0   0:00.00 sshd
     25 root      20   0   21540   4056   3456 S   0.0  0.0   0:00.09 bash
    447 root      20   0   39512   3740   3196 R   0.0  0.0   0:00.00 top


k8如何影响集装箱的工作?

最佳答案

回答:

Kubernetes已将有问题的列出为繁忙的文件添加到了容器中。

关于linux - Kubernetes在容器中隐式使用Nvidia文件,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/58949861/

10-16 23:41
查看更多