我需要一些有关k8s 1.14以及在其上运行gitlab管道的问题的建议。许多作业都抛出了退出代码137错误,我发现这意味着容器被突然终止。

集群信息:

Kubernetes版本:1.14
正在使用的云:AWS EKS
节点:C5.4xLarge

深入研究后,我发现了以下日志:

**kubelet: I0114 03:37:08.639450**  4721 image_gc_manager.go:300] [imageGCManager]: Disk usage on image filesystem is at 95% which is over the high threshold (85%). Trying to free 3022784921 bytes down to the low threshold (80%).

**kubelet: E0114 03:37:08.653132**  4721 kubelet.go:1282] Image garbage collection failed once. Stats initialization may not have completed yet: failed to garbage collect required amount of images. Wanted to free 3022784921 bytes, but freed 0 bytes

**kubelet: W0114 03:37:23.240990**  4721 eviction_manager.go:397] eviction manager: timed out waiting for pods runner-u4zrz1by-project-12123209-concurrent-4zz892_gitlab-managed-apps(d9331870-367e-11ea-b638-0673fa95f662) to be cleaned up

**kubelet: W0114 00:15:51.106881**   4781 eviction_manager.go:333] eviction manager: attempting to reclaim ephemeral-storage

**kubelet: I0114 00:15:51.106907**   4781 container_gc.go:85] attempting to delete unused containers

**kubelet: I0114 00:15:51.116286**   4781 image_gc_manager.go:317] attempting to delete unused images

**kubelet: I0114 00:15:51.130499**   4781 eviction_manager.go:344] eviction manager: must evict pod(s) to reclaim ephemeral-storage

**kubelet: I0114 00:15:51.130648**   4781 eviction_manager.go:362] eviction manager: pods ranked for eviction:

 1. runner-u4zrz1by-project-10310692-concurrent-1mqrmt_gitlab-managed-apps(d16238f0-3661-11ea-b638-0673fa95f662)
 2. runner-u4zrz1by-project-10310692-concurrent-0hnnlm_gitlab-managed-apps(d1017c51-3661-11ea-b638-0673fa95f662)

 3. runner-u4zrz1by-project-13074486-concurrent-0dlcxb_gitlab-managed-apps(63d78af9-3662-11ea-b638-0673fa95f662)

 4. prometheus-deployment-66885d86f-6j9vt_prometheus(da2788bb-3651-11ea-b638-0673fa95f662)

 5. nginx-ingress-controller-7dcc95dfbf-ld67q_ingress-nginx(6bf8d8e0-35ca-11ea-b638-0673fa95f662)

然后, pods 终止,导致退出代码137s。

谁能帮助我了解原因以及解决此问题的可能解决方案?

谢谢 :)

最佳答案

退出代码137不一定表示OOMKilled。当容器收到SIGKILL(某些中断或“oom-killer” [OUT-OF-MEMORY])时,它指示失败。

如果Pod收到OOMKilled,则在描述Pod时会看到以下一行

      State:        Terminated
      Reason:       OOMKilled

我看到了类似的错误,但无法找出根本原因,因为我们提到的原因:Error

关于linux - Kubernetes pod 终止-退出代码137,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/59729917/

10-13 03:31