本文介绍了Kubernetes Pod失败,回退重启失败的容器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试设置普罗米修斯日志记录,我正在尝试在yamls下部署,但Pod失败,并显示"Back-off restarting FAILED CONTAINER"
完整说明:
Name: prometheus-75dd748df4-wrwlr
Namespace: monitoring
Priority: 0
Node: kbs-vm-02/172.16.1.8
Start Time: Tue, 28 Apr 2020 06:13:22 +0000
Labels: app=prometheus
pod-template-hash=75dd748df4
Annotations: <none>
Status: Running
IP: 10.44.0.7
IPs:
IP: 10.44.0.7
Controlled By: ReplicaSet/prometheus-75dd748df4
Containers:
prom:
Container ID: docker://50fb273836c5522bbbe01d8db36e18688e0f673bc54066f364290f0f6854a74f
Image: quay.io/prometheus/prometheus:v2.4.3
Image ID: docker-pullable://quay.io/prometheus/prometheus@sha256:8e0e85af45fc2bcc18bd7221b8c92fe4bb180f6bd5e30aa2b226f988029c2085
Port: 9090/TCP
Host Port: 0/TCP
Args:
--config.file=/prometheus-cfg/prometheus.yml
--storage.tsdb.path=/data
--storage.tsdb.retention=$(STORAGE_LOCAL_RETENTION)
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 28 Apr 2020 06:14:08 +0000
Finished: Tue, 28 Apr 2020 06:14:08 +0000
Ready: False
Restart Count: 3
Limits:
memory: 1Gi
Requests:
cpu: 200m
memory: 500Mi
Environment Variables from:
prometheus-config-flags ConfigMap Optional: false
Environment: <none>
Mounts:
/data from storage (rw)
/prometheus-cfg from config-file (rw)
/var/run/secrets/kubernetes.io/serviceaccount from prometheus-token-bt7dw (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-file:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: prometheus-config-file
Optional: false
storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: prometheus-storage-claim
ReadOnly: false
prometheus-token-bt7dw:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-token-bt7dw
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 76s (x3 over 78s) default-scheduler running "VolumeBinding" filter plugin for pod "prometheus-75dd748df4-wrwlr": pod has unbound immediate PersistentVolumeClaims
Normal Scheduled 73s default-scheduler Successfully assigned monitoring/prometheus-75dd748df4-wrwlr to kbs-vm-02
Normal Pulled 28s (x4 over 72s) kubelet, kbs-vm-02 Container image "quay.io/prometheus/prometheus:v2.4.3" already present on machine
Normal Created 28s (x4 over 72s) kubelet, kbs-vm-02 Created container prom
Normal Started 27s (x4 over 71s) kubelet, kbs-vm-02 Started container prom
Warning BackOff 13s (x6 over 69s) kubelet, kbs-vm-02 Back-off restarting failed container
部署文件:
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
namespace: monitoring
labels:
app: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
strategy:
type: Recreate
template:
metadata:
labels:
app: prometheus
spec:
securityContext:
fsGroup: 1000
serviceAccountName: prometheus
containers:
- image: quay.io/prometheus/prometheus:v2.4.3
name: prom
args:
- '--config.file=/prometheus-cfg/prometheus.yml'
- '--storage.tsdb.path=/data'
- '--storage.tsdb.retention=$(STORAGE_LOCAL_RETENTION)'
envFrom:
- configMapRef:
name: prometheus-config-flags
ports:
- containerPort: 9090
name: prom-port
resources:
limits:
memory: 1Gi
requests:
cpu: 200m
memory: 500Mi
volumeMounts:
- name: config-file
mountPath: /prometheus-cfg
- name: storage
mountPath: /data
volumes:
- name: config-file
configMap:
name: prometheus-config-file
- name: storage
persistentVolumeClaim:
claimName: prometheus-storage-claim
PV YAML:
apiVersion: v1
kind: PersistentVolume
metadata:
name: prometheus-storage
namespace: monitoring
labels:
app: prometheus
spec:
capacity:
storage: 12Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/data"
PVC YAML数据:
[vidya@KBS-VM-01 7-1_prometheus]$ cat prometheus/prom-pvc.yml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheus-storage-claim
namespace: monitoring
labels:
app: prometheus
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
您知道问题出在哪里以及如何解决吗?如果需要共享更多文件,请同时通知我
我猜是存储配置有问题,正在查看事件日志
警告失败调度76(x3超过78)默认-为Pod"prometheus-75dd748df4-wrwlr"运行"VolumeBinding"筛选器插件的调度程序:Pod具有未绑定的立即持续卷声明
我正在使用本地存储。
[vidya@KBS-VM-01 7-1_prometheus]$ kubectl describe pvc prometheus-storage-claim -n monitoring
Name: prometheus-storage-claim
Namespace: monitoring
StorageClass:
Status: Bound
Volume: prometheus-storage
Labels: app=prometheus
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 12Gi
Access Modes: RWO
VolumeMode: Filesystem
Mounted By: prometheus-75dd748df4-wrwlr
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal FailedBinding 37m persistentvolume-controller no persistent volumes available for this claim and no storage class is set
[vidya@KBS-VM-01 7-1_prometheus]$ kubectl logs prometheus-75dd748df4-zlncv -n monitoring
level=info ts=2020-04-28T07:49:07.885529914Z caller=main.go:238 msg="Starting Prometheus" version="(version=2.4.3, branch=HEAD, revision=167a4b4e73a8eca8df648d2d2043e21bdb9a7449)"
level=info ts=2020-04-28T07:49:07.885635014Z caller=main.go:239 build_context="(go=go1.11.1, user=root@1e42b46043e9, date=20181004-08:42:02)"
level=info ts=2020-04-28T07:49:07.885812014Z caller=main.go:240 host_details="(Linux 3.10.0-1062.1.1.el7.x86_64 #1 SMP Fri Sep 13 22:55:44 UTC 2019 x86_64 prometheus-75dd748df4-zlncv (none))"
level=info ts=2020-04-28T07:49:07.885833214Z caller=main.go:241 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2020-04-28T07:49:07.885849614Z caller=main.go:242 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2020-04-28T07:49:07.888695413Z caller=main.go:554 msg="Starting TSDB ..."
level=info ts=2020-04-28T07:49:07.889017612Z caller=main.go:423 msg="Stopping scrape discovery manager..."
level=info ts=2020-04-28T07:49:07.889033512Z caller=main.go:437 msg="Stopping notify discovery manager..."
level=info ts=2020-04-28T07:49:07.889041112Z caller=main.go:459 msg="Stopping scrape manager..."
level=info ts=2020-04-28T07:49:07.889048812Z caller=main.go:433 msg="Notify discovery manager stopped"
level=info ts=2020-04-28T07:49:07.889071612Z caller=main.go:419 msg="Scrape discovery manager stopped"
level=info ts=2020-04-28T07:49:07.889083112Z caller=main.go:453 msg="Scrape manager stopped"
level=info ts=2020-04-28T07:49:07.889098012Z caller=manager.go:638 component="rule manager" msg="Stopping rule manager..."
level=info ts=2020-04-28T07:49:07.889109912Z caller=manager.go:644 component="rule manager" msg="Rule manager stopped"
level=info ts=2020-04-28T07:49:07.889124912Z caller=notifier.go:512 component=notifier msg="Stopping notification manager..."
level=info ts=2020-04-28T07:49:07.889137812Z caller=main.go:608 msg="Notifier manager stopped"
level=info ts=2020-04-28T07:49:07.889169012Z caller=web.go:397 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=error ts=2020-04-28T07:49:07.889653412Z caller=main.go:617 err="opening storage failed: lock DB directory: open /data/lock: permission denied"
推荐答案
这里的问题是pvc没有绑定到pvc,主要是因为没有存储类来链接pvc和pvc,并且pv(12Gi)和pvc(10Gi)的容量不匹配。因此,到最后,库伯内斯想不出PVC应该绑定到哪个PV上。
- 在PV和PVC的规范中都添加
storageClassName: manual
。 - 使PV中的容量与PVC中的请求相同,即10Gi
PV
apiVersion: v1
kind: PersistentVolume
metadata:
name: prometheus-storage
namespace: monitoring
labels:
app: prometheus
spec:
storageClassName: manual
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/data"
聚氯乙烯
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheus-storage-claim
namespace: monitoring
labels:
app: prometheus
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
更新:
通过添加runAsUser: 0
以根用户身份运行Pod应该可以解决open /data/lock: permission denied
错误
这篇关于Kubernetes Pod失败,回退重启失败的容器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!