问题描述
我很难理解哪种方法最适合我的情况以及如何实际实施.
I'm having difficulty understanding which would be best for my situation and how to actually implement it.
简而言之,问题是这样的:
In a nutshell, the problem is this:
- 我正在使用Skaffold扩展数据库(Postgres),BE(Django)和FE(React)部署
- BE在DB旋转之前大约有50%的时间
- Django要做的第一件事就是连接到数据库
- 它只会尝试一次(根据设计并且无法更改),如果无法尝试,它将失败并导致应用程序损坏
- 因此,我需要确保每次启动部署时,在开始进行BE部署之前,数据库部署都在运行中.
- Thus, I need to make sure every single time I spin up my deployments, the DB deployment is running before starting the BE deployment
我遇到了就绪,活跃,以及starup探针.我已经阅读了好几次,准备调查听起来像我需要的:我不希望BE部署在DB部署准备好接受连接之前就开始.
I came across readiness, liveness, and starup probes. I've read it a couple times and readiness probes sound like what I need: I don't want the BE deployment to start until the DB deployment is ready to accept connections.
我想我不了解如何设置它.这是我尝试过的方法,但是我仍然遇到实例被加载到另一个实例之前的情况.
I guess I'm not understanding how to set it up. This is what I've tried, but I still run into instances where one is being loaded before another.
postgres.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres-deployment
spec:
replicas: 1
selector:
matchLabels:
component: postgres
template:
metadata:
labels:
component: postgres
spec:
containers:
- name: postgres
image: testappcontainers.azurecr.io/postgres
ports:
- containerPort: 5432
env:
- name: POSTGRES_DB
valueFrom:
secretKeyRef:
name: testapp-secrets
key: PGDATABASE
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: testapp-secrets
key: PGUSER
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: testapp-secrets
key: PGPASSWORD
- name: POSTGRES_INITDB_ARGS
value: "-A md5"
volumeMounts:
- name: postgres-storage
mountPath: /var/lib/postgresql/data
subPath: postgres
volumes:
- name: postgres-storage
persistentVolumeClaim:
claimName: postgres-storage
---
apiVersion: v1
kind: Service
metadata:
name: postgres-cluster-ip-service
spec:
type: ClusterIP
selector:
component: postgres
ports:
- port: 1423
targetPort: 5432
api.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-deployment
spec:
replicas: 3
selector:
matchLabels:
component: api
template:
metadata:
labels:
component: api
spec:
containers:
- name: api
image: testappcontainers.azurecr.io/testapp-api
ports:
- containerPort: 5000
env:
- name: PGUSER
valueFrom:
secretKeyRef:
name: testapp-secrets
key: PGUSER
- name: PGHOST
value: postgres-cluster-ip-service
- name: PGPORT
value: "1423"
- name: PGDATABASE
valueFrom:
secretKeyRef:
name: testapp-secrets
key: PGDATABASE
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: testapp-secrets
key: PGPASSWORD
- name: SECRET_KEY
valueFrom:
secretKeyRef:
name: testapp-secrets
key: SECRET_KEY
- name: DEBUG
valueFrom:
secretKeyRef:
name: testapp-secrets
key: DEBUG
readinessProbe:
httpGet:
host: postgres-cluster-ip-service
port: 1423
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 2
---
apiVersion: v1
kind: Service
metadata:
name: api-cluster-ip-service
spec:
type: ClusterIP
selector:
component: api
ports:
- port: 5000
targetPort: 5000
client.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: client-deployment
spec:
replicas: 3
selector:
matchLabels:
component: client
template:
metadata:
labels:
component: client
spec:
containers:
- name: client
image: testappcontainers.azurecr.io/testapp-client
ports:
- containerPort: 3000
readinessProbe:
httpGet:
path: api-cluster-ip-service
port: 5000
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 2
---
apiVersion: v1
kind: Service
metadata:
name: client-cluster-ip-service
spec:
type: ClusterIP
selector:
component: client
ports:
- port: 3000
targetPort: 3000
我认为 ingress.yaml
和 skaffold.yaml
不会有帮助,但是请告诉我是否应该添加它们.
I don't think the ingress.yaml
and the skaffold.yaml
will be helpful, but let me know if I should add those.
那我在做什么错了?
因此,我根据David Maze的回复尝试了一些方法.这可以帮助我了解正在发生的事情,但是我仍然遇到一些我不太了解如何解决的问题.
So I've tried out a few things based on David Maze's response. This helped me understand what is going on better, but I am still running into issues I'm not quite understanding how to resolve.
第一个问题是,即使使用默认的 restartPolicy:Always
,即使Django失败,Pods本身也不会失败.Pod认为即使Django失败了,它们也完全健康.
The first problem is that even with a default restartPolicy: Always
, and even though Django fails, the Pods themselves don't fail. The Pods think they are perfectly healthy even though Django has failed.
第二个问题是,显然需要使Pods了解Django的状态.那是我还没有全神贯注的部分,特别是探针应该检查其他部署或它们本身的状态吗?
The second problem is that apparently the Pods need to be made aware of Django's status. That is the part I'm not quite wrapping my brain around, particularly should probes be checking the status of other deployments or themselves?
昨天我的想法是前者,但今天我认为是后者:Pod需要知道其中包含的程序已失败.但是,我尝试过的所有操作只会导致探测失败,连接被拒绝等.
Yesterday my thinking was the former, but today I'm thinking it is the latter: the Pod needs to know the program contained in it has failed. However, everything I've tried just results in a failed probe, connection refused, etc.:
# referring to itself
host: /health
port: 5000
host: /healthz
port: 5000
host: /api
port: 5000
host: /
port: 5000
host: /api-cluster-ip-service
port: 5000
host: /api-deployment
port: 5000
# referring to the DB deployment
host: /health
port: 1423 #or 5432
host: /healthz
port: 1423 #or 5432
host: /api
port: 1423 #or 5432
host: /
port: 1423 #or 5432
host: /postgres-cluster-ip-service
port: 1423 #or 5432
host: /postgres-deployment
port: 1423 #or 5432
因此,尽管它是超级简单"的实现,但显然我在设置探针是错误的(正如一些博客所描述的那样).例如,/health
和/healthz
路由:这些是内置在Kubernetes中还是需要设置?重新阅读文档以希望澄清这一点.
So apparently I'm setting up the probe wrong, despite it being a "super-easy" implementation (as a few blogs have described it). For example, the /health
and /healthz
routes: are these built into Kubernetes or do these need to be setup? Rereading the docs to hopefully clarify this.
推荐答案
实际上,认为我可能已经解决了.
Actually, think I might have sorted it out.
部分问题是,即使 restartPolicy:Always
是默认设置,Pod也不知道Django失败了,因此认为它们是健康的.
Part of the problem is that even though restartPolicy: Always
is the default, the Pods are not aware the Django has failed so it thinks they are healthy.
我的想法是错误的,因为我本来以为我需要参考数据库部署来查看它是否在开始API部署之前已经启动.相反,我需要检查Django是否失败,然后重新部署.
My thinking was wrong in that I originally assumed I needed to refer to the DB deployment to see if it had start before starting the API deployment. Instead I needed to check if Django had failed and redeploy it if it had.
通过以下操作为我完成了此任务:
Doing the following accomplished this for me:
livenessProbe:
tcpSocket:
port: 5000
initialDelaySeconds: 2
periodSeconds: 2
readinessProbe:
tcpSocket:
port: 5000
initialDelaySeconds: 2
periodSeconds: 2
我正在学习Kubernetes,因此,如果有更好的方法或者只是完全错误,请更正我.我只是知道它可以实现我想要的.
I'm learning Kubernetes so please correct me if there is a better way to do this or if this is just plain wrong. I just know it accomplishes what I want.
这篇关于设置就绪,活跃或启动探针的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!