问题描述
我有一个 EKS 集群,我已向其中添加了在混合模式下工作的支持(换句话说,我已向其中添加了 Fargate 配置文件).我的目的是只在 AWS Fargate 上运行特定的工作负载,同时为其他类型的工作负载保留 EKS 工作节点.
I have an EKS cluster to which I've added support to work in hybrid mode (in other words, I've added Fargate profile to it). My intention is to run only specific workload on the AWS Fargate while keeping the EKS worker nodes for other kind of workload.
为了测试这一点,我的 Fargate 配置文件定义为:
To test this out, my Fargate profile is defined to be:
- 仅限于特定的命名空间(比方说:mynamespace)
- 具有特定标签,因此 Pod 需要匹配它才能在 Fargate 上进行调度(标签是:fargate: myvalue)
为了测试 k8s 资源,我正在尝试部署简单的 nginx 部署,如下所示:
For testing k8s resources, I'm trying to deploy simple nginx deployment which looks like this:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
namespace: mynamespace
labels:
fargate: myvalue
spec:
selector:
matchLabels:
app: nginx
version: 1.7.9
fargate: myvalue
replicas: 1
template:
metadata:
labels:
app: nginx
version: 1.7.9
fargate: myvalue
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 80
当我尝试应用此资源时,我得到以下信息:
When I try to apply this resource, I get following:
$ kubectl get pods -n mynamespace -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-596c594988-x9s6n 0/1 Pending 0 10m <none> <none> 07c651ad2b-7cf85d41b2424e529247def8bda7bf38 <none>
Pod 保持在 Pending 状态,它永远不会被安排到 AWS Fargate 实例.
Pod stays in the Pending state and it is never scheduled to the AWS Fargate instances.
这是一个 pod 描述输出:
This is a pod describe output:
$ kubectl describe pod nginx-deployment-596c594988-x9s6n -n mynamespace
Name: nginx-deployment-596c594988-x9s6n
Namespace: mynamespace
Priority: 2000001000
PriorityClassName: system-node-critical
Node: <none>
Labels: app=nginx
eks.amazonaws.com/fargate-profile=myprofile
fargate=myvalue
pod-template-hash=596c594988
version=1.7.9
Annotations: kubernetes.io/psp: eks.privileged
Status: Pending
IP:
Controlled By: ReplicaSet/nginx-deployment-596c594988
NominatedNodeName: 9e418415bf-8259a43075714eb3ab77b08049d950a8
Containers:
nginx:
Image: nginx:1.7.9
Port: 80/TCP
Host Port: 0/TCP
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-784d2 (ro)
Volumes:
default-token-784d2:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-784d2
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
我可以从这个输出中得出的结论是选择了正确的 Fargate 配置文件:
One thing that I can conclude from this output is that correct Fargate profile was chosen:
eks.amazonaws.com/fargate-profile=myprofile
另外,我看到一些值被添加到 NOMINATED NODE 字段,但不确定它代表什么.
Also, I see that some value is added to NOMINATED NODE field but not sure what it represents.
在这种情况下,是否有任何想法或常见问题可能值得进行故障排除?谢谢
Any ideas or usual problems that happen and that might be worth troubleshooting in this case? Thanks
推荐答案
事实证明,问题一直在于与 Fargate 配置文件关联的私有子网的网络设置.
It turns out the problem was in networking setup of private subnets associated with the Fargate profile all the time.
为了提供更多信息,这是我最初拥有的:
To give more info, here is what I initially had:
- 具有多个工作节点的 EKS 集群,我仅将公共子网分配给 EKS 集群本身
- 当我尝试将 Fargate 配置文件添加到 EKS 集群时,由于 Fargate 当前的限制,无法将配置文件与公共子网关联.为了解决这个问题,我创建了与公共子网具有相同标签的私有子网,以便 EKS 集群知道它们
我忘记了我需要启用从 vpc 私有子网到外部世界的连接(我缺少 NAT 网关).因此,我在与 EKS 关联的公共子网中创建了 NAT 网关,并在其关联的路由表中添加到私有子网的附加条目,如下所示:
- EKS cluster with several worker nodes where I've assigned only public subnets to the EKS cluster itself
- When I tried to add Fargate profile to the EKS cluster, because of the current limitation on Fargate, it is not possible to associate profile with public subnets. In order to solve this, I've created private subnets with the same tag like the public ones so that EKS cluster is aware of them
What I forgot was that I needed to enable connectivity from the vpc private subnets to the outside world (I was missing NAT gateway). So I've created NAT gateway in Public subnet that is associated with EKS and added to the private subnets additional entry in their associated Routing table that looks like this:
0.0.0.0/0 nat-xxxxxxxx
0.0.0.0/0 nat-xxxxxxxx
这解决了我上面遇到的问题,尽管我不确定 AWS Fargate 配置文件只需要与私有子网关联的真正原因.
This solved the problem that I had above although I'm not sure about the real reason why AWS Fargate profile needs to be associated only with private subnets.
这篇关于尝试在 AWS Fargate 上安排 Pod 时,Pod 停留在 Pending 状态的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!