K8s之Node亲和性调度
前言
前面介绍了定向调度,虽然很方便,但是也存在一些问题,不够灵活并且是硬限制,如果Node节点不存在,那么该Pod就运行不了,所以使用场景还是有所限制。
针对于上面问题,k8s就给我们提供了亲和性调度,它在NodeSelector上做了一些扩展,可以通过配置优先选择符合条件的Node节点,如果没有也可以调度到其它Node节点上,取代了定向调度的硬限制。随着亲和性调度越来越能够体现NodeSelector的功能,最终NodeSelector应该会被废弃
亲和性调度有两种分别是 节点亲和性 与 Pod亲和性。本文主要介绍节点亲和性,Pod亲和性后面更新
NodeAffinity
NodeAffinity表示Node亲和性调度,用于替换NodeSelector的全新调度策略
目前nodeAffinity有两种配置项
requiredDuringSchedulingIgnoredDuringExecution
必须满足指定的规则才能将Pod调度到节点,属于硬限制,调度完成之后就不再检查条件是否满足,与NodeSelector非常像只是语法不同,所以说NodeAffinity可以替代NodeSelector
通过 kubectl explain pods.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution 可以查看配置项,具体如下
requiredDuringSchedulingIgnoredDuringExecution
nodeSelectorTerms #<[]Object> -required- 节点选择列表
- matchFields # <[]Object> 按节点字段列出的节点选择器要求列表
- matchExpressions #<[]Object> 按节点标签列出的节点选择器要求列表(推荐)
- key # 标签名
operator # 操作符 包括 In、NotIn、Exists、DoesNotExist、Gt、Lt
values # 标签值
操作符代表的含义如下
In # label的值必须在某个节点列表中
NotIn # 与In相反
Exists # 某个label存在
DoesNotExist # 某个label不存在
Gt # label的值大于某个值
Lt # label的值小于某个值
后面讲的Pod亲和性具有互斥功能,Node亲和性虽然语法上没有互斥功能,但是通过 NotIn和DoesNotExist可以实现节点互斥的功能
通过kubectl label命令给node01打上北京机房的标签,给node02打上上海机房的标签
kubectl label nodes node01 area=bj
kubectl label nodes node02 area=shanghai
# 查看标签是否设置成功
kubectl get nodes --show-labels
编写 nginx-nodeAffinity-required.yaml 内容如下
apiVersion: v1
kind: Pod
metadata:
name: nginx-node-affinity-required
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: area
operator: In
values: ['bj','changsha']
启动Pod,由于设置values: [‘bj’,‘changsha’],所以只有node01满足要求,观察Pod所在节点
# 启动
[root@master affinity]# kubectl create -f nginx-nodeAffinity-required.yaml
pod/nginx-node-affinity-required created
# 查看Pod详情,落在node01节点
[root@master affinity]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-node-affinity-required 1/1 Running 0 23s 10.244.1.71 node01 <none> <none>
修改yaml为values: [‘shanghai’,‘changsha’],此时只有node02满足要求,启动Pod观察Pod所在节点
# 删除之前Pod
[root@master affinity]# kubectl delete -f nginx-nodeAffinity-required.yaml
pod "nginx-node-affinity-required" deleted
# 启动
[root@master affinity]# kubectl create -f nginx-nodeAffinity-required.yaml
pod/nginx-node-affinity-required created
# 查看Pod详情,落在node02节点
[root@master affinity]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-node-affinity-required 1/1 Running 0 8s 10.244.2.24 node02 <none> <none>
修改yaml为values: [‘hangzhou’,‘changsha’],此时没有节点满足要求,前面提到requiredDuringSchedulingIgnoredDuringExecution属于硬限制,所以Pod应该运行不起来,启动Pod,观察Pod状态
# 删除之前Pod
[root@master affinity]# kubectl delete -f nginx-nodeAffinity-required.yaml
pod "nginx-node-affinity-required" deleted
# 启动
[root@master affinity]# kubectl create -f nginx-nodeAffinity-required.yaml
pod/nginx-node-affinity-required created
# 查看Pod详情,状态为Pending并没有运行成功
[root@master affinity]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-node-affinity-required 0/1 Pending 0 14s <none> <none> <none> <none>
# 查看启动过程事件,抛出异常
[root@master affinity]# kubectl describe pod nginx-node-affinity-required|grep -A 100 Event
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 44s default-scheduler 0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity/selector.
preferredDuringSchedulingIgnoredDuringExecution
优先满足指定规则,调度器会尝试调度Pod到指定Node,如果没有此节点,也不会强制要求,是一个软限制。如果有多个匹配规则,可以设置权重(weight)来定义执行的先后顺序。
通过 kubectl explain pods.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution 可以查看配置项,具体如下
preferredDuringSchedulingIgnoredDuringExecution
- weight # 权重,范围1-100。
preference # 一个节点选择器项,与相应的权重相关联
matchFields # <[]Object> 按节点字段列出的节点选择器要求列表
matchExpressions # <[]Object> 按节点标签列出的节点选择器要求列表(推荐)
key # 标签
operator # 操作符
values # <[]string> 标签值
编写 nginx-nodeAffinity-preferred.yaml 内容如下,设置两个匹配规则,以权重定义执行顺序
apiVersion: v1
kind: Pod
metadata:
name: nginx-node-affinity-preferred
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: area
operator: In
values: ['bj']
- weight: 2
preference:
matchExpressions:
- key: area
operator: In
values: ['shanghai']
启动Pod,由于设置有两个匹配规则,并且shanghai的优先级高于bj,所以应该要落在node02节点,观察Pod所在节点
# 启动
[root@master affinity]# kubectl create -f nginx-nodeAffinity-preferred.yaml
pod/nginx-node-affinity-preferred created
# 查看Pod详情,落在node02节点
[root@master affinity]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-node-affinity-preferred 1/1 Running 0 12s 10.244.2.25 node02 <none> <none>
修改yaml将bj的权重设为3,再次启动Pod,观察Pod所在节点
# 删除之前Pod
[root@master affinity]# kubectl delete -f nginx-nodeAffinity-preferred.yaml
pod "nginx-node-affinity-preferred" deleted
# 启动
[root@master affinity]# kubectl create -f nginx-nodeAffinity-preferred.yaml
pod/nginx-node-affinity-preferred created
# 查看Pod详情,落在node01节点
[root@master affinity]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-node-affinity-preferred 1/1 Running 0 8s 10.244.1.72 node01 <none> <none>
修改yaml如下,条件均不满足
apiVersion: v1
kind: Pod
metadata:
name: nginx-node-affinity-preferred
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: area
operator: In
values: ['changsha']
- weight: 2
preference:
matchExpressions:
- key: area
operator: In
values: ['hangzhou']
启动Pod,上面提到,这种调度为软限制,如果没有找到符合条件的Node会退而求其次,选择一个资源充足的Pod进行调度
# 删除之前Pod
[root@master affinity]# kubectl delete -f nginx-nodeAffinity-preferred.yaml
pod "nginx-node-affinity-preferred" deleted
# 启动
[root@master affinity]# kubectl create -f nginx-nodeAffinity-preferred.yaml
pod/nginx-node-affinity-preferred created
# 调度到了node01几点
[root@master affinity]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-node-affinity-preferred 1/1 Running 0 4s 10.244.1.74 node01 <none> <none>
问题
由于nodeSelector还未被废弃,那么同时设置nodeSelector与nodeAffinity,该如何调度
从上面给出的亲和性配置项可以知道nodeSelectorTerms为一个数组类型,可以设置多个matchExpressions,那么有多个matchExpressions该如何匹配
matchExpressions也是一个数组,可以设置多组key-value操作,这又该如何匹配
同时定义nodeSelector与nodeAffinity
编写 affinity-nodeselector.yaml 内容如下
apiVersion: v1
kind: Pod
metadata:
name: affinity-nodeselector
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: area
operator: In
values: ['bj']
nodeSelector:
area: shanghai
启动Pod,由于两者都是硬限制,限制的条件不同应该无法调度。
# 启动Pod
[root@master affinity]# kubectl create -f affinity-nodeselector.yaml
pod/affinity-nodeselector created
# 查看Pod状态,未启动成功
[root@master affinity]# kubectl get pods
NAME READY STATUS RESTARTS AGE
affinity-nodeselector 0/1 Pending 0 9s
# 查看启动事件,看到了似曾相识的报错
[root@master affinity]# kubectl describe pod affinity-nodeselector | grep -A 100 Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 34s default-scheduler 0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity/selector.
修改nodeSelector,将条件也设置为bj,启动成功
[root@master affinity]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
affinity-nodeselector 1/1 Running 0 7s 10.244.1.75 node01 <none> <none>
nodeAffinity指定多个matchExpressions如何执行
编写 multiple-nodeselectorterms.yaml 内容如下
apiVersion: v1
kind: Pod
metadata:
name: multiple-nodeselectorterms
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: area
operator: In
values: ['changsha']
- matchExpressions:
- key: area
operator: In
values: ['shanghai']
启动Pod,如果可以启动成功并且在node02节点,表示只要满足一个条件即可,如果启动不成功表示都要满足
# 启动
[root@master affinity]# kubectl create -f multiple-nodeselectorterms.yaml
pod/multiple-nodeselectorterms created
# 观察Pod详情,启动成功,并且Pod落在node02节点
[root@master affinity]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
multiple-nodeselectorterms 1/1 Running 0 19s 10.244.2.26 node02 <none> <none>
matchExpressions有多个匹配项
编写 multiple-matchexpressions.yaml 内容如下
apiVersion: v1
kind: Pod
metadata:
name: multiple-matchexpressions
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: area
operator: In
values: ['changsha']
- key: area
operator: In
values: ['shanghai']
启动Pod,如果可以启动成功并且在node02节点,表示只要满足一个条件即可,如果启动不成功表示都要满足
# 启动
[root@master affinity]# kubectl create -f multiple-matchexpressions.yaml
pod/multiple-matchexpressions created
# 观察Pod,启动失败
[root@master affinity]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
multiple-matchexpressions 0/1 Pending 0 11s <none> <none> <none> <none>
# 查看启动事件,还是熟悉的报错
[root@master affinity]# kubectl describe pod multiple-matchexpressions | grep -A 100 Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 48s default-scheduler 0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity/selector.
总结
如果同时定义了nodeSelector与nodeAffinity,两个条件必须满足,才能调度到指定Pod上
如果有多个 matchExpressions ,那么只要满足一个即可
如果一个matchExpressions 下有多个匹配项,那么需要全部满足
节点亲和性就介绍到这里,后面介绍Pod亲和性调度与互斥调度。
欢迎关注,学习不迷路!