企业级Jenkins实践
- 1. 概述
- 2. Jenkins部署
- 2.1. Jenkins容器镜像构建
- 2.2. 部署
- 3. 其他组件集成
- 3.1. sonarqube
- 3.2. gitlab
- 3.3. harbor
- 4. 构建运行时
- 4.1. 动态slave
- 4.2. 定制化镜像
- 4.3. pod启动加速
- 5. CI构建
- 5.1. 构建改造
- 5.2. 缓存
- 5.3. DID
- 5.4. buildx
- 5.5. DOCKER_BUILDKIT
- 6. 落地规划
- 7. 附录
- 7.1. 附录1: buildx安装配置
- 7.1.1. 前提
- 7.1.2. 安装
- 7.1.3. 交叉编译支持
- 7.1.4. 创建新的builder
- 7.1.5. 验证
- 7.1.6. 构建命令
- 7.1.7. 构建脚本
- 7.1.8. 常见问题及解决方式
- 7.1.8.1. 编译arm64架构的nodejs项目失败
- 7.1.8.1.0.1. 现象
- 7.1.8.1.0.2. 解决
- 7.1.8.1.1. 参考
- 7.1.8.1. 编译arm64架构的nodejs项目失败
- 7.2. 附录2: k9s动态slave pod启动优化
- 7.2.1. 概述
- 7.2.2. 环境说明
- 7.2.3. 时间分析
- 7.2.4. 性能优化
- 7.2.4.1. etcd
- 7.2.4.2. 启动链分析
- 7.2.4.3. Jenkins remoting协议
- 7.2.4.3.1. jenkinsfile
- 7.2.4.3.2. 服务端日志
- 7.2.4.3.2.1. websocket模式
- 7.2.4.3.2.2. jnlp模式
- 7.2.4.3.3. 客户端agent日志
- 7.2.4.3.3.1. websocke连接
- 7.2.4.3.3.2. jnlp连接
- 7.2.5. 参考
- 7.1. 附录1: buildx安装配置
1. 概述
Jenkins作为开源的自动化CICD工具,自横空出世以来,为企业提供了稳定的CICD基础服务,也提供了丰富多彩的插件,用以包罗生态内各种软件、提供大量方面的快捷操作等。
云原生自出世以来大行其道,对计算服务提供了高效便捷的编排管理,成为当前计算服务生命周期管理方案的不二之选。Jenkins的软件特性与云原生相得益彰:
- Jenkins可以使用k8s的pod作为slave,达成了云计算按需分配的特点
- Jenkins可以通过k8s部署,实现高可用、与生命周期管理,提高服务交付效率、降低维护成本
但Jenkins本身是一个JAVA应用,且庞大,在和K8S集成过程中、大规模使用上仍有不少挑战, 下面从部署、集成、运行、构建、落地规划几个方面记录下Jenkins、Jenkins和K8s在企业实践中遇到的比较有意思的问题。
2. Jenkins部署
2.1. Jenkins容器镜像构建
- custom-war-packager 定制Jenkins WAR包
- jenkins-formulas 通过formula.yaml定制自己的jenkins,针对中国区优化
- formula kubersphere定制的jenkins镜像,方便好用
定制镜像时,需要预制几个服务
- blueocean 提供了jenkins的restful api
- kubernetes 提供了动态slave能力
- 其他常用服务
2.2. 部署
helm-charts
部署时最具多样性的操作是修改 JCasC(Jenkins Configuration as Code), JCasC为Jenkins提供了配置热加载功能、一切服务皆可通过JCasC配置,例如
- 配置Cloud,自动对接k8s集群作为动态slave、自定义pod template
- 配置sonarqube对接
- 初始化jdk工具
- 初始化maven工具
- 初始化git工具
- …
还可以通过修改configmap的挂载来动态修改Maven settings.xml,也可以通过groovy脚本进行用户名、密码、RBAC、邮箱等的初始化,这里的大部分现在可以通过JCasC做,但有一些高级功能仍需要这种传统方式。
3. 其他组件集成
整套DevOps系统要落地的话,从代码存储到容器镜像存储、代码扫描工具、maven仓库等等,需要在部署的时候集成gitlab、sonarqube、harbor、nexus, 这里除了sonarqube,其他的集成以helm charts形式集成部署即可
3.1. sonarqube
sonarqube不仅需要部署即成,还需要在jenkins中添加集成配置,并安装sonar-scanner, 安装sonar-scanner可以通过groovy初始化脚本,也可以通过JCasC,我们采用JCasC方式比较方便,配置如下
unclassified:
sonarGlobalConfiguration:
buildWrapperEnabled: false
installations:
- name: "sonar01"
serverUrl: "http://sonarqube:9000" # helm部署集成之后sonarqube的svc名称和端口,如果有命名空间的话需要加上命名空间
credentialsId: token-sonarqube # 等部署完之后创建
triggers:
skipScmCause: false
skipUpstreamCause: false
tool:
sonarRunnerInstallation:
installations:
- name: "sonar-scanner"
properties:
- installSource:
installers:
- sonarRunnerInstaller:
id: "4.2.0.1873"
3.2. gitlab
集成gitlab的时候需要考虑到部分场景可能没有域名,只能使用IP+PORT,且ssh+http不同,需要注意相关配置和环境变量
# 环境变量 http
EXTERNAL_URL
# 配置 ssh
gitlab_rails['gitlab_shell_ssh_port'] = 30750
3.3. harbor
helm-chart
helm安装harbor遇到耗时比较长的有两个问题:
externalURL
这个值需要配置成需要访问的url,否则docker无法login- 如果使用 nginx/haproxy(https) + ingress(http) + svc(http) 的方式则需要修改 configmap
harbor-registry
中relativeurls: true
否则docker 无法push
4. 构建运行时
4.1. 动态slave
使用k8s动态slave创建容器,首先需要Jenkins对接K8s,这里会用到kubernetes插件,实际操作为配置cloud,使用k8s serviceacount去连接k8s,这这里分为两步
- 使用groovy脚本在Jenkins上创建secret
// cat /var/jenkins_home/init.groovy.d/initK8sCredentials.groovy
import com.clouds.plugins.credentials.CredentialsScope
import com.clouds.plugins.credentials.SystemCredentialsProvider
import com.clouds.plugins.credentials.domains.Domain
import org.csanchez.jenkins.plugins.kubernetes.ServiceAccountCredential
def addKubeCredential(String credentialId) {
def kubeCredential = new ServiceAccountCredential(CredentialsScope.GLOBAL, credentialId, 'Kubernetes service account')
SystemCredentialsProvider.getInstance().getStore().addCredentials(Domain.global(), kubeCredential)
- 使用JCasC进行Cloud配置
jenkins:
mode: EXCLUSIVE
numExecutors: 0
scmCheckoutRetryCount: 2
disableRememberMe: true
clouds:
- kubernetes:
name: "kubernetes"
serverUrl: "https://kubernetes.default"
skipTlsVerify: true
namespace: "jenkins-worker" # worker pod的命名空间
credentialsId: "k8s-service-account"
jenkinsUrl: "http://jenkins-jenkins.kubesphere-devops-system:80" // jenkins svc地址
webSocket: true # master slave通信使用websocket协议
containerCapStr: "200"
connectTimeout: "60"
readTimeout: "60"
maxRequestsPerHostStr: "32"
templates:
- name: "base"
namespace: "jenkins-worker"
label: "base"
nodeUsageMode: "NORMAL"
idleMinutes: 0
containers:
- name: "base"
image: "harbor.com/kubesphere/builder-base:v3.2.2"
command: "cat"
args: ""
ttyEnabled: true
privileged: false
resourceRequestCpu: "100m"
resourceLimitCpu: "4000m"
resourceRequestMemory: "100Mi"
resourceLimitMemory: "4096Mi"
- name: "jnlp"
image: "reg.harbor.com/jenkins/inbound-agent:4.10-3"
args: "^${computer.jnlpmac} ^${computer.name}"
resourceRequestCpu: "50m"
resourceLimitCpu: "4000m"
resourceRequestMemory: "400Mi"
resourceLimitMemory: "4000Mi"
workspaceVolume:
emptyDirWorkspaceVolume:
memory: false
volumes:
- hostPathVolume:
hostPath: "/var/run/docker.sock"
mountPath: "/var/run/docker.sock"
- hostPathVolume:
hostPath: "/var/data/jenkins_sonar_cache"
mountPath: "/root/.sonar/cache"
- hostPathVolume:
hostPath: "/usr/bin/docker"
mountPath: "/usr/bin/docker"
yaml: |
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: node-role.kubernetes.io/worker
operator: In
values:
- ci
tolerations:
- key: "node.kubernetes.io/ci"
operator: "Exists"
effect: "NoSchedule"
- key: "node.kubernetes.io/ci"
operator: "Exists"
effect: "PreferNoSchedule"
containers:
- name: "base"
resources:
requests:
ephemeral-storage: "1Gi"
limits:
ephemeral-storage: "10Gi"
securityContext:
fsGroup: 1000
这里需要挂载 /usr/bin/docker.sock
目录, 使用DID构建容器镜像
4.2. 定制化镜像
针对不同的语言需要有不同的镜像,例如go、java、python、nodejs等等,对于不同的语言版本也需要不同的镜像,对于不同的jenkins任务也需要不同的镜像,例如需要在镜像中运行git命令,则需要安装git等等,除此之外还需要有一个agent镜像用于和master通信, 对于不同的镜像需要不同的Dockerfile, kubersphere
devops-agent
4.3. pod启动加速
jenkins动态slave从发起流水线请求到创建pod再到执行任务默认耗时很长,经过分析主要在以下几个地方:
- pod启动 – 有遇到由于etcd时间偏移导致启动慢的问题,校准时间
- agent启动 – 调整pod limit增大cpu内存
- agent和master建立联系, 使用jnlp协议需要9-14s,使用websocket仅需2s
- pod池 – pod在运行完任务之后不删除,等有相同配置的pod需要的时候运行在存量pod之上,限制存活pod数量,超过一定时间回收pod
这部分参考 附录2: k9s动态slave pod启动优化
5. CI构建
5.1. 构建改造
- 改造两阶段构建的Dockerfile,将二进制构建和dockerfile构建分开,这样可以使用缓存加速,利用语言特性进行交叉编译等等
5.2. 缓存
缓存,使用挂载共享存储的方式,对不同语言挂载不同缓存目录, 可以设置项目隔离缓存与集群隔离缓存多维度
nodejs npm缓存
/root/.yarn
/root/.npm
maven maven仓库
/root/.m2
golang golang缓存
/home/jenkins/go/pkg
pip缓存
/root/.cache/pip
/root/.local/share/virtualenvs
sonar缓存
/root/.sonar/cache
docker
5.3. DID
需要在容器中挂载母机的 docker.socket, 需要注意权限文件权限,可在母机的docker启动文件 /usr/lib/systemd/system/docker.service
编辑如下
ExecStartPost=/bin/chmod 666 /var/run/docker.sock
5.4. buildx
使用buildx可以达到编译多架构镜像的功能
需要安装buildx之后,安装buildx见附录1: buildx安装配置,挂载buildx目录到docker容器
/usr/libexec/docker
5.5. DOCKER_BUILDKIT
docker build时设置 DOCKER_BUILDKIT=1 可以加速
6. 落地规划
- etcd 使用ssd
- docker目录使用ssd
- 连接共享存储集群的网卡选择万兆
7. 附录
7.1. 附录1: buildx安装配置
7.1.1. 前提
https://www.docker.com/blog/multi-platform-docker-builds/
需要提前安装buildx
需要检查base 镜像是否也是多架构=
7.1.2. 安装
wget https://github.com/docker/buildx/releases/download/v0.9.1/buildx-v0.9.1.linux-amd64
mkdir -p /usr/libexec/docker/cli-plugins/
chmod +x buildx-v0.9.1.linux-amd64
mv buildx-v0.9.1.linux-amd64 /usr/libexec/docker/cli-plugins/docker-buildx
7.1.3. 交叉编译支持
- 升级操作系统内核到4.8以上
- docker启用 experimental
- 使用qemu进行架构仿真,否则不支持除本机架构以外的架构
使用binfmt镜像时,在x86平台上构建arm64的nodejs项目会报错,配置
qemu-user-static
可解决
docker run --privileged --rm docker/binfmt:a7996909642ee92942dcd6cff44b9b95f08dad64
docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
7.1.4. 创建新的builder
docker buildx create --name mybuilder --driver-opt image=moby/buildkit:master,network=host --use
7.1.5. 验证
docker buildx ls
7.1.6. 构建命令
# --load 参数只适用于单个镜像,将oci load成本地docker镜像
# --push 参数适用于单个/多个镜像,将oci push到远端仓库
# x86
docker buildx build --platform linux/amd64 \
-f build/mysql/Dockerfile -t reg.myregistry.com/my project/mysql-amd64:v1.3.2.2.gd75a3c0 . --load
docker push reg.myregistry.com/myproject/mysql-amd64:v1.3.2.2.gd75a3c0
# arm
docker buildx build --platform linux/arm64 \
-f build/mysql/Dockerfile -t reg.myregistry.com/myproject/mysql-arm64:v1.3.2.2.gd75a3c0 . --load
docker push reg.myregistry.com/myproject/mysql-arm64:v1.3.2.2.gd75a3c0
# create manifest
docker manifest create reg.myregistry.com/myproject/mysql:v1.3.2.2.gd75a3c0 \
reg.myregistry.com/myproject/mysql-arm64:v1.3.2.2.gd75a3c0 \
reg.myregistry.com/myproject/mysql-amd64:v1.3.2.2.gd75a3c0
7.1.7. 构建脚本
# 抽象循环后的脚本
PLATFORMLIST="linux/amd64 linux/arm64 "
REGISTRY="reg.harbor.com"
REGISTRY_NAMESPACE="base"
TAG=$(git describe --dirty --always --tags | sed 's/-/./g')
SERVICENAME="storage-base"
DOCKERFILE_PATH="build/storage-base/Dockerfile"
MANIFEST_LIST="docker manifest create $REGISTRY/$REGISTRY_NAMESPACE/$SERVICENAME:$TAG"
for i in $PLATFORMLIST;
do {
ARCH=$(echo $i|awk -F '/' '{print $2}')
DOCKER_BUILDKIT=1 docker buildx build --progress plain --platform $i \
-f $DOCKERFILE_PATH -t $REGISTRY/$REGISTRY_NAMESPACE/$SERVICENAME-$ARCH:$TAG . --load
docker push $REGISTRY/$REGISTRY_NAMESPACE/$SERVICENAME-$ARCH:$TAG
}&
done
wait
for i in $PLATFORMLIST;
do {
ARCH=$(echo $i|awk -F '/' '{print $2}')
MANIFEST_LIST+=" $REGISTRY/$REGISTRY_NAMESPACE/$SERVICENAME-$ARCH:$TAG"
echo $MANIFEST_LIST
}
done
eval $MANIFEST_LIST
docker manifest push $REGISTRY/$REGISTRY_NAMESPACE/$SERVICENAME:$TAG
7.1.8. 常见问题及解决方式
7.1.8.1. 编译arm64架构的nodejs项目失败
7.1.8.1.0.1. 现象
Unknown host QEMU_IFLA type: 47
Unknown host QEMU_IFLA type: 48
Unknown host QEMU_IFLA type: 43
#12 89.59 warning compression-webpack-plugin > cacache > @npmcli/move-file@1.1.2: This functionality has n moved to @npmcli/fs
#12 95.21 [2/4] Fetching packages...
#12 128.0 Unknown QEMU_IFLA_INFO_KIND ipip
#12 128.1 info There appears to be trouble with your network connection. Retrying...
#12 162.6 Unknown QEMU_IFLA_INFO_KIND ipip
#12 162.6 info There appears to be trouble with your network connection. Retrying...
#12 196.2 Unknown QEMU_IFLA_INFO_KIND ipip
#12 196.2 info There appears to be trouble with your network connection. Retrying...
#12 230.2 Unknown QEMU_IFLA_INFO_KIND ipip
#12 230.2 info There appears to be trouble with your network connection. Retrying...
#12 263.6 Unknown QEMU_IFLA_INFO_KIND ipip
#12 266.1 error An unexpected error occurred: "http://nexus.myregistry.cn/repository/lls-npm/link-ui-web/-/link-ui-web-2.1.7.tgz: ESOCKETTIMEDOUT".
#12 266.1 info If you think this is a bug, please open a bug report with the information provided in "/opt/yarn-error.log".
#12 266.1 info Visit https://yarnpkg.com/en/docs/cli/install for documentation about this command.
#12 ERROR: executor failed running [/bin/sh -c yarn install]: exit code: 1
------
> [web-builder 4/6] RUN yarn install:
#12 162.6 Unknown QEMU_IFLA_INFO_KIND ipip
#12 162.6 info There appears to be trouble with your network connection. Retrying...
#12 196.2 Unknown QEMU_IFLA_INFO_KIND ipip
#12 196.2 info There appears to be trouble with your network connection. Retrying...
#12 230.2 Unknown QEMU_IFLA_INFO_KIND ipip
#12 230.2 info There appears to be trouble with your network connection. Retrying...
#12 263.6 Unknown QEMU_IFLA_INFO_KIND ipip
7.1.8.1.0.2. 解决
docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
参照 emu-user-static issue
7.1.8.1.1. 参考
- qemu-user-static
- multi-platform-docker-builds
- binfmt
- how-to-upgrade-kernel-centos
7.2. 附录2: k9s动态slave pod启动优化
7.2.1. 概述
jenkins 使用 k8s 作为动态slave,使用jnlp进行slave和master之间的通信,从容器启动到拉取代码需要25s,体验较差
Jenkins 的调度模式是一种尽量能够节省资源的方式进行调度的。在一般的调度过程中 Jenkins 需要经历以下几个阶段: 检查有没有可用的 agent -> 如果没有的可用的agent,计算是否有 agent 预计将要运行完任务 -> 等待一段时间-> 启动动态的 agent -> agent 与 master建立连接。 这种方式使 CI/CD 任务被执行的太慢,我们往往都需要等待几十秒甚至更长时间来准备 CI/CD的执行环境。
7.2.2. 环境说明
组件 | 参数 | 版本 | 备注 |
---|---|---|---|
CPU芯片架构 | amd64 | ||
服务器配置 | 3 * 8C16G500G | ||
操作系统版本 | CentOS Linux release 7.8.2003 (Core) | ||
内核 | 5.4.224-1.el7.elrepo.x86_64 | ||
K8s | v1.22.10 | ||
K8s网络模式 | calico | ||
jenkins版本 | 2.319.1 | ||
jnlp agent版本 | 4.10-3 |
7.2.3. 时间分析
- 点击界面到jenkins打印日志2s
- 容器到running状态3s
- jnlp连接通信17s
- 拉取代码3s
7.2.4. 性能优化
7.2.4.1. etcd
k8s的多个服务不定时在running和terminal之间切换频繁,查看etcd日志,有使用偏移较大的报错,校准始终之后现象没有改变,直到重启etcd解决,这是导致偶尔容器启动慢甚至启动失败的原因。
日志
Nov 30 14:22:50 node-76 etcd[1037]: rejected connection from "10.10.10.158:53604" (error "EOF", ServerName "")
Nov 30 14:22:50 node-76 etcd[1037]: rejected connection from "10.10.10.76:40270" (error "EOF", ServerName "")
Nov 30 14:22:50 node-76 etcd[1037]: rejected connection from "10.10.10.158:51594" (error "EOF", ServerName "")
Nov 30 14:22:50 node-76 etcd[1037]: rejected connection from "10.10.10.85:35032" (error "EOF", ServerName "")
Nov 30 14:22:50 node-76 etcd[1037]: rejected connection from "10.10.10.85:47684" (error "EOF", ServerName "")
Nov 30 14:22:50 node-76 etcd[1037]: rejected connection from "10.10.10.76:40182" (error "EOF", ServerName "")
Nov 30 14:22:50 node-76 etcd[1037]: rejected connection from "10.10.10.85:47604" (error "EOF", ServerName "")
Nov 30 14:22:50 node-76 etcd[1037]: rejected connection from "10.10.10.85:35028" (error "EOF", ServerName "")
Nov 30 14:22:51 node-76 etcd[1037]: rejected connection from "10.10.10.76:40130" (error "EOF", ServerName "")
Nov 30 14:22:51 node-76 etcd[1037]: rejected connection from "10.10.10.76:40226" (error "EOF", ServerName "")
Nov 30 14:22:51 node-76 etcd[1037]: rejected connection from "10.10.10.76:40246" (error "EOF", ServerName "")
Nov 30 14:22:51 node-76 etcd[1037]: rejected connection from "10.10.10.76:40230" (error "EOF", ServerName "")
Nov 30 14:22:51 node-76 etcd[1037]: rejected connection from "10.10.10.158:53668" (error "EOF", ServerName "")
Nov 30 14:22:51 node-76 etcd[1037]: rejected connection from "10.10.10.85:34996" (error "EOF", ServerName "")
Nov 30 14:22:51 node-76 etcd[1037]: rejected connection from "10.10.10.76:40258" (error "EOF", ServerName "")
Nov 30 14:22:51 node-76 etcd[1037]: raft2022/11/30 14:22:51 INFO: 496af9c69679c08a is starting a new election at term 1114
Nov 30 14:22:51 node-76 etcd[1037]: raft2022/11/30 14:22:51 INFO: 496af9c69679c08a became candidate at term 1115
Nov 30 14:22:51 node-76 etcd[1037]: raft2022/11/30 14:22:51 INFO: 496af9c69679c08a received MsgVoteResp from 496af9c69679c08a at term 1115
Nov 30 14:22:51 node-76 etcd[1037]: raft2022/11/30 14:22:51 INFO: 496af9c69679c08a [logterm: 9, index: 18853153] sent MsgVote request to 8e9e6d1fe6ff301d at term 1115
Nov 30 14:22:51 node-76 etcd[1037]: raft2022/11/30 14:22:51 INFO: 496af9c69679c08a [logterm: 9, index: 18853153] sent MsgVote request to dc6ed8a281b6886b at term 1115
Nov 30 14:22:52 node-76 etcd[1037]: read-only range request "key:\"/registry/health\" " with result "error:context deadline exceeded" took too long (2.000157124s) to execute
Nov 30 14:22:55 node-76 etcd[1037]: health check for peer dc6ed8a281b6886b could not connect: dial tcp 10.10.10.85:2380: i/o timeout
Nov 30 14:22:55 node-76 etcd[1037]: health check for peer dc6ed8a281b6886b could not connect: dial tcp 10.10.10.85:2380: i/o timeout
Nov 30 14:22:55 node-76 etcd[1037]: the clock difference against peer dc6ed8a281b6886b is too high [1m40.949394882s > 1s]
Nov 30 14:22:55 node-76 etcd[1037]: health check for peer 8e9e6d1fe6ff301d could not connect: dial tcp 10.10.10.158:2380: i/o timeout
Nov 30 14:22:55 node-76 etcd[1037]: health check for peer 8e9e6d1fe6ff301d could not connect: dial tcp 10.10.10.158:2380: i/o timeout
Nov 30 14:22:55 node-76 etcd[1037]: the clock difference against peer 8e9e6d1fe6ff301d is too high [1m35.440412268s > 1s]
Nov 30 14:22:55 node-76 etcd[1037]: raft2022/11/30 14:22:55 INFO: 496af9c69679c08a [term: 1115] ignored a MsgVoteResp message with lower term from dc6ed8a281b6886b [term: 1100]
Nov 30 14:22:58 node-76 etcd[1037]: raft2022/11/30 14:22:58 INFO: 496af9c69679c08a is starting a new election at term 1115
Nov 30 14:22:58 node-76 etcd[1037]: raft2022/11/30 14:22:58 INFO: 496af9c69679c08a became candidate at term 1116
Nov 30 14:22:58 node-76 etcd[1037]: raft2022/11/30 14:22:58 INFO: 496af9c69679c08a received MsgVoteResp from 496af9c69679c08a at term 1116
Nov 30 14:22:58 node-76 etcd[1037]: raft2022/11/30 14:22:58 INFO: 496af9c69679c08a [logterm: 9, index: 18853153] sent MsgVote request to 8e9e:
7.2.4.2. 启动链分析
- jenkins接口到jenkins slave处理模块的2s – 优化空间小,难度高
- k8s启动容器3s – 优化空间小,难度高
- jenkins slave模块到agent通信的17s – 优化空间大
- jenkins启动的pod都有两个容器,其中一个是用于与master通信的jnlp容器,jnlp容器主要做的事
- 启动shell脚本
- 启动java程序
- 与服务端交互 – 模拟jnlp java程序与服务端建立交互,需要5s
- 交互成功
- jenkins启动的pod都有两个容器,其中一个是用于与master通信的jnlp容器,jnlp容器主要做的事
其中,shell脚本比较简单,耗时比较少,剩下的就是java程序
/opt/java/openjdk/bin/java -cp /usr/share/jenkins/agent.jar hudson.remoting.jnlp.Main -headless -tunnel jenkins-jenkins-agent.kubesphere-devops-system:50000 -url http://jenkins-jenkins.kubesphere-devops-system:80/ -workDir /home/jenkins/agent 158e8612dcb51a9266d8579642ff4eb73a419967235729bb2e8294e0d5f80c5c base-hbrw8
从CPU、内存、磁盘IO分析,所用占系统总体都不高,Jenkins主程序经常由于处理很多并发任务,占用CPU资源比较多,一般应用启动瞬间,需要做很多检查、建立很多连接,所以耗计算资源较多,检查jnlp容器的资源确实limit很小,调整limit到4c8g之后有明显改观,整个过程从原来的25s降低到8s,整体体验上快了三倍。
7.2.4.3. Jenkins remoting协议
- JNLP4-connect – 当前Jenkins slave通信使用的主流协议
- 连接方式 – 服务端启动一个端口,等agent端回掉
- WebSocket – 推荐使用的协议,简单、高效
- 连接方式 – websocket
- 插件式协议 remoting kafka plugin – 使用kafka通信
7.2.4.3.1. jenkinsfile
pipeline {
agent {
node {
label 'base'
}
}
stages {
stage('Hello') {
steps {
echo 'Hello World'
}
}
}
}
7.2.4.3.2. 服务端日志
7.2.4.3.2.1. websocket模式
jenkins container log
pod从创建到执行任务再到销毁需要 9s, slave+echo hello world通信需要2s
- 从jenkins接收到请求到检索label
- 发送创建pod请求
- 创建pod到podrunning
- slave建立连接进行通信
2022-12-05 07:57:28.263+0000 [id=92860] INFO i.j.p.g.event.HttpEventSender#send: Skipped event sending due to receiver URL not set
2022-12-05 07:57:28.265+0000 [id=92860] INFO i.j.p.g.event.HttpEventSender#send: Skipped event sending due to receiver URL not set
2022-12-05 07:57:33.249+0000 [id=48] INFO hudson.slaves.NodeProvisioner#update: base-hdn7j provisioning successfully completed. We have now 8 computer(s)
2022-12-05 07:57:33.285+0000 [id=92845] INFO o.c.j.p.k.KubernetesLauncher#launch: Created Pod: kubernetes kubesphere-devops-worker/base-hdn7j
2022-12-05 07:57:35.646+0000 [id=92845] INFO o.c.j.p.k.KubernetesLauncher#launch: Pod is running: kubernetes kubesphere-devops-worker/base-hdn7j
2022-12-05 07:57:37.227+0000 [id=92833] INFO o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent base-hdn7j
2022-12-05 07:57:37.261+0000 [id=92834] INFO o.j.p.workflow.job.WorkflowRun#finish: wanggangfeng-test01/helloworld #19 completed: SUCCESS
7.2.4.3.2.2. jnlp模式
jenkins container log
pod从创建到执行任务再到销毁需要 14s slave通信+echo hello world需要6s
- 从jenkins接收到请求到检索label 6s 更换docker磁盘 加大内存 cpu 4s
- 发送创建pod请求 18ms
- 创建pod到podrunning 2s 2s
- slave建立连接进行通信 6s 减少为1s通过更换协议 前面增大limit前14s
2022-12-05 05:53:07.316+0000 [id=90051] INFO i.j.p.g.event.HttpEv entSender#send: Skipped event sending due to receiver URL not set
2022-12-05 05:53:13.255+0000 [id=55] INFO hudson.slaves.NodeProvisioner#update: base-1pfs8 provisioning successfully completed. We have now 8 computer(s)
2022-12-05 05:53:13.273+0000 [id=89997] INFO o.c.j.p.k.KubernetesLauncher#launch: Created Pod: kubernetes kubesphere-devops-worker/base-1pfs8
2022-12-05 05:53:15.777+0000 [id=90058] INFO h.TcpSlaveAgentListener$ConnectionHandler#run: Accepted JNLP4-connect connection #302 from /10.222.122.126:39776
2022-12-05 05:53:15.913+0000 [id=89997] INFO o.c.j.p.k.KubernetesLauncher#launch: Pod is running: kubernetes kubesphere-devops-worker/base-1pfs8
2022-12-05 05:53:21.261+0000 [id=89988] INFO o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent base-1pfs8
2022-12-05 05:53:21.283+0000 [id=89989] INFO o.j.p.workflow.job.WorkflowRun#finish: wanggangfeng-test01/helloworld #2 completed: SUCCESS
7.2.4.3.3. 客户端agent日志
7.2.4.3.3.1. websocke连接
Warning: SECRET is defined twice in command-line arguments and the environment variable
Warning: AGENT_NAME is defined twice in command-line arguments and the environment variable
Dec 02, 2022 9:17:36 AM hudson.remoting.jnlp.Main createEngine
INFO: Setting up agent: base1-qxbh2
Dec 02, 2022 9:17:36 AM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Dec 02, 2022 9:17:36 AM hudson.remoting.Engine startEngine
INFO: Using Remoting version: 4.10
Dec 02, 2022 9:17:36 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
INFO: Using /home/jenkins/agent/remoting as a remoting work directory
Dec 02, 2022 9:17:36 AM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
INFO: Both error and output logs will be printed to /home/jenkins/agent/remoting
Dec 02, 2022 9:17:37 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: WebSocket connection open
Dec 02, 2022 9:17:38 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected
Dec 02, 2022 9:17:42 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave$SlaveDisconnector call
INFO: Disabled agent engine reconnects.
7.2.4.3.3.2. jnlp连接
Warning: SECRET is defined twice in command-line arguments and the environment variable
Warning: AGENT_NAME is defined twice in command-line arguments and the environment variable
Dec 02, 2022 9:21:04 AM hudson.remoting.jnlp.Main createEngine
INFO: Setting up agent: base-xjlhx
Dec 02, 2022 9:21:05 AM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Dec 02, 2022 9:21:05 AM hudson.remoting.Engine startEngine
INFO: Using Remoting version: 4.10
Dec 02, 2022 9:21:05 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
INFO: Using /home/jenkins/agent/remoting as a remoting work directory
Dec 02, 2022 9:21:05 AM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
INFO: Both error and output logs will be printed to /home/jenkins/agent/remoting
Dec 02, 2022 9:21:05 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://jenkins-jenkins.kubesphere-devops-system:80/]
Dec 02, 2022 9:21:05 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping]
Dec 02, 2022 9:21:05 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
INFO: Remoting TCP connection tunneling is enabled. Skipping the TCP Agent Listener Port availability check
Dec 02, 2022 9:21:05 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Agent discovery successful
Agent address: jenkins-jenkins-agent.kubesphere-devops-system
Agent port: 50000
Identity: f0:b3:19:11:68:7e:a7:9f:5c:0d:32:d9:f0:30:d2:45
Dec 02, 2022 9:21:05 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Dec 02, 2022 9:21:05 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to jenkins-jenkins-agent.kubesphere-devops-system:50000
Dec 02, 2022 9:21:05 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Trying protocol: JNLP4-connect
Dec 02, 2022 9:21:06 AM org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader run
INFO: Waiting for ProtocolStack to start.
Dec 02, 2022 9:21:10 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Remote identity confirmed: f0:b3:19:11:68:7e:a7:9f:5c:0d:32:d9:f0:30:d2:45
Dec 02, 2022 9:21:10 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected
Dec 02, 2022 9:21:15 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave$SlaveDisconnector call
INFO: Disabled agent engine reconnects.
7.2.5. 参考
- remoting-4.14
- configure-ports-jnlp-agents