当前位置: 首页 > news >正文

aws imagebuilder 理解并使用imagebuilder构建pcluster自定义ami

参考资料

  • ec2-image-builder-workshop

  • Troubleshoot EC2 Image Builder

理解imagebuilder

imagebuilder 使用 cinc-client 进行客户端统一配置CINC is not Chef,而是chef的免费分发版本。

https://cinc.sh/about/

imagebuilder管道的整体逻辑如下

pipeline

核心概念的关系如下图

concept

  • recipe,包含一个parent image和一个或多个components

  • component,是recipe的构建块,描述了如何构建、验证和测试映像

    在这里插入图片描述

  • Infrastructure,定义了构建和测试映像的环境

  • distribution,配置指定分发到选定的 AWS 区域、帐户或组织

运行命令和日志的细节可以参考,Under the Hood

构建pcluster自定义ami

官方pcluster作为源

之前的pcluster文章介绍了通过pcluster工具创建ami,实际上就是使用了imagebuilder

Image Builder 使用 SSM 自动化以协调映像构建操作。要查看其他详细信息以帮助排除生成故障,需要在控制台中搜索Image Builder 提供的执行 ID,然后检查 Automation 执行

Resource handler returned message: "Error occurred during operation 'SSM execution 'a13bc224-150b-47ae-8e9d-47f3bdc4dc48' failed for image arn: 'arn:aws-cn:imagebuilder:cn-north-1:xxxxxxx:image/parallelclusterimage-myubuntu1804/3.1.4/1' with status = 'Failed' in state = 'BUILDING' and failure message = 'Document arn:aws-cn:imagebuilder:cn-north-1:xxxxxxx:component/parallelclusterimage-de178710-9674-11ed-b264-0e2b2c28fce2/3.1.4/1 failed!''." (RequestToken: 273970de-d749-1216-1215-06466707ae47, HandlerErrorCode: GeneralServiceException)

在这里插入图片描述

查看具体的错误细节,和cfn的报错一致,具体需要查看对应document的错误日志

在这里插入图片描述

在document的cwlogs中查看构建自定义ami的报错(日志来自image builder)

可见是由于pcluser命令行版本3.1.4,ami对应pcluster版本为3.2.1,版本不一致导致报错

================================================================================
Stdout: Recipe Compile Error in /etc/chef/local-mode-cache/cache/cookbooks/aws-parallelcluster/attributes/conditions.rb
Stdout: ================================================================================
Stdout: 
Stdout: RuntimeError
Stdout: ------------
Stdout: This AMI was created with aws-parallelcluster-cookbook-3.2.1, but is trying to be used with aws-parallelcluster-cookbook-3.1.4. Please either use an AMI created with aws-parallelcluster-cookbook-3.1.4 or change your ParallelCluster to aws-parallelcluster-cookbook-3.2.1

修改版本一致后构建成功,之后使用自定义ami创建集群即可

Region: cn-north-1
Image:
  Os: ubuntu1804
  CustomAmi: ami-003819348308f4f4f
HeadNode:
  InstanceType: m5.large
...

公开ami作为源

之前选择的是pcluster的官方ami版本, aws-parallelcluster-3.2.1-ubuntu-1804-lts-hvm-x86_64-202209270835,尝试使用普通的ubuntu ami能否顺利构建

Build:
  InstanceType: c5.4xlarge
  ParentImage: ami-07356f2da3fd22521
  SubnetId: subnet-xxxxxxxxx
  SecurityGroupIds:
    - sg-xxxxxxxx
  UpdateOsPackages:
    Enabled: true

cfn堆栈报错如下

Resource handler returned message: "Error occurred during operation 'SSM execution 'cb055f7d-7c07-471a-9d3a-06a900926f8e' failed for image arn: 'arn:aws-cn:imagebuilder:cn-north-1:xxxxxxx:image/parallelclusterimage-myubuntu1804raw/3.2.1/1' with status = 'Failed' in state = 'BUILDING' and failure message = 'Document arn:aws-cn:imagebuilder:cn-north-1:xxxxxxx:component/parallelclusterimage-f78ad100-9685-11ed-89e5-06b4c2e890aa/3.2.1/1 failed!''." (RequestToken: ea6df8f2-d076-43b7-8893-44c567a70a34, HandlerErrorCode: GeneralServiceException)

还是一样的套路寻找错误原因

Command 9647e5df-dfe4-49f5-aab2-f6843bf55c16 returns unexpected invocation result: 
{Status=[Failed], ResponseCode=[1], Output=[{
    "executionId": "c0466b39-9686-11ed-8042-0651be0b5200",
    "status": "failed",
    "failedStepCount": 1,
    "executedStepCount": 24,
    "ignoredFailedStepCount": 0,
    "failureMessage": "Document arn:aws-cn:imagebuilder:cn-north-1:xxxxxxx:component/parallelclusterimage-f78ad100-9685-11ed-89e5-06b4c2e890aa/3.2.1/1 failed!",
    "logUrl": "/var/lib/amazon/toe/TOE_2023-01-17_16-48-21_UTC-0_c0466b39-9686-11ed-8042-0651be0b5200"
}

查看cwlogs日志,这就有点尴尬了

STDERR: fatal: unable to access 'https://github.com/pyenv/pyenv-virtualenv/': gnutls_handshake() failed: The TLS connection was non-properly terminated.
Ran git ls-remote "https://github.com/pyenv/pyenv-virtualenv" "master*" returned 128

没有找到配置代理的地方,暂时无奈放弃

通过userdata分析报错

构建成功后启动pcluster头节点的userdata,只保留主要逻辑如下

  • 检查cookbook和pcluster版本是否一致
  • 检查ami是否被pcluster支持
  • 运行chef配置节点
#!/bin/bash -x
...
function vendor_cookbook
{
  mkdir /tmp/cookbooks
  cd /tmp/cookbooks
  tar -xzf /etc/chef/aws-parallelcluster-cookbook.tgz
  HOME_BAK="${HOME}"
  export HOME="/tmp"
  for d in `ls /tmp/cookbooks`; do
    cd /tmp/cookbooks/$d
    LANG=en_US.UTF-8 /opt/cinc/embedded/bin/berks vendor /etc/chef/cookbooks --delete || error_exit 'Vendoring cookbook failed.'
  done;
  export HOME="${HOME_BAK}"
}
...
custom_cookbook=NONE
export _region=cn-north-1
s3_url=amazonaws.com.cn
if [ "${custom_cookbook}" != "NONE" ]; then
  if [[ "${custom_cookbook}" =~ ^s3://([^/]*)(.*) ]]; then
    bucket_region=$(aws s3api get-bucket-location --bucket ${BASH_REMATCH[1]} | jq -r '.LocationConstraint')
    if [[ "${bucket_region}" == null ]]; then
      bucket_region="us-east-1"
    fi
    cookbook_url=$(aws s3 presign "${custom_cookbook}" --region "${bucket_region}")
  else
    cookbook_url=${custom_cookbook}
  fi
fi
export parallelcluster_version=aws-parallelcluster-3.2.1
export cookbook_version=aws-parallelcluster-cookbook-3.2.1
export chef_version=17.2.29
export berkshelf_version=7.2.0
if [ -f /opt/parallelcluster/.bootstrapped ]; then
  installed_version=$(cat /opt/parallelcluster/.bootstrapped)
  if [ "${cookbook_version}" != "${installed_version}" ]; then
    error_exit "This AMI was created with ${installed_version}, but is trying to be used with ${cookbook_version}. Please either use an AMI created with ${cookbook_version} or change your ParallelCluster to ${installed_version}"
  fi
else
  error_exit "This AMI was not baked by ParallelCluster. Please use pcluster build-image command to create an AMI by providing your AMI as parent image."
fi
if [ "${custom_cookbook}" != "NONE" ]; then
  curl --retry 3 -v -L -o /etc/chef/aws-parallelcluster-cookbook.tgz ${cookbook_url}
  vendor_cookbook
fi

由此可见,构建自定义ami出现的错误实际上是在测试镜像阶段检测版本不一致导致的。

查看/etc/chef/cookbooks目录,是recipe菜单目录

$ tree -L 1
/etc/chef/cookbooks
├── apt
├── aws-parallelcluster
├── aws-parallelcluster-awsbatch
├── aws-parallelcluster-config
├── aws-parallelcluster-install
├── aws-parallelcluster-scheduler-plugin
├── aws-parallelcluster-slurm
├── aws-parallelcluster-test
├── iptables
├── line
├── nfs
├── openssh
├── pyenv
├── selinux
├── yum
└── yum-epel

具体报错需要结合内部的ruby代码进行分析了

相关文章:

  • wordpress群发/长沙seo优化首选
  • 数商云电子商务网站建设/电脑系统优化软件排行榜
  • wordpress 破解主题下载/百度首页排名代发
  • 苏州正规做网站公司/营销培训
  • 关于手机电子商务网站建设/百度写一篇文章多少钱
  • 网站建设 网页设计 的文章/热门职业培训班
  • 关于ElasticSearch的那些事,面试常见问题
  • 浅析正则表达式+范围规则校验表达式+js从字符串中截取数字
  • 设计模式——代理模式
  • 左右双指针 - nSum问题
  • HTML知识梳理
  • 黑马学ElasticSearch(十二)
  • 初识 Django
  • C语言——语句与程序块
  • 装修--避坑--门
  • 【渗透测试】web端姿势-前端利用
  • Linux diff 命令
  • Linux系统的启动与关闭