Update Worker Node AMI in Launch Template with Automation

2024年01月20日


This blog post illustrates how to programmatically retrieve the AMI ID for Amazon EKS-optimized AMIs by querying the AWS Systems Manager Parameter Store API. This parameter eliminates the need for you to manually look up Amazon EKS optimized AMI IDs.

The SNS notification ARN for the Amazon EKS-optimized Linux AMI was deprecated. For the Linux variant AMIs there was never separate SNS topics for each variant, it was all tied to the Amazon Linux AMI. The AWS team creates separate SSM Parameter Store parameters for each variant so each one can be queried at any point to retrieve the latest AMI ID.

Architecture



(click for original picture)

Prerequisites

1. An IAM role that will be attached to the Lambda function. This role should have the following policies attached:
- AWS-managed IAM policy: AWSLambdaVPCAccessExecutionRole
- Inline policy:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "eks:DescribeCluster",
            "Resource": "arn:aws:eks:<Region>:<AWS account ID>:cluster/<EKS cluster name>"
        },
        {
            "Effect": "Allow",
            "Action": "sns:Publish",
            "Resource": "arn:aws:sns:<Region>:<AWS account ID>:<SNS topic name>"
        },
        {
            "Effect": "Allow",
            "Action": "sts:GetCallerIdentity",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ssm:GetParameter*"
            ],
            "Resource": [
                "arn:aws:ssm:us-west-2::parameter/aws/service/eks/optimized-ami/*/amazon-linux-2-arm64/recommended/image_id",
                "arn:aws:ssm:<Region>:<AWS account ID>:parameter/<Parameter name>"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "ssm:PutParameter"
            ],
            "Resource": "arn:aws:ssm:<Region>:<AWS account ID>:parameter/<Parameter name>"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:ModifyLaunchTemplate",
                "ec2:CreateLaunchTemplateVersion",
                "ec2:DescribeLaunchTemplateVersions"
            ],
            "Resource": "*"
        }
    ]
}

2. Create a parameter in the Systems Manager Parameter Store. Set its value to any value that you would like to, as it will be replaced by the next Lambda function execution.

3. Create a SNS topic. Here I subscribe my Email address to this topic.

4. Of course, EKS cluster with worker node groups.


Steps

1. Create Lambda function. The following is the Python code I used:
##############################
# FUNCTION INIT PHASE START  #
##############################

import json
import logging
import os
import re

import boto3

from botocore.exceptions import ClientError


ec2_client = boto3.client('ec2')
ssm_client = boto3.client('ssm')
sns_client = boto3.client('sns')
eks_client = boto3.client('eks')

session = boto3.session.Session()
sts_client = session.client('sts')


EKS_ClUSTER_NAME = os.getenv('EKS_ClUSTER_NAME')
SNS_TOPIC_NAME = os.getenv('SNS_TOPIC_NAME')

# the type of AMI / EC2 instace needed
AMI_TYPE = [
    # "amazon-linux-2",  # x86 based instances
    "amazon-linux-2-arm64",  # ARM instances, such as AWS Graviton based instances
    # "amazon-linux-2-gpu"  # GPU accelerated instances
]


logger = logging.getLogger()
logger.setLevel(logging.INFO)


try:
    # Retrieve the current region
    REGION = session.region_name
    logger.info(f"Current Region: {REGION}")

    # Retrieve the account ID
    ACCOUNT_ID = sts_client.get_caller_identity()["Account"]
    logger.info(f"Account ID: {ACCOUNT_ID}")
except ClientError as error:
    logger.error(f"An error occurred: {error}")
    logger.error("Could not retrieve the region or account ID.")


SNS_TOPIC_ARN = f"arn:aws:sns:{REGION}:{ACCOUNT_ID}:{SNS_TOPIC_NAME}"


try:
    # Retrieve information about the specified EKS cluster
    response = eks_client.describe_cluster(name=EKS_ClUSTER_NAME)
    cluster_info = response['cluster']

    # Get the Kubernetes version
    EKS_VER = cluster_info.get('version')
    logger.info(f"The Kubernetes version of the EKS cluster '{EKS_ClUSTER_NAME}' is: {EKS_VER}")
except ClientError as error:
    logger.error(f"An error occurred: {error}")
    logger.error(f"Could not retrieve the Kubernetes version for the cluster '{EKS_ClUSTER_NAME}'.")


##############################
#  FUNCTION INIT PHASE END   #
##############################
#     INVOKE PHASE START     #
##############################


def construct_aws_ami_param_names(ami_type):
    """
    Contrust parameter name in the AWS-owned Parameter Store regarding 
    the EKS optimized AMI ID.
    """
    return f"/aws/service/eks/optimized-ami/{EKS_VER}/{ami_type}/recommended/image_id"


def construct_self_ami_param_names(ami_type):
    """
    Contrust parameter name in the self-owned Parameter Store that 
    the EKS optimized AMI ID that next launched EC2 should use.
    """
    return f"/eks/ami/latest/{ami_type}/id"


def replace_ami_id(text, new_ami_id):
    # Define the pattern for an AMI ID
    ami_id_pattern = r"ami-[0-9a-fA-F]{17}"

    # Replace all occurrences of the AMI ID in the text
    updated_text = re.sub(ami_id_pattern, new_ami_id, text)

    return updated_text


def describe_launch_template_latest(launch_template_id):
    # Retrieve the latest version of the launch template
    try:
        return ec2_client.describe_launch_template_versions(
            LaunchTemplateId=launch_template_id,
            Versions=['$Latest']
        )['LaunchTemplateVersions'][0]
    except ClientError as error:
        logger.error(f"An error occurred: {error}")


def sns_publish(msg):
    sns_client.publish(
        TopicArn=SNS_TOPIC_ARN,
        Message=msg
    )


def get_launch_template_description(launch_template_id):
    # Extract the version description
    latest_version_description = describe_launch_template_latest(
        launch_template_id).get('VersionDescription')
    return latest_version_description


def get_launch_template_ami(launch_template_id):
    ami_id = describe_launch_template_latest(
        launch_template_id)['LaunchTemplateData'].get('ImageId')
    
    if ami_id:
        logger.info(f"The AMI ID used in Launch Template '{launch_template_id}' is: {ami_id}")
        return ami_id
    else:
        logger.error(f"Could not retrieve the AMI ID for the Launch Template '{launch_template_id}'.")


def update_launch_template_ami(launch_template_id, new_ami_id):
    try:
        # Retrieve the description of the latest version
        description = get_launch_template_description(launch_template_id)
        
        if description is not None:
            logger.info(f"Description of the latest version of Launch Template '{launch_template_id}': {description}")
        else:
            logger.warning(f"Did not retrieve the description for the latest version of the Launch Template '{launch_template_id}'. The description might be empty.")
        
        description = replace_ami_id(description, new_ami_id)
        
        logger.info("Description will be updated to:")
        logger.info(description)
        
        # Create or update the launch template version
        response = ec2_client.create_launch_template_version(
            LaunchTemplateId=launch_template_id,
            SourceVersion='$Latest',
            LaunchTemplateData={
                'ImageId': new_ami_id
            },
            VersionDescription=description,
        )

        # Set the default version to the new version
        new_version_number = response['LaunchTemplateVersion']['VersionNumber']
        ec2_client.modify_launch_template(
            LaunchTemplateId=launch_template_id,
            DefaultVersion=str(new_version_number)
        )
        logger.info(f"Launch Template updated to version {new_version_number} with AMI {new_ami_id}")
        return new_version_number
    except ClientError as error:
        logger.error(f"An error occurred: {error}")
        logger.error("Did not update the Launch Template.")


def update_launch_template():
    final_notification = list()
    counter = 1
    
    for each_ami_type in AMI_TYPE:
        launch_template_ids = os.getenv(
            f'LAUNCH_TEMPLATE_ID_{each_ami_type}'.replace("-", "_")
        ).split(",")
        
        # Retrieve the self-managed AMI ID
        self_ami_param = construct_self_ami_param_names(each_ami_type)
        self_managed_ami_id = ssm_client.get_parameter(
            Name=self_ami_param)['Parameter']['Value']
            
        for each_launch_template in launch_template_ids:
            template_ami_id = get_launch_template_ami(each_launch_template)
            
            if template_ami_id != self_managed_ami_id:
                update_launch_template_ami(each_launch_template, self_managed_ami_id)
                result = f"The AMI ID {template_ami_id} in the launch template {each_launch_template} is the different than the one ({self_managed_ami_id}) in the parameter {self_ami_param}. Updated the AMI ID setting in the launch template."
            else:
                result = f"The AMI ID {template_ami_id} in the launch template {each_launch_template} is the same as the one ({self_managed_ami_id}) specified in the parameter {self_ami_param}."
            
            final_notification.append(f"{str(counter)}. {result}\n")
            counter += 1
    
    sns_publish("".join(final_notification))

    return "AMI synchronization completed."


def update_param():
    final_notification = list()
    counter = 1
    
    for each_ami_type in AMI_TYPE:
        # Retrieve the managed AMI ID
        managed_ami_id = ssm_client.get_parameter(
            Name=construct_aws_ami_param_names(each_ami_type)
        )['Parameter']['Value']
    
        # Retrieve the self-managed AMI ID
        self_ami_param = construct_self_ami_param_names(each_ami_type)
        self_managed_ami_id = ssm_client.get_parameter(
            Name=self_ami_param)['Parameter']['Value']
        
        # Compare the AMI IDs
        if managed_ami_id != self_managed_ami_id:
            # Update the self-managed parameter
            ssm_client.put_parameter(
                Name=self_ami_param,
                Value=managed_ami_id,
                Type='String',
                Overwrite=True
            )
            result = "AMI ID updated in parameter."
            logger.info(result)
        else:
            result = "AMI IDs are in sync between the latest AWS EKS optimized AMI and the one stored in my own parameter. Parameter does not need to be updated."
            logger.info(result)
        
        final_notification.append(f"{str(counter)}. {result} The latest EKS optimized AMI ID: {managed_ami_id}, previous AMI ID stored in my parameter: {self_managed_ami_id}.\n")
        
        counter += 1
    
    sns_publish("".join(final_notification))
    
    return "AMI synchronization completed."


def lambda_handler(event, context):
    logger.info(event)
    
    if "code" in event.keys():
        if 1 == event["code"]:
            return {
                'statusCode': 200,
                'body': json.dumps(update_param())
            }
        elif 2 == event["code"]:
            return {
                'statusCode': 200,
                'body': json.dumps(update_launch_template())
            }
        else:
            logger.error("Event code is unexpected! The code value should be either 1 or 2")
    else:
        logger.error("Event message is malformated! The code key does not exist")

Under this Lambda function, create environment variables. The variable keys are EKS_ClUSTER_NAME, SNS_TOPIC_NAME, LAUNCH_TEMPLATE_ID_{AMI type}. Set their values according to your own environment. The LAUNCH_TEMPLATE_ID_{AMI type} should be in the format as either of the following:
lt-0fxxxa0, if only one launch template to be updated.
lt-0fxxxa0,lt-04xxxx16, if there are multiple launch templates to be updated.

Replace the {AMI type} with the type of AMI you need.
  • amazon-linux-2 is the most common value, for x86 based instances.
  • amazon-linux-2-arm64 is for ARM instances, such as AWS Graviton based instances.
  • amazon-linux-2-gpu is for GPU accelerated instances.

2. Create two EventBridge schedules under the default schedle group.
- One schedule is for triggering the Lambda function to check and to synchronize the EKS optimized AMI to the parameter. Use cron expression to set the frequency you would like to, e.g., 0 6 * * ? *. Set the Lambda function as its target. Configure its payload as:
{
    "code": 1
}

- Another schedule is for triggering the Lambda function to check and to synchronized the AMI ID to the launch templates. Use cron expression to set the frequency you would like to, e.g., 30 6 * * ? *. Set the Lambda function as its target.Configure its payload as:
{
    "code": 2
}


Update Worker Nodes to Use the Latest AMI


Update the worker nodes to use the latest AMI. This is done by changing the launch template version to the latest in EKS node group console.


References


Retrieving Amazon EKS optimized Amazon Linux AMI IDs

amazon-eks-ami /CHANGELOG.md


Category: container Tags: public

Upvote


Downvote