Install Karpenter on AWS EKS cluster
2022年07月16日
Karpenter deployment
Create an Amazon EKS cluster and node group. Then set up Karpenter and deploy Provisioner API.
1) Set the following environment variables:
export CLUSTER_NAME=karpenter-demo
export AWS_DEFAULT_REGION=us-west-2
AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
2) Create a cluster with eksctl.
The following example configuration specifies a basic cluster with one initial node and sets up an IAM OIDC provider for the cluster to enable IAM roles for Pods. Note: For an existing EKS cluster, you can determine whether you have one or need to create one in Create an IAM OIDC provider for your cluster.
export CLUSTER_ENDPOINT="$(aws eks describe-cluster --name ${CLUSTER_NAME} --query "cluster.endpoint" --output text)"
echo $CLUSTER_ENDPOINT
kubectl get serviceaccount -n kube-system aws-node -o yaml
3) Create subnet tags kubernetes.io/cluster/$CLUSTER_NAME.
Karpenter discovers subnets tagged kubernetes.io/cluster/$CLUSTER_NAME. Add this tag to associated subnets of your cluster. Retrieve the subnet IDs and tag them with the cluster name.
4) Create the Karpenter node's IAM role.
K8s worker nodes launched by Karpenter must run with an EC2 instance profile that grants permissions necessary to run containers and configure networking. Karpenter discovers the instance profile using the name KarpenterNodeRole-${ClusterName}.
TEMPOUT=$(mktemp)
PS: cloudformation.yaml file content:
5) Grant access to instances using the profile to connect to the cluster. Add the Karpenter node role to your aws-auth ConfigMap.
This should update the aws-auth ConfigMap.
6) Create the KarpenterController IAM role. Karpenter requires permissions like launching instances. This will create an AWS IAM role and a Kubernetes service account and associate them using IRSA.
7) Install Karpenter Helm chart.
export KARPENTER_IAM_ROLE_ARN="arn:aws:iam::${AWS_ACCOUNT_ID}:role/${CLUSTER_NAME}-karpenter"
Add a chart repository, if you haven't done so.
helm repo add karpenter https://charts.karpenter.sh
Update information of available charts locally from chart repositories, if you haven't done so.
helm repo update
8) (Optional) Enable debug logging.
kubectl patch configmap config-logging -n karpenter --patch '{"data":{"loglevel.controller":"debug"}}'
9) Deploy the provisioner and application Pods with layered constraints by applying the following Karpenter provisioner spec. It has the requirements for architecture type (arm64 & amd64), capacity type (Spot & On-demand), and taints for GPU-based use cases.
10) Run the application deployment on a specific capacity, instance type, hardware, and Availability Zone using pod scheduling constraints.
Sample deployment
In the following sample deployment, we define the nodeSelector with topology.kubernetes.io/zone to choose an Availability Zone and on-demand arm64 instance with karpenter.sh/capacity-type and kubernetes.io/arch: arm64 and specific instance type node.kubernetes.io/instance-type so that new nodes can be launched by Karpenter using the following Pod scheduling constraints.
1) Scale the above deployment.
kubectl scale deployment inflate --replicas 3
2) Review the Karpenter Pod logs for events and more details.
kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter -c controller
3) Validate the application pods with the following command and the same will be in a Running state.
kubectl get node -L node.kubernetes.io/instance-type,kubernetes.io/arch,karpenter.sh/capacity-type
kubectl get pods -o wide
Groupless Node upgrades
When using the node groups (self-managed or managed) with an EKS cluster, and as part of upgrading the worker nodes to a newer version of Kubernetes, we would have to rely on either migrating to a new node group for self-managed or launching a new autoscaling group of worker nodes for a managed node group, as mentioned in managed node group update behavior. Whereas with the Karpenter groupless autoscaling the upgrade of nodes works with the expiry time-to-live value.
Karpenter Provisioner API has Node Expiry that will allow a node to expire on reaching the expiry time-to-live value (ttlSecondsUntilExpired). The same value is used to upgrade nodes ttlSecondsUntilExpired. The nodes will be terminated after a set period of time, after which they are replaced with newer nodes.
Note: Karpenter supports using custom launch templates. When using a custom launch template, you are taking responsibility for maintaining the launch template, including updating which AMI is used (that is, for security updates). In the default configuration, Karpenter will use the latest version of the EKS optimized AMI, which is maintained by AWS.
1) Validate the current EKS cluster Kubernetes version with the following command.
aws eks describe-cluster --name ${CLUSTER_NAME} | grep -i version
2) Deploy Pod Disruption Budget (PodDisruptionBudget / PDB) for your application deployment. PDB limits the number of Pods of a replicated application that are down simultaneously from voluntary disruptions.
kubectl get pdb
kubectl get deploy inflate
3) Upgrade the EKS cluster to a newer Kubernetes version. We can see that the cluster was upgraded successfully to 1.22.
aws eks describe-cluster --name ${CLUSTER_NAME} | grep -i version
4) Checking our workload and node created by Karpenter earlier, we can see that nodes are of version 1.20 as Karpenter used the latest version of the EKS optimized AMI based on the earlier EKS cluster version 1.20.
kubectl get node -L node.kubernetes.io/instance-type,kubernetes.io/arch,karpenter.sh/capacity-type
kubectl get pods -o wide
5) Now, let’s reconfigure the provisioner API of Karpenter and append ttlSecondsUntilExpired. This will add the node expiry, which allows the nodes to get terminated and replaced with a new one matching the EKS cluster Kubernetes version 1.21 now.
6) Review the Karpenter pod logs for events and more details.
kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter -c controller
kubectl get node -L node.kubernetes.io/instance-type,kubernetes.io/arch,karpenter.sh/capacity-type
kubectl get pods -o wide
Cleanup
Delete all the provisioners (CRDs) that were created.
kubectl delete provisioner default
helm uninstall karpenter --namespace karpenter
eksctl delete iamserviceaccount --cluster ${CLUSTER_NAME} --name karpenter --namespace karpenter
aws cloudformation delete-stack --stack-name Karpenter-${CLUSTER_NAME}
eksctl delete cluster --name ${CLUSTER_NAME}
Conclusion
Karpenter provides the option to scale nodes quickly and with very little latency. In this blog, we demonstrated how the nodes can be scaled with different options for each use case using Provisioner API by leveraging the well-known Kubernetes labels and taints and using the pod scheduling constraints within the deployment so that Pods get deployed on the Karpenter provisioned nodes. This demonstrates that we can run different types of workloads on different capacities or requirements for each of its use cases. Further, we see the upgrade node behavior for the nodes launched by Karpenter by enabling the node expiry time ttlSecondsUntilExpiredwith the provisioner API.
References
Managing Pod Scheduling Constraints and Groupless Node Upgrades with Karpenter in Amazon EKS
Introducing Karpenter – An Open-Source High-Performance Kubernetes Cluster Autoscaler
Kubernetes 节点弹性伸缩开源组件 Karpenter 实践:部署GPU推理应用
Chinese Wechat articles:
Karpenter:一个开源的高性能 Kubernetes 集群自动缩放器
Note: Karpenter is designed to be cloud-provider agnostic but currently only supports AWS.
Karpenter deployment
Create an Amazon EKS cluster and node group. Then set up Karpenter and deploy Provisioner API.
1) Set the following environment variables:
export CLUSTER_NAME=karpenter-demo
export AWS_DEFAULT_REGION=us-west-2
AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
2) Create a cluster with eksctl.
The following example configuration specifies a basic cluster with one initial node and sets up an IAM OIDC provider for the cluster to enable IAM roles for Pods. Note: For an existing EKS cluster, you can determine whether you have one or need to create one in Create an IAM OIDC provider for your cluster.
eksctl create cluster -f - << EOF --- apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: ${CLUSTER_NAME} region: ${AWS_DEFAULT_REGION} version: "1.21" managedNodeGroups: - instanceType: m5.large amiFamily: AmazonLinux2 name: ${CLUSTER_NAME}-ng desiredCapacity: 1 minSize: 1 maxSize: 2 iam: withOIDC: true EOFThis returns:
2022-07-17 05:35:00 [ℹ] eksctl version 0.105.0 2022-07-17 05:35:00 [ℹ] using region us-west-2 2022-07-17 05:35:01 [ℹ] setting availability zones to [us-west-2a us-west-2d us-west-2b] 2022-07-17 05:35:01 [ℹ] subnets for us-west-2a - public:192.168.0.0/19 private:192.168.96.0/19 2022-07-17 05:35:01 [ℹ] subnets for us-west-2d - public:192.168.32.0/19 private:192.168.128.0/19 2022-07-17 05:35:01 [ℹ] subnets for us-west-2b - public:192.168.64.0/19 private:192.168.160.0/19 2022-07-17 05:35:01 [ℹ] nodegroup "karpenter-demo-ng" will use "" [AmazonLinux2/1.21] 2022-07-17 05:35:01 [ℹ] using Kubernetes version 1.21 2022-07-17 05:35:01 [ℹ] creating EKS cluster "karpenter-demo" in "us-west-2" region with managed nodes 2022-07-17 05:35:01 [ℹ] 1 nodegroup (karpenter-demo-ng) was included (based on the include/exclude rules) 2022-07-17 05:35:01 [ℹ] will create a CloudFormation stack for cluster itself and 0 nodegroup stack(s) 2022-07-17 05:35:01 [ℹ] will create a CloudFormation stack for cluster itself and 1 managed nodegroup stack(s) 2022-07-17 05:35:01 [ℹ] if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=us-west-2 --cluster=karpenter-demo' 2022-07-17 05:35:01 [ℹ] Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "karpenter-demo" in "us-west-2" 2022-07-17 05:35:01 [ℹ] CloudWatch logging will not be enabled for cluster "karpenter-demo" in "us-west-2" 2022-07-17 05:35:01 [ℹ] you can enable it with 'eksctl utils update-cluster-logging --enable-types={SPECIFY-YOUR-LOG-TYPES-HERE (e.g. all)} --region=us-west-2 --cluster=karpenter-demo' 2022-07-17 05:35:01 [ℹ] 2 sequential tasks: { create cluster control plane "karpenter-demo", 2 sequential sub-tasks: { 4 sequential sub-tasks: { wait for control plane to become ready, associate IAM OIDC provider, 2 sequential sub-tasks: { create IAM role for serviceaccount "kube-system/aws-node", create serviceaccount "kube-system/aws-node", }, restart daemonset "kube-system/aws-node", }, create managed nodegroup "karpenter-demo-ng", } } 2022-07-17 05:35:01 [ℹ] building cluster stack "eksctl-karpenter-demo-cluster" 2022-07-17 05:35:04 [ℹ] deploying stack "eksctl-karpenter-demo-cluster" 2022-07-17 05:35:34 [ℹ] waiting for CloudFormation stack "eksctl-karpenter-demo-cluster" 2022-07-17 05:36:05 [ℹ] waiting for CloudFormation stack "eksctl-karpenter-demo-cluster" 2022-07-17 05:37:06 [ℹ] waiting for CloudFormation stack "eksctl-karpenter-demo-cluster" 2022-07-17 05:38:07 [ℹ] waiting for CloudFormation stack "eksctl-karpenter-demo-cluster" 2022-07-17 05:39:08 [ℹ] waiting for CloudFormation stack "eksctl-karpenter-demo-cluster" 2022-07-17 05:40:09 [ℹ] waiting for CloudFormation stack "eksctl-karpenter-demo-cluster" 2022-07-17 05:41:10 [ℹ] waiting for CloudFormation stack "eksctl-karpenter-demo-cluster" 2022-07-17 05:42:11 [ℹ] waiting for CloudFormation stack "eksctl-karpenter-demo-cluster" 2022-07-17 05:43:12 [ℹ] waiting for CloudFormation stack "eksctl-karpenter-demo-cluster" 2022-07-17 05:44:13 [ℹ] waiting for CloudFormation stack "eksctl-karpenter-demo-cluster" 2022-07-17 05:45:14 [ℹ] waiting for CloudFormation stack "eksctl-karpenter-demo-cluster" 2022-07-17 05:47:23 [ℹ] building iamserviceaccount stack "eksctl-karpenter-demo-addon-iamserviceaccount-kube-system-aws-node" 2022-07-17 05:47:24 [ℹ] deploying stack "eksctl-karpenter-demo-addon-iamserviceaccount-kube-system-aws-node" 2022-07-17 05:47:25 [ℹ] waiting for CloudFormation stack "eksctl-karpenter-demo-addon-iamserviceaccount-kube-system-aws-node" 2022-07-17 05:47:56 [ℹ] waiting for CloudFormation stack "eksctl-karpenter-demo-addon-iamserviceaccount-kube-system-aws-node" 2022-07-17 05:47:57 [ℹ] serviceaccount "kube-system/aws-node" already exists 2022-07-17 05:47:57 [ℹ] updated serviceaccount "kube-system/aws-node" 2022-07-17 05:47:59 [ℹ] daemonset "kube-system/aws-node" restarted 2022-07-17 05:48:01 [ℹ] building managed nodegroup stack "eksctl-karpenter-demo-nodegroup-karpenter-demo-ng" 2022-07-17 05:48:01 [ℹ] deploying stack "eksctl-karpenter-demo-nodegroup-karpenter-demo-ng" 2022-07-17 05:48:02 [ℹ] waiting for CloudFormation stack "eksctl-karpenter-demo-nodegroup-karpenter-demo-ng" 2022-07-17 05:48:33 [ℹ] waiting for CloudFormation stack "eksctl-karpenter-demo-nodegroup-karpenter-demo-ng" 2022-07-17 05:49:16 [ℹ] waiting for CloudFormation stack "eksctl-karpenter-demo-nodegroup-karpenter-demo-ng" 2022-07-17 05:50:57 [ℹ] waiting for CloudFormation stack "eksctl-karpenter-demo-nodegroup-karpenter-demo-ng" 2022-07-17 05:51:55 [ℹ] waiting for CloudFormation stack "eksctl-karpenter-demo-nodegroup-karpenter-demo-ng" 2022-07-17 05:51:55 [ℹ] waiting for the control plane availability... 2022-07-17 05:51:55 [✔] saved kubeconfig as "/Users/l***u/.kube/config" 2022-07-17 05:51:55 [ℹ] no tasks 2022-07-17 05:51:55 [✔] all EKS cluster resources for "karpenter-demo" have been created 2022-07-17 05:51:56 [ℹ] nodegroup "karpenter-demo-ng" has 1 node(s) 2022-07-17 05:51:56 [ℹ] node "ip-192-168-19-125.us-west-2.compute.internal" is ready 2022-07-17 05:51:56 [ℹ] waiting for at least 1 node(s) to become ready in "karpenter-demo-ng" 2022-07-17 05:51:56 [ℹ] nodegroup "karpenter-demo-ng" has 1 node(s) 2022-07-17 05:51:56 [ℹ] node "ip-192-168-19-125.us-west-2.compute.internal" is ready 2022-07-17 05:51:58 [ℹ] kubectl command should work with "/Users/l***u/.kube/config", try 'kubectl get nodes' 2022-07-17 05:51:58 [✔] EKS cluster "karpenter-demo" in "us-west-2" region is ready
export CLUSTER_ENDPOINT="$(aws eks describe-cluster --name ${CLUSTER_NAME} --query "cluster.endpoint" --output text)"
echo $CLUSTER_ENDPOINT
https://3402****AADD.gr7.us-west-2.eks.amazonaws.com
kubectl get serviceaccount -n kube-system aws-node -o yaml
apiVersion: v1 kind: ServiceAccount metadata: annotations: eks.amazonaws.com/role-arn: arn:aws:iam::111122223333:role/eksctl-karpenter-demo-addon-iamserviceaccoun-Role1-OUQZSEK2KI49 ... labels: app.kubernetes.io/managed-by: eksctl name: aws-node namespace: kube-system ... secrets: - name: aws-node-token-kvtfr
3) Create subnet tags kubernetes.io/cluster/$CLUSTER_NAME.
Karpenter discovers subnets tagged kubernetes.io/cluster/$CLUSTER_NAME. Add this tag to associated subnets of your cluster. Retrieve the subnet IDs and tag them with the cluster name.
SUBNET_IDS=$(aws cloudformation describe-stacks \ --stack-name eksctl-${CLUSTER_NAME}-cluster \ --query 'Stacks[].Outputs[?OutputKey==`SubnetsPrivate`].OutputValue' \ --output text)
aws ec2 create-tags \ --resources $(echo $SUBNET_IDS | tr ',' '\n') \ --tags Key="kubernetes.io/cluster/${CLUSTER_NAME}",Value=
4) Create the Karpenter node's IAM role.
K8s worker nodes launched by Karpenter must run with an EC2 instance profile that grants permissions necessary to run containers and configure networking. Karpenter discovers the instance profile using the name KarpenterNodeRole-${ClusterName}.
TEMPOUT=$(mktemp)
curl -fsSL https://karpenter.sh/v0.13.2/getting-started/getting-started-with-eksctl/cloudformation.yaml > $TEMPOUT \ && aws cloudformation deploy \ --stack-name Karpenter-${CLUSTER_NAME} \ --template-file ${TEMPOUT} \ --capabilities CAPABILITY_NAMED_IAM \ --parameter-overrides ClusterName=${CLUSTER_NAME}This returns:
Waiting for changeset to be created.. Waiting for stack create/update to complete Successfully created/updated stack - Karpenter-karpenter-demo
PS: cloudformation.yaml file content:
AWSTemplateFormatVersion: "2010-09-09" Description: Resources used by https://github.com/aws/karpenter Parameters: ClusterName: Type: String Description: "EKS cluster name" Resources: KarpenterNodeInstanceProfile: Type: "AWS::IAM::InstanceProfile" Properties: InstanceProfileName: !Sub "KarpenterNodeInstanceProfile-${ClusterName}" Path: "/" Roles: - Ref: "KarpenterNodeRole" KarpenterNodeRole: Type: "AWS::IAM::Role" Properties: RoleName: !Sub "KarpenterNodeRole-${ClusterName}" Path: / AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Principal: Service: !Sub "ec2.${AWS::URLSuffix}" Action: - "sts:AssumeRole" ManagedPolicyArns: - !Sub "arn:${AWS::Partition}:iam::aws:policy/AmazonEKS_CNI_Policy" - !Sub "arn:${AWS::Partition}:iam::aws:policy/AmazonEKSWorkerNodePolicy" - !Sub "arn:${AWS::Partition}:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly" - !Sub "arn:${AWS::Partition}:iam::aws:policy/AmazonSSMManagedInstanceCore" KarpenterControllerPolicy: Type: AWS::IAM::ManagedPolicy Properties: ManagedPolicyName: !Sub "KarpenterControllerPolicy-${ClusterName}" PolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Resource: "*" Action: # Write Operations - ec2:CreateLaunchTemplate - ec2:CreateFleet - ec2:RunInstances - ec2:CreateTags - iam:PassRole - ec2:TerminateInstances - ec2:DeleteLaunchTemplate # Read Operations - ec2:DescribeLaunchTemplates - ec2:DescribeInstances - ec2:DescribeSecurityGroups - ec2:DescribeSubnets - ec2:DescribeInstanceTypes - ec2:DescribeInstanceTypeOfferings - ec2:DescribeAvailabilityZones - ec2:DescribeSpotPriceHistory - ssm:GetParameter - pricing:GetProducts
5) Grant access to instances using the profile to connect to the cluster. Add the Karpenter node role to your aws-auth ConfigMap.
eksctl create iamidentitymapping \ --username system:node:{{EC2PrivateDNSName}} \ --cluster ${CLUSTER_NAME} \ --arn arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME} \ --group system:bootstrappers \ --group system:nodesThis returns:
2022-07-16 21:49:52 [ℹ] adding identity "arn:aws:iam::111122223333:role/KarpenterNodeRole-karpenter-demo" to auth ConfigMap
This should update the aws-auth ConfigMap.
Name: aws-auth Namespace: kube-system Labels: <none> Annotations: <none> Data ==== mapRoles: ---- - groups: - system:bootstrappers - system:nodes rolearn: arn:aws:iam::123456789012:role/eksctl-karpenter-demo-nodegroup-k-NodeInstanceRole-9BVA46MVZMRO username: system:node:{{EC2PrivateDNSName}} - groups: - system:bootstrappers - system:nodes rolearn: arn:aws:iam::123456789012:role/KarpenterNodeRole-karpenter-demo username: system:node:{{EC2PrivateDNSName}} mapUsers: ---- [] BinaryData ==== Events: <none>
6) Create the KarpenterController IAM role. Karpenter requires permissions like launching instances. This will create an AWS IAM role and a Kubernetes service account and associate them using IRSA.
eksctl create iamserviceaccount \ --cluster "${CLUSTER_NAME}" --name karpenter --namespace karpenter \ --role-name "${CLUSTER_NAME}-karpenter" \ --attach-policy-arn "arn:aws:iam::${AWS_ACCOUNT_ID}:policy/KarpenterControllerPolicy-${CLUSTER_NAME}" \ --role-only \ --approveThis returns:
2022-07-16 21:50:55 [ℹ] 1 existing iamserviceaccount(s) (kube-system/aws-node) will be excluded 2022-07-16 21:50:55 [ℹ] 1 iamserviceaccount (karpenter/karpenter) was included (based on the include/exclude rules) 2022-07-16 21:50:55 [!] serviceaccounts that exist in Kubernetes will be excluded, use --override-existing-serviceaccounts to override 2022-07-16 21:50:55 [ℹ] 1 task: { create IAM role for serviceaccount "karpenter/karpenter" } 2022-07-16 21:50:55 [ℹ] building iamserviceaccount stack "eksctl-karpenter-demo-addon-iamserviceaccount-karpenter-karpenter" 2022-07-16 21:50:55 [ℹ] deploying stack "eksctl-karpenter-demo-addon-iamserviceaccount-karpenter-karpenter" 2022-07-16 21:50:56 [ℹ] waiting for CloudFormation stack "eksctl-karpenter-demo-addon-iamserviceaccount-karpenter-karpenter" 2022-07-16 21:51:27 [ℹ] waiting for CloudFormation stack "eksctl-karpenter-demo-addon-iamserviceaccount-karpenter-karpenter"
7) Install Karpenter Helm chart.
export KARPENTER_IAM_ROLE_ARN="arn:aws:iam::${AWS_ACCOUNT_ID}:role/${CLUSTER_NAME}-karpenter"
Add a chart repository, if you haven't done so.
helm repo add karpenter https://charts.karpenter.sh
"karpenter" has been added to your repositories
Update information of available charts locally from chart repositories, if you haven't done so.
helm repo update
Hang tight while we grab the latest from your chart repositories... ...Successfully got an update from the "karpenter" chart repository ...Successfully got an update from the "cilium" chart repository Update Complete. ⎈Happy Helming!⎈
helm upgrade karpenter karpenter/karpenter --install --namespace karpenter \ --create-namespace --version v0.13.2 \ --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=${KARPENTER_IAM_ROLE_ARN} \ --set clusterName=${CLUSTER_NAME} \ --set clusterEndpoint=${CLUSTER_ENDPOINT} \ --set aws.defaultInstanceProfile=KarpenterNodeInstanceProfile-${CLUSTER_NAME} \ --wait # for the defaulting webhook to install before creating a ProvisionerThis returns:
Release "karpenter" does not exist. Installing it now. NAME: karpenter LAST DEPLOYED: Sat Jul 16 22:01:24 2022 NAMESPACE: karpenter STATUS: deployed REVISION: 1 TEST SUITE: None
8) (Optional) Enable debug logging.
kubectl patch configmap config-logging -n karpenter --patch '{"data":{"loglevel.controller":"debug"}}'
configmap/config-logging patched
9) Deploy the provisioner and application Pods with layered constraints by applying the following Karpenter provisioner spec. It has the requirements for architecture type (arm64 & amd64), capacity type (Spot & On-demand), and taints for GPU-based use cases.
cat <<EOF | kubectl apply -f - apiVersion: karpenter.sh/v1alpha5 kind: Provisioner metadata: name: default spec: requirements: - key: "karpenter.sh/capacity-type" operator: In values: ["spot", "on-demand"] - key: "kubernetes.io/arch" operator: In values: ["arm64", "amd64"] limits: resources: cpu: 1000 provider: subnetSelector: kubernetes.io/cluster/$CLUSTER_NAME: '*' securityGroupSelector: kubernetes.io/cluster/$CLUSTER_NAME: '*' ttlSecondsAfterEmpty: 30 EOFThis returned:
provisioner.karpenter.sh/default created
10) Run the application deployment on a specific capacity, instance type, hardware, and Availability Zone using pod scheduling constraints.
Sample deployment
In the following sample deployment, we define the nodeSelector with topology.kubernetes.io/zone to choose an Availability Zone and on-demand arm64 instance with karpenter.sh/capacity-type and kubernetes.io/arch: arm64 and specific instance type node.kubernetes.io/instance-type so that new nodes can be launched by Karpenter using the following Pod scheduling constraints.
cat <<EOF | kubectl apply -f - apiVersion: apps/v1 kind: Deployment metadata: name: inflate spec: replicas: 0 selector: matchLabels: app: inflate template: metadata: labels: app: inflate spec: nodeSelector: node.kubernetes.io/instance-type: t4g.2xlarge karpenter.sh/capacity-type: on-demand topology.kubernetes.io/zone: us-west-2a kubernetes.io/arch: arm64 terminationGracePeriodSeconds: 0 containers: - name: inflate image: public.ecr.aws/eks-distro/kubernetes/pause:3.5 resources: requests: cpu: 1 EOFOR
cat <<EOF | kubectl apply -f - apiVersion: apps/v1 kind: Deployment metadata: name: inflate spec: replicas: 0 selector: matchLabels: app: inflate template: metadata: labels: app: inflate spec: nodeSelector: node.kubernetes.io/instance-type: r6gd.xlarge karpenter.sh/capacity-type: on-demand topology.kubernetes.io/zone: us-west-2a kubernetes.io/arch: arm64 terminationGracePeriodSeconds: 0 containers: - name: inflate image: public.ecr.aws/eks-distro/kubernetes/pause:3.5 resources: requests: cpu: 1 EOFThis returns:
deployment.apps/inflate created
1) Scale the above deployment.
kubectl scale deployment inflate --replicas 3
deployment.apps/inflate scaled
2) Review the Karpenter Pod logs for events and more details.
kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter -c controller
2022-07-16T14:08:17.805Z DEBUG controller.provisioning Discovered subnets: [subnet-0b946c0c2c8c19b45 (us-west-2a) subnet-0417d1570d137d12e (us-west-2d) subnet-02fd9368eedb5e2e2 (us-west-2b)] {"commit": "fd19ba2", "provisioner": "default"} 2022-07-16T14:08:17.913Z INFO controller.provisioning Computed packing of 1 node(s) for 3 pod(s) with instance type option(s) [r6gd.xlarge] {"commit": "fd19ba2", "provisioner": "default"} 2022-07-16T14:08:18.091Z DEBUG controller.provisioning Discovered security groups: [sg-054cafe14bfa6f85d] {"commit": "fd19ba2", "provisioner": "default"} 2022-07-16T14:08:18.094Z DEBUG controller.provisioning Discovered kubernetes version 1.22 {"commit": "fd19ba2", "provisioner": "default"} 2022-07-16T14:08:18.177Z DEBUG controller.provisioning Discovered ami-0bea0f4e1d10dcca7 for query /aws/service/eks/optimized-ami/1.22/amazon-linux-2-arm64/recommended/image_id {"commit": "fd19ba2", "provisioner": "default"} 2022-07-16T14:08:18.177Z DEBUG controller.provisioning Discovered caBundle, length 1099 {"commit": "fd19ba2", "provisioner": "default"} 2022-07-16T14:08:18.406Z DEBUG controller.provisioning Created launch template, Karpenter-karpenter-demo-9378935887162504259 {"commit": "fd19ba2", "provisioner": "default"} 2022-07-16T14:08:20.636Z INFO controller.provisioning Launched instance: i-02a1b104a0ad0356e, hostname: ip-192-168-139-148.us-west-2.compute.internal, type: r6gd.xlarge, zone: us-west-2a, capacityType: on-demand {"commit": "fd19ba2", "provisioner": "default"} 2022-07-16T14:08:20.676Z INFO controller.provisioning Bound 3 pod(s) to node ip-192-168-139-148.us-west-2.compute.internal {"commit": "fd19ba2", "provisioner": "default"} 2022-07-16T14:08:20.676Z INFO controller.provisioning Waiting for unschedulable pods {"commit": "fd19ba2", "provisioner": "default"} 2022-07-16T14:09:19.830Z DEBUG controller.provisioning Discovered 408 EC2 instance types {"commit": "fd19ba2", "provisioner": "default"} 2022-07-16T14:09:19.987Z DEBUG controller.provisioning Discovered subnets: [subnet-0b946c0c2c8c19b45 (us-west-2a) subnet-0417d1570d137d12e (us-west-2d) subnet-02fd9368eedb5e2e2 (us-west-2b)] {"commit": "fd19ba2", "provisioner": "default"} 2022-07-16T14:09:20.141Z DEBUG controller.provisioning Discovered EC2 instance types zonal offerings {"commit": "fd19ba2", "provisioner": "default"} --- 2022-07-17T00:17:43.818Z DEBUG controller.events Normal {"commit": "062a029", "object": {"kind":"Pod","namespace":"default","name":"inflate-f9587f7c6-78l7k","uid":"6af76de8-ab74-4c32-8283-3aa38994982e","apiVersion":"v1","resourceVersion":"30364"}, "reason": "NominatePod", "message": "Pod should schedule on ip-192-168-111-244.us-west-2.compute.internal"} 2022-07-17T00:17:43.818Z DEBUG controller.events Normal {"commit": "062a029", "object": {"kind":"Pod","namespace":"default","name":"inflate-f9587f7c6-f46tx","uid":"d66e590a-4692-448c-8e2d-07b246692d80","apiVersion":"v1","resourceVersion":"30370"}, "reason": "NominatePod", "message": "Pod should schedule on ip-192-168-111-244.us-west-2.compute.internal"} 2022-07-17T00:17:43.818Z DEBUG controller.events Normal {"commit": "062a029", "object": {"kind":"Pod","namespace":"default","name":"inflate-f9587f7c6-4gpfr","uid":"9fd9ac38-cb52-4cd4-9aae-9e9103e7c426","apiVersion":"v1","resourceVersion":"30373"}, "reason": "NominatePod", "message": "Pod should schedule on ip-192-168-111-244.us-west-2.compute.internal"} 2022-07-17T00:18:35.999Z INFO controller.node Added TTL to empty node {"commit": "062a029", "node": "ip-192-168-111-244.us-west-2.compute.internal"} 2022-07-17T00:18:36.038Z INFO controller.node Added TTL to empty node {"commit": "062a029", "node": "ip-192-168-111-244.us-west-2.compute.internal"} 2022-07-17T00:18:45.076Z DEBUG controller.events Normal {"commit": "062a029", "object": {"kind":"Pod","namespace":"default","name":"inflate-f9587f7c6-78l7k","uid":"6af76de8-ab74-4c32-8283-3aa38994982e","apiVersion":"v1","resourceVersion":"30691"}, "reason": "NominatePod", "message": "Pod should schedule on ip-192-168-111-244.us-west-2.compute.internal"} 2022-07-17T00:18:45.076Z DEBUG controller.events Normal {"commit": "062a029", "object": {"kind":"Pod","namespace":"default","name":"inflate-f9587f7c6-f46tx","uid":"d66e590a-4692-448c-8e2d-07b246692d80","apiVersion":"v1","resourceVersion":"30693"}, "reason": "NominatePod", "message": "Pod should schedule on ip-192-168-111-244.us-west-2.compute.internal"} 2022-07-17T00:18:45.076Z DEBUG controller.events Normal {"commit": "062a029", "object": {"kind":"Pod","namespace":"default","name":"inflate-f9587f7c6-4gpfr","uid":"9fd9ac38-cb52-4cd4-9aae-9e9103e7c426","apiVersion":"v1","resourceVersion":"30695"}, "reason": "NominatePod", "message": "Pod should schedule on ip-192-168-111-244.us-west-2.compute.internal"} 2022-07-17T00:18:46.950Z INFO controller.node Removed emptiness TTL from node {"commit": "062a029", "node": "ip-192-168-111-244.us-west-2.compute.internal"} 2022-07-17T00:18:46.970Z INFO controller.node Removed emptiness TTL from node {"commit": "062a029", "node": "ip-192-168-111-244.us-west-2.compute.internal"} 2022-07-17T00:22:44.900Z DEBUG controller.node-state Discovered 539 EC2 instance types {"commit": "062a029", "node": "ip-192-168-111-244.us-west-2.compute.internal"} 2022-07-17T00:22:45.078Z DEBUG controller.node-state Discovered subnets: [subnet-0f6ba29c24dd48d8c (us-west-2d) subnet-01b827461efe10187 (us-west-2a) subnet-044338d5111ed565b (us-west-2b)] {"commit": "062a029", "node": "ip-192-168-111-244.us-west-2.compute.internal"} 2022-07-17T00:22:45.228Z DEBUG controller.node-state Discovered EC2 instance types zonal offerings {"commit": "062a029", "node": "ip-192-168-111-244.us-west-2.compute.internal"}
3) Validate the application pods with the following command and the same will be in a Running state.
kubectl get node -L node.kubernetes.io/instance-type,kubernetes.io/arch,karpenter.sh/capacity-type
NAME STATUS ROLES AGE VERSION INSTANCE-TYPE ARCH CAPACITY-TYPE ip-192-168-111-244.us-west-2.compute.internal Ready <none> 6m15s v1.21.12-eks-5308cf7 t4g.2xlarge arm64 on-demand ip-192-168-19-125.us-west-2.compute.internal Ready <none> 153m v1.21.12-eks-5308cf7 m5.large amd64
kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES inflate-f9587f7c6-4gpfr 1/1 Running 0 8m59s 192.168.116.156 ip-192-168-111-244.us-west-2.compute.internal <none> <none> inflate-f9587f7c6-78l7k 1/1 Running 0 8m59s 192.168.102.188 ip-192-168-111-244.us-west-2.compute.internal <none> <none> inflate-f9587f7c6-f46tx 1/1 Running 0 8m59s 192.168.117.10 ip-192-168-111-244.us-west-2.compute.internal <none> <none>We can see Karpenter applied layered constraints to launch nodes that satisfy multiple scheduling constraints of a workload, like instance type, specific Availability Zone, and hardware architecture via Karpenter.
Groupless Node upgrades
When using the node groups (self-managed or managed) with an EKS cluster, and as part of upgrading the worker nodes to a newer version of Kubernetes, we would have to rely on either migrating to a new node group for self-managed or launching a new autoscaling group of worker nodes for a managed node group, as mentioned in managed node group update behavior. Whereas with the Karpenter groupless autoscaling the upgrade of nodes works with the expiry time-to-live value.
Karpenter Provisioner API has Node Expiry that will allow a node to expire on reaching the expiry time-to-live value (ttlSecondsUntilExpired). The same value is used to upgrade nodes ttlSecondsUntilExpired. The nodes will be terminated after a set period of time, after which they are replaced with newer nodes.
Note: Karpenter supports using custom launch templates. When using a custom launch template, you are taking responsibility for maintaining the launch template, including updating which AMI is used (that is, for security updates). In the default configuration, Karpenter will use the latest version of the EKS optimized AMI, which is maintained by AWS.
1) Validate the current EKS cluster Kubernetes version with the following command.
aws eks describe-cluster --name ${CLUSTER_NAME} | grep -i version
"version": "1.22", "platformVersion": "eks.4", "alpha.eksctl.io/eksctl-version": "0.105.0",
2) Deploy Pod Disruption Budget (PodDisruptionBudget / PDB) for your application deployment. PDB limits the number of Pods of a replicated application that are down simultaneously from voluntary disruptions.
cat <<EOF | kubectl apply -f - apiVersion: policy/v1beta1 kind: PodDisruptionBudget metadata: name: inflate-pdb spec: minAvailable: 2 selector: matchLabels: app: inflate EOFThis returns:
Warning: policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget poddisruptionbudget.policy/inflate-pdb created
kubectl get pdb
NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE inflate-pdb 2 N/A 1 64s
kubectl get deploy inflate
NAME READY UP-TO-DATE AVAILABLE AGE inflate 3/3 3 3 8m37s
3) Upgrade the EKS cluster to a newer Kubernetes version. We can see that the cluster was upgraded successfully to 1.22.
aws eks describe-cluster --name ${CLUSTER_NAME} | grep -i version
"version": "1.22", "platformVersion": "eks.4", "alpha.eksctl.io/eksctl-version": "0.105.0",
4) Checking our workload and node created by Karpenter earlier, we can see that nodes are of version 1.20 as Karpenter used the latest version of the EKS optimized AMI based on the earlier EKS cluster version 1.20.
kubectl get node -L node.kubernetes.io/instance-type,kubernetes.io/arch,karpenter.sh/capacity-type
NAME STATUS ROLES AGE VERSION INSTANCE-TYPE ARCH CAPACITY-TYPE ip-192-168-139-148.us-west-2.compute.internal Ready <none> 7m47s v1.22.9-eks-810597c r6gd.xlarge arm64 on-demand ip-192-168-28-196.us-west-2.compute.internal Ready <none> 35m v1.22.9-eks-810597c m5.large amd64
kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES inflate-599c98dd86-dfmts 1/1 Running 0 8m36s 192.168.148.13 ip-192-168-139-148.us-west-2.compute.internal <none> <none> inflate-599c98dd86-jkjss 1/1 Running 0 8m36s 192.168.147.112 ip-192-168-139-148.us-west-2.compute.internal <none> <none> inflate-599c98dd86-jzznj 1/1 Running 0 8m36s 192.168.140.10 ip-192-168-139-148.us-west-2.compute.internal <none> <none>
5) Now, let’s reconfigure the provisioner API of Karpenter and append ttlSecondsUntilExpired. This will add the node expiry, which allows the nodes to get terminated and replaced with a new one matching the EKS cluster Kubernetes version 1.21 now.
cat <<EOF | kubectl apply -f - apiVersion: karpenter.sh/v1alpha5 kind: Provisioner metadata: name: default spec: requirements: - key: "karpenter.sh/capacity-type" operator: In values: ["spot", "on-demand"] - key: "kubernetes.io/arch" operator: In values: ["arm64", "amd64"] limits: resources: cpu: 1000 provider: subnetSelector: kubernetes.io/cluster/$CLUSTER_NAME: '*' securityGroupSelector: kubernetes.io/cluster/$CLUSTER_NAME: '*' ttlSecondsAfterEmpty: 30 ttlSecondsUntilExpired: 1800 EOFThis returns:
provisioner.karpenter.sh/default configuredNote: If ttlSecondsUntilExpired is nil, that means that the feature is disabled and nodes will never expire. For an example value, we can configure the node expiry to a value of 30 days as ttlSecondsUntilExpired: 2592000 (# 30 Days = 60 * 60 * 24 * 30 Seconds).
6) Review the Karpenter pod logs for events and more details.
kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter -c controller
2022-07-16T14:09:20.141Z DEBUG controller.provisioning Discovered EC2 instance types zonal offerings {"commit": "fd19ba2", "provisioner": "default"} 2022-07-16T14:11:45.805Z DEBUG controller.aws.launchtemplate Deleted launch template lt-01814ee28aa5a0998 {"commit": "fd19ba2"} 2022-07-16T14:14:21.078Z DEBUG controller.provisioning Discovered 408 EC2 instance types {"commit": "fd19ba2", "provisioner": "default"} 2022-07-16T14:14:21.189Z DEBUG controller.provisioning Discovered subnets: [subnet-0b946c0c2c8c19b45 (us-west-2a) subnet-0417d1570d137d12e (us-west-2d) subnet-02fd9368eedb5e2e2 (us-west-2b)] {"commit": "fd19ba2", "provisioner": "default"} 2022-07-16T14:14:21.349Z DEBUG controller.provisioning Discovered EC2 instance types zonal offerings {"commit": "fd19ba2", "provisioner": "default"} 2022-07-16T14:17:39.163Z DEBUG controller.provisioning Discovered subnets: [subnet-0b946c0c2c8c19b45 (us-west-2a) subnet-0417d1570d137d12e (us-west-2d) subnet-02fd9368eedb5e2e2 (us-west-2b)] {"commit": "fd19ba2", "provisioner": "default"} 2022-07-16T14:17:39.166Z INFO controller.provisioning Waiting for unschedulable pods {"commit": "fd19ba2", "provisioner": "default"} 2022-07-16T14:19:22.370Z DEBUG controller.provisioning Discovered 408 EC2 instance types {"commit": "fd19ba2", "provisioner": "default"} 2022-07-16T14:19:22.417Z DEBUG controller.provisioning Discovered subnets: [subnet-0b946c0c2c8c19b45 (us-west-2a) subnet-0417d1570d137d12e (us-west-2d) subnet-02fd9368eedb5e2e2 (us-west-2b)] {"commit": "fd19ba2", "provisioner": "default"} 2022-07-16T14:19:22.559Z DEBUG controller.provisioning Discovered EC2 instance types zonal offerings {"commit": "fd19ba2", "provisioner": "default"}
Note: In the previous logs, we can see that PodDisruptionBudget was respected by Karpenter, and then it Discovered kubernetes version 1.21, used the latest version of the EKS optimized AMI for 1.21, and launched a new node for the workload. Later, the old node was cordoned, drained, and deleted by Karpenter.
If we validate the application Pods with the following commands, we can see that Karpenter launched nodes are upgraded to 1.21, same as that of the EKS cluster Kubernetes version.
If we validate the application Pods with the following commands, we can see that Karpenter launched nodes are upgraded to 1.21, same as that of the EKS cluster Kubernetes version.
kubectl get node -L node.kubernetes.io/instance-type,kubernetes.io/arch,karpenter.sh/capacity-type
NAME STATUS ROLES AGE VERSION INSTANCE-TYPE ARCH CAPACITY-TYPE ip-192-168-139-148.us-west-2.compute.internal Ready <none> 13m v1.22.9-eks-810597c r6gd.xlarge arm64 on-demand ip-192-168-28-196.us-west-2.compute.internal Ready <none> 41m v1.22.9-eks-810597c m5.large amd64
kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES inflate-599c98dd86-dfmts 1/1 Running 0 15m 192.168.148.13 ip-192-168-139-148.us-west-2.compute.internal <none> <none> inflate-599c98dd86-jkjss 1/1 Running 0 15m 192.168.147.112 ip-192-168-139-148.us-west-2.compute.internal <none> <none> inflate-599c98dd86-jzznj 1/1 Running 0 15m 192.168.140.10 ip-192-168-139-148.us-west-2.compute.internal <none> <none>In the previous demonstration, we see that Karpenter respected the PDB and its ability to apply node expiry for upgrading of nodes launched by Karpenter. Node expiry can be used as a means of upgrading or repacking nodes so that nodes are retired and replaced with updated versions. See How Karpenter nodes are deprovisioned in the Karpenter documentation for information on using ttlSecondsUntilExpired and ttlSecondsAfterEmpty.
Cleanup
Delete all the provisioners (CRDs) that were created.
kubectl delete provisioner default
provisioner.karpenter.sh "default" deletedRemove Karpenter and delete the infrastructure from your AWS account.
helm uninstall karpenter --namespace karpenter
release "karpenter" uninstalled
eksctl delete iamserviceaccount --cluster ${CLUSTER_NAME} --name karpenter --namespace karpenter
2022-07-16 22:25:55 [ℹ] 1 iamserviceaccount (karpenter/karpenter) was included (based on the include/exclude rules) 2022-07-16 22:25:59 [ℹ] 1 task: { 2 sequential sub-tasks: { delete IAM role for serviceaccount "karpenter/karpenter" [async], delete serviceaccount "karpenter/karpenter", } }2022-07-16 22:25:59 [ℹ] will delete stack "eksctl-karpenter-demo-addon-iamserviceaccount-karpenter-karpenter" 2022-07-16 22:26:00 [ℹ] serviceaccount "karpenter/karpenter" was already deleted
aws cloudformation delete-stack --stack-name Karpenter-${CLUSTER_NAME}
aws ec2 describe-launch-templates \ | jq -r ".LaunchTemplates[].LaunchTemplateName" \ | grep -i Karpenter-${CLUSTER_NAME} \ | xargs -I{} aws ec2 delete-launch-template --launch-template-name {}
eksctl delete cluster --name ${CLUSTER_NAME}
Conclusion
Karpenter provides the option to scale nodes quickly and with very little latency. In this blog, we demonstrated how the nodes can be scaled with different options for each use case using Provisioner API by leveraging the well-known Kubernetes labels and taints and using the pod scheduling constraints within the deployment so that Pods get deployed on the Karpenter provisioned nodes. This demonstrates that we can run different types of workloads on different capacities or requirements for each of its use cases. Further, we see the upgrade node behavior for the nodes launched by Karpenter by enabling the node expiry time ttlSecondsUntilExpiredwith the provisioner API.
References
Managing Pod Scheduling Constraints and Groupless Node Upgrades with Karpenter in Amazon EKS
Introducing Karpenter – An Open-Source High-Performance Kubernetes Cluster Autoscaler
Kubernetes 节点弹性伸缩开源组件 Karpenter 实践:部署GPU推理应用
Chinese Wechat articles:
Karpenter:一个开源的高性能 Kubernetes 集群自动缩放器