TZ Weekly 24-6
2023年12月17日
- Upgraded from ElasticSearch version 7.10 to OpenSearch version 1.3
[Minor upgrade]
- Updated EKS add-ons version to latest.
- Upgraded Aurora engine: v3.05.1 → v3.05.2.
- Migrated interruptable Pods from on-demand EC2 instances to spot instances. Traffic is forwarded to all Pods of the replica set. Refer to Schedule Kubernetes Pods across AWS EC2 Instances of Different Purchase Options.
- The service to run ES engine migrated from ElasticSearch to OpenSearch
- N/A
[Minor upgrade]
- Update EKS add-ons version to latest.
- Send logs to OpenSearch via Fluent Bit.
-- Deprecated the logging solution to use CloudWatch Container Insight.
-- Deprecated the logging solution to use EFS and then pick up by FluentD.
- N/A
[Minor upgrade]
- Update EKS-optimized AMIs to latest.
- Update Terraform version to latest.
- Update EKS add-ons version to latest.
- Use spot instances to build multi-architecture container images.
- N/A
[Minor update]
- Automated the process to update EKS worker node AMI. Refer to Update Worker Node AMI in Launch Template with Automation.
- N/A
[Minor upgrade]
- Update EKS-optimized AMIs to latest.
% kubectl scale deploy -n karpenter karpenter --replicas=0
In my case, its related compute resource cost does not produce comparable benefits.
- Reduced one worker node by increasing the amount of available IP addresses for the EC2 nodes. Refer to Increasing the amount of available IP addresses for the EC2 nodes.
As a consequence, 50% worker nodes have been reduced, meaning a 50% cost saving has been achieved for the EKS node group compute resource part.
Before cost optimization:
After cost optimization:
- Upgraded container base image Ubuntu v20.04 → v22.04 and Python (core application) v3.9 → v3.10.
- Upgraded Django from version 4.2 to 5.0: v3.0 → v5.0.
[Minor upgrade]
- Upgraded RDS Aurora MySQL from version 3.02.2 to 3.05.1.
- Deploy EKS observability solution, deploy CloudWatch Container Insights. Refer to Deploy Container Insights within EKS cluster.
- Upgraded Django from major version 3 to 4.2: v3.0 → v5.0.
[Minor upgrade]
- Upgraded and pinned dependency packages.
- Cost saving on EKS worker nodes - decommissioned all x86_64 worker nodes. Savings plan rate costs for ARM-architecture instance families are 20% (referential value) less than X86-architecture ones.
- Migrated Amazon EFS CSI driver: self-managed → Amazon-managed (EKS add-on).
- Upgraded Knative: v1.10 → v1.12.
- Upgraded Istio: v1.18 → v1.19.
[Minor upgrade]
- N/A
- Upgraded K8s: v1.27 → v1.28.
- Updated AWS load balancer controller: v2.4.5 → v2.6.2.
- Updated worker node AMI for the new K8s version: v1.27 → v1.28.
[Minor upgrade]
- Updated Amazon EKS add-ons.
libmysqlclient-dev #187
This site is scheduled for a series of major/minor upgrades, including optimizations and architectural revisions. The entire process will be documented, and this post will serve as a comprehensive index. For detailed information, please refer to the respective individual posts.
Release Note of 2024Q1 5th Upgrade (Feb. 11, 2024)
Version upgrades
[Major upgrade]- Upgraded from ElasticSearch version 7.10 to OpenSearch version 1.3
[Minor upgrade]
- Updated EKS add-ons version to latest.
- Upgraded Aurora engine: v3.05.1 → v3.05.2.
Optimization
- Accelerated the instance transition speed of an ASG when the corresponding EKS node group get updated.FinOps
- Migrated from the Container Insight solution to FluentBit + OpenSearch solution, due to the considerable bills that Container Insight observations would generate.- Migrated interruptable Pods from on-demand EC2 instances to spot instances. Traffic is forwarded to all Pods of the replica set. Refer to Schedule Kubernetes Pods across AWS EC2 Instances of Different Purchase Options.
Re-architecture
[Major update]- The service to run ES engine migrated from ElasticSearch to OpenSearch
Release Note of 2024Q1 4th Upgrade (Jan. 27, 2024 - Jan. 28, 2024)
Version upgrades
[Major upgrade]- N/A
[Minor upgrade]
- Update EKS add-ons version to latest.
FinOps
- Replaced the Container Insight logging solution which generates considerable cost.Re-architecture
[Major update]- Send logs to OpenSearch via Fluent Bit.
-- Deprecated the logging solution to use CloudWatch Container Insight.
-- Deprecated the logging solution to use EFS and then pick up by FluentD.
Release Note of 2024Q1 3rd Upgrade (Jan. 20, 2024 - Jan. 21, 2024)
Version upgrades
[Major upgrade]- N/A
[Minor upgrade]
- Update EKS-optimized AMIs to latest.
- Update Terraform version to latest.
- Update EKS add-ons version to latest.
FinOps
- Add one spot instance into the EKS node group to run interruptable workloads.- Use spot instances to build multi-architecture container images.
Re-architecture
[Major update]- N/A
[Minor update]
- Automated the process to update EKS worker node AMI. Refer to Update Worker Node AMI in Launch Template with Automation.
Housekeeping
- Cleanup EKS bootstrap parameters. Refer to Cleanup EKS bootstrap parameters.Release Note of 2024Q1 2nd Upgrade (Jan. 13, 2024 - Jan. 14, 2024)
Version upgrades
[Major upgrade]- N/A
[Minor upgrade]
- Update EKS-optimized AMIs to latest.
Performance tuning
- N/AFinOps
- Reduced the worker node fleet by two by restricting Karpenter workloads only in the dev/test environment. After checked a few possibilities to reduce this cost, the cost of this resource is decided to be cut down.% kubectl scale deploy -n karpenter karpenter --replicas=0
deployment.apps/karpenter scaledNote
In my case, its related compute resource cost does not produce comparable benefits.
- Reduced one worker node by increasing the amount of available IP addresses for the EC2 nodes. Refer to Increasing the amount of available IP addresses for the EC2 nodes.
As a consequence, 50% worker nodes have been reduced, meaning a 50% cost saving has been achieved for the EKS node group compute resource part.
Before cost optimization:
After cost optimization:
Fix issues
- Automatically register new launched worker nodes to target group.TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"` EC2InstanceID=$(curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/instance-id) aws elbv2 register-targets --target-group-arn arn:aws:elasticloadbalancing:us-west-2:<111122223333>:targetgroup/<target-group-name>/c8***19 --targets Id=${EC2InstanceID},Port=31xxx
Re-architecture
- Changed from assigning secondary IP addresses toPods
to assigning IP prefixes. For more information, refer to the second item listed in the above FinOps section.Release Note of 2024Q1 1st Upgrade (Jan. 6, 2024 - Jan. 7, 2024)
Version upgrades
[Major upgrade]- Upgraded container base image Ubuntu v20.04 → v22.04 and Python (core application) v3.9 → v3.10.
- Upgraded Django from version 4.2 to 5.0: v3.0 → v5.0.
[Minor upgrade]
- Upgraded RDS Aurora MySQL from version 3.02.2 to 3.05.1.
Fix issues
- Fixed KubeCost Pod creation issue. Refer to Troubleshoot KubeCost Pod Creating Issue.Re-architecture
- Build and use multi-architecture container image for application. Refer to Build Your-Own Multi-architecture container image with Your Own Infrastructure.- Deploy EKS observability solution, deploy CloudWatch Container Insights. Refer to Deploy Container Insights within EKS cluster.
Release Note of 2023Q4 3rd Upgrade (Dec. 30, 2023 - Jan. 1, 2024)
Version upgrades
[Major upgrade]- Upgraded Django from major version 3 to 4.2: v3.0 → v5.0.
[Minor upgrade]
- Upgraded and pinned dependency packages.
FinOps
- Cost saving on WorkMail - cost reduction 92% on WorkMail (theoretical value, based on the current AWS accounts in the Organizations).- Cost saving on EKS worker nodes - decommissioned all x86_64 worker nodes. Savings plan rate costs for ARM-architecture instance families are 20% (referential value) less than X86-architecture ones.
Re-architecture
- Moved ALL of the containers to the Graviton architecture!- Migrated Amazon EFS CSI driver: self-managed → Amazon-managed (EKS add-on).
Release Note of 2023Q4 2nd Upgrade (Dec. 23 - 25)
Version upgrades
[Major upgrade]- Upgraded Knative: v1.10 → v1.12.
- Upgraded Istio: v1.18 → v1.19.
[Minor upgrade]
- N/A
Release Note of 2023Q4 1st Upgrade (Dec. 16 - 17)
Version upgrades
[Major upgrade]- Upgraded K8s: v1.27 → v1.28.
- Updated AWS load balancer controller: v2.4.5 → v2.6.2.
- Updated worker node AMI for the new K8s version: v1.27 → v1.28.
[Minor upgrade]
- Updated Amazon EKS add-ons.
References
libmysqlclient-dev #187