Automate AWS Aurora cluster start and stop triggered by time

2019年01月30日

Background

After my Blog's database migrated to Aurora database, its performance has been enhanced a lot. Now, I'm focusing on reducing the Aurora database cost.

Hello, every body. I am Leo Du, a builder on AWS and Google Cloud. Today I will share how I reduce Aurora database cost by 30 percent (compared to no-upfront RI), and by 22 percent (compared to Aurora Serverless). The comparison is made based on the same instance type, size and region conditions.

Objective

The overall objective to reduce the Aurora cost. Because the database serves my personal Blog, I can choose to stop the Aurora cluster during night to avoid wasting money.

Before I move to the detailed design part, first let me list the objectives of the design and the deliverable.

The architecture should feature secure, cost-efficient, easy for maintenance and etc..

For functionalities, the design should make it available for adjustment of the start and stop time of Aurora cluster. This includes but not limited to, able to operate Aurora cluster start and stop, have a control precision of minute-level, be flexible to specify time-points based on different conditions, the actions should be fully automated.

The consideration is that anything could change in the future, and thus the commitment of cloud resources is not an option here.

Last but not least, the implementation of this design should not introduce too much additional cost.
*

Constraint

Long time ago, when the Blog application is hosted on RDS MySQL, it uses AWS Instance Scheduler (version 2.2.2.0) to control start and stop of EC2 and RDS MySQL instances. But when it comes to Aurora cluster, that solution of that version does not support actions against Aurora cluster. So, I have to find a replacement solution to start and stop Aurora cluster.

Solution

To meet the aforementioned requirements, the design uses a combination of System Manager - Automation, CloudWatch Events, as well as IAM. An overall architecture is illustrated as below.

Detailed Solution


To meet the requirements, we utilize System Manager - Automation, CloudWatch Events and IAM.

To meet security requirements, this design introduces IAM features, including but not limited to, assuming roles, passing roles to grant access between different service components. And as always, I leverage policies to define permissions, and meanwhile narrow down the access scope to the least privilege level.

To simplify maintenance, the architecture of the deliverable should minimize the maintenance tasks related to underlying infrastructure. Both System Manager - Automation and CloudWatch Events are serverless, and there is no need to maintain the underlying infrastructure.

To make it possible to operate Aurora cluster, especially for cluster start and stop. Specifically, the deliverable should be able to call Aurora APIs, namely StartCluster and StopCluster. The System Manager - Automation supports job definition which supports these two APIs.

Also, the deliverable should have a time precision of minute-level, and flexible to support different conditions. CloudWatch Events support cron format, which is enough flexible and able to define minute level time points.

To make the deliverable fully automated, we let CloudWatch Events to trigger actions to start or stop Aurora cluster. The Aurora start or stop is defined in System Manager - Automation, which is also responsible for executing pre-defined actions.

Implementation

System Manager - Automation

Create a new Automation Document in System Manager. It defines how to start up an Aurora cluster. Name the document name as "my-AWS-StartRdsCluster". Below is the content of this document.
---
description: Start RDS Cluster
schemaVersion: "0.3"
assumeRole: "{{ AutomationAssumeRole }}"
parameters:
  ClusterId:
    type: String
    description: (Required) RDS Cluster Id to start
  AutomationAssumeRole:
    type: String
    description: (Optional) The ARN of the role that allows Automation to perform the actions on your behalf.
    default: ""
mainSteps:
  -
    name: AssertNotStartingOrAvailable
    action: aws:assertAwsResourceProperty
    isCritical: false
    onFailure: step:StartCluster
    nextStep: CheckStart
    inputs:
      Service: rds
      Api: DescribeDBClusters
      DBClusterIdentifier: "{{ClusterId}}"
      PropertySelector: "$.DBClusters[0].Status"
      DesiredValues: ["available", "starting"]
  -
    name: StartCluster
    action: aws:executeAwsApi
    inputs:
      Service: rds
      Api: StartDBCluster
      DBClusterIdentifier: "{{ClusterId}}"
  -
    name: CheckStart
    action: aws:waitForAwsResourceProperty
    onFailure: Abort
    maxAttempts: 10
    timeoutSeconds: 600
    inputs:
      Service: rds
      Api: DescribeDBClusters
      DBClusterIdentifier: "{{ClusterId}}"
      PropertySelector: "$.DBClusters[0].Status"
      DesiredValues: ["available"]
    isEnd: true
...
Set this document version as default.

Create another Automation Document to define how to stop an Aurora cluster. Name the document name as "my-AWS-StopRdsCluster". Below is the content of this document.
---
description: Stop RDS Cluster
schemaVersion: "0.3"
assumeRole: "{{ AutomationAssumeRole }}"
parameters:
  ClusterId:
    type: String
    description: (Required) RDS Cluster Id to stop
  AutomationAssumeRole:
    type: String
    description: (Optional) The ARN of the role that allows Automation to perform the actions on your behalf.
    default: ""
mainSteps:
  -
    name: AssertNotStopped
    action: aws:assertAwsResourceProperty
    isCritical: false
    onFailure: step:StopCluster
    nextStep: CheckStop
    inputs:
      Service: rds
      Api: DescribeDBClusters
      DBClusterIdentifier: "{{ClusterId}}"
      PropertySelector: "$.DBClusters[0].Status"
      DesiredValues: ["stopped", "stopping"]
  -
    name: StopCluster
    action: aws:executeAwsApi
    inputs:
      Service: rds
      Api: StopDBCluster
      DBClusterIdentifier: "{{ClusterId}}"
  -
    name: CheckStop
    action: aws:waitForAwsResourceProperty
    onFailure: Abort
    maxAttempts: 10
    timeoutSeconds: 600
    inputs:
      Service: rds
      Api: DescribeDBClusters
      DBClusterIdentifier: "{{ClusterId}}"
      PropertySelector: "$.DBClusters[0].Status"
      DesiredValues: ["stopped"]
...
Set this document version as default.
*

IAM

CloudWatch Events needs to assume a role, and then uses the granted permissions of that role to execute SSM Automation with the Automation document and parameters (Aurora cluster ID and the role for System Manager - Automation to assume).

Create a role for the CloudWatch Event, which is for starting up an Aurora Cluster, to assume and to execute SSM Automation actions.

Below is the policy of this role. Name the role as "role_CweInvokeSsmAutomation_StartAurora".
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": "ssm:StartAutomationExecution",
            "Effect": "Allow",
            "Resource": [
                "arn:aws:ssm:<AwsRegionId>:<AwsAccountId>:automation-definition/my-AWS-StartRdsCluster:$DEFAULT"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:PassRole"
            ],
            "Resource": "arn:aws:iam::<AwsAccountId>:role/role_cwe_startStopRds",
            "Condition": {
                "StringLikeIfExists": {
                    "iam:PassedToService": "ssm.amazonaws.com"
                }
            }
        }
    ]
}
Its trust policy:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "events.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
*
Create a role for the CloudWatch Event, which is for stopping an Aurora Cluster.

Below is the policy of role. Name the role as "role_CweInvokeSsmAutomation_StopAurora".
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": "ssm:StartAutomationExecution",
            "Effect": "Allow",
            "Resource": [
                "arn:aws:ssm:<AwsRegionId>:<AwsAccountId>:automation-definition/my-AWS-StopRdsCluster:$DEFAULT"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:PassRole"
            ],
            "Resource": "arn:aws:iam::<AwsAccountId>:role/role_cwe_startStopRds",
            "Condition": {
                "StringLikeIfExists": {
                    "iam:PassedToService": "ssm.amazonaws.com"
                }
            }
        }
    ]
}
Its trust policy:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "events.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Create a new IAM role that allows System Manager - Automation to perform the actions on your behalf. Name the role as "role_cwe_startStopRds". Create a new inline policy shown as below. Also, attach IAM policies "CloudWatchEventsBuiltInTargetExecutionAccess" and "CloudWatchEventsInvocationAccess" to this role.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "rds:StartDBCluster",
                "rds:StopDBCluster",
                "rds:StopDBInstance",
                "rds:StartDBInstance",
                "rds:DescribeDBClusters"
            ],
            "Resource": "*"
        }
    ]
}
Its trust policy:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "ssm.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

CloudWatch Events

For the CloudWatch Events to start Aurora cluster, below table lists necessary configuration.
  Value Description
Name ScheduleStartRdsCluster  
Type Time-base event  
Cron expression 0 23 * * ? * every day at 7:00 (GMT+8)
Target SSM Automation Documentation ("my-AWS-StopRdsCluster")  
AutomationAssumeRole ARN of role "role_cwe_startStopRds"  
Role role_CweInvokeSsmAutomation_StartAurora  
Aurora cluster ID The Aurora cluster ID in your environment  
*

For the CloudWatch Events to stop Aurora cluster, below table lists necessary configuration.
  Value Description
Name ScheduleStopRdsCluster  
Type Time-base event  
Cron expression 0 14 * * ? * every day at 22:00 (GMT+8)
Target SSM Automation Documentation ("my-AWS-StopRdsCluster")  
AutomationAssumeRole ARN of role "role_cwe_startStopRds"  
Role role_CweInvokeSsmAutomation_StopAurora  
Aurora cluster ID The Aurora cluster ID in your environment  

Based on the above parameters, the Aurora cluster will produce 15 hours of billing everyday. Below shows a comparison of annual Aurora MySQL cost, with instance type and size being "db.t2.small", based on Seoul region. As AWS price may change over time, I need to mention that the price is obtained on Jan. 31, 2019. It is observed that there is about 30% of cost reduction compared to no-upfront RI.
  Annual cost
OD with 15 hours daily running time $344.93
RI - No Upfront $490.56
RI - Partial Upfront $415.24
RI - All Upfront $407.00

Due to the smallest size of Aurora Serverless is 2 vCPU and 4 GB of memory, here we will compare based on t2.medium instance type and size, to make the comparison even. It is observed that there is about 22% cost reduction.
  Annual cost
OD with 15 hours daily running time $684.38
Serverless $876.00

With the aforementioned configurations being set, the Aurora cluster should be able to automatically start and stop as per the cron settings.

Nota bene
When using CloudWatch Events to trigger System Manager - Automation execution, you may observe a latency between the time you set via cron expression and the status update in the Aurora console. However, if you check the Aurora Events and compare to the cron expression, you will figure out an Aurora cluster of type and size being "db.t2.small" will start around 8 minutes after the time point specified in the cron expression.
GMT 04:02 scheduled to start in CloudWatch Events
GMT 04:10 RDS cluster is totally available

Conclusion

In this article, we demonstrated how to use System Manager - Automation and CloudWatch Events to automatically control Aurora cluster start and stop. This solution gives a fine-grained control over the time point to execute corresponding actions.

Category: AWS Tags: public

Upvote


Downvote