Troubleshoot AWS EKS application deploy failure due to insufficient privilege

2020年03月09日

This blog first demonstated the fact that insufficient privilege will lead to pod failure, and then illustrated how to troubleshoot relevant issues when a K8S pod falls into ERROR status.

Specifically, there are chances that EKS worker nodes group that associated EC2 profile with insufficient access for the application to serve traffic., in which case the K8S pod creation will fail.

The environment is based on AWS cloud, using a K8S cluster and some other relevant resources. The worker node was actually an AWS EC2 instance, which had a default role attached to it.

The EKS worker node is deployed using AWS CloudFormation template. To re-generate the issue, we will use a public CloudFormation template (URL: https://amazon-eks.s3-us-west-2.amazonaws.com/cloudformation/2019-11-15/amazon-eks-nodegroup.yaml). Without adjust the policies that would be used to attached to the worker node, we will see the pod went into ERROR. Then we will do the troubleshoot and fix the issue in the end.

Below is the relevant block showing the permissions that the EC2 role will grant to do.

...
Resources:
  NodeInstanceRole:
    Type: "AWS::IAM::Role"
    Properties:
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - !FindInMap [ServicePrincipals, !Ref "AWS::Partition", ec2]
            Action:
              - "sts:AssumeRole"
      ManagedPolicyArns:
        - !Sub "arn:${AWS::Partition}:iam::aws:policy/AmazonEKSWorkerNodePolicy"
        - !Sub "arn:${AWS::Partition}:iam::aws:policy/AmazonEKS_CNI_Policy"
        - !Sub "arn:${AWS::Partition}:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
      Path: /
...

After the application has been deployed into K8S cluster, view the deployment status.
$ kubectl get deployments
NAME                       READY   UP-TO-DATE   AVAILABLE   AGE
web-client-deployment      0/1     1            0           27s

List all pods in the namespace.
kubectl get pods
NAME                                       READY   STATUS    RESTARTS   AGE
...
web-client-deployment-7bfcfc9df#######     0/1     Error     3          73s

Then, we fetched details about that pod.
kubectl describe pods web-client-deployment-7###
...
    State:          Waiting
      Reason:       CrashLoopBackOff
...
The blocks above and under these lines have been removed for conciseness.

It is observed that the status is "waiting" and the reason of being waiting is "CrashLoopBackOff". Then we try to retrieve logs from that pod.
kubectl logs web-client-deployment-7### web-client
ERROR: Unable to create or describe required table in DynamoDB { AccessDeniedException: User: arn:aws:sts::123456789012:assumed-role/EKS-WorkerNodes-NodeInstanceRole-1R710U#######/i-091b87da0b####### is not authorized to perform: dynamodb:CreateTable on resource: arn:aws:dynamodb:us-east-1:123456789012:table/s3-photos-bucket-id
    at Request.extractError (/app/node_modules/aws-sdk/lib/protocol/json.js:48:27)
    at Request.callListeners (/app/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
    at Request.emit (/app/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
    at Request.emit (/app/node_modules/aws-sdk/lib/request.js:683:14)
    at Request.transition (/app/node_modules/aws-sdk/lib/request.js:22:10)
    at AcceptorStateMachine.runTo (/app/node_modules/aws-sdk/lib/state_machine.js:14:12)
    at /app/node_modules/aws-sdk/lib/state_machine.js:26:10
    at Request.<anonymous> (/app/node_modules/aws-sdk/lib/request.js:38:9)
    at Request.<anonymous> (/app/node_modules/aws-sdk/lib/request.js:685:12)
    at Request.callListeners (/app/node_modules/aws-sdk/lib/sequential_executor.js:116:18)
  message: 'User: arn:aws:sts::123456789012:assumed-role/EKS-WorkerNodes-NodeInstanceRole-1R710U#######/i-091b87da0b####### is not authorized to perform: dynamodb:CreateTable on resource: arn:aws:dynamodb:us-east-1:123456789012:table/s3-photos-bucket-id',
  code: 'AccessDeniedException',
  time: 2020-03-09T12:06:59.714Z,
  requestId: 'L5GIC1M6CQJGCU3O56DRPJ5RFFVV4KQNSO5AEMVJF66Q9#######',
  statusCode: 400,
  retryable: false,
  retryDelay: 42.64570494712846 }

There is an error message stating that the user with assumed role is not authorized to perform certain API call, i.e. dynamodb:CreateTable, on the DynamoDB table, i.e. s3-photos-bucket-id. 

To fix this, we need to grant necessary privileges to the role. Specifically, we added below policy to the EC2 role. Here, we added it as an inline policy.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "dynamodb:CreateTable",
            "Resource": "*"
        }
    ]
}

NB
A Deployment’s rollout is triggered if and only if the Deployment’s Pod template (that is, .spec.template) is changed, for example if the labels or container images of the template are updated. Other updates, such as scaling the Deployment, do not trigger a rollout.

After a while, view the deployment again, the output is similar to below.
kubectl get deployment
NAME                       READY   UP-TO-DATE   AVAILABLE   AGE
web-client-deployment      1/1     1            1           2m44s
...
-

Category: container Tags: public

Upvote


Downvote