Troubleshoot - error "InvalidIdentityToken - OpenIDConnect provider's HTTPS certificate doesn't match configured thumbprint"

2022年07月19日


Background
Prior to last weekend, my EKS cluster's CNI daemonset had been assuming the worker node's IAM role, which has been working very well.

During last weekend's implementation of Karpenter, the IAM role of service account aws-node has been explicitly changed to use a new one that associated with the EKS's OIDC provider.


Below is the detail information of this new role.
Associate policy: AWS managed policy - AmazonEKS_CNI_Policy
Trust relationship:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::111122223333:oidc-provider/oidc.eks.us-west-2.amazonaws.com/id/BFDB****D49F"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "oidc.eks.us-west-2.amazonaws.com/id/BFDB****D49F:aud": "sts.amazonaws.com",
                    "oidc.eks.us-west-2.amazonaws.com/id/BFDB****D49F:sub": "system:serviceaccount:kube-system:aws-node"
                }
            }
        }
    ]
}

After this change, the service account "aws-node" has been annotated with "eks.amazonaws.com/role-arn", specifying the IAM role that this service account should assume.

kubectl get sa aws-node -n kube-system -o yaml
...
metadata:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::111122223333:role/ROLE-NAME
...


Issue
After changed the VPC CNI add-on to use a new IAM role, the Pods of the aws-node daemonset went into failed state.


Troubleshooting

While checking the Pod's log, I found the error message saying:

kubectl exec -it aws-node-xxxx -n kube-system -- /bin/bash
Defaulted container "aws-node" out of: aws-node, aws-vpc-cni-init (init)
bash-4.2# cat /host/var/log/aws-routed-eni/ipamd.log
{"level":"error","ts":"2022-07-19T13:45:18.759Z","caller":"ipamd/ipamd.go:463","msg":"Failed to call ec2:DescribeNetworkInterfaces for [eni-0da3****c06b eni-019a****1250]: WebIdentityErr: failed to retrieve credentials\ncaused by: InvalidIdentityToken: OpenIDConnect provider's HTTPS certificate doesn't match configured thumbprint\n\tstatus code: 400, request id: eb0b****7b07"}

Then I realized that, this OIDC provider, together with EKS, were created in Apr. 2020, more than two years ago.

To troubleshoot this issue and obtain a thumbprint, the OpenSSL command line tool need to be installed and configured beforehand.

To obtain a thumbprint for the OIDC provider, do the following:

1. Find the URL for the OIDC identity provider (IdP) by doing the following:
Open the Amazon EKS console.
In the navigation pane, Choose Clusters.
Select the cluster that you want to check.
Select the Configuration tab.
Note the OICD provider URL under the Details section. Include "/.well-known/openid-configuration" at the end of the OICD provider URL to form the URL for the IdP's configuration document. Example:
https://oidc.eks.us-west-2.amazonaws.com/id/BFDB****D49F/.well-known/openid-configuration

Access this URL in a web browser. The browser output looks similar to the following:
{"issuer":"https://oidc.eks.us-west-2.amazonaws.com/id/BFDB****D49F","jwks_uri":"https://oidc.eks.us-west-2.amazonaws.com/id/BFDB****D49F/keys","authorization_endpoint":"urn:kubernetes:programmatic_authorization","response_types_supported":["id_token"],"subject_types_supported":["public"],"claims_supported":["sub","iss"],"id_token_signing_alg_values_supported":["RS256"]}

Make a note of the value of jwks_uri from the output.
https://oidc.eks.us-west-2.amazonaws.com/id/BFDB****D49F/keys

Use the OpenSSL command line tool to run the following command to display all the certificates used: Note: Be sure to replace oidc.eks.us-west-2.amazonaws.com with the domain name returned in Step 1.

openssl s_client -connect oidc.eks.us-west-2.amazonaws.com:443 -showcerts
The output looks similar to the following:
(Below content COPIED FROM AWS premium-support knowledge-center web page)
CONNECTED(00000003)
depth=4 C = US, O = "Starfield Technologies, Inc.", OU = Starfield Class 2 Certification Authority
verify return:1
depth=3 C = US, ST = Arizona, L = Scottsdale, O = "Starfield Technologies, Inc.", CN = Starfield Services Root Certificate Authority - G2
verify return:1
depth=2 C = US, O = Amazon, CN = Amazon Root CA 1
verify return:1
depth=1 C = US, O = Amazon, OU = Server CA 1B, CN = Amazon
verify return:1
depth=0 CN = *.execute-api.us-east-2.amazonaws.com
verify return:1
---
Certificate chain
 0 s:/CN=*.execute-api.us-east-2.amazonaws.com
   i:/C=US/O=Amazon/OU=Server CA 1B/CN=Amazon
-----BEGIN CERTIFICATE-----
CERTIFICATE Redacted
-----END CERTIFICATE-----
 1 s:/C=US/O=Amazon/OU=Server CA 1B/CN=Amazon
   i:/C=US/O=Amazon/CN=Amazon Root CA 1
-----BEGIN CERTIFICATE-----
CERTIFICATE Redacted
-----END CERTIFICATE-----
 2 s:/C=US/O=Amazon/CN=Amazon Root CA 1
   i:/C=US/ST=Arizona/L=Scottsdale/O=Starfield Technologies, Inc./CN=Starfield Services Root Certificate Authority - G2
-----BEGIN CERTIFICATE-----
CERTIFICATE Redacted
-----END CERTIFICATE-----
 3 s:/C=US/ST=Arizona/L=Scottsdale/O=Starfield Technologies, Inc./CN=Starfield Services Root Certificate Authority - G2
   i:/C=US/O=Starfield Technologies, Inc./OU=Starfield Class 2 Certification Authority
-----BEGIN CERTIFICATE-----
MIIEdTCCA12gAwIBAgIJAKcOSkw0grd/MA0GCSqGSIb3DQEBCwUAMGgxCzAJBgNV
...
VsyuLAOQ1xk4meTKCRlb/weWsKh/NEnfVqn3sF/tM+2MR7cEXAMPLE=
-----END CERTIFICATE-----
---
Server certificate
subject=/CN=*.execute-api.us-east-2.amazonaws.com
issuer=/C=US/O=Amazon/OU=Server CA 1B/CN=Amazon
---

If you see more than one certificate in the output, then look for the last certificate displayed at the end of the output. The last certificate is the root CA in the certificate authority chain.

3. Create a certificate file (example: certificate.crt), and the copy the contents of the last certificate to this file.

vim certificate.crt
-----BEGIN CERTIFICATE-----
MIIE
...
A4w=
-----END CERTIFICATE-----
Run the following command:
openssl x509 -in certificate.crt -text
The output looks similar to the following:
(Below content COPIED FROM AWS premium-support knowledge-center web page)
Certificate:    Data:
        Version: 3 (0x2)
        Serial Number:
            a7:0e:4a:4c:34:82:b7:7f
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: C=US, O=Starfield Technologies, Inc., OU=Starfield Class 2 Certification Authority
        Validity
            Not Before: Sep  2 00:00:00 2009 GMT
            Not After : Jun 28 17:39:16 2034 GMT

You can check the validity of the certificate from the values in the Not Before and Not After fields. From the output, you can see that the validity of Amazon CA is around 25 years.

4. If the output indicates that the certificate is expired, then you must renew the certificate with your OIDC provider. After you renew the certificate, run the following command using the OpenSSL command line tool to get the latest thumbprint, after excluding the semicolons.
openssl x509 -in certificate.crt -fingerprint -noout | sed s/://g
SHA1 Fingerprint=9E99***7280

5. If the current thumbprint has expired, then use the latest thumbprint from step 4 to replace it. You can do so from the IAM console or using the AWS Command Line Interface (AWS CLI).

To replace the thumbprint, run command.
aws iam update-open-id-connect-provider-thumbprint --open-id-connect-provider-arn arn:aws:iam::111122223333:oidc-provider/oidc.eks.us-west-2.amazonaws.com/id/BFDB****D49F --thumbprint-list 9E99****7280

6. After updated the thumbprint, restart the DaemonSet.
kubectl rollout restart daemonset aws-node -n kube-system

7. Check the Pod status.
kubectl get pod -n kube-system -l k8s-app=aws-node
NAME             READY   STATUS    RESTARTS   AGE
aws-node-2mxxx   1/1     Running   0          4m59s
aws-node-89xxx   1/1     Running   0          4m41s
aws-node-9vxxx   1/1     Running   0          5m38s
aws-node-cjxxx   1/1     Running   0          4m22s
aws-node-rbxxx   1/1     Running   0          5m16s
aws-node-wzxxx   1/1     Running   0          5m54s


References

How do I troubleshoot the error "InvalidIdentityToken - OpenIDConnect provider's HTTPS certificate doesn't match configured thumbprint" when I'm using the Amazon EKS IAM role to access the service account?


Category: container Tags: public

Upvote


Downvote