A Hands-on Deep Dive into
AWS Auth and Kubernetes

An in-depth exploration of the primary options that you have to authenticate and authorize AWS resources to talk to your EKS Kubernetes cluster and vice versa.

Sean Kane
SuperOrbital Engineer
Flipping bits on the internet since 1992.

Published on April 25, 2024


Table of Contents

Overview

fingerprint banner

In this article, we will take a hands-on approach to exploring the various options that AWS users can use to allow controlled access between AWS cloud resources and EKS Kubernetes cluster resources.

When using an EKS1 cluster in AWS2 two scenarios often crop up regarding authentication and authorization that have nothing to do with human access. These are:

How do I allow AWS cloud resources access to my Kubernetes cluster?and
How do I allow my Kubernetes Pods to perform AWS API3 calls?

For better or worse, there are at least two valid answers to each of these questions.

How do I allow AWS cloud resources access to my Kubernetes cluster?

The aws-auth ConfigMap was the original option to avoid using hard-coded credentials. This configuration is managed inside the Kubernetes cluster and contains a series of maps that define the AWS users, roles, and accounts that have access to the cluster, and what Kubernetes Roles they each should be bound to.

With the release of Amazon EKS v1.23, AWS introduced a new feature called EKS Access Entries. Using this new feature it is now possible to fully manage what AWS entities have access to your EKS cluster through the AWS API, which means that you are no longer required to manage and apply Kubernetes manifest directly to the cluster.

By default, both of these methods are supported in new EKS clusters, but eventually, EKS Access Entries will become the default.

How do I allow my Kubernetes Pods to perform AWS API calls?

To enable software running inside our Kubernetes Pods to talk to the AWS APIs in a controlled manner, we have always had the option to use hard-coded credentials, but there are significant risks in doing this since they are very hard to manage securely. The traditional approach to avoiding this issue is to leverage IAM roles for service accounts, which makes it possible to leverage an OpenID Connect Issuer and Provider and AWS roles to manage what access each Kubernetes ServiceAccount has to the AWS API. The primary downside to this approach is that it requires adding annotations to each of your ServiceAccount manifests, with the specific AWS role ARN5 that the ServiceAccount is expected to use. IRSA can also be used to provide access to AWS APIs in other AWS accounts. Something that we will cover near the end of the article.

On November 26th, 2023 AWS announced EKS Pod Identity, which provides a much simpler method for providing Pods access to AWS resources inside of the same account. With EKS Pod Identities it is possible to give Pods access to AWS resources without requiring any changes to the manifests inside the Kubernetes cluster itself, beyond initially installing the Amazon EKS Pod Identity Agent add-on.

Preparation

In this article, we are going to explore each of the available options, discuss some of the tradeoffs, demonstrate how the various components need to be assembled and how you can test access before and after each method has been applied.

To help try to make the requirements as clear as possible, we are going to avoid using tools, like Amazon’s eksctl which often obfuscates the details, and instead rely on Hashicorp’s Terraform. This way each component and its various connections are explicitly defined throughout the examples.

NOTE: This article assumes that you are using a Unix-style shell, like bash. If you are using PowerShell, you will need to adjust a few things like how you set environment variables and anywhere that we use sub-shells via $(). On Windows, it will likely be much easier to use The Windows Subsystem for Linux (WSL) to follow along. Using Git Bash for Windows, is NOT recommended, as it can behave in unexpected ways at times.

Spinning up an EKS Cluster

NOTE: This blog post is designed so that you can follow along and do all of this locally, however, you do not have to. Spinning this up will cost you money and there are no guarantees that this is secure enough for your environment, so use this code with appropriate caution.

The terraform codebase that we are going to use can be found at github.com/superorbital/aws-eks-auth-examples.

git clone https://github.com/superorbital/aws-eks-auth-examples.git
cd aws-eks-auth-examples

As mentioned in the README.md for the superorbital/aws-eks-auth-examples repo, to do this all yourself, you will need at least one AWS account that you have full access to, and you will also need to have the AWS, kubectl, jq and Terraform CLIs6 installed.

NOTE: Near the end of the article we will discuss how to allow Kubernetes Pods to make AWS API calls to other AWS accounts. To do the hands-on work for that section, you will need access to a second AWS account, which we will cover then.

You will also initially need one AWS profile, with admin access to your account, that looks very similar to this:

  • ~/.aws/config
[profile aws-auth-account-one]
region=us-west-2
output=yaml-stream
  • ~/.aws/credentials
[aws-auth-account-one]
aws_access_key_id=REDACTED_ACCESS_KEY
aws_secret_access_key=REDACTED_SECRET_ACCESS_KEY

Once this is all set up, then you should be able to spin up the core infrastructure, like so:

export AWS_PROFILE="aws-auth-account-one"

terraform init

terraform apply -var dev_role_id=$(aws --profile aws-auth-account-one iam get-user --output text --query 'User.UserName')

NOTE: It can easily take 15-20 minutes for the whole environment to spin up or down.

If all went well, you should have output from Terraform that looks something like this:

module.eks.data.aws_partition.current: Reading...
…
module.eks.aws_eks_addon.this["kube-proxy"]: Creation complete after 34s [id=aws-eks-auth-test:kube-proxy]

Apply complete! Resources: 79 added, 0 changed, 0 destroyed.

Outputs:

aws_iam_keys_user_one = <sensitive>
aws_iam_keys_user_three = <sensitive>
aws_iam_keys_user_two = <sensitive>
ec2_irsa_role_arn = []
…

Let’s confirm that we have access to the new EKS-based Kubernetes cluster:

$ aws --profile aws-auth-account-one eks update-kubeconfig --name aws-eks-auth-test

Added new context arn:aws:eks:us-west-2:123456789012:cluster/aws-eks-auth-test to ~/.kube/config
$ kubectl get all -A

NAMESPACE     NAME                               READY   STATUS    RESTARTS   AGE
…

Accessing the Cluster with AWS Entities

In this section, we will explore how you can give entities inside AWS access to your EKS cluster, by exploring the deprecated aws-auth ConfigMap and the new access entries.

At the time of this writing, the Terraform EKS module configures the cluster to support both the aws-auth ConfigMap and the new access entries, by setting authentication_mode to "API_AND_CONFIG_MAP", in the future there may be a point where API is the only option.

The aws-auth ConfigMap

The aws-auth ConfigMap is created by default when you create a managed node group or when a node group is created using eksctl. If you have no node-groups you will not see this ConfigMap.

Traditionally, when you wanted to give IAM7 user, roles, and accounts access to an EKS cluster your only option was to manage a ConfigMap in the kube-system namespace within the cluster. You can take a look at the one in your current cluster by running:

$ kubectl get cm -n kube-system aws-auth -o yaml

apiVersion: v1
data:
  mapAccounts: |
    []
  mapRoles: |
    - "groups":
      - "system:bootstrappers"
      - "system:nodes"
      "rolearn": "arn:aws:iam::123456789012:role/default_node_group-eks-node-group-2024042316563038110000000e"
      "username": "system:node:{{EC2PrivateDNSName}}"
  mapUsers: |
    null
    ...
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
…

At the moment, this is a default ConfigMap that has a single entry that allows the default node group role to access the Kubernetes API, so that new nodes can be registered with the cluster.

We are going to need to run terraform apply with the create_test_users variable set to true so that we can create a few additional users in our account and give some of them access to the cluster. This command will create our new IAM users (_UserOne, UserTwo, and UserThree) and update the ConfigMap to reference the first two of these three new accounts.

$ terraform apply -var dev_role_id=$(aws --profile aws-auth-account-one iam get-user --output text --query 'User.UserName') -var create_test_users=true

…
  # aws_iam_access_key.user_one[0] will be created
  # aws_iam_access_key.user_three[0] will be created
  # aws_iam_access_key.user_two[0] will be created
  # aws_iam_policy.eks_users[0] will be created
  # aws_iam_user.user_one[0] will be created
  # aws_iam_user.user_three[0] will be created
  # aws_iam_user.user_two[0] will be created
  # aws_iam_user_policy_attachment.user_one[0] will be created
  # aws_iam_user_policy_attachment.user_three[0] will be created
  # aws_iam_user_policy_attachment.user_two[0] will be created
  # module.aws_auth.kubernetes_config_map_v1_data.aws_auth[0] will be updated in-place
…
Plan: 10 to add, 1 to change, 0 to destroy.
…

If we take a look at the ConfigMap again, we should see that two new IAM user ARNs have been added to it under mapUsers.

$ kubectl get cm -n kube-system aws-auth -o yaml

apiVersion: v1
data:
  mapAccounts: |
    []
  mapRoles: |
…
  mapUsers: |
    - "groups":
      - "system:masters"
      "userarn": "arn:aws:iam::123456789012:user/UserOne"
      "username": "UserOne"
    - "groups":
      - "eks-default"
      "userarn": "arn:aws:iam::123456789012:user/UserTwo"
      "username": "UserTwo"
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
…

Each entry in mapUsers takes an IAM user ARN and maps it to a Kubernetes Role. "system:masters" is defined in the cluster-admin ClusterRoleBinding (kubectl get clusterrolebindings cluster-admin -o yaml) and gives that user complete access to the cluster. In the next step, we will create a new eks-default Role to manage the other users’ permissions inside the cluster.

If you want to discover what other RBAC subjects, like "system:masters" are in your cluster, you can install a tool like rbac-lookup via krew and then run kubectl rbac-lookup to list all of them and what Roles/ClusterRoles they are defined in.

Now, we will create a Kubernetes manifest file called test-cm-role.yaml with the following contents, which will create a Role and RoleBinding that has very broad access to the default namespace, but nowhere else.

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: eks-default
  namespace: default
rules:
- apiGroups: ["*"]
  resources: ["*"]
  verbs: ["*"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: eks-default
  namespace: default
subjects:
- kind: Group
  name: eks-default
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: eks-default
  apiGroup: rbac.authorization.k8s.io

and then we will apply this to the cluster:

$ kubectl apply -f ./test-cm-role.yaml

role.rbac.authorization.k8s.io/eks-default created
rolebinding.rbac.authorization.k8s.io/eks-default created

At this point, we should create three new AWS profiles that will allow us to easily test what sort of access each of these users has to our cluster.

To grab the credentials for each of the users, we can run the following commands. Each one will return a comma-separated value that includes the AWS Access Key and the AWS Secret Access Key (e.g. AAAAKN34NFH3425JDNM3,2-sadlkjasi867y3425hsedfhnf834-sdiohn-3a).

terraform output -raw aws_iam_keys_user_one

terraform output -raw aws_iam_keys_user_two

terraform output -raw aws_iam_keys_user_three
  • ~/.aws/config
[profile aws-auth-account-one-userone]
region=us-west-2
output=yaml-stream

[profile aws-auth-account-one-usertwo]
region=us-west-2
output=yaml-stream

[profile aws-auth-account-one-userthree]
region=us-west-2
output=yaml-stream
  • ~/.aws/credentials
    • Ensure that you set each aws_access_key_id and aws_secret_access_key with the keys for that specific user, as retrieved via terraform output.
[aws-auth-account-one-userone]
aws_access_key_id=REDACTED_ACCESS_KEY
aws_secret_access_key=REDACTED_SECRET_ACCESS_KEY

[aws-auth-account-one-usertwo]
aws_access_key_id=REDACTED_ACCESS_KEY
aws_secret_access_key=REDACTED_SECRET_ACCESS_KEY

[aws-auth-account-one-userthree]
aws_access_key_id=REDACTED_ACCESS_KEY
aws_secret_access_key=REDACTED_SECRET_ACCESS_KEY

Now if we use UserOne’s credentials to access the cluster, we should see that it can see everything in the cluster just fine.

$ AWS_PROFILE="aws-auth-account-one-userone" aws eks update-kubeconfig --name aws-eks-auth-test --user-alias userone

Updated context userone in ~/.kube/config
$ kubectl get all -A

NAMESPACE     NAME                               READY   STATUS    RESTARTS   AGE
…

If we then do the same thing with UserTwo’s credentials, we should see that they can see things inside the default NameSpace, but nowhere else.

$ AWS_PROFILE="aws-auth-account-one-usertwo" aws eks update-kubeconfig --name aws-eks-auth-test --user-alias usertwo

Updated context usertwo in ~/.kube/config
$ kubectl get all -n default

NAME                 TYPE        CLUSTER-IP          EXTERNAL-IP   PORT(S)   AGE
…
$ kubectl get all -A

Error from server (Forbidden): pods is forbidden: User "UserTwo" cannot list resource "pods" in API group "" at the cluster scope
…

And now, if we then do the same thing with UserThree’s credentials, we should see that they can not do anything at all, despite being allowed to run aws eks update-kubeconfig.

$ AWS_PROFILE="aws-auth-account-one-userthree" aws eks update-kubeconfig --name aws-eks-auth-test --user-alias userthree

Updated context userthree in ~/.kube/config
$ kubectl get all -n default

error: You must be logged in to the server (Unauthorized)
…

AWS Access Entries

The newer approach to managing this access no longer requires you to make changes to objects inside the Kubernetes cluster (e.g. the aws-auth ConfigMap). Instead, you can simply use the AWS API to define what IAM users should have access to and what sort of access they should have.

We can enable access for UserThree by creating an EKS Access Entry along with an associated access policy.

To accomplish this with our Terraform code, we can simply set the create_access_entries variable to true for our next Terraform run.

$ terraform apply -var dev_role_id=$(aws --profile aws-auth-account-one iam get-user --output text --query 'User.UserName') -var create_test_users=true -var create_access_entries=true

…
  # aws_eks_access_entry.user_three[0] will be created
  # aws_eks_access_policy_association.user_three[0] will be created
…
Plan: 2 to add, 0 to change, 0 to destroy.
…

NOTE: AWS Access Entries will win out in conflicts with the aws-auth ConfigMap.

The Access Entry for UserThree gives that user view access to the following 4 namespaces "default", "kube-node-lease", "kube-public", "kube-system" via the AWS standard arn:aws:eks::aws:cluster-access-policy/AmazonEKSViewPolicy policy.

We can confirm that this is working as we expect by running a few commands against the cluster as this user.

$ AWS_PROFILE="aws-auth-account-one-userthree" aws eks update-kubeconfig --name aws-eks-auth-test --user-alias userthree

Updated context userthree in ~/.kube/config
$ kubectl get all -n default

NAME                 TYPE        CLUSTER-IP          EXTERNAL-IP   PORT(S)   AGE
…
$ kubectl get all -n kube-system

NAME                               READY   STATUS    RESTARTS   AGE
…
$ kubectl run -n default pause --image=k8s.gcr.io/pause:3.1

Error from server (Forbidden): pods is forbidden: User "UserThree" cannot create resource "pods" in API group "" in the namespace "default"

Let’s go ahead and make sure that the default AWS_PROFILE that we are using is set back to the cluster creator.

export AWS_PROFILE="aws-auth-account-one"

Accessing the AWS API from the Cluster

In this section, we will explore how you can give Pods running inside your EKS cluster controlled access to the AWS API by utilizing the traditional approach of implementing IRSA and then covering the more recent, EKS Pod Identities approach.

IAM roles for service accounts (IRSA)

So, let’s imagine that we are writing some software that needs to make calls to the AWS API. This doesn’t need to be anything fancy, it might simply be to read a list of user names.

For a long time, the only option has been to either pass hard-coded credentials into your application as secrets, which is pretty broadly frowned upon or to utilize IRSA.

Setting up IRSA requires a few things, one of the most important core components is an OIDC provider for your cluster, which in this case is automatically set up by the Terraform EKS module.

In our codebase, the rest of the components can be enabled, by setting the variable setup_irsa to true and passing it in with our next terraform apply.

$ terraform apply -var dev_role_id=$(aws --profile aws-auth-account-one iam get-user --output text --query 'User.UserName') -var create_test_users=true -var create_access_entries=true -var setup_irsa=true

…
  # aws_iam_policy.ec2_list_instances[0] will be created
  # aws_iam_role.ec2_list_instances[0] will be created
  # aws_iam_role_policy_attachment.ec2_list_instances[0] will be created
…
Plan: 3 to add, 0 to change, 0 to destroy.
…

This is going to create an AWS role that has a policy that allows it to list EC28 compute instances (virtual machines).

To assign this role to one of our Pods, we first need to get the AWS Role ARN that identifies it. We can grab this from our Terraform outputs, like so:

$ terraform output -json ec2_irsa_role_arn | jq .[0]

"arn:aws:iam::123456789012:role/eks-ec2-list-instances"

Now that we have this ARN, we need to create a Kubernetes ServiceAccount that references it.

Go ahead and create a new Kubernetes manifest file called test-irsa-sa.yaml with the following contents, which will create a ServiceAccount that is tied to the AWS Role that we just created via the eks.amazonaws.com/role-arn annotation.

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: eks-ec2-list
  namespace: default
  annotations:
    eks.amazonaws.com/role-arn: "arn:aws:iam::123456789012:role/eks-ec2-list-instances"
    eks.amazonaws.com/audience: "sts.amazonaws.com"

NOTE: sts.amazonaws.com in the eks.amazonaws.com/audience annotation above, is referencing the AWS Security Token Service9.

and then we will apply this to the cluster:

$ aws --profile aws-auth-account-one eks update-kubeconfig --name aws-eks-auth-test

Updated context arn:aws:eks:us-west-2:123456789012:cluster/aws-eks-auth-test in ~/.kube/config
$ kubectl apply -f ./test-irsa-sa.yaml

serviceaccount/eks-ec2-list created

Now let’s create a Pod that uses that ServiceAccount and is capable of running AWS CLI commands and wait for it to be in a “Ready” state.

kubectl run -n default awscli --image=public.ecr.aws/aws-cli/aws-cli --overrides='{ "spec": { "serviceAccount": "eks-ec2-list" }  }' --command -- bash -c "sleep infinity"

kubectl wait --for=condition=ready pod/awscli

Now we can use this Pod to run some AWS commands and see what it is allowed to do.

If you get errors from all of these commands, the most likely cause is that the eks.amazonaws.com/role-arn annotation being used in the ServiceAccount is wrong.

$ kubectl exec -ti -n default pod/awscli -- aws s3api list-buckets

An error occurred (AccessDenied) when calling the ListBuckets operation: Access Denied
command terminated with exit code 254
$ kubectl exec -ti -n default pod/awscli -- aws ec2 describe-instances --max-items 1 --no-cli-pager

{
    "Reservations": [
...
    ],
    "NextToken": ...
}

From this example, we can see that our Pod is able to use aws ec2 describe-instances but gets an error when it tries to use aws s3api list-buckets, which is what we would expect based on the AWS policy that we attached to that ServiceAccount.

At this point, we can go ahead and delete the Pod that we were using for our testing.

kubectl delete pod -n default awscli

NOTE: This command may take a minute, due to the use of sleep infinity in the container.

Pod Identity

Now, let’s take a look at EKS Pod Identity, which, as we mentioned earlier, provides a much simpler method for providing Pods access to AWS resources inside of the same account.

There is one important AWS EKS add-on that must be installed into EKS for Pod Identities to work. This add-on is called the Amazon EKS Pod Identity Agent. We have already installed this via this tiny bit of code in our Terraform stack.

We are going to run Terraform again and have it create a new role and EKS Pod Identity association that we can use for testing.

To do this, we will set the variable setup_pod_identity to true and pass it in with our next terraform apply.

$ terraform apply -var dev_role_id=$(aws --profile aws-auth-account-one iam get-user --output text --query 'User.UserName') -var create_test_users=true -var create_access_entries=true -var setup_irsa=true -var setup_pod_identity=true

…
  # aws_eks_pod_identity_association.pod_identity_test_pod[0] will be created
  # aws_iam_policy.eks_describe_cluster[0] will be created
  # aws_iam_role.pod_identity_test_pod[0] will be created
  # aws_iam_role_policy_attachment.pod_identity_test_pod[0] will be created
…
Plan: 4 to add, 0 to change, 0 to destroy.
…

Now that we have created a new role and EKS Pod Identity association, let’s test them out.

When using Pod Identity, versus IRSA, we no longer need to add any special annotations to the Kubernetes manifests that we install into the cluster, because all of that information is already contained in the code/API call that created the EKS Pod Identity association.

In this case, we have created an association for the ServiceAccount pod-identity in the Namespace pod-id in the cluster aws-eks-auth-test to the AWS role named pod_identity_test-aws-eks-auth-test, which only has permission to list and describe EKS clusters.

Go ahead and create a new Kubernetes manifest file called test-pod-identity-default.yaml with the following contents, which will create a ServiceAccount and a Deployment that uses that ServiceAccount in the default Namespace.

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: pod-identity
  namespace: default
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: pod-id-test
  namespace: default
spec:
  selector:
    matchLabels:
      app: pod-id-test
  template:
    metadata:
      labels:
        app: pod-id-test
    spec:
      serviceAccountName: pod-identity
      containers:
      - name: awscli
        image: public.ecr.aws/aws-cli/aws-cli:latest
        command: ["/bin/bash"]
        args: ["-c", "sleep infinity"]

Now apply the new manifests:

$ kubectl apply -f ./test-pod-identity-default.yaml

serviceaccount/pod-identity created
deployment.apps/pod-id-test created

We can test the permissions that our Deployment’s Pods have, by running an aws CLI command from inside one of their containers.

$ kubectl exec -ti -n default $(kubectl get pod -l app=pod-id-test -o=custom-columns=NAME:.metadata.name --no-headers) -- aws eks list-clusters --max-items 1 --no-cli-pager

An error occurred (AccessDeniedException) when calling the ListClusters operation: User: arn:aws:sts::123456789012:assumed-role/default_node_group-eks-node-group-2024042216363269390000000d/i-090669129a16e5408 is not authorized to perform: eks:ListClusters on resource: arn:aws:eks:us-west-2:123456789012:cluster/*
command terminated with exit code 254

We can see that this command fails, which we should have expected since our EKS Pod Identity association is for the ServiceAccount pod-identity in the Namespace pod-id. Let’s go ahead and create another Kubernetes manifest called test-pod-identity-pod-id.yaml that creates the pod-id Namespace and a similar ServiceAccount and Deployment in that Namespace.

---
apiVersion: v1
kind: Namespace
metadata:
  name: pod-id
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: pod-identity
  namespace: pod-id
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: pod-id-test
  namespace: pod-id
spec:
  selector:
    matchLabels:
      app: pod-id-test
  template:
    metadata:
      labels:
        app: pod-id-test
    spec:
      serviceAccountName: pod-identity
      containers:
      - name: awscli
        image: public.ecr.aws/aws-cli/aws-cli:latest
        command: ["/bin/bash"]
        args: ["-c", "sleep infinity"]

Now we can apply these new manifests:

$ kubectl apply -f ./test-pod-identity-pod-id.yaml

namespace/pod-id created
serviceaccount/pod-identity created
deployment.apps/pod-id-test created

If we now test the permissions that our new Deployment Pods in the pod-id Namespace have, we should see that everything works as expected.

$ kubectl exec -ti -n pod-id $(kubectl get pod -n pod-id -l app=pod-id-test -o=custom-columns=NAME:.metadata.name --no-headers) -- aws eks list-clusters --max-items 1 --no-cli-pager

{
    "clusters": [
        "aws-eks-auth-test"
    ]
}

Cross-Account Access

The final situation that we are going to explore is how you handle the case where you want software that is running in one of your Pods to be able to make AWS API calls in a completely different AWS account. For this, we must lean back on the capabilities that are provided by IRSA.

To do the hands-on work for this section, you will need to add a second AWS account to your AWS CLI configuration. Make sure that these new credentials have admin access to a different AWS account and DO NOT point to the same account that you have been using up to this point.

  • ~/.aws/config
[profile aws-auth-account-two]
region=us-west-2
output=yaml-stream
  • ~/.aws/credentials
[aws-auth-account-two]
aws_access_key_id=REDACTED_ACCESS_KEY
aws_secret_access_key=REDACTED_SECRET_ACCESS_KEY

Next, you should uncomment all the Terraform code in the file second-account.tf.

NOTE: If you do not have the second profile setup correctly, any Terraform commands you run will fail, at the point.

Assuming that we have two AWS accounts to work with and have everything set up correctly, we can run terraform apply one last time to create all the components that we will need in both accounts so that Pods in our cluster can make some AWS API calls into our second cluster.

To do this, we will set the variable setup_cross_account_sts to true and pass it in with our next terraform apply. And if everything is set up correctly, you should see output very similar to what is shown here.

$ terraform apply -var dev_role_id=$(aws --profile aws-auth-account-one iam get-user --output text --query 'User.UserName') -var create_test_users=true -var create_access_entries=true -var setup_irsa=true -var setup_pod_identity=true -var setup_cross_account_sts=true

…
  # data.aws_iam_policy_document.local_account_access[0] will be read during apply
  # aws_iam_policy.local_account_access[0] will be created
  # aws_iam_policy.remote_account_access[0] will be created
  # aws_iam_role.local_account_access_serviceaccount[0] will be created
  # aws_iam_role.remote_account_access[0] will be created
  # aws_iam_role_policy_attachment.local_account_access[0] will be created
  # aws_iam_role_policy_attachment.remote_account_access[0] will be created
…
Plan: 6 to add, 0 to change, 0 to destroy.
…
Outputs:
…
sts_local_account_role_arn = [
  "arn:aws:iam::123456789012:role/local-account-access",
]
sts_remote_account_role_arn = [
  "arn:aws:iam::012345678901:role/remote-account-access",
]

In the remote account, we are creating an AWS role that has a trust relationship with the local account and will allow the local account to use elasticloadbalancing:DescribeLoadBalancers within the remote account. In the local account, we are creating an AWS role that is set up to allow the Kubernetes ServiceAccount named local-access in the cross-account-sts Namespace to use STS to assume the new role in the remote account.

To assign this role to one of our Pods, we first need to get the AWS Role ARN that identifies it. We can grab this from our Terraform outputs, like so:

$ terraform output -json sts_local_account_role_arn | jq .[0]

"arn:aws:iam::123456789012:role/local-account-access"

Let’s create a new Kubernetes manifest file called test-cross-account-working.yaml with the following contents, which will create a new Namespace, ServiceAccount and a Deployment that uses that ServiceAccount.

Since we are using IRSA again, you will need to update the eks.amazonaws.com/role-arn annotation for the ServiceAccount, so that it matches the role ARN that you retrieved from your Terraform outputs.

---
apiVersion: v1
kind: Namespace
metadata:
  name: cross-account-sts
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: local-access
  namespace: cross-account-sts
  annotations:
    eks.amazonaws.com/role-arn: "arn:aws:iam::123456789012:role/local-account-access"
    eks.amazonaws.com/audience: "sts.amazonaws.com"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cross-account-sts-test
  namespace: cross-account-sts
spec:
  selector:
    matchLabels:
      app: cross-account-sts-test
  template:
    metadata:
      labels:
        app: cross-account-sts-test
    spec:
      serviceAccountName: local-access
      containers:
      - name: awscli
        image: public.ecr.aws/aws-cli/aws-cli:latest
        command: ["/bin/bash"]
        args: ["-c", "sleep infinity"]

When we apply the new manifests, we should see something like this:

$ kubectl apply -f ./test-cross-account-working.yaml

namespace/cross-account-sts configured
serviceaccount/local-access configured
deployment.apps/cross-account-sts-test configured

Mimicking what an application would do is a bit tricky with the AWS CLI since it needs to make multiple calls, but we can do it by editing and then running this monstrosity:

$ kubectl exec -ti -n cross-account-sts $(kubectl get pod -n cross-account-sts -l app=cross-account-sts-test -o=custom-columns=NAME:.metadata.name --no-headers) -- /bin/env lrole=$(terraform output -json sts_local_account_role_arn | jq --raw-output .[0]) rrole=$(terraform output -json sts_remote_account_role_arn | jq --raw-output .[0]) /bin/bash -c 'export AWT=$(cat /var/run/secrets/eks.amazonaws.com/serviceaccount/token) && export $(printf "AWS_ACCESS_KEY_ID=%s AWS_SECRET_ACCESS_KEY=%s AWS_SESSION_TOKEN=%s" $(aws sts assume-role-with-web-identity --role-arn ${lrole} --role-session-name remote-sts-test --web-identity-token ${AWT} --query "Credentials.[AccessKeyId,SecretAccessKey,SessionToken]" --output text)) && export $(printf "AWS_ACCESS_KEY_ID=%s AWS_SECRET_ACCESS_KEY=%s AWS_SESSION_TOKEN=%s" $(aws sts assume-role --role-arn ${rrole} --role-session-name remote-sts-test --query "Credentials.[AccessKeyId,SecretAccessKey,SessionToken]" --output text)) && aws elb describe-load-balancers --max-items=1 --no-cli-pager'

{
    "LoadBalancerDescriptions": []
}

Basically, this command is using some bash shell hackery, to exec into the pod and then run a chain of commands which assume the roles that are required to eventually allow us to try and list all the ELBs10 (load balancers) in the remote account. If it was torn apart and simplified it would look something like this:

NOTE: Don’t try to run the individual commands in this list. They are just here to help explain the combined command that we used above.

  1. Exec into the pod that we just spun up.
    • kubectl exec -ti -n cross-account-sts $(kubectl get pod -n cross-account-sts -l app=cross-account-sts-test -o=custom-columns=NAME:.metadata.name --no-headers)
  2. Set a few environment variables that will be useful in the following commands.
    • export lrole=$(terraform output -json sts_local_account_role_arn | jq --raw-output .[0])
    • export rrole=$(terraform output -json sts_remote_account_role_arn | jq --raw-output .[0])
    • export AWT=$(cat /var/run/secrets/eks.amazonaws.com/serviceaccount/token)
  3. Try to use assume-role-with-web-identity to get credentials for the local role.
    • aws sts assume-role-with-web-identity --role-arn ${lrole} --role-session-name remote-sts-test --web-identity-token ${AWT} --query "Credentials.[AccessKeyId,SecretAccessKey,SessionToken]" --output text
  4. Update the AWS environment variables with our new credentials from assume-role-with-web-identity.
    • Set the environment variables AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN with the values we got from assuming the initial role in our local account.
  5. Try to use assume-role to get credentials for the remote role.
    • aws sts assume-role --role-arn ${rrole} --role-session-name remote-sts-test --query "Credentials.[AccessKeyId,SecretAccessKey,SessionToken]" --output text
  6. Update the AWS environment variables with our new credentials from assume-role.
    • Set the environment variables AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN with the values we got from assuming the final role in our remote account.
  7. Then we finally run the following command using the temporary credentials that we received from the remote account.
    • aws elb describe-load-balancers --max-items=1 --no-cli-pager

Let’s make sure that this functionality is limited to the ServiceAccount and Namespace that we specified, by also testing this from the default Namespace.

Go ahead and create our final Kubernetes manifest file with the following contents, and name it test-cross-account-default.yaml. It will be very similar to the previous manifest but will put everything into the default Namespace.

Remember to update the eks.amazonaws.com/role-arn annotation for the ServiceAccount, so that it matches the role ARN that you retrieved from your Terraform outputs.

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: local-access
  namespace: default
  annotations:
    eks.amazonaws.com/role-arn: "arn:aws:iam::0123456789012:role/local-account-access"
    eks.amazonaws.com/audience: "sts.amazonaws.com"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cross-account-sts-test
  namespace: default
spec:
  selector:
    matchLabels:
      app: cross-account-sts-test-default
  template:
    metadata:
      labels:
        app: cross-account-sts-test-default
    spec:
      serviceAccountName: local-access
      containers:
      - name: awscli
        image: public.ecr.aws/aws-cli/aws-cli:latest
        command: ["/bin/bash"]
        args: ["-c", "sleep infinity"]

Applying the new manifests should generate the following output:

$ kubectl apply -f ./test-cross-account-default.yaml

serviceaccount/local-access created
deployment.apps/cross-account-sts-test created

Finally, we can test this with a very small modification to our previous command (due to the namespace and deployment label change):

$ kubectl exec -ti -n default $(kubectl get pod -n default -l app=cross-account-sts-test-default -o=custom-columns=NAME:.metadata.name --no-headers) -- /bin/env lrole=$(terraform output -json sts_local_account_role_arn | jq --raw-output .[0]) rrole=$(terraform output -json sts_remote_account_role_arn | jq --raw-output .[0]) /bin/bash -c 'export AWT=$(cat /var/run/secrets/eks.amazonaws.com/serviceaccount/token) && export $(printf "AWS_ACCESS_KEY_ID=%s AWS_SECRET_ACCESS_KEY=%s AWS_SESSION_TOKEN=%s" $(aws sts assume-role-with-web-identity --role-arn ${lrole} --role-session-name remote-sts-test --web-identity-token ${AWT} --query "Credentials.[AccessKeyId,SecretAccessKey,SessionToken]" --output text)) && export $(printf "AWS_ACCESS_KEY_ID=%s AWS_SECRET_ACCESS_KEY=%s AWS_SESSION_TOKEN=%s" $(aws sts assume-role --role-arn ${rrole} --role-session-name remote-sts-test --query "Credentials.[AccessKeyId,SecretAccessKey,SessionToken]" --output text)) && aws elb describe-load-balancers --max-items=1 --no-cli-pager'

An error occurred (AccessDenied) when calling the AssumeRoleWithWebIdentity operation: Not authorized to perform sts:AssumeRoleWithWebIdentity

An error occurred (AccessDenied) when calling the AssumeRoleWithWebIdentity operation: Not authorized to perform sts:AssumeRoleWithWebIdentity

An error occurred (AccessDenied) when calling the AssumeRoleWithWebIdentity operation: Not authorized to perform sts:AssumeRoleWithWebIdentity
command terminated with exit code 254

As we expected, everything failed because we weren’t authorized to even make the first aws sts assume-role-with-web-identity call.

Tearing down the Infrastructure

When you are all done using the EKS cluster (and would like to stop paying for it) you can tear it down, as shown below. We do not need to pass any variables to this command, since this command on its own will make Terraform destroy everything that is currently defined in the state file.

$ terraform destroy

…
Plan: 0 to add, 0 to change, 104 to destroy.
…
│ Warning: EC2 Default Network ACL (acl-0000d00000b0a000b) not deleted, removing from state
…
Destroy complete! Resources: 104 destroyed.

NOTE: Terraform can not delete EC2 Default Network ACLs. Although this should not be an issue, if desired, you can utilize the AWS console, by navigating to the VPC service, selecting Network ACLs and then searching for and removing any unnecessary ACLs and related components from there.

Conclusion

There are a lot of options when it comes to authentication and authorization between AWS and a Kubernetes cluster, but hopefully, this walkthrough has helped you understand the differences between the options and how each of the options is wired together. Some choices regarding which options you use will be made for you depending on whether you are using EKS or a custom cluster, and other things will simply be a matter of requirements and preferences. The one thing that you should always avoid if possible, is passing hard-coded credentials into your applications, since all of the options discussed here make it possible to generate short-term credentials that can be easily managed (after the initial setup) and can only be used by a targeted audience.

Further Reading


Footnotes

Sean Kane
SuperOrbital Engineer
Flipping bits on the internet since 1992.