Cross-account image pulls with Amazon ECR pull-through cache
2023-December-15 • by David Norton
The problem
Amazon ECR's pull-through cache feature is a helpful tool to allow usage of public image repositories while buffering your system from unexpected downtime.
Our client saw quay.io have four outages in a week, and critical daemonset pods were unable to start on their Kubernetes cluster. This was just days before our client's busiest season, and we had to act fast to protect ourselves against further outages.
The solution? Implement a pull-through cache. However, we ran into an issue because our artifacts (ECR images) were stored in a different AWS account than our compute (Kubernetes nodes). For existing images and repositories, it worked fine (with cross-account repository policy), but for repositories that didn't exist yet, there was a bit of a chicken and egg problem: with a nonexistent repository, there is no policy to grant cross-account image pulls.
The interim solution
A ticket to AWS Support was answered very helpfully that this was a currently unsupported use case, and suggested we authenticate to the artifact account and pre-create the ECR repositories and policy. We did that for the critical images, and we were all set -- cross-account pulls started working.
The real solution: repository creation templates
The very day after we implemented this workaround, a curious post appeared on the AWS blog: Amazon ECR adds ability to specify initial configuration for repositories created via pull through cache (Preview)
That sounded... promising. Being in preview, the creation templates don't yet have an API or Terraform support, but we were able to create them through the console, which we'll show below. This allowed us to pull images from another account, for a repository that did not yet exist.
We'll use Terraform for these examples. You can use whichever infrastructure tool you prefer, including the AWS console,
aws
cli, CloudFormation, CDK, etc.
Step 1: set up pull through cache rule in ECR account
First we will set up a pull through cache rule, in the account where we want ECR images to live:
resource "aws_ecr_pull_through_cache_rule" "rule" {
provider = aws.ecr
ecr_repository_prefix = "registry.k8s.io"
upstream_registry_url = "registry.k8s.io"
}
Step 2: create registry policy in ECR account
Next we'll set up some registry policy so that new repositories can be created by principals in other accounts:
resource "aws_ecr_registry_policy" "policy" {
provider = aws.ecr
policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Sid = "AllowPullThroughCacheFromOtherAccount",
Effect = "Allow",
Principal = {
"AWS" : "arn:aws:iam::${data.aws_caller_identity.pulling_account.account_id}:root"
},
Action = [
"ecr:CreateRepository",
"ecr:BatchImportUpstreamImage"
],
Resource = "arn:aws:ecr:${data.aws_region.current.name}:${data.aws_caller_identity.ecr_account.account_id}:repository/registry.k8s.io/*"
}
]
})
}
Step 3: add policy to role in pulling account
Now, we'll create a policy and role in the pulling account, which allows those using the role to pull images:
resource "aws_iam_role" "puller" {
provider = aws.pulling_account
name = "ecr-puller"
assume_role_policy = jsonencode({
Version : "2012-10-17",
Statement: {
Action: ["sts:AssumeRole"],
Effect: "Allow",
Principal: {
"AWS" : "arn:aws:iam::${data.aws_caller_identity.pulling_account.account_id}:root"
}
}
})
}
resource "aws_iam_role_policy" "compute_pull_policy" {
provider = aws.pulling_account
name = "ecr-pull-through-cache-${data.aws_region.current.name}"
role = aws_iam_role.puller.name
policy = jsonencode({
Version : "2012-10-17",
Statement : [
{
Sid : "AllowPullThroughCacheInECRAccount",
Effect : "Allow",
Action : [
"ecr:BatchGetImage",
"ecr:GetDownloadUrlForLayer",
"ecr:CreateRepository",
"ecr:BatchImportUpstreamImage"
],
Resource : [
"arn:aws:ecr:${data.aws_region.current.name}:${data.aws_caller_identity.ecr_account.account_id}:repository/registry.k8s.io/*"
]
},
{
Sid : "AllowLogin",
Effect : "Allow",
Action : [
"ecr:GetAuthorizationToken",
],
Resource : [
"*"
]
},
]
})
}
Step 4: it still can't pull!
Here is where we were at before repository creation templates: if the repository doesn't exist, it does not work:
$ aws sts assume-role --role-arn arn:aws:iam::_pulling_account_:role/ecr-puller --role-session-name puller
# this gives us some creds, which we then export as environment variables so we are authenticated as `role/ecr-puller`.
$ aws sts get-caller-identity
{
"UserId": "redacted:puller",
"Account": "redacted-pulling-account",
"Arn": "arn:aws:sts::_pulling_account_:assumed-role/ecr-puller/puller"
}
$ aws ecr get-login-password --region us-east-2 | docker login --username AWS --password-stdin _pulling_account_.dkr.ecr.us-east-2.amazonaws.com
Login Succeeded
$ docker pull _ecr_account_.dkr.ecr.us-east-2.amazonaws.com/registry.k8s.io/busybox:latest
Error response from daemon: pull access denied for redacted.dkr.ecr.us-east-2.amazonaws.com/registry.k8s.io/busybox,
repository does not exist or may require 'docker login': denied: User:
arn:aws:sts::redacted:assumed-role/ecr-puller/puller is not authorized to perform: ecr:BatchGetImage on resource:
arn:aws:ecr:us-east-2:redacted:repository/registry.k8s.io/busybox because no resource-based policy allows the ecr:BatchGetImage action
Step 5: create repository creation template in ECR account
- In AWS Console, search for Elastic Container Registry
- Click Private Registry -> Settings -> Creation templates
- Create a template with:
- Prefix matching the pull through cache rule's prefix, e.g.
registry.k8s.io
- And the following policy under Repository permissions:
- Prefix matching the pull through cache rule's prefix, e.g.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowPull",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::_account_you_will_pull_from_:root"
},
"Action": [
"ecr:BatchCheckLayerAvailability",
"ecr:BatchGetImage",
"ecr:DescribeImages",
"ecr:DescribeRepositories",
"ecr:GetDownloadUrlForLayer"
]
}
]
}
Now we can pull from the other account:
$ docker pull _ecr_account_.dkr.ecr.us-east-2.amazonaws.com/registry.k8s.io/busybox:latest
latest: Pulling from registry.k8s.io/busybox
a3ed95caeb02: Pull complete
138cfc514ce4: Pull complete
Digest: sha256:cdedfa26285ad3ee2078003b8266eaf11e3678cf0a0002f17c7bea2d17abb5ce
Status: Downloaded newer image for redacted.dkr.ecr.us-east-2.amazonaws.com/registry.k8s.io/busybox:latest
redacted.dkr.ecr.us-east-2.amazonaws.com/registry.k8s.io/busybox:latest
The repository has been automatically created, and the desired cross-account policy automatically attached. Works great for us!
Summary
AWS is a complicated beast, with new features constantly rolling out and things changing. I suspect that by the time folks read this, something will have changed and made this even easier.
You can find the Terraform code for this post on GitHub.