AWS CodeArtifact with Maven — Further adventures with ServerLess

Robert Hook - @thebellman@hachyderm.io
Level Up Coding
Published in
10 min readMar 6, 2021

--

oval stone with the word “Tranquility” and a Chinese character engraved on it.
Tranquility — Photo by the author

If you’ve read any of my previous writing, you will have noticed I’m a fan of services that don’t involve me directly building and managing a server anywhere. Call them “Serverless”, call them “X-as-a-service”, the crux for me is that I can focus on using the service and not worry much about where and how it runs.

CodeArtifact falls squarely into that category for me. I recently wrote about building a serverless CI/CD pipeline for Go code. For Go, the ultimate output of the pipeline was an executable published into an S3 bucket. For Java code, this approach gets a bit clunky.

Like it or not, the de-facto standard for building and packaging Java code for many years has been Maven. Alternatives like Gradle chip away at it’s dominance, but Maven still wears the crown. I don’t think anyone on the planet can say that they like Maven, but for all of it’s many annoyances it gets the job done reliably and repeatably.

In the Java world, the automatic de-facto development infrastructure for a team looks pretty well the same as it has for over a decade: Eclipse or Intellij on the desktop, against a Maven project. Some sort of code repository, usually Gitlab or Github. Jenkins or Hudson to do the actual CI builds (even though there are much nicer solutions now), and an artifact repository to store built JARs in and allow them to be shared. Again, the de-facto standard for many years has been the well-regarded Artifactory from JFrog, and it’s this component in the stack that CodeArtifact replaces.

Abstract Code Pipeline

The role of a modern artefact repository is fairly straightforward: a managed place where compiled code artefacts can be catalogued and shared. One of the more important ideas in an artefact repository is that each version of the code artefact is distinct, and there is some way of tracing it back to the version of the generated code. Most artefact repositories can also act as caching proxies to, or mirror copies of, external repositories. This feature is often used by enterprises to manage what external repositories are used by developers, and what external dependencies they can import.

In the case of Maven, each project is defined by one or more XML files usually named pom.xml and settings.xml, which (among many other things) define where Maven will publish built artefacts to, and where it will look to resolve dependencies. Very often these places are the same, but not always, which can lead to a certain amount of confusion!

As part of my exploration of how to assemble a serverless CI/CD pipeline, I finally sat down to wrap my head around how to get CodeBuild, CodeArtifact and Maven to collaborate on building and publishing Java JARs. I will work through some of the details below, although please refer to my previous notes on all the bits and pieces needed to sort out the CodeBuild/CodePipeline pieces to provide continuous integration and deployment. In this write-up, I will focus heavily on CodeArtifact and the Maven configuration.

Serverless CI/CD Pipeline

Note also that I will give examples using Terraform — if you have read anything else I’ve written, you already know that I’m fond of Terraform, and very fond of the idea of defining infrastructure on AWS using Terraform code.

The first thing that I tripped over is that at the time of writing (early 2021), CodeArtifact is only available in a limited number of AWS environments. Generally I build things into the London (eu-west-2) region, since that’s where I live, but to my slight annoyance CodeArtifact is only available in Ireland (eu-west-1). So, heavy sigh, step one was to add an additional named provider in my Terraform code:

provider "aws" {
region = var.aws_region
profile = var.aws_profile
}
provider "aws" {
alias = "eu-west-1"
region = "eu-west-1"
profile = var.aws_profile
}

Next step, I worked on the assumption that I will be building more pipelines in the mid-future for additional Java libraries, so I create a list of the projects (each project will have it’s own distinct pipeline and repository). This is pretty straightforward — a map where the key is the common name I use for the project and all it’s bits and pieces, and a description:

locals {
projects = {
iplib : "Small library for consuming AWS CIDR block data"
}
}

CodeArtifact is organised around domains — a domain is any logical grouping suitable for you, and acts as a logical container for a set of artefacts. You might want to organise by product or project, or by team, or by organisational boundaries. It really doesn’t matter, other than that it gives you an opportunity to use IAM roles and policies to possibly constrain who can access given domains.

A quick aside on the question of roles and policies. In the examples below, I’ve been operating as a user with a very high level of access to the account. You may need to finesse permissions to use Terraform to create the assets.

Right, so, domains. Next step is to create a domain:

resource "aws_codeartifact_domain" "development" {
provider = aws.eu-west-1
domain = "development"
encryption_key = aws_kms_key.codeartifact.arn
tags = merge({ "Name" = "development" }, var.tags)
}

You might note that there’s an encryption key. That’s mandatory, and is another potential tool for constraining access. It also gives an increased guarantee that any artefacts in the repository are genuine: it would take quite a lot off effort for an external bad actor to subvert both the key and the CodeArtifact domain to be able to install spoofed artefacts (although cached artefacts from upstream repositories may still have been poisoned upstream)

Keys are simple to throw up, but make sure it’s in the same region as the domain:

resource "aws_kms_key" "codeartifact" {
provider = aws.eu-west-1
deletion_window_in_days = 30
enable_key_rotation = true
tags = merge({ "Name" = "codeartifact" },
var.tags)
}

Next, the repository itself. In general terms you will probably organise this around a project, with all closely related artefacts in one repository:

resource "aws_codeartifact_repository" "maven" {
provider = aws.eu-west-1
domain = aws_codeartifact_domain.development.domain
repository = "maven"
description = "Repository to install maven artifacts into"
external_connections {
external_connection_name = "public:maven-central"
}
tags = merge({ "Name" = "maven"}, var.tags)
}

I’ve called it “maven” because in my head I will only use this repository for reading and writing artefacts via Maven, but every repository can be used for Maven, Gradle, pip, npm… pretty well all the standards you need, and growing over time.

The CodeBuild definition is a little simpler than the ones I’ve previously described — the key difference is that I don’t specify an artefact to be produced by the CodeBuild build. That’s because the job of publishing to CodeArtifact is done by Maven itself:

resource "aws_codebuild_project" "maven" {
for_each = local.projects
name = each.key
description = "project to build the ${each.key} project"
service_role = aws_iam_role.maven.arn
build_timeout = 15
badge_enabled = true
source_version = "refs/heads/main"
source {
git_clone_depth = 1
insecure_ssl = false
location = aws_codecommit_repository.maven[each.key].clone_url_http
report_build_status = false
type = "CODECOMMIT"
git_submodules_config {
fetch_submodules = false
}
}
artifacts {
type = "NO_ARTIFACTS"
}
environment {
compute_type = "BUILD_GENERAL1_SMALL"
image = "aws/codebuild/amazonlinux2-x86_64-standard:3.0"
image_pull_credentials_type = "CODEBUILD"
privileged_mode = false
type = "LINUX_CONTAINER"
}
logs_config {
cloudwatch_logs {
status = "ENABLED"
group_name = aws_cloudwatch_log_group.maven.name
}
s3_logs {
encryption_disabled = false
status = "DISABLED"
}
}
tags = merge({ "Name" = each.key }, var.tags)
}

Unless you’re quite familiar with Terraform, something subtle shows up in that code snippet — the use of for_each means that a pipeline will be created by the same bit of code for each of my defined projects. Adding a new project is as simple as adding an entry to the local list at the top, and re-running Terraform.

Another slight nuance may not be immediately obvious — the CodeBuild project (along with the CodePipeline pipeline and the CodeCommit git repository) are not in eu-west-1 — there is no requirement for the CodeArtifact repository to be in the same region as the bits and pieces that build and publish the artifact.

The policy attached to the service_role is the same as the one I’ve previously published, other than adding permissions for CodeBuild to use CodeArtifact. This can be nuanced a little if you want to be more precise about the resources used, and I’ll be tweaking that a bit later myself. The principle of least-privilege is a very powerful security safety net.

data "aws_iam_policy_document" "codebuild" {
.
.
.
statement {
actions = [
"codeartifact:GetAuthorizationToken",
"codeartifact:GetRepositoryEndpoint",
"codeartifact:ReadFromRepository",
"codeartifact:PublishPackageVersion",
"codeartifact:PutPackageMetadata"
]
resources = ["*"]
}
statement {
actions = ["sts:GetServiceBearerToken"]
resources = ["*"]
condition {
test = "StringEquals"
variable = "sts:AWSServiceName"
values = [
"codeartifact.amazonaws.com"
]
}
}
}

The only complicated bit is the use of STS to get an authentication token. This is the “glue” between the world of IAM principals and policies, and the old-school authentication world of (in this case) Maven.

So let’s transition to the content of the Java project itself. First, because this is a CodeBuild project, we need the buildspec.yml that tells CodeBuild what to do with the code:

version: 0.2

phases:
install:
runtime-versions:
java: corretto11
pre_build:
commands:
- export STAMP="1.0-`date +%Y%m%d.%H%M%S`"
- export CODEARTIFACT_AUTH_TOKEN=`aws --region eu-west-1 codeartifact get-authorization-token --domain development --domain-owner 304388931199 --query authorizationToken --output text`
build:
commands:
- mvn --no-transfer-progress versions:set -DnewVersion=$STAMP
- mvn --no-transfer-progress -s settings.xml clean package deploy
artifacts:
files:
- target/iplib-$STAMP.jar
name: iplib-$STAMP.jar

Working from top to bottom: I specify AWS’s Coretto because I know they’ve optimised the build environments for it — simply, builds with Coretto tend to be a bit faster and marginally cheaper. The produced artefact is entirely compatible with any Java 11 (or later) JDK. I also create a build version stamp using the current date and time, which you can see being used in the names of artefacts lower down.

The only complicated piece is asking CodeBuild to get an authentication token to use in the build. Even though the invocation is aws codeartifact get-authorisation-token, underneath the hood STS gets involved, which is why we needed to grant access to that service as well.

The build takes those bits we’ve configured, and then runs a plain vanilla Maven build to first set the version identifier we will use, then build and deploy.

From here on down, it’s all Maven configuration. First up, we must have a settings.xml at the top of the project to specify one or more servers for Maven to know about. This is where the authentication token in the environment comes into play. Note that the server id is based on the CodeArtifact domain and repository, but that’s really just a convenience so that consistent names are used throughout the project.

<settings>
<servers>
<server>
<id>development--maven</id>
<username>aws</username>
<password>${env.CODEARTIFACT_AUTH_TOKEN}</password>
</server>
</servers>
</settings>

In the pom.xml itself, it’s a good convention to include a reference to the source of the code:

<?xml version="1.0" encoding="utf-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>

<name>IPLib</name>
<groupId>net.parttimepolymath</groupId>
<artifactId>iplib</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>

<description>Simple library to extract IP ranges fom the IP range set published by AWS</description>
<url>http://parttimepolymath.net</url>

<scm>
<url>scm:ssh://git-codecommit.eu-west-2.amazonaws.com/v1/repos/iplib</url>
<connection>scm:ssh://git-codecommit.eu-west-2.amazonaws.com/v1/repos/iplib</connection>
<developerConnection>scm:ssh://git-codecommit.eu-west-2.amazonaws.com/v1/repos/iplib</developerConnection>
<tag>iplib-1.0-snapshot</tag>
</scm>
.
.
.

but the important part for publishing is to specify distributionManagement:

<distributionManagement>
<repository>
<id>development--maven</id>
<name>development--maven</name>
<url>https://development-304388931199.d.codeartifact.eu-west-1.amazonaws.com/maven/maven/</url>
</repository>
</distributionManagement>

Assembling all of that, and pushing my Java code up, results very pleasingly in various versions bubbling into the repository, with no manual intervention, no fuss, and most importantly no servers to manage.

That same information can be retrieved from the command line, but this is definitely one of those cases where it’s far simpler to use the console:

% aws --profile XXX --region eu-west-1 codeartifact \
list-package-versions \
--namespace net.parttimepolymath --package iplib \
--domain development --repository maven \
--format maven
{
"versions": [
{
"version": "1.0-20210306.104320",
"revision": "IsZFITs4QUJ+NQ4f45Sipzxy6RAbM5GOs1DNPYUh0YY=",
"status": "Published"
},
{
"version": "1.0-20210306.104728",
"revision": "hCdgGhXuuqVIqUWlSe3r9YKxQZDnWG3LnBAFR4mCqLw=",
"status": "Published"
},
.
.
.

So all of the above gets us to the point of deploying Maven artifacts into CodeArtifact in our serverless pipeline. To close the loop, we need to be able to pull artifacts down to the desktop in our build — for instance to consume the iplib library in a different project. The great news is, this is really simple and straightforward.

First off, our client project needs the same settings.xml as above (alternatively, this can go into your local .m2/settings.xml, as discussed in the AWS documentation).

Next, the pom.xml needs the distribution management definition for deploying:

<distributionManagement>
<repository>
<id>development--maven</id>
<name>development--maven</name>
<url>https://development-304388931199.d.codeartifact.eu-west-1.amazonaws.com/maven/maven/</url>
</repository>
</distributionManagement>

but we need one additional item to pull dependencies from CodeArtifact:

<repositories>
<repository>
<id>development--maven</id>
<url>https://development-304388931199.d.codeartifact.eu-west-1.amazonaws.com/maven/maven/</url>
</repository>
</repositories>

One caveat around that definition — without specifying other repositories, this will cause all artefact resolution to be routed through our CodeArtifact instance. That’s why in our configuration of CodeArtifact back at the beginning we specified an external connection: our instance is acting as a caching proxy in front of the Maven Central repository. Depending on your requirements, that may not be what you want, in which case you probably want to use multiple repositories.

Either way, doing a local build is easy:

$ CODEARTIFACT_AUTH_TOKEN=`aws --region eu-west-1 codeartifact get-authorization-token --domain development --domain-owner 304388931199 --query authorizationToken --output text`
$ mvn -s settings.xml clean package

and during the initial stages of the maven build you should see the dependency being downloaded from our codeartifact respository:

[INFO] Scanning for projects...
[INFO]
[INFO] --------------------< net.parttimepolymath:awscidr >--------------------
[INFO] Building AWSCidr 1.0-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
Downloading from development--maven: https://development-889199313043.d.codeartifact.eu-west-1.amazonaws.com/maven/maven/net/parttimepolymath/iplib/1.0-20210306.194311/iplib-1.0-20210306.194311.pom
.
.
.

On one final note: pricing. The cost of CodeArtifact is ridiculously low, and you would need to have a very active development environment before the costs rise into the “price of a cup of coffee” realm. More critically, the costs are orders of magnitude lower than the costs you would wear by running a server with Artifactory or similar installed, let alone the cost of manpower to manage that server.

--

--

Engineering Manager at Pleo, with a belief that technology can be simple, easy and fun. 35+ years building robust, secure data driven solutions.