« Back to home

Continuous Integration and Deployment using GitLab.com, ECS, ALB

Posted on

Amazon’s AWS ECS has interested me and I haven’t had a chance to really test it out yet. I decided to set up a minimal project and walk through the steps of setting up Continuous Integration and Deployment.

I have been hearing good things about GitLab and I decided to give that a try as well. I’m very glad I did. It had many more features than I expected and it reduced the number of moving parts in this exercise.

The hosted version of Gitlab has free unlimited private repos and also includes continuous integration servers and even private docker registry for each repo. Thank you GitLab!!

The Plan

The idea is to use continuous integration and deployment that deploys a simple web site to AWS infrastructure, namely Amazon EC2 Container Service (ECS).

Gitlab ECS Continuous Integration Architecture Diagram

Starting in the top left of the diagram:

  1. Changes are pushed to a git repository.
  2. The build service automatically builds the docker image, pushes the image to a private registry and notifies ECS of the new application version.
  3. ECS deploys the new version to the EC2 instances and manages the ELB target group to add and remove instances as required.
  4. Internet requests are routed through the ELB Application Load Balancer to the active ECS instances.

Minimal Web App

To keep things simple, I decided to use a simple web application. The application consists of one HTML file and one Dockerfile:

pub/index.html

<html>
    <body>
        Hello world.
    </body>
</html>

Dockerfile

FROM nginx:latest
COPY pub /usr/share/nginx/html

This Dockerfile uses Docker’s official nginx image and copies index.html and any other files in the pub directory to the resulting image.

Continuous Integration

The next step was to set up the Docker build in GitLab – this is accomplished using a simple YAML file that describes the build steps. Here’s the build file at this point:

.gitlab-ci.yml

stages:
  - build

build_image:
  stage: build
  only: [master]
  image: docker:git
  services:
    - docker:dind
  script:
    - docker login -u gitlab-ci-token -p $CI_BUILD_TOKEN registry.gitlab.com
    - docker build -t registry.gitlab.com/don.mcnamara/mantis-web .
    - docker tag registry.gitlab.com/don.mcnamara/mantis-web registry.gitlab.com/don.mcnamara/mantis-web:$CI_BUILD_REF
    - docker push registry.gitlab.com/don.mcnamara/mantis-web:$CI_BUILD_REF
    - docker push registry.gitlab.com/don.mcnamara/mantis-web

I am starting with one build stage at this point. It consists of one task: build_image. This stage uses “docker in docker” as a service to allow us to build the docker image.

The script for the build_image step is fairly straightforward. It builds the docker image and tags the image with the git revision hash – an alternative would be to use the build number. The script then pushes the image to the private docker registry. I push twice so that the tagged version is updated as well as the “latest” version.

Amazon Web Services

Now that the project builds and automatically publishs docker images, let’s think about the AWS architecture.

Inbound HTTP traffic will come through an ALB – this is Amazon’s Application Load Balancer. It will allow me to do path based routing later, if I want. It also integrates nicely with ECS. The ALB has one listener and one target group.

The application will be deployed to an ECS cluster and the ECS service will work directly with the ALB target group to add and remove service mappings appropriately. This gives us a great deployment story.

The biggest tripping point for me was making sure each AWS component had the appropriate security groups and roles to talk to each other. Also, the AWS command line is a little unhelpful when there is a problem.

Note: The process to create the AWS infrastructure could be simplified by using CloudFormation to build the infrastructure. I might address that in a follow up post. For a first attempt I think it helped me to reason about each piece independently. Update: This is done! See my post on the CloudFormation change..

Docker authentication information

Because we are using an external private docker registry we will need to provide authentication information for the ECS instances. If we were using a public registry, no authentication would be required. If we were using AWS EC2 Container Registry, we could use an IAM role to connect.

To create the authentication information, run docker login on your local computer, use your GitLab credentials and then grab the authentication information from the docker config file. In a production application you should have a user just for this purpose.

docker login registry.gitlab.com
# enter your user name and password
cat ~/.docker/config.json

Look for a line similar to this in your docker config file:

"registry.gitlab.com": { "auth": "[ENCODED_AUTH_TOKEN]" }

Authentication Data in EC2 User Data

We pass the authentication information above in the user data to our EC2 instances. I went for simplicity here and appended directly to the config file. I think a better solution would be to store this configuration in S3 and copy it locally when your instance starts.

ecs_startup.txt

#!/bin/bash
echo ECS_CLUSTER=test >> /etc/ecs/ecs.config
echo ECS_ENGINE_AUTH_TYPE=dockercfg >> /etc/ecs/ecs.config
echo ECS_ENGINE_AUTH_DATA={\"registry.gitlab.com\": { \"auth\": \"your_auth_string_goes_here\" }} >> /etc/ecs/ecs.config

We set the ECS cluster name – otherwise the server will join the default cluster. We are also setting the authentication type and adding the authentication data into the ecs config file.

It can be a little tricky to get this to work with the escaped quotes. If your ECS instances have trouble authenticating with the docker registry you can run this script locally, or just cat out the config file at the end of your user data script and look for any problems.

Create the ECS task template

We will need a template to create our ECS task definitions. Here is what I used:

containers.json

[
  {
    "name": "web-server",
    "image": "registry.gitlab.com/don.mcnamara/mantis-web:{version}",
    "cpu": 10,
    "memory": 10,
    "essential": true,
    "portMappings": [
      {
        "containerPort": 80
      }
    ]
  }
]

A script will swap the {version} text with the version we want to use.

One interesting thing about this task definition is that we are only defining the container port, we are not defining a host port. This means docker will assign an available port on the container. This is new functionality afforded by the integration of ECS and ALBs. It also means we don’t have to run N+1 containers to allow blue-green deployments.

From what I recall, previously an ELB pointed at a static port. This meant each container could only have one version of a service running at time. So, you had to provision N+1 ECS instances to allow for a rolling deploy. Now, different versions of a task can simply use different dynamic ports and everything is handled by the load balancer target group.

AWS Setup

Note: The following scripts use the AWS cli. You might have to update to the latest version to get the ELB/ALB related commands. Also, I am using jq to parse the output when I need to grab an ID for later use.

We will need some information about our AWS environment. I used an existing VPC and subnets. You might want to spin up a VPC just for this, it is up to you.

aws ec2 describe-subnets
VPC_ID=vpc-AAAAAAAA   # [copy/paste VPC ID]
SUBNET_1=subnet-BBBBBBBB  # [copy/paste value for one AZ]
SUBNET_2=subnet-CCCCCCCC  # [copy/paste value for a different AZ]

aws ec2 describe-security-groups --filters Name=vpc-id,Values=$VPC_ID
VPC_SECURTIY_GROUP=sg-DDDDDDDD  # [find your default VPN security group]

I don’t know if there is any relation between the VPC and the default security group other than that they would have been created at the same time. Browse through your security groups and find the correct one. It should be one named “default VPC security group” or something along those lines.

Create security group for inbound ELB traffic

aws ec2 create-security-group --group-name ecs-test-http-inbound --description "Allow inbound HTTP from everyone" --vpc-id $VPC_ID | tee inbound_security_group.out.json
INBOUND_SECURITY_GROUP="$(cat inbound_security_group.out.json | jq -r .GroupId)"
echo $INBOUND_SECURITY_GROUP
aws ec2 authorize-security-group-ingress --group-id $INBOUND_SECURITY_GROUP --protocol tcp --port 80 --protocol tcp --port 80 --cidr 0.0.0.0/0
aws ec2 create-tags --resources $INBOUND_SECURITY_GROUP --tags Key=Name,Value=ecs-test-http-inbound

Create the Application Load Balance (ALB), target group and listener

aws elbv2 create-load-balancer --name test-ecs-service-load-balancer --subnets $SUBNET_1 $SUBNET_2 --security-groups $VPC_SECURTIY_GROUP $INBOUND_SECURITY_GROUP | tee load_balancer.out.json
LOAD_BALANCER_ARN="$(cat load_balancer.out.json | jq -r .LoadBalancers[0].LoadBalancerArn)"
echo $LOAD_BALANCER_ARN
aws elbv2 create-target-group --name ecs-target-group --protocol HTTP --port 80 --vpc-id $VPC_ID | tee target_group.out.json
TARGET_GROUP_ARN="$(cat target_group.out.json | jq -r .TargetGroups[0].TargetGroupArn)"
echo $TARGET_GROUP_ARN
aws elbv2 create-listener --load-balancer-arn $LOAD_BALANCER_ARN --protocol HTTP --port 80 --default-actions Type=forward,TargetGroupArn=$TARGET_GROUP_ARN

It might take a few minutes for the ELB to become active, but you can proceed with the steps below as it becomes active.

Create the ECS cluster

aws ecs create-cluster --cluster-name test

Start your EC2 instance(s)

You should use the appropriate AMI for your region.

aws ec2 run-instances --image-id ami-c17ce0d6 --instance-type t2.micro --subnet-id $SUBNET_1 --iam-instance-profile Name=ecsInstanceRole --user-data file://ecs_startup.txt | tee create1.out.json
INSTANCE_ID1="$(cat create1.out.json | jq -r .Instances[].InstanceId)"
echo $INSTANCE_ID1
aws ec2 create-tags --resources $INSTANCE_ID1 --tags Key=Name,Value=test-ecs-thing-1

Adding multiple instances in multiple availability zones is simple. Just make sure that the load balancer has access to the security group.

Define your new ECS task

You should manually update the containers.json file to have a valid version number.

aws ecs register-task-definition --family test-task-family --container-definitions file://containers.json

One quick heads up: ECS tasks are defined independently of clusters and you can’t delete task definitions. You can deactivate all the versions to hide it in the AWS console UI.

Create the ECS service

Now that all the infrastructure is in place, it is time to create the ECS service using the task definition. This will cause your image to download and your service to start and register with the load balancer.

aws ecs create-service --cluster test --service-name test-task-web --task-definition test-task-family:1 --desired-count 1 --load-balancers targetGroupArn=$TARGET_GROUP_ARN,containerName=web-server,containerPort=80 --role ecsServiceRole

Deploy a new version manually

Let’s deploy a new version manually just to see what that is like. This is a two step process: you need to register a new task definition with the updated version number in containers.json and then you need to update the service to use the new task definition.

aws ecs register-task-definition --family test-task-family --container-definitions file://containers.json
aws ecs update-service --cluster test --service test-task-web2 --task-definition test-task-family:2 --desired-count 1

To roll back, just update the version number on the task definition parameter.

aws ecs update-service --cluster test --service test-task-web2 --task-definition test-task-family:1 --desired-count 1

Continuous Deployment

To enable continuous deployment, we need to do one more thing in AWS. We need to create an IAM user that the build/deployment process will use. Create an IAM user in AWS and give it the following policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1472009018000",
            "Effect": "Allow",
            "Action": [
                "ecs:DescribeServices",
                "ecs:RegisterTaskDefinition",
                "ecs:UpdateService"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}

Copy the user’s AWS key and secret and set them as variables on your build. You should set the following variables:

  1. AWS_DEFAULT_REGION
  2. AWS_ACCESS_KEY_ID
  3. AWS_SECRET_ACCESS_KEY

Now let’s return our attention to our build script. Here is the updated .gitlab-ci.yml file:

.gitlab-ci.yml

stages:
  - build
  - deploy

build_image:
  stage: build
  only: [master]
  image: docker:git
  services:
    - docker:dind
  script:
    - docker login -u gitlab-ci-token -p $CI_BUILD_TOKEN registry.gitlab.com
    - docker build -t registry.gitlab.com/don.mcnamara/mantis-web .
    - docker tag registry.gitlab.com/don.mcnamara/mantis-web registry.gitlab.com/don.mcnamara/mantis-web:$CI_BUILD_REF
    - docker push registry.gitlab.com/don.mcnamara/mantis-web:$CI_BUILD_REF
    - docker push registry.gitlab.com/don.mcnamara/mantis-web

deploy_production:
  stage: deploy
  only: [master]
  image: cgswong/aws:latest
  environment: production
  script:
  - aws --version
  - CONTAINER_DEF_FILE=deployment/containers_$CI_BUILD_REF.json
  - cp deployment/containers.json $CONTAINER_DEF_FILE
  - sed -i "s/{version}/$CI_BUILD_REF/g" $CONTAINER_DEF_FILE
  - cat $CONTAINER_DEF_FILE
  - NEW_REVISION="$(aws ecs register-task-definition --family test-task-family --container-definitions file://$CONTAINER_DEF_FILE | jq '.taskDefinition.revision')"
  - DESIRED_COUNT="$(aws ecs describe-services --cluster test --service test-task-web | jq '.services[].desiredCount')"
  - echo Desired Count = $DESIRED_COUNT
  - aws ecs update-service --cluster test --service test-task-web --task-definition test-task-family:$NEW_REVISION --desired-count $DESIRED_COUNT

I’ve added a deploy stage and a deployment_production step. This step uses an AWS CLI docker image so that we have the AWS CLI tools. If I were doing this on a production app, I might fork my own image of the AWS CLI tools for greater control.

The script makes a copy of the containers.json file and uses sed to find and replace {version} with the docker version tag which is also the commit hash.

Next the script calls AWS and creates the new task definition. It grabs the revision from the output. It then describes the current service to grab the desired count. Finally, it updates the service with the new task definition revision and the desired count.

One pleasant outcome is that ECS handles a full rolling deploy of your service. It creates the new service instances, updates the ALB to direct traffic to the new instances and drains connections to the old version. It is quite impressive to see it in action.

You should be aware that there is a window of time when both versions of your code are running, especially when making changes. If you make a breaking change in your code, it might cause downtime if your end user is bouncing between servers and therefore hitting different deployed versions of your application.

Performance

To test the end-to-end performance, I made a small change and timed the build and deployment. The entire process took about 3.5 minutes. 1.5 minutes was build and deploy on the GitLab continuous integration server. The remaining 2 minutes was the ECS rolling deployment. This feels a little slow to me and I think there are improvements that could be made.

Summary

This turned out to be a bit more complex than what I originally expected, but it has helped me learn some AWS concepts that I didn’t really have a complete grasp on previously. I am hugely impressed by GitLab and I will be using them for future projects.

The combination of Amazon ECS and ELB/ALB make a very nice deployment story in this proof of concept and I look forward to working more with this in the future.

Next steps

There are some obvious next steps here:

  • I’d like to add an API service that is deployed in parallel and use ALB path based routing to split the traffic.
  • It might be useful to have additional environments in the deployment pipeline.
  • I think the end-to-end performance can be improved.
  • It would also make a lot of sense to convert this to CloudFormation template. Update: This is done! See my post on the CloudFormation change..

If you’ve followed this far, don’t forget to clean up your AWS infrastructure. The ELB costs about $17/month and EC2 t2.micro instance is about $10/month.