AWS DevOps Masterclass: Automating Deployments with AWS CodeDeploy
In modern enterprise cloud architectures, deploying applications reliably, securely, and with zero downtime is a fundamental operational requirement. Manual deployments or custom, fragile shell scripts are major sources of production outages. AWS CodeDeploy solves these challenges by providing a fully managed, highly flexible deployment service that automates application deployments to a variety of compute services, including Amazon EC2, AWS Fargate, AWS Lambda, and on-premises servers.
This masterclass provides an exhaustive, production-grade guide to AWS CodeDeploy. It is designed for senior DevOps engineers, enterprise architects, and systems administrators who need to design, implement, and maintain robust continuous delivery (CD) pipelines. We will explore CodeDeploy's internal mechanics, write complete configuration manifests, implement advanced deployment strategies (such as canary and blue/green deployments), establish monitoring and observability, and address complex enterprise scenarios like database migrations and multi-region rollouts.
What is AWS CodeDeploy?
AWS CodeDeploy is a fully managed deployment service that automates software deployments to Amazon EC2 instances, on-premises instances, serverless AWS Lambda functions, or Amazon ECS services. It eliminates the need for manual operations, scales dynamically with your infrastructure, prevents downtime through advanced traffic-shifting deployment strategies (such as Blue/Green, Canary, and Linear), and automatically rolls back deployments when errors or failed health checks are detected.
What You Will Learn
- The internal architectural components and execution lifecycle of AWS CodeDeploy.
- How to write syntactically perfect, production-grade
appspec.ymlfiles for EC2, ECS, and Lambda. - How to configure and secure the CodeDeploy Host Agent on Linux and Windows.
- Advanced traffic-shifting strategies, including Blue/Green, Canary, and Linear deployments.
- How to write robust, idempotent lifecycle hook scripts with error handling and retry logic.
- How to provision CodeDeploy infrastructure using HashiCorp Terraform.
- How to handle database migrations in a Blue/Green environment using the Expand/Contract pattern.
- Enterprise observability, debugging techniques, security hardening, and troubleshooting strategies.
Prerequisites
To get the most out of this guide, you should have a solid understanding of:
- Core AWS services, particularly IAM, EC2, Auto Scaling, ECS, and Lambda.
- Basic continuous integration and continuous delivery (CI/CD) concepts.
- Infrastructure as Code (IaC) principles, specifically using Terraform.
- Linux systems administration, including shell scripting (Bash).
If you need to brush up on Infrastructure as Code, we highly recommend reviewing our previous module: Infrastructure as Code with Terraform.
CodeDeploy Core Architecture & Internal Mechanics
To build reliable deployment pipelines, you must understand how CodeDeploy coordinates deployments across various compute environments. CodeDeploy is not a push-based deployment tool like Ansible or Capistrano; instead, for EC2 and on-premises servers, it operates on a secure, pull-based architectural model managed by an agent.
Core Architectural Components
CodeDeploy relies on several logical entities that you must configure:
- Application: A logical container that ensures your deployment components (such as deployment groups and revisions) are uniquely identified. It points to the specific platform (Server, ECS, or Lambda) you are targeting.
- Deployment Configuration: A set of deployment rules and success/failure criteria. AWS provides pre-defined configurations (such as
CodeDeployDefault.OneAtATime,CodeDeployDefault.HalfAtATime, orCodeDeployDefault.LambdaCanary10Percent5Minutes), or you can define custom ones. - Deployment Group: The target environment configuration. This defines where the application will be deployed (e.g., specific EC2 instances targeted by tags, an Auto Scaling Group, an ECS Service, or a Lambda function) and how the deployment should behave (IAM service roles, load balancers, and rollback triggers).
- Revision: The specific version of your application code, files, container images, or serverless functions along with the
appspec.ymlconfiguration file. Revisions are typically stored as a.zip,.tar, or.tar.gzarchive in Amazon S3, or pulled from a GitHub repository. - Deployment: The execution execution of a deployment process, applying a specific Revision to a target Deployment Group using a designated Deployment Configuration.
The CodeDeploy Agent: Internal Pull Mechanism
For EC2 and on-premises deployments, the CodeDeploy Agent must be installed and running on the target operating system. The agent operates using a secure polling model:
- The agent initiates an outbound HTTPS connection (Port 443) to the AWS CodeDeploy service endpoint. It uses long polling to check for scheduled deployment commands.
- When a deployment is triggered, the agent receives a command payload containing instructions and the location of the deployment artifact (typically an S3 bucket).
- The agent downloads the application revision from S3, decrypts it (using AWS KMS if configured), and extracts it to a local workspace.
- The agent parses the
appspec.ymlfile located at the root of the revision and executes the defined lifecycle scripts in the exact sequence specified by the deployment engine. - After each lifecycle hook finishes, the agent reports the status (Success or Failure) back to the CodeDeploy service. If a hook fails, the agent stops execution and reports a deployment failure.
+------------------------------------------------------------------------------------------------+
| AWS Cloud Platform |
| |
| +--------------------+ +------------------------+ +---------------------+ |
| | | | | | | |
| | AWS CodePipeline | ----------> | AWS CodeDeploy | <----> | Amazon S3 Bucket | |
| | | | Service Control Plane | | (App Revisions) | |
| +--------------------+ +------------------------+ +---------------------+ |
| ^ ^ |
+-------------------------------------------------|--------------------------------|-------------+
| |
| HTTPS Polling | HTTPS Download
| (Port 443) | (Port 443)
v |
+----------------------------------------------------------------------------------|-------------+
| Target Compute Environment (EC2 / On-Premises Instance) | |
| | |
| +-------------------------------------------------------------------------------+----------+ |
| | CodeDeploy Host Agent Daemon | |
| | | |
| | 1. Polls CodeDeploy Service ---> 2. Receives Command ---> 3. Downloads S3 Revision | |
| | 4. Decrypts Artifacts ---> 5. Executes Hooks ---> 6. Reports Status / Logs | |
| +------------------------------------------------------------------------------------------+ |
| | |
| +---------------------------------------+---------------------------------------+ |
| | | | |
| v v v |
| [BeforeInstall] [ApplicationStart] [ValidateService] |
| (Stop services, clear cache) (Launch application) (Run integration tests) |
+------------------------------------------------------------------------------------------------+
ECS & Lambda Deployments (Agentless Architecture)
Unlike EC2, deployments to Amazon ECS and AWS Lambda are agentless. CodeDeploy communicates directly with the ECS and Lambda service control planes. It manages traffic shifting by updating the AWS Lambda function aliases or modifying the Amazon ECS listener rules on an Application Load Balancer (ALB). CodeDeploy executes validation hooks via AWS Lambda functions that run during the deployment lifecycle to confirm the health of the new version before routing production traffic to it.
Deep Dive: The AppSpec File Specification
The appspec.yml (Application Specification) file is the core configuration manifest for AWS CodeDeploy. It defines the files that should be copied, the permissions that should be applied, and the lifecycle event hooks to execute during the deployment. It must be located in the root directory of your application revision archive.
1. EC2 / On-Premises AppSpec Structure
The EC2 AppSpec file is divided into three primary sections: files, permissions, and hooks. The following example shows a production-grade appspec.yml for an Apache/PHP web application running on Amazon Linux 2:
version: 0.0
os: linux
files:
- source: /src
destination: /var/www/html/app
- source: /config/httpd.conf
destination: /etc/httpd/conf/httpd.conf
permissions:
- object: /var/www/html/app
pattern: "**"
owner: apache
group: apache
mode: 644
type:
- file
- object: /var/www/html/app/bin
pattern: "*"
owner: apache
group: apache
mode: 755
type:
- file
hooks:
ApplicationStop:
- location: scripts/stop_server.sh
timeout: 120
runas: root
BeforeInstall:
- location: scripts/install_dependencies.sh
timeout: 300
runas: root
AfterInstall:
- location: scripts/configure_application.sh
timeout: 180
runas: root
ApplicationStart:
- location: scripts/start_server.sh
timeout: 120
runas: root
ValidateService:
- location: scripts/validate_service.sh
timeout: 180
runas: apache
2. ECS AppSpec Structure
For Amazon ECS, the appspec.yml file is written in YAML or JSON and defines the task definition to use, the container port to route traffic to, and the optional validation Lambda functions to run during lifecycle hooks:
version: 0.0
Resources:
- TargetService:
Type: AWS::ECS::Service
Properties:
TaskDefinition: "arn:aws:ecs:us-east-1:123456789012:task-definition/my-ecs-app:5"
LoadBalancerInfo:
ContainerName: "web-container"
ContainerPort: 8080
Hooks:
- BeforeInstall: "arn:aws:lambda:us-east-1:123456789012:function:ValidateBeforeInstallHook"
- AfterAllowTestTraffic: "arn:aws:lambda:us-east-1:123456789012:function:RunIntegrationTests"
- AfterAllowTraffic: "arn:aws:lambda:us-east-1:123456789012:function:PostDeploymentSanityCheck"
3. Lambda AppSpec Structure
For serverless deployments, the AppSpec file is lightweight. It defines the name of the function, the alias to update, and the target version to route traffic to:
version: 0.0
Resources:
- MyLambdaFunction:
Type: AWS::Lambda::Function
Properties:
Name: "payment-processing-service"
Alias: "production"
CurrentVersion: "2"
TargetVersion: "3"
Hooks:
- PreTraffic: "arn:aws:lambda:us-east-1:123456789012:function:ValidatePaymentGatewaySchema"
- PostTraffic: "arn:aws:lambda:us-east-1:123456789012:function:VerifyPaymentProcessingMetrics"
Understanding the Deployment Lifecycle Hook Sequence
The sequence of lifecycle hooks is strictly enforced by the CodeDeploy engine. Understanding this order is critical for troubleshooting hanging or failing deployments.
| Hook Name | Execution Context | Typical Use Case |
|---|---|---|
| ApplicationStop | EC2/On-Premises | Gracefully shut down the currently running application, stop systemd services, or deregister from local load balancers. Runs the revision from the *previous* successful deployment. |
| BeforeInstall | EC2/On-Premises | Pre-deployment tasks. Decrypt secrets, clear disk caches, install system packages (e.g., yum install -y nginx), or pre-create application directories. |
| AfterInstall | EC2/On-Premises | Post-extraction configuration. Overwrite configuration templates with environment variables, run build steps, or configure log rotation. |
| ApplicationStart | EC2/On-Premises | Start application processes or services (e.g., systemctl start httpd or pm2 start server.js). |
| ValidateService | EC2/On-Premises | Verify the deployment succeeded. Run curl commands against localhost health check endpoints, verify database connections, or perform light integration testing. |
| PreTraffic / BeforeInstall | ECS / Lambda | Invoke validation Lambda functions before any production traffic is shifted to the new version (e.g., verify database schema migrations). |
| AfterAllowTestTraffic | ECS | Runs after traffic is routed to the ECS test listener. Ideal for running end-to-end integration tests against the new container tasks before production cutover. |
| PostTraffic / AfterAllowTraffic | ECS / Lambda | Runs after all traffic has shifted to the new deployment. Used to trigger cleanups, notify chat channels, or run post-deployment smoke tests. |
Deployment Strategies & Traffic-Shifting Configurations
Choosing the right deployment strategy is essential for balancing deployment speed against application availability and business risk. CodeDeploy supports multiple deployment configurations, categorized into In-Place and Blue/Green deployments.
1. In-Place Deployments
In an In-Place deployment, the application on each instance in the deployment group is stopped, the new revision is installed, and the new version of the application is started and validated. This strategy is best for non-critical workloads or environments where compute costs must be kept to a minimum, as it does not require provisioning new hardware.
To prevent downtime during In-Place deployments, you must configure a deployment configuration that updates instances progressively:
- OneAtATime (
CodeDeployDefault.OneAtATime): Deploys to only one instance at a time. If you have 10 instances, 9 remain online to handle traffic. This is the safest but slowest In-Place option. - HalfAtATime (
CodeDeployDefault.HalfAtATime): Deploys to up to half of the instances simultaneously. This reduces deployment time but temporarily cuts your serving capacity by 50%. - AllAtOnce (
CodeDeployDefault.AllAtOnce): Deploys to all instances at the same time. This results in complete downtime during the deployment but is useful for rapid updates in development or staging environments.
2. Blue/Green Deployments
Blue/Green deployments mitigate deployment risk by provisioning a completely new set of instances (the "Green" environment) alongside the existing running instances (the "Blue" environment). The new revision is deployed to the Green environment. Once validated, traffic is shifted from Blue to Green, either instantly or gradually.
A. Blue/Green with Auto Scaling Groups (EC2)
When integrated with an Auto Scaling Group (ASG), CodeDeploy automates the Blue/Green lifecycle:
- CodeDeploy copies the configuration of the existing ASG (Blue) and creates a new, temporary ASG (Green).
- It provisions new EC2 instances inside the Green ASG.
- The CodeDeploy agent installs the new application revision on the Green instances and executes the lifecycle hooks.
- Once the Green instances pass their
ValidateServicehooks and ALB target group health checks, CodeDeploy begins shifting traffic. - Traffic is shifted by associating the Green ASG with the Application Load Balancer target group and disassociating the Blue ASG.
- By default, CodeDeploy keeps the Blue instances running for a configurable "termination wait time" (e.g., 1 hour). This allows you to roll back instantly to the Blue environment if errors are discovered post-deployment. If no errors occur, the Blue ASG and its instances are terminated automatically.
[Production Traffic]
|
v
+------------------+
| Application |
| Load Balancer |
+------------------+
/ \
/ (0% Traffic) \ (100% Traffic)
v v
+--------------+ +--------------+
| Blue ASG | | Green ASG |
| (Old Version)| | (New Version)|
| [Instance] | | [Instance] |
| [Instance] | | [Instance] |
+--------------+ +--------------+
(Kept online for (Serving live
rollback window) production)
B. ECS Blue/Green Deployments
For Amazon ECS, Blue/Green deployments rely on an Application Load Balancer with two listeners (Production and Test) and two target groups (Target Group 1 and Target Group 2):
- Canary: Shifts a specified percentage of traffic to the new task definition for a set period, then shifts the remaining traffic. Example:
ECSLinear10PercentEvery1MinuteorECSCanary10Percent5Minutes. - Linear: Shifts traffic in equal increments over equal time intervals. Example:
ECSLinear10PercentEvery3Minutes. - AllAtOnce: Immediately shifts 100% of traffic from the old target group to the new target group.
Rollback Strategies
A deployment system is only as good as its recovery mechanisms. CodeDeploy provides automated rollbacks based on two primary triggers:
- Deployment Failure: If any host-level script fails, or if a validation hook returns a non-zero exit code, CodeDeploy immediately stops the deployment and triggers a rollback. It redeploys the last known good revision to the failed instances.
- CloudWatch Alarms: You can associate CloudWatch Alarms (such as HTTP 5XX error rates, application latency, or system CPU utilization) with a Deployment Group. If any of these alarms trigger during the deployment or during a post-deployment monitoring window, CodeDeploy stops the deployment and rolls back to the previous stable state.
Step-by-Step Production Implementation Guide (EC2 & Auto Scaling)
Let's build a production-grade, automated deployment pipeline for an Auto Scaling Group of EC2 instances running an Apache/PHP web application. We will configure IAM roles, install the host agent, write the deployment hooks, and provision the entire infrastructure using Terraform.
Step 1: Define IAM Roles and Policies
We need two distinct IAM roles: the CodeDeploy Service Role (which allows the CodeDeploy service to interact with ASGs, EC2, and ALBs) and the EC2 Instance Profile Role (which allows the EC2 instances to pull artifacts from S3 and communicate with the CodeDeploy service).
CodeDeploy Service Role Trust Policy (codedeploy-trust.json)
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "codedeploy.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
EC2 Instance Profile Trust Policy (ec2-trust.json)
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
Step 2: Install and Configure the CodeDeploy Agent via User Data
To automate the installation of the CodeDeploy agent on our EC2 instances, we will use an EC2 Launch Template with a User Data script. This script installs the agent, configures it, and ensures it starts automatically on boot:
#!/bin/bash
# Production CodeDeploy Agent Bootstrap Script
set -euo pipefail
exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1
echo "Updating system packages..."
yum update -y
echo "Installing dependencies..."
yum install -y ruby wget httpd
echo "Fetching CodeDeploy Agent installer for region: us-east-1..."
cd /home/ec2-user
wget https://aws-codedeploy-us-east-1.s3.us-east-1.amazonaws.com/latest/install
echo "Setting executable permissions on installer..."
chmod +x ./install
echo "Executing CodeDeploy Agent installation..."
if ./install auto; then
echo "Installation successful."
else
echo "Installation failed. Retrying..."
sleep 10
./install auto
fi
echo "Verifying agent service status..."
systemctl enable codedeploy-agent
systemctl start codedeploy-agent
systemctl status codedeploy-agent
echo "Starting and enabling Apache Web Server..."
systemctl enable httpd
systemctl start httpd
Step 3: Write Production-Grade Lifecycle Hook Scripts
Next, we need to create the lifecycle hook scripts referenced in our appspec.yml file. These scripts must be robust, return correct exit codes (0 for success, non-zero for failure), and handle errors gracefully.
Script 1: Stop Server (scripts/stop_server.sh)
#!/bin/bash
# Gracefully stop Apache to prepare for the new deployment
set -e
echo "Running ApplicationStop hook..."
if systemctl is-active --quiet httpd; then
echo "Stopping Apache Web Server..."
systemctl stop httpd
else
echo "Apache Web Server is already stopped."
fi
exit 0
Script 2: Install Dependencies (scripts/install_dependencies.sh)
#!/bin/bash
# Ensure required runtime packages are installed
set -eo pipefail
echo "Running BeforeInstall hook..."
echo "Verifying PHP and extensions are installed..."
if ! command -v php &> /dev/null; then
echo "PHP not found. Installing PHP..."
yum install -y php php-cli php-common php-opcache php-mbstring php-xml php-gd
else
echo "PHP is already installed: $(php -v | head -n 1)"
fi
# Ensure target directories exist and have correct permissions
mkdir -p /var/www/html/app
chown -R apache:apache /var/www/html/app
exit 0
Script 3: Configure Application (scripts/configure_application.sh)
#!/bin/bash
# Configure application parameters and inject environment variables
set -eo pipefail
echo "Running AfterInstall hook..."
# Fetch configuration secrets from AWS Systems Manager Parameter Store
echo "Retrieving database credentials from SSM Parameter Store..."
DB_HOST=$(aws ssm get-parameter --name "/prod/db/endpoint" --query "Parameter.Value" --output text)
DB_USER=$(aws ssm get-parameter --name "/prod/db/username" --query "Parameter.Value" --output text)
# Inject parameters into application config file
cat <<EOF > /var/www/html/app/config.php
<?php
define('DB_HOST', '${DB_HOST}');
define('DB_USER', '${DB_USER}');
define('DEPLOY_TIME', '$(date -u +"%Y-%m-%dT%H:%M:%SZ")');
?>
EOF
echo "Application configurations generated successfully."
chown apache:apache /var/www/html/app/config.php
chmod 600 /var/www/html/app/config.php
exit 0
Script 4: Start Server (scripts/start_server.sh)
#!/bin/bash
# Start Apache Web Server
set -e
echo "Running ApplicationStart hook..."
echo "Starting Apache Web Server..."
systemctl start httpd
exit 0
Script 5: Validate Service (scripts/validate_service.sh)
The ValidateService hook is the most important hook in the deployment. It verifies that the application is running correctly and is ready to accept production traffic. We use a retry loop to poll our health check endpoint, allowing the application a short grace period to start up.
#!/bin/bash
# Robust, self-healing service validation script
set -eo pipefail
HEALTH_CHECK_URL="http://localhost/app/health.php"
MAX_ATTEMPTS=6
DELAY_SECONDS=10
echo "Running ValidateService hook..."
echo "Polling health check endpoint: ${HEALTH_CHECK_URL}"
for ((attempt=1; attempt<=MAX_ATTEMPTS; attempt++)); do
echo "Attempt ${attempt} of ${MAX_ATTEMPTS}..."
# Use curl with a short timeout to check the endpoint
HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" --connect-timeout 5 "${HEALTH_CHECK_URL}" || true)
if [ "$HTTP_STATUS" -eq 200 ]; then
echo "Health check passed with HTTP status 200."
exit 0
else
echo "Health check failed with HTTP status: ${HTTP_STATUS}. Retrying in ${DELAY_SECONDS} seconds..."
sleep $DELAY_SECONDS
fi
done
echo "Error: Service validation failed after ${MAX_ATTEMPTS} attempts. Aborting deployment."
exit 1
Step 4: Provision CodeDeploy Resources via Terraform
Now, let's write a complete, production-ready HashiCorp Terraform configuration to provision the CodeDeploy Application, Deployment Group, custom Deployment Configuration, and all required IAM policies and roles.
# Terraform Configuration for AWS CodeDeploy Enterprise Deployment
terraform {
required_version = ">= 1.3.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = "us-east-1"
}
# 1. IAM Service Role for CodeDeploy
resource "aws_iam_role" "codedeploy_service_role" {
name = "codedeploy-service-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "codedeploy.amazonaws.com"
}
}
]
})
}
# Attach AWS Managed CodeDeploy policy
resource "aws_iam_role_policy_attachment" "codedeploy_role_attach" {
policy_arn = "arn:aws:iam::aws:policy/service-role/AWSCodeDeployRole"
role = aws_iam_role.codedeploy_service_role.name
}
# 2. IAM Role and Instance Profile for EC2 Instances
resource "aws_iam_role" "ec2_instance_role" {
name = "ec2-instance-deployment-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
}
]
})
}
# IAM Policy to allow EC2 to read from deployment S3 bucket and Parameter Store
resource "aws_iam_policy" "ec2_deployment_policy" {
name = "ec2-deployment-permissions"
description = "Allows instances to pull deployment artifacts and access configuration parameters"
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"s3:GetObject",
"s3:ListBucket"
]
Resource = [
"arn:aws:s3:::enterprise-deployment-artifacts-prod/*",
"arn:aws:s3:::enterprise-deployment-artifacts-prod"
]
},
{
Effect = "Allow"
Action = [
"ssm:GetParameter",
"ssm:GetParameters"
]
Resource = "arn:aws:ssm:us-east-1:*:parameter/prod/*"
}
]
})
}
resource "aws_iam_role_policy_attachment" "ec2_policy_attach" {
policy_arn = aws_iam_policy.ec2_deployment_policy.arn
role = aws_iam_role.ec2_instance_role.name
}
# Attach SSM Core policy for Session Manager access (security best practice)
resource "aws_iam_role_policy_attachment" "ec2_ssm_attach" {
policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
role = aws_iam_role.ec2_instance_role.name
}
resource "aws_iam_instance_profile" "ec2_instance_profile" {
name = "ec2-instance-deployment-profile"
role = aws_iam_role.ec2_instance_role.name
}
# 3. CodeDeploy Application
resource "aws_codedeploy_app" "web_app" {
compute_platform = "Server"
name = "enterprise-web-application"
}
# 4. Custom Deployment Configuration
resource "aws_codedeploy_deployment_config" "custom_rolling_config" {
deployment_config_name = "Custom.RollingTwoAtATime"
compute_platform = "Server"
minimum_healthy_hosts {
type = "FLEET_PERCENT"
value = 75
}
}
# 5. CodeDeploy Deployment Group
resource "aws_codedeploy_deployment_group" "web_app_group" {
app_name = aws_codedeploy_app.web_app.name
deployment_group_name = "production-fleet"
service_role_arn = aws_codedeploy_service_role.codedeploy_service_role.arn
deployment_config_name = aws_codedeploy_deployment_config.custom_rolling_config.deployment_config_name
# Target instances associated with specific Auto Scaling Groups
autoscaling_groups = ["production-web-asg"]
# Configure automatic rollback on deployment failures or alarms
auto_rollback_configuration {
enabled = true
events = ["DEPLOYMENT_FAILURE", "DEPLOYMENT_STOP_ON_ALARM"]
}
# Enable deployment status notifications via SNS
trigger_configuration {
trigger_events = ["DeploymentFailure", "DeploymentSuccess", "DeploymentRollback"]
trigger_name = "deployment-notifications-trigger"
trigger_target_arn = "arn:aws:sns:us-east-1:123456789012:deployment-alerts-topic"
}
# Out-of-service alarm monitoring during deployment
alarm_configuration {
alarms = ["web-application-error-rate-high"]
enabled = true
ignore_poll_alarm_failure = false
}
}
Step-by-Step Production Implementation Guide (ECS Blue/Green)
Deploying microservices to Amazon ECS using Blue/Green strategies requires configuring an Application Load Balancer (ALB) to safely shift traffic between two Target Groups. Let's look at how to structure an ECS Blue/Green deployment using CodeDeploy.
The ECS Blue/Green Network Topology
An ECS Blue/Green deployment uses the following architecture:
- Production Listener (Port 443): Routes live customer traffic to the active Target Group (e.g., Target Group 1 / Blue).
- Test Listener (Port 8443): Routes test traffic to the replacement Target Group (e.g., Target Group 2 / Green) before the final cutover.
- Target Group 1 (Blue): Hosts the tasks running the currently active version of the container.
- Target Group 2 (Green): Hosts the tasks running the new version of the container during a deployment.
[Client Requests]
/ \
(Port 443) / \ (Port 8443)
Prod Listener/ \Test Listener
v v
+------------+ +------------+
| Target Grp | | Target Grp |
| 1 (Blue) | | 2 (Green) |
+------------+ +------------+
| |
v v
+------------+ +------------+
| ECS Tasks | | ECS Tasks |
| (v1.0.0) | | (v1.1.0) |
+```
+------------+ +------------+
| ECS Tasks | | ECS Tasks |
| (v1.0.0) | | (v1.1.0) |
+------------+ +------------+
Initial State:
- Production Listener -> Target Group 1 (Blue)
- Test Listener -> Target Group 2 (Green)
After Validation:
- Production Listener -> Target Group 2 (Green)
- Blue Tasks retained for rollback window
```
ECS CodeDeploy Deployment Group Configuration (Terraform)
The following Terraform configuration creates an ECS deployment group configured for Blue/Green deployments with traffic shifting managed by CodeDeploy.
resource "aws_codedeploy_app" "ecs_app" {
name = "payment-processing-service"
compute_platform = "ECS"
}
resource "aws_codedeploy_deployment_group" "ecs_blue_green" {
app_name = aws_codedeploy_app.ecs_app.name
deployment_group_name = "production-ecs-bluegreen"
service_role_arn = aws_iam_role.codedeploy_service_role.arn
deployment_config_name = "CodeDeployDefault.ECSCanary10Percent5Minutes"
ecs_service {
cluster_name = aws_ecs_cluster.production_cluster.name
service_name = aws_ecs_service.payment_service.name
}
deployment_style {
deployment_type = "BLUE_GREEN"
deployment_option = "WITH_TRAFFIC_CONTROL"
}
blue_green_deployment_config {
deployment_ready_option {
action_on_timeout = "CONTINUE_DEPLOYMENT"
}
terminate_blue_instances_on_deployment_success {
action = "TERMINATE"
termination_wait_time_in_minutes = 60
}
}
load_balancer_info {
target_group_pair_info {
prod_traffic_route {
listener_arns = [
aws_lb_listener.production.arn
]
}
test_traffic_route {
listener_arns = [
aws_lb_listener.test.arn
]
}
target_group {
name = aws_lb_target_group.blue.name
}
target_group {
name = aws_lb_target_group.green.name
}
}
}
auto_rollback_configuration {
enabled = true
events = [
"DEPLOYMENT_FAILURE",
"DEPLOYMENT_STOP_ON_ALARM"
]
}
}
ECS Deployment Lifecycle Flow
- A new task definition revision is registered.
- CodePipeline triggers a CodeDeploy deployment.
- CodeDeploy launches replacement tasks in the Green target group.
- ALB health checks validate Green tasks.
- Traffic is routed to the Test Listener.
- Integration tests execute through
AfterAllowTestTraffichooks. - Traffic shifts gradually according to the deployment configuration.
- CloudWatch alarms continuously monitor application health.
- If alarms remain healthy, Green becomes production.
- Blue tasks are retained for the configured rollback period.
Database Migration Strategies for Blue/Green Deployments
One of the most common causes of deployment failures is database schema incompatibility. Applications often deploy faster than database changes can be safely rolled back. Enterprise teams therefore use the Expand/Contract Pattern.
Phase 1: Expand
- Add new tables or columns.
- Do not remove existing structures.
- Ensure old and new application versions continue to function.
ALTER TABLE customers
ADD COLUMN preferred_language VARCHAR(10) NULL;
Phase 2: Application Deployment
- Deploy new application version.
- Write to both old and new schema structures if necessary.
- Monitor production traffic.
Phase 3: Contract
- Remove deprecated columns only after all environments run the new code.
- Execute cleanup migration in a later deployment cycle.
ALTER TABLE customers
DROP COLUMN legacy_language_code;
This strategy enables safe rollback because previous application versions remain compatible with the expanded schema.
Monitoring and Observability
Enterprise deployments require complete visibility into deployment health.
Critical CloudWatch Metrics
| Metric | Purpose |
|---|---|
| HTTPCode_Target_5XX_Count | Application failures |
| TargetResponseTime | Latency degradation detection |
| HealthyHostCount | Target health verification |
| CPUUtilization | Infrastructure stress monitoring |
| MemoryUtilization | ECS task resource consumption |
Recommended CloudWatch Alarm
resource "aws_cloudwatch_metric_alarm" "high_5xx_rate" {
alarm_name = "production-http-5xx-errors"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "HTTPCode_Target_5XX_Count"
namespace = "AWS/ApplicationELB"
period = 60
statistic = "Sum"
threshold = 20
alarm_description = "Triggers rollback if error rate spikes"
}
Security Hardening Best Practices
- Use IAM least-privilege policies.
- Store secrets in AWS Secrets Manager or Parameter Store.
- Encrypt deployment artifacts using AWS KMS.
- Enable CloudTrail logging for deployment auditing.
- Use Session Manager instead of SSH access.
- Enable VPC endpoints for S3, CodeDeploy, and Systems Manager.
- Restrict deployment initiation permissions to CI/CD roles.
- Implement approval gates for production deployments.
Advanced Troubleshooting Guide
CodeDeploy Agent Not Reporting
sudo systemctl status codedeploy-agent
sudo journalctl -u codedeploy-agent
Review Agent Logs
tail -f /var/log/aws/codedeploy-agent/codedeploy-agent.log
Review Deployment Script Logs
ls -ltr /opt/codedeploy-agent/deployment-root/
Verify Deployment Status
aws deploy get-deployment \
--deployment-id d-ABCDEFGHIJK
Conclusion
AWS CodeDeploy is significantly more than a deployment utility. It is a deployment orchestration platform that enables highly reliable software delivery across EC2, ECS, Lambda, and hybrid environments. By combining Blue/Green deployment patterns, automated rollback mechanisms, CloudWatch-driven observability, and Infrastructure as Code, organizations can achieve near-zero-downtime releases while dramatically reducing operational risk.
Mastering CodeDeploy allows DevOps teams to move from fragile deployment processes to fully automated, auditable, scalable, and enterprise-grade continuous delivery pipelines.