AWS DevOps Masterclass: Automating Deployments with AWS CodeDeploy

In modern enterprise cloud architectures, deploying applications reliably, securely, and with zero downtime is a fundamental operational requirement. Manual deployments or custom, fragile shell scripts are major sources of production outages. AWS CodeDeploy solves these challenges by providing a fully managed, highly flexible deployment service that automates application deployments to a variety of compute services, including Amazon EC2, AWS Fargate, AWS Lambda, and on-premises servers.

This masterclass provides an exhaustive, production-grade guide to AWS CodeDeploy. It is designed for senior DevOps engineers, enterprise architects, and systems administrators who need to design, implement, and maintain robust continuous delivery (CD) pipelines. We will explore CodeDeploy's internal mechanics, write complete configuration manifests, implement advanced deployment strategies (such as canary and blue/green deployments), establish monitoring and observability, and address complex enterprise scenarios like database migrations and multi-region rollouts.

What is AWS CodeDeploy?

AWS CodeDeploy is a fully managed deployment service that automates software deployments to Amazon EC2 instances, on-premises instances, serverless AWS Lambda functions, or Amazon ECS services. It eliminates the need for manual operations, scales dynamically with your infrastructure, prevents downtime through advanced traffic-shifting deployment strategies (such as Blue/Green, Canary, and Linear), and automatically rolls back deployments when errors or failed health checks are detected.

What You Will Learn

The internal architectural components and execution lifecycle of AWS CodeDeploy.
How to write syntactically perfect, production-grade appspec.yml files for EC2, ECS, and Lambda.
How to configure and secure the CodeDeploy Host Agent on Linux and Windows.
Advanced traffic-shifting strategies, including Blue/Green, Canary, and Linear deployments.
How to write robust, idempotent lifecycle hook scripts with error handling and retry logic.
How to provision CodeDeploy infrastructure using HashiCorp Terraform.
How to handle database migrations in a Blue/Green environment using the Expand/Contract pattern.
Enterprise observability, debugging techniques, security hardening, and troubleshooting strategies.

Prerequisites

To get the most out of this guide, you should have a solid understanding of:

Core AWS services, particularly IAM, EC2, Auto Scaling, ECS, and Lambda.
Basic continuous integration and continuous delivery (CI/CD) concepts.
Infrastructure as Code (IaC) principles, specifically using Terraform.
Linux systems administration, including shell scripting (Bash).

If you need to brush up on Infrastructure as Code, we highly recommend reviewing our previous module: Infrastructure as Code with Terraform.

CodeDeploy Core Architecture & Internal Mechanics

To build reliable deployment pipelines, you must understand how CodeDeploy coordinates deployments across various compute environments. CodeDeploy is not a push-based deployment tool like Ansible or Capistrano; instead, for EC2 and on-premises servers, it operates on a secure, pull-based architectural model managed by an agent.

Core Architectural Components

CodeDeploy relies on several logical entities that you must configure:

Application: A logical container that ensures your deployment components (such as deployment groups and revisions) are uniquely identified. It points to the specific platform (Server, ECS, or Lambda) you are targeting.
Deployment Configuration: A set of deployment rules and success/failure criteria. AWS provides pre-defined configurations (such as CodeDeployDefault.OneAtATime, CodeDeployDefault.HalfAtATime, or CodeDeployDefault.LambdaCanary10Percent5Minutes), or you can define custom ones.
Deployment Group: The target environment configuration. This defines where the application will be deployed (e.g., specific EC2 instances targeted by tags, an Auto Scaling Group, an ECS Service, or a Lambda function) and how the deployment should behave (IAM service roles, load balancers, and rollback triggers).
Revision: The specific version of your application code, files, container images, or serverless functions along with the appspec.yml configuration file. Revisions are typically stored as a .zip, .tar, or .tar.gz archive in Amazon S3, or pulled from a GitHub repository.
Deployment: The execution execution of a deployment process, applying a specific Revision to a target Deployment Group using a designated Deployment Configuration.

The CodeDeploy Agent: Internal Pull Mechanism

For EC2 and on-premises deployments, the CodeDeploy Agent must be installed and running on the target operating system. The agent operates using a secure polling model:

The agent initiates an outbound HTTPS connection (Port 443) to the AWS CodeDeploy service endpoint. It uses long polling to check for scheduled deployment commands.
When a deployment is triggered, the agent receives a command payload containing instructions and the location of the deployment artifact (typically an S3 bucket).
The agent downloads the application revision from S3, decrypts it (using AWS KMS if configured), and extracts it to a local workspace.
The agent parses the appspec.yml file located at the root of the revision and executes the defined lifecycle scripts in the exact sequence specified by the deployment engine.
After each lifecycle hook finishes, the agent reports the status (Success or Failure) back to the CodeDeploy service. If a hook fails, the agent stops execution and reports a deployment failure.

+------------------------------------------------------------------------------------------------+
|                                      AWS Cloud Platform                                        |
|                                                                                                |
|  +--------------------+             +------------------------+        +---------------------+  |
|  |                    |             |                        |        |                     |  |
|  |  AWS CodePipeline  | ----------> |  AWS CodeDeploy        | <----> |  Amazon S3 Bucket   |  |
|  |                    |             |  Service Control Plane |        |  (App Revisions)    |  |
|  +--------------------+             +------------------------+        +---------------------+  |
|                                                 ^                                ^             |
+-------------------------------------------------|--------------------------------|-------------+
                                                  |                                |
                                                  | HTTPS Polling                  | HTTPS Download
                                                  | (Port 443)                     | (Port 443)
                                                  v                                |
+----------------------------------------------------------------------------------|-------------+
|  Target Compute Environment (EC2 / On-Premises Instance)                         |             |
|                                                                                  |             |
|  +-------------------------------------------------------------------------------+----------+  |
|  |  CodeDeploy Host Agent Daemon                                                            |  |
|  |                                                                                          |  |
|  |  1. Polls CodeDeploy Service  --->  2. Receives Command ---> 3. Downloads S3 Revision    |  |
|  |  4. Decrypts Artifacts        --->  5. Executes Hooks  ---> 6. Reports Status / Logs     |  |
|  +------------------------------------------------------------------------------------------+  |
|                                                 |                                              |
|         +---------------------------------------+---------------------------------------+      |
|         |                                       |                                       |      |
|         v                                       v                                       v      |
|  [BeforeInstall]                       [ApplicationStart]                       [ValidateService]      |
|  (Stop services, clear cache)          (Launch application)                     (Run integration tests) |
+------------------------------------------------------------------------------------------------+

ECS & Lambda Deployments (Agentless Architecture)

Unlike EC2, deployments to Amazon ECS and AWS Lambda are agentless. CodeDeploy communicates directly with the ECS and Lambda service control planes. It manages traffic shifting by updating the AWS Lambda function aliases or modifying the Amazon ECS listener rules on an Application Load Balancer (ALB). CodeDeploy executes validation hooks via AWS Lambda functions that run during the deployment lifecycle to confirm the health of the new version before routing production traffic to it.

Deep Dive: The AppSpec File Specification

The appspec.yml (Application Specification) file is the core configuration manifest for AWS CodeDeploy. It defines the files that should be copied, the permissions that should be applied, and the lifecycle event hooks to execute during the deployment. It must be located in the root directory of your application revision archive.

1. EC2 / On-Premises AppSpec Structure

The EC2 AppSpec file is divided into three primary sections: files, permissions, and hooks. The following example shows a production-grade appspec.yml for an Apache/PHP web application running on Amazon Linux 2:

version: 0.0
os: linux
files:
  - source: /src
    destination: /var/www/html/app
  - source: /config/httpd.conf
    destination: /etc/httpd/conf/httpd.conf
permissions:
  - object: /var/www/html/app
    pattern: "**"
    owner: apache
    group: apache
    mode: 644
    type:
      - file
  - object: /var/www/html/app/bin
    pattern: "*"
    owner: apache
    group: apache
    mode: 755
    type:
      - file
hooks:
  ApplicationStop:
    - location: scripts/stop_server.sh
      timeout: 120
      runas: root
  BeforeInstall:
    - location: scripts/install_dependencies.sh
      timeout: 300
      runas: root
  AfterInstall:
    - location: scripts/configure_application.sh
      timeout: 180
      runas: root
  ApplicationStart:
    - location: scripts/start_server.sh
      timeout: 120
      runas: root
  ValidateService:
    - location: scripts/validate_service.sh
      timeout: 180
      runas: apache

2. ECS AppSpec Structure

For Amazon ECS, the appspec.yml file is written in YAML or JSON and defines the task definition to use, the container port to route traffic to, and the optional validation Lambda functions to run during lifecycle hooks:

version: 0.0
Resources:
  - TargetService:
      Type: AWS::ECS::Service
      Properties:
        TaskDefinition: "arn:aws:ecs:us-east-1:123456789012:task-definition/my-ecs-app:5"
        LoadBalancerInfo:
          ContainerName: "web-container"
          ContainerPort: 8080
Hooks:
  - BeforeInstall: "arn:aws:lambda:us-east-1:123456789012:function:ValidateBeforeInstallHook"
  - AfterAllowTestTraffic: "arn:aws:lambda:us-east-1:123456789012:function:RunIntegrationTests"
  - AfterAllowTraffic: "arn:aws:lambda:us-east-1:123456789012:function:PostDeploymentSanityCheck"

3. Lambda AppSpec Structure

For serverless deployments, the AppSpec file is lightweight. It defines the name of the function, the alias to update, and the target version to route traffic to:

version: 0.0
Resources:
  - MyLambdaFunction:
      Type: AWS::Lambda::Function
      Properties:
        Name: "payment-processing-service"
        Alias: "production"
        CurrentVersion: "2"
        TargetVersion: "3"
Hooks:
  - PreTraffic: "arn:aws:lambda:us-east-1:123456789012:function:ValidatePaymentGatewaySchema"
  - PostTraffic: "arn:aws:lambda:us-east-1:123456789012:function:VerifyPaymentProcessingMetrics"

Understanding the Deployment Lifecycle Hook Sequence

The sequence of lifecycle hooks is strictly enforced by the CodeDeploy engine. Understanding this order is critical for troubleshooting hanging or failing deployments.

Hook Name	Execution Context	Typical Use Case
ApplicationStop	EC2/On-Premises	Gracefully shut down the currently running application, stop systemd services, or deregister from local load balancers. Runs the revision from the previous successful deployment.
BeforeInstall	EC2/On-Premises	Pre-deployment tasks. Decrypt secrets, clear disk caches, install system packages (e.g., `yum install -y nginx`), or pre-create application directories.
AfterInstall	EC2/On-Premises	Post-extraction configuration. Overwrite configuration templates with environment variables, run build steps, or configure log rotation.
ApplicationStart	EC2/On-Premises	Start application processes or services (e.g., `systemctl start httpd` or `pm2 start server.js`).
ValidateService	EC2/On-Premises	Verify the deployment succeeded. Run curl commands against localhost health check endpoints, verify database connections, or perform light integration testing.
PreTraffic / BeforeInstall	ECS / Lambda	Invoke validation Lambda functions before any production traffic is shifted to the new version (e.g., verify database schema migrations).
AfterAllowTestTraffic	ECS	Runs after traffic is routed to the ECS test listener. Ideal for running end-to-end integration tests against the new container tasks before production cutover.
PostTraffic / AfterAllowTraffic	ECS / Lambda	Runs after all traffic has shifted to the new deployment. Used to trigger cleanups, notify chat channels, or run post-deployment smoke tests.

Deployment Strategies & Traffic-Shifting Configurations

Choosing the right deployment strategy is essential for balancing deployment speed against application availability and business risk. CodeDeploy supports multiple deployment configurations, categorized into In-Place and Blue/Green deployments.

1. In-Place Deployments

In an In-Place deployment, the application on each instance in the deployment group is stopped, the new revision is installed, and the new version of the application is started and validated. This strategy is best for non-critical workloads or environments where compute costs must be kept to a minimum, as it does not require provisioning new hardware.

To prevent downtime during In-Place deployments, you must configure a deployment configuration that updates instances progressively:

OneAtATime (CodeDeployDefault.OneAtATime): Deploys to only one instance at a time. If you have 10 instances, 9 remain online to handle traffic. This is the safest but slowest In-Place option.
HalfAtATime (CodeDeployDefault.HalfAtATime): Deploys to up to half of the instances simultaneously. This reduces deployment time but temporarily cuts your serving capacity by 50%.
AllAtOnce (CodeDeployDefault.AllAtOnce): Deploys to all instances at the same time. This results in complete downtime during the deployment but is useful for rapid updates in development or staging environments.

2. Blue/Green Deployments

Blue/Green deployments mitigate deployment risk by provisioning a completely new set of instances (the "Green" environment) alongside the existing running instances (the "Blue" environment). The new revision is deployed to the Green environment. Once validated, traffic is shifted from Blue to Green, either instantly or gradually.

A. Blue/Green with Auto Scaling Groups (EC2)

When integrated with an Auto Scaling Group (ASG), CodeDeploy automates the Blue/Green lifecycle:

CodeDeploy copies the configuration of the existing ASG (Blue) and creates a new, temporary ASG (Green).
It provisions new EC2 instances inside the Green ASG.
The CodeDeploy agent installs the new application revision on the Green instances and executes the lifecycle hooks.
Once the Green instances pass their ValidateService hooks and ALB target group health checks, CodeDeploy begins shifting traffic.
Traffic is shifted by associating the Green ASG with the Application Load Balancer target group and disassociating the Blue ASG.
By default, CodeDeploy keeps the Blue instances running for a configurable "termination wait time" (e.g., 1 hour). This allows you to roll back instantly to the Blue environment if errors are discovered post-deployment. If no errors occur, the Blue ASG and its instances are terminated automatically.

    [Production Traffic]
             |
             v
   +------------------+
   |   Application    |
   |  Load Balancer   |
   +------------------+
     /              \
    / (0% Traffic)   \ (100% Traffic)
   v                  v
+--------------+    +--------------+
|  Blue ASG    |    |  Green ASG   |
| (Old Version)|    | (New Version)|
|  [Instance]  |    |  [Instance]  |
|  [Instance]  |    |  [Instance]  |
+--------------+    +--------------+
(Kept online for     (Serving live
 rollback window)     production)

B. ECS Blue/Green Deployments

For Amazon ECS, Blue/Green deployments rely on an Application Load Balancer with two listeners (Production and Test) and two target groups (Target Group 1 and Target Group 2):

Canary: Shifts a specified percentage of traffic to the new task definition for a set period, then shifts the remaining traffic. Example: ECSLinear10PercentEvery1Minute or ECSCanary10Percent5Minutes.
Linear: Shifts traffic in equal increments over equal time intervals. Example: ECSLinear10PercentEvery3Minutes.
AllAtOnce: Immediately shifts 100% of traffic from the old target group to the new target group.

Rollback Strategies

A deployment system is only as good as its recovery mechanisms. CodeDeploy provides automated rollbacks based on two primary triggers:

Deployment Failure: If any host-level script fails, or if a validation hook returns a non-zero exit code, CodeDeploy immediately stops the deployment and triggers a rollback. It redeploys the last known good revision to the failed instances.
CloudWatch Alarms: You can associate CloudWatch Alarms (such as HTTP 5XX error rates, application latency, or system CPU utilization) with a Deployment Group. If any of these alarms trigger during the deployment or during a post-deployment monitoring window, CodeDeploy stops the deployment and rolls back to the previous stable state.

Step-by-Step Production Implementation Guide (EC2 & Auto Scaling)

Let's build a production-grade, automated deployment pipeline for an Auto Scaling Group of EC2 instances running an Apache/PHP web application. We will configure IAM roles, install the host agent, write the deployment hooks, and provision the entire infrastructure using Terraform.

Step 1: Define IAM Roles and Policies

We need two distinct IAM roles: the CodeDeploy Service Role (which allows the CodeDeploy service to interact with ASGs, EC2, and ALBs) and the EC2 Instance Profile Role (which allows the EC2 instances to pull artifacts from S3 and communicate with the CodeDeploy service).

CodeDeploy Service Role Trust Policy (`codedeploy-trust.json`)

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "codedeploy.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

EC2 Instance Profile Trust Policy (`ec2-trust.json`)

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Step 2: Install and Configure the CodeDeploy Agent via User Data

To automate the installation of the CodeDeploy agent on our EC2 instances, we will use an EC2 Launch Template with a User Data script. This script installs the agent, configures it, and ensures it starts automatically on boot:

#!/bin/bash
# Production CodeDeploy Agent Bootstrap Script
set -euo pipefail
exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1

echo "Updating system packages..."
yum update -y

echo "Installing dependencies..."
yum install -y ruby wget httpd

echo "Fetching CodeDeploy Agent installer for region: us-east-1..."
cd /home/ec2-user
wget https://aws-codedeploy-us-east-1.s3.us-east-1.amazonaws.com/latest/install

echo "Setting executable permissions on installer..."
chmod +x ./install

echo "Executing CodeDeploy Agent installation..."
if ./install auto; then
    echo "Installation successful."
else
    echo "Installation failed. Retrying..."
    sleep 10
    ./install auto
fi

echo "Verifying agent service status..."
systemctl enable codedeploy-agent
systemctl start codedeploy-agent
systemctl status codedeploy-agent

echo "Starting and enabling Apache Web Server..."
systemctl enable httpd
systemctl start httpd

Step 3: Write Production-Grade Lifecycle Hook Scripts

Next, we need to create the lifecycle hook scripts referenced in our appspec.yml file. These scripts must be robust, return correct exit codes (0 for success, non-zero for failure), and handle errors gracefully.

Script 1: Stop Server (`scripts/stop_server.sh`)

#!/bin/bash
# Gracefully stop Apache to prepare for the new deployment
set -e

echo "Running ApplicationStop hook..."
if systemctl is-active --quiet httpd; then
    echo "Stopping Apache Web Server..."
    systemctl stop httpd
else
    echo "Apache Web Server is already stopped."
fi
exit 0

Script 2: Install Dependencies (`scripts/install_dependencies.sh`)

#!/bin/bash
# Ensure required runtime packages are installed
set -eo pipefail

echo "Running BeforeInstall hook..."
echo "Verifying PHP and extensions are installed..."
if ! command -v php &> /dev/null; then
    echo "PHP not found. Installing PHP..."
    yum install -y php php-cli php-common php-opcache php-mbstring php-xml php-gd
else
    echo "PHP is already installed: $(php -v | head -n 1)"
fi

# Ensure target directories exist and have correct permissions
mkdir -p /var/www/html/app
chown -R apache:apache /var/www/html/app
exit 0

Script 3: Configure Application (`scripts/configure_application.sh`)

#!/bin/bash
# Configure application parameters and inject environment variables
set -eo pipefail

echo "Running AfterInstall hook..."

# Fetch configuration secrets from AWS Systems Manager Parameter Store
echo "Retrieving database credentials from SSM Parameter Store..."
DB_HOST=$(aws ssm get-parameter --name "/prod/db/endpoint" --query "Parameter.Value" --output text)
DB_USER=$(aws ssm get-parameter --name "/prod/db/username" --query "Parameter.Value" --output text)

# Inject parameters into application config file
cat <<EOF > /var/www/html/app/config.php
<?php
define('DB_HOST', '${DB_HOST}');
define('DB_USER', '${DB_USER}');
define('DEPLOY_TIME', '$(date -u +"%Y-%m-%dT%H:%M:%SZ")');
?>
EOF

echo "Application configurations generated successfully."
chown apache:apache /var/www/html/app/config.php
chmod 600 /var/www/html/app/config.php
exit 0

Script 4: Start Server (`scripts/start_server.sh`)

#!/bin/bash
# Start Apache Web Server
set -e

echo "Running ApplicationStart hook..."
echo "Starting Apache Web Server..."
systemctl start httpd
exit 0

Script 5: Validate Service (`scripts/validate_service.sh`)

The ValidateService hook is the most important hook in the deployment. It verifies that the application is running correctly and is ready to accept production traffic. We use a retry loop to poll our health check endpoint, allowing the application a short grace period to start up.

#!/bin/bash
# Robust, self-healing service validation script
set -eo pipefail

HEALTH_CHECK_URL="http://localhost/app/health.php"
MAX_ATTEMPTS=6
DELAY_SECONDS=10

echo "Running ValidateService hook..."
echo "Polling health check endpoint: ${HEALTH_CHECK_URL}"

for ((attempt=1; attempt<=MAX_ATTEMPTS; attempt++)); do
    echo "Attempt ${attempt} of ${MAX_ATTEMPTS}..."
    
    # Use curl with a short timeout to check the endpoint
    HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" --connect-timeout 5 "${HEALTH_CHECK_URL}" || true)
    
    if [ "$HTTP_STATUS" -eq 200 ]; then
        echo "Health check passed with HTTP status 200."
        exit 0
    else
        echo "Health check failed with HTTP status: ${HTTP_STATUS}. Retrying in ${DELAY_SECONDS} seconds..."
        sleep $DELAY_SECONDS
    fi
done

echo "Error: Service validation failed after ${MAX_ATTEMPTS} attempts. Aborting deployment."
exit 1

Step 4: Provision CodeDeploy Resources via Terraform

Now, let's write a complete, production-ready HashiCorp Terraform configuration to provision the CodeDeploy Application, Deployment Group, custom Deployment Configuration, and all required IAM policies and roles.

# Terraform Configuration for AWS CodeDeploy Enterprise Deployment
terraform {
  required_version = ">= 1.3.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "us-east-1"
}

# 1. IAM Service Role for CodeDeploy
resource "aws_iam_role" "codedeploy_service_role" {
  name               = "codedeploy-service-role"
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "codedeploy.amazonaws.com"
        }
      }
    ]
  })
}

# Attach AWS Managed CodeDeploy policy
resource "aws_iam_role_policy_attachment" "codedeploy_role_attach" {
  policy_arn = "arn:aws:iam::aws:policy/service-role/AWSCodeDeployRole"
  role       = aws_iam_role.codedeploy_service_role.name
}

# 2. IAM Role and Instance Profile for EC2 Instances
resource "aws_iam_role" "ec2_instance_role" {
  name               = "ec2-instance-deployment-role"
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "ec2.amazonaws.com"
        }
      }
    ]
  })
}

# IAM Policy to allow EC2 to read from deployment S3 bucket and Parameter Store
resource "aws_iam_policy" "ec2_deployment_policy" {
  name        = "ec2-deployment-permissions"
  description = "Allows instances to pull deployment artifacts and access configuration parameters"

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "s3:GetObject",
          "s3:ListBucket"
        ]
        Resource = [
          "arn:aws:s3:::enterprise-deployment-artifacts-prod/*",
          "arn:aws:s3:::enterprise-deployment-artifacts-prod"
        ]
      },
      {
        Effect = "Allow"
        Action = [
          "ssm:GetParameter",
          "ssm:GetParameters"
        ]
        Resource = "arn:aws:ssm:us-east-1:*:parameter/prod/*"
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "ec2_policy_attach" {
  policy_arn = aws_iam_policy.ec2_deployment_policy.arn
  role       = aws_iam_role.ec2_instance_role.name
}

# Attach SSM Core policy for Session Manager access (security best practice)
resource "aws_iam_role_policy_attachment" "ec2_ssm_attach" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
  role       = aws_iam_role.ec2_instance_role.name
}

resource "aws_iam_instance_profile" "ec2_instance_profile" {
  name = "ec2-instance-deployment-profile"
  role = aws_iam_role.ec2_instance_role.name
}

# 3. CodeDeploy Application
resource "aws_codedeploy_app" "web_app" {
  compute_platform = "Server"
  name             = "enterprise-web-application"
}

# 4. Custom Deployment Configuration
resource "aws_codedeploy_deployment_config" "custom_rolling_config" {
  deployment_config_name = "Custom.RollingTwoAtATime"
  compute_platform        = "Server"

  minimum_healthy_hosts {
    type  = "FLEET_PERCENT"
    value = 75
  }
}

# 5. CodeDeploy Deployment Group
resource "aws_codedeploy_deployment_group" "web_app_group" {
  app_name              = aws_codedeploy_app.web_app.name
  deployment_group_name = "production-fleet"
  service_role_arn      = aws_codedeploy_service_role.codedeploy_service_role.arn
  
  deployment_config_name = aws_codedeploy_deployment_config.custom_rolling_config.deployment_config_name

  # Target instances associated with specific Auto Scaling Groups
  autoscaling_groups = ["production-web-asg"]

  # Configure automatic rollback on deployment failures or alarms
  auto_rollback_configuration {
    enabled = true
    events  = ["DEPLOYMENT_FAILURE", "DEPLOYMENT_STOP_ON_ALARM"]
  }

  # Enable deployment status notifications via SNS
  trigger_configuration {
    trigger_events     = ["DeploymentFailure", "DeploymentSuccess", "DeploymentRollback"]
    trigger_name       = "deployment-notifications-trigger"
    trigger_target_arn = "arn:aws:sns:us-east-1:123456789012:deployment-alerts-topic"
  }

  # Out-of-service alarm monitoring during deployment
  alarm_configuration {
    alarms             = ["web-application-error-rate-high"]
    enabled            = true
    ignore_poll_alarm_failure = false
  }
}

Step-by-Step Production Implementation Guide (ECS Blue/Green)

Deploying microservices to Amazon ECS using Blue/Green strategies requires configuring an Application Load Balancer (ALB) to safely shift traffic between two Target Groups. Let's look at how to structure an ECS Blue/Green deployment using CodeDeploy.

The ECS Blue/Green Network Topology

An ECS Blue/Green deployment uses the following architecture:

Production Listener (Port 443): Routes live customer traffic to the active Target Group (e.g., Target Group 1 / Blue).
Test Listener (Port 8443): Routes test traffic to the replacement Target Group (e.g., Target Group 2 / Green) before the final cutover.
Target Group 1 (Blue): Hosts the tasks running the currently active version of the container.
Target Group 2 (Green): Hosts the tasks running the new version of the container during a deployment.

                                     [Client Requests]
                                      /            \
                       (Port 443)    /              \    (Port 8443)
                       Prod Listener/                \Test Listener
                                   v                  v
                            +------------+      +------------+
                            | Target Grp |      | Target Grp |
                            |  1 (Blue)  |      |  2 (Green) |
                            +------------+      +------------+
                                  |                    |
                                  v                    v
                            +------------+      +------------+
                            | ECS Tasks  |      | ECS Tasks  |
                            | (v1.0.0)   |      | (v1.1.0)   |
                            +```
                        +------------+      +------------+
                        | ECS Tasks  |      | ECS Tasks  |
                        | (v1.0.0)   |      | (v1.1.0)   |
                        +------------+      +------------+

               Initial State:
               - Production Listener -> Target Group 1 (Blue)
               - Test Listener -> Target Group 2 (Green)

               After Validation:
               - Production Listener -> Target Group 2 (Green)
               - Blue Tasks retained for rollback window
```

ECS CodeDeploy Deployment Group Configuration (Terraform)

The following Terraform configuration creates an ECS deployment group configured for Blue/Green deployments with traffic shifting managed by CodeDeploy.

resource "aws_codedeploy_app" "ecs_app" {
  name             = "payment-processing-service"
  compute_platform = "ECS"
}

resource "aws_codedeploy_deployment_group" "ecs_blue_green" {
  app_name              = aws_codedeploy_app.ecs_app.name
  deployment_group_name = "production-ecs-bluegreen"
  service_role_arn      = aws_iam_role.codedeploy_service_role.arn

  deployment_config_name = "CodeDeployDefault.ECSCanary10Percent5Minutes"

  ecs_service {
    cluster_name = aws_ecs_cluster.production_cluster.name
    service_name = aws_ecs_service.payment_service.name
  }

  deployment_style {
    deployment_type   = "BLUE_GREEN"
    deployment_option = "WITH_TRAFFIC_CONTROL"
  }

  blue_green_deployment_config {

    deployment_ready_option {
      action_on_timeout = "CONTINUE_DEPLOYMENT"
    }

    terminate_blue_instances_on_deployment_success {
      action                           = "TERMINATE"
      termination_wait_time_in_minutes = 60
    }
  }

  load_balancer_info {

    target_group_pair_info {

      prod_traffic_route {
        listener_arns = [
          aws_lb_listener.production.arn
        ]
      }

      test_traffic_route {
        listener_arns = [
          aws_lb_listener.test.arn
        ]
      }

      target_group {
        name = aws_lb_target_group.blue.name
      }

      target_group {
        name = aws_lb_target_group.green.name
      }
    }
  }

  auto_rollback_configuration {
    enabled = true
    events = [
      "DEPLOYMENT_FAILURE",
      "DEPLOYMENT_STOP_ON_ALARM"
    ]
  }
}

ECS Deployment Lifecycle Flow

A new task definition revision is registered.
CodePipeline triggers a CodeDeploy deployment.
CodeDeploy launches replacement tasks in the Green target group.
ALB health checks validate Green tasks.
Traffic is routed to the Test Listener.
Integration tests execute through AfterAllowTestTraffic hooks.
Traffic shifts gradually according to the deployment configuration.
CloudWatch alarms continuously monitor application health.
If alarms remain healthy, Green becomes production.
Blue tasks are retained for the configured rollback period.

Database Migration Strategies for Blue/Green Deployments

One of the most common causes of deployment failures is database schema incompatibility. Applications often deploy faster than database changes can be safely rolled back. Enterprise teams therefore use the Expand/Contract Pattern.

Phase 1: Expand

Add new tables or columns.
Do not remove existing structures.
Ensure old and new application versions continue to function.

ALTER TABLE customers
ADD COLUMN preferred_language VARCHAR(10) NULL;

Phase 2: Application Deployment

Deploy new application version.
Write to both old and new schema structures if necessary.
Monitor production traffic.

Phase 3: Contract

Remove deprecated columns only after all environments run the new code.
Execute cleanup migration in a later deployment cycle.

ALTER TABLE customers
DROP COLUMN legacy_language_code;

This strategy enables safe rollback because previous application versions remain compatible with the expanded schema.

Monitoring and Observability

Enterprise deployments require complete visibility into deployment health.

Critical CloudWatch Metrics

Metric	Purpose
HTTPCode_Target_5XX_Count	Application failures
TargetResponseTime	Latency degradation detection
HealthyHostCount	Target health verification
CPUUtilization	Infrastructure stress monitoring
MemoryUtilization	ECS task resource consumption

Recommended CloudWatch Alarm

resource "aws_cloudwatch_metric_alarm" "high_5xx_rate" {
  alarm_name          = "production-http-5xx-errors"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "HTTPCode_Target_5XX_Count"
  namespace           = "AWS/ApplicationELB"
  period              = 60
  statistic           = "Sum"
  threshold           = 20

  alarm_description = "Triggers rollback if error rate spikes"
}

Security Hardening Best Practices

Use IAM least-privilege policies.
Store secrets in AWS Secrets Manager or Parameter Store.
Encrypt deployment artifacts using AWS KMS.
Enable CloudTrail logging for deployment auditing.
Use Session Manager instead of SSH access.
Enable VPC endpoints for S3, CodeDeploy, and Systems Manager.
Restrict deployment initiation permissions to CI/CD roles.
Implement approval gates for production deployments.

Advanced Troubleshooting Guide

CodeDeploy Agent Not Reporting

sudo systemctl status codedeploy-agent
sudo journalctl -u codedeploy-agent

Review Agent Logs

tail -f /var/log/aws/codedeploy-agent/codedeploy-agent.log

Review Deployment Script Logs

ls -ltr /opt/codedeploy-agent/deployment-root/

Verify Deployment Status

aws deploy get-deployment \
  --deployment-id d-ABCDEFGHIJK

Conclusion

AWS CodeDeploy is significantly more than a deployment utility. It is a deployment orchestration platform that enables highly reliable software delivery across EC2, ECS, Lambda, and hybrid environments. By combining Blue/Green deployment patterns, automated rollback mechanisms, CloudWatch-driven observability, and Infrastructure as Code, organizations can achieve near-zero-downtime releases while dramatically reducing operational risk.

Mastering CodeDeploy allows DevOps teams to move from fragile deployment processes to fully automated, auditable, scalable, and enterprise-grade continuous delivery pipelines.