Auto Scaling: Tự động mở rộng EC2 theo nhu cầu

Tại sao cần Auto Scaling?

Hãy tưởng tượng bạn có một website bán hàng online. Có những ngày bình thường chỉ có vài trăm người truy cập, nhưng vào ngày sale (như Black Friday), lượng truy cập tăng gấp 50 lần!

Nếu không có Auto Scaling:

Bạn phải chuẩn bị sẵn 50 servers cho ngày sale → lãng phí tiền 364 ngày còn lại
Hoặc chỉ có vài servers → website crash vào ngày sale → mất doanh thu

Với Auto Scaling:

Ngày thường: 2 servers
Khi traffic tăng: tự động thêm servers
Khi traffic giảm: tự động xóa servers thừa
Bạn chỉ trả tiền cho những gì thực sự dùng

Các thành phần của Auto Scaling

Để Auto Scaling hoạt động, bạn cần hiểu 3 thành phần chính:

1. Launch Template

Đây là “bản thiết kế” của EC2 instance. Nó định nghĩa:

AMI (hệ điều hành)
Instance type (t3.micro, m5.large…)
Security Groups
Key pair
User data script

Khi cần tạo instance mới, AWS sẽ dựa vào template này.

2. Auto Scaling Group (ASG)

Đây là “quản lý” của các instances. Nó quyết định:

Số lượng instances tối thiểu (min)
Số lượng instances tối đa (max)
Số lượng mong muốn (desired)
Ở trong VPC/Subnets nào

3. Scaling Policies

Đây là “luật” để tăng/giảm instances. Có nhiều loại:

Target Tracking: Giữ một metric ở mức nhất định (VD: CPU luôn ≤ 70%)
Step Scaling: Tăng/giảm theo từng bước
Scheduled: Lên lịch trước (VD: tăng lúc 9h sáng thứ 2)

Minh họa trực quan

         ┌─────────────────────────────────────────┐
         │           Auto Scaling Group             │
         │                                          │
         │   min: 2    desired: 3    max: 10       │
         │                                          │
         │   ┌────────┐ ┌────────┐ ┌────────┐      │
         │   │ EC2-1  │ │ EC2-2  │ │ EC2-3  │      │
         │   └────────┘ └────────┘ └────────┘      │
         │        ▲          ▲          ▲          │
         │        │          │          │          │
         │   Created from Launch Template          │
         │                                          │
         └──────────────────┬──────────────────────┘
                            │
                            ▼
         ┌─────────────────────────────────────────┐
         │          Application Load Balancer       │
         │     (Phân phối traffic đến instances)    │
         └─────────────────────────────────────────┘
                            │
                            ▼
                        Internet

Hands-on: Tạo Auto Scaling Group với Terraform

Bước 1: Launch Template

# launch_template.tf

# Lấy AMI mới nhất
data "aws_ami" "amazon_linux" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["al2023-ami-*-x86_64"]
  }
}

# Launch Template
resource "aws_launch_template" "app" {
  name_prefix   = "${var.project_name}-"
  description   = "Launch template for ${var.project_name} application servers"
  
  image_id      = data.aws_ami.amazon_linux.id
  instance_type = var.instance_type

  # Network
  vpc_security_group_ids = [aws_security_group.app.id]

  # IAM Role (để app có quyền truy cập AWS services)
  iam_instance_profile {
    name = aws_iam_instance_profile.app.name
  }

  # Bắt buộc IMDSv2 cho security
  metadata_options {
    http_endpoint               = "enabled"
    http_tokens                 = "required"
    http_put_response_hop_limit = 1
  }

  # EBS root volume
  block_device_mappings {
    device_name = "/dev/xvda"
    
    ebs {
      volume_size           = 20
      volume_type           = "gp3"
      encrypted             = true
      delete_on_termination = true
    }
  }

  # User data: Script chạy khi instance khởi động
  user_data = base64encode(<<-EOF
    #!/bin/bash
    set -e
    
    # Cập nhật system
    dnf update -y
    
    # Cài đặt web server
    dnf install -y nginx
    
    # Tạo trang đơn giản hiển thị instance info
    INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
    AZ=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)
    
    cat > /usr/share/nginx/html/index.html <<HTML
    <!DOCTYPE html>
    <html>
    <head><title>App Server</title></head>
    <body>
      <h1>Hello from Auto Scaling!</h1>
      <p>Instance ID: $INSTANCE_ID</p>
      <p>Availability Zone: $AZ</p>
      <p>Time: $(date)</p>
    </body>
    </html>
    HTML
    
    # Start nginx
    systemctl enable nginx
    systemctl start nginx
    
    # Signal CloudFormation/ASG rằng instance đã ready
    # (Optional - dùng với lifecycle hooks)
  EOF
  )

  # Tags cho instances được tạo từ template này
  tag_specifications {
    resource_type = "instance"
    tags = {
      Name        = "${var.project_name}-app"
      Environment = var.environment
      ManagedBy   = "terraform"
    }
  }

  tag_specifications {
    resource_type = "volume"
    tags = {
      Name = "${var.project_name}-app-volume"
    }
  }

  lifecycle {
    create_before_destroy = true
  }
}

Bước 2: Auto Scaling Group

# asg.tf

resource "aws_autoscaling_group" "app" {
  name                = "${var.project_name}-asg"
  
  # Sử dụng Launch Template
  launch_template {
    id      = aws_launch_template.app.id
    version = "$Latest"  # Luôn dùng version mới nhất
  }

  # Capacity settings
  min_size         = var.asg_min_size      # Tối thiểu (VD: 2)
  max_size         = var.asg_max_size      # Tối đa (VD: 10)
  desired_capacity = var.asg_desired_size  # Mong muốn (VD: 2)

  # Network - deploy vào private subnets
  vpc_zone_identifier = var.private_subnet_ids

  # Gắn với Load Balancer
  target_group_arns = [aws_lb_target_group.app.arn]

  # Health check
  health_check_type         = "ELB"  # Dùng ALB health check
  health_check_grace_period = 300     # 5 phút để instance khởi động

  # Instance refresh: tự động replace khi thay đổi launch template
  instance_refresh {
    strategy = "Rolling"
    preferences {
      min_healthy_percentage = 50  # Giữ ít nhất 50% instances healthy
      instance_warmup        = 120 # 2 phút warmup
    }
  }

  # Termination policy: xóa instance nào trước?
  termination_policies = ["OldestInstance", "Default"]

  # Enabled metrics (để xem trong CloudWatch)
  enabled_metrics = [
    "GroupMinSize",
    "GroupMaxSize",
    "GroupDesiredCapacity",
    "GroupInServiceInstances",
    "GroupTotalInstances"
  ]

  tag {
    key                 = "Name"
    value               = "${var.project_name}-asg-instance"
    propagate_at_launch = true
  }

  lifecycle {
    create_before_destroy = true
    ignore_changes        = [desired_capacity]  # Để scaling policy tự quản lý
  }
}

Bước 3: Scaling Policies

# scaling_policies.tf

# === Target Tracking Policy (Khuyến nghị) ===
# Tự động điều chỉnh để giữ CPU ở mức 70%

resource "aws_autoscaling_policy" "cpu_target" {
  name                   = "${var.project_name}-cpu-target"
  autoscaling_group_name = aws_autoscaling_group.app.name
  policy_type            = "TargetTrackingScaling"

  target_tracking_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ASGAverageCPUUtilization"
    }
    target_value = 70.0  # Giữ CPU ≤ 70%
    
    # Cooldown: đợi bao lâu trước khi scale tiếp
    # Tránh scale lên xuống liên tục
  }
}

# Target tracking theo số requests trên mỗi instance
resource "aws_autoscaling_policy" "request_count" {
  name                   = "${var.project_name}-request-count"
  autoscaling_group_name = aws_autoscaling_group.app.name
  policy_type            = "TargetTrackingScaling"

  target_tracking_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ALBRequestCountPerTarget"
      resource_label         = "${aws_lb.main.arn_suffix}/${aws_lb_target_group.app.arn_suffix}"
    }
    target_value = 1000.0  # 1000 requests/phút/instance
  }
}

# === Step Scaling (Cho control chi tiết hơn) ===

# Scale OUT khi CPU > 80%
resource "aws_autoscaling_policy" "scale_out" {
  name                   = "${var.project_name}-scale-out"
  autoscaling_group_name = aws_autoscaling_group.app.name
  policy_type            = "StepScaling"
  adjustment_type        = "ChangeInCapacity"

  step_adjustment {
    scaling_adjustment          = 1        # Thêm 1 instance
    metric_interval_lower_bound = 0        # CPU từ 80%
    metric_interval_upper_bound = 20       # đến 100%
  }

  step_adjustment {
    scaling_adjustment          = 2        # Thêm 2 instances
    metric_interval_lower_bound = 20       # CPU > 100% (burst)
  }
}

# Scale IN khi CPU < 30%
resource "aws_autoscaling_policy" "scale_in" {
  name                   = "${var.project_name}-scale-in"
  autoscaling_group_name = aws_autoscaling_group.app.name
  policy_type            = "StepScaling"
  adjustment_type        = "ChangeInCapacity"

  step_adjustment {
    scaling_adjustment          = -1       # Xóa 1 instance
    metric_interval_upper_bound = 0        # CPU < 30%
  }
}

# CloudWatch Alarms để trigger step scaling
resource "aws_cloudwatch_metric_alarm" "high_cpu" {
  alarm_name          = "${var.project_name}-high-cpu"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "CPUUtilization"
  namespace           = "AWS/EC2"
  period              = 60
  statistic           = "Average"
  threshold           = 80

  dimensions = {
    AutoScalingGroupName = aws_autoscaling_group.app.name
  }

  alarm_actions = [aws_autoscaling_policy.scale_out.arn]
}

resource "aws_cloudwatch_metric_alarm" "low_cpu" {
  alarm_name          = "${var.project_name}-low-cpu"
  comparison_operator = "LessThanThreshold"
  evaluation_periods  = 5  # Đợi lâu hơn trước khi scale in
  metric_name         = "CPUUtilization"
  namespace           = "AWS/EC2"
  period              = 60
  statistic           = "Average"
  threshold           = 30

  dimensions = {
    AutoScalingGroupName = aws_autoscaling_group.app.name
  }

  alarm_actions = [aws_autoscaling_policy.scale_in.arn]
}

# === Scheduled Scaling ===
# Biết trước lúc nào traffic cao

# Scale up lúc 8h sáng thứ 2-6
resource "aws_autoscaling_schedule" "morning_scale_up" {
  scheduled_action_name  = "morning-scale-up"
  autoscaling_group_name = aws_autoscaling_group.app.name
  
  min_size         = 4
  max_size         = 10
  desired_capacity = 4
  
  recurrence = "0 8 * * MON-FRI"  # Cron: 8:00 AM, Mon-Fri
  time_zone  = "Asia/Ho_Chi_Minh"
}

# Scale down lúc 8h tối thứ 2-6
resource "aws_autoscaling_schedule" "evening_scale_down" {
  scheduled_action_name  = "evening-scale-down"
  autoscaling_group_name = aws_autoscaling_group.app.name
  
  min_size         = 2
  max_size         = 10
  desired_capacity = 2
  
  recurrence = "0 20 * * MON-FRI"  # 8:00 PM, Mon-Fri
  time_zone  = "Asia/Ho_Chi_Minh"
}

Bước 4: Load Balancer

# alb.tf

# Application Load Balancer
resource "aws_lb" "main" {
  name               = "${var.project_name}-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets            = var.public_subnet_ids

  enable_deletion_protection = var.environment == "prod"

  access_logs {
    bucket  = aws_s3_bucket.alb_logs.id
    prefix  = "alb-logs"
    enabled = true
  }

  tags = {
    Name = "${var.project_name}-alb"
  }
}

# Target Group
resource "aws_lb_target_group" "app" {
  name     = "${var.project_name}-tg"
  port     = 80
  protocol = "HTTP"
  vpc_id   = var.vpc_id

  # Health check
  health_check {
    enabled             = true
    healthy_threshold   = 2
    unhealthy_threshold = 3
    timeout             = 5
    interval            = 30
    path                = "/health"
    matcher             = "200"
  }

  # Deregistration delay: đợi connections drain
  deregistration_delay = 60

  tags = {
    Name = "${var.project_name}-tg"
  }
}

# Listener
resource "aws_lb_listener" "http" {
  load_balancer_arn = aws_lb.main.arn
  port              = 80
  protocol          = "HTTP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.app.arn
  }
}

Hiểu về Cooldown và Warmup

Đây là 2 khái niệm quan trọng thường bị bỏ qua:

Cooldown Period

Sau khi scale (lên hoặc xuống), ASG sẽ “đợi” trước khi scale tiếp. Điều này tránh việc:

Scale lên → metric chưa kịp giảm → scale lên tiếp → lãng phí
Scale xuống → metric tăng → scale xuống tiếp → thiếu capacity

# Mặc định cooldown là 300 seconds (5 phút)
# Có thể override trong scaling policy

Instance Warmup

Thời gian để instance mới “ấm lên” - khởi động hoàn toàn và bắt đầu serve traffic. Trong thời gian này, metrics của instance mới không được tính vào average.

target_tracking_configuration {
  target_value = 70.0
  # Instance warmup mặc định = 5 phút
}

Lifecycle Hooks

Đôi khi bạn cần chạy custom scripts trước/sau khi instance khởi động hoặc terminate:

# Hook khi instance launching
resource "aws_autoscaling_lifecycle_hook" "launching" {
  name                   = "launching-hook"
  autoscaling_group_name = aws_autoscaling_group.app.name
  lifecycle_transition   = "autoscaling:EC2_INSTANCE_LAUNCHING"
  
  # Thời gian chờ script hoàn thành
  heartbeat_timeout = 3600  # 1 giờ
  
  # Default result nếu timeout
  default_result = "ABANDON"  # Hủy launch nếu script fail
  
  # Gửi notification đến SNS/SQS để Lambda xử lý
  notification_target_arn = aws_sns_topic.asg_hooks.arn
  role_arn                = aws_iam_role.asg_hooks.arn
}

# Hook khi instance terminating
resource "aws_autoscaling_lifecycle_hook" "terminating" {
  name                   = "terminating-hook"
  autoscaling_group_name = aws_autoscaling_group.app.name
  lifecycle_transition   = "autoscaling:EC2_INSTANCE_TERMINATING"
  
  heartbeat_timeout = 600  # 10 phút để graceful shutdown
  default_result    = "CONTINUE"
  
  notification_target_arn = aws_sns_topic.asg_hooks.arn
  role_arn                = aws_iam_role.asg_hooks.arn
}

Use cases:

Pull code mới nhất khi launch
Register với service discovery
Drain connections trước khi terminate
Backup data trước khi xóa instance

Monitoring Auto Scaling

CloudWatch Metrics

resource "aws_cloudwatch_dashboard" "asg" {
  dashboard_name = "${var.project_name}-asg"

  dashboard_body = jsonencode({
    widgets = [
      {
        type   = "metric"
        x      = 0
        y      = 0
        width  = 12
        height = 6
        properties = {
          title  = "ASG Capacity"
          region = var.aws_region
          metrics = [
            ["AWS/AutoScaling", "GroupDesiredCapacity", "AutoScalingGroupName", aws_autoscaling_group.app.name],
            [".", "GroupInServiceInstances", ".", "."],
            [".", "GroupMinSize", ".", "."],
            [".", "GroupMaxSize", ".", "."]
          ]
          period = 60
          stat   = "Average"
        }
      },
      {
        type   = "metric"
        x      = 12
        y      = 0
        width  = 12
        height = 6
        properties = {
          title = "Average CPU Utilization"
          metrics = [
            ["AWS/EC2", "CPUUtilization", "AutoScalingGroupName", aws_autoscaling_group.app.name]
          ]
          period = 60
          stat   = "Average"
          annotations = {
            horizontal = [
              { value = 70, label = "Target" },
              { value = 80, label = "Scale Out" },
              { value = 30, label = "Scale In" }
            ]
          }
        }
      }
    ]
  })
}

Best Practices

1. Đừng scale quá nhanh

# ❌ Sai - evaluation_periods = 1 → scale ngay khi spike
resource "aws_cloudwatch_metric_alarm" "bad" {
  evaluation_periods = 1
  period             = 60
}

# ✅ Đúng - đợi 2-3 periods để confirm trend
resource "aws_cloudwatch_metric_alarm" "good" {
  evaluation_periods = 3
  period             = 60  # 3 phút tổng cộng
}

2. Scale IN chậm hơn scale OUT

# Scale OUT nhanh để đáp ứng demand
resource "aws_cloudwatch_metric_alarm" "scale_out" {
  evaluation_periods = 2  # 2 phút
}

# Scale IN chậm để tránh thrashing
resource "aws_cloudwatch_metric_alarm" "scale_in" {
  evaluation_periods = 5  # 5 phút - chắc chắn traffic đã giảm
}

3. Luôn có min >= 2 cho production

resource "aws_autoscaling_group" "prod" {
  min_size = 2  # Luôn có ít nhất 2 instances
  # Multi-AZ để high availability
  vpc_zone_identifier = [subnet_a, subnet_b]
}

4. Sử dụng predictive scaling nếu traffic có pattern

resource "aws_autoscaling_policy" "predictive" {
  name                   = "predictive-scaling"
  autoscaling_group_name = aws_autoscaling_group.app.name
  policy_type            = "PredictiveScaling"

  predictive_scaling_configuration {
    metric_specification {
      target_value = 70
      predefined_load_metric_specification {
        predefined_metric_type = "ASGTotalCPUUtilization"
      }
      predefined_scaling_metric_specification {
        predefined_metric_type = "ASGAverageCPUUtilization"
      }
    }
    mode = "ForecastAndScale"  # Dự đoán và scale trước
  }
}

Troubleshooting

Instance không join Target Group

Kiểm tra:

Security Group cho phép traffic từ ALB?
Health check path có trả về 200?
Instance đã qua health check grace period?

# Debug
aws autoscaling describe-auto-scaling-instances
aws elbv2 describe-target-health --target-group-arn xxx

Scale activities không hoạt động

# Xem lịch sử scaling activities
aws autoscaling describe-scaling-activities \
  --auto-scaling-group-name my-asg \
  --query 'Activities[*].[StartTime,Description,StatusCode,StatusMessage]'

Bài tiếp theo: EBS, EFS và S3 - So sánh các loại storage.