ElastiCache: Redis, Memcached và Caching Strategies

Tại sao cần Caching?

Mỗi lần user request, application phải:

Parse request
Query database (chậm nhất!)
Process data
Return response

Database query thường mất 50-100ms. Với caching, cùng data chỉ mất 1-5ms.

Không có cache:
User ──► App ──► Database (100ms) ──► App ──► User
                    ▲
               Bottleneck!

Có cache:
User ──► App ──► Cache (2ms) ──► App ──► User
                    │
                    └── Cache miss ──► Database ──► Cache

Amazon ElastiCache

ElastiCache là managed in-memory caching service. Hai engines:

Feature	Redis	Memcached
Data structures	Strings, Lists, Sets, Hashes, Sorted Sets	Strings only
Persistence	Có (snapshots, AOF)	Không
Replication	Có (Multi-AZ)	Không
Pub/Sub	Có	Không
Transactions	Có	Không
Lua scripting	Có	Không
Multi-threaded	Single-threaded	Multi-threaded
Use case	Complex data, sessions, leaderboards	Simple caching

Khuyến nghị: Dùng Redis cho hầu hết use cases.

Caching Strategies

1. Lazy Loading (Cache-Aside)

Data chỉ được load vào cache khi cần.

Read:
1. Check cache
2. If MISS: query DB, store in cache, return
3. If HIT: return from cache

Write:
1. Write to database
2. Invalidate cache (hoặc không làm gì)

def get_user(user_id):
    # 1. Check cache
    cache_key = f"user:{user_id}"
    cached = redis.get(cache_key)
    
    if cached:
        return json.loads(cached)  # Cache HIT
    
    # 2. Cache MISS - query database
    user = db.query("SELECT * FROM users WHERE id = ?", user_id)
    
    # 3. Store in cache với TTL
    redis.setex(cache_key, 3600, json.dumps(user))  # 1 hour TTL
    
    return user

Pros:

Chỉ cache data thực sự được dùng
Cache failures không ảnh hưởng app

Cons:

Cache miss penalty (first request chậm)
Data có thể stale (cần TTL)

2. Write-Through

Data được write đồng thời vào cache và database.

Write:
1. Write to cache
2. Write to database (same transaction)

Read:
1. Always read from cache (data luôn fresh)

def update_user(user_id, data):
    cache_key = f"user:{user_id}"
    
    # 1. Update database
    db.execute("UPDATE users SET name = ? WHERE id = ?", data['name'], user_id)
    
    # 2. Update cache
    user = db.query("SELECT * FROM users WHERE id = ?", user_id)
    redis.setex(cache_key, 3600, json.dumps(user))
    
    return user

Pros:

Data trong cache luôn fresh
Reads luôn nhanh

Cons:

Write latency tăng
Có thể cache data không bao giờ được đọc

3. Write-Behind (Write-Back)

Write vào cache trước, async write vào database sau.

Write:
1. Write to cache
2. Return to user immediately
3. Background process writes to database

(Rủi ro: mất data nếu cache crash trước khi persist)

Pros:

Write latency thấp nhất
Có thể batch writes

Cons:

Phức tạp để implement
Risk mất data

4. TTL (Time-To-Live)

Đơn giản nhất - data tự expire sau một thời gian.

# Set với TTL
redis.setex("user:123", 3600, user_data)  # Expire sau 1 giờ

# TTL cho different data types
CACHE_TTL = {
    'user_profile': 3600,      # 1 hour
    'product_list': 300,       # 5 minutes
    'homepage': 60,            # 1 minute
    'real_time_data': 10,      # 10 seconds
}

Terraform: ElastiCache Redis

Redis Replication Group (Production)

# Subnet Group
resource "aws_elasticache_subnet_group" "redis" {
  name       = "${var.project_name}-redis-subnet"
  subnet_ids = var.private_subnet_ids

  tags = {
    Name = "${var.project_name}-redis-subnet"
  }
}

# Security Group
resource "aws_security_group" "redis" {
  name        = "${var.project_name}-redis-sg"
  description = "Security group for ElastiCache Redis"
  vpc_id      = var.vpc_id

  ingress {
    from_port       = 6379
    to_port         = 6379
    protocol        = "tcp"
    security_groups = [aws_security_group.app.id]
    description     = "Redis from app servers"
  }

  tags = {
    Name = "${var.project_name}-redis-sg"
  }
}

# Redis Replication Group (Cluster Mode Disabled)
resource "aws_elasticache_replication_group" "redis" {
  replication_group_id = "${var.project_name}-redis"
  description          = "Redis cluster for ${var.project_name}"

  # Engine
  engine               = "redis"
  engine_version       = "7.0"
  node_type            = "cache.t3.medium"
  port                 = 6379

  # Replication (1 primary + N replicas)
  num_cache_clusters         = 2  # 1 primary + 1 replica
  automatic_failover_enabled = true
  multi_az_enabled           = true

  # Network
  subnet_group_name  = aws_elasticache_subnet_group.redis.name
  security_group_ids = [aws_security_group.redis.id]

  # Encryption
  at_rest_encryption_enabled = true
  transit_encryption_enabled = true
  auth_token                 = random_password.redis_auth.result  # Redis AUTH

  # Maintenance
  maintenance_window       = "sun:05:00-sun:06:00"
  snapshot_window          = "04:00-05:00"
  snapshot_retention_limit = 7

  # Parameter group
  parameter_group_name = aws_elasticache_parameter_group.redis.name

  # Auto minor version upgrade
  auto_minor_version_upgrade = true

  tags = {
    Name = "${var.project_name}-redis"
  }
}

# Parameter Group
resource "aws_elasticache_parameter_group" "redis" {
  family = "redis7"
  name   = "${var.project_name}-redis-params"

  parameter {
    name  = "maxmemory-policy"
    value = "volatile-lru"  # Xóa keys có TTL trước khi hết memory
  }

  parameter {
    name  = "notify-keyspace-events"
    value = "Ex"  # Enable keyspace notifications
  }
}

# Random AUTH token
resource "random_password" "redis_auth" {
  length  = 32
  special = false
}

# Store in Secrets Manager
resource "aws_secretsmanager_secret_version" "redis" {
  secret_id = aws_secretsmanager_secret.redis.id
  secret_string = jsonencode({
    host      = aws_elasticache_replication_group.redis.primary_endpoint_address
    port      = 6379
    auth      = random_password.redis_auth.result
    reader    = aws_elasticache_replication_group.redis.reader_endpoint_address
  })
}

Redis Cluster Mode Enabled

Cho workloads cần horizontal scaling (more data, more throughput):

resource "aws_elasticache_replication_group" "redis_cluster" {
  replication_group_id = "${var.project_name}-redis-cluster"
  description          = "Redis cluster mode enabled"

  engine         = "redis"
  engine_version = "7.0"
  node_type      = "cache.r6g.large"

  # Cluster mode enabled
  num_node_groups         = 3  # 3 shards
  replicas_per_node_group = 1  # 1 replica per shard

  automatic_failover_enabled = true
  multi_az_enabled           = true

  subnet_group_name  = aws_elasticache_subnet_group.redis.name
  security_group_ids = [aws_security_group.redis.id]

  at_rest_encryption_enabled = true
  transit_encryption_enabled = true
}

Session Management

Redis là giải pháp phổ biến cho session storage trong distributed systems.

Tại sao không lưu sessions trên EC2?

User ──► ALB ──► EC2-1 (has session)
                    ↓
         Next request goes to EC2-2
                    ↓
         Session not found! Login lại...

Session trong Redis

User ──► ALB ──► EC2-1 ──► Redis (get session)
                    ↓
         Next request to EC2-2
                    ↓
         EC2-2 ──► Redis ──► Same session!

Python Flask Example

from flask import Flask, session
from flask_session import Session
import redis

app = Flask(__name__)

# Configure Redis session
app.config['SESSION_TYPE'] = 'redis'
app.config['SESSION_REDIS'] = redis.Redis(
    host='redis-cluster.xxx.cache.amazonaws.com',
    port=6379,
    password='...',
    ssl=True
)
app.config['SESSION_PERMANENT'] = False
app.config['SESSION_USE_SIGNER'] = True

Session(app)

@app.route('/login')
def login():
    session['user_id'] = '12345'
    session['email'] = 'user@example.com'
    return 'Logged in'

@app.route('/profile')
def profile():
    user_id = session.get('user_id')
    # Works regardless of which EC2 handles request!
    return f'User: {user_id}'

Cache Invalidation Patterns

Pattern 1: Delete on Write

def update_user(user_id, data):
    # Update database
    db.update(user_id, data)
    
    # Delete cache (next read will repopulate)
    redis.delete(f"user:{user_id}")

Pattern 2: Event-Driven Invalidation

# Lambda trigger on DynamoDB stream
resource "aws_lambda_event_source_mapping" "cache_invalidator" {
  event_source_arn  = aws_dynamodb_table.users.stream_arn
  function_name     = aws_lambda_function.cache_invalidator.arn
  starting_position = "LATEST"
}

def lambda_handler(event, context):
    for record in event['Records']:
        if record['eventName'] in ['INSERT', 'MODIFY', 'REMOVE']:
            user_id = record['dynamodb']['Keys']['id']['S']
            redis.delete(f"user:{user_id}")

Practice Questions (SAA style)

1. Ứng dụng web cần session persistence khi users được routed đến different EC2 instances. Giải pháp nào phù hợp và scalable nhất?

A. Enable sticky sessions trên ALB
B. Store sessions trong DynamoDB
C. Store sessions trong ElastiCache Redis
D. Store sessions trên EFS shared filesystem

Đáp án: C - ElastiCache Redis cho latency thấp nhất và được thiết kế cho session storage.

2. Một e-commerce site cần cache product catalog thường xuyên được update. Data trong cache cần fresh nhưng performance vẫn quan trọng. Caching strategy nào phù hợp?

A. Lazy loading với TTL ngắn
B. Write-through caching
C. Write-behind caching
D. Read-through caching

Đáp án: B - Write-through đảm bảo cache luôn up-to-date khi data thay đổi.

Bài tiếp theo: Data Analytics trên AWS - Kinesis, Athena, Glue.