S3 Advanced: Storage Classes, Lifecycle và Object Lock

Deep dive vào S3 storage tiers, Intelligent-Tiering, Glacier, lifecycle policies và compliance với Object Lock.

Prerequisites: Đọc S3 Basics để hiểu các khái niệm cơ bản.

Storage Classes Overview

S3 có nhiều storage classes với trade-offs giữa cost, availability và access time.

┌─────────────────────────────────────────────────────────────────┐
│                    S3 Storage Classes                            │
├─────────────────────────────────────────────────────────────────┤
│ Frequent Access                                                  │
│ ├── S3 Standard              (99.99% availability)              │
│ └── S3 Intelligent-Tiering   (auto-optimize)                    │
│                                                                  │
│ Infrequent Access                                                │
│ ├── S3 Standard-IA           (99.9%, min 30 days)               │
│ └── S3 One Zone-IA           (99.5%, single AZ)                 │
│                                                                  │
│ Archive                                                          │
│ ├── S3 Glacier Instant       (milliseconds)                     │
│ ├── S3 Glacier Flexible      (minutes to hours)                 │
│ └── S3 Glacier Deep Archive  (12-48 hours, cheapest)            │
└─────────────────────────────────────────────────────────────────┘

Comparison Table

ClassUse CaseRetrievalMin DurationCost (GB/month)
StandardFrequently accessedInstant-$0.023
Intelligent-TieringUnknown patternsInstant-$0.023 + monitoring
Standard-IAInfrequent but quickInstant30 days$0.0125
One Zone-IAReproducible dataInstant30 days$0.01
Glacier InstantArchive, quick accessInstant90 days$0.004
Glacier FlexibleBackup, compliance1-12 hrs90 days$0.0036
Glacier Deep ArchiveLong-term archive12-48 hrs180 days$0.00099

S3 Intelligent-Tiering

Tự động move objects giữa tiers based on access patterns.

┌───────────────────────────────────────────────────────────────┐
│             S3 Intelligent-Tiering                             │
├───────────────────────────────────────────────────────────────┤
│                                                                │
│  ┌──────────────┐     30 days      ┌──────────────┐           │
│  │  Frequent    │───────────────►  │  Infrequent  │           │
│  │  Access      │◄───────────────  │  Access      │           │
│  └──────────────┘   1 access       └──────────────┘           │
│         │                                │                    │
│         │                                │ 90 days            │
│         │                                ▼                    │
│         │                        ┌──────────────┐             │
│         │                        │   Archive    │             │
│         │                        │   Instant    │             │
│         │                        └──────────────┘             │
│         │                                │ 180 days           │
│         │                                ▼                    │
│         │                        ┌──────────────┐             │
│         │                        │ Deep Archive │             │
│         └────────────────────────│   (optional) │             │
│           1 access moves back     └──────────────┘             │
│           to Frequent Access                                   │
└───────────────────────────────────────────────────────────────┘

Cost: $0.0025 per 1,000 objects/month monitoring fee

Terraform

resource "aws_s3_bucket" "data" {
  bucket = "${var.project_name}-data"
}

# Enable Intelligent-Tiering für entire bucket
resource "aws_s3_bucket_intelligent_tiering_configuration" "main" {
  bucket = aws_s3_bucket.data.id
  name   = "EntireBucket"

  tiering {
    access_tier = "ARCHIVE_ACCESS"
    days        = 90
  }

  tiering {
    access_tier = "DEEP_ARCHIVE_ACCESS"
    days        = 180
  }
}

Lifecycle Policies

Automate transitions và expiration của objects.

Common Patterns

Pattern 1: Log Retention
┌────────────┐   30 days   ┌────────────┐   90 days   ┌────────────┐
│  Standard  │────────────►│ Standard-IA│────────────►│   Delete   │
└────────────┘             └────────────┘             └────────────┘

Pattern 2: Backup Archive
┌────────────┐   30 days   ┌────────────┐   365 days  ┌────────────┐
│  Standard  │────────────►│   Glacier  │────────────►│ Deep Arch  │
└────────────┘             │  Flexible  │             └────────────┘
                           └────────────┘

Pattern 3: Version Cleanup
Current Version: Keep indefinitely
Previous Versions: Move to IA after 30 days, delete after 90 days
Delete markers: Clean up after 1 day

Terraform

resource "aws_s3_bucket_lifecycle_configuration" "main" {
  bucket = aws_s3_bucket.data.id

  # Rule 1: Log files
  rule {
    id     = "log-retention"
    status = "Enabled"

    filter {
      prefix = "logs/"
    }

    transition {
      days          = 30
      storage_class = "STANDARD_IA"
    }

    transition {
      days          = 60
      storage_class = "GLACIER"
    }

    expiration {
      days = 365
    }
  }

  # Rule 2: Backup files
  rule {
    id     = "backup-archival"
    status = "Enabled"

    filter {
      prefix = "backups/"
    }

    transition {
      days          = 7
      storage_class = "GLACIER"
    }

    transition {
      days          = 180
      storage_class = "DEEP_ARCHIVE"
    }
  }

  # Rule 3: Cleanup old versions
  rule {
    id     = "version-cleanup"
    status = "Enabled"

    filter {
      prefix = ""  # All objects
    }

    noncurrent_version_transition {
      noncurrent_days = 30
      storage_class   = "STANDARD_IA"
    }

    noncurrent_version_expiration {
      noncurrent_days = 90
    }

    # Clean up incomplete multipart uploads
    abort_incomplete_multipart_upload {
      days_after_initiation = 7
    }
  }

  # Rule 4: Delete markers cleanup
  rule {
    id     = "delete-marker-cleanup"
    status = "Enabled"

    filter {
      prefix = ""
    }

    expiration {
      expired_object_delete_marker = true
    }
  }
}

S3 Object Lock

Prevent objects từ bị delete hoặc overwrite - cho compliance (WORM: Write Once Read Many).

Retention Modes

ModeDescription
GovernanceUsers với đặc biệt permissions có thể override
ComplianceNO ONE can delete, kể cả root account

Lock Types

┌───────────────────────────────────────────────────────────────┐
│                    Object Lock                                 │
├───────────────────────────────────────────────────────────────┤
│                                                                │
│  Retention Period              Legal Hold                      │
│  ┌────────────────┐            ┌────────────────┐              │
│  │ Protect until  │            │ Indefinite     │              │
│  │ specific date  │            │ until removed  │              │
│  │                │            │                │              │
│  │ Mode:          │            │ For litigation │              │
│  │ - Governance   │            │ or legal cases │              │
│  │ - Compliance   │            │                │              │
│  └────────────────┘            └────────────────┘              │
│                                                                │
└───────────────────────────────────────────────────────────────┘

Terraform

# Bucket với Object Lock enabled
resource "aws_s3_bucket" "compliance" {
  bucket = "${var.project_name}-compliance"
  
  object_lock_enabled = true
}

# Default retention
resource "aws_s3_bucket_object_lock_configuration" "compliance" {
  bucket = aws_s3_bucket.compliance.id

  rule {
    default_retention {
      mode = "COMPLIANCE"
      years = 7  # 7 year retention for financial records
    }
  }
}

Per-Object Lock

# Upload với retention
aws s3api put-object \
  --bucket compliance-bucket \
  --key financial/report-2024.pdf \
  --body report.pdf \
  --object-lock-mode COMPLIANCE \
  --object-lock-retain-until-date "2031-01-01T00:00:00Z"

# Add legal hold
aws s3api put-object-legal-hold \
  --bucket compliance-bucket \
  --key financial/report-2024.pdf \
  --legal-hold Status=ON

Glacier Retrieval Options

OptionTimeCost
Expedited1-5 minutes$$$
Standard3-5 hours$$
Bulk5-12 hours$

Terraform - Restore Request

# Restore from Glacier
aws s3api restore-object \
  --bucket my-bucket \
  --key archive/old-data.zip \
  --restore-request '{"Days":7,"GlacierJobParameters":{"Tier":"Standard"}}'

# Check restore status
aws s3api head-object \
  --bucket my-bucket \
  --key archive/old-data.zip
# Look for: Restore: ongoing-request="false", expiry-date="..."

S3 Replication

Cross-Region Replication (CRR)

┌──────────────┐                  ┌──────────────┐
│  US-East-1   │─────Replicate───►│ AP-Southeast │
│  (Primary)   │                  │  (Replica)   │
└──────────────┘                  └──────────────┘

Same-Region Replication (SRR)

┌──────────────┐                  ┌──────────────┐
│  Production  │─────Replicate───►│   Backup     │
│   Bucket     │                  │   Bucket     │
└──────────────┘                  └──────────────┘

Terraform

# Source bucket
resource "aws_s3_bucket" "source" {
  bucket = "${var.project_name}-source"
}

resource "aws_s3_bucket_versioning" "source" {
  bucket = aws_s3_bucket.source.id
  versioning_configuration {
    status = "Enabled"  # Required for replication
  }
}

# Destination bucket (can be different region)
resource "aws_s3_bucket" "destination" {
  provider = aws.destination_region
  bucket   = "${var.project_name}-destination"
}

resource "aws_s3_bucket_versioning" "destination" {
  provider = aws.destination_region
  bucket   = aws_s3_bucket.destination.id
  versioning_configuration {
    status = "Enabled"
  }
}

# Replication configuration
resource "aws_s3_bucket_replication_configuration" "main" {
  bucket = aws_s3_bucket.source.id
  role   = aws_iam_role.replication.arn

  rule {
    id     = "full-replication"
    status = "Enabled"

    filter {
      prefix = ""  # All objects
    }

    destination {
      bucket        = aws_s3_bucket.destination.arn
      storage_class = "STANDARD_IA"

      # Replicate encrypted objects
      encryption_configuration {
        replica_kms_key_id = aws_kms_key.destination.arn
      }

      # Sync delete markers
      # replication_time và metrics cho RTC (Replication Time Control)
    }

    # Also replicate delete markers
    delete_marker_replication {
      status = "Enabled"
    }
  }
}

S3 Event Notifications

S3 Bucket ──► Event ──► Lambda / SQS / SNS / EventBridge

    └── Events: PutObject, DeleteObject, CompleteMultipartUpload...

Terraform

resource "aws_s3_bucket_notification" "main" {
  bucket = aws_s3_bucket.data.id

  # Lambda trigger
  lambda_function {
    lambda_function_arn = aws_lambda_function.processor.arn
    events              = ["s3:ObjectCreated:*"]
    filter_prefix       = "uploads/"
    filter_suffix       = ".jpg"
  }

  # SQS trigger
  queue {
    queue_arn     = aws_sqs_queue.processing.arn
    events        = ["s3:ObjectCreated:*"]
    filter_prefix = "data/"
  }

  # EventBridge (recommended for complex routing)
  eventbridge = true
}

S3 Access Points

Simplify access management cho shared buckets.

                    ┌──────────────────┐
                    │    S3 Bucket     │
                    │  (shared data)   │
                    └────────┬─────────┘

        ┌────────────────────┼────────────────────┐
        │                    │                    │
        ▼                    ▼                    ▼
┌───────────────┐    ┌───────────────┐    ┌───────────────┐
│ Access Point  │    │ Access Point  │    │ Access Point  │
│   (Finance)   │    │  (Analytics)  │    │    (Dev)      │
│               │    │               │    │               │
│ /finance/*    │    │ /analytics/*  │    │ /dev/*        │
│ read-write    │    │ read-only     │    │ full access   │
└───────────────┘    └───────────────┘    └───────────────┘

Terraform

resource "aws_s3_access_point" "finance" {
  bucket = aws_s3_bucket.data.id
  name   = "finance-access-point"

  vpc_configuration {
    vpc_id = aws_vpc.main.id
  }

  public_access_block_configuration {
    block_public_acls       = true
    block_public_policy     = true
    ignore_public_acls      = true
    restrict_public_buckets = true
  }
}

resource "aws_s3control_access_point_policy" "finance" {
  access_point_arn = aws_s3_access_point.finance.arn

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = {
          AWS = aws_iam_role.finance.arn
        }
        Action = ["s3:GetObject", "s3:PutObject"]
        Resource = "${aws_s3_access_point.finance.arn}/object/finance/*"
      }
    ]
  })
}

Best Practices

1. Use Lifecycle Policies

# Always have a lifecycle policy for cost optimization
rule {
  id     = "cleanup"
  status = "Enabled"
  
  noncurrent_version_expiration {
    noncurrent_days = 30
  }
}

2. Enable Versioning for critical data

versioning_configuration {
  status = "Enabled"
}

3. Use Intelligent-Tiering for unknown access patterns

# Set as default for new objects
default_storage_class = "INTELLIGENT_TIERING"

4. Block public access

resource "aws_s3_bucket_public_access_block" "main" {
  bucket = aws_s3_bucket.data.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

Practice Questions (SAA style)

1. Financial records cần lưu 7 năm và không ai được phép xóa. Storage class và feature nào phù hợp?

A. S3 Standard với bucket policy
B. S3 Glacier với Object Lock Compliance mode
C. S3 Glacier với Object Lock Governance mode
D. S3 Standard-IA với MFA Delete

Đáp án: B - Compliance mode đảm bảo không ai có thể xóa, kể cả root.


2. Application tạo logs hàng ngày. Logs cần truy cập nhanh trong 7 ngày, sau đó hiếm khi access nhưng phải giữ 1 năm. Giải pháp cost-effective nhất?

A. S3 Standard với lifecycle to Glacier sau 7 ngày
B. S3 Intelligent-Tiering
C. S3 Standard → Standard-IA (7 days) → Glacier (30 days)
D. S3 One Zone-IA

Đáp án: C - Transition dần để optimize cost trong khi vẫn có thể access.


3. Data lake bucket được access bởi 50 teams khác nhau. Mỗi team chỉ nên thấy data của họ. Giải pháp nào tốt nhất để quản lý access?

A. 50 bucket policies
B. IAM policies cho mỗi team
C. S3 Access Points cho mỗi team
D. S3 Object ACLs

Đáp án: C - Access Points simplify việc quản lý access cho shared buckets.


Bài tiếp theo: VPC Advanced - Transit Gateway, PrivateLink, VPN.