Prerequisites: Đọc S3 Basics để hiểu các khái niệm cơ bản.
Storage Classes Overview
S3 có nhiều storage classes với trade-offs giữa cost, availability và access time.
┌─────────────────────────────────────────────────────────────────┐
│ S3 Storage Classes │
├─────────────────────────────────────────────────────────────────┤
│ Frequent Access │
│ ├── S3 Standard (99.99% availability) │
│ └── S3 Intelligent-Tiering (auto-optimize) │
│ │
│ Infrequent Access │
│ ├── S3 Standard-IA (99.9%, min 30 days) │
│ └── S3 One Zone-IA (99.5%, single AZ) │
│ │
│ Archive │
│ ├── S3 Glacier Instant (milliseconds) │
│ ├── S3 Glacier Flexible (minutes to hours) │
│ └── S3 Glacier Deep Archive (12-48 hours, cheapest) │
└─────────────────────────────────────────────────────────────────┘
Comparison Table
| Class | Use Case | Retrieval | Min Duration | Cost (GB/month) |
|---|---|---|---|---|
| Standard | Frequently accessed | Instant | - | $0.023 |
| Intelligent-Tiering | Unknown patterns | Instant | - | $0.023 + monitoring |
| Standard-IA | Infrequent but quick | Instant | 30 days | $0.0125 |
| One Zone-IA | Reproducible data | Instant | 30 days | $0.01 |
| Glacier Instant | Archive, quick access | Instant | 90 days | $0.004 |
| Glacier Flexible | Backup, compliance | 1-12 hrs | 90 days | $0.0036 |
| Glacier Deep Archive | Long-term archive | 12-48 hrs | 180 days | $0.00099 |
S3 Intelligent-Tiering
Tự động move objects giữa tiers based on access patterns.
┌───────────────────────────────────────────────────────────────┐
│ S3 Intelligent-Tiering │
├───────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ 30 days ┌──────────────┐ │
│ │ Frequent │───────────────► │ Infrequent │ │
│ │ Access │◄─────────────── │ Access │ │
│ └──────────────┘ 1 access └──────────────┘ │
│ │ │ │
│ │ │ 90 days │
│ │ ▼ │
│ │ ┌──────────────┐ │
│ │ │ Archive │ │
│ │ │ Instant │ │
│ │ └──────────────┘ │
│ │ │ 180 days │
│ │ ▼ │
│ │ ┌──────────────┐ │
│ │ │ Deep Archive │ │
│ └────────────────────────│ (optional) │ │
│ 1 access moves back └──────────────┘ │
│ to Frequent Access │
└───────────────────────────────────────────────────────────────┘
Cost: $0.0025 per 1,000 objects/month monitoring fee
Terraform
resource "aws_s3_bucket" "data" {
bucket = "${var.project_name}-data"
}
# Enable Intelligent-Tiering für entire bucket
resource "aws_s3_bucket_intelligent_tiering_configuration" "main" {
bucket = aws_s3_bucket.data.id
name = "EntireBucket"
tiering {
access_tier = "ARCHIVE_ACCESS"
days = 90
}
tiering {
access_tier = "DEEP_ARCHIVE_ACCESS"
days = 180
}
}
Lifecycle Policies
Automate transitions và expiration của objects.
Common Patterns
Pattern 1: Log Retention
┌────────────┐ 30 days ┌────────────┐ 90 days ┌────────────┐
│ Standard │────────────►│ Standard-IA│────────────►│ Delete │
└────────────┘ └────────────┘ └────────────┘
Pattern 2: Backup Archive
┌────────────┐ 30 days ┌────────────┐ 365 days ┌────────────┐
│ Standard │────────────►│ Glacier │────────────►│ Deep Arch │
└────────────┘ │ Flexible │ └────────────┘
└────────────┘
Pattern 3: Version Cleanup
Current Version: Keep indefinitely
Previous Versions: Move to IA after 30 days, delete after 90 days
Delete markers: Clean up after 1 day
Terraform
resource "aws_s3_bucket_lifecycle_configuration" "main" {
bucket = aws_s3_bucket.data.id
# Rule 1: Log files
rule {
id = "log-retention"
status = "Enabled"
filter {
prefix = "logs/"
}
transition {
days = 30
storage_class = "STANDARD_IA"
}
transition {
days = 60
storage_class = "GLACIER"
}
expiration {
days = 365
}
}
# Rule 2: Backup files
rule {
id = "backup-archival"
status = "Enabled"
filter {
prefix = "backups/"
}
transition {
days = 7
storage_class = "GLACIER"
}
transition {
days = 180
storage_class = "DEEP_ARCHIVE"
}
}
# Rule 3: Cleanup old versions
rule {
id = "version-cleanup"
status = "Enabled"
filter {
prefix = "" # All objects
}
noncurrent_version_transition {
noncurrent_days = 30
storage_class = "STANDARD_IA"
}
noncurrent_version_expiration {
noncurrent_days = 90
}
# Clean up incomplete multipart uploads
abort_incomplete_multipart_upload {
days_after_initiation = 7
}
}
# Rule 4: Delete markers cleanup
rule {
id = "delete-marker-cleanup"
status = "Enabled"
filter {
prefix = ""
}
expiration {
expired_object_delete_marker = true
}
}
}
S3 Object Lock
Prevent objects từ bị delete hoặc overwrite - cho compliance (WORM: Write Once Read Many).
Retention Modes
| Mode | Description |
|---|---|
| Governance | Users với đặc biệt permissions có thể override |
| Compliance | NO ONE can delete, kể cả root account |
Lock Types
┌───────────────────────────────────────────────────────────────┐
│ Object Lock │
├───────────────────────────────────────────────────────────────┤
│ │
│ Retention Period Legal Hold │
│ ┌────────────────┐ ┌────────────────┐ │
│ │ Protect until │ │ Indefinite │ │
│ │ specific date │ │ until removed │ │
│ │ │ │ │ │
│ │ Mode: │ │ For litigation │ │
│ │ - Governance │ │ or legal cases │ │
│ │ - Compliance │ │ │ │
│ └────────────────┘ └────────────────┘ │
│ │
└───────────────────────────────────────────────────────────────┘
Terraform
# Bucket với Object Lock enabled
resource "aws_s3_bucket" "compliance" {
bucket = "${var.project_name}-compliance"
object_lock_enabled = true
}
# Default retention
resource "aws_s3_bucket_object_lock_configuration" "compliance" {
bucket = aws_s3_bucket.compliance.id
rule {
default_retention {
mode = "COMPLIANCE"
years = 7 # 7 year retention for financial records
}
}
}
Per-Object Lock
# Upload với retention
aws s3api put-object \
--bucket compliance-bucket \
--key financial/report-2024.pdf \
--body report.pdf \
--object-lock-mode COMPLIANCE \
--object-lock-retain-until-date "2031-01-01T00:00:00Z"
# Add legal hold
aws s3api put-object-legal-hold \
--bucket compliance-bucket \
--key financial/report-2024.pdf \
--legal-hold Status=ON
Glacier Retrieval Options
| Option | Time | Cost |
|---|---|---|
| Expedited | 1-5 minutes | $$$ |
| Standard | 3-5 hours | $$ |
| Bulk | 5-12 hours | $ |
Terraform - Restore Request
# Restore from Glacier
aws s3api restore-object \
--bucket my-bucket \
--key archive/old-data.zip \
--restore-request '{"Days":7,"GlacierJobParameters":{"Tier":"Standard"}}'
# Check restore status
aws s3api head-object \
--bucket my-bucket \
--key archive/old-data.zip
# Look for: Restore: ongoing-request="false", expiry-date="..."
S3 Replication
Cross-Region Replication (CRR)
┌──────────────┐ ┌──────────────┐
│ US-East-1 │─────Replicate───►│ AP-Southeast │
│ (Primary) │ │ (Replica) │
└──────────────┘ └──────────────┘
Same-Region Replication (SRR)
┌──────────────┐ ┌──────────────┐
│ Production │─────Replicate───►│ Backup │
│ Bucket │ │ Bucket │
└──────────────┘ └──────────────┘
Terraform
# Source bucket
resource "aws_s3_bucket" "source" {
bucket = "${var.project_name}-source"
}
resource "aws_s3_bucket_versioning" "source" {
bucket = aws_s3_bucket.source.id
versioning_configuration {
status = "Enabled" # Required for replication
}
}
# Destination bucket (can be different region)
resource "aws_s3_bucket" "destination" {
provider = aws.destination_region
bucket = "${var.project_name}-destination"
}
resource "aws_s3_bucket_versioning" "destination" {
provider = aws.destination_region
bucket = aws_s3_bucket.destination.id
versioning_configuration {
status = "Enabled"
}
}
# Replication configuration
resource "aws_s3_bucket_replication_configuration" "main" {
bucket = aws_s3_bucket.source.id
role = aws_iam_role.replication.arn
rule {
id = "full-replication"
status = "Enabled"
filter {
prefix = "" # All objects
}
destination {
bucket = aws_s3_bucket.destination.arn
storage_class = "STANDARD_IA"
# Replicate encrypted objects
encryption_configuration {
replica_kms_key_id = aws_kms_key.destination.arn
}
# Sync delete markers
# replication_time và metrics cho RTC (Replication Time Control)
}
# Also replicate delete markers
delete_marker_replication {
status = "Enabled"
}
}
}
S3 Event Notifications
S3 Bucket ──► Event ──► Lambda / SQS / SNS / EventBridge
│
└── Events: PutObject, DeleteObject, CompleteMultipartUpload...
Terraform
resource "aws_s3_bucket_notification" "main" {
bucket = aws_s3_bucket.data.id
# Lambda trigger
lambda_function {
lambda_function_arn = aws_lambda_function.processor.arn
events = ["s3:ObjectCreated:*"]
filter_prefix = "uploads/"
filter_suffix = ".jpg"
}
# SQS trigger
queue {
queue_arn = aws_sqs_queue.processing.arn
events = ["s3:ObjectCreated:*"]
filter_prefix = "data/"
}
# EventBridge (recommended for complex routing)
eventbridge = true
}
S3 Access Points
Simplify access management cho shared buckets.
┌──────────────────┐
│ S3 Bucket │
│ (shared data) │
└────────┬─────────┘
│
┌────────────────────┼────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Access Point │ │ Access Point │ │ Access Point │
│ (Finance) │ │ (Analytics) │ │ (Dev) │
│ │ │ │ │ │
│ /finance/* │ │ /analytics/* │ │ /dev/* │
│ read-write │ │ read-only │ │ full access │
└───────────────┘ └───────────────┘ └───────────────┘
Terraform
resource "aws_s3_access_point" "finance" {
bucket = aws_s3_bucket.data.id
name = "finance-access-point"
vpc_configuration {
vpc_id = aws_vpc.main.id
}
public_access_block_configuration {
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
}
resource "aws_s3control_access_point_policy" "finance" {
access_point_arn = aws_s3_access_point.finance.arn
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Principal = {
AWS = aws_iam_role.finance.arn
}
Action = ["s3:GetObject", "s3:PutObject"]
Resource = "${aws_s3_access_point.finance.arn}/object/finance/*"
}
]
})
}
Best Practices
1. Use Lifecycle Policies
# Always have a lifecycle policy for cost optimization
rule {
id = "cleanup"
status = "Enabled"
noncurrent_version_expiration {
noncurrent_days = 30
}
}
2. Enable Versioning for critical data
versioning_configuration {
status = "Enabled"
}
3. Use Intelligent-Tiering for unknown access patterns
# Set as default for new objects
default_storage_class = "INTELLIGENT_TIERING"
4. Block public access
resource "aws_s3_bucket_public_access_block" "main" {
bucket = aws_s3_bucket.data.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
Practice Questions (SAA style)
1. Financial records cần lưu 7 năm và không ai được phép xóa. Storage class và feature nào phù hợp?
A. S3 Standard với bucket policy
B. S3 Glacier với Object Lock Compliance mode
C. S3 Glacier với Object Lock Governance mode
D. S3 Standard-IA với MFA Delete
Đáp án: B - Compliance mode đảm bảo không ai có thể xóa, kể cả root.
2. Application tạo logs hàng ngày. Logs cần truy cập nhanh trong 7 ngày, sau đó hiếm khi access nhưng phải giữ 1 năm. Giải pháp cost-effective nhất?
A. S3 Standard với lifecycle to Glacier sau 7 ngày
B. S3 Intelligent-Tiering
C. S3 Standard → Standard-IA (7 days) → Glacier (30 days)
D. S3 One Zone-IA
Đáp án: C - Transition dần để optimize cost trong khi vẫn có thể access.
3. Data lake bucket được access bởi 50 teams khác nhau. Mỗi team chỉ nên thấy data của họ. Giải pháp nào tốt nhất để quản lý access?
A. 50 bucket policies
B. IAM policies cho mỗi team
C. S3 Access Points cho mỗi team
D. S3 Object ACLs
Đáp án: C - Access Points simplify việc quản lý access cho shared buckets.
Bài tiếp theo: VPC Advanced - Transit Gateway, PrivateLink, VPN.