by skunxicat

Cross-Account OpenSearch Snapshots

A Complete Guide to Multi-Account Data Backup and Recovery

When building distributed systems across multiple AWS accounts, one common challenge is moving data between OpenSearch clusters. Whether you’re migrating from development to production, creating cross-account backups, or synchronizing data between environments, OpenSearch snapshots provide a robust solution.

This guide walks through setting up cross-account OpenSearch snapshots using Terraform and CloudPosse modules, based on a real-world implementation that processes over €27 million in transactions.

Architecture Overview

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Account A     │    │   Shared S3     │    │   Account B     │
│                 │    │   Repository    │    │                 │
│  ┌───────────┐  │    │                 │    │  ┌───────────┐  │
│  │OpenSearch │──┼────┼──► S3 Bucket ◄──┼────┼──│OpenSearch │  │
│  │ Cluster   │  │    │                 │    │  │ Cluster   │  │
│  └───────────┘  │    │                 │    │  └───────────┘  │
│                 │    └─────────────────┘    │                 │
└─────────────────┘                           └─────────────────┘
        │                                              │
        │              create snapshots                │
        └──────────────────────────────────────────────┘
                         restore indices

Use Cases

This cross-account snapshot setup addresses several critical scenarios:

  1. Data Migration: Moving data from development to production clusters across accounts
  2. Cross-Account Backups: Creating disaster recovery snapshots in separate AWS accounts
  3. Environment Synchronization: Keeping development environments updated with production data
  4. Compliance: Meeting regulatory requirements for data isolation and backup retention

Prerequisites

  • Two AWS accounts with OpenSearch clusters
  • Terraform with CloudPosse modules
  • Node.js for snapshot management utilities
  • Appropriate IAM permissions

Infrastructure Setup

1. S3 Bucket Configuration (Target Account)

First, create an S3 bucket in the target account that will store the snapshots:

# modules/analytics/main.tf
module "s3_bucket" {
  source = "cloudposse/s3-bucket/aws"
  version = "4.10.0"
  
  context = module.this
  name = join("-", [module.this.name, "opensearch"])
  
  s3_object_ownership = "BucketOwnerPreferred"  # Required for OpenSearch ACLs
  enabled = true
  versioning_enabled = false

  privileged_principal_actions = [
    "s3:GetObject",
    "s3:ListBucket", 
    "s3:GetBucketLocation",
    "s3:PutObject",
    "s3:PutObjectAcl",
    "s3:ListBucketMultipartUploads",
    "s3:AbortMultipartUpload",
    "s3:DeleteObject"
  ]

  privileged_principal_arns = [
    for p in var.source_principals : {
      (p.arn) = [""]
    }
  ]
}

2. IAM Role Configuration (Source Account)

Create an IAM role in the source account that OpenSearch can assume:

# Source account IAM configuration
data "aws_iam_policy_document" "opensearch_snapshot_trust" {
  statement {
    actions = ["sts:AssumeRole", "sts:TagSession"]
    
    principals {
      type = "Service"
      identifiers = ["es.amazonaws.com"]  # Critical for OpenSearch service
    }
    
    principals {
      type = "AWS"
      identifiers = ["arn:aws:iam::${local.account_id}:user/admin"]
    }
    
    effect = "Allow"
  }
}

resource "aws_iam_role" "opensearch_snapshot" {
  name = "${module.label.id}-opensearch-snapshots"
  assume_role_policy = data.aws_iam_policy_document.opensearch_snapshot_trust.json
}

resource "aws_iam_policy" "opensearch_snapshot_policy" {
  name = "${module.label.id}-opensearch-snapshots"
  
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "s3:PutObject",
          "s3:GetObject", 
          "s3:DeleteObject",
          "s3:GetBucketLocation",
          "s3:ListBucketMultipartUploads",
          "s3:AbortMultipartUpload"
        ]
        Resource = [
          "arn:aws:s3:::target-account-bucket",
          "arn:aws:s3:::target-account-bucket/*"
        ]
      },
      {
        Effect = "Allow"
        Action = ["s3:ListBucket"]
        Resource = ["arn:aws:s3:::target-account-bucket"]
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "opensearch_snapshot" {
  role = aws_iam_role.opensearch_snapshot.name
  policy_arn = aws_iam_policy.opensearch_snapshot_policy.arn
}

3. OpenSearch Cluster (Target Account)

Deploy the target OpenSearch cluster:

# modules/analytics/main.tf
module "opensearch" {
  source = "cloudposse/elasticsearch/aws"
  version = "0.47.0"

  name = "analytics-cluster"
  elasticsearch_version = "OpenSearch_2.7"
  instance_type = "t3.small.elasticsearch"
  instance_count = 1
  
  ebs_volume_size = 20
  ebs_volume_type = "gp3"
  
  vpc_id = var.vpc_id
  subnet_ids = var.private_subnet_ids
  security_groups = [var.vpc_default_security_group_id]
  
  encrypt_at_rest_enabled = true
  node_to_node_encryption_enabled = true
  domain_endpoint_options_enforce_https = true
}

Snapshot Management Utilities

Create a Node.js CLI tool for managing snapshots:

Helper Functions

// lib/es/helper.js
const es = require('./es')
const config = require('../config')

const createSnapshotRepository = async (name, params) => {
  const client = await es.getClient(config.ES_HOST, config.AWS_REGION)
  return await client.snapshot.createRepository({
    repository: name,
    body: params
  })
}

const createSnapshot = async (repository, snapshot, index = null) => {
  const client = await es.getClient(config.ES_HOST, config.AWS_REGION)
  
  let request = {
    repository,
    snapshot,
    wait_for_completion: true
  }

  if (index) {
    request.body = { indices: [index] }
  }

  return await client.snapshot.create(request)
}

const restoreIndexFromSnapshot = async (index, snapshot, repository) => {
  const client = await es.getClient(config.ES_HOST, config.AWS_REGION)
  
  const indexExists = await client.indices.exists({ index })
  if (indexExists.statusCode === 200) {
    await client.indices.close({ index })
  }
  
  const response = await client.snapshot.restore({
    repository,
    snapshot,
    body: {
      indices: [index],
      ignore_unavailable: true,
      include_global_state: false
    }
  })

  if (indexExists.statusCode === 200) {
    await client.indices.open({ index })
  }

  return response
}

const deleteSnapshot = async (repository, snapshot) => {
  const client = await es.getClient(config.ES_HOST, config.AWS_REGION)
  return await client.snapshot.delete({ repository, snapshot })
}

module.exports = {
  createSnapshotRepository,
  createSnapshot,
  restoreIndexFromSnapshot,
  deleteSnapshot
}

CLI Interface

// bin/snapshot.js
const args = require('args')
const helper = require('../lib/es/helper')

args.command('create-repository', 'create a snapshot repository', async (name, sub, options) => {
  const result = await helper.createSnapshotRepository(options.repository, {
    type: "s3",
    settings: { 
      bucket: options.bucket,
      region: process.env.AWS_REGION,
      role_arn: options.roleArn,
      canned_acl: "private"  // Required for cross-account access
    }
  })
  console.log(result)
})

args.command('create-snapshot', 'create a snapshot', async (name, sub, options) => {
  const result = await helper.createSnapshot(
    options.repository, 
    options.snapshot,
    options.index || null
  )
  console.log(result)
})

args.command('restore-index', 'restore an index from snapshot', async (name, sub, options) => {
  const result = await helper.restoreIndexFromSnapshot(
    options.index,
    options.snapshot,
    options.repository
  )
  console.log(result)
})

args.parse(process.argv)

Step-by-Step Implementation

1. Deploy Infrastructure

# Deploy target account infrastructure
cd target-account
terraform init
terraform apply

# Deploy source account infrastructure  
cd source-account
terraform init
terraform apply

2. Create Snapshot Repository (Source Account)

node bin/snapshot.js create-repository \
  --repository="cross-account-snapshots" \
  --bucket="target-account-opensearch-bucket" \
  --role-arn="arn:aws:iam::SOURCE-ACCOUNT:role/opensearch-snapshot-role"

3. Create Snapshot

# Snapshot all indices
node bin/snapshot.js create-snapshot \
  --repository="cross-account-snapshots" \
  --snapshot="migration-$(date +%Y%m%d)"

# Snapshot specific index
node bin/snapshot.js create-snapshot \
  --repository="cross-account-snapshots" \
  --snapshot="booking-data-$(date +%Y%m%d)" \
  --index="booking_jobs"

4. Create Repository (Target Account)

# Switch to target account OpenSearch endpoint
export ES_HOST="https://target-opensearch-domain.region.es.amazonaws.com"

node bin/snapshot.js create-repository \
  --repository="cross-account-snapshots" \
  --bucket="target-account-opensearch-bucket" \
  --role-arn="arn:aws:iam::TARGET-ACCOUNT:role/opensearch-snapshot-role"

5. Restore Indices

node bin/snapshot.js restore-index \
  --repository="cross-account-snapshots" \
  --snapshot="migration-20250813" \
  --index="booking_jobs"

Common Issues and Solutions

1. ACL Permission Errors

Error: AccessControlListNotSupported

Solution: Set S3 bucket ownership to BucketOwnerPreferred:

s3_object_ownership = "BucketOwnerPreferred"

And use canned_acl: "private" in repository settings.

2. Trust Policy Issues

Error: User is not authorized to perform: sts:AssumeRole

Solution: Ensure the IAM role trust policy includes es.amazonaws.com:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "Service": "es.amazonaws.com"
    },
    "Action": "sts:AssumeRole"
  }]
}

3. Repository Verification Failures

Error: cannot delete test data

Solution: Add s3:DeleteObject permission to the IAM policy.

Performance Considerations

  • Same Region: Keep S3 bucket and OpenSearch clusters in the same region for optimal performance
  • Incremental Snapshots: Subsequent snapshots are incremental and much faster
  • Parallel Processing: Multiple shards snapshot simultaneously
  • Compression: Data is automatically compressed during transfer

Security Best Practices

  1. Least Privilege: Grant minimal required permissions
  2. Cross-Account Roles: Use IAM roles instead of access keys
  3. Encryption: Enable encryption at rest and in transit
  4. VPC Deployment: Deploy OpenSearch in private subnets
  5. Access Logging: Enable S3 access logging for audit trails

Monitoring and Alerting

Monitor snapshot operations with CloudWatch metrics:

resource "aws_cloudwatch_metric_alarm" "snapshot_failures" {
  alarm_name = "opensearch-snapshot-failures"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods = "1"
  metric_name = "SnapshotFailure"
  namespace = "AWS/ES"
  period = "300"
  statistic = "Sum"
  threshold = "0"
  alarm_description = "OpenSearch snapshot failures"
}

Real-World Experience

This setup was battle-tested in a production environment processing over €27 million in transactions. Key learnings:

  • Documentation Matters: Previous experience with similar patterns accelerated implementation
  • Reusable Tools: CLI utilities built for one project directly helped solve cross-account challenges
  • Same Region Benefits: Keeping everything in the same region (Milan/eu-south-1) improved performance and reduced costs
  • ACL Gotchas: The BucketOwnerEnforced vs BucketOwnerPreferred distinction is critical

Conclusion

Cross-account OpenSearch snapshots provide a robust solution for data migration, backup, and synchronization across AWS accounts. By combining Terraform infrastructure-as-code with purpose-built CLI tools, you can create a reliable, automated system for managing your OpenSearch data across environments.

The key to success is proper IAM configuration, understanding S3 ACL requirements, and building reusable tools that can be applied across different scenarios. With this foundation, you can confidently manage multi-account OpenSearch deployments at scale.

Resources