TCP #43: Zero-ETL Patterns on AWS with DynamoDB and OpenSearch

Part 3: How to Achieve Zero-ETL Patterns on AWS with DynamoDB and OpenSearch

Amrut Patil

Mar 02, 2025

You can also read my newsletters from the Substack mobile app and be notified when a new issue is available.

Available for iOS and Android

👋 Before we begin... a big thank you to today's sponsor, Lemon.io

How Much Does Your Hiring Delay Cost You?

When critical features sit in your backlog for months during traditional hiring cycles, you're losing more than time—you're losing market opportunity.

Lemon.io eliminates this risk.

Through our rigorous multi-step vetting process, we assess technical skills, problem-solving abilities, and communication—delivering top engineers in just 48 hours.

Need to scale quickly or find specialized talent? Our developers from Europe and Latin America become part of your team immediately—no lengthy hiring, no long-term commitments.

Calculate your savings at lemon.io

This is part 3 of the 3-part series on Zero-ETL Patterns on AWS series.

You can find the other parts here:

Traditional ETL pipelines require significant engineering resources to build and maintain, introduce latency and can become bottlenecks as your data needs grow.

Zero-ETL represents a paradigm shift in how we think about data movement.

Rather than treating data transfer as a separate, batch-oriented process, Zero-ETL aims to make data flow automatically and in real time between different systems.

When appropriately implemented, this approach can dramatically reduce data latency, simplify architecture, and allow your teams to focus on deriving insights rather than managing pipelines.

In this newsletter, I'll explore how to implement Zero-ETL patterns on AWS using DynamoDB as your operational database and OpenSearch as your search and analytics engine.

Let's dive in!

Understanding the Building Blocks

Before we get into implementation details, let's understand the core AWS services we'll be working with:

Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability.

It excels at handling high-throughput, low-latency workloads and is an excellent operational database for many applications.

Amazon OpenSearch Service (the successor to Amazon Elasticsearch Service) is a managed service that makes it easy to deploy, operate, and scale OpenSearch clusters. It's ideal for log analytics, full-text search, application monitoring, and more.

AWS Lambda is a serverless compute service that lets you run code without provisioning or managing servers. In our Zero-ETL architecture, Lambda will serve as the connector that reacts to changes in DynamoDB and updates OpenSearch accordingly.

The Architecture: Building a Real-Time Data Pipeline

Our Zero-ETL architecture follows these principles:

Changes in DynamoDB are captured automatically
Data transformations happen in-flight
OpenSearch is updated in near real-time
The entire process is automatic and resilient

Here's how the components fit together:

DynamoDB Streams captures item-level changes in your DynamoDB table
Lambda function is triggered by these streams
The function transforms the data as needed
The transformed data is indexed in OpenSearch
Your applications can then search and analyze this data

Let's build this pipeline step by step.

Imagine Your Entire Backlog Cleared by Next Month

While other engineering teams still post job ads, you could ship those critical features. Lemon.io makes this possible by matching you with rigorously vetted engineers in just 48 hours.

Unlike platforms that merely check résumés, our multi-step vetting process thoroughly assesses technical skills, problem-solving abilities, and communication—delivering developers who make an immediate impact.

Need specialized talent that's hard to find locally?

Scale quickly with our pre-vetted developers from Europe and Latin America—no long hiring cycles, no long-term commitments.

Scale quickly

Step 1: Setting Up DynamoDB Streams

DynamoDB Streams is the foundation of our Zero-ETL architecture. It provides a time-ordered sequence of item-level modifications in your DynamoDB table.

To enable DynamoDB Streams:

Open the DynamoDB console
Select your table
Choose the "Exports and streams" tab
Under "DynamoDB stream details," click "Enable"
Select "New and old images" for the view type

This configuration captures both the previous and new state of each modified item, giving you maximum flexibility for handling updates and deletes.

Step 2: Creating the Lambda Function

Now, let's create a Lambda function that will process the stream records and update OpenSearch. Here's a detailed implementation:

import boto3

import json

import os

import requests

from requests_aws4auth import AWS4Auth

from decimal import Decimal

# Initialize AWS clients

region = os.environ['AWS_REGION']

opensearch_endpoint = os.environ['OPENSEARCH_ENDPOINT']

opensearch_index = os.environ['OPENSEARCH_INDEX']

# Get credentials for OpenSearch requests

credentials = boto3.Session().get_credentials()

awsauth = AWS4Auth(

credentials.access_key,

credentials.secret_key,

region,

'es',

session_token=credentials.token

)

# Helper class to handle Decimal types in DynamoDB items

class DecimalEncoder(json.JSONEncoder):

def default(self, obj):

if isinstance(obj, Decimal):

return float(obj)

return super(DecimalEncoder, self).default(obj)

def lambda_handler(event, context):

success_count = 0

failure_count = 0

# Process each record in the DynamoDB stream

for record in event['Records']:

try:

# Get the DynamoDB table name from the event source ARN

table_name = record['eventSourceARN'].split('/')[1]

# Get the primary key configuration for the table

dynamodb = boto3.resource('dynamodb')

table = dynamodb.Table(table_name)

key_schema = table.key_schema

# Extract the primary key from the record

if record['eventName'] == 'REMOVE':

# For deletions, use the old image

ddb_item = record['dynamodb'].get('OldImage', {})

else:

# For inserts and updates, use the new image

ddb_item = record['dynamodb'].get('NewImage', {})

# Convert DynamoDB JSON to regular JSON

item_json = json.dumps(ddb_item, cls=DecimalEncoder)

item = json.loads(item_json)

# Extract the document ID from the primary key

document_id = build_document_id(key_schema, item)

# Handle different event types

if record['eventName'] == 'REMOVE':

# Delete document from OpenSearch

response = delete_from_opensearch(document_id)

else:

# Index document in OpenSearch

response = index_in_opensearch(document_id, item)

# Check if operation was successful

if 200 <= response.status_code < 300:

success_count += 1

else:

failure_count += 1

print(f"Error processing record: {response.text}")

except Exception as e:

failure_count += 1

print(f"Error processing record: {str(e)}")

# Return summary of processing

return {

'statusCode': 200,

'body': json.dumps({

'processedRecords': len(event['Records']),

'successCount': success_count,

'failureCount': failure_count

})

}

def build_document_id(key_schema, item):

"""Build a document ID from the primary key attributes"""

parts = []

for key in key_schema:

attr_name = key['AttributeName']

if attr_name in item:

parts.append(str(item[attr_name]))

return "#".join(parts)

def index_in_opensearch(document_id, document):

"""Index a document in OpenSearch"""

url = f"{opensearch_endpoint}/{opensearch_index}/_doc/{document_id}"

headers = {"Content-Type": "application/json"}

response = requests.put(url, auth=awsauth, json=document, headers=headers)

return response

def delete_from_opensearch(document_id):

"""Delete a document from OpenSearch"""

url = f"{opensearch_endpoint}/{opensearch_index}/_doc/{document_id}"

response = requests.delete(url, auth=awsauth)

return response

This Lambda function:

Processes each record from the DynamoDB stream
Handles different event types (INSERT, MODIFY, REMOVE)
Converts DynamoDB's JSON format to standard JSON
Creates a unique document ID based on the item's primary key
Indexes or deletes documents in OpenSearch as appropriate
Includes proper error handling and reporting

Step 3: Configuring Lambda Permissions and Environment Variables

For the Lambda function to work correctly, you need to:

Create an IAM role with these permissions:
- DynamoDB: dynamodb:DescribeTable, dynamodb:GetItem
- DynamoDB Streams: dynamodb:GetRecords, dynamodb:GetShardIterator, dynamodb:DescribeStream, dynamodb:ListStreams
- OpenSearch: es:ESHttpPut, es:ESHttpDelete, es:ESHttpGet
- CloudWatch Logs: logs:CreateLogGroup, logs:CreateLogStream, logs:PutLogEvents
Set these environment variables:
- OPENSEARCH_ENDPOINT: The endpoint URL of your OpenSearch domain
- OPENSEARCH_INDEX: The name of the index where you want to store the data

Step 4: Creating the OpenSearch Index

Before data can flow into OpenSearch, you must create an index with the appropriate mapping. The mapping defines how documents and their fields are stored and indexed.

Here's how to create a basic index using the OpenSearch Dashboard's Dev Tools:

PUT /your-index-name

{

"settings": {

"number_of_shards": 3,

"number_of_replicas": 1

},

"mappings": {

"properties": {

"id": { "type": "keyword" },

"name": { "type": "text" },

"description": { "type": "text" },

"price": { "type": "float" },

"created_at": { "type": "date" },

"tags": { "type": "keyword" }

}

Adjust the field names and types to match your DynamoDB table structure. The mapping should reflect the structure of your data after transformation in the Lambda function.

Step 5: Setting Up the Lambda Trigger

Now, connect your Lambda function to the DynamoDB stream:

In the Lambda console, go to your function
Click on "Add trigger"
Select "DynamoDB" as the source
Choose your DynamoDB table
Set "Starting position" to "Latest"
Click "Add"

Your Lambda function will be invoked whenever changes occur in your DynamoDB table.

Handling Schema Evolution

One of the challenges in maintaining Zero-ETL pipelines is dealing with schema changes.

Here are strategies to make your pipeline resilient:

Dynamic Mapping in OpenSearch: Enable dynamic mapping in OpenSearch to adapt to new fields automatically.
Schema Version Tracking: Include a schema version with each item and handle different versions in your Lambda function.
Field Pruning: In the Lambda function, selectively include or exclude fields based on your needs.
Transform Layer: Implement a transformation layer in your Lambda function that can adapt to schema changes.

Here's how you might implement versioned transformations in your Lambda function:

def transform_item(item, version):

if version == 1:

return transform_v1(item)

elif version == 2:

return transform_v2(item)

else:

return item # Default transformation

def transform_v1(item):

# Original transformation logic

pass

def transform_v2(item):

# Updated transformation logic

pass

Advanced Pattern: Using SQS for Resilience

For improved resilience, you can introduce an SQS queue between the Lambda function and OpenSearch:

The Lambda function sends records to an SQS queue instead of directly to OpenSearch
A separate Lambda function processes messages from the queue and updates OpenSearch
Failed operations can be sent to a Dead Letter Queue for later processing

This pattern provides better buffering and retry capabilities, essential for handling OpenSearch service disruptions or throttling.

Optimizing for Performance and Cost

To ensure your Zero-ETL pipeline performs well while keeping costs under control:

Batch Processing: Modify your Lambda function to process multiple records in a single OpenSearch bulk operation.
Error Handling: Implement robust error handling and consider using Dead Letter Queues for records that can't be processed.
Monitoring: Set up CloudWatch Alarms to monitor the performance and health of your pipeline.
Scaling: Configure appropriate provisioned capacity for DynamoDB and choose the right instance type and count for OpenSearch.
Data Lifecycle: Implement index lifecycle management in OpenSearch to handle aging data.

Monitoring Your Zero-ETL Pipeline

To ensure the health and performance of your Zero-ETL pipeline, monitor these key metrics:

Lambda Execution Metrics: Invocations, errors, duration, throttles
DynamoDB Metrics: Read/write capacity, throttled requests, stream records
OpenSearch Metrics: Cluster health, indexing rate, search latency
End-to-End Latency: Time from DynamoDB update to OpenSearch availability

Set up CloudWatch Dashboards to visualize these metrics and configure alarms for important thresholds.

Best Practices and Considerations

When implementing Zero-ETL patterns with DynamoDB and OpenSearch, keep these best practices in mind:

Security: Use IAM roles with least privilege, encrypt data in transit and at rest
Testing: Thoroughly test your pipeline with various data scenarios and failure modes
Documentation: Document your architecture, including error handling and recovery procedures
Versioning: Version your Lambda functions and OpenSearch mappings
Costs: Regularly review and optimize costs across all components

Final Thoughts

Implementing Zero-ETL patterns between DynamoDB and OpenSearch opens up new real-time search and analytics possibilities.

By eliminating traditional ETL processes, you can:

Reduce data latency from hours to seconds
Simplify your overall architecture
Improve developer productivity
Enable new real-time use cases

While setting up a Zero-ETL pipeline requires careful planning and consideration, the long-term benefits in terms of agility, performance, and reduced maintenance make it a worthwhile investment for many organizations.

As you embark on your Zero-ETL journey, remember that the goal is not just to move data more efficiently, but to enable your organization to make better, faster decisions based on the most current information available.

Next Steps

To get started with your own Zero-ETL implementation:

Evaluate your current data flow and identify opportunities for real-time processing
Start with a pilot project using non-critical data
Iterate and refine based on lessons learned
Gradually expand to more critical data flows

Hiring doesn’t have to slow you down.

Lemon.io delivers pre-vetted developers who integrate seamlessly into your team—fast.

Find your match

That’s it for today!

Did you enjoy this newsletter issue?

Share with your friends, colleagues, and your favorite social media platform.

Share The Cloud Playbook

Until next week — Amrut

Whenever you’re ready, there are 4 ways I can help you:

NEW! Get certified as an AWS AI Practitioner in 2025. Sign up today to elevate your cloud skills. (link)
Are you thinking about getting certified as a Google Cloud Digital Leader?
Here’s a link to my Udemy course, which has helped 628+ students prepare and pass the exam. Currently, rated 4.37/5. (link)
Free guides and helpful resources: https://thecloudplaybook.gumroad.com/
Sponsor The Cloud Playbook Newsletter:
https://www.thecloudplaybook.com/p/sponsor-the-cloud-playbook-newsletter

Get in touch

You can find me on LinkedIn or X.

If you wish to request a topic you would like to read, you can contact me directly via LinkedIn or X.

The Cloud Playbook

TCP #43: Zero-ETL Patterns on AWS with DynamoDB and OpenSearch

Part 3: How to Achieve Zero-ETL Patterns on AWS with DynamoDB and OpenSearch

Understanding the Building Blocks

The Architecture: Building a Real-Time Data Pipeline

Step 1: Setting Up DynamoDB Streams

Step 2: Creating the Lambda Function

Step 3: Configuring Lambda Permissions and Environment Variables

Step 4: Creating the OpenSearch Index

Step 5: Setting Up the Lambda Trigger

Handling Schema Evolution

Advanced Pattern: Using SQS for Resilience

Optimizing for Performance and Cost

Monitoring Your Zero-ETL Pipeline

Best Practices and Considerations

Final Thoughts

Next Steps

Hiring doesn’t have to slow you down.

Whenever you’re ready, there are 4 ways I can help you:

Get in touch

Discussion about this post