Shell Functions as Lambda

The Complete Journey to Sub-25ms Cold Starts

What started as a simple question

Can we run shell scripts as Lambda functions?

became a deep dive into Lambda performance optimization. Through months of benchmarking and iteration, we discovered an architecture that delivers sub-25ms cold starts while maintaining the simplicity that makes shell scripting powerful.

This is the complete story of that journey.

The Problem: Shell Scripts Meet Serverless

Shell scripts are perfect for automation, data processing, and system integration. They’re readable, maintainable, and leverage decades of Unix tooling. But AWS Lambda doesn’t natively support shell scripts.

The challenge: How do you run shell functions in Lambda without sacrificing performance?

The Initial Solution: Custom Runtime

Our first approach was building a custom Lambda runtime specifically for shell functions. This involved:

Creating a container image with our custom runtime
Publishing it to AWS Lambda’s runtime ecosystem
Packaging shell functions with the runtime

The custom runtime worked well, delivering ~30ms cold starts. But it raised a question: “Do we really need a custom runtime, or can we achieve the same results with AWS’s provided runtimes?”

This question sparked the optimization journey that follows.

Chapter 1: Pure Bash (The Naive Approach)

Our first attempt used pure Bash for everything, including Lambda Runtime API communication:

#!/bin/bash
while true; do
  # Get next invocation
  response=$(curl -s "$AWS_LAMBDA_RUNTIME_API/2018-06-01/runtime/invocation/next")
  request_id=$(echo "$response" | grep -i lambda-runtime-aws-request-id | cut -d: -f2)
  
  # Process event
  result=$(process_event "$response")
  
  # Send response
  curl -X POST "$AWS_LAMBDA_RUNTIME_API/2018-06-01/runtime/invocation/$request_id/response" \
    -d "$result"
done

Problems:

Heavy process spawning for every HTTP call
JSON parsing with shell tools
Significant overhead per invocation

Performance: ~80-100ms cold starts

Verdict: Functional but too slow for production.

Chapter 2: Hybrid Architecture (Go + Bash)

Instead of maintaining a custom runtime, we experimented with using AWS’s provided.al2023 runtime and bringing our own bootstrap. The breakthrough came from separating concerns:

Go binary: Handle Lambda Runtime API (fast HTTP client)
Shell functions: Handle business logic (simple scripting)

// Go handles the heavy lifting
func (c *runtimeAPIClient) getNextInvocation() (string, []byte, error) {
    resp, err := c.httpClient.Get(c.baseURL + "next")
    // ... fast HTTP processing
}

// Shell handles the logic
func executeShellHandler(handlerFile, handlerFunc string, eventData []byte) ([]byte, error) {
    cmd := exec.Command("bash", "-c", "source "+handlerFile+" && "+handlerFunc)
    cmd.Stdin = bytes.NewReader(eventData)
    return cmd.Output()
}

Performance: ~42ms cold starts

Verdict: Major improvement, but we can do better.

Chapter 3: Container Images vs ZIP Packages

Conventional wisdom says ZIP packages are always faster. We tested this assumption with identical 5MB Go bootstrap binaries.

The Benchmark Results

ZIP Package Performance:

{
  "init_average_ms": 42.61,
  "p20_ms": 40.41,
  "p80_ms": 47.30
}

Container Image Performance:

{
  "init_average_ms": 33.51,
  "p20_ms": 25.71,
  "p80_ms": 36.46
}

Key Finding: Container images were 21-36% faster across all percentiles.

Why Container Images Won:

Pre-built layers eliminate S3 download + extraction overhead
Lambda’s container infrastructure efficiently caches layers
No runtime I/O during cold start initialization

Verdict: Container images outperform ZIP packages for larger runtimes (>2-3MB).

Chapter 4: Raw TCP Socket Optimization

The Go net/http package is robust but heavy (~3MB). What if we used raw TCP sockets for the Lambda Runtime API?

// Before: Heavy HTTP client
resp, err := c.httpClient.Get(c.baseURL + "next")

// After: Raw TCP socket
conn, err := net.Dial("tcp", c.host)
fmt.Fprintf(conn, "GET /2018-06-01/runtime/invocation/next HTTP/1.1\r\nHost: %s\r\n\r\n", c.host)

Results:

Binary size: 5.7MB → 2.3MB (60% reduction)
Cold start: 42ms → 21.31ms (50% faster)

Performance Breakdown:

{
  "init_average_ms": 21.31,
  "p20_ms": 19.99,
  "p40_ms": 20.24,
  "p60_ms": 20.82,
  "p80_ms": 23.74
}

Verdict: Raw TCP sockets delivered the biggest performance gain.

Chapter 5: The UPX Compression Trap

With a 2.3MB binary, UPX compression seemed like the obvious next step:

Binary size: 2.3MB → 676KB (70% reduction)
Expected result: Even faster cold starts

Actual Results:

{
  "init_average_ms": 56.01,
  "p20_ms": 51.51,
  "p80_ms": 60.24
}

The Trap: UPX made performance 33% worse than the original HTTP client!

Why UPX Backfired:

Decompression overhead: ~35ms CPU penalty per cold start
ARM64 architecture: Slower decompression on Lambda’s ARM processors
Memory pressure: Additional allocation during initialization
No caching benefit: Each container instance decompresses independently

Critical Insight: In Lambda’s execution model, a 70% file size reduction led to a 160% performance degradation.

Verdict: Code simplicity beats file size optimization.

Chapter 6: Runtime-as-a-Layer Pattern

The final optimization came from architectural thinking: What if we separate the runtime from the business logic entirely?

Architecture:

runtime-layer.zip (2.3MB)
└── bootstrap (Optimized Go runtime)

function.zip (~1KB)  
└── handler.sh (Shell business logic)

Implementation:

# Deploy runtime layer once
module "shell_runtime_layer" {
  source      = "git::https://github.com/ql4b/terraform-aws-lambda-layer.git?ref=v1.0.0"
  
  name       = "shell-runtime"
  source_dir = "./runtime/build"
  
  compatible_architectures = ["arm64"]
  compatible_runtimes      = ["provided.al2023"]
}

# Deploy multiple functions using the layer
module "my_function" {
  source = "git::https://github.com/ql4b/terraform-aws-lambda-function.git"
  
  name       = "my-shell-function"
  source_dir = "./app/src"
  
  layers = [module.shell_runtime_layer.layer_arn]
}

Performance Results:

{
  "init_average_ms": 21.80,
  "p20_ms": 20.28,
  "p40_ms": 20.56,
  "p60_ms": 21.06,
  "p80_ms": 23.88
}

The Surprise: Runtime layers added zero performance penalty compared to monolithic functions.

The Complete Performance Journey

Approach	Binary Size	Avg Init	P80	Key Insight
Pure Bash	~100KB	~90ms	~120ms	Process spawning kills performance
Go + Bash (HTTP)	5.7MB	42.61ms	47.30ms	Hybrid architecture works
Container Image	5.7MB	33.51ms	36.46ms	Containers beat ZIP for large runtimes
Raw TCP	2.3MB	21.31ms	23.74ms	Code simplicity > package complexity
Raw TCP + UPX	676KB	56.01ms	60.24ms	Compression can backfire
Runtime Layer	2.3MB + 1KB	21.80ms	23.88ms	Perfect architecture

The Final Architecture: Runtime-as-a-Layer

The optimal solution combines all our learnings:

Runtime Layer (Deployed Once)

Optimized Go bootstrap with raw TCP sockets
2.3MB binary with aggressive build flags
Shared across functions - deploy once, use everywhere

Function Packages (Deployed Per Function)

Pure shell scripts - just business logic
~1KB packages - 99.96% size reduction
Fast deployments - only redeploy when logic changes

Performance Characteristics

21.80ms average cold starts - sub-25ms consistently
Zero layer overhead - identical to monolithic performance
Predictable latency - tight P20-P80 range (20-24ms)

Benefits of the Final Architecture

For Developers

Shell simplicity: Write functions in familiar Bash
Fast iteration: Deploy only business logic changes
Clean separation: Runtime and logic evolve independently

For Operations

Shared runtime: One layer serves multiple functions
Tiny packages: 99.96% reduction in function package size
Cost efficiency: Reduced storage and transfer costs
Maintenance: Runtime updates don’t require function redeployment

For Performance

Sub-25ms cold starts: Consistently fast initialization
Predictable latency: Tight performance variance
Scalable architecture: No performance penalty for separation

Implementation Guide

1. Build the Runtime Layer

# Optimize Go bootstrap
CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build \
  -ldflags="-s -w -extldflags '-static'" \
  -trimpath \
  -o bootstrap main.go

2. Deploy with Terraform

# Runtime layer (deploy once)
module "shell_runtime" {
  source = "git::https://github.com/ql4b/terraform-aws-lambda-layer.git"
  
  name       = "shell-runtime"
  source_dir = "./runtime/build"
}

# Shell function (deploy per function)
module "my_shell_function" {
  source = "git::https://github.com/ql4b/terraform-aws-lambda-function.git"
  
  name       = "data-processor"
  source_dir = "./src"
  
  runtime      = "provided.al2023"
  handler      = "handler.process"
  architecture = "arm64"
  
  layers = [module.shell_runtime.layer_arn]
}

3. Write Shell Functions

#!/bin/bash
# handler.sh

process() {
  local event="$1"
  
  # Your shell logic here
  echo "$event" | jq '.records | length'
}

Lessons Learned

Performance Optimization

Measure everything - Assumptions about performance are often wrong
Code simplicity wins - Raw TCP beats heavy HTTP libraries
Compression can backfire - CPU overhead > size benefits
Architecture matters - Layer separation has zero performance cost

Lambda Insights

Container images aren’t always slower - They win for larger runtimes
Layers are highly optimized - AWS handles layer loading efficiently
ARM64 is different - Decompression performance varies by architecture
Size thresholds exist - Different optimizations work at different scales

Development Philosophy

Systematic experimentation beats intuition
Benchmark real scenarios - Synthetic tests miss important details
Challenge conventional wisdom - “ZIP is always faster” isn’t always true
Separate concerns cleanly - Runtime vs business logic

When You Need More: Utilities and Custom Runtimes

While the runtime-as-a-layer pattern is optimal for pure shell logic, real-world shell functions often need additional utilities not included in provided.al2023.

Missing Utilities Strategy

When your shell functions need specific programs or utilities:

Option 1: Utility Layers Package missing utilities as additional Lambda layers. This approach works well for:

Common tools like jq, curl, aws-cli
Compiled binaries that don’t require complex dependencies
Utilities that can be shared across multiple functions

module "my_function" {
  source = "./terraform-aws-lambda-function"
  
  layers = [
    module.shell_runtime.layer_arn,    # Go bootstrap
    module.utilities.layer_arn         # jq, curl, etc.
  ]
}

For a deep dive on this approach, see Lambda Layers Breakthrough

Option 2: Custom Runtime When you need:

Complex system dependencies
Specific OS configurations
Tightly integrated toolchains
Full control over the runtime environment

In these cases, a custom runtime container image may be more convenient than managing multiple layers.

Decision Framework

Need	Solution	Trade-off
Pure shell logic	Runtime layer	Optimal performance
Common utilities	Utility layers	Shared, modular
Complex dependencies	Custom runtime	Full control, larger packages
System configuration	Custom runtime	Flexibility, deployment complexity

The runtime-as-a-layer pattern excels for shell logic, but don’t hesitate to use custom runtimes when your use case demands it.

The Cloudless Way

This journey embodies the cloudless philosophy:

Start simple - Pure Bash was the right first step
Measure and iterate - Each optimization was data-driven
Embrace constraints - Lambda’s limitations drove creative solutions
Build composable pieces - Runtime layers enable reuse
Optimize for clarity - Shell scripts remain readable and maintainable

Conclusion

What began as a custom Lambda runtime project became a masterclass in performance optimization and architectural design. Through systematic experimentation, we discovered that we could achieve better performance using AWS’s standard provided.al2023 runtime than our custom container image.

Key discoveries:

Hybrid architectures work - Go for performance, shell for logic
Container images can outperform ZIP packages - for the right use cases
Raw protocols beat abstractions - when performance matters
Compression isn’t always better - CPU overhead can dominate
Layer architecture is optimal - separation with zero performance cost

The final runtime-as-a-layer pattern delivers:

Sub-25ms cold starts consistently
99.96% package size reduction for functions
Clean architectural separation between runtime and logic
Production-ready performance with development simplicity

Shell functions in Lambda aren’t just possible—they’re fast, efficient, and maintainable. Sometimes the best way to solve a complex problem is to systematically experiment your way to simplicity.

Want to implement shell functions in your Lambda architecture? Check out the terraform-aws-lambda-layer and terraform-aws-lambda-function modules, and explore the complete benchmark data from this journey.

A Terraform Module for Shell Functions on Lambda — the production-ready Terraform module
Lambda Container Images vs ZIP: The UPX Trap — deep dive into packaging benchmarks
Test Lambda Functions Locally — run functions on your laptop with the RIE
Lambda Custom Runtime for Shell Scripts — the original container image approach