by skunxicat

Shell Functions as Lambda

The Complete Journey to Sub-25ms Cold Starts

What started as a simple question

Can we run shell scripts as Lambda functions?

became a deep dive into Lambda performance optimization. Through months of benchmarking and iteration, we discovered an architecture that delivers sub-25ms cold starts while maintaining the simplicity that makes shell scripting powerful.

This is the complete story of that journey.

The Problem: Shell Scripts Meet Serverless

Shell scripts are perfect for automation, data processing, and system integration. They’re readable, maintainable, and leverage decades of Unix tooling. But AWS Lambda doesn’t natively support shell scripts.

The challenge: How do you run shell functions in Lambda without sacrificing performance?

The Initial Solution: Custom Runtime

Our first approach was building a custom Lambda runtime specifically for shell functions. This involved:

  • Creating a container image with our custom runtime
  • Publishing it to AWS Lambda’s runtime ecosystem
  • Packaging shell functions with the runtime

The custom runtime worked well, delivering ~30ms cold starts. But it raised a question: “Do we really need a custom runtime, or can we achieve the same results with AWS’s provided runtimes?”

This question sparked the optimization journey that follows.

Chapter 1: Pure Bash (The Naive Approach)

Our first attempt used pure Bash for everything, including Lambda Runtime API communication:

#!/bin/bash
while true; do
  # Get next invocation
  response=$(curl -s "$AWS_LAMBDA_RUNTIME_API/2018-06-01/runtime/invocation/next")
  request_id=$(echo "$response" | grep -i lambda-runtime-aws-request-id | cut -d: -f2)
  
  # Process event
  result=$(process_event "$response")
  
  # Send response
  curl -X POST "$AWS_LAMBDA_RUNTIME_API/2018-06-01/runtime/invocation/$request_id/response" \
    -d "$result"
done

Problems:

  • Heavy process spawning for every HTTP call
  • JSON parsing with shell tools
  • Significant overhead per invocation

Performance: ~80-100ms cold starts

Verdict: Functional but too slow for production.

Chapter 2: Hybrid Architecture (Go + Bash)

Instead of maintaining a custom runtime, we experimented with using AWS’s provided.al2023 runtime and bringing our own bootstrap. The breakthrough came from separating concerns:

  • Go binary: Handle Lambda Runtime API (fast HTTP client)
  • Shell functions: Handle business logic (simple scripting)
// Go handles the heavy lifting
func (c *runtimeAPIClient) getNextInvocation() (string, []byte, error) {
    resp, err := c.httpClient.Get(c.baseURL + "next")
    // ... fast HTTP processing
}

// Shell handles the logic
func executeShellHandler(handlerFile, handlerFunc string, eventData []byte) ([]byte, error) {
    cmd := exec.Command("bash", "-c", "source "+handlerFile+" && "+handlerFunc)
    cmd.Stdin = bytes.NewReader(eventData)
    return cmd.Output()
}

Performance: ~42ms cold starts

Verdict: Major improvement, but we can do better.

Chapter 3: Container Images vs ZIP Packages

Conventional wisdom says ZIP packages are always faster. We tested this assumption with identical 5MB Go bootstrap binaries.

The Benchmark Results

ZIP Package Performance:

{
  "init_average_ms": 42.61,
  "p20_ms": 40.41,
  "p80_ms": 47.30
}

Container Image Performance:

{
  "init_average_ms": 33.51,
  "p20_ms": 25.71,
  "p80_ms": 36.46
}

Key Finding: Container images were 21-36% faster across all percentiles.

Why Container Images Won:

  • Pre-built layers eliminate S3 download + extraction overhead
  • Lambda’s container infrastructure efficiently caches layers
  • No runtime I/O during cold start initialization

Verdict: Container images outperform ZIP packages for larger runtimes (>2-3MB).

Chapter 4: Raw TCP Socket Optimization

The Go net/http package is robust but heavy (~3MB). What if we used raw TCP sockets for the Lambda Runtime API?

// Before: Heavy HTTP client
resp, err := c.httpClient.Get(c.baseURL + "next")

// After: Raw TCP socket
conn, err := net.Dial("tcp", c.host)
fmt.Fprintf(conn, "GET /2018-06-01/runtime/invocation/next HTTP/1.1\r\nHost: %s\r\n\r\n", c.host)

Results:

  • Binary size: 5.7MB → 2.3MB (60% reduction)
  • Cold start: 42ms → 21.31ms (50% faster)

Performance Breakdown:

{
  "init_average_ms": 21.31,
  "p20_ms": 19.99,
  "p40_ms": 20.24,
  "p60_ms": 20.82,
  "p80_ms": 23.74
}

Verdict: Raw TCP sockets delivered the biggest performance gain.

Chapter 5: The UPX Compression Trap

With a 2.3MB binary, UPX compression seemed like the obvious next step:

  • Binary size: 2.3MB → 676KB (70% reduction)
  • Expected result: Even faster cold starts

Actual Results:

{
  "init_average_ms": 56.01,
  "p20_ms": 51.51,
  "p80_ms": 60.24
}

The Trap: UPX made performance 33% worse than the original HTTP client!

Why UPX Backfired:

  1. Decompression overhead: ~35ms CPU penalty per cold start
  2. ARM64 architecture: Slower decompression on Lambda’s ARM processors
  3. Memory pressure: Additional allocation during initialization
  4. No caching benefit: Each container instance decompresses independently

Critical Insight: In Lambda’s execution model, a 70% file size reduction led to a 160% performance degradation.

Verdict: Code simplicity beats file size optimization.

Chapter 6: Runtime-as-a-Layer Pattern

The final optimization came from architectural thinking: What if we separate the runtime from the business logic entirely?

Architecture:

runtime-layer.zip (2.3MB)
└── bootstrap (Optimized Go runtime)

function.zip (~1KB)  
└── handler.sh (Shell business logic)

Implementation:

# Deploy runtime layer once
module "shell_runtime_layer" {
  source      = "git::https://github.com/ql4b/terraform-aws-lambda-layer.git?ref=v1.0.0"
  
  name       = "shell-runtime"
  source_dir = "./runtime/build"
  
  compatible_architectures = ["arm64"]
  compatible_runtimes      = ["provided.al2023"]
}

# Deploy multiple functions using the layer
module "my_function" {
  source = "git::https://github.com/ql4b/terraform-aws-lambda-function.git"
  
  name       = "my-shell-function"
  source_dir = "./app/src"
  
  layers = [module.shell_runtime_layer.layer_arn]
}

Performance Results:

{
  "init_average_ms": 21.80,
  "p20_ms": 20.28,
  "p40_ms": 20.56,
  "p60_ms": 21.06,
  "p80_ms": 23.88
}

The Surprise: Runtime layers added zero performance penalty compared to monolithic functions.

The Complete Performance Journey

ApproachBinary SizeAvg InitP80Key Insight
Pure Bash~100KB~90ms~120msProcess spawning kills performance
Go + Bash (HTTP)5.7MB42.61ms47.30msHybrid architecture works
Container Image5.7MB33.51ms36.46msContainers beat ZIP for large runtimes
Raw TCP2.3MB21.31ms23.74msCode simplicity > package complexity
Raw TCP + UPX676KB56.01ms60.24msCompression can backfire
Runtime Layer2.3MB + 1KB21.80ms23.88msPerfect architecture

The Final Architecture: Runtime-as-a-Layer

The optimal solution combines all our learnings:

Runtime Layer (Deployed Once)

  • Optimized Go bootstrap with raw TCP sockets
  • 2.3MB binary with aggressive build flags
  • Shared across functions - deploy once, use everywhere

Function Packages (Deployed Per Function)

  • Pure shell scripts - just business logic
  • ~1KB packages - 99.96% size reduction
  • Fast deployments - only redeploy when logic changes

Performance Characteristics

  • 21.80ms average cold starts - sub-25ms consistently
  • Zero layer overhead - identical to monolithic performance
  • Predictable latency - tight P20-P80 range (20-24ms)

Benefits of the Final Architecture

For Developers

  • Shell simplicity: Write functions in familiar Bash
  • Fast iteration: Deploy only business logic changes
  • Clean separation: Runtime and logic evolve independently

For Operations

  • Shared runtime: One layer serves multiple functions
  • Tiny packages: 99.96% reduction in function package size
  • Cost efficiency: Reduced storage and transfer costs
  • Maintenance: Runtime updates don’t require function redeployment

For Performance

  • Sub-25ms cold starts: Consistently fast initialization
  • Predictable latency: Tight performance variance
  • Scalable architecture: No performance penalty for separation

Implementation Guide

1. Build the Runtime Layer

# Optimize Go bootstrap
CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build \
  -ldflags="-s -w -extldflags '-static'" \
  -trimpath \
  -o bootstrap main.go

2. Deploy with Terraform

# Runtime layer (deploy once)
module "shell_runtime" {
  source = "git::https://github.com/ql4b/terraform-aws-lambda-layer.git"
  
  name       = "shell-runtime"
  source_dir = "./runtime/build"
}

# Shell function (deploy per function)
module "my_shell_function" {
  source = "git::https://github.com/ql4b/terraform-aws-lambda-function.git"
  
  name       = "data-processor"
  source_dir = "./src"
  
  runtime      = "provided.al2023"
  handler      = "handler.process"
  architecture = "arm64"
  
  layers = [module.shell_runtime.layer_arn]
}

3. Write Shell Functions

#!/bin/bash
# handler.sh

process() {
  local event="$1"
  
  # Your shell logic here
  echo "$event" | jq '.records | length'
}

Lessons Learned

Performance Optimization

  1. Measure everything - Assumptions about performance are often wrong
  2. Code simplicity wins - Raw TCP beats heavy HTTP libraries
  3. Compression can backfire - CPU overhead > size benefits
  4. Architecture matters - Layer separation has zero performance cost

Lambda Insights

  1. Container images aren’t always slower - They win for larger runtimes
  2. Layers are highly optimized - AWS handles layer loading efficiently
  3. ARM64 is different - Decompression performance varies by architecture
  4. Size thresholds exist - Different optimizations work at different scales

Development Philosophy

  1. Systematic experimentation beats intuition
  2. Benchmark real scenarios - Synthetic tests miss important details
  3. Challenge conventional wisdom - “ZIP is always faster” isn’t always true
  4. Separate concerns cleanly - Runtime vs business logic

When You Need More: Utilities and Custom Runtimes

While the runtime-as-a-layer pattern is optimal for pure shell logic, real-world shell functions often need additional utilities not included in provided.al2023.

Missing Utilities Strategy

When your shell functions need specific programs or utilities:

Option 1: Utility Layers Package missing utilities as additional Lambda layers. This approach works well for:

  • Common tools like jq, curl, aws-cli
  • Compiled binaries that don’t require complex dependencies
  • Utilities that can be shared across multiple functions
module "my_function" {
  source = "./terraform-aws-lambda-function"
  
  layers = [
    module.shell_runtime.layer_arn,    # Go bootstrap
    module.utilities.layer_arn         # jq, curl, etc.
  ]
}

For a deep dive on this approach, see Lambda Layers Breakthrough

Option 2: Custom Runtime When you need:

  • Complex system dependencies
  • Specific OS configurations
  • Tightly integrated toolchains
  • Full control over the runtime environment

In these cases, a custom runtime container image may be more convenient than managing multiple layers.

Decision Framework

NeedSolutionTrade-off
Pure shell logicRuntime layerOptimal performance
Common utilitiesUtility layersShared, modular
Complex dependenciesCustom runtimeFull control, larger packages
System configurationCustom runtimeFlexibility, deployment complexity

The runtime-as-a-layer pattern excels for shell logic, but don’t hesitate to use custom runtimes when your use case demands it.

The Cloudless Way

This journey embodies the cloudless philosophy:

  • Start simple - Pure Bash was the right first step
  • Measure and iterate - Each optimization was data-driven
  • Embrace constraints - Lambda’s limitations drove creative solutions
  • Build composable pieces - Runtime layers enable reuse
  • Optimize for clarity - Shell scripts remain readable and maintainable

Conclusion

What began as a custom Lambda runtime project became a masterclass in performance optimization and architectural design. Through systematic experimentation, we discovered that we could achieve better performance using AWS’s standard provided.al2023 runtime than our custom container image.

Key discoveries:

  1. Hybrid architectures work - Go for performance, shell for logic
  2. Container images can outperform ZIP packages - for the right use cases
  3. Raw protocols beat abstractions - when performance matters
  4. Compression isn’t always better - CPU overhead can dominate
  5. Layer architecture is optimal - separation with zero performance cost

The final runtime-as-a-layer pattern delivers:

  • Sub-25ms cold starts consistently
  • 99.96% package size reduction for functions
  • Clean architectural separation between runtime and logic
  • Production-ready performance with development simplicity

Shell functions in Lambda aren’t just possible—they’re fast, efficient, and maintainable. Sometimes the best way to solve a complex problem is to systematically experiment your way to simplicity.


Want to implement shell functions in your Lambda architecture? Check out the terraform-aws-lambda-layer and terraform-aws-lambda-function modules, and explore the complete benchmark data from this journey.