Shell Functions as Lambda
The Complete Journey to Sub-25ms Cold Starts
What started as a simple question
Can we run shell scripts as Lambda functions?
became a deep dive into Lambda performance optimization. Through months of benchmarking and iteration, we discovered an architecture that delivers sub-25ms cold starts while maintaining the simplicity that makes shell scripting powerful.
This is the complete story of that journey.
The Problem: Shell Scripts Meet Serverless
Shell scripts are perfect for automation, data processing, and system integration. They’re readable, maintainable, and leverage decades of Unix tooling. But AWS Lambda doesn’t natively support shell scripts.
The challenge: How do you run shell functions in Lambda without sacrificing performance?
The Initial Solution: Custom Runtime
Our first approach was building a custom Lambda runtime specifically for shell functions. This involved:
- Creating a container image with our custom runtime
- Publishing it to AWS Lambda’s runtime ecosystem
- Packaging shell functions with the runtime
The custom runtime worked well, delivering ~30ms cold starts. But it raised a question: “Do we really need a custom runtime, or can we achieve the same results with AWS’s provided runtimes?”
This question sparked the optimization journey that follows.
Chapter 1: Pure Bash (The Naive Approach)
Our first attempt used pure Bash for everything, including Lambda Runtime API communication:
#!/bin/bash
while true; do
# Get next invocation
response=$(curl -s "$AWS_LAMBDA_RUNTIME_API/2018-06-01/runtime/invocation/next")
request_id=$(echo "$response" | grep -i lambda-runtime-aws-request-id | cut -d: -f2)
# Process event
result=$(process_event "$response")
# Send response
curl -X POST "$AWS_LAMBDA_RUNTIME_API/2018-06-01/runtime/invocation/$request_id/response" \
-d "$result"
done
Problems:
- Heavy process spawning for every HTTP call
- JSON parsing with shell tools
- Significant overhead per invocation
Performance: ~80-100ms cold starts
Verdict: Functional but too slow for production.
Chapter 2: Hybrid Architecture (Go + Bash)
Instead of maintaining a custom runtime, we experimented with using AWS’s provided.al2023 runtime and bringing our own bootstrap. The breakthrough came from separating concerns:
- Go binary: Handle Lambda Runtime API (fast HTTP client)
- Shell functions: Handle business logic (simple scripting)
// Go handles the heavy lifting
func (c *runtimeAPIClient) getNextInvocation() (string, []byte, error) {
resp, err := c.httpClient.Get(c.baseURL + "next")
// ... fast HTTP processing
}
// Shell handles the logic
func executeShellHandler(handlerFile, handlerFunc string, eventData []byte) ([]byte, error) {
cmd := exec.Command("bash", "-c", "source "+handlerFile+" && "+handlerFunc)
cmd.Stdin = bytes.NewReader(eventData)
return cmd.Output()
}
Performance: ~42ms cold starts
Verdict: Major improvement, but we can do better.
Chapter 3: Container Images vs ZIP Packages
Conventional wisdom says ZIP packages are always faster. We tested this assumption with identical 5MB Go bootstrap binaries.
The Benchmark Results
ZIP Package Performance:
{
"init_average_ms": 42.61,
"p20_ms": 40.41,
"p80_ms": 47.30
}
Container Image Performance:
{
"init_average_ms": 33.51,
"p20_ms": 25.71,
"p80_ms": 36.46
}
Key Finding: Container images were 21-36% faster across all percentiles.
Why Container Images Won:
- Pre-built layers eliminate S3 download + extraction overhead
- Lambda’s container infrastructure efficiently caches layers
- No runtime I/O during cold start initialization
Verdict: Container images outperform ZIP packages for larger runtimes (>2-3MB).
Chapter 4: Raw TCP Socket Optimization
The Go net/http package is robust but heavy (~3MB). What if we used raw TCP sockets for the Lambda Runtime API?
// Before: Heavy HTTP client
resp, err := c.httpClient.Get(c.baseURL + "next")
// After: Raw TCP socket
conn, err := net.Dial("tcp", c.host)
fmt.Fprintf(conn, "GET /2018-06-01/runtime/invocation/next HTTP/1.1\r\nHost: %s\r\n\r\n", c.host)
Results:
- Binary size: 5.7MB → 2.3MB (60% reduction)
- Cold start: 42ms → 21.31ms (50% faster)
Performance Breakdown:
{
"init_average_ms": 21.31,
"p20_ms": 19.99,
"p40_ms": 20.24,
"p60_ms": 20.82,
"p80_ms": 23.74
}
Verdict: Raw TCP sockets delivered the biggest performance gain.
Chapter 5: The UPX Compression Trap
With a 2.3MB binary, UPX compression seemed like the obvious next step:
- Binary size: 2.3MB → 676KB (70% reduction)
- Expected result: Even faster cold starts
Actual Results:
{
"init_average_ms": 56.01,
"p20_ms": 51.51,
"p80_ms": 60.24
}
The Trap: UPX made performance 33% worse than the original HTTP client!
Why UPX Backfired:
- Decompression overhead: ~35ms CPU penalty per cold start
- ARM64 architecture: Slower decompression on Lambda’s ARM processors
- Memory pressure: Additional allocation during initialization
- No caching benefit: Each container instance decompresses independently
Critical Insight: In Lambda’s execution model, a 70% file size reduction led to a 160% performance degradation.
Verdict: Code simplicity beats file size optimization.
Chapter 6: Runtime-as-a-Layer Pattern
The final optimization came from architectural thinking: What if we separate the runtime from the business logic entirely?
Architecture:
runtime-layer.zip (2.3MB)
└── bootstrap (Optimized Go runtime)
function.zip (~1KB)
└── handler.sh (Shell business logic)
Implementation:
# Deploy runtime layer once
module "shell_runtime_layer" {
source = "git::https://github.com/ql4b/terraform-aws-lambda-layer.git?ref=v1.0.0"
name = "shell-runtime"
source_dir = "./runtime/build"
compatible_architectures = ["arm64"]
compatible_runtimes = ["provided.al2023"]
}
# Deploy multiple functions using the layer
module "my_function" {
source = "git::https://github.com/ql4b/terraform-aws-lambda-function.git"
name = "my-shell-function"
source_dir = "./app/src"
layers = [module.shell_runtime_layer.layer_arn]
}
Performance Results:
{
"init_average_ms": 21.80,
"p20_ms": 20.28,
"p40_ms": 20.56,
"p60_ms": 21.06,
"p80_ms": 23.88
}
The Surprise: Runtime layers added zero performance penalty compared to monolithic functions.
The Complete Performance Journey
| Approach | Binary Size | Avg Init | P80 | Key Insight |
|---|---|---|---|---|
| Pure Bash | ~100KB | ~90ms | ~120ms | Process spawning kills performance |
| Go + Bash (HTTP) | 5.7MB | 42.61ms | 47.30ms | Hybrid architecture works |
| Container Image | 5.7MB | 33.51ms | 36.46ms | Containers beat ZIP for large runtimes |
| Raw TCP | 2.3MB | 21.31ms | 23.74ms | Code simplicity > package complexity |
| Raw TCP + UPX | 676KB | 56.01ms | 60.24ms | Compression can backfire |
| Runtime Layer | 2.3MB + 1KB | 21.80ms | 23.88ms | Perfect architecture |
The Final Architecture: Runtime-as-a-Layer
The optimal solution combines all our learnings:
Runtime Layer (Deployed Once)
- Optimized Go bootstrap with raw TCP sockets
- 2.3MB binary with aggressive build flags
- Shared across functions - deploy once, use everywhere
Function Packages (Deployed Per Function)
- Pure shell scripts - just business logic
- ~1KB packages - 99.96% size reduction
- Fast deployments - only redeploy when logic changes
Performance Characteristics
- 21.80ms average cold starts - sub-25ms consistently
- Zero layer overhead - identical to monolithic performance
- Predictable latency - tight P20-P80 range (20-24ms)
Benefits of the Final Architecture
For Developers
- Shell simplicity: Write functions in familiar Bash
- Fast iteration: Deploy only business logic changes
- Clean separation: Runtime and logic evolve independently
For Operations
- Shared runtime: One layer serves multiple functions
- Tiny packages: 99.96% reduction in function package size
- Cost efficiency: Reduced storage and transfer costs
- Maintenance: Runtime updates don’t require function redeployment
For Performance
- Sub-25ms cold starts: Consistently fast initialization
- Predictable latency: Tight performance variance
- Scalable architecture: No performance penalty for separation
Implementation Guide
1. Build the Runtime Layer
# Optimize Go bootstrap
CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build \
-ldflags="-s -w -extldflags '-static'" \
-trimpath \
-o bootstrap main.go
2. Deploy with Terraform
# Runtime layer (deploy once)
module "shell_runtime" {
source = "git::https://github.com/ql4b/terraform-aws-lambda-layer.git"
name = "shell-runtime"
source_dir = "./runtime/build"
}
# Shell function (deploy per function)
module "my_shell_function" {
source = "git::https://github.com/ql4b/terraform-aws-lambda-function.git"
name = "data-processor"
source_dir = "./src"
runtime = "provided.al2023"
handler = "handler.process"
architecture = "arm64"
layers = [module.shell_runtime.layer_arn]
}
3. Write Shell Functions
#!/bin/bash
# handler.sh
process() {
local event="$1"
# Your shell logic here
echo "$event" | jq '.records | length'
}
Lessons Learned
Performance Optimization
- Measure everything - Assumptions about performance are often wrong
- Code simplicity wins - Raw TCP beats heavy HTTP libraries
- Compression can backfire - CPU overhead > size benefits
- Architecture matters - Layer separation has zero performance cost
Lambda Insights
- Container images aren’t always slower - They win for larger runtimes
- Layers are highly optimized - AWS handles layer loading efficiently
- ARM64 is different - Decompression performance varies by architecture
- Size thresholds exist - Different optimizations work at different scales
Development Philosophy
- Systematic experimentation beats intuition
- Benchmark real scenarios - Synthetic tests miss important details
- Challenge conventional wisdom - “ZIP is always faster” isn’t always true
- Separate concerns cleanly - Runtime vs business logic
When You Need More: Utilities and Custom Runtimes
While the runtime-as-a-layer pattern is optimal for pure shell logic, real-world shell functions often need additional utilities not included in provided.al2023.
Missing Utilities Strategy
When your shell functions need specific programs or utilities:
Option 1: Utility Layers Package missing utilities as additional Lambda layers. This approach works well for:
- Common tools like
jq,curl,aws-cli - Compiled binaries that don’t require complex dependencies
- Utilities that can be shared across multiple functions
module "my_function" {
source = "./terraform-aws-lambda-function"
layers = [
module.shell_runtime.layer_arn, # Go bootstrap
module.utilities.layer_arn # jq, curl, etc.
]
}
For a deep dive on this approach, see Lambda Layers Breakthrough
Option 2: Custom Runtime When you need:
- Complex system dependencies
- Specific OS configurations
- Tightly integrated toolchains
- Full control over the runtime environment
In these cases, a custom runtime container image may be more convenient than managing multiple layers.
Decision Framework
| Need | Solution | Trade-off |
|---|---|---|
| Pure shell logic | Runtime layer | Optimal performance |
| Common utilities | Utility layers | Shared, modular |
| Complex dependencies | Custom runtime | Full control, larger packages |
| System configuration | Custom runtime | Flexibility, deployment complexity |
The runtime-as-a-layer pattern excels for shell logic, but don’t hesitate to use custom runtimes when your use case demands it.
The Cloudless Way
This journey embodies the cloudless philosophy:
- Start simple - Pure Bash was the right first step
- Measure and iterate - Each optimization was data-driven
- Embrace constraints - Lambda’s limitations drove creative solutions
- Build composable pieces - Runtime layers enable reuse
- Optimize for clarity - Shell scripts remain readable and maintainable
Conclusion
What began as a custom Lambda runtime project became a masterclass in performance optimization and architectural design. Through systematic experimentation, we discovered that we could achieve better performance using AWS’s standard provided.al2023 runtime than our custom container image.
Key discoveries:
- Hybrid architectures work - Go for performance, shell for logic
- Container images can outperform ZIP packages - for the right use cases
- Raw protocols beat abstractions - when performance matters
- Compression isn’t always better - CPU overhead can dominate
- Layer architecture is optimal - separation with zero performance cost
The final runtime-as-a-layer pattern delivers:
- Sub-25ms cold starts consistently
- 99.96% package size reduction for functions
- Clean architectural separation between runtime and logic
- Production-ready performance with development simplicity
Shell functions in Lambda aren’t just possible—they’re fast, efficient, and maintainable. Sometimes the best way to solve a complex problem is to systematically experiment your way to simplicity.
Want to implement shell functions in your Lambda architecture? Check out the terraform-aws-lambda-layer and terraform-aws-lambda-function modules, and explore the complete benchmark data from this journey.