Lambda Performance Deep Dive

Container Images, Raw TCP, and the UPX Trap

I built lambda-shell-runtime, a custom AWS Lambda runtime that lets you write serverless functions in Bash. It worked great, but I discovered something that challenged conventional wisdom about Lambda packaging formats.

The Journey: From Pure Bash to Hybrid Architecture

Initially, my shell runtime used pure Bash for everything - including communication with the Lambda Runtime API using curl. This worked, but the overhead was noticeable:

# Pure Bash approach - lots of process spawning
response=$(curl -s "$AWS_LAMBDA_RUNTIME_API/2018-06-01/runtime/invocation/next")
request_id=$(echo "$response" | grep -i lambda-runtime-aws-request-id)
# ... more parsing and HTTP calls

Every Lambda invocation spawned multiple processes for HTTP communication and JSON parsing. The performance impact was clear.

The Hybrid Solution: Go + Bash

To eliminate this overhead, I created a hybrid approach:

Go binary handles Lambda Runtime API communication (fast HTTP client)
Bash functions handle business logic (simple scripting)

// Fast Go HTTP client for Lambda API
func (c *runtimeAPIClient) getNextInvocation() (string, []byte, error) {
    resp, err := c.httpClient.Get(c.baseURL + "next")
    // ... handle response
}

// Execute shell function for business logic  
func executeShellHandler(handlerFile, handlerFunc string, eventData []byte) ([]byte, error) {
    shellCmd := fmt.Sprintf("source %s && %s", handlerFile, handlerFunc)
    cmd := exec.Command("bash", "-c", shellCmd)
    cmd.Stdin = bytes.NewReader(eventData)
    return cmd.Output()
}

This hybrid runtime delivered ~30ms cold start times when packaged as a container image.

The Experiment: Do We Even Need Custom Runtimes?

Then I had a thought: “What if I’m overcomplicating this?”

AWS provides provided.al2023 - an OS-only runtime where you can bring your own bootstrap. Instead of maintaining a custom runtime, I could:

Compile my Go bootstrap to a bootstrap binary
Package it with my shell handler in a ZIP file
Use the standard provided.al2023 runtime

This would eliminate the need for my custom runtime entirely. Time to test this theory.

The Surprising Results

I benchmarked two approaches with identical code:

Both variants used the exact same Go bootstrap binary, the same shell handler logic, identical memory settings, and were executed under the same cold-start testing conditions. The only variable was the packaging format.

Approach 1: ZIP Package + provided.al2023

5MB Go bootstrap binary (compiled with -ldflags="-w -s")
Shell handler script
Standard provided.al2023 runtime

Approach 2: Container Image

Same 5MB Go bootstrap
Same shell handler
Custom container image

The Benchmark Results

I ran controlled benchmarks with 60 cold start measurements for each approach. The results were striking:

ZIP Package Performance (provided.al2023)

{
  "init_count": 60,
  "init_total_ms": 2557.03,
  "init_average_ms": 42.61,
  "p20_ms": 40.41,
  "p40_ms": 40.92,
  "p60_ms": 41.16,
  "p80_ms": 47.30
}

Container Image Performance (Optimized)

{
  "init_average_ms": 27.68,
  "p20_ms": 20.91,
  "p40_ms": 24.01,
  "p60_ms": 27.74,
  "p80_ms": 32.50
}

Complete Performance Comparison

Metric	ZIP Package	Container (Original)	Container (Optimized)	Raw TCP	Raw TCP + UPX
Average Init	42.61ms	33.51ms	27.68ms	21.31ms	56.01ms
P20	40.41ms	25.71ms	20.91ms	19.99ms	51.51ms
P40	40.92ms	28.21ms	24.01ms	20.24ms	53.24ms
P60	41.16ms	31.88ms	27.74ms	20.82ms	56.53ms
P80	47.30ms	36.46ms	32.50ms	23.74ms	60.24ms

Key findings:

Container images are 21-36% faster across all percentiles
Lower variance: Container images show more predictable performance (25-36ms vs 40-47ms)
Consistent advantage: No scenario where ZIP packages performed better

Expected result: ZIP package should be faster (conventional wisdom)

Actual result: Container images consistently outperformed ZIP packages by a significant margin.

Why Container Images Won

This doesn’t mean container images are always faster — for very small deployments (typically under ~1MB), the difference is often negligible and ZIP packages may still be the simpler choice.

This result challenges the typical assumption that ZIP packages are always faster. Here’s what I believe is happening:

ZIP Package Overhead (provided.al2023)

Download phase: Lambda downloads 5MB bootstrap from S3 during cold start
Extraction phase: Unzip and extract files to container filesystem
Permission setup: Configure file permissions and execution context
Process startup: Launch the bootstrap binary

Container Image Advantages

Pre-built layers: Bootstrap is already in optimized image layers
No runtime I/O: No S3 download or extraction during cold start
Optimized filesystem: File permissions and structure pre-configured
Layer caching: Lambda’s container infrastructure efficiently caches layers

The 5MB Threshold Theory

The key insight: there’s a crossover point where container images become more efficient than ZIP packages.

For small deployments (< 1MB), ZIP extraction is negligible. But as your bootstrap grows:

ZIP: Linear increase in download + extraction time
Container: Constant startup time (layers are pre-cached)

My 5MB Go binary crossed this threshold. The network I/O and filesystem operations for ZIP extraction exceeded the container startup overhead.

Implications for Lambda Architecture

When to Choose Container Images

Large custom runtimes (> 2-3MB)
Complex dependencies that benefit from pre-installation
Custom system configurations that can be baked into the image

When ZIP Packages Still Win

Small, simple functions (< 1MB)
Frequent code changes (faster deployment)
Standard runtime compatibility requirements

The Optimization Deep Dive: Raw TCP vs HTTP Client

After discovering container images outperformed ZIP packages, I wondered: “Can we optimize the runtime itself?”

The Go bootstrap was using Go’s standard net/http package for Lambda Runtime API communication. While robust, it’s heavy - the HTTP client alone adds ~3MB to the binary and significant initialization overhead.

The Raw TCP Socket Experiment

I replaced the HTTP client with raw TCP sockets:

// Before: Heavy HTTP client
resp, err := c.httpClient.Get(c.baseURL + "next")

// After: Raw TCP socket
conn, err := net.Dial("tcp", c.host)
fmt.Fprintf(conn, "GET /2018-06-01/runtime/invocation/next HTTP/1.1\r\nHost: %s\r\n\r\n", c.host)

This eliminated the entire net/http package dependency, reducing the binary from 5.7MB to 2.3MB.

The UPX Compression Trap

With a smaller binary, I tried UPX compression to reduce it further:

Binary size: 2.3MB → 676KB (70% reduction)
Cold start performance: Actually got worse!

Performance Results

Approach	Binary Size	Avg Init Time	Performance
HTTP Client	5.7MB	~42ms	Baseline
Raw TCP	2.3MB	21ms	50% faster
Raw TCP + UPX	676KB	~45ms	Slower than baseline

Runtime Optimization Comparison

Metric	HTTP Client	Raw TCP	Raw TCP + UPX	Best Performance
Binary Size	5.7MB	2.3MB	676KB	Raw TCP + UPX
Average Init	~42ms	21.31ms	56.01ms	Raw TCP
P20	~40ms	19.99ms	51.51ms	Raw TCP
P40	~41ms	20.24ms	53.24ms	Raw TCP
P60	~42ms	20.82ms	56.53ms	Raw TCP
P80	~47ms	23.74ms	60.24ms	Raw TCP

Why UPX Backfired Dramatically

UPX compression didn’t just hurt performance - it made it 33% worse than the original HTTP client:

Average: 56.01ms vs 42ms baseline (33% slower)
P80: 60.24ms vs 47ms baseline (28% slower)
Decompression penalty: ~35ms overhead per cold start

The performance penalty was much worse than expected because:

Heavy decompression cost: UPX decompression takes significant CPU time
Lambda’s ARM64 architecture: Decompression is slower on ARM processors
Memory pressure: Decompression requires additional memory allocation during init
No caching benefit: Each container instance must decompress independently

Critical insight: In Lambda’s execution model, a 70% file size reduction led to a 160% performance degradation.

The Meta-Lesson

This experiment taught me something valuable: sometimes the best way to validate your architecture is to try to replace it.

I set out to prove my custom runtime was unnecessary, but instead discovered:

The hybrid Go+Bash approach has real performance benefits
Container images can outperform ZIP packages for larger deployments
Conventional wisdom doesn’t always apply at scale

Measuring Your Own Workloads

If you’re curious about your own Lambda performance characteristics:

# Get Lambda performance stats
aws logs tail --since 1h "/aws/lambda/your-function" \
  | grep "REPORT" \
  | grep -o -E 'Init Duration: (.+) ms' \
  | cut -d' ' -f 3

This extracts just the init duration values from CloudWatch logs, which you can then analyze for averages, percentiles, and trends.

Benchmark both packaging approaches with your actual code. The results might surprise you.

Conclusion

This journey from pure Bash to optimized hybrid runtime revealed multiple performance insights that challenge conventional Lambda wisdom:

Container images can outperform ZIP packages for larger runtimes (>2-3MB)
Raw TCP sockets deliver 50% faster cold starts than HTTP clients
File compression can hurt performance - UPX made things worse, not better
Code simplicity often beats size optimization in Lambda’s execution model

The Lambda ecosystem is more nuanced than simple rules. As we build sophisticated serverless applications, understanding these performance characteristics becomes crucial.

My lambda-shell-runtime project started as a way to bring Bash scripting to serverless. It ended up revealing that the most valuable discoveries come from systematically challenging your assumptions with real benchmarks.

Sometimes the best way to validate your architecture is to try to replace it.

Want to experiment with hybrid Lambda runtimes? Check out the lambda-shell-runtime project and the benchmark code used in this analysis.

Shell on Lambda: The Complete Journey — full evolution from pure Bash to sub-25ms layers
Test Lambda Functions Locally — run functions on your laptop with the Runtime Interface Emulator
Lambda Custom Runtime for Shell Scripts — the original container image approach
A Terraform Module for Shell Functions on Lambda — the current layer-based architecture