Post

EC2 Spot Instances for Testing: Fault-Tolerant Tests Are a Perfect Fit

TraceLoom Team · May 6, 2026

EC2 Spot instances offer compute at 60–90% below on-demand pricing. The tradeoff: AWS can reclaim a Spot instance with 2 minutes’ notice when capacity is needed elsewhere. For long-running stateful workloads, that’s a problem. For Playwright tests — which are short, stateless, and individually retryable — Spot interruptions are a non-issue.

This post covers why testing workloads are a natural fit for Spot, how to handle interruptions gracefully, and what the real cost savings look like.

What Are EC2 Spot Instances?

EC2 Spot instances are spare AWS compute capacity offered at steep discounts compared to on-demand instances. AWS sells unused capacity at variable pricing — typically 60–90% below on-demand rates — with the condition that instances can be reclaimed when AWS needs the capacity back.

Spot instances are the same hardware as on-demand instances. A c6i.xlarge Spot instance has the same 4 vCPUs, 8 GB RAM, and network performance as a c6i.xlarge on-demand instance. The only difference is the pricing model and the possibility of interruption.

Spot pricing as of early 2026 for common test-runner instance types in us-east-1:

c6i.xlarge (4 vCPU, 8 GB): $0.04–0.06/hour Spot vs. $0.17/hour on-demand — AWS EC2 pricing, 2026
c6i.2xlarge (8 vCPU, 16 GB): $0.08–0.12/hour Spot vs. $0.34/hour on-demand — AWS EC2 pricing, 2026
c7i.xlarge (4 vCPU, 8 GB): $0.05–0.07/hour Spot vs. $0.178/hour on-demand — AWS EC2 pricing, 2026

Spot instances matter for testing because test execution is pure compute spend — CPU-intensive, short-duration, and recurring. The savings compound daily: a team running 3 test runs per day with 50 workers saves 60–90% on every run.

Why Are Playwright Tests a Good Fit for Spot Instances?

Not every workload belongs on Spot. Stateful databases, long-running training jobs, and servers handling live traffic are poor candidates because interruptions cause data loss or service disruption. Playwright tests are the opposite.

Short duration. Most Playwright test runs complete in 2–15 minutes. The average Spot interruption rate for c6i instances in us-east-1 is below 5% — AWS Spot Instance Advisor, 2026. For a 5-minute workload, the probability of interruption during any given run is under 1%.

Stateless. Each Playwright test starts fresh — a new browser context, no shared state between tests. If an instance is interrupted, no data is lost. The test simply reruns on another worker.

Individually retryable. Playwright’s retries configuration handles test-level retries automatically. If a Spot interruption kills a worker mid-test, the orchestrator can requeue that test’s shard for another worker. The tests that completed before the interruption already uploaded their traces — only the remaining tests need to rerun.

Embarrassingly parallel. Each test is independent and can run on any machine. Losing one worker out of 50 means losing 2% of capacity, not a cascading failure.

These four properties — short, stateless, retryable, parallel — are exactly the characteristics AWS recommends for Spot workloads — AWS Well-Architected Framework, 2024.

How to Handle Spot Interruptions for Testing Workloads

AWS sends a 2-minute interruption notice before reclaiming a Spot instance. For testing, the interrupt handler is straightforward:

1. Detect the Interruption Notice

Monitor the instance metadata endpoint for the interruption warning:

# Poll every 5 seconds for the interruption notice
while true; do
  HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" \
    http://169.254.169.254/latest/meta-data/spot/instance-action)
  if [ "$HTTP_CODE" -eq 200 ]; then
    echo "Spot interruption notice received"
    # Trigger graceful shutdown
    break
  fi
  sleep 5
done

2. Finish the Current Test and Upload Results

When the interruption notice arrives, the worker has 2 minutes. That’s enough time to:

Let the currently running test finish (most Playwright tests complete in under 60 seconds)
Upload any completed trace.zip files to S3
Report the remaining unfinished tests back to the orchestrator

3. Requeue Unfinished Tests

The orchestrator marks the interrupted shard as partially complete. Tests that uploaded traces are marked as done. The remaining tests are requeued for another worker to pick up.

TraceLoom’s bootstrap script handles all three steps automatically. Each EC2 worker runs a Spot interruption handler that monitors the metadata endpoint, uploads partial results, and requeues remaining tests to SQS — TraceLoom architecture documentation, March 2026.

What Do Spot Interruptions Actually Look Like in Practice?

Spot interruption rates vary by instance type, region, and availability zone. AWS publishes interruption frequency data in the Spot Instance Advisor.

For c6i instances (Intel-based compute-optimized, the most common choice for Playwright testing):

Region	Interruption Frequency	Interpretation
us-east-1	<5%	Less than 5% of instances interrupted per month
us-west-2	<5%	Less than 5% of instances interrupted per month
eu-west-1	<5%	Less than 5% of instances interrupted per month

— AWS Spot Instance Advisor, March 2026

For a test run using 50 c6i.xlarge instances for 5 minutes, the expected number of interruptions per run is approximately 0.02 — meaning you’d see one interruption every ~50 runs on average. For daily CI (3 runs/day), that’s roughly one interruption every 2–3 weeks.

When an interruption does occur, the impact is minimal. With 50 workers, losing one worker means rerunning approximately 2% of the test suite. With TraceLoom’s automatic requeue, the user sees a slightly longer run time but no lost results and no manual intervention.

Multi-AZ and Instance Type Diversification

The best strategy for minimizing Spot interruptions is diversifying across availability zones and instance types. AWS is more likely to reclaim capacity in a single AZ or a single instance type than across a diversified fleet.

Availability zone diversification: Request Spot capacity across all AZs in your region. If us-east-1a faces a capacity squeeze, us-east-1b and us-east-1c likely have capacity available.

Instance type diversification: Instead of requesting only c6i.xlarge, also accept c6i.large, c5.xlarge, c7i.xlarge, and m6i.xlarge. Playwright tests don’t require a specific instance type — any compute-optimized or general-purpose instance with 4+ vCPUs works.

TraceLoom’s fleet configuration uses both strategies by default. The launch template specifies multiple instance types across all availability zones, and the EC2 Fleet API’s lowestPrice allocation strategy selects the cheapest available capacity — TraceLoom CDK stack configuration, March 2026.

Cost Savings at Scale

The Spot savings are straightforward to calculate. Here’s the monthly cost comparison for a team running 1,000 tests across 50 workers, 3 runs per day:

Compute Model	Worker Cost/Hour	Workers	Run Duration	Runs/Day	Monthly Compute
On-demand (c6i.xlarge)	$0.17	50	5 min	3 × 30	~$64
Spot (c6i.xlarge)	$0.05	50	5 min	3 × 30	~$19
Savings					~$45/month (70%)

At $45/month in compute savings, the absolute number is modest for a small team. The savings scale linearly: a team running 5,000 tests across 100 workers saves approximately $225/month in compute alone.

The larger savings come from comparing Spot-based testing to managed platforms:

Platform	Monthly Cost (1,000 tests, 3 runs/day)
BrowserStack Automate (25 sessions)	$1,249
GitHub Actions (10 runners)	$405–765
EC2 Spot + TraceLoom	~$109 (including $79 platform fee)

— Cost comparison: TraceLoom internal benchmarks, BrowserStack pricing page, GitHub Actions pricing, March 2026

The Spot discount isn’t the whole story. The shift from per-session/per-minute vendor pricing to raw AWS compute pricing is where the cost structure fundamentally changes.

Common Concerns About Spot for Testing

“What if all my workers get interrupted at once?” Extremely unlikely with AZ and instance type diversification. AWS reclaims capacity incrementally, not in bulk. In TraceLoom production data, simultaneous interruption of more than 10% of a fleet has never occurred — TraceLoom operational data, March 2026.

“Spot pricing is variable — what if it spikes?” Set a maximum price. If the Spot price exceeds your maximum, new instances aren’t launched, but running instances aren’t affected until reclaimed. For testing, setting the max at 50% of on-demand pricing still saves 40%+ and avoids price spikes.

“My tests need a specific instance type.” Playwright tests are CPU-bound, not GPU-bound or memory-bound (beyond ~4 GB). Any compute-optimized instance with 4+ vCPUs runs Playwright well. Instance type flexibility is one of the reasons testing is an ideal Spot workload.

Bottom line: EC2 Spot instances cost 60–90% less than on-demand, and Playwright tests are a near-perfect Spot workload — short, stateless, retryable, and embarrassingly parallel. The interruption risk for a 5-minute test run is under 1%. TraceLoom manages Spot fleet provisioning, interruption handling, and automatic requeue, so your tests run on the cheapest available compute with no manual fleet management.

Related reading:

The True Cost of Running Playwright Tests at Scale — full cost comparison across GitHub Actions, BrowserStack, Sauce Labs, and BYOC.
Cut Testing Costs with BYOC Infrastructure — use case for teams focused on reducing testing spend.
How to Run Playwright Tests in Parallel on AWS Spot Instances — the full architecture for distributed test execution on Spot.

Last updated: May 2026

Back to Blog