Post

Playwright Test Sharding: How to Split Tests Across Workers

TraceLoom Team · April 8, 2026

A 500-test Playwright suite takes 30–45 minutes running on a single CI runner. Split that same suite across 10 workers and the wall-clock time drops to 3–5 minutes. The technique is called sharding — distributing tests across multiple machines so they run in parallel.

Playwright ships a built-in --shard flag. It works. But it has limitations that matter at scale. This post covers how Playwright sharding works, where it breaks down, and how duration-aware sharding solves the imbalance problem.

What is Playwright Test Sharding?

Playwright test sharding splits a test suite into N roughly equal segments, each assigned to a separate worker machine. Each worker runs only its assigned segment, and results are merged after all workers finish.

Sharding is the standard approach to parallel test execution across multiple machines. It differs from Playwright’s built-in workers option, which parallelizes tests within a single machine’s CPU cores. Sharding distributes across machines; the workers option distributes across processes on one machine.

Sharding matters because test suites grow with the product. Teams with 200+ Playwright tests routinely see 20–40 minute CI runs on a single runner — GitLab 2024 DevSecOps Survey. Sharding turns that into a parallelization problem with a straightforward solution.

How Does Playwright’s Built-In Sharding Work?

Playwright’s --shard CLI flag splits the test file list into N equal-count segments:

# Split the suite into 5 shards and run shard 1
npx playwright test --shard=1/5

# Run shard 2 of 5
npx playwright test --shard=2/5

Playwright assigns test files (not individual tests) to shards. If your suite has 50 test files and you request 5 shards, each shard gets 10 files. The assignment is deterministic — the same --shard=1/5 always gets the same files, given the same test list.

Setting Up Sharding in CI

In GitHub Actions, sharding uses a matrix strategy:

jobs:
  test:
    strategy:
      matrix:
        shard: [1, 2, 3, 4, 5]
    steps:
      - uses: actions/checkout@v4
      - run: npx playwright install --with-deps
      - run: npx playwright test --shard=${{ matrix.shard }}/5

Each matrix job runs on a separate runner, executing one shard. The total suite completes when the slowest shard finishes.

Merging Results

After all shards complete, merge the results using Playwright’s blob reporter:

# Each shard produces a blob report
npx playwright test --shard=1/5 --reporter=blob

# After all shards finish, merge into a single HTML report
npx playwright merge-reports ./blob-reports --reporter=html

Playwright’s blob reporter outputs a binary file per shard. The merge-reports command combines them into a unified HTML report showing all tests across all shards.

Where Does Equal-Count Sharding Break Down?

Playwright’s --shard flag distributes by file count, not by test duration. This creates an imbalance problem when test files have different runtimes.

Consider a suite with 20 test files sharded across 4 workers:

Worker	Files	Total Duration
Worker 1	5 files (login, checkout, payments, subscriptions, admin)	12 minutes
Worker 2	5 files (search, filters, sorting, pagination, empty-states)	4 minutes
Worker 3	5 files (forms, validation, dropdowns, modals, tooltips)	3 minutes
Worker 4	5 files (navigation, breadcrumbs, sidebar, footer, header)	2 minutes

Each worker gets 5 files — perfectly “equal.” But Worker 1 takes 12 minutes because it drew the slow, integration-heavy test files, while Worker 4 finishes in 2 minutes and sits idle for 10 minutes. The total run time is 12 minutes instead of the theoretical ~5 minutes with perfect balance.

This imbalance gets worse as suite complexity grows. Equal-count sharding works well for suites where all test files take roughly the same time. For suites with mixed file durations — which is most real-world suites — equal-count sharding wastes worker time.

What is Duration-Aware Sharding?

Duration-aware sharding assigns test files to workers based on their historical runtime, not their count. The goal is to balance the total elapsed time per worker, minimizing the critical path (the slowest worker’s finish time).

The algorithm works like a bin-packing problem:

Collect timing data from previous runs. Store the duration of each test file in a database.
Sort files by duration in descending order (longest first).
Assign each file to the worker with the lowest accumulated time — a greedy approach to bin packing.
Fall back to equal-count for new files with no timing history.

Using the same example from above with duration-aware sharding:

Worker	Files	Total Duration
Worker 1	payments (6 min), tooltips (0.5 min), header (0.3 min)	6.8 minutes
Worker 2	checkout (4 min), modals (1 min), footer (0.4 min), breadcrumbs (0.3 min)	5.7 minutes
Worker 3	subscriptions (3 min), forms (2 min), navigation (0.5 min)	5.5 minutes
Worker 4	admin (3 min), login (2 min), validation (1 min)	6.0 minutes

The slowest worker finishes in 6.8 minutes instead of 12 — a 43% reduction in wall-clock time from the same test suite on the same number of workers.

TraceLoom’s sharding algorithm uses this approach. Before launching EC2 workers, the orchestrator Lambda reads historical test durations from DynamoDB and computes shard assignments that minimize the longest worker runtime — TraceLoom architecture documentation, March 2026.

How to Choose the Right Shard Count

The optimal shard count depends on three factors: suite size, worker startup time, and the parallelism limit of your infrastructure.

Suite size: More shards reduce per-shard runtime, but each shard has a fixed overhead (machine boot, Playwright install, browser download). For most suites, the point of diminishing returns is roughly:

100–300 tests: 5–10 shards
300–1,000 tests: 10–30 shards
1,000+ tests: 30–50 shards

Worker startup time: If each worker takes 90 seconds to boot, install dependencies, and download browsers, adding workers with fewer than 90 seconds of test work produces no net benefit. Track your startup time and ensure each shard has meaningfully more test work than startup overhead.

Infrastructure limits: GitHub Actions allows up to 256 concurrent jobs per workflow, but practical limits are lower — most organizations have 20–50 concurrent runners. AWS EC2 has higher limits but requires fleet management. TraceLoom manages fleets of 50+ EC2 Spot workers automatically.

Sharding with Trace Capture

When running sharded tests with trace: 'on', each worker produces trace.zip files for its assigned tests. These traces need to be collected and stored centrally for debugging.

The simplest approach is uploading traces to S3 after each shard completes:

# In each shard's CI step, after tests run
aws s3 sync ./test-results/ s3://your-bucket/runs/$RUN_ID/shard-$SHARD_INDEX/

Trace files are typically 0.5–5 MB per test. For a 1,000-test suite with traces enabled, total storage per run is 500 MB–5 GB. S3 Standard storage costs for that volume are under $0.12/month — AWS S3 pricing, 2026.

TraceLoom handles trace upload automatically — each EC2 worker uploads trace.zip files to your S3 bucket as tests complete, and the dashboard links directly to each trace.

Sharding Compared to Other Parallelization Approaches

Approach	Parallelism	Trace Support	Cost Model	Best For
Playwright `--shard` on CI	Limited by CI runner count (typically 5–20)	Manual S3 upload	Per-runner-minute pricing	Small-medium suites, existing CI
GitHub Actions matrix	Up to 256 jobs (practical: 20–50)	No built-in support	GitHub runner minutes	Teams already on GitHub Actions
TraceLoom on EC2 Spot	50+ workers	Automatic trace upload to S3	AWS Spot pricing (~$0.04–0.12/hr per worker)	Large suites, data sovereignty needs
BrowserStack Automate	Per-session pricing (5–25 sessions typical)	Platform-specific format	$249–1,249/month	Teams wanting managed infrastructure

Bottom line: Playwright’s built-in --shard flag works for basic parallelization on any CI platform. For suites over 300 tests where equal-count sharding creates worker imbalance, duration-aware sharding cuts wall-clock time significantly. TraceLoom combines duration-aware sharding with EC2 Spot workers and automatic trace capture — no CI matrix management required.

Related reading:

How to Run Playwright Tests in Parallel on AWS Spot Instances — the full architecture for distributed Playwright execution on Spot.
Accelerate Your CI Pipeline with Distributed Testing — use case page for teams focused on CI speed.
TraceLoom vs Cypress Cloud — how TraceLoom’s Playwright-native sharding compares to Cypress Cloud’s parallelization.

Last updated: April 2026

Back to Blog