Selenium → Playwright Migration with CI/CD Improvements

Problem statement

Enterprise QA programmes accumulate Selenium suites over years. A five-year-old Selenium-Java framework typically carries the hallmarks: explicit-wait sprawl, XPath chains coupled tightly to DOM structure, self-hosted Grid or cloud-grid spend growing quarter-on-quarter, and a flake rate that quietly erodes gate trust. At some point the engineering leadership decides the maintenance cost now exceeds the migration cost — and a migration brief arrives on the desk of whoever owns the QA architecture.

The phased approach removes release risk. Phase 0 baseline measurement is non-negotiable — it is the evidence base for the ROI case at Phase 3. Skipping it means the migration cannot defend itself to stakeholders.

The brief usually arrives framed as a tool swap. The correct architectural lens is wider. A migration from Selenium to Playwright is a framework redesign opportunity, not a library substitution. If the Selenium patterns — explicit waits, PageFactory locator binding, TestNG DataProviders, Grid session handling — are ported 1:1 into Playwright, the team inherits the same structural fragility in a new wrapping. The value of Playwright's auto-wait model, lazy-locator fixtures, worker-based in-process parallelism, and storage-state auth patterns is only realised through a deliberate redesign.

Before solving, I ask the questions that shape the migration architecture: How many tests, and what proportion are real user-journey E2E versus backend logic dressed up as UI tests? Is the Grid self-hosted on VMs, cloud (Sauce/BrowserStack), or hybrid — because the infra-savings ROI shape differs materially. What is the team's language exposure — Java-only, or TypeScript-curious? What does the CI shape look like today? And critically — what can be deleted? Most Selenium suites of five-plus years carry 20–40% dead, wrong-layer, or low-value tests. Migration is the cheapest moment to delete them. For a real engagement where these questions shaped the full migration — five-year-old Selenium-Java, self-hosted Grid, cohort ramp, shadow mode, and atomic cutover — see the Enterprise Selenium → Playwright Migration case study, where this pattern was crystallised at scale.

Reference architecture diagram

Phase 0: Assessment + Cull (Weeks 1–3)

Inventory all tests → classify

keep-as-E2E | pull-down-to-API | pull-down-to-unit | delete

Baseline metrics (defend ROI later)

wall-clock · flake rate · CI cost/run · runs/day

Framework decision brief (ADR)

locator strategy · auth model · parallelism shape

Phase 1: Pilot + Pattern Lock (Weeks 3–7)

Scope

30–50 tests · one critical user journey · greenfield Playwright + TypeScript
POM with lazy locator getters (not PageFactory)
storageState auth fixtures replacing per-test login
Containerised CI: --shard=i/N across N runners

Exit criteria

flake < 2%
p95 wall-clock per test < 30s

Phase 2: Parallel Suite + Ramp (Weeks 7–18)

Tracks

Selenium → release gate (active)
Playwright → shadow mode (informational)

Ramp

Per-week migration cohorts (40–80 tests each)
Cutover criterion: Playwright shadow ≥ 98% pass rate, 2 consecutive weeks
Team uplift: TS primer · trace-viewer training · pair weeks 1–4

Phase 3: Cutover + Decommission (Weeks 18–26)

Cutover

Playwright becomes release gate
Selenium shadow for 2–3 weeks (regression-on-regression)
Selenium Grid / cloud-grid subscription wound down
Repo tagged + archived (not deleted — institutional reference)
ROI re-measured against Phase 0 baseline; published

Design decisions

Tool selection: why Playwright over staying on Selenium

The case for migration rests on three compounding improvements. First, auto-wait removes the explicit-wait layer entirely — Playwright waits for elements to be actionable before interacting, which eliminates the WebDriverWait.until(...) boilerplate that inflates Selenium test runtime and conceals race conditions when written lazily as Thread.sleep(). Second, Playwright's worker-based in-process parallelism removes the grid handshake overhead: each worker gets an isolated browser context without a Hub, and scaling is bounded only by CI runner count rather than grid licence. Third, the trace-viewer fundamentally changes triage economics: a failed test ships a full DOM snapshot, network log, and console recording for every step, reducing the "reproduce-then-investigate" loop from 15–30 minutes to under 5.

The alternative — keeping Selenium and stabilising — is the right call when the suite is within 12 months of decommission anyway, or when the organisation cannot absorb a TypeScript ramp. Playwright-Java exists but is a second-class citizen: slower release cadence, weaker tooling ecosystem, fewer community examples. The TypeScript path is the only one that aligns with where the ecosystem invests.

Migration architecture: framework redesign over literal port

A literal port from Selenium to Playwright preserves the anti-patterns: XPath chains, explicit-wait wrappers, eagerly-resolved @FindBy fields, TestNG DataProvider idioms that don't map to Playwright fixtures.

Each decision has a rejected alternative that is faster to ship and worse to live with. Locking these in during the pilot phase — before the bulk migration starts — is what prevents the new suite from inheriting Selenium's structural fragility.

The result is a Playwright suite that underperforms and inherits fragility.

The redesign decisions I lock in during the pilot phase: locator priority order (role-first via getByRole, then visible text via getByText, then test-id via getByTestId, then CSS — XPath banned except for provable edge cases); page objects expose locators as lazy getter properties, not constructor-resolved fields; auth handled via storageState fixtures per role; test isolation enforced per-test via separate browser contexts, not shared sessions. The AI-Augmented Playwright Test Pipeline for a Large Australian Energy Company deploys these same structural decisions — role-first locators, storageState auth, sharded CI — on a greenfield programme with no Selenium legacy; it shows what the framework looks like when none of the migration constraints apply.

Parallel-suite strategy: shadow mode before cutover

Big-bang cutover — Playwright lands, Selenium switches off the same week — carries unacceptable release risk when the Selenium suite is gating production. The shadow-mode approach runs Playwright in parallel but informational for 2–3 months: the same CI builds that Selenium gates are also executed by Playwright, and the results surfaced as a separate check. Cutover happens only when Playwright shadow pass rate sits at or above 98% for two consecutive weeks and coverage equivalence on critical user journeys is verified.

The dual-suite window has a real cost, but it is finite and bounded. The Selenium infrastructure cost is not. Model the window cost explicitly and include it in the total migration cost presented to stakeholders — it makes the business case defensible.

Shadow mode — how a build triggers both suites during Phase 2

Loading diagram…

During the dual-suite window, every CI build runs both job groups. Selenium failures block merge; Playwright shadow failures create a dashboard annotation only. The shadow job group is promoted to gate status once cutover criteria are met, and the Selenium job group is removed.

The dual-suite window has a real cost — two test suites running per build. That cost is finite and bounded; the Selenium infrastructure cost is not. I always model the dual-suite window cost explicitly and include it in the total migration cost presented to stakeholders.

Culling: migration as forcing function for suite hygiene

Most organisations do not want to admit they have 200 dead tests. Migration is the moment to force the question, because the cost of migrating a dead test is nonzero and the cost of deleting it is bounded. I run a per-test classification at Phase 0 — keep-as-E2E, pull-down-to-API, pull-down-to-unit, dead/duplicate, low-value — and target a 20–40% cull rate before migration starts.

The classification record becomes a stakeholder artefact. Who signed off which tests were deleted and why is non-optional — a team that skips this will migrate the bloat and wonder why the new suite is slow.

The classification record becomes a stakeholder artefact: who signed off which tests were deleted and why. This is not optional; a team that skips it will migrate the bloat and wonder why the new suite is slow.

ROI framing: the CFO question

The infrastructure savings are real but bounded. A self-hosted Selenium Grid on 8–12 VMs or an equivalent cloud-grid subscription is a line item that migration can eliminate. For a mid-size suite the infrastructure saving over a 12-month window is material and defensible.

The more important ROI story, and the harder one to tell to a CFO, is the feedback-loop dividend: faster tests run more frequently, which means defect escape windows shorten and developer-hour cost of fixing bugs drops. Playwright's sharded pipeline moves a suite that ran nightly to running multiple times per day on PR merge. The compound effect on defect escape cost is larger than the infrastructure saving but requires a credible baseline measurement to defend. That is why Phase 0 baseline measurement is non-negotiable.

What not to do

Do not allow page.waitForTimeout() in migrated code. If a test requires a hard timeout to pass, the test is wrong, not Playwright. Auto-wait covers approximately 95% of cases; the remaining 5% are solved with page.waitForResponse, expect.poll, or page.waitForLoadState('networkidle') used with deliberate intent. Any hard timeout in a migration PR should be a code-review rejection without exception — the cost of establishing that norm in the first 50 tests is far lower than the cost of re-establishing it after 500.

Do not defer the team language transition. A Java-fluent team transitioning to TypeScript will be slower for 2–4 weeks. Planning for that ramp is not optional. The single biggest accelerant is the trace-viewer demo in week one: when an engineer sees triage drop from 20 minutes to 2 minutes, the language transition stops being the resistance point.

Code snippets

1. Page object with lazy locators (TypeScript)

// src/pages/LoginPage.ts
import { type Page } from '@playwright/test';

export class LoginPage {
  constructor(private readonly page: Page) {}

  // Lazy getters — resolved at action time, not at construction
  get emailInput() { return this.page.getByRole('textbox', { name: 'Email' }); }
  get passwordInput() { return this.page.getByLabel('Password'); }
  get submitButton() { return this.page.getByRole('button', { name: 'Log in' }); }

  async login(email: string, password: string) {
    await this.emailInput.fill(email);
    await this.passwordInput.fill(password);
    await this.submitButton.click();
  }
}

2. Storage state auth fixture (TypeScript)

// global-setup.ts
import { chromium, type FullConfig } from '@playwright/test';
import path from 'path';

async function globalSetup(_config: FullConfig) {
  const browser = await chromium.launch();
  const page = await browser.newPage();

  await page.goto(process.env.BASE_URL + '/login');
  await page.getByRole('textbox', { name: 'Email' }).fill(process.env.ADMIN_EMAIL!);
  await page.getByLabel('Password').fill(process.env.ADMIN_PASSWORD!);
  await page.getByRole('button', { name: 'Log in' }).click();
  await page.waitForURL('**/dashboard');

  await page.context().storageState({
    path: path.join('playwright', '.auth', 'admin.json'),
  });
  await browser.close();
}

export default globalSetup;

3. GitHub Actions — sharded Playwright run

# .github/workflows/playwright.yml
name: Playwright Tests
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        shard: [1, 2, 3, 4]

    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20

      - run: npm ci
      - run: npx playwright install --with-deps chromium

      - name: Run shard ${{ matrix.shard }}/4
        run: npx playwright test --shard=${{ matrix.shard }}/4
        env:
          BASE_URL: ${{ vars.BASE_URL }}
          ADMIN_EMAIL: ${{ secrets.ADMIN_EMAIL }}
          ADMIN_PASSWORD: ${{ secrets.ADMIN_PASSWORD }}

      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: playwright-report-shard-${{ matrix.shard }}
          path: playwright-report/

4. Azure DevOps — parallel Playwright run (YAML pipeline)

# azure-pipelines-playwright.yml
trigger:
  branches:
    include: [main, develop]

strategy:
  parallel: 4

steps:
  - task: NodeTool@0
    inputs:
      versionSpec: '20.x'

  - script: npm ci && npx playwright install --with-deps chromium
    displayName: Install dependencies

  - script: |
      SHARD_INDEX=$(( $(System.JobPositionInPhase) ))
      npx playwright test --shard=$(SHARD_INDEX)/$(System.TotalJobsInPhase)
    displayName: Run Playwright shard $(System.JobPositionInPhase)/$(System.TotalJobsInPhase)
    env:
      BASE_URL: $(BASE_URL)
      ADMIN_EMAIL: $(ADMIN_EMAIL)
      ADMIN_PASSWORD: $(ADMIN_PASSWORD)

  - task: PublishPipelineArtifact@1
    condition: always()
    inputs:
      targetPath: playwright-report
      artifact: playwright-report-$(System.JobPositionInPhase)

CI/CD integration

The pattern slots into a pipeline at two points: a per-PR fast-feedback run and a full nightly gate.

For the per-PR run I scope to the critical user journeys — login, checkout, the 3–5 flows that gate release confidence — and shard across 4 runners. This keeps per-PR feedback under 15 minutes even for a 300-test suite. For the nightly gate the full suite runs sharded across 8 runners, with HTML report artefacts and a Slack notification on failure.

The shard count is a tunable dial: 4 shards for daily rhythm, 8 for pre-release. Both GitHub Actions and Azure DevOps support matrix/parallel job strategies natively. The GitHub Actions pattern above uses a matrix strategy; the Azure DevOps pattern uses strategy: parallel with System.JobPositionInPhase to derive the shard index. Patterns #3 (Azure DevOps) and Pattern #6 (GitHub Actions) go deeper on CI/CD pipeline architecture; this pattern keeps the test layer in focus.

The dual-suite period — Selenium gating, Playwright in shadow — is CI-shaped as two separate job groups in the same pipeline. Selenium failures block merge; Playwright shadow failures create a dashboard annotation but do not block. The shadow job group is promoted to gate status once the cutover criteria are met, and the Selenium job group is removed.

Stack

Tool	Role	Version guidance
Playwright	E2E test runner	v1.44+ (use latest stable)
TypeScript	Test language	5.x
Node.js	Runtime	20 LTS
GitHub Actions	CI (cloud)	ubuntu-latest runners
Azure DevOps	CI (enterprise)	ubuntu-latest agent
Allure / PW HTML	Reporting	PW built-in HTML report or Allure 2.x

When I'd brief this

The brief only makes sense when the Selenium maintenance cost now exceeds the migration cost. If the engagement is framed as a tool swap rather than a framework redesign, the architectural lens is too narrow to capture the value.

This pattern fits when: the existing Selenium suite is the primary regression gate for a release cadence of daily or faster; the Grid or cloud-grid spend has become a visible line item; flake rate is above 10% and eroding CI gate trust; the team is engineering-led enough to absorb a TypeScript ramp with structured support. Enterprise SI delivery programmes, large-scale digital transformation engagements, and QA-led capability uplift engagements are the typical context — organisations where QA has been a function long enough to accumulate technical debt, not greenfield builds.

The pattern is also the right call when an organisation is evaluating Playwright for a new programme and has a parallel legacy Selenium suite they need to carry temporarily — the parallel-suite architecture applies in both directions.