D365 AI-Copilot Test Framework for an Enterprise CRM Modernisation Client

Engagement context

I was engaged as QE Architect and Consultant on a Microsoft Dynamics 365 Sales Hub programme for an enterprise CRM modernisation client. The engagement began as a proof-of-concept to determine whether a modern Playwright-based test framework could be built to address D365's unique testing challenges — a stack most senior automation engineers have not worked against directly. D365's model-driven UI is significantly harder to test than a conventional web application: the interface is rendered dynamically through nested iframes, ribbon controls shift with security role context, Business Process Flow (BPF) visualisations introduce timing-sensitive state transitions, and the underlying Dataverse data layer adds an API surface that conventional test frameworks don't address. The POC demonstrated production-quality viability across all three layers — UI, API, and hybrid — and converted to an ongoing engagement with the D365 framework adopted as the client's production test automation standard. D365 sits within the broader enterprise QA practice context covered in the Enterprise QA Leadership — 6-Year Multi-Programme Tenure, which established the reusable-spine, per-engagement-variance framework architecture that this engagement extends into the Microsoft stack.

The four layers address distinct surfaces but compose into single test flows — API-created data, UI-validated outcomes, and hybrid round-trips where both surfaces participate in the same scenario.

The win

"I designed and shipped a D365-specialised Playwright + TypeScript test framework end-to-end — D365-aware base pages, nested-iframe handling, Business Process Flow helpers, a Dataverse Web API service layer with OAuth and typed entity helpers, hybrid API→UI flow composition, a tag-driven parallel execution harness, and AI copilots grounded in the framework's own conventions for test generation and Tier 4 self-healing locator re-binding. The POC proved the viability of treating D365 as a properly-layered testable system rather than an opaque UI surface — and the framework patterns are reusable across subsequent D365 engagements."

Building the D365-aware framework (Layer 1–4 architecture)

Standard Playwright frameworks do not handle D365's quirks. The iframe nesting alone — content embedded two to three levels deep — breaks naïve locator strategies. Ribbon controls render differently depending on security role. BPF stage transitions are timing-sensitive, and the test data story is complicated by Dataverse's entity model: contacts, accounts, and opportunities are related objects with business rules enforced at the data layer, not just the UI. UI-based test data setup is slow and brittle; API-based setup requires OAuth, schema awareness, and typed helpers.

Each decision rejects the simpler alternative for a reason grounded in D365's specific constraints — rejected options are faster to build and measurably worse to maintain on this stack.

I designed a four-layer architecture to address all of these surfaces in a single coherent framework.

D365 Test Framework (Playwright + TypeScript)

Layer 1: D365-Aware Base Pages

EntityPage — form, grid, view abstractions
Frame-aware locator resolution (nested iframe traversal)
BPF helpers (advance stage, assert stage, query active flow)
Ribbon + command bar interactions
Timeline + quick-create dialog handlers

Layer 2: Dataverse Web API Service Layer

OAuth client credentials flow (app registration in Entra)
Typed helpers: contacts, accounts, opportunities, custom entities
Schema validation against Dataverse metadata
Batch operations for multi-record setup

Layer 3: Fixtures + Data Factories

Playwright fixtures for D365 session (login, role-switching)
Data factories: API-create record → return typed reference
Teardown via API batch delete

Layer 4: Hybrid Flow Patterns

"API setup → UI validation" composition pattern
"UI action → API assertion" composition pattern
Cross-surface contact search, nested iframe handling

The Dataverse Web API service layer was particularly consequential. By moving test data setup off the UI and onto the API — using OAuth-authenticated client credentials against the Dataverse endpoint — test setup became materially faster and substantially more reliable. Typed helpers for contacts, accounts, and opportunities enforced schema alignment at compile time, eliminating a category of data-mismatch failures that had previously surfaced only at runtime. Hybrid flow patterns — API-create followed by UI-validate, or UI-action followed by API-assert — gave test authors a clean composition model for the scenarios where both surfaces needed to participate.

The architectural choice to build D365-specific page objects rather than adopting EasyRepro (Microsoft's legacy Selenium-based library) was deliberate. EasyRepro is increasingly unmaintained; Playwright is faster, more capable, and naturally fits a TypeScript-native stack. The initial investment in D365-aware page objects amortises quickly when the same patterns are reused across subsequent customer engagements — each new engagement inherits the iframe handling, BPF helpers, and Dataverse service layer rather than rebuilding from scratch.

Tag-driven harness and high-value workflow coverage

The framework architecture addressed the surface. The execution discipline addressed how tests are organised, selected, and parallelised at scale.

I established a tag-driven harness with a structured taxonomy. Domain tags (@sales-hub, @dataverse) identify the product surface. Type tags (@smoke, @regression, @e2e) define execution policy — smoke runs fast against every deployment, regression runs nightly, e2e runs for full business-cycle scenarios. Priority tags (@p0, @p1, @p2) encode risk weighting so CI gates can run the highest-priority subset without executing the full suite on every commit. Feature tags (@contact-search, @opportunity-management, @bpf) allow targeted execution when a specific area is under change.

Tags are the execution contract between the framework and CI. @p0 @smoke runs fast against every deployment; the nightly @regression envelope picks up the rest. Feature tags mean a targeted area change does not require the full suite.

All tests follow Arrange-Act-Assert structure enforced via Playwright fixture patterns. A lint rule catches tests that mix arrangement and assertion concerns in the same block — small overhead at authoring, significant benefit at maintenance.

Parallel execution required data isolation. Because D365 environments are expensive to provision and shared across test runs, test data needed to be isolated per test rather than per environment. The solution was UUID-suffixed record creation via the Dataverse Web API — each test creates its own contact or account with a unique identifier, runs against it, and the teardown batch-deletes by name pattern. Environments are shared safely; test-to-test contention is structurally prevented.

Hybrid composition is the architecture's payoff: API setup is fast and schema-safe while UI validation confirms the user-facing outcome. The two patterns together cover the full business scenario without either surface carrying the full testing burden.

The high-value workflow set covered the Sales Hub scenarios with the highest business risk: cross-surface contact search across Sales and Service Hub, nested-iframe BPF flows embedded three levels deep, end-to-end Dataverse-plus-UI validations (API-create, UI-verify, API-update, UI-verify), and the full opportunity lifecycle from creation through stages to close. The tag taxonomy meant CI gates could run the @p0 @smoke subset against every deployment — fast and cheap — while the nightly regression run executed the full @regression envelope.

AI copilots grounded in the D365 framework

Generic AI test generation produces D365-incompatible output. A model generating Playwright code without knowledge of the framework's conventions will not use EntityPage helpers, will not traverse iframes correctly, will not compose hybrid API→UI flows, and will produce code that fails immediately on a D365 surface. The value of AI-assisted generation depends entirely on the model knowing what it is generating into.

I integrated AI copilots into the framework at two points: test generation and failure triage.

Both copilots are grounded in the D365 framework's own conventions — generic AI output fails on D365 surfaces without this context. The generation copilot expands coverage; the triage copilot absorbs release-wave locator drift that page-object discipline alone cannot prevent.

For test generation, the copilot prompt context included the full framework API reference — Layer 1 through Layer 4 — so generated test code uses the D365-aware base pages, Dataverse service layer helpers, fixture patterns, and tag taxonomy correctly. User stories describing Sales Hub workflows are the input; structured Playwright test code that matches house conventions is the output. Authors review and curate rather than author from scratch. Test authoring throughput expanded significantly, and edge-case coverage — particularly for BPF state transition paths and security-role-specific form behaviour — improved meaningfully because the copilot applies the full framework surface to each scenario systematically.

For failure triage, I extended the framework's Tier 4 self-healing pattern with D365-specific awareness — the generic reference architecture for this pattern, covering the tiered locator-recovery model, LLM guardrails, and confidence-scoring logic, is documented in the Agentic / Self-Healing Test Framework pattern; what follows describes the D365-specific extensions layered on top of it. When a locator fails, standard self-healing inspects the current DOM and proposes a corrected locator. D365-aware self-healing goes further: the LLM has access to the Dataverse entity metadata for the form under test — field schema name, display name, control type — and proposes a corrected locator that is both visually accurate and schema-aligned. This addresses a specific D365 failure mode where the visual label and the underlying schema name diverge, a common source of re-binding errors in other frameworks. D365's twice-yearly release waves introduce UI changes that drift locators; the Tier 4 extension absorbed the residual drift that page-object discipline alone could not prevent.

The triage rubric also extended the generic web-app failure taxonomy with D365-specific categories: iframe latency (test acted before the frame completed loading), Dataverse rate limiting (429 from API setup batch operations), BPF stage transition not yet visible in the UI, security role mismatch (test ran under the wrong user role context), and form-field display-name versus schema-name confusion. D365 failure triage with these categories in the rubric was significantly faster than generic web-app triage — the model routes correctly on the first pass rather than escalating for human disambiguation.

These five categories do not exist in a generic web-app triage rubric. With them in place, the model routes correctly on the first pass — human escalation is reserved for genuine ambiguity, not D365-specific patterns a generic model would misclassify.

Tier 4 self-healing — D365-aware locator re-binding call sequence

Loading diagram…

The framework, not the LLM, is the source of truth. Dataverse entity metadata grounds the proposal in schema reality — not just visual DOM shape. Every heal is logged and reversible.

Engagement summary

Field	Detail
Duration	Sep 2025 – current
Role	QE Architect and Consultant
Reporting line	Client QA Lead / Programme Delivery Manager
Team	Mixed-skill delivery team — functional consultants and QE Architects

Reference

Reference from QA Lead / Programme Delivery Manager available on request at screen stage.