AI-Driven Quality Engineering Architect · Available for new engagements · Australia

SNK
SNK Digital
Back to Work
Enterprise / CRMSep 2025 – current

D365 AI-Copilot Test Framework for an Enterprise CRM Modernisation Client

D365-aware Playwright framework + Dataverse Web API service layer + AI copilots for test generation and self-healing

PlaywrightTypeScriptD365Dataverse APIOAuthPower PlatformAI copilots

Engagement context

I was engaged as QE Architect and Consultant on a Microsoft Dynamics 365 Sales Hub programme for an enterprise CRM modernisation client. The engagement began as a proof-of-concept to determine whether a modern Playwright-based test framework could be built to address D365's unique testing challenges — a stack most senior automation engineers have not worked against directly. D365's model-driven UI is significantly harder to test than a conventional web application: the interface is rendered dynamically through nested iframes, ribbon controls shift with security role context, Business Process Flow (BPF) visualisations introduce timing-sensitive state transitions, and the underlying Dataverse data layer adds an API surface that conventional test frameworks don't address. The POC demonstrated production-quality viability across all three layers — UI, API, and hybrid — and converted to an ongoing engagement with the D365 framework adopted as the client's production test automation standard. D365 sits within the broader enterprise QA practice context covered in the Enterprise QA Leadership — 6-Year Multi-Programme Tenure, which established the reusable-spine, per-engagement-variance framework architecture that this engagement extends into the Microsoft stack.

Four-layer D365 test framework architecture arranged left to right from foundation to composition. Layer 1 D365-Aware Base Pages covers EntityPage form and grid abstractions, frame-aware locator resolution, BPF helpers, ribbon interactions, and timeline dialogs. Layer 2 Dataverse Web API Service Layer covers OAuth client credentials via Entra, typed helpers for contacts, accounts, and opportunities, schema validation, and batch operations. Layer 3 Fixtures and Data Factories covers D365 session fixtures, API-create data factories with typed references, UUID-suffixed isolation, and teardown via batch delete. Layer 4 Hybrid Flow Patterns covers API setup to UI validation and UI action to API assertion composition patterns.

The four layers address distinct surfaces but compose into single test flows — API-created data, UI-validated outcomes, and hybrid round-trips where both surfaces participate in the same scenario.

The win

"I designed and shipped a D365-specialised Playwright + TypeScript test framework end-to-end — D365-aware base pages, nested-iframe handling, Business Process Flow helpers, a Dataverse Web API service layer with OAuth and typed entity helpers, hybrid API→UI flow composition, a tag-driven parallel execution harness, and AI copilots grounded in the framework's own conventions for test generation and Tier 4 self-healing locator re-binding. The POC proved the viability of treating D365 as a properly-layered testable system rather than an opaque UI surface — and the framework patterns are reusable across subsequent D365 engagements."

Building the D365-aware framework (Layer 1–4 architecture)

Standard Playwright frameworks do not handle D365's quirks. The iframe nesting alone — content embedded two to three levels deep — breaks naïve locator strategies. Ribbon controls render differently depending on security role. BPF stage transitions are timing-sensitive, and the test data story is complicated by Dataverse's entity model: contacts, accounts, and opportunities are related objects with business rules enforced at the data layer, not just the UI. UI-based test data setup is slow and brittle; API-based setup requires OAuth, schema awareness, and typed helpers.

Four architectural decisions shown as cards with rejected alternatives. Decision 1: Playwright over EasyRepro — EasyRepro is increasingly unmaintained while Playwright is faster and TypeScript-native. Decision 2: D365-specific page objects over generic locator helpers — iframe nesting 2 to 3 levels deep breaks naive strategies and BPF transitions need dedicated helpers. Decision 3: API-first test data setup over UI-driven setup — OAuth-authenticated Dataverse API is fast and typed helpers catch schema mismatches at compile time. Decision 4: UUID-suffixed per-test isolation over per-environment isolation — D365 environments are expensive to provision so shared environments with UUID record names prevent contention.

Each decision rejects the simpler alternative for a reason grounded in D365's specific constraints — rejected options are faster to build and measurably worse to maintain on this stack.

I designed a four-layer architecture to address all of these surfaces in a single coherent framework.

D365 Test Framework (Playwright + TypeScript)

Layer 1: D365-Aware Base Pages

  • EntityPage — form, grid, view abstractions
  • Frame-aware locator resolution (nested iframe traversal)
  • BPF helpers (advance stage, assert stage, query active flow)
  • Ribbon + command bar interactions
  • Timeline + quick-create dialog handlers

Layer 2: Dataverse Web API Service Layer

  • OAuth client credentials flow (app registration in Entra)
  • Typed helpers: contacts, accounts, opportunities, custom entities
  • Schema validation against Dataverse metadata
  • Batch operations for multi-record setup

Layer 3: Fixtures + Data Factories

  • Playwright fixtures for D365 session (login, role-switching)
  • Data factories: API-create record → return typed reference
  • Teardown via API batch delete

Layer 4: Hybrid Flow Patterns

  • "API setup → UI validation" composition pattern
  • "UI action → API assertion" composition pattern
  • Cross-surface contact search, nested iframe handling

The Dataverse Web API service layer was particularly consequential. By moving test data setup off the UI and onto the API — using OAuth-authenticated client credentials against the Dataverse endpoint — test setup became materially faster and substantially more reliable. Typed helpers for contacts, accounts, and opportunities enforced schema alignment at compile time, eliminating a category of data-mismatch failures that had previously surfaced only at runtime. Hybrid flow patterns — API-create followed by UI-validate, or UI-action followed by API-assert — gave test authors a clean composition model for the scenarios where both surfaces needed to participate.

The architectural choice to build D365-specific page objects rather than adopting EasyRepro (Microsoft's legacy Selenium-based library) was deliberate. EasyRepro is increasingly unmaintained; Playwright is faster, more capable, and naturally fits a TypeScript-native stack. The initial investment in D365-aware page objects amortises quickly when the same patterns are reused across subsequent customer engagements — each new engagement inherits the iframe handling, BPF helpers, and Dataverse service layer rather than rebuilding from scratch.

Tag-driven harness and high-value workflow coverage

The framework architecture addressed the surface. The execution discipline addressed how tests are organised, selected, and parallelised at scale.

I established a tag-driven harness with a structured taxonomy. Domain tags (@sales-hub, @dataverse) identify the product surface. Type tags (@smoke, @regression, @e2e) define execution policy — smoke runs fast against every deployment, regression runs nightly, e2e runs for full business-cycle scenarios. Priority tags (@p0, @p1, @p2) encode risk weighting so CI gates can run the highest-priority subset without executing the full suite on every commit. Feature tags (@contact-search, @opportunity-management, @bpf) allow targeted execution when a specific area is under change.

Four tag dimensions in a card grid. Domain tags @sales-hub and @dataverse identify the product surface. Type tags @smoke, @regression, and @e2e define when tests run: smoke on every deployment, regression nightly. Priority tags @p0, @p1, and @p2 encode risk weighting so CI gates run the @p0 subset without the full suite on every commit. Feature tags @contact-search, @opportunity-management, and @bpf allow targeted execution when a specific area is under change.

Tags are the execution contract between the framework and CI. @p0 @smoke runs fast against every deployment; the nightly @regression envelope picks up the rest. Feature tags mean a targeted area change does not require the full suite.

All tests follow Arrange-Act-Assert structure enforced via Playwright fixture patterns. A lint rule catches tests that mix arrangement and assertion concerns in the same block — small overhead at authoring, significant benefit at maintenance.

Parallel execution required data isolation. Because D365 environments are expensive to provision and shared across test runs, test data needed to be isolated per test rather than per environment. The solution was UUID-suffixed record creation via the Dataverse Web API — each test creates its own contact or account with a unique identifier, runs against it, and the teardown batch-deletes by name pattern. Environments are shared safely; test-to-test contention is structurally prevented.

Two hybrid flow composition patterns shown side by side. Pattern A API setup then UI validation: inputs are an OAuth Dataverse API call and a typed entity reference; the framework fixture processes them and outputs UI assertions on the created record. Pattern B UI action then API assertion: input is a browser session with a UI interaction; the framework processes it and outputs Dataverse entity state assertions via API.

Hybrid composition is the architecture's payoff: API setup is fast and schema-safe while UI validation confirms the user-facing outcome. The two patterns together cover the full business scenario without either surface carrying the full testing burden.

The high-value workflow set covered the Sales Hub scenarios with the highest business risk: cross-surface contact search across Sales and Service Hub, nested-iframe BPF flows embedded three levels deep, end-to-end Dataverse-plus-UI validations (API-create, UI-verify, API-update, UI-verify), and the full opportunity lifecycle from creation through stages to close. The tag taxonomy meant CI gates could run the @p0 @smoke subset against every deployment — fast and cheap — while the nightly regression run executed the full @regression envelope.

AI copilots grounded in the D365 framework

Generic AI test generation produces D365-incompatible output. A model generating Playwright code without knowledge of the framework's conventions will not use EntityPage helpers, will not traverse iframes correctly, will not compose hybrid API→UI flows, and will produce code that fails immediately on a D365 surface. The value of AI-assisted generation depends entirely on the model knowing what it is generating into.

I integrated AI copilots into the framework at two points: test generation and failure triage.

Two AI copilot integration points shown side by side. Integration point 1 test generation: inputs are user stories describing Sales Hub workflows; the copilot is given the full Layer 1 through Layer 4 framework API reference as context; output is Playwright TypeScript code matching house conventions. Integration point 2 Tier 4 self-healing triage: inputs are locator failures combined with Dataverse entity metadata including field schema name, display name, and control type for the form under test; the copilot proposes a corrected locator that is both visually accurate and schema-aligned.

Both copilots are grounded in the D365 framework's own conventions — generic AI output fails on D365 surfaces without this context. The generation copilot expands coverage; the triage copilot absorbs release-wave locator drift that page-object discipline alone cannot prevent.

For test generation, the copilot prompt context included the full framework API reference — Layer 1 through Layer 4 — so generated test code uses the D365-aware base pages, Dataverse service layer helpers, fixture patterns, and tag taxonomy correctly. User stories describing Sales Hub workflows are the input; structured Playwright test code that matches house conventions is the output. Authors review and curate rather than author from scratch. Test authoring throughput expanded significantly, and edge-case coverage — particularly for BPF state transition paths and security-role-specific form behaviour — improved meaningfully because the copilot applies the full framework surface to each scenario systematically.

For failure triage, I extended the framework's Tier 4 self-healing pattern with D365-specific awareness — the generic reference architecture for this pattern, covering the tiered locator-recovery model, LLM guardrails, and confidence-scoring logic, is documented in the Agentic / Self-Healing Test Framework pattern; what follows describes the D365-specific extensions layered on top of it. When a locator fails, standard self-healing inspects the current DOM and proposes a corrected locator. D365-aware self-healing goes further: the LLM has access to the Dataverse entity metadata for the form under test — field schema name, display name, control type — and proposes a corrected locator that is both visually accurate and schema-aligned. This addresses a specific D365 failure mode where the visual label and the underlying schema name diverge, a common source of re-binding errors in other frameworks. D365's twice-yearly release waves introduce UI changes that drift locators; the Tier 4 extension absorbed the residual drift that page-object discipline alone could not prevent.

The triage rubric also extended the generic web-app failure taxonomy with D365-specific categories: iframe latency (test acted before the frame completed loading), Dataverse rate limiting (429 from API setup batch operations), BPF stage transition not yet visible in the UI, security role mismatch (test ran under the wrong user role context), and form-field display-name versus schema-name confusion. D365 failure triage with these categories in the rubric was significantly faster than generic web-app triage — the model routes correctly on the first pass rather than escalating for human disambiguation.

Five D365-specific failure triage categories shown as a card row. Category 1 iframe latency: test acted before the nested frame completed loading, frames embed 2 to 3 levels deep. Category 2 Dataverse 429: rate limiting from API setup batch operations under parallel runs against a shared environment. Category 3 BPF transition: BPF stage not yet visible in the UI, state transitions are timing-sensitive. Category 4 role mismatch: test executed under the wrong security role context, ribbon controls shift with role. Category 5 display versus schema name: field display label and Dataverse schema name diverge, a common source of re-binding errors.

These five categories do not exist in a generic web-app triage rubric. With them in place, the model routes correctly on the first pass — human escalation is reserved for genuine ambiguity, not D365-specific patterns a generic model would misclassify.

Tier 4 self-healing — D365-aware locator re-binding call sequence
Loading diagram…

The framework, not the LLM, is the source of truth. Dataverse entity metadata grounds the proposal in schema reality — not just visual DOM shape. Every heal is logged and reversible.

Engagement summary

FieldDetail
DurationSep 2025 – current
RoleQE Architect and Consultant
Reporting lineClient QA Lead / Programme Delivery Manager
TeamMixed-skill delivery team — functional consultants and QE Architects

Reference

Reference from QA Lead / Programme Delivery Manager available on request at screen stage.

Matching your brief? Get in touch.