Where We Started 

Automated UI testing frameworks are critical for maintaining quality in modern web applications. But in practice, they often become too technical and brittle for the teams who rely on them most. In our case, an existing Selenium-based testing solution had gradually fallen out of use. It was difficult to maintain, hard to update as the UI evolved, and largely inaccessible to non-developers.

At RBA, we view quality as a shared responsibility across engineering, design, and product teams. That meant we needed a UI testing approach that was both powerful and approachable. As we evaluated alternatives, Playwright and Cypress quickly rose to the top. While Cypress remains a popular choice, Playwright stood out for one key reason: its MCP server and support for natural language-driven testing. That capability aligned directly with our goal of making UI testing more accessible without sacrificing rigor.

 

Our First Attempt with AI-Assisted Testing

Before Playwright introduced custom chat modes, we experimented with building our own AI-assisted workflow using long, manually written prompts. The concept was straightforward. We asked the AI to plan the test, generate the code, and then troubleshoot failures as they appeared.

It worked, but only partially. While the tests were functional, accuracy was inconsistent. In some cases, the AI would get stuck in loops or make unexpected decisions that required significant cleanup. The approach proved valuable as a learning exercise, but it was not yet reliable enough for long-term use in a production testing workflow.

 

The Shift to Playwright Custom Chat Modes

Everything changed when Playwright introduced its three custom chat modes: Planner, Generator, and Healer. These modes are purpose-built for test creation and maintenance, and they fundamentally improved how we worked with AI.

Instead of relying on massive prompts and hoping the AI stayed on track, we could now break the workflow into clear, discrete steps. Planner defines what needs to be tested. Generator writes the test code. Healer steps in when a test breaks due to UI changes or regressions.

This structure dramatically improved accuracy and predictability. It also solved one of our biggest early challenges: runaway loops. With chat modes, the AI stops after each phase, waits for input, and only continues when prompted. That single change made the workflow far more controlled and far easier to trust.

 

Continuous Learning Through Evolution Files

One of the most valuable lessons we learned is that AI-driven UI testing should not be static. We wanted a system that improves over time, not one that simply generates tests on demand.

To support that goal, we introduced what we call an evolution file. Each time the AI encounters a recurring issue, such as a flaky selector, a common assertion failure, or inconsistent naming patterns, we capture that insight and feed it back into the system. Over time, this creates a growing library of lessons learned that the AI can reference when generating future tests.

UI testing is not just about code correctness. It is about adapting to real-world complexity and ongoing change. By giving the AI historical context and concrete examples of what failed before, we reduce repeated mistakes, improve consistency, and shift the workflow from constant oversight to true collaboration.

 

What Playwright Enables Beyond AI

Even without AI integration, Playwright has proven to be a strong foundation for modern UI testing. Features like automatic screenshots and full video recordings of test runs have been especially impactful. These artifacts make failures easier to understand, even for non-technical stakeholders. When a test fails, teams can literally watch what happened instead of deciphering logs.

We have already used Playwright to cover most of the Infrastructure Manager section of our site and are now expanding into smoke tests and broader integration testing. The next step is integrating these tests into CI/CD pipelines so they run automatically before UI changes are merged, reinforcing quality earlier in the delivery lifecycle.

 

Lessons Learned Along the Way

If there is one consistent takeaway, it is that iteration matters. AI-driven UI testing is not something you configure once and walk away from. The tools are evolving, and so must your approach.

Segmenting work into planning, generating, and healing steps does more than clean up the workflow. It provides the AI with clearer context and improves the quality of its output. Continuous learning applies to the AI just as much as it does to the teams using it. Capturing history, patterns, and lessons learned makes each iteration stronger than the last.

 

Cost and Practicality

From a cost perspective, Playwright is free and the custom chat modes are included. The primary expense is token usage when generating tests through tools like Copilot or similar AI assistants. In terms of maintainability, the AI-generated tests are surprisingly readable. They can be slightly verbose, but they are far easier to understand and maintain than many legacy test suites.

 

What’s Next

We are continuing to expand test coverage, explore API-level testing, and deepen CI/CD integration. Longer term, we plan to share more of what we have learned so other teams can adopt AI-assisted UI testing with confidence.

 

Final Thought

AI is not replacing testers. It is making UI testing faster, smarter, and more accessible across teams. The key is to keep iterating, keep providing context, and remain open to new ways of working. When used thoughtfully, AI becomes a powerful partner in delivering high-quality digital experiences.

About the Author

Xander Schroeder
Xander Schroeder

Software Engineer

Xander is a full stack developer at RBA with a passion for finding and applying new technologies across a variety of projects. With a focus on .Net projects and automated testing Xander is a young developer looking forward to exploring all kinds of software problems. Outside of the job Xander enjoys time with his dog and partner doing things like running, rock climbing, cycling and just generally enjoying the outdoors.