
Every QA I know has tried it.
You paste the requirements into ChatGPT. You ask for test cases. You get a list. It looks complete. You move on. Then something breaks in production that was obviously testable if anyone had thought about it for more than thirty seconds. That is not AI-assisted QA. That is a faster way to feel like you did the work without actually doing it.
The problem is not the models. Claude, ChatGPT, Gemini, they can all reason about software behavior if you give them the right context and structure. The problem is how AI is being used in QA right now: one prompt, one model, one pass, no review. That is not how QA works. That has never been how QA works.
What the Current AI QA Landscape Actually Looks Like
To be clear, this is not a criticism of the tools that exist. Mabl, Katalon, Virtuoso, Testim, and others in that space are doing genuinely useful work. They handle self-healing tests, natural language test generation, and autonomous script creation. Tools like Playwright have seen massive adoption growth because they solve real automation problems that teams were dealing with daily.
But here is what all of those tools have in common: they are automation tools. They solve the execution layer, the scripting layer, the maintenance layer. That is valuable work. It is just not the whole job.
The most common use of AI in QA right now is generating test cases, optimizing them, and planning the testing process. Those save real time and nobody is arguing otherwise. The gap is not that AI is not helping at all. The gap is that none of it replicates how a QA team actually thinks and works together to catch what matters.
What Real QA Actually Looks Like
Before talking about what AI should do, it helps to be clear about what QA actually is. Not the textbook definition, but how it works on a real team under real sprint pressure.
A Jr QA picks up a ticket. They read the AC. They test the feature. They decide pass or fail. Along the way they are also catching what is in scope, what is out of scope, and what edge cases exist that nobody wrote down. If something fails, they raise a bug report with enough detail that a developer can reproduce it without a twenty-minute call.
Then a Sr QA looks at it. Not to redo the work, to validate it. Did they miss anything obvious? Is the bug report actually reproducible? Is what got flagged as out of scope actually out of scope, or did someone just not want to deal with it? They approve or they send it back.
That loop, Jr QA does the work, Sr QA challenges it, human makes the final call, is where quality actually gets caught. Not in the individual test cases. In the back and forth between people who are not looking at the same thing the same way.
Current AI usage in QA skips that loop entirely. You get one answer from one model and you decide if it is good. There is no second pass, no pushback, nothing to catch what the first pass missed. That is a solo QA with no backup dressed up as AI assistance.
Why One Model Reviewing Its Own Work Does Not Work
This is not a technical limitation. It is a structural one, and it matters to understand why before trying to fix it.
A model reviewing its own output will miss the same things it missed the first time. It generated the test cases based on its own assumptions, and asking it to review those same test cases means asking it to question assumptions it just made. It does not do that reliably. It smooths, it fills gaps with plausible-sounding output, and it looks thorough because it is internally consistent, not because it is actually covering the right things.
This is the same reason you do not have one person write code and review their own pull request. Not because they are incompetent, but because the same blind spots that existed during writing exist during review. A second set of eyes catches things the first set cannot see because they are too close to it.
Different models have different blind spots. One model catches what another smooths over. That friction between them is where real coverage comes from. The disagreement is the feature, not a problem to be solved by picking the best single model and hoping it covers everything.
This Is Not Test Automation and It Is Not SDET Either
Worth being explicit about this because the confusion shuts down the right conversation before it starts.
Test automation is code. You are writing deterministic scripts that run the same way every time against a known system. It includes:
- Playwright and Cypress suites for UI testing
- API testing frameworks like Postman and Newman
- Load testing tools like k6
- CI pipeline integration
That is engineering work. Valuable, necessary, a completely different skill set from what is being described here.
SDET sits at the intersection of development and testing. Framework design, tooling, infrastructure, maintaining the automation layer itself. Also valuable, also a completely different thing.
What a real AI QA team does is neither of those. It is the thinking layer, the judgment layer. Reading AC and deciding if it is actually testable. Testing a feature and deciding pass or fail. Catching edge cases that nobody specified because they require understanding how users actually behave, not just how the spec says they should. Writing a bug report that is useful instead of a vague description that wastes everyone’s time. Deciding what belongs in regression and what does not.
That judgment work is what no tool currently replicates end to end. The execution layer is increasingly automated. The thinking layer is still mostly one QA, one prompt, one model, hoping the output is good enough.
What an AI QA Team Actually Needs to Look Like
The mental model is straightforward. Stop thinking about AI as a tool you prompt and start thinking about it as a team you manage. The roles map directly to how a real QA team operates.
The Jr QA model does the work:
- Reads the AC and checks if it is actually testable
- Tests the feature in context, not in a vacuum
- Identifies what passes and what fails
- Catches in-scope and out-of-scope issues
- Flags edge cases
- Drafts bug reports with reproduction steps
The Sr QA model reviews it, not redoes it:
- Challenges the coverage, looks for what got missed
- Checks whether bug reports are reproducible as written
- Questions whether edge cases are genuinely in scope
- Approves or sends back with specific feedback
You are the QA Lead. You make the final call on what ships.
That structure needs to hold across different test types because functional testing does not work the same way as regression, and API testing has different considerations than UAT. Each test type has its own workflow underneath. The team structure stays the same across all of them.
It also needs to work whether you are a solo QA who wants backup or an actual team that wants to scale without adding headcount. Those are different use cases with the same underlying need: QA that thinks like a team, not a prompt box.
QA Needs to Be Integral to the AI Age, Not Just Adjacent to It
Upskilling in AI tools is not the same as having AI work the way QA actually works. Using ChatGPT to generate test cases faster does not make QA more integral to software development. It just makes the existing shallow workflow faster and gives everyone a false sense that the problem is solved.
What makes QA integral is when it operates the way it always should have: earlier in the process, embedded in the thinking, not just verifying at the end. An AI QA team that mirrors how a real team operates puts QA at the center of feature development, not as a final gate before release. That is the shift worth building toward.
Developers are not waiting for QA to tell them AI tools matter. They already have Copilot, Cursor, and a growing stack built around how they actually work. QA deserves the same, built around how QA actually works, not just a faster version of what already exists.
Where This Is Going
This is being built as an actual platform, not a concept doc. The same way AutoBlog AI started as a writing workflow on EngineeredAI and became a real system, this starts as a QA problem definition and becomes something you can actually run.
The QA thinking lives here on QAJourney because this is where the problem is understood. The build, the stack decisions, the model choices, what breaks and what survives, that gets documented on EngineeredAI as the experiment runs. Same approach, different domain.
If you are a solo QA who has tried every AI tool and found them shallow, this is being built for you. If you are running a small QA team that wants AI that works like a real team member and not just a faster prompt interface, same. More as it gets built.
Related: AI QA Workflow — Structured System for Testing with AI The build: EngineeredAI — Multi-Model AI Writing Stack


