Most AI prompt guides for QA are backwards. They give you a list of commands to copy-paste, like AI is a vending machine where you insert tokens and get test cases.

That’s not how good testing works. And it’s definitely not how to get useful output from AI.

Here’s what actually works: treat the AI like a junior QA you’re training. Give it context, a role, and specific tasks. Then critically review and correct its work, because AI doesn’t understand user frustration, design inconsistency, or when something is technically correct but completely wrong from a UX perspective.

Setting Up Your AI Testing Partner

Before you ask AI to test anything, you need to give it a role and context. This isn’t corporate roleplay—it changes how the AI interprets your requests.

This builds on what we covered in AI-Assisted Manual Testing AI works best when it’s positioned as your thinking assistant, not a replacement for your judgment.

Start every testing session like this:

You are a senior QA engineer with 5+ years of experience testing web applications. 

Your job is to:
- Identify bugs (functional, UI, UX, edge cases)
- Write clear test cases following specific formats
- Test against acceptance criteria
- Think about user experience, not just technical correctness

Important: Technical accuracy doesn't mean good UX. Flag anything that would frustrate users, even if it "works" as coded.

This matters more than you’d think. Without this context, AI defaults to checking if code works logically. With it, AI considers the broader picture though you’ll still need to catch what it misses.

Real Scenario: Testing From Screenshots

This is the workflow that actually happens. Someone drops a screenshot in Slack, or you’re reviewing a feature in staging, and you need to document what you’re seeing.

When you’re analyzing what you see, remember the principles from The Happy and Sad Path you’re not just checking if something works technically, but whether it handles both ideal and failure scenarios gracefully.

Here’s how to work with AI on this:

[Attach screenshot or describe what you're seeing]

As a senior QA engineer, analyze this screen and create:

1. Test cases following this format:
   - Bug ID: [leave blank, you'll assign]
   - Title: [clear, specific]
   - Severity: [Critical/High/Medium/Low]
   - Steps to Reproduce:
   - Expected Result:
   - Actual Result:
   - Environment: [Browser, OS, build version]
   - Additional Notes:

2. Acceptance criteria validation:
   - [List your actual acceptance criteria here]
   - For each criterion, state: Pass/Fail/Unclear
   - Explain why

Focus on:
- Functional issues (does it work?)
- UI issues (layout, alignment, responsiveness, visual bugs)
- UX issues (is it confusing? frustrating? does it match user expectations?)
- Edge cases visible in this screen

The AI will analyze and give you a starting point. But here’s the critical part: you still need to review everything.

For more on structuring these bug reports properly, check out How to Write Effective Bug Reports, the format matters when developers need to act on your findings.

What AI Misses (And Why You Still Matter)

AI will catch obvious stuff: broken layouts, missing buttons, clear functional issues. But it will miss things that require human judgment.

This is what we discussed in Debugging in QA AI can point you toward issues, but it can’t replace the critical thinking that comes from actually understanding your system and users.

AI won’t feel user frustration. You show it a form with 15 required fields, AI says “form functions correctly.” You know users will abandon it. That’s your call, not the AI’s.

AI doesn’t understand design consistency. You show it a button that’s technically functional but uses a different shade of blue than every other button in the app. AI might not flag it. You know it breaks design language.

AI can’t gauge severity accurately. It might call a minor visual glitch “High” and a major workflow blocker “Medium” because it doesn’t understand your product’s priorities or user impact.

This is why the workflow is: AI generates → You review → You correct → You use the useful parts.

Testing Against Acceptance Criteria

This is where AI actually saves time. You have acceptance criteria from your ticket, and you need to methodically verify each one.

Here are the acceptance criteria for this feature:

1. User can upload images up to 5MB
2. Supported formats: JPG, PNG, GIF
3. Upload button is disabled until an image is selected
4. Error message appears for files over 5MB
5. Success message appears after successful upload
6. Uploaded image appears in preview immediately

I'm testing this feature. Based on [screenshot/description of current behavior], evaluate:
- Which acceptance criteria are met?
- Which are failing?
- What's unclear or untestable from what I'm seeing?
- What edge cases should I test that aren't in the acceptance criteria?

Be specific. Don't just say "looks good"—tell me exactly what you verified and what you couldn't verify from this view.

AI walks through each criterion and tells you what it can confirm. More importantly, it catches edge cases the acceptance criteria missed. What happens if someone tries to upload a 4.9MB file? What if they select an image, then try to select another before the first finishes uploading?

You still need to actually test these scenarios. AI just helps you think through what to test.

Real Example: The Logout Button That Worked Perfectly (And Was Completely Wrong)

Here’s where AI limitations become obvious.

You’re testing a new feature. There’s a logout button in the top-right corner. AI analyzes the screenshot:

Functional Test: ✓ Logout button present
Functional Test: ✓ Button clickable
Functional Test: ✓ User logged out on click
Acceptance Criteria: ✓ All met

Technically correct. But you, the human QA, notice: the logout button is the same size and color as the primary CTA button. Users will accidentally click logout when trying to complete their main task. That’s a UX disaster waiting to happen.

AI didn’t catch it because logically, the button works. You caught it because you understand user behavior and interface design patterns.

This is the loop: AI helps you document and structure, but you provide the judgment that comes from actually using software and watching users struggle.

Video Analysis: Testing Flows and Interactions

Screenshots are static. Video lets you test actual flows. If you’re screen recording a bug or testing a multi-step process, you can feed video to AI (Claude, Gemini, and GPT-4 support this).

[Attach screen recording]

As a senior QA, watch this user flow and document:

1. Every action taken
2. Expected behavior vs. actual behavior at each step
3. UI issues (flickering, layout shifts, loading states)
4. UX issues (confusing flows, unclear feedback, unnecessary steps)
5. Performance concerns (slow loads, janky animations)
6. Edge cases or error states shown

Format findings as test cases using our bug report template:
[Insert your template]

AI watches the video and documents what happens. This is useful for capturing complex reproduction steps or when you’re testing a flow with 10+ steps and don’t want to manually write out every action.

But again: AI sees what happens. You interpret whether what happens is acceptable.

Course Correcting Mid-Generation (The Most Underused Trick)

Here’s something most people don’t realize: you don’t have to wait for AI to finish generating a wrong answer before you correct it.

If AI starts going in the wrong direction, analyzing the wrong element, missing the point entirely, or generating a response that’s clearly not what you need, stop it immediately. Hit stop, then add more context to your original question.

This is faster and more effective than letting AI finish a useless 500-word response, then trying to correct it after.

Real example:

You ask AI to analyze a screenshot and it starts describing the header navigation in detail when you actually needed it to focus on the broken form validation below. Stop it. Add context:

Stop. I need you to focus specifically on the form validation errors in the center of the screen, not the navigation. The issue is with how error messages are displayed when users submit invalid data.

AI pivots immediately and analyzes what you actually need.

Why this works:

When AI gets it wrong from the start, it’s usually because:

Your context was too vague (“test this screen” vs “test the password validation on this login form”)
Your context was too complex (you dumped 10 different things to check without prioritizing)
AI latched onto the wrong element or made an assumption you didn’t intend

Mid-correction forces you to clarify what you actually want. And AI course-corrects faster than if you let it finish the wrong path.

Handling AI’s Wrong Answers After It Finishes

Sometimes you let AI complete its response, and it’s wrong. Here’s how to handle it.

When AI says something passes that clearly fails: Don’t ignore it. Correct it in the same conversation:

That's incorrect. The button is not centered—it's 3px off to the left. Update the bug report to reflect this visual alignment issue as Medium severity.

AI adjusts. You’re training it within the session to be more accurate for your context.

When AI’s severity assessment is off: Override it immediately:

This is Critical, not Medium. Users cannot complete checkout without this functionality. Update severity and add business impact: blocks revenue.

When AI misses something obvious to you: Add it yourself, but also teach the AI:

You missed the error message displaying HTML tags instead of formatted text. This is a High severity UI bug. Add this to the bug report and remember: always check if error messages render correctly.

Over the course of a session, AI gets better at understanding your product and priorities. But this resets when you start a new conversation, so save your best prompts and corrections as templates.

The clarity principle:

If AI consistently misunderstands what you want, the problem is usually your prompt. Either you’re being too vague or too complex.

Too vague: “Test this feature” Better: “Test the password reset flow, specifically checking if the email validation matches our regex pattern and if the success message displays correctly”

Too complex: “Test the entire checkout flow including payment processing, shipping calculations, tax logic, promo codes, guest checkout, saved addresses, and mobile responsiveness” Better: Break it into separate focused prompts, or prioritize: “First, test the payment processing flow. We can check shipping and tax logic after.”

When AI gets it wrong, ask yourself: would a junior QA understand what I’m asking based on what I wrote? If not, add more context before AI wastes time generating the wrong thing.

Building Your Personal Testing Prompts

Here’s what you actually need to do this efficiently: save prompts that work for your specific workflow.

Create a doc with:

Your standard role setup (“You are a senior QA…”)
Your bug report template
Your acceptance criteria validation format
Your specific product context (tech stack, design system rules, common edge cases)

Then when you’re testing, you’re not writing prompts from scratch. You’re filling in variables: [screenshot], [acceptance criteria], [feature description].

This ties into the test case templates we discussed in Efficient QA Test Case Design whether you prefer simplified or detailed formats, AI can work with both as long as you’re consistent about the structure.

This is how guerrilla QA uses AI: not as a replacement for thinking, but as a faster way to document what you’re already thinking.

Which AI Platform for Testing?

For screenshot analysis: Claude or GPT-4. Both handle images well. Claude is better with detailed analysis, GPT-4 is faster.

For video analysis: Gemini or GPT-4. Claude’s video support is newer and less reliable right now.

For generating test data or scripts: Any of them work. Use whatever you already have access to.

For real-time documentation while testing: Copilot or voice-to-text with ChatGPT on mobile. You’re clicking through the app, speaking your observations, and AI structures them into proper bug reports.

The platform matters less than knowing how to set it up properly and correct its mistakes.

The Uncomfortable Truth About AI in QA

AI won’t replace QA engineers. But QA engineers who use AI will replace those who don’t.

As we explored in QA and the Future: AI, Automation, and Trends You Need to Watch, the industry is moving fast. Testers who adapt and learn to leverage AI alongside their existing skills will thrive. Those who resist will struggle.

The value isn’t in AI doing your job. It’s in AI handling the tedious documentation and initial analysis so you can focus on the judgment calls that actually require experience: Is this a blocker? Will users understand this? Is this worth delaying release?

This is the evolution of Manual vs. Automated Testing now we’re adding AI-assisted testing as a third dimension. You still need manual testing for exploratory work and automation for regression, but AI accelerates both.

Use AI to move faster. But never let it replace your understanding of what makes software actually usable, not just technically functional.

If you’re letting AI test for you without reviewing its work, you’re not doing QA. You’re just hoping the AI noticed what you would have noticed. That’s not testing—that’s gambling.

Jaren Cudilla
QA Overlord

Turns fragile “AI testing” experiments into repeatable QA systems.
Writes from field experience, not theory on how automation, discipline, and LLMs actually coexist without breaking production.
Runs QAJourney.net — real QA from the trenches, not prompt theater.

🔗 About QAJourney | Help me upgrade the test rig

📄 View this post’s TLDR on GitHub Gist and GitHub Gist #2.

Training AI to Think Like a QA: A Real-World Testing Approach