Updated October 2024 with new examples, refactoring workflows, and real-world fixes for brittle selectors.

The first time I used Playwright’s codegen, I was excited. Click a button, codegen records it, out pops a test. Revolutionary.

Then I tried to run it again.

The test failed. The selectors were brittle. The flow was bloated. I’d generated 47 lines of code to navigate three pages, and half of it was garbage I’d never use.

I realized codegen wasn’t magic. It was a starting point. And if you don’t know how to fix what it generates, you’re building a test suite on sand.

Most teams don’t talk about this part. They show you the happy path: launch codegen, record, run test, done. What they don’t show you is what happens when your generated tests become unmaintainable nightmares that break every time the UI changes.

This is what actually using codegen looks like at scale. The problems you’ll hit. How to fix them. And when to stop using codegen and write it by hand.

The Codegen Problem: Why Generated Tests Fail in Production

Codegen is brilliant at one thing: recording user interactions.

It’s terrible at everything else.

When you record a click on a login button, codegen generates something like:

javascript

await page.click('text=Login');

That’s fine until the button text changes. Or someone rereleases the button with a new class. Or the page is translated to Spanish.

The selector breaks. The test breaks. You’re back here fixing it.

But here’s the real problem: you might not notice until you’re running it in CI, at 3 AM, wondering why 40 tests just failed.

Why This Happens

Codegen records what it sees in that exact moment. It doesn’t know about:

Selectors that will break when the DOM changes
Dynamic data that shouldn’t be hard-coded
The difference between “a thing that works” and “a thing that’s maintainable”
Business logic that the test should validate

It just records clicks and fills and submits.

As long as the UI doesn’t change, you’re fine. The second it does, you’re debugging test failures instead of testing features.

The Real Cost

A generated test might take 10 minutes to create. A brittle test costs you:

15 minutes debugging when it fails
5 minutes fixing the selector
The context switch that kills your focus
The frustration of maintaining someone else’s mess (usually the person who generated it three months ago)

That 10 minutes of “automation” turns into hours of maintenance debt.

Problem 1: Brittle Selectors That Break on Every UI Update

What codegen generates:

javascript

await page.click('div > button:nth-child(3)');

What happens: You hire a designer. The designer changes the button order. Everything breaks.

Why this is a real problem: If your test suite is brittle, you stop trusting it. When a test fails, you assume the test is broken, not the feature. You start skipping tests. Tests go back to being busywork.

How to Fix It

Step 1: Replace Generated Selectors With Stable Ones

After codegen records your test, before you commit it, audit every selector.

For a login form, codegen might give you:

javascript

await page.fill('input[type="password"]', 'password123');

That works until your form has a credit card input. Now you have two password fields.

Replace it with something stable:

javascript

await page.fill('[data-testid="login-password"]', 'password123');

If your developers haven’t added data-testids, ask them to. It’s the single most important thing you can do for test stability.

If they say “it’s not production code,” remind them it’s testing infrastructure. It’s exactly as important.

Step 2: Use Semantic Selectors When Possible

If data-testid isn’t available, use role-based selectors:

javascript

await page.click('role=button[name="Login"]');

This says “find the button labeled ‘Login'” instead of “find the third div that looks clickable.”

It’s resilient. It’s human-readable. It’s close to how actual users interact with the page.

Step 3: Build a Selector Library

After you’ve cleaned up selectors from three tests, you realize you’re using the same ones everywhere.

Create a selectors file:

javascript

// selectors.js
export const LOGIN_FORM = {
  username: '[data-testid="login-username"]',
  password: '[data-testid="login-password"]',
  loginButton: 'role=button[name="Login"]',
  errorMessage: '[data-testid="login-error"]',
};

Now your test becomes:

javascript

import { LOGIN_FORM } from './selectors.js';

await page.fill(LOGIN_FORM.username, 'testuser');
await page.fill(LOGIN_FORM.password, 'password123');
await page.click(LOGIN_FORM.loginButton);
await expect(page.locator(LOGIN_FORM.errorMessage)).toBeVisible();

If the selector changes, you fix it in one place, not 50 tests.

Problem 2: Hard-Coded Data That Only Works Once

Codegen records the exact data you entered.

javascript

await page.fill('[data-testid="search-input"]', 'iPhone 15 Pro Max');
await page.click('[data-testid="search-button"]');

This test searches for one specific product. Forever.

What happens when the product gets discontinued? When you need to test search with different terms? When your test data changes?

You’re stuck hard-coding new test variations.

How to Fix It

Extract Your Test Data Into Parameters

Instead of recording one search, parameterize it:

javascript

async function searchProduct(page, productName) {
  await page.fill('[data-testid="search-input"]', productName);
  await page.click('[data-testid="search-button"]');
}

// Run the same test with different data
await test('search for existing product', async ({ page }) => {
  await searchProduct(page, 'iPhone 15 Pro Max');
  await expect(page.locator('[data-testid="product-result"]')).toBeVisible();
});

await test('search for non-existent product', async ({ page }) => {
  await searchProduct(page, 'Galaxy Brain 2000');
  await expect(page.locator('[data-testid="no-results"]')).toBeVisible();
});

Now you’re testing multiple scenarios with the same code. And when test data changes, you update it in one place.

Use Fixtures for Common Setup Data

Codegen might record logging in with hardcoded credentials. That breaks in production, CI, and everywhere else.

Create a fixture:

javascript

// fixtures.js
export const testUser = {
  email: '[email protected]',
  password: process.env.TEST_PASSWORD,
  firstName: 'QA',
  lastName: 'Tester',
};

// In your test
const { page, testUser } = await test.use({ testUser });

Now your credentials come from environment variables, not hard-coded strings in your test file.

Problem 3: Tests That Record Fluff, Not Logic

Codegen records everything you do. Including the mistakes.

You misclick. You wait for something to load. You hover over something by accident. Codegen records all of it.

Your generated test now has all this garbage in it:

javascript

await page.click('[data-testid="search-input"]');
await page.waitForTimeout(500); // You just waiting, not testing
await page.fill('[data-testid="search-input"]', 'test');
await page.hover('[data-testid="filter-button"]'); // Why is this here?
await page.click('[data-testid="filter-button"]');
await page.waitForTimeout(300); // Another wait
await page.click('[data-testid="search-button"]');

It works, but it’s bloated. It’s slow. It’s fragile (timeouts break on slow CI).

Most importantly, it doesn’t validate anything. It just reproduces what you did.

How to Fix It

Remove the Fluff

After codegen finishes, go through line by line:

Remove unnecessary waits. Playwright has built-in waits that are smarter than hard-coded timeouts.
Remove clicks that don’t do anything.
Remove hovers unless they’re actually testing hover behavior.

Clean version:

javascript

await page.fill('[data-testid="search-input"]', 'test');
await page.click('[data-testid="filter-button"]');
await page.click('[data-testid="search-button"]');

Add Assertions That Actually Validate

Codegen records the flow. It doesn’t validate the outcome.

After you click search, what should happen? Add that validation:

javascript

await page.fill('[data-testid="search-input"]', 'test');
await page.click('[data-testid="search-button"]');

// This is what you're actually testing
await expect(page.locator('[data-testid="search-results"]')).toHaveCount(5);
await expect(page.locator('text=Results for "test"')).toBeVisible();

Now the test validates something. It’s not just “did this sequence of clicks work,” it’s “did the search return the right results.”

Use Explicit Waits Only When Necessary

Instead of:

javascript

await page.waitForTimeout(500);

Use assertions:

javascript

// Wait for the results to appear
await expect(page.locator('[data-testid="search-results"]')).toBeVisible();

Playwright waits until this is true, with intelligent retries. If it times out, you have a real problem to investigate. If it passes immediately, you don’t waste time.

Problem 4: Monolithic Tests That Do Too Much

Codegen encourages recording long flows. Sign in, navigate, fill form, submit, verify. All in one test.

The problem: if step 3 fails, you don’t know if it’s step 3’s fault or if something earlier broke the state.

javascript

test('complete checkout', async ({ page }) => {
  // Step 1: Login
  await page.goto('https://playground.qajourney.net/login');
  await page.fill('[data-testid="email"]', '[email protected]');
  await page.fill('[data-testid="password"]', 'password');
  await page.click('[data-testid="login-button"]');
  
  // Step 2: Browse
  await page.click('[data-testid="search-button"]');
  await page.fill('[data-testid="search-input"]', 'Shoes');
  
  // Step 3: Add to cart
  await page.click('[data-testid="product-result"]:first-child');
  await page.click('[data-testid="add-to-cart"]');
  
  // Step 4: Checkout
  await page.click('[data-testid="cart-button"]');
  await page.click('[data-testid="checkout"]');
  
  // Step 5: Payment
  await page.fill('[data-testid="card-number"]', '4242 4242 4242 4242');
  await page.fill('[data-testid="expiry"]', '12/25');
  await page.click('[data-testid="complete-purchase"]');
  
  // Verify
  await expect(page.locator('text=Thank you for your order')).toBeVisible();
});

If this test fails on the payment step, you’ve wasted time verifying login, search, and cart. You already know those work.

How to Fix It

Break Monolithic Tests Into Reusable Flows

Instead of one massive test, create helper functions:

javascript

// flows.js
export async function loginAs(page, email, password) {
  await page.goto('https://playground.qajourney.net/login');
  await page.fill('[data-testid="email"]', email);
  await page.fill('[data-testid="password"]', password);
  await page.click('[data-testid="login-button"]');
  await expect(page).toHaveURL('**/dashboard');
}

export async function searchAndAddToCart(page, query, productIndex = 0) {
  await page.click('[data-testid="search-button"]');
  await page.fill('[data-testid="search-input"]', query);
  await page.locator('[data-testid="product-result"]').nth(productIndex).click();
  await page.click('[data-testid="add-to-cart"]');
}

export async function checkout(page, cardNumber, expiry) {
  await page.click('[data-testid="cart-button"]');
  await page.click('[data-testid="checkout"]');
  await page.fill('[data-testid="card-number"]', cardNumber);
  await page.fill('[data-testid="expiry"]', expiry);
  await page.click('[data-testid="complete-purchase"]');
}

// tests.js
import { loginAs, searchAndAddToCart, checkout } from './flows.js';

test('user can complete checkout', async ({ page }) => {
  await loginAs(page, '[email protected]', 'password123');
  await searchAndAddToCart(page, 'Shoes', 0);
  await checkout(page, '4242 4242 4242 4242', '12/25');
  
  await expect(page.locator('text=Thank you for your order')).toBeVisible();
});

// Now you can test individual flows
test('login flow', async ({ page }) => {
  await loginAs(page, '[email protected]', 'password123');
  await expect(page).toHaveURL('**/dashboard');
});

test('search and add to cart', async ({ page }) => {
  await loginAs(page, '[email protected]', 'password123');
  await searchAndAddToCart(page, 'Shoes', 0);
  await expect(page.locator('[data-testid="cart-count"]')).toContainText('1');
});

Now if checkout fails, you already know login and search work. You debug faster. You reuse code. Your tests are shorter and easier to read.

Problem 5: Knowing When NOT to Use Codegen

Here’s what nobody tells you about codegen: sometimes it’s faster to just write the test yourself.

As a QA lead, I still prefer manual testing in many cases. The same logic applies to test generation—just because you can generate it doesn’t mean you should.

Codegen makes sense for:

Complex user flows that are hard to script by hand
Creating initial scaffolding for critical paths (then refactor it)
Testing visual regressions (record, screenshot, compare)
Prototyping tests quickly for exploratory testing

Codegen doesn’t make sense for:

Any test with dynamic data
Error handling or edge cases
Form validation
Multi-step workflows that need to be broken into functions
Tests that need to run across multiple environments

The truth is: automation isn’t the goal—quality is. And great QA engineers know when to use the right testing method for the right scenario.

If you’re spending 40 minutes refactoring a generated test, you might have spent 10 minutes writing it from scratch. The overhead isn’t always worth it.

When Codegen Creates More Work Than It Saves

Think about your actual testing workflow. I used Playwright for what mattered: UI regression checks on staging. When I felt too lazy to test manually, automation picked up the slack. It became a tool—not a religion.

The key insight: If you’re automating every test case just to feel productive, congratulations—you’ve just invented flaky hell.

This is exactly why many teams pause CI/CD or scale back automation. The maintenance cost exceeds the benefit. For teams without fully implemented CI/CD pipelines, automation should be targeted rather than exhaustive. Instead of aiming for 100% automation, prioritize API automation over UI automation for faster validation.

The Real Workflow: Codegen + Refactoring + Judgment

This is how I actually use codegen now:

Step 1: Decide if codegen makes sense (2 minutes)

Is this a repetitive flow? Does it change often? Will writing it by hand be faster?
If yes to “repetitive” and “stable,” continue. If no, write it manually.

Step 2: Record the flow (5 minutes)

bash

npx playwright codegen https://playground.qajourney.net/

Step 3: Clean up the selectors (10 minutes) Replace nth-child and text= selectors with data-testids or roles.

Step 4: Extract into functions (10 minutes) Move the flow into a reusable helper so you can use it across multiple tests.

Step 5: Parameterize the data (5 minutes) Replace hard-coded values with variables or fixtures.

Step 6: Add assertions (10 minutes) Don’t just record actions. Validate outcomes.

Total: 40 minutes to a maintainable test

But notice: you only do this for critical, stable paths. For everything else, you write the test from scratch in 10 minutes and it’s clean from the start.

The Hard Truth

I’m not a formally trained automation engineer. I didn’t come from a Selenium bootcamp or spend months perfecting test pipelines. I came from real-world QA. And in real-world QA, you learn fast: the best tests aren’t generated. They’re written by someone who understands what they’re testing, how to structure it, and what will break it.

Codegen is a tool to accelerate that process, not replace it.

If you’re using codegen and your tests are becoming unmaintainable, you’re not using it wrong. You’re realizing that recording user interactions and writing good automated tests are two completely different skills.

The best QA approach isn’t choosing between manual or automation. It’s about using all three strategically. That includes knowing when automation overhead isn’t worth the ROI.

Quick Reference: When to Use Codegen vs. Write By Hand

Use Codegen If:

The flow is stable and unlikely to change for 6+ months
It’s a critical path you’ll run dozens of times
You’re willing to spend time refactoring it
The test is too complex to script quickly

Write By Hand If:

You can write it faster than codegen + refactoring
It’s a one-off or short-term test
It involves dynamic data or error scenarios
You need it maintainable from day one

For Deeper Context

Learn more about using Playwright strategically in real-world testing: Playwright for QA Testing
How to use automation as a tool without letting it become a religion: I’m Not an Automation Engineer, but Here’s How I Use Playwright
Understanding when manual testing wins against full automation: Balancing Manual, Automation, and AI-Driven Testing
A deeper look at automation vs. manual from a QA lead perspective: I Was an Automation Engineer – But as a QA Lead, I Still Prefer Manual Testing

Do the work. Your future self will thank you when the test passes six months later without modification.

Jaren Cudilla
QA Overlord

Learned automation wasn’t magic when 47-line generated tests started failing in production.
Built refactoring workflows that actually save time, trained teams to know when codegen helps vs. when it hurts, and still debug flaky selectors so you don’t have to at 3 AM.
Real QA needs real judgment calls not just more scripts.

🔗 About QAJourney | See automation examples on GitHub | Buy me a test rig or a gaming PC!