About Blog Contact Links Vault
Latest
Home / Advanced Testing Techniques / Playwright Codegen: When Automation Records Garbage (And What to Actually Do)
Advanced Testing Techniques
14 min read · January 6, 2025 · Updated February 11, 2026 · 437 views

Playwright Codegen: When Automation Records Garbage (And What to Actually Do)

Generated tests fail. Codegen records what it sees, not what's maintainable. Learn the 5 problems with Playwright codegen, how to fix them, and when to skip codegen entirely and write tests by hand instead.

Share:
Contents

    Updated October 2024 with new examples, refactoring workflows, and real-world fixes for brittle selectors.

    The first time I used Playwright’s codegen, I was excited. Click a button, codegen records it, out pops a test. Revolutionary.

    Then I tried to run it again.

    The test failed. The selectors were brittle. The flow was bloated. I’d generated 47 lines of code to navigate three pages, and half of it was garbage I’d never use.

    I realized codegen wasn’t magic. It was a starting point. And if you don’t know how to fix what it generates, you’re building a test suite on sand.

    Most teams don’t talk about this part. They show you the happy path: launch codegen, record, run test, done. What they don’t show you is what happens when your generated tests become unmaintainable nightmares that break every time the UI changes.

    This is what actually using codegen looks like at scale. The problems you’ll hit. How to fix them. And when to stop using codegen and write it by hand.



    The Codegen Problem: Why Generated Tests Fail in Production

    Codegen is brilliant at one thing: recording user interactions.

    It’s terrible at everything else.

    When you record a click on a login button, codegen generates something like:

    javascript

    await page.click('text=Login');

    That’s fine until the button text changes. Or someone rereleases the button with a new class. Or the page is translated to Spanish.

    The selector breaks. The test breaks. You’re back here fixing it.

    But here’s the real problem: you might not notice until you’re running it in CI, at 3 AM, wondering why 40 tests just failed.

    Why This Happens

    Codegen records what it sees in that exact moment. It doesn’t know about:

    • Selectors that will break when the DOM changes
    • Dynamic data that shouldn’t be hard-coded
    • The difference between “a thing that works” and “a thing that’s maintainable”
    • Business logic that the test should validate

    It just records clicks and fills and submits.

    As long as the UI doesn’t change, you’re fine. The second it does, you’re debugging test failures instead of testing features.

    The Real Cost

    A generated test might take 10 minutes to create. A brittle test costs you:

    • 15 minutes debugging when it fails
    • 5 minutes fixing the selector
    • The context switch that kills your focus
    • The frustration of maintaining someone else’s mess (usually the person who generated it three months ago)

    That 10 minutes of “automation” turns into hours of maintenance debt.

    Problem 1: Brittle Selectors That Break on Every UI Update

    What codegen generates:

    javascript

    await page.click('div > button:nth-child(3)');

    What happens: You hire a designer. The designer changes the button order. Everything breaks.

    Why this is a real problem: If your test suite is brittle, you stop trusting it. When a test fails, you assume the test is broken, not the feature. You start skipping tests. Tests go back to being busywork.

    How to Fix It

    Step 1: Replace Generated Selectors With Stable Ones

    After codegen records your test, before you commit it, audit every selector.

    For a login form, codegen might give you:

    javascript

    await page.fill('input[type="password"]', 'password123');

    That works until your form has a credit card input. Now you have two password fields.

    Replace it with something stable:

    javascript

    await page.fill('[data-testid="login-password"]', 'password123');

    If your developers haven’t added data-testids, ask them to. It’s the single most important thing you can do for test stability.

    If they say “it’s not production code,” remind them it’s testing infrastructure. It’s exactly as important.

    Step 2: Use Semantic Selectors When Possible

    If data-testid isn’t available, use role-based selectors:

    javascript

    await page.click('role=button[name="Login"]');

    This says “find the button labeled ‘Login'” instead of “find the third div that looks clickable.”

    It’s resilient. It’s human-readable. It’s close to how actual users interact with the page.

    Step 3: Build a Selector Library

    After you’ve cleaned up selectors from three tests, you realize you’re using the same ones everywhere.

    Create a selectors file:

    javascript

    // selectors.js
    export const LOGIN_FORM = {
      username: '[data-testid="login-username"]',
      password: '[data-testid="login-password"]',
      loginButton: 'role=button[name="Login"]',
      errorMessage: '[data-testid="login-error"]',
    };

    Now your test becomes:

    javascript

    import { LOGIN_FORM } from './selectors.js';
    
    await page.fill(LOGIN_FORM.username, 'testuser');
    await page.fill(LOGIN_FORM.password, 'password123');
    await page.click(LOGIN_FORM.loginButton);
    await expect(page.locator(LOGIN_FORM.errorMessage)).toBeVisible();

    If the selector changes, you fix it in one place, not 50 tests.

    Problem 2: Hard-Coded Data That Only Works Once

    Codegen records the exact data you entered.

    javascript

    await page.fill('[data-testid="search-input"]', 'iPhone 15 Pro Max');
    await page.click('[data-testid="search-button"]');

    This test searches for one specific product. Forever.

    What happens when the product gets discontinued? When you need to test search with different terms? When your test data changes?

    You’re stuck hard-coding new test variations.

    How to Fix It

    Extract Your Test Data Into Parameters

    Instead of recording one search, parameterize it:

    javascript

    async function searchProduct(page, productName) {
      await page.fill('[data-testid="search-input"]', productName);
      await page.click('[data-testid="search-button"]');
    }
    
    // Run the same test with different data
    await test('search for existing product', async ({ page }) => {
      await searchProduct(page, 'iPhone 15 Pro Max');
      await expect(page.locator('[data-testid="product-result"]')).toBeVisible();
    });
    
    await test('search for non-existent product', async ({ page }) => {
      await searchProduct(page, 'Galaxy Brain 2000');
      await expect(page.locator('[data-testid="no-results"]')).toBeVisible();
    });

    Now you’re testing multiple scenarios with the same code. And when test data changes, you update it in one place.

    Use Fixtures for Common Setup Data

    Codegen might record logging in with hardcoded credentials. That breaks in production, CI, and everywhere else.

    Create a fixture:

    javascript

    // fixtures.js
    export const testUser = {
      email: '[email protected]',
      password: process.env.TEST_PASSWORD,
      firstName: 'QA',
      lastName: 'Tester',
    };
    
    // In your test
    const { page, testUser } = await test.use({ testUser });

    Now your credentials come from environment variables, not hard-coded strings in your test file.

    Problem 3: Tests That Record Fluff, Not Logic

    Codegen records everything you do. Including the mistakes.

    You misclick. You wait for something to load. You hover over something by accident. Codegen records all of it.

    Your generated test now has all this garbage in it:

    javascript

    await page.click('[data-testid="search-input"]');
    await page.waitForTimeout(500); // You just waiting, not testing
    await page.fill('[data-testid="search-input"]', 'test');
    await page.hover('[data-testid="filter-button"]'); // Why is this here?
    await page.click('[data-testid="filter-button"]');
    await page.waitForTimeout(300); // Another wait
    await page.click('[data-testid="search-button"]');

    It works, but it’s bloated. It’s slow. It’s fragile (timeouts break on slow CI).

    Most importantly, it doesn’t validate anything. It just reproduces what you did.

    How to Fix It

    Remove the Fluff

    After codegen finishes, go through line by line:

    • Remove unnecessary waits. Playwright has built-in waits that are smarter than hard-coded timeouts.
    • Remove clicks that don’t do anything.
    • Remove hovers unless they’re actually testing hover behavior.

    Clean version:

    javascript

    await page.fill('[data-testid="search-input"]', 'test');
    await page.click('[data-testid="filter-button"]');
    await page.click('[data-testid="search-button"]');

    Add Assertions That Actually Validate

    Codegen records the flow. It doesn’t validate the outcome.

    After you click search, what should happen? Add that validation:

    javascript

    await page.fill('[data-testid="search-input"]', 'test');
    await page.click('[data-testid="search-button"]');
    
    // This is what you're actually testing
    await expect(page.locator('[data-testid="search-results"]')).toHaveCount(5);
    await expect(page.locator('text=Results for "test"')).toBeVisible();

    Now the test validates something. It’s not just “did this sequence of clicks work,” it’s “did the search return the right results.”

    Use Explicit Waits Only When Necessary

    Instead of:

    javascript

    await page.waitForTimeout(500);

    Use assertions:

    javascript

    // Wait for the results to appear
    await expect(page.locator('[data-testid="search-results"]')).toBeVisible();

    Playwright waits until this is true, with intelligent retries. If it times out, you have a real problem to investigate. If it passes immediately, you don’t waste time.

    Problem 4: Monolithic Tests That Do Too Much

    Codegen encourages recording long flows. Sign in, navigate, fill form, submit, verify. All in one test.

    The problem: if step 3 fails, you don’t know if it’s step 3’s fault or if something earlier broke the state.

    javascript

    test('complete checkout', async ({ page }) => {
      // Step 1: Login
      await page.goto('https://playground.qajourney.net/login');
      await page.fill('[data-testid="email"]', '[email protected]');
      await page.fill('[data-testid="password"]', 'password');
      await page.click('[data-testid="login-button"]');
      
      // Step 2: Browse
      await page.click('[data-testid="search-button"]');
      await page.fill('[data-testid="search-input"]', 'Shoes');
      
      // Step 3: Add to cart
      await page.click('[data-testid="product-result"]:first-child');
      await page.click('[data-testid="add-to-cart"]');
      
      // Step 4: Checkout
      await page.click('[data-testid="cart-button"]');
      await page.click('[data-testid="checkout"]');
      
      // Step 5: Payment
      await page.fill('[data-testid="card-number"]', '4242 4242 4242 4242');
      await page.fill('[data-testid="expiry"]', '12/25');
      await page.click('[data-testid="complete-purchase"]');
      
      // Verify
      await expect(page.locator('text=Thank you for your order')).toBeVisible();
    });

    If this test fails on the payment step, you’ve wasted time verifying login, search, and cart. You already know those work.

    How to Fix It

    Break Monolithic Tests Into Reusable Flows

    Instead of one massive test, create helper functions:

    javascript

    // flows.js
    export async function loginAs(page, email, password) {
      await page.goto('https://playground.qajourney.net/login');
      await page.fill('[data-testid="email"]', email);
      await page.fill('[data-testid="password"]', password);
      await page.click('[data-testid="login-button"]');
      await expect(page).toHaveURL('**/dashboard');
    }
    
    export async function searchAndAddToCart(page, query, productIndex = 0) {
      await page.click('[data-testid="search-button"]');
      await page.fill('[data-testid="search-input"]', query);
      await page.locator('[data-testid="product-result"]').nth(productIndex).click();
      await page.click('[data-testid="add-to-cart"]');
    }
    
    export async function checkout(page, cardNumber, expiry) {
      await page.click('[data-testid="cart-button"]');
      await page.click('[data-testid="checkout"]');
      await page.fill('[data-testid="card-number"]', cardNumber);
      await page.fill('[data-testid="expiry"]', expiry);
      await page.click('[data-testid="complete-purchase"]');
    }
    
    // tests.js
    import { loginAs, searchAndAddToCart, checkout } from './flows.js';
    
    test('user can complete checkout', async ({ page }) => {
      await loginAs(page, '[email protected]', 'password123');
      await searchAndAddToCart(page, 'Shoes', 0);
      await checkout(page, '4242 4242 4242 4242', '12/25');
      
      await expect(page.locator('text=Thank you for your order')).toBeVisible();
    });
    
    // Now you can test individual flows
    test('login flow', async ({ page }) => {
      await loginAs(page, '[email protected]', 'password123');
      await expect(page).toHaveURL('**/dashboard');
    });
    
    test('search and add to cart', async ({ page }) => {
      await loginAs(page, '[email protected]', 'password123');
      await searchAndAddToCart(page, 'Shoes', 0);
      await expect(page.locator('[data-testid="cart-count"]')).toContainText('1');
    });

    Now if checkout fails, you already know login and search work. You debug faster. You reuse code. Your tests are shorter and easier to read.

    Problem 5: Knowing When NOT to Use Codegen

    Here’s what nobody tells you about codegen: sometimes it’s faster to just write the test yourself.

    As a QA lead, I still prefer manual testing in many cases. The same logic applies to test generation—just because you can generate it doesn’t mean you should.

    Codegen makes sense for:

    • Complex user flows that are hard to script by hand
    • Creating initial scaffolding for critical paths (then refactor it)
    • Testing visual regressions (record, screenshot, compare)
    • Prototyping tests quickly for exploratory testing

    Codegen doesn’t make sense for:

    • Any test with dynamic data
    • Error handling or edge cases
    • Form validation
    • Multi-step workflows that need to be broken into functions
    • Tests that need to run across multiple environments

    The truth is: automation isn’t the goal—quality is. And great QA engineers know when to use the right testing method for the right scenario.

    If you’re spending 40 minutes refactoring a generated test, you might have spent 10 minutes writing it from scratch. The overhead isn’t always worth it.

    When Codegen Creates More Work Than It Saves

    Think about your actual testing workflow. I used Playwright for what mattered: UI regression checks on staging. When I felt too lazy to test manually, automation picked up the slack. It became a tool—not a religion.

    The key insight: If you’re automating every test case just to feel productive, congratulations—you’ve just invented flaky hell.

    This is exactly why many teams pause CI/CD or scale back automation. The maintenance cost exceeds the benefit. For teams without fully implemented CI/CD pipelines, automation should be targeted rather than exhaustive. Instead of aiming for 100% automation, prioritize API automation over UI automation for faster validation.

    The Real Workflow: Codegen + Refactoring + Judgment

    This is how I actually use codegen now:

    Step 1: Decide if codegen makes sense (2 minutes)

    • Is this a repetitive flow? Does it change often? Will writing it by hand be faster?
    • If yes to “repetitive” and “stable,” continue. If no, write it manually.

    Step 2: Record the flow (5 minutes)

    bash

    npx playwright codegen https://playground.qajourney.net/

    Step 3: Clean up the selectors (10 minutes) Replace nth-child and text= selectors with data-testids or roles.

    Step 4: Extract into functions (10 minutes) Move the flow into a reusable helper so you can use it across multiple tests.

    Step 5: Parameterize the data (5 minutes) Replace hard-coded values with variables or fixtures.

    Step 6: Add assertions (10 minutes) Don’t just record actions. Validate outcomes.

    Total: 40 minutes to a maintainable test

    But notice: you only do this for critical, stable paths. For everything else, you write the test from scratch in 10 minutes and it’s clean from the start.

    The Hard Truth

    I’m not a formally trained automation engineer. I didn’t come from a Selenium bootcamp or spend months perfecting test pipelines. I came from real-world QA. And in real-world QA, you learn fast: the best tests aren’t generated. They’re written by someone who understands what they’re testing, how to structure it, and what will break it.

    Codegen is a tool to accelerate that process, not replace it.

    If you’re using codegen and your tests are becoming unmaintainable, you’re not using it wrong. You’re realizing that recording user interactions and writing good automated tests are two completely different skills.

    The best QA approach isn’t choosing between manual or automation. It’s about using all three strategically. That includes knowing when automation overhead isn’t worth the ROI.

    Quick Reference: When to Use Codegen vs. Write By Hand

    Use Codegen If:

    • The flow is stable and unlikely to change for 6+ months
    • It’s a critical path you’ll run dozens of times
    • You’re willing to spend time refactoring it
    • The test is too complex to script quickly

    Write By Hand If:

    • You can write it faster than codegen + refactoring
    • It’s a one-off or short-term test
    • It involves dynamic data or error scenarios
    • You need it maintainable from day one

    For Deeper Context

    Do the work. Your future self will thank you when the test passes six months later without modification.

    Jaren Cudilla
    Jaren Cudilla
    QA Overlord

    Learned automation wasn’t magic when 47-line generated tests started failing in production.
    Built refactoring workflows that actually save time, trained teams to know when codegen helps vs. when it hurts, and still debug flaky selectors so you don’t have to at 3 AM.
    Real QA needs real judgment calls not just more scripts.
    Share this article:
    Jaren Cudilla
    QA Overlord

    Tests what people assume works, breaks what AI insists is fine. Writes about real QA workflows, not prompt theater. If it passes without thinking, it’s probably wrong.

    Leave a Comment

    What is Playwright Codegen: When Automation Records Garbage (And What to Actually Do)?

    Updated October 2024 with new examples, refactoring workflows, and real-world fixes for brittle selectors.