How to Bypass CAPTCHA Using Playwright

Idowu Omisola
Idowu Omisola
November 20, 2024 · 4 min read

Have you encountered CAPTCHAs blocking your web scraper? Facing a CAPTCHA challenge can be frustrating during web scraping. Luckily, you can use Playwright to bypass CAPTCHA, and we'll walk you through three methods:

  1. Base Playwright and 2Captcha.
  2. Playwright with the Stealth plugin.
  3. Request masking with ZenRows.

Read on if you're tired of dealing with CAPTCHA interruptions while scraping.

Why Playwright Alone Isn't Enough to Bypass CAPTCHA

Playwright is a valuable web scraping tool, as it can handle dynamic websites and mimic human users. Unfortunately, it has bot-like attributes that most websites detect quickly. Consequently, it can't bypass CAPTCHA challenges on its own.

For instance, Playwright presents bot-like fingerprints like the presence of an automated WebDriver, a HeadlessChrome parameter in the headless User Agent string, missing plugins like the PDF Viewer, misconfigured renderers, etc.

All these factors indicate to the target site that you're trying to gain automated access to extract data. The purpose of CAPTCHAs like the one below is to be challenging for automated bots but easy for humans to solve.

Captcha Example
Click to open the image in full screen

However, while these limitations make Playwright detectable by anti-bot systems, you can make it more effective for web scraping tasks by fortifying it with the correct Playwright CAPTCHA bypass techniques. This approach typically involves pairing Playwright with complementary tools to bypass CAPTCHAs.

Although you can attempt to solve the CAPTCHA test when it appears, it's better to prevent it from appearing at all.

In solving the CAPTCHA, you'll need to employ a Playwright CAPTCHA solver, which might be slow and expensive, making it unsuitable for large-scale scraping. Bypassing CAPTCHA requires your scraper to simulate human behavior better to stay below the radar. 

Let's see how to implement these solutions.

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Method #1: Use 2Captcha for Playwright CAPTCHA Solving

The first method you'll learn is using Playwright with 2Captcha, a service that solves CAPTCHAs by employing humans on your behalf. 

Let's see how it works using a reCAPTCHA demo page as the target.

Google reCAPTCHA Demo
Click to open the image in full screen

To get started with Playwright CAPTCHA solving, install the library.

Terminal
pip3 install 2captcha-python 

Add the 2captcha-python library to your imports and specify the target site. Start a browser in headless mode and instantiate the CAPTCHA solver with your 2Captcha API key (create a 2Captcha account to obtain one):

Example
# pip3 install playwright 2captcha-python
from playwright.sync_api import sync_playwright
from twocaptcha import TwoCaptcha

# target URL with reCAPTCHA
url = "https://patrickhlauke.github.io/recaptcha/"

# run Playwright
with sync_playwright() as p:
    # launch the browser in headless mode
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    solver = TwoCaptcha("<YOUR_API_KEY>")

Open the target URL and obtain the iFrame containing the CAPTCHA box. Switch to the iFrame and extract the site key from its src attribute. Click the CAPTCHA checkbox to spin the image puzzle:

Example
# ...

# run Playwright
with sync_playwright() as p:
    # ...
    # open the target URL
    page.goto(url)

    # obtain the iFrame containing the CAPTCHA box
    captcha_frame = page.wait_for_selector("iframe[src*='recaptcha']")

    # switch to the content of the CAPTCHA iframe
    captcha_frame_content = captcha_frame.content_frame()

    # extract site key for the CAPTCHA
    site_key = captcha_frame.get_attribute("src").split("k=")[-1].split("&")[0]

    # get the CAPTCHA checkbox element
    captcha_checkbox = captcha_frame_content.wait_for_selector("#recaptcha-anchor")

    # click the CAPTCHA checkbox
    captcha_checkbox.click()

Call the solver object with the site key and retrieve the token from the result. Enter the token into the response field to solve the on-page CAPTCHA. Log the input value to confirm token generation:

Example
# ...

# run Playwright
with sync_playwright() as p:
    # ...

    # solve CAPTCHA
    captcha_response = solver.recaptcha(sitekey=site_key, url=url)
    # extract the Turnstile token from the response
    captcha_token = captcha_response["code"]

    if captcha_response:
        # fill in the CAPTCHA response in the hidden input
        page.evaluate(
            f'document.querySelector("#g-recaptcha-response").value="{captcha_response}"'
        )

    # ... further actions (e.g., trigger form submission or specific action)

    # wait to observe the result
    page.wait_for_timeout(5000)
    browser.close()

Merge the snippets. Here's the complete code:

Example
# pip3 install playwright 2captcha-python
from playwright.sync_api import sync_playwright
from twocaptcha import TwoCaptcha

# target URL with reCAPTCHA
url = "https://patrickhlauke.github.io/recaptcha/"

# run Playwright
with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    solver = TwoCaptcha("<YOUR_API_KEY>")

    # open the target URL
    page.goto(url)

    # obtain the iFrame containing the CAPTCHA box
    captcha_frame = page.wait_for_selector("iframe[src*='recaptcha']")

    # switch to the content of the CAPTCHA iframe
    captcha_frame_content = captcha_frame.content_frame()

    # extract site key for the CAPTCHA
    site_key = captcha_frame.get_attribute("src").split("k=")[-1].split("&")[0]

    # get the CAPTCHA checkbox element
    captcha_checkbox = captcha_frame_content.wait_for_selector("#recaptcha-anchor")

    # click the CAPTCHA checkbox
    captcha_checkbox.click()

    # solve CAPTCHA
    captcha_response = solver.recaptcha(sitekey=site_key, url=url)
    # extract the Turnstile token from the response
    captcha_token = captcha_response["code"]

    if captcha_response:
        # fill in the CAPTCHA response in the hidden input
        input = page.evaluate(
            f'document.querySelector("#g-recaptcha-response").value="{captcha_token}"'
        )
        # check if the token has been entered correctly
        print(input)

        page.screenshot(path="screengrab.png")

    # ... further actions (e.g., trigger form submission or specific action)

    # wait to observe the result
    page.wait_for_timeout(5000)
    browser.close()

Amazing! You've built your first Playwright CAPTCHA solver.

However, while 2Captcha can be a useful solution for small-scale data extraction, it doesn't work at scale and isn't suitable for solving all CAPTCHA types. As mentioned earlier, the best approach is to prevent the challenge from being triggered in the first place.

Method #2: Bypass CAPTCHAs With Playwright Stealth Plugin

The Playwright Stealth plugin is a handy solution for bypassing CAPTCHAs. It's an open-source Playwright Extra plugin that strengthens Playwright with various evasion techniques to mimic human behavior during web scraping.

For example, the Stealth plugin patches the Playwright User Agent, spoofs a real browser's runtime to mimic an actual browser, turns off WebRTC to prevent IP address identification, changes the WebDriver navigator field from true to false, etc.

Let's make our example more concrete and test it with this Anti-bot Challenge page:

Before getting started, install the required dependencies by running this command inside your project folder:

Terminal
pip3 install playwright-stealth

Import the Stealth package, launch a new headless browser instance, and add the plugin to Playwright by calling stealth_sync:

Example
# pip3 install playwright playwright-stealth
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync

# launch the Playwright instance
with sync_playwright() as playwright:
    # launch the browser
    browser = playwright.chromium.launch(headless=True)

    # create a new page
    page = browser.new_page()

    # apply stealth settings to the page
    stealth_sync(page)

Open the protected target site and take its screenshot:

Example
# ...

# launch the Playwright instance
with sync_playwright() as playwright:
    # ...

    # navigate to the desired URL
    page.goto("https://www.scrapingcourse.com/antibot-challenge")

    # wait for any dynamic content to load
    page.wait_for_load_state("networkidle")

    # take a screenshot of the page
    page.screenshot(path="screenshot.png")

    # close the browser
    browser.close()

Here's the complete code after combining both snippets:

Example
# pip3 install playwright playwright-stealth
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync

# launch the Playwright instance
with sync_playwright() as playwright:
    # launch the browser
    browser = playwright.chromium.launch(headless=True)

    # create a new page
    page = browser.new_page()

    # apply stealth settings to the page
    stealth_sync(page)

    # navigate to the desired URL
    page.goto("https://www.scrapingcourse.com/antibot-challenge")

    # wait for any dynamic content to load
    page.wait_for_load_state("networkidle")

    # take a screenshot of the page
    page.screenshot(path="coursecom.png")

    # close the browser
    browser.close()

However, running the above code generates a screenshot showing that the Stealth plugin couldn't bypass the Turnstile CAPTCHA:

scrapingcourse cloudflare blocked screenshot
Click to open the image in full screen

The Playwright Stealth Plugin failed because it doesn't work against advanced CAPTCHA technologies like the Cloudflare Turnstile. 

Although we expected the plugin to help us avoid triggering the Turnstile CAPTCHA checkbox, it didn't work because it still leaks some bot-like properties that Cloudflare doesn't overlook. That said, the current scraper can still work with simpler CAPTCHA protections, but not at scale.

There aren't many effective CAPTCHA-bypass options for Playwright. The ultimate solution for such cases is ZenRows. Let's learn more about it!

Method #3: Best CAPTCHA Bypass With ZenRows

ZenRows is the best solution for bypassing CAPTCHAs automatically. It features all the toolkits for successful web crawling and scraping, including premium proxy rotation, request header management, JavaScript rendering support, CAPTCHA auto-bypass, and more.

It bypasses even the most complex challenges posed by top-tier security systems, like Cloudflare (used by 1/5 of internet sites), DataDome, etc. ZenRows even helps you handle advanced fingerprinting with headless browsing, allowing you to simulate human interactions while scraping. As a result, it can serve as a substitute for browser automation tools like Playwright.

Let's try scraping the previous Anti-bot Challenge page with ZenRows to see how it works.

Sign up to open the ZenRows Request Builder. Paste the target URL in the link box and activate Premium Proxies and JS Rendering.

Next, select your programming language (Python, in this case) and choose the API connection mode. Copy and paste the generated code into your Python script:

building a scraper with zenrows
Click to open the image in full screen

The generated code should look like this:

Example
# pip install requests
import requests

url = 'https://www.scrapingcourse.com/antibot-challenge'
apikey = '<YOUR_ZENROWS_API_KEY>'

params = {
    'url': url,
    'apikey': apikey,
    'js_render': 'true',
    'premium_proxy': 'true',
}

response = requests.get('https://api.zenrows.com/v1/', params=params)
print(response.text)

The above scraper accesses the protected website and scrapes its full-page HTML, as shown:

Output
<html lang="en">
<head>
    <!-- ... -->
    <title>Antibot Challenge - ScrapingCourse.com</title>
    <!-- ... -->
</head>
<body>
    <!-- ... -->
    <h2>
        You bypassed the Antibot challenge! :D
    </h2>
    <!-- other content omitted for brevity -->
</body>
</html>

Bravo! 💪 You just bypassed a CAPTCHA challenge with ZenRows.

Conclusion

Bypassing CAPTCHAs with Playwright can be tricky, as this popular challenge is designed to prevent automated website access. There are only a few solutions to bypass CAPTCHA with Playwright. Solving CAPTCHAs isn't sustainable, and the Playwright Stealth plugin also falls short against complex CAPTCHA challenges.

Fortunately, ZenRows is a reliable option to bypass even the toughest CAPTCHA and anti-bot challenges. All it takes is a single API request.

Try ZenRows for free now, no credit card required!

Ready to get started?

Up to 1,000 URLs for free are waiting for you