How to Bypass CAPTCHA Using Playwright

Idowu Omisola
Idowu Omisola
Updated: November 20, 2024 · 4 min read

Have you encountered CAPTCHAs blocking your web scraper? Facing a CAPTCHA challenge can be frustrating during web scraping. Luckily, you can use Playwright CAPTCHA bypass techniques, and we'll walk you through three methods:

  1. Base Playwright and 2Captcha.
  2. Playwright with the Stealth plugin.
  3. Request masking with ZenRows.

Read on if you're tired of dealing with CAPTCHA interruptions while scraping.

Why Playwright Alone Isn't Enough to Bypass CAPTCHA

Playwright is a valuable web scraping tool, as it can handle dynamic websites and mimic human users. Unfortunately, it has bot-like attributes that most websites detect quickly. Consequently, it can't bypass CAPTCHA with Playwright on its own.

For instance, Playwright presents bot-like fingerprints like the presence of an automated WebDriver, a HeadlessChrome parameter in the headless User Agent string, missing plugins like the PDF Viewer, misconfigured renderers, etc.

All these factors indicate to the target site that you're trying to gain automated access to extract data. The purpose of CAPTCHAs like the one below is to be challenging for automated bots but easy for humans to solve.

Captcha Example
Click to open the image in full screen

However, while these limitations make Playwright detectable by anti-bot systems, you can make it more effective for web scraping tasks by fortifying it with the correct Playwright CAPTCHA bypass techniques. This approach typically involves pairing Playwright with complementary tools to bypass CAPTCHAs.

Although you can attempt to solve the CAPTCHA test when it appears, it's better to prevent it from appearing at all.

In solving the CAPTCHA, you'll need to employ a Playwright CAPTCHA solver, which might be slow and expensive, making it unsuitable for large-scale scraping. Bypassing CAPTCHA requires your scraper to simulate human behavior better to stay below the radar. 

Let's see how to implement these solutions.

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Method #1: Use 2Captcha for Playwright CAPTCHA Solving

The first method you'll learn is using Playwright with 2Captcha, a service that solves CAPTCHAs by employing humans on your behalf. 

Let's see how it works using a reCAPTCHA demo page as the target.

Google reCAPTCHA Demo
Click to open the image in full screen

To get started with Playwright CAPTCHA solving, install the library.

Terminal
pip3 install 2captcha-python 

Add the 2captcha-python library to your imports and specify the target site. Start a browser in headless mode and instantiate the CAPTCHA solver with your 2Captcha API key (create a 2Captcha account to obtain one):

Example
# pip3 install playwright 2captcha-python
from playwright.sync_api import sync_playwright
from twocaptcha import TwoCaptcha

# target URL with reCAPTCHA
url = "https://patrickhlauke.github.io/recaptcha/"

# run Playwright
with sync_playwright() as p:
    # launch the browser in headless mode
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    solver = TwoCaptcha("<YOUR_API_KEY>")

Open the target URL and obtain the iFrame containing the CAPTCHA box. Switch to the iFrame and extract the site key from its src attribute. Click the CAPTCHA checkbox to spin the image puzzle:

Example
# ...

# run Playwright
with sync_playwright() as p:
    # ...
    # open the target URL
    page.goto(url)

    # obtain the iFrame containing the CAPTCHA box
    captcha_frame = page.wait_for_selector("iframe[src*='recaptcha']")

    # switch to the content of the CAPTCHA iframe
    captcha_frame_content = captcha_frame.content_frame()

    # extract site key for the CAPTCHA
    site_key = captcha_frame.get_attribute("src").split("k=")[-1].split("&")[0]

    # get the CAPTCHA checkbox element
    captcha_checkbox = captcha_frame_content.wait_for_selector("#recaptcha-anchor")

    # click the CAPTCHA checkbox
    captcha_checkbox.click()

Call the solver object with the site key and retrieve the token from the result. Enter the token into the response field to solve the on-page CAPTCHA. Log the input value to confirm token generation:

Example
# ...

# run Playwright
with sync_playwright() as p:
    # ...

    # solve CAPTCHA
    captcha_response = solver.recaptcha(sitekey=site_key, url=url)
    # extract the Turnstile token from the response
    captcha_token = captcha_response["code"]

    if captcha_response:
        # fill in the CAPTCHA response in the hidden input
        page.evaluate(
            f'document.querySelector("#g-recaptcha-response").value="{captcha_response}"'
        )

    # ... further actions (e.g., trigger form submission or specific action)

    # wait to observe the result
    page.wait_for_timeout(5000)
    browser.close()

Merge the snippets. Here's the complete code:

Example
# pip3 install playwright 2captcha-python
from playwright.sync_api import sync_playwright
from twocaptcha import TwoCaptcha

# target URL with reCAPTCHA
url = "https://patrickhlauke.github.io/recaptcha/"

# run Playwright
with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    solver = TwoCaptcha("<YOUR_API_KEY>")

    # open the target URL
    page.goto(url)

    # obtain the iFrame containing the CAPTCHA box
    captcha_frame = page.wait_for_selector("iframe[src*='recaptcha']")

    # switch to the content of the CAPTCHA iframe
    captcha_frame_content = captcha_frame.content_frame()

    # extract site key for the CAPTCHA
    site_key = captcha_frame.get_attribute("src").split("k=")[-1].split("&")[0]

    # get the CAPTCHA checkbox element
    captcha_checkbox = captcha_frame_content.wait_for_selector("#recaptcha-anchor")

    # click the CAPTCHA checkbox
    captcha_checkbox.click()

    # solve CAPTCHA
    captcha_response = solver.recaptcha(sitekey=site_key, url=url)
    # extract the Turnstile token from the response
    captcha_token = captcha_response["code"]

    if captcha_response:
        # fill in the CAPTCHA response in the hidden input
        input = page.evaluate(
            f'document.querySelector("#g-recaptcha-response").value="{captcha_token}"'
        )
        # check if the token has been entered correctly
        print(input)

        page.screenshot(path="screengrab.png")

    # ... further actions (e.g., trigger form submission or specific action)

    # wait to observe the result
    page.wait_for_timeout(5000)
    browser.close()

Amazing! You've built your first Playwright CAPTCHA solver.

However, while 2Captcha can be a useful solution for small-scale data extraction, it doesn't work at scale and isn't suitable for solving all CAPTCHA types. As mentioned earlier, the best approach is to prevent the challenge from being triggered in the first place.

Method #2: Bypass CAPTCHAs With Playwright Stealth Plugin

The Playwright Stealth plugin is a handy solution for bypassing CAPTCHAs. It's an open-source Playwright Extra plugin that strengthens Playwright with various evasion techniques to mimic human behavior during web scraping.

For example, the Stealth plugin patches the Playwright User Agent, spoofs a real browser's runtime to mimic an actual browser, turns off WebRTC to prevent IP address identification, changes the WebDriver navigator field from true to false, etc.

Let's make our example more concrete and test it with this Anti-bot Challenge page:

Before getting started, install the required dependencies by running this command inside your project folder:

Terminal
pip3 install playwright-stealth

Import the Stealth package, launch a new headless browser instance, and add the plugin to Playwright by calling stealth_sync:

Example
# pip3 install playwright playwright-stealth
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync

# launch the Playwright instance
with sync_playwright() as playwright:
    # launch the browser
    browser = playwright.chromium.launch(headless=True)

    # create a new page
    page = browser.new_page()

    # apply stealth settings to the page
    stealth_sync(page)

Open the protected target site and take its screenshot:

Example
# ...

# launch the Playwright instance
with sync_playwright() as playwright:
    # ...

    # navigate to the desired URL
    page.goto("https://www.scrapingcourse.com/antibot-challenge")

    # wait for any dynamic content to load
    page.wait_for_load_state("networkidle")

    # take a screenshot of the page
    page.screenshot(path="screenshot.png")

    # close the browser
    browser.close()

Here's the complete code after combining both snippets:

Example
# pip3 install playwright playwright-stealth
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync

# launch the Playwright instance
with sync_playwright() as playwright:
    # launch the browser
    browser = playwright.chromium.launch(headless=True)

    # create a new page
    page = browser.new_page()

    # apply stealth settings to the page
    stealth_sync(page)

    # navigate to the desired URL
    page.goto("https://www.scrapingcourse.com/antibot-challenge")

    # wait for any dynamic content to load
    page.wait_for_load_state("networkidle")

    # take a screenshot of the page
    page.screenshot(path="coursecom.png")

    # close the browser
    browser.close()

However, running the above code generates a screenshot showing that the Stealth plugin couldn't bypass the Turnstile CAPTCHA:

scrapingcourse cloudflare blocked screenshot
Click to open the image in full screen

The Playwright Stealth Plugin failed because it doesn't work against advanced CAPTCHA technologies like the Cloudflare Turnstile. 

Although we expected the plugin to help us avoid triggering the Turnstile CAPTCHA checkbox, it didn't work because it still leaks some bot-like properties that Cloudflare doesn't overlook. That said, the current scraper can still work with simpler CAPTCHA protections, but not at scale.

There aren't many effective CAPTCHA-bypass options for Playwright. The ultimate solution for such cases is ZenRows. Let's learn more about it!

Method #3: Best CAPTCHA Bypass With ZenRows

The best solution to bypass all CAPTCHAs on any website is to use ZenRows' Universal Scraper API. It provides everything you need to avoid CAPTCHA challenges, including JavaScript rendering capabilities, automatic header management, premium proxy rotation, and more.

Let's see how ZenRows performs against a protected page like the anti-bot challenge page.

Start by signing up for a new account, and you'll get to the Request Builder.

building a scraper with zenrows
Click to open the image in full screen

Paste the target URL, enable JS Rendering, and activate Premium Proxies.

Next, select Python and click on the API connection mode. Then, copy the generated code and paste it into your script.

scraper.py
# pip3 install requests
import requests

url = "https://www.scrapingcourse.com/antibot-challenge"
apikey = "<YOUR_ZENROWS_API_KEY>"
params = {
    "url": url,
    "apikey": apikey,
    "js_render": "true",
    "premium_proxy": "true",
}
response = requests.get("https://api.zenrows.com/v1/", params=params, print(response.text)

The generated code uses Python's Requests library as the HTTP client. You can install this library using pip:

Terminal
pip3 install requests

Run the code, and you'll successfully access the page:

File
<html lang="en">
<head>
    <!-- ... -->
    <title>Antibot Challenge - ScrapingCourse.com</title>
    <!-- ... -->
</head>
<body>
    <!-- ... -->
    <h2>
        You bypassed the Antibot challenge! :D
    </h2>
    <!-- other content omitted for brevity -->
</body>
</html>

Congratulations! 🎉 You've successfully bypassed the anti-bot challenge page using ZenRows. This works for any website.

Conclusion

In this guide, you've learned the different ways to handle CAPTCHAs in web scraping:

  • Why Playwright alone isn't enough to bypass CAPTCHA.
  • How to use 2Captcha for Playwright CAPTCHA solving.
  • How to bypass CAPTCHAs with Playwright Stealth plugin.
  • How to bypass all CAPTCHAs on any website.

We've explored three different ways to handle CAPTCHAs while web scraping. Basic Playwright alone is not enough, and while 2Captcha and the Stealth plugin help in some cases, they both have significant limitations. ZenRows is the most reliable solution to effectively bypass any CAPTCHA. Try ZenRows for free.

Ready to get started?

Up to 1,000 URLs for free are waiting for you