The Anti-bot Solution to Scrape Everything? Get Your Free API Key! ๐Ÿ˜Ž

How to Bypass CAPTCHA Using Playwright

March 22, 2023 ยท 7 min read

Have you encountered any CAPTCHAs blocking your web scraper? These challenges can be a headache when automating data collection. Luckily, you can use Playwright to bypass CAPTCHA, and we'll walk you through three methods:

  1. Base Playwright and 2Captcha.
  2. Playwright with the Stealth plugin.
  3. Request masking with ZenRows.

If you're tired of dealing with those annoying tests, read on.

Can Playwright Solve CAPTCHA?

The purpose of CAPTCHAs is to be challenging for bots but easy for humans. However, we'll see that you can use Playwright together with complementary tools to get rid of them.

CAPTCHA
Click to open the image in full screen

An important lesson is you can either A) solve the test when it appears or B) prevent it from appearing and retry if it's shown.

In the first case, you'll need to employ a Playwright CAPTCHA solver, and it might get expensive at scale. In the second scenario, your scraper needs to simulate human behavior better to stay below the radar. We'll see both approaches, but the second one is the best practice as a foundation.

Now, let's see how you can implement them!

Method #1: Bypass CAPTCHA with Base Playwright and 2Captcha

The first method we'll discuss is using Playwright with 2Captcha, a service that solves CAPTCHAs by employing humans on your behalf.

2Captcha
Click to open the image in full screen

To get started with Playwright CAPTCHA bypassing, start by installing the library.

Terminal
npm install playwright

Then, sign up for a 2Captcha account to obtain your API key and install the package.

Terminal
npm install 2captcha

Now, go to your code editor, import both libraries and create an async function that launches the headless Chrome browser (with headless: true, as in production).

scraper.js
// Start with calling both Playwright and 2captcha
const { chromium } = require('playwright');
const Captcha = require("2captcha");

(async () => {
  const browser = await chromium.launch({ headless: true });
  const page = await browser.newPage();

Pass your API key into a Captcha.Solver class to gain access to 2Captcha services later in the code.

scraper.js
// Insert your API key here
  const solver = new Captcha.Solver("<Your 2Captcha API key>");

Navigate to a demo page containing a reCAPTCHA task, wait for the loading of the test iframe and retrieve its content through captchaFrame.contentFrame().ย  That'll enable you to locate and manipulate the essential elements required to solve the challenge.

scraper.js
// Call ReCaptcha Website
  const websiteUrl = "https://patrickhlauke.github.io/recaptcha/";
  await page.goto(websiteUrl);

  // Wait for the CAPTCHA element to load
  const captchaFrame = await page.waitForSelector("iframe[src*='recaptcha/api2']");

  // Switch to the CAPTCHA iframe
  const captchaFrameContent = await captchaFrame.contentFrame();

  // Wait for the CAPTCHA checkbox to appear
  const captchaCheckbox = await captchaFrameContent.waitForSelector("#recaptcha-anchor");

  // Click the CAPTCHA checkbox
  await captchaCheckbox.click();

Great! You're just a few steps away from solving it.

To get the answer you need, invoke the solver.recaptcha() method to send a request to 2Captcha's API and retrieve a response string containing the correct answer. Here, it's crucial to pass the data-sitekey parameter (i.e., 6Ld2sf4SAAAAAKSgzs0Q13IZhY02Pyo31S2jgOB5) from the CAPTCHA, a unique identifier for the type of challenge the website employs.

Once you have the answer, click the "Submit" button.

scraper.js
 // Wait for the CAPTCHA challenge to be solved by 2Captcha
  const captchaResponse = await solver.recaptcha("6Ld2sf4SAAAAAKSgzs0Q13IZhY02Pyo31S2jgOB5", websiteUrl);

  // Fill in the CAPTCHA response and submit the form
  const captchaInput = await captchaFrameContent.waitForSelector("#g-recaptcha-response");
  await captchaInput.evaluate((input, captchaResponse) => {
    input.value = captchaResponse;
  }, captchaResponse);
  await captchaFrameContent.waitForSelector("button[type='submit']").then((button) => button.click());

  // Wait for the page to navigate to the next page
  await page.waitForNavigation();

  console.log("CAPTCHA solved successfully!");

  await browser.close();
})();

Amazing! You've solved your first CAPTCHA with Playwright.

However, while 2Captcha can be a useful solution for testing and small-scale data extraction, it isn't the most cost-effective option for large-scale web scraping or solving all CAPTCHA types. The best approach is to prevent the challenge from being prompted.

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Method #2: Use Playwright with the Stealth Plugin

The previous Playwright setup won't work if you need to scrape data from a website that uses more complex CAPTCHA challenges, but the Stealth plugin is a handy solution. It's an open-source project that strengthens Playwright with additional features to mimic human web traffic:

  • It masks your User-Agent.
  • It disables WebRTC to prevent IP address identification. While it doesn't explicitly block tracking scripts, it still maintains privacy by obscuring browsing data.
  • It adds other elements to your headless browser to make your requests appear more natural.

Let's make our example more vivid and test with Astra, a website with basic Cloudflare protection.

Before getting started, install the required dependencies by running this command inside your project folder:

Terminal
npm install playwright playwright-extra

Note: You find the Stealth plugin in the playwright-extra framework.

Supercharge Playwright by calling a headless Chrome browser through playwright-extra and enabling puppeteer-extra-plugin-stealth using chromium.use(pluginStealth). This combination of tools provides additional measures to make it more difficult for websites to detect your web scraper.

scraper.js
const { chromium } = require('playwright-extra')
// Load the stealth plugin and use defaults (all tricks to hide playwright usage)
const pluginStealth = require("puppeteer-extra-plugin-stealth");  

// Use stealth
chromium.use(pluginStealth)

// That's it, the rest is playwright usage as normal ๐Ÿ˜Š
chromium.launch({ headless: true }).then(async browser => {

  // Create a new page 
  const page = await browser.newPage()

  // Go to the website 
  await page.goto('https://www.getastra.com/')

   // Wait for page to download
  await page.waitForTimeout(1000); 
   
  // Take screenshot 
  await page.screenshot({ path: 'screen.png'})

  // Close the browser 
  console.log('All done, check the screenshot. โœจ')
  await browser.close()
})

With a fresh web page loaded using browser.newPage() and calling a page.goto() function, our website is ready to be scraped.

Your script is now fully functional and can capture a screenshot, as shown below:

Screenshot
Click to open the image in full screen

Playwright with the Stealth plugin makes bypassing CAPTCHAs easier and more reliable than the previous method. However, some CAPTCHA systems may still detect and block your bot.

For example, when attempting to scrape websites with tougher Cloudflare protection, like G2, you may encounter an Access denied message when using the Stealth plugin.

Access Denied
Click to open the image in full screen

The ultimate solution for such cases is ZenRows. Let's learn about it!

Method #3: Best CAPTCHA Bypass with ZenRows

Unlike Playwright and other web automation frameworks, ZenRows is specifically designed for web crawling. It can solve even the most complex challenges of top-tier security systems, like Cloudflare (used by 1/5 of internet sites) and DataDome. You'll scrape G2 with it next to see that it works.

To try ZenRows, sign up to get your free API key and install it by running the following command:

Terminal
npm install zenrows

Then, use the following code, which performs an API request having enabled js_render, antibot and premium_proxy.

scraper.js
const { ZenRows } = require("zenrows");

(async () => {
    const client = new ZenRows("<Your api key>");
    const url = "https://www.g2.com/";

    try {
        const { data } = await client.get(url, {
			"js_render": "true",
			"antibot": "true",
			"premium_proxy": "true"
});
        console.log(data);
    } catch (error) {
        console.error(error.message);
        if (error.response) {
            console.error(error.response.data);
        }
    }
})();

Note: Remember to add your API key.

Run it and wait for beautiful success. ๐Ÿ˜Œ

Bypass with ZenRows
Click to open the image in full screen

Conclusion

Bypassing CAPTCHA with Playwright can be a hard task, as this popular challenge is designed to prevent automated access to websites. However, by using the right tools and libraries, you'll be able to scrape the data you want.

In this article, we saw three different methods to deal with CAPTCHAs:

  • Using base Playwright and 2Captcha.
  • Using Playwright with the Stealth plugin.
  • Masking requests with ZenRows.

The best solution depends on your specific needs, but ZenRows is a reliable option able to bypass even the toughest anti-bot challenges. Make the most of its free trial and make your first requests with it.

Did you find the content helpful? Spread the word and share it on Twitter, or LinkedIn.

Frustrated that your web scrapers are blocked once and again? ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

The easiest way to do Web Scraping

From Rotating Proxies and Headless Browsers to CAPTCHAs, a single API call to ZenRows handles all anti-bot bypass for you.