How to Bypass CAPTCHA With Puppeteer

May 31, 2024 ยท 12 min read

Does your Puppeteer web scraper struggle to bypass CAPTCHA? You've come to the right place for the solution!

In this tutorial, you'll learn the four best ways to deal with CAPTCHA while using Puppeteer and scrape without obstacles.

Can Puppeteer Solve CAPTCHA?

The short answer is yes, but only if you give Puppeteer a boost. That's because Puppeteer alone can't automate CAPTCHA-clicking.

For instance, try scraping G2 Reviews, a website protected by Cloudflare CAPTCHA, with vanilla Puppeteer:

scraper.js
// import the required library
const puppeteer = require("puppeteer");

(async () => {
  
    // start Puppeteer in headless mode and open the target website
    const browser = await puppeteer.launch({ headless: "new" });
    const page = await browser.newPage();
    const response = await page.goto("https://www.g2.com/products/asana/reviews");
    
    // wait for the content to load
    await page.waitForSelector("body");
  
    // get the content of the page
    const content = await page.content();
    console.log(response.status(), content);
    
    // close the browser instance
    await browser.close();
})();

The code outputs the following HTML, indicating that Cloudflare has blocked your Puppeteer scraper:

Output
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
  
    <!--  ...    -->
  
    <title>Attention Required! | Cloudflare</title>
</head>

The CAPTCHA typically looks like this in the GUI (non-headless mode):

Click to open the image in full screen

Generally, there are two ways to CAPTCHAs:

  • Solve the CAPTCHA once it triggers.
  • Bypass the CAPTCHA completely.

Puppeteer can solve CAPTCHAs only if supported with external CAPTCHA-solving tools. The vanilla version of Puppeteer is an automation library, not designed to solve CAPTCHAs.

The most efficient way of handling CAPTCHA is bypassing it by preventing it from appearing. While Puppeteer's headless browser capability may help you bypass CAPTCHAs, it still requires backup from plugins like Puppeteer Stealth.

Another way to bypass CAPTCHAs is by optimizing Puppeteer's request headers to mimic a real user or setting up a proxy with Puppeteer to switch IPs.

Let's look at the four best techniques to handle CAPTCHAs with Puppeteer.

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Method #1: Supercharge Puppeteer With Stealth to Bypass CAPTCHA

Puppeteer Stealth is a plugin featuring various evasion techniques for bypassing anti-bot detection during web scraping. It removes bot-like attributes from Puppeteer's ChromeDriver, making it appear as a legitimate browser.

The Stealth plugin requires some technical setup, but it's a free method of bypassing CAPTCHA with Puppeteer.

Let's see how it works by scraping OpenSea, an anti-bot-protected website that presents CAPTCHAs when a request doesn't meet its security criteria.

To get started, you have to install puppeteer-extra and puppeteer-extra-plugin-stealth:

Terminal
npm install puppeteer-extra puppeteer-extra-plugin-stealth

Once installed, import the modules and enable the stealth plugin:

scraper.js
const puppeteer = require("puppeteer-extra"); 
const pluginStealth = require("puppeteer-extra-plugin-stealth"); 
 
//save to executable path
const { executablePath } = require("puppeteer"); 
 
// use stealth 
puppeteer.use(pluginStealth());

The next steps include setting the viewport, navigating to the page URL, waiting for it to load, and taking screenshots to track the process.

scraper.js
// ...

// launch puppะตteer-stealth 
puppeteer.launch({ executablePath: executablePath() }).then(async browser => { 
	// create a new page 
	const page = await browser.newPage(); 
 
	// set page view 
	await page.setViewport({ width: 1280, height: 720 }); 
 
	// navigate to the website 
	await page.goto("https://www.opensea.io/"); 
 
	// wait for page to load
	await page.waitForTimeout(1000); 
 
	// take a screenshot 
	await page.screenshot({ path: "image.png" }); 
 
	// close the browser 
	await browser.close(); 
});

Here's the complete code:

scraper.js
const puppeteer = require("puppeteer-extra"); 
 
// add stealth plugin and use defaults 
const pluginStealth = require("puppeteer-extra-plugin-stealth"); 
const { executablePath } = require("puppeteer"); 
 
// use stealth 
puppeteer.use(pluginStealth()); 
 
// launch puppeteer-stealth 
puppeteer.launch({ executablePath: executablePath() }).then(async browser => { 
	// create a new page 
	const page = await browser.newPage(); 
 
	// set page view 
	await page.setViewport({ width: 1280, height: 720 }); 
 
	// navigate to the website 
	await page.goto("https://www.opensea.io/"); 
 
	// wait for page to load
	await page.waitForTimeout(1000); 
 
	// take a screenshot 
	await page.screenshot({ path: "image.png" }); 
 
	// close the browser 
	await browser.close(); 
});

The above code outputs the following screenshot:

bypass-captcha-puppeteer
Click to open the image in full screen

Congrats! You've just made your scraper more undetectable.

However, more advanced website protections can detect Puppeteer Stealth. Confirm that by running the same script on a G2's product page. Here's the output:

bypass-g2
Click to open the image in full screen

Puppeteer and Stealth plugin couldn't solve the CAPTCHA problem. Let's see a solution that works in the next section.

Method #2: Best CAPTCHA Bypass With ZenRows

As mentioned, the best way to handle CAPTCHA is to avoid it. That's where ZenRows, an all-in-one web scraping API, comes in. It modifies your request headers, auto-rotates premium proxies, and bypasses CAPTCHAs and other anti-bot measures at scale in a single API call.

ZenRows also features JavaScript instructions, allowing it to act as a headless browser for extracting content from dynamic websites like those using infinite scrolling. Thanks to this feature, you can replace Puppeteer with ZenRows and focus on scraping your target content without getting blocked.

Let's see ZenRows in action by scraping the G2 page where Puppeteer Stealth got blocked.

Sign up to open the ZenRows Request Builder. Paste the target URL in the link box, toggle on the Boost mode to JS Rendering, and activate Premium Proxies. Select Node.js as your programming language and choose the API connection mode. Copy and paste the generated code into your JavaScript:

building a scraper with zenrows
Click to open the image in full screen

Here's a slightly modified version of the generated code:

scraper.js
// npm install axios
const axios = require("axios");

// define your request parameters and make an axios request
axios({
    url: "https://api.zenrows.com/v1/",
    method: "GET",
    params: {
        "url": "https://www.g2.com/products/asana/reviews",
        "apikey": "<YOUR_ZENROWS_API_KEY>",
        "js_render": "true",
        "premium_proxy": "true",
    },
})
    .then(response => console.log(response.data))
    .catch(error => console.log(error));

The code extracts the full-page HTML of the Cloudflare-protected website. The result below shows the page title and omitted content:

Output
<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8" />
    <link href="https://www.g2.com/images/favicon.ico" rel="shortcut icon" type="image/x-icon" />
    <title>Asana Reviews, Pros + Cons, and Top Rated Features</title>
</head>
<body>
    <!-- other content omitted for brevity -->
</body>

You're all set! ZenRows makes bypassing CAPTCHAs and advanced anti-bot measures quick and easy.

Would you rather try solving the CAPTCHA manually instead? Let's go through two more methods.

Method #3: Implement a Free Solver Plugin

Puppeteer-extra-plugin-recaptcha is a free and open-source module that automates the solving of reCAPTCHA and hCAPTCHA, two of the most popular anti-bot technologies on the market. It also supports a 2Captcha integration that you can use when the free module proves insufficient.

Let's use 2Captcha's demo page to illustrate how to integrate the solver with Puppeteer.

bypass-2captcha-puppeteer
Click to open the image in full screen

To get started, install puppeteer-extra and recaptcha.

Terminal
npm install puppeteer puppeteer-extra puppeteer-extra-plugin-recaptcha

Import the libraries and provide your 2Captcha API key as a token.

script.js
const puppeteer = require('puppeteer-extra')
const RecaptchaPlugin = require('puppeteer-extra-plugin-recaptcha')
 
// use the RecaptchaPlugin with the specified provider (2captcha) and token
puppeteer.use(
  RecaptchaPlugin({
    provider: {
      id: '2captcha',
      token: 'XXXXXXX' 
    },
    visualFeedback: true // enable visual feedback (colorize reCAPTCHAs)
  })
)

Next, navigate to your target webpage and initialize the solving with the page.solveRecaptchas() method.

script.js
// launch a headless browser instance
puppeteer.launch({ headless: true }).then(async browser => {
  // create a new page
  const page = await browser.newPage()
 
  // navigate to a page containing a reCAPTCHA challenge
  await page.goto('https://2captcha.com/demo/recaptcha-v2')
 
  // automatically solve the reCAPTCHA challenge
  await page.solveRecaptchas()

Now, wait for the solution and click on the submit button.

script.js
// wait for the navigation and click the submit button
await Promise.all([
await Promise.all([
  page.waitForNavigation(),
  page.click(`#recaptcha-demo-submit`)
])

The complete code should be:

script.js
const puppeteer = require('puppeteer-extra')
const RecaptchaPlugin = require('puppeteer-extra-plugin-recaptcha')
 
// use the RecaptchaPlugin with the specified provider (2captcha) and token
puppeteer.use(
  RecaptchaPlugin({
    provider: {
      id: '2captcha',
      token: 'XXXXXXX' 
    },
    visualFeedback: true // enable visual feedback (colorize reCAPTCHAs)
  })
)
 
// launch a headless browser instance
puppeteer.launch({ headless: true }).then(async browser => {
  // create a new page
  const page = await browser.newPage()
 
  // navigate to a page containing a reCAPTCHA challenge
  await page.goto('https://2captcha.com/demo/recaptcha-v2')
 
  // automatically solve the reCAPTCHA challenge
  await page.solveRecaptchas()
 
  // wait for the navigation and click the submit button
  await Promise.all([
    page.waitForNavigation(),
    page.click(`#recaptcha-demo-submit`)
  ])
 
  // take a screenshot of the response page
  await page.screenshot({ path: 'response.png', fullPage: true })
 
  // close the browser
  await browser.close()
})

Here's the outcome:

bypass-recaptcha
Click to open the image in full screen

Great! You've just successfully solved the CAPTCHA.

However, free CAPTCHA solvers are unreliable because they're automated. If they fail, you should look into paid solvers. Since they employ humans, they can interact with any CAPTCHA type.

Let's imagine you encounter a CAPTCHA-protected form while scraping and need to solve it. Here, we'll use 2Captcha, an API-based service that employs humans to solve the challenge.

Letโ€™s go with the same 2Captcha's demo page.

First, sign up on 2Captcha to get an API key. Then, install Puppeteer and the requests module.

Terminal
npm install puppeteer request

Now, let's write a script that opens the website you want to scrape, takes a screenshot of the CAPTCHA, and sends it to the service.

script.js
const puppeteer = require('puppeteer');
const request = require('request');
 
(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
 
  // navigate to the page with the CAPTCHA
  await page.goto('https://2captcha.com/demo/normal');
 
  // take a screenshot of the CAPTCHA
  const screenshot = await page.screenshot();
 
  // convert the screenshot to a base64 encoded string
  const image = new Buffer(screenshot).toString('base64');
 
  // send the image to the 2Captcha API
  request.post({
    url: 'http://2captcha.com/in.php',
    formData: {
      key: 'your_2captcha_api_key',
      method: 'base64',
      body: image
    }
  }, async (error, response, body) => {
    if (error) {
      console.error(error);
    } 

Let's capture the API response using an ID, as shown below:

script.js
// get the CAPTCHA ID from the 2Captcha API response
const captchaId = body.split('|')[1];

// request the CAPTCHA solution from the 2Captcha API
request.get({
  url: `http://2captcha.com/res.php?key=your_2captcha_api_key&action=get&id=${captchaId}`
}, (error, response, body) => {
  if (error) {
    console.error(error);
  }
});

Once we get the solution, we can put it on the page to solve the test.

script.js
// get the CAPTCHA solution from the 2Captcha API response
const captchaSolution = body.split('|')[1];

// use the CAPTCHA solution in your Puppeteer script
await page.type('#captcha-input', captchaSolution);
await page.click('#submit-button');

This is what the full script will look like:

script.js
const puppeteer = require('puppeteer');
const request = require('request');
 
(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
 
  // navigate to the page with the CAPTCHA
  await page.goto('https://example.com/captcha');
 
  // take a screenshot of the CAPTCHA
  const screenshot = await page.screenshot();
 
  // convert the screenshot to a base64 encoded string
  const image = new Buffer(screenshot).toString('base64');
 
  // send the image to the 2Captcha API
  request.post({
    url: 'http://2captcha.com/in.php',
    formData: {
      key: 'your_2captcha_api_key',
      method: 'base64',
      body: image
    }
  }, async (error, response, body) => {
    if (error) {
      console.error(error);
    } else {
      // get the CAPTCHA ID from the 2Captcha API response
      const captchaId = body.split('|')[1];
 
      // request the CAPTCHA solution from the 2Captcha API
      request.get({
        url: `http://2captcha.com/res.php?key=your_2captcha_api_key&action=get&id=${captchaId}`
      }, async (error, response, body) => {
        if (error) {
          console.error(error);
        } else {
          // get the CAPTCHA solution from the 2Captcha API response
          const captchaSolution = body.split('|')[1];
 
          // use the CAPTCHA solution in your Puppeteer script
          await page.type('#captcha-input', captchaSolution);
          await page.click('#submit-button');
        }
        await browser.close();
      });
    }
  });
})();

Here are the results:

bypass-captcha
Click to open the image in full screen

Keep in mind that using CAPTCHA solvers with Puppeteer works mostly for testing purposes rather than large-scale scraping, as they can quickly become too expensive and slow. Additionally, some types of CAPTCHA, e.g., reCAPTCHA or Geetest, can't be solved by API solvers.

Conclusion

In this article, you've learned a few solutions to bypass CAPTCHA with Puppeteer. Methods like integrating a solver or masking the browser are sometimes effective, but they fail for more complex CAPTCHA and don't scale well for big web scraping projects.

For successful data extraction, you need a scalable and efficient solution like ZenRows, which handles all anti-bot bypasses for you in a single API call. Get your API key now and enjoy 1,000 requests for free.

Ready to get started?

Up to 1,000 URLs for free are waiting for you