How to Bypass CAPTCHA With Puppeteer

May 31, 2024 · 12 min read

Does your Puppeteer web scraper struggle to bypass CAPTCHA? You've come to the right place for the solution!

In this tutorial, you'll learn the four best ways to deal with CAPTCHA using Puppeteer while scraping without obstacles.

Can Puppeteer Solve CAPTCHA?

The short answer is yes, but only if you give Puppeteer a boost. That's because Puppeteer alone can't automate CAPTCHA-clicking.

For instance, try scraping G2 Reviews, a website protected by Cloudflare CAPTCHA, with vanilla Puppeteer:

scraper.js
// import the required library
const puppeteer = require("puppeteer");

(async () => {
  
    // start Puppeteer in headless mode and open the target website
    const browser = await puppeteer.launch({ headless: "new" });
    const page = await browser.newPage();
    const response = await page.goto("https://www.g2.com/products/asana/reviews");
    
    // wait for the content to load
    await page.waitForSelector("body");
  
    // get the content of the page
    const content = await page.content();
    console.log(response.status(), content);
    
    // close the browser instance
    await browser.close();
})();

The code outputs the following HTML, indicating that Cloudflare has blocked your Puppeteer CAPTCHA solver:

Output
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
  
    <!--  ...    -->
  
    <title>Attention Required! | Cloudflare</title>
</head>

The CAPTCHA typically looks like this in the GUI (non-headless mode):

Click to open the image in full screen

This is a common scenario when trying to bypass Cloudflare, which uses various techniques like TLS fingerprinting to detect and block automated requests. Generally, there are two ways to avoid CAPTCHAs:

  • Solve the CAPTCHA once it triggers.
  • Bypass the CAPTCHA completely.

Puppeteer can solve CAPTCHAs only if supported with external CAPTCHA-solving tools. The vanilla version of Puppeteer is an automation library, not designed to solve CAPTCHAs.

The most efficient way of handling CAPTCHA is bypassing it by preventing it from appearing. While Puppeteer's headless browser capability may help you bypass CAPTCHAs, it still requires backup from plugins like Puppeteer Stealth.

Another way to bypass CAPTCHAs is by optimizing Puppeteer's request headers to mimic a real user or setting up a proxy with Puppeteer to switch IPs.

Let's look at the four best techniques to handle CAPTCHAs with Puppeteer.

Get rid of all CAPTCHAs with a single API call.
ZenRows easily handles every CAPTCHA with a 99.93% average success rate.
Try for Free

Method #1: Supercharge Puppeteer With Stealth to Bypass CAPTCHA

Puppeteer Stealth is a plugin featuring various evasion techniques for bypassing anti-bot detection during web scraping. It removes bot-like attributes from Puppeteer's ChromeDriver, making it appear as a legitimate browser.

The Stealth plugin requires some technical setup, but it's a free method of bypassing CAPTCHA with Puppeteer.

Let's see how it works by scraping OpenSea, an anti-bot-protected website that presents CAPTCHAs when a request doesn't meet its security criteria.

To get started, you have to install puppeteer-extra and puppeteer-extra-plugin-stealth:

Terminal
npm install puppeteer-extra puppeteer-extra-plugin-stealth

Once installed, import the modules and enable the stealth plugin:

scraper.js
const puppeteer = require("puppeteer-extra"); 
const pluginStealth = require("puppeteer-extra-plugin-stealth"); 
 
//save to executable path
const { executablePath } = require("puppeteer"); 
 
// use stealth 
puppeteer.use(pluginStealth());

The next steps include setting the viewport, navigating to the page URL, waiting for it to load, and taking screenshots to track the process.

scraper.js
// ...

// launch puppеteer-stealth 
puppeteer.launch({ executablePath: executablePath() }).then(async browser => { 
	// create a new page 
	const page = await browser.newPage(); 
 
	// set page view 
	await page.setViewport({ width: 1280, height: 720 }); 
 
	// navigate to the website 
	await page.goto("https://www.opensea.io/"); 
 
	// wait for page to load
	await page.waitForTimeout(1000); 
 
	// take a screenshot 
	await page.screenshot({ path: "image.png" }); 
 
	// close the browser 
	await browser.close(); 
});

Here's the complete code:

scraper.js
const puppeteer = require("puppeteer-extra"); 
 
// add stealth plugin and use defaults 
const pluginStealth = require("puppeteer-extra-plugin-stealth"); 
const { executablePath } = require("puppeteer"); 
 
// use stealth 
puppeteer.use(pluginStealth()); 
 
// launch puppeteer-stealth 
puppeteer.launch({ executablePath: executablePath() }).then(async browser => { 
	// create a new page 
	const page = await browser.newPage(); 
 
	// set page view 
	await page.setViewport({ width: 1280, height: 720 }); 
 
	// navigate to the website 
	await page.goto("https://www.opensea.io/"); 
 
	// wait for page to load
	await page.waitForTimeout(1000); 
 
	// take a screenshot 
	await page.screenshot({ path: "image.png" }); 
 
	// close the browser 
	await browser.close(); 
});

The above code outputs the following screenshot:

bypass-captcha-puppeteer
Click to open the image in full screen

Congrats! You've just made your scraper more undetectable.

However, more advanced website protections can detect Puppeteer Stealth. Confirm that by running the same script on a G2's product page. Here's the output:

bypass-g2
Click to open the image in full screen

Puppeteer and Stealth plugin couldn't solve the CAPTCHA problem. Let's see a solution that works in the next section.

Method #2: Best CAPTCHA Bypass With ZenRows

The best way to handle CAPTCHA is to use ZenRows' Universal Scraper API. It modifies your request headers, auto-rotates premium proxies, and bypasses CAPTCHAs and other anti-bot measures at scale in a single API call.

ZenRows also features JavaScript instructions, allowing it to act as a headless browser for extracting content from dynamic websites like those using infinite scrolling. Thanks to this feature, you can replace Puppeteer with ZenRows and focus on scraping your target content without getting blocked.

Let's see ZenRows in action by scraping the Antibot Challenge page where Puppeteer Stealth got blocked.

Start by signing up for a new account to get to the Request Builder.

building a scraper with zenrows
Click to open the image in full screen

Insert the target URL into the link box, activate Premium Proxies, and enable JS Rendering.

Next, choose Node.js and then click on the API connection mode. After that, copy the generated code and paste it into your script.

scraper.js
const axios = require('axios'); 

const params = { 
url: 'https://www.scrapingcourse.com/antibot-challenge', 
apikey: '<YOUR_ZENROWS_API_KEY>', 
js_render: 'true', 
premium_proxy: 'true' 
}; 

axios.get('https://api.zenrows.com/v1/', { params }) 
.then(({ data }) => console.log(data)) 
.catch(error => console.error(error));

When you run this code, you'll successfully access the page:

Output
<html lang="en">
<head>
    <!-- ... -->
    <title>Antibot Challenge - ScrapingCourse.com</title>
    <!-- ... -->
</head>
<body>
    <!-- ... -->
    <h2>
        You bypassed the Antibot challenge! :D
    </h2>
    <!-- other content omitted for brevity -->
</body>
</html>

Congratulations! 🎉 You've accessed a protected page without any complex User Agent setup.

Would you rather try solving the CAPTCHA manually instead? Let's go through two more methods.

Method #3: Implement a Free Solver Plugin

Puppeteer-extra-plugin-recaptcha is a free and open-source module that automates the solving of reCAPTCHA, one of the most popular anti-bot technologies on the market. It also supports a 2Captcha integration that you can use when the free module proves insufficient.

Let's use 2Captcha's demo page to illustrate how to integrate the solver with Puppeteer.

bypass-2captcha-puppeteer
Click to open the image in full screen

To get started, install puppeteer-extra and recaptcha.

Terminal
npm install puppeteer puppeteer-extra puppeteer-extra-plugin-recaptcha

Import the libraries and provide your 2Captcha API key as a token.

script.js
const puppeteer = require('puppeteer-extra')
const RecaptchaPlugin = require('puppeteer-extra-plugin-recaptcha')
 
// use the RecaptchaPlugin with the specified provider (2captcha) and token
puppeteer.use(
  RecaptchaPlugin({
    provider: {
      id: '2captcha',
      token: 'XXXXXXX' 
    },
    visualFeedback: true // enable visual feedback (colorize reCAPTCHAs)
  })
)

Next, navigate to your target webpage and initialize the solving with the page.solveRecaptchas() method.

script.js
// launch a headless browser instance
puppeteer.launch({ headless: true }).then(async browser => {
  // create a new page
  const page = await browser.newPage()
 
  // navigate to a page containing a reCAPTCHA challenge
  await page.goto('https://2captcha.com/demo/recaptcha-v2')
 
  // automatically solve the reCAPTCHA challenge
  await page.solveRecaptchas()

Now, wait for the solution and click on the submit button.

script.js
// wait for the navigation and click the submit button
await Promise.all([
await Promise.all([
  page.waitForNavigation(),
  page.click(`#recaptcha-demo-submit`)
])

The complete code should be:

script.js
const puppeteer = require('puppeteer-extra')
const RecaptchaPlugin = require('puppeteer-extra-plugin-recaptcha')
 
// use the RecaptchaPlugin with the specified provider (2captcha) and token
puppeteer.use(
  RecaptchaPlugin({
    provider: {
      id: '2captcha',
      token: 'XXXXXXX' 
    },
    visualFeedback: true // enable visual feedback (colorize reCAPTCHAs)
  })
)
 
// launch a headless browser instance
puppeteer.launch({ headless: true }).then(async browser => {
  // create a new page
  const page = await browser.newPage()
 
  // navigate to a page containing a reCAPTCHA challenge
  await page.goto('https://2captcha.com/demo/recaptcha-v2')
 
  // automatically solve the reCAPTCHA challenge
  await page.solveRecaptchas()
 
  // wait for the navigation and click the submit button
  await Promise.all([
    page.waitForNavigation(),
    page.click(`#recaptcha-demo-submit`)
  ])
 
  // take a screenshot of the response page
  await page.screenshot({ path: 'response.png', fullPage: true })
 
  // close the browser
  await browser.close()
})

Here's the outcome:

bypass-recaptcha
Click to open the image in full screen

Great! You've just successfully solved the CAPTCHA.

However, free CAPTCHA solvers are unreliable because they're automated. If they fail, you should look into paid solvers. Since they employ humans, they can interact with any CAPTCHA type.

Let's imagine you encounter a CAPTCHA-protected form while scraping and need to solve it. Here, we'll use 2Captcha, an API-based service that employs humans to solve the challenge.

Let’s go with the same 2Captcha's demo page.

First, sign up on 2Captcha to get an API key. Then, install Puppeteer and the requests module.

Terminal
npm install puppeteer request

Now, let's write a script that opens the website you want to scrape, takes a screenshot of the CAPTCHA, and sends it to the service.

script.js
const puppeteer = require('puppeteer');
const request = require('request');
 
(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
 
  // navigate to the page with the CAPTCHA
  await page.goto('https://2captcha.com/demo/normal');
 
  // take a screenshot of the CAPTCHA
  const screenshot = await page.screenshot();
 
  // convert the screenshot to a base64 encoded string
  const image = new Buffer(screenshot).toString('base64');
 
  // send the image to the 2Captcha API
  request.post({
    url: 'http://2captcha.com/in.php',
    formData: {
      key: 'your_2captcha_api_key',
      method: 'base64',
      body: image
    }
  }, async (error, response, body) => {
    if (error) {
      console.error(error);
    } 

Let's capture the API response using an ID, as shown below:

script.js
// get the CAPTCHA ID from the 2Captcha API response
const captchaId = body.split('|')[1];

// request the CAPTCHA solution from the 2Captcha API
request.get({
  url: `http://2captcha.com/res.php?key=your_2captcha_api_key&action=get&id=${captchaId}`
}, (error, response, body) => {
  if (error) {
    console.error(error);
  }
});

Once we get the solution, we can put it on the page to solve the test.

script.js
// get the CAPTCHA solution from the 2Captcha API response
const captchaSolution = body.split('|')[1];

// use the CAPTCHA solution in your Puppeteer script
await page.type('#captcha-input', captchaSolution);
await page.click('#submit-button');

This is what the full script will look like:

script.js
const puppeteer = require('puppeteer');
const request = require('request');
 
(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
 
  // navigate to the page with the CAPTCHA
  await page.goto('https://example.com/captcha');
 
  // take a screenshot of the CAPTCHA
  const screenshot = await page.screenshot();
 
  // convert the screenshot to a base64 encoded string
  const image = new Buffer(screenshot).toString('base64');
 
  // send the image to the 2Captcha API
  request.post({
    url: 'http://2captcha.com/in.php',
    formData: {
      key: 'your_2captcha_api_key',
      method: 'base64',
      body: image
    }
  }, async (error, response, body) => {
    if (error) {
      console.error(error);
    } else {
      // get the CAPTCHA ID from the 2Captcha API response
      const captchaId = body.split('|')[1];
 
      // request the CAPTCHA solution from the 2Captcha API
      request.get({
        url: `http://2captcha.com/res.php?key=your_2captcha_api_key&action=get&id=${captchaId}`
      }, async (error, response, body) => {
        if (error) {
          console.error(error);
        } else {
          // get the CAPTCHA solution from the 2Captcha API response
          const captchaSolution = body.split('|')[1];
 
          // use the CAPTCHA solution in your Puppeteer script
          await page.type('#captcha-input', captchaSolution);
          await page.click('#submit-button');
        }
        await browser.close();
      });
    }
  });
})();

Here are the results:

bypass-captcha
Click to open the image in full screen

Keep in mind that using CAPTCHA solvers with Puppeteer works mostly for testing purposes rather than large-scale scraping, as they can quickly become too expensive and slow. Additionally, some types of CAPTCHA, e.g., reCAPTCHA or Geetest, can't be solved by API solvers.

Conclusion

We've explored seven effective methods to bypass CAPTCHAs while web scraping, from rotating IPs to hiding automation indicators. While each technique can help, using them together gets better results - especially proper IP rotation, cookie management, and human-like behavior simulation. However, implementing and maintaining all these methods can be complex. ZenRows offers a complete solution that handles everything automatically. Try ZenRows for free!

Ready to get started?

Up to 1,000 URLs for free are waiting for you