How to Bypass Cloudflare With Puppeteer: 2 Working Methods

Rubén del Campo
Rubén del Campo
October 4, 2024 · 3 min read

Can you really bypass Cloudflare with Puppeteer? The short answer is yes, but using Puppeteer alone is insufficient. So, how can you supercharge your Puppeteer script to scrape Cloudflare-protected sites?  

In this article, we'll show you how Cloudflare works and two tested methods to bypass it with Puppeteer. We'll demonstrate these methods using CoinTracker, a site with simple protection, and the Cloudflare Challenge page, which uses an advanced security measure. Let's begin!

How Cloudflare Detects Bots

Cloudflare uses various techniques to guard against threats, data invasion, and even scrapers. These include detecting botnets, checking IP address reputation, TLS fingerprinting, displaying CAPTCHAs, Canvas fingerprinting, HTTP request header analysis, event tracking, etc.

For instance, when a Puppeteer web scraper visits a Cloudflare-protected website, it undergoes the above security checks on an interstitial page called Cloudflare's waiting room. If the web scraper passes, it's granted access. Otherwise, it gets blocked.

To learn more about Cloudflare detection techniques, check out our guide on how to bypass Cloudflare.

Why Puppeteer Can't Bypass Cloudflare

Puppeteer's obvious bot signals, like the presence of the navigator.webdriver property, HeadlessChrome User Agent flag, and other missing browser fingerprints, allow Cloudflare to identify it as an automated browser.

While some minor configurations, such as WebDriver patching, can mitigate those limitations, Puppeteer still leaves subtle traces in its browser fingerprint that make it easily detectable.

Let's scrape this Cloudflare Challenge page with the script below to determine if Puppeteer can effectively bypass Cloudflare.

Before running the code, ensure you install Puppeteer if you've not done so already:

Terminal
npm install puppeteer

Now, try accessing the challenge page using the following Puppeteer script:

Example
// npm install puppeteer
const puppeteer = require('puppeteer');

(async () => {
    // set up browser environment
    const browser = await puppeteer.launch();
    const page = await browser.newPage();

    // navigate to a URL
    await page.goto('https://www.scrapingcourse.com/cloudflare-challenge', {
        waitUntil: 'load',
    });

    // take page screenshot
    await page.screenshot({ path: 'screenshot.png' });

    // close the browser instance
    await browser.close();
})();

Here's what we got:

scrapingcourse cloudflare blocked screenshot
Click to open the image in full screen
Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

So, Cloudflare detected our script as a bot and locked us in the interstitial page. As mentioned earlier, Puppeteer alone can't bypass Cloudflare by itself.

How about we optimize Puppeteer with stealth evasions and free our scraper from the waiting room? Let's see the two ways to achieve this.

Method #1: Bypass Cloudflare With puppeteer-extra-plugin-stealth

The puppeteer-extra-plugin-stealth is a patch that masks Puppeteer's automated browser properties, making it appear like an actual browser.

For example, the Stealth plugin overrides the WebDriver property and replaces the HeadlessChrome flag with Chrome to mask automation signals. It also mocks other legitimate browser properties, such as chrome.runtime, which makes it appear headful even in headless mode.

The Puppeteer Stealth plugin uses a similar API as the base Puppeteer, so there's no learning curve for developers already using Puppeteer.

Let's bypass CoinTracker, a website with simple Cloudflare protection, to see how Puppeteer Stealth works.

First, install the plugin:

Terminal
npm install puppeteer-extra puppeteer-extra-plugin-stealth

Now, import the required libraries and add the Stealth plugin. Then, request the protected website and take a screenshot of its homepage:

Example
// npm install puppeteer-extra puppeteer-extra-plugin-stealth
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');

// add the stealth plugin
puppeteer.use(StealthPlugin());

(async () => {
    // set up browser environment
    const browser = await puppeteer.launch();
    const page = await browser.newPage();

    // navigate to a URL
    await page.goto('https://www.cointracker.io/', {
        waitUntil: 'load',
    });

    // take page screenshot
    await page.screenshot({ path: 'screenshot.png' });

    // close the browser instance
    await browser.close();
})();

The Puppeteer Stealth plugin bypasses Cloudflare and screenshots the website's homepage, as shown:

Coni Tracker Puppeteer Stealth Screenshot
Click to open the image in full screen

Awesome! The plugin worked, and you successfully avoided Cloudflare detection. 

You can celebrate if this is your case. Otherwise, it means you're stuck with Cloudflare's advanced security.

The current target website was easy to access because it doesn't enforce any complex detection techniques. Can the Puppeteer Stealth plugin handle a more advanced security measure? That brings us to its limitations.

Limitations of Puppeteer-extra-plugin-stealth

Some websites use more advanced Cloudflare security checks than others. In such cases, masking Puppeteer's automation properties using the Stealth plugin is insufficient to get through. 

For example, Puppeteer Stealth got blocked when attempting to access the Cloudflare Challenge page. 

Try it out yourself by replacing the previous target URL with the challenge page URL:

Example
// npm install puppeteer-extra puppeteer-extra-plugin-stealth
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');

// add the stealth plugin
puppeteer.use(StealthPlugin());

(async () => {
    // set up browser environment
    const browser = await puppeteer.launch();
    const page = await browser.newPage();

    // navigate to a URL
    await page.goto('https://www.scrapingcourse.com/cloudflare-challenge', {
        waitUntil: 'networkidle0',
    });

    // wait for the challenge to resolve
    await new Promise(function (resolve) {
        setTimeout(resolve, 10000);
    });

    // take page screenshot
    await page.screenshot({ path: 'screenshot.png' });

    // close the browser instance
    await browser.close();
})();

The Stealth plugin got blocked, as shown:

scrapingcourse cloudflare blocked screenshot
Click to open the image in full screen

The results indicate that a more advanced Cloudflare anti-bot system detected the Stealth plugin as a bot. The Stealth plugin still has some detectable traits, such as inconsistent WebGL or Canvas rendering, giving it away as a bot.

How can you solve these limitations and extract data from complicated websites? The answer is ZenRows.

Method #2: Bypass Cloudflare With ZenRows and Puppeteer

The easiest way to avoid the limitations of Puppeteer and its Stealth plugin is to integrate the library with the ZenRows Scraping Browser. With the ZenRows Scraping Browser, your Puppeteer scraper gets fortified with advanced evasions to appear as a human and bypass anti-bot detection. 

All you have to do is add a single line of code to your existing Puppeteer script, and the Scraping Browser will help you handle core browser fingerprinting, add missing plugins and extensions, manage residential proxy rotation, and more. 

The Scraping Browser also runs in the cloud, preventing the memory overhead of running local browser instances. This feature makes it highly scalable.

Let's quickly see how to integrate the Scraping Browser with Puppeteer. 

ZenRows requires puppeteer-core, a Puppeteer version that doesn't download the Chrome binary. So, ensure you install it:

Terminal
npm install puppeteer-core

Sign up to load the Request Builder. Then, go to the Scraping Browser Builder and copy the browser URL.

ZenRows scraping browser
Click to open the image in full screen

Integrate the copied browser URL into your Puppeteer script like so:

Example
// npm install puppeteer-core
const puppeteer = require('puppeteer-core');
// define your connection URL
const connectionURL = 'wss://browser.zenrows.com?apikey=<YOUR_ZENROWS_API_KEY>';

(async () => {
    // set up browser environment
    const browser = await puppeteer.connect({
        browserWSEndpoint: connectionURL,
    });

    // create a new page
    const page = await browser.newPage();

    // navigate to a URL
    await page.goto('https://www.scrapingcourse.com/cloudflare-challenge', {
        waitUntil: 'networkidle0',
    });

    // wait for the challenge to resolve
    await new Promise(function (resolve) {
        setTimeout(resolve, 10000);
    });

    //take page screenshot
    await page.screenshot({ path: 'screenshot.png' });
    // close the browser instance
    await browser.close();
})();

The above code accesses and screenshots the protected page. See the result below:

cloudflare-challenge-passed
Click to open the image in full screen

Congratulations 🎉! You've successfully bypassed Cloudflare using Puppeteer and ZenRows.

Conclusion

In this article, you've learned how Cloudflare works and two ways to bypass Cloudflare while scraping with Puppeteer. While an open-source solution, such as Puppeteer Stealth, fixes some bot-like signals, it's insufficient against advanced anti-bot measures.

We recommend integrating ZenRows with your Puppeteer scraper to bypass anti-bot measures at scale while retaining all the automation features of Puppeteer.

Try ZenRows for free now without a credit card.

Ready to get started?

Up to 1,000 URLs for free are waiting for you