Can you really bypass Cloudflare with Puppeteer? The short answer is yes, but using Puppeteer alone is insufficient. So, how can you supercharge your Puppeteer script to scrape Cloudflare-protected sites? Â
In this article, we'll show you how Cloudflare works and two tested methods to bypass it with Puppeteer. We'll demonstrate these methods using CoinTracker, a site with simple protection, and the Cloudflare Challenge page, which uses an advanced security measure. Let's begin!
How Cloudflare Detects Bots
Cloudflare uses various techniques to guard against threats, data invasion, and even scrapers. These include detecting botnets, checking IP address reputation, TLS fingerprinting, displaying CAPTCHAs, Canvas fingerprinting, HTTP request header analysis, event tracking, etc.
For instance, when a Puppeteer web scraper visits a Cloudflare-protected website, it undergoes the above security checks on an interstitial page called Cloudflare's waiting room. If the web scraper passes, it's granted access. Otherwise, it gets blocked.
To learn more about Cloudflare detection techniques, check out our guide on how to bypass Cloudflare.
Why Puppeteer Can't Bypass Cloudflare
Puppeteer's obvious bot signals, like the presence of the navigator.webdriver
property, HeadlessChrome
User Agent flag, and other missing browser fingerprints, allow Cloudflare to identify it as an automated browser.
While some minor configurations, such as WebDriver patching, can mitigate those limitations, Puppeteer still leaves subtle traces in its browser fingerprint that make it easily detectable.
Let's scrape this Cloudflare Challenge page with the script below to determine if Puppeteer can effectively bypass Cloudflare.
Before running the code, ensure you install Puppeteer if you've not done so already:
npm install puppeteer
Now, try accessing the challenge page using the following Puppeteer script:
// npm install puppeteer
const puppeteer = require('puppeteer');
(async () => {
// set up browser environment
const browser = await puppeteer.launch();
const page = await browser.newPage();
// navigate to a URL
await page.goto('https://www.scrapingcourse.com/cloudflare-challenge', {
waitUntil: 'load',
});
// take page screenshot
await page.screenshot({ path: 'screenshot.png' });
// close the browser instance
await browser.close();
})();
Here's what we got:
So, Cloudflare detected our script as a bot and locked us in the interstitial page. As mentioned earlier, Puppeteer alone can't bypass Cloudflare by itself.
How about we optimize Puppeteer with stealth evasions and free our scraper from the waiting room? Let's see the two ways to achieve this.
Method #1: Bypass Cloudflare With puppeteer-extra-plugin-stealth
The puppeteer-extra-plugin-stealth is a patch that masks Puppeteer's automated browser properties, making it appear like an actual browser.
For example, the Stealth plugin overrides the WebDriver property and replaces the HeadlessChrome
flag with Chrome
to mask automation signals. It also mocks other legitimate browser properties, such as chrome.runtime
, which makes it appear headful even in headless mode.
The Puppeteer Stealth plugin uses a similar API as the base Puppeteer, so there's no learning curve for developers already using Puppeteer.
Let's bypass CoinTracker, a website with simple Cloudflare protection, to see how Puppeteer Stealth works.
First, install the plugin:
npm install puppeteer-extra puppeteer-extra-plugin-stealth
Now, import the required libraries and add the Stealth plugin. Then, request the protected website and take a screenshot of its homepage:
// npm install puppeteer-extra puppeteer-extra-plugin-stealth
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
// add the stealth plugin
puppeteer.use(StealthPlugin());
(async () => {
// set up browser environment
const browser = await puppeteer.launch();
const page = await browser.newPage();
// navigate to a URL
await page.goto('https://www.cointracker.io/', {
waitUntil: 'load',
});
// take page screenshot
await page.screenshot({ path: 'screenshot.png' });
// close the browser instance
await browser.close();
})();
The Puppeteer Stealth plugin bypasses Cloudflare and screenshots the website's homepage, as shown:
Awesome! The plugin worked, and you successfully avoided Cloudflare detection.Â
You can celebrate if this is your case. Otherwise, it means you're stuck with Cloudflare's advanced security.
The current target website was easy to access because it doesn't enforce any complex detection techniques. Can the Puppeteer Stealth plugin handle a more advanced security measure? That brings us to its limitations.
Limitations of Puppeteer-extra-plugin-stealth
Some websites use more advanced Cloudflare security checks than others. In such cases, masking Puppeteer's automation properties using the Stealth plugin is insufficient to get through.Â
For example, Puppeteer Stealth got blocked when attempting to access the Cloudflare Challenge page.Â
Try it out yourself by replacing the previous target URL with the challenge page URL:
// npm install puppeteer-extra puppeteer-extra-plugin-stealth
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
// add the stealth plugin
puppeteer.use(StealthPlugin());
(async () => {
// set up browser environment
const browser = await puppeteer.launch();
const page = await browser.newPage();
// navigate to a URL
await page.goto('https://www.scrapingcourse.com/cloudflare-challenge', {
waitUntil: 'networkidle0',
});
// wait for the challenge to resolve
await new Promise(function (resolve) {
setTimeout(resolve, 10000);
});
// take page screenshot
await page.screenshot({ path: 'screenshot.png' });
// close the browser instance
await browser.close();
})();
The Stealth plugin got blocked, as shown:
The results indicate that a more advanced Cloudflare anti-bot system detected the Stealth plugin as a bot. The Stealth plugin still has some detectable traits, such as inconsistent WebGL or Canvas rendering, giving it away as a bot.
How can you solve these limitations and extract data from complicated websites? The answer is ZenRows.
Method #2: Bypass Cloudflare With ZenRows and Puppeteer
The easiest way to avoid the limitations of Puppeteer and its Stealth plugin is to integrate the library with the ZenRows Scraping Browser. With the ZenRows Scraping Browser, your Puppeteer scraper gets fortified with advanced evasions to appear as a human and bypass anti-bot detection.Â
All you have to do is add a single line of code to your existing Puppeteer script, and the Scraping Browser will help you handle core browser fingerprinting, add missing plugins and extensions, manage residential proxy rotation, and more.Â
The Scraping Browser also runs in the cloud, preventing the memory overhead of running local browser instances. This feature makes it highly scalable.
Let's quickly see how to integrate the Scraping Browser with Puppeteer.Â
ZenRows requires puppeteer-core
, a Puppeteer version that doesn't download the Chrome binary. So, ensure you install it:
npm install puppeteer-core
Sign up to load the Request Builder. Then, go to the Scraping Browser Builder and copy the browser URL.
Integrate the copied browser URL into your Puppeteer script like so:
// npm install puppeteer-core
const puppeteer = require('puppeteer-core');
// define your connection URL
const connectionURL = 'wss://browser.zenrows.com?apikey=<YOUR_ZENROWS_API_KEY>';
(async () => {
// set up browser environment
const browser = await puppeteer.connect({
browserWSEndpoint: connectionURL,
});
// create a new page
const page = await browser.newPage();
// navigate to a URL
await page.goto('https://www.scrapingcourse.com/cloudflare-challenge', {
waitUntil: 'networkidle0',
});
// wait for the challenge to resolve
await new Promise(function (resolve) {
setTimeout(resolve, 10000);
});
//take page screenshot
await page.screenshot({ path: 'screenshot.png' });
// close the browser instance
await browser.close();
})();
The above code accesses and screenshots the protected page. See the result below:
Congratulations 🎉! You've successfully bypassed Cloudflare using Puppeteer and ZenRows.
If you're still getting blocked by advanced anti-bot measures, turn to the ZenRows Scraper API, an all-in-one solution boasting a 98.7% average success rate of bypassing Cloudflare and any other anti-bot at scale.
Conclusion
In this article, you've learned how Cloudflare works and two ways to bypass Cloudflare while scraping with Puppeteer. While an open-source solution, such as Puppeteer Stealth, fixes some bot-like signals, it's insufficient against advanced anti-bot measures.
We recommend integrating ZenRows with your Puppeteer scraper to bypass anti-bot measures at scale while retaining all the automation features of Puppeteer.
Try ZenRows for free now without a credit card.