How to Use Puppeteer Stealth: A Plugin for Scraping

Idowu Omisola
Idowu Omisola
Updated: December 18, 2024 · 4 min read

Puppeteer is a fantastic headless browser library, yet it can easily be detected and blocked by anti-scraping measures. This is where Puppeteer Extra, with the help of Puppeteer Stealth plugin, plays a key role.

This tutorial introduces Puppeteer Stealth and how to scrape web pages with it. Let's dive in!

What Is Puppeteer Extra?

Puppeteer Extra is an open-source library built to extend the functionality of the popular Puppeteer headless browser.

Here's a list of some of the main plugins you can use with Puppeteer Extra and what they do:

  • The Stealth plugin hides Puppeteer's automation properties by masking the subtle differences between headless and regular Chrome browsers.
  • AdBlocker plugin blocks ads and trackers.
  • User Data Dir plugin maintains consistent browser data and settings between sessions.
  • reCAPTCHA plugin solves reCAPTCHAs automatically.
  • Block Resource plugin intercepts and blocks unwanted resources, including images, fonts, CSS, etc.
  • The DevTools plugin creates a secure portal for Chrome DevTools APIs to allow debugging and custom profiling from anywhere.

We'll focus on how to avoid detection with Puppeteer.

What Is Puppeteer Stealth?

Puppeteer Stealth, also known as puppeteer-extra-plugin-stealth, is an extension built on top of Puppeteer Extra that uses different techniques to hide properties that would otherwise flag your request as a bot. That makes it harder for websites to detect your scraper.

To further enhance stealth capabilities, developers often leverage Puppeteer Extra Stealth alongside additional anti-detection measures.

Let's see it in action.

What Does Puppeteer Stealth Do?

While web scraping with a headless browser gives you browser-like access, websites also get code execution access. That means they can leverage various browser fingerprinting scripts to gather data to identify your automated browser.

Puppeteer Stealth plugin is crucial here. Its goal is to mask some default headless properties, such as headless: true, navigator.webdriver: true and request headers, to crawl below the radar.

That's possible, thanks to the extension modules.

Built-in Evasion Modules

Built-in evasion modules are pre-packaged plugins that drive the Puppeteer Stealth functionality. As stated earlier, base Puppeteer has leaks or properties that flag it as a bot, which the Stealth plugin aims to fix.

Each Puppeteer Stealth evasion module is designed to plug a particular leak. Take a look below:

  • iframe.contentWindow fixes the HEADCHR_iframe detection by modifying window.top and window.frameElement.
  • Media.codecs modifies codecs to support what actual Chrome supports.
  • Navigator.hardwareConcurrency sets the number of logical processors to four.
  • Navigator.languages modifies the languages property to allow custom languages.
  • Navigator.plugin emulates navigator.mimeTypes and navigator.plugins with functional mocks to match standard Chrome used by humans.
  • Navigator.permissions masks the permissions property to pass the permissions test.
  • Navigator.vendors makes it possible to customize the navigator.vendor property.
  • Navigator.webdriver masks navigator.webdriver.
  • Sourceurl hides the sourceurl attribute of the Puppeteer script.
  • User-agent-override modifies the user-agent components.
  • Webgl.vendor changes the Vendor/Renderer property from Google, which is the default for Puppeteer headless.
  • Window.outerdimensions adds the missing window.outerWidth or window.outerHeight properties.
Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

How to Web Scrape With Puppeteer Stealth

Before we dive into Puppeteer in stealth mode, it's essential to explore web scraping with the base headless browser. For this test, we'll use the anti-bot challenge page to evaluate the plugin's capability.

Let's begin!

  1. Install NodeJS and Puppeteer using the following command:
Terminal
npm install puppeteer
  1. Import Puppeteer and open an async function where you'll write your code.
scraper.js
const puppeteer = require('puppeteer'); 
 
(async () => {
	//…
})();
  1. Launch a browser, create a new page, and navigate to your target URL.
scraper.js
(async () => {
	const browser = await puppeteer.launch();
	const page = await browser.newPage();

	//navigate to target URL
	await page.goto('https://www.scrapingcourse.com/antibot-challenge');
	
})();
  1. Set the screen size, wait for the page to load, take a screenshot, and close your browser.
scraper.js
(async () => {
	//…
 
	// Set screen size
	await page.setViewport({width: 1280, height: 720});
 
	// wait for page to load
	await page.waitForTimeout(30000); 
 
	// Take screenshot 
	await page.screenshot({ path: 'screenshot.png', fullPage: true }); 
 
	// Closes the browser and all of its pages 
	await browser.close(); 
})();

Putting all of it together, here's your complete code:

scraper.js
const puppeteer = require('puppeteer'); 
 
(async () => {
	const browser = await puppeteer.launch();
	const page = await browser.newPage();

	//navigate to target URL
	await page.goto('https://www.scrapingcourse.com/antibot-challenge');
 
	// Set screen size
	await page.setViewport({width: 1280, height: 720});
 
	//wait for page to load
	await page.waitForTimeout(30000); 
 
	// Take screenshot 
	await page.screenshot({ path: 'screenshot.png', fullPage: true }); 
 
	// Closes the browser and all of its pages 
	await browser.close(); 
})();

And this is the screenshot of the web page:

scrapingcourse cloudflare blocked screenshot
Click to open the image in full screen

The result above shows that our Puppeteer script got blocked since we couldn't bypass anti-bot detection.

Now, let's try scraping the same website using Puppeteer Stealth.

Here are the steps you must take:

Step 1: Install Puppeteer-Stealth

As mentioned earlier, we need the Puppeteer Extra library to use Puppeteer Stealth. Install both libraries using the following command.

Terminal
npm install puppeteer-extra puppeteer-extra-plugin-stealth

Step 2: Configure Puppeteer-Stealth

To configure Puppeteer Stealth, start by importing Puppeteer Extra. Then add the Stealth plugin and use it in default mode, which ensures your script uses all evasion modules:

scraper.js
// npm install puppeteer-extra puppeteer-extra-plugin-stealth
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');

// add the stealth plugin
puppeteer.use(StealthPlugin());

Open an async function and launch Puppeteer Stealth in headless mode:

scraper.js
// ...
(async () => {
    // set up browser environment
    const browser = await puppeteer.launch({ headless: 'new' });
})();

Step 3: Take a Screenshot

Like in our base Puppeteer script, create a new page, set the screen size, and navigate to the target website.

scraper.js
// ...
(async () => {
    // ...
    const page = await browser.newPage();

    // navigate to a URL
    await page.goto('https://www.scrapingcourse.com/cloudflare-challenge');
})();

Lastly, wait for the page to load and take a screenshot.

scraper.js
// ...
(async () => {
    // ...

    // wait for the challenge to resolve
    await new Promise(function (resolve) {
        setTimeout(resolve, 10000);
    });

    // take page screenshot
    await page.screenshot({ path: 'screenshot.png' });

    // close the browser instance
    await browser.close();
})();

Combining the snippets gives the following complete code:

scraper.js
// npm install puppeteer-extra puppeteer-extra-plugin-stealth
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');

// add the stealth plugin
puppeteer.use(StealthPlugin());

(async () => {
    // set up browser environment
    const browser = await puppeteer.launch({ headless: 'new' });
    const page = await browser.newPage();

    // navigate to a URL
    await page.goto('https://www.scrapingcourse.com/antibot-challenge', {
        waitUntil: 'networkidle0',
    });

    // wait for the challenge to resolve
    await new Promise(function (resolve) {
        setTimeout(resolve, 10000);
    });

    // take page screenshot
    await page.screenshot({ path: 'screenshot.png' });

    // close the browser instance
    await browser.close();
})();

Unfortunately, Puppeteer Stealth got blocked, as shown below, which means it doesn't work anymore:

scrapingcourse cloudflare blocked screenshot
Click to open the image in full screen

That's not the end! In the next section, we'll show the reason for the above result and the best alternative for Puppeteer Stealth.

Limitations of puppeteer-extra-plugin-stealth and a Solution

While Puppeteer Stealth can help you avoid detection, it has several limitations that prevent it from bypassing anti-bot measures like Cloudflare:

  • Open-Source Nature: Since the library is open-source and not actively maintained, anti-bot developers can easily block its evasion mechanisms.
  • Leaks Bot-Like Fingerprints: Despite featuring several evasion patches, Puppeteer Stealth still leaks bot-like fingerprints, such as missing plugins.
  • Performance Challenges: Due to the memory-demanding browser instance, the library can become extremely slow, making scaling to larger projects challenging.

As you've seen earlier, your script will easily get detected and blocked if you use Puppeteer Stealth to try to bypass an anti-bot like Cloudflare. 

The best way to keep up with the evolving anti-bot landscape and scrape without getting blocked is to use a web scraping API like the ZenRows Scraper API. 

ZenRows provides all the toolkits required for bypassing CAPTCHAs and scraping smoothly at scale. It features premium proxy rotation, request header management, advanced evasion mechanisms, JavaScript rendering, anti-bot auto-bypass, and more.

Let's see ZenRows in action by scraping the anti-bot challenge page that blocked Puppeteer Stealth previously.

Sign up to open the ZenRows Request Builder. Paste the target URL in the link box and activate Premium Proxies and JS Rendering.

Select Node.js as your programming language and choose the API connection mode. Copy and paste the generated code into your Node.js script.

building a scraper with zenrows
Click to open the image in full screen

The generated code looks like this:

scraper.js
// npm install axios
const axios = require('axios');

const url = 'https://www.scrapingcourse.com/antibot-challenge';
const apikey = '<YOUR_ZENROWS_API_KEY>';
axios({
    url: 'https://api.zenrows.com/v1/',
    method: 'GET',
    params: {
        url: url,
        apikey: apikey,
        js_render: 'true',
        premium_proxy: 'true',
    },
})
    .then((response) => console.log(response.data))
    .catch((error) => console.log(error));

Here's the result:

Output
<html lang="en">
<head>
    <!-- ... -->
    <title>Antibot Challenge - ScrapingCourse.com</title>
    <!-- ... -->
</head>
<body>
    <!-- ... -->
    <h2>
        You bypassed the Antibot challenge! :D
    </h2>
    <!-- other content omitted for brevity -->
</body>
</html>

How does it feel knowing you can scrape just about any website? Awesome, right?

Conclusion

Puppeteer is a popular web scraping and automation tool. But its default properties make it easy for websites to detect and block your bot. Fortunately, Puppeteer Stealth lets you leverage its evasion modules to stay below the radar.

Yet, Puppeteer Stealth can't keep up with the frequently evolving anti-bot measures. Thus, it doesn't work against advanced obstacles. For these cases, consider solutions like ZenRows and use its free trial for your next project.

Frequent Questions

Does puppeteer-extra-plugin-stealth Still Work?

The puppeteer-extra-plugin-stealth isn't effective against anti-bots any longer. The plugin hasn't been updated since 2022, and its open-source nature makes it easy for anti-bots to block its evasion techniques. The best alternative to scraping the web reliably without getting blocked is ZenRows, which provides all the tools required to scrape without limitations.

Ready to get started?

Up to 1,000 URLs for free are waiting for you