How to Patch Puppeteer Stealth to Improve Its Anti-bot Bypass Power

July 15, 2024 · 8 min read

Are you looking for a better way to bypass anti-bot protection while scraping with Puppeteer in JavaScript? Good news: the Puppeteer Stealth plugin lets you create Chrome patches that boost your scraper's evasion power.

In this tutorial, you'll explore how the Puppeteer Stealth evasions work and learn how to improve the tool to avoid all blocks and bans. Let's go!

What Is Puppeteer Stealth?

Puppeteer Stealth is a plugin for Puppeteer Extra, a Node.js library that extends Puppeteer with extra plugin functionalities. The Stealth plugin patches the base Puppeteer library to mimic a real browser, improving its ability to evade anti-bot detection during web scraping.

Stealth is quite popular, with an average of 450k weekly downloads and 36.1k GitHub repositories depending on it.

Puppeteer Stealth Weekly Stats
Click to open the image in full screen

While the Stealth plugin already simplifies anti-bot bypass with built-in evasion techniques, you can further improve them with custom ones.

Why Use the Puppeteer Stealth Plugin?

The vanilla Puppeteer library is a useful web scraping tool. However, it leaks obvious bot-like signals that expose you to being flagged while scraping. Fortifying it with the Stealth plugin patches those loopholes and reduces the likelihood of anti-bot detection.

We compared the fingerprints of vanilla Puppeteer and the Stealth plugin on CreepJS to understand their evasion capabilities better. Let's discuss the result, starting with vanilla Puppeteer.

The result showed that base Puppeteer is 33% headless. With this information alone, anti-bots will likely detect you as an automated browser, increasing your chances of getting blocked.

Vanilla Puppeteer Headless Detected
Click to open the image in full screen

The test also revealed that the Puppeteer browser instance includes the WebDriver property (webDriverIsOn: true). That's a red flag because a legitimate browser doesn't include an automated WebDriver.

Besides, many anti-bot systems, such as Cloudflare and Akamai, effectively fingerprint the browser's navigator property for the presence of a WebDriver. Once discovered, it becomes easy for a protected website to flag your request as a potential bot.

See the result below:

Output
webDriverIsOn: true
hasHeadlessUA: false
hasHeadlessWorkerUA: false

Puppeteer Stealth, on the other hand, shows a higher chance of evading detection with a headless score of 0%. This means that the Stealth plugin will pass an anti-bot fingerprinting test that checks the WebDriver parameter and is unlikely to get blocked.

Puppeteer Stealth Headless Values
Click to open the image in full screen

An inspection of the WebDriver property also reveals that Puppeteer Stealth patches Puppeteer's WebDriver behavior (webDriverIsOn: false). This modification reduces your likelihood of bot detection:

Output
webDriverIsOn: false
hasHeadlessUA: false
hasHeadlessWorkerUA: false

The above tests only scratched the surface of the Puppeteer Stealth plugin's evasion strategies. The tool has many more, e.g., the Chrome runtime, outer dimensions, User Agent override, or media codecs.

Still, the Stealth plugin only works for websites with basic protection. Let's learn how this plugin works behind the scenes so you can improve it with custom evasions.

Frustrated that your web scrapers are blocked once and again?
ZenRows API handles rotating proxies and headless browsers for you.
Try for FREE

Puppeteer Stealth Evasions You Should Know

To understand how Puppeteer Stealth works, you must learn the workings of a few critical evasions. Let's take a look at them.

Chrome.runtime

Running Puppeteer in headless mode adds more bot-like properties, such as the HeadlessChrome flag in the User Agent request header. The plugin's chrome.runtime evasion ensures your scraper mocks the real Chrome runtime object. 

So, even if you're scraping in the headless mode, this patch makes it look like you're running the real Chrome browser in the GUI mode. This evasion is handy for bypassing anti-bots in headless mode.

Outer Dimensions

The Windows outer dimensions include your browser's outerHeight and outerWidth. They return your browser's total height and width, including the title, address, and scroll bars. 

However, these properties are typically missing or set to default bot-like values in Puppeteer's headless mode.

Puppeteer Stealth uses an evasion called the windows.outerdimensions to force these features while scraping in headless mode. This patch also ensures that the headless viewport value matches your original Windows viewport.

User Agent Override

The User Agent Override evasion patches Puppeteer's default User Agent information, including the platform data. This modification works with other evasions, such as chrome.runtime and chrome.app. While in headless mode, this technique ensures that the User Agent browser string changes to Chrome instead of the default headlessChrome.

The navigator.webdriver evasion modifies Puppeteer's WebDriver property. Some websites use the presence of a WebDriver as a pointer to bot-like activities. This evasion reduces the likelihood of getting blocked by changing the WebDriver property from true to false in the Puppeteer browser instance.

Media Codecs

Some media codecs are only available in the Chrome browser, not the open-source Chromium. For instance, in Chromium, the canPlayType method might return maybe instead of Chrome's standard probably.

The media codec evasion modifies how Chromium reports codec support in Puppeteer, allowing you to mimic native Google Chrome. It enhances your likelihood of passing the codec support test during an anti-bot browser fingerprinting.

These are the common evasions that the Stealth plugin deals with. Depending on your project requirement, you can add more custom evasions to improve Puppeteer Stealth. That takes us to the critical part of patching the navigator fields.

Critical Object to Patch: The Navigator Fields

The navigator object is the most significant parameter that anti-bots check during browser fingerprinting. It provides details about the browser's identity, properties, and environment.

In this section, we'll explore Puppeteer Stealth evasions further, focusing on the navigator object and how to patch it.

Let's quickly browse seven essential navigator object fields to understand them better.

1. UserAgent

The userAgent navigator outputs the User Agent header string sent by the browser. It describes the browser's version, name, host platform, and rendering engine.

Here's a sample output on Chrome:

Output
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36

2. Language/Languages

The language and languages fields describe the user's language preference for their browser. However, both fields are different in terms of output. While the language field returns a specific language, the languages field outputs an array of the user's preferred languages.

3. Platform

The platform navigator field represents the operating system (OS) where the browser operates. Depending on the OS, its value can be Win32, Win64, MacIntel, Linux i686, or Linux x86_64.

4. WebDriver

The webdriver field returns a boolean to indicate whether or not a WebDriver automation instance controls the browser window. Its value is usually true for the Puppeteer driver but false for a legitimate browser.

5. Hardware Concurrency and Device Memory

The hardware concurrency field returns the number of processor cores on the user's machine, while the memory property describes the device's Random Access Memory (RAM).

6. Plugins and mimeTypes

The plugins navigator returns an array of all the plugins installed on the browser, including their names, supported mimeTypes, and descriptions. MimeTypes are the file formats the browser supports, including Image/JPEG, Image/PNG, application/PDF, text/PDF, and more.

7. UserAgentData

The userAgentData navigator field returns an array of information about the browser User Agent, including the brand name, version, and the platform that runs the browser.

How Navigator Varies in Different Browsers and Platforms

The navigator fields vary across browsers and platforms. Let's see how they compare. 

Firefox vs. Google Chrome

The navigator object presents different values for Firefox and Chrome. For example, running navigator.userAgent via the Firefox console returns the following:

Output
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:123.0) Gecko/20100101 Firefox/123.0

The same command on Google Chrome outputs this value:

Output
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36

Additionally, the userAgentData field is only available on Google Chrome and not on Firefox.

Execute navigator.userAgentData via your Chrome browser's console, and you might see the following result. It describes the User Agent version, host platform, and brand name.

Chrome UserAgent Navigator Property
Click to open the image in full screen

Chrome's vendor navigator field returns Google Inc., showing that the browser is Google's product. The same command outputs an empty string in the Firefox console. So, fixing a vendor navigator for Firefox is another mismatch that can flag you as a bot.

The oscpu navigator (navigator.oscpu) field also behaves differently depending on the browser. While Google Chrome says its value is undefined, it logs the following on Firefox:

Output
Windows NT 10.0; Win64; x64

This disparity means you don't need to patch the oscpu field if you're using a Chrome User Agent since it's not a navigator requirement on the Chrome browser.

Brave vs. Google Chrome

Brave and Google Chrome are Chromium browsers with many navigator fields in common. However, even small pieces of information, such as their brand names in the navigator field, can be used to track you.

On Google Chrome, the navigator.userAgentData.brands command outputs Google Chrome as the brand name:

Chrome UserAgent Data Brands Navigator
Click to open the image in full screen

This command returns Brave on the Brave browser:

Brave UserAgent Data Brands Navigator
Click to open the image in full screen

The above examples are limited to the Windows operating system. Let's see how platform differences affect the Chrome navigator object.

Google Chrome on macOS vs. Windows vs. Linux

The platform where the browser operates also determines the navigator field properties. For instance, Google Chrome's platform navigator varies across operating systems, including macOS, Windows, and Linux.

Running the navigator.platform command via a Chrome browser operating on macOS returns MacIntel. If you execute this command on Windows, you'll get 'Win32'. The Chrome browser on Linux might output Linux x86_64. 

The platform string in the userAgent and userAgentData navigators also vary across these operating systems. 

Running navigator.userAgent via Chrome's console on each operating system outputs different platform strings. See the User Agent values for each platform below:

Linux:

Output
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36

Windows:

Output
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36

macOS:

Output
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36

You can also check the userAgentData across these OSes. You'll see the correct platform name for each in the returned value. However, note that changing the User Agent's platform doesn't affect the platform name in the userAgentData.

For example, using a Linux Chrome User-Agent on a Windows OS doesn't change the userAgentData and platform navigators for the Windows OS. To avoid detection during scraping, you must maintain consistency across all navigator fields.

How Anti-Bot Systems Detect Vanilla Puppeteer via the Navigator

Vanilla Puppeteer leaks bot-like information through navigator properties like the User Agent and the WebDriver. 

Let's access both fields using the code below to see what Puppeteer returns. Before you begin, ensure you install Puppeteer and the Stealth plugin with npm:

Terminal
npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth

Now, check the webdriver and userAgent fields with the following code.

Example
// import the required library
const puppeteer = require('puppeteer');

(async () => {

    // run Puppeteer in headless mode
    const browser = await puppeteer.launch({ headless: 'new' });
    const page = await browser.newPage();

    const navigatorWebdriverValue = await page.evaluate(() => {
        // return the values of the navigator fields
        return {
            webdriver: navigator.webdriver,
            userAgent: navigator.userAgent,
        }
    });

    // output the navigator values
    console.log(navigatorWebdriverValue);

    // launch the target web page
    await page.goto('https://abrahamjuliot.github.io/creepjs/');
   
    await browser.close();
})();

Vanilla Puppeteer returns true for the WebDriver, and the User Agent string shows the HeadlessChrome flag. Anti-bots will detect your request as automated software with these values:

Output
{
    webdriver: true,
    userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/121.0.0.0 Safari/537.36'
}

A legitimate Chrome browser shows false for the WebDriver and features the mainstream Chrome in the user agent string. To confirm, open your browser console and run the following navigator commands individually, as shown:

Terminal
> navigator.webdriver
// false
> navigator.userAgent
// 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36'

How can you patch the navigator object?

How to Patch the Navigator

The previous test conducted on CreepJS showed that the Stealth plugin still leaks some details that can trigger anti-bot actions. Adding custom evasions can further boost your chances of avoiding detection with Puppeteer.

You've seen that the navigator fields vary across browsers and platforms. In this section, you'll learn to patch some properties, including the WebDriver, User-Agent, platform, and userAgentData fields (platform and brands).

Let's make some custom evasions!

Patch the WebDriver Property

Let's start with the WebDriver, which returns true, indicating automation. You'll patch that by changing its value to false so your scraper doesn't appear as an automated browser.

First, import the required library, start the browser instance in non-headless mode, and create a new page. Define a document modifier function and redefine the WebDriver property as false:

Example
// import the required libraries
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');

// add the Stealth plugin
puppeteer.use(StealthPlugin());

(async () => {

    // start the browser in non-headless mode
    const browser = await puppeteer.launch({ headless: false });
    const page = await browser.newPage();
 
    await page.evaluateOnNewDocument(() => {

        // check if navigator.webdriver is null or false
        if (navigator.webdriver === null || navigator.webdriver === false) {
           
            // already patched or not detected, no action needed
           
        } else {
           
            // patch navigator.webdriver to false
            Object.defineProperty(navigator, 'webdriver', {
                get: () => false,
                configurable: true,
            });
        }
     
    });
 
 })()

Run the above code and open the browser console. Then execute the navigator.webdriver command. It returns false, indicating that you've patched the WebDriver anti-bot property.

Let's patch the User Agent and platform navigator fields in the next section.

Modify the User Agent and Platform Navigators 

Assume you want to use Chrome version 126 for Linux as your custom User Agent. The User Agent OS (Linux) must match the platform navigator. That's why you should patch both fields (User Agent and platform) at the same time.

Expand the document modifier function to patch the User-Agent and platform navigators. Then, open a target URL, wait for the page to load, and close the browser:

Example
(async () => {

    // ...
 
    await page.evaluateOnNewDocument(() => {
       
        // ...
       
        // customize the navigator.userAgent value
        Object.defineProperty(navigator, "userAgent", {

            value: "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
            configurable: true,
        });

        // customize the navigator.platform value
        Object.defineProperty(navigator, "platform", {

            value: "Linux x86_64",
            configurable: true,
        });

    });
   
    // visit a target page    
    await page.goto("https://abrahamjuliot.github.io/creepjs/");

    // wait for the page to load
    await new Promise(resolve => setTimeout(resolve, 1000000));
   
    // close the browser
    await browser.close();
})();

Your final code should look like this after combining both snippets:

Example
// import the required library
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');

puppeteer.use(StealthPlugin());

(async () => {
   
    // start the browser in non-headless mode
    const browser = await puppeteer.launch({ headless: false });
    const page = await browser.newPage();
 
    await page.evaluateOnNewDocument(() => {

        // check if navigator.webdriver is null or false
        if (navigator.webdriver === null || navigator.webdriver === false) {

            // already patched or not detected, no action needed

        } else {

            // patch navigator.webdriver to false
            Object.defineProperty(navigator, 'webdriver', {
                get: () => false,
                configurable: true,
            });
        }

        // customize the navigator.userAgent value
        Object.defineProperty(navigator, "userAgent", {

            value: "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
            configurable: true,
        });

        // customize the navigator.platform value
        Object.defineProperty(navigator, "platform", {

            value: "Linux x86_64",
            configurable: true,
        });

    });

    // visit a target page    
    await page.goto("https://abrahamjuliot.github.io/creepjs/");

    // wait for the page to load
    await new Promise(resolve => setTimeout(resolve, 1000000));
   
    // close the browser
    await browser.close();
})();

To confirm the changes, execute the above code and open the browser console. Then, run each of the following commands individually:

Terminal
> navigator.webdriver
// false
> navigator.userAgent
// 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36'
> navigator.platform
// 'Linux x86_64'

Nice job! You've patched the WebDriver, platform, and User Agent navigators.

Patch the userAgentData Navigator Property

Although you've patched the OS to Linux in the User-Agent and platform navigator fields, this modification doesn't affect the userAgentData properties. 

To check, execute the previous code and run the navigator.userAgentData command. 

The Chrome version and the platform name in the userAgentData navigator don't match the User Agent and platform navigators you patched previously. Here's the result:

Stealth User Agent Data in Console
Click to open the image in full screen

To patch the platform and brand version in the userAgentData with matching values, add the following to the previous code:

Example
(async () => {
   
    // ...
 
    await page.evaluateOnNewDocument(() => {

        // ...
     
        // redefine navigator.userAgentData to return a custom value
        if (navigator.userAgentData) {
            navigator.__defineGetter__("userAgentData", function() {
                return {
                    brands: [
                        { brand: "Chromium", version: "126" },
                        {brand: "Not(A:Brand", version: "8"},
                        { brand: "Google Chrome", version: "126" },
                    ],
                    mobile: false,
                    platform: "Linux",
                };
            });
        }
    });

    // ...
})();

Your new full code should look like this:

Example
// import the required library
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');

puppeteer.use(StealthPlugin());

(async () => {
   
    // start the browser in non-headless mode
    const browser = await puppeteer.launch({ headless: false });
    const page = await browser.newPage();
 
    await page.evaluateOnNewDocument(() => {

        // check if navigator.webdriver is null or false
        if (navigator.webdriver === null || navigator.webdriver === false) {

            // already patched or not detected, no action needed

        } else {
            // patch navigator.webdriver to false
            Object.defineProperty(navigator, 'webdriver', {
                get: () => false,
                configurable: true,
            });
        }

        // customize the navigator.userAgent value
        Object.defineProperty(navigator, "userAgent", {

            value: "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
            configurable: true,
        });

        // customize the navigator.platform value
        Object.defineProperty(navigator, "platform", {

            value: "Linux x86_64",
            configurable: true,
        });

        // redefine navigator.userAgentData to return a custom value
        if (navigator.userAgentData) {
            navigator.__defineGetter__("userAgentData", function() {
                return {
                    brands: [
                        { brand: "Chromium", version: "126" },
                        {brand: "Not(A:Brand", version: "8"},
                        { brand: "Google Chrome", version: "126" },
                    ],
                    mobile: false,
                    platform: "Linux",
                };
            });
        }
    });

    // visit a target page    
    await page.goto("https://abrahamjuliot.github.io/creepjs/");

    // wait for the page to load
    await new Promise(resolve => setTimeout(resolve, 1000000));
   
    // close the browser
    await browser.close();
})();

Now, execute the code and run navigator.userAgentData via the browser console. The output reflects your changes, showing that you've patched the userAgentData navigator field:

Patched UserAgent Data in Console
Click to open the image in full screen

You just modified Puppeteer Stealth to improve its evasion capabilities. Congratulations! 

However, you can still get detected despite patching these loopholes. That's because most anti-bots check several other bot-like signals beyond your control. 

The most effective solution to bypass any anti-bot system, regardless of the complexity, is to use a web scraping API like ZenRows. It's the fastest way to bypass advanced and evolving anti-bot measures without tiresome manual approaches.

Conclusion

You've just explored Puppeteer Stealth evasions, including navigator fields relevant to web scraping. Now, you know how they differ across browsers and User Agents. You've also learned to patch the WebDriver, User Agent, platform, and userAgentData navigator fields to improve your scraper's ability to bypass blocks.

Still, the only foolproof way to scrape without limitation is to use a web scraping API, such as ZenRows. Try ZenRows for free.

Ready to get started?

Up to 1,000 URLs for free are waiting for you