Are you looking for a better way to bypass anti-bot protection while scraping with Puppeteer in JavaScript? Good news: the Puppeteer Stealth plugin lets you create Chrome patches that boost your scraper's evasion power.
In this tutorial, you'll explore how the Puppeteer Stealth evasions work and learn how to improve the tool to avoid all blocks and bans. Let's go!
What Is Puppeteer Stealth?
Puppeteer Stealth is a plugin for Puppeteer Extra, a Node.js library that extends Puppeteer with extra plugin functionalities. The Stealth plugin patches the base Puppeteer library to mimic a real browser, improving its ability to evade anti-bot detection during web scraping.
Stealth is quite popular, with an average of 450k weekly downloads and 36.1k GitHub repositories depending on it.
While the Stealth plugin already simplifies anti-bot bypass with built-in evasion techniques, you can further improve them with custom ones.
Why Use the Puppeteer Stealth Plugin?
The vanilla Puppeteer library is a useful web scraping tool. However, it leaks obvious bot-like signals that expose you to being flagged while scraping. Fortifying it with the Stealth plugin patches those loopholes and reduces the likelihood of anti-bot detection.
We compared the fingerprints of vanilla Puppeteer and the Stealth plugin on CreepJS to understand their evasion capabilities better. Let's discuss the result, starting with vanilla Puppeteer.
The result showed that base Puppeteer is 33% headless. With this information alone, anti-bots will likely detect you as an automated browser, increasing your chances of getting blocked.
The test also revealed that the Puppeteer browser instance includes the WebDriver property (webDriverIsOn: true
). That's a red flag because a legitimate browser doesn't include an automated WebDriver.
Besides, many anti-bot systems, such as Cloudflare and Akamai, effectively fingerprint the browser's navigator property for the presence of a WebDriver. Once discovered, it becomes easy for a protected website to flag your request as a potential bot.
See the result below:
webDriverIsOn: true
hasHeadlessUA: false
hasHeadlessWorkerUA: false
Puppeteer Stealth, on the other hand, shows a higher chance of evading detection with a headless score of 0%. This means that the Stealth plugin will pass an anti-bot fingerprinting test that checks the WebDriver parameter and is unlikely to get blocked.
An inspection of the WebDriver property also reveals that Puppeteer Stealth patches Puppeteer's WebDriver behavior (webDriverIsOn: false
). This modification reduces your likelihood of bot detection:
webDriverIsOn: false
hasHeadlessUA: false
hasHeadlessWorkerUA: false
The above tests only scratched the surface of the Puppeteer Stealth plugin's evasion strategies. The tool has many more, e.g., the Chrome runtime, outer dimensions, User Agent override, or media codecs.
Still, the Stealth plugin only works for websites with basic protection. Let's learn how this plugin works behind the scenes so you can improve it with custom evasions.
Puppeteer Stealth Evasions You Should Know
To understand how Puppeteer Stealth works, you must learn the workings of a few critical evasions. Let's take a look at them.
Chrome.runtime
Running Puppeteer in headless mode adds more bot-like properties, such as the HeadlessChrome
flag in the User Agent request header. The plugin's chrome.runtime
evasion ensures your scraper mocks the real Chrome runtime object.Â
So, even if you're scraping in the headless mode, this patch makes it look like you're running the real Chrome browser in the GUI mode. This evasion is handy for bypassing anti-bots in headless mode.
Outer Dimensions
The Windows outer dimensions include your browser's outerHeight
and outerWidth
. They return your browser's total height and width, including the title, address, and scroll bars.Â
However, these properties are typically missing or set to default bot-like values in Puppeteer's headless mode.
Puppeteer Stealth uses an evasion called the windows.outerdimensions
to force these features while scraping in headless mode. This patch also ensures that the headless viewport value matches your original Windows viewport.
User Agent Override
The User Agent Override evasion patches Puppeteer's default User Agent information, including the platform data. This modification works with other evasions, such as chrome.runtime
and chrome.app
. While in headless mode, this technique ensures that the User Agent browser string changes to Chrome
instead of the default headlessChrome
.
Navigator.webdriver
The navigator.webdriver evasion modifies Puppeteer's WebDriver property. Some websites use the presence of a WebDriver as a pointer to bot-like activities. This evasion reduces the likelihood of getting blocked by changing the WebDriver property from true to false in the Puppeteer browser instance.
Media Codecs
Some media codecs are only available in the Chrome browser, not the open-source Chromium. For instance, in Chromium, the canPlayType
method might return maybe
instead of Chrome's standard probably
.
The media codec evasion modifies how Chromium reports codec support in Puppeteer, allowing you to mimic native Google Chrome. It enhances your likelihood of passing the codec support test during an anti-bot browser fingerprinting.
These are the common evasions that the Stealth plugin deals with. Depending on your project requirement, you can add more custom evasions to improve Puppeteer Stealth. That takes us to the critical part of patching the navigator fields.
Critical Object to Patch: The Navigator Fields
The navigator object is the most significant parameter that anti-bots check during browser fingerprinting. It provides details about the browser's identity, properties, and environment.
In this section, we'll explore Puppeteer Stealth evasions further, focusing on the navigator object and how to patch it.
Navigator Fields
Let's quickly browse seven essential navigator object fields to understand them better.
1. UserAgent
The userAgent navigator outputs the User Agent header string sent by the browser. It describes the browser's version, name, host platform, and rendering engine.
Here's a sample output on Chrome:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36
2. Language/Languages
The language and languages fields describe the user's language preference for their browser. However, both fields are different in terms of output. While the language field returns a specific language, the languages field outputs an array of the user's preferred languages.
3. Platform
The platform navigator field represents the operating system (OS) where the browser operates. Depending on the OS, its value can be Win32, Win64, MacIntel, Linux i686, or Linux x86_64.
4. WebDriver
The webdriver field returns a boolean to indicate whether or not a WebDriver automation instance controls the browser window. Its value is usually true
for the Puppeteer driver but false
for a legitimate browser.
5. Hardware Concurrency and Device Memory
The hardware concurrency field returns the number of processor cores on the user's machine, while the memory property describes the device's Random Access Memory (RAM).
6. Plugins and mimeTypes
The plugins navigator returns an array of all the plugins installed on the browser, including their names, supported mimeTypes, and descriptions. MimeTypes are the file formats the browser supports, including Image/JPEG, Image/PNG, application/PDF, text/PDF, and more.
7. UserAgentData
The userAgentData navigator field returns an array of information about the browser User Agent, including the brand name, version, and the platform that runs the browser.
How Navigator Varies in Different Browsers and Platforms
The navigator fields vary across browsers and platforms. Let's see how they compare.Â
Firefox vs. Google Chrome
The navigator object presents different values for Firefox and Chrome. For example, running navigator.userAgent
via the Firefox console returns the following:
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:123.0) Gecko/20100101 Firefox/123.0
The same command on Google Chrome outputs this value:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36
Additionally, the userAgentData field is only available on Google Chrome and not on Firefox.
Execute navigator.userAgentData
via your Chrome browser's console, and you might see the following result. It describes the User Agent version, host platform, and brand name.
You don't need to patch the userAgentData navigator if you've used Firefox as your custom User-Agent. Otherwise, you'll get flagged as a bot because the Firefox navigator object doesn't feature that field.
Chrome's vendor navigator field returns Google Inc.
, showing that the browser is Google's product. The same command outputs an empty string in the Firefox console. So, fixing a vendor navigator for Firefox is another mismatch that can flag you as a bot.
The oscpu navigator (navigator.oscpu
) field also behaves differently depending on the browser. While Google Chrome says its value is undefined
, it logs the following on Firefox:
Windows NT 10.0; Win64; x64
This disparity means you don't need to patch the oscpu
field if you're using a Chrome User Agent since it's not a navigator requirement on the Chrome browser.
Brave vs. Google Chrome
Brave and Google Chrome are Chromium browsers with many navigator fields in common. However, even small pieces of information, such as their brand names in the navigator field, can be used to track you.
On Google Chrome, the navigator.userAgentData.brands
command outputs Google Chrome
as the brand name:
This command returns Brave
on the Brave browser:
The above examples are limited to the Windows operating system. Let's see how platform differences affect the Chrome navigator object.
Google Chrome on macOS vs. Windows vs. Linux
The platform where the browser operates also determines the navigator field properties. For instance, Google Chrome's platform navigator varies across operating systems, including macOS, Windows, and Linux.
Running the navigator.platform
command via a Chrome browser operating on macOS returns MacIntel
. If you execute this command on Windows, you'll get 'Win32'. The Chrome browser on Linux might output Linux x86_64
.Â
The platform string in the userAgent
and userAgentData
navigators also vary across these operating systems.Â
Running navigator.userAgent
via Chrome's console on each operating system outputs different platform strings. See the User Agent values for each platform below:
Linux:
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36
Windows:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36
macOS:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36
You can also check the userAgentData
across these OSes. You'll see the correct platform name for each in the returned value. However, note that changing the User Agent's platform doesn't affect the platform name in the userAgentData.
For example, using a Linux Chrome User-Agent on a Windows OS doesn't change the userAgentData
and platform navigators for the Windows OS. To avoid detection during scraping, you must maintain consistency across all navigator fields.
How Anti-Bot Systems Detect Vanilla Puppeteer via the Navigator
Vanilla Puppeteer leaks bot-like information through navigator properties like the User Agent and the WebDriver.Â
Let's access both fields using the code below to see what Puppeteer returns. Before you begin, ensure you install Puppeteer and the Stealth plugin with npm
:
npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth
Now, check the webdriver
and userAgent
fields with the following code.
// import the required library
const puppeteer = require('puppeteer');
(async () => {
// run Puppeteer in headless mode
const browser = await puppeteer.launch({ headless: 'new' });
const page = await browser.newPage();
const navigatorWebdriverValue = await page.evaluate(() => {
// return the values of the navigator fields
return {
webdriver: navigator.webdriver,
userAgent: navigator.userAgent,
}
});
// output the navigator values
console.log(navigatorWebdriverValue);
// launch the target web page
await page.goto('https://abrahamjuliot.github.io/creepjs/');
await browser.close();
})();
Vanilla Puppeteer returns true
for the WebDriver, and the User Agent string shows the HeadlessChrome
flag. Anti-bots will detect your request as automated software with these values:
{
webdriver: true,
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/121.0.0.0 Safari/537.36'
}
A legitimate Chrome browser shows false
for the WebDriver and features the mainstream Chrome in the user agent string. To confirm, open your browser console and run the following navigator commands individually, as shown:
> navigator.webdriver
// false
> navigator.userAgent
// 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36'
How can you patch the navigator object?
How to Patch the Navigator
The previous test conducted on CreepJS showed that the Stealth plugin still leaks some details that can trigger anti-bot actions. Adding custom evasions can further boost your chances of avoiding detection with Puppeteer.
You've seen that the navigator fields vary across browsers and platforms. In this section, you'll learn to patch some properties, including the WebDriver, User-Agent, platform, and userAgentData fields (platform and brands).
Let's make some custom evasions!
Patch the WebDriver Property
Let's start with the WebDriver, which returns true
, indicating automation. You'll patch that by changing its value to false
so your scraper doesn't appear as an automated browser.
First, import the required library, start the browser instance in non-headless mode, and create a new page. Define a document modifier function and redefine the WebDriver property as false:
// import the required libraries
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
// add the Stealth plugin
puppeteer.use(StealthPlugin());
(async () => {
// start the browser in non-headless mode
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.evaluateOnNewDocument(() => {
// check if navigator.webdriver is null or false
if (navigator.webdriver === null || navigator.webdriver === false) {
// already patched or not detected, no action needed
} else {
// patch navigator.webdriver to false
Object.defineProperty(navigator, 'webdriver', {
get: () => false,
configurable: true,
});
}
});
})()
Run the above code and open the browser console. Then execute the navigator.webdriver
command. It returns false
, indicating that you've patched the WebDriver anti-bot property.
Let's patch the User Agent and platform navigator fields in the next section.
Modify the User Agent and Platform NavigatorsÂ
Assume you want to use Chrome version 126 for Linux as your custom User Agent. The User Agent OS (Linux) must match the platform navigator. That's why you should patch both fields (User Agent and platform) at the same time.
Expand the document modifier function to patch the User-Agent and platform navigators. Then, open a target URL, wait for the page to load, and close the browser:
(async () => {
// ...
await page.evaluateOnNewDocument(() => {
// ...
// customize the navigator.userAgent value
Object.defineProperty(navigator, "userAgent", {
value: "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
configurable: true,
});
// customize the navigator.platform value
Object.defineProperty(navigator, "platform", {
value: "Linux x86_64",
configurable: true,
});
});
// visit a target page
await page.goto("https://abrahamjuliot.github.io/creepjs/");
// wait for the page to load
await new Promise(resolve => setTimeout(resolve, 1000000));
// close the browser
await browser.close();
})();
Your final code should look like this after combining both snippets:
// import the required library
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
(async () => {
// start the browser in non-headless mode
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.evaluateOnNewDocument(() => {
// check if navigator.webdriver is null or false
if (navigator.webdriver === null || navigator.webdriver === false) {
// already patched or not detected, no action needed
} else {
// patch navigator.webdriver to false
Object.defineProperty(navigator, 'webdriver', {
get: () => false,
configurable: true,
});
}
// customize the navigator.userAgent value
Object.defineProperty(navigator, "userAgent", {
value: "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
configurable: true,
});
// customize the navigator.platform value
Object.defineProperty(navigator, "platform", {
value: "Linux x86_64",
configurable: true,
});
});
// visit a target page
await page.goto("https://abrahamjuliot.github.io/creepjs/");
// wait for the page to load
await new Promise(resolve => setTimeout(resolve, 1000000));
// close the browser
await browser.close();
})();
To confirm the changes, execute the above code and open the browser console. Then, run each of the following commands individually:
> navigator.webdriver
// false
> navigator.userAgent
// 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36'
> navigator.platform
// 'Linux x86_64'
Nice job! You've patched the WebDriver, platform, and User Agent navigators.
Patch the userAgentData Navigator Property
Although you've patched the OS to Linux in the User-Agent and platform navigator fields, this modification doesn't affect the userAgentData properties.Â
To check, execute the previous code and run the navigator.userAgentData
command.Â
The Chrome version and the platform name in the userAgentData
navigator don't match the User Agent and platform navigators you patched previously. Here's the result:
To patch the platform and brand version in the userAgentData
with matching values, add the following to the previous code:
(async () => {
// ...
await page.evaluateOnNewDocument(() => {
// ...
// redefine navigator.userAgentData to return a custom value
if (navigator.userAgentData) {
navigator.__defineGetter__("userAgentData", function() {
return {
brands: [
{ brand: "Chromium", version: "126" },
{brand: "Not(A:Brand", version: "8"},
{ brand: "Google Chrome", version: "126" },
],
mobile: false,
platform: "Linux",
};
});
}
});
// ...
})();
Your new full code should look like this:
// import the required library
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
(async () => {
// start the browser in non-headless mode
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.evaluateOnNewDocument(() => {
// check if navigator.webdriver is null or false
if (navigator.webdriver === null || navigator.webdriver === false) {
// already patched or not detected, no action needed
} else {
// patch navigator.webdriver to false
Object.defineProperty(navigator, 'webdriver', {
get: () => false,
configurable: true,
});
}
// customize the navigator.userAgent value
Object.defineProperty(navigator, "userAgent", {
value: "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
configurable: true,
});
// customize the navigator.platform value
Object.defineProperty(navigator, "platform", {
value: "Linux x86_64",
configurable: true,
});
// redefine navigator.userAgentData to return a custom value
if (navigator.userAgentData) {
navigator.__defineGetter__("userAgentData", function() {
return {
brands: [
{ brand: "Chromium", version: "126" },
{brand: "Not(A:Brand", version: "8"},
{ brand: "Google Chrome", version: "126" },
],
mobile: false,
platform: "Linux",
};
});
}
});
// visit a target page
await page.goto("https://abrahamjuliot.github.io/creepjs/");
// wait for the page to load
await new Promise(resolve => setTimeout(resolve, 1000000));
// close the browser
await browser.close();
})();
Now, execute the code and run navigator.userAgentData
via the browser console. The output reflects your changes, showing that you've patched the userAgentData
navigator field:
You just modified Puppeteer Stealth to improve its evasion capabilities. Congratulations!Â
However, you can still get detected despite patching these loopholes. That's because most anti-bots check several other bot-like signals beyond your control.Â
The most effective solution to bypass any anti-bot system, regardless of the complexity, is to use a web scraping API like ZenRows. It's the fastest way to bypass advanced and evolving anti-bot measures without tiresome manual approaches.
Conclusion
You've just explored Puppeteer Stealth evasions, including navigator fields relevant to web scraping. Now, you know how they differ across browsers and User Agents. You've also learned to patch the WebDriver, User Agent, platform, and userAgentData navigator fields to improve your scraper's ability to bypass blocks.
Still, the only foolproof way to scrape without limitation is to use a web scraping API, such as ZenRows. Try ZenRows for free.