Are you scraping with Puppeteer but getting the "Press & Hold to confirm you're a human (and not a bot)" message on your screen or console? You're getting blocked by the PerimeterX anti-bot!
Unfortunately, base Puppeteer can't bypass this block. But no worries, there are ways to deal with this problem.
This article explains how PerimeterX works and offers three tested and trusted ways to bypass it with Puppeteer:
Let's go!
How Does PerimeterX Work (and Why Does It Detect Puppeteer)?
PerimeterX is a cybersecurity company that offers advanced protective measures against bots and malicious activities, including account takeovers, DDoS attacks, Ad frauds, and client-side threats. Sadly, it doesn't spare web scrapers while defending a website against these threats, preventing you from getting your desired data.
PerimeterX uses detection techniques such as IP monitoring, browser fingerprinting, request header analysis, behavioral analysis, and machine learning models to differentiate between humans and bots. Once the anti-bot detects bot-like activities, it triggers a CAPTCHA to block your request.
Headless browsers like Puppeteer can't bypass these detection techniques because they contain bot-like information. For instance, Puppeteer sends a HeadlessChrome
flag via its default User Agent and displays the WebDriver parameter in its navigator property, which is likely to cause detection and blocks.
What's more, even avoiding one or two of these detection measures with manual methods doesn't guarantee success. You need to bypass them all at once.
To confirm, let's see how Puppeteer performs against a PerimeterX-protected website like Zillow. Try it out with the following Puppeteer scraper that screenshots the target website:
const puppeteer = require('puppeteer');
(async () => {
// launch a new browser instance
const browser = await puppeteer.launch();
// open a new page
const page = await browser.newPage();
// navigate to the Zillow homepage
await page.goto('https://www.zillow.com/');
// wait for the page to load fully
new Promise((r) => setTimeout(r, 5000));
// take a screenshot of the page
await page.screenshot({ path: 'zillow_homepage.png' });
// close the browser
await browser.close();
})();
The above scraper gets blocked by PerimeterX, as shown in the screenshot below:
If you don't find a way to bypass PerimeterX, your scraper can get stuck. In the next section, we'll find out how to avoid this block.
Best Ways to Bypass PerimeterX With Puppeteer
PerimeterX is a formidable anti-bot against headless browsers like Puppeteer. But there are a few ways to bypass it. Let's explore the most popular ones.
Method #1: Use Puppeteer Stealth Plugin
The Puppeteer Stealth plugin is an extension that modifies Puppeteer to bypass anti-bot detection. It patches the base Puppeter browser instance with various evasion strategies to simulate an actual browser environment.Â
Specifically, the plugin hides the WebDriver navigator property and overrides the User Agent header to show Chrome instead of HeadlessChrome
. It also modifies properties such as chrome.runtime
, chrome.app
, and many more, allowing the browser instance to run as if it's in the GUI, even if you use the headless mode.Â
These patches enhance your scraper's ability to bypass fingerprinting tests and request header analysis.
Check out our detailed tutorial on web scraping with Puppeteer Stealth for more info on how to use it.
However, the plugin's bypass strategies still aren't enough for all of PerimeterX's detection techniques. Especially since the anti-bot is becoming increasingly difficult to bypass with its consistent security updates.
Method #2: Get Premium Proxies
A proxy is a service that forwards requests on your behalf, making it look like you're requesting from another location. Since proxies route requests through another IP address, they help avoid IP bans due to rate limiting and geo-restrictions during scraping.
For web scraping, you should avoid free proxies since they're unreliable and short-lived. The best choice is auto-rotating premium web scraping proxies, which guarantee a high success rate. These proxies rotate your IP from a large pool of residential IPs assigned to daily internet users by network providers, allowing you to mimic a different user per request during web scraping.
One of the best premium proxy providers on the market is ZenRows. It offers residential IP auto-rotation and geo-targeting out of the box. Under the same proxy plan, you also gain access to all the tools you need to avoid getting blocked, including anti-CAPTCHA, anti-bot auto-bypass, a headless browser, and more. All with a single API call.
You can easily integrate ZenRows with Puppeteer. Here's a template script demonstrating authenticated proxy integration with Puppeteer:
const puppeteer = require('puppeteer');
(async () => {
// launch a new browser instance with a proxy option
const browser = await puppeteer.launch(
{args: [`--proxy-server=<PROXY_IP_ADDRESS>:<PROXY_PORT>`]}
);
// open a new page
const page = await browser.newPage();
// authenticate the proxy
await page.authenticate({
username: '<YOUR_USERNAME>',
password: '<YOUR_PASSWORD>'
});
// navigate to the target page
await page.goto('https://httpbin.io/ip');
// wait for the page to load fully
new Promise((r) => setTimeout(r, 50000));
// print the page result
console.log(await page.content())
// close the browser
await browser.close();
})();
Replace the placeholders in the above code with your proxy credentials, and you're all set.
Also, read our detailed guide on using proxies with Puppeteer to learn more!
Method #3: Optimize Your Request Headers
The request headers provide more information about the HTTP client or browser sending a request to a server. They detail the client's User Agent, platform version, accepted encoding, language, content type, and more. This information contributes to how the server responds to your request.
PerimeterX scans your request headers against a database of disallowed ones to determine whether your scraper should access a protected web page. So, any inconsistency in your scraper's request headers can expose you as a bot, resulting in a potential ban.
One way to optimize Puppeteer's request headers is to spoof those of an actual browser to mimic a real user. You should also avoid conflicting header fields.Â
For instance, if you've used a Chrome version 126 User Agent for Linux, the Secure Client Hint User Agent (Sec-Ch-Ua
) must bear the same Chrome version (126). Similarly, the Secure Client Hint User Agent Platform header (Sec-Ch-Ua-Platform
) must be Linux. Otherwise, PerimeterX will flag your scraper as a bot and may even block it from subsequently accessing protected websites.
Rotating some header fields, such as the User Agent and platform, can also reduce your chances of being blocked by PerimeterX.Â
Check out our guide on how to set Puppeteer's request headers for a detailed tutorial.
Conclusion
We've shown you the 3 proven ways to bypass PerimeterX while scraping with Puppeteer. Premium proxies are the most efficient solutions with the highest success rate. That's because they receive frequent upkeep that can withstand evolving PerimeterX detection techniques.
You can also try other headless browsers like Playwright to bypass PerimeterX, or for the best experience, we recommend using ZenRows, an all-in-one web scraping toolkit for bypassing any anti-bot at scale.Â
Try ZenRows now and scrape any website without limitations!